-
View all jobs
Working for a company like Smile Digital Health means supporting our mandate for #BetterGlobalHealth. We strive towards this goal every day, and the results can be seen in the impact of our innovative health data platform and data management solutions, which are used in over 20 countries. We were #19 on Deloitte's Technology Fast 50 Ranking for 2024!
Smile Digital Health makes it easy for healthcare stakeholders to collect and exchange data with our leading FHIR-based data liberation platform.
At its heart, the Smile platform enables people and organizations to better manage healthcare data. We help generate and liberate structured healthcare data to ensure effective delivery across care teams and health systems bringing #BetterGlobalHealth to patients everyday!
Apply today and find plenty of reasons to SMILE!
The Cloud Site Reliability Engineer (SRE) is responsible for ensuring the reliability, scalability, and performance of production-grade services deployed across multiple cloud vendors and infrastructure platforms for Smile Digital Health, its clients, and partners. This role designs and automates performance testing frameworks, integrates them into CI/CD pipelines, and uses observability tools to proactively detect and resolve bottlenecks. Working closely with engineering, product, and security teams, the SRE ensures systems meet strict SLAs for performance and availability while driving continuous optimization across multiple cloud platforms.
Responsibilities:
We welcome and encourage candidates of all backgrounds to apply. Candidates are encouraged to inform us if they wish to discuss or require accommodations during interviews or while working at Smile.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Smile Digital Health makes it easy for healthcare stakeholders to collect and exchange data with our leading FHIR-based data liberation platform.
At its heart, the Smile platform enables people and organizations to better manage healthcare data. We help generate and liberate structured healthcare data to ensure effective delivery across care teams and health systems bringing #BetterGlobalHealth to patients everyday!
Apply today and find plenty of reasons to SMILE!
The Cloud Site Reliability Engineer (SRE) is responsible for ensuring the reliability, scalability, and performance of production-grade services deployed across multiple cloud vendors and infrastructure platforms for Smile Digital Health, its clients, and partners. This role designs and automates performance testing frameworks, integrates them into CI/CD pipelines, and uses observability tools to proactively detect and resolve bottlenecks. Working closely with engineering, product, and security teams, the SRE ensures systems meet strict SLAs for performance and availability while driving continuous optimization across multiple cloud platforms.
Responsibilities:
- Collaborate with our Security Operations teams to help define and implement best practices around Cloud Service Provider configuration for Azure and other cloud providers
- Develop, implement and coordinate a multi-tenant approach around service offerings for DB, Container platform, Authentication, Certificates, and Product Registries etc
- Design and maintain performance testing strategies, framework, and environments in the cloud. Develop and maintain cost/utilization tracking and attribution processes for all Cloud Service Providers
- Create documentation around Cloud Service Provider offerings detailing use cases, best practices, and implementation details
- Develop and maintain technical relationships with our core Cloud Service Providers
- Implement and maintain a secure and scalable infrastructure platform for delivering Cloud Services applications
- Ensure that internal and external SLA’s meet and exceed expectations, and ensure that system centric KPIs are continuously monitored and improved
- Create tools for automating deployment, monitoring and operations of the overall platform
- Participate in an on-call rotation to provide application support, incident management, and troubleshooting
- Provide ongoing maintenance and support of internal tools, improve system health and reliability
- Assist customers with the on-site deployments when needed.
- Implement and manage observability tools (logging, metrics, tracing) for performance insights, Otel and Grafana Stack preferredOngoing compliance with organizational policies, procedures and practices (such as but not limited to security policies) are an ongoing requirement of the employment or contractual agreement.
- Accountable for ensuring that all working hours are accurately reported in Time Tracking System on a daily or weekly basis, that the majority of (if not all) hours are tracked as billable and that the project management tool in the time tracking system is properly and fully utilized.
- Tracking and reporting of billable hours is a critical aspect of project management and delivery to our customers and this is a major area of accountability
- Comply with the privacy, security and confidentiality policies. Hold all confidential information in trust and strict confidence and ensure that it shall be used only for the purposes required to fulfill employment obligations, and shall not be used for any other purpose, or disclosed to any third party
- Demonstrated expertise of cloud service providers and best practices around implementation and configuration, preferably managing Azure on behalf of multiple teams for a company that delivers SaaS products
- Experience with Kubernetes, Openshift, Kafka, Elastic stack. Proven experience working with microservices architecture, with a strong focus on Java-based services
- Experience in applying chaos engineering practices to evaluate and enhance system resiliency
- Skilled in troubleshooting performance issues, including analyzing time consumption, allocating resources, and recommending optimizations
- Familiar with performance testing methodologies and tools to assess system behavior under load
- Proven experience with Security and Compliance (SOC2, HIPAA, ISO27001) best practices and how to implement controls that support high-velocity software delivery teams
- Proficiency in Terraform, Ansible or Chef.Expertise in troubleshooting, support escalation, on-call process optimization and documenting knowledge
- Passionate about Infrastructure as code, automation, and developing solutions that help developers move quickly and safely
- Familiarity with infrastructure management and operations lifecycle concepts and ecosystem
- Experience operating and maintaining production systems in a Linux and public cloud environment
- You have prior experience working in high-performance or distributed systems, while we strive to hire at a variety of experience levels
- Working knowledge of industry best practices regarding information securityPrevious experience building or maintaining a large-scale Cloud service
- Proven ability to prioritize and track multiple projects in parallel.Proven ability to be highly responsive and customer-focused
- Remote Work Environment
- Flexible Time Away From Work Policy including PTO, Personal and Sick Days
- Competitive Salary and Health/Medical Benefits
- RRSP/TFSA/401K Employee Contribution
- Life and Disability
- Employee Assistance Program
- FHIR Study Program and Skillsoft Learning
- Super HAPI Fun Club
We welcome and encourage candidates of all backgrounds to apply. Candidates are encouraged to inform us if they wish to discuss or require accommodations during interviews or while working at Smile.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Key Skills
Ranked by relevance
cloud
artificial intelligence
infrastructure as code
microservices
kubernetes
responsive
terraform
ai tools
ansible
grafana
kafka
hipaa
linux
java
saas
cicd
ai
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Senior Software Engineer (.NET | Azure | Full Stack)
2026-05-21
Full-time
Not Applicable
Argentina
Transportation
Engineering
View Job Details
Related
Head of Tech (Engineering & AI)
2026-05-24
Full-time
Not Applicable
Singapore
Transportation
Engineering
View Job Details
Related
Full Stack Web Developer
2026-06-09
Full-time
Not Applicable
Germany
Transportation
Engineering
Login to Apply
- Posted
- Nov 20, 2025
- Type
- Full-time
- Level
- Not Applicable
- Location
- Toronto
- Company
- Smile Digital Health
Industries
Transportation
Logistics
Supply Chain
Storage
Categories
Engineering
Information Technology
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Senior Software Engineer (.NET | Azure | Full Stack)
2026-05-21
Full-time
Not Applicable
Argentina
Transportation
Engineering
View Job Details
Related
Head of Tech (Engineering & AI)
2026-05-24
Full-time
Not Applicable
Singapore
Transportation
Engineering
View Job Details
Related
Full Stack Web Developer
2026-06-09
Full-time
Not Applicable
Germany
Transportation
Engineering