Cloud Site Reliability Engineer

Smile Digital Health

Canada · Full-time · Not Applicable

Working for a company like Smile Digital Health means supporting our mandate for #BetterGlobalHealth. We strive towards this goal every day, and the results can be seen in the impact of our innovative health data platform and data management solutions, which are used in over 20 countries. We were #19 on Deloitte's Technology Fast 50 Ranking for 2024!

Smile Digital Health makes it easy for healthcare stakeholders to collect and exchange data with our leading FHIR-based data liberation platform.

At its heart, the Smile platform enables people and organizations to better manage healthcare data. We help generate and liberate structured healthcare data to ensure effective delivery across care teams and health systems bringing #BetterGlobalHealth to patients everyday!

Apply today and find plenty of reasons to SMILE!
The Cloud Site Reliability Engineer (SRE) is responsible for ensuring the reliability, scalability, and performance of production-grade services deployed across multiple cloud vendors and infrastructure platforms for Smile Digital Health, its clients, and partners. This role designs and automates performance testing frameworks, integrates them into CI/CD pipelines, and uses observability tools to proactively detect and resolve bottlenecks. Working closely with engineering, product, and security teams, the SRE ensures systems meet strict SLAs for performance and availability while driving continuous optimization across multiple cloud platforms.

Responsibilities:

Collaborate with our Security Operations teams to help define and implement best practices around Cloud Service Provider configuration for Azure and other cloud providers
Develop, implement and coordinate a multi-tenant approach around service offerings for DB, Container platform, Authentication, Certificates, and Product Registries etc
Design and maintain performance testing strategies, framework, and environments in the cloud. Develop and maintain cost/utilization tracking and attribution processes for all Cloud Service Providers
Create documentation around Cloud Service Provider offerings detailing use cases, best practices, and implementation details
Develop and maintain technical relationships with our core Cloud Service Providers
Implement and maintain a secure and scalable infrastructure platform for delivering Cloud Services applications
Ensure that internal and external SLA’s meet and exceed expectations, and ensure that system centric KPIs are continuously monitored and improved
Create tools for automating deployment, monitoring and operations of the overall platform
Participate in an on-call rotation to provide application support, incident management, and troubleshooting
Provide ongoing maintenance and support of internal tools, improve system health and reliability
Assist customers with the on-site deployments when needed.
Implement and manage observability tools (logging, metrics, tracing) for performance insights, Otel and Grafana Stack preferredOngoing compliance with organizational policies, procedures and practices (such as but not limited to security policies) are an ongoing requirement of the employment or contractual agreement.
Accountable for ensuring that all working hours are accurately reported in Time Tracking System on a daily or weekly basis, that the majority of (if not all) hours are tracked as billable and that the project management tool in the time tracking system is properly and fully utilized.
Tracking and reporting of billable hours is a critical aspect of project management and delivery to our customers and this is a major area of accountability
Comply with the privacy, security and confidentiality policies. Hold all confidential information in trust and strict confidence and ensure that it shall be used only for the purposes required to fulfill employment obligations, and shall not be used for any other purpose, or disclosed to any third party

Requirements:

Demonstrated expertise of cloud service providers and best practices around implementation and configuration, preferably managing Azure on behalf of multiple teams for a company that delivers SaaS products
Experience with Kubernetes, Openshift, Kafka, Elastic stack. Proven experience working with microservices architecture, with a strong focus on Java-based services
Experience in applying chaos engineering practices to evaluate and enhance system resiliency
Skilled in troubleshooting performance issues, including analyzing time consumption, allocating resources, and recommending optimizations
Familiar with performance testing methodologies and tools to assess system behavior under load
Proven experience with Security and Compliance (SOC2, HIPAA, ISO27001) best practices and how to implement controls that support high-velocity software delivery teams
Proficiency in Terraform, Ansible or Chef.Expertise in troubleshooting, support escalation, on-call process optimization and documenting knowledge
Passionate about Infrastructure as code, automation, and developing solutions that help developers move quickly and safely
Familiarity with infrastructure management and operations lifecycle concepts and ecosystem
Experience operating and maintaining production systems in a Linux and public cloud environment
You have prior experience working in high-performance or distributed systems, while we strive to hire at a variety of experience levels
Working knowledge of industry best practices regarding information securityPrevious experience building or maintaining a large-scale Cloud service
Proven ability to prioritize and track multiple projects in parallel.Proven ability to be highly responsive and customer-focused

Some of the benefits we offer:

Remote Work Environment
Flexible Time Away From Work Policy including PTO, Personal and Sick Days
Competitive Salary and Health/Medical Benefits
RRSP/TFSA/401K Employee Contribution
Life and Disability
Employee Assistance Program
FHIR Study Program and Skillsoft Learning
Super HAPI Fun Club

Smile's core values include respect, inclusion, embracing our differences, and celebrating shared values because our people are the foundation of our success. We are big on creating a sense of belonging and empowering each other to bring our authentic selves to work. We are dedicated to fostering a workplace that values diversity, equity, and inclusion.

We welcome and encourage candidates of all backgrounds to apply. Candidates are encouraged to inform us if they wish to discuss or require accommodations during interviews or while working at Smile.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

Key Skills

Ranked by relevance

cloud artificial intelligence infrastructure as code microservices kubernetes responsive terraform ai tools ansible grafana kafka hipaa linux java saas cicd ai

Related Jobs

3 roles aligned with this opportunity

View all jobs

Senior Software Engineer (.NET | Azure | Full Stack)

2026-05-21

Full-time

Not Applicable

Argentina

Transportation

Engineering

Head of Tech (Engineering & AI)

2026-05-24

Full-time

Not Applicable

Singapore

Transportation

Engineering

Full Stack Web Developer

2026-06-09

Full-time

Not Applicable

Germany

Transportation

Engineering

🇨🇦

Country Guide

Canada

Express Entry & tech-friendly immigration

Posted: Nov 20, 2025
Type: Full-time
Level: Not Applicable
Location: Toronto
Company: Smile Digital Health

Industries

Transportation Logistics Supply Chain Storage

Related Jobs

3 roles aligned with this opportunity

View all jobs

Senior Software Engineer (.NET | Azure | Full Stack)

2026-05-21

Full-time

Not Applicable

Argentina

Transportation

Engineering

Head of Tech (Engineering & AI)

2026-05-24

Full-time

Not Applicable

Singapore

Transportation

Engineering

Full Stack Web Developer

2026-06-09

Full-time

Not Applicable

Germany

Transportation

Engineering

Cloud Site Reliability Engineer

Key Skills

Related Jobs

Senior Software Engineer (.NET | Azure | Full Stack)

Head of Tech (Engineering & AI)

Full Stack Web Developer

Related Jobs

Senior Software Engineer (.NET | Azure | Full Stack)

Head of Tech (Engineering & AI)

Full Stack Web Developer

Cookie Settings