Site Reliability Engineer

Socium - Teams Done Differently

United Arab Emirates · Contract · Mid-Senior

Job Title: Senior Site Reliability Engineer (SRE)

Location: Abu Dhabi, UAE

Work Setup: Onsite

Contract Duration: 6 Months Rolling Contract

General Description

We are seeking a highly experienced Senior Site Reliability Engineer (SRE) to support and optimize mission-critical cloud and on-premises platforms across Azure and air-gapped Kubernetes environments.

This role is responsible for ensuring the reliability, scalability, availability, and security of modern application platforms running across Azure Kubernetes Service (AKS) and self-managed Rancher RKE2 clusters. The ideal candidate will have strong expertise in Kubernetes operations, GitOps-driven deployments, infrastructure automation, monitoring and observability, and incident management within highly secure and complex enterprise environments.

The successful candidate will work closely with engineering, security, and operations teams to support high-availability systems and continuously improve operational resilience across connected and disconnected environments.

Key Responsibilities

Ensure reliability, availability, and performance of services running across Azure AKS and air-gapped Kubernetes (Rancher RKE2) environments while meeting strict SLAs and operational requirements.
Maintain scalable, resilient, and secure Kubernetes platforms including ingress controllers, storage layers, and stateful workloads.
Automate deployments and operational processes using Python, Go, Bash, Terraform, Bicep, and Ansible.
Implement and manage GitOps workflows using ArgoCD and Kustomize across cloud and on-premises environments.
Operate and optimize CI/CD pipelines using Azure DevOps and GitHub Actions.
Manage container supply chains for connected and disconnected environments, including private registry mirroring and image scanning.
Monitor infrastructure and application performance using Azure Monitor, Prometheus, Grafana, and OpenTelemetry.
Proactively identify, troubleshoot, and resolve platform and application issues to minimize service disruption.
Lead incident response activities, root cause analysis, and post-incident reviews while driving permanent corrective actions.
Develop and enforce operational best practices related to reliability, security, compliance, and platform governance.
Collaborate with development, platform, infrastructure, and security teams to improve system architecture and operational maturity.
Participate in on-call rotations supporting critical production systems.
Utilize ITSM processes and tools for incident, problem, and change management.
Support Agile, Scrum, and ITIL-aligned operational practices and assist with audit and compliance requirements.

Requirements

Bachelor’s degree in Computer Science, Engineering, or a related field.
Minimum 10 years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles.
Strong expertise in Microsoft Azure cloud environments, networking, and security.
Hands-on experience with:
Azure Kubernetes Service (AKS)
Rancher RKE2 or equivalent air-gapped Kubernetes platforms
Docker and Kubernetes ecosystem technologies
Strong scripting and automation experience using:
Python
Go
Bash
Strong Infrastructure as Code (IaC) experience using:
Terraform
Bicep
Ansible
Experience with GitOps methodologies and tools including ArgoCD and Kustomize.
Strong CI/CD pipeline experience using Azure DevOps and/or GitHub Actions.
Experience with monitoring and observability tools including Azure Monitor, Prometheus, Grafana, and OpenTelemetry.
Proven experience managing production incidents, troubleshooting distributed systems, and performing root cause analysis.
Strong understanding of high-availability systems, operational resilience, and enterprise security practices.
Excellent communication, stakeholder management, and collaboration skills.

Preferred Skills

Experience supporting air-gapped or highly regulated enterprise environments.
Knowledge of container security, image scanning, and private registry management.
Familiarity with enterprise compliance and audit processes.
Experience working in Agile, Scrum, and ITIL-based operational environments.
Exposure to large-scale enterprise modernization or cloud transformation programs.

Key Skills

Ranked by relevance

kubernetes cloud devops prometheus grafana scrum itil cicd infrastructure as code incident response terraform storage python bicep bash

Related Jobs

3 roles aligned with this opportunity

View all jobs

Senior Manager Software Engineering

2026-06-13

Full-time

Mid-Senior

United Arab Emirates

Financial Services

Information Technology

Junior Software Engineer

2026-06-18

Full-time

Entry

United Kingdom

Software Development

Information Technology

Engineering Manager

2026-06-17

Full-time

Mid-Senior

United Arab Emirates

Financial Services

Information Technology

🇦🇪

Country Guide

United Arab Emirates

Tax-friendly regional tech hub

Posted: May 23, 2026
Type: Contract
Level: Mid-Senior
Location: Abu Dhabi
Company: Socium - Teams Done Differently

Industries

Financial Services

Related Jobs

3 roles aligned with this opportunity

View all jobs

Senior Manager Software Engineering

2026-06-13

Full-time

Mid-Senior

United Arab Emirates

Financial Services

Information Technology

Junior Software Engineer

2026-06-18

Full-time

Entry

United Kingdom

Software Development

Information Technology

Engineering Manager

2026-06-17

Full-time

Mid-Senior

United Arab Emirates

Financial Services

Information Technology

Site Reliability Engineer

Key Skills

Related Jobs

Senior Manager Software Engineering

Junior Software Engineer

Engineering Manager

Related Jobs

Senior Manager Software Engineering

Junior Software Engineer

Engineering Manager

Cookie Settings