-
View all jobs
Job Summary
We are looking for a Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of our production systems. The SRE will work closely with engineering, DevOps, and product teams to build highly available systems, automate operations, and improve system observability while maintaining service level objectives (SLOs).
Key Responsibilities
Reliability & Operations
We are looking for a Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of our production systems. The SRE will work closely with engineering, DevOps, and product teams to build highly available systems, automate operations, and improve system observability while maintaining service level objectives (SLOs).
Key Responsibilities
Reliability & Operations
- Ensure high availability, reliability, and performance of production systems.
- Define, monitor, and manage SLIs, SLOs, and SLAs.
- Lead incident response, root cause analysis (RCA), and post-incident reviews.
- Implement proactive monitoring and alerting to prevent outages.
- Automate repetitive operational tasks using scripting and infrastructure-as-code.
- Improve system reliability through engineering solutions rather than manual intervention.
- Reduce toil by building tools, automation, and self-healing systems.
- Design and manage scalable infrastructure on cloud platforms (AWS / Azure / GCP).
- Manage containerized workloads using Docker and Kubernetes.
- Implement and maintain CI/CD pipelines for safe and frequent deployments.
- Build and maintain observability solutions using tools such as:
- Prometheus, Grafana
- ELK / OpenSearch
- Datadog, New Relic
- Track system performance, capacity planning, and error budgets.
- Ensure reliability best practices aligned with security standards.
- Participate in on-call rotations and ensure secure system operations.
- Collaborate with security teams to implement secure infrastructure practices.
- Bachelor’s degree in Computer Science, Engineering, or related field.
- Strong experience in Linux/Unix system administration.
- Proficiency in at least one scripting or programming language:
- Python, Go, Bash, or Java
- Experience with cloud platforms (AWS / Azure / GCP).
- Hands-on experience with Kubernetes and container orchestration.
- Knowledge of networking fundamentals (TCP/IP, DNS, load balancing).
- Experience with monitoring, alerting, and incident management.
- Experience implementing SRE best practices from Google SRE principles.
- Knowledge of Terraform, Ansible, or CloudFormation.
- Experience with service mesh (Istio, Linkerd).
- Understanding of chaos engineering tools (Gremlin, Chaos Mesh).
- Experience in fintech, banking, or high-availability systems.
Key Skills
Ranked by relevance
cloud
aws
incident response
high availability
kubernetes
terraform
ansible
docker
devops
istio
bash
cicd
dns
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
System Engineer/Site Reliability Engineer (m/w/d)
2026-06-09
Full-time
Not Applicable
Germany
IT Services
Engineering
View Job Details
Related
Senior Engineer – Network Operations
2026-05-24
Full-time
Mid-Senior
United Arab Emirates
IT Services
Information Technology
View Job Details
Related
Fullstack Engineer (m/w/d) - Android & Kotlin
2026-05-22
Full-time
Not Applicable
Germany
IT Services
Engineering
Login to Apply
- Posted
- Jan 15, 2026
- Type
- Contract
- Level
- Entry
- Location
- Dubai
- Company
- Dicetek LLC
Industries
IT Services
IT Consulting
Categories
Engineering
Information Technology
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
System Engineer/Site Reliability Engineer (m/w/d)
2026-06-09
Full-time
Not Applicable
Germany
IT Services
Engineering
View Job Details
Related
Senior Engineer – Network Operations
2026-05-24
Full-time
Mid-Senior
United Arab Emirates
IT Services
Information Technology
View Job Details
Related
Fullstack Engineer (m/w/d) - Android & Kotlin
2026-05-22
Full-time
Not Applicable
Germany
IT Services
Engineering