-
Kinetic Business Solutions

DevOps Engineer

Kinetic Business Solutions
United Arab Emirates · Full-time · Mid-Senior

Kinetic has partnered with a leading Technology company who are hiring an MLOps Engineer to be based in Abu Dhabi.


***Please take the time to read the job description, you must meet all the criteria set out below for your application to be considered. We do check all applications and suitable candidates will be contacted within 5 working days. If you are not contacted by us within that time, please consider your application unsuccessful on this occasion.***


The main responsibilities will include but not limited to:

  • Build, deploy, monitor, and manage large-scale AI infrastructure based on HGX H200 nodes.
  • Operate and manage Kubernetes or OpenShift clusters for multi-node orchestration.
  • Deploy and manage LLMs and other AI models for inference using Triton Inference Server or custom endpoints.
  • Automate CI/CD pipelines for model packaging, serving, retraining, and rollback using GitLab CI or ArgoCD.
  • Set up model and infrastructure monitoring systems (Prometheus, Grafana, NVIDIA DCGM).
  • Implement model drift detection, performance alerting, and inference logging.
  • Manage model checkpoints, reproducibility controls, and rollback strategies.
  • Track deployed model versions using MLFlow or equivalent registry tools.
  • Implement secure access controls for model endpoints and data artifacts.
  • Collaborate with the AI/Data Engineer to integrate and deploy fine-tuned datasets.
  • Ensure high availability, performance, and observability of all AI services in production.


To be successful you will need to meet the following:

  • 10+ overall experience with solution operations.
  • Minimum 3+ years of experience in DevOps, MLOps, or AI/ML infrastructure roles.
  • Proven experience with Kubernetes or OpenShift in production environments, preferably certified.
  • Experience with CI/CD automation tools with OpenShift / Kubernetes.
  • Hands-on experience with model registry systems (e.g., MLFlow, KubeFlow)
  • Experience with monitoring tools (e.g., Prometheus, Grafana) and GPU workload optimization.
  • Strong scripting skills (Python, Bash) and Linux system administration knowledge.
  • Familiarity with deploying and scaling PyTorch or TensorFlow models for inference.
  • Applicants should be available for face-to-face interviews in the location mentioned above.


Hiring? If you need help filling a similar position in your company, please contact us on +971(0)4 433 4579 or click here.


***We check all applications and suitable candidates will be contacted within 5 working days. If you are not contacted by us within that time, please consider your application unsuccessful on this occasion.***

Key Skills

Ranked by relevance

ai kubernetes prometheus grafana mlflow cicd system administration high availability tensorflow gitlab ci pytorch python devops server gitlab linux mlops bash
Login to Apply
Posted
May 12, 2025
Type
Full-time
Level
Mid-Senior
Location
Abu Dhabi

Industries

Technology Information Media

Categories

Engineering

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
Drivvn
Related

Frontend Engineer

2026-06-16

Full-time
Associate
United Kingdom
Technology
Engineering
View Job Details
MoMo from MTN
Related

Manager: Machine Learning

2026-06-16

Full-time
Mid-Senior
United Arab Emirates
Financial Services
Information Technology
View Job Details
Kinetic Business Solutions
Related

Network Engineer

2025-11-24

Full-time
Mid-Senior
United Arab Emirates
Technology
Information Technology