-
View all jobs
We are looking to strengthen our team for a DevOps/SRE Engineer!
Requirements
Requirements
- Minimum 5 years of experience in a DevOps and/or Site Reliability Engineering role
- Strong hands-on experience with Linux system administration
- Extensive experience deploying, operating, and scaling Kubernetes in both cloud and bare-metal environments
- Deep expertise and practical experience with at least one major cloud provider (preferably Google Cloud Platform)
- Experience with ML inference on GPU/CPU is a strong plus
- Proven experience implementing SRE practices and building observability stacks using Grafana, Prometheus, and Loki
- Strong adherence to GitOps, Infrastructure as Code (IaC), and CI/CD principles
- Advanced expertise in Terraform, Ansible, and Python
- Comfortable working in high-uncertainty environments: we are building a new product, requirements evolve quickly, and the ability to rapidly learn new technologies and patterns is essential
- Proactive mindset: ability to look beyond DevOps tasks and actively debug and understand the product
- Strategic thinking: ability to choose technologies and architectural approaches based on long-term goals rather than short-term compromises
- Deploy, operate, and evolve a microservices-based platform running in Kubernetes clusters across AWS, GCP, and on-prem (Rancher)
- Operate and support GPU-based ML inference services (Triton Inference Server, vLLM) deployed on RunPod, Scaleway, and Nebius
- Build and maintain Docker images for all microservices and ensure a stable service lifecycle
- Maintain and scale development and production Kubernetes clusters, actively participate in deployment debugging, incident investigation, and performance troubleshooting
- Develop, maintain, and evolve custom Helm charts for each service
- Design and operate CI/CD pipelines using GitHub (code and pipelines) and GitLab for on-prem customer deployments
- Ensure platform compliance with SOC 2 requirements and actively contribute to improving security and compliance processes
- Manage cluster access via NetBird VPN, implementing role-based access control using group policies
- Deploy and manage infrastructure using IaC practices with Terraform and Ansible
- Develop and continuously improve observability systems:
- Grafana & Prometheus for metrics
- ELK stack for centralized log storage and analysis
- Continuously optimize infrastructure in the areas of IaC, IAM, Observability, and CI/CD
- Work with a technology stack, including: Python, Kubernetes, Linux, Docker, GitHub CI/CD, PostgreSQL, ClickHouse, Kafka, Superset, Terraform, Ansible
- The team has built award-winning AI products for tech corporations — devices, voice assistants, products that are actually in the world
- Cutting-edge tech stack: Speech Technologies, NLP, Generative AI (LLMs, diffusion models), voice-first agentic architecture with privacy-first and on-premises deployment
- High engineering bar and real ownership — the team cares about what actually works in production, not what looks good in a demo, and you'll see the impact of your work directly
- Fast career progression — a senior-heavy team and a high volume of real problems means you grow faster than you would anywhere else
- Startup pace with enterprise stability — real clients, real revenue, no bureaucracy
- Fully remote
- 21 vacation days + public holidays + 5 sick days
- Private English lessons via Preply
- Participation in Employee Stock Ownership Plan (ESOP)
Key Skills
Ranked by relevance
kubernetes
terraform
cloud
cicd
microservices
prometheus
docker
devops
linux
ai
infrastructure as code
postgresql
ansible
grafana
storage
python
server
gitlab
kafka
aws
gcp
vpn
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Full Stack Software Engineer (React & Python)
2026-06-12
Contract
Not Applicable
Romania
Technology
Engineering
View Job Details
Related
Junior AWS DevOps Engineer (Remoto)
2026-06-12
Full-time
Not Applicable
Spain
Technology
Engineering
Login to Apply
- Posted
- May 16, 2026
- Type
- Full-time
- Level
- Not Applicable
- Location
- Greater Buenos Aires
- Company
- Acclaim AI
Industries
Technology
Information
Internet
Categories
Engineering
Information Technology
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Full Stack Software Engineer (React & Python)
2026-06-12
Contract
Not Applicable
Romania
Technology
Engineering
View Job Details
Related
Junior AWS DevOps Engineer (Remoto)
2026-06-12
Full-time
Not Applicable
Spain
Technology
Engineering