DevOps (Cloud-AI)
W2 Contract
Pay Rate: $55 - $65 per hour
Location: Cupertino, CA - Remote Role
Job Summary:
We are looking for a highly motivated DevOps / Site Reliability Engineer to support large-scale Kubernetes-based infrastructure and platform operations. This role is focused on building, automating, and operating highly reliable systems that power critical engineering platforms and services.
Duties and Responsibilities:
- Design, build, automate, and support scalable Kubernetes-based platforms and services
- Operate and troubleshoot production environments running at scale
- Develop automation and tooling to improve operational efficiency and reliability
- Monitor platform health, performance, and availability using observability tooling
- Troubleshoot infrastructure, application, and networking issues across distributed systems
- Work closely with engineering teams to improve deployment, reliability, and scalability practices
- Participate in operational support, incident response, and root cause analysis
- Improve CI/CD workflows and deployment automation
- Drive operational excellence through documentation, automation, and process improvements
- Take ownership of projects and independently drive deliverables to completion
Requirements and Qualifications:
- Strong hands-on experience with Kubernetes platforms such as EKS, GKE, AKS, or similar
- Experience running and supporting applications on Kubernetes at scale
- Strong understanding of containerized infrastructure and distributed systems
- Experience with monitoring and observability tools, preferably Grafana and Prometheus
- Experience with CI/CD pipelines and deployment automation
- Experience with Splunk logging, log analysis, and troubleshooting
- Strong scripting and automation experience using Python and/or Golang
- Experience troubleshooting production systems under pressure
- Strong communication and collaboration skills
- Self-starter mentality with strong ownership and accountability
Preferred Qualifications
- Experience operating Ray clusters/services
- Strong networking and troubleshooting experience
- Experience with cloud infrastructure and platform services
- Experience with Infrastructure as Code and automation frameworks
- Experience supporting high-scale production systems
- Familiarity with SRE principles and operational best practices
Desired Skills and Experience
Kubernetes, DevOps, Site Reliability Engineering, SRE, Cloud Infrastructure, EKS, GKE, AKS, Kubernetes Platforms, Kubernetes Operations, Containerized Infrastructure, Distributed Systems, Platform Engineering, Infrastructure Automation, Automation Tooling, Production Support, Production Troubleshooting, Monitoring, Observability, Grafana, Prometheus, Splunk, Log Analysis, CI/CD Pipelines, Deployment Automation, Python, Golang, Networking Troubleshooting, Incident Response, Root Cause Analysis, Infrastructure as Code, Automation Frameworks, Ray Clusters, Ray Services, High-Scale Production Systems, Operational Excellence, Documentation, Reliability Engineering, Scalability, Platform Services, Engineering Collaboration, Ownership, Accountability
Bayside Solutions, Inc. is not able to sponsor any candidates at this time. Additionally, candidates for this position must qualify as a W2 candidate.
Bayside Solutions, Inc. may collect your personal information during the position application process. Please reference Bayside Solutions, Inc.'s CCPA Privacy Policy at www.baysidesolutions.com.
Key Skills
Ranked by relevance
Related Jobs
3 roles aligned with this opportunity
DevOps Engineer / AWS DevOps Consultant
2026-05-13
Cloud & Devops Engineer
2026-05-13
Senior Backend Engineer .NET & Azure Cloud
2026-05-25
- Posted
- May 14, 2026
- Type
- Contract
- Level
- Mid-Senior
- Location
- Cupertino
- Company
- Bayside Solutions
Industries
Categories
Related Jobs
3 roles aligned with this opportunity
DevOps Engineer / AWS DevOps Consultant
2026-05-13
Cloud & Devops Engineer
2026-05-13
Senior Backend Engineer .NET & Azure Cloud
2026-05-25