Member of Technical Staff – AI Inference platform

Lyceum

Switzerland · Full-time · Associate

Your mission

You will make Lyceum's AI inference platform reliable, secure, and scalable - ensuring it performs under pressure as we grow to thousands of concurrent

users. While others on the team expand what the platform can do, your job is to make sure it keeps working, fails gracefully, and gets faster over time.

Your focus

Scalability: Architect and implement the systems that allow our inference platform to scale to thousands of concurrent users. This includes request

routing, load balancing, autoscaling, and resource scheduling across GPU clusters.

Reliability and observability: Build robust monitoring, alerting, and incident response tooling. Design for graceful degradation, automatic recovery, and minimal downtime.

Performance engineering: Profile and optimise the full inference path from request ingestion through model execution to response delivery. Identify and

eliminate bottlenecks at every layer.

Infrastructure evolution: Evaluate and integrate open-source inference frameworks and tooling (Dynamo, vLLM, Triton, etc.) where they improve

throughput, latency, or stability of the serving stack.

Your KPIs

Platform uptime and availability (SLA adherence)
P50/P95/P99 latency and throughput under load
Time-to-detection and time-to-resolution for incidents
Scalability milestones (concurrent users, requests per second, GPU utilisation)

Your profile

We consider candidates from diverse backgrounds, with a deep love for technical challenges and the desire to take on ownership beyond what's

reasonably expected.

Requirements

3+ years of experience in backend, infrastructure, or systems engineering
Strong proficiency in Go and Python
Experience building or operating a model serving platform or ML platform
Solid understanding of systems performance - profiling, benchmarking, and optimising latency and throughput
Familiarity with observability tooling (Prometheus, Grafana, OpenTelemetry, or similar)
Understanding of security fundamentals - network isolation, authentication, encryption, multi-tenancy

Nice to have

Experience with NVIDIA Dynamo or similar inference orchestration/routing frameworks
Hands-on experience with GPU serving infrastructure (vLLM, Triton, TensorRT-LLM)
Experience with Kubernetes in a production environment (deployment, networking, resource management)
Experience operating at scale (10k+ RPS, multi-region, multi-cluster)
Comfortable working on-call or in incident response when things break

Why us?

Outstanding team: Work with some of the best engineers in the world, coming from hedge funds, big tech, AI startups and top universities.
Once in a lifetime opportunity: Early-stage company in the fastest-growing market in the world
Ownership: Shape how European AI companies access GPU compute
European mission: Build sovereign, GDPR-compliant AI infrastructure for the next generation of deep-tech

Key Skills

Ranked by relevance

ai incident response kubernetes prometheus grafana gdpr sla

Related Jobs

3 roles aligned with this opportunity

View all jobs