Site Reliability Engineer -

Site Reliability Engineer – Mistral AI

Location: London, UK (Paris also possible)
Visa Sponsorship: Available (Skilled Worker)

About Mistral AI

At Mistral AI, we believe AI should simplify tasks, save time, and enhance learning and creativity. Our high-performance, open-source AI models and solutions integrate seamlessly into daily work. Our platform includes le Chat, the AI assistant for life and work. We are a collaborative, low-ego, and innovative team with offices in France, USA, UK, Germany, and Singapore. Learn more about our culture: Mistral AI Careers.

Role Summary

We’re seeking an experienced Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of our platform and customer-facing applications. You’ll work closely with software engineers and AI/ML teams to maintain top-tier operational standards.

Responsibilities

Operations (50%)

Build and maintain scalable, fault-tolerant infrastructures for web services and ML workloads
Ensure high availability of platforms, training environments, and HPC clusters
Troubleshoot production issues, participate in on-call rotations, and perform root-cause analysis
Implement monitoring, alerting, and incident response systems
Manage CI/CD pipelines, containerization, orchestration, and logging systems

Development (50%)

Automate infrastructure deployment and orchestration using Kubernetes, Terraform, Flux, etc.
Collaborate with researchers to enable safe, reproducible model-training experiments
Build a cloud-agnostic platform bridging science and infrastructure
Develop workflows, dashboards, scripts, and APIs to improve system reliability and performance
Ensure security compliance and document best practices
Contribute to open-source projects and technical publications

Requirements

Master’s in Computer Science, Engineering, or related field
7+ years in DevOps/SRE roles with experience in highly available distributed systems
Strong knowledge of cloud computing, monitoring, CI/CD, containerization (Docker/Kubernetes), scripting (Python/Go/Bash), and infrastructure-as-code (Terraform/CloudFormation)
Hands-on experience with observability tools (Prometheus, Grafana, ELK, Datadog)
Excellent problem-solving, communication skills, and ability to thrive in a startup environment

Bonus Skills

Experience in AI/ML environments
High-performance computing (HPC) systems and workload managers (Slurm)
Familiarity with AI-oriented platforms like Fluidstack, Coreweave, or Vast

Location & Remote Work

Primarily based in Paris HQ, with flexibility for remote work
Remote candidates may be considered in France, UK, Germany, Netherlands, Spain, or Italy with required office visits for onboarding and monthly collaboration

What We Offer

Competitive salary + equity
Health insurance, transport & sport allowances, meal vouchers
Private pension plan & generous parental leave
Visa sponsorship

Apply Now

Join a pioneering AI company shaping the future of technology