Senior Site Reliability Engineer
Join our global force of 500+ innovators, blending the latest in tech with the greatest in soundtracking, from our Stockholm HQ to offices in London, New York, Los Angeles, Berlin, Oslo, and Seoul. We’re an industry leader with a startup mentality. We take what we do seriously, but we don’t take ourselves too seriously. Creating and collaborating to transform the sound of streaming, content, and culture. Come join us—and let the world feel your work.
We are looking for an experienced Site Reliability Engineer (SRE) with a proven track record of driving reliability and scalability in dynamic, growth-oriented environments. In this central role, you will bring deep expertise to architect and implement resilient systems, mentor teams on operational excellence, and embed best practices across our infrastructure. Your experience in navigating similar journeys will be instrumental as we elevate our cloud platforms, enhance observability, and set new benchmarks for scalability and performance. Join us in building a world-class, robust infrastructure for the future!
How you will make an impact
Champion a Platform Engineering mindset: Developers own their own infrastructure, and we provide the tooling and platform to make it effortless.
Architect and build scalable, resilient, and observable infrastructure, working closely with development teams to enable seamless delivery and robust testing.
Foster a collaborative environment through code reviews, pair programming, and mob programming.
Lead Kubernetes and container management, simplifying operations, optimizing performance, and promoting best practices.
Coach engineering teams on cloud-native principles, promoting continuous learning and knowledge sharing.
What we're looking for
Platform Engineering: Play a key role in creating a developer-first platform that streamlines workflows and accelerates productivity.
Product mindset: Track record of building great products with a user-centric approach to technology choices.
Architectural knowledge: Deep understanding of modern web architectures, system design, and engineering principles for building scalable and reliable platforms.
Cloud Native: We are fully cloud-native in GCP (Google Cloud Platform), running our services on GKE with managed databases. Hands-on experience with GCP and managing multiple Kubernetes clusters is essential.
Observability skills: Expertise in troubleshooting distributed systems, building dashboards (e.g., Grafana), and instrumenting metrics, logs, and traces for monitoring and performance analysis.
CI/CD proficiency: Strong experience with CI/CD and GitOps operational frameworks. We use GitHub Actions, ArgoCD, and manage all our infrastructure as code using terraform.
Programming skills: Strong programming skills in Go or Typescript, the primary languages used within our team.
Mentorship: Experience in mentoring and guiding engineering teams, promoting operational excellence through documentation and best practices.
It would also be music to our ears if you have
Experience with Progressive Delivery and Canary Releases using tools like Argo Rollouts.
Expertise in managing an Observability stack (We run Thanos, OpenTelemetry with Grafana Tempo, Loki, Grafana, Prometheus).
Proficiency in defining and tracking Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to enhance system reliability.
Equal Opportunity Employer
We believe that bringing people together from different backgrounds, experiences and perspectives makes for a healthy workplace, a more successful business and a better world. We value diversity and encourage everyone to come and soundtrack the world with us.
Application
Do you want to be a part of our fantastic team? Please apply, in English, by clicking the link below.
We embrace a hybrid work setup that offers the freedom and flexibility to balance your personal and professional life. At the same time, we value the collaboration, creativity, and connection that comes from spending the majority of our time together in the office. This approach allows us to innovate as a team while giving you the autonomy to work in a way that suits you best.
- Department
- Tech & Data Analytics
- Locations
- HQ (Stockholm)
- Remote status
- Hybrid Remote
- Employment type
- Full-time
HQ (Stockholm)
About Epidemic Sound
Join our global force of 500+ innovators, blending the latest in tech with the greatest in soundtracking, from our Stockholm HQ to offices in London, New York, Los Angeles, Berlin, Oslo, and Seoul. We’re an industry leader with a startup mentality. We take what we do seriously, but we don’t take ourselves too seriously. Creating and collaborating to transform the sound of streaming, content, and culture. Come join us—and let the world feel your work.
Senior Site Reliability Engineer
Loading application form
Already working at Epidemic Sound?
Let’s recruit together and find your next colleague.