Range
Senior Infrastructure & Operations Engineer (Kubernetes / Platform Reliability)
Munich, Germany · New York, NY, USA · Lisbon, Portugal · London, UK · Remote
Posted on Apr 12, 2026 Apply now
We’re looking for a senior infrastructure and operations engineer to own and evolve our platform reliability. You’ll design, operate, and maintain our Kubernetes-based infrastructure, build reliable monitoring and alerting pipelines, and ensure our systems remain stable under real-world load and failure conditions. This is a hands-on role for someone with deep experience running production systems at scale and who focuses on making infrastructure predictable and stable. You’ll work across Kubernetes, networking, CI/CD, Cloudflare, and observability to create a platform engineers can trust.
Design, deploy, and maintain production Kubernetes clusters. Own cluster reliability, upgrades, security, and performance. Build and operate monitoring, logging, and alerting pipelines. Ensure full-stack observability across infrastructure and services. Design and maintain CI/CD pipelines that are fast, reproducible, and safe. Improve deployment strategies (rollouts, canaries, rollbacks). Automate infrastructure provisioning and configuration. Investigate and resolve production incidents. Improve system resilience, redundancy, and recovery strategies. Define SLOs/SLIs and track reliability targets. Optimize and maintain our Cloudflare setup (caching, routing, security, edge behavior). Work closely with engineering teams to improve operational practices. Identify and remove single points of failure.
Senior-level experience operating production infrastructure. Deep, hands-on expertise with Kubernetes (cluster internals, networking, storage, security). Strong networking fundamentals (TCP/IP, routing, DNS, TLS, load balancing). Experience debugging distributed systems and network-related issues. Experience optimizing CDN and edge setups, including Cloudflare. Strong experience building monitoring and observability systems. Experience with metrics, logs, traces, and alerting pipelines. Experience designing reliable CI/CD pipelines. Strong Linux fundamentals. Experience with infrastructure as code and automation. Ability to debug issues across the entire stack. Experience handling incidents and conducting postmortems.
Experience with multi-cluster or multi-region setups. Experience with high-throughput or data-heavy systems. Experience with Elasticsearch or large-scale data infrastructure. Experience with service meshes. Experience with cost optimization and capacity planning. Experience in regulated or reliability-focused environments.
You assume infrastructure will fail and design accordingly. You prioritize reliability, visibility, and recoverability. You build systems that engineers trust in production. You automate carefully and deliberately. You are calm and methodical during incidents. You focus on long-term stability over short-term fixes. You document and standardize important processes.
Hardening Kubernetes clusters for high availability and safe upgrades. Debugging network latency or connectivity issues across services. Optimizing Cloudflare caching, routing, and edge security rules. Building monitoring pipelines that provide reliable signals. Designing alerting that is actionable and low-noise. Improving deployment reliability and rollback safety. Removing single points of failure in production systems. Ensuring observability across all services and data pipelines.
Join one of the fastest-growing sectors in Web3 as stablecoins reach mass adoption. Competitive compensation with meaningful equity upside. Strong potential for growth and leadership opportunities. Remote-first culture with bi-yearly international off-sites. Opportunities for global conference travel and ecosystem engagement. Health and wellness benefits.
A short introduction and your background. Examples of infrastructure or platform work you’ve led. Any public write-ups, repositories, or talks (if available). We’re particularly interested in engineers who have built and operated Kubernetes platforms, improved network reliability, and optimized CDN/edge setups such as Cloudflare in production. Apply now See more open positions at Range