Site Reliability Engineer
Job Description
Latent is building the intelligence infrastructure for American healthcare. Our products are already helping hospitals and clinics dramatically increase workflow output, speed up patient access to medications, and boost provider revenue. Our flagship multi-modal search and question-answering platform analyzes EHR data to surface the most relevant information, reducing operational overhead and improving care delivery.
We’re a small, mission-driven team backed by General Catalyst, Conviction, and YC, tackling some of healthcare’s hardest technical challenges. If you’re passionate about reliability, scalability, and performance in a high-stakes domain, we’d love to meet you.
About the Role
As a Site Reliability Engineer at Latent, you’ll design, build, and operate the core infrastructure that powers our AI-driven healthcare platform. You will own the reliability and scalability of our cloud services, build monitoring and alerting pipelines, and enable rapid iteration with high confidence in safety and uptime. You’ll be instrumental in evolving our infrastructure to support real-time, fault-tolerant ML applications in production.
This is a high-impact, high-ownership role based full-time onsite in our San Francisco office.
What You’ll Do
- Design and maintain scalable, resilient, and secure infrastructure across cloud platforms (e.g., AWS)
- Improve observability across our systems with robust metrics, tracing, and alerting (e.g., Datadog)
- Build automation to eliminate toil, reduce deployment friction, and accelerate developer velocity
- Own our infrastructure-as-code stack (e.g. Terraform, Kubernetes) to ensure reproducibility and control
- Help scale real-time and batch infrastructure for medical LLM inference and data pipelines
- Build incident response systems, participate in on-call rotations, and drive postmortem culture
- Optimize CI/CD pipelines and deployment flows for performance, security, and safety
- Collaborate with engineering, ML, and compliance teams to uphold HIPAA and data privacy standards
You Might Be a Fit If You…
- Have 5+ years of experience in infrastructure, DevOps, or site reliability engineering roles
- Are proficient with cloud-native systems (e.g. AWS), Bash, and tools like Docker, Kubernetes, and Terraform
- Have built and maintained production systems with strong uptime and security guarantees
- Have experience setting up scalable monitoring, tracing, and alerting systems
- Thrive in fast-moving, high-ownership, zero-to-one environments and can make pragmatic trade-offs under uncertainty
- (Bonus) Experience supporting ML or data infrastructure (e.g., model serving, feature stores, vector databases)
Compensation
The expected salary range for this role is $140,000 to $240,000 annually, in addition to equity and comprehensive benefits. Compensation packages are highly variable based on a variety of factors including experience and expertise. If your compensation expectations fall outside this range, we still encourage you to apply.
Why You Should Join Us
- Backed by top investors: General Catalyst, Conviction, and YC
- Tight-knit, world-class team with a deep sense of mission
- Huge greenfield opportunity with significant ownership and room for growth
- Competitive salary and equity compensation. The equity upside of an early-stage startup with the product-market fit of a later-stage company.
- Excellent benefits and versatile health, dental, and vision coverage plans
- Paid parental leave
- Lunch and dinner provided at the office
- Unlimited PTO
Company Information
Location: Not specified
Type: Not specified