Site Reliability Engineer

Latent • San Francisco, California, United States • Full-time

$140,000

per year

Automation DevOps Infrastructure Cloud Services Healthcare Technology Monitoring Site Reliability Engineering

Job Description

Latent is building the intelligence infrastructure for American healthcare. Our products are already helping hospitals and clinics dramatically increase workflow output, speed up patient access to medications, and boost provider revenue. Our flagship multi-modal search and question-answering platform analyzes EHR data to surface the most relevant information, reducing operational overhead and improving care delivery.

We’re a small, mission-driven team backed by General Catalyst, Conviction, and YC, tackling some of healthcare’s hardest technical challenges. If you’re passionate about reliability, scalability, and performance in a high-stakes domain, we’d love to meet you.

About the Role

As a Site Reliability Engineer at Latent, you’ll design, build, and operate the core infrastructure that powers our AI-driven healthcare platform. You will own the reliability and scalability of our cloud services, build monitoring and alerting pipelines, and enable rapid iteration with high confidence in safety and uptime. You’ll be instrumental in evolving our infrastructure to support real-time, fault-tolerant ML applications in production.

This is a high-impact, high-ownership role based full-time onsite in our San Francisco office.

What You’ll Do

Design and maintain scalable, resilient, and secure infrastructure across cloud platforms (e.g., AWS)
Improve observability across our systems with robust metrics, tracing, and alerting (e.g., Datadog)
Build automation to eliminate toil, reduce deployment friction, and accelerate developer velocity
Own our infrastructure-as-code stack (e.g. Terraform, Kubernetes) to ensure reproducibility and control
Help scale real-time and batch infrastructure for medical LLM inference and data pipelines
Build incident response systems, participate in on-call rotations, and drive postmortem culture
Optimize CI/CD pipelines and deployment flows for performance, security, and safety
Collaborate with engineering, ML, and compliance teams to uphold HIPAA and data privacy standards

You Might Be a Fit If You…

Have 5+ years of experience in infrastructure, DevOps, or site reliability engineering roles
Are proficient with cloud-native systems (e.g. AWS), Bash, and tools like Docker, Kubernetes, and Terraform
Have built and maintained production systems with strong uptime and security guarantees
Have experience setting up scalable monitoring, tracing, and alerting systems
Thrive in fast-moving, high-ownership, zero-to-one environments and can make pragmatic trade-offs under uncertainty
(Bonus) Experience supporting ML or data infrastructure (e.g., model serving, feature stores, vector databases)

Compensation

The expected salary range for this role is $140,000 to $240,000 annually, in addition to equity and comprehensive benefits. Compensation packages are highly variable based on a variety of factors including experience and expertise. If your compensation expectations fall outside this range, we still encourage you to apply.

Why You Should Join Us

Backed by top investors: General Catalyst, Conviction, and YC
Tight-knit, world-class team with a deep sense of mission
Huge greenfield opportunity with significant ownership and room for growth
Competitive salary and equity compensation. The equity upside of an early-stage startup with the product-market fit of a later-stage company.
Excellent benefits and versatile health, dental, and vision coverage plans
Paid parental leave
Lunch and dinner provided at the office
Unlimited PTO