Back to Jobs

Site Reliability Engineer

Great Question • No location specified • Full-time
N/A
per year

Job Description

🚀 About Us

We’re a product-focused startup with a tight-knit team of 14 engineers building tools that help teams make better decisions through great research. We're pragmatic, fast-moving, and obsessed with product quality.

As we grow, our infrastructure needs to grow with us. That means better observability, stronger systems, faster deploys—and smarter decisions about cloud spend. We’re hiring someone who can take ownership of this and lay the foundation for long-term platform health.

🎯 What You’ll Do

You’ll be the first dedicated DevOps/Infra hire with end-to-end ownership of platform health, reliability, and scalability. You’ll partner directly with our engineering team to improve our systems, reduce toil, and make infra a product in its own right.

Your scope will include:

  • Observability & Reliability

    • Define and maintain service SLOs, dashboards, and alerts

    • Improve incident detection and response

    • Establish best practices around reliability and error budgets

  • Infrastructure

    • Maintain and improve Terraform-managed infrastructure

    • Lead our migration of staging infrastructure to AWS

    • Scale systems to handle growth and changing workloads

  • Developer Experience & CI/CD

    • Increase pipeline reliability

    • Speed up deploy cycles and improve rollback confidence

  • Database Performance

    • Help identify and fix slow queries, optimize indexes

    • Support product teams with performance diagnostics

  • Cloud Cost Management

    • Monitor and optimize cloud spend

    • Build visibility and tooling to help teams make cost-aware decisions

💡 You Might Be a Great Fit If You...

  • Have 4–8+ years of experience in DevOps, SRE, or Infrastructure roles

  • Have hands-on AWS experience (EC2, RDS, VPCs, etc.)

  • Are confident with Terraform, GitHub Actions, Docker, and PostgreSQL

  • Have a track record of improving observability and reducing incident response times

  • Have worked in high-autonomy, high-ownership environments

  • Are cost-conscious and can identify waste in infra and cloud spend

  • Love building leverage tools for engineers—infra as a product

📈 Growth Path

This is a foundational hire. Today, the role is fully IC, but there’s clear runway to grow into:

  • Platform leadership (tech lead or manager)

  • Head of Infra/SRE if we expand the team

  • Principal engineer focused on scale, reliability, and platform strategy

You’ll have support and visibility from leadership, and the freedom to chart your path as the company grows.

⚙️ Our Stack

  • Cloud: AWS

  • Infra-as-code: Terraform
    CI/CD: GitHub Actions

  • Containers: Docker, lightweight Kubernetes
    Monitoring: Datadog, Sentry

  • Database: PostgreSQL, Redis

  • App: Rails, React, Sidekiq

✨ Why This Role?

  • Impact: You’ll shape the systems and culture of how we build and run software.

  • Trust: High autonomy and low process—make smart decisions, move fast.

  • People: No egos, just a team that values thoughtfulness, speed, and care.

  • Growth: Opportunity to grow with the company in whichever direction excites you.


Company Information

Location: Oakland, California, United States

Type: Hybrid