Senior Site Reliability Engineer

Apple • San Diego, California, United States • Full-time

$120,000

per year

Kubernetes Cloud Infrastructure Distributed Systems Site Reliability Engineering satellite communications

Job Description

Apple is looking for a Senior Engineer with systems and software engineering experience to join our Satellite Communications Group SRE team. The SRE team builds, monitors, and maintains large scale, highly resilient systems that enable our customers to access communications services via satellite. You'll be contributing to distributed systems, architecture design, and cloud infrastructure (as code!) for critical and unique customer-facing Apple services. This is a rare opportunity to build and control the entire end to end infrastructure, along with all supporting components such as provisioning, monitoring, deployment, and software tools platforms, from the beginning within a team with a no-ops culture.

Description

At Apple, we strive every single day to craft products that enrich people’s lives. Our successes are the result of skilled domain experts working in an environment that encourages creativity, collaboration, and rethinking of old problems in new ways! As a member of the Satellite Connectivity Group, you will work on the satellite network that enables connectivity to iPhone when off the grid without cellular or Wi-Fi coverage. Every day, Apple customers use Emergency SOS via satellite to access emergency assistance when they are in need of help and have no other means to communicate. You will have the unique and rewarding opportunity to shape this and other critical services to the benefit and safety of millions of Apple device users. You will build and run an Apple service enabling platform that millions of customers may rely on every day. You’ll also build and run the infrastructure that powers those services with emphasis on build, not just operate or implement. We’re looking for people who like to solve problems using software rather than shell prompts as we scale Apple’s services for customers around the world. Help us build the Apple experience on a global scale!

Minimum Qualifications

Deep understanding of distributed systems principles, including consistency, fault tolerance, and scalability. Strong familiarity with consensus algorithms (e.g., Raft, Paxos, Zab, etc) Experience building and operating multi-clustered and highly-available services Experience with Temporal/Cadence/Windmill or other durable execution platforms Understanding of zero-trust application architecture Proven experience building and optimizing real-time and batch data processing pipelines using technologies such as Kafka, Spark, Flink, Beam, etc. Kubernetes experience, including cluster management as well as application deployment and configuration Experience with IoT/Edge device compute and infrastructure Experience or interest in RF, Cellular, Satellite communications (Bluetooth, GPS, WiFi, LTE/5G) Experience with modern web-scale services including servers, VIPs, load balancers, proxies Experience working with monitoring and metrics platforms like Splunk and Prometheus Education: Engineering or technical BS is a positive but not required

Preferred Qualifications

Experience supporting environments with thousands of servers and critical uptime requirements Able to write software tools & services needed to build and operate a large scale platform Proficient with Puppet. Experience with IP network design and architecture; Cisco, Juniper, or Arista routing and switching hardware & configuration. Kubernetes experience, including cluster management as well as application deployment and configuration Experience with modern web-scale services including servers, vips, load balancers, proxies. Experience working with monitoring and metrics platforms like Splunk and Prometheus Education: Technical engineering BS is a positive but not required