Senior Data Engineer, Tools & Infrastructure

Apple • San Diego, California, United States • Full-time

$120,000

per year

Data Engineering SQL Cloud Computing Machine Learning ETL data pipelines Data Warehousing

Job Description

Come join a creative engineering team devoted to making our products more durable through data-driven insights. We're looking for a senior data engineer to work with software and hardware engineers to unlock the power of our hardware test data. In this role, you'll architect and build scalable data pipelines and analytics infrastructure to help take our department's capabilities to the next level. You'll work with several hardware and software engineering teams throughout Apple to design robust data solutions, implement cloud-native data platforms, and iterate based on evolving requirements. The data infrastructure you build will power analytics reviewed at an executive level and directly influences the design of future products. This is a hands-on work environment where engineers are expected to be self-motivated, proficient with a wide range of data technologies, and support several concurrent data initiatives while ensuring the reliability and performance of our critical data warehouse systems.

Description

In this role you'll architect, implement, and optimize data solutions for a small, highly effective engineering team while providing technical leadership on data strategy, identifying new data opportunities within the Reliability department, proposing organization-wide data platforms, and building new analytics: Design and build scalable data pipelines and ETL processes to ingest, transform, and manage hardware test data from multiple sources Lead the migration of legacy data warehouse components to modern cloud-native solutions while ensuring minimal downtime, data integrity, and seamless transition for end users Build and integrate cutting-edge AI/ML solutions that drive decision-making, automate workflows, and surface novel insights from complex datasets Design and optimize data warehouse schemas, dimensional models, and indexing strategies for large-scale hardware test datasets, ensuring efficient storage and high-performance query execution Develop and maintain robust, fault-tolerant data processing workflows to handle high-volume test data ingestion, transformation, and validation with appropriate error handling and recovery mechanisms Implement comprehensive data quality frameworks, validation rules, and monitoring systems to ensure accuracy, completeness, and reliability of critical metrics and analytics Continuously analyze and optimize data warehouse performance through query tuning, resource allocation, cost management, and capacity planning to support growing test data volumes Partner with reliability engineers, cross-functional software teams, and business stakeholders to understand data requirements and deliver analytics-ready datasets that enable data-driven insights and decision making

Minimum Qualifications

B.S. in Computer Science, Software Engineering, Computer Engineering, or related field woth 5-7+ years of related experience Hands-on experience designing, building, and maintaining modern data lake architectures (e.g., Apache Iceberg) and cloud-native or hybrid data warehouses (e.g., Snowflake), utilizing distributed compute engines like Apache Spark to process and transform large-scale datasets for analytical and AI/ML uses Deep SQL expertise, including advanced query optimization techniques (partitioning, clustering, vectorized execution) for high-performance analytics on large tables Solid understanding of multi-cloud networking concepts (VPCs, subnets, routing, firewalls, Transit Gateway, Direct Connect/VPN) and demonstrated ability to design and troubleshoot secure, low-latency cross-cloud data pipelines Expertise developing robust, scalable ETL pipelines using orchestration tools such as Apache Airflow, with a solid grasp of batch and streaming ingestion patterns (incremental loads, change data capture, event-driven architectures) Proven ability to analyze and remediate performance bottlenecks in distributed processing (tuning Spark executors, optimizing shuffle processes, adjusting resource allocations for cost efficiency) Experience integrating data engineering pipelines into machine-learning workflows—feeding feature stores, preparing training datasets, operationalizing model outputs—and exposure to feature store frameworks (Feast) and container-native processing frameworks (Kubeflow, MLFlow) Experience provisioning and managing cloud infrastructure via Infrastructure-as-Code (Terraform, CloudFormation), containerized deployments, and orchestration on Kubernetes (EKS, GKE, AKS), including autoscaling Spark clusters Comfortable with CI/CD pipelines tailored for data engineering (automated testing and deployment of SQL/Spark jobs) and skilled at instrumenting data pipelines with comprehensive logging and metrics Effective communicator in team environments with diverse technical backgrounds Thrives in fast-paced, evolving environments—quickly pivots priorities while maintaining data integrity and reliability

Preferred Qualifications

M.S. in Computer Science, Software Engineering, Computer Engineering, or related field Passion for quality and attention to detail; proactive in researching emerging technologies (cloud-native lakehouse services, new open-source formats) and integrating them into production Experience building software leveraging ML and GenAI to create innovative solutions to business needs, and enhance organizational and development workflows.