|
At T-Mobile, we invest in YOU! Our Total Rewards Package ensures that employees get the same big love we give our customers. All team members receive a competitive base salary and compensation package - this is Total Rewards. Employees enjoy multiple wealth-building opportunities through our annual stock grant, employee stock purchase plan, 401(k), and access to free, year-round money coaches. That's how we're UNSTOPPABLE for our employees!
Are you ready to join the Un-carrier movement?
The Site Reliability Engineer at T-Mobile is instrumental in enhancing system reliability and resilience, ensuring our digital infrastructure operates seamlessly. By automating processes and reducing manual efforts, they minimize operational incidents and streamline software development and deployment. Their proficiency in programming, scripting languages, and incident response management fortifies our systems against disruptions. Through continuous learning and adaptation to new technologies, they drive innovation and maintain system robustness. Their contributions are vital to the stability and performance of T-Mobile's digital operations, directly impacting our service quality and operational efficiency.
We pride ourselves on encouraging a culture of innovation, advocating for agile methodologies, and promoting transparency in all that we do. Join us in embodying the spirit of the 'Un-carrier' and make a tangible impact! If you're up to the challenge, apply today!
Job Responsibilities:
- Automates processes to enhance system reliability and resilience
- Own the reliability, scalability, and uptime of business-critical systems and services.
- Build self-healing systems that automatically detect and recover from faults.
- Implement redundancy, failover, and chaos testing strategies to validate system resilience.
- Apply and refine SLOs/SLIs to measure and improve reliability outcomes.
- Develop runbooks, automation scripts, and incident workflows that improve recovery time.
- Integrate reliability best practices into the software development lifecycle.
- Collaborate closely with Platform, Cloud, and Development teams to design for reliability.
- Minimizes operational incidents through proactive monitoring and maintenance
- Maintain and improve service-level objectives (SLOs) and error budgets for production systems.
- Develop and enhance observability platforms (Prometheus, Grafana, CloudWatch, Azure Monitor) for deep visibility into system health.
- Build and tune alerting systems to detect anomalies before they impact users.
- Analyze system metrics, logs, and traces to identify performance bottlenecks.
- Lead capacity planning and performance optimization efforts across multiple environments.
- Streamlines software development and deployment processes
- Partner with operations and development teams to ensure service stability across hybrid infrastructure.
- Design and implement automation for infrastructure provisioning and environment management using Terraform, Ansible, and Python.
- Optimize CI/CD pipelines (GitLab CI/CD, Jenkins) to accelerate deployments while maintaining stability.
- Develops scripts and tools to reduce manual efforts in operational tasks
- Maintain reusable automation modules and operational tooling for reliability improvements.
- Manages incident response to ensure rapid recovery and minimal disruption
- Lead incident response and root cause analysis to minimize downtime and prevent recurrence.
- Contribute to post-incident reviews and reliability roadmaps to drive continuous improvement.
- Adapts to new technologies to maintain and enhance system robustness
- Stay current with new SRE methodologies, tools, and infrastructure technologies.
Education and Work Experience:
- Bachelor's Degree Computer Science or Engineering (Required)
- Master's/Advanced Degree Computer Science or Data Science (Preferred)
- 2-4 years developing and maintaining CI/CD pipelines for software deployment (Required)
- 2-4 years implementing and managing cloud-native platforms and solutions (Required)
- 2-4 years guiding and mentoring teams in reliability engineering practices (Required)
Knowledge, Skills and Abilities:
- Problem Solving Ability to identify, analyze, and resolve system reliability issues. (Required)
- Scripting Languages Proficiency in scripting languages such as Python or Bash to automate tasks and processes. (Required)
- Incident Response Management Skilled in managing and responding to system incidents to minimize downtime and impact. (Required)
Licenses and Certifications:
- Certified Kubernetes Administrator (CKA) - Certification that validates the ability to use Kubernetes, which is crucial for automating deployment, scaling, and operations of application containers across clusters of hosts. (Preferred)
- AWS Certified DevOps Engineer - Certification that demonstrates an individual's expertise in provisioning, operating, and managing distributed application systems on the AWS platform. (Preferred)
- Site Reliability Engineering (SRE) Foundation Certification - Certification that provides a foundational understanding of the SRE philosophy, practices, and tools to enhance the reliability and performance of systems. (Preferred)
- At least 18 years of age
- Legally authorized to work in the United States
Travel: Travel Required (Yes/No): No
DOT Regulated: DOT Regulated Position (Yes/No): No Safety Sensitive Position (Yes/No): No
Base Pay Range: $84,900 - $153,200
Corporate Bonus Target: 15%
The pay range above is the general base pay range for a successful candidate in the role. The successful candidate's actual pay will be based on various factors, such as work location, qualifications, and experience, so the actual starting pay will vary within this range.
At T-Mobile, employees in regular, non-temporary roles are eligible for an annual bonus or periodic sales incentive or bonus, based on their role. Most Corporate employees are eligible for a year-end bonus based on company and/or individual performance and which is set at a percentage of the employee's eligible earnings in the prior year. Certain positions in Customer Care are eligible for monthly bonuses based on individual and/or team performance. To find the pay range for this role based on hiring location, click here.
At T-Mobile, our benefits exemplify the spirit of One Team, Together! A big part of how we care for one another is working to ensure our benefits evolve to meet the needs of our team members. Full and part-time employees have access to the same benefits when eligible. We cover all of the bases, offering medical, dental and vision insurance, a flexible spending account, 401(k), employee stock grants, employee stock purchase plan, paid time off and up to 12 paid holidays - which total about 4 weeks for new full-time employees and about 2.5 weeks for new part-time employees annually - paid parental and family leave, family building benefits, back-up care, enhanced family support, childcare subsidy, tuition assistance, college coaching, short- and long-term disability, voluntary AD&D coverage, voluntary accident coverage, voluntary life insurance, voluntary disability insurance, and voluntary long-term care insurance. We don't stop there - eligible employees can also receive mobile service & home internet discounts, pet insurance, and access to commuter and transit programs! To learn about T-Mobile's amazing benefits, check out www.t-mobilebenefits.com. Never stop growing! As part of the T-Mobile team, you know the Un-carrier doesn't have a corporate ladder-it's more like a jungle gym of possibilities! We love helping our employees grow in their careers, because it's that shared drive to aim high that drives our business and our culture forward. By applying for this career opportunity, you're living our values while investing in your career growth-and we applaud it. You're unstoppable!
T-Mobile USA, Inc. is an Equal Opportunity Employer. All decisions concerning the employment relationship will be made without regard to age, race, ethnicity, color, religion, creed, sex, sexual orientation, gender identity or expression, national origin, religious affiliation, marital status, citizenship status, veteran status, the presence of any physical or mental disability, or any other status or characteristic protected by federal, state, or local law. Discrimination, retaliation or harassment based upon any of these factors is wholly inconsistent with how we do business and will not be tolerated.
Talent comes in all forms at the Un-carrier. If you are an individual with a disability and need reasonable accommodation at any point in the application or interview process, please let us know by emailing ApplicantAccommodation@t-mobile.com or calling 1-844-873-9500. Please note, this contact channel is not a means to apply for or inquire about a position and we are unable to respond to non-accommodation related requests.
|