New

Sr. Site Reliability Engineer

Skill
United States, Iowa, Urbandale
Apr 21, 2026
Overview Placement Type: Temporary Salary: $72.48-75.48 Hourly Hourly, DOE Start Date: May 11, 2026 Aquent Studios is looking for a Senior Site Reliability Engineer to join its team, working on projects for one of its top clients, a globally recognized leader dedicated to shaping the future of its industry through cutting-edge technology and innovation. This organization is committed to excellence, leveraging advanced solutions to deliver unparalleled quality and efficiency. Are you a visionary engineer passionate about building resilient, high-performing systems that power critical operations? In this pivotal role, you will directly influence the stability, scalability, and security of our core applications, ensuring seamless operations and accelerating our technological advancements. Your expertise will be instrumental in safeguarding our infrastructure and enhancing our ability to deliver groundbreaking products and services. About the Role We are looking for a highly skilled and proactive engineer to drive the reliability and performance of our sophisticated application ecosystem. You will be at the forefront of designing and implementing solutions that prevent outages, optimize system health, and respond effectively to complex incidents. This role offers a unique opportunity to lead critical initiatives, mentor peers, and contribute to a culture of continuous improvement and innovation. What You Will Do * Develop and implement creative solutions to enhance application resiliency, reliability, and security. * Proactively identify potential system failures and automate robust solutions for failure prevention and rapid recovery. * Lead as an Incident Commander during critical system outages, coordinating recovery efforts for large and complex systems. * Act as a technical advisor, leading the resolution of complex, organization-wide problems related to system resiliency, reliability, and security by organizing and leveraging necessary resources. * Drive the development and implementation of Service Level Objectives (SLOs) across various system components. * Evaluate, design, and implement enhancements to improve the cost-efficiency, quality, performance, and security of software applications. * Design and implement solutions for comprehensive metric gathering related to the cost, quality, performance, and security of software applications. * Continuously expand and maintain expert knowledge of industry trends and technical innovations within the discipline. * Create and maintain essential support and playbook documentation. * Collaborate closely with fellow engineers and product teams to ensure features meet business requirements and operational standards. * Lead post-mortem analyses of incidents to identify root causes, extract lessons learned, and implement improvements in automation, recovery processes, and documentation. Required Skills & Experience * Deep and broad understanding of technology, with a proven ability to develop creative solutions for complex technical challenges. * Demonstrated experience in anticipating product failures and automating solutions for failure modes and recovery. * Strong leadership experience, including acting as an Incident Commander for large or complex system failures. * Expertise in solving complex, organization-wide problems related to resiliency, reliability, and security. * Proven ability to lead the development of Service Level Objectives (SLOs). * Experience evaluating and implementing enhancement design solutions for software applications, focusing on cost, quality, performance, and security. * Proficiency in designing and implementing solutions for gathering metrics related to application cost, quality, performance, and security. * Strong collaborative skills, with experience working effectively with engineering and product teams. * Experience leading problem post-mortems and driving continuous improvement. * Hands-on experience with AWS Cloud Services. * Proficiency with Kubernetes. * Experience with Datadog for monitoring and observability. * Expertise in Terraform for infrastructure as code. * Proficiency in one or more programming languages such as Java, Scala, JavaScript, .NET, Go, or Python.