Sr Principal Site Reliability Engineer
UKG
About the role
Here's the job description for a Site Reliability Engineer at UKG, incorporating all the provided details: About UKG Site Reliability Engineering
At UKG, Site Reliability Engineers are pivotal team members possessing a breadth of knowledge encompassing all aspects of service delivery. We develop software solutions to enhance, harden, and support our service delivery processes. This includes building and managing CI/CD deployment pipelines, automated testing, capacity planning, performance analysis, monitoring, alerting, chaos engineering, and auto-remediation.
We have a passion for learning and evolving with current technology trends, striving to innovate and relentlessly pursuing a flawless customer experience. We operate with an 'automate everything' mindset, helping us bring immense value to our customers by deploying services with incredible speed, consistency, and availability. Primary/Essential Duties and Key Responsibilities • Engage in and improve the lifecycle of services from conception to End-of-Life (EOL), including: system design consulting and capacity planning. • Define and implement standards and best practices related to: System Architecture, Service delivery, metrics, and the automation of operational tasks. • Support services, product & engineering teams by providing common tooling and frameworks to deliver increased availability and improved incident response. • Improve system performance, application delivery, and efficiency through automation, process refinement, postmortem reviews, and in-depth configuration analysis. • Collaborate closely with engineering professionals within the organization to deliver reliable services. • Identify and eliminate operational toil by treating operational challenges as a software engineering problem. • Actively participate in incident response, including on-call responsibilities. • Partner with stakeholders to influence and help drive the best possible technical and business outcomes. • Guide junior team members and serve as a champion for Site Reliability Engineering. Qualifications • Engineering degree, or a related technical discipline, and 10+ years of experience in SRE . • Experience coding in higher-level languages (e.g., Python, Javascript, C++, or Java). • Knowledge of Cloud-based applications & Containerization Technologies. • Demonstrated understanding of best practices in metric generation and collection, log aggregation pipelines, time-series databases, and distributed tracing. • Ability to analyze current technology utilized and engineering practices within the company and develop steps and processes to improve and expand upon them. • Working experience with industry standards like Terraform, Ansible .
(Experience, Education, Certification, License and Training) • Must have hands-on experience working within Engineering or Cloud. • Experience with public cloud platforms (e.g., GCP, AWS, Azure). • Experience in configuration and maintenance of applications & systems infrastructure. • Experience with distributed system design and architecture. • Experience building and managing CI/CD Pipelines
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free