SRE (Site Reliability Engineer)
Supersourcing
About the role
As a Site Reliability Engineer at the company, you will play a crucial role in managing the challenges of scale specific to Digitization. You will utilize your expertise in coding, algorithms, complexity analysis, and large-scale system design to deliver scalable, reliable, durable, and secure applications for both internal users and customers. Your focus will be on ensuring a customer-first approach to application development while driving technical innovation to meet and exceed our customer's needs.
**Responsibilities:** - Collaborate with the Site Reliability Engineering team, Development team, and other partner teams to maintain application reliability, efficiency, and performance to meet customer requirements. This includes ensuring the operation of services is reliable, scalable, and automated. - Plan and execute projects aimed at enhancing system reliability, efficiency, and performance. - Work closely with development teams during feature launches to deliver reliable and scalable functionality to customers. - Develop a deep understanding of production infrastructure to diagnose distributed systems issues and suggest system improvements. - Manage operations, service level objectives (SLO), service level agreements (SLA), metrics reporting, and progress tracking. - Be on-call to respond to and manage incidents effectively. - Implement observability practices including alarms, monitoring, and synthetics, along with error management strategies.
**Qualifications:** - Bachelor's degree in Computer Science or a related engineering field. - Minimum of 8 years of experience in the IT industry. - Strong proficiency in: - Java, Springboot, Nodejs, microservices, RDBMS, NoSQL - AWS services: EC2, S3, Lambda, IAM, ECS, EKS, SQS, Kinesis - Observability tools like Splunk and NewRelic - Infrastructure as Code using Terraform - APIs and event-driven approaches - Security patterns - Unix/Linux systems administration; familiarity with Docker is essential. - Extensive experience in analyzing and troubleshooting large-scale distributed systems with a quick response to high severity customer impacts. - Ability to debug, optimize code, and automate routine tasks. - Proficient in modern software engineering practices and tools such as Agile and DevOps. - Excellent communication skills and the ability to simplify complex technical concepts for easy understanding. As a Site Reliability Engineer at the company, you will play a crucial role in managing the challenges of scale specific to Digitization. You will utilize your expertise in coding, algorithms, complexity analysis, and large-scale system design to deliver scalable, reliable, durable, and secure applications for both internal users and customers. Your focus will be on ensuring a customer-first approach to application development while driving technical innovation to meet and exceed our customer's needs.
**Responsibilities:** - Collaborate with the Site Reliability Engineering team, Development team, and other partner teams to maintain application reliability, efficiency, and performance to meet customer requirements. This includes ensuring the operation of services is reliable, scalable, and automated. - Plan and execute projects aimed at enhancing system reliability, efficiency, and performance. - Work closely with development teams during feature launches to deliver reliable and scalable functionality to customers. - Develop a deep understanding of production infrastructure to diagnose distributed systems issues and suggest system improvements. - Manage operations, service level objectives (SLO), service level agreements (SLA), metrics reporting, and progress tracking. - Be on-call to respond to and manage incidents effectively. - Implement observability practices including alarms, monitoring, and synthetics, along with error management strategies.
**Qualifications:** - Bachelor's degree in Computer Science or a related engineering field. - Minimum of 8 years of experience in the IT industry. - Strong proficiency in: - Java, Springboot, Nodejs, microservices, RDBMS, NoSQL - AWS services: EC2, S3, Lambda, IAM, ECS, EKS, SQS, Kinesis - Observability tools like Splunk and NewRelic - Infrastructure as Code using Terraform - APIs and event-driven approaches - Security patterns - Unix/Linux systems administration; familiarity with Docker is essential. - Extensive experience in analyzing and troubleshooting large-scale distributed systems with a quick response to high severity customer impacts. - Ability to debug, optimize code, and automate routine tasks. - Proficient in modern software engineering practices and tools such as Agile and DevOps. - Excellent communication skills and the ability to simplify complex technical concepts for easy understanding.
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free