All jobs

Site Reliability Engineer

Uplers

Kowdoor · On-site Full-time 5d ago

Apply with a tailored resume Save job

About the role

Experience: 5.00 + years

Salary: Confidential (based on experience)

Expected Notice Period: 15 Days

Shift: (GMT+05:30) Asia/Kolkata (IST)

Opportunity Type: Hybrid ()

Placement Type: Full Time Contract for 12 Months(40 hrs a week/160 hrs a month)

(*Note: This is a requirement for one of Uplers' client - SC)

What do you need for this opportunity?

Must have skills required:

esk, Helm, automation, Cloud Security, Service Reliability Management, SRE principles, Terraform., AWS, Github

SC is Looking for:

We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our growing SRE team and play a critical role in shaping the future of our industry-leading Palette platform.

As an SRE, you will operate at the intersection of software engineering and operations, owning the reliability, scalability, and operational excellence of our runtime platforms. You will design and build automation-driven solutions that enable self-service infrastructure, improve system resilience, and ensure a world-class customer experience.''''

This role is ideal for engineers who thrive in ambiguity, enjoy solving complex problems incrementally, and are energized by building scalable systems in a fast-moving, collaborative environment.

Key Responsibilities

Technical Leadership & Culture :

• Foster a culture that values technical excellence, accountability, collaboration, and empathy. • Lead by example in operational rigor, engineering best practices, and continuous improvement.

Automation & Self-Service Enablement

• Design and implement automation, tools, and workflows that enable self-service provisioning and accelerate engineering velocity. • Build scalable systems for environment configuration, container orchestration, and infrastructure lifecycle management.

Platform & Infrastructure Engineering

• Contribute to the design, configuration, and deployment of platform, networking, and cloud infrastructure. • Actively manage and prioritize critical Kubernetes infrastructure initiatives.

Reliability, Resilience & Observability

• Enhance and automate failover capabilities and resiliency against fault conditions. • Implement comprehensive testing, logging, monitoring, and alerting using ELK and related observability tooling. • Ensure all systems are auditable, with automated mechanisms to supply compliance evidence.

Service Reliability Management

• Define, implement, and continuously refine Service Level Indicators (SLIs) and Service Level Objectives (SLOs) aligned with customer impact and business priorities. • Act as Incident Commander for high-severity incidents, driving clear decision-making and effective cross-functional coordination.

Risk & Operations Management

• Proactively identify, assess, and mitigate operational risks. • Balance reliability improvements with delivery speed and business objectives.

What Success Looks Like

You will excel in this role if you :

• Thrive in environments with evolving requirements. • Break down complex challenges into iterative, measurable improvements. • Embrace a test-and-learn mindset and continuously refine solutions based on outcomes. • Demonstrate strong ownership, independence, and a bias toward action. • Collaborate effectively across distributed teams and time zones.

Qualifications

We recognize that no candidate meets every requirement. The following qualifications help guide our assessment:

• 5+ years of experience delivering SRE-focused projects involving automation, systems administration, and operational excellence. • Bachelors degree in Computer Science or a related field (or equivalent practical experience). • Strong understanding of cloud security best practices and compliance standards. • Hands-on experience with Infrastructure as Code (IaC) tools such as Terraform. • Advanced experience with AWS, EKS, Helm, and Git (Github). • Proven ability to define and implement AWS architectural best practices, including security, performance, reliability, and cost optimization. • Familiarity with SRE principles relevant certifications (e.g., SRE Foundation) are a plus. • Excellent written and verbal communication skills. • Ability to manage multiple initiatives and respond effectively to escalations. • Experience working with stakeholders across all levels, including executive leadership. • Comfort collaborating with remote teams across the U.S. and internationally.

The Hiring Process

Our engineering interview process typically includes three to four stages :

• Initial screening interview • Two to three technical interviews, including hands-on assessments • Final round focused on team fit and deeper discussions

How to apply for this opportunity?

• Step 1: Click On Apply! And Register or Login on our portal. • Step 2: Complete the Screening Form & Upload updated Resume • Step 3: Increase your chances to get shortlisted & meet the client for the Interview!

About Uplers:

Our goal is to make hiring reliable, simple, and fast. Our role will be to help all our talents find and apply for relevant contractual onsite opportunities and progress in their career. We will support any grievances or challenges you may face during the engagement.

(Note: There are many more opportunities apart from this on the portal. Depending on the assessments you clear, you can apply for them as well).

So, if you are ready for a new challenge, a great work environment, and an opportunity to take your career to the next level, don't hesitate to apply today. We are waiting for you!

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Site Reliability Engineer

About the role

Similar roles

Sr.Software Engineer - Bizagi Job

Software Engineer/Developer

Machine Learning Engineer Focused on Fraud Prevention and System Optimization

Don't send a generic resume