DevOps Engineer / Site Reliability Engineer - Remote working (Permanent)
Uplers
About the role
Experience: 2.00 + years
Salary : INR 2500000-3500000 / year (based on experience)
Shift : (GMT+05:30) Asia/Kolkata (IST)
Opportunity Type : Remote
Placement Type : Full Time Permanent position(Payroll and Compliance to be managed by: Lyzr)
(*FinOps, AWS, Python, PowerShell, Bash, DevOps, SRE, System Engineering
At Lyzr AI, this role sits at the heart of platform reliability and scale. You will own the availability, security, and performance of mission-critical AI systems powering our customers, ensuring they run flawlessly at all times. Acting as the final escalation point, you'll blend deep technical expertise with SRE principles to build resilient, automated, and cost-efficient cloud infrastructure
Troubleshoot and resolve complex issues across infrastructure, application code, and networking layers. Incident Management: Lead Root Cause Analysis (RCA) processes for outages, driving permanent fixes and architectural changes to prevent recurrence.
Maintain strict IAM policies and security groups.
Build and maintain comprehensive monitoring, logging, and alerting frameworks (CloudWatch, Prometheus, Datadog) to ensure early detection of anomalies.
Define and maintain backup/restore processes and routine maintenance windows with minimal downtime.
Eliminate Toil: Apply SRE principles to automate repetitive operational tasks, reducing manual intervention.
Develop automation tools and manage infrastructure using Terraform or CloudFormation, along with scripting in Python, Go, or Bash.
Performance Tuning: optimize application runtime parameters, database queries, and system kernel settings for maximum throughput.
Cloud & Cost Optimization (FinOps)
AWS Management: Architect and manage extensive AWS services—EC2, EKS/ECS, RDS, S3, Lambda, VPC, and Route53.
Cost Efficiency: Actively monitor cloud spend and drive Cost Optimization initiatives.
This includes rightsizing instances, managing Reserved/Spot instances, and identifying idle resources to reduce waste.
Capacity Planning: Collaborate with engineering teams to forecast infrastructure needs, ensuring we scale to meet demand without over-provisioning.
Experience: 2-5 years in SRE, DevOps, or Systems Engineering roles with a strong focus on AWS.
Cloud Proficiency: Expert-level knowledge of AWS core services and architecture standards.
Strong proficiency in Python or Shell/Bash for automation.
Cost Tools: Experience with AWS Cost Explorer, Trusted Advisor, or 3rd party tools (e.g., CloudHealth) to drive financial efficiency.
Monitoring: Hands-on experience with tools like Grafana, Prometheus, ELK Stack, or Splunk.
Experience in Hybrid Cloud environments (AWS + On-Prem/Data Center).
Knowledge of container orchestration (Kubernetes/EKS).
Understanding of database administration and replication (PostgreSQL, MySQL, or DynamoDB).
Step 3: Increase your chances to get shortlisted & meet the client for the Interview!
Our role will be to help all our talents find and apply for relevant contractual onsite opportunities and progress in their career. So, if you are ready for a new challenge, a great work environment, and an opportunity to take your career to the next level, don't hesitate to apply today.
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free