Skip to content
mimi

System Operation Engineer (SOE)

Uplers

India · On-site Full-time Mid Level ₹2400k – ₹2500k/yr Today

About the role

Position Details

  • Experience: 3.00+ years
  • Salary: INR 2,400,000‑2,500,000 / year (based on experience)
  • Expected Notice Period: 30 Days
  • Shift: (GMT+05:30) Asia/Kolkata (IST)
  • Opportunity Type: Office
  • Placement Type: Full Time Permanent position (Payroll and Compliance to be managed by: A cloud-based field service management SaaS platform)

Note: This is a requirement for one of Uplers' client – a cloud‑based field service management SaaS platform.

Must Have Skills Required

  • Cloud Build
  • GitLab CI
  • Infrastructure as Code
  • Cloud Infrastructure Management
  • Container & Orchestration Operations
  • Linux System Administration
  • Monitoring & Observability
  • Jenkins

Role

System Operation Engineer (SOE)

ROLE OVERVIEW
We are looking for a skilled and proactive System Operation Engineer (SOE) to join our Infrastructure & Operations team. In this role, you will be responsible for maintaining the stability, scalability, and security of our production systems across on‑premise and cloud environments. You will work closely with development, DevOps, and platform teams to streamline operational workflows, automate repetitive tasks, and ensure maximum system availability. The ideal candidate brings a strong Linux foundation, hands‑on cloud experience, a deep understanding of containerisation technologies, and a passion for building reliable, automated infrastructure.

Key Responsibilities

  • Linux System Administration – Manage, configure, and maintain Linux‑based servers (RHEL / CentOS / RockyLinux / Ubuntu). Perform OS hardening, patch management, performance tuning, and capacity planning.
  • Cloud Infrastructure Management – Provision, manage, and optimise cloud resources primarily on GCP (or equivalent AWS/Azure). Oversee VPCs, IAM, compute, storage, networking, and billing hygiene.
  • Container & Orchestration Operations – Deploy, manage, and monitor Docker containerised workloads on Kubernetes clusters (GKE/EKS). Handle cluster upgrades, pod scheduling, resource limits, and service mesh configuration.
  • Monitoring & Observability – Set up and maintain monitoring, alerting, and dashboards using tools such as Prometheus, Grafana, Datadog, CloudWatch, or Stackdriver. Define SLOs/SLAs and create actionable runbooks.
  • Infrastructure as Code & Automation – Write and maintain Terraform modules and Ansible playbooks for automated provisioning, configuration management, and drift remediation.
  • CI/CD Pipeline Support – Support and maintain CI/CD pipelines (Jenkins, GitLab CI, GitHub Actions, Cloud Build). Collaborate with Dev teams to streamline build, test, and deployment workflows.
  • On‑Call Rotation – Participate in on‑call rotations. Lead incident triage, root‑cause analysis, and post‑mortem documentation. Drive mean‑time‑to‑recovery (MTTR) improvements.
  • Security & Compliance – Ensure systems adhere to security best practices — vulnerability scanning, certificate management, secrets rotation, and compliance with internal/external audit requirements.

Required Skills & Qualifications

Linux Administration

  • STRONG – Deep expertise in RHEL / CentOS / RockyLinux / Ubuntu
  • Process, memory & I/O performance tuning
  • File systems, LVM, disk management
  • Shell scripting (Bash / Python)
  • User management, ACLs & PAM
  • Systemd, cron, networking (iptables / nftables)
  • SSL/TLS certificate lifecycle management

Cloud Engineering

  • STRONG – GCP preferred (Compute, GKE, Cloud SQL, GCS)
  • AWS / Azure experience also valued
  • VPC design, subnets, firewall rules
  • IAM roles, service accounts & policies
  • Cloud‑native monitoring & logging
  • Cost optimisation & resource tagging

Docker / Kubernetes

  • STRONG – Docker image build, registry, security scanning
  • Kubernetes cluster operations (GKE / EKS / self‑hosted)
  • Helm chart creation & management
  • Namespaces, RBAC, network policies
  • HPA / VPA / Cluster Autoscaler
  • StatefulSets, PVCs, storage classes

Monitoring Systems

  • STRONG – Prometheus + Alertmanager + Grafana
  • ELK / EFK stack for log aggregation
  • Datadog / New Relic / Dynatrace
  • Cloud‑native: Stackdriver / CloudWatch
  • Synthetic monitoring & uptime checks
  • SLO / SLA dashboards & PagerDuty integration

Good To Have Skills

  • Ansible / Terraform – Infrastructure as Code, configuration management & provisioning automation
  • CI/CD Pipelines – Jenkins, GitLab CI, GitHub Actions, Cloud Build (2+ years DevOps exposure preferred)
  • Scripting Languages – Python, Bash, Go (for automation, tooling, and integration scripts)
  • Database Operations – PostgreSQL, MySQL, Redis (basic DBA tasks, backups, and replication monitoring)

Core Technology Stack

Linux, GCP, Datadog, AWS, Azure, New Relic, Docker, Cloud Build, Kubernetes, Istio, Helm, Redis, Terraform, PostgreSQL

Soft Skills & Professional Attributes

  • Problem‑Solving Mindset: Ability to diagnose complex system issues under pressure, think analytically, and develop long‑term preventive solutions.
  • Communication Skills: Articulate technical findings clearly to both technical peers and non‑technical stakeholders. Strong documentation habits.
  • Team Collaboration: Works effectively in cross‑functional teams — partnering with Dev, QA, Security, and Business teams.
  • Ownership & Accountability: Takes end‑to‑end ownership of assigned systems and services. Self‑driven with high accountability.
  • Continuous Learning: Stays current with emerging tools, cloud services, and infrastructure best practices. Proactively upskills.
  • Agile Adaptability: Comfortable working in Agile/Scrum environments with sprint planning, retrospectives, and iterative delivery.

How to Apply

  1. Step 1: Click Apply! and register or log in on our portal.
  2. Step 2: Complete the screening form & upload an updated resume.
  3. Step 3: Increase your chances to get shortlisted & meet the client for the interview!

About Uplers

Our goal is to make hiring reliable, simple, and fast. Our role will be to help all our talents find and apply for relevant contractual onsite opportunities and progress in their career. We will support any grievances or challenges you may face during the engagement.

(Note: There are many more opportunities apart from this on the portal. Depending on the assessments you clear, you can apply for them as well.)

So, if you are ready for a new challenge, a great work environment, and an opportunity to take your career to the next level, don't hesitate to apply today. We are waiting for you!

Requirements

  • Deep expertise in RHEL / CentOS / RockyLinux / Ubuntu
  • Process, memory & I/O performance tuning
  • File systems, LVM, disk management
  • Shell scripting (Bash / Python)
  • User management, ACLs & PAM
  • Systemd, cron, networking (iptables / nftables)
  • SSL/TLS certificate lifecycle management
  • STRONG GCP preferred (Compute, GKE, Cloud SQL, GCS)
  • AWS / Azure experience also valued
  • VPC design, subnets, firewall rules
  • IAM roles, service accounts & policies
  • Cloud-native monitoring & logging
  • Cost optimisation & resource tagging
  • STRONG Docker image build, registry, security scanning
  • Kubernetes cluster operations (GKE / EKS / self-hosted)
  • Helm chart creation & management
  • Namespaces, RBAC, network policies
  • HPA / VPA / Cluster Autoscaler
  • StatefulSets, PVCs, storage classes
  • STRONG Prometheus + Alertmanager + Grafana
  • ELK / EFK stack for log aggregation
  • Datadog / New Relic / Dynatrace
  • Cloud-native: Stackdriver / CloudWatch
  • Synthetic monitoring & uptime checks
  • SLO / SLA dashboards & PagerDuty integration

Responsibilities

  • Manage, configure, and maintain Linux-based servers (RHEL / CentOS / RockyLinux / Ubuntu).
  • Perform OS hardening, patch management, performance tuning, and capacity planning.
  • Provision, manage, and optimise cloud resources primarily on GCP (or equivalent AWS/Azure).
  • Oversee VPCs, IAM, compute, storage, networking, and billing hygiene.
  • Deploy, manage, and monitor Docker containerised workloads on Kubernetes clusters (GKE/EKS).
  • Handle cluster upgrades, pod scheduling, resource limits, and service mesh configuration.
  • Set up and maintain monitoring, alerting, and dashboards using tools such as Prometheus, Grafana, Datadog, CloudWatch, or Stackdriver.
  • Define SLOs/SLAs and create actionable runbooks.
  • Write and maintain Terraform modules and Ansible playbooks for automated provisioning, configuration management, and drift remediation.
  • Support and maintain CI/CD pipelines (Jenkins, GitLab CI, GitHub Actions, Cloud Build).
  • Collaborate with Dev teams to streamline build, test, and deployment workflows
  • Participate in on-call rotations.
  • Lead incident triage, root-cause analysis, and post-mortem documentation.
  • Drive mean-time-to-recovery (MTTR) improvements.
  • Ensure systems adhere to security best practices — vulnerability scanning, certificate management, secrets rotation, and compliance with internal/external audit requirements

Skills

AnsibleBashCloud BuildCloud Infrastructure ManagementContainer & Orchestration OperationsDatadogDockerGCPGitLab CIGoGrafanaHelmInfrastructure as CodeIstioJenkinsKubernetesLinux System AdministrationMonitoring & observabilityNew RelicPostgreSQLPrometheusPythonRedisRHELTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free