All jobs

System Operation Engineer (SOE)

Uplers

India · On-site Full-time Mid Level ₹2400k – ₹2500k/yr Today

Apply with a tailored resume Save job

About the role

Position Details

Experience: 3.00+ years
Salary: INR 2,400,000‑2,500,000 / year (based on experience)
Expected Notice Period: 30 Days
Shift: (GMT+05:30) Asia/Kolkata (IST)
Opportunity Type: Office
Placement Type: Full Time Permanent position (Payroll and Compliance to be managed by: A cloud-based field service management SaaS platform)

Note: This is a requirement for one of Uplers' client – a cloud‑based field service management SaaS platform.

Must Have Skills Required

Cloud Build
GitLab CI
Infrastructure as Code
Cloud Infrastructure Management
Container & Orchestration Operations
Linux System Administration
Monitoring & Observability
Jenkins

Role

System Operation Engineer (SOE)

ROLE OVERVIEW
We are looking for a skilled and proactive System Operation Engineer (SOE) to join our Infrastructure & Operations team. In this role, you will be responsible for maintaining the stability, scalability, and security of our production systems across on‑premise and cloud environments. You will work closely with development, DevOps, and platform teams to streamline operational workflows, automate repetitive tasks, and ensure maximum system availability. The ideal candidate brings a strong Linux foundation, hands‑on cloud experience, a deep understanding of containerisation technologies, and a passion for building reliable, automated infrastructure.

Key Responsibilities

Linux System Administration – Manage, configure, and maintain Linux‑based servers (RHEL / CentOS / RockyLinux / Ubuntu). Perform OS hardening, patch management, performance tuning, and capacity planning.
Cloud Infrastructure Management – Provision, manage, and optimise cloud resources primarily on GCP (or equivalent AWS/Azure). Oversee VPCs, IAM, compute, storage, networking, and billing hygiene.
Container & Orchestration Operations – Deploy, manage, and monitor Docker containerised workloads on Kubernetes clusters (GKE/EKS). Handle cluster upgrades, pod scheduling, resource limits, and service mesh configuration.
Monitoring & Observability – Set up and maintain monitoring, alerting, and dashboards using tools such as Prometheus, Grafana, Datadog, CloudWatch, or Stackdriver. Define SLOs/SLAs and create actionable runbooks.
Infrastructure as Code & Automation – Write and maintain Terraform modules and Ansible playbooks for automated provisioning, configuration management, and drift remediation.
CI/CD Pipeline Support – Support and maintain CI/CD pipelines (Jenkins, GitLab CI, GitHub Actions, Cloud Build). Collaborate with Dev teams to streamline build, test, and deployment workflows.
On‑Call Rotation – Participate in on‑call rotations. Lead incident triage, root‑cause analysis, and post‑mortem documentation. Drive mean‑time‑to‑recovery (MTTR) improvements.
Security & Compliance – Ensure systems adhere to security best practices — vulnerability scanning, certificate management, secrets rotation, and compliance with internal/external audit requirements.

Required Skills & Qualifications

Linux Administration

STRONG – Deep expertise in RHEL / CentOS / RockyLinux / Ubuntu
Process, memory & I/O performance tuning
File systems, LVM, disk management
Shell scripting (Bash / Python)
User management, ACLs & PAM
Systemd, cron, networking (iptables / nftables)
SSL/TLS certificate lifecycle management

Cloud Engineering

STRONG – GCP preferred (Compute, GKE, Cloud SQL, GCS)
AWS / Azure experience also valued
VPC design, subnets, firewall rules
IAM roles, service accounts & policies
Cloud‑native monitoring & logging
Cost optimisation & resource tagging

Docker / Kubernetes

STRONG – Docker image build, registry, security scanning
Kubernetes cluster operations (GKE / EKS / self‑hosted)
Helm chart creation & management
Namespaces, RBAC, network policies
HPA / VPA / Cluster Autoscaler
StatefulSets, PVCs, storage classes

Monitoring Systems

STRONG – Prometheus + Alertmanager + Grafana
ELK / EFK stack for log aggregation
Datadog / New Relic / Dynatrace
Cloud‑native: Stackdriver / CloudWatch
Synthetic monitoring & uptime checks
SLO / SLA dashboards & PagerDuty integration

Good To Have Skills

Ansible / Terraform – Infrastructure as Code, configuration management & provisioning automation
CI/CD Pipelines – Jenkins, GitLab CI, GitHub Actions, Cloud Build (2+ years DevOps exposure preferred)
Scripting Languages – Python, Bash, Go (for automation, tooling, and integration scripts)
Database Operations – PostgreSQL, MySQL, Redis (basic DBA tasks, backups, and replication monitoring)

Core Technology Stack

Linux, GCP, Datadog, AWS, Azure, New Relic, Docker, Cloud Build, Kubernetes, Istio, Helm, Redis, Terraform, PostgreSQL

Soft Skills & Professional Attributes

Problem‑Solving Mindset: Ability to diagnose complex system issues under pressure, think analytically, and develop long‑term preventive solutions.
Communication Skills: Articulate technical findings clearly to both technical peers and non‑technical stakeholders. Strong documentation habits.
Team Collaboration: Works effectively in cross‑functional teams — partnering with Dev, QA, Security, and Business teams.
Ownership & Accountability: Takes end‑to‑end ownership of assigned systems and services. Self‑driven with high accountability.
Continuous Learning: Stays current with emerging tools, cloud services, and infrastructure best practices. Proactively upskills.
Agile Adaptability: Comfortable working in Agile/Scrum environments with sprint planning, retrospectives, and iterative delivery.

How to Apply

Step 1: Click Apply! and register or log in on our portal.
Step 2: Complete the screening form & upload an updated resume.
Step 3: Increase your chances to get shortlisted & meet the client for the interview!

About Uplers

Our goal is to make hiring reliable, simple, and fast. Our role will be to help all our talents find and apply for relevant contractual onsite opportunities and progress in their career. We will support any grievances or challenges you may face during the engagement.

(Note: There are many more opportunities apart from this on the portal. Depending on the assessments you clear, you can apply for them as well.)

So, if you are ready for a new challenge, a great work environment, and an opportunity to take your career to the next level, don't hesitate to apply today. We are waiting for you!

Requirements

Deep expertise in RHEL / CentOS / RockyLinux / Ubuntu
Process, memory & I/O performance tuning
File systems, LVM, disk management
Shell scripting (Bash / Python)
User management, ACLs & PAM
Systemd, cron, networking (iptables / nftables)
SSL/TLS certificate lifecycle management
STRONG GCP preferred (Compute, GKE, Cloud SQL, GCS)
AWS / Azure experience also valued
VPC design, subnets, firewall rules
IAM roles, service accounts & policies
Cloud-native monitoring & logging
Cost optimisation & resource tagging
STRONG Docker image build, registry, security scanning
Kubernetes cluster operations (GKE / EKS / self-hosted)
Helm chart creation & management
Namespaces, RBAC, network policies
HPA / VPA / Cluster Autoscaler
StatefulSets, PVCs, storage classes
STRONG Prometheus + Alertmanager + Grafana
ELK / EFK stack for log aggregation
Datadog / New Relic / Dynatrace
Cloud-native: Stackdriver / CloudWatch
Synthetic monitoring & uptime checks
SLO / SLA dashboards & PagerDuty integration

Responsibilities

Manage, configure, and maintain Linux-based servers (RHEL / CentOS / RockyLinux / Ubuntu).
Perform OS hardening, patch management, performance tuning, and capacity planning.
Provision, manage, and optimise cloud resources primarily on GCP (or equivalent AWS/Azure).
Oversee VPCs, IAM, compute, storage, networking, and billing hygiene.
Deploy, manage, and monitor Docker containerised workloads on Kubernetes clusters (GKE/EKS).
Handle cluster upgrades, pod scheduling, resource limits, and service mesh configuration.
Set up and maintain monitoring, alerting, and dashboards using tools such as Prometheus, Grafana, Datadog, CloudWatch, or Stackdriver.
Define SLOs/SLAs and create actionable runbooks.
Write and maintain Terraform modules and Ansible playbooks for automated provisioning, configuration management, and drift remediation.
Support and maintain CI/CD pipelines (Jenkins, GitLab CI, GitHub Actions, Cloud Build).
Collaborate with Dev teams to streamline build, test, and deployment workflows
Participate in on-call rotations.
Lead incident triage, root-cause analysis, and post-mortem documentation.
Drive mean-time-to-recovery (MTTR) improvements.
Ensure systems adhere to security best practices — vulnerability scanning, certificate management, secrets rotation, and compliance with internal/external audit requirements

Skills

AnsibleBashCloud BuildCloud Infrastructure ManagementContainer & Orchestration OperationsDatadogDockerGCPGitLab CIGoGrafanaHelmInfrastructure as CodeIstioJenkinsKubernetesLinux System AdministrationMonitoring & observabilityNew RelicPostgreSQLPrometheusPythonRedisRHELTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free