DevOps Engineer - Google Cloud Platform
SYEDYN SOLUTIONS PRIVATE LIMITED
About the role
About
Strong production‑grade Python development (version 3.8+), hands‑on experience building and deploying serverless applications on Google Cloud Platform (particularly Cloud Run, Cloud Functions, Pub/Sub, Eventarc, IAM, and Secret Manager), practical expertise in implementing monitoring, alerting, distributed tracing, and event‑driven observability, solid application‑layer interaction with Oracle databases (efficient querying, connection pooling and retry logic, basic PL/SQL invocation for remediation tasks, transaction management, and performance‑aware access patterns), and familiarity with event‑driven architectures, API integrations, containerization, and modern CI/CD practices. The primary focus of this role is to design and implement intelligent, serverless automation on GCP that responds to Instana observability signals. This includes developing automated remediation workflows for common Oracle database issues (such as high tablespace usage, excessive session counts, low DB CPU time to DB time ratios, or related performance degradations) triggered directly from Instana alerts. In addition to these Oracle‑specific remediations, other remediation capabilities will be developed as requirements are identified through further project discovery.
Key Responsibilities
- Develop and maintain Cloud Run services and Cloud Functions in GCP to execute automated remediation tasks
- Integrate these serverless components with Instana observability platform to trigger remediation actions based on alerts, anomalies, and performance incidents (via webhooks, custom events, or payload processing)
- Build Python‑based remediation logic specifically targeting common Oracle database issues detectable through Instana, implementing safe, controlled corrective measures (e.g., cleanup procedures, session management, advisory executions) from the application layer
- Develop Python‑based ingestion pipelines to import relevant metrics, events, and topology data from SolarWinds and Turbonomic into Instana, enabling a consolidated, unified dashboard view across these platforms
- Extend Instana‑based triggering and remediation logic to incorporate signals and enriched context from SolarWinds and Turbonomic data sources, allowing automated responses to issues surfaced through the combined observability dataset
- Design and implement additional remediation workflows for other systems or components as determined during project discovery phases
- Ensure reliable application‑side interactions with Oracle databases and other relevant systems to support remediation execution
- Enhance observability capabilities, including custom metrics, tracing, and logging to validate remediation effectiveness and prevent recurrence
- Collaborate with observability, platform, and database teams to define, test, and refine automation playbooks for high‑reliability remediation in production
Highly Desirable
- Direct experience with Instana observability platform, including configuration of health rules, alert payloads, webhook integrations, custom event handling, or custom data source ingestion
- Prior development of automated remediation or self‑healing workflows using observability signals (particularly for databases or infrastructure components)
- Basic knowledge of SolarWinds, Instana and Turbonomic platforms
- Experience integrating or ingesting data from SolarWinds, Turbonomic, or similar IT operations management / resource optimization platforms
Nice to Have
- Experience with asynchronous Python in serverless contexts (asyncio, FastAPI, concurrent.futures)
- Knowledge of additional GCP eventing and orchestration tools for reliable, auditable automation
- Familiarity with Instana's custom data ingestion mechanisms (e.g., OpenTelemetry, custom agents, or API‑based ingestion) (ref: hirist.tech)
Requirements
- Strong production-grade Python development (version 3.8+)
- Hands-on experience building and deploying serverless applications on Google Cloud Platform (particularly Cloud Run, Cloud Functions, Pub/Sub, Eventarc, IAM, and Secret Manager)
- Practical expertise in implementing monitoring, alerting, distributed tracing, and event-driven observability
- Solid application-layer interaction with Oracle databases (efficient querying, connection pooling and retry logic, basic PL/SQL invocation for remediation tasks, transaction management, and performance-aware access patterns)
- Familiarity with event-driven architectures, API integrations, containerization, and modern CI/CD practices
Responsibilities
- Develop and maintain Cloud Run services and Cloud Functions in GCP to execute automated remediation tasks
- Integrate these serverless components with Instana observability platform to trigger remediation actions based on alerts, anomalies, and performance incidents (via webhooks, custom events, or payload processing)
- Build Python-based remediation logic specifically targeting common Oracle database issues detectable through Instana, implementing safe, controlled corrective measures (e.g., cleanup procedures, session management, advisory executions) from the application layer
- Develop Python-based ingestion pipelines to import relevant metrics, events, and topology data from SolarWinds and Turbonomic into Instana, enabling a consolidated, unified dashboard view across these platforms
- Extend Instana-based triggering and remediation logic to incorporate signals and enriched context from SolarWinds and Turbonomic data sources, allowing automated responses to issues surfaced through the combined observability dataset
- Design and implement additional remediation workflows for other systems or components as determined during project discovery phases
- Ensure reliable application-side interactions with Oracle databases and other relevant systems to support remediation execution
- Enhance observability capabilities, including custom metrics, tracing, and logging to validate remediation effectiveness and prevent recurrence
- Collaborate with observability, platform, and database teams to define, test, and refine automation playbooks for high-reliability remediation in production
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free