Skip to content
mimi

Senior DevOps Engineer

Total Wine & More

On-site Full-time Senior $122k – $165k/yr 3w ago

About the role

About the Role

Total Wine & More is seeking a Senior DevOps Engineer to join our Technology team in Bethesda, MD or Boca Raton, FL. In this role, you will run Kubernetes on AKS and GKE with a strong focus on security and reliability. You'll use Argo Workflows to orchestrate data pipelines, and scheduled jobs. You'll build, maintain, and improve CI/CD with Jenkins and GitHub Actions. You'll grow Backstage into a practical internal developer platform that empowers engineers with golden paths, reusable templates, and self‑service options. You'll operate Kafka, CockroachDB (Postgres‑compatible), Couchbase, and Elasticsearch for performance, resilience, and cost‑efficiency. You'll help lead observability with Prometheus, Grafana, and Tempo for tracing. You'll collaborate with teams building in C#/.NET, Go, and Node.js, and help drive our AI initiative by identifying practical use cases and integrating tools that deliver measurable impact. You will report to the Sr. Manager, Platform Engineering.

Responsibilities

  • Own multi‑cloud Kubernetes platforms on Azure AKS and Google Cloud GKE—design cluster topology, networking, RBAC, and policies for secure, scalable, cost‑efficient operation.
  • Build and evolve the Internal Developer Platform with Backstage—service catalog, golden‑path templates, scorecards, and self‑service scaffolding to standardize app onboarding.
  • Engineer CI/CD at scale using Jenkins and GitHub Actions—pipelines‑as‑code, environment promotion, secrets management, and artifact provenance.
  • Orchestrate batch/data workflows with Argo Workflows (not CI/CD)—multi‑tenant DAGs, resource quotas, artifact/versioning strategy, and guardrails.
  • Operate and tune stateful services—Kafka, CockroachDB (Postgres), Couchbase, Elasticsearch—including capacity planning, replication, backup/restore, and DR.
  • Establish end‑to‑end observability—Prometheus metrics, Grafana dashboards, Grafana Tempo tracing, SLOs/error budgets, actionable alerting, and on‑call runbooks.
  • Build platform tooling & automation in Go, Node.js, and C#/.NET—CLIs, controllers/operators, APIs, and integrations that improve developer experience.
  • Drive security, compliance, and reliability practices—image/signing & SBOMs, secrets management, network policies, least privilege, cost monitoring, incident response, and postmortems.

Requirements

  • 5‑8 years preferred
  • Multi‑cloud Kubernetes experience (AKS/GKE)
  • CI/CD (Jenkins, GitHub Actions)
  • Backstage experience
  • Argo Workflow
  • Distributed data ops: Kafka, CockroachDB, Couchbase, Elasticsearch—tuning, backup/restore, DR
  • Observability: Prometheus, Grafana, Tempo
  • Platform development: Go, Node.js, C#/.NET

Benefits

  • Paid Time Off (PTO)
  • Generous store discounts
  • Health care plans (medical, prescription, dental, vision)
  • 401(k), HSA, FSA, Pre‑tax commuter benefits
  • Disability & life insurance coverage
  • Paid parental leave
  • Pet insurance
  • Critical illness and accident insurance
  • Discounted home and auto insurance
  • College tuition assistance
  • Career development & product training
  • Consumer classes
  • & More!

About Total Wine & More

Total Wine & More is the country's largest independent retailer of fine wine, beer and spirits, and we continue to grow our footprint year over year. Total Wine offers exciting and unique career opportunities across the country and in our corporate office. Our strength is our people. We have a commitment to training and career growth, all in an environment that values new ideas and teamwork. If you share our entrepreneurial spirit and a passion for providing best‑in‑class customer experience, take a moment to apply or learn more at https://careers.totalwine.com/!

Compensation

Pay Range: $122,200 – $165,000 Annually

Total Wine & More considers several factors when establishing compensation. Estimated salaries determined by third parties have not been validated by Total Wine & More. Compensation may vary based on a number of factors including, but not limited to, market location, job‑related knowledge, skills and/or experience.

Equal Opportunity

Total Wine & More is an equal opportunity employer and all qualified applicants will receive consideration for employment without discrimination based on race, color, religion, national origin, sex, sexual orientation, age, marital status, veteran status, disability, or any other characteristic protected by applicable law. Total Wine & More makes reasonable accommodations during all aspects of the employment process, including during the interview process. Total Wine & More is a Drug Free Workplace.

The information provided above indicates the general nature and level of work required of the position and is not a comprehensive list of all responsibilities or qualifications. Benefits list is only a highlight of some of the benefits offered to team members; eligibility for certain benefits apply.

Requirements

  • Multi-cloud Kubernetes exp (AKS/GKE)
  • CI/CD (Jenkins, GitHub Actions)
  • Backstage exp
  • Argo Workflow
  • Distributed data ops: Kafka, CockroachDB, Couchbase, Elasticsearch—tuning, backup/restore, DR
  • Observability: Prometheus, Grafana, Tempo
  • Platform dev: Go, Node.js, C#/.NET

Responsibilities

  • Own multi-cloud Kubernetes platforms on Azure AKS and Google Cloud GKE—design cluster topology, networking, RBAC, and policies for secure, scalable, cost-efficient operation.
  • Build and evolve the Internal Developer Platform with Backstage—service catalog, golden-path templates, scorecards, and self-service scaffolding to standardize app onboarding.
  • Engineer CI/CD at scale using Jenkins and GitHub Actions—pipelines-as-code, environment promotion, secrets management, and artifact provenance.
  • Orchestrate batch/data workflows with Argo Workflows (not CI/CD)—multi-tenant DAGs, resource quotas, artifact/versioning strategy, and guardrails.
  • Operate and tune stateful services—Kafka, CockroachDB (Postgres), Couchbase, Elasticsearch—including capacity planning, replication, backup/restore, and DR.
  • Establish end-to-end observability—Prometheus metrics, Grafana dashboards, Grafana Tempo tracing, SLOs/error budgets, actionable alerting, and on-call runbooks.
  • Build platform tooling & automation in Go, Node.js, and C#/.NET—CLIs, controllers/operators, APIs, and integrations that improve developer experience.
  • Drive security, compliance, and reliability practices—image/signing & SBOMs, secrets management, network policies, least privilege, cost monitoring, incident response, and postmortems.

Benefits

Paid Time Off (PTO)store discountsmedical insuranceprescription insurancedental insurancevision insurance401(k)HSAFSAPre-tax commuter benefitsDisability insurancelife insurancePaid parental leavePet insuranceCritical illness insuranceaccident insurancehome insuranceauto insuranceCollege tuition assistanceCareer development & product trainingConsumer classes

Skills

AKSArgo WorkflowsBackstageC#.NETCockroachDBCouchbaseElasticsearchGKEGitHub ActionsGoGrafanaJenkinsKafkaKubernetesNode.jsPrometheusPostgreSQLTempo

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free