Skip to content
mimi

Platform Engineer - Big Data

Noblesoft Technologies

Hybrid Senior 6d ago

About the role

Role

Senior Platform Engineer - Big Data (AWS | EMR | EKS)

Location

  • Rockville, MD
  • Tysons Corner, VA
  • Woodbridge, NJ
  • Jersey City, NJ

3 days onsite per week

Duration

6 months (long‑term extensions)

Notes

  • Build and modernize a large‑scale AWS big data platform (EMR, S3, Athena, Trino) supporting enterprise analytics
  • Help drive platform evolution toward cloud‑native, containerized workloads on AWS EKS (Kubernetes)
  • Work at the intersection of software engineering, big data, and platform engineering — not ETL‑only
  • Design and operate Spark‑based data workloads, optimizing performance, reliability, and cost
  • Implement CI/CD and Infrastructure as Code (Terraform / CloudFormation) for data platforms
  • Ideal for engineers with a strong backend or platform background who’ve grown into big data

Overview

We are seeking a Senior Platform Engineer with deep Big Data experience to help design, operate, and modernize a large‑scale data platform on AWS. This role goes beyond traditional ETL or pipeline development — it is focused on building and evolving the underlying data platform that supports analytics, reporting, and future AI/ML use cases.

The current environment is built primarily on AWS EMR and S3, with a strong query layer using Athena and Trino. The team is actively modernizing the platform and evaluating AWS EKS (Kubernetes) as part of a shift toward more cloud‑native, containerized data workloads.

This role is ideal for an engineer with a software or platform engineering background who moved into big data, rather than a pure ETL developer.

Key Responsibilities

  • Design, build, and operate scalable big data platforms on AWS, with S3 as the core data lake.
  • Develop and optimize Spark‑based workloads on EMR, including performance tuning and cost optimization.
  • Support and enhance federated query engines such as Athena and Trino for large‑scale analytics.
  • Contribute to the modernization of the data platform, including evaluation and adoption of Kubernetes/EKS for data services and workloads.
  • Build and operate data services and platform components using containerized deployments (Docker + EKS).
  • Implement and maintain Infrastructure as Code using Terraform and/or CloudFormation.
  • Design and support CI/CD pipelines for data and platform workloads.
  • Partner with data engineers, analytics teams, and stakeholders to ensure the platform is reliable, performant, and extensible.
  • Monitor and troubleshoot platform issues across clusters, pipelines, and query engines using CloudWatch and related tooling.
  • Continuously evaluate new technologies and propose improvements to the overall data architecture.

Required Qualifications

  • 8+ years of experience in Big Data, Platform Engineering, or Data Engineering roles.
  • Strong hands‑on experience with AWS, including:
    • EMR
    • S3
    • Athena
    • AWS Glue / Glue Data Catalog
  • Solid experience with Spark (PySpark or Scala) and distributed data processing.
  • Strong SQL skills, particularly with large datasets (Athena, Trino, Presto, etc.).
  • Experience with Docker and containerized applications.
  • Working knowledge of Kubernetes, with exposure to AWS EKS strongly preferred.
  • Experience implementing CI/CD pipelines (Jenkins, GitHub Actions, or similar).
  • Infrastructure as Code experience using Terraform and/or CloudFormation.
  • Strong scripting and programming skills (Python preferred).
  • Ability to think at a platform and architecture level, not just task execution.

Nice to Have

  • Experience running Spark on Kubernetes (EKS).
  • Trino/Presto performance tuning experience.
  • Experience preparing data platforms for AI/ML workloads.
  • Observability tooling experience (CloudWatch, Grafana, Prometheus).
  • Background as a software engineer before moving into big data.

Requirements

  • Strong hands‑on experience with AWS, including EMR, S3, Athena, AWS Glue / Glue Data Catalog.
  • Solid experience with Spark (PySpark or Scala) and distributed data processing.
  • Strong SQL skills, particularly with large datasets (Athena, Trino, Presto, etc.).
  • Experience with Docker and containerized applications.
  • Working knowledge of Kubernetes, with exposure to AWS EKS strongly preferred.
  • Experience implementing CI/CD pipelines (Jenkins, GitHub Actions, or similar).
  • Infrastructure as Code experience using Terraform and/or CloudFormation.
  • Strong scripting and programming skills (Python preferred).
  • Ability to think at a platform and architecture level, not just task execution.

Responsibilities

  • Design, build, and operate scalable big data platforms on AWS, with S3 as the core data lake.
  • Develop and optimize Spark-based workloads on EMR, including performance tuning and cost optimization.
  • Support and enhance federated query engines such as Athena and Trino for large-scale analytics.
  • Contribute to the modernization of the data platform, including evaluation and adoption of Kubernetes/EKS for data services and workloads.
  • Build and operate data services and platform components using containerized deployments (Docker + EKS).
  • Implement and maintain Infrastructure as Code using Terraform and/or CloudFormation.
  • Design and support CI/CD pipelines for data and platform workloads.
  • Partner with data engineers, analytics teams, and stakeholders to ensure the platform is reliable, performant, and extensible.
  • Monitor and troubleshoot platform issues across clusters, pipelines, and query engines using CloudWatch and related tooling.
  • Continuously evaluate new technologies and propose improvements to the overall data architecture.

Skills

AthenaAWSAWS CloudFormationAWS GlueAWS Glue Data CatalogAWS EKSAWS EMRAWS S3CloudWatchDockerEMRGrafanaHadoopInfrastructure as CodeJenkinsKubernetesPrometheusPythonPySparkS3ScalaSparkTerraformTrino

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free