Platform Engineer - Big Data
Noblesoft Technologies
About the role
Role
Senior Platform Engineer - Big Data (AWS | EMR | EKS)
Location
- Rockville, MD
- Tysons Corner, VA
- Woodbridge, NJ
- Jersey City, NJ
3 days onsite per week
Duration
6 months (long‑term extensions)
Notes
- Build and modernize a large‑scale AWS big data platform (EMR, S3, Athena, Trino) supporting enterprise analytics
- Help drive platform evolution toward cloud‑native, containerized workloads on AWS EKS (Kubernetes)
- Work at the intersection of software engineering, big data, and platform engineering — not ETL‑only
- Design and operate Spark‑based data workloads, optimizing performance, reliability, and cost
- Implement CI/CD and Infrastructure as Code (Terraform / CloudFormation) for data platforms
- Ideal for engineers with a strong backend or platform background who’ve grown into big data
Overview
We are seeking a Senior Platform Engineer with deep Big Data experience to help design, operate, and modernize a large‑scale data platform on AWS. This role goes beyond traditional ETL or pipeline development — it is focused on building and evolving the underlying data platform that supports analytics, reporting, and future AI/ML use cases.
The current environment is built primarily on AWS EMR and S3, with a strong query layer using Athena and Trino. The team is actively modernizing the platform and evaluating AWS EKS (Kubernetes) as part of a shift toward more cloud‑native, containerized data workloads.
This role is ideal for an engineer with a software or platform engineering background who moved into big data, rather than a pure ETL developer.
Key Responsibilities
- Design, build, and operate scalable big data platforms on AWS, with S3 as the core data lake.
- Develop and optimize Spark‑based workloads on EMR, including performance tuning and cost optimization.
- Support and enhance federated query engines such as Athena and Trino for large‑scale analytics.
- Contribute to the modernization of the data platform, including evaluation and adoption of Kubernetes/EKS for data services and workloads.
- Build and operate data services and platform components using containerized deployments (Docker + EKS).
- Implement and maintain Infrastructure as Code using Terraform and/or CloudFormation.
- Design and support CI/CD pipelines for data and platform workloads.
- Partner with data engineers, analytics teams, and stakeholders to ensure the platform is reliable, performant, and extensible.
- Monitor and troubleshoot platform issues across clusters, pipelines, and query engines using CloudWatch and related tooling.
- Continuously evaluate new technologies and propose improvements to the overall data architecture.
Required Qualifications
- 8+ years of experience in Big Data, Platform Engineering, or Data Engineering roles.
- Strong hands‑on experience with AWS, including:
- EMR
- S3
- Athena
- AWS Glue / Glue Data Catalog
- Solid experience with Spark (PySpark or Scala) and distributed data processing.
- Strong SQL skills, particularly with large datasets (Athena, Trino, Presto, etc.).
- Experience with Docker and containerized applications.
- Working knowledge of Kubernetes, with exposure to AWS EKS strongly preferred.
- Experience implementing CI/CD pipelines (Jenkins, GitHub Actions, or similar).
- Infrastructure as Code experience using Terraform and/or CloudFormation.
- Strong scripting and programming skills (Python preferred).
- Ability to think at a platform and architecture level, not just task execution.
Nice to Have
- Experience running Spark on Kubernetes (EKS).
- Trino/Presto performance tuning experience.
- Experience preparing data platforms for AI/ML workloads.
- Observability tooling experience (CloudWatch, Grafana, Prometheus).
- Background as a software engineer before moving into big data.
Requirements
- Strong hands‑on experience with AWS, including EMR, S3, Athena, AWS Glue / Glue Data Catalog.
- Solid experience with Spark (PySpark or Scala) and distributed data processing.
- Strong SQL skills, particularly with large datasets (Athena, Trino, Presto, etc.).
- Experience with Docker and containerized applications.
- Working knowledge of Kubernetes, with exposure to AWS EKS strongly preferred.
- Experience implementing CI/CD pipelines (Jenkins, GitHub Actions, or similar).
- Infrastructure as Code experience using Terraform and/or CloudFormation.
- Strong scripting and programming skills (Python preferred).
- Ability to think at a platform and architecture level, not just task execution.
Responsibilities
- Design, build, and operate scalable big data platforms on AWS, with S3 as the core data lake.
- Develop and optimize Spark-based workloads on EMR, including performance tuning and cost optimization.
- Support and enhance federated query engines such as Athena and Trino for large-scale analytics.
- Contribute to the modernization of the data platform, including evaluation and adoption of Kubernetes/EKS for data services and workloads.
- Build and operate data services and platform components using containerized deployments (Docker + EKS).
- Implement and maintain Infrastructure as Code using Terraform and/or CloudFormation.
- Design and support CI/CD pipelines for data and platform workloads.
- Partner with data engineers, analytics teams, and stakeholders to ensure the platform is reliable, performant, and extensible.
- Monitor and troubleshoot platform issues across clusters, pipelines, and query engines using CloudWatch and related tooling.
- Continuously evaluate new technologies and propose improvements to the overall data architecture.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free