Skip to content
mimi

Data Engineer

Varite

Malvern · On-site Full-time Lead $64 – $66/hr 1w ago

About the role

Job Description

We are seeking an experienced Tech lead- Data Engineer (15+ years) with a strong background in Java, AWS, Python, PySpark, and event-driven architectures. You will design and build scalable batch and streaming data pipelines, optimize cloud data platforms, and deliver high-quality, reliable datasets that support analytics, reporting, and machine learning workloads.

Key Responsibilities

  • Architect, build, and maintain event-driven data pipelines using AWS services such as Kinesis, MSK/Kafka, Lambda, Step Functions, SQS/SNS, and Glue/EMR.
  • Develop ETL/ELT workflows using Python and PySpark, ensuring performance, scalability, and cost efficiency.
  • Implement and optimize Spark-based data transformations, partitioning strategies, and data processing frameworks.
  • Design and manage data lake and warehouse structures using S3, Glue Catalog, Athena, and/or Redshift.
  • Build streaming solutions with checkpointing, stateful transformations, idempotency, and schema evolution.
  • Ensure high standards of data quality, observability, monitoring, and alerting (CloudWatch, Datadog, etc.).
  • Implement data security best practices including IAM, encryption (KMS), networking, and governance.
  • Create reusable frameworks, internal libraries, and CI/CD pipelines for automated deployments.
  • Collaborate with data scientists, analysts, and business teams to deliver well-modeled, reliable datasets.
  • Lead design reviews, mentor junior engineers, and contribute to engineering best practices.

Required Qualifications

  • 15+ years of professional experience in Data Engineering.
  • Strong expertise in Python and PySpark for large-scale data processing.
  • Advanced hands-on experience with AWS (S3, Glue, EMR, Lambda, Step Functions, Kinesis/MSK, DynamoDB, Athena, Redshift).
  • Deep experience building event-driven and streaming data pipelines.
  • Strong SQL experience for analytical and ETL workloads.
  • Hands-on experience with workflow orchestration tools such as Airflow or Step Functions.
  • Experience with CI/CD, Git, and Infrastructure-as-Code (Terraform or CloudFormation).
  • Strong understanding of distributed systems, Spark performance tuning, data modeling, and cloud cost optimization.
  • Knowledge of data security, encryption, networking, and compliance best practices in cloud environments.

Soft Skills

  • Strong design and architectural understanding
  • Excellent communication and stakeholder interaction skills
  • Ability to work in a globally distributed team

Skills

AWSCloud ComputingData ArchitectureEvent-Driven PipelinesGitGlueIAMJavaKinesisLambdaMSK/KafkaPythonPySparkRedshiftS3SparkStep FunctionsTerraformCloudFormation

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free