Skip to content
mimi

Data & Software Engineer

VTG

McLean · On-site Full-time Today

About the role

Overview

We are seeking a Data & Software Engineer works with a small team to build complex data flows for a custom application. Successful candidate will have advanced Python programming skills, familiarity with Java, an understanding of data security, privacy, governance and compliance principles and a demonstrated history of building production data pipelines and ETL workflows at scale. Candidate must have experience:

What will you do?

  • Building end-to-end data pipelines leveraging Python
  • Using orchestration tools to deploy data pipelines, including configuring and updating Spark Jobs
  • Containerizing and deploying applications in cloud environments like AWS.
  • Working with MySQL and PostgreSQL including performance tuning, schema design, and query optimization for complex, analytical workloads.
  • Leveraging industry standard tools for code control (Git, IaaC control, etc.)
  • Working with data catalogs, tracking data lineage and handling a variety of data formats, including Geospatial.
  • Using Bash scripting for automation and data processing tasks
  • Integrating Al/ML services and models
  • Work with stakeholders to understand data requirements, assess feasibility, and design appropriate solutions with minimal oversight
  • Leverage strong problem-solving and debugging skills for data quality issues, pipeline failures, and performance bottlenecks
  • Leverage a background in large-scale data migration or platform modernization efforts
  • Contribute to data engineering documentation, best practices, and design patterns.

Do you have what it takes?

  • Active TS/SCI W/ Polygraph required.
  • Bachelor's degree in Computer Science, Engineering, Finance, or a related technical field, or equivalent practical experience.
  • Minimum of 5 years' experience with:
    • Apache Spark & PySpark
    • Advanced Python skills (including Pandas & NumPy)
    • Docker, Podman
    • AWS S3, Lambda & Step functions
    • Apache Iceberg, Airflow, etc.
    • SQL (with Trino)
    • NoSQL, DynamoDB
    • Unity Catalog OSS, Apache Polaris
    • Apache Superset
    • Terraform or CloudFormation
    • OpenLineage
    • H3, PostGIS

Requirements

  • Successful candidate will have advanced Python programming skills, familiarity with Java, an understanding of data security, privacy, governance and compliance principles and a demonstrated history of building production data pipelines and ETL workflows at scale
  • Active TS/SCI W/ Polygraph required
  • Bachelor's degree in Computer Science, Engineering, Finance, or a related technical field, or equivalent practical experience
  • Minimum of 5 years' experience with:
  • Apache Spark & PySpark
  • Advanced Python skills (including Pandas & NumPy)
  • Docker, Podman
  • AWS S3, Lambda & Step functions
  • SQL (with Trino)
  • NoSQL, DynamoDB
  • Unity Catalog OSS, Apache Polaris
  • Terraform or CloudFormation
  • OpenLineage
  • H3, PostGIS

Responsibilities

  • We are seeking a Data & Software Engineer works with a small team to build complex data flows for a custom application
  • Building end-to-end data pipelines leveraging Python
  • Using orchestration tools to deploy data pipelines, including configuring and updating Spark Jobs
  • Containerizing and deploying applications in cloud environments like AWS
  • Working with MySQL and PostgreSQL including performance tuning, schema design, and query optimization for complex, analytical workloads
  • Leveraging industry standard tools for code control (Git, IaaC control, etc.)
  • Working with data catalogs, tracking data lineage and handling a variety of data formats, including Geospatial
  • Using Bash scripting for automation and data processing tasks
  • Integrating Al/ML services and models
  • Work with stakeholders to understand data requirements, assess feasibility, and design appropriate solutions with minimal oversight
  • Leverage strong problem-solving and debugging skills for data quality issues, pipeline failures, and performance bottlenecks
  • Leverage a background in large-scale data migration or platform modernization efforts
  • Contribute to data engineering documentation, best practices, and design patterns

Benefits

Apache Iceberg, Airflow, etcApache Superset

Skills

AWS LambdaAWS S3AWS Step FunctionsAirflowAl/MLApache IcebergApache PolarisApache SparkApache SupersetBashCloudFormationDockerDynamoDBGitH3IaaCJavaMySQLNoSQLOpenLineagePandasPostGISPostgreSQLPodmanPythonPySparkSQLTerraformTrinoUnity Catalog OSS

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free