V
Data & Software Engineer
VTG
McLean · On-site Full-time Today
About the role
Overview
We are seeking a Data & Software Engineer works with a small team to build complex data flows for a custom application. Successful candidate will have advanced Python programming skills, familiarity with Java, an understanding of data security, privacy, governance and compliance principles and a demonstrated history of building production data pipelines and ETL workflows at scale. Candidate must have experience:
What will you do?
- Building end-to-end data pipelines leveraging Python
- Using orchestration tools to deploy data pipelines, including configuring and updating Spark Jobs
- Containerizing and deploying applications in cloud environments like AWS.
- Working with MySQL and PostgreSQL including performance tuning, schema design, and query optimization for complex, analytical workloads.
- Leveraging industry standard tools for code control (Git, IaaC control, etc.)
- Working with data catalogs, tracking data lineage and handling a variety of data formats, including Geospatial.
- Using Bash scripting for automation and data processing tasks
- Integrating Al/ML services and models
- Work with stakeholders to understand data requirements, assess feasibility, and design appropriate solutions with minimal oversight
- Leverage strong problem-solving and debugging skills for data quality issues, pipeline failures, and performance bottlenecks
- Leverage a background in large-scale data migration or platform modernization efforts
- Contribute to data engineering documentation, best practices, and design patterns.
Do you have what it takes?
- Active TS/SCI W/ Polygraph required.
- Bachelor's degree in Computer Science, Engineering, Finance, or a related technical field, or equivalent practical experience.
- Minimum of 5 years' experience with:
- Apache Spark & PySpark
- Advanced Python skills (including Pandas & NumPy)
- Docker, Podman
- AWS S3, Lambda & Step functions
- Apache Iceberg, Airflow, etc.
- SQL (with Trino)
- NoSQL, DynamoDB
- Unity Catalog OSS, Apache Polaris
- Apache Superset
- Terraform or CloudFormation
- OpenLineage
- H3, PostGIS
Requirements
- Successful candidate will have advanced Python programming skills, familiarity with Java, an understanding of data security, privacy, governance and compliance principles and a demonstrated history of building production data pipelines and ETL workflows at scale
- Active TS/SCI W/ Polygraph required
- Bachelor's degree in Computer Science, Engineering, Finance, or a related technical field, or equivalent practical experience
- Minimum of 5 years' experience with:
- Apache Spark & PySpark
- Advanced Python skills (including Pandas & NumPy)
- Docker, Podman
- AWS S3, Lambda & Step functions
- SQL (with Trino)
- NoSQL, DynamoDB
- Unity Catalog OSS, Apache Polaris
- Terraform or CloudFormation
- OpenLineage
- H3, PostGIS
Responsibilities
- We are seeking a Data & Software Engineer works with a small team to build complex data flows for a custom application
- Building end-to-end data pipelines leveraging Python
- Using orchestration tools to deploy data pipelines, including configuring and updating Spark Jobs
- Containerizing and deploying applications in cloud environments like AWS
- Working with MySQL and PostgreSQL including performance tuning, schema design, and query optimization for complex, analytical workloads
- Leveraging industry standard tools for code control (Git, IaaC control, etc.)
- Working with data catalogs, tracking data lineage and handling a variety of data formats, including Geospatial
- Using Bash scripting for automation and data processing tasks
- Integrating Al/ML services and models
- Work with stakeholders to understand data requirements, assess feasibility, and design appropriate solutions with minimal oversight
- Leverage strong problem-solving and debugging skills for data quality issues, pipeline failures, and performance bottlenecks
- Leverage a background in large-scale data migration or platform modernization efforts
- Contribute to data engineering documentation, best practices, and design patterns
Benefits
Apache Iceberg, Airflow, etcApache Superset
Skills
AWS LambdaAWS S3AWS Step FunctionsAirflowAl/MLApache IcebergApache PolarisApache SparkApache SupersetBashCloudFormationDockerDynamoDBGitH3IaaCJavaMySQLNoSQLOpenLineagePandasPostGISPostgreSQLPodmanPythonPySparkSQLTerraformTrinoUnity Catalog OSS
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free