Data Engineer (PYSPARK)

LONG FINCH TECHNOLOGIES

Mississauga · On-site Full-time 4w ago

About the role

Python & PySpark Data Engineer Overview

We are looking for a Data Engineer with strong expertise in Python and Py Spark to design, build, and optimize scalable data pipelines. You will work with large datasets, distributed systems, and cloud platforms to enable data-driven decision-making.

Key Responsibilities

1. Data Pipeline Development

Design and build ETL/ELT pipelines using Python and Py Spark
Process large-scale structured and unstructured data
Ensure high performance and reliability of data workflows

2. Big Data Processing

Use Apache Spark (especially PySpark) for distributed data processing
Optimize Spark jobs (partitioning, caching, joins, etc.)
Handle batch and near real-time data processing

3. Data Integration

Ingest data from multiple sources: APIs, databases, flat files, streaming systems
Work with tools like Apache Kafka for real-time pipelines
Ensure data consistency and integrity

4. Data Modeling & Storage

Design scalable data models (star/snowflake schemas)
Work with:
- Data lakes (e.g., Amazon S3)
- Data warehouses (e.g., Snowflake, Amazon Redshift)

5. Performance Optimization

Tune SQL queries and Spark jobs
Optimize memory usage and job execution time
Implement efficient partitioning and indexing strategies

6. Cloud & Dev Ops

Work on cloud platforms like:
- Amazon Web Services
- Microsoft Azure
- Google Cloud Platform
Build CI/CD pipelines for data workflows
Use containerization tools like Docker

7. Data Quality & Governance

Implement validation checks and monitoring
Ensure data accuracy, lineage, and governance
Work with logging and alerting systems

Required Skills

Core Technical Skills

Strong programming in Python
Expertise in PySpark / Apache Spark
Advanced SQL knowledge
Experience with distributed computing

Big Data & Tools

Hands-on with:
- Hadoop ecosystem
- Apache Hive
- Apache Airflow

Data Engineering Concepts

ETL/ELT design
Data warehousing & modeling
Batch vs streaming architectures

Cloud & Storage

Experience with cloud data services (S3, Big Query, ADLS, etc.)
Understanding of data lake architecture

Preferred / Nice-to-Have Skills

Real-time processing (Kafka, Spark Streaming)
Knowledge of Delta Lake or Apache Iceberg
Experience with Databricks
Basic understanding of machine learning pipelines
Familiarity with Dev Ops tools (CI/CD, Terraform)

Skills

Apache AirflowApache HiveApache KafkaApache SparkAmazon RedshiftAmazon S3BigQueryCloudData LakesData WarehousingDatabricksDelta LakeDockerETLGoogle Cloud PlatformHadoopMicrosoft AzurePythonPySparkSQLSnowflake

Similar roles

Team Leads

imagino

€70k – €110k/yr

Staff Engineer

imagino

€70k – €110k/yr

Mid-Level IoT Engineer

Cosmoquick

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free