Skip to content
mimi

Data Engineer (PYSPARK)

LONG FINCH TECHNOLOGIES

Mississauga · On-site Full-time 4w ago

About the role

Python & PySpark Data Engineer Overview

We are looking for a Data Engineer with strong expertise in Python and Py Spark to design, build, and optimize scalable data pipelines. You will work with large datasets, distributed systems, and cloud platforms to enable data-driven decision-making.

Key Responsibilities

1. Data Pipeline Development

  • Design and build ETL/ELT pipelines using Python and Py Spark
  • Process large-scale structured and unstructured data
  • Ensure high performance and reliability of data workflows

2. Big Data Processing

  • Use Apache Spark (especially PySpark) for distributed data processing
  • Optimize Spark jobs (partitioning, caching, joins, etc.)
  • Handle batch and near real-time data processing

3. Data Integration

  • Ingest data from multiple sources: APIs, databases, flat files, streaming systems
  • Work with tools like Apache Kafka for real-time pipelines
  • Ensure data consistency and integrity

4. Data Modeling & Storage

  • Design scalable data models (star/snowflake schemas)
  • Work with:
    • Data lakes (e.g., Amazon S3)
    • Data warehouses (e.g., Snowflake, Amazon Redshift)

5. Performance Optimization

  • Tune SQL queries and Spark jobs
  • Optimize memory usage and job execution time
  • Implement efficient partitioning and indexing strategies

6. Cloud & Dev Ops

  • Work on cloud platforms like:
    • Amazon Web Services
    • Microsoft Azure
    • Google Cloud Platform
  • Build CI/CD pipelines for data workflows
  • Use containerization tools like Docker

7. Data Quality & Governance

  • Implement validation checks and monitoring
  • Ensure data accuracy, lineage, and governance
  • Work with logging and alerting systems

Required Skills

Core Technical Skills

  • Strong programming in Python
  • Expertise in PySpark / Apache Spark
  • Advanced SQL knowledge
  • Experience with distributed computing

Big Data & Tools

  • Hands-on with:
    • Hadoop ecosystem
    • Apache Hive
    • Apache Airflow

Data Engineering Concepts

  • ETL/ELT design
  • Data warehousing & modeling
  • Batch vs streaming architectures

Cloud & Storage

  • Experience with cloud data services (S3, Big Query, ADLS, etc.)
  • Understanding of data lake architecture

Preferred / Nice-to-Have Skills

  • Real-time processing (Kafka, Spark Streaming)
  • Knowledge of Delta Lake or Apache Iceberg
  • Experience with Databricks
  • Basic understanding of machine learning pipelines
  • Familiarity with Dev Ops tools (CI/CD, Terraform)

Skills

Apache AirflowApache HiveApache KafkaApache SparkAmazon RedshiftAmazon S3BigQueryCloudData LakesData WarehousingDatabricksDelta LakeDockerETLGoogle Cloud PlatformHadoopMicrosoft AzurePythonPySparkSQLSnowflake

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free