LF
Data Engineer (PYSPARK)
LONG FINCH TECHNOLOGIES
Mississauga · On-site Full-time 4w ago
About the role
Python & PySpark Data Engineer Overview
We are looking for a Data Engineer with strong expertise in Python and Py Spark to design, build, and optimize scalable data pipelines. You will work with large datasets, distributed systems, and cloud platforms to enable data-driven decision-making.
Key Responsibilities
1. Data Pipeline Development
- Design and build ETL/ELT pipelines using Python and Py Spark
- Process large-scale structured and unstructured data
- Ensure high performance and reliability of data workflows
2. Big Data Processing
- Use Apache Spark (especially PySpark) for distributed data processing
- Optimize Spark jobs (partitioning, caching, joins, etc.)
- Handle batch and near real-time data processing
3. Data Integration
- Ingest data from multiple sources: APIs, databases, flat files, streaming systems
- Work with tools like Apache Kafka for real-time pipelines
- Ensure data consistency and integrity
4. Data Modeling & Storage
- Design scalable data models (star/snowflake schemas)
- Work with:
- Data lakes (e.g., Amazon S3)
- Data warehouses (e.g., Snowflake, Amazon Redshift)
5. Performance Optimization
- Tune SQL queries and Spark jobs
- Optimize memory usage and job execution time
- Implement efficient partitioning and indexing strategies
6. Cloud & Dev Ops
- Work on cloud platforms like:
- Amazon Web Services
- Microsoft Azure
- Google Cloud Platform
- Build CI/CD pipelines for data workflows
- Use containerization tools like Docker
7. Data Quality & Governance
- Implement validation checks and monitoring
- Ensure data accuracy, lineage, and governance
- Work with logging and alerting systems
Required Skills
Core Technical Skills
- Strong programming in Python
- Expertise in PySpark / Apache Spark
- Advanced SQL knowledge
- Experience with distributed computing
Big Data & Tools
- Hands-on with:
- Hadoop ecosystem
- Apache Hive
- Apache Airflow
Data Engineering Concepts
- ETL/ELT design
- Data warehousing & modeling
- Batch vs streaming architectures
Cloud & Storage
- Experience with cloud data services (S3, Big Query, ADLS, etc.)
- Understanding of data lake architecture
Preferred / Nice-to-Have Skills
- Real-time processing (Kafka, Spark Streaming)
- Knowledge of Delta Lake or Apache Iceberg
- Experience with Databricks
- Basic understanding of machine learning pipelines
- Familiarity with Dev Ops tools (CI/CD, Terraform)
Skills
Apache AirflowApache HiveApache KafkaApache SparkAmazon RedshiftAmazon S3BigQueryCloudData LakesData WarehousingDatabricksDelta LakeDockerETLGoogle Cloud PlatformHadoopMicrosoft AzurePythonPySparkSQLSnowflake
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free