W
Senior Data Engineer
Wellnessliving
Thornhill · On-site Full-time Senior 1w ago
About the role
Responsibilities
- Monolith-to-Microservices Data Transition: Lead the decomposition of monolithic database structures into domain-aligned schemas that enable service independence and ownership.
- Pipeline Development & Migration: Build and optimize ETL/ELT workflows using Python, PySpark/Spark, AWS Glue, and dbt, including schema/data mapping and transformation from on-prem and cloud legacy systems into data lake and warehouse environments.
- Domain Data Modeling: Define logical and physical domain-driven data models (star/snowflake schemas, data marts) to serve cross-functional needs, BI, operations, streaming, and ML.
- Legacy Systems Integration: Design strategies for extracting, validating, and restructuring data from legacy systems with embedded logic and incomplete normalization.
- Database Management: Administer, optimize, and scale SQL (MySQL, Aurora, Redshift) and NoSQL (MongoDB) platforms to meet high-availability and low-latency needs.
- Cloud & Serverless ETL: Leverage AWS Glue Catalog, Crawlers, Lambda, and S3 to manage and orchestrate modern, cost-efficient data pipelines.
- Data Governance & Compliance: Enforce best practices around cataloging, lineage, retention, access control, and security, ensuring compliance with GDPR, CCPA, PIPEDA, and internal standards.
- Monitoring & Optimization: Implement observability (CloudWatch, logs, metrics) and performance tuning across Spark, Glue, and Redshift workloads.
- Stakeholder Collaboration: Work with architects, analysts, product managers, and data scientists to define, validate, and prioritize requirements.
- Documentation & Mentorship: Maintain technical documentation (data dictionaries, migration guides, schema specs) and mentor junior engineers in engineering standards.
Required Qualifications
- Experience: 5+ years in data engineering with a proven record in modernizing legacy data systems and driving large-scale migration initiatives.
- Cloud ETL Expertise: Proficient in AWS Glue, Apache Spark/PySpark, and modular transformation frameworks like dbt.
- Data Modeling: Strong grasp of domain-driven design, bounded contexts, and BI-friendly modeling approaches (star/snowflake/data vault).
- Data Migration: Experience with full lifecycle migrations including schema/data mapping, reconciliation, and exception handling.
- Databases: SQL: MySQL, Aurora, Redshift & NoSQL: MongoDB, DocumentDB
- Programming: Strong Python skills for data wrangling, pipeline automation, and API interactions.
- Data Architecture: Hands-on with data lakes, warehousing strategies, and hybrid cloud data ecosystems.
- Compliance & Security: Track record implementing governance, data cataloging, encryption, retention, lineage, and RBAC.
- DevOps Practices: Git, CI/CD pipelines, Docker, and test automation for data pipelines.
Preferred Qualifications
- Experience with streaming data platforms like Kafka, Kinesis, or CDC tools such as Debezium
- Familiarity with orchestration platforms like Airflow or Prefect
- Background in analytics, data modeling for AI/ML pipelines, or ML-ready data preparation
- Understanding of cloud-native data services (AWS Glue, Redshift, Snowflake, BigQuery, etc.)
- Degree in Computer Science, Engineering, or equivalent field
- Strong written and verbal communication skills
- Self-starter with ability to navigate ambiguity and legacy system complexity
- Exposure to generative AI, LLM fine-tuning, or feature store design is a plus
Please note that only those selected for an interview will be contacted. We appreciate you taking the time and look forward to reviewing your application.
Skills
AWS GlueAuroraCI/CDCloudWatchDockerGitLambdaMongoDBMySQLNoSQLPythonPySparkRedshiftS3SparkSQLdbt
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free