Skip to content
mimi

Senior Data Engineer

Wellnessliving

Thornhill · On-site Full-time Senior 1w ago

About the role

Responsibilities

  • Monolith-to-Microservices Data Transition: Lead the decomposition of monolithic database structures into domain-aligned schemas that enable service independence and ownership.
  • Pipeline Development & Migration: Build and optimize ETL/ELT workflows using Python, PySpark/Spark, AWS Glue, and dbt, including schema/data mapping and transformation from on-prem and cloud legacy systems into data lake and warehouse environments.
  • Domain Data Modeling: Define logical and physical domain-driven data models (star/snowflake schemas, data marts) to serve cross-functional needs, BI, operations, streaming, and ML.
  • Legacy Systems Integration: Design strategies for extracting, validating, and restructuring data from legacy systems with embedded logic and incomplete normalization.
  • Database Management: Administer, optimize, and scale SQL (MySQL, Aurora, Redshift) and NoSQL (MongoDB) platforms to meet high-availability and low-latency needs.
  • Cloud & Serverless ETL: Leverage AWS Glue Catalog, Crawlers, Lambda, and S3 to manage and orchestrate modern, cost-efficient data pipelines.
  • Data Governance & Compliance: Enforce best practices around cataloging, lineage, retention, access control, and security, ensuring compliance with GDPR, CCPA, PIPEDA, and internal standards.
  • Monitoring & Optimization: Implement observability (CloudWatch, logs, metrics) and performance tuning across Spark, Glue, and Redshift workloads.
  • Stakeholder Collaboration: Work with architects, analysts, product managers, and data scientists to define, validate, and prioritize requirements.
  • Documentation & Mentorship: Maintain technical documentation (data dictionaries, migration guides, schema specs) and mentor junior engineers in engineering standards.

Required Qualifications

  • Experience: 5+ years in data engineering with a proven record in modernizing legacy data systems and driving large-scale migration initiatives.
  • Cloud ETL Expertise: Proficient in AWS Glue, Apache Spark/PySpark, and modular transformation frameworks like dbt.
  • Data Modeling: Strong grasp of domain-driven design, bounded contexts, and BI-friendly modeling approaches (star/snowflake/data vault).
  • Data Migration: Experience with full lifecycle migrations including schema/data mapping, reconciliation, and exception handling.
  • Databases: SQL: MySQL, Aurora, Redshift & NoSQL: MongoDB, DocumentDB
  • Programming: Strong Python skills for data wrangling, pipeline automation, and API interactions.
  • Data Architecture: Hands-on with data lakes, warehousing strategies, and hybrid cloud data ecosystems.
  • Compliance & Security: Track record implementing governance, data cataloging, encryption, retention, lineage, and RBAC.
  • DevOps Practices: Git, CI/CD pipelines, Docker, and test automation for data pipelines.

Preferred Qualifications

  • Experience with streaming data platforms like Kafka, Kinesis, or CDC tools such as Debezium
  • Familiarity with orchestration platforms like Airflow or Prefect
  • Background in analytics, data modeling for AI/ML pipelines, or ML-ready data preparation
  • Understanding of cloud-native data services (AWS Glue, Redshift, Snowflake, BigQuery, etc.)
  • Degree in Computer Science, Engineering, or equivalent field
  • Strong written and verbal communication skills
  • Self-starter with ability to navigate ambiguity and legacy system complexity
  • Exposure to generative AI, LLM fine-tuning, or feature store design is a plus

Please note that only those selected for an interview will be contacted. We appreciate you taking the time and look forward to reviewing your application.

Skills

AWS GlueAuroraCI/CDCloudWatchDockerGitLambdaMongoDBMySQLNoSQLPythonPySparkRedshiftS3SparkSQLdbt

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free