Skip to content
mimi

Data Engineer (GCP)

Women Innovators In Tech

Remote · Canada Full-time Mid Level $71k – $141k/yr Today

About the role

Role Overview

We’re looking for a skilled Data Engineer to design, build, and optimize scalable, cloud‑native data pipelines on Google Cloud Platform (GCP). The role involves extensive work with Apache Airflow, Spark, Python, and Scala to develop high‑performance data solutions supporting analytics, streaming, and generative AI initiatives.

Key Responsibilities

  • Develop, automate, and maintain batch and streaming ETL pipelines using Apache Airflow, Apache Spark, Python, and Scala.
  • Build and manage cloud‑based data ecosystems on GCP (BigQuery, Bigtable, Dataproc, Pub/Sub, Cloud Storage, IAM, VPC).
  • Design and optimize SQL and NoSQL data models for data lakes and warehouses (BigQuery, MongoDB, Snowflake).
  • Write complex SQL queries for advanced data transformation, aggregation, and analytics optimization within BigQuery or equivalent platforms.
  • Apply modern Test Driven Development (TDD) methodologies for big data pipelines, ensuring test automation across Airflow workflows, Spark jobs, and transformation logic.
  • Apply data mesh and data‑as‑a‑product principles to enable reusable and domain‑driven datasets.
  • Implement real‑time ingestion with Kafka Connect and process streaming data using Spark Streaming, Apache Flink, or similar technologies.
  • Optimize data performance, scalability, and cost efficiency across GCP components.
  • Ensure compliance with PCI and PII data standards such as GDPR, PCI DSS, SOX, and CCPA.
  • Integrate GenAI tools such as OpenAI, Gemini, and Anthropic LLMs for intelligent data quality and analytics enhancement.
  • Collaborate with stakeholders, data scientists, and full‑stack engineers to deliver trusted, documented, and reusable data products.

Required Qualifications

  • Bachelor’s or Master’s in Computer Science, Data Engineering, or related field.
  • 5+ years of hands‑on experience with large‑scale data engineering in cloud environments.
  • Advanced skills using Python, Scala, Spark ecosystem, and SQL to build data pipelines.
  • Strong GCP expertise (BigQuery, Bigtable, Dataproc, Pub/Sub, IAM, VPC).
  • Proficiency in SQL/NoSQL modeling and data architecture for cloud data lakes.
  • Familiarity with streaming frameworks (Kafka, Flume).
  • Experience handling sensitive data and ensuring regulatory compliance.
  • Working knowledge of Docker, CI/CD, and modern DevOps practices for data platforms.

Preferred Qualifications

  • Experience with Infrastructure as Code (IaC) tools such as Terraform or Ansible.
  • Contributions to open‑source projects or internal developer tooling.
  • Prior experience building Customer Data Platforms (CDPs) in‑house.
  • Experience with AI‑assisted developer tools (e.g., IntelliJ plug‑ins using OpenAI or Anthropic models), Codex CLI, Windsurf.

Job Details

  • Job Type: Full‑time
  • Location: Canada / Remote
  • Experience Required: 5 years
  • Salary: $70,503.61 – $140,613.77 per year

Requirements

  • Bachelor’s or Master’s in Computer Science, Data Engineering, or related field.
  • 5+ years of hands-on experience with large-scale data engineering in cloud environments.
  • Advanced skills using Python, Scala, Spark ecosystem, SQL to build data pipelines.
  • Strong GCP expertise (BigQuery, Bigtable, Dataproc, Pub/Sub, IAM, VPC).
  • Proficiency in SQL/NoSQL modeling and data architecture for cloud data lakes.
  • Familiarity with streaming frameworks (Kafka, Flume).
  • Experience handling sensitive data and ensuring regulatory compliance.
  • Working knowledge of Docker, CI/CD, and modern DevOps practices for data platforms.

Responsibilities

  • Develop, automate, and maintain batch and streaming ETL pipelines using Apache Airflow, Apache Spark, Python, and Scala.
  • Build and manage cloud-based data ecosystems on GCP (BigQuery, Bigtable, Dataproc, Pub/Sub, Cloud Storage, IAM, VPC).
  • Design and optimize SQL and NoSQL data models for data lakes and warehouses (BigQuery, MongoDB, Snowflake).
  • Write complex SQL queries for advanced data transformation, aggregation, and analytics optimization within BigQuery or equivalent platforms.
  • Apply modern Test Driven Development (TDD) methodologies for big data pipelines, ensuring test automation across Airflow workflows, Spark jobs, and transformation logic.
  • Apply data mesh and data-as-a-product principles to enable reusable and domain-driven datasets.
  • Implement real time ingestion with Kafka Connect and process streaming data using Spark Streaming, Apache Flink, or similar technologies.
  • Optimize data performance, scalability, and cost efficiency across GCP components.
  • Ensure compliance with PCI and PII data with standards such as GDPR, PCI DSS, SOX, and CCPA.
  • Integrate GenAI tools such as OpenAI, Gemini, and Anthropic LLMs for intelligent data quality and analytics enhancement.
  • Collaborate with stakeholders, data scientists, and full stack engineers to deliver trusted, documented, and reusable data products.

Skills

Apache AirflowApache SparkBigQueryBigtableCI/CDCloud StorageDataprocDockerFlumeGCPGDPRGenAIGeminiIAMKafkaKafka ConnectLLMsMongoDBNoSQLOpenAIPub/SubPythonScalaSpark StreamingSQLSnowflakeSparkTDDTerraformVPC

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free