Skip to content
mimi

Senior Data Engineer (Hadoop + GCP Dataproc)

Recutify Inc.

Canada · On-site Full-time Senior 2w ago

About the role

Position: Senior Data Engineer (Hadoop + GCP Dataproc)

Location : Toronto

Mode of Work : Hybrid (4 days a week)

Role Overview:

We are looking for an experienced Senior Data Engineer with strong expertise in the Hadoop ecosystem and Google Cloud Platform, particularly GCP Dataproc. The ideal candidate will have hands-on experience in modernizing data platforms, optimizing large-scale data processing workloads, and migrating Hive-based workloads to BigQuery.

Required Qualifications: • 6 10+ years of experience working within the Hadoop ecosystem, with deep expertise in Hive, Hive Metastore, and GCP Dataproc. • Strong hands-on experience with Dataproc Serverless, BigQuery, Google Cloud Storage (GCS), Cloud Composer, and Cloud Logging/Monitoring. • Solid understanding of table formats (Parquet, ORC), partitioning/bucketing strategies, and query performance optimization. • Proven experience migrating Hive datasets, tables, and SQL queries to BigQuery, including handling syntax differences, functions, UDFs, and performance tuning. • Strong knowledge of Security/IAM best practices, service accounts, and network/security controls; ability to operate within strict organization-level policies.

Nice-to-Have Skills : • Familiarity with Dataproc Metastore vs. standalone Hive Metastore, and experience with Glue or other metadata/catalog services. • Exposure to data quality frameworks and lineage tools (e.g., OpenLineage, Collibra). • A FinOps mindset, including experience managing quotas, reservations, and cost governance for Dataproc and BigQuery workloads

Requirements

  • 10+ years of experience working within the Hadoop ecosystem
  • deep expertise in Hive, Hive Metastore, and GCP Dataproc
  • hands-on experience with Dataproc Serverless, BigQuery, Google Cloud Storage (GCS), Cloud Composer, and Cloud Logging/Monitoring
  • solid understanding of table formats (Parquet, ORC), partitioning/bucketing strategies, and query performance optimization
  • proven experience migrating Hive datasets, tables, and SQL queries to BigQuery
  • strong knowledge of Security/IAM best practices, service accounts, and network/security controls

Responsibilities

  • modernizing data platforms
  • optimizing large-scale data processing workloads
  • migrating Hive-based workloads to BigQuery

Skills

Hadoop ecosystemGoogle Cloud PlatformGCP DataprocHiveHive MetastoreBigQueryDataproc ServerlessGoogle Cloud Storage (GCS)Cloud ComposerCloud Logging/MonitoringParquetORCSecurity/IAM best practicesservice accountsnetwork/security controls

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free