Skip to content
mimi

Data Engineer - Databricks (gn)

BLACKBULL INTERNATIONAL GmbH

On-site Full-time Mid Level 3w ago

About the role

Purpose of job

The role is responsible for designing, building and operating data products and reusable data preparation components on a Databricks-based Data and AI Platform, while acting as a technical enabler for internal platform users through expert guidance and advanced-level support. The position ensures adherence to security, privacy and regulatory requirements via a compliance-by-design approach, maintains alignment with established best practices for data pipelines and data quality, and drives continuous platform improvement through the integration of new features and standardized, reusable pipelines.

Tasks

  • Develop and maintain a library of modular, reusable pipeline components — covering data intake, processing, verification and enrichment — to produce consistently structured, AI/ML-ready datasets from both structured and unstructured sources.
  • Architect and run dependable data workflows on Databricks, pulling from a wide range of internal and external origins to produce clean, validated data assets available for AI, ML and reporting purposes.
  • Govern the lifecycle of layered data assets across maturity tiers, upholding quality and timeliness standards while maintaining purpose-specific, analytics- and model-ready output datasets.
  • Build and maintain transformation workflows that derive meaningful predictive attributes from raw data, alongside a centrally managed attribute repository with clear versioning, ownership and service-level commitments.
  • Work alongside AI and ML engineers to establish and maintain data supply chains for retrieval-augmented generation systems, covering content segmentation, vector representation updates and index synchronization.
  • Embed governance controls — covering permissions, traceability, data lifecycle management, encryption and audit trails — into platform design to meet both internal policies and external regulatory requirements.

Required:

  • At least 3 years of hands-on experience designing and running large-scale data pipelines and data products on Databricks — batch and/or streaming — preferably in regulated or governance-heavy environments (ideally in the Financial Services industry)
  • Advanced proficiency in Databricks, Spark, Python and SQL, complemented by sound software engineering practices such as CI/CD workflows and infrastructure-as-code tooling (e.g., Terraform) on Azure.
  • Solid grasp of data engineering principles relevant to AI/ML contexts, including data modelling, quality assurance, feature collaboration and reproducibility, as well as an understanding of how data characteristics influence model behaviour.
  • Strong command of modern data architecture patterns — including Lakehouse principles, layered data organisation and data product thinking — and the ability to put them into practice within a governed enterprise setting.
  • Degree in computer science or a comparable discipline.
  • Strong analytical mindset paired with structured planning, clear documentation and the ability to effectively transfer knowledge to colleagues.

Preferred:

  • Professional background in the financial sector, ideally within asset management, combined with international work experience and relevant Databricks certifications.

Skills

AzureCI/CDDatabricksInfrastructure-as-codePythonSparkSQLTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free