Data Engineer - Databricks (gn)

BLACKBULL INTERNATIONAL GmbH

On-site Full-time Mid Level 2mo ago

About the role

Purpose of job

The role is responsible for designing, building and operating data products and reusable data preparation components on a Databricks-based Data and AI Platform, while acting as a technical enabler for internal platform users through expert guidance and advanced-level support. The position ensures adherence to security, privacy and regulatory requirements via a compliance-by-design approach, maintains alignment with established best practices for data pipelines and data quality, and drives continuous platform improvement through the integration of new features and standardized, reusable pipelines.

Tasks

Develop and maintain a library of modular, reusable pipeline components — covering data intake, processing, verification and enrichment — to produce consistently structured, AI/ML-ready datasets from both structured and unstructured sources.
Architect and run dependable data workflows on Databricks, pulling from a wide range of internal and external origins to produce clean, validated data assets available for AI, ML and reporting purposes.
Govern the lifecycle of layered data assets across maturity tiers, upholding quality and timeliness standards while maintaining purpose-specific, analytics- and model-ready output datasets.
Build and maintain transformation workflows that derive meaningful predictive attributes from raw data, alongside a centrally managed attribute repository with clear versioning, ownership and service-level commitments.
Work alongside AI and ML engineers to establish and maintain data supply chains for retrieval-augmented generation systems, covering content segmentation, vector representation updates and index synchronization.
Embed governance controls — covering permissions, traceability, data lifecycle management, encryption and audit trails — into platform design to meet both internal policies and external regulatory requirements.

Required:

At least 3 years of hands-on experience designing and running large-scale data pipelines and data products on Databricks — batch and/or streaming — preferably in regulated or governance-heavy environments (ideally in the Financial Services industry)
Advanced proficiency in Databricks, Spark, Python and SQL, complemented by sound software engineering practices such as CI/CD workflows and infrastructure-as-code tooling (e.g., Terraform) on Azure.
Solid grasp of data engineering principles relevant to AI/ML contexts, including data modelling, quality assurance, feature collaboration and reproducibility, as well as an understanding of how data characteristics influence model behaviour.
Strong command of modern data architecture patterns — including Lakehouse principles, layered data organisation and data product thinking — and the ability to put them into practice within a governed enterprise setting.
Degree in computer science or a comparable discipline.
Strong analytical mindset paired with structured planning, clear documentation and the ability to effectively transfer knowledge to colleagues.

Preferred:

Professional background in the financial sector, ideally within asset management, combined with international work experience and relevant Databricks certifications.

Skills

AzureCI/CDDatabricksInfrastructure-as-codePythonSparkSQLTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Data Engineer - Databricks (gn)

About the role

Purpose of job

Tasks

Required:

Preferred:

Skills

Similar roles

Senior Database Engineer

Software Engineer (Rust)

Staff Engineer

Don't send a generic resume