Skip to content
mimi

Data Engineer NLP

statworx

Germany · On-site Full-time Senior Today

About the role

About Us

statworx is a leading consulting and development company for data and AI based in Frankfurt am Main. We offer strategic consulting for medium-sized companies and global corporations. We develop innovative data & AI solutions for all areas and fields of activity of a company. We empower people at all skill levels with our data & AI training formats. In short: We support companies in all aspects of digital transformation – for more than 10 years, in over 1,000 data & AI projects, and for over 100 clients from almost all industries.

Our AI Development department serves as a catalyst for Data & AI Transformation. We rely on a holistic approach that ranges from the initial evaluation of AI maturity, through the conception and elaboration of the data and AI solution, to the practical implementation and scaling of AI solutions. Through our in-depth expertise in Data Engineering, Data Science, and Machine Learning, we ensure that our clients derive maximum benefit from their data.

Your Tasks

Focus: Data pipelines and provisioning for NLP and LLM-based applications

  • You connect classic Data Engineering with modern NLP approaches – especially in the context of Large Language Models (LLMs), Embeddings, Knowledge Graphs, Retrieval-Augmented Generation (RAG), and Text-to-SQL applications.
  • You conceptualize, develop, and operate modern data architectures that form the basis for advanced NLP applications – from knowledge management systems to semantic search solutions and RAG use cases.
  • You work closely with our clients, understand their business requirements and data processes, and develop customized, scalable data and AI solutions from them.
  • You implement scalable data pipelines and infrastructures to efficiently provide, transform, and version large amounts of structured and unstructured data.
  • You ensure data quality, security, and governance across the entire value chain and establish best practices for handling sensitive data in AI projects.
  • You are responsible for the setup and operation of scalable data infrastructures in cloud environments and automate deployments as well as monitoring systems to ensure reliability and availability.
  • You advise our clients and internal teams strategically on data architectures, technologies, tools, and best practices, acting as a reliable sparring partner.
  • You support junior colleagues, actively share your knowledge within the team, and contribute to the further development of the statworx Data Engineering community through workshops, blog posts, or internal talks.

Your Profile

  • You have successfully completed a Master's degree – e.g., in Computer Science (Business Informatics) or a comparable field of study.
  • You have at least five years of relevant professional experience in Data Engineering or Data Architecture.
  • You have a deep understanding of modern data architectures (Data Lakes, Lakehouses, Data Warehouses) and are well-versed in ETL/ELT processes and data modeling.
  • Ideally, you have experience in building data infrastructures for NLP applications – especially in the context of LLMs, Retrieval-Augmented Generation (RAG), Semantic Layers, and Knowledge Graphs.
  • Practical experience with Text-to-SQL systems or developing interfaces between natural language and databases is a plus.
  • You have experience with cloud platforms (Azure, AWS, or GCP) and modern data platforms like Databricks or Snowflake.
  • You are familiar with Infrastructure-as-Code (e.g., Terraform, Pulumi) and CI/CD workflows (e.g., GitHub Actions, GitLab CI, Azure DevOps).

Skills

AWSAzureDatabricksData EngineeringData ScienceData WarehousesETLGCPGitLab CIGitHub ActionsInfrastructure-as-CodeKnowledge GraphsLLMMachine LearningNLPPulumiRAGSnowflakeTerraformText-to-SQL transformasi data

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free