Skip to content
mimi

Data Engineer

Gini Talent

US · On-site Full-time Mid Level 1w ago

About the role

About

We are looking for an experienced Data Engineer to join our Knowledge AI (KAI) team, focusing on building scalable data infrastructure for AI and NLP-driven products.

In this role, you will be responsible for designing, building, and maintaining robust data pipelines and architectures that enable the processing and analysis of large-scale structured and unstructured datasets. You will work closely with AI/ML engineers to support the development and deployment of data-intensive solutions powered by NLP, large language models (LLMs), and advanced analytics.

Key Responsibilities

  • Design, build, and maintain scalable data pipelines for ingestion, transformation, and processing of large structured and unstructured datasets
  • Develop and optimize ETL/ELT workflows for data cleaning, normalization, enrichment, and versioning
  • Manage and optimize data storage solutions, including data lakes, warehouses, and vector databases
  • Collaborate with AI/ML teams to ensure efficient data access for model training and inference
  • Implement data quality, validation, and monitoring processes to ensure reliability and consistency
  • Optimize data workflows for performance, scalability, and cost-efficiency
  • Build and maintain data APIs and services to enable seamless data access across teams
  • Work with Docker and Kubernetes (K8S) for scalable data processing environments
  • Troubleshoot data-related production issues and improve system reliability
  • Document data architecture, pipelines, and processes clearly

Skills & Experience

  • Strong experience in Python for data engineering (e.g., Pandas, PySpark)
  • Advanced knowledge of SQL and experience with large-scale data processing
  • Solid understanding of data modeling, data warehousing, and data lake architectures
  • Hands-on experience with ETL/ELT processes and data pipeline orchestration tools (e.g., Airflow)
  • Experience working with cloud platforms, preferably MS Azure
  • Familiarity with big data technologies (e.g., Spark, distributed systems) is a plus
  • Experience with unstructured data processing (text, multimodal data) is highly desirable
  • Knowledge of NLP / LLM workflows and data preparation for AI models is a plus
  • Hands-on experience with Docker and Kubernetes
  • Experience with vector databases is a plus
  • Understanding of data governance, security, and best practices

Why Join Us?

  • Work on cutting-edge AI, NLP, and data engineering projects
  • Be part of a highly skilled and collaborative team
  • Build data systems that directly power impactful AI solutions
  • Opportunity to work with large-scale, complex datasets and modern technologies

Skills

AirflowDockerKubernetesLLMMS AzureNLPPandasPythonPySparkSQLSpark

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free