Data Engineer

Gini Talent

US · On-site Full-time Mid Level 1mo ago

About the role

About

We are looking for an experienced Data Engineer to join our Knowledge AI (KAI) team, focusing on building scalable data infrastructure for AI and NLP-driven products.

In this role, you will be responsible for designing, building, and maintaining robust data pipelines and architectures that enable the processing and analysis of large-scale structured and unstructured datasets. You will work closely with AI/ML engineers to support the development and deployment of data-intensive solutions powered by NLP, large language models (LLMs), and advanced analytics.

Key Responsibilities

Design, build, and maintain scalable data pipelines for ingestion, transformation, and processing of large structured and unstructured datasets
Develop and optimize ETL/ELT workflows for data cleaning, normalization, enrichment, and versioning
Manage and optimize data storage solutions, including data lakes, warehouses, and vector databases
Collaborate with AI/ML teams to ensure efficient data access for model training and inference
Implement data quality, validation, and monitoring processes to ensure reliability and consistency
Optimize data workflows for performance, scalability, and cost-efficiency
Build and maintain data APIs and services to enable seamless data access across teams
Work with Docker and Kubernetes (K8S) for scalable data processing environments
Troubleshoot data-related production issues and improve system reliability
Document data architecture, pipelines, and processes clearly

Skills & Experience

Strong experience in Python for data engineering (e.g., Pandas, PySpark)
Advanced knowledge of SQL and experience with large-scale data processing
Solid understanding of data modeling, data warehousing, and data lake architectures
Hands-on experience with ETL/ELT processes and data pipeline orchestration tools (e.g., Airflow)
Experience working with cloud platforms, preferably MS Azure
Familiarity with big data technologies (e.g., Spark, distributed systems) is a plus
Experience with unstructured data processing (text, multimodal data) is highly desirable
Knowledge of NLP / LLM workflows and data preparation for AI models is a plus
Hands-on experience with Docker and Kubernetes
Experience with vector databases is a plus
Understanding of data governance, security, and best practices

Why Join Us?

Work on cutting-edge AI, NLP, and data engineering projects
Be part of a highly skilled and collaborative team
Build data systems that directly power impactful AI solutions
Opportunity to work with large-scale, complex datasets and modern technologies

Skills

AirflowDockerKubernetesLLMMS AzureNLPPandasPythonPySparkSQLSpark

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Data Engineer

About the role

About

Key Responsibilities

Skills & Experience

Why Join Us?

Skills

Similar roles

MCP Engineer / AI Backend Engineer

Senior Database Engineer

Team Leads

Don't send a generic resume