All jobs · Data Engineer jobs

Pyspark data engineer

ValueLabs

Chennai · On-site Full-time Mid Level Today

Apply with a tailored resume Save job

About the role

About the Role

We are looking for a Pyspark engineer for Chennai location with banking domain experience.

Data Engineer with 7+ years of experience. designs, builds, and maintains scalable data pipelines to collect, store, and process large datasets. They collaborate with data scientists to optimize data flow, ensure data quality, and implement security measures. Key skills include SQL & Oracle. Ensure data cleanliness, reliability, and security by implementing validation checks and access controls. Work with project teams to provide clean, accessible data as per requirement. Good exposure in delivery for Risk and Compliance areas and experience with Python development as a secondary skill.

Responsibilities

Design, build, and maintain scalable data pipelines to collect, store, and process large datasets.
Collaborate with data scientists to optimize data flow, ensure data quality, and implement security measures.
Ensure data cleanliness, reliability, and security by implementing validation checks and access controls.
Work with project teams to provide clean, accessible data as per requirement.
Provide delivery support for Risk and Compliance areas.

Requirements

Bachelor’s or Master’s degree in Computer Science, Data Engineering, Information Systems, or a related field.
6+ years of experience as a Data Engineer, with a strong focus on PySpark and the Cloudera Data Platform.

Technical Skills

PySpark: Advanced proficiency in PySpark, including working with RDDs, DataFrames, and optimization techniques.
Cloudera Data Platform: Strong experience with Cloudera Data Platform (CDP) components, including Cloudera Manager, Hive, Impala, HDFS, and HBase.
Data Warehousing: Knowledge of data warehousing concepts, ETL best practices, and experience with SQL‑based tools (e.g., Hive, Impala).
Big Data Technologies: Familiarity with Hadoop, Kafka, and other distributed computing tools.
Orchestration and Scheduling: Experience with Apache Oozie, Airflow, or similar orchestration frameworks.
Scripting and Automation: Strong scripting skills in Linux.

Soft Skills

Strong analytical and problem‑solving skills.
Excellent verbal and written communication abilities.
Ability to work independently and collaboratively in a team environment.
Attention to detail and commitment to data quality.

How to Apply

Interested candidates please share resume to jobs.bfs2@valuelabs.com.

Requirements

Strong focus on PySpark and the Cloudera Data Platform.
Advanced proficiency in PySpark, including working with RDDs, DataFrames, and optimization techniques.
Strong experience with Cloudera Data Platform (CDP) components, including Cloudera Manager, Hive, Impala, HDFS, and HBase.
Knowledge of data warehousing concepts, ETL best practices, and experience with SQL-based tools (e.g., Hive, Impala).
Familiarity with Hadoop, Kafka, and other distributed computing tools.
Experience with Apache Oozie, Airflow, or similar orchestration frameworks.
Strong scripting skills in Linux.

Responsibilities

Designs, builds, and maintains scalable data pipelines to collect, store, and process large datasets.
Collaborate with data scientists to optimize data flow, ensure data quality, and implement security measures.
Ensure data cleanliness, reliability, and security by implementing validation checks and access controls.
Work with project teams to provide clean, accessible data as per requirement.

Skills

HadoopHBaseHiveImpalaKafkaLinuxOraclePySparkSQLPython

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Pyspark data engineer

About the role

About the Role

Responsibilities

Requirements

Technical Skills

Soft Skills

How to Apply

Requirements

Responsibilities

Skills

Similar roles

Data Platform Engineer (m/w/d)

Technical Talent Sourcing Recruiter

Data Solutions Engineer

Don't send a generic resume