Pyspark data engineer
ValueLabs
About the role
About the Role
We are looking for a Pyspark engineer for Chennai location with banking domain experience.
Data Engineer with 7+ years of experience. designs, builds, and maintains scalable data pipelines to collect, store, and process large datasets. They collaborate with data scientists to optimize data flow, ensure data quality, and implement security measures. Key skills include SQL & Oracle. Ensure data cleanliness, reliability, and security by implementing validation checks and access controls. Work with project teams to provide clean, accessible data as per requirement. Good exposure in delivery for Risk and Compliance areas and experience with Python development as a secondary skill.
Responsibilities
- Design, build, and maintain scalable data pipelines to collect, store, and process large datasets.
- Collaborate with data scientists to optimize data flow, ensure data quality, and implement security measures.
- Ensure data cleanliness, reliability, and security by implementing validation checks and access controls.
- Work with project teams to provide clean, accessible data as per requirement.
- Provide delivery support for Risk and Compliance areas.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, Information Systems, or a related field.
- 6+ years of experience as a Data Engineer, with a strong focus on PySpark and the Cloudera Data Platform.
Technical Skills
- PySpark: Advanced proficiency in PySpark, including working with RDDs, DataFrames, and optimization techniques.
- Cloudera Data Platform: Strong experience with Cloudera Data Platform (CDP) components, including Cloudera Manager, Hive, Impala, HDFS, and HBase.
- Data Warehousing: Knowledge of data warehousing concepts, ETL best practices, and experience with SQL‑based tools (e.g., Hive, Impala).
- Big Data Technologies: Familiarity with Hadoop, Kafka, and other distributed computing tools.
- Orchestration and Scheduling: Experience with Apache Oozie, Airflow, or similar orchestration frameworks.
- Scripting and Automation: Strong scripting skills in Linux.
Soft Skills
- Strong analytical and problem‑solving skills.
- Excellent verbal and written communication abilities.
- Ability to work independently and collaboratively in a team environment.
- Attention to detail and commitment to data quality.
How to Apply
Interested candidates please share resume to jobs.bfs2@valuelabs.com.
Requirements
- Strong focus on PySpark and the Cloudera Data Platform.
- Advanced proficiency in PySpark, including working with RDDs, DataFrames, and optimization techniques.
- Strong experience with Cloudera Data Platform (CDP) components, including Cloudera Manager, Hive, Impala, HDFS, and HBase.
- Knowledge of data warehousing concepts, ETL best practices, and experience with SQL-based tools (e.g., Hive, Impala).
- Familiarity with Hadoop, Kafka, and other distributed computing tools.
- Experience with Apache Oozie, Airflow, or similar orchestration frameworks.
- Strong scripting skills in Linux.
Responsibilities
- Designs, builds, and maintains scalable data pipelines to collect, store, and process large datasets.
- Collaborate with data scientists to optimize data flow, ensure data quality, and implement security measures.
- Ensure data cleanliness, reliability, and security by implementing validation checks and access controls.
- Work with project teams to provide clean, accessible data as per requirement.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free