Lead Data Engineer-PySpark, RedShift, Airflow, AWS

Zortech Solutions

Valhalla · On-site Contract Lead 1mo ago

About the role

Candidate should have 12+ years of experience in Data Engineering. Must have strong work experience with onshore-offshore model

Designing, creating, testing and maintaining the complete data management & processing systems.
Candidate need to have in depth understanding of how data pipelines are built
Typical challenges with fetching data from various sources.
How incremental/CDC data flows are handled.
How do you ensure data quality
How do you do Data profiling
Should be able to design and document data model at various levels
Working closely with the stakeholders.
Building highly scalable, robust & fault-tolerant systems.
Discovering data acquisitions opportunities
Finding ways & methods to find value out of existing data.
Improving data quality, reliability & efficiency of the individual components & the complete system.

Hands-on experience with PySpark, Redshift (SQL) and Airflow at minimum
Strong hands-on with required tech skills, flexible, right attitude to play the lead role
Knowledge of Hadoop ecosystem and different frameworks inside it - HDFS, YARN, MapReduce, Apache Pig, Hive, Flume, Sqoop, ZooKeeper, Oozie, Impala and Kafka
Must have experience on SQL-based technologies (e.g. MySQL/ Oracle DB) and NoSQL technologies (e.g. Cassandra and MongoDB)
Should have Python/Scala/Java Programming skills
Problem solving mindset working in agile environment

AirflowApache PigCassandraFlumeHDFSHiveHadoopImpalaJavaKafkaMapReduceMongoDBMySQLNoSQLOracle DBOoziePythonPySparkRedshiftScalaSQLSqoopYARNZooKeeper

Arango

Randstad Digital

HelloFresh

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.