Job Title

Samprasoft

McLean · On-site Full-time Senior 2mo ago

About the role

Bachelor’s degree in Computer Science, Engineering, Data science or a related quantitative field.
5-6 years of relevant experience in design and development of data pipelines to processing large volumes and variety of data (Structured and unstructured data, writing code for parallel processing, XMLS, JSONs, PDFs)
Hands-on programming experience in Hadoop, Spark, Python and SQL for data processing and analysis.
Demonstrated ability to manage competing demands, prioritize work, and manage customer expectation.
Strong verbal and written communication skills.

Advanced Python, SQL and Spark, very good familiarity with Bug data technologies like Hadoop, Scoop, Hive, Ambari
Prior experience working with AWS and Snowflake technologies
Unix Shell script, Autosys batch scheduling

Cleanse, manipulate and analyze large datasets (Structured and Unstructured data – XMLs, JSONs, PDFs) using Hadoop platform.
Develop Python, PySpark, Spark scripts to filter/cleanse/map/aggregate data.
Be able to build Dashboards in R/Shiny for end user consumption.
Manage and implement data processes (Data Quality reports).
Develop data profiling, deduping logic, matching logic for analysis.
Use programming languages in Python, PySpark and Spark for data ingestion.
Develop programs in BigData platform using Hadoop platform.
Present ideas and recommendations on Hadoop and other technologies best use to management.

AmbariAWSAutosysBigDataHadoopHiveJSONPDFPythonRSScoopShinySnowflakeSparkSQLUnix Shell scriptXML

Alten

Attio

$110k – $130k/yr

Runable

₹2500k – ₹4500k/yr

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.