Skip to content
mimi

Job Title

Samprasoft

McLean · On-site Full-time Senior 4w ago

About the role

Qualifications

  • Bachelor’s degree in Computer Science, Engineering, Data science or a related quantitative field.
  • 5-6 years of relevant experience in design and development of data pipelines to processing large volumes and variety of data (Structured and unstructured data, writing code for parallel processing, XMLS, JSONs, PDFs)
  • Hands-on programming experience in Hadoop, Spark, Python and SQL for data processing and analysis.
  • Demonstrated ability to manage competing demands, prioritize work, and manage customer expectation.
  • Strong verbal and written communication skills.

Required Technical Skills

  • Advanced Python, SQL and Spark, very good familiarity with Bug data technologies like Hadoop, Scoop, Hive, Ambari
  • Prior experience working with AWS and Snowflake technologies
  • Unix Shell script, Autosys batch scheduling

Responsibilities

  • Cleanse, manipulate and analyze large datasets (Structured and Unstructured data – XMLs, JSONs, PDFs) using Hadoop platform.
  • Develop Python, PySpark, Spark scripts to filter/cleanse/map/aggregate data.
  • Be able to build Dashboards in R/Shiny for end user consumption.
  • Manage and implement data processes (Data Quality reports).
  • Develop data profiling, deduping logic, matching logic for analysis.
  • Use programming languages in Python, PySpark and Spark for data ingestion.
  • Develop programs in BigData platform using Hadoop platform.
  • Present ideas and recommendations on Hadoop and other technologies best use to management.

Skills

AmbariAWSAutosysBigDataHadoopHiveJSONPDFPythonRSScoopShinySnowflakeSparkSQLUnix Shell scriptXML

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free