Skip to content
mimi

Freelance Data Engineer / ML Engineer (Public Health Analytics)

ThreatXIntel

India · On-site Full-time Today

About the role

Role Overview

We are looking for a highly skilled Freelance Data Engineer / Machine Learning Engineer to build an end-to-end data pipeline and predictive analytics system focused on life expectancy modeling using public health and socio-economic data .

The ideal candidate should have strong experience in data engineering, big data processing, and machine learning , with the ability to work on real-world datasets and derive actionable insights .

Key Responsibilities

Data Engineering

  • Build scalable data pipelines using Python, SQL, and Apache Spark
  • Ingest data from APIs and public datasets (Census, healthcare, etc.)
  • Design multi-layer architecture:
    • Bronze (raw data)
    • Silver (cleaned data)
    • Gold (aggregated/feature-ready data)
  • Perform data transformation, joins, and aggregation at regional/community level

Feature Engineering

  • Develop key health indicators such as:
    • Mortality rates
    • Poverty & unemployment rates
    • Healthcare provider density
    • Food accessibility metrics
  • Build composite indices like:
    • Economic Hardship Index
    • Health Access Index

Machine Learning

  • Develop predictive models using scikit-learn (Random Forest, Regression)
  • Evaluate models using:
    • R² Score
    • RMSE
  • Perform feature importance analysis to identify key drivers of life expectancy

Simulation & Insights

  • Build an interactive Life Expectancy Simulator
  • Enable scenario-based analysis (e.g., impact of poverty reduction, healthcare improvements)
  • Provide recommendations for policy and intervention strategies

Visualization & Reporting

  • Create dashboards using Power BI and Streamlit
  • Develop geospatial visualizations using Folium
  • Highlight disparities across communities and generate insights reports

Required Skills & Experience

  • Strong experience in:
    • Python, SQL
    • Data Engineering & ETL pipelines
    • Big Data tools (Spark, Databricks)
  • Hands-on experience with:
    • Machine Learning (scikit-learn)
    • Feature engineering & model evaluation
  • Experience working with:
    • Public datasets / APIs
    • Data modeling & transformations
  • Good understanding of:
    • Data pipelines (Bronze/Silver/Gold architecture)
    • Statistical analysis and predictive modeling

Nice to Have

  • Experience in public health / healthcare analytics
  • Knowledge of geospatial data analysis
  • Experience building interactive dashboards or simulators
  • Exposure to cloud platforms (AWS / Azure / GCP)

Requirements

  • Strong experience in Python, SQL
  • Strong experience in Data Engineering & ETL pipelines
  • Strong experience in Big Data tools (Spark, Databricks)
  • Hands-on experience with Machine Learning (scikit-learn)
  • Hands-on experience with Feature engineering & model evaluation
  • Experience working with Public datasets / APIs
  • Experience working with Data modeling & transformations
  • Good understanding of Data pipelines (Bronze/Silver/Gold architecture)
  • Good understanding of Statistical analysis and predictive modeling

Responsibilities

  • Build scalable data pipelines using Python, SQL, and Apache Spark
  • Ingest data from APIs and public datasets (Census, healthcare, etc.)
  • Design multi-layer architecture: Bronze (raw data), Silver (cleaned data), Gold (aggregated/feature-ready data)
  • Perform data transformation, joins, and aggregation at regional/community level
  • Develop key health indicators such as: Mortality rates, Poverty & unemployment rates, Healthcare provider density, Food accessibility metrics
  • Build composite indices like: Economic Hardship Index, Health Access Index
  • Develop predictive models using scikit-learn (Random Forest, Regression)
  • Evaluate models using: R² Score, RMSE
  • Perform feature importance analysis to identify key drivers of life expectancy
  • Build an interactive Life Expectancy Simulator
  • Enable scenario-based analysis (e.g., impact of poverty reduction, healthcare improvements)
  • Provide recommendations for policy and intervention strategies
  • Create dashboards using Power BI and Streamlit
  • Develop geospatial visualizations using Folium
  • Highlight disparities across communities and generate insights reports

Skills

Apache SparkAWSAzureDatabricksFoliumGCPPythonPower BIscikit-learnSQLStreamlit

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free