T
Freelance Data Engineer / ML Engineer (Public Health Analytics)
ThreatXIntel
India · On-site Full-time Today
About the role
Role Overview
We are looking for a highly skilled Freelance Data Engineer / Machine Learning Engineer to build an end-to-end data pipeline and predictive analytics system focused on life expectancy modeling using public health and socio-economic data .
The ideal candidate should have strong experience in data engineering, big data processing, and machine learning , with the ability to work on real-world datasets and derive actionable insights .
Key Responsibilities
Data Engineering
- Build scalable data pipelines using Python, SQL, and Apache Spark
- Ingest data from APIs and public datasets (Census, healthcare, etc.)
- Design multi-layer architecture:
- Bronze (raw data)
- Silver (cleaned data)
- Gold (aggregated/feature-ready data)
- Perform data transformation, joins, and aggregation at regional/community level
Feature Engineering
- Develop key health indicators such as:
- Mortality rates
- Poverty & unemployment rates
- Healthcare provider density
- Food accessibility metrics
- Build composite indices like:
- Economic Hardship Index
- Health Access Index
Machine Learning
- Develop predictive models using scikit-learn (Random Forest, Regression)
- Evaluate models using:
- R² Score
- RMSE
- Perform feature importance analysis to identify key drivers of life expectancy
Simulation & Insights
- Build an interactive Life Expectancy Simulator
- Enable scenario-based analysis (e.g., impact of poverty reduction, healthcare improvements)
- Provide recommendations for policy and intervention strategies
Visualization & Reporting
- Create dashboards using Power BI and Streamlit
- Develop geospatial visualizations using Folium
- Highlight disparities across communities and generate insights reports
Required Skills & Experience
- Strong experience in:
- Python, SQL
- Data Engineering & ETL pipelines
- Big Data tools (Spark, Databricks)
- Hands-on experience with:
- Machine Learning (scikit-learn)
- Feature engineering & model evaluation
- Experience working with:
- Public datasets / APIs
- Data modeling & transformations
- Good understanding of:
- Data pipelines (Bronze/Silver/Gold architecture)
- Statistical analysis and predictive modeling
Nice to Have
- Experience in public health / healthcare analytics
- Knowledge of geospatial data analysis
- Experience building interactive dashboards or simulators
- Exposure to cloud platforms (AWS / Azure / GCP)
Requirements
- Strong experience in Python, SQL
- Strong experience in Data Engineering & ETL pipelines
- Strong experience in Big Data tools (Spark, Databricks)
- Hands-on experience with Machine Learning (scikit-learn)
- Hands-on experience with Feature engineering & model evaluation
- Experience working with Public datasets / APIs
- Experience working with Data modeling & transformations
- Good understanding of Data pipelines (Bronze/Silver/Gold architecture)
- Good understanding of Statistical analysis and predictive modeling
Responsibilities
- Build scalable data pipelines using Python, SQL, and Apache Spark
- Ingest data from APIs and public datasets (Census, healthcare, etc.)
- Design multi-layer architecture: Bronze (raw data), Silver (cleaned data), Gold (aggregated/feature-ready data)
- Perform data transformation, joins, and aggregation at regional/community level
- Develop key health indicators such as: Mortality rates, Poverty & unemployment rates, Healthcare provider density, Food accessibility metrics
- Build composite indices like: Economic Hardship Index, Health Access Index
- Develop predictive models using scikit-learn (Random Forest, Regression)
- Evaluate models using: R² Score, RMSE
- Perform feature importance analysis to identify key drivers of life expectancy
- Build an interactive Life Expectancy Simulator
- Enable scenario-based analysis (e.g., impact of poverty reduction, healthcare improvements)
- Provide recommendations for policy and intervention strategies
- Create dashboards using Power BI and Streamlit
- Develop geospatial visualizations using Folium
- Highlight disparities across communities and generate insights reports
Skills
Apache SparkAWSAzureDatabricksFoliumGCPPythonPower BIscikit-learnSQLStreamlit
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free