Data Engineer Lead
Cactus Communications
About the role
Overview :
CACTUS is a remote-first organization and we embrace an accelerate from anywhere culture. You may be required to travel to our Mumbai office based on business requirements or for company/team events.
We are looking for a Data Engineering Lead to architect and manage the large-scale data foundations that power our analytics and AI systems. In this role, you will design robust data pipelines, implement enterprise-grade architectures, and ensure seamless integration across diverse Digital India platforms while maintaining strict governance and security standards. If you are a technical leader who thrives on building scalable ingestion frameworks and optimizing high-performance data environments, this role offers the opportunity to drive strategic data initiatives at a national scale.
Responsibilities : Architect and implement scalable data pipelines to support AI/ML models and analytical workloads. Define data lake and data warehouse architectures compliant with company standards. Implement data ingestion frameworks for real-time and batch processing of datasets. Establish robust metadata management, lineage tracking, and data governance frameworks. Optimize data storage, compression, and retrieval strategies for large-scale AI applications. Collaborate with AI/ML teams to ensure clean, high-quality, and versioned datasets for model training and inference. Integrate APIs and connectors to unify data from multiple digital platforms. Maintain compliance with data retention, privacy, and security policies.
Requirements : B.Tech / M.Tech / M.S. in Computer Science, Data Engineering, or related fields. Professional certifications in cloud data platforms (AWS, Azure, GCP) are desirable. 8–12 years of professional experience in data engineering or data platform architecture. At least 4–5 years in designing and managing large-scale data pipelines for analytics or AI systems. 5-7 years designing and implementing enterprise data architectures for analytics and AI/ML use cases Experience working with structured, semi-structured, and unstructured data from diverse sources.
Technical Competencies: Cloud Services: AWS Redshift, Glue, S3; Azure Synapse; GCP BigQuery and Dataflow. Big Data Technologies: Apache Spark, Hadoop, Kafka, Airflow, Databricks, Snowflake, dbt for data transformation and orchestration Programming: Python, Scala, Java, SQL. Databases: PostgreSQL, MongoDB, Cassandra, Elasticsearch. Data Management: ETL/ELT design, schema evolution, DataOps, CI/CD pipelines. Infrastructure: Docker, Kubernetes, Terraform, Jenkins for data automation. Governance: Data cataloging, access control, encryption, and compliance logging.
About Cactus:
Established in 2002, CACTUS (cactusglobal.com) is a leading technology company that specializes in expert services and AI-driven products which improve how research gets funded, published, communicated, and discovered. Its flagship brand Editage offers a comprehensive suite of researcher solutions, including expert services and cutting-edge AI products like Mind the Graph, Paperpal, and R Discovery. With offices in Princeton, London, Singapore, Beijing, Shanghai, Seoul, Tokyo, and Mumbai and a global workforce of over 3,000 experts, CACTUS is a pioneer in workplace best practices and has been consistently recognized as a great place to work.
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free