Data Engineer/Scientist
Lightium
About the role
About Lightium
Lightium is building next-generation photonic integrated circuits on thin-film lithium niobate (TFLN). As we scale, we are looking for a data engineer/scientist to help us analyze and visualize our technical data as well as contribute to scale the infrastructure that turns that data into insight.
Position Summary
This is a hands-on, build-things role. You will work closely with the Head of Data and our technical characterization team to design and implement data pipelines, transformations, and analytical models that connect our lab, manufacturing, and enterprise systems into a coherent data platform. We want someone who is comfortable writing production Python, can think clearly about data modelling, and is excited to apply machine learning and AI tooling to real scientific and operational problems. If you are a recent graduate or an industry expert who is hungry to build at scale and learn fast, this is the role for you.
Responsibilities
Data Engineering & Pipeline Development
- Design, build, and maintain scalable data pipelines that ingest data from lab instruments, PLM (Aras Innovator), ERP (Oracle NetSuite), MES, and other operational systems.
- Develop and manage ELT/ETL transformations using Python and DBT, applying software engineering best practices: version control, testing, modularity, and documentation.
- Work with Apache Iceberg and cloud object storage (AWS S3 or GCP GCS) to build and manage a scalable data lake that supports both batch and incremental processing patterns.
- Build and operate distributed data processing workflows using Apache Spark for large-scale transformation, aggregation, and feature engineering tasks.
- Implement data quality checks, schema validation, and pipeline monitoring to ensure that data flowing through the platform is reliable, traceable, and fit for purpose.
- Manage and evolve the data warehouse layer (table design, partitioning strategies, naming conventions, and access controls) to support growing analytical workloads.
Data Modelling & Transformation
- Translate raw data from diverse sources (instrument outputs, optical or SEM images, process logs, ERP exports, metrology files) into clean, well-structured analytical datasets.
- Define and maintain DBT models that implement business logic, process metrics, and cross-system joins in a version-controlled, testable way.
- Work with domain experts in the field of integrated photonics and RF across characterization, process engineering, and operations to understand data semantics and ensure models accurately reflect physical reality.
- Document data lineage, transformation logic, and model definitions so that downstream users can trust and understand what they are working with.
Dashboarding, Reporting & Manufacturing Visibility
- Design and build dashboards that give process and design engineers, fab operations staff, and leadership real-time visibility into wafer yield, process control metrics, layer-by-layer performance, and device characterization trends.
- Develop scheduled and on-demand reports that surface actionable manufacturing insights (yield excursions, parametric drift, lot genealogy, and cross-run comparisons) without manual data wrangling.
- Build self-service data access tools and well-documented datasets so that engineers can answer their own questions quickly.
- Work with characterization and process teams to define the KPIs, control charts, and real time data visualizations, statistical process control (SPC) views that matter most for day-to-day fab decision-making.
- Maintain and continuously improve the reporting layer as new process steps, measurement types, and device generations are introduced.
Manufacturing Analytics, Yield & Process Optimization
- Build statistical models and machine learning pipelines focused on wafer-level yield analysis.
- Develop process optimization models that correlate upstream process parameters with downstream device performance, supporting design-of-experiment (DOE) analysis and root cause investigation.
- Build anomaly detection systems that flag out-of-control process conditions early, before they propagate into yield loss or device failures downstream.
- Build, evaluate, and iterate on models in Python using libraries such as scikit-learn or PyTorch.
- Stay current with advances in scientific ML and semiconductor process analytics; apply emerging methods where they offer genuine improvement over existing approaches.
Collaboration & Documentation
- Work closely with the Head of Data, characterization engineers, and process teams to understand data needs and translate them into well-scoped engineering work.
- Maintain clear documentation of pipeline logic, model definitions, dataset schemas, and known data quality issues.
- Communicate progress, blockers, and findings clearly in writing; contribute to a culture of transparency and knowledge sharing across the data and engineering teams.
What You’ll Bring
Experience and Skills
- A recent graduate or industry expert with a degree in computer science, data science, statistics, physics, engineering, or a related quantitative field.
- Background in physical science, physics, photonics, engineering, chemistry
- Strong Python skills - you write clean, maintainable code, and you are comfortable working in a shared codebase with version control.
- Solid SQL fundamentals: you can write complex queries, reason about query performance, and design a sensible schema.
- Familiarity with cloud data platforms (AWS or GCP preferred) including object storage, managed compute, and cloud-native data services.
- Experience with Data warehouseing, Apache Iceberg and cloud object storage (AWS S3 or GCP GCS)
- Some hands-on experience with data pipeline tooling (DBT, Dagster, or similar)
- A working understanding of machine learning fundamentals: you have trained models, evaluated them properly, and thought carefully about overfitting, leakage, and generalization.
- Curiosity about LLMs and AI agents.
- A rigorous, detail-oriented mindset, and you take data quality seriously.
Startup Mentality
- A driven, results-oriented mindset with a passion for innovation and problem-solving. You are proactive, take ownership of your successes and failures, and thrive on delivering tangible outcomes in a dynamic environment.
Communication
- Excellent written and verbal communication skills. Fluency in English is required.
What We Offer
- Competitive compensation and benefits package, including an employee stock option plan (ESOP), fully-covered Pillar Two, and generous vacation.
- A unique opportunity to lead a cutting-edge manufacturing initiative in the rapidly growing field of photonics.
- A collaborative and innovative work environment where your contributions will shape the future of telecommunications, datacom, and beyond.
- Become a key player in an early-phase startup and join a young, motivated, and energetic team. You will have the unique opportunity to create, grow, and develop your professional skills at a fast pace.
Final Thoughts
We'd love to hear from you if you're passionate about building new tech from the ground up and thrive in a collaborative and result-driven environment.
Lightium is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free