Skip to content
mimi

Senior Big Data Engineer - NXCALS Pipelines & Spark

Confidential

Genf · On-site Yesterday

About the role

About

At CERN, the European Organisation for Nuclear Research, physicists and engineers are probing the fundamental structure of the universe. Using the world's largest and most complex scientific instruments, they study the basic constituents of matter - fundamental particles that are made to collide together at close to the speed of light. The process gives physicists clues about how particles interact, and provides insights into the fundamental laws of nature.

At CERN, the European Organisation for Nuclear Research, physicists and engineers are probing the fundamental structure of the universe. Using the world's largest and most complex scientific instruments, they study the basic constituents of matter - fundamental particles that are made to collide together at close to the speed of light. The process gives physicists clues about how particles interact, and provides insights into the fundamental laws of nature.

Job Description

Introduction

You will play a major role in the evolution of non-relational data stores and big data platforms, based on technologies such as Hadoop and Spark. You will apply your software engineering expertise to large and long-lived data platforms, high-throughput ingestion pipelines, performance-critical access patterns, and demanding reliability requirements. Your work will directly support the operation, monitoring, and analysis of particle accelerator systems through the management of multi-petabyte datasets accumulated over many years.

Functions

  • Drive the evolution of the CERN Accelerator Archival system (NXCALS).
  • Design and develop the core components of the system, including ingestion pipelines (ETL), metadata services, data compaction mechanisms, data extraction algorithms, and APIs.
  • Collaborate with different user communities to define and promote best practices for using NXCALS in the development of control applications for the CERN Control Centre.
  • Work closely with the CERN IT department to select and validate evolution of the underlying storage technologies (e.g. HDFS, ClickHouse).
  • Contribute to the operation, maintenance, and user support of the system.
  • Keep watch on relevant big-data technologies and assess their applicability to NXCALS.
  • Mentor and technically support junior software engineers contributing to these activities.
  • Contribute to the development of other Controls data engineering platforms according to overall priorities.

Qualifications

  • Master's degree or equivalent relevant experience in the field of Computer Science or a related field.

Experience

  • Extensive experience in Java development using the Spring ecosystem.
  • Solid knowledge of big-data technologies, including Hadoop, HDFS or Apache Ozone, HBase or ClickHouse, Apache Spark, and Kafka.
  • Working knowledge of Python, including SDK development and PySpark.
  • Proven experience with modern software engineering practices, including quality assurance, CI/CD, and DevOps methodologies.
  • Experience in the design, deployment, and operation of complex, high-availability systems.

Technical competencies

  • Knowledge of programming techniques and languages (java and Python).
  • Architecture and design of ICT systems.
  • Identification and selection of relevant emerging ICT technologies.
  • Conceptualising, designing and developing user experiences and interfaces.
  • Design of databases/repositories.

Behavioural competencies

  • Solving Problems: adopting a pragmatic approach; understanding the value of adopting generic rather than 'gold‑plated' technical solutions.
  • Working in Teams: contributing to promoting a positive atmosphere in the team through an optimistic and constructive attitude; addressing issues.
  • Demonstrating Flexibility: adapting quickly and resourcefully to shifting priorities and requirements.
  • Achieving Results: defining clear objectives, milestones and deliverables before initiating work/project.
  • Demonstrating Accountability: working conscientiously and reliably; delivering on promises.

Language skills

  • Spoken and written English, with a commitment to learn French.

Additional Information

Eligibility and closing date

Diversity has been an integral part of CERN's mission since its foundation and is an established value of the Organisation. Employing a diverse workforce is central to our success. We welcome applications from all Member States and Associate Member States.

This vacancy will be filled as soon as possible, and applications should normally re

Requirements

  • Extensive experience in Java development using the Spring ecosystem.
  • Solid knowledge of big-data technologies, including Hadoop, HDFS or Apache Ozone, HBase or ClickHouse, Apache Spark, and Kafka.
  • Working knowledge of Python, including SDK development and PySpark.
  • Proven experience with modern software engineering practices, including quality assurance, CI/CD, and DevOps methodologies.
  • Experience in the design, deployment, and operation of complex, high-availability systems.
  • Knowledge of programming techniques and languages (java and Python).
  • Architecture and design of ICT systems.
  • Identification and selection of relevant emerging ICT technologies.
  • Conceptualising, designing and developing user experiences and interfaces.
  • Design of databases/repositories.
  • Solving Problems: adopting a pragmatic approach; understanding the value of adopting generic rather than 'gold -plated' technical solutions.
  • Working in Teams: contributing to promoting a positive atmosphere in the team through an optimistic and constructive attitude; addressing issues.
  • Demonstrating Flexibility: adapting quickly and resourcefully to shifting priorities and requirements.
  • Achieving Results: defining clear objectives, milestones and deliverables before initiating work/ project.
  • Demonstrating Accountability: working conscientiously and reliably; delivering on promises.
  • Spoken and written English, with a commitment to learn French.

Responsibilities

  • Drive the evolution of the CERN Accelerator Archival system (NXCALS).
  • Design and develop the core components of the system, including ingestion pipelines (ETL), metadata services, data compaction mechanisms, data extraction algorithms, and APIs.
  • Collaborate with different user communities to define and promote best practices for using NXCALS in the development of control applications for the CERN Control Centre.
  • Work closely with the CERN IT department to select and validate evolution of the underlying storage technologies (e.g. HDFS, ClickHouse).
  • Contribute to the operation, maintenance, and user support of the system.
  • Keep watch on relevant big-data technologies and assess their applicability to NXCALS.
  • Mentor and technically support junior software engineers contributing to these activities.
  • Contribute to the development of other Controls data engineering platforms according to overall priorities.

Skills

Apache SparkCI/CDClickHouseDevOpsHadoopHBaseHDFSJavaKafkaNXCALSPythonPySparkSpring

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free