1st EuroHPC malleability hackathon

Research on dynamic resource utilization for "traditional HPC workload" is one of the common research topics across several EuroHPC research projects (ADMIRE, DEEP-SEA, REGALE, Time-X) .

This hackathon will bring EuroHPC developers (and special invited guests) together for a joint effort around this topic.

General information

  • Where? Université Grenoble Alpes (aka. the green capital of the alps)

  • Registration deadline: December 20th
    Extended deadline: January 6th 2023

  • When? 23rd - 27th January 2023

  • Venue:

    • 700 Av. Centrale, 38400 Saint-Martin-d'Hères, France

    • Room location:

      • 23rd January: Room 106

      • 24th January: Room 106

      • 25th January: Room 106

      • 26th January: Room 406 (Attention! Different room!)

      • 27th January: Room 106

  • Ice-breaker:

    • Tuesday 24th January

    • Place: l'Épicurian, 1 Pl. aux Herbes, 38000 Grenoble

    • Time: 20:00

Registration

Deadline is December 20th.

We like to make this a strongly collaborative hacking event. Participation is therefore only possible if you also contribute via hacking together with others. In order to utilize the limited time in the best way, we like to gather hacking modules (see below) and collaborators on them in advance.

There are basically two possibilities to participate:

  • Joining an existing module: Please describe in your Email to which Module you'd like to contribute to and how this contribution would look like (what you can do for others and what others can do for you).

  • Suggesting a new module: If you like to work on a new module, please provide a brief description of this.

For participation, please use the registration website.

Main hacking modules

These modules will be updated on a rolling basis

  • Module #PMIx-SLURM: PMIx <=> SLURM dynamic resources with high priority queue

    • Brief description: Supporting dynamic resource utilization with PMIx using the SLURM high priority queue.

    • Hackers: Sergio Iserte (SLURM), Dominik Huber (PMIx), Isaías Comprés (SLURM), Martin Schreiber (Numerics)

  • Module #DMRLib-MPISessions: DMRLib <=> MPI Sessions

    • Brief description: Make DMRLib also using MPI Sessions

    • Hackers: Sergio Iserte (DMRLib developer), Dominik Huber (PMIx), Martin Schreiber (Numerics)

  • Module #AppSupportLib

    • Brief description:

      • Develop first prototype of software support layer/library for applications

      • Supporting handling of dynamic resources with MPI Sessions, SLURM, Flux, etc.

    • Hackers: Sergio Iserte (DMRLib developer), Martin Schreiber (Numerics)

  • Module #JobAllocationGrammar [DONE]

    • Brief description:

    • Hackers: Jean-Baptiste Besnard (ParaTools SAS), Martin Schreiber (Numerics), Isaías Comprés (Slurm expert)

  • Module #Monitoring

    • Brief description:

      • Job tracking (resource blaming) store job to resource correspondence in traces / logs

      • Job clustering how to match multiple instances of jobs including at various scales (go beyond the argv array)

      • Look at how to leverage the rich MPI tools interface (MPI-T) to expose portability and collect some application info

    • Hackers: Jean-Baptiste Besnard (ParaTools SAS), Isaías Comprés (Slurm expert), Pierre-François Dutot (REGALE)

  • Module: #DMRlib-reconf: Extend DMRlib with novel reconfiguration techniques

    • Brief description:

      • Implement new spawning methods in DMRlib

      • Adopt new reconfiguration policies for DMRlib

    • Hackers: Sergio Iserte (DMRLib developer), Iker Martín (MPI Developer), Martin Schreiber (Numerics)

  • Module: #Collocation: Collocation of HPC and Big Data workloads:

    • Brief description:

      • Implement or improve Bebida collocation tool support for OAR and Slurm

      • Add support for execution of Big Data workloads with deadline guarantees

    • Hackers: Michael Mercier (Bebida developer), Adrien Faure (OAR developer), Olivier Richard (OAR developer), Pierre-François Dutot (REGALE)

  • Module: #BatSim/Sched: Simulation of jobs with dynamic resources

    • Brief description:

      • Explore ways to use BatSim for running jobs with dynamic resources

      • Explore ways to incorporate experimental schedulers

    • Hackers: Adrien FAURE (REGALE), Martin SCHREIBER (TIME-X)

  • Module #[Please provide a module identifier during the registration]

    • Brief description: [Please provide a description of your module during the registration]

    • Hackers: [Please provide a potential collaborator for this topic]

Schedule

Preliminary, will be updated more and more!

23rd January 2023 (Monday):

  • 9-9:30: Welcome

  • 9:30 - 12: Short presentations

    • Presentation by Dominik Huber

    • Presentation by Isaias Comprez

    • Presentation by Jean-Baptiste Besnard

  • 13-17: Hacking sessions

24th January 2023 (Tuesday):

25th January 2023 (Wednesday):

  • 9-12: Hacking sessions & Talks

    • Presentation by Michael Mercier

  • 13-17: Hacking sessions & Talks

    • Presentation by Adrien FAURE

26th January 2023 (Thursday):

  • 9-12: Hacking sessions

  • 13-17: Hacking sessions & Talk

    • Tutorial by Adrien FAURE about NIX

27th January 2023 (Friday):

  • 9-12: Final presentations and discussions


We plan to have mini-talks given by the participants on the first day as well as mini-presentations to summarize the hacking efforts for each module at the end of this meeting.

Registered people

  • Jean-Baptiste Besnard (ParaTools SAS, ADMIRE)

  • Alberto Cascajo (Universidad Carlos III de Madrid, ADMIRE)

  • Isaías Comprés (Technical University of Munich, DEEP-SEA)

  • Pierre-François Dutot (LIG, REGALE) [missing hacking project]

  • Adrien FAURE (LIG/UGA, REGALE)

  • Dominik Huber (Technical University of Munich, TIME-X)

  • Sergio Iserte (BSC, DEEP-SEA)

  • Iker Martín (BSC, DEEP-SEA)

  • Michael Mercier (RYAX Technologies, REGALE)

  • Daniel Milroy (LLNL, external expert) [Keynote speaker, remote talk on Flux, participation via Zoom in selected hacking groups]

  • Olivier Richard (LIG/UGA, REGALE)

  • Martin Schreiber (Université Grenoble Alpes / Technical Univ. Munich, TIME-X)

Organizers

  • Martin Schreiber: martin.schreiber AT univ-grenoble-alpes.fr

  • Pierre-François Dutot: pierre-francois.dutot AT univ-grenoble-alpes.fr

  • Annie Simon