Physical Sciences

Beojan Stanislaus - "HEP on HPC"

SLAM Finalist 3Q4 -

How did you initially get interested in science?

I’m one of those people who’s wanted to be a scientist since I was about five. I think it might have been after seeing a Space Shuttle launch on TV, perhaps.

What is your favorite place at the Lab?

The 50C patio is close to where I work, and you get an amazing view of the bay, as you do from many places at the lab.

Most memorable moment at the Lab?

The SLAM, of course 😄

What are your hobbies or interests outside the Lab?

I find the less well-known programming languages really interesting. I’ve been looking at Raku recently, and it has a lot of really nice features that you don’t find elsewhere.

Beojan's Script - "HEP on HPC"

High Energy Physics has brought the world many technologies, from PET scans to the World Wide Web. Today the big collider experiments like the ATLAS experiment at CERN, are at the forefront of Big Data. ATLAS, for example, collects tens of petabytes of data every year. We’re now working toward our High Luminosity upgrade and when it’s completed, we expect to collect an order of magnitude more data.

Until now we’ve relied on a computing model called High Throughput Computing. With High Throughput Computing we use many thousands of individual machines, not unlike your desktop, to run all our computing on CPUs. With the coming avalanche of data we can no longer afford to do this. We simply must make use of High Performance Computers like Perlmutter, here at LBL.

But there’s a big difference between High Throughput Computing and High Performance Computing. Where High Throughput Computing runs many independent tasks on individual machines, High Performance Computers are designed to run gigantic simulations on thousands of machines all working together, using both CPUs and GPUs.

I have been working to adapt ATLAS’s software to work well on High Performance Computers.

First, I’m using an industry standard library to distribute events (which are our basic unit of work) across the machines. This ensures no machine sits idle waiting for the laggards to catch up. The fastest machines get given the most work. We are now able to do this with an overhead of less than a thousandth of a percent.

I’ve also developed a method to allow us to use of all our CPUs and our GPUs simultaneously. Until now we’ve had to leave some CPUs idle while we used the GPU, because the next step to run on the CPU depends on what’s currently being calculated on the GPU. My method lets us start additional events to fill in the idle CPUs.

Ultimately, my work will allow ATLAS to make efficient use of our HPC allocations, so we can successfully cope with the mountain of data we expect from the High Luminosity upgrade. In the process, I’ve developed a model for using HPCs which can also be applied to many other fields with similar Big Data needs.

Report abuse