Generating Training Sets

To create training sets for the NNNs, you can use your favorite nucleosynthesis or stellar evolution code. We used bbq, which is a wrapper that enables direct use of the nucleosynthesis solver of the stellar evolution code MESA.

So how do I produce the training sets ?

To create your own training sets, you will need to run nucleosynthesis computations on a large scale.

Create a grid of your parameter space of interest

The first step to do this is to deiced on your parameter space regime. Think about the temperatures, densities and compositions (or electron fractions) typical of your physical scenario of interest, and how you want to sample the parameter space. You have this figured out? Great! now you can generate a grid representing this parameter space.

In the zip folder you downloaded from ZENODO, go to

$NuclearNeuralNetworks/python_scripts_for_analysis/GenerateTrainingSets/

and open the python script called GridGenerator.py. It generates a grid of log temperatures, log densities and electron fractions (in this order) sampled quasi-randomly between the minimum and maximum supplied values using the Sobol algorithm. Each row in the grid corresponds to a different combination of these parameters. You can change the boundaries of the parameter space, and control the size of the grid by changing pointsNum. Creating the grid doesn't require a lot of computational power, so you can easily do it on your own laptop.

You can find the grid we created to generate our training sets in

$NuclearNeuralNetworks/training_sets/logT_9.2_to_9.9_logRho_7_to_9_Ye_0.45_to_0.5.txt

Run bbq based on this grid

Now let's assume you have a grid ready, and that you already installed bbq according to the instructions here. To run bbq, you will first need to create inlist files that contain all the information bbq needs to make a nucleosynthesis calculation, like the temperate and density of a burning zone, the initial composition, the nuclear reaction network you want to use, the number of timesteps you want to perform, etc. Since to train the NNNs you need a very large number of training sets (around a million bbq runs in our case!), it is impossible to generate the inlist files manually. We wrote a python script that prepares the inlists based on our grid of parameters. It also prepares the initial compositions needed for the bbq run by assigning a random composition to each electron fraction in the grid according to the chosen nuclear reaction network. The script is called bbqCreateDatabaseFromGivenComp.py, and you can find it in

$NuclearNeuralNetworks/python_scripts_for_analysis/GenerateTrainingSets/

To run it, you will need the previously generated grid of parameters, and a file that lists the isotopes contained in your chosen nuclear reaction networks, found in

$NuclearNeuralNetworks/python_scripts_for_analysis/GenerateTrainingSets/isoLists/

While bbq runs are not computationally expensive (each bbq run can take between 2 to 5 minutes) and you can easily perform them on your laptop, generating the amount of bbq outputs needed to train the NNNs in a reasonable amount of time requires the use of a cluster . Luckily, we only need one cpu per bbq run, so we can send different runs on different cpus and computational nodes. We execute the bbq runs with GNU using the script submit_GNU_jobs.slurm found in

$NuclearNeuralNetworks/python_scripts_for_analysis/GenerateTrainingSets/

where we can control the number of parallel jobs we send through #SBATCH --array and the number of cpus per job through #SBATCH --ntasks. We determine how many times to repeat the task of executing a single bbq run in each job through TASK_ID. If we set #SBATCH --array = 1-256,TASK_ID 4100, for instance, we will execute 256 X 4100 = 1,049,600 bbq runs to serve as training sets for the NNNs.

The output of each bbq run we execute is a text file named in the format output_T_9.201_rho_7.666_ye_0.4875.txt, where what comes after T_ and rho_ are the logarithms of the temperature and density, respectively, and what comes after ye_ is the electron fraction. Each output file is for a different combination of these parameters taken from the previously generated grid file. It contains the age, timesteps, nuclear energy generation term, neutrino loss term and abundance of each isotope in the nuclear reaction network for each of the recoded ages of the stellar zone.

Sine the entire raw data from our bbq outputs, which includes nucleosynthesis calculations for the parameter space we explore in Grichener et al. 2025 for many more additional timesteps, and a wider range of densities, amounts to 4TB, we don't share it on Zenodo. But if you have the storage space feel free to contact me and I will happily send you everything I have.

If you want to use another software development tool rather than bbq to generate your training sets, you will need to modify our script to include the relevant parameters your code asks of you, or write your own script that receives the parameter grid as input, generates your desired initial compositions and executes the runs.

Assemble the relevant data from the bbq output in a csv file

So after running submit_GNU_jobs.slurm or a script of your own making, you are left with around a million output files. Since the timestep in not a input of the NNNs currently, you can only train them on one timestep at a time, so you only need one line from each of the bbq output files. You could in principle look for this line in each of the files while training, but it would be time consuming and messy. You might also want to preform the training on your own work station (if the cluster you are using doesn't have a GPU, for instance), and transferring 4TB of data is a real headache. An easy way to overcome this obstacle is to gather all the information you need in the much smaller csv file. We do this by using the file buildDatabaseFromBBQoutputForComps_plus_Eps.py found in

$NuclearNeuralNetworks/python_scripts_for_analysis/GenerateTrainingSets/

This script goes over the bbq outputs and gathers all the data for a given age (training timestep), by copying the row in the index ageIndex from all the bbq output files to a csv file. You can run the script on a cluster using the file submit_slurm_job_single_task.sh found in the same directory.

Than you just download this file into your laptop or workstation, and you don't need the cluster anymore.

Normalize the energy terms

We divide the energy terms in the previously created csv files so their values are closer to isotopic abundances and the training process is more efficient. We use the script NormalizeEps.py found in

$NuclearNeuralNetworks/python_scripts_for_analysis/GenerateTrainingSets/

If you are interested in the same parameter space and timesteps as us, you can find our database files (where the energy terms have already been normalized) in

$NuclearNeuralNetworks/training_sets/mesa_${net}/

where net=mesa_80 or net=meas_151.

⬅️About Architecture➡️

Page updated

Google Sites

Report abuse