This week we ranked our research group preferences. Mine are:
LZ Dark Matter Experiment - I think this sounds like a cool data analysis project. It's going to use Python1 and ROOT. ROOT is a data analysis tool built by CERN specifically for physics. I'm a big fan of Python and learned a lot about it in PHYS165, so I'm excited to do some real research with it.
IceCube - Neutrinos are really cool and I'm pretty sure most IceCube members use Python. I know a little about it because one of my professors from last semester works on IceCube.
CMS LHC Group - This is another particle physics option that sounds interesting. The research project will be on "searching for new physics beyond the standard model in the known W and Z boson physics."
Computing - This isn't necessarily research in physics, but learning about the computing grid and other related computer science topics seems like a great learning experience.
I was assigned to the LZ Dark Matter group and we did some research into what we think dark matter is and we also learned some details about the LZ Dark Matter Experiment itself. All we were able to do without meeting Professor Bhatti is get an overview of the experiment itself and the motivation behind it (to detect dark matter).
Class was cancelled on 2/18 due to snow.
We first got to meet with Professor Bhatti on 2/24, where he gave us a presentation on dark matter and one source of indirect evidence for it, which is the study of rotation curves.
Professor Bhatti also gave us instructions through email on how to setup everything on the server and run some code he gave us. We had some trouble, so we sent him an email. Here's what to do to make it work and get it setup:
make a directory in your home directory using the make directory "mkdir name" command
copy the prof's files over to the directory you just made using the copy command "cp source_path/* destination_path" command (the star means get all the files from the source file path)
type "bash" to enter the bash command line
get into the directory you put all the copied files into (use "cd directory")
type "source setup.sh"
now that the root environment is setup, for us there were 2 ways to run the code the prof gave us:
enter ROOT by typing in "root" and then running line by line the commands in the .C file
or enter "root the_dot_C_file"
the code analyzes a simulated data set and generates some simple plots and puts them in a .root file
to view these plots, enter (in the ROOT interface) "TBrowser T" to open a browser. You can then open up the root file in this browser and select on any plot you want to se
With that, the setup is complete and we have some simple plots for the simulated data. The problem we faced is that we have no idea how the code works, as the only comments in the code besides commented out code were "LOOP" and "Monte Carlo." I personally was also struggling here because I have not used ROOT before and the last time I used anything C was a year ago. This was obviously a problem, so in our next meeting we asked Professor Bhatti to walk us through the code because we were having trouble deciphering it. After the prof walked us through the code, it made more sense and we started fiddling with it to make our own plots.
There was a period when we were waiting for feedback/trying to fix the problems we had when setting stuff up, so I decided to set up a virtual machine (VM) to practice my terminal skills and get familiar with ROOT. I honestly had a lot of trouble with this, so here are some links that helped:
https://root.cern.ch/
https://root.cern/get_started/
https://root.cern/get_started/courses/
https://root.cern/tutorials/
https://root.cern/manual/first_steps_with_root/
https://root.cern/install/
https://root.cern/install/dependencies/
https://root.cern.ch/releases/release-62206/
https://iscinumpy.gitlab.io/post/root-conda/
The software I used for the VM was Oracle's VirtualBox. I installed one of the newer versions of Ubuntu, which was probably a mistake because it's unclear if ROOT is stable on the newer versions of Ubuntu (though I haven't had any issues). It's pretty easy to find a youtube tutorial on setting a VM up. One thing to note is that you need to give your VM more storage than the default amount (>5GB) or else you will max out the storage midway through installing ROOT and you'll have to start the whole process over. I know this from experience, sadly.
Something you might want to do is set up a shared folder between your pc and your VM. This makes it possible to quickly transfer files to and from your VM. There are also youtube videos on how to do this.
Another problem I faced was that the display on my VM was TINY. Changing the window size did nothing, but I found a fix by changing the display settings inside the VM to display a higher resolution.
Here's a screenshot of what my VM looks like with root and TBrowser open:
Footnote 1: We didn't actually use any Python. Everything we did was in C++.
Overall, we were off to a slow start during the first handful of weeks due to a variety of reasons. We had trouble getting stuff setup and we only first met with Professor Bhatti a few weeks into the semester. However, after we sorted everything out we made a lot of progress on making a bunch of simple plots of the simulated data we were given. Professor Bhatti also gave us different data sets to use to generate plots and compare them. The data set we had been using so far was the WIMP data set, which was simulated WIMP events. We were also given data sets on simulated background events and simulated H3 (tritium) events. To switch between data sets, we changed a couple lines of the code that had paths pointing to where the data is stored on the cluster.
The code started to make a lot of sense in terms of how it's structured and how it works line by line. Basically there is one giant loop that loops over each event and each event has a ton of data associated with it. Each event is dissected by the code and some calculations are made. Then the data taken from each event is used to fill in histograms. After all the events are analyzed, these histograms are written out to a .root file.
I'm not on campus this semester, so I experienced quite a bit of lag when using TBrowser to view the plots. To get around this, I downloaded the .root files, put them in my VM, and viewed them from the VM. This was a LOT smoother, as I didn't have to wait a few seconds each time I wanted to use the scroll wheel on TBrowser or click something.
At this point I should probably mention that I was using MobaXterm instead of Putty (usually Putty is the preferred tool for using ssh). The reason why I was using MobaXterm was because it makes it much easier to download/upload files from the server. It has a drag and drop feature, which makes things a lot more painless.
In terms of meetings with Professor Bhatti, we went over more on the theory side of dark matter and more about the experiment itself.
Here's a code snippet as an example of what I mean when I say some calculations are done and histograms are filled up:
We were able to generate a lot of simple plots for each of the data sets. Analyzing the .root files from running the code was made a lot easier by the VM I had set up. Here's a screenshot of 2 TBrowser's open in my VM:
After getting familiar with the code and generating some simple plots and making comparisons, it was time for us to write a lot of code on our own and try to get some more meaningful results by making more complicated plots. We talked with Professor Bhatti about what our final goal should be, and he told us to recreate some of the plots from a paper on the sensitivity of the detector, which was the paper that we had read and had been following throughout the semester. One of the plots we already had, another one Zach was going to do, and the final one I was going to do.
The plot I had to make was the log(S2) vs S1 plot. What's plotted in this plot are mean values for the WIMP and electronic recoil (ER) events, as well as their 10% and 90% lines. The red lines are for the WIMP data set and the blue lines for the ER events. The points plotted are the ER background events. The data set we used for the electronic recoil set was the H3 data set. These points represented background due to electronic recoil. The point of this graph is to see how the WIMP events and ER are distributed, and define a cut of the data that gets as much of the WIMP events as possible (we want WIMP's to analyze because we think dark matter is a WIMP) and has a low amount of background events present. The paper said that they could get an ER discrimination rate of 99.5%. The cut that we did was below the mean line of the WIMP data, which means we kept about 50% of the WIMP events. The fraction of ER background present in this cut could then be calculated by counting the amount of H3 events below the WIMP mean line and dividing by the total amount of H3 events.
The paper's plot for this looked like this:
To recreate this plot, I followed a lot of steps. The first step was to divide the x-axis into bins, and in each bin calculate the mean of each point's corresponding log(S2) value. Here's what that plot looked like for the WIMP data set (I used 100 bins for this one but lowered the bin count because I wanted a smoother fit):
To get points for the 10% and 90% lines, I calculated the RMS value in each bin and used the mean plus/minus the RMS value to get points for the 10% and 90% lines.
Getting a fit for these points was very difficult because there was not a lot of documentation and examples online. Unfortunately there is no Stack Overflow for ROOT, so I had to make do with what examples I could find and old public message logs from the early 2000's. I checked all the default fit templates ROOT had, and found the fifth degree polynomial fit to look the best. Here's one of fits shown below:
Now onto how I actually generated these plots using the code. To generate these plots, I had to define some variables and did some calculations:
I also had to run the code for each data set (WIMP and H3).
Once I had the .root files generated from the code above, I copied the .root files to my VM and wrote a script to do the analysis:
You might notice that this code makes separate canvases. This is because I was having trouble writing these plots in different colors to the same canvas. Despite my efforts, I couldn't get it to work. I found a fairly non elegant and non clean work around. I used the skills I learned from my meme lord days and put the screenshots into paint.net (the program, not the website), and I then merged the plots together. To change the colors of the H3 data set, I selected the exact shade of red used in the plots and replaced it with a blue color I selected.
Here are the steps if you were wondering how I did this:
1. I took screenshots of each of the plots in the VM where each canvas was scaled to the same size
2. I added each screenshot to its own layer in paint.net
3. I used the "color picker" tool to replace the reds of the H3 plots with blues. More information on this here: https://www.youtube.com/watch?v=SjMiHBBXPc4
4. For each layer, I then went to layer->layer properties-> blend mode, and changed the blend mode from "normal" to "multiply"
5. Finally, I saved the file as a .png, which flattened the layers on top of each other.
The final plot I made looks like this:
Now I know, it's a complicated process that is ugly. But I needed results that were both presentable and correct, and this method worked just fine for that. It was a long process, but I don't know how much more time I wouldn't spent trying to get the same results using ROOT. If I have time, I'll fix it and redo it using ROOT. I was planning on asking Professor Bhatti about this, but the last meeting we had planned for the semester was cancelled.
Although the code itself may seem simple and not a lot, I spent a lot of time trying to things to work. My unfamiliarity with C++ and ROOT caused a lot of complications, but in the end I learned a lot about both of these technologies and was able to reach the end goal of recreating the log(S2) vs S1 plot from the paper. Some references I used were:
https://root.cern.ch/root/htmldoc/guides/users-guide/Histograms.html
https://root.cern.ch/root/roottalk/roottalk01/3421.html
https://root.cern.ch/root/htmldoc/guides/users-guide/Histograms.html
https://root.cern/manual/histograms/
https://root.cern/doc/master/classTH2D.html
https://root.cern.ch/doc/master/classTAttMarker.html
https://root.cern.ch/doc/master/classTAttMarker.html#a1f93a0d68673e698e9808ab1da414c46
https://root.cern.ch/doc/master/classTAttMarker.html
https://root-forum.cern.ch/t/least-squares-fit/40554
https://root.cern/root/html534/TGraph.html
https://root.cern.ch/doc/master/group__tutorial__fit.html
https://root.cern.ch/doc/master/fithist_8C.html
https://root-forum.cern.ch/t/combining-two-histograms/17003/2
https://root.cern.ch/root/roottalk/roottalk05/2574.html
https://root-forum.cern.ch/t/question-about-adding-two-graphs-on-the-same-canvas/27740/2
https://root.cern.ch/doc/master/classTHistPainter.html#HP01
https://root.cern.ch/doc/master/classTGraphPainter.html#GP01
https://root.cern.ch/doc/master/group__tutorial__hist.html
https://root.cern.ch/doc/master/hstack_8C.html
https://root.cern.ch/doc/master/fit2d_8C.html
I was nearly at my wit's end trying to figure out how to do the fit because my search results weren't giving me anything useful. I resolved this by searching using keywords like "cern" and "root." Just as I had nearly given up and was going to pull a last resort and do something like a linear spline interpolation (connect the dots) or copy the data into Python (my language of choice) and solve the problem in there, I found out that ROOT has a "fit" function. I then played around with this fit function and got it to work in ROOT.
With the log(S2) vs S1 plot complete, we could now define a cut of the data and count the fraction of background events present in that cut. The cut Professor Bhatti suggested was below the mean line for the WIMP data. This allows us to retain about 50% of the WIMP data and only have 0.68% of the background data present. The 0.68% was calculated by counting the number of H3 points below the mean WIMP line and dividing by the total amount of H3 points plotted. This fraction corresponds to the paper's results, which were that they could get an ER discrimination rate of 99.5%.
This semester, I came in knowing almost nothing about dark matter and ROOT. I knew a little C from when I took CMSC216 a year ago, but to be honest, I really didn't like that class. I know I didn't write much on the theory of dark matter in my logbook, but I learned a lot of interesting physics related to the theory of dark matter. I also learned a lot about C++ and ROOT because I had to understand how they worked in order to complete our analysis. More things that I learned about were VM's, the command line, and image editing. Overall, although this project was frustrating at times, I'm glad I was able to work on it because I learned a lot of cool things and gained a lot of useful skills.