HONR269L: Dark Matter Research
Jan. 29: Assigned to work on the LZ dark matter research project aiming to understand more of the dynamics of and interactions of the mysterious dark matter. We will be working alongside Professor Bhatti to work on simulations and development of software to analyze data to assist in the experiment. Project will utilize ROOT and Python.
Feb. 5: Set up to meet with Professor Bhatti from 5-6pm on Wednesday’s. This was our first meeting and he gave a presentation describing the theoretical background of the existence and known properties of dark matter. Evidence for dark matter’s existence can be seen through galaxies by tools such as gravitational lensing and x-rays, and it explains phenomena such as the orbital speeds of galactic bodies. The experiment that we will be working on believes dark matter is a WIMP (weakly interacting massive particle), and the LZ detector tries observing the interactions of these WIMPs with Xenon nuclei. Since the detector is not yet complete, there is no physical data for us to work with; therefore, we are going to be working with simulated data based on theoretical calculations and previous measurements. We will be discussing project goals, a timeline, and more specifics about the experiment next meeting.
Feb. 7: Began working on our research. Professor Bhatti gave us an article to read to give us a better understanding of the work that we will be doing, as well as giving us instructions for how we should set up our Tier-3 server. Unfortunately, we encountered permission errors involving retrieving his files in the T3 server, involving the shell and Linux distributions on the cluster. We met with Professor Bhatti to solve it, but we were unable to fix the issue. We will continue trying to figure it out on Sunday.
Feb. 9: Met as a team and worked out some of the computing issues. Eli figured out how to handle the permission issues over Saturday and helped the rest of us get up to date with the right files and code. In addition, I read over the scientific article on the LZ detector that Professor Bhatti gave to further understand what the detector will do and what types of events we are going to be simulating. Below is a screenshot of the steps to login to my TIER-3 cluster, get Professor Bhatti’s files and access root.
In addition to finishing the setup of the T3 cluster, I checked out some of the code to understand what types of events we will be simulating and the types of graphs we will use.
Feb. 11: Unfortunately I encountered an issue with logging into the T3 cluster. I emailed Sam and he was able to help me reset my password and login again. After getting it back up I got the updated files and began running the code that Professor Bhatti gave us to generate example plots.
Feb 12: Met again with Professor Bhatti. This time we went into much more detail about the actual detector and what types of events we will be simulating. We discussed some of the mathematical properties of WIMPS and how events are processed in the detector, such as the S1 and S2 signals. In addition, we went over the code together to try pinpointing the best way to go about running the simulations and talked about how we should review the code and think about how we can get the most meaningful results. As a group we decided that we would all go over the codes and discuss whatever we came up with.
Feb 16: Got together and discussed technical details about the code. Discussed how to manipulate the number of events generated, how the plots are produced, the pipeline in order to generate these pots, and what the contents of the important plots were. Compiled a list of questions on this topic to ask Professor Bhatti at our next meeting.
Feb 19: Now that we all had a solid understanding of the code, we met with Professor Bhatti, who explained any of the gaps in our knowledge of what the graphs represented. We went over the process for generation of these graphs, and discussed future graphs that we would like to be producing later in this research. I order to run the code we needed to run the following commands:
This setup root to run and then runs the file containing the code to run the events. Inside the make_H3_plots file contains the location where you can adjust the number of simulated events.
Feb 23: Spent a lot of the week playing around with the codes and understanding what each line affects. Generated lots of plots based on simulation data (26 different PNGs in total). Worked on understanding what the plots represent. Below are a few example plots.
Feb 26: Now that we all were able to generate the graphs and feel more comfortable working with the code, it was finally time to begin reproducing the first graph from the paper! The graph below is what we are trying to produce with our simulated data. The graph is a display of H3 decaying in both ER (blue) and NR (red). Our goal was to take our data (middle graph above) containing the ER data, produce the NR data, and adjust the current code to reformat the data we had to look like the graph below. We struggled quite a bit on this actually, finding it hard to find the exact lines of code that generated this data, getting the NR data, and overlaying the quantile lines. This week we will continue to work on working out these issues to be able to produce the graph.
Mar 1: Today we met as a group to try finishing up generating the above plot with our data. We discovered that it is quite a challenge working in root to merge the 2 files containing the different data (ER and NR data). In order to make the plot, we want to overlay both the existing plots we have, but in order to do this seems to require a very tedious redefinition of the parameters in one of the files. We also had trouble shifting the quantile lines. It seems that the current quantiles are generated through calling values from a stored matrix, and since our data will be displayed differently than Professor Bhatti’s initial graph, we need to generate new quantile lines. We are not sure whether we just calculate them or we get them from another file. We will hopefully get an answer bby tonight, and if not we will make sure to resolve it all for Wednesday.
Mar 4: Eli was able to solve the problem we were having of not being able to overlay the graphs or insert the correct quantile lines. In order for me to get the edits he made to the code, I just copied the new files he created from his directory, and was able to successfully run them and generate the graph below: <insert graph>. In addition, you can see some of the important edits made to the code. <Insert pics of edits>
Lastly, working with emacs was a bit since editing and managing multiple files can be tough, so Eli was able to mount my directory of the cluster to my computer using <insert command> and now I am able to edit with my local text editor of either sublime or VS code! Next steps now are to apply cuts to the graph so we can remove some of the extraneous noise present and to get mean quantile lines only for both the ER and NR data. After doing this we will be able to compare our generated plots with other ones such as seeing how tritium decay would differ.
Mar 8: I worked for a while trying to just plot the mean quantile lines of both the ER data and NR data on the same plot. It was no problem to get it from the ER data since we were already plotting quantile lines, so I just kept the 50th percentile line and removed others. For the NR data, since we overlaid the plots, we got the data from a separate file, which we needed to get the quantile lines from. I spent some time working through the code to understand how the original quantile lines were plotted, and from there I tried to do a similar thing with the NR data. However, there was an error which I was unable to solve, causing the mean quantile line not to display. I will work on fixing the issue later this week.
Mar 9: We got everything working! Delina and Eli were able to solve the overlay of the plots, and Eli was able to help get the mean quantile line for the NR data. To quickly go over what these plots represent, the S1Area_phd one represents the recoil energy of both the S1 and logS2 data after preprocessing. Due to the absorption of some energy in the nucleus, NR data tends to have lower energy levels represented, as can be seen by the mean quantile lines. The S2 data is much higher since that is read after passing through many photo-multipliers to strengthen the signal. The raw S1 and S2 graphs represent the raw data from both the H3 data and the WIMP data, allowing us to compare their raw count frequencies. From here our next steps are to apply the cuts found in the LZ paper. These cuts remove background noise such as removing the data from the border of the detector due to the radiation it gives off. We will need to find out how to apply these cuts in our code, which Eli is editing to try making it a bit more organized and readable, making the task of implementing the cuts much easier.
Mar 29: After the extended break we got back to working on our research. Unfortunately, the coronavirus has hit my neighborhood pretty bad and my father seems to have it. It’s been pretty crazy in the house, which has slowed down my contributions a bit. Now that we had the complete overlaid graphs with all the data, our job was to apply the necessary cuts. Today we applied the spatial cuts, involving a cut on both the radius and along the x,y axis. These cuts are necessary to reduce the amount of background noise coming from radon, muons, radioactive material from walls of detector, and other sources. The cut along the x,y axis and the radius were made by changing lines 65 and 66 in the .cc file, and we cut 30mm off the radius, as specified in the paper. The resulting graph can be seen below:
This graph is the same data as the previous graph, just now the spatial cuts are applied so we are working with a smaller region of interest. From here we will continue working on applying the other necessary cuts and are waiting to hear back from professor Bhatti whether these cuts applied produce the expected results.
Apr 1: Today was the first day of doing the slides and a recording of the presentation. Our group had a zoom meeting where we all were able to discuss our progress and create the slides. Recording was able to work well with zoom and I think this system will work for future meetings and recordings as well.
Apr 12: Over the past 2 weeks we have made a decent amount of progress. Unfortunately, due to my situation at home with both my father and sister getting coronavirus (they are recovered now) and with the Jewish holiday of Passover I have had much less time to do research than I would have wanted. Nevertheless, I have successfully reproduced ran the Baccarat analysis as Prof. Bhatti told us too through these commands:
In root I then typed BaccRootAnalysis(“BG”);, which produced a .root file that had this plot of interest in it:
This plot depicts the position of decay/radioactivity which deposited energy in the detector, including both simulated and reconstructed events. Based on this plot we can use it to determine what R2-Z cuts to make, and we try to exclude events close to the detector wall. I’m not exactly sure what the sources of these background events are, so I will read up on why they occur soon.
During this time John made some progress on finding information about drift time, which we would like to make cuts on. He found that the mean drift time for 1000 events is .000111 seconds, which gives us an approximation of where to cut, but we would like the minimum rather than the mean so we can make a more accurate cut. John compiled a list of questions in which he will send to Prof. Bhatti to help us further understand this issue.
Apr 13: I guess it wasn’t enough for it to be crazy in my home, so my computer decided to make things even worse. Today when trying to do more research I booted into my Linux OS, which decided to completely stop working upon booting up. I spent about 2 hours troubleshooting, but nothing fixed the issue. In order to continue research (I log into the cluster from my Linux OS), I got everything I need working on Windows. Now I can access the cluster and work from my Windows environment until this issue is fixed (should be fixed by Friday).
Apr 14: Today I read up on the sources of background radiation, as outlined in figure 3 in section 4 of the paper we are working from. Here is the figure:
Here are the notes I took on the paper to understand the sources of background:
Determination of background rates in detector
ER and NR counts in 5.6 tonne Xenon, before and after preliminary cuts were made
Preliminary cuts include position, drift time, SS… described in 3c
Focus on ROI of 1.5-6.5keV for ER and 6-30keV for NR over 1000 day live run
Total counts are expected to be 1131 ER, 1.03 NR
Apply “discrimination” for ER at 99.5%, leaving count of 5.66, and 50% for NR, leaving .52
What is discrimination and how is it done?
Why different for ER and NR?
I think that’s what outlined in the LZ documentation is acceptable
Sources of events
Radon is main for total, while atmospheric neutrinos for NR
Trace radioactivity most prevalent from gamma-emitting isotopes
LZ is going to measure radioactivity of detector materials using 2,000 radio-assays
Should reduce the contribution of trace radioactivity in bg
Surface contaminants
Dust and Rn daughters cause contamination
2 processes that generate from plate-out
Neutron release into xenon
Ions from Pb subchain
Important to get accurate position reconstruction
Cause for 4cm radial cut
LZ set target for occurences
Mobility of Rn daughters on surfaces, mainly beta emitter, and placed limit on mobility
Xenon
Radioisotopes coming from LXe cannot be shielded and account for largest proportion of bg
Naked beta emission
No associated gamma ray
Projections made by … resulting in estimated 1.53uBq/kg
Dust estimations from SURF yield .28uBq/kg
1.8uBq/kg total, adding .09 based on LUX
Kr and Ar get dispersed, so make sure to do extensive purification of Xe
Beta emitters, leading to ER events
Laboratory and cosmogenic
Neutrinos for muon-induced something causes bg
Estimated thru simulations of muon transport thru rock
Hydrogenous shielding reduces to insignificant contribution from this
Cosmogenic activation contributes approx 2.7 +- .5 mBq/kg
Physics
3 main sources, 2 accounting for ER, 1 NR
Xe and solar neutrinos seem to account for much source of bg for ER
Non-standard
Very little expected from MS of gamma rays
Other source also amount to very little source of contamination
From here I will be working on using my understanding of these sources of background to apply further cuts to our data (as well as working to create figure 2 from the paper).
Apr 19: Good news, I got my computer fixed this weekend, so now it should be much more smooth sailing (except that no external windows like emacs or TBrowser will open). Today we focused on completing the R squared vs drift time plot, which has been causing some trouble. We were encountering an error when trying to add drift time into the BG_DataRQ_V2.cc file
drift_time = singleScatter->driftTime_ns;
h_ZvsRSqr_corrA->Fill(recoR2,drift_time);
This was giving an error saying the variables was not a type specifier for all declarations, so we went into the header file, and after much search, we realized this line:
h_ZvsRSqr_corrA = new TH2D("CorrZvsRSqr_corrA", "CorrZvsRSqr_corrA",1000,-50,100,250,-50,200);
was declared outside the function, which was causing issues. Along with this there were some small fixes in the codes before we actually got it to run (make_plots had the wrong path, combining x and y to get r squared…), but eventually we got it to run with all the events and this was the resulting graph:
Now that we finished that task we will make a graph without cuts and draw lines to show where the r squared cuts will be in order to see where these cuts lie. In addition, I will begin working on recreating figure 2 from the paper.
Apr 22: I have been working a bit to recreate figure 2 from the paper, which looks like this:
I’ve added an additional graph to the .cc file, in addition to declaring it at the start of the .hh file. Unfortunately, when running it I ran into a bunch of path issues. I realized that some of my code was pointing to the wrong files to be able to generate the desired graph, and I fixed all those up. When it came to testing it, for some reason TBrowser was not working on my computer, so John went to test it instead. When he tested it it came up as an empty plot, and I believe this is due to me inputting the incorrect bounds. Hopefully when I fix the bounds it should work.
May 2: I wasn’t able to do much research in the past couple of weeks since unfortunately I got Coronavirus. Thankfully I’m better now, but with the paper, poster and presentation due very shortly I do not believe we will have time to complete figure 2. We have just finished up making a ratio plot, as seen below. This plot allows us to make a cut at the median WIMP signal, which reduces the ER background significantly. From here we are going to be concentrating mainly on concluding our research in the poster, paper, and presentations.
HONR268N: Logbook
Homework 2: 9/10/19
In this homework we got introduced to the linux command structure and got practice testing different useful commands. Here are the some of the commands:
Pwd - prints the directory you are currently in.
Ls <dirPath> - List the files in whatever directory you indicate.
Ls alone will automatically use current directory.
You can also use options -l so each item gets its own line.
Mkdir <dirPath> <dirName> - Makes a new directory with the chosen in the desired location.
Cd <dirPath> - Change the directory to indicated path (cd .. to navigate back 1 directory and cd ~ to navigate to home directory).
rm/rmdir - rmdir will remove an empty directory, while rm will remove anything you specify.
Find <dirPath> <pattern> - Finds items in the directory that match the specified pattern.
Which <pattern> - Navigates to executable file that matches the pattern.
Whoami - Displays the current user of the computer.
Echo <input> - Echoes the input given to the screen.
Touch <fileName> - Creates a new file with whatever fileName given.
Less <fileName> - View the file and scroll through content. Click ‘q’ to exit.
There are many more commands than these, but these are the pretty basic ones that we have been working with so far. Here is a screenshot of my terminal playing with these commands:
File extensions:
There are a few different file extensions that I familiarized myself with for the class:
.txt - Plain text stored in ASCII format
.sh or .csh - Linux scripts, which are grouping of linux commands that are performed sequentially
.x or .exe - Executable files, which is compiled code ready to run.
Emacs:
Emacs is a text-editor that we will be using throughout the class. In this homework I got to use emacs a bit and get a feel for how to operate it. Below are some useful commands:
Emacs -nw <fileName> - Opens up the file in the terminal mode.
To save work in this mode click [Ctrl] - c [Ctrl] - x, and then type ‘y’ to save changes.
Emacs <fileName> - Opens up a GUI for emacs where you can edit the file.
Emacs <fileName> & - Same as previous, except you can enter more commands into terminal (terminal does not keep the process running. It opens the emacs GUI and leaves).
To save your work in the GUI, simple navigate to file → save as, and choose what you want to save it as.
To find a pattern in your code, type [Alt] - x then type your desired pattern.
Homework 3: 9/23/19
In this homework I got very familiar with shell scripts and how to use the commands.
Use “#” to insert a comment
Except very important to start script with #!/bin/tcsh. This will allow you to use all the commands associated with tcsh and must be at the start of bash scripts!
Can input our past Linux commands such as touch, echo… in the script.
Use set ARGS = $1 to allow an argument to be given when running the script
Here is the commented script that I wrote:
The use of bash scripts is that they allow you to run the terminal commands automatically and with variable arguments
To run the script you need to start do tcsh <filename> -<argument>
Can use chmod +x <filename> to change the permissions on the file input, and make it an executable file
This allows you to run your file without starting with tcsh
The difference between ARG and $ARG is that ARG is just a string while $ARG indicates it’s retrieving the value of a variable
Check homework #3 for the rest of the commands
Homework #4:
In this homework, I learned how to write c++ scripts, including declaration of variables, loops, and how to use the debugger.
Ex: #include <iostream>
using namespace std;
int main() {
cout <<"Hello World!" << endl; //Print hello world to screen followed by end line (endl)
return 0; //Exit the program
}
This program will print “hello world” in the terminal, go to the next line, and then exit.
#include <iostream> → Tells compiler to include iostream library (for functions such as cout)
Using namespace std; → Allows you to write just function name instead of std::<name> (i.e. cout instead of std::cout)
Int main() → Begins the method where your code will begin
Cout <<””Hel…” << endl; → Tells the command line to output text in the middle, and create a new line at the end.
Return 0; → Ends the program
In order to compile the program we need to go back to the terminal and write g++ <programName>
This translates the c++ code into a file that the computer can actually read
./a.out → After compiled, a new file will be made named a.out (can change the name with -O option; a.out is the default), and inputting this command will run your program
Note: after every change you need to recompile the code or else the last compiled code will run
find /usr/include/c++ -name "4*" → Finds the version of c++ downloaded. They are from the 4th release and are stored in the specified folder
g++ -dumpspecs → List the specifications of the g++ compiler
Some header files included: (header files contain functions to use in programs)
ls /usr/include/c++/4.4.4/ 0 → Lists files in 4.4.4 in c++ folder
ls /usr/include/math.h → Returns the file path for math.h
ls /usr/include/bits/*math*.h → Returns header files with ‘math’ in the name
More <filename> → displays the contents of a file in the terminal window
G++ --help → Will display the available options you can use with g++
Variables code:
In this code I test out how to set variables, including ints and doubles. I also test outputting the variables to the console, as can be seen by the cout commands
Operation with numbers:
Here I learned how to do operations with variables, such as incrementing and decrementing a number.
++ increments by 1 and -- decrements by 1, as seen by the output.
Non-numeric variables:
Here I learned how to use boolean statements in c++ and used them in some examples
I declared a bool named prop, set it equal to a logical comparison statement, and printed the results
Interesting to note that when the statement is true, c++ returns 1 and when false, it returns 0. This is because the computer only knows 0 and 1, so it uses those values to represent true/false
Loops:
Here I learned while loops and used them to print out the numbers 1-10, decrementing 1 each time.
Here is the same loops as before, just in a for loop style. Each loop style have different times when they should be used (a for loop makes sense for this).
Here was an example combining what I learned in the previous examples, using both for and while loops
Additionally, here we have nested loops, which are loops inside of loops. This runs one cycle of the outer loops, goes through all the inner loop, then repeats until outer loop in finished.
You can see how results gets longer each time since as n gets bigger, the inside loop runs more time, outputting more numbers
Debug
Debugging is a way to test your code to find the errors
There are many ways to debug including cout statements, setting a variable to true to debug (what is done in code to the right), or using debugger programs such as gdb
My code sets a variable named idebug to false at the start of my code, then I change it to 1, saying I want to debug, and then in the loop it will output n so I can debug the code.
This is useful since only prints statements when you set your debug variable to true
Practical practice of loops and conditionals: 10/3/19
In order to get more comfortable coding in C++, especially with loops and conditionals, I decided I would do a mini project of estimating pi. I have done this before in python and I thought doing it in C++ would be good practice.
The idea behind the calculation is that you have a unit square of side length 1 and a unit circle of radius 1 inscribed inside. The area of the square is 1 and the circle’s area is pi/4 (pi * ½^2). This allows us to use the relation (area of circle)/(area of square) = (# points inside circle)/(# points inside square), which gives us the equation pi = 4 * (# in circle / # in square) for a large quantity of points. In order to code this I needed to generate random points and see whether they lie inside or outside the circle. I can check this by seeing whether the equation x^2 + y^2 < 1 (or equal to), meaning it lies within the circle. To the side is my code, which approximated pi as 3.14134, which is a relatively accurate estimation. In order to accomplish this I needed to use a for loop to iterate over randomly generated points, and an if statement to check whether the point lies within the circle. Additionally, I needed to look up a couple other parts such as generating random numbers. This was overall a fun mini-project that helped give me a better grasp of how to code in C++.
Homework 5: 10/3/19
In this homework I further familiarized myself with logic statements and was introduced to the concept of pointers in C++.
‘If’ conditional
Here there is nested loops, which iterates over one loop on the outside, and each iteration of that loop the inner loop is fully executed.
Additionally, there is an if statement, checking whether n > 5 and m > 50. If both of those conditions are met, the code will output the values of m and n.
Throughout the loops I decrement m and n, and reset m as 100 after the loop finishes (it might have made more sense to use for loops so I wouldn’t need to reset the value of m).
Pointers
A pointer is a way to access the compartment where the computer stores the variable’s information
A pointer ‘points’ to the location of the object in memory, and can retrieve both the location and the information stored in that compartment.
The most important aspect of pointers is that they allow for highly effective memory allocation, as to not use too much memory in computationally expensive tasks.
Declared a variable named i and set a pointer named p assigned to it
* operator gets the value of the information stored in the compartment
& operator gets the memory address
Program 1:
Program 2:
In both program 1 and 2 2 variables are declared and printed. However in program 2 variable p is a pointer of i, which means that they are directly connected. Therefore, when i is updated, so too is p, and vice versa. That is why in program 1 only 1 variable updates while in program 2 both are updated as the same value. This is one of the major applications of pointers.
In this example I leaned the ‘new’ construct, which is a way of telling the computer that it should store the variable p in an empty spot in memory
This is very similar to declaring a variable and then setting a pointer connected to it, but here the only way to access the information is by using the * operator
Here is my own code using the skills I learned in this homework
It begins by declaring a pointer with type double equal to pi. P2 is then made and is a pointer to p1, so therefore when I change the value of p1, p2’s value is changed as well
Homework 6: 10/10/19
In this homework I got a much deeper understanding of C++ and learned some more of its functionalities such as arrays, reading/writing files, generating random numbers, classes, and using it to solve real-world questions
Arrays:
Arrays are used to store bunches of related data of the same type in a single variable, with each element able to be accessed by calling it’s index
Declaration of arrays is similar to normal declaration, where you indicate the type and then in brackets you put the number of elements
The elements are declared in {}, with a comma separating each element
You call an element of the array by calling the variableName[n], with n being the index of the desired element (Note: 1st element is numbered 0)
Additionally, arrays can be initialized like LL, where the number of elements is the first number multiplied by the second number, where LL is a 2-d array (2 columns, 3 rows)
Accessing a value from a 2-d array works by first calling the desired row number then the desired column
Commenting code:
It is important to comment code so other people can easily understand what your code does
Syntax for adding a single line comment is //, and everything on that line after that will be commented (can add \ to tell it to comment the next line too)
Multiline comment syntax begins with /* and ends the comment at */
Variable names:
It is important to give meaningful names to your variables
This allows other people reading your code have a clear understanding of what the variable is used for in the code
Reading data from a file
When working in large teams, often code can get messy if everyone is working on the same file
That is why being able to read in other files and use the information inside that file, or writing data to a file, can be very useful
This allows the use of text, functions, classes, and more from an array of sources
The above code demonstrates the writing of a text file using the fstream class
The steps are import the necessary code, initialize the reader, open the reading, write the content, then close
In the terminal you can see that it created the file and that the contents are what we expect
Combining code from different files:
You are able to write multiple files and use all the relevant code from each one
Main program:
Dot product:
Vectors:
The above demonstrates using 3 different files in a single code
The main script adds the name of the dot product code as to allow it to use the dot_prod method from it
Main code then opens the file stream, declares 2 arrays, and stores the data from the vectors.txt script in those arrays
Lastly, the code calls the dot_prod function with the vectors received from the txt file, and the dot product is calculated and displayed in the terminal
Output:
I then updated this code to use the last line of the txt file as a scalar of the vectors, and created a new program called scalarmult.cpp to do the scalar multiplication on a vector
Updated main code:
New program for scalar multiplication:
In the main code I included the new program so I can use the scalar_mult function and added a new value called ‘scalar’ to the main program
‘Scalar’ reads in the last line of the vectors.txt file to get the value
Call scalar_mult() function on each vector to get the scaled vector
Output:
Random numbers:
Random number generation is an important part of monte carlo simulations
This code will generate a random number n times (you input n), and plot it on a histogram.
Output:
Running the code without the ‘&’ character generates all of the initial random number. This is because now that inew is just a variable and not a pointer, and that the change inside the function does not affect the value of inew outside. However, allowing inew to be a pointer with the ‘&’ character, when the value is changed in the function, next time the function is called and checks the stored location, the value will be changed, allowing for new randomly generated numbers to occur.
With n bytes you can store 2^8n - 1 values since each bit has 2 options (0 or 1), and there are 8 bits per byte. We subtract 1 from this value since we start counting from 0.
Printing the random output:
The code produces these outputs. You can notice that the outputted random numbers are the same for the first 10 since this method generates pseudo-random numbers. That means that while the initial time the numbers are generated it is random, it will be consistent if nothing else is changed.
In the example to the left I changed the seed and now you can see that the first 10 numbers have changed. The seed is a way of standardizing the random numbers you will get in a consistent way. For example, for this project, anyone that uses the same seed will produce the same pattern of random numbers
If you change the amount of loops, you will see a different distribution. With the first example I used 1,000 iterations, which has about equal distribution across all numbers, and for the second I used 100,000 iterations, and with that it is perfectly consistent. This makes sense since according to the law of large numbers, the more times you do an action with a certain probability, the closer you will get to the expected distribution. Here all numbers are equally likely since we are generating a random number within this range, so the more iterations I run it, the closer it should be to an equal distribution across the numbers.
The general method for producing random numbers is through the rand function and then multiplying that by whatever number you would want your random number to be between. For example, the following code produces a random number between 0 - 100:
Continuation of homework 6: 10/17/19
Calorimeter
This assignment helped me understand what the ‘resolution’ of a graph meant and how it relates to particle physics experiments and discoveries
Secretparameter.txt file → 91. 15.0.005
This file sets the parameters for the generated graph including mean value, resolution…
Resolutions.C:
In this code I needed to add a closing bracket at the end of the code. Initially I put it at the very end, but I realized then after each iteration the graph was saving and updating, causing it to take a very long time to run many iterations. Therefore, I moved the bracket to only update and save after all iterations are complete, and this greatly increased the speed.
Another issue I found a lot of people had was that many people names the secretparameters file secretparameters.tx instead of .txt since there was a typo on the webpage
To the right is the output when I have N set to 1. This produces a single randomly generated mass in accordance with the secretparameters file. Since there is only a single observation, no real guess can be made for the true mass of the mother particle
Here I ran the same code, just now with N = 10. Now we can see a bit more of a pattern. You can guess the true mass would probably be around 80-120 from these values, but there is still too little information to make a good guess.
I ran the code again with N = 100, and again we see a similar output. I can make a better guess that the true mass is probably around 90ish now, but 100 data points is still too little information to get a really clear sense of the true mass.
When I run it with N = 1000 we finally start seeing a nice Gaussian distribution. Here you can guess the mass of the mother particle is approximately 90, and almost definitely will fall in a range between 87 - 93.
Now we change the first number in secretparameters.txt to 1 (instead of 91). Below are the outputs:
With just 1 point we generated a negative number, and since those are not displayed on the graph, it just saves an empty graph. The more points we include, the clearer you can see the pattern, except from the last graph you can tell that it is a gaussian distribution, however, much of the left side is cut off since the plot does not display negative results.
Homework 7: 10/17/19
Madgraphs
First I opened firefox and signed up for an account to download madgraphs, and a zip file was downloaded onto my VM
I then unzipped the file, created the new directory, and moved my unzipped file into there and began madgraphs
Here I ran the command that will generate a new process with the specified particles and displayed the processes
Here is the display of particles and multi particles from our process
You can add more processes through the following command:
To get the output of the processes to allow us to calculate the cross-sections run:
Generate the events using:
Results shown from the madgraph processes:
Unzipping the file from the run:
I then downloaded lhe2root.py and ran the command to convert it to root, then ran the root command to see the final file (10/24/19)
Homework 8: 10/31/19
In this homework we worked with root to rediscover the Z boson!
First I downloaded this link in my VM and unpacked the .tgz file with tar -xvzf <filename>
Then I created a new directory called Higgs_analysis and moved the files there
For the next step I opened 1 of the 4 root files unpacked and I ran the command MakeClass(“HiggsAnalysis) in root, which created 2 new files, HiggsAnalysis.C and HiggsAnalysis.h (a header file for the C++ code)
Code
The next step is to run the code in the HiggsAnalysis.C file. This code creates a Tlorentzvector for 2 leptons, which in this case are electrons. The goal of the code is to combine the Tlorentz Vectors (which is all done behind the scenes through the function TLorentzVector) in such a way that when plotting the invariant mass of their combination we obtain the invariant mass of the Z-boson. This code essentially is a way to ‘rediscover’ the Z-boson. The code is below:
Running these commands in a root section then ran the code and produced the output below:
This plot is a rediscovery of the Z-boson! You can clearly see that the combination of the leptons produced a new particle which has an invariant mass of 91GeV, which is the mass of the Z-boson. This plot also shows that there is noise in the combination, as seen by the bump on the left of the graph. In the homework we do this calculation again, but now with 4 leptons, combining them to form 2 Z-bosons, and combining the Z-bosons to rediscover Higgs!
Machine learning with TMVA!
*Note: Since the code is very long I am not including it in the logbook
We used this code to perform multivariate data analysis in root
This ran whichever ML models we specified, and created files displaying the results
With this code it tries to relate 4 random variables that we created, and uses ML models to make cuts and hopefully be able to identify which variable each data point of new data matches to. Below is a picture during the training process
The code produced several output files
Inside the TMVA.root file we can open it up using the TBrowser T command in root, and here we see lots of examples of the results from the ML model
In addition, we can open the correlation matrix and fill it using the command colv, and it displays the correlation between the variables
We want to feed the variables that are not as highly correlated as input to a ML model since the fewer the variables the easier the ML model should be able to learn
We will replicate this process using variables such as the 4 electrons masses, phis, pts… and use that to predict the higgs (and Z-boson)!
Homework 9: 11/13/19
In this homework we finally started coding with Python! This homework was an intro to many of the concepts we already learned in C++ transferred into python syntax, in addition to working with Google colab and working with some python external libraries.
CPU vs GPU
The CPU is the ‘brain’ of the computer, where it handles very general tasks, while the GPU has many cores to be utilized for specialized tasks such as graphics or matrix operations, and can do thousands of operations in parallel.
1st code:
Here we began by declaring new variables, doing a mathematical operation on the variables, and printing the result
2nd code:
We move onto a more useful task, solving for the area of a circle, by defining pi (I estimated it using the first 100 digits) and a diameter, in which I then wrote the formula to calculate the area.
3rd code:
This code showed the difference between 2 attempts at changing values
The first method doesn’t work since after I set a = b, now all the data in a is the data in b. That means when I now go to set b = a, the data in a is already the same as b, and we lost the initial data that was in a that was intended to copy over
In the 2nd method we declare a temporary variable that holds onto the data in a, re-assigns the data in a to be b, then assigns b to the data in the temporary variable
The 2nd method is the right way to accomplish the goal
1st code:
Here I am using the same concept as last code, creating new temporary variables to store the initial data. After that is done I can then re-assign everything as it should be with no error
Functions
Functions are pieces of code that will accomplish a certain task given specified parameters. They are important since often we will want to do the same task, and a function gives us a way to reuse the same code to accomplish the task
2nd code:
This is an example of a code that will take a certain number and round it to 2 decimal places.
3rd code:
This is an even more general version of the last function, which will take a number and the amount of digits you want to round it to, and it will perform that operation and return the rounded number.
Built in functions:
1st code:
The first code essentially is a display of many built-in functions that python contains which can be very useful in many cases.
Python has all sorts of built-in functions that can be used on variables, strings, lists, and many more things, which is one of the reasons why python is so popular.
2nd code:
This code is an example of using the built-in functions (specifically max), and how it can be useful for solving certain tasks.
Booleans and conditionals
Boolean and conditionals are an essential part of all languages since they provide a way for us to determine whether a condition is true, and then we can perform different operations based on the results
1st code:
This is an example of some of the comparators such as ==, >, and !=. These are the core of boolean and conditional statements since they are the operators that will compare the variables whichever way you need
2nd code:
This is an example of putting together the concepts of the built-in functions and boolean statements. Here I write a statement that will check whether the square of the minimum value times 10 in my list a is greater than the sum of all the elements in my list.
Since it returned false, I know that this condition is not met
Above is an example of the same code being run 3 times with different input variables. You can see how the results differ between each code since the values of a and b were changed.
This shows the power of conditionals, being able to perform separate tasks depending on the result of a certain boolean expression.
We now shift to using these conditionals in functions
Code 1:
This code will take in 2 strings and will test to see which is larger
You can see based on the results, the conditionals will make the function return different values based on this condition, as shown in the output.
Code 2:
This code is a function giving the requirements necessary to make a cake. The logic outlines what is necessary,and if you call the function and you meet the reuired conditions, it will return true, signifying that you can make a cake, otherwise it will return false.
Modulo
Here we use the modulo operator, which returns the remainder after a division is done.
It is a very useful tool to check whether a number is even or odd, which is what I use it for in the example above.
In my code I take the list and check whether the absolute value of the lowest number is odd by using the built-in functions and modulo operator, and if the number is odd I return ‘odd’, otherwise I return ‘even.’
Here is also use what is known as a ternary operator, allowing me to write the if/else logic in a single line of code.
Lists
Here I make a function that returns the value in the 3rd value of an array (if it exists).
I also made a 2d array, which is an array with 2 dimensions, and each value can be retrieved by first specifying the row then the column, as shown by the output of my code and the diagram below.
This type of array can be very useful in certain scenarios, such as making a game board like chess, which can be represented as an 8x8 2d array.
Here is another example of a 2d array, and this one stores information about soccer teams in order from best to worst (higher the row the worse the team).
My function returns the captain of the losing team by indexing the last row (pass in -1 as index and it will return the last item), and the 2nd item (index 1), which is where the captain is stored.
My use of the -1 allows this to extend to more general cases where you can have lists of 100’s of teams and the function will still return the captain of the worst team.
This is an example of rearranging the items in an array, with the idea that a new item in mario was created that swaps the position of the person in 1st and last.
I use the concept of making a temporary variable to store the value of the last racer (again I use index -1 to get last), and then do the reassignment.
Loops
Code 1:
This is a basic example of a for loop in python. We are using the loop to iterate over an array, which means that at each iteration the value of element will be the value of the nth element of the list. This is different from C++ where we used a variable i to count the current index and then get the value by calling my_list[i].
Since the return false is at the end of the loop, if the logic statement is triggered even once (1 value in the array is a 7), then the function will return true.
Code 2:
This is an example of a while loop in python. This syntax is very similar to C++ and the code is a very simple function that will add 1 to a number until it reaches a value including or greater than 10.
Code 3:
Here is a program I wrote, which is pretty similar to code 1, except here the logic statement uses modulo to check if the number is divisible by 7, and adds 1 to a counter called frequency, which it returns after the loop iterated over all the elements of the list.
Dictionaries
Dictionaries are an import data structure, where each key maps to a certain value
Code 1:
This is a basic example of a dictionary. It shows how you initialize one using the curly braces and a colon to seperate key and value. It also demonstrates the use of one of the methods you can call on dictionaries, which is the ‘get’ method that returns the value at the given key
Code 2:
This code was lots of fun. We began by defining the values for each suit, and then we made that into a dictionary of a deck
You can follow the code to see how I implemented each of the given instructions, but here I became more comfortable working with arrays, familiarizing myself how to add and remove data, access certain data, remove data… Most importantly, this gave me the important skill of learning how to do some of these tasks by going to the link given or other resources to figure out how to do these tasks in python
External libraries and plotting
Importing and making good use of the vast amount of libraries is one of the major benefits of python and is a major reason why it is so successful. The libraries often have efficient code that can help make many of your programming tasks much easier. Here we were introduced to some of the important libraries for data analysis and machine learning, namely math, matplotlib, and numpy. There are many more such as pandas, seaborn, tensorflow… that are commonly used libraries, and it is important to know how to work with these libraries
Code 1:
Here I wrote a simple example of some of the functionality of the math library. It has common mathematical values such as pi stored, and useful functions like log, gcd, and cos, as demonstrated.
Code 2:
In this code we use matplotlib to make a plot. We defined our data points, our graph, and the axis names, and it was as simple as calling a function from matplotlib to create a nice graph. That is the power of these libraries
Above is some code that uses some functionality from the numpy library to generate data, and create a graph with more than just 2 variables. This demonstrates how easy it is to generate cool data using numpy function such as randn and arange, and how matplotlib can incorporate plotting more variables by using new dimensions such as color and size.
For my plot I decided to make a mobius strip since they’re awesome objects! Using some math and some extra tools within the matplotlib (and some help from the internet), I was able to plot a complex object like a mobius strip in just a handful of lines of code. Without the libraries this would take many more lines of code, not be as pretty, and be much less extendable. To me, this is a great display of how powerful it can be to make good use out of the python libraries
Example with real dataset
This is an example of plotting with real data taken from the Dimuon dataset. We collect the data through using pandas to read the csv file from the GitHub page, then we do some operations on the dataset, such as looking at the first couple rows, adding a couple new columns, and plotting a histogram of the total energy in 10 different bins within the range of 0-120. Here you can see how easy it is to quickly get a grasp of a dataset and do some analysis.
Homework 9½: 11/14/19
We finally got to writing deep learning networks! We worked with one of the most popular datasets for image recognition, the MNIST dataset, and wrote a neural network that would attempt to classify these images to their corresponding value. Below is a commented script containing the code: