HONR269L Logbook (Spring 2021): --------------------
Projects Summaries:
Kaustubh Agashe (Top Quark Studies):
One very interesting question that physicists ask today concerns the Planck-weak hierarchy; that is, why is there such a huge disparity between the masses of the lightest and heaviest particles, and why is there such a huge disparity between the strongest and weakest fundamental forces? This question is unanswered, and many experiments seek to answer it. The top quark, being one of the heaviest particles out there, is a very strong candidate for shedding light on the strange physics of extremely heavy subatomic particles. By performing surveys of the top quarks' properties, and by searching for irregularities in its production and decay modes, it is hoped that some insight can be gained in the field of the Planck-weak hierarchy.
IceCube Observatory (Neutrinos):
The IceCube observatory is a very large cosmic particle detector out in Antarctica dedicated to the detection of neutrinos. Neutrinos are ghostly particles that rarely interact with regular matter at all. Detecting them, then, let alone determining where they're coming from and how they were produced, is a tricky business.
The IceCube data analysts use signals detected from over a cubic kilometer of ice in order to try to see collisions between the atoms of the ice and neutrinos from space, calculating angles and energies to pin the neutrinos down to various kinds of astronomical sources in order to better understand the physics of these particles.
LZ Next Generation (Dark Matter):
The universe is made up of a large amount of massive stuff, and yet we don't know what the majority of it is! Dark matter is a mysterious phenomenon that causes galaxies to appear to have more mass than can be accounted for by matter that we can see in the EM spectrum. The LZ experiment plans to use Xenon based detectors to detect the interactions of WIMPs, or weakly-interacting massive particles, with the xenon in order to place constraints on both how much dark matter exists in our solar neighborhood, and in order to place constraints on what exactly dark matter is. The LZ hopes to be the definitive experiment on proving either that dark matter can be detected by detecting WIMPs in our own neighborhood, or that it can't and some new insight is needed to explain the phenomenon, by providing higher sensitivity than any previous dark matter search ever conducted.
Week of 3-4-2021 Update: I have chosen to work on the LZ Dark matter experiment with professor Bhatti, and we've already had our first meeting to start discussing things. First, some explanation of the LZ experiment: it is a dark matter detector that essentially tries to detect very weak interactions between dark matter particles and xenon atoms, the xenon being all in a big tank. These interactions produce S1 and S2 signals, two different kinds of signals, and by looking at the ratio of these signals you can tell (usually) which kind of signal you're receiving in the detector. You can also tell some other stuff, for example how many times a particle interacted with the detector, which is good because you know dark matter should only be interacting once in the detector due to its weak interaction.
Now, as for what we're doing this semester, as I understand it, we're going to be using ROOT a lot to try and recreate some results from a paper on the sensitivity of the LZ experiment. We're also going to be looking into some C++ code under the hood to try and do this. We started looking at the code last week - from what I can tell, the events basically have their own data structure individually, and each event carries a bunch of information about how it's coming in and what energy it has, along with what the detector sees from it. The code then loops through all such events, extracting some information where needed and making calculations and plots, eventually writing out a bunch of histograms at the end of the loop. We're going to start trying to make some plots of our own tomorrow using these files, mostly by modifying "WIMP_DataRQ_V3.cc" and the code in it.
Week of 3-11-2021 Update: We've done a good bit of stuff this week. The code is a bit hard to understand, especially because it's all uncommented, but our professor is asking us to run the analysis on multiple different data sets to start with (ie generate the plots for a "background" dataset, an "H3" dataset, and the original dataset we started with). That turned out to be pretty easy to do. There's a line near the top that specifies a String inputDir, which we can change to point towards any other dataset we want. In doing so, you do also need to change the output directory. There's another String near the top called "outRootFileName" that you can change, just so that you don't overwrite any of your previous work while doing new analysis. Once we did this, it was as easy as just running the code to get new results. We inspect these results by using a TBrowser within ROOT, which allows us to look through all of the plots that were generated by the code. The biggest problem we're having right now is that there are some plots we want to make that aren't already being generated by the code, which means of course that we need to add code to generate them. But when we try to copy the lines that generate other histograms and write our own, they usually end up just being empty or not being generated. It's a pretty strange problem - we're going to continue looking into it.
Week of 3-25-2021 Update: Some big progress on the front of generating plots. Basically, along with the .cc file there is also a .hh file which has more C++ code. The way it kind of works is that the .cc code calls a function, "BookHistograms()", which is actually a function in the .hh file, and this function is the key function that creates new histograms in the first place and gets the variable names ready to be filled with data. We weren't really looking at the .hh file, so we didn't really see any of that until recently. When we were trying to make histograms ourselves, they were empty because we never really got it ready to be used in the booking function, so the program didn't know where to put new data.
With that being done, we started analyzing the plots. The first one we looked at was "ParentPositionXYAll", which can be found without adding code. It looks basically like a bunch of dots in a circle, which makes sense - from a top down view of the detector, you would expect to get basically an even distribution of events within the detector. We also looked at the "ParentPositionRSqrZAll", which plots horizontal distance from the center of the detector against vertical position, and this also has a basically even distribution. One thing we're going to look into is how to make different kinds of plots, like heatmaps or color plots, within ROOT - I wasn't able to find any easy ways to do these kinds of things. One last exciting plot we looked at for the H3 set was the "RawS1Photons_logRawS2Photons_ss", with the final ss indicating "single-scatter" events only. This plot is very exciting because it is basically the big plot that our group is trying to recreate, and it seems to match what the original paper published very well. It has the same shape and lies along roughly the same line. From here, what we need to do is find some way to add mean lines and confidence intervals onto these plots so we can get some numbers out of them, and also we have a couple more plots that we need to generate by hand still.
Week of 4-1-2021 Update: We're having some strange trouble getting the dataset to work properly. The problem lies in the background dataset - the energy plots for it are all very messed up and weird. They have a bunch of events lying at the zero point, which shouldn't be right - all events should have some parent energy. The background dataset definitely should have some data in it, since we were able to produce position plots for this and those position plots were what we expected. But whenever we try to plot anything to do with energy, it just comes out with a bunch of zeroes. We also tried to go into the code and physically remove any datapoints that had a zero before they reached the end of the loop and got plotted by using a "continue" statement within an if statement that said if(parentEnergy < 0.06), but even this didn't work. The parent energy must get calculated multiple times, meaning we can't just filter it out once in the loop. It's a very thorny problem because we aren't having these issues with the other two datasets, the H3 and original sets. Another thing we don't really know how to explain is that it seems like the H3 dataset, which is supposed to be simulated data as far as we can tell, matches what we'd expect from background a lot better when we plot the raw S1 vs log raw S2.
HONR268N Logbook (Fall 2020): -----------------------
9-16-2020: The previous weeks' Wednesday class was spent primarily exploring commands for Linux. A summary of some highlights is listed below:
pwd - print current directory
ls - print subdirectories and files of current directory
(a quick note on some symbols: . is a shortcut for current directory, .. is a shortcut for parent directory, ~ is a shortcut for home directory. So ls ~ will list subs and files in the home directory, no matter where you are)
ls -l lists things with more detail (file sizes, owners, etc.)
cd - change directory (eg. cd ~ takes you to home)
mkdir - make a directory -- rmdir - delete an empty directory -- rm - delete a file -- rm -r - delete a whole directory, and everything in it (BE CAREFUL)
echo - prints what you tell it to print back into the terminal
(you can use the up arrow to access earlier commands that you typed. You can use TAB as a sort of "auto-complete" on file names and commands)
[command name] --help - displays some info on how to use the command (Google is more helpful for simple commands though)
find [directory] -name "[regexp]" - searches the directory for files matching that name, or regular expression (I'm not an expert on those, you'll have to look it up)
(the * character, as in "file*", acts as a wildcard that can be anything. So find . -name "file*" will find any file in directory . with a name that starts with "file", and then anything after that (like file1 or fileSecret))
touch - makes a file with nothing in it (probably useful for scripts)
more - read text in files (eg. more log.txt)
cat - print text in files to console
(The > and >> characters can be used to send the output of a console command into a text file. For example, echo "test" > log.txt will put test, the output of echo "test", into a file called log.txt. If you use >, this will overwrite whatever was in the file to begin with. If you use >>, the new text is simply concatenated onto whatever was there already, and nothing is deleted)
(The | character is a "pipe", sending the output of one command to be an input to another command. For example, echo "test" | more will echo test silently, output test, then pipe test into more, which will read test out onto the console. On the other hand, (echo "test" > log.txt) | more will output nothing because (echo "test" > log.txt) produces no output, but rather just changes log.txt, so nothing is piped into more and nothing is read out to console)
cp - copies a file from one place to another. So cp log.txt ~ sends a copy of log.txt to home
mv - moves a file from one place to another.
grep - finds all lines with specified text within a file, as in grep "3" log.txt (returns all lines of log.txt with a 3 in them)
chmod - a command with many different flags that changes the read, write, and execution permissions of a specified file. Can only be done on your own files
(use ctrl + c to kill a command)
9-21-2020: We have begun to learn about tcsh scripting, including writing simple scripts. When you have a script "testScript.tcsh", then to run it you should run the command
tcsh ./testScript.tcsh <argument 1> <argument 2> . . .
./testScript.tcsh is a directory, which points to your script (. means current directory, remember). The arguments can be accessed in your script with the line set X = $1, set Y = $2, etc.
A tcsh script is basically a list of standard linux commands all run at one time, meaning that all the standard commands (echo, touch, > and >>, |, etc.) will all be useful for tcsh scripting.
The command "emacs" allows you to edit text documents, and tcsh scripts, as in emacs firstScript.tcsh & to edit a tcsh script. Your script should start with:
#!/bin/tcsh
This is just a starting line for every script, similar to the main method of Java (if you code in java - if not, this analogy made no sense to you).
You can then use standard commands of linux, along with variables $1, $2 (for passed arguments) and set X (to make variables) in order to start making a script that does something useful. One thing to note is that your script has to actually "output" something, whether by altering a file or by printing something to the console. If your script goes and searches a directory for some files, but you don't tell the script to tell you where it found those files (with "echo" or the > operator) then your script will do everything silently and you will gain nothing from it.
One more tip, and then I'll log some scripts I made for homework: if you're stuck trying to do something or you don't know how to use a command, Google it. By Googling examples of how a command is used, you'll figure out the general pattern of how to use it much faster than by trial and error. If you don't know how to use the "cp" command, for example, a simple Google search will show you that you have to say cp, then a file you want copied, then a directory to which you want it copied.
9-23-2020:
9-30-2020: For Homework 4, we have started learning some C++ coding (which, as a Java person, I am much more comfortable with than all this tcsh stuff). I have attached my 5 homework solutions here, with every line commented.
Rather than insert all those images though, I instead placed them all in a Google Drive folder. The link is here, and should stay open for several years at least: https://drive.google.com/drive/folders/11r14fHGiTmJHa2t1uLihDIuNSTK0_Pk6?usp=sharing
For the physics homework, we were asked to research why muons travel farther in a detector than most other particles. Essentially, it is because muons are heavy and travel very fast when they leave collisions inside particle accelerators. This means muons have very high momenta in the accelerator, and this means that relativistic time dilation applies. The muons' clock runs much slower than an observer's proper time, and thus the muon can travel further than one might classically expect given their half-lives. In addition, the high momentum means the muons are hard to stop, and penetrate far into detector layers.
Reference: http://www.physics.smu.edu/cooley/phy3305/lectures/muon_slides.pdf
10-14-2020: Homework 5 asks us to use if statements and while loops to make something. I decided to create a script that will output the numbers from 2 to numsToTest and determine if each of these integers is prime or composite. The script is copied below; it uses a very basic form of trial division where we test a number by dividing it by every possible divisor less than half the testing number:
#include <iostream>
#include <fstream>
using namespace std;
int main() {
int numsToTest = 100; //Upper bound on numbers to test
int n = 2; //The current number being tested
bool prime = true; //Stores whether it's prime or not
while(n < numsToTest) {
for(int i = 2; i < numsToTest && i < (n/2); i++) {
if(n % i == 0) {
prime = false; //Not prime if it has a divisor
i = numsToTest; //End the loop, we don't need to go further
}
}
if(prime) { //If the number weren't prime, prime would be false here
cout<<n<<" is prime."<<endl;
} else {
cout<<n<<" is composite."<<endl;
}
n++; //Move to next number
prime = true; //Reset prime
} //end while loop
}
We are also asked to modify a script and make it do something different. I'll explain some of the problem solving process here:
First, the new problem asks us to make the script find files modified in the last month. Since the day is irrelevant, I deleted all the lines that had anything to do with the day (tmpday or today are the relevant variables). Next, we need a command that can figure out whether a file has been modified in the current month. Examining the output of the command ls -lat in your own home directory, you may notice that the date of last modification is printed in the ls -lat output. So by piping "ls -lat" int a grep for "$month", we can search for every line in the ls -lat output that has the current month somewhere in it. By grepping for "$month " specifically, with a space afterwards, we can in fact eliminate the off chance that a file name has, for example, "Oct" somewhere in it and throw off our search. But this will show us files that were modified years ago as well, such as a file modified Oct 2018. We notice that files modified more than a year ago are listed with the year, whereas more recent files are printed with a timestamp, like 10:58. So in fact we can insert another grep for ':', a single colon, which would usually only appear as part of a timestamp. Thus, the following script will search your directory for files modified in the last month.
#!/bin/tcsh
set month = `date | awk '{print $2}'`
echo $month
ls -lat | grep -E ':' | grep -E "$month "
11-1-2020: Each question of HW 6 will be taken in turn:
Q: How are the elements distributed in LL[][]? LL[3][2] is an array with 3 columns and 2 rows (although rows and columns are arbitrary - they can be switched however you want). The output tells us that LL[0][0] = 1, LL[0][1] = 2, and LL[0][2] = 3. So the top row, "row 0", has the number 1 to 3 in order. The output then tells us that the second row, "row 1", has the numbers 3, 4, and 5 in order. The program in fact has a bug, where the bounds of the while loops are swapped - if you correct this, you find the second row to correctly by 4, 5, and 6 instead. Elements assigned to a 2D array from a list are first filled into the first row, and then progressively placed in the later rows.
Q: What does '&' do in the random number generating code? What happens if it is removed? The & has something to do with the way variables are passed into functions. Normally, for a function to run, the function won't directly manipulate the variables associated with its arguments: instead, the function will save a copy of the values of the arguments and manipulate those as needed. When we simply do getRanFlat(int inew), we create a variable inew, and then immediately create a copy of it that gets passed into the random number generator. So inew itself doesn't actually get changed to any random value. With getRanFlat(int& inew), we allow getRanFlat to directly manipulate our new variable inew, and in particular to set its value to a random number. If you run the program without the ampersand in there, you end up just getting only zeroes for your random numbers because inew is always set to zero, and never to anything else as getRanFlat is supposed to do.
Q: Why can we count only up to 2^8n-1 if we count with n bytes of memory? Suppose we have n bytes, and thus 8n distinct bits, of memory to manipulate. We wish to create some function f from bitstrings to integers which will be used by the computer to interpret memory as a number. f must be one-to-one, for the simple reason that it must be unambiguous during read-in which integer a bitstring represents, and it must be unambiguous at write time what bitstring to write for a given integer. With n bytes, or 8n bits, simple combinatorics shows there are 2^8n distinct bitstrings that can be written. Suppose that we also want our function f to not "skip" any integers during its assignment of bitstrings to integers, so it must assign a distinct bitstring to a distinct integer starting at 0 and having no unassigned integers in between two assigned integers. Then it is clear that the greatest number that can be assigned a bitstring is 2^8n-1, because it will take 2^8n distinct bitstrings to assign each integer from 0 to 2^8n-1 a bitstring, and thus we have run out of possible bitstrings and have nothing to assign to 2^8n.
Q: What do you notice about these "pseudo-random" numbers? What happens if the seed changes? Repeated runnings of the program reveals that the program actually puts out the same set of random numbers every single time you run it, because a computer can't reliably create true randomness out of algorithms. You can change the seeds though, which are the constants a and c, to make a different set of random numbers come up. If you run the loop of random numbers more than 10 times, say 10,000 times, you'll notice that the histogram slowly turns into a completely uniform, flat surface. This is because of the Law of Large Numbers - as you generate more and more random numbers, differences in the occurences of each number become significantly less likely because all numbers have equal chance of being generated, and the frequencies of each number get closer and closer to each other.
11-14-2020: The CMS detector has several major components. The beam at the center is surrounded by many layers, in a cylinder, of detectors with various purposes. The innermost is the silicon tracker, which is used to help locate the exact vertex (or collision point) in any given run of data. We then have the electromagnetic calorimeter, which as the name suggests is used to stop and detect most electrically charged particles that are leaving the barrel. Beyond that is the hadron calorimeter, or the part used to detect hadrons (particles composed of quarks). Outside of that is the solenoid, or magnet, that actually is used to produce the very high-energy protons. Then the final major layer around these magnets are the muon chambers, used to detect specifically muons that travel a very long way through normal matter.
The buttons along the top of the screen that have the three axes labelled with x, y, and z will align you with those axes, showing you the view down the beam pipe or perpendicular to the detector.
Electron tracks are shown in light green; muon tracks are shown in red; missing energy is shown in purple; jets get represented by orange cones coming from the vertex;
When looking at the Hto4l_120-130GeV.ig file, we can start looking at the first events. In the first one, I see the following:
1. 4 electron tracks
2. 2 muon tracks
3. 4 photons
4. 18 (?) jets (hard to count because they overlap - in fact the program says there's 49)
5. 1 missing energy track
6. You have to zoom way in for the vertices, which I see 7 of
We can then move to the second event, which has:
1. 2 electron tracks
2. 4 muon tracks
3. No photons
4. 16 jets
5. 1 missing energy track
6. 11 vertices
Important to note: The program sometimes glitches and doesn't display all of the vertices or tracks at once. You should wiggle the camera around to make sure that there aren't any disappearing and reappearing vertices that you might not be counting.
12-1-2020: For HW 8, we're investigating how to rediscover the Higgs. Pasted here are two histograms we computed:
These are relatively easy to compute using the HiggsAnalysis.C file (that is the file created by our "MakeClass(HiggsAnalysis)" command - the command creates a new class, in this case a self-contained new program, for us to use). One thing to note is that you can change the file name that your results are saved under by altering the line where the files are opened to give a different name - where it says "Dielectron_MC.root", you can switch that to something else. That way your old results are not overwritten.
Also, try making the above histogram on the el3, el4 second zcandidate. It will give you a different result, which is interesting! The reason is all the particles are listed in order of pt in the program.
12-8-2020: The CPU is a generalized, central processing unit that is used to do any general purpose number crunching sequentially. A Graphics Processing Unit is a specialized, highly parallelized form of computing device that can do thousands of computations at once non-sequentially, making it very useful for certain applications like graphics and machine learning.
https://colab.research.google.com/drive/14j8ClmZV_3NJ4NZevf5fgTzkqv3mdPGG?usp=sharing This link will take you to a Colab notebook of mine that implements all of the Python code examples of homework 9.