1/26/17
First class of second semester.
2/15
Met with Dr. Blaufuss today for the first time.
-Python should run on Mac without running a virtual machine
-NumPi library for calculations/cuts using Python
-Still need to read up more to determine what to study
2/28
-Set up NumPy and Matplotlib (It's not NumPi)
In order to actually use numpy with Python, first the application has to be downloaded onto your computer
-Go to the downloaded folder for NumPy in Terminal / Command Prompt and run the command python setup.py
-Then a wall of processes floods the screen as all of the appropriate stuff is run and installed for ~2-5 minutes
Matplotlib is then downloaded after Numpy. The Matplotlib website provides a long list of binary distributions to download, but I took advantage of the method of downloading "for the lazy"
-Run the following commands in Terminal, in order:
curl -O https://bootstrap.pypa.io/get-pip.py
python get-pip.py
pip install matplotlib
-Installing Matplotlib does not take nearly as long
A Note from the future:
Once pip is installed, any future python libraries can be easily installed with the command
pip install <library name>
pip also has various other commands, such as listing all python libraries.
3/1
-Accessed the sample data with the Python shell application IDLE
-In order to access the data, I had to use the relative file pathing (./filename instead of ~/Documents/filename) system, which required me to actually find what folder IDLE was operating from
-To do so, I imported the os module and used os.getcwd()
Note from future: Python can also be run as a command line by typing Python in Terminal. I found this to be much more convenient, as I would be doing all of the rest of my work in Terminal anyways.
3/8
- Successfully managed to make simple plots from the data we were provided
3/16
- A largely uneventful week. Juan met up with Paul, one of the students from last year, to talk about the legacy code that they had made
3/22
- Spring Break. I ended up not doing any work while off campus.
3/31
- Juan and I took the code from last semester as posted on Paul's page and edited it to work without the use of iPython, which automatically imports several of the modules that we use, and ran it for 2500 trials.
One of the weirder errors that we encountered was with typecasting. At one several points in the code, the value of a variable "lamb" is set to itself plus "lambn". This should be achieved quite simply with the line:
lamb += lambn
But that produces the below error.
TypeError: Cannot cast ufunc add output from dtype('float64') to dtype('int64') with casting rule 'same_kind'
However, by instead using the line
lamb = lambn + lamb
We did not encounter any problems
4/6
- We began talking about what to do next. Erik recommended looking at specific points in the sky based on the research done by the HAWC collaboration
4/13
-Talked about the HAWC paper. Erik accidentally sent us the wrong paper initially (oops), so we initially thought we were looking at 12 points when we are actually looking at 40
-However, some of these points are close enough to each other that they will be placed in the same bin, effectively decreasing the amount of points that we will be looking at
We will need to alter the code that we already have to specifically look at these points. Doing so will increase the significance of our p values since we are looking at a subsection of the sky instead of all of it
There are two methods for evaluating the p values of a subset of points. The first is to use the method that we used previously, looking at the p value of each individual point. The second is to 'stack' expectation and actual values, treating it as a singular point
4/20
- We talked with Erik to make sure that our updated code actually did what it was supposed to. Last weekend I took all of the data points from the HAWC paper and put them into a numpy array, where the declinations can be accessed by ['dec'] and the right ascension can be accessed by ['ra']
- We took last year's code as a base and altered it to specifically look at the bins that contained point sources from the HAWC paper
-We first defined a secondary skymap of all zeros and added a 1 to any bin that contained a HAWC source
-We initially had a problem where only two of the points would be entered into the skymap, as the HAWC data was listed in degrees but the Healpy skymaps looks at bins based on radians. We chose to convert the HAWC data to radians in the code, not in the initial array
4/27
-A whirlwind of a week. We thought that we had all of the code working correctly. Juan and I sent the code to Blaufuss, then realized that we didn't include code to actually save the stacked p-values. Upon realizing that is the case, I changed to code to only save stacked p-vals (since the regular method worked correctly), only to find that the array was saving all ones, implying that the poisson distribution was so off that it was providing a 0% chance probability. After making a few changes, it instead gave a value of 0, making the code crash upon attempting to graph the histogram of the log of the results.
So what went wrong?
Essentially, I was trying to oversimplify the process of calculating the stacked p-value. I was taking the values of the shuffled skymap at the relevant bins as the expected values, and the number of events in the unshuffled bin as the actual. First I miscoded the latter, trying to call a specific value from the data instead of summing up the number of points at that bin. Then I realized that I should be using the expectation array that was created in the legacy code, but I initially did not use it correctly, resulting in the value being off from what it should be by a factor of about three. It was at that point where I realized that I needed to essentially copy the code from the other pval calculations, and manipulating it to cumulatively sum multiple actual and expected values before calling the poisson distribution function.
Final week
- I went in to talk with Erik for one last time about what explicitly the pval injection code does, as I was still somewhat unaware of what the code was doing and if that was impeding my progress in updating the code.
Beforehand, I had altered the code to presumably produce values but I wasn't sure what the code was recording, much less if it was correct. Even more confusing to me was the fact that the stacked pvalue injection took 8 hours to run 50 trials but the minimum pvalue code took 10 minutes to run 100. When I noticed that I had a tabbing error in the stacked injection, I instead ran into an error that I have yet to solve.
So the injection code is based off of the unscrambled pvalue of the data points. We take this pvalue and set it as a threshhold. We then look for the amount of additional events at a point of interest that are required in order to get a higher pvalue than the unscrambled pvalue 90% of the time. From this number of events, we can compare it to the sensitivity of the detector to officially determine the statistical significance of the points of interest.
The stacked injection code had an error where it would produce an array with length 124 at a point where it should have a single value. To worsen the problem, it later decides to set the same value to an array of length 128, preventing the values to be summed together. A command exists to reshape arrays, but as I am unaware of why these arrays are being made, it may still provide values that are not correct in the slightest.
The other injection code does not have similar problems, but instead provides a p value of 0 each time, meaning that somewhere the calculation is broken.
***************
Post-mortem
***************
Unfortunately, we were unable to do everything that we wanted to do by the end of the semester, though I will try to see if I can make additional progress in the future. Probably the biggest obstacle for me was my lack of a complete understanding of what the code was doing for a majority of the semester. I understood the code well enough to explain what it could do to others, but I had never fully understood the way that the expectation array calculated the expected values at a point. Below I will provide the code that we have worked on for the semester.