This page contains research notes and documentation on work for Icecube in the Spring Semester of 2016. A presentation of nice pictures and whatnot can be found here.
Icecube is a research collaboration of about 300 physicists (so 1/10th the size of the CMS collaboration). It consists of a whole array of detectors at the South Pole that sense incoming high energy particles (specifically neutrinos). The detector array takes up a cubic kilometer of ice that goes from 1450 to 2450 meters down into the ice. It also has a surface array that collects data on what kinds of cosmic rays hit the surface of the ice (Icetop). It took seven years (from 2004 to 2010) to build the whole detector. Each hole took 48 hours to drill and over 18,000 liters of fuel to drill, using a 25,000 pound water hose. In the few years that the observatory has been operating, Icecube has confirmed the existence of high energy cosmic neutrinos, studied neutrino oscillations, attempted to find sources of high energy cosmic neutrinos, searched for dark matter, and performed incredibly detailed studies of dust impurities in the antarctic ice.
Neutrinos interact only through the weak force. Therefore, they rarely interact with matter. This leads to the necessity of having a large volume of ice to run the detector in. They are everywhere (they make up a background radiation of 1.9K everywhere), but they only interact by exchanging a Z or W boson (Z is neutral, W is charged). When they interact with a W boson, they can create their partner lepton (ie an electron, muon, or taon), that is moving incredibly fast. If it is going faster than the speed of light in ice (about .76c), then the charged particle starts emitting Cherenkov radiation which is picked up by the DOMs (digital optical modules) spread out throughout the ice. They have photo-multiplier tubes, and transmit the data with a time stamp back to computers to record it all. Based on the propagation of the particle and the radiation, we can then work back to the original momentum of the neutrino.
We met with Erik Blaufuss to discuss ideas for our work this semester. He suggested we work on reconstructing paths or binning events by origin location. The first is important to all parts of the experiment as you can't do anything if you can't identify what the particles in your detector are doing. A great resource for track reconstruction is to be found here (section 4.2.2). The second binning of events is very useful in the search for point sources of neutrinos.The idea of looking for point sources is that we want to know more about what types of objects in the universe are capable of making these high energy particles. A recent paper on the status of point source searches can be found here. A paper on the methods involved in the search for point sources can be found here. Additionally, we gained access to the icecube internal wiki, whose link is here.
Also, here's a fun paper that IceCube published following the detection of the first gravitational waves. The paper just quickly says that IceCube and Antares (another similar neutrino observatory in the Mediterranean) didn't see any neutrino events in the same area that LIGO saw its gravitational waves.
We set up a weekly time to meet with Erik. The next meeting is on the 25th, so we did not have a chance to meet him this week. I looked into different map projections trying to figure out how they work so I will be able to plot histograms of the celestial sphere. One common one that is used is called the Mollweide projection (more math can be found here). It was used in WMAP (they plotted the cosmic microwave background in a similar way to how IceCube plots events to look for point sources) to see anisotropy in the CMB. I couldn't find the specific projection or coordinate system that IceCube uses most often, though I speculate that it is the Mollweide (which offers good size representation but poor shape representation). I however do not know exactly what the coordinate system they are using for these projections is, and that is perhaps more important. Neither the Earth's equator nor the Galactic plane is lined up with the skymaps (which they call "equatorial skymaps" here).
We finally got data from Erik Blaufuss (dropbox link here). It's in a numpy file called "IC86_exp_all.npy" which I believe is all of the data from IceCube from 2012. In order to access it, we have to use numpy in python. So we decided to use VirtualBox and install a lightweight ubuntu then put our install of python on that. In addition, there are several other good modules we decided to add to our install of python to get things rolling. We are actually using iPython because it provides a better and more convenient environment than regular python. We also want Python Tables to handle the large tables of data we will get, and we want matplotlib as a way to graph all that data. Once all that is installed, we need to be able to plot a histogram of the events recorded at each zenith and azimuth angle.
Note: I had trouble getting a 64 bit virtual machine started, and it turns out the solution was to go into my computer's bios and enable a virtual computer setting, then it worked.
The commands to install these packages are:
sudo apt-get install ipython
sudo apt-get install python-numpy python-scipy
sudo apt-get install python-matplotlib
Then we can type in "ipython -pylab" and get started doing things in python! Here are some commands we need to do:
data = np.load("./IC86_exp_all.npy") //this will load the npy file into data so we can manipulate it
import matplotlib.pyplot as plt //this lets us reference pyplot as plt
h1 = plt.hist(data["zenith"]) // here we can plot a distribution of the zenith angles in this data, though we really need more bins, let's do:
h1 = plt.hist(data["zenith"],bins=100) //much better!
h1 = plt.hist(cos(data["zenith"]),bins=100)//it's kind of hard to think in radians, this is easier
This is what we get.The -1 corresponds to going straight up (if you are standing at the south pole) through the Earth. 1 cooresponds to coming straight down. The difference in what we see is because there are different background. On the right the neutrino counts go down because the earth blocks more, on the left it's because we have to cut out a bunch of atmospheric muon background.
Skymaps from IceCube
Coordinate System
Erik drew us a picture that describes how the zenith and azimuth angles as well as the right ascension and declination work in this data.
h1 = plt.hist2d(cos(data["zenith"]),data["azimuth"],bins=100) //now we can do some 2d histogram plotting
This is what we see from this. We see the same distribution as above horizontally (that's the zenith angle) but looking vertically we get a much more of a homogeneous distribution. That means that there is little difference how many neutrinos the detector sees depending on what direction tangent to the surface of the south pole you look. That is, except for a few spikes in the down-going neutrino side. Let's take a closer look.
h1 = plt.hist(data["azimuth"],bins=100)//let's look at the azimuthal histogram
This is interesting. We see 6 very distinct spikes in the number of events at even spacing in azimuth angle. This only happens to the down-going data, as we saw above. I guess that this is because the detector is a hexagonal grid and that may put some directional asymmetry in the data. I wonder if they should have cut this out of the final data. But they know what they're doing.
Continuing from that histogram of the azimuth, the reason why we got those 6 bumps is because tracks that go right down a row of DOMs will get better sensitivity than otherwise. This is why we use right ascension instead of the azimuth because azimuth is based on the orientation of the detector but right ascension stays constant relative to the stars and icecube rotates in it. So when we plot right ascension (via "h1 = plt.hist(data["ra"],bins=100)") and get:
and that looks much more uniform. Then we can plot that in 2d and get a more true histogram (by "h1 = plt.hist2d(cos(data["zenith"]),data["ra"],bins=100)") and get this:
Now that looks more true to life. The next step is to try to plot this in a mollweide projection. We can use healpy (sudo apt-get install python-healpy) which does a good job of indexing the sky so that each bin is equal sky.
You know, this code is getting a bit complicated. We should start making it so we can do all this in one step. I hear shell scripts coming on! To make one, we type "touch scriptname.sh" and then use "emacs scriptname.sh".
Fun emacs shortcuts:
C-<cha> means press ctrl and whatever character at the same time. M-<cha> means press alt and whatever character
C-x C-c ends the session
C-g quits a command
C-v is page down, M-v is page up, C-l (L) centers the cursor
C-p (previous) moves up one line, C-n (next) moves down one line, C-b (back) moves back one space, C-f (forward) moves forward one space. These replace arrow keys
M-f and M-b move whole words at a time forwards or backwards
C-a and C-e move to the beginning and end of a line. M-a and M-e move to the beginning and end of a sentence
Here I wrote a shell script which calls a python program to do all these things we just talked about with one line. I type in "./2dhist_radec.sh" and it runs. Admittedly this is not super elegant. It also has the problem that I can't get the histogram to stay displayed. It appears then disappears immediately, so I worked around it by saving a png of the plot then opening it with eog ("sudo apt-get install eog"). Oh also you'll need to change permissions of the file with "chmod 777 <scriptname.sh>"
Of course, this still has the problem that we are plotting onto a rectangle. But the Earth is a circle. We like circles. Let's plot more circl-ey. Luckily healpy is a great library for plotting mollweide projections of data. I used Erik's code here to do this (we can color-code the code in emacs by typing "M-x python-mode"). That code looks like:
Â
and creates:
Now we can consider the p-values of each pixel. That is, how likely is each pixel to be a point source, given the background. We use the data itself to try and guess how many events we expect for a given pixel in a given declination, then we compare the expected value to what we actually see in each pixel and calculate that p-value assuming that the probability follows a poisson distribution.
Here is the code that does that:
The line to calculate n[i] is admittedly confusing because we have to normalize our histogram to account for the number of pixels in a bin of declination. The math behind this horrid line is here:
And we get this graph of log(p-value) vs log(frequency):
And this map of pvalues (notice that there are no noticeable patterns)
Week seven was a bit light on work because Erik (our mentor) left for a conference. We completed the work we had to do mostly in the eighth week.
We learned a better way to run a python script. Once you have started python ("ipython -pylab") we can run our script ("%run -i <scriptname>.py") and it runs from inside python, then whatever variables (or plots) were made inside the script are now in the work space. So if pval.py makes a histogram called h1, we can look at it by running "h1.draw()" from inside python.
One other edit I made to the pvalue code is that I also looked at the nearest neighbors to a given pixel and added them in to the consideration of whether there was a point source or not there. This is to accommodate for the fact that the uncertainty in degree of a given event is on the order of the width of a pixel, so we want to account for that in the pvalue test. This required two additions to the code:
A couple effects arose from this. First, there are a few more very low p-value events as can be seen in the first graph below, and second, near the equator and at a couple other declinations, there are noticeable strips of lower pvalues. I think that this is because there are really big changes between bins in declination at those lines, and so when we look at several bin that overlap across those shifts in declination, the poisson distribution sees a dramatic divergence from what it expects. This is seen in the second graph
The next thing we have to do is to run a trials corrected p-value test on the data. To do this, we need to see what the distribution of p-values is in a sample of data that we know there are no point sources in. The easiest way to check is to scramble the actual data in right ascension so we know there are no real point sources and see what the p-value distribution looks like.
The edits to the code make it become:
And makes a pval distribution given no point sources that looks like this after scrambling in right ascension ten times and adding all the pvalues into one histogram:
And scrambling 1000 times gives this distribution:
So on the agenda: we need to fix the expectation calculations when looking at nearest-neighbor pixels, we also need to change the histogram of the distribution of p-values to be a probability distribution. This means that we need to normalize the histogram so that it has an integrated area of one.
To fix the nearest neighbors calculations, we need to use the expectation of each pixel and sum them together instead of multiplying the expected value for the central pixel by 9 (because there are 8 neighbors and the pixel itself). We can do this by implementing the following code:
Which will produce a new map that no longer has the statistical artifacts from rapid changes in expectation between declinations. This can clearly be seen in the skymaps below. The left hand map shows the problem and the right hand one shows what the revised code does.
And the pvalue distribution looks like this
Ok, now we can implement that in the pvalue trials correction code and then normalize the resulting histogram. This can be accomplished by adding the condition "normed = True" to our plt.hist line that creates the histogram. Our distribution of pvalues from data that was scrambled in right ascension, after running 1000 trials, looks like this:
One interesting note about this is that the slope is much more linear than it looked before we were accounting for nearest neighbors, so I suspect that one ramification of the nearest neighbors calculation is to smooth out the pvalues in some sense.
So the first thing we're going to try this week is to adjust how we are getting an expected value for our poisson distribution in each declination band. The old way was to make a histogram with 100 bins in declination, try to normalize that with some messy trig to get the rough number of events per pixel area in a band. Then when you wanted to find an expectation for a given pixel later, you had to round its declination to the nearest bin. This whole situation has lots of over complexity. What we are going to try to do now is to get the expectation by finding all the pixels in a given declination by healpy, add the events in all those pixels up, then dividing by the number of pixels in that band. This will give us a more accurate expectation, and will make finding the right expectation later more simple. The changes to the code to do this are here:
Which gives me this skymap of p-values:
And the distribution of p-values looks like this:
So this new technique gives more low p-values than normal. However this brings us to a rather big problem that we've just realized we have to deal with...
These p-values are way too low for what we expect to see. Given that there are papers saying there are no point sources, and we know there are no point sources in this data, a p-value of 10^-13 should only happen once in ten trillion pixels. That map only has 12288 pixels, so we shouldn't see many points above around 10^-4. What this means is that somehow we are underestimating our expected background on a significant number of points. Our expectation could just be two or three times too low and we could have this problem. Where is the error coming from?
We tried adding the expectation up for each pixel and comparing that to the total number of events in the data set. This should be the same if our expectations are correct. We get 121089.3 from the expectation and 122691 from the data. That difference can be explained away by events we don't account for in the polar regions. So no help there.
Next we can look at what distribution of values we get for the number of events in a given declination and then see if it looks very poisson-ey to us.
We make these plots and compare them to the poisson distribution we are using to describe the event distribution in that declination and, as you can see below (these are just 3 of the 124 declination distributions we made), the fits are very believable:
Next we looked at how individual points in the sky were being analyzed and viola! After several hours of debugging, we have a solution! What was the problem, you ask? It's a simple problem: python's default log function does natural logs. We did not know that. We know that now. That means that the fix is simple:
When we plot the histogram, instead of plotting "-log(pvalmap)," we normalize so the log is in base 10: "-log(pvalmap)/log(10)" and boom! the problem goes away. Now the biggest p-value we see is about 10^-5.8, which is much more reasonable for 12,000 calculations.
Now that that's solved we need to do one more big thing. We need to start injecting signal into our scrambled datasets to see how many neutrinos we need to see from a point source before it becomes statistically signifigant. To do this, we will need a distribution of the highest p-values we get from a bunch of scrambles to do trials corrections, and we need code that gives us the p-value of an injection of x neutrinos and y declinations.
We plotted the lowest p-value from 1000 scrambles and get this distribution:
And we can see here that we don't see any p-values above 10^-11 from random fluctuations in the background.
So now that we know we're calculating the p-values correctly, we need to figure out how many neutrinos we'd need to see from a hypothetical point source above the background before we see it as the most significant point in the sky 50% of the time (that will be our discovery potential). We will only look at 7 or so declinations for simplicity, and because the background doesn't change too incredibly drastically in declination. So basically we're going to try to make some code to scramble all our data in right ascension, add a few extra events to one pixel, and see if it's the highest p-value on the map and has high enough statistics to be distinguishable reliably. Then we repeat this a whole bunch of times for 7 declination with different amounts of added events.
So how much is significant? Well after 1000 background scrambles to get a trials adjustment distribution (see above) we find that the highest p-value from random background was 1.713e-11. This means the right tail .1% (or 3 sigma) cuttoff is about 1.7e-11, so if our injection point is both the highest point on the map and if it is above that 1.7e-11, then we consider that a success.
Next we wrote some code ("injection_pval.py") that adds extra events over and over to scrambled maps and creates a distribution of "what percent of injections are above the threshold" indicated by color plotted against the number of injections on the x axis and the declination on the y axis.
Awesome! It is interesting to note that this actually closely resembles the distribution of events in declination we got at the very beginning, but this makes sense as our test statistic is based on a poisson distribution, who's standard deviation is related to the number of events in a bin. Funny how things work out. Now that we have that, we need to convert "number of events we need to see in icecube" to neutrino flux since that is a more universal value that anyone can use. We got conversion factors from Erik (which vary in declination and depend on the cross sectional area of the detector versus the energy distribution of neutrinos and the cutting criteria for events). Then we used those values to create a discovery potential for IceCube at the 3 sigma level (we'd do 5 sigma, but that requires orders of magnitude more computer time).
Which is very similar in shape to IceCube's own results after 7 years and much more sophisticated analysis (albeit they are about 100 times more sensitive and 2 sigma above us, but they have 7 times more data and much more complicated algorithms).
For just three months of work by 3 undergraduates, I'd say we did pretty good!
Attached at the bottom are our final code, a zip with all our final plots, and a pdf of our poster and (eventually) our paper. Unfortunately the data we used is not publicly available. All publicly available data can be found here: http://icecube.wisc.edu/science.
This is the "about us" page on the Icecube public webpage. It has general information about who they are and what they do. http://icecube.wisc.edu/about/overview
This is a link to the main Icecube page for the UMD part of the collaboration. http://www.icecube.umd.edu/
These are some cool facts about Icecube. http://icecube.wisc.edu/about/facts
This is a page with research highlights of the experiment. http://icecube.wisc.edu/science/highlights
This has some great information on neutrinos: https://icecube.wisc.edu/outreach/neutrinos