-----------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------HONR269L--------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------
February 3rd, Sunday:
Samuel Howard and I decided on a meeting time with our professor for this research in IceCube, Erik Blaufuss. We will meet at 2:30 on Fridays.
February 6th, Wednesday:
Dr. Blaufuss told us to look at previous logbooks. I found that a lot of them used Python, and they made skymaps of where the neutrinos could have possibly come from, based on their point source searches. A lot of their work was confusing, since I do not know the technicalities of a lot of this yet.
February 7th, Thursday:
Sam and I discussed what we maybe wanted to do. He suggested something with machine learning, since he finds that really interesting, and he is a Computer Science major. I also find that interesting, although I do not know that much about it.
February 8th, Friday:
We met with Dr. Blaufuss for the first time. We discussed our ideas of a possible machine learning approach with him, and we talked about what we found in the previous logbooks. It seems like we have two options we could pursue: do similar work to previous years' teams, choosing a different part of the sky to search, or do a new type of work, with machine learning, in which it will be less known what to do.
If we were to pursue the machine learning approach, we would also have to decide if we wanted to use a Windows-based system or a Virtual Machine. Dr. Blaufuss said that Windows would probably be better, since it would be directly on our computers, but if we went with that it would be uncharted territory for all three of us.
February 11th, Monday:
Dr. Blaufuss emailed us with some papers to read before our next meeting.
One gave a general overview of how Icecube works: When particles enter the ice, if they go fast enough, they give off Cherenkov light radiation, by going faster than the speed of light in that material. This is basically the equivalent of a sonic boom for light.
February 15th, Friday:
Sam and I decided that we wanted to pursue machine learning as opposed to the previous years' point source searches. We discussed this with Dr. Blaufuss, and he looked up online sites to get us started. Amazon Web Services (AWS) will be a good place to go if we want to do some cloud computing, or need more storage for data. He set up a classroom and requested access for the two of us to create an educate account there. In addition, he told us to make an account for Enthought Canopy and to get Scikit-learn.
Dr. Blaufuss suggested that we find papers on neutrino oscillations and neural networks.
February 20th, W:
I got an invitation today to join AWS. I had to confirm my email with them and create a password for my account.
February 25th, M:
We had to have a meeting on a Monday this week, because Dr. Blaufuss will be out of town on Friday. We talked about multiple types of cloud computing that we could use, mainly either Google Cloud or AWS. If we start to use this, we will have to decide which one to use.
We went over how to use AWS during the meeting. Dr. Blaufuss had tried it out some, and there are multiple resources and tutorials on there that assume you have a full account, as opposed to the educate account that we have. So, he told us to add different Rolls on the site so that we can use them. Later this week I will do this myself.
We also discussed beginning to use our computers to start using some machine learning. Dr. Blaufuss suggested using public census data, which he will email us a link with the data and a tutorial of how to use machine learning with this.
Another thing we have to decide is how in-depth we want to go with our research. We can either go in depth in one thing--like boosted decision trees or cloud computing--or learn and use the basics of multiple things.
By next week, we should fully go through one analysis of data, training the computers with data, using them to predict aspects of something, and quantifying how accurate the computer was able to predict these. In the census data that we will use, the computer will predict the characteristics of people based on certain other demographics.
After we work with public data sets and get a feel for how it works, we can start using these skills and apply them to neutrino analysis.
March 1st, F:
I have started working on the tutorial Dr. Blaufuss sent us on how to train the computers to learn census data characteristics. We are using the Enthought Canopy program to do this on our computers, since the data files are not too large for our laptops to handle. The tutorial is found here: https://medium.com/district-data-labs/building-a-classifier-from-census-data-18f996c4d7cf and it is outlined with different subparts, which I will show.
Data Ingestion:
We created a function to download the census data from web files. At first I tried to do exactly the code on the tutorial, but I got this error:
So, I looked it up, and turns out the page is from a less updated version of Python. All I had to do was add in a "b" to the "w" part so that it read "wb". This is the the code:
--------
import os
import requests
CENSUS_DATASET = (
"http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",
"http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.names",
"http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test",
)
def download_data(path='data', urls=CENSUS_DATASET):
if not os.path.exists(path):
os.mkdir(path)
for url in urls:
response = requests.get(url)
name = os.path.basename(url)
with open(os.path.join(path, name), 'wb') as f:
f.write(response.content)
download_data()
---------
Data Exploration:
In exploring the data, we used Pandas, and had to supply names for the header row to the computer, since the data did not contain that already. This is the code:
---------
import pandas as pd
import seaborn as sns
names = [
'age',
'workclass',
'fnlwgt',
'education',
'education-num',
'marital-status',
'occupation',
'relationship',
'race',
'sex',
'capital-gain',
'capital-loss',
'hours-per-week',
'native-country',
'income',
]
data = pd.read_csv('data/adult.data', names=names)
data.head()
---------
Here's the output it gives us. The head command in the last line of the code tells it to show us the first five lines of the data, that we have made a header row for.
In this tutorial, we are trying to get the computer to predict the income of the people in the census. To help us visualize the actual data more, we can create plots that compare two characteristics. The orange lines show the amount of people in that category with an income above 50 thousand dollars, and the blue lines show people with an income below that or equal to it. There is a simple code for this...
---------
sns.countplot(y='occupation', hue='income', data=data,)
---------
...that we can then change the "y" part to whatever characteristic we want to look at, and it will make the corresponding graph. Here's the output for occupation and race.
Later, I will continue this tutorial.
March 4th, M:
We had to meet with Dr. Blaufuss on a Monday again because he will be out of town starting tomorrow. In this meeting, we discussed a new idea that Dr. Blaufuss proposed for us to use machine learning for. At Icecube, they have an alert system that sends out a notification whenever a significant neutrino event occurs. The machine that sends out these alerts is trained to notify people when there is cascade and a line happening right after another in time. Usually, this is when an event occurs within the detector. Yet, sometimes the machine sends out an alert when there are two unrelated things, the line does not come out of the cascade and is just another random particle. They then have to manually send out an alert to disregard the previous alert. Using machine learning, we could try to train a machine to recognize the pattern in this, and to know when the line come out of the cascade and when they do not. This would require a pattern recognizing machine learning algorithm. It would also require both "good" samples--lines coming out of cascades--and "bad" samples--the false alarm alerts. We would need enough of these to train the machine, and it will be difficult for Dr. Blaufuss to find enough of the "bad" data samples if we were to pursue this.
In case we do decide to pursue this though, since it does seem interesting, Dr. Blaufuss told us to investigate different machine learning algorithms and what different approaches we could take to machine learning and classifying. Specifically for me, I plan to read into more of how machine learning actually works, since I do not fully grasp the fundamentals of that yet; I did not know much programming at all until this year. Dr. Blaufuss recommended that I just start with what I don't know (even if that's just the basics), figure it out, and go from there.
March 9th, Sa:
I watched a few videos on Machine Learning to check that I understood the concept thoroughly. I realized that Machine Learning comes in many different types and shows up in many places we don't realize, like google search giving us the best suggestions based on what we want, or the workings of self-driving cars.
I worked on the census data tutorial more. It does not actually take that long to do, but I am trying to really understand each step as I do it. After data exploration in the census tutorial comes data management.
Data Management:
We had to organize the data to be more readable by Scikit Learn. Using this code...
------
import json
meta = {
'target_names': list(data.income.unique()),
'feature_names': list(data.columns),
'categorical_features': {
column: list(data[column].unique())
for column in data.columns
if data[column].dtype == 'object'
},
}
with open('data/meta.json', 'w') as f:
json.dump(meta, f, indent=2)
------
...we made a meta.json file, which told the computer which parts of the data was out target value. In this case, it's the people's incomes.
In the tutorial, the author uses a "readme" file within their code that I am not sure what it does. This was their own file though, and when I commented out the lines containing it, the code still worked. Here is the next code we used, with the readme parts commented out.
------
from sklearn.datasets.base import Bunch
def load_data(root='data'):
# Load the meta data from the file
with open(os.path.join(root, 'meta.json'), 'r') as f:
meta = json.load(f)
names = meta['feature_names']
# Load the readme information
#with open(os.path.join(root, 'README.md'), 'r') as f:
#readme = f.read()
# Load the training and test data, skipping the bad row in the test data
train = pd.read_csv(os.path.join(root, 'adult.data'), names=names)
test = pd.read_csv(os.path.join(root, 'adult.test'), names=names, skiprows=1)
# Remove the target from the categorical features
meta['categorical_features'].pop('income')
# Return the bunch with the appropriate data chunked apart
return Bunch(
data = train[names[:-1]],
target = train[names[-1]],
data_test = test[names[:-1]],
target_test = test[names[-1]],
target_names = meta['target_names'],
feature_names = meta['feature_names'],
categorical_features = meta['categorical_features'],
#DESCR = readme,
)
dataset = load_data()
------
I wasn't really sure what bunch was, so I looked it up. It seems to be an object that acts as a dictionary, and it is used a lot with scikit-learn. It is a container object, used for datasets.
The code above basically splits the data into target and data variables. This makes them ready to be used by scikit-learn.
March 11th, M:
I also realized that I do not really know what scikit-learn is either. So i went to the main scikit-learn website and tried to learn what it is. Scikit-learn is a library (like numpy and other libraries that we used first semester) specifically for python. It is actually built ON numpy, scipy, and matplotlib, presumably using some of their codes and words too. It is used for Machine Learning, since it includes code for data analysis and using data. It can classify data, predict the attributes of data (which is what happens at the end of the census tutorial), cluster similar data objects together, reduce the amount of random variables, choose a good model based on comparisons with others, and transform text data into data ready for machine learning (which is what we did in the previous census tutorial step)
March 13th, W:
I started to look into more of the different types of machine learning, to see what our options are for our project.
March 18th, M:
Today I looked at the handwriting tutorial on Scikit Learn, found here: https://scikit-learn.org/stable/tutorial/basic/tutorial.html#introduction
At the beginning of this tutorial, I learned more about machine learning. There are two categories: supervised learning and unsupervised learning. In supervised learning, one part of the training data is the target data, which we want to predict in later data sets. Basically, you are trying to figure out how the input variable and the output variable are connected. Then, when we have new input data we will be able to predict the output data. Supervised learning also has two subgroups: regression and classification. Classification is when the output variable/target data is of a categorical form--like predicting whether something is going to be either blue or red. Regression is when the output variable/target data is a value--like predicting the exact income of somebody. The census data tutorial above is a classification problem because it does not predict their incomes, just whether they earn either more or less than 50k. With unsupervised learning, there is no correct answer that machines can learn from. Instead, they are supposed to find their own patterns and interesting aspects of the data.
Data sets always are in the form of a 2D array. The data has to be put in this form to used by Scikit-learn.
March 26th, Tu:
I decided to start meeting with Sam so that our work was more similar and I could ask him any questions I had, since he knows more about computing than I do.
Today we met and completed a task Dr. Blaufuss had wanted us to do in an email he sent on Sunday.
We are starting to apply our machine learning knowledge to the actual Icecube data. Dr. Blaufuss found some data that we can use, but it is very large and not accessible to the public. So, he wanted us to make sure we could read it with our Enthough Canopy first, and then we would figure out how to transfer the data from him to us. He gave us a set of aroudn 5 or 6 data points that we could download and then make the computer read. He provided us with a code we could use to accomplish this, once we downloaded the data:
--------
import cPickle as pickle
with (open("coinc.pkl", "rb")) as openfile:
while True:
try:
event = pickle.load(openfile)
## do something with this event (a dictionary)
print event
except EOFError:
break
--------
When I met with Sam, he had already run the code and gotten it to work. He had to download the file, make sure it was in the same place as his directory, and just change a minor detail in the code (which I will show later), and then it worked.
I downloaded the data, and moved it from my Downloads folder into my Users/Rebecca folder, since that is where my computing directory is located. However, when I copied the code into Canopy, it gave me unusual errors that Sam did not have. The first one was saying that I needed parentheses after the print function. Looking this up, we found that this was a requirement if using Python 3. We realized that we were using two different Python types--somehow I was using Python 3.5, and Sam was on 2.7. It seemed to us that I was still using Python 2, though, since that was what I had selected on Canopy, as shown below.
However, it seems that this selection does not mean exactly what we had thought, and we are still not sure exactly what it means. So, we looked up how to change which version of Python to use in Canopy. Following the instructions there, it worked. In the main "Welcome to Canopy" window, I went to Edit, and then Preferences. Under the "Python" tab, I selected Create New Environment. From here, I named a new environment and selected the EDM Bundle file of Python 2.7 for it. Once I had done this, it took about half an hour to download. Then I restarted Canopy and chose the new environment from the drop-down list of EDM Environments. Once I had done all this, the code got the normal error that Sam said he had gotten. It was the error "could not convert string to float". Sam had previously looked this up, and found that we just need to change the rb in the code to solely r. Once I had done this, the code ran successfully.
--------
import cPickle as pickle with (open("coinc.pkl", "r")) as openfile: while True: try: event = pickle.load(openfile) ## do something with this event (a dictionary) print event except EOFError: break
--------
The output of this code is a dictionary. It says {'Hits': all the data points, with an array of three numbers each--the string number, DOM number on the string, and time, 'Coincident': True or False}. There are 5 or 6 of these in the output.
March 29th, F:
Today we met with Dr. Blaufuss. In the meeting, we told him that we were able to read the small amount of data he sent us. He showed us that he had a lot more data for us--thousands of actual good events and around 400 coincident events, since these are more rare. In their Icecube computer environment, Dr. Blaufuss has a simulator that shows us the detection of particles in Icecube. Blue colors are later in time, while red colors are earlier. The size of the ball corresponds to the charge and energy of the event. Dr. Blaufuss will email us pictures of clear events and coincident events that we looked at during the meeting. In addition, he will add to dropbox the larger sample of data, since it is too large to send via email, and we will begin applying different types of machine learning to this.
Also within the meeting, we discussed any questions we had about the data and other subjects. As stated earlier, there is a number in the data that corresponds to time. However, this number is usually around 10,000, so Sam and I had no idea what it meant. Dr. Blaufuss described how, because of the speed of neutrinos, scientists decided that looking at a window of 20,000 nanoseconds was a good amount. When the detector detects a significant event, the 0 value of this window is placed so that the event happens in the middle of the window. This way, they can look at what happened before and after the event. That is why a lot of the data was around 10,000.
Within some of the reconstructed event pictures, there is a line in the detector cube where no doms are lit. Dr. Blaufuss explained that this was because of dust within the ice, from a volcanic eruption. The dust is more absorbing of light, so it usually causes a gap in the data. However, sometimes the particles go close enough to a string that this light is still detected, since it is more intense right at the path of the particle.
We made plans for the upcoming week to start to use machine learning with the data that Dr. Blaufuss will send us. First we will do this without the time data, and later once we are more skilled with what we are doing, we will try to add in the time data to see if that helps. Dr. Blaufuss also will email us the geometry of the Icecube strings and doms in case we think that will help the computer too, but he thinks that the computer can figure that out by itself.
March 30th, Sa:
These are the emailed pictures.
These two are notable, single events:
While these two are coincident events:
April 2nd, Tu:
Today Sam and I met together to begin working with the data that Dr. Blaufuss had sent us. Our plan is to base our machine learning off of the census data steps that we previously completed. Here is the code we have so far:
-------
# import cpickle for reading and writing data import cPickle as pickle # # import libraries for handling data import pandas as pd import seaborn as sns coinc_test = open("coinc_test.pkl", "r") coinc_test1 = pickle.load(coinc_test) print coinc_test1['Coincident'] print coinc_test1['Hits'][0][0]
-------
In order to use machine learning, we will have to convert the dictionaries, that our data came as, into pandas data structures. We thought about how to do this, and ended up finding a website that seemed straightforward and useful: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.from_dict.html .
April 5th, F:
Today we met with Dr. Blaufuss, since it is Friday. It was a very short meeting this week, since we had not really done much that we could actually show him. We had mainly just acquired the data, learned how to read it, and planned our next steps. Dr. Blaufuss did add a little to our code, though. At first, we were only actually using one event, when we had wanted to use all of them. So Dr. Blaufuss added the line in the code that I will show later, in which .append was used.
April 8th, M:
Sam and I met again, to proceed more with the machine learning steps and discuss what we had done over the weekend. Sam shared a code he had come up with, that changed the data into a data array. He was able to accomplish this with code for the first event. Our plan with this code was to make a data array, full of 1s and 0s. The rows would be each event. The columns would correspond to each DOM. Since there are 84 strings and 60 DOMs on each string, we numbered the DOMs 1-5160, with 1-60 on string number 1, and so on. To make it simple, we are neglecting the time value for right now. A 1 in the array will mean that that certain DOM was hit. A 0 will mean it was not. Hopefully, once we tell the machine whether the events were significant or not, it can use this data to train itself.
We were able to successfully make the code have an array of 1s and 0s. Then, we wanted a column at the end to show whether each event was significant or not. So we put a boolean at the end that says that. We already know what it is, because the sets of data given to us were each labeled. The machine then should be able to use these labels accordingly, and print out its own once we test with the testing events.
Here is the end result. The actual array is very large, this is just a preview of the end of it:
This array of events has 394 events in total. There are 5161 columns because of the 5160 total DOMs, plus the boolean column at the end.
Here is the full code that we came up with:
With this code, we first made all the values in the array zero. Then we went through the data and put a one everywhere there was a hit. The data that was given to us was just the hit data, not the non-hits.
April 10th, W:
Today we transferred this code to work with all four data sets that we have. The above code is specific to the Coincident Test sample, the smallest one. So, using find and replace, we just changed all the names of files within the code to correspond to the Coincident Train, Single Test, and Single Train data (when we used find and replace coincident to single, we had to cancel the part that changed 'Coincident' to 'Singlecident'). This was successful.
However, when we tried to run the code for the Single Train data set, Sam's computer froze for two hours. It was too much for our personal laptops to handle. We are thinking of making the code better and more efficient, or using a GPU or computer that we could hopefully gain access to through Icecube and Dr. Blaufuss. For now, we just shrunk the size of the data. We made the Single Train data set consist of only 2000 events, and the Single Test data set consist of only 1000 events. From this, we were able to run the code. We then pickled the data, so that we can access it later by unpickling it. So now all four data sets are in pickle files, in an array.
April 15th, M:
We met with Dr. Blaufuss today and showed him our code. We also told him how it was too big, and how we had shrunk it. He enlightened us that we are making the computer hold all the events in its memory at once when it runs the code. Instead, we should make it go through each event and then move on to the next one, so that only one is in its memory at one time. That way, we will have enough memory to use all of the events in the data set. This is what Dr. Blaufuss did within his code, which we have in our large code in the part where it opens the Icecube dataset file (labeled by a commented line). It adds each of the events on once it is done, and then takes them out of its immediate memory.
April 16th, Tu:
We started applying machine learning to our files. Here is the code so far:
------------
# This file is for applying machine learning to IceCube Neutrino data
# import libraries for handling data
import pandas as pd
import numpy
# Obtain the DataFrames
coinc_test_events_frame = pd.read_pickle("coinc_test_events_frame.pkl")
coinc_train_events_frame = pd.read_pickle("coinc_train_events_frame.pkl")
single_test_events_frame = pd.read_pickle("single_test_events_frame.pkl")
single_train_events_frame = pd.read_pickle("single_train_events_frame.pkl")
#print coinc_test_events_frame
#print coinc_train_events_frame
#print single_test_events_frame
#print single_train_events_frame
#######################################
#
# Model Build
#
# Import DummyClassifier from scikit-learn
from sklearn.dummy import DummyClassifier
# Create a Dummy Classifier
clf = DummyClassifier()
# untouched versions
untouched_coinc_train_events_frame = coinc_train_events_frame
untouched_single_train_events_frame = single_train_events_frame
untouched_coinc_test_events_frame = coinc_test_events_frame
untouched_single_test_events_frame = single_test_events_frame
# single and coinc targets
coinc_target = coinc_train_events_frame.pop(5160)
single_target = single_train_events_frame.pop(5160)
# entire target and data
train_events_frame = single_train_events_frame.append(coinc_train_events_frame)
target = single_target.append(coinc_target)
# train the clf classifier using .fit(x, y)
clf.fit(train_events_frame, target)
# entire test frame
test_events_frame = single_test_events_frame.append(coinc_test_events_frame)
y_true = test_events_frame.pop(5160)
y_pred = clf.predict(test_events_frame)
#print y_pred
# Classification Report
from sklearn.metrics import classification_report
# execute classification report
print "Classification Report"
print classification_report(y_true, y_pred)
################################
# Import MLPClassifier from scikit-learn
from sklearn.neural_network import MLPClassifier
# Create a MLP classifier
#clf = MLPClassifier(activation="logistic", solver ="adam", learning_rate = "adaptive")
mlpclf = MLPClassifier()
# train the clf classifier using .fit(x, y) as described above
mlpclf.fit(train_events_frame, target)
# Now use the classifier to predict the label for the last entry we left out
mlp_y_pred = mlpclf.predict(test_events_frame)
#print y_pred
#
# Classification Reports
#
# import library for assessing machine learning models
from sklearn.metrics import classification_report
# execute classification report
print "MLP Classification Report"
print classification_report(y_true, mlp_y_pred)
------------
The code has three parts. The first part just gains access to the pickled files--the data arrays--that we made in the previous code. Then, after the line of #s, we are using the Dummy Classifier. First we made untouched versions of our dataframes, in case we need those. Then we took out the boolean from end of the data frames, and made this its own variable, the target. We put the single and coincident parts together by using ".append". We predicted the boolean for the test data set, using the training from the train data set, and compared our results to the actual boolean of the test set. In the next part of the code, after the next line of #s, we did the same thing but with an actual classifier, the MLP classifier. We got this code from the digits data set tutorial on scikit-learn, and changed various names and parts so that it could be used with our data.
This code was able to give us this output, when we ran it twice on Sam's computer:
The first report in each run is from the dummy classifier, which is basically just what would happen if it randomly selected ones. The second reports are from an actual type of classifier; right now we are using MLP. We can see that it worked somewhat better than the dummy classifier, since the percentages are higher. However, it is still not very accurate, especially with the Coincident: True samples. We would like more Coincident data, but Icecube does not have any. We plan to try to get the code to be able to use more of the Single data, and hopefully this will help make them more accurate.
April 24th, W:
This week, we made our code process each event one after the other, so that all of the events are not stored in memory at once. This way, we can use more data. We based this new version of the code on Dr. Blaufuss' code that he gave us to read the events. This code uses a While True loop instead of a For loop, so we did the same in changing our code. However, we still have not been able to use the full amount of data, and the computers are being very slow.
This is our old code that we had, that stored all the events in memory at once, and therefore crashed Sam's computer for two hours:
We put all the outer For loops into this one While True loop, and the inner For loop is still a For loop within this. Here is the new code:
With this code, we were able to use larger sample sizes. However, using this new code within the machine learning part of the full code, the computer still got stuck. We have been running different iterations of that code, and it still gets stuck. We also tried different numbers of samples. It seems to work at 20000 samples at least, which is much greater than the original 1000 we had. We hope to figure out what is wrong soon and get larger samples, since this does not make the machine learning very successful.
April 30th, Tu:
In the past week, we have tried various different machine learning tactics, to see which one is the best. In addition to the MLP Classifier, we used a decision tree, which seemed to produce better results. In addition, we added the time values, to see if they would help. We had thought this would require a non numeric value, since 0 does not work if you are using actual number values. However, scikit-learn does not take NaN, so we tried to use 0. This did not produce good results. We tried to use -1, and this did help. For all of these changes, we just switched out simple lines of code. To use the decision tree, we just used that code instead of the MLP code. To input time, we changed the 1 that we had put into the dataframe into the third value from the data (the time value), and we changed the 0, unhit, to -1. We ended up with four results, each including the Dummy, MLP, and Decision Tree classifiers.
First we have the original code, with 0s and 1s in the dataframe, with 1000 single events to train it:
Then we put in more data--training it with 20000 single events:
Then we added time values, and computed it with both less and more data again.
Less data:
More data:
We are not sure why these results are the way they are. However, we can look at them and try to analyze it. The best scenario for predicting Coincident False events was using more data, without time values. However, the best scenario for predicting Coincident True events was using less data, with time values.
May 8th, W:
We are done with our research for this semester. There are many ways to continue this; here are three that we would have tried, had we had more time:
1. We could modify the classifiers, instead of using those just built into Scikit-learn. For instance, we can change and set the depth of the decision trees, which will give us different results. For neural networks, we can change the activation, which is a function used to transform the values coming from each neuron. Some options for this are identity, logistic, tanh, and relu. We do not know exactly how to do this yet or which one will be better.
2. We could try more classifiers, since so far we have only had time to try MLPs and Decision Trees.
3. We could try regression instead of classification. Regressions gives a value instead of a category. With this, we could make the computer predict the probability, as a number, of the events being Coincident True.
-----------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------HONR268N--------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------
Useful commands and shortcuts (HW 2)
-cmsenv puts you into the cms environment
-you can do shift>ctrl>v to paste into the terminal
-if youre writing a command but forgot something earlier, do ctrl>u type the thing you need to do and do it and then ctrl>v brings back what you were in the middle of writing
-you can open a terminal in a specific place by right clicking on that directory
-ctrl c force stops a command!
-man intro shows you some useful commands and introduction to it
-cd changes the directory
-ls lists what's in the directory; ls -l shows a longer list; ls -h makes it more human readable
-mkdir makes a directory
-pwd shows what current directory you are in (print working directory)
-history lists all the previous commands you have done
-cal with the month and year as numbers afterwards presents that month
-cat filename displays what is in a file
-The > operator writes the word into the text file given, replacing all other text in the file. The >> operator adds the word into a new line of the text file, and leaves what is already there.
-mv filename filenamenew replaces the name of a file!
-when you click somewhere and start typing, it puts things BEFORE the cursor
-CTRL S SEARCHES IN EMACS
9/25/18
I got a VirtualBox error whenever I tried to open it, saying that the interface "virtualbox host-only ethernet adapter" is not a host-only adapter interface (e_fail 0x8004005). I think this may have been caused when I updated my computer or when my computer shut down with everything still running. To fix this, I had to go to settings>network, and there adapter 2 was checked but there was no specific adapter that it was attached to. So in the "attached to" dropdown I clicked "not attached" instead and that fixed the problem.
HW 3
-in scripts, you need to put #!/where they are located/the type of script for them to work
-putting a command in singular quotes overrides any variables you try to put within it, so if you want variables in a command you have to use double quotes -- aka, single quotes just take the literal sense of exactly what you say, not interpreting anything
-set ARG=$1 means that we set a variable called ARG to be that of the first word we type when executing the command ($2 would be the second word)
HW 4
all of these are within src:
also, this is a good site to reference to learn about types of loops and other things: http://www.cplusplus.com/doc/tutorial/control/
OGMAIN.CPP
#include <iostream> ---> this tells your compiler the types of commands you are giving it, that they are stored together in a group of codes called the iostream
using namespace std;
int main() { --> tells where the first line of code is-- it's in main
cout <<"Hello World!" << endl; //Print hello world to screen followed by end line (endl)
--> the cout command tells it to display the output. Hello World is what we tell it to display. The endl is put there so that it executes this command before it moves on to the next line. The semicolon is used to separate between commands
return 0; //Exit the program --> tells the compiler that this is the end of the command
}
output:
Hello World!
COMMANDS
g++ main.cpp --> tells your computer what kind of language it compiles this with. It translates the ASCII code into assembler language
./a.out --> executes it
TEST.CPP
#include <iostream> --> tells your compiler the types of commands you are giving it, that they are stored together in a group of codes called the iostrea
using namespace std;
int main() {
cout << "hello world" << end; --> displays hello world
int i=2; --> makes the variable i equal to 2; telling it how many bits to use with the word int
cout << "i = " <<i<<endl; --> displays "i=" and then the next i in this code tells it to display the actual value of i, not the specific letter since that is not in qoutes; then ends the line
double a=3.3; --> makes the variable a equal to 3.3; since it's double it has 64 bits
cout << "a = " <<a<<endl; --> displays a=WhatItEquals
int j = a*i; -->makes the variable j equal to a times i
cout << "a*i = "<<j<<endl; --> since j is just an int, it outputs 6 instead of 6.6. If we were to change it to double, then it would output 6.6.
return 0; --> tells that it's the end of the command
}
output:
hello world
i = 2
a = 3.3
a*i = 6
NUMBERS.CPP
#include <iostream>
using namespace std;
int main() {
int n=10; --> the initial n is ten
cout << "n is "<<n<<endl;
est
n--; --> makes the n variable go down an integer (it becomes 9)
cout<<"n is now "<<n<<endl;
n++; --> makes the n variable go up an integer (becomes 10 again)
cout<<n is now "<<n<<endl;
return 0;
}
output:
n is 10
n is now 9
n is now 10
TRUEFALSE.CPP
#include <iostream>
using namespace std;
int main() {
bool prop;
prop = (5>1); --> states that 5 is greater than 1
cout<<"prop is "<<prop<<endl; --> if the prop is true (in this case yes), then it will display prop is 1 (because it displays "prop is", like we told it to in quotes, and then the prop not in quotes tells it to display either the true or false statement in forms of a 1 or 0
prop = (1>5); --> states that 5 is less than 1
cout<<"prop is "<<prop<<endl; --> since the prop is false, it will display prop is 0
prop = (1 != 5); --> states that 1 is not equal to 5
cout << "prop is " <<prop<<endl; --> since the prop is true, it will display prop is 1
return 0;
}
output:
prop is 1
prop is 0
prop is 1
LOOP.CPP
#include <iostream>
using namespace std;
int main() {
int n=10; --> the initial n is 10
while(n>0) { --> makes it so that the loop stops at 1, and so the command is done until n gets down to above 0, when the condition of the loop is no longer met
cout<<"n is "<<n<<endl; --> displays what n is
n--; --> subtracts one from n each time
} --> ends the looping command
return 0;
}
output:
n is 10
n is 9
n is 8
n is 7
n is 6
n is 5
n is 4
n is 3
n is 2
n is 1
FORLOOP.CPP
#include <iostream>
using namespace std;
int main() {
// when we declare a for loop, we also initialize the loop variable,
// specify the exit condition, and tell the program how to modify the
// loop variable at the end of each loop
for (int n=10; n>0; n--) { --> a FOR loop--does the same thing as loop.cpp did, just all together in one line
cout<<"n is "<<n<<endl;
} --> here the loop ends
// in a for loop, the loop variable (in this case, 'n') only exists in
// the loop. we are not able to call 'n' from out here
// uncomment the following line and see for yourself
// cout<<"n outside the loop: "<<n; when I uncommented this, g++ did not accept my code because n was not defined outside of the loop
return 0;
}
output:
same as above (when the line is commented)
PRACTICE1.CPP
#include <iostream>
using namespace std;
int main() {
int n=0, m=0; -->starts both the n and m variables at 0
while(n<10) { --> while n is less than 10 it does all the things enclosed in the brackets
// this is the slow (or outer) loop
cout << "n is " << n << ": "; --> displays what number n is, followed by a colon
m=0;
while(m<=n) { --> while n is less than or equal to n it does the following command
// this is the fast (or inner) loop
// in this loop, the slow loop variable (n) is a constant
// this loop must run to completion before the slow loop
// can progress (during every iteration of the slow loop!)
cout << m;
m++; --> increases m by 1 until it stops meeting the m<=n condition
}
// now the fast loop has finished and the slow loop can
// continue with the current iteration
cout << endl; --> makes it go to the next line in the output
n++; --> after it increased m by one a bunch, it goes back to the outer loop, increases n by 1, and does it all again until the outer loops condition stops being met
}
return 0 ;
}
gives the output:
n is 0: 0
n is 1: 01
n is 2: 012
n is 3: 0123
n is 4: 01234
n is 5: 012345
n is 6: 0123456
n is 7: 01234567
n is 8: 012345678
n is 9: 0123456789
MY NEW VERSION, WITH A FOR LOOP (PRACTICE2.CPP)
#include <iostream>
using namespace std;
int main() {
for (int n=0, m=0; n<10; n++) { --> the initialization is executed, then the condition is checked. if it is met, it continues to do the command a line below
cout << "n is " <<n << ": "; --> tells it to display what n is followed by a colon (displays directly what is in quotes, and for the n not in quotes it displays the value of n
for (int m=0; m<=n; m++) { -->another command, still before it "goes back up" to the previous for command
cout << m; --> displays the m as long as the m<=n condition is met, then goes back up and adds 1 to m, then goes way back up and finishes the first for command by adding 1 to n. Then it loops and does all this again
}
cout << endl;
}
return 0 ;
}
gives same output!
HW 5
notes: &p is the address of p, *p makes it find the data within that address
LOGICSTATEMENT.CPP
#include <iostream>
using namespace std;
int main() {
int n = 10; --> starts n at 10
while (n>=10) { --> while n is greater than or equal to 10:
if(n>5) { --> if n is greater than five, do the following command
cout<<"n is "<<n<<endl; -->display n is what it is (says the content in quotes directly, then after that it says n, but since it's not in quotes it substitutes in the actual value of n
}
else { --> otherwise (in this case though, never, since all values greater than or equal to 10 are greater than 5)
cout<<"n = "<<n<<endl; --> display what n is (just like above)
n--; --> subtract one from n
}
return 0;
}
}
output:
n is 10
POINTERS.CPP
#include <iostream>
using namespace std;
int main() {
int i = 10;
cout << "The memory address of i is " << &i << "\n"; --> \n ends the line (but make sure the slash is going in this direction!!!)
cout << "The data stored at memory address " << &i << " is " << i << "\n";
int* p = &i;
cout << "The value of p is " << p << "\n";
cout << "We say that p 'points at' the memory location referenced by address " << p << "\n";
cout << "The data stored at memory address " << p << " is " << *p << "\n";
return 0;
}
PROGRAM1.CPP
#include <iostream>
using namespace std;
int main(){
int i = 10; --> declares i to be 10
int j = i; --> declares j to be the value of i at that moment---if i changes, j will still just be 10 because that is what i was when we said int j = i
cout << "i= " << i << " and j= " << j << "\n";
i=5;
cout << "i= " << i << " and j= " << j << "\n";
j=1;
cout << "i= " << i << " and j= " << j << "\n";
return 0;
}
PROGRAM2.CPP
#include <iostream>
using namespace std;
int main(){
int i = 10;
int* p = &i; --> points to the data inside the "mailbox" of i -- the * after int means that the p is a pointer not just a variable; also, if we wanted to declare a pointer to a double, we just say double* p =
cout << "i= " << i << " and *p= " << *p << "\n"; --> *p is the data that p points to
i=5; --> when i changes, so does *p, because both are referencing the same data
cout << "i= " << i << " and *p= " << *p << "\n";
*p=1; --> similarly, when *p is changed, so is i
cout << "i= " << i << " and *p= " << *p << "\n";
return 0;
}
output:
i= 10 and *p= 10
i= 5 and *p= 5
i= 1 and *p= 1
NEWPOINTER.CPP
#include <iostream>
using namespace std;
int main(){
int* p = new int(5); --> puts the data of 5 into some random available memory location, and then the p variable points to this location
cout << "p points at address " << p << "\n";
cout << "The data stored in address " << p << " is " << *p << "\n";
*p = 10; --> when we change the value of *p, since p points at the memory location with the 5 in it, the data inside that memory location is what its changed
cout << "Now the data stored in address " << p << " is " << *p << "\n";
return 0;
}
output:
p points at address 0x1bd7010
The data stored in address 0x1bd7010 is 5
Now the data stored in address 0x1bd7010 is 10
HW5CODE.CPP
#include <iostream>
using namespace std;
int main() {
int n = 5;
cout << "we start off with n equal to 5." << endl;
while (n<=45) {
if (n<21) {
n=n*2; --> at first, I tried just saying n*2, but it does not actually multiply it by 2 if you do that. This is what you need to do; you can also do "n*=2" as a shortcut way to do it
cout << "here, when n (before being multiplied) is less than 21, we multiply it by two: " <<endl<< "n is " <<n<<endl;
}
else {
n++;
cout << "otherwise, aka once n rises above 21, we just add 1 to n each time: " <<endl<< "n is " <<n<<endl;
}
}
cout << "the original while statement was while n is less than or equal to 45, so once n goes above 45 the commands stop being executed" <<endl;
return 0;
}
output:
we start off with n equal to 5.
here, when n (before being multiplied) is less than 21, we multiply it by two:
n is 10
here, when n (before being multiplied) is less than 21, we multiply it by two:
n is 20
here, when n (before being multiplied) is less than 21, we multiply it by two:
n is 40
otherwise, aka once n rises above 21, we just add 1 to n each time:
n is 41
otherwise, aka once n rises above 21, we just add 1 to n each time:
n is 42
otherwise, aka once n rises above 21, we just add 1 to n each time:
n is 43
otherwise, aka once n rises above 21, we just add 1 to n each time:
n is 44
otherwise, aka once n rises above 21, we just add 1 to n each time:
n is 45
otherwise, aka once n rises above 21, we just add 1 to n each time:
n is 46
the original while statement was while n is less than or equal to 45, so once n goes above 45 the commands stop being executed
HW 6
notes: use descriptive variable names!
doubles are stored in scientific notation and ints are stored in just binary, so can't do decimals
exit root by typing ".quit"
ARRAY.CPP - in src
#include <iostream>
using namespace std;
int main() {
int ii[3] = {1,2,3}; --> the 3 is the number of columns, then the 1, 2, and 3 are the array
int j=0;
while (j<3) {
cout <<" ii of "<<j<<" is "<<ii[j]<<endl; --> ii of 0 is the first number in the array, 1 is the second, 2 is the third
j++;
}
int LL[2][3] = {1,2,3,4,5,6}; -->a new integer LL is introduced, with 2 ROWS and 3 COLUMNS (listed as the numbers from left to right in the first row and then from left to right in the second)
j=0; --> j, from earlier, is now made to equal 0
int k; --> k is introduced as an integer but does not have a value yet
while (j<2) {
k=0; --> k is initialized at 0
while (k<3) { --> does the following cout statement plus the increase of n by 1, as long as k is less than 3
cout<<" LL of "<<j<<" "<<k<<" is "<<LL[j][k]<<endl; --> outputs the number in row 0, column 0 first, then row (the j) 0, column 1, as k increases
k++;
}
j++; --> once k, the column number, gets to 3, it comes down here and executes the increasing j command, and then makes k start at 0 again and therefore outputs all three number in column number 1 instead of 0
}
return 0;
}
output:
ii of 0 is 1
ii of 1 is 2
ii of 2 is 3
LL of 0 0 is 1
LL of 0 1 is 2
LL of 0 2 is 3
LL of 1 0 is 4
LL of 1 1 is 5
LL of 1 2 is 6
COMMENTLINES.CPP
#include <iostream>
using namespace std;
/************************************************\
* *
* Arrays * --> when making multi line comments, they have to start with /* and end with */
* This program demonstrates arrays *
* *
\************************************************/
int main() {
// a loop to demonstrate 1D arrays --> single line comments just start with // and end when the line ends
int ii[3] = {1,2,3};
int j=0;
while (j<3) {
cout <<" ii of "<<j<<" is "<<ii[j]<<endl;
j++;
}
// a loop to demonstrate 2D arrays
int LL[2][3] = {1,2,3,4,5,6};
j=0;
int k;
while (j<2) {
k=0; // do not forget to initialize k here
while (k<3) {
cout<<" LL of "<<j<<" "<<k<<" is "<<LL[j][k]<<endl;
k++;
}
j++;
}
return 0;
}
READINGDATA.CPP
#include <iostream>
#include <fstream>
using namespace std;
int main() {
ofstream myfile; --> makes the variable "myfile" of the type ofstream [output file stream]
myfile.open("example.txt"); --> opens and makes the file example.txt -- since it's output file it's automatically created
myfile<<"write some junk."; --> puts "write some junk" into this file
myfile.close(); --> closes the file
return 0;
}
there is no visible output, but the file "example.txt" has been created, and inside are the words "write some junk"
VECTORS.TXT
1 2 3
4 5 6
7
DOTPROD.CPP
/* DOTPROD.CPP */
#include <iostream>
using namespace std;
double dot_prod(double v1[3],double v2[3]) { --> makes a double variable called dot_prod. The stuff in parentheses is the type of stuff that the function needs to be able work (it needs two double arrays each with 3 columns/elements - within the function, these values are called v1 and v2 (like the A and B of a algebraic function))
double dotdot; --> makes a double variable called dotdot
dotdot = v1[0]*v2[0]+v1[1]*v2[1]+v1[2]*v2[2];
cout<<" The dot product is "<<dotdot<<endl;
return 0;
}
MAIN.CPP
/* MAIN.CPP */
#include <iostream>
#include <fstream>
// include the program dotprod.cpp so that we can find the dot_prod function
#include "dotprod.cpp"
using namespace std;
int main () {
// declare the vectors
double vector1[3]; --> vector1 is made, to have 3 columns
double vector2[3]; --> vector2 is made, also with 3 columns
// open the input file
ifstream infile; --> an object called infile [inputfile] is created, of the type ifstream (this lets you read from a file)
infile.open("vectors.txt"); --> opens the vectors.txt file
// store the input in the vectors and print the vectors for the user
infile>>vector1[0]>>vector1[1]>>vector1[2]; --> reads the input file and puts the first three terms into the first 3 columns of vector 1
cout<<" Vector 1 is ("<<vector1[0]<<","<<vector1[1]<<","<<vector1[2]<<")"<<endl;
infile>>vector2[0]>>vector2[1]>>vector2[2]; --> reads the input file and puts the first three terms into the first 3 columns of vector 1
cout<<" Vector 2 is ("<<vector2[0]<<","<<vector2[1]<<","<<vector2[2]<<")"<<endl;
// close the input file
infile.close(); --> closes vectors.txt
// call the dot_prod function from dotprod.cpp
dot_prod(vector1,vector2);
return 0;
}
output:
Vector 1 is (1,2,3)
Vector 2 is (4,5,6)
The dot product is 32
Here is where I added the scalar multiple to the main.cpp program:
SCALARMULT.CPP
#include <iostream>
using namespace std;
double scalar_mult(double v1[3], double sc) {
double product1;
double product2;
double product3;
product1 = v1[0]*sc;
product2 = v1[1]*sc;
product3 = v1[2]*sc;
cout<<" The scalar multiple is ("<<product1<<","<<product2<<","<<product3<<")"<<endl;
return 0;
}
MAIN.CPP
/* MAIN.CPP - EDITED */
#include <iostream>
#include <fstream>
// include the program dotprod.cpp so that we can find the dot_prod function
#include "dotprod.cpp"
#include "scalarmult.cpp"
using namespace std;
int main () {
// declare the vectors and the scalar
double vector1[3];
double vector2[3];
double scalar ;
// open the input file
ifstream infile;
infile.open("vectors.txt");
// store the input in the vectors and print the vectors for the user
infile>>vector1[0]>>vector1[1]>>vector1[2];
cout<<" Vector 1 is ("<<vector1[0]<<","<<vector1[1]<<","<<vector1[2]<<")"<<endl;
infile>>vector2[0]>>vector2[1]>>vector2[2];
cout<<" Vector 2 is ("<<vector2[0]<<","<<vector2[1]<<","<<vector2[2]<<")"<<endl;
infile>>scalar;
//make the next line of the infile (the 7) the data for scalar
cout<<" The scalar is "<<scalar<< endl;
// close the input file
infile.close();
// call the dot_prod function from dotprod.cpp
dot_prod(vector1,vector2);
//call the scalar_mult function from scalarmult.cpp
scalar_mult(vector1,scalar);
scalar_mult(vector2,scalar);
return 0;
}
output:
Vector 1 is (1,2,3)
Vector 2 is (4,5,6)
The scalar is 7
The dot product is 32
The scalar multiple is (7,14,21)
The scalar multiple is (28,35,42)
RANDOMNUMBERS.CPP
#include <iostream>
#include <math.h>
using namespace std;
// this function is the actual random number generator
// this code is stolen from the book numerical recipes for fortran
// it relies on random generated from overflow of memory locations
// and is a pseudo random number generator
const int a = 7141;
const int c = 54773;
const int mmod=256200;
double getFlatRandom(int& inew) {
double mranflat = 0.;
inew = inew%mmod;
double aa = double(inew)/double(mmod);
mranflat=aa;
inew = a*inew+c;
return mranflat;
}
// in this code, we will call the pseudo-random number generator and learn some things
// about its properties by filling and then displaying a histogram
int main() {
int num;
cout << "Enter the number of loop iterations: ";
cin >> num;
int inew = 2345; // This is the "seed" for the random number generator
// we will put the results from the call into a histogram that we can look at, to learn some of its
// properties. This histogram has 10 bins.
int histo[10] = {0,0,0,0,0,0,0,0,0,0};
double atmp; //
// call the random number generator 1000 times and fill a histogram
for (int i = 0; i<num; i++){
atmp=getFlatRandom(inew); // call the random number generator
histo[int(atmp*10)]++; // increment the histogram bin the number falls within
}
// print the histogram to the screen
for (int i = 0; i<10; i++){
cout<<i<<": ";
for (int j=0; j<int((double(100*histo[i])/double(num))+0.5); j++) {
cout << "=";
}
cout << endl;
}
return 0;
}
I placed in a cout statement to learn more about how this is just pseudo-random numbers--each time, the numbers come in the same order.
if (i<10) {
cout <<int(atmp*10)<<endl; //atmp*10 is the bins of the histogram, which in this case \
are the random numbers we are looking at, only going up to 10
} --> this whole thing was placed before the part where its prints the histogram to the screen
ACTUALRANDOMNUMBERS.CPP
#include <stdio.h>
#include <stdlib.h> /* srand, rand */
#include <time.h>
#include <iostream> --> I had to add this line, so that it knows what the cout command does
using namespace std;
int main(){
cout << "Random Number" << rand()%100; /* This will print a random number between [0,100). */ -->it always gives 83--if we want a different number, we have to change the seed
return 0;
}
output:
Random Number83
HW7
in class, we worked on Madgraph and generated particle collision events, using some of these commands:
cd CMSSW_5_3_32/src
cmsenv
mkdir MCProduction --> made a new directory for us to do our work in
mv ~/Downloads/MG5_aMC_v2.6.0.tar.gz MCProduction/ --> moved our newly downloaded file into this MCProduction directory -->NOTE: the backslash is AFTER the mcproduction--meaning it will go into that. no need for a / before the mcproduction (this will go to something completely different) since mcproduction is a directory within the place where you are typing
cd MCProduction/
tar -xvzf MG5_aMC_v2.6.0.tar.gz --> here, we unzip the file
./bin/mg5_aMC --> this command starts madgraph, bc of its dot at the beginning
within madgraph:
MG5_aMC>generate p p > t t~ --> this generated a process
HW8
to comment eVERYTHING, you can do ctrl cc or something like that
if you want to copy your thing using shift ctrl c, do the cat command
HIGGSANALYSIS.C (in src, Higgs mass)
#define HiggsAnalysis_cxx
#include "HiggsAnalysis.h"
#include <TH2.h>
#include <TStyle.h>
#include <TCanvas.h>
void HiggsAnalysis::Loop()
{
// In a ROOT session, you can do:
// Root > .L HiggsAnalysis.C
// Root > HiggsAnalysis t
// Root > t.GetEntry(12); // Fill t data members with entry number 12
// Root > t.Show(); // Show values of entry 12
// Root > t.Show(16); // Read and show values of entry 16
// Root > t.Loop(); // Loop on all entries
//
// This is the loop skeleton where:
// jentry is the global entry number in the chain
// ientry is the entry number in the current Tree
// Note that the argument to GetEntry must be:
// jentry for TChain::GetEntry
// ientry for TTree::GetEntry and TBranch::GetEntry
//
// To read only selected branches, Insert statements like:
// METHOD1:
// fChain->SetBranchStatus("*",0); // disable all branches
// fChain->SetBranchStatus("branchname",1); // activate branchname
// METHOD2: replace line
// fChain->GetEntry(jentry); //read all branches
//by b_branchname->GetEntry(ientry); //read only this branch
if (fChain == 0) return;
Long64_t nentries = fChain->GetEntriesFast();
Long64_t nbytes = 0, nb = 0;
TFile* output = TFile::Open("Dielectron_MC.root", "RECREATE"); // "RECREATE" would produce a new root file with name Dielectron_MC.root every time you run the code
TH1F* Z_ee = new TH1F("Z_ee", "Di-electron candidate invariant mass", 200, 0, 200); --> tells it how to range and title the histogram
TH1F* H_zz = new TH1F("H_zz", "ZZ candidate invariant mass", 200, 0, 300);
double el1mt = 0.0;
double el1pt = 0.0;
double el1eta = 0.0;
for (Long64_t jentry=0; jentry<nentries;jentry++) {
Long64_t ientry = LoadTree(jentry);
if (ientry < 0) break;
nb = fChain->GetEntry(jentry); nbytes += nb;
// if (Cut(ientry) < 0) continue;
TLorentzVector el1, el2, el3, el4; --> makes these four variables in the TLorentzVector class
el1.SetPtEtaPhiM(f_lept1_pt, f_lept1_eta, f_lept1_phi, 0.0); -->sets the characteristics of these variables
el2.SetPtEtaPhiM(f_lept2_pt, f_lept2_eta, f_lept2_phi, 0.0);
el3.SetPtEtaPhiM(f_lept3_pt, f_lept3_eta, f_lept3_phi, 0.0);
el4.SetPtEtaPhiM(f_lept4_pt, f_lept4_eta, f_lept4_phi, 0.0);
TLorentzVector zCandidate =el1 + el2 ; --> makes another variable, the zCandidate
TLorentzVector zCandidate2 =el3 + el4;
TLorentzVector Higgs =zCandidate + zCandidate2;
Z_ee->Fill(zCandidate.M()); --> fills the histogram with all the different values of the variable in parentheses, when it runs the loop
H_zz->Fill(Higgs.M());
el1mt = el1.Mt();
cout << el1mt << endl;
}
Z_ee->Write();
H_zz->Write();
output->Close();
}
to use the code:
-go into root, by doing root -l outputakaanyofthefilesinthedirectory
-do the 3 commands in blue above, within root
-once it runs this, type .quit to quit root
-now go back into root, but into the new thing it created: root -l Dielectron_MC.root
-within root, do TBrowser t to open the object browser and then under root files, click on the dielectron part, and you can see your histograms there for you to click on
HW9
-in python, when you set something equal to another thing only the first thing is changed
-once you run a code, everything in that is still existing even in your new code line things
-TABS define when you are in or out of a loop/function
-it only recognizes stuff that is in the same tab as it. if you are outside a loop and want to print something you defined above inside a loop, it wont work
-within loops, use RETURN
-if confused about functions, see the second thing under the built in functions topic
- the = is an equation. while the == means "is equal to" or "is" (like a true or false thing)
-lists are in [these brackets]
-when you use built in functions, be sure to put ( BEFORE you put the [list] or somethin. you know it works if it provides the end thing for you, otherwise its probably wrong
-use ELIF if you have an "else" statement that you want it to be if the stuff doestn satisfy EITHER of the if statements, not just one
-IF STATEMETNS NEED TO HAVE COLONS AFTER THEM
-IN LISTS -1 IS THE LAST ELEMENT AND -2 IS THE SECOND TO LAST (and remember btw, 0 is the first and 1 is the second)
-you can directly put stuff into functions, even if they are not specific variables
pandas=10 --> makes the variable pandas equal to 10
zebras=20
total_animals_in_zoo=pandas+zebras -->the variable total_animals_in_zoo is set equal to these two variables added together
print(total_animals_in_zoo) --> prints the value of that variable
output: 30
pi=3.14159 #Approx. --> makes the variable pi, equal to 3.14159
diameter=10
area=pi*(diameter/2)*(diameter/2) --> makes the area equal to the variables multiplied by each other, forming the equation pi*rsquared. note--cannot do ^2 in python; there is likely some other way to do exponents
print(area)
output: 78.53975
#####Method 1########
a = [1, 2, 3]
b = ['cat', 'dog', 'fish']
a=b --> first, a is set equal to b--so now a says cat dog fish
b=a --> b is set equal to the a, after a has already been turned into cat dog fish
print('Method 1') --> prints just the words Method 1, since this is in singular quotes
print(a)
print(b)
#####Method 2########
a = [1, 2, 3]
b = ['cat', 'dog', 'fish']
temp = a --> now, temp is set to the a that is there currently--1 2 3
a = b --> a is set equal to b--so a now reads as cat dog fish
b = temp --> then b is set equal to temp--since temp was made while a was still 1 2 3, b is now set equal to 1 2 3
print('Method 2')
print(a)
print(b)
output:
Method 1
['cat', 'dog', 'fish']
['cat', 'dog', 'fish']
Method 2
['cat', 'dog', 'fish']
[1, 2, 3]
dog_legs=0
human_legs=4
goldfish_legs=2
four=human_legs -->made a temporary variable four to store the value of 4 that was originally in the human_legs variable
human_legs=goldfish_legs -->set the human_legs equal to the value of 2 that is in the goldfish_legs
goldfish_legs=dog_legs -->set the goldfish_legs equal to the value of 0 that is in the dog_legs
dog_legs=four -->set the value of dog_legs equal to the value of 4 that is in four (since 4 is no longer in the human_legs variable, since this one was changed)
print('dog_legs:')
print(dog_legs)
print('human_legs:')
print(human_legs)
print('goldfish_legs:')
print(goldfish_legs)
output:
dog_legs:
4
human_legs:
2
goldfish_legs:
0
FUNCTIONS:
def round_to_two_places(num): --> def is used to define functions. The function round_to_two_places will be defined below, in the tabbed words. This fucntion is a function of one variable, a number num
return round(num, 2) --> the function is to just return/output the number that comes from this command--the 'round' command, that takes in a number called num and rounds it to 2 decimal places, specified by the 2
print(round_to_two_places(4.325456463)) --> prints the output of the function when 4.325456463 is put as the input
output: 4.33
def round_to_n_places(num,n): --> the argument takes in two named values, separated by a comma
return round(num,n) --this is the output of the function (what you tell it to RETURN)!!!! so if you want to print this later, this is what it will print
print(round_to_n_places(5.12462356,5)) -->importantly, this is no longer tabbed! that way it is out of the loop.
output: 5.12462
BUILT IN FUNCTIONS:
a=[1,5435,324,3,645365,23,4,5665,435,31, -543]
b=-10
c='cat'
print(c)
print(abs(b)) #absolute value
print(min(a)) #min value
print(max(a)) #max value
print(sum(a)) #sum
print(len(c)) #length of a string
print(sorted(a)) #sorts a list
output:
cat --> the thing that is variable c
10 -->the absolute value of the number in b
-543 -->the minimum value of the string of numbers in a
645365
656743 --> the total sum when all the numbers of a are added together
3 --> the length of the string in c -- it has 3 letters -->NOTE: if you do len(a), it outputs 11 because there are 11 numbers separated by commas in a -->NOTE AGAIN: if you change c to be 'cat', 'dog' it will say that its length is TWO because now there are two items
[-543, 1, 3, 4, 23, 31, 324, 435, 5435, 5665, 645365] -->a, sorted from least to greatest
a=[3243,645645,54346] --> here we defined two lists as variables a and b
b=[7,3,2]
def product_of_maxes(list1, list2): --> we defined a function, that takes in two variables called list1 and list2. when we call the function, we have to give it the two things we want to sub into these variables
max1=max(list1) --> makes a new variable for within the function, called max1, that takes the max value of the list we will put in as list1
max2=max(list2)
return max1*max2 --> makes the output of the function to be the product of the two variables we made within this function
print(product_of_maxes(a,b)) --> back outside the loop, we use what we have defined outside the loop. we tell it to print the output of the function, subbing in a and b into the variables list1 and list2 of the function
output:
4519515
BOOLEANS AND CONDITIONS:
print(3==10) --> note that we use the == since this is saying "is equal to" (or essentially just "is") not like...stating an equation. so since it's not equal to, it prints out false
print(len('abc')==3) --> prints true or false if the length of "abc" is 3--in this case it is, since abc has three letter
print(max([10,40,53]) < 25) --> prints true or false if the max number in that list is less than 25--since the max is 53, this is false
print (5 != 3) --> prints true or false if 5 does not equal 3
output:
False
True
False
True
print(max([5,56,3])>20) -->prints true or false if the max of this list is greater than 20--true, since the max is 56
print(min([55,54,53.1])<27) --> prints true or false if the min of this list is less than 27--false, since the minimum is 53.1
print(sum([5,10,11]) != 26) --> prints true or false if the sum of the numbers in this list does not equal 26--it does equal 26, so this is false
output:
True
False
False
a=10
b=20
if(a>b): -->if the value of a is greater than b, we tell it to print these words
print(a, 'is greater than', b)
elif(a<b): -->elif--else, if a is less than b, we tell it to print that b is greater than a
print(b, 'is greater than', a)
else: -->else -- if neither of these conditions are satisfied, we tell it to print that they are equal -->NOTE:the else applies to the one right above it I believe. so, if we were to change the above "elif" to just "if", then if the things satisfied the first part it prints that but ALSO prints the "theyre equal", bc this is printed if they do not satisfy the "if" right above it
print("they're equal")
output:
20 is greater than 10
def longer_string(string1, string2): --> defines the function longer_string, of the variables string1 and string2
if(len(string1) > len(string2)): -->if the length of string1 is greater than the length of string2...
return string1 -->....output the string1
elif(len(string1) < len(string2)):
return string2
else: -->if neither of these are satisfied, return nothing -- so there is no output (so if you ask it to print the function when this is the case, it says "none")
return
print(longer_string('abc', 'jfkdla;s'))
print(longer_string('abcfdsafdasg', 'jfkdla;s'))
print(longer_string('abc', 'def'))
output:
jfkdla;s
abcfdsafdasg
None
def can_make_cake(eggs, flour, milk, almond_milk): --> defines the function, with four variables called the ingredient names
#I can make cake if I have some kind of milk AND flour AND eggs
return (milk or almond_milk) and flour and eggs > 0 --> if all of these are greater than 0, it will return true -- and only ONE of the types of milks has to be greater than 0, since there is an or
print('Can I make cake?')
#What is the following line implying?
print(can_make_cake(10, True, False, True)) --> this line implies that we have 10 eggs, an amount of flour, no milk, and an amount of almond_milk
#redo the above call to the function with 0 eggs:
print(can_make_cake(0, True, False, True)) --> here, we have no eggs (we could have also put in false as the eggs variable, as well as numbers for the other variables
output:
Can I make cake?
True
False
MODULUS OPERATOR (%)
the mod OPERATOR gives the remainder when you divide two number--- so 10%3 is 1-----this is a good tool to use to determine if a number is even or odd (n%2 is 1 for odd numbers)
a = [-35423,-5432654,-352435,53252]
def abs_of_min_is_odd(list1): --> defined this function, of one variable
minimum=min(list1) --> made a new variable called minimum, that finds the min of the input
absv=abs(minimum) --> made a new variable called absv that uses the built in abs command to find the absolute value of that min
odd=absv%2 --> made a new variable called odd that uses the modulus operator and finds the remainder when you divide the absv by 2
if(odd > 0): -->if the variable odd is greater than 0 (aka just 1 for this) it will do the below thing -->NOTE that "if" statements need to have a colon after them or else they dont work
return "odd" -->if above, the output of the function is the word odd
else: --> otherwise, it does what is below
return "even" -->the output is the word even if the variable odd is 0
print(abs_of_min_is_odd(a)) --> prints the output of this function when the variable a, that we defined above the function, is inputted
output:
even
LISTS
remember--lists are indexed starting at 0 (in matlab its at 1!!)
def select_third_element(my_list): --> defines a function, with the variable my_list
if(len(my_list) < 3): --> if the length of the input is less than 3, do the following
return None --> no output
else: --> otherwise -- if the length is 3 or more, do the following
return my_list[2] #remember index 0, so third element is index 2 --> the output is the THIRD element of the list input
foo = ['a', 'b', 'c', 'd'] --> a new variable is made outside the function, its a list called foo
print(select_third_element(foo)) --> print the output of the function when foo is put as the input
list1=[1,2,3] --> a new variable is made
list2=[4,5,6]
list_of_lists=[list1,list2] --> a new LIST VARIABLE -- made up of the two lists
print(foo[1]) --> prints out the element number 1 of the foo list
print(list_of_lists[1][2]) --> the first number is that of the element of the list_of_lists, and then within that element/list, the second number tells it which element of THAT list to go to
print(list_of_lists[0]) --> so here, it just prints out the 0th element of the variable -- the full list1
print(list_of_lists[1]) --> this is the same as above, except its element number 1 -- the second list, list2
print(list_of_lists[0][1]) --> here, the 0 tells it to look at the 0th element--list1--and then the 1 tells it to look at the element number 1 within that 0th element
print(list_of_lists[0][2]) -->now it prints out element number 2, still within the 0th element (list1)
print(list_of_lists[1][0]) --> now it has switched to the element number 1 of the list_of_lists, which is list2, and then it prints out the 0th element of that list
print(foo[-1]) -->LOOK!!!!! negative numbers go to the end of the list, starting with -1
output:
c
b
6
[1, 2, 3]
[4, 5, 6]
2
3
4
d
real_madrid=['zidane', 'ramos', 'marcelo']
barcelona=['valverde', 'messi', 'pique']
teams = [real_madrid, barcelona]
def losing_team_captain(team_list):
return team_list[-1][0] -->the -1 refers to the losing team--the last team in the "teams" variable. the 0 refers to the first element in that team--their captain
print(losing_team_captain(teams))
output:
valverde
standings=['mario', 'bowser', 'luigi','peach'] -->defines this new list variable
def purple_shell(racers): --> defines the new function of one variable
firstplace=racers[0] --> a new variable is made, equal to the first (0) element in the input list
lastplace=racers[-1] --> a new variable is made equal to the last (-1!!) element in the input list
standings[0]=lastplace -->now, the input list is changed--its first (0) element is changed to be what is in the variable lastplace
standings[-1]=firstplace --> here its last element is changed to be what is in the variable firstplace
return racers --> the output of the function will be the updated variable standings, which was the input list, except now changed
print(purple_shell(standings)) --> prints the output of the function when the list standing is inputted in
output:
['peach', 'bowser', 'luigi', 'mario']
LOOPS:
def list_contains_seven(my_list):
for element in my_list: --> for all the elements in the variable, do the following
if element ==7: --> since this is a LOOP, it will run this for all the elements
return True --> if it does find an element that is equal to 7, it will output true and that is the end of the function
return False --> once it has run the loop for all the elements and not found anything, it will return False
print(list_contains_seven([6,14,5,7])) --> look--you can directly put stuff into functions, even if they are not specific variables!!
print(list_contains_seven([6,14,5,9]))
output:
True
False
def count_to_10(num): --> makes a new function of one variable
while(num<10): --> while the input is less than 10, do the following
num=num+1 --> makes the number equal to the number plus one--aka, adds one to the number
print(num) --> prints the number
count_to_10(0) -->puts zero into the function--so it will count from 1 (since 0 goes in and gets one added to it) to 10 and then stop the loop
count_to_10(5) --> puts 5 into the function--so it will count from 6 to 10
count_to_10(11) --> puts 11 into the function--so it will not print anything, since 11 is greater than 10
output:
1
2
3
4
5
6
7
8
9
10
6
7
8
9
10
list=[1,21,7,-14,49,74,700]
def seven_counter(my_list):
counter=0 --> makes a variable that will serve as our counter
for element in my_list:
if element%7==0: --> if the element is divisible by 7 (since the remainder is 0), do the following
counter=counter+1 --> add one to the counter (so then it adds one each time this is true)
print(counter) --> prints the final counter number after the loop has been run for all the elements
seven_counter(list)
output: 5
DICTIONARIES:
capitals={'United States': 'DC', 'France': 'Paris', 'Mexico': 'DF'} --> the keys are the words before the colons, and the values are the words after----note the { type of parentheses
populations={'UMD':40000, 'Towson': 30000, 'UMBC': 20000}
print(populations.get('UMD')) -->gets the value after the key UMD, in the dictionary variable populations
output: 40000
############################DO NOT TOUCH THIS CODE#########################
deck={} #represents a deck of cards where the keys are the suits and values are lists of card values
hearts=[]
clubs=[]
diamonds=[]
spades=[]
values=['ace', 'two', 'three', 'four', 'five', 'six','seven', 'eight', 'nine', 'jack', 'king', 'queen']
def create_cards(suit):
for value in values:
suit.append(value)
create_cards(hearts)
create_cards(clubs)
create_cards(diamonds)
create_cards(spades)
##################################Add your code here according to the comments ##########################################
#Use the strings 'h', 's', 'c', 'd' as keys, and add the lists of values to the dictionary.
keys = ['h','s','c','d']
deck = dict.fromkeys(keys,values)
#Print a view of dictionary (key, value) pairs
print(deck.items())
#Print a view of all of the keys
print(deck.keys())
#Print a view of all of the values
print(deck.values())
#Remove all of the spades from the deck
deck.pop('s')
#Add a new entry to the dictionary with key 'j' and values 'joker1' and 'joker2'
deck.setdefault('j',['joker1', 'joker2'])
print(deck)
#Clear the dictionary
deck.clear()
print(deck)
output:
dict_items([('h', ['ace', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'jack', 'king', 'queen']), ('s', ['ace', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'jack', 'king', 'queen']), ('c', ['ace', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'jack', 'king', 'queen']), ('d', ['ace', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'jack', 'king', 'queen'])])
dict_keys(['h', 's', 'c', 'd'])
dict_values([['ace', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'jack', 'king', 'queen'], ['ace', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'jack', 'king', 'queen'], ['ace', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'jack', 'king', 'queen'], ['ace', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'jack', 'king', 'queen']])
{'h': ['ace', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'jack', 'king', 'queen'], 'c': ['ace', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'jack', 'king', 'queen'], 'd': ['ace', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'jack', 'king', 'queen'], 'j': ['joker1', 'joker2']}
{}
EXTERNAL LIBRARIES:
import math as m #I use the as m part so when I refer to the library I can just type m instead of math. Its just an abbreviation. -->imports the math external library
print(m.pi, m.log(32, 2)) --> outputs a rounded value of pi --> also, the first number for log is the what you find the log of, and the second number is the base
print(m.gcd(75348597,979531683)) --> this finds the greatest common denominator between the two numbers
print(m.cos(10)) --> finds the cosine of 10, when 10 is in radians
output:
3.141592653589793 5.0
3
-0.8390715290764524
import matplotlib -->imports the external library matplotlib
import matplotlib.pyplot as plt -->imports this external library, abbreviated by plt
import numpy as np -->imports the numpy external library, abbreviated by np
# Data for plotting
t = np.arange(0.0, 2.0, 0.01) -->the variable t, begins at 0, goes to 2, and goes by increments of .01
s = 1 + np.sin(2 * np.pi * t) -->the variable s, is equal to 1 + the sin of 2*pi*t (t being the variable, and pi being the value of pi---note that we had to get pi from the np library here)
fig, ax = plt.subplots() --> makes the figure ax
ax.plot(t, s) --> makes a plot, w the variable t on the horizontal and s on the vertical
ax.set(xlabel='time (s)', ylabel='voltage (mV)', --> sets the axis labels on the graph
title='About as simple as it gets, folks')
ax.grid()
plt.show()
output:
import matplotlib -->imports the external library matplotlib
import matplotlib.pyplot as plt
import numpy as np
data = {'a': np.arange(50), --> makes the x axis go to 50
'c': np.random.randint(0, 50, 50), -->makes a bunch of random values, with 0 as the low, 50 as the high, and 50 as the size
'd': np.random.randn(50)} --> returns samples from the standard normal distribution
data['b'] = data['a'] + 10 * np.random.randn(50) -->tells how high the y axis should go
data['d'] = np.abs(data['d']) * 100
plt.scatter('a', 'b', c='c', s='d', data=data) -->s is the scalar shape
plt.xlabel('entry a') --> makes the x axis label go to the value of a
plt.ylabel('entry b') --> makes the y axis label go to the value of b
plt.show() -->shows the plot
output:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(0.0, 4.7, .1) --> makes the x values go from 0 to 4.7 in increments of .1
y = 3*x + np.square(x) --> makes the y values equal to 3x + x^2
plt.plot(x,y) --> makes the plot of (x,y)
plt.xlabel('x values') --> makes the intercept labels
plt.ylabel('y values')
plt.show()
output:
# imports some software packages we'll use
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
inline_rc = dict(mpl.rcParams)
# a hashtag tells the program "don't read the rest of the line"
# That way we can write "comments" to humans trying to figure out what the code does
two_u = pd.read_csv('https://github.com/adamlamee/HEP-data/raw/master/Double_Muon_Run2011A.csv')
# two_e = pd.read_csv('https://github.com/adamlamee/HEP-data/raw/master/Double_Electron_Run2011A.csv')
# one_u = pd.read_csv('https://github.com/adamlamee/HEP-data/raw/master/Single_Muon_Run2011A.csv')
# one_e = pd.read_csv('https://github.com/adamlamee/HEP-data/raw/master/Single_Electron_Run2011A.csv')
data = two_u
# The .head(n) command displays the first n rows of a file.
data.head(3)
# The .shape command displays the (number of rows , number of columns) in a file.
data.shape
output:
(475465, 20)
# You can specify a column by dataset.columnName (e.g., two_u.E1)
# This makes a new column called "totalE" and fills it with (E1 + E2) for each event
data['totalE'] = data.E1 + data.E2
# This makes a new column called "Esquared" and fills it with E1^2 for each event
data['Esquared'] = data.E1**2
# makes the histogram
plt.hist(data.totalE, bins=10, range=[0,120], log=False)
plt.title("Dimuon Events")
plt.xlabel("x-axis label")
plt.ylabel("number of events")
output:
Text(0,0.5,'number of events')