2/7-2/13 (Second Semester logbook)
Our project is to work on the future circular electron positron calorimeter. The purpose of the project is to use simulations to develop a new calorimeter specialized for measuring the energy of hadrons. The fundamental goal is to decrease the resolution (sigma) as small as possible. The resolution is determined by the formula
σ/E=s/√E⊕n/E⊕c
Luckily, the largest term, c is controllable by designing a better calorimeter.
Detection has an "active" scintillator part, and a "passive" material part. The passive part creates the particle shower and the active part measures the number of particles created.
Putting more scintillators makes the measurement more accurate, but introduces the possibility of measuring the same particle twice. For the passive material, atoms with high atomic masses create more interactions, and increase the reliability of showers being produced in an area it can easily be detected.
Ideally, a calorimeter also has a low temporal resolution, meaning it can distinguish between showers created by one particle or two.
Simulation is done on the Tier 3 computing cluster with the GEANT library (Though it seems to work on CMS VM too?).
2/14-2/20
The next step is to run a sample simulation to get used to using GEANT. The procedure to run the test is here:
mkdir junk
cd junk
cp /home/eno/dualReadout/cepc_calotiming/g4env.sh .
source g4env.sh
cp -r /cvmfs/geant4.cern.ch/geant4/10.5/share/examples/basic/B4/B4a .
mkdir build
cd build
cmake -DGeant4_DIR=/cvmfs/geant4.cern.ch/geant4/10.5/x86_64-slc6-gcc63-opt/lib64/GEANT4-10.5.0 ../B4a
make
./exampleB4a
Xm
(be a little patient… the x11 connection is slow)
Next to the “session” prompt in the window it pops up
/run/beamOn 1
It generates a (mostly) useless root file, and produces the following picture, representing a particle shower being made and detected:
When multiplying the initial energy by 10 from 50MeV to 500MeV, the shower gets more intense. The rays exiting the calorimeter demonstrate the constant error term C, which grows at these higher energies. Out of the total 500MeV, 461MeV was detected (92%)
The general flow of using GEANT 4:
2/21-2/27
Continuing with the tutorial, we are given a sample calorimeter to get an idea of what kind of output to expect. We perform a run with 1000 1GeV electrons to see the average deposited energy and resolution. The code by default records energy deposited and free path (Mean free path) in Eabs and Labs histograms.
When running at 1Gev, the mean energy deposited is 905MeV with a resolution of 19.69MeV. This means it should use an ajustment factor of 1.104 (for electrons) and the resolution is boosted to 21.74MeV. This factor means not only should detection be linear, but as much energy as possible should be absorbed.
2/28-3/6
This week, we moved to trying to combine the simulations together, and make a resolution/energy by energy plot. An expected plot would start high at low energies, and drop to the constant term for higher energies. The program used to generate the plot was used previously to generate this graph:
A similar file should be generated as the output of running two new programs: CC_Analyzer, and res. However, the Analyzer is not specified to work with the output of the root files generated in the tutorial, so it has to be customized. So far, The main tree has been changed so it is reading the histograms correctly, but it has an incorrect list of histograms to read as input.
The correct path is B4>row_wise_branch>{Eabs, Egap, Labs, Lgap}
Also, the Analyzer file uses a strange system where local variables are loaded by the callee through pointers. There is never a reason to do this. It almost looks as bad as the code I write.
In the following picture, the file component is inaccurately labeled as "Tree"
Ignoring this analysis file for the time being, I altered res.C to accept input directly from the B4 histograms, but the output doesn't quite match. The problem is not in the simulation because opening the histograms in root shows a standard deviation value in line with what is expected from the prediction curve, but the line generated graphically does not match.
3/7-3/27
Not much was accomplished over this period due to getting prepared for and used to https://en.wikipedia.org/wiki/2019%E2%80%9320_coronavirus_pandemic
Other than that, I found that the reason for the wacky data was the fact that the analysis phase was incomplete. The most important information is held in the Egap histogram, which measures the energy deposited in the scintillator, whereas Eabs is deposited in the passive material. The RMS of this detector is much higher than previously stated because I accidentally switched the two histograms. The real way to calculate RMS is first taking the average value of Etrue / Egap and multiplying each Egap entry by that number. The standard deviation of the resultant distribution is the resolution.
After realizing this, I went back to work on the analyzer code, commented out most of the references to unused histograms, and focused on Egap and Eabs, as those are the only available datapoints. Egap approximates Etrue well enough for large detectors, but it's not something you can measure in a real experiment. Unfortunately, the root file I was working with had a weird structure which made it hard to parse the original event tuples without getting either no data or a null pointer error. I am still working on this problem.
3/28-4/3
The bug causing loss of access to the histograms was fixed. Apparently accessing subfolders required use of a method I did not know about. The same simulations as before were run at 20~80MEV and were successfully analyzed. Unfortunately, when it came to fitting a gaussian distribution, the mean was calculated to be negative several times. This is not because the simulation generated negative energy, but rather because the graph followed more of a power law with several events clustering at 0 energy. This made it look like the tail end of a distribution that continued into the negatives. Simply increasing the energy to 1~4GeV was enough to alleviate this problem. The following is a sample histogram from 3GeV. "ajusted" means it represents the real energy detected multiplied by a correction factor.
A gaussian curve can fit this with a reasonable mean, so it can be plotted into a RMS / Energy plot like the one before. The axis has been changed a bit to represent higher energies.
so at 3GeV, the resolution is 25% or about 750MeV
4/3 - present
There is another plot that can be used to analyze the effectiveness of estimating energy using a calorimeter. That is the estimate / real energy plot. Up until now, each histogram had its entries corrected by a factor meant to center it around the true energy. However, practically speaking, this factor is only calculated once for a single, usually high energy, then applied to all energy levels. This means that the calorimeter could be inherently better at detecting high energy particles (which they usually are), and those factors do not apply well to lower energies. I ran a few simulations and the results matched expectations: Electrons, being easy to detect and measure, did not have much variation, even when the correction factor is limited. However, pions did have some variation, even when the calorimeter was made 10 times the original thickness.
(please ignore the trend lines. They are completely useless and I don't know how to get rid of them)
It seemed that most of the variation came below 40GeV, so I decided to focus new simulations specifically on that range. Unfortunately, I once again run into the problem that low energies cause gaussian curves to fit poorly, resulting in a spike on the left. Other than that, the pions specifically show more variation than the electrons. I am confused about how the statistically anomaly seemed to apply more to the electrons than the pions. There also seemed to be a degree of overfitting as there should be no bump between 20 and 50Gev. This was simply the range that was not sampled.
It turned out that the bug in the previous code was caused by the code I used to assign means. At 2Gev, 3Gev was assigned to the mean, etc. This was fixed, but a new problem cropped up with electrons from 30-49GeV (once again, a programming error) however, that part of the graph is not extremely important as the electron energy is very predictable, and the pions are accurate.
Pions do seem to have an issue when below 4Gev, but they do mostly satisfy the predicted upward trend.
[I put a huge thing about how all these experiments were redone with dualtoy, but google decided to delete that for me and I'm not retyping it all so sorry if this is tiny with only 2 graphs.]
we redid all the resolution things for dualtoy (can be found on Dr. Eno's github) and found that it had a resolution worse than that of B4a because its interaction length ratio was like 1:200 ( bad) instead of 1:5 (better). The results from cherenkov radiation in the quartz did not have any better resolution. We also graphed a plot of scintillator energy vs cherenkov photons and found they were both correlated, likely because of initial shower size.
___________________________________________________________________________________________________________________________
I am used to using linux, so I know most of these commands (but ls -l was a bit new to me).
++ adds 1 to the variable and -- subtracts 1. ++n adds 1 to n then uses its value while n++ uses the value of n first. C++ has its name because it is more than just C (C+1).
Code running from HW4:
#include <iostream>
using namespace std;
int main() {
cout <<"Hello World!" << endl; //Print hello world to screen followed by end line (endl)
//variables
int i=2;//put the number 2 on the stack for later use
cout << "i = "<<i<<endl; //print i = 2 with a newline
double a=3.3;//put the double precision value 3.3 onto the stack
cout << "a = " <<a<<endl; //print a = 3.3
int j = a*i; //makes 6.6 then cast to the int 6
cout << "a*i = "<<j<<endl;// print a*i = 6
int n=10;//set n to 10
cout << "n is "<<n<<endl;//prints n is 1
n--;//decrease n to 9
cout<<"n is now "<<n<<endl;//prints n is now 9
n++;//increase n to 10
cout<<"n is now "<<n<<endl;//print n is now 10
//non-numeric variables
bool prop;//new boolean value
prop = (5>1);//set prop to true (because 5 is more than 1)
cout<<"prop is "<<prop<<endl;//print prop is true
prop = (1>5);//set prop to false because 5 is not less than 1
cout<<"prop is "<<prop<<endl;//print prop is false
prop = (1 != 5);//prop is true because 1 is not 5
cout << "prop is " <<prop<<endl;//print prop is true
n=10;//set n (loop counter) to 10
while(n>0) {//this loop prints "n is 10" "n is 9" "n is 8" ... until "n is 1" then stops.
cout<<"n is "<<n<<endl;
n--;
}
{
//rewriting of the while loop code
for(int n=0;n<10;++n){
//slow loop
cout<< "n is "<<n<<": ";
for(int m=0;m<=n;m++){
//inner loop
cout<<m;
//this bit will write n is 0: 0 then n is 1: 01 then n is 2: 012, counting up to 9: 0123456789
}
cout<<endl;
}
}
//debugging
int idebug=1;
int super_secret_value = 0x4f;//who knows what this is?
if(idebug)
{
cout<<super_secret_value<<endl;//oh, it's 79
}
return 0;//end program
}
Result when running:
Hello World!
i = 2
a = 3.3
a*i = 6
n is 10
n is now 9
n is now 10
prop is 1
prop is 0
prop is 1
n is 10
n is 9
n is 8
n is 7
n is 6
n is 5
n is 4
n is 3
n is 2
n is 1
n is 0: 0
n is 1: 01
n is 2: 012
n is 3: 0123
n is 4: 01234
n is 5: 012345
n is 6: 0123456
n is 7: 01234567
n is 8: 012345678
n is 9: 0123456789
79
HW5:
#include <iostream>
using namespace std;
int main() {
int n = 10;
while (n>=10) { //only runs while n is 10 or above
if(n>5) {
cout<<"n is "<<n<<endl;//prints "n is 10" once
}
else {
cout<<"n = "<<n<<endl;//never runs
n--;
}
return 0;//immediately quit after first loop
}
}
The first bit of code simply prints "n is 10" because the decrement is inside the else statement, the check is >=, and the return is inside the while loop. If these statements are moved out, and the check is changed to >=0 it prints:
n is 10
n is 9
n is 8
n is 7
n is 6
n = 5
n = 4
n = 3
n = 2
n = 1
n = 0
Pointers:
#include <iostream>
using namespace std;
int main() {
int i = 10;//add variable i onto the stack
cout << "The memory address of i is " << &i << "\n";// &i gets the address of i (near the top of the stack)
cout << "The data stored at memory address " << &i << " is " << i << "\n"; //state the data stored at (some hexcode) is 10
int* p = &i;// declare pointer p which points at i, and puts p on the stack
cout << "The value of p is " << p << "\n";// prints p is (some hexcode)
cout << "We say that p 'points at' the memory location referenced by address " << p << "\n";
cout << "The data stored at memory address " << p << " is " << *p << "\n"; //*p evaluates to 10
return 0;
}
/*PROGRAM 1*/
#include <iostream>
using namespace std;
int main(){
int i = 10;
int j = i;//copy the value of i (10) into j
cout << "i= " << i << " and j= " << j << "\n";// i= 10 and j= 10. two different variables with the same value
i=5;
cout << "i= " << i << " and j= " << j << "\n";//i= 5 and j= 10. the value of i changing does not affect j.
j=1;
cout << "i= " << i << " and j= " << j << "\n";//i= 5 and j= 1. the value of j changing does not affect i.
return 0;
}
/*PROGRAM 2*/
#include <iostream>
using namespace std;
int main(){
int i = 10;
int* p = &i;
cout << "i= " << i << " and *p= " << *p << "\n";// i= 10 and *p= 10. they seem to act the same
i=5;
cout << "i= " << i << " and *p= " << *p << "\n";// i= 5 and *p= 5. they still act the same
*p=1;
cout << "i= " << i << " and *p= " << *p << "\n";// i= 1 and *p= 1 they still act the same
return 0;
}
Why is the behaviour of Program1 different than that of Program2?
Program 1 creates two independent variables while Program 2 creates 2 aliases for the same variable.
New:
New puts the variable into the heap instead of on the stack.
#include <iostream>
using namespace std;
int main(){
int* p = new int(5);// put int 5 into the heap, assign its pointer to p.
cout << "p points at address " << p << "\n";//p points at (some hexcode)
cout << "The data stored in address " << p << " is " << *p << "\n";// the data stored in p is 5
*p = 10;// reassign the heap entry to 10
cout << "Now the data stored in address " << p << " is " << *p << "\n";// now the data stored is 10
return 0;
}
Understanding test:
#include <iostream>
using namespace std;
int main(){
//To test your understanding
//try writing a similar code.
//In your code, use the 'new'
//construct to create a new pointer,
//'p1', and point it at a new
//double with value '3.14'. Then,
double data = 3.14;
double* p1 = &data;
//declare another pointer, 'p2',
//and point it at the same double
//as p1. Print the address pointed
double* p2 = p1;
//to by each pointer, and the
//value when the * operator is used
//on each pointer. Finally,
cout<<"p1 is "<<p1<<" and *p1 is "<<*p1<<endl;
cout<<"p2 is "<<p2<<" and *p2 is "<<*p2<<endl;
//multiply the value of the original
//double by 2 (you'll have to use
//the * operator for this) and
//print the values of *p1 and *p2 again.
data=data*2;
cout<<"p1 is "<<p1<<" and *p1 is "<<*p1<<endl;
cout<<"p2 is "<<p2<<" and *p2 is "<<*p2<<endl;
return 0;}
p1 is 0x7ffcf18530c8 and *p1 is 3.14
p2 is 0x7ffcf18530c8 and *p2 is 3.14
p1 is 0x7ffcf18530c8 and *p1 is 6.28
p2 is 0x7ffcf18530c8 and *p2 is 6.28
HW6:
#include <iostream>
using namespace std;
int main() {
// a loop to demonstrate 1D arrays
int ii[3] = {1,2,3};
int j=0;
while (j<3) {
cout <<" ii of "<<j<<" is "<<ii[j]<<endl;//prints each element of the array in turn
j++;//increase j by 1 (could be done as a for loop)
}
// a loop ot demonstrate 2D arrays
int LL[2][3] = {1,2,3,4,5,6};//2 rows 3 columns
j=0;
int k;
while (j<2) {
k=0; // do not forget to initialize k here
while (k<3) {
cout<<" LL of "<<j<<" "<<k<<" is "<<LL[j][k]<<endl;
k++;//print the elements like [ 1 2 3 ]
// [ 4 5 6 ]
}
j++;
}
return 0;
}
#include <iostream>
#include <fstream> // include the filestream operations
using namespace std;// remove std:: prefix
int main() {
ofstream myfile;// create output file stream
myfile.open("example.txt");// create output file named example.txt in current directory
myfile<<"write some junk.";// write "write some junk." to the file stream
myfile.close();// save changes and close the file.
return 0;//exit program
}
HOMEWORK 6:
The demonstration code is exactly the same as the code from homework 5.
How are these lemenst distributed in rows and colums?
The distribution starts with filling the first row. once it's full, it moves on to the second row.
Note: you can, and in some cases should, use braces to specify rows and columns. Fins a simple example on google.
If you have a jagged array where the lengths of the rows are not always equal, you have to specify them using {{},{},{}} syntax.
#include <iostream>
#include <fstream>
using namespace std;
int main() {
ofstream myfile;//create output file stream
myfile.open("example.txt");//name the file example.txt
myfile<<"write some junk.";//output some junk to the file
myfile.close();//flush buffer and save file
return 0;//return without error
}
/* DOTPROD.CPP */
#include <iostream>
using namespace std;
double dot_prod(double v1[3],double v2[3]) {//take 2 3-dimensional vectors
double dotdot;
dotdot = v1[0]*v2[0]+v1[1]*v2[1]+v1[2]*v2[2];//multiply each element and add results
cout<<" The dot product is "<<dotdot<<endl;
return 0;
}
Scalarmult File:
#include <iostream>
/* SCALARMULT.CPP */
void scalar_mult(double vec[3], double scalar) {
std::cout<<"The scalar multiple is ("<<
vec[0]*scalar<<","<<//multiply scalar by each element
vec[1]*scalar<<","<<//print the new vector
vec[2]*scalar<<")"<<std::endl;
}
Main file:
/* MAIN.CPP */
#include <iostream>
#include <fstream>
// include the program dotprod.cpp so that we can find the dot_prod function
#include "dotprod.cpp"
#include "scalarmult.cpp"
using namespace std;
int main () {
// declare the vectors
double vector1[3];
double vector2[3];
// open the input file
ifstream infile;
infile.open("vectors.txt");
// store the input in the vectors and print the vectors for the user
infile>>vector1[0]>>vector1[1]>>vector1[2];
cout<<" Vector 1 is ("<<vector1[0]<<","<<vector1[1]<<","<<vector1[2]<<")"<<endl;
infile>>vector2[0]>>vector2[1]>>vector2[2];
cout<<" Vector 2 is ("<<vector2[0]<<","<<vector2[1]<<","<<vector2[2]<<")"<<endl;
double scale;//declare scalar
infile>>scale;//read scalar from file
cout<<"The scalar is "<<scale<<endl;//print scalar
// close the input file
infile.close();
// call the dot_prod function from dotprod.cpp
dot_prod(vector1,vector2);
scalar_mult(vector1,scale);//multiply the vectors by their scalar
scalar_mult(vector2, scale);//both vectors
return 0;
}
Simulation:
#include "Riostream.h"
#include <iostream>
#include <fstream>
/*
some code to help us understand why and how resolutions affect our ability to measure the
mass of a particle
*/
// main routine
void resolutions(){
// set debug mode
bool idebug=true;
// set up the display
gROOT->Reset();
gROOT->SetStyle("Plain");
gStyle->SetOptStat(0);
gStyle->SetOptFit(11);
gStyle->SetErrorX(0);
c1 = new TCanvas("c1","Our data",200,10,700,500);
c1->SetFillColor(0);
c1->SetBorderMode(0);
c1->SetBorderSize(1);
c1->SetFrameBorderMode(0);
// book some histograms
TH1F *histo1 = new TH1F("histo1","mass",1000,0.,200.);
// setup random number generator
gRandom->SetSeed();
// get the secret parameters
ifstream myfile;
myfile.open("secretparameters.txt");
double mass, resE, resA;
myfile>>mass>>resE>>resA;
myfile.close();
// make N bosons at rest
// each boson will decay to 2 back-to-back daughter particles
// they will be back-to-back by conservation of momentum, since the momentum of the mother was zero
// assume that the mass of the daughter particles is small compared to the mother mass so we can
// assume that their energy will be large compared to their mass and we can treat them as massless.
// and thus their energy is equal to the magnitude of their momenta
// by conservations of energy, their energy
// work in 2D. Can always choose my coordinate system so that the 2 daughters are in the same plane
int N=8000;
double etrue1,etrue2,phitrue1,phitrue2; // true energy of the 2 daughters
double e1,px1,py1,phi1; // smeared 4-momenta of daughter 1
double e2,px2,py2,phi2; // smeared 4-momenta of daughter 2
double masssmeared;
for(int i=0;i<N;i++) {
// set true energy
etrue1=mass/2.;
etrue2=mass/2.;
if(idebug) cout<<"etrue "<<etrue1<<" "<<etrue2<<endl;
// choose phi for daughter 1 and daughter 2
phitrue1=2*TMath::Pi()*gRandom->Rndm();
phitrue2 = phitrue1+TMath::Pi();
if(idebug) cout<<"phitrue "<<phitrue1<<" "<<phitrue2<<endl;
// smear true energy with resolution of detector
e1=etrue1+resE*gRandom->Gaus(0.,1.);
e2=etrue2+resE*gRandom->Gaus(0.,1.);
if(idebug) cout<<"e "<<e1<<" "<<e2<<endl;
//smear angles with resolution of the detector
phi1=phitrue1+resA*gRandom->Gaus(0.,1.);
phi2=phitrue2+resA*gRandom->Gaus(0.,1.);
if(idebug) cout<<"phi "<<phi1<<" "<<phi2<<endl;
//calculate 4 momenta after smearing
px1=e1*cos(phi1);
py1=e1*sin(phi1);
px2=e2*cos(phi2);
py2=e2*sin(phi2);
if(idebug) cout<<"pxs "<<px1<<" "<<py1<<" "<<px2<<" "<<py2<<endl;
// calculate smeared mass
masssmeared=sqrt((e1+e2)*(e1+e2) - (px1+px2)*(px1+px2) - (py1+py2)*(py1+py2));
if(idebug) cout<<"masssmeared "<<masssmeared<<endl;
histo1->Fill(masssmeared);
histo1->Draw("");
c1->Update();
}//add the close brace here.
c1->SaveAs("c1.gif");
}
Transverse momentum histogram:
Homework 7:
Top quark histogram:
Homework 8:
#define HiggsAnalysis_cxx
#include "HiggsAnalysis.h"
#include <TH2.h>
#include <TStyle.h>
#include <TCanvas.h>
void HiggsAnalysis::Loop()
{
// To read only selected branches, Insert statements like:
// METHOD1:
// fChain->SetBranchStatus("*",0); // disable all branches
// fChain->SetBranchStatus("branchname",1); // activate branchname
// METHOD2: replace line
// fChain->GetEntry(jentry); //read all branches
//by b_branchname->GetEntry(ientry); //read only this branch
if (fChain == 0) return;
Long64_t nentries = fChain->GetEntriesFast();
Long64_t nbytes = 0, nb = 0;
TFile* output = TFile::Open("Dielectron_MC.root", "RECREATE"); // "RECREATE" would produce a new root file with name Dielectron_MC.root every time you run the code
TH1F* Z_ee = new TH1F("Z_ee", "Di-electron candidate invariant mass", 200, 0, 200);//histogram of first Z boson
TH1F* H_zz = new TH1F("H_zz", "ZZ candidate invariant mass", 200, 0, 300);// histogram of original higgs
TH1F* Z2_ee = new TH1F("Z2_ee", "Di-electron candidate2 invariant mass",200,0,200);//histogram of second Z boson
double el1mt = 0.0;
double el1pt = 0.0;
double el1eta = 0.0;
for (Long64_t jentry=0; jentry<nentries;jentry++) {
Long64_t ientry = LoadTree(jentry);
if (ientry < 0) break;
nb = fChain->GetEntry(jentry); nbytes += nb;
// if (Cut(ientry) < 0) continue;
TLorentzVector el1, el2, el3, el4;// capture 4 electrons from H->ZZ->(e+e-)(e+e-)
el1.SetPtEtaPhiM(f_lept1_pt, f_lept1_eta, f_lept1_phi, 0.0);
el2.SetPtEtaPhiM(f_lept2_pt, f_lept2_eta, f_lept2_phi, 0.0);
el3.SetPtEtaPhiM(f_lept3_pt, f_lept3_eta, f_lept3_phi, 0.0);
el4.SetPtEtaPhiM(f_lept4_pt, f_lept4_eta, f_lept4_phi,0.0);
TLorentzVector zCandidate = el1 + el2;//add together the mass of the first electron pair
TLorentzVector zCandidate2 = el3+el4;//add together the mass of the second electron pair
TLorentzVector hCandidate = zCandidate+zCandidate2;//add together mass of Z bosons to get higgs
Z_ee->Fill(zCandidate.M());//record first Z boson
Z2_ee->Fill(zCandidate2.M());//record second Z boson
H_zz->Fill(hCandidate.M());//record mass of Higgs
el1mt = el1.Mt();
cout << hCandidate.M() << endl;//print higgs mass
}
Z_ee->Write();//save all 3 histograms to the file.
Z2_ee->Write();
H_zz->Write();
output->Close();
}
(Machine learning thingy)
Homework 9.5 code:
import tensorflow as tf #import all the libraries needed to run
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (10,10) # Make the figures a bit bigger (try changi
from keras.datasets import cifar10
from keras.datasets import mnist #nist database of handwritten digits. A dataset of 6
from keras.models import Sequential #model consists of sequential layers
from keras.layers import Dense, Dropout, Activation # Dropout consists in randomly se
from keras.utils import np_utils
nb_classes = 10 #10 classes (from 0 to 9) # the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
print("X_train original shape", X_train.shape)
print("y_train original shape", y_train.shape)
for i in range(9):
plt.subplot(3,3,i+1)#display 9 sample images including their classification
plt.imshow(X_train[i], cmap='gray', interpolation='none')#show image in greyscale
plt.title("Class {}".format(y_train[i]))#show classification
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')#train with floating point data (0%-100%)
X_test = X_test.astype('float32')
X_train /= 255 # try commenting these two lines and see if you can train the netwrk
X_test /= 255
print("Training matrix shape", X_train.shape)
print("Testing matrix shape", X_test.shape)
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
model = Sequential()# create neural network
model.add(Dense(512, input_shape=(784,)))#add input layer to hidden layer
model.add(Activation('relu')) # An "activation" is just a non-linear function applied to # of the layer above. Here, with a "rectified linear unit" # we clamp all values below 0 to 0.
model.add(Dropout(0.2)) # Dropout helps protect the model from memorizing or "overfitt
model.add(Dense(512)) #create a second hidden layer
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))# create an output layer which picks from each digit 0-9
model.add(Activation('softmax')) #turn output layer into probability map
model.compile(loss="categorical_crossentropy", optimizer="adam")# create AI model
model.fit(X_train, Y_train,batch_size=128, epochs=4,verbose=1,validation_data=(X_test, Y_test))#provide the data set to the AI
score = model.evaluate(X_test, Y_test, verbose=1)#get loss
predicted_classes = model.predict_classes(X_test)#get predicted digits
correct_indices = np.nonzero(predicted_classes == y_test)[0]#count number of correct digit guesses
incorrect_indices = np.nonzero(predicted_classes != y_test)[0]# count number of incorrect digit guesses
plt.figure()
for i, correct in enumerate(correct_indices[:9]):#get first 9 correct digits
plt.subplot(3,3,i+1)
plt.imshow(X_test[correct].reshape(28,28), cmap='gray', interpolation='none')#display the images associated with the correct guesse
plt.title("Predicted {}, Class {}".format(predicted_classes[correct], y_test[correct]))
plt.figure()
for i, incorrect in enumerate(incorrect_indices[:9]):#get first 9 incorrect digits
plt.subplot(3,3,i+1)
plt.imshow(X_test[incorrect].reshape(28,28), cmap='gray', interpolation='none')#display the images that tricked the AI
plt.title("Predicted {}, Class {}".format(predicted_classes[incorrect], y_test[incorrect]))#display the predicted digit and the real digit side-by-side
(x_train, y_train), (x_test, y_test) = cifar10.load_data()#this is a bunch of pictures. If this is loaded instead of mnist, it will categorize the pictures
titanic = pd.read_csv("https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/carData/TitanicSurvival.csv")#open data table about titanic survivors
titanic.head(5)#read first five entries and print them to the screen.
Homework 9 code:
pandas=10 #declare pandas (type int) to be 10.
zebras=20 #declare zebras (type int) to be 20.
total_animals_in_zoo=pandas+zebras# add the pandas and zebras, get 30
print(total_animals_in_zoo)#print 30
pi=3.14159 #Approx.
diameter=10
######YOUR CODE HERE####
radius= diameter/2 #find radius
area = radius*radius*pi#circle area formula
print(area)#output area to screen
########################
#####Method 1########
a = [1, 2, 3] #set a={1,2,3}
b = ['cat', 'dog', 'fish']# set b to list of animals
a=b#now, both a and b are lists of animals, and the original numbers list is gone
b=a#this does nothing
print('Method 1')#start method 1
print(a)#prints a list of animals
print(b)#also prints a list of animals
#####Method 2########
a = [1, 2, 3]#set a to numbers
b = ['cat', 'dog', 'fish']#and b to animals
temp = a#create a temporary copy of the numbers list
a = b#set a to animals (doesn't completely delete the numbers list)
b = temp#set b to the temporary copy of the numbers list
print('Method 2')#start method 2
print(a)#prints animals
print(b)#prints numbers (as expected)
dog_legs=0
human_legs=4
goldfish_legs=2
######YOUR CODE HERE###########
temp=dog_legs#hold on to the value 0
dog_legs=human_legs#give dogs 4 legs
human_legs=goldfish_legs#give humans 2 legs
goldfish_legs=temp#give goldfish stored 0 legs
print("dog legs: ",dog_legs)#prints 4
print("human legs: ",human_legs)#prints 2
print("goldfish legs: ",goldfish_legs)#and prints 0, as expected
def round_to_two_places(num):#declare function that takes a number
return round(num, 2)#return itself, rounded to two places
print(round_to_two_places(4.325456463))#print 4.33, rounding up
def round_to_n_places(num, places):#declare general rounding function with optional number of places
#####Your Code Here############
return round(num,places)#return num rounded to (places) digits
print(round_to_n_places(pi,1))#test rounding pi to various digit places
print(round_to_n_places(pi,2))#3.14
print(round_to_n_places(pi,3))#3.142
print(round_to_n_places(pi,4))#3.1416
a=[1,5435,324,3,645365,23,4,5665,435,31, -543] #a is a list with a bunch of numbers
b=-10#b is negative, abs(b) should be positive
c='cat'#cat is 3 long
print(c)#prints "cat"
print(abs(b)) #absolute value
print(min(a)) #min value
print(max(a)) #max value
print(sum(a)) #sum
print(len(c)) #length of a string
print(sorted(a)) #sorts a list
a=[3243,645645,54346]#a=list1
b=[7,3,2]#b=list2
def product_of_maxes(list1, list2):#the function takes 2 list arguments
########YOUR CODE HERE#############
max1 = max(list1)#find first max
max2 = max(list2)#find second max
return max1*max2#return the producs
print(product_of_maxes(a,b))#prints 7*645645 or 4519515
print(3==10)# check equality between 3 and 10 (False)
print(len('abc')==3)#check length of 'abc' and compare to 3 (True)
print(max([10,40,53]) < 25) #find max (53) and compare to 25 (False)
print (5 != 3)#test if 5 is not equal to 3 (True)
#####YOUR CODE HERE###########
#write code that evaluates? ok then
x=eval("1==3 or 4>2")# reduces to (False or True), which evaluates to True
print(x)#print True
a=10#a is now 10
b=20#b is now 20
if(a>b):#fails because 20>10
print(a, 'is greater than', b)#prints a>b if a is indeed bigger than b
elif(a<b):#proceeds because 20>10
print(b, 'is greater than', a)#prints b>a if b is bigger than a
else:#handle other scenarios
print("they're equal")#print they're equal when neither is larger
a=40#test for the opposite
if(a>b):#succeeds because 40>20
print(a, 'is greater than', b)#prints a>b if a is indeed bigger than b
elif(a<b):#fails the first if activated
print(b, 'is greater than', a)#prints b>a if b is bigger than a
else:#handle other scenarios
print("they're equal")#print they're equal when neither is larger
b=40#b test for equality
if(a>b):#fails because 40==40
print(a, 'is greater than', b)#prints a>b if a is indeed bigger than b
elif(a<b):#fails because 40==40
print(b, 'is greater than', a)#prints b>a if b is bigger than a
else:#handle other scenarios
print("they're equal")#print they're equal when neither is larger
def longer_string(string1, string2):#function of 2 strings
if(len(string1) > len(string2)):#run if string1 is longer
return string1#return the longer string
elif(len(string1) < len(string2)):#run if string2 is longer
return string2#return the longer string
else:#when they're the same size
return#don't return either
print(longer_string('abc', 'jfkdla;s'))#print jfkdla;s
print(longer_string('abcfdsafdasg', 'jfkdla;s'))#print abcfdsafdasg
print(longer_string('abc', 'def'))#print None (they're the same length)
def can_make_cake(eggs, flour, milk, almond_milk):#define cake-making function taking ingredients as arguments
#I can make cake if I have some kind of milk AND flour AND eggs
return (milk or almond_milk) and flour and eggs > 0#
print('Can I make cake?')#print the practice introduction
#What is the following line implying?
print(can_make_cake(10, True, False, True))#print true
#redo the above call to the function with 0 eggs
### your code here ###
print(can_make_cake(0,True,False,True))#print false because we need more than 0 eggs
a = [-35423,-5432654,-352435,53252]
def abs_of_min_is_odd(thelist):#take a single list argument
###Your logic here###
isodd = [False,True][abs(min(a))%2]#when abs(min(a)) is even, gets the first element, which is False. When abs(min(a)) is odd, gets True
print(["even","odd"][isodd]) #prints even because abs(min) is 5432654, which is even
abs_of_min_is_odd(a)#call the function with a
def select_third_element(my_list):#
if(len(my_list) < 3):#if the list is too small to have a third element
return None#return None (python equivalent of null)
else:#otherwise
return my_list[2] #remember index 0, so third element is index 2
foo = ['a', 'b', 'c', 'd']# practically the same as foo="abcd"
print(select_third_element(foo))#prints c
list1=[1,2,3]#create rows of a matrix
list2=[4,5,6]
list_of_lists=[list1,list2]#create the double-indexed matrix
print(list_of_lists[1][2])#pick row 2, element 3, which is a 6
print(list_of_lists[0])#pick row 1, or list1 and print [1,2,3]
print(list_of_lists[1])#pick row 2, or list2 and print [4,5,6]
print(list_of_lists[0][1])#pick cell from row 1, column 2 (2)
print(list_of_lists[0][2])#pick cell from row 1, column 3 (3)
print(list_of_lists[1][0])#pick cell from row 2, column 1 (4)
real_madrid=['zidane', 'ramos', 'marcelo']#obviously the best team
barcelona=['valverde', 'messi', 'pique']#obviously the worst team
teams = [real_madrid, barcelona]#put the teams in a list
def losing_team_captain(team_list):#create function to identify the sacrifice... I mean losing team's captain
#######Your code here###########
losing_team=team_list[len(team_list)-1]#pick from the end of the list
print(losing_team[1])#print captain of losing team (messi)
losing_team_captain(teams)#call function with the teams
standings=['mario', 'bowser', 'luigi','peach']#good job mario, you're winning. And peach sucks
def purple_shell(racers):#uh, oh, I think peach is gonna get her revenge
#######YOUR CODE HERE##########
back=racers[len(racers)-1]#hold on to a copy of the last place
racers[len(racers)-1]=racers[0]#bring the frontrunner to last place
racers[0]=back #the former last-place person is now in front
purple_shell(standings)#make peach use a purple shell
print(standings)#print the updated standings with peach in front and mario at the back
def list_contains_seven(my_list):#function identifying sevens
for element in my_list:#go through the list
if element ==7:#if there is a seven
return True#return true
return False#return false when there are no sevens :(
print(list_contains_seven([6,14,5,7]))#print True
print(list_contains_seven([6,14,5,9]))#print False
def count_to_10(num):#counts to 10, starting from num+1
while(num<10):#so long as num is less than to
num=num+1#icrease num
print(num)#then print it (this can print 10)
count_to_10(0)#print 1,2...10
count_to_10(5)#prints 6,7...10
count_to_10(11)#prints nothing
list=[1,21,7,-14,49,74,700]#this has a lot of numbers divisible by seven (there are 5)
def seven_counter(my_list):#count the number of multiples of seven
return sum([b%7==0 for b in my_list])#sum up the occurences of multiples of seven
print(seven_counter(list))#prints 5
############################DO NOT TOUCH THIS CODE#########################
deck={} #represents a deck of cards where the keys are the suits and values are lists of card values
hearts=[] # represents all the hearts in a deck
clubs=[] # represents the clubs in a deck
diamonds=[] # represents the diamonds in a deck
spades=[]# represents the spades in a deck
values=['ace', 'two', 'three', 'four', 'five', 'six','seven', 'eight', 'nine', 'jack', 'king', 'queen'] #creates a list of possible card values
def create_cards(suit):#function that copies the contens of values into the list given as an argument
for value in values:#for each element
suit.append(value)#add it to the given list
create_cards(hearts)#create all the hearts
create_cards(clubs)#create all the clubs
create_cards(diamonds)#create all the diamonds
create_cards(spades)#create all the spades
##################################Add your code here according to the comments ##########################################
#Use the strings 'h', 's', 'c', 'd' as keys, and add the lists of values to the dictionary.
deck['h']=hearts#assign hearts
deck['s']=spades#assign spades
deck['c']=clubs#assign clubs
deck['d']=diamonds#assign diamonds
#Print a view of dictionary (key, value) pairs
print(deck)
#Print a view of all of the keys
print(deck.keys())
#Print a view of all of the values
print(deck.values())
#Remove all of the spades from the deck
deck.pop('s')
print(deck)
#Add a new entry to the dictionary with key 'j' and values 'joker1' and 'joker2'
deck['j']=['joker1','joker2']
print(deck)
#Clear the dictionary
deck.clear()
print(deck)
import math as m #I use the as m part so when I refer to the library I can just type m instead of math. Its just an abbreviation.
print(m.pi, m.log(32, 2))#print 3.1415... then print 5
print(m.gcd(75348597,979531683))#such factor, much aglorithm, wow
print(m.cos(10))#cos 10 radians is about -0.84. it's pretty close to -sin(1)
#Here's a more advanced example.
import matplotlib#importing math & graphics libraries
import matplotlib.pyplot as plt
import numpy as np
# Data for plotting
t = np.arange(0.0, 2.0, 0.01)#give t as an interval from 0 to 2 with a step of .01 between
s = 1 + np.sin(2 * np.pi * t) # is s a list variable? awesome! I guess np.sin responds with a custom math object which overrides the + operator. This is a bit much to expect us to understand, but it's cool nonetheless.
fig, ax = plt.subplots()# retrieving easy-to-use plot functionality
ax.plot(t, s)#plot the graph of s given t as a range
ax.set(xlabel='time (s)', ylabel='voltage (mV)',#label t with time and s with voltage (a sine wave)
title='About as simple as it gets, folks')#yeah, right
ax.grid()#enable grid display so we know the coordinate system
plt.show()#display the plot
#in case you were wondering, I wrote every single comment in this homework before running the code.
data = {'a': np.arange(50),#set data to be a dictionary with 3 numpy variables. a=[0..49]
'c': np.random.randint(0, 50, 50),#c= random from 0 to 50
'd': np.random.randn(50)}#d= some random double centered at 0
data['b'] = data['a'] + 10 * np.random.randn(50)#add a new variable, b=(index)+ a random number
data['d'] = np.abs(data['d']) * 100#reassign d to its absolute value
plt.scatter('a', 'b', c='c', s='d', data=data)#draw a scatter plot, color=c, size=d
plt.xlabel('entry a')#give the horizontal axis label a (the value which ranges from 0-49)
plt.ylabel('entry b')#label the vertical axis b (linear plus a random factor)
plt.show()#display the plot
#now I have to make my own numpy thing? if you say so...
t=np.arange(0,7,.01)#parametric plot, ranging t from 0-7
xx=[]#array of x points
yy=[]#and y points
x0=0#placement of x
y0=0#and y
for tt in t:#integrals are done best in a loop
x0+=m.cos(2*m.sin(tt))/100#increase x
y0+=m.sin(2*m.sin(tt))/100#increase y
xx.append(x0)#add x to xlist
yy.append(y0)#add y to ylist
x0=0#reset x for negative side
y0=0#reset y for negative side
xx=xx[::-1]#reverse list to fix discontinuity
yy=yy[::-1]#reverse list to fix discontinuity
for tt in t:#run through the negative plot
x0-=m.cos(2*m.sin(tt))/100#decrease x
y0+=m.sin(2*m.sin(tt))/100#increase y
xx.append(x0)#add x to xlist
yy.append(y0)#add y to ylist
fig, ax = plt.subplots()# retrieving easy-to-use plot functionality
ax.plot(np.array(xx), np.array(yy))#plot the graph of s given t as a range
ax.set(title='awesome dragon curves')#give it a righteous title
ax.grid()#enable grid display so we know the coordinate system
plt.show()#display the plot
(images)
(the better sine waves)