Famous Opening Quote
"Hello, how are you doing today?"
-Annabelle
Logbook:
September 13th, 2018
Bash practice:
echo $0
returns
bash
echo $SHELL
returns
/bin/bash/
These commands can be used to tell what language you are using in that terminal, and the directory of that language (if you needed to add commands to it)
pwd :print working directory, not used often as it is always written to the left of where you enter commands
ls: prints all the files and subdirectories of your working directory. (not sure what ls stands for, maybe "list stuff")
ls -l :more details of ls such as size and last modified.
mkdir : makes a directory (folder) with the name following the command in current folder
cd : change current directory to the one listed
-can take absolute directions from root or home
-can take relative directions from current directory
-can also take .. in order to move up a directory out of the folder
rmdir : remove directory with the name following the command
ls –CFx /usr/bin : I think this lists all commands for the terminal
ls –l /bin : I think lists base commands or something? (all I recognize is echo, su, and mount commands
HW3: Shell Scripts
Chmod allows you to change the permissions on a file, for example giving anyone the ability to run a executable bash script, but allowing noone to change it.
ls -l allows you to see the file permisions of all the files in a directory.
The following program takes one input, and then outputs the result of "find -name <the input>" to the test.txt file.
#bin/bash
Set arg=$1
Find ~ -name “*arg*” > test.txt
To do a find and replace command using bash do the following:
In file named wordreplace:
#bin/bash
Set arg1 = $1
Set arg2 = $2
sed -i -e 's/$arg1/$arg2/g' test.txt
What this does is replace arg1 with arg2, which for the exercise was done with:
Command:
./wordreplace was is
HW4 : C++
(This section is a little bare as I already know C from CMSC 216)
(So just some basic stuff, and commenting on differences between c and c++)
#include <iostream> //includes the library for terminal output (probably file output too)
using namespace std;// idk
int main() {// main method that will be run when the executable is run
cout <<"Hello World!" << endl;//Print hello world with new line (\n works too)
return 0; //Exit the program, and say that it ran correctly
}
find /usr/include/c++ -name "4*"
C++ 4.4.4, and C++ 4.4.7
I am using 4.4.7
G++ increments variable by 1, same as G+=1;
G--; decrements variable by 1, same as G-=1;
Booleans, declared bool, hold true and false values. (Is 0 and 1 in the background)
While (boolean or condition){
//Code to be executed repeatedly;
}
Just like c syntax, which I have a lot of experience with from CMSC 216.
Differences to C:
Main method declaration : in c is int main (void){} in c++ is int main(){}, in essence you don’t need to specify when you don’t need arguments
output : C uses printf and scanf for console input and output, C++ has these too but usually instead uses cout << and cout >>
For loops : can declare variables inside of for loops for(int i=0;i<10;i++), while in c you have to int i=0; for(i=0;i<10;i++)
“Your code is not doing what it should. Maybe it is even crashing. What do you do? Do you give up? Do you start randoming changing things? Of course ” -HW 4
On the danger of misleading quotes.
g++ compiles both cpp and c files, not sure what the difference is.
g++ works like gcc in that -o filename code.cpp creates a different file as the executable than a.out
HW5:
Pointers work in c++ just like in c.
However, unlike in c, c++ has a keyword called new.
the new keyword allows you to declare a pointer to a variable without first declaring the variable itself.
Used:
int * p = new int(5);
Now p would point to an integer with the value of 5.
HW 6: October 30th
Just like c syntax, which I have a lot of experience with from CMSC 216.
#include <iostream>
#include <fstream>
using namespace std;
int main() {
ofstream myfile;
myfile.open("example.txt");
myfile<<"write some junk.";
myfile.close();
return 0;
}
What this does is write “write some junk” into example txt.
Basically this works by replacing cout with another IOStream, in this case the file you can use the same syntax as printing output to console to add stuff to a file.
Computing
#include <iostream>
#include <math.h>
using namespace std;
// this function is the actual random number generator
// this code is stolen from the book numerical recipes for fortran
// it relies on random generated from overflow of memory locations
// and is a pseudo random number generator
const int a = 7141;
const int c = 54773;
const int mmod=256200;
double getFlatRandom(int& inew) {
double mranflat = 0.;
inew = inew%mmod;
double aa = double(inew)/double(mmod);
mranflat=aa;
inew = a*inew+c;
return mranflat;
}
// in this code, we will call the pseudo-random number generator and learn some things
// about its properties by filling and then displaying a histogram
int main() {
int num;
cout << "Enter the number of loop iterations: ";
cin >> num;
int inew = 2345; // This is the "seed" for the random number generator
// we will put the results from the call into a histogram that we can look at, to learn some of its
// properties. This histogram has 10 bins.
int histo[10] = {0,0,0,0,0,0,0,0,0,0};
double atmp; //
// call the random number generator 1000 times and fill a histogram
for (int i = 0; i<num; i++){
atmp=getFlatRandom(inew); // call the random number generator
histo[int(atmp*10)]++; // increment the histogram bin the number falls within
}
// print the histogram to the screen
for (int i = 0; i<10; i++){
cout<<i<<": ";
for (int j=0; j<int((double(100*histo[i])/double(num))+0.5); j++) {
cout << "=";
}
cout << endl;
}
return 0;
}
By removing the & you make inew passed by value rather than passed by reference, and so you end up always calling getFlatRandom with the same value, returning the same “random” number.
getFlatRandom works through integer overflow, which I know well.
The reason the numbers are called Pseudo random is that with the same seed, and the same number of iterations you get the same number, while this is terrible for small iterations, with 10000 iterations the random number generator is fairly good, but there are better.
Calorimeter:
N=10000
Physics:
The LEP tunnel (google it) has a circumference of 27 km. Assuming the maximum field strength the best magnet experts can currently produce for a large bore magnet is 8.4 Tesla, what will the energy of the protons in the LHC beams be?
T = JoulesAm2, 27km=27000m
P=.3qrB
Pc=.3qrBc=.3qrB
Pc = .3(27000/2π)q(8.4), maybe q=1 for a proton
Pc=10.828 GeV
E2=(Pc)2+(mc2)2 , m=938.54 x 106eV, m=.93854GeV
E=10.869Gev
What is the tunnel were along the earth's equator? What would the maximum energy be?
.3(40075000/(2π))(8.4)Gev=16072.898Tev
HW 7: November 12th
Done in class.
In essence we just installed mad graph and used it a bit, which was handled through very detailed instructions on the HW page.
Write a brief description of major parts of the CMS detector. Beam pipe, tracker, calorimeter, solenoid, muon tracker. What are the physical dimensions of each part?
Beam pipe: 6.3 cm in diameter
tracker: 1.1 meter radius
Calorimeter, about 1.5 m radius
Solenoid about 1 meter radius
Muon tracker: 4+ meters
Find how to change the display to show you a view along the beam pipe?
Which color tracks are electrons?
Green
Which color tracks are muon?
Long red
Which color track missing energy is represented with?
A purple dashed line
How are jets depicted in this visualization?
Yellow tubes
Can you figure out if a track has clockwise curvature, it is +vely charged or negatively charged? (when looking at the x-y view, with x along the horizontal axis)
It is positively charged
Study at least 2 events from each category in the files available with the display and specify how many objects of each kind are seen in every event?
Electron.ig:Events/Run_146644/Event_718940510 [1 of 25]
Electrons: 1
Muons: 0
Photons: 1
Jets: 8
missing energy: 1
Vertices: 0
Electron.ig:Events/Run_146644/Event_718948926 [2 of 25]
Electrons: 1
Muons: 0
Photons: 1
Jets : 14
missing energy: 1
Vertices: 0
Mu.ig:Events/Run_146436/Event_90625265 [1 of 25]
Electrons: 0
Muons:2
Photons: 0
Jets: 0
missing energy: 0
Vertices: 0
Mu.ig:Events/Run_146436/Event_90626440 [2 of 25]
Electrons: 0
Muons: 2
Photons: 0
Jets : 15
missing energy: 0
Vertices: 0
4lepton.ig:Events/Run_178424/Event_666626491 [1 of 3]
Electrons: 0
Muons: 4
Photons: 0
Jets: 30+
missing energy: 1
Vertices: 0
4lepton.ig:Events/Run_193575/Event_400912970 [2 of 3]
Electrons: 4
Muons: 1
Photons: 4
Jets : 30+
missing energy:2
Vertices: 0
diphoton.ig:Events/Run_199319/Event_641436592 [1 of 10]
Electrons: 0
Muons:2
Photons: 2
Jets: 15+
missing energy: 1
Vertices: 0
diphoton.ig:Events/Run_199699/Event_336259924 [2 of 10]
Electrons: 0
Muons: 0
Photons: 2
Jets : 15+
missing energy:1
Vertices: 0
MinimumBias.ig:Events/Run_148031/Event_441572355 [1 of 25]
Electrons: 0
Muons:0
Photons: 0
Jets: 12
missing energy: 0
Vertices: 0
MinimumBias.ig:Events/Run_148031/Event_441584555 [2 of 25]
Electrons: 0
Muons: 0
Photons: 0
Jets : 0
missing energy:0
Vertices: 0
HW 8: November 12th
HW9
CODE:
pandas=10# no semicolons
zebras=20# declare vars like all other languages
total_animals_in_zoo=pandas+zebras
print(total_animals_in_zoo)#print stuff to console
pi=3.1415926535 #Approx.
diameter=10
area = pi*diameter*diameter/4
print(area)
a = [1, 2, 3]#list of ints
b = ['cat', 'dog', 'fish']# list of strings
a=b
b=a#this part fails as the values of a have been over written
print('Method 1')
print(a)#unlike other languages this will actually print all elements in the list
print(b)
a = [1, 2, 3]
b = ['cat', 'dog', 'fish']
temp = a#here a is saved so it is not overwritten
a = b
b = temp
print('Method 2')
print(a)
print(b)
dog_legs=0
human_legs=4
goldfish_legs=2
#code
temp = dog_legs
dog_legs=human_legs
human_legs = goldfish_legs
goldfish_legs=temp
#def is keyword to indicate function
def round_to_two_places(num):#end functs, if , for, while with a :
return round(num, 2)#function name is longer than function call
print(round_to_two_places(4.325456463))
def round_to_n_places(num, places):
return round(num,places)#AMAZING
a=[1,5435,324,3,645365,23,4,5665,435,31, -543]
b=-10
c='cat'
print(c)
print(abs(b)) #absolute value
print(min(a)) #min value
print(max(a)) #max value
print(sum(a)) #sum
print(len(c)) #length of a string
print(sorted(a)) #sorts a list
a=[3243,645645,54346]
b=[7,3,2]
def product_of_maxes(a,b):
return max(a)*max(b)
print(3==10)
print(len('abc')==3)
print(max([10,40,53]) < 25)
print (5 != 3)
#prints False, True, False, True
print(min([1,2,3])!=1)
print(max([4,5,6])<7)
print(sum([1,2,3,4])==10)
#prints False, True, True
a=10#b>a
b=20
#a=20#a>b
#b=10
#a=10#a=b
#b=10
if(a>b):
print(a, 'is greater than', b)
elif(a<b):
print(b, 'is greater than', a)
else:
print("they're equal")
def longer_string(string1, string2):
if(len(string1) > len(string2)):
return string1
elif(len(string1) < len(string2)):
return string2
else:
return
print(longer_string('abc', 'jfkdla;s'))
print(longer_string('abcfdsafdasg', 'jfkdla;s'))
print(longer_string('abc', 'def'))
#prints jfkdla;s, abcfdsafdasg, and then nothing
def can_make_cake(eggs, flour, milk, almond_milk):
#I can make cake if I have some kind of milk AND flour AND eggs
return (milk or almond_milk) and flour and eggs > 0
print('Can I make cake?')
#What is the following line implying?
print(can_make_cake(10, True, False, True))
# you can make a cake with 10 eggs, flour, and almond_milk, will print True
print(can_make_cake(0, True, False, True))
# you can't make a cake with no eggs, flour, and almond_milk, will print False
a = [-35423,-5432654,-352435,53252]
def abs_of_min_is_odd(a):
if(abs(min(a))%2==0):
print("Even")
else:
print("Odd")
print(abs_of_min_is_odd(a))
def select_third_element(my_list):
if(len(my_list) < 3):
return None
else:
return my_list[2] #remember index 0, so third element is index 2
foo = ['a', 'b', 'c', 'd']
print(select_third_element(foo))#prints c
list1=[1,2,3]
list2=[4,5,6]
list_of_lists=[list1,list2]#[[1,2,3],[4,5,6]]
print(list_of_lists[1][2])#prints 6
print(list_of_lists[0])#prints [1,2,3]
print(list_of_lists[1])#prints [4,5,6]
print(list_of_lists[0][1])#prints 2
print(list_of_lists[0][2])#prints 3
print(list_of_lists[1][0])#prints 4
#like base 4!
real_madrid=['zidane', 'ramos', 'marcelo']
barcelona=['valverde', 'messi', 'pique']
teams = [real_madrid, barcelona]
def losing_team_captain(team_list):
return team_list[0][1]
print(losing_team_captain(teams))
standings=['mario', 'bowser', 'luigi','peach']
def purple_shell(racers):
temp=racers[0]
racers[0]=racers[len(racers)-1]
racers[len(racers)-1]=temp
print(purple_shell(standings))
list=[1,21,7,-14,49,74,700]
def seven_counter(listy):
r=0.0
for i in listy:
if i%7==0:
r+=1
return r/len(listy)
print(seven_counter(list))
deck={} #represents a deck of cards where the keys are the suits and values are lists of card values
hearts=[]
clubs=[]
diamonds=[]
spades=[]
values=['ace', 'two', 'three', 'four', 'five', 'six','seven', 'eight', 'nine', 'jack', 'king', 'queen']
def create_cards(suit):
for value in values:
suit.append(value)
create_cards(hearts)
create_cards(clubs)
create_cards(diamonds)
create_cards(spades)
##################################Add your code here according to the comments ##########################################
#Use the strings 'h', 's', 'c', 'd' as keys, and add the lists of values to the dictionary.
deck['h']=hearts
deck['s']=spades
deck['c']=clubs
deck['d']=diamonds
#Print a view of dictionary (key, value) pairs
for i in deck:
for j in deck[i]:
print(j+" of "+i)
#Print a view of all of the keys
print(deck.keys())
#Print a view of all of the values
print(deck.items())
#Remove all of the spades from the deck
del deck['s']
#Add a new entry to the dictionary with key 'j' and values 'joker1' and 'joker2'
deck['j']=['joker1','joker2']
#Clear the dictionary
deck.clear()
import math as m
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
data = {'a': np.arange(50),
'c': np.random.randint(0, 50, 50),
'd': np.random.randn(50)}
data['b'] = data['a'] + 10 * np.random.randn(50)
data['d'] = np.abs(data['d']) * 100
plt.scatter('a', 'b', c='c', s='d', data=data)
plt.xlabel('entry a')
plt.ylabel('entry b')
plt.show()# this outputs a scatter plot of the randomly generated data
# NOW MY OWN
data = {'a': np.arange(100),
'c': np.random.randint(10,30,100),
'd': data['c']}
data['b'] = np.square(data['a']-50)+np.random.randint(-20,20,100)
data['d'] = np.abs(data['d']) * 10
plt.scatter('a', 'b', c='c', s='d', data=data)
plt.xlabel('entry a')
plt.ylabel('entry b')
plt.show()
# imports some software packages we'll use
import pandas as pd
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
inline_rc = dict(mpl.rcParams)
# a hashtag tells the program "don't read the rest of the line"
# That way we can write "comments" to humans trying to figure out what the code does
two_u = pd.read_csv('https://github.com/adamlamee/HEP-data/raw/master/Double_Muon_Run2011A.csv')
# two_e = pd.read_csv('https://github.com/adamlamee/HEP-data/raw/master/Double_Electron_Run2011A.csv')
# one_u = pd.read_csv('https://github.com/adamlamee/HEP-data/raw/master/Single_Muon_Run2011A.csv')
# one_e = pd.read_csv('https://github.com/adamlamee/HEP-data/raw/master/Single_Electron_Run2011A.csv')
data = two_u
# The .head(n) command displays the first n rows of a file.
data.head(3)
# The .shape command displays the (number of rows , number of columns) in a file.
data.shape
# You can specify a column by dataset.columnName (e.g., two_u.E1)
# This makes a new column called "totalE" and fills it with (E1 + E2) for each event
data['totalE'] = data.E1 + data.E2
# This makes a new column called "Esquared" and fills it with E1^2 for each event
data['Esquared'] = data.E1**2
# makes the histogram
plt.hist(data.totalE, bins=10, range=[0,120], log=False)
plt.title("Dimuon Events")
plt.xlabel("x-axis label")
plt.ylabel("number of events")
OUTPUT:
30
78.5398163375
Method 1
['cat', 'dog', 'fish']
['cat', 'dog', 'fish']
Method 2
['cat', 'dog', 'fish']
[1, 2, 3]
4.33
cat
10
-543
645365
656743
3
[-543, 1, 3, 4, 23, 31, 324, 435, 5435, 5665, 645365]
False
True
False
True
False
True
True
20 is greater than 10
jfkdla;s
abcfdsafdasg
None
Can I make cake?
True
False
Even
None
c
6
[1, 2, 3]
[4, 5, 6]
2
3
4
ramos
None
0.7142857142857143
ace of h
two of h
three of h
four of h
five of h
six of h
seven of h
eight of h
nine of h
jack of h
king of h
queen of h
ace of s
two of s
three of s
four of s
five of s
six of s
seven of s
eight of s
nine of s
jack of s
king of s
queen of s
ace of c
two of c
three of c
four of c
five of c
six of c
seven of c
eight of c
nine of c
jack of c
king of c
queen of c
ace of d
two of d
three of d
four of d
five of d
six of d
seven of d
eight of d
nine of d
jack of d
king of d
queen of d
dict_keys(['h', 's', 'c', 'd'])
dict_items([('h', ['ace', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'jack', 'king', 'queen']), ('s', ['ace', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'jack', 'king', 'queen']), ('c', ['ace', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'jack', 'king', 'queen']), ('d', ['ace', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'jack', 'king', 'queen'])])
Text(0,0.5,'number of events')
HW10:
For hw10 we used google deep dream to create some cool images:
Code here:
https://github.com/keras-team/keras/blob/master/examples/deep_dream.py
The image I ran the code on is below:
Then after running the code in colab, I ended up with:
Which certainly looks incredibly terrifying.
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
Beginning of Honr269L Logbook
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
Week 1:
In the first week I am just getting back into college routine
Sent email to Bill Dorland after class on Thursday
Friday snow day
Week 2:
Got response next Wednesday, set up meeting for Friday
On Friday’s meeting met with Bill Dorland and Matt Landreman about the project
Bill told me about how Matt has created several million fusion magnetic containment designs, that he computed through a brute force search.
The "stellarator" we saw last semester is one such example of a reactor design, but one that was done by hand.
All of these designs are sitting as mostly unprocessed data, and my project will be figuring out how to use machine learning to see if any of the reactor designs cluster usefully around any particular qualities.
Week 3:
On Monday all 3 of us met again The main purpose of the meeting was for Matt to show us the format the data is in and what sort of things he has done to generate the data.
He generated the curve that defines the design, by using a small fourier series with a small discrete choice of constants for each level of the series.
The Fourier series creates a continuous curve, which using advanced math is evaluated for certain properties all of which are logged in a format called NetCDF
How to Read NetCDF:
Manually from a terminal using command ncdump
In python using a NetCdf4 library:
ex:
from netCDF4 import Dataset
dataset = Dataset('../data/quasisymmetry_out.20190105-01-012_nfp1_stellSym.nc')
print(dataset.dimensions.keys())
print(dataset.variables.keys())
sz=dataset.dimensions['N_scan'].size
for i in range(0,sz):
print(dataset.variables['iotas'][i])
This prints out all the sizes used in the file, whatr each variable stored is called, and then accesses and print each element of the "iotas" variable from the data file matt gave me.
Next on Wednesday met again with Bill, talked about ML, and how I was going to start with it
Start with something simple and ensure it works to make sure result is going to make sense
Planning to use TensorFlow
Bill suggested using K-means clustering
Week 4:
Did not meet with Bill Dorland or Matt this week, Bill was not in on Monday and school was canceled due to snow on Wednesday.
In the mean time I have been reading up about tensorflow, much of it is above me for right now.
What I am trying to learn about is K-means clustering, and what I have determined is that it is not the right ML algorrithum, what we want is to create some circles in data that indicate clusters, however
the K-means algorithum to the best of my abiltiy to understand only separates data radially:
But I want an algorithum that can pick out circles like below:
So while I learned a bit about Tensorflow tensors, and how to move data from Netcdf to a tensor, I have not made progress towards something like the image above (which I made by taking a image provided my matt and editing it in paint)
Week 5:
On Monday, met and decided that because k-means doesn’t work
My goal should be to do research and find 3 possible models that might work better.
Model 1 : Expectation–maximization
EMis used to find (local) maximum likelihood parameters of a statistical model in cases where the equations cannot be solved directly
Generates new random data using model, and compares it to the actual data
Requires knowing a form of the model already, polynomial, log, exponential
Not clustering neccessarily but still useful for looking at the data, but requires knowing something about what the data should look like before hand which is very hard
Model 2 Density-based SCAN (DBSCAN)
Clustering algorithm with two picked variables:
Radius
Min points
Picks a point at random if it has less than min-points within radius, then it N point and is discarded, otherwise is a core point (Red in left picture) and begins cluster, applying the same question to all nodes within radius of the point until all nodes in that range are connected. The edge points (yellow in left pic) are points adjacent to a core point that is not a core point, at the end edge points are discarded as well.
Pros:
Any shape and number groupings
Easy to implement
Cons:
Maybe not the best due to the density and holes in the data
Lastly
Model 3: Affinity Propagation
Alternates updating two metrics: Affinity / responsibility and Availability
(r(i,k) is element i,k in responsibility matrix, which represents responsibility of node i for node k)
(a(i,k) is element i,k in availability matrix)
(s(i,k) is the "distance" between node i and k)
After this has converged, then where r(i,i)+a(i,i)>0 are the centers
Pros:
Any number groupings, grouping in any position relative to each other
Fairly easy to implement?
Cons:
Not as sensitive to the holes in the data as DBSCAN (I think)
I will work on implementing model three over the weekend:
Week 6:
Found a library called scikit-learn, and it contains example code about using Affinity propagation.
Combined with example code here is what I came up with:
from sklearn.cluster import AffinityPropagation
from sklearn import metrics
import matplotlib.pyplot as plt
from itertools import cycle
from sklearn.datasets.samples_generator import make_blobs
import random as r
import numpy as np
from netCDF4 import Dataset
import time
# #############################################################################
# Generate sample data
def weighted (n):
x=0.0;
for i in range(0,n):
x+=r.random()
return x/n
Y=[]
for i in range(0,100):
Y+=[[weighted(16),weighted(16)]]
for i in range(0,100):
Y+=[[2+weighted(6),weighted(6)]]
Y=np.array(Y)
Above Imports necessary libraries and generates random data around 0,0 and 2,0, the weighted function gives a random distribution, that is more bezier the higher the input, and uniform at 1.
Below is the affinity propagation code from scikit learn
# Compute Affinity Propagation
af = AffinityPropagation(convergence_iter=5000000).fit(DATAUSED)
cluster_centers_indices = af.cluster_centers_indices_
labels = af.labels_
n_clusters_ = len(cluster_centers_indices)
"""
print('Estimated number of clusters: %d' % n_clusters_)
print("Homogeneity: %0.3f" % metrics.homogeneity_score(labels_true, labels))
print("Completeness: %0.3f" % metrics.completeness_score(labels_true, labels))
print("V-measure: %0.3f" % metrics.v_measure_score(labels_true, labels))
print("Adjusted Rand Index: %0.3f"
% metrics.adjusted_rand_score(labels_true, labels))
print("Adjusted Mutual Information: %0.3f"
% metrics.adjusted_mutual_info_score(labels_true, labels))
print("Silhouette Coefficient: %0.3f"
% metrics.silhouette_score(X, labels, metric='sqeuclidean'))
"""
# #############################################################################
Below plots the final data, with colors according to what cluster they belong to, the commented code, adds additional markers on the clustering centers, and also draws lines between the centers to all of the points in a cluster.
plt.close('all')
plt.figure(1)
plt.clf()
plt.xlabel(xvar)
plt.ylabel(yvar)
colors = cycle('bgrcmykbgrcmykbgrcmykbgrcmyk')
for k, col in zip(range(n_clusters_), colors):
class_members = labels == k
cluster_center = DATAUSED[cluster_centers_indices[k]]
plt.plot(DATAUSED[class_members, 0], DATAUSED[class_members, 1], col + '.')
#plt.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col,
#markeredgecolor='k', markersize=14)
#for x in X[class_members]:
# plt.plot([cluster_center[0], x[0]], [cluster_center[1], x[1]], col)
plt.title('Points: '+str(howmany) +'. Estimated number of clusters: %d' % n_clusters_)
plt.savefig('/home/annabelle/School/FusionStuff/'+str((int)(time.time()*100000))+'.png', bbox_inches='tight')
plt.show()
Below is a example output of this code:
Next what I did was input the data from Matt into the clustering algorithm, but as the data set is very large I only took a very small rendom selection from the data.
The code works well at clustering, especially with small numbers of points. as the number of points increases, it may stop converging the last cluster as seen above in the bottom left.
After getting some good examples with the scikit learn library, I will write my own.
Week 7:
Initially I just translated the wikipedia definition of the algorithm directly into python, however I was running into the issue that instead of converging, my two matrices were simply unchanging or oscillating between two values and not slowly converging to a finished state.
I did some research and after reading http://academics.smcvt.edu/jtrono/Papers/SMCClustering%20Paper_PatrickRedmond.pdf.
I found out that to fix this problem I can average the new value for the index of the matrix with the old one with a damping factor, which slowly convergence but fixes the oscillation.
It took a lot of work, and me throwing out the code and starting again once but in the end I got the below code:
import matplotlib.pyplot as plt
from itertools import cycle
import random as r
import numpy as np
from netCDF4 import Dataset
import time,math,sys
dataset = Dataset('../data/quasisymmetry_out.20190105-01-019_nfp5_stellSym.nc')
sz=dataset.dimensions['N_scan'].size
Z=[]
howmany=150
xvar='iotas'
yvar='max_curvatures'
for i in range(0,howmany):
rand=(int)(r.random()*sz)
Z+=[[dataset.variables[xvar][rand],dataset.variables[yvar][rand]]]
print("Data Read")
DATAUSED=Z
####################################################
s=[[0]*howmany for i in range(0,howmany)]
r=[[0]*howmany for i in range(0,howmany)]
a=[[0]*howmany for i in range(0,howmany)]
The above section initializes the data to be processed, and also sets up variables and imports libraries, similar to before.
###########SETTING UP S ################################
def dist(x,y):
dx = math.pow(x[0]-y[0],2)
dy = math.pow(x[1]-y[1],2)
return -math.sqrt(dx+dy)
for i in range(0,howmany):
for k in range(0,howmany):
s[i][k]=dist(DATAUSED[i],DATAUSED[k])
for k in range(0,howmany):
s[k][k]=np.median(s)
This pre-calculates the distance between any two points, as it will be needed many times and instead of calculating it each time it is needed it is only calculated once and then stored.
def updateR():
newr=[[0]*howmany for i in range(0,howmany)]
for i in range(0,howmany):
for k in range(0,howmany):
maxof=-sys.maxsize
for kbar in range(0,howmany):
if kbar!=k:
maxof=max(maxof, s[i][kbar]+a[i][kbar])
newr[i][k]=((s[i][k]-maxof)+(r[i][k]))/2
return newr
This is a direct translation of the definition of the responsibility matrix definition from wikipedia, the paper linked above also contains pseudo code.
def updateA():
newa=[[0]*howmany for i in range(0,howmany)]
for i in range(0,howmany):
for k in range(0,howmany):
sum=0
if i!=k:
for ibar in range(0,howmany):
if ibar!= i and ibar!=k:
sum+=max(0,r[ibar][k])
newa[i][k]=((min(0,r[k][k]+sum))+a[i][k])/2
else:
for ibar in range(0,howmany):
if ibar!=k:
sum+=max(0,r[ibar][k])
newa[k][k]=(sum+a[k][k])/2
return newa
This is a direct translation of the definition of the availability matrix definition from wikipedia, the paper linked above also contains pseudo code.
def iterate(n):
global r,a,s
printar(s)
i=0
while i<n:
if i%2==0:
r = updateR()
else:
a = updateA()
i+=1
if i%10==0:
print(i)
iterate(5000)
This is the main loop of the program, currently it is set to run some number of times, but the alg can also run until the number of clusters does not change over some number of iterations.
def getExemplars():
e=[]
for k in range(0,howmany):
if (a[k][k]+r[k][k]>0):
e+=[k]
return e
e=getExemplars()
q=[]
for i in e:
q+=[DATAUSED[i]]
Exemp=np.array(q)
print("clusters : "+str(Exemp))
def members(exmplars):
allpointshash=[[] for item in range(0,howmany)]
for i in range(0,howmany):
max2=r[i][0]+a[i][0]
kmax=0
for j in range(0,howmany):
if r[i][j]+a[i][j]>max2:
max2=r[i][j]+a[i][j]
kmax=j
allpointshash[kmax]+=[i]
return allpointshash
mem=members(getExemplars())
printar(mem)
print(mem)
Together these two functions figure out from the two matricies which points are centers and if a point is not a center what cluster it belongs to.
colors = cycle('bgrcmykbgrcmykbgrcmykbgrcmyk')
DATAUSED2=np.array(DATAUSED)
print(DATAUSED2.size)
############### PLOTTING ######################################
plt.close('all')
plt.figure(1)
plt.clf()
clus=0
plt.xlabel(xvar)
plt.ylabel(yvar)
for k, col in zip(range(len(mem)), colors):#draw the cluster centers
if mem[k]!=[]:
clus+=1
plt.plot(DATAUSED[k][0], DATAUSED[k][1], col+".")#Cluster Center
for p in mem[k]:
plt.plot([DATAUSED[p][0], DATAUSED[k][0]], [DATAUSED[p][1], DATAUSED[k][1]], col)#Lines Between
plt.plot(DATAUSED[p][0], DATAUSED[p][1], col+".")#other points
plt.title('Points: '+str(howmany)+". Clusters: "+str(Exemp.__len__()))
plt.savefig('~/FusionStuff/'+str((int)(time.time()*100000))+'.png', bbox_inches='tight')
plt.show()
#############################################################################
Leading to the final part which like before graphs the result using matplotlib.
This code works excellently producing the following graphs, my version is slower but also lacks the failing to cluster final section problem that the scikit learn algorithm had:
(first is before I added color)
I did not meet on Monday as Bill was Busy, but on Wednesday we talked about goals over break and the plan was to do several things:
Make it faster by pre-calculating max and sum of columns and rows.
figure out how long the algorithm really needs to run for until the clustering is "good enough"
Begin looking at the newer data Matt has and potentially use some code he has provided me to look at 3d models of several designs to look and understand the data better.
Since being back from break:
Monday:
Informed Bill of the progress I made over break, which was the improvement of my algorithm from O(n^3) to O(n^2) by pre-calculating max and sum of columns and rows.
Wednesday:
I was very busy with several tests, so did not meet with Bill
Monday:
Bill was sick, did not meet
Wednesday:
Met for a short time, arranged a meeting for Sunday which we would work for a much longer time to improve Affinity propagation, in order to hopefully have it deal with millions of data points at once. If we could not accomplish that then we would switch to a new algorithm. As hw, Bill had me prepare
SUNDAY:
First I explained in detail to Bill all of the parts of affinity propagation, through the pseudo code of the definition of the algorithm and also through the code of my improvements.
What he noticed is that the first matrix the similarity matrix was very like a negative definite adjacency matrix, which he had seen other used with other things in quantum mechanics. What Bill guessed is that he thought the other two matrixes were a way of approximating eigenvalues and eigenvectors.
So after some introduction/review of eigenvalues and vectors, Bill set out for me to investigate whether eigen values or vectors can be used to find clusters for some very small sets of data. As calculating eigen values and vectors can be done in nlogn, but it is very difficult to make any manual sense of what to do with those values and vectors on large sets of data.
On monday, I met again but didn't have much to report, but after todays meeting I have written some mathematica code to generate random points in clusters, calculate the similarity matrix (plus some value along the diagonal), and then calculate the Eigen Values and vectors.
While perhaps not the most efficient code, my code is below
close = 6
r[] := First[RandomReal[-1, 1]]/close
points = {
{1, 1}, {1 + r[], 1 + r[]}, {1 + r[], 1 + r[]},
{-1 + r[], -1 + r[]}, {-1 + r[], -1 + r[]}, {-1, -1},
{-1 + r[], 2 + r[]}, {-1 + r[], 2 + r[]}, {-1 + r[], 2 + r[]},
{2 + r[], -2 + r[]}, {2 + r[], -2 + r[]}, {2 + r[], -2 + r[]}
}
num = Length[
points]; (*less than 26*)
(*set up some matrix a matrix*)
dists =
Outer[List, Take[Alphabet[], num], Take[Alphabet[], num]] //
Flatten[#, 1] &;
(*replace the elements of the matrix with the points*)
g[a_, b_] := Replace[a, First[b] -> Last[b], {2}];
dists = Last[
FoldList[g, dists,
MapThread[List, {Take[Alphabet[], num], points}]]];
(*replace the points with the respective distances*)
f[a_] := -1*EuclideanDistance[First[a], Last[a]];
dists = Map[f, dists, {1}];
dists = ArrayReshape[dists, {Length[points], Length[points]}];
(*add the diagonal in*)
mat = dists + IdentityMatrix[num]*0.1;
MatrixForm[mat];
eval = Eigenvalues[mat] // N
evec = Eigenvectors[mat] // N
(*try to compute the centers by taking the dot of the evectors with \
the points*)
(*works sometimes*)
h[a_, b_] := a + First[b]*Last[b];
center[evector_] :=
Last[FoldList[h, 0, MapThread[List, {evector, points}]]]
ListPlot[points]
centers = Map[center, evec]
ListPlot[Take[centers*Abs[eval], 4]]
What I have found so far is that there are large eigen values in number equal to the number of clusters, and then the rest of the eigen values are small in comparison. (Bill did predict that this would be the case but I have indeed observed it)
I have also attempted to see if I could get clusters out of the eigen values, and so far I have been not entirely successful. Plotting centers * |eigenvalues| is interesting as it seems to give a fairly stable result (not changing with the small fluctuations in the random points generated).
Example points:
Centers*|eigenvalues|:
Certainly the points in this diagram for Centers*|eigenvalues| are not cluster centers, but the fact that they remain constant over small iterations and different sets of data is very interesting.
Upon further investigation, plotting eval*centers/Map[Norm, centers] gives:
Which does really look like cluster centers, not in magnitude exactly but in direction!
On a different set of data:
eval*centers/Map[Norm, centers]
which looks about right as well, with two obvious clusters and then some other ones in the center
Next Week:
I updated my mathematica code to plot it color, and also assign points to clusters based on the centers found.
An example clustering is provided below:
What I have found is:
Don’t have control over number of clusters it splits into
Not perfect, the directions the vectors I generate point not exactly where the clusters are
Is just taking random dot with points going to tell you sorta where points are?
Overall, interesting but not super useful, so what I did next was do some research about how spectral clustering is usually done
So I looked and found the following source: Segmentation Using Eigenvectors: A Unifying View by Yair Weiss
Which was very helpful, it told me several things:
Instead of using biggest Eigenvectors use the second smallest
Which would be a problem as approximation methods for e-values and vectors go from biggest to smallest
But it also gives a way to calculate the second smallest e-value from the first two biggest ones
I have not been able to confirm this fact in mathematica.
Instead of using the coordinates of the points, instead clusters with the values on the eigen-vector
Week after that:
Matt Launderman sent us an invite to a talk on spectral clustering to go to on Kirwan Hall 3206, Tuesday, April 30, 2:00pm, and Wednesday, May 1, 2:00 pm by David Bindel
However I had a appointment on Monday, so I only had time to meet with Bill for a short time
Decided to both go to the talk as that ison spectral clustering: which is what we are focusing on right now
Also Bill provided me two papers to read over and understand about skeletonization of a matrix:
A Highly Accurate Solver for Stiff Ordinary Differential Equations
by Dan Kushnir∗ Vladimir Rokhlin†
and
ON THE COMPRESSION OF LOW RANK MATRICES
by H. CHENG†, Z. GIMBUTAS†, P. G. MARTINSSON‡, AND V. ROKHLIN‡
On Wednesday Bill was ill, so we did not meet