Python

https://study.com/academy/lesson/pearson-correlation-coefficient-formula-example-significance.html

https://gramener.com/retail/clothing-sales

https://docs.python.org/3/tutorial/index.html

http://interactivepython.org/runestone/static/pythonds/index.html

http://greenteapress.com/thinkpython2/html/index.html

http://worrydream.com/refs/Crowe-HistoryOfVectorAnalysis.pdf

https://en.wikipedia.org/wiki/Transformation_matrix

https://www.khanacademy.org/math/precalculus/precalc-matrices/row-echelon-and-gaussian-elimination/v/matrices-reduced-row-echelon-form-2

https://gist.github.com/anonymous/7d888663c6ec679ea65428715b99bfdd

https://www.mathbootcamps.com/proof-every-matrix-transformation-is-a-linear-transformation/

https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues

https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection#

https://www.slideshare.net/21_venkat/decision-tree-53154033


https://www.thispersondoesnotexist.com/

http://www.whichfaceisreal.com/

https://github.com/the-deep-learners/TensorFlow-LiveLessons/



Actual/pred No Yes

No TN FP

Yes FN TP

Positive Predictive Value

Precision = TP / TP+FP


actual 'Yes' case is predicted correctly

True Positive Rate (TPR) = TP/ TP + FN

sensitivity = TP/ TP + FN

Recall = TP/ TP + FN


Specificity = TN/ TN + FP


Accuracy= TN + TP / (TN + TP + FN + FP)


false postive rate FPR = FP/ TN + FP


wEIGHT OF eVIDENCE = WOE=ln(PercentageofGood / PercentageofBad)

IV=WOE * (Good in the bucket/ Total Good − Bad in the bucket / Total Bad )


list1 = [n*6 for n in range(1,11) if n%2==0]

map(lambda e: e**2, filter(lambda e: type(e) == types.IntType, a_list))

squared_ints = [ e**2 for e in a_list if type(e) == types.IntType ]

master_df.groupby('Region').Sales.sum()

master_df['Profit'] = master_df['Profit'].apply(lambda x: round(x, 1))

master_df.pivot_table(values = 'is_profitable', index = 'Region', columns = 'Customer_Segment', aggfunc = 'sum')

df_3 = pd.merge(df_2, shipping_df, how='inner', on='Ship_id')

pd.concat([df1, df2], axis = 0)

companies = pd.read_csv("companies.txt", sep="\t", encoding = "ISO-8859-1")

import pymysql

# create a connection object 'conn'

conn = pymysql.connect(host="localhost", # your host, localhost for your local machine

user="root", # your username, usually "root" for localhost

passwd="yourpassword", # your password

db="world") # name of the data base; world comes inbuilt with mysql

# create a cursor object c

c = conn.cursor()

# execute a query using c.execute

c.execute("select * from city;")

# getting the first row of data as a tuple

all_rows = c.fetchall()

# to get only the first row, use c.fetchone() instead

import requests, bs4

# getting HTML from the Google Play web page

url = "https://play.google.com/store/apps/details?id=com.facebook.orca&hl=en"

req = requests.get(url)

# create a bs4 object

# To avoid warnings, provide "html5lib" explicitly

soup = bs4.BeautifulSoup(req.text, "html5lib")

import PyPDF2

# reading the pdf file

pdf_object = open('animal_farm.pdf', 'rb')

pdf_reader = PyPDF2.PdfFileReader(pdf_object)

# Number of pages in the PDF file

print(pdf_reader.numPages)

# get a certain page's text

page_object = pdf_reader.getPage(5)

# Extract text from the page_object

print(page_object.extractText())

df.isnull().sum()

df.isnull().any()

df.isnull().any(axis=0)

df.isnull().any(axis=1)

df.isnull().all(axis=1)

df.isnull().all(axis=1).sum()

df.isnull().sum(axis=1)

round(100*(df.isnull().sum()/len(df.index)), 2)

df[df.isnull().sum(axis=1) > 5]

len(df[df.isnull().sum(axis=1) > 5].index)

df = df[df.isnull().sum(axis=1) <= 5]

df = df[~np.isnan(df['Price'])]

df.loc[np.isnan(df['Lattitude']), ['Lattitude']] = df['Lattitude'].mean()

df['Car'] = df['Car'].astype('category')

df['Car'].value_counts()

year_month = pd.pivot_table(df, values='Sales', index='year', columns='month', aggfunc='mean')

plt.figure(figsize=(12, 8))

sns.heatmap(year_month, cmap="YlGnBu")

plt.show()

68% probability of the variable lying within 1 standard deviation of the mean

95% probability of the variable lying within 2 standard deviations of the mean

99.7% probability of the variable lying within 3 standard deviations of the mean

P(-0.5 < Z < 0.5) = P(Z < 0.5) - P(Z < -0.5)

P(Z<=2.33) + P(Z>2.33) = 1,

P(Z>2.33) = 1 - P(Z<2.33)

P(Z>-3) = 1 - P(Z<-3)

P(-2.2 < Z < 1.8) = P(Z < 1.8) - P(Z < -2.2)


Deep Learning

https://learning.oreilly.com/library/view/deep-learning-illustrated/9780135116821/

https://resources.oreilly.com/live-training/deep-learning

https://github.com/the-deep-learners/TensorFlow-LiveLessons

https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html