Pandas
What is pandas?
allows you to work with larger, v large datasets
Intro video I watched: https://www.youtube.com/watch?v=vmEHCJofslg
allows you to work with larger, v large datasets
Intro video I watched: https://www.youtube.com/watch?v=vmEHCJofslg
type in
pip install pandas
working with pandas in jupyternotebook
import pandas as pd
dataframe = pd.read_csv('name of data') #if you are in the same directory
(notes from https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python)
In general, you could say that the Pandas DataFrame consists of three main components: the data, the index, and the columns.
Firstly, the DataFrame can contain data that is:
a Pandas DataFrame
a Pandas Series: a one-dimensional labeled array capable of holding any data type with axis labels or index. An example of a Series object is one column from a DataFrame.
a NumPy ndarray, which can be a record or structured
a two-dimensional ndarray
dictionaries of one-dimensional ndarray’s, lists, dictionaries or Series.
Note the difference between np.ndarray and np.array() . The former is an actual data type, while the latter is a function to make arrays from other data structures.
https://www.youtube.com/watch?v=OYZNk7Z9s6I
In pandas columns ad index(aka row labels) are the names for the x and y name of the data frame.
Index and columns are always there for pandas, its not something that is a option.
difference between loc, iloc, ix
print(df.columns)
print(df['Name'])
print(df['Name'][0:5])
print(df.Name)
print(df[['Name', 'Speed', 'HP']])
print(df.iloc[1])
print(df.iloc[1:4])
print(df.iloc[2,1])
index_list = df.index.tolist()
column_name_list = df["column_name"].tolist()
#reading headers
#read each column
#can specify if only want 0-5 etc
#can also do like df.Name
#looking at specific things
#iloc is intiger location
#can do multiple rows
#Read specific location (row, column)
#converting a df's index to list
converting a column to list
#sometimes we want to go through each row of the data
for index, row in df.iterrows(): #itter through rows
print(index, row)
friends = [{'name' : 'Jone', 'age' : 20, 'job': 'student'},
{'name' : 'amy', 'age' : 10, 'job': 'none'},
{'name' : 'Nate', 'age' : 30, 'job': 'teacher'}]
df = pd.DataFrame(friends)
df = df[['name', 'age', 'job']]
df. head
[{column:row}, {colums, row2},
detail example see: https://www.youtube.com/watch?v=hmYdzvmcTD8