Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures.
Fast and efficient DataFrame object with default and customized indexing.
Tools for loading data into in-memory data objects from different file formats.
Data alignment and integrated handling of missing data.
Reshaping and pivoting of date sets.
Label-based slicing, indexing and subsetting of large data sets.
Columns from a data structure can be deleted or inserted.
Group by data for aggregation and transformations.
High performance merging and joining of data.
Time Series functionality.
Pandas deals with the following three data structures,
Series - 1D labeled homogeneous array, size immutable.
DataFrame - General 2D labeled, size-mutable tabular structure with potentially heterogeneously typed columns.
Panel - General 3D labeled, size-mutable array
Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.)
pandas.Series( data, index, dtype, copy)
data : data takes various forms like ndarray, list, constants
index : Index values must be unique and hashable, same length as data. Default np.arrange(n) if no index is passed.
dtype : dtype is for data type. If None, data type will be inferred
copy : Copy data. Default False
s1 = pd.Series(np.random.randint(low=0,high=100,size=3)) # Here am using numpy to generate random numbers
print(s1)
0 92
1 18
2 6
dtype: int64
DataFrame can be created using the following constructor
pandas.DataFrame( data, index, columns, dtype, copy)
DataFrame can be created from,
Lists
dict
Series
Numpy ndarrays
Another DataFrame
parameters of the constructor are as follows,
Data : data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame.
index : For the row labels, the Index to be used for the resulting frame is Optional Default np.arrange(n) if no index is passed.
columns : For column labels, the optional default syntax is - np.arrange(n). This is only true if no index is passed.
dtype : Data type of each column.
copy : This command (or whatever it is) is used for copying of data, if the default is False.
df = pd.DataFrame({"Name":["Dharan","Vinay",'Kumar','ram','raj'],
"Age":list(np.random.randint(low=10,high=60,size=5))})