Tableau Prep for ETL (extract, transform and loading) process because it is easy to visualize, understande and start-
But I will use Python here since it is now my most frequent use language
Have fun torture the data
Some tabular data just don't have value(Nan in python)
Using other expressions like non, don't have value, 9999 etc
Can't collect
have problems when fetching data
just random missing
Typically, we just want to get rid of the missing data for the analysis step. Usually just drop data is the most convient way, but it will lose useful information.
There are still some methods can be used to deal with the missing data, here introduce in data type.
Sometimes the missing data isn't just show NaN, So you should really be familer with the dataset
Categorical variables missing
drop it if small
fill na with mode ( it also can be used in numeric variables, just more helps in categorical data)
Numeric Variables missing
just drop it if small
fill na with mean or median based on the distribution
fill na using regressions or other methods
if it's time series data, the previous, last or moving average could be useful. But I will not include this part in the code example.
string/category
int
float
boolean
Time Stamp
Some ranking data was recognized as numeric data
Some numeric data have type issue and can not be recognized as
identify the category variables
If you are familer with SQL, you would know that there four kind of join, which is the same in the python merge.
The picture show right side shows the relationship between two dataset when you join.
The join type ususally depends on your target.
pandas
merge
Pivot and pivot table
melt
stack and unstack
Groupby