working-with-kaggle-data
Example
https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques
Download Dataset
Upload to colab file section
Code Reference
https://www.kaggle.com/code/gusthema/house-prices-prediction-using-tfdf
add this on top
!pip install tensorflow_decision_forests
path should refer to your upload one
train_file_path = "./train.csv"
Note
"By default the Random Forest Model is configured to train classification tasks. Since this is a regression problem, we will specify the type of the task (tfdf.keras.Task.REGRESSION) as a parameter here. "
Extra Notes
configure
https://www.tensorflow.org/decision_forests/api_docs/python/tfdf/keras/RandomForestModel
OOB
https://developers.google.com/machine-learning/decision-forests/out-of-bag
Discover
Open
https://docs.google.com/spreadsheets/d/1h9XHoZLzsjKQfjVhG4ayYWk7-EJOXOS1Kz9V2zRBNVM/edit?usp=sharing
The dataset on Kaggle
https://www.kaggle.com/code/pratik1120/penguin-dataset-eda-classification-and-clustering
Recall
What features are more powerful to do prediction?
Observe the regression model for year , is it a good prediction ?
( make year as classification prediction to observe)
Note that, the Conf. is calculated by
https://ydf.readthedocs.io/en/latest/tutorial/classification/
Can we make the numeric regression model to classification model ?
( see mass example )
Review Problem framing
https://developers.google.com/machine-learning/problem-framing/ml-framing