Post date: Dec 30, 2014 8:38:35 AM
Pandas is a great tool for data analysis as it is able to represents the data as a data frame and provides tons of awesome tools. It feels like you are using R in Python except it's even better!
Scikit-learn is arguably the best machine learning package for python, not only because it is very well structured and designed, but also because its large spectrum of models and great support communities. The problem is that scikit-learn is developed based on numpy, and therefore not compatible with dataframe from pandas.
There are at least 2 ways to bridge the gap:
I implemented a short demo of both approaches in iPython notebook here. The demo uses decision tree classifier as an example.
Useful resources