Tools for Data Scientists

1) Python

2) R

3) Shell scripts

4) SQL (sqldf package on R or Sequel Pro)

5) Mirador for quick data visualization and statistics (also caravel from AirBnB, Trifacta Data Wrangler, plotly, Lyra, Shiny R, Bokeh Python)

Also tensorflow in the browser

6) Weka for quick machine learning and Orange for quick machine learning and visual programming (also rattle package in R)

7) Machine learning

8) High performance computing and distributed computing (Spark, Hadoop, Scala)

9) Slack (for team communication)

10) Github or bitbucket (for version control and issue tracking)

11) NoSQL, Scala/Spark and Hadoop

12) Trello for time management

13) Data sharing (AWS or Dropbox)

14) Reproducible data science using Jupyter notebooks

15) Developer documentation using Doxygen

16) Docker for containerized data science (link)

17) Virtual machines (VirtualBox) and run Ubuntu UNIX in the VM

Download the Ubuntu ISO file here

18) Configuration as code (AirFlow)

19) Automated machine learning (TPOT, auto-sklearn)

20) Reproducible research using knitr and rmarkdown