Data Science
Distributed Computing Framework
Distributed Computing Framework
- Apache Spark : a fast and general engine for large-scale data processing
- Apache Hadoop : a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models
- Apache HBase : an open-source, distributed, versioned, non-relational database
- Apache Hive : a data warehouse software to facilitate querying and managing large datasets residing in distributed storage
- Apache Phoenix : High performance relational database layer over HBase for low latency applications
- Apache Pig : a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs
General-Purpose Library
General-Purpose Library
- pandas : an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language
Data Mining
Data Mining
- Weka : a collection of machine learning algorithms for data mining tasks
- RapidMiner
- KNIME
- ELKI : Environment for Developing KDD-Applications Supported by Index-Structures
Data Visualization
Data Visualization
- gnuplot : a portable command-line driven graphing utility for Linux, OS/2, MS Windows, OSX, VMS, and many other platforms
- VTK : the Visualization Toolkit, an open-source, freely available software system for 3D computer graphics, image processing and visualization
- MathGL : a library for making high-quality scientific graphics under Linux and Windows
- PLplot : a cross-platform software package for creating scientific plots
- OxyPlot : a cross-platform plotting library for .NET
- Google Charts
- JFreeChart : a free 100% Java chart library that makes it easy for developers to display professional quality charts in their applications
- RGraph : HTML5 charts library, Open Source interactive charts using JavaScript and the HTML5 canvas tag
- Raphael : a small JavaScript library that should simplify your work with vector graphics on the web
- Graphviz : an open source graph visualization software
- Gephi : an interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs
- Cytoscape : an open source software platform for visualizing complex networks and integrating these with any type of attribute data
- Tulip : an information visualization framework dedicated to the analysis and visualization of relational data