R Lab

Background

R is a language and software environment which was made by Robert Gentleman and Ross Ihaka from the University of Auckland in New Zealand. It is used in various fields such as for analysis of statistical information, their graphical representation and their reporting. It is being further developed by the R Development Core Team. R is free under the GNU General Public License and its precompiled binary code is available for download. It also supports many operating systems like Windows, Linux and MacOS.

The name of this programming language originates from the initials of the first name of both its creators, which is R. Also, its name can be considered as a play on the name of the S language, which was created by the Bell Laboratories.

R was made in the 1990s when programming languages like Redmonk, Tlobe and PyPL were popular. It is an alternate implementation of the S language, as already stated. R is an important tool for machine learning and statistics, along with numerical analysis.

Introduction

R Analytics (or R programming language) is a free, open-source software used for heavy statistical computing. The language is built specifically for, and used widely by, statistical analysis and data mining.

More specifically, it’s used to not just analyze data, but create software and applications that can reliably perform statistical analysis.

In addition to the standard statistical tools, R includes a graphical interface. As such, it can be used in a wide range of analytical modeling including classical statistical tests, lineal/non-lineal modelling, data clustering, time-series analysis and more.

Importance

R is very important in data science because of its versatility in the field of statistics. R is usually used in the field of data science when the task requires special analysis of data for standalone or distributed computing.

R is also perfect for exploration. It can be used in any kind of analysis work, as it has many tools and is also very extensible. Additionally, it is a perfect fit for big data solutions.

Following are some of the highlights which show why R is important for data science:

  • Data analysis software: R is s data analysis software. It is used by data scientists for statistical analysis, predictive modeling and visualization.

  • Statistical analysis environment: R provides a complete environment for statistical analysis. It is easy to implement statistical methods in R. Most of the new research in statistical analysis and modeling is done using R. So, the new techniques are first available only in R.

  • Open source: R is open source technology, so it is very easy to integrate with other applications.

  • Community support: R has the community support of leading statisticians, data scientists from different parts of the world and is growing rapidly.

So, most of the development of R language is done by keeping data science and statistics in mind. As a result, R is become the default choice for data science applications and data science professionals.

References

  1. Applied Statistical Inference: Likelihood and Bayes by Leohard Held and Daniel Sabanes Bove, Springer-Verlag Berlin 2014

  2. The R Student Companion, Brian Dennis, CRC Press, 2013.

  3. An Introduction to Statistical Learning with Applications in R by James, Witten, Hastie and Tibshirani, Springer Text in Statistics 2013

  4. Statistical Modeling: The Two Cultures by Leo Breiman, Statistical Science 2001, Vol. 16, No. 3, 199-231

  5. Hands-On Programming with R by Garrett Grolemund

  6. R for Data Science by Hadley Wickham & Garrett Grolemund

  7. Practical Data Science with R by Nina Zumel & John Mount