Note: It is recommended to use <- for assignment rather than = sign)
> x = 1
Assign x a vector of integers from 1 to 1
> x = 1:10
Assign x a vector of integers from 10 to 1
> x = 10:1
If you are interested in different scale ...
> x = 1:10 * 0.01
Use seq function - generates vector of integers 0 through 100 at interval of 10
> x = seq(0, 100, by = 10)
Create a vector of arbitrary sequence of numbers
> x = c(1, 5, 8, 5, 10)
Prompt user to enter number
> x = scan()
There are packages available to read data from excel spreadsheet etc. However, whenever possible export your data to .csv format then use read.csv function
> sfpd = read.csv("data/credit.csv")
Take a look at the other similar functions: read.table, and read.csv2.
Option 1: Reading well formatted date
> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data"
> housing = read.csv(url, header = FALSE, sep = "")
Option 2: More granular control of the data
> raw = getURL("https://raw.githubusercontent.com/einext/data/master/Olympics2016.csv")
> olympics = read.csv(textConnection(raw), header=TRUE)
View description of the built in datasets in R in the following link
https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html
In R, you can get the list by
> data()
Load a dataset in R session
> data("iris")
View help on data
> ?iris
Clean up data from R session
> rm(list = ls())
Here is nice index of data updated 2/12/2016
Other data format and sources
To load json data you can use rjson package
To load data from RDBMS, you can use RODBC package
Once you have data in R session, few common exploration steps you take as below
View the no of records, no of columns, column types and sample values from each column
> str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Summary of each column. For numeric column, it shows the min, max, quartiles etc, for categorical type it shows frequency.
> summary(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 setosa :50
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 versicolor:50
Median :5.800 Median :3.000 Median :4.350 Median :1.300 virginica :50
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Find summary of a numeric column
> summary(iris$Petal.Length)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 1.600 4.350 3.758 5.100 6.900
Find unique values of a categorical column
> unique(iris$Species)
[1] setosa versicolor virginica
Levels: setosa versicolor virginica
Find frequency of categorical column
> table(iris$Species)
setosa versicolor virginica
50 50 50
Find proportion of each categorical values
> prop.table(table(iris$Species))
setosa versicolor virginica
0.3333333 0.3333333 0.3333333
Find no of records
> nrow(iris)
[1] 150
Find no of columns
> ncol(iris)
[1] 5