Learn R Programming: Getting Started with R Language Cheatsheet

R Programming Cheatsheet Techhyme

R is a programming language designed for statistical computing and graphics. It is free, open-source, and available on Linux, Mac OS X, and Windows operating systems. R is an essential language in Data Science. R was built to be fast, and it is used by more than 10% of all statisticians and data scientists.

Also Read:

As R is a community-driven language and software platform, it thrives and improves on user contributions. In this article, we will present R cheat sheets that are organized in the following manner:

1. Data processing and transformation

For any kind of analysis, input/output and transformation of data are core tasks. R is a robust platform with many features that we will cover in the following sections.

a) Data handling

To extract and load data for any kind of analysis, R provides pretty powerful and easy-to-use utility functions. Some of these are listed, as follows:

  • read.csv(<file_name>): This imports a standard .csv file
  • write.csv(<object_name>,<file_name>): This exports to a .csv file
  • data(<dataset_name>): This loads R’s built-in dataset
  • head(<object>): This prints the first few entries of the data imported
  • names(<object>): This lists variables in an object
  • read.table(<file_name>): This reads contents from an ASCII file

b) Basic data types

Data types form the basic constructs for R—or any other language as a matter of fact. What makes R special is an extended list of basic data types to handle varied data types. These are as follows:

  • numeric (integer and double) and character: These are data types that are available in R
  • factor: This allows you to store categorical data while a complex data type is used for complex numbers
  • is.<data_type> and as.<data_type>: These are used to check data types and type conversion, respectively
  • length(<variable>): This gives you a count of characters in a variable

2. Data structures

R provides many data structures out of the box, which we discuss in the following subsections.

a) Vectors

This is the most basic data structure in R. It is similar to a mathematical vector. The following are ways to interact with a vector in R:

  • r[1]: This allows you to access elements using square braces. The element count begins from 1.
  • r[ x > 100 ]: These vectors support logical expressions as indices.
  • r[5:10]: These vectors support subselection. The given example returns vector values between the index 5 to 10.
  • r[-1]: This returns all indices except 1.
  • factor(x): This converts a vector x to factor.
  • which.max(x) and which.min(x): These return the maximum and minimum values of x, respectively.
  • rev(x): This reverses the elements of x.
  • table(x): This gives you the frequency table for elements of the x vector.
  • match(a,b): This returns values from a which exist in b; otherwise, this is not applicable.

b) Arrays and matrices

R supports multidimensional arrays. A matrix is a two-dimensional array. The following are access patterns for these data structures:

  • array (<vector>,<vector_dimensions>): This generates an array from an input vector
  • %o%: This gives you the outer or cross product of two arrays
  • x[a,b,c]: This is when the dimensions of an array are comma-separated and accessed from within square braces
  • matrix(<vector>,nrow=r,ncol=c): This generates an r X c matrix with values from <vector>
  • t(<matrix>): This is the transpose of a matrix
  • diag(<matrix>): This gives the diagonal of a matrix
  • colsum(<matrix>) and rowsum(<matrix>): This calculates the sum of columns and rows of a matrix, respectively
  • colmeans(<matrix>) and rowmeans(<matrix>): This calculates the sum of columns and rows of a matrix, respectively
  • %*%: This is a matrix multiplication operator
  • lower.tri(<matrix>): This returns a vector with values from the lower triangle of a matrix

c) Lists

A list is an ordered collection of named or unnamed objects, which may or may not be homogenous. These are recursive data structures; that is, a list’s element can itself be a list. A list can be manipulated using the following:

  • list(<object_1>,<object_2>,…): This generates a list of objects that are separated by a comma
  • L[[i]]: This is when double-square brackets are used to access elements at the ith index of the list
  • length(<list>): This returns the count of the topmost elements of a list
  • L$<name>: This is when the $ operator allows access to the <named> element of list L; this is the same as L[[i]]

d) Data frames

Data frames are tabular structures that can have columns of different data types and attributes. A data frame may contain components of the numeric, character, factor, or list types, or it may contain other data frames. The following utilities help in manipulating data frames:

  • data.frame(col1=<object1>,col2=<object2>,…): This generates a data frame with n columns or components, which have values from corresponding objects
  • attach(<data.frame>): This exposes components of a data frame in a search path for easy access
  • merge(x,y): This combines two data frames that are based on common columns or row names

e) General utilities

Apart from the utilities and the other constructs that we just discussed, R provides a rich set of general utilities to make data analysis even easier. Check out the following utilities:

  • c(1:5): This is a generic function that concatenates values. The given example would generate a vector with values 1 to 5.
  • rep(<value>,<count>): This generates a vector with repeating <value> elements of the <count> size.
  • seq(to,from): This generates a sequence vector starting with to and ending with from. You can also specify increments; the default is 1.
  • sort(c(10,9,8,7): This returns a sorted vector 7,8,9,10.
  • order(10,9,1,2): This returns indices in ascending order as 3,4,2,1.
  • rank(10,5,6,9): This returns the rank order of elements as 4,1,2,3.
  • summary(<object>): This has summary details, such as min, max, mean, median, and so on, for the object.
  • choose(n,k): This returns the combination of k in n repetitions.
  • na.omit(x): This suppresses all the missing values (nas) from x.
  • na.fail(x): This errors out if x contains even a single missing value.
  • unique(x): This returns only distinct or unique values of x. This works with vectors and data frames.
  • paste(…): This converts objects to strings and concatenates them.
  • substr(cv,start,stop): This substrings from the cv character vector from the start to the stop position.
  • grep(ptrn,cv): This searches for the ptrn patterns in the cv vector.
  • gsub(ptrn,rep,cv): This replaces match for the ptrn pattern with the rep replacement in the cv vector.
  • tolower and toupper: This converts character vector elements to lowercase and uppercase, respectively.

3. Math and modeling

R has a rich set of inbuilt functions and packages to perform mathematical and modeling operations.

a) Math and modeling utilities

As R is a statistical language, it provides a rich set of mathematical functions that are available right out of the box (while more can be added using additional libraries or packages):

  • sum(x): This is the sum of the elements of x.
  • cumsum(x): This calculates the cumulative sum of the elements of x.
  • diff(x): This is the pair-wise difference between the elements of vector x.
  • prod(x): This is the product of the elements of x.
  • mean(x)and median(x): This is the mean and median of x, respectively.
  • var(x,y): This is the variance between the elements of x and y. It works with matrices and data frames as well. This is the same as cov(x, y).
  • quantile(x,probs): This returns the quantile breakup of x for given probabilities.
  • sd(x): This is the standard deviation for x.
  • weighted.mean(x,w): This returns the weighted mean of x using the w weight vector.
  • cor(x,y): This is the linear correlation between x and y.
  • round(x,n): This rounds the elements of x to n digits.
  • log(a,b): This calculates the log of a for base b.
  • sin, cos, tan, asin, acos, atan, and so on: These are Trigonometric functions.
  • exp(x): This exponentiates each element of the x vector.
  • scale(m): This centers or scales the elements of an m numeric matrix.
  • union(x,y), intersect(x,y), and is.element(e,x): These are Set functions that are also available.
  • Conj(c): This returns the conjugate of the c complex number.
  • rnorm, rpois, rgamma, rexp, rcauchy, rt, and so on: These can be used to generate Gaussian, Poisson, Gamma, Exponential, Cauchy, and Student distributions.
  • fft(x): This calculates Fast Fourier Transform of the elements of x.
  • apply(m,INDEX,FUNC): This applies the FUNC function on the INDEX index of the m matrix.
  • lapply(l,FUNC): This applies the FUNC function on the l list.
  • optim(params, func, mtds): This is the general-purpose method to optimize a func function for the params parameters using the mtds methods.
  • lm(frml): This fits a linear model on the frml formula. This is used for regression and covariance analysis. Also, check glm for generalized linear models.
  • nls(fml): This fits nonlinear least squares estimates for nonlinear models.
  • spline(s): This calculates the cubic spline.
  • predict(fit,[…]): This is a generic function to test model fitting on input data.
  • df.residual(fit): This calculates the degrees of residual freedom from fit.
  • coef, residuals, and deviance: These return coefficients, residuals, and deviance of models fitted.
  • logLik(fit): This calculates the log likelihood of the model fitted.
  • aov(frml): This performs analysis of variance model calculations on frml.
  • Anova(fit,[…]): This performs analysis of variance of models fitted.

b) Math and modeling packages

The following is a list of popular and mature sets of packages, which enhance the power of R:

  • arules: This is association rule mining
  • cluster, fpc, mclust: This is clustering and classification
  • DmwR, dprep,rlof: This is outlier detection
  • multicore, snow: This is a multiprocessing library
  • nlme: This is regression, linear, and nonlinear modeling
  • TraMiner: This is sequential pattern mining
  • party and rpart: These are recursive partitioning, decision trees, and survival analysis
  • nnet: This is neural networks
  • kernlab and e1071: These support Vector Machines, PCA, Naive Bayes, fuzzy clustering, and so on.
  • stats, ast, forecast: This is for time series analysis
  • RgoogleMaps, ggmap, plotKML, and spdep: These are for spatial analysis
  • sna, network, and igraph: These are for social network analysis
  • tm, lda, topicmodels, RTextTools, and tau: These are for text mining

4. Plotting

Statistical analysis and data science are way too difficult without graphs and visualization. R has a rich set of utilities and libraries for plotting. Let’s have a look at a few of these:

  • plot(y): This plots the values of y on the y axis ordered by indices on the x axis.
  • plot(x,y): This plots values on the x and y axis, respectively.
  • barplot(x): This is a bar plot of the values of x.
  • hist(x): This is a histogram of frequencies of the elements of x.
  • pie(x): This is a pie chart for the elements of x.
  • boxplot(x): This is a boxplot for the elements of x.
  • plot.ts(x): This is a plot with respect to time.
  • mosaicplot(x): This is a mosaic graph of residuals of a log-linear regression.
  • contour(x,y,z): This is a contour plot of x and y, where x and y must be vectors and z should be a matrix of the x X y dimension.
  • qqplot(x,y): This is a quantile plot of y with respect to x.
  • abline(m,c): This draws a line with the m slope and the c intercept. This can also be used to draw horizontal, vertical, and regression lines.
  • rect(x1,y1,x2,y2): This draws a rectangle, based on the top-left (x1,y1) and bottom-right (x2,y2) coordinates.
  • polygon(x,y): This draws a polygon, connecting the elements of x and y.
  • xlim,ylim: These are the x and y limits of a graph.
  • col(): This is the line or symbol color.
  • text(), title(), and legend(): These are for text, title, and legends on a graph.

Plotting packages

Let’s now take a look at some plotting packages.

  • ggplot2: This is the de facto graphics grammar for R
  • ggvis: This is a rich and powerful plotting library
  • googleVis: This brings the power of Google Visualization APIs to R
  • lattice: This is specialized for multivariate data
  • iplots: These are interactive plots

Leave a Reply