R (programming language)
Paradigms | Multi-paradigm: procedural, object-oriented, functional, reflective, imperative, array[1] |
---|---|
Designed by | Ross Ihaka and Robert Gentleman |
Developer | R Core Team |
First appeared | August 1993 |
Stable release | 4.3.3[2]
/ 29 February 2024 |
Typing discipline | Dynamic |
Platform | arm64 and x86-64 |
License | GNU GPL v2[3] |
Filename extensions | |
Website | www |
Influenced by | |
Influenced | |
Julia[7] | |
|
R is a programming language for statistical computing and data visualization. It has been adopted in the fields of data mining, bioinformatics, and data analysis.[8]
The core R language is augmented by a large number of extension packages, containing reusable code, documentation, and sample data.
R software is open-source and free software. It is licensed by the GNU Project and available under the GNU General Public License.[3] It is written primarily in C, Fortran, and R itself. Precompiled executables are provided for various operating systems.
As an interpreted language, R has a native command line interface. Moreover, multiple third-party graphical user interfaces are available, such as RStudio—an integrated development environment—and Jupyter—a notebook interface.
History[edit]
R was started by professors Ross Ihaka and Robert Gentleman as a programming language to teach introductory statistics at the University of Auckland.[9] The language was inspired by the S programming language, with most S programs able to run unaltered in R.[6] The language was also inspired by Scheme's lexical scoping, allowing for local variables.[1]
The name of the language, R, comes from being both an S language successor as well as the shared first letter of the authors, Ross and Robert.[10] In August 1993, Ihaka and Gentleman posted a binary of R on StatLib — a data archive website. At the same time, they announced the posting on the s-news mailing list.[11] On December 5, 1997, R became a GNU project when version 0.60 was released.[12] On February 29, 2000, the first official 1.0 version was released.[13]
Packages[edit]
R packages are collections of functions, documentation, and data that expand R.[14] For example, packages add report features such as RMarkdown, knitr and Sweave. Easy package installation and use have contributed to the language's adoption in data science.[15]
The Comprehensive R Archive Network (CRAN) was founded in 1997 by Kurt Hornik and Fritz Leisch to host R's source code, executable files, documentation, and user-created packages.[16] Its name and scope mimic the Comprehensive TeX Archive Network and the Comprehensive Perl Archive Network.[16] CRAN originally had three mirrors and 12 contributed packages.[17] As of December 2022, it has 103 mirrors[18] and 18,976 contributed packages.[19] Packages are also available on repositories R-Forge, Omegahat, and GitHub.
The Task Views on the CRAN website lists packages in fields such as finance, genetics, high-performance computing, machine learning, medical imaging, meta-analysis, social sciences, and spatial statistics.
The Bioconductor project provides packages for genomic data analysis, complementary DNA, microarray, and high-throughput sequencing methods.
Packages add the capability to implement various statistical techniques such as linear, generalized linear and nonlinear modeling, classical statistical tests, spatial analysis, time-series analysis, and clustering.
The tidyverse package is organized to have a common interface around accessing and processing data contained in the data frame data structure, a two-dimensional table of rows and columns. Each function in the package is designed to couple together all the other functions in the package.[14]
Installing a package occurs only once. To install tidyverse:[14]
> install.packages( "tidyverse" )
To instantiate the functions, data, and documentation of a package, execute the library()
function. To instantiate tidyverse:[a]
> library( tidyverse )
Interfaces[edit]
R comes installed with a command line console. Available for installation are various integrated development environments (IDE). IDEs for R include R.app (OSX/macOS only), Rattle GUI, R Commander, RKWard, RStudio, and Tinn-R.
General purpose IDEs that support R include Eclipse via the StatET plugin and Visual Studio via R Tools for Visual Studio.
Editors that support R include Emacs, Vim via the Nvim-R plugin, Kate, LyX via Sweave, WinEdt (website), and Jupyter (website).
Scripting languages that support R include Python (website), Perl (website), Ruby (source code), F# (website), and Julia (source code).
General purpose programming languages that support R include Java via the Rserve socket server, and .NET C# (website).
Statistical frameworks which use R in the background include Jamovi and JASP.
Community[edit]
The R Core Team was founded in 1997 to maintain the R source code. The R Foundation for Statistical Computing was founded in April 2003 to provide financial support. The R Consortium is a Linux Foundation project to develop R infrastructure.
The R Journal is an open access, academic journal which features short to medium-length articles on the use and development of R. It includes articles on packages, programming tips, CRAN news, and foundation news.
The R community hosts many conferences and in-person meetups. These groups include:
- UseR!: an annual international R user conference (website)
- Directions in Statistical Computing (DSC) (website)
- R-Ladies: an organization to promote gender diversity in the R community (website)
- SatRdays: R-focused conferences held on Saturdays (website)
- R Conference (website)
- posit::conf (formerly known as rstudio::conf) (website)
Implementations[edit]
The main R implementation is written primarily in C, Fortran, and R itself. Other implementations include:
- pretty quick R (pqR), by Radford M. Neal, attempts to improve memory management.
- Renjin is an implementation of R for the Java Virtual Machine.
- CXXR and Riposte[20] are implementations of R written in C++.
- Oracle's FastR is an implementation of R, built on GraalVM.
- TIBCO Software, creator of S-PLUS, wrote TERR — an R implementation to integrate with Spotfire.[21]
Microsoft R Open (MRO) was a R implementation. As of 30 June 2021, Microsoft started to phase out MRO in favor of the CRAN distribution.[22]
Commercial support[edit]
Although R is an open-source project, some companies provide commercial support:
- Revolution Analytics provides commercial support for Revolution R.
- Oracle provides commercial support for the Big Data Appliance, which integrates R into its other products.
- IBM provides commercial support for in-Hadoop execution of R.
Examples[edit]
Basic syntax[edit]
The following examples illustrate the basic syntax of the language and use of the command-line interface. (An expanded list of standard language features can be found in the R manual, "An Introduction to R".[23])
In R, the generally preferred assignment operator is an arrow made from two characters <-
, although =
can be used in some cases.[24]
> x <- 1:6 # Create a numeric vector in the current environment
> y <- x^2 # Create vector based on the values in x.
> print(y) # Print the vector’s contents.
[1] 1 4 9 16 25 36
> z <- x + y # Create a new vector that is the sum of x and y
> z # Return the contents of z to the current environment.
[1] 2 6 12 20 30 42
> z_matrix <- matrix(z, nrow=3) # Create a new matrix that turns the vector z into a 3x2 matrix object
> z_matrix
[,1] [,2]
[1,] 2 20
[2,] 6 30
[3,] 12 42
> 2*t(z_matrix)-2 # Transpose the matrix, multiply every element by 2, subtract 2 from each element in the matrix, and return the results to the terminal.
[,1] [,2] [,3]
[1,] 2 10 22
[2,] 38 58 82
> new_df <- data.frame(t(z_matrix), row.names=c('A','B')) # Create a new data.frame object that contains the data from a transposed z_matrix, with row names 'A' and 'B'
> names(new_df) <- c('X','Y','Z') # Set the column names of new_df as X, Y, and Z.
> print(new_df) # Print the current results.
X Y Z
A 2 6 12
B 20 30 42
> new_df$Z # Output the Z column
[1] 12 42
> new_df$Z==new_df['Z'] && new_df[3]==new_df$Z # The data.frame column Z can be accessed using $Z, ['Z'], or [3] syntax and the values are the same.
[1] TRUE
> attributes(new_df) # Print attributes information about the new_df object
$names
[1] "X" "Y" "Z"
$row.names
[1] "A" "B"
$class
[1] "data.frame"
> attributes(new_df)$row.names <- c('one','two') # Access and then change the row.names attribute; can also be done using rownames()
> new_df
X Y Z
one 2 6 12
two 20 30 42
Structure of a function[edit]
One of R's strengths is the ease of creating new functions.[25] Objects in the function body remain local to the function, and any data type may be returned.
Create a function:
# The input parameters are x and y.
# The function returns a linear combination of x and y.
f <- function(x, y) {
z <- 3 * x + 4 * y
# this return() statement is optional
return(z)
}
Usage output:
> f(1, 2)
[1] 11
> f(c(1,2,3), c(5,3,4))
[1] 23 18 25
> f(1:3, 4)
[1] 19 22 25
Modeling and plotting[edit]
The R language has built-in support for data modeling and graphics. The following example shows how R can generate and plot a linear model with residuals.
# Create x and y values
x <- 1:6
y <- x^2
# Linear regression model y = A + B * x
model <- lm(y ~ x)
# Display an in-depth summary of the model
summary(model)
# Create a 2 by 2 layout for figures
par(mfrow = c(2, 2))
# Output diagnostic plots of the model
plot(model)
Output:
Residuals:
1 2 3 4 5 6 7 8 9 10
3.3333 -0.6667 -2.6667 -2.6667 -0.6667 3.3333
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.3333 2.8441 -3.282 0.030453 *
x 7.0000 0.7303 9.585 0.000662 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.055 on 4 degrees of freedom
Multiple R-squared: 0.9583, Adjusted R-squared: 0.9478
F-statistic: 91.88 on 1 and 4 DF, p-value: 0.000662
Mandelbrot set[edit]
This Mandelbrot set example highlights the use of complex numbers. It models the first 20 iterations of the equation z = z2 + c
, where c
represents different complex constants.
Install the package that provides the write.gif()
function beforehand:
install.packages("caTools")
R Source code:
library(caTools)
jet.colors <-
colorRampPalette(
c("green", "pink", "#007FFF", "cyan", "#7FFF7F",
"white", "#FF7F00", "red", "#7F0000"))
dx <- 1500 # define width
dy <- 1400 # define height
C <-
complex(
real =
rep(
seq(-2.2, 1.0, length.out = dx), each = dy),
imag = rep(seq(-1.2, 1.2, length.out = dy),
dx))
# reshape as matrix of complex numbers
C <- matrix(C, dy, dx)
# initialize output 3D array
X <- array(0, c(dy, dx, 20))
Z <- 0
# loop with 20 iterations
for (k in 1:20) {
# the central difference equation
Z <- Z^2 + C
# capture the results
X[, , k] <- exp(-abs(Z))
}
write.gif(
X,
"Mandelbrot.gif",
col = jet.colors,
delay = 100)
See also[edit]
- Comparison of numerical-analysis software
- Comparison of statistical packages
- List of numerical-analysis software
- List of statistical software
- Rmetrics
Further reading[edit]
- Wickham, Hadley; Çetinkaya-Rundel, Mine; Grolemund, Garrett (2023). R for data science: import, tidy, transform, visualize, and model data (2nd ed.). Beijing Boston Farnham Sebastopol Tokyo: O'Reilly. ISBN 978-1-4920-9740-2.
- Gagolewski, Marek (2024). Deep R Programming. doi:10.5281/ZENODO.7490464. ISBN 978-0-6455719-2-9.
External links[edit]
- R Technical Papers
- Free Software Foundation
- R FAQ
- Big Book of R, curated list of R-related programming books
- Books Related to R - R Project, partially annotated list of books that are related to S or R and may be useful to the R user community
Portal[edit]
Notes[edit]
- ^ This displays to standard error a listing of all the packages that tidyverse depends upon. It may also display two errors showing conflict. The errors may be ignored.
References[edit]
- ^ a b c Morandat, Frances; Hill, Brandon; Osvald, Leo; Vitek, Jan (11 June 2012). "Evaluating the design of the R language: objects and functions for data analysis". European Conference on Object-Oriented Programming. 2012: 104–131. doi:10.1007/978-3-642-31057-7_6. Retrieved 17 May 2016 – via SpringerLink.
- ^ Peter Dalgaard (29 February 2024). "R 4.3.3 is released". Retrieved 1 March 2024.
- ^ a b "R - Free Software Directory". directory.fsf.org. Retrieved 26 January 2024.
- ^ "R scripts". mercury.webster.edu. Retrieved 17 July 2021.
- ^ "R Data Format Family (.rdata, .rda)". Loc.gov. 9 June 2017. Retrieved 17 July 2021.
- ^ a b Hornik, Kurt; The R Core Team (12 April 2022). "R FAQ". The Comprehensive R Archive Network. 3.3 What are the differences between R and S?. Archived from the original on 28 December 2022. Retrieved 27 December 2022.
- ^ "Introduction". The Julia Manual. Archived from the original on 20 June 2018. Retrieved 5 August 2018.
- ^ Giorgi, Federico M.; Ceraolo, Carmine; Mercatelli, Daniele (27 April 2022). "The R Language: An Engine for Bioinformatics and Data Science". Life. 12 (5): 648. Bibcode:2022Life...12..648G. doi:10.3390/life12050648. PMC 9148156. PMID 35629316.
- ^ Ihaka, Ross. "The R Project: A Brief History and Thoughts About the Future" (PDF). p. 12. Archived (PDF) from the original on 28 December 2022. Retrieved 27 December 2022.
We set a goal of developing enough of a language to teach introductory statistics courses at Auckland.
- ^ Hornik, Kurt; The R Core Team (12 April 2022). "R FAQ". The Comprehensive R Archive Network. 2.13 What is the R Foundation?. Archived from the original on 28 December 2022. Retrieved 28 December 2022.
- ^ Ihaka, Ross. "R: Past and Future History" (PDF). p. 4. Archived (PDF) from the original on 28 December 2022. Retrieved 28 December 2022.
- ^ Ihaka, Ross (5 December 1997). "New R Version for Unix". stat.ethz.ch. Archived from the original on 12 February 2023. Retrieved 12 February 2023.
- ^ Ihaka, Ross. "The R Project: A Brief History and Thoughts About the Future" (PDF). p. 18. Archived (PDF) from the original on 28 December 2022. Retrieved 27 December 2022.
- ^ a b c Wickham, Hadley; Cetinkaya-Rundel, Mine; Grolemund, Garrett (2023). R for Data Science, Second Edition. O'Reilly. p. xvii. ISBN 978-1-492-09740-2.
- ^ Chambers, John M. (2020). "S, R, and Data Science". The R Journal. 12 (1): 462–476. doi:10.32614/RJ-2020-028. ISSN 2073-4859.
The R language and related software play a major role in computing for data science. ... R packages provide tools for a wide range of purposes and users.
- ^ a b Hornik, Kurt (2012). "The Comprehensive R Archive Network". WIREs Computational Statistics. 4 (4): 394–398. doi:10.1002/wics.1212. ISSN 1939-5108. S2CID 62231320.
- ^ Kurt Hornik (23 April 1997). "Announce: CRAN". r-help. Wikidata Q101068595..
- ^ "The Status of CRAN Mirrors". cran.r-project.org. Retrieved 30 December 2022.
- ^ "CRAN - Contributed Packages". cran.r-project.org. Retrieved 29 December 2022.
- ^ Talbot, Justin; DeVito, Zachary; Hanrahan, Pat (1 January 2012). "Riposte: A trace-driven compiler and parallel VM for vector code in R". Proceedings of the 21st international conference on Parallel architectures and compilation techniques. ACM. pp. 43–52. doi:10.1145/2370816.2370825. ISBN 9781450311823. S2CID 1989369.
- ^ Jackson, Joab (16 May 2013). TIBCO offers free R to the enterprise. PC World. Retrieved 20 July 2015.
- ^ "Looking to the future for R in Azure SQL and SQL Server". 30 June 2021. Retrieved 7 November 2021.
- ^ "An Introduction to R. Notes on R: A Programming Environment for Data Analysis and Graphics" (PDF). Retrieved 3 January 2021.
- ^ R Development Core Team. "Assignments with the = Operator". Retrieved 11 September 2018.
- ^ Kabacoff, Robert (2012). "Quick-R: User-Defined Functions". statmethods.net. Retrieved 28 September 2018.
- R (programming language)
- Array programming languages
- Cross-platform free software
- Data mining and machine learning software
- Data-centric programming languages
- Dynamically typed programming languages
- Free plotting software
- Free statistical software
- Functional languages
- GNU Project software
- Literate programming
- Numerical analysis software for Linux
- Numerical analysis software for macOS
- Numerical analysis software for Windows
- Programming languages created in 1993
- Science software
- Statistical programming languages