Introduction

Data Types & Probability Distributions

Francisco Rowe

2020-08-31

This session1 Part of Introduction to Statistical Learning in R Creative Commons License
Introduction – Data Types & Probability Distributions by Francisco Rowe is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
introduces the structure, tools and key concepts that we will use during the duration of the course. While this first session may seem to revolve around basic ideas, they provide the foundation for the rest. Let’s start by generally defining Statistics.

Statistics = Descriptive Statistics + Inferential Statistics

1 Introducing R

R is a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques. It has gained widespread use in academia and industry. R offers a wider array of functionality than a traditional statistics package, such as SPSS and is composed of core (base) functionality, and is expandable through libraries hosted on CRAN. CRAN is a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R.

Commands are sent to R using either the terminal / command line or the R Console which is installed with R on either Windows or OS X. On Linux, there is no equivalent of the console, however, third party solutions exist. On your own machine, R can be installed from here.

Normally RStudio is used to implement R coding. RStudio is an integrated development environment (IDE) for R and provides a more user-friendly front-end to R than the front-end provided with R.

To run R or RStudio, just double click on the R or RStudio icon. Throughout this course, we will be using RStudio:

Fig. 1. RStudio features.

If you would like to know more about the various features of RStudio, watch this video

2 Setting the working directory

Before we start any analysis, ensure to set the path to the directory where we are working. We can easily do that with setwd(). Please replace in the following line the path to the folder where you have placed this file -and where the data folder lives.

#setwd('../data/sar.csv')
#setwd('.')

Note: It is good practice to not include spaces when naming folders and files. Use underscores or dots.

You can check your current working directory by typing:

getwd()
## [1] "/Users/franciscorowe/Dropbox/Francisco/Research/github_projects/sl/code"

3 R as a calculator

3.1 The Console window

The Console window provides a means of entering commands for immediate execution.

To demonstrate, we will use the Console window to introduce the use of R as a simple calculator.

In the Console window, type the sum:

Hit enter to find the result.

3.2 Mathematical operators

The full set of mathematical operators used by R are:

20 / 10
## [1] 2
20 * 10
## [1] 200
20 + 10
## [1] 30
20 - 10
## [1] 10
20^10
## [1] 1.024e+13
sqrt(20)
## [1] 4.472136
log(20)
## [1] 2.995732

3.3 Operator precedence

R uses something known as operator precedence: some mathematical operations, such as multiplication, are undertaken before other lower priority operations, such as addition. Use brackets () for the operations you want R performs first.

log(20+10*(4/2))
## [1] 3.688879

4 R Scripts and Notebooks

An R script is a series of commands that you can execute at one time and help you save time. So you don’t repeat the same steps every time you want to execute the same process with different datasets. An R script is just a plain text file with R commands in it.

To create an R script in RStudio, you need to

mtcars
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

An R Notebook is an R Markdown document with descriptive text and code chunks that can be executed independently and interactively, with output visible immediately beneath a code chunk - see Xie, Allaire, and Grolemund (2019Xie, Yihui, JJ Allaire, and Garrett Grolemund. 2019. R Markdown: The Definitive Guide. CRC Press, Taylor & Francis, Chapman & Hall Book. https://bookdown.org/yihui/rmarkdown/.).

To create an R Notebook, you need to:

Fig. 2. YAML metadata for notebooks.

  1. use the Insert command on the editor toolbar;
  2. use the keyboard shortcut Ctrl + Alt + I or Cmd + Option + I (Mac); or,
  3. type the chunk delimiters ```{r} and ```

In a chunk code you can produce text output, tables, graphics and write code! You can control these outputs via chunk options which are provided inside the curly brackets eg.