Why

Presented by: Andy Choens, MSW

URLs To Remember:

  1. http://choens.github.io/why-r
  2. https://github.com/choens/why-r/tree/gh-pages
  3. http://cran.r-project.org/

Bad Reasons To Learn / Use R


  • Some guy with a beard likes it.
  • Someone on the Internet said it was great.
  • SAS isn't doing what you want it to. (Operator error?)

Good Reasons to Learn / Use R


  • Ecosystem
  • Literate Programming
  • Open Source / Open Science
  • Job Growth
  • Everyone Writes About R

Ecosystem

Packages

The capabilities of R are extended through user-created packages, which allow specialized statistical techniques, graphical devices (ggplot2), import/export capabilities, reporting tools (knitr, Sweave), etc. These packages are developed primarily in R, and sometimes in Java, C, C++ and Fortran.

Popular Package Places

To Infinity & Beyond!

Source: r4stats.com/articles/popularity/

Packages Examples

dplyr

ggplot2

sqlutils

A fast, consistent tool for working with data frame like objects, both in memory and out of memory. An implementation of the grammar of graphics in R. It combines the advantages of both base and lattice graphics: conditioning and shared axes are handled automatically, and you can still build up a plot step by step from multiple data sources. It also implements a sophisticated multidimensional conditioning system and a consistent interface to map data to aesthetic attributes. This package provides utilities for working with a library of SQL files.

Package Craziness

kobe-package

Ch IP peak Anno

The tuna Regional Fisheries Management Organisations (tRFMOs) use a common framework for providing scintific advice, i.e. the Kobe II Framework. This is based on maintaining fishing mortal- ity below FMSY and stock biomass above BMSY. This package provides methods for summarising results from stock assessments and Management Strategy Evaluations in the Kobe format. The package includes functions to retrieve the sequences around the peak, obtain enriched Gene Ontology (GO) terms, find the nearest gene, exon, miRNA or custom features such as most conserved elements and other transcription factor binding sites supplied by users. Starting 2.0.5, new functions have been added for finding the peaks with bi-directional promoters with summary statistics (peaksNearBDP), for summarizing the occurrence of motifs in peaks (summarizePatternInPeaks) and for adding other IDs to annotated peaks or enrichedGO (addGeneIDs). This package leverages the biomaRt, IRanges, Biostrings, BSgenome, GO.db, multtest and stat packages.

Commercial Support

Programming Languages

  • C
  • Java
  • JMP
  • Mathematica
  • MATLAB
  • Python
  • SAS
  • SPSS
  • Statistica
  • tableau

Database
Vendors

  • Hadoop
  • Oracle
  • PostgreSQL
  • Vertica

Business Intelligence

  • Alteryx
  • Jaspersoft
  • Oracle Business Intelligence Enterprise Edition
  • Pentaho
  • SAP (and SAP HANA)

Interface Choices

Interface - RStudio

Interface - EMACS / ESS

Interface - Eclipse / Statet

Interface - JASP

Interface - Custom Dashboard

Literate Programming

MTCARS

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973 - 1974 models).

MTCARS - Just A Glimpse

data(mtcars)
kable(head(mtcars))
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Source: http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html

R Markdown

---
title: Minimal Markdown Example
author: Andy Choens, MSW
---

This is a simple, minimal example of report written in R markdown. The purpose of this report is two fold:

1. To enlighten you about the relationship between engine displacement and gas mileage.
2. Provide an accessible introduction to Literate Programming.

The source of the data is the 1974 Motor Trend USA Magazine.

```{r echo=FALSE}

## Init R and Config
library(ggplot2)
data(mtcars)

```

Insert a Graph

# Scatterplot

Displacement is the independent variable. Gas mileage is the dependent variable. Correlation does not imply causation, except for this time when it does.

```{r echo=FALSE}

ggplot(data = mtcars, aes(x=disp, y=mpg)) + geom_point(shape=1) + geom_smooth(method=lm)

```

Transparent, Reproducible

The Return of SHINY

Open Source /
Open Science

Free as in freedom, not beer.

Open Source Is Everywhere:

FOSS:

Firefox Adium Java

Based On:

Chrome Safari Android

Even @ DOH:

FOSS:

Filezilla SQL Workbench/J 7-Zip

Based On:

Vertica SAS

Why?

The Four Freedoms

  1. The freedom to run the program for any purpose.
  2. The freedom to study how the program works, and change it to make it do what you wish.
  3. The freedom to redistribute copies so you can help your neighbor.
  4. The freedom to improve the program, and release your improvements (and modified versions in general) to the public, so that the whole community benefits.

Real: Code, Science, People

This ain't Fishy Science

Open Data or Open Science?

  • Hint: The presenter wants the latter.
  • data.ny.gov
  • Open Data is more powerful when it comes with open, reproducible, transparent analysis
  • Added benefit: Open science comes with no additional privacy concerns.

Job Growth

Jobs


Who Wants A Job?


Comparative Number of Jobs

Growth over Time

Indeed.com Query

R !"R D" !"A R" !"H R" !"R N" !toys !kids !" R Walgreen" !walmart !"HVAC R" !"R Bard"
and ( "biostatistics" or "data analysis" or "data analyst" or "epidemiologist"
or "healthcare analysis" or "healthcare analyst"
or "statistical"
)
,SAS !"system administrator" !"school age" !sata !firmware !scsi !raid !samsung !scandinavian !sonar !nurse
and ( "biostatistics" or "data analysis" or "data analyst" or "epidemiologist"
or "healthcare analysis" or "healthcare analyst"
or "statistical"
)

Everyone Writes About R

Lots of Articles About . . . . R!

New York Times Data Analysts Captivated by R's Power January 6, 2009
New York Times R You Ready For R? January 8, 2009
Fast Company Why The R Programming Language Is Good For Business May 5, 2014
Fast Company The 9 Best Programming Languages For Crunching Data May 5, 2014
Fast Company Where Data And Creativity Meet: Confessions Of A Quant, Madison Avenue's "Hitman". ~ 2 Years Ago
Revolution Analytics R Is Still Hot and Getting Hotter January 14, 2015

Questions