Vectors

  • Run code, learn more
  • Duration: ~45 Minutes
  • Followed by: A Break!
Photo of the Titanic

Learning Objectives

  • What Is A Vector?
  • How To Make One
  • How To Delete One
  • Vector Indexing (How To Subset)
  • Work With Two Vectors
  • Apply Functions To A Vector

What is a vector?

  • Congrats, you have already worked with vectors!
    • (We forgot to tell you)
  • Understanding vectors is CRITICAL to becoming a useR

## This is a vector, length == 1.
captain <- "John Smith"

## This is a vector, length == 2.
captain <- c("John", "Smith")

## This is a vector too, length == 5.
age <- c(22, 38, 26, 35, 35)

## What does c stand for?
?c
    

Combine

  • Use the function c() to combine things
  • Works for numerics, characters, booleans, dates, etc.
  • All arguments are coerced to a common type
  • Each vector is an object
    • (Everything is an object in R)

Photo of Captain Smith Your Turn!


## You can copy/paste this from the slide.
## These will be used again, don't delete.
age <- c(22,38,26,35,35,NA,54,2,27,14)
gender <- c("male","female","female","female","male",
            "male","male","male","female","female")

## Check that you have them with this code:
length(age)
length(gender)
    

The answer to both length() commands should be 10

STOP!

If you didn't succeed with the last slide, SAY SOMETHING!

Visualize A Vector

A vector is like a column in Excel

ABC
1age
222
338
426
535
635
7NA
854
92
1027
1114

Note: NA isn't a string, it is NA

Apply A Function

  • Some functions act on each element in the vector
  • Some functions act on the vector in total

## Per Item
age + 1
            

[1] 23 39 27 36 36 NA 55  3 28 15
            

## In Total
table(gender)
            

gender
female   male 
     5      5 
            

NA

  • NA is the bane of our existence
  • na.rm = TRUE is our friend
  • Example: mean()

Bad

## This will not work.
mean(age)
            

[1] NA

## Yeah, that's not right.
            
Good

## This works just fine.
mean(age, na.rm = TRUE)
            

[1] 28.11111

## That's more like it.
            

HELP!

(You might need a tugboat)

Note the lack of smoke coming from stack #4

Some Functions

Mathematical
  • mean(): Average
  • sd(): Standard Deviation
  • var(): Variance
  • min(): Smallest
  • max(): Biggest
Useful
  • length(): # of items
  • table(): # distinct values
  • summary(): You tell me
  • plot(): You tell me
  • hist(): You tell me
  • barplot(): You tell me

Photo of Captain Smith Your Turn!


## Take a few minutes and spend some
## time using a few of these functions.
## Remember ? to get help with how to use the function.
    

Index[1]

  • AKA Subsetting / Filtering
  • Let's do this together

## Use the square bracket operator to select which
## entries in a vector to return.
## Remember: There are 10 entries in age and gender.

## But maybe we only want one of them.
gender[10]

## Or maybe we only want some of them.
age[3:5]
    

Index[2]

STOP ME IF THIS DOESN'T MAKE SENSE!


## Perhaps we only want to return the males.
gender[gender == "male"]

## More usefully - this works across vectors.
age[gender == "male"]
    

Visualize 2 Vectors

In this case, these vectors are in the same order . . .

age
22
38
26
35
35
NA
54
2
27
14
gender
male
female
female
female
male
male
male
male
female
female

Index[3]


## Make a new variable
age_of_men <- age[gender == "male"]

## Or use it in a function
mean( age[gender=="female"] )
    

Photo of Captain Smith Your Turn!


## Using the following Boolean Object
survived <- c(FALSE,TRUE,TRUE,TRUE,FALSE,
              FALSE,FALSE,FALSE,TRUE,TRUE)

## How many men survived the sinking of the Titanic?
## How many women?
    

Answer on the next slide!

Photo of Captain Smith Your Turn!


## How many men survived the sinking of the Titanic?
survived_men <- survived[gender=="male"]
table(survived_men)

## Extra credit for anyone who did it this way.
sum(survived_men)


## Try this to better understand.
survived_men == 1
    

Actual Passengers

NameAgeGender
Mr. Owen Braund22male
Mrs. Florence Briggs Thayer38female
Miss. Laina Heikkinen26female
Mrs. Lily May Peel35female
Mr. William Allen35male
Mr. James MoranNAmale
Mr. Timothy McCarthy54male
Master. Gosta Palsson2male
Mrs. Elisabeth Vilhelmina Berg27female
Mrs. Adele Achem14female
Photo of the Grand Staircase

Statistical Testing

  • Statistical tests are functions in R:
    • Student's T-Test: t.test()
    • Chi Square Test: chisq.test()
  • You know everything you need to know to run

Statistical Testing


t.test(age[gender=="male"], age[gender=="female"])
    

Welch Two Sample t-test

data:  age[gender == "male"] and age[gender == "female"]
t = 0.021341, df = 3.8789, p-value = 0.984
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -32.67896  33.17896
sample estimates:
mean of x mean of y 
    28.25     28.00 
    

One More Data Type

  • Factor
  • Superficially, similar to a string character
  • More efficient with memory usage
  • Defines a variable explicitly as a categorical

Make A Factor


gender
[1] "male"   "female" "female" "female" "male"   "male"   "male"   "male"  
 [9] "female" "female"
        

as.factor(gender)
 [1] male   female female female male   male   male   male   female female
Levels: female male
        

Take A Break!

Titanic in Cobh Harbour, County Cork Ireland

Titanic in Cobh Harbour, County Cork Ireland