## This is a vector, length == 1.
captain <- "John Smith"
## This is a vector, length == 2.
captain <- c("John", "Smith")
## This is a vector too, length == 5.
age <- c(22, 38, 26, 35, 35)
## What does c stand for?
?c
c()
to combine things
## You can copy/paste this from the slide.
## These will be used again, don't delete.
age <- c(22,38,26,35,35,NA,54,2,27,14)
gender <- c("male","female","female","female","male",
"male","male","male","female","female")
## Check that you have them with this code:
length(age)
length(gender)
The answer to both length()
commands should be 10
If you didn't succeed with the last slide, SAY SOMETHING!
A vector is like a column in Excel
A | B | C | |
---|---|---|---|
1 | age | ||
2 | 22 | ||
3 | 38 | ||
4 | 26 | ||
5 | 35 | ||
6 | 35 | ||
7 | NA | ||
8 | 54 | ||
9 | 2 | ||
10 | 27 | ||
11 | 14 |
Note: NA isn't a string, it is NA
## Per Item
age + 1
[1] 23 39 27 36 36 NA 55 3 28 15
## In Total
table(gender)
gender
female male
5 5
na.rm = TRUE
is our friendmean()
## This will not work.
mean(age)
[1] NA
## Yeah, that's not right.
## This works just fine.
mean(age, na.rm = TRUE)
[1] 28.11111
## That's more like it.
(You might need a tugboat)
?foo
Note the lack of smoke coming from stack #4
mean()
: Averagesd()
: Standard Deviationvar()
: Variancemin()
: Smallestmax()
: Biggestlength()
: # of itemstable()
: # distinct valuessummary()
: You tell meplot()
: You tell mehist()
: You tell mebarplot()
: You tell me
## Take a few minutes and spend some
## time using a few of these functions.
## Remember ? to get help with how to use the function.
## Use the square bracket operator to select which
## entries in a vector to return.
## Remember: There are 10 entries in age and gender.
## But maybe we only want one of them.
gender[10]
## Or maybe we only want some of them.
age[3:5]
STOP ME IF THIS DOESN'T MAKE SENSE!
## Perhaps we only want to return the males.
gender[gender == "male"]
## More usefully - this works across vectors.
age[gender == "male"]
In this case, these vectors are in the same order . . .
age |
---|
22 |
38 |
26 |
35 |
35 |
NA |
54 |
2 |
27 |
14 |
gender |
---|
male |
female |
female |
female |
male |
male |
male |
male |
female |
female |
## Make a new variable
age_of_men <- age[gender == "male"]
## Or use it in a function
mean( age[gender=="female"] )
## Using the following Boolean Object
survived <- c(FALSE,TRUE,TRUE,TRUE,FALSE,
FALSE,FALSE,FALSE,TRUE,TRUE)
## How many men survived the sinking of the Titanic?
## How many women?
Answer on the next slide!
## How many men survived the sinking of the Titanic?
survived_men <- survived[gender=="male"]
table(survived_men)
## Extra credit for anyone who did it this way.
sum(survived_men)
## Try this to better understand.
survived_men == 1
Name | Age | Gender |
---|---|---|
Mr. Owen Braund | 22 | male |
Mrs. Florence Briggs Thayer | 38 | female |
Miss. Laina Heikkinen | 26 | female |
Mrs. Lily May Peel | 35 | female |
Mr. William Allen | 35 | male |
Mr. James Moran | NA | male |
Mr. Timothy McCarthy | 54 | male |
Master. Gosta Palsson | 2 | male |
Mrs. Elisabeth Vilhelmina Berg | 27 | female |
Mrs. Adele Achem | 14 | female |
t.test()
chisq.test()
t.test(age[gender=="male"], age[gender=="female"])
Welch Two Sample t-test
data: age[gender == "male"] and age[gender == "female"]
t = 0.021341, df = 3.8789, p-value = 0.984
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-32.67896 33.17896
sample estimates:
mean of x mean of y
28.25 28.00
gender
[1] "male" "female" "female" "female" "male" "male" "male" "male"
[9] "female" "female"
as.factor(gender)
[1] male female female female male male male male female female
Levels: female male
Titanic in Cobh Harbour, County Cork Ireland