. . . We have much to discuss.
data()
Anscombe's Quartet is built into R
data(anscombe)
anscombe
x1 x2 x3 x4 y1 y2 y3 y4
1 10 10 10 8 8.04 9.14 7.46 6.58
2 8 8 8 8 6.95 8.14 6.77 5.76
3 13 13 13 8 7.58 8.74 12.74 7.71
4 9 9 9 8 8.81 8.77 7.11 8.84
5 11 11 11 8 8.33 9.26 7.81 8.47
6 14 14 14 8 9.96 8.10 8.84 7.04
7 6 6 6 8 7.24 6.13 6.08 5.25
8 4 4 4 19 4.26 3.10 5.39 12.50
9 12 12 12 8 10.84 9.13 8.15 5.56
10 7 7 7 8 4.82 7.26 6.42 7.91
11 5 5 5 8 5.68 4.74 5.73 6.89
attributes(anscombe)
summary(anscombe)
str(anscombe)
View(anscombe)
head(anscombe)
tail(anscombe)
Independent Variable |
Dependent Variable |
|
---|---|---|
Set 1 | x1 | y1 |
Set 2 | x2 | y2 |
Set 3 | x3 | y3 |
Set 4 | x4 | y4 |
Anscombe's Quartet is a synthetic data set. The abstract ideas which underlie the normal differences between row and column in a data frame do not really apply here. There is relationship between X1 and X3. But, to use the data, we need to access individual columns of data.
Question How can we access all the values in a given colum?
colMeans(anscombe)
x1 x2 x3 x4 y1 y2 y3 y4
9.000000 9.000000 9.000000 9.000000 7.500909 7.500909 7.500000 7.500909
cbind(x1 = sd(anscombe$x1),
x2 = sd(anscombe$x2),
x3 = sd(anscombe$x3),
x4 = sd(anscombe$x4)
)
x1 x2 x3 x4
[1,] 3.316625 3.316625 3.316625 3.316625
cor.test(x=anscombe$x1, y=anscombe$y1)
Pearson's product-moment correlation
data: anscombe$x1 and anscombe$y1
t = 4.2415, df = 9, p-value = 0.00217
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.4243912 0.9506933
sample estimates:
cor
0.8164205
plot(anscombe$x1, anscombe$y1, main="Anscombe: Set 1",
xlab="x1", ylab="y1"
)
?cor.test
m1 <- lm(formula=y1~x1, data=anscombe)
m1
Call:
lm(formula = y1 ~ x1, data = anscombe)
Coefficients:
(Intercept) anscombe$x1
3.0001 0.5001
?formula
attributes(m1)
summary(m1)
str(m1)
View(m1)
head(m1)
tail(m1)
summary(m1)
Call:
lm(formula = y1 ~ x1, data = anscombe)
Residuals:
Min 1Q Median 3Q Max
-1.92127 -0.45577 -0.04136 0.70941 1.83882
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.0001 1.1247 2.667 0.02573 *
anscombe$x1 0.5001 0.1179 4.241 0.00217 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.237 on 9 degrees of freedom
Multiple R-squared: 0.6665, Adjusted R-squared: 0.6295
F-statistic: 17.99 on 1 and 9 DF, p-value: 0.00217
attributes(m1)
$names
[1] "coefficients" "residuals" "effects" "rank"
[5] "fitted.values" "assign" "qr" "df.residual"
[9] "xlevels" "call" "terms" "model"
$class
[1] "lm"
attributes(summary(m1))
$names
[1] "coefficients" "residuals" "effects" "rank"
[5] "fitted.values" "assign" "qr" "df.residual"
[9] "xlevels" "call" "terms" "model"
$class
[1] "lm"
m1$coefficients
(Intercept) x1
3.0000909 0.5000909
summary(m1)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.0000909 1.1247468 2.667348 0.025734051
anscombe$x1 0.5000909 0.1179055 4.241455 0.002169629
## Same scatterplot, adds the linear model
plot(anscombe$x1, anscombe$y1, main="Anscombe: Set 1 w/ Model in Red", xlab="x1", ylab="y1")
abline(m1, col="red")
png("anscombe-1.png")
plot(anscombe$x1, anscombe$y1, main="Anscombe: Set 1 w/ Model in Red", xlab="x1", ylab="y1")
abline(m1, col="red")
dev.off()
qqplot(anscombe$x1,anscombe$y1)
abline(m1, col="red")
plot(m1)
m1
Call:
lm(formula = y1 ~ x1, data = anscombe)
Coefficients:
(Intercept) x1
3.0001 0.5001
8.0011 = 10 * .5001 + 3.0001
p1 <- data.frame(x1=anscombe$x1+30, y1=NA)
p1$y1 <- predict(object=m1, newdata=p1)
plot(rbind(anscombe[,c(1,5)], p1))
abline(m1, col="red")
mGood <- lm(formula=y1~x1, data=anscombe)
mBad <- lm(formula=anscombe$y1~anscombe$x1)
Titanic in Cobh Harbour, County Cork Ireland