Learning Objectives
- What is it?
- How to download?
- How to open?
-
Structure:
- Data Management
- Data Analysis
What is it?
-
Take Five: A
nightly game from the New York State Lottery.
-
Nightly, five numbers are chosen "at random".
-
Each is between 1 and 39.
-
The data for all winning numbers, starting in 1992, are
available on-line at the New York State Open Data site.
Objectives
- Download the data from Open Data.
-
Use the Chi Squared test to see if the distribution of
winning numbers is evenly distributed.
-
With over 36,000 winning numbers chosen since 1992, we
have very high power.
-
Project Code
-
Download to your local computer & unzip on Desktop.
New Project
Let's make a new project!
-
File -> New Project
-
Three options
- New Directory
- Existing Directory
- Version Control
-
Select the second option.
-
Select the folder you just created on your desktop.
-
Let us know now if you don't get this to work.
Documentation
- I keep saying R is an ecosystem.
- This is a good example.
-
Open the README.md file with RStudio.
Remember, you can open files in the bottom right pane.
- See how that Markdown renders to this: LINK!
- Proficient R users are (almost) universally polyglots.
- We use lots of languages and tools.
- The learning curve _is_ steep. But it _is_ worth it.
Working Directory
-
Because RStudio sets the current working directory to the
root folder of the project, we can use relative urls in
our code.
- I'll show you what that means in just a minute.
App Tokens
- New York State Open Data runs on a platform called Socrata.
- To download data, you need a APP TOKEN.
-
To get one, you must go: here.
- You will need an account. Sorry.
- Let's walk through this together.
Research Question
Are the winning Take Five numbers selected randomly?
Your Turn!
Team up with a neighbor (or two). Can you run this code?
- You don't have to understand all of it.
- READMEs are there to help!
-
You WILL need your APP TOKEN.
Code will not run without it.
- Download Code Here!
- Can you answer our research question from the last slide?
Where did you start?
Did you create take_five.Rda?
Did you answer Research Question?
We will walk through this thing together.
Where should we start?
Why would we want to separate data management?
Start: data-raw/take_five.R
APP TOKEN
Insert YOUR APP TOKEN into the code.
## GET DATA ====================================================================
## To learn more about using th Socrata API with this data set:
## https://dev.socrata.com/foundry/data.ny.gov/hh4x-xmbw
take_five <- read.socrata("https://data.ny.gov/resource/hh4x-xmbw.json?$$app_token=YOUR_APP_TOKEN")
Data Munging
Can anyone explain this part of take_five.R?
## Transforms " " separated values into a single column, called winning_numbers:
take_five <-
take_five %>%
transform(winning_numbers=strsplit(winning_numbers, split=" ")) %>%
unnest(winning_numbers) %>%
select(draw_date, winning_numbers=winning_numbers, bonus) %>%
distinct()
Any other questions about data-raw/take_five.R?
No? Then open explore.R and run it.