Take Five

  • Duration: ~30 Minutes
Photo of the Titanic

Learning Objectives

  • What is it?
  • How to download?
  • How to open?
  • Structure:
    • Data Management
    • Data Analysis

What is it?

  • Take Five: A nightly game from the New York State Lottery.
  • Nightly, five numbers are chosen "at random".
  • Each is between 1 and 39.
  • The data for all winning numbers, starting in 1992, are available on-line at the New York State Open Data site.
NYS Lotto Take Five logo

Objectives

  • Download the data from Open Data.
  • Use the Chi Squared test to see if the distribution of winning numbers is evenly distributed.
  • With over 36,000 winning numbers chosen since 1992, we have very high power.
  • Project Code
  • Download to your local computer & unzip on Desktop.

New Project

Let's make a new project!

  • File -> New Project
  • Three options
    • New Directory
    • Existing Directory
    • Version Control
  • Select the second option.
  • Select the folder you just created on your desktop.
  • Let us know now if you don't get this to work.
RStudio: New Project

Documentation

  • I keep saying R is an ecosystem.
  • This is a good example.
  • Open the README.md file with RStudio.
    Remember, you can open files in the bottom right pane.
  • See how that Markdown renders to this: LINK!
  • Proficient R users are (almost) universally polyglots.
  • We use lots of languages and tools.
  • The learning curve _is_ steep. But it _is_ worth it.

Working Directory

  • Because RStudio sets the current working directory to the root folder of the project, we can use relative urls in our code.
  • I'll show you what that means in just a minute.

App Tokens

  • New York State Open Data runs on a platform called Socrata.
  • To download data, you need a APP TOKEN.
  • To get one, you must go: here.
  • You will need an account. Sorry.
  • Let's walk through this together.

Research Question

Are the winning Take Five numbers selected randomly?

Photo of Captain Smith Your Turn!

Team up with a neighbor (or two). Can you run this code?

  • You don't have to understand all of it.
  • READMEs are there to help!
  • You WILL need your APP TOKEN.
    Code will not run without it.
  • Download Code Here!
  • Can you answer our research question from the last slide?

Where did you start?


Did you create take_five.Rda?


Did you answer Research Question?


We will walk through this thing together.

Where should we start?


Why would we want to separate data management?

Start: data-raw/take_five.R

APP TOKEN

Insert YOUR APP TOKEN into the code.


## GET DATA ====================================================================
## To learn more about using th Socrata API with this data set:
## https://dev.socrata.com/foundry/data.ny.gov/hh4x-xmbw
take_five <- read.socrata("https://data.ny.gov/resource/hh4x-xmbw.json?$$app_token=YOUR_APP_TOKEN")
    

Data Munging

Can anyone explain this part of take_five.R?


## Transforms " " separated values into a single column, called winning_numbers:
take_five <-
    take_five %>%
    transform(winning_numbers=strsplit(winning_numbers, split=" ")) %>%
    unnest(winning_numbers) %>%
    select(draw_date, winning_numbers=winning_numbers, bonus) %>%
    distinct()
    

Any other questions about data-raw/take_five.R?


No? Then open explore.R and run it.