Get CGM Data

Jul 29, 2015
data management
Andy Choens

About

We store all of Karen’s CGM / Blood Glucose data on a Nightscout database. The database is a Mongodb (Mongo) instance running on a cloud server controlled by our family. I intend to use this data to better understand her diabetes. In order to do so, I have to learn how to use Mongo.

Objectives Of This Post

This post is a documented code example to help myself and others use Mongo and R for analytics. My objectives for this post include:

Import Nightscout data (JSON) to an R data frame.
Briefly examine / QA the Nightscout data.
Export the Nightscout data as a CSV file.

I will post some analytic code soon.

Code, Data, Tools

Code: The .Rmd file used to create this post is available on GitHub.

Data: Although I will not publish the user name and password to the Mongo database, we are publishing some of Karen’s Nightscout data publicly via GitHub. Feel free to use this data for research or learning purposes. The data set generated by the example code from this document can be found in the data sub-folder on the project page.

Data Warning: Our current rig has been in nearly continuous use since June 21, 2015. Data collected prior to June 2015 is less consistent due to limitations with the previous rig and mistakes we made figuring out how to use it. I do not recommend using data collected prior to June 2015 for analysis.

Reproducibility Warning: File names and content are subject to periodic updates. Open a bug or download one of these datasets for long-term use.

Tools: I used the RoboMongo application to view Mongo JSON data directly in the database server. Like R, it is FOSS and is available for Linux, Mac and Windows. Other tools used include R and Emacs running on Fedora 22.

R Packages

There are three R packages able to query a Mongo database:

First, I skimmed package documentation for RMongo and rmongodb. RMongo looked easier to use so I tried it first. Unfortunately, it did not work. RMongo is unable to connect to a Mongo server using user names, passwords or custom ports. All three of these are required to connect to a Mongolab server which is what we use to store Karen’s CGM data. This is unfortunate because, RMongo appears to have a fairly R-centric API which could make it easier to use than rmongodb.

The second package, rmongodb, is more complicated to use but it works. This package loops over a cursor to import the data one entry at a time. Looping over a cursor object is an algorithm commonly used in web development but not in R, which tends to avoid loops. Using a cursor in R feels unnecessarily complicated, and may impact speed if your dataset is large. Most of the example code you will find using rmongodb is simplistic compared to this example. Most example code is for querying simple collections. I find the additional complexity required to convert Mongo data into a data frame frustrating. Structural differences in JSON objects and native R data structures are an additional source of complexity to the below code.

Note: I am hopeful that mongolite will simplify this code. In the meantime, this is a workable solution for importing data from a Nightscout database. I only learned about mongolite this week and have not yet tried to use it. I will write a separate post to discuss the results of that effort.

Import Nightscout Data (JSON) To An R Data Frame

This R code has three dependencies, rmongodb, dplyr and pander. Dplyr and pander should be familiar to anyone with literate programming experience in R. I will focus my comments on the use of rmongodb.

The file, passwords.R, is not part of the public repo for security reasons. This may be frustrating because the example code herein cannot be run immediately on your own system and for that I apologize. As written, this code is only runnable if you have access to a Nightscout server. If you do, you can adopt the code in the passwords.example.R file to connect to your own database.

## passwords.R -----------------------------------------------------------------
    ## Defines the variables I don't want to post to GitHub. (Sorry)
    ## ns is short for Nightscout.
    ## ns_host = URL for the host server (xyz.Mongolab.com:123456)
    ## ns_db = Name of the Nightscout database (lade_k_nightscout)
    ## ns_user = Admin User Name (Not admin)
    ## ns_pw = Admin Password (Not Admin)
    source("passwords.R")

    ## Required Packages -----------------------------------------------------------
    library(rmongodb)  ## For importing the data.
    library(dplyr)     ## For QA / data manipulation.
    library(pander)    ## For nice looking tables, etc.

Loading the rmongodb package, version 1.8.0, returns the following warning:

WARNING!

There are some quite big changes in this version of rmongodb. mongo.bson.to.list, mongo.bson.from.list (which are workhorses of many other rmongofb high-level functions) are rewritten. Please, TEST IT BEFORE PRODUCTION USAGE. Also there are some other important changes, please see NEWS file and release notes at https://github.com/mongosoup/rmongodb/releases/

In spite of the drama, it worked fine. Mongo stores data in an object called a collection, which is similar to a table in a traditional RDBMS. However, unlike a table, the structure of a collection is not defined prior to use. There are several other differences, which you can learn about by reading the introduction tutorial which is written by actual Mongo experts.

Collections

The following code chunk returns a list of all the collections which exist in the Nightscout database. The “entries” collection is the only collection we are interested in today. The database name, lade_k_nightscout is prepended to each collection name.

## Open a connection to Mongo --------------------------------------------------
    con <- mongo.create(host = ns_host,
                        username = ns_user,
                        password = ns_pw,
                        db = ns_db
                       )

    ## Make sure we have a connection ----------------------------------------------
    if(mongo.is.connected(con) == FALSE) stop("Mongo connection has been terminated.")


    ns_collections <- mongo.get.database.collections(con, ns_db)
    pandoc.list(ns_collections)

lade_k_nightscout.entries
lade_k_nightscout.devicestatus
lade_k_nightscout.treatments

Blank Vectors ————-

The next code chunk will produce some empty vectors needed to hold the Nightscout data before it is turned into a data frame. When importing data from a RDBMS, it is normal practice to place the imported data directly into an R data frame. When importing data from Mongo, the data must first be placed into vectors. This is further complicated by the fact that some records have a different number of fields and we have to handle the NULL values in R, rather than via the database. Complexities like this make this code much more complicated than a simple ‘select * from foo;’ query in a RDBMS.

## Make sure we still have a connection ----------------------------------------
    if(mongo.is.connected(con) == FALSE) stop("Mongo connection has been terminated.")

    ## Collections Variables -------------------------------------------------------
    ## Yeah, I just hard-coded these. Sue me.
    ns_entries <- "lade_k_nightscout.entries"

    ## Mongo Variables -------------------------------------------------------------
    ## ns_count: Total number of records in entries.
    ## ns_cursor: A cursor variable capable of returning the valie of all fields in
    ##            a single row of the entries collection.
    ##
    ns_count   <- mongo.count(con, ns_entries)
    ns_cursor <- mongo.find(con, ns_entries)

    ## R Vectors to hold Nightscout  data ------------------------------------------
    ## If you don't define the variable type, you tend to get characters.
    device     <- vector("character",ns_count)
    date       <- vector("numeric",ns_count)
    dateString <- vector("character",ns_count)
    sgv        <- vector("integer",ns_count)
    direction  <- vector("character",ns_count)
    type       <- vector("character",ns_count)
    filtered   <- vector("integer",ns_count)
    unfiltered <- vector("integer",ns_count)
    rssi       <- vector("integer",ns_count)
    noise      <- vector("integer",ns_count)
    mbg        <- vector("numeric",ns_count)
    slope      <- vector("numeric",ns_count)
    intercept  <- vector("numeric",ns_count)
    scale      <- vector("numeric",ns_count)

As of 2015-07-29 the “entries” collection contains r format(ns_count, big.mark=",") records. That is a lot of data, about a single person. The following code chunk imports the records in ‘entries’ and places the data into the vectors produced above.

Import Nightscout Data

The ‘ns_cursor’ variable is a cursor. Looping over a cursor to get entry-specific output is an approach which should be familiar to web-developers. The use of a cursor loop feels odd because R programming usually avoids loops like the plague but it works and appears to be the preferred way of importing data from Mongo.

## Get the CGM Data, with a LOOP -----------------------------------------------

    i = 1

    while(mongo.cursor.next(ns_cursor)) {
        
        # Get the values of the current record
        cval = mongo.cursor.value(ns_cursor)

        ## Place the values of the record into the appropriate location in the vectors.
        ## Must catch NULLS for each record or the vectors will have different lengths when we are done.
        device[i] <- if( is.null(mongo.bson.value(cval, "device")) ) NA else mongo.bson.value(cval, "device")
        date[i] <- if( is.null(mongo.bson.value(cval, "date")) ) NA else mongo.bson.value(cval, "date")
        dateString[i] <- if(is.null(mongo.bson.value(cval, "dateString")) ) NA else mongo.bson.value(cval, "dateString")
        sgv[i] <- if( is.null( mongo.bson.value(cval, "sgv") ) ) NA else mongo.bson.value(cval, "sgv")
        direction[i] <- if( is.null( mongo.bson.value(cval, "direction") ) ) NA else mongo.bson.value(cval, "direction")
        type[i] <- if( is.null(mongo.bson.value(cval, "type") ) ) NA else mongo.bson.value(cval, "type")
        filtered[i] <- if( is.null( mongo.bson.value(cval, "filtered") ) ) NA else mongo.bson.value(cval, "filtered")
        unfiltered[i] <- if( is.null( mongo.bson.value(cval, "unfiltered") ) ) NA else mongo.bson.value(cval, "unfiltered")
        rssi[i] <- if( is.null( mongo.bson.value(cval, "rssi") ) ) NA else mongo.bson.value(cval, "rssi")
        noise[i] <- if( is.null( mongo.bson.value(cval, "noise") ) ) NA else mongo.bson.value(cval, "noise")
        mbg[i] <- if( is.null( mongo.bson.value(cval, "mbg"))) NA else mongo.bson.value(cval, "mbg")
        slope[i] <- if( is.null( mongo.bson.value(cval, "slope") ) ) NA else mongo.bson.value(cval, "slope")
        intercept[i] <- if( is.null( mongo.bson.value(cval, "intercept") ) ) NA else mongo.bson.value(cval, "intercept")
        scale[i] <- if( is.null( mongo.bson.value(cval, "scale") ) ) NA else mongo.bson.value(cval, "scale")

        ## Increment the cursor to the next record.
        i = i + 1
    }


    ## Data Clean Up ---------------------------------------------------------------

    ## Fixes the date data.
    ## I'm not sure why I have to divide by 1000.
    date <- as.POSIXct(date/1000, origin = "1970-01-01 00:00:01")


    ## Builds the data.frame -------------------------------------------------------
    entries <- as.data.frame(list( device = device,
                                   date = date,
                                   dateString = dateString,
                                   sgv = sgv,
                                   direction = direction,
                                   type = type,
                                   filtered = filtered,
                                   unfiltered = unfiltered,
                                   rssi = rssi,
                                   noise = noise,
                                   mbg = mbg,
                                   slope = slope,
                                   intercept = intercept,
                                   scale = scale
                                  )
                             )

Mongo allows each to have a different number of data elements. Thus, not all records include a ‘mbg’ element. Furthermore, NULLS must be handled by the client. Querying a collection with a complicated data structure requires some trial and error.

Briefly Examine / QA Nightscout Data

The next code chunk does some very minimal QA on the “entries” data frame. If the data frame has 0 rows, it will force the script to stop. Otherwise, it returns a table with some basic meta-data about the imported data set.

if(dim(entries)[1] == 0) stop("Entries variable contains no rows.")

    entries %>%
        summarize(
            "N Rows" = n(),
            "N Days" = length(unique( format.POSIXct(.$date, format="%F" ))),
            "First Day" = min( format.POSIXct(.$date, format="%F" )),
            "Last Day" = max( format.POSIXct(.$date, format="%F" ))
            ) %>%
        pander()

N Rows	N Days	First Day	Last Day
14384	74	2015-01-30	2015-07-29

The following code chunk returns the number of records saved each day between June 20 and July 05. Each record is an independent interstitial glucose reading. These records are uploaded via the rig to the Nightscout database. Assuming everything is working, the sensor takes a new reading every five minutes. The reading is then uploaded to the database, for an expected average of 288 records per day.

entries %>%
        filter(date >= "2015-06-20" & date <= "2015-07-05") %>%
        group_by( "Date" = format.POSIXct(.$date, format="%F") ) %>%
        summarize("N Entries" = n() ) %>%
        pander()

Date	N Entries
2015-06-21	15
2015-06-22	279
2015-06-23	261
2015-06-24	275
2015-06-25	286
2015-06-26	237
2015-06-27	85
2015-06-28	130
2015-06-29	282
2015-06-30	285
2015-07-01	276
2015-07-02	283
2015-07-03	256
2015-07-04	283

Nightscout users refer to the hardware and software used to manage their data as a ‘rig’. The previous table is interesting because it shows the variability in the amount of data collected via the rig over a set of 15 days. There are several days which have far fewer than the expected number of records.

June 21: I set up Karen’s current rig on the evening of the 21st. Because it was late, Karen’s rig only saved 15 records on that ‘day’.
June 27: Karen believes she replaced her CGM sensor on June 27. There was a period of several hours of no data after the old sensor failed and an additional time gap while she was syncing the new sensor to her CGM. As a result of those gaps, there are only 85 records on the 27th.
June 28: The small number of entries on the 28th is a mystery. For some reason the sensor was not communicating with the receiver correctly or the rig failed to upload the data to the server. We don’t know which.

The other days are fairly consistent, but the table does demonstrate how the number of entries recorded can vary dramatically. The inconsistency in the quantity of data will have an impact on our ability to use Nightscout data to predict her future blood glucose levels.

Export Nightscout Data As A CSV File

The final code chunk exports a date-stamped CSV file. Feel free to download the file locally if you would like to use it.

I’ll try to add a new data set periodically to the public data. Old data sets will remain frozen, for reproducibility purposes, but may disappear at some point in the future. Don’t expect there to be more than 5 data sets in the data folder.

## Saves the data as a CSV file ------------------------------------------------
    ## You are welcome to use the data stored publicly in the data folder.
    file_name <- paste("data/entries-", Sys.Date(), ".csv", sep="")
    write.csv(entries, file_name, row.names = FALSE)

    ## Clean up the session and good-bye.
    rm(list=ls())

Blood Sugars

Using Nightscout CGM Data