Introduction

The most recent Morbidity and Mortality Weekly Report, dated May 2, 2014, from the Centers for Disease Control and Prevention had a report by Yoon et al. (2014) on potentially preventable deaths from 5 leading causes of death for people under the age of 80. In this post, I use interactive bar charts and choropleths to help visualize state-wise statistics. For these charts, I use googleVis and RStudio's shiny server platform. This post was generated using slidify and the code necessary to recreate it can be found on github. The code for the accompanying shiny app can also be found on github.

Data

The report mentions that in 2010, the top 5 causes of death - diseases of the heart, cancer, chronic lower respiratory disease, cerebrovascular diseases (stroke), and unintentional injuries accounted for approximately 63% of all deaths. For the purposes of their report, they used mortality data from the National Vital Statistics System for 2008-2010. Please read their report for caveats associated with the data as well as the assumptions underlying the procedures used. Implications are also discussed in the report and the discussion section of the report is really worth a read.

Retrieve Data

This section of the R code retrieves data from CDC's report.

library(XML)
URL = "http://www.cdc.gov/mmwr/preview/mmwrhtml/mm6317a1.htm?s_cid=mm6317a1_w"
table = readHTMLTable(URL)
statewise = table[[1]]  # first of two tables on that page

Data Cleaning and Manipulations

Let's clean the dataset by doing the following.

  1. Changing column names
  2. Removing the top 3 rows and the bottom two rows

Let's also check the structure of the data.

colnames(statewise) = c("State", "HeartDiseasesObserved", "HeartDiseasesExpected", 
    "HeartDiseasesPreventable", "CancerDiseasesObserved", "CancerDiseasesExpected", 
    "CancerDiseasesPreventable", "ChroniclowerrespiratoryDiseasesObserved", 
    "ChroniclowerrespiratoryDiseasesExpected", "ChroniclowerrespiratoryDiseasesPreventable", 
    "CerebrovascularDiseasesObserved", "CerebrovascularDiseasesExpected", "CerebrovascularDiseasesPreventable", 
    "UnintentionalinjuriesObserved", "UnintentionalinjuriesExpected", "UnintentionalinjuriesPreventable")
statewise = statewise[-(1:3), ]
statewise = statewise[-(52:53), ]
str(statewise)
## 'data.frame':    51 obs. of  16 variables:
##  $ State                                     : Factor w/ 56 levels "Abbreviation: DC = District of Columbia.\r\n\t\t\t\t\t\t\t\t*\tExpected deaths are the lowest three-state average age-specific "| __truncated__,..: 2 3 4 5 6 7 8 11 9 12 ...
##  $ HeartDiseasesObserved                     : Factor w/ 54 levels "1,007","1,080",..: 43 31 29 24 22 20 17 49 46 12 ...
##  $ HeartDiseasesExpected                     : Factor w/ 54 levels "1,063","1,194",..: 22 33 29 8 14 20 15 42 32 12 ...
##  $ HeartDiseasesPreventable                  : Factor w/ 54 levels "0","1,089","1,092",..: 29 10 49 6 39 9 32 26 40 35 ...
##  $ CancerDiseasesObserved                    : Factor w/ 54 levels "1,054","1,304",..: 44 45 41 30 27 32 29 3 46 23 ...
##  $ CancerDiseasesExpected                    : Factor w/ 54 levels "1,006","1,112",..: 33 39 43 22 27 31 24 1 37 20 ...
##  $ CancerDiseasesPreventable                 : Factor w/ 51 levels "0","1,059","1,126",..: 21 14 44 7 30 16 41 35 18 40 ...
##  $ ChroniclowerrespiratoryDiseasesObserved   : Factor w/ 53 levels "1,016","1,035",..: 15 17 12 3 45 7 42 29 47 41 ...
##  $ ChroniclowerrespiratoryDiseasesExpected   : Factor w/ 53 levels "1,004","1,148",..: 42 43 1 32 28 38 34 12 44 24 ...
##  $ ChroniclowerrespiratoryDiseasesPreventable: Factor w/ 51 levels "0","1,013","1,117",..: 2 30 39 42 4 35 1 47 1 12 ...
##  $ CerebrovascularDiseasesObserved           : Factor w/ 52 levels "1,003","1,119",..: 5 49 45 41 36 38 31 18 12 28 ...
##  $ CerebrovascularDiseasesExpected           : Factor w/ 52 levels "1,015","1,108",..: 34 37 46 26 21 32 28 7 36 16 ...
##  $ CerebrovascularDiseasesPreventable        : Factor w/ 49 levels "1,527","1,783",..: 36 12 39 16 1 44 27 31 21 43 ...
##  $ UnintentionalinjuriesObserved             : Factor w/ 52 levels "1,010","1,013",..: 21 34 25 6 47 10 49 29 18 44 ...
##  $ UnintentionalinjuriesExpected             : Factor w/ 52 levels "1,074","1,093",..: 49 17 4 38 42 50 43 19 14 29 ...
##  $ UnintentionalinjuriesPreventable          : Factor w/ 51 levels "0","1,027","1,054",..: 4 23 5 46 12 39 25 16 38 30 ...

Let's change columns for numbers from factor variables to numeric variables and view the data using googleVis's table. Entries can be sorted in this table by clicking on the header for a column.

for (i in 2:16){statewise[, i] = as.character(statewise[,i])}
for (i in 2:16){statewise[, i] = gsub(",","",statewise[,i])}
for (i in 2:16){statewise[, i] = as.numeric(statewise[,i])}

library(googleVis)
plot(gvisTable(statewise,options=list(height=400, width=800)))

For each type of disease, we do the following. Instead of dealing with raw numbers of potential deaths preventable, we compute the percentage of potential deaths preventable among the number of deaths observed. We then also compute the average percentage of potential deaths preventable among the 5 categories of diseases.

statewise$PercentageHeartDiseasesPreventable = round(statewise$HeartDiseasesPreventable * 
    100/statewise$HeartDiseasesObserved, 2)
statewise$PercentageCancerDiseasesPreventable = round(statewise$CancerDiseasesPreventable * 
    100/statewise$CancerDiseasesObserved, 2)
statewise$PercentageChroniclowerrespiratoryDiseasesPreventable = round(statewise$ChroniclowerrespiratoryDiseasesPreventable * 
    100/statewise$ChroniclowerrespiratoryDiseasesObserved, 2)
statewise$PercentageCerebrovascularDiseasesPreventable = round(statewise$CerebrovascularDiseasesPreventable * 
    100/statewise$CerebrovascularDiseasesObserved, 2)
statewise$PercentageUnintentionalinjuriesPreventable = round(statewise$UnintentionalinjuriesPreventable * 
    100/statewise$UnintentionalinjuriesObserved, 2)

statewise$PercentageAveragePreventableDeaths = round((statewise$PercentageHeartDiseasesPreventable + 
    statewise$PercentageCancerDiseasesPreventable + statewise$PercentageChroniclowerrespiratoryDiseasesPreventable + 
    statewise$PercentageCerebrovascularDiseasesPreventable + statewise$PercentageUnintentionalinjuriesPreventable)/5, 
    2)

save(statewise, file = "statewise.Rda")

Visualizations

Let's now start plotting bar charts and choropleths using googleVis within the shiny server environment. This application is hosted by RStudio in their shinyapps.io server. Before we do that, we make the following modifications to the dataset.

  1. Convert it into a long form such that all columns, besides State are collapsed into a single column with a new column for the corresponding value.
  2. Reorder levels of the column of different diseases so that the average percentages and disease percentages are among the first few levels.
library(reshape2)
statewisemelt = melt(statewise, id = "State")
statewisemelt$variable = factor(statewisemelt$variable, levels(statewisemelt$variable)[c(21, 
    16:20, 1:15)])
save(statewisemelt, file = "statewisemelt.Rda")

As mentioned previously, this application is hosted on R-Studio's shinyapps.io platform. As mentioned at the beginning of the post, the code necessary to recreate the post can be found on github. The code for the shiny app below can also be found on github. You can hover over either the bars of the barchart or over the map to get the corresponding values. A quick update: If this app doesn't show up, an alternate app hosted on R-Studio's glimmer server can be found here. The code of that alternative glimmer app can be found here.