Recently in conversation, a relative mentioned anecdotally how heart complications seemed to be getting more and more treatable, and cited several examples of heart-attack victims she knew who had made successful recoveries from things that would have almost certainly been death sentences once. I thought it might be interesting to try and visualize this decline over the years, in addition to seeing if there are similar declines for other illnesses. I’ve also heard women and minorities talk about their struggles with the healthcare system, so I thought it would also be illuminating to see how the trends in mortality differ by race and gender.

To start with, I went to the CDC compressed mortality dataset and queried the 1999-2016 database. Each database organizes cause of death based on the International Classification of Disease (ICD), which has updated several times over the years. While older data is available, the classification system would be slightly different the further back I went, and I’d have to try to find some way of equating them all. Fortunately for my purposes, the 1999 to 2016 data, which uses the 10th edition of the ICD, should be plenty. I queried the system for information organized by ICD sub-chapter, year, age, race, and gender.

For this analysis, I’m not actually interested in examining the data by age, but I still need to query the age data in order to create age-adjusted death rates, which is what the CDC recommends using when comparing multiple populations. The reason for this is that some causes of death are far more common for certain age groups. Various illnesses are common for older adults but rare for children, or vice-versa. Alternatively, some things are much more dangerous depending on age group. For example, chicken-pox is considered fairly harmless for children, but can cause severe complications in adults. If we had a population with relatively few children, but lots of older adults, we might expect to see a much higher rate of death for chicken-pox, even if the rate of chicken-pox is exactly the same as a more child-heavy group. In order to account for this, age-adjusted mortality rates take a weighted average of different age groups in order to transform each population into a standardized population, where the proportions of each age group are the same. Currently the CDC uses the estimated population for the year 2000 as their standardized population. The estimates and the related age weights can be found here.

Unfortunately, the CDC Wonder query system won’t output any query that produces more than 75,000 results. In order to get all the data I wanted, I had to submit a separate query for each individual age range, meaning I actually have 14 datasets to work with.

Data Cleaning

Before I start analyzing the data, I need to do some basic data cleaning, and some formatting as well. Fortunately for this analysis, I’m just using data that was output by a single query system, and I was able to remove a lot of unwanted input at the time of the query, so there shouldn’t be too much cleaning needed.

To start with, I’ll have to merge my 14 datasets into a single dataframe. After that, I want to get a clear idea of what I’m working with, so I’m also going to ask for the dimensions of the dataframe as well as its basic structure.

library(tidyverse)
library(knitr)
library(kableExtra)

# Load data by age

df_1     <- read.delim("data/mortality_1999-2016_1.txt")
df_1_4   <- read.delim("data/mortality_1999-2016_1-4.txt")
df_5_9   <- read.delim("data/mortality_1999-2016_5-9.txt")
df_10_14 <- read.delim("data/mortality_1999-2016_10-14.txt")
df_15_19 <- read.delim("data/mortality_1999-2016_15-19.txt")
df_20_24 <- read.delim("data/mortality_1999-2016_20-24.txt")
df_25_34 <- read.delim("data/mortality_1999-2016_25-34.txt")
df_35_44 <- read.delim("data/mortality_1999-2016_35-44.txt")
df_35_44 <- read.delim("data/mortality_1999-2016_35-44.txt")
df_45_54 <- read.delim("data/mortality_1999-2016_45-54.txt")
df_55_64 <- read.delim("data/mortality_1999-2016_55-64.txt")
df_65_74 <- read.delim("data/mortality_1999-2016_65-74.txt")
df_75_84 <- read.delim("data/mortality_1999-2016_75-84.txt")
df_85    <- read.delim("data/mortality_1999-2016_85+.txt")

# Merge into single dataframe

df <- rbind(df_1, 
            df_1_4, 
            df_5_9, 
            df_10_14, 
            df_15_19, 
            df_20_24, 
            df_25_34, 
            df_35_44,
            df_45_54,
            df_55_64,
            df_65_74,
            df_75_84,
            df_85)

dim(df)
## [1] 138420     14
str(df)
## 'data.frame':    138420 obs. of  14 variables:
##  $ Notes               : chr  "" "" "" "" ...
##  $ ICD.Sub.Chapter     : chr  "Intestinal infectious diseases" "Intestinal infectious diseases" "Intestinal infectious diseases" "Intestinal infectious diseases" ...
##  $ ICD.Sub.Chapter.Code: chr  "A00-A09" "A00-A09" "A00-A09" "A00-A09" ...
##  $ Year                : int  1999 1999 1999 1999 2000 2000 2000 2000 2000 2001 ...
##  $ Year.Code           : int  1999 1999 1999 1999 2000 2000 2000 2000 2000 2001 ...
##  $ Age.Group           : chr  "< 1 year" "< 1 year" "< 1 year" "< 1 year" ...
##  $ Age.Group.Code      : chr  "1" "1" "1" "1" ...
##  $ Gender              : chr  "Female" "Female" "Male" "Male" ...
##  $ Gender.Code         : chr  "F" "F" "M" "M" ...
##  $ Race                : chr  "Black or African American" "White" "Black or African American" "White" ...
##  $ Race.Code           : chr  "2054-5" "2106-3" "2054-5" "2106-3" ...
##  $ Deaths              : int  2 4 3 12 1 3 1 4 8 2 ...
##  $ Population          : int  297942 1452001 307443 1528776 302401 1447232 84431 312650 1524372 316708 ...
##  $ Crude.Rate          : chr  "0.7 (Unreliable)" "0.3 (Unreliable)" "1.0 (Unreliable)" "0.8 (Unreliable)" ...

The only immediate issues I see are that several columns have redundant information. Most of them are also currently character strings, whereas I’d prefer them to be factors. From the look of the first few entries in the character string columns, there’s no immediately apparent reason why they can’t be simply converted into factors. I’ll take a closer look later, but for now no problems are jumping out for those columns.

One potentially larger issue is that I’d like the crude death rate to be numeric, rather than a character string. Because that column contains a lot of non-numeric values, such as several cells being marked “Unreliable”, it won’t convert simply into a numeric value. Normally I’d want to remove those character strings or split them off into a different column. However, I don’t need to do that for this dataframe, because I don’t actually want this column of crude death rates at all. The crude rate is calculated by the formula “(Deaths * 100,000) / Population”. This means that the crude rates I have right now only work for this specific grouping of the data. If, for example, I wanted to examine the change in death rate across race and cause of death, but not across gender, I’d have to calculate new crude rates using the sum of the deaths and population for both men and women. In fact, the only reason I included the crude rate in my query was because it’s mandatory output for the CDC Wonder query system. So, instead of trying to convert this column to being numeric, I can simply remove it from the dataframe later on, and then calculate the death rates for the groups I’m interested in when I need them.

The fact that some of the rates are marked as unreliable also seems like it might be a problem for the validity of my analysis, but it’s also not an issue. The reason they’ve been marked as unreliable is because the CDC doesn’t consider a crude rate to be reliable when there are fewer than 20 deaths in the population. The sample size is too small for the number to be meaningful. The reason the number of deaths I have in each field is so low, however, is because I split each population into 14 different age groups. When I calculate the age-adjusted death rates, all those age groups will be added to each other, resulting in an age-adjusted death rate with a decent sample size.

As far as the redundant columns go, for several fields, such as the ICD sub-chapter, gender, year, age group, and race, the query system automatically outputs both a plain English version of the data and a reference code. This means I have a lot of columns with perfectly correlated information. I’ll deal with those in a moment, but before I do I want to make sure I don’t have any data cells marked as “NA” in the dataframe.

apply(df, 2, function(x) sum(is.na(x)))
##                Notes      ICD.Sub.Chapter ICD.Sub.Chapter.Code 
##                    0                    0                    0 
##                 Year            Year.Code            Age.Group 
##                  756                  756                    0 
##       Age.Group.Code               Gender          Gender.Code 
##                   60                    0                    0 
##                 Race            Race.Code               Deaths 
##                    0                    0                  756 
##           Population           Crude.Rate 
##                  756                    0

It looks like I have several NA values. The number of NA values in Year.Code, Year, Population, and Deaths are all the same, so it’s a fair bet that it’s the same rows containing NA values for each of them. There’s also a much smaller number of NA values in Age.Group.Code. I’ll subset both groups of NA values and examine them to try and see what the cause is.

age_group_na <- subset(df, subset = is.na(df$Age.Group.Code))
year_na <- subset(df, subset = is.na(df$Year))

kable(age_group_na, style = "html") %>%
    kable_styling() %>%
  scroll_box(width = "100%", height = "400px")
Notes ICD.Sub.Chapter ICD.Sub.Chapter.Code Year Year.Code Age.Group Age.Group.Code Gender Gender.Code Race Race.Code Deaths Population Crude.Rate
9352 NA NA NA NA NA
9353 Dataset: Compressed Mortality, 1999-2016 NA NA NA NA NA
9354 Query Parameters: NA NA NA NA NA
9355 Title: mortality_1999-2016_<1 NA NA NA NA NA
9356 Age Group: < 1 year NA NA NA NA NA
9357 Group By: ICD Sub-Chapter; Year; Age Group; Gender; Race NA NA NA NA NA
9358 Show Totals: False NA NA NA NA NA
9359 Show Zero Values: False NA NA NA NA NA
9360 Show Suppressed: False NA NA NA NA NA
9361 Calculate Rates Per: 100,000 NA NA NA NA NA
9362 NA NA NA NA NA
9363 Help: See http://wonder.cdc.gov/wonder/help/cmf.html for more information. NA NA NA NA NA
9364 NA NA NA NA NA
9365 Query Date: Sep 13, 2022 7:12:58 PM NA NA NA NA NA
9366 NA NA NA NA NA
9367 Suggested Citation: Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics NA NA NA NA NA
9368 System, Mortality: Compressed Mortality File 1999-2016 on CDC WONDER Online Database, released June 2017. Data are from the NA NA NA NA NA
9369 Compressed Mortality File 1999-2016 Series 20 No. 2U, 2016, as compiled from data provided by the 57 vital statistics NA NA NA NA NA
9370 jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/cmf-icd10.html on Sep 13, 2022 NA NA NA NA NA
9371 7:12:58 PM NA NA NA NA NA
9372 NA NA NA NA NA
9373 Messages: NA NA NA NA NA
9374
  1. Rows with zero Deaths are hidden. Use Quick Options above to show zero rows.
NA NA NA NA NA
9375 NA NA NA NA NA
9376 Caveats: NA NA NA NA NA
9377
  1. Death rates are flagged as Unreliable when the rate is calculated with a numerator of 20 or less. More information:
NA NA NA NA NA
9378 http://wonder.cdc.gov/wonder/help/cmf.html#Unreliable. NA NA NA NA NA
9379
  1. About national population figures: population figures for 1999 are from the 1990-1999 series of bridged-race intercensal
NA NA NA NA NA
9380 estimates of the July 1 resident population; population figures for 2000 and 2010 are bridged-race April 1 census counts; NA NA NA NA NA
9381 population figures for 2001-2009 are from the revised 2000-2009 series of bridged-race intercensal estimates of the July 1 NA NA NA NA NA
9382 resident population; population figures for 2011 are bridged-race postcensal estimates of the July 1 resident population, from NA NA NA NA NA
9383 the Vintage 2011 series released by NCHS on July 18, 2012; population figures for 2012 are bridged-race postcensal estimates of NA NA NA NA NA
9384 the July 1 resident population, from the Vintage 2012 series released by NCHS on June 13, 2013; population figures for 2013 are NA NA NA NA NA
9385 bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2013 series released by NCHS on June 26, NA NA NA NA NA
9386 2014; population figures for 2014 are bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2014 NA NA NA NA NA
9387 series released by NCHS on June 30, 2015. and population figures for 2015 are bridged-race postcensal estimates of the July 1 NA NA NA NA NA
9388 resident population, from the Vintage 2015 series released by NCHS on June 28, 2016. More information: NA NA NA NA NA
9389 http://wonder.cdc.gov/wonder/help/cmf.html#Population Information. NA NA NA NA NA
9390
  1. The population figures used in the calculation of death rates for the age group ‘under 1 year’ are the estimates of the
NA NA NA NA NA
9391 resident population that is under one year of age. More information: http://wonder.cdc.gov/wonder/help/cmf.html#Age Group. NA NA NA NA NA
9392
  1. Deaths of persons with Age “Not Stated” are included in “All” counts and rates, but are not distributed among age groups,
NA NA NA NA NA
9393 so are not included in age-specific counts, age-specific rates or in any age-adjusted rates. More information: NA NA NA NA NA
9394 http://wonder.cdc.gov/wonder/help/cmf.html#Not Stated. NA NA NA NA NA
9395
  1. Information included on the death certificate about the race and Hispanic ethnicity of the decedent is reported by the
NA NA NA NA NA
9396 funeral director as provided by an informant, often the surviving next of kin, or, in the absence of an informant, on the basis NA NA NA NA NA
9397 of observation. Race and ethnicity information from the census is by self-report. To the extent that race and Hispanic origin NA NA NA NA NA
9398 are inconsistent between these two data sources, death rates will be biased. More information: NA NA NA NA NA
9399 http://wonder.cdc.gov/wonder/help/cmf.html#Racial Differences. NA NA NA NA NA
9400
  1. As of April 3, 2017, the underlying cause of death has been revised for 125 deaths in 2014. More information:
NA NA NA NA NA
9401 http://wonder.cdc.gov/wonder/help/cmf.html#2014-Revision. NA NA NA NA NA
9402
  1. Circumstances in California resulted in unusually high death counts for the ICD-10 cause of death code R99, “Other
NA NA NA NA NA
9403 ill-defined and unspecified causes of mortality” for deaths occurring in years 2000 and 2001. Caution should be used in NA NA NA NA NA
9404 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#California-Reporting-Anomalies. NA NA NA NA NA
9405
  1. Circumstances in Georgia for the years 2008-2009 have resulted in unusually high death counts for the ICD-10 cause of death
NA NA NA NA NA
9406 code R99, “Other ill-defined and unspecified causes of mortality.” Caution should be used in interpreting these data. More NA NA NA NA NA
9407 information: http://wonder.cdc.gov/wonder/help/cmf.html#Georgia-Reporting-Anomalies. NA NA NA NA NA
9408
  1. Circumstances in New Jersey for the year 2009 have resulted in unusually high death counts for the ICD-10 cause of death code
NA NA NA NA NA
9409 R99, “Other ill-defined and unspecified causes of mortality” and therefore unusually low death counts in other ICD-10 codes, NA NA NA NA NA
9410 most notably R95, “Sudden Infant Death Syndrome” and X40-X49, “Unintentional poisoning.” Caution should be used in NA NA NA NA NA
9411 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#New-Jersey-Reporting-Anomalies. NA NA NA NA NA
kable(year_na, style = "html") %>%
  kable_styling() %>%
  scroll_box(width = "100%", height = "400px")
Notes ICD.Sub.Chapter ICD.Sub.Chapter.Code Year Year.Code Age.Group Age.Group.Code Gender Gender.Code Race Race.Code Deaths Population Crude.Rate
9352 NA NA NA NA NA
9353 Dataset: Compressed Mortality, 1999-2016 NA NA NA NA NA
9354 Query Parameters: NA NA NA NA NA
9355 Title: mortality_1999-2016_<1 NA NA NA NA NA
9356 Age Group: < 1 year NA NA NA NA NA
9357 Group By: ICD Sub-Chapter; Year; Age Group; Gender; Race NA NA NA NA NA
9358 Show Totals: False NA NA NA NA NA
9359 Show Zero Values: False NA NA NA NA NA
9360 Show Suppressed: False NA NA NA NA NA
9361 Calculate Rates Per: 100,000 NA NA NA NA NA
9362 NA NA NA NA NA
9363 Help: See http://wonder.cdc.gov/wonder/help/cmf.html for more information. NA NA NA NA NA
9364 NA NA NA NA NA
9365 Query Date: Sep 13, 2022 7:12:58 PM NA NA NA NA NA
9366 NA NA NA NA NA
9367 Suggested Citation: Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics NA NA NA NA NA
9368 System, Mortality: Compressed Mortality File 1999-2016 on CDC WONDER Online Database, released June 2017. Data are from the NA NA NA NA NA
9369 Compressed Mortality File 1999-2016 Series 20 No. 2U, 2016, as compiled from data provided by the 57 vital statistics NA NA NA NA NA
9370 jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/cmf-icd10.html on Sep 13, 2022 NA NA NA NA NA
9371 7:12:58 PM NA NA NA NA NA
9372 NA NA NA NA NA
9373 Messages: NA NA NA NA NA
9374
  1. Rows with zero Deaths are hidden. Use Quick Options above to show zero rows.
NA NA NA NA NA
9375 NA NA NA NA NA
9376 Caveats: NA NA NA NA NA
9377
  1. Death rates are flagged as Unreliable when the rate is calculated with a numerator of 20 or less. More information:
NA NA NA NA NA
9378 http://wonder.cdc.gov/wonder/help/cmf.html#Unreliable. NA NA NA NA NA
9379
  1. About national population figures: population figures for 1999 are from the 1990-1999 series of bridged-race intercensal
NA NA NA NA NA
9380 estimates of the July 1 resident population; population figures for 2000 and 2010 are bridged-race April 1 census counts; NA NA NA NA NA
9381 population figures for 2001-2009 are from the revised 2000-2009 series of bridged-race intercensal estimates of the July 1 NA NA NA NA NA
9382 resident population; population figures for 2011 are bridged-race postcensal estimates of the July 1 resident population, from NA NA NA NA NA
9383 the Vintage 2011 series released by NCHS on July 18, 2012; population figures for 2012 are bridged-race postcensal estimates of NA NA NA NA NA
9384 the July 1 resident population, from the Vintage 2012 series released by NCHS on June 13, 2013; population figures for 2013 are NA NA NA NA NA
9385 bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2013 series released by NCHS on June 26, NA NA NA NA NA
9386 2014; population figures for 2014 are bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2014 NA NA NA NA NA
9387 series released by NCHS on June 30, 2015. and population figures for 2015 are bridged-race postcensal estimates of the July 1 NA NA NA NA NA
9388 resident population, from the Vintage 2015 series released by NCHS on June 28, 2016. More information: NA NA NA NA NA
9389 http://wonder.cdc.gov/wonder/help/cmf.html#Population Information. NA NA NA NA NA
9390
  1. The population figures used in the calculation of death rates for the age group ‘under 1 year’ are the estimates of the
NA NA NA NA NA
9391 resident population that is under one year of age. More information: http://wonder.cdc.gov/wonder/help/cmf.html#Age Group. NA NA NA NA NA
9392
  1. Deaths of persons with Age “Not Stated” are included in “All” counts and rates, but are not distributed among age groups,
NA NA NA NA NA
9393 so are not included in age-specific counts, age-specific rates or in any age-adjusted rates. More information: NA NA NA NA NA
9394 http://wonder.cdc.gov/wonder/help/cmf.html#Not Stated. NA NA NA NA NA
9395
  1. Information included on the death certificate about the race and Hispanic ethnicity of the decedent is reported by the
NA NA NA NA NA
9396 funeral director as provided by an informant, often the surviving next of kin, or, in the absence of an informant, on the basis NA NA NA NA NA
9397 of observation. Race and ethnicity information from the census is by self-report. To the extent that race and Hispanic origin NA NA NA NA NA
9398 are inconsistent between these two data sources, death rates will be biased. More information: NA NA NA NA NA
9399 http://wonder.cdc.gov/wonder/help/cmf.html#Racial Differences. NA NA NA NA NA
9400
  1. As of April 3, 2017, the underlying cause of death has been revised for 125 deaths in 2014. More information:
NA NA NA NA NA
9401 http://wonder.cdc.gov/wonder/help/cmf.html#2014-Revision. NA NA NA NA NA
9402
  1. Circumstances in California resulted in unusually high death counts for the ICD-10 cause of death code R99, “Other
NA NA NA NA NA
9403 ill-defined and unspecified causes of mortality” for deaths occurring in years 2000 and 2001. Caution should be used in NA NA NA NA NA
9404 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#California-Reporting-Anomalies. NA NA NA NA NA
9405
  1. Circumstances in Georgia for the years 2008-2009 have resulted in unusually high death counts for the ICD-10 cause of death
NA NA NA NA NA
9406 code R99, “Other ill-defined and unspecified causes of mortality.” Caution should be used in interpreting these data. More NA NA NA NA NA
9407 information: http://wonder.cdc.gov/wonder/help/cmf.html#Georgia-Reporting-Anomalies. NA NA NA NA NA
9408
  1. Circumstances in New Jersey for the year 2009 have resulted in unusually high death counts for the ICD-10 cause of death code
NA NA NA NA NA
9409 R99, “Other ill-defined and unspecified causes of mortality” and therefore unusually low death counts in other ICD-10 codes, NA NA NA NA NA
9410 most notably R95, “Sudden Infant Death Syndrome” and X40-X49, “Unintentional poisoning.” Caution should be used in NA NA NA NA NA
9411 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#New-Jersey-Reporting-Anomalies. NA NA NA NA NA
16420 NA NA NA NA
16421 Dataset: Compressed Mortality, 1999-2016 NA NA NA NA
16422 Query Parameters: NA NA NA NA
16423 Title: mortality_1999-2016_1-4 NA NA NA NA
16424 Age Group: 1-4 years NA NA NA NA
16425 Group By: ICD Sub-Chapter; Year; Age Group; Gender; Race NA NA NA NA
16426 Show Totals: False NA NA NA NA
16427 Show Zero Values: False NA NA NA NA
16428 Show Suppressed: False NA NA NA NA
16429 Calculate Rates Per: 100,000 NA NA NA NA
16430 NA NA NA NA
16431 Help: See http://wonder.cdc.gov/wonder/help/cmf.html for more information. NA NA NA NA
16432 NA NA NA NA
16433 Query Date: Sep 13, 2022 7:15:39 PM NA NA NA NA
16434 NA NA NA NA
16435 Suggested Citation: Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics NA NA NA NA
16436 System, Mortality: Compressed Mortality File 1999-2016 on CDC WONDER Online Database, released June 2017. Data are from the NA NA NA NA
16437 Compressed Mortality File 1999-2016 Series 20 No. 2U, 2016, as compiled from data provided by the 57 vital statistics NA NA NA NA
16438 jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/cmf-icd10.html on Sep 13, 2022 NA NA NA NA
16439 7:15:39 PM NA NA NA NA
16440 NA NA NA NA
16441 Messages: NA NA NA NA
16442
  1. Rows with zero Deaths are hidden. Use Quick Options above to show zero rows.
NA NA NA NA
16443 NA NA NA NA
16444 Caveats: NA NA NA NA
16445
  1. Death rates are flagged as Unreliable when the rate is calculated with a numerator of 20 or less. More information:
NA NA NA NA
16446 http://wonder.cdc.gov/wonder/help/cmf.html#Unreliable. NA NA NA NA
16447
  1. About national population figures: population figures for 1999 are from the 1990-1999 series of bridged-race intercensal
NA NA NA NA
16448 estimates of the July 1 resident population; population figures for 2000 and 2010 are bridged-race April 1 census counts; NA NA NA NA
16449 population figures for 2001-2009 are from the revised 2000-2009 series of bridged-race intercensal estimates of the July 1 NA NA NA NA
16450 resident population; population figures for 2011 are bridged-race postcensal estimates of the July 1 resident population, from NA NA NA NA
16451 the Vintage 2011 series released by NCHS on July 18, 2012; population figures for 2012 are bridged-race postcensal estimates of NA NA NA NA
16452 the July 1 resident population, from the Vintage 2012 series released by NCHS on June 13, 2013; population figures for 2013 are NA NA NA NA
16453 bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2013 series released by NCHS on June 26, NA NA NA NA
16454 2014; population figures for 2014 are bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2014 NA NA NA NA
16455 series released by NCHS on June 30, 2015. and population figures for 2015 are bridged-race postcensal estimates of the July 1 NA NA NA NA
16456 resident population, from the Vintage 2015 series released by NCHS on June 28, 2016. More information: NA NA NA NA
16457 http://wonder.cdc.gov/wonder/help/cmf.html#Population Information. NA NA NA NA
16458
  1. Deaths of persons with Age “Not Stated” are included in “All” counts and rates, but are not distributed among age groups,
NA NA NA NA
16459 so are not included in age-specific counts, age-specific rates or in any age-adjusted rates. More information: NA NA NA NA
16460 http://wonder.cdc.gov/wonder/help/cmf.html#Not Stated. NA NA NA NA
16461
  1. Information included on the death certificate about the race and Hispanic ethnicity of the decedent is reported by the
NA NA NA NA
16462 funeral director as provided by an informant, often the surviving next of kin, or, in the absence of an informant, on the basis NA NA NA NA
16463 of observation. Race and ethnicity information from the census is by self-report. To the extent that race and Hispanic origin NA NA NA NA
16464 are inconsistent between these two data sources, death rates will be biased. More information: NA NA NA NA
16465 http://wonder.cdc.gov/wonder/help/cmf.html#Racial Differences. NA NA NA NA
16466
  1. As of April 3, 2017, the underlying cause of death has been revised for 125 deaths in 2014. More information:
NA NA NA NA
16467 http://wonder.cdc.gov/wonder/help/cmf.html#2014-Revision. NA NA NA NA
16468
  1. Circumstances in California resulted in unusually high death counts for the ICD-10 cause of death code R99, “Other
NA NA NA NA
16469 ill-defined and unspecified causes of mortality” for deaths occurring in years 2000 and 2001. Caution should be used in NA NA NA NA
16470 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#California-Reporting-Anomalies. NA NA NA NA
16471
  1. Circumstances in Georgia for the years 2008-2009 have resulted in unusually high death counts for the ICD-10 cause of death
NA NA NA NA
16472 code R99, “Other ill-defined and unspecified causes of mortality.” Caution should be used in interpreting these data. More NA NA NA NA
16473 information: http://wonder.cdc.gov/wonder/help/cmf.html#Georgia-Reporting-Anomalies. NA NA NA NA
16474
  1. Circumstances in New Jersey for the year 2009 have resulted in unusually high death counts for the ICD-10 cause of death code
NA NA NA NA
16475 R99, “Other ill-defined and unspecified causes of mortality” and therefore unusually low death counts in other ICD-10 codes, NA NA NA NA
16476 most notably R95, “Sudden Infant Death Syndrome” and X40-X49, “Unintentional poisoning.” Caution should be used in NA NA NA NA
16477 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#New-Jersey-Reporting-Anomalies. NA NA NA NA
21906 NA NA NA NA
21907 Dataset: Compressed Mortality, 1999-2016 NA NA NA NA
21908 Query Parameters: NA NA NA NA
21909 Title: mortality_1999-2016_5-9 NA NA NA NA
21910 Age Group: 5-9 years NA NA NA NA
21911 Group By: ICD Sub-Chapter; Year; Age Group; Gender; Race NA NA NA NA
21912 Show Totals: False NA NA NA NA
21913 Show Zero Values: False NA NA NA NA
21914 Show Suppressed: False NA NA NA NA
21915 Calculate Rates Per: 100,000 NA NA NA NA
21916 NA NA NA NA
21917 Help: See http://wonder.cdc.gov/wonder/help/cmf.html for more information. NA NA NA NA
21918 NA NA NA NA
21919 Query Date: Sep 13, 2022 7:18:14 PM NA NA NA NA
21920 NA NA NA NA
21921 Suggested Citation: Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics NA NA NA NA
21922 System, Mortality: Compressed Mortality File 1999-2016 on CDC WONDER Online Database, released June 2017. Data are from the NA NA NA NA
21923 Compressed Mortality File 1999-2016 Series 20 No. 2U, 2016, as compiled from data provided by the 57 vital statistics NA NA NA NA
21924 jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/cmf-icd10.html on Sep 13, 2022 NA NA NA NA
21925 7:18:14 PM NA NA NA NA
21926 NA NA NA NA
21927 Messages: NA NA NA NA
21928
  1. Rows with zero Deaths are hidden. Use Quick Options above to show zero rows.
NA NA NA NA
21929 NA NA NA NA
21930 Caveats: NA NA NA NA
21931
  1. Death rates are flagged as Unreliable when the rate is calculated with a numerator of 20 or less. More information:
NA NA NA NA
21932 http://wonder.cdc.gov/wonder/help/cmf.html#Unreliable. NA NA NA NA
21933
  1. About national population figures: population figures for 1999 are from the 1990-1999 series of bridged-race intercensal
NA NA NA NA
21934 estimates of the July 1 resident population; population figures for 2000 and 2010 are bridged-race April 1 census counts; NA NA NA NA
21935 population figures for 2001-2009 are from the revised 2000-2009 series of bridged-race intercensal estimates of the July 1 NA NA NA NA
21936 resident population; population figures for 2011 are bridged-race postcensal estimates of the July 1 resident population, from NA NA NA NA
21937 the Vintage 2011 series released by NCHS on July 18, 2012; population figures for 2012 are bridged-race postcensal estimates of NA NA NA NA
21938 the July 1 resident population, from the Vintage 2012 series released by NCHS on June 13, 2013; population figures for 2013 are NA NA NA NA
21939 bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2013 series released by NCHS on June 26, NA NA NA NA
21940 2014; population figures for 2014 are bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2014 NA NA NA NA
21941 series released by NCHS on June 30, 2015. and population figures for 2015 are bridged-race postcensal estimates of the July 1 NA NA NA NA
21942 resident population, from the Vintage 2015 series released by NCHS on June 28, 2016. More information: NA NA NA NA
21943 http://wonder.cdc.gov/wonder/help/cmf.html#Population Information. NA NA NA NA
21944
  1. Deaths of persons with Age “Not Stated” are included in “All” counts and rates, but are not distributed among age groups,
NA NA NA NA
21945 so are not included in age-specific counts, age-specific rates or in any age-adjusted rates. More information: NA NA NA NA
21946 http://wonder.cdc.gov/wonder/help/cmf.html#Not Stated. NA NA NA NA
21947
  1. Information included on the death certificate about the race and Hispanic ethnicity of the decedent is reported by the
NA NA NA NA
21948 funeral director as provided by an informant, often the surviving next of kin, or, in the absence of an informant, on the basis NA NA NA NA
21949 of observation. Race and ethnicity information from the census is by self-report. To the extent that race and Hispanic origin NA NA NA NA
21950 are inconsistent between these two data sources, death rates will be biased. More information: NA NA NA NA
21951 http://wonder.cdc.gov/wonder/help/cmf.html#Racial Differences. NA NA NA NA
21952
  1. As of April 3, 2017, the underlying cause of death has been revised for 125 deaths in 2014. More information:
NA NA NA NA
21953 http://wonder.cdc.gov/wonder/help/cmf.html#2014-Revision. NA NA NA NA
21954
  1. Circumstances in California resulted in unusually high death counts for the ICD-10 cause of death code R99, “Other
NA NA NA NA
21955 ill-defined and unspecified causes of mortality” for deaths occurring in years 2000 and 2001. Caution should be used in NA NA NA NA
21956 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#California-Reporting-Anomalies. NA NA NA NA
21957
  1. Circumstances in Georgia for the years 2008-2009 have resulted in unusually high death counts for the ICD-10 cause of death
NA NA NA NA
21958 code R99, “Other ill-defined and unspecified causes of mortality.” Caution should be used in interpreting these data. More NA NA NA NA
21959 information: http://wonder.cdc.gov/wonder/help/cmf.html#Georgia-Reporting-Anomalies. NA NA NA NA
21960
  1. Circumstances in New Jersey for the year 2009 have resulted in unusually high death counts for the ICD-10 cause of death code
NA NA NA NA
21961 R99, “Other ill-defined and unspecified causes of mortality” and therefore unusually low death counts in other ICD-10 codes, NA NA NA NA
21962 most notably R95, “Sudden Infant Death Syndrome” and X40-X49, “Unintentional poisoning.” Caution should be used in NA NA NA NA
21963 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#New-Jersey-Reporting-Anomalies. NA NA NA NA
27869 NA NA NA NA
27870 Dataset: Compressed Mortality, 1999-2016 NA NA NA NA
27871 Query Parameters: NA NA NA NA
27872 Title: mortality_1999-2016_10-14 NA NA NA NA
27873 Age Group: 10-14 years NA NA NA NA
27874 Group By: ICD Sub-Chapter; Year; Age Group; Gender; Race NA NA NA NA
27875 Show Totals: False NA NA NA NA
27876 Show Zero Values: False NA NA NA NA
27877 Show Suppressed: False NA NA NA NA
27878 Calculate Rates Per: 100,000 NA NA NA NA
27879 NA NA NA NA
27880 Help: See http://wonder.cdc.gov/wonder/help/cmf.html for more information. NA NA NA NA
27881 NA NA NA NA
27882 Query Date: Sep 13, 2022 7:20:31 PM NA NA NA NA
27883 NA NA NA NA
27884 Suggested Citation: Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics NA NA NA NA
27885 System, Mortality: Compressed Mortality File 1999-2016 on CDC WONDER Online Database, released June 2017. Data are from the NA NA NA NA
27886 Compressed Mortality File 1999-2016 Series 20 No. 2U, 2016, as compiled from data provided by the 57 vital statistics NA NA NA NA
27887 jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/cmf-icd10.html on Sep 13, 2022 NA NA NA NA
27888 7:20:31 PM NA NA NA NA
27889 NA NA NA NA
27890 Messages: NA NA NA NA
27891
  1. Rows with zero Deaths are hidden. Use Quick Options above to show zero rows.
NA NA NA NA
27892 NA NA NA NA
27893 Caveats: NA NA NA NA
27894
  1. Death rates are flagged as Unreliable when the rate is calculated with a numerator of 20 or less. More information:
NA NA NA NA
27895 http://wonder.cdc.gov/wonder/help/cmf.html#Unreliable. NA NA NA NA
27896
  1. About national population figures: population figures for 1999 are from the 1990-1999 series of bridged-race intercensal
NA NA NA NA
27897 estimates of the July 1 resident population; population figures for 2000 and 2010 are bridged-race April 1 census counts; NA NA NA NA
27898 population figures for 2001-2009 are from the revised 2000-2009 series of bridged-race intercensal estimates of the July 1 NA NA NA NA
27899 resident population; population figures for 2011 are bridged-race postcensal estimates of the July 1 resident population, from NA NA NA NA
27900 the Vintage 2011 series released by NCHS on July 18, 2012; population figures for 2012 are bridged-race postcensal estimates of NA NA NA NA
27901 the July 1 resident population, from the Vintage 2012 series released by NCHS on June 13, 2013; population figures for 2013 are NA NA NA NA
27902 bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2013 series released by NCHS on June 26, NA NA NA NA
27903 2014; population figures for 2014 are bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2014 NA NA NA NA
27904 series released by NCHS on June 30, 2015. and population figures for 2015 are bridged-race postcensal estimates of the July 1 NA NA NA NA
27905 resident population, from the Vintage 2015 series released by NCHS on June 28, 2016. More information: NA NA NA NA
27906 http://wonder.cdc.gov/wonder/help/cmf.html#Population Information. NA NA NA NA
27907
  1. Deaths of persons with Age “Not Stated” are included in “All” counts and rates, but are not distributed among age groups,
NA NA NA NA
27908 so are not included in age-specific counts, age-specific rates or in any age-adjusted rates. More information: NA NA NA NA
27909 http://wonder.cdc.gov/wonder/help/cmf.html#Not Stated. NA NA NA NA
27910
  1. Information included on the death certificate about the race and Hispanic ethnicity of the decedent is reported by the
NA NA NA NA
27911 funeral director as provided by an informant, often the surviving next of kin, or, in the absence of an informant, on the basis NA NA NA NA
27912 of observation. Race and ethnicity information from the census is by self-report. To the extent that race and Hispanic origin NA NA NA NA
27913 are inconsistent between these two data sources, death rates will be biased. More information: NA NA NA NA
27914 http://wonder.cdc.gov/wonder/help/cmf.html#Racial Differences. NA NA NA NA
27915
  1. As of April 3, 2017, the underlying cause of death has been revised for 125 deaths in 2014. More information:
NA NA NA NA
27916 http://wonder.cdc.gov/wonder/help/cmf.html#2014-Revision. NA NA NA NA
27917
  1. Circumstances in California resulted in unusually high death counts for the ICD-10 cause of death code R99, “Other
NA NA NA NA
27918 ill-defined and unspecified causes of mortality” for deaths occurring in years 2000 and 2001. Caution should be used in NA NA NA NA
27919 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#California-Reporting-Anomalies. NA NA NA NA
27920
  1. Circumstances in Georgia for the years 2008-2009 have resulted in unusually high death counts for the ICD-10 cause of death
NA NA NA NA
27921 code R99, “Other ill-defined and unspecified causes of mortality.” Caution should be used in interpreting these data. More NA NA NA NA
27922 information: http://wonder.cdc.gov/wonder/help/cmf.html#Georgia-Reporting-Anomalies. NA NA NA NA
27923
  1. Circumstances in New Jersey for the year 2009 have resulted in unusually high death counts for the ICD-10 cause of death code
NA NA NA NA
27924 R99, “Other ill-defined and unspecified causes of mortality” and therefore unusually low death counts in other ICD-10 codes, NA NA NA NA
27925 most notably R95, “Sudden Infant Death Syndrome” and X40-X49, “Unintentional poisoning.” Caution should be used in NA NA NA NA
27926 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#New-Jersey-Reporting-Anomalies. NA NA NA NA
35124 NA NA NA NA
35125 Dataset: Compressed Mortality, 1999-2016 NA NA NA NA
35126 Query Parameters: NA NA NA NA
35127 Title: mortality_1999-2016_15-19 NA NA NA NA
35128 Age Group: 15-19 years NA NA NA NA
35129 Group By: ICD Sub-Chapter; Year; Age Group; Gender; Race NA NA NA NA
35130 Show Totals: False NA NA NA NA
35131 Show Zero Values: False NA NA NA NA
35132 Show Suppressed: False NA NA NA NA
35133 Calculate Rates Per: 100,000 NA NA NA NA
35134 NA NA NA NA
35135 Help: See http://wonder.cdc.gov/wonder/help/cmf.html for more information. NA NA NA NA
35136 NA NA NA NA
35137 Query Date: Sep 13, 2022 7:23:35 PM NA NA NA NA
35138 NA NA NA NA
35139 Suggested Citation: Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics NA NA NA NA
35140 System, Mortality: Compressed Mortality File 1999-2016 on CDC WONDER Online Database, released June 2017. Data are from the NA NA NA NA
35141 Compressed Mortality File 1999-2016 Series 20 No. 2U, 2016, as compiled from data provided by the 57 vital statistics NA NA NA NA
35142 jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/cmf-icd10.html on Sep 13, 2022 NA NA NA NA
35143 7:23:35 PM NA NA NA NA
35144 NA NA NA NA
35145 Messages: NA NA NA NA
35146
  1. Rows with zero Deaths are hidden. Use Quick Options above to show zero rows.
NA NA NA NA
35147 NA NA NA NA
35148 Caveats: NA NA NA NA
35149
  1. Death rates are flagged as Unreliable when the rate is calculated with a numerator of 20 or less. More information:
NA NA NA NA
35150 http://wonder.cdc.gov/wonder/help/cmf.html#Unreliable. NA NA NA NA
35151
  1. About national population figures: population figures for 1999 are from the 1990-1999 series of bridged-race intercensal
NA NA NA NA
35152 estimates of the July 1 resident population; population figures for 2000 and 2010 are bridged-race April 1 census counts; NA NA NA NA
35153 population figures for 2001-2009 are from the revised 2000-2009 series of bridged-race intercensal estimates of the July 1 NA NA NA NA
35154 resident population; population figures for 2011 are bridged-race postcensal estimates of the July 1 resident population, from NA NA NA NA
35155 the Vintage 2011 series released by NCHS on July 18, 2012; population figures for 2012 are bridged-race postcensal estimates of NA NA NA NA
35156 the July 1 resident population, from the Vintage 2012 series released by NCHS on June 13, 2013; population figures for 2013 are NA NA NA NA
35157 bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2013 series released by NCHS on June 26, NA NA NA NA
35158 2014; population figures for 2014 are bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2014 NA NA NA NA
35159 series released by NCHS on June 30, 2015. and population figures for 2015 are bridged-race postcensal estimates of the July 1 NA NA NA NA
35160 resident population, from the Vintage 2015 series released by NCHS on June 28, 2016. More information: NA NA NA NA
35161 http://wonder.cdc.gov/wonder/help/cmf.html#Population Information. NA NA NA NA
35162
  1. Deaths of persons with Age “Not Stated” are included in “All” counts and rates, but are not distributed among age groups,
NA NA NA NA
35163 so are not included in age-specific counts, age-specific rates or in any age-adjusted rates. More information: NA NA NA NA
35164 http://wonder.cdc.gov/wonder/help/cmf.html#Not Stated. NA NA NA NA
35165
  1. Information included on the death certificate about the race and Hispanic ethnicity of the decedent is reported by the
NA NA NA NA
35166 funeral director as provided by an informant, often the surviving next of kin, or, in the absence of an informant, on the basis NA NA NA NA
35167 of observation. Race and ethnicity information from the census is by self-report. To the extent that race and Hispanic origin NA NA NA NA
35168 are inconsistent between these two data sources, death rates will be biased. More information: NA NA NA NA
35169 http://wonder.cdc.gov/wonder/help/cmf.html#Racial Differences. NA NA NA NA
35170
  1. As of April 3, 2017, the underlying cause of death has been revised for 125 deaths in 2014. More information:
NA NA NA NA
35171 http://wonder.cdc.gov/wonder/help/cmf.html#2014-Revision. NA NA NA NA
35172
  1. Circumstances in California resulted in unusually high death counts for the ICD-10 cause of death code R99, “Other
NA NA NA NA
35173 ill-defined and unspecified causes of mortality” for deaths occurring in years 2000 and 2001. Caution should be used in NA NA NA NA
35174 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#California-Reporting-Anomalies. NA NA NA NA
35175
  1. Circumstances in Georgia for the years 2008-2009 have resulted in unusually high death counts for the ICD-10 cause of death
NA NA NA NA
35176 code R99, “Other ill-defined and unspecified causes of mortality.” Caution should be used in interpreting these data. More NA NA NA NA
35177 information: http://wonder.cdc.gov/wonder/help/cmf.html#Georgia-Reporting-Anomalies. NA NA NA NA
35178
  1. Circumstances in New Jersey for the year 2009 have resulted in unusually high death counts for the ICD-10 cause of death code
NA NA NA NA
35179 R99, “Other ill-defined and unspecified causes of mortality” and therefore unusually low death counts in other ICD-10 codes, NA NA NA NA
35180 most notably R95, “Sudden Infant Death Syndrome” and X40-X49, “Unintentional poisoning.” Caution should be used in NA NA NA NA
35181 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#New-Jersey-Reporting-Anomalies. NA NA NA NA
43605 NA NA NA NA
43606 Dataset: Compressed Mortality, 1999-2016 NA NA NA NA
43607 Query Parameters: NA NA NA NA
43608 Title: mortality_1999-2016_20-24 NA NA NA NA
43609 Age Group: 20-24 years NA NA NA NA
43610 Group By: ICD Sub-Chapter; Year; Age Group; Gender; Race NA NA NA NA
43611 Show Totals: False NA NA NA NA
43612 Show Zero Values: False NA NA NA NA
43613 Show Suppressed: False NA NA NA NA
43614 Calculate Rates Per: 100,000 NA NA NA NA
43615 NA NA NA NA
43616 Help: See http://wonder.cdc.gov/wonder/help/cmf.html for more information. NA NA NA NA
43617 NA NA NA NA
43618 Query Date: Sep 13, 2022 7:26:38 PM NA NA NA NA
43619 NA NA NA NA
43620 Suggested Citation: Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics NA NA NA NA
43621 System, Mortality: Compressed Mortality File 1999-2016 on CDC WONDER Online Database, released June 2017. Data are from the NA NA NA NA
43622 Compressed Mortality File 1999-2016 Series 20 No. 2U, 2016, as compiled from data provided by the 57 vital statistics NA NA NA NA
43623 jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/cmf-icd10.html on Sep 13, 2022 NA NA NA NA
43624 7:26:38 PM NA NA NA NA
43625 NA NA NA NA
43626 Messages: NA NA NA NA
43627
  1. Rows with zero Deaths are hidden. Use Quick Options above to show zero rows.
NA NA NA NA
43628 NA NA NA NA
43629 Caveats: NA NA NA NA
43630
  1. Death rates are flagged as Unreliable when the rate is calculated with a numerator of 20 or less. More information:
NA NA NA NA
43631 http://wonder.cdc.gov/wonder/help/cmf.html#Unreliable. NA NA NA NA
43632
  1. About national population figures: population figures for 1999 are from the 1990-1999 series of bridged-race intercensal
NA NA NA NA
43633 estimates of the July 1 resident population; population figures for 2000 and 2010 are bridged-race April 1 census counts; NA NA NA NA
43634 population figures for 2001-2009 are from the revised 2000-2009 series of bridged-race intercensal estimates of the July 1 NA NA NA NA
43635 resident population; population figures for 2011 are bridged-race postcensal estimates of the July 1 resident population, from NA NA NA NA
43636 the Vintage 2011 series released by NCHS on July 18, 2012; population figures for 2012 are bridged-race postcensal estimates of NA NA NA NA
43637 the July 1 resident population, from the Vintage 2012 series released by NCHS on June 13, 2013; population figures for 2013 are NA NA NA NA
43638 bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2013 series released by NCHS on June 26, NA NA NA NA
43639 2014; population figures for 2014 are bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2014 NA NA NA NA
43640 series released by NCHS on June 30, 2015. and population figures for 2015 are bridged-race postcensal estimates of the July 1 NA NA NA NA
43641 resident population, from the Vintage 2015 series released by NCHS on June 28, 2016. More information: NA NA NA NA
43642 http://wonder.cdc.gov/wonder/help/cmf.html#Population Information. NA NA NA NA
43643
  1. Deaths of persons with Age “Not Stated” are included in “All” counts and rates, but are not distributed among age groups,
NA NA NA NA
43644 so are not included in age-specific counts, age-specific rates or in any age-adjusted rates. More information: NA NA NA NA
43645 http://wonder.cdc.gov/wonder/help/cmf.html#Not Stated. NA NA NA NA
43646
  1. Information included on the death certificate about the race and Hispanic ethnicity of the decedent is reported by the
NA NA NA NA
43647 funeral director as provided by an informant, often the surviving next of kin, or, in the absence of an informant, on the basis NA NA NA NA
43648 of observation. Race and ethnicity information from the census is by self-report. To the extent that race and Hispanic origin NA NA NA NA
43649 are inconsistent between these two data sources, death rates will be biased. More information: NA NA NA NA
43650 http://wonder.cdc.gov/wonder/help/cmf.html#Racial Differences. NA NA NA NA
43651
  1. As of April 3, 2017, the underlying cause of death has been revised for 125 deaths in 2014. More information:
NA NA NA NA
43652 http://wonder.cdc.gov/wonder/help/cmf.html#2014-Revision. NA NA NA NA
43653
  1. Circumstances in California resulted in unusually high death counts for the ICD-10 cause of death code R99, “Other
NA NA NA NA
43654 ill-defined and unspecified causes of mortality” for deaths occurring in years 2000 and 2001. Caution should be used in NA NA NA NA
43655 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#California-Reporting-Anomalies. NA NA NA NA
43656
  1. Circumstances in Georgia for the years 2008-2009 have resulted in unusually high death counts for the ICD-10 cause of death
NA NA NA NA
43657 code R99, “Other ill-defined and unspecified causes of mortality.” Caution should be used in interpreting these data. More NA NA NA NA
43658 information: http://wonder.cdc.gov/wonder/help/cmf.html#Georgia-Reporting-Anomalies. NA NA NA NA
43659
  1. Circumstances in New Jersey for the year 2009 have resulted in unusually high death counts for the ICD-10 cause of death code
NA NA NA NA
43660 R99, “Other ill-defined and unspecified causes of mortality” and therefore unusually low death counts in other ICD-10 codes, NA NA NA NA
43661 most notably R95, “Sudden Infant Death Syndrome” and X40-X49, “Unintentional poisoning.” Caution should be used in NA NA NA NA
43662 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#New-Jersey-Reporting-Anomalies. NA NA NA NA
54907 NA NA NA NA
54908 Dataset: Compressed Mortality, 1999-2016 NA NA NA NA
54909 Query Parameters: NA NA NA NA
54910 Title: mortality_1999-2016_25-34 NA NA NA NA
54911 Age Group: 25-34 years NA NA NA NA
54912 Group By: ICD Sub-Chapter; Year; Age Group; Gender; Race NA NA NA NA
54913 Show Totals: False NA NA NA NA
54914 Show Zero Values: False NA NA NA NA
54915 Show Suppressed: False NA NA NA NA
54916 Calculate Rates Per: 100,000 NA NA NA NA
54917 NA NA NA NA
54918 Help: See http://wonder.cdc.gov/wonder/help/cmf.html for more information. NA NA NA NA
54919 NA NA NA NA
54920 Query Date: Sep 13, 2022 4:22:07 PM NA NA NA NA
54921 NA NA NA NA
54922 Suggested Citation: Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics NA NA NA NA
54923 System, Mortality: Compressed Mortality File 1999-2016 on CDC WONDER Online Database, released June 2017. Data are from the NA NA NA NA
54924 Compressed Mortality File 1999-2016 Series 20 No. 2U, 2016, as compiled from data provided by the 57 vital statistics NA NA NA NA
54925 jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/cmf-icd10.html on Sep 13, 2022 NA NA NA NA
54926 4:22:07 PM NA NA NA NA
54927 NA NA NA NA
54928 Messages: NA NA NA NA
54929
  1. Rows with zero Deaths are hidden. Use Quick Options above to show zero rows.
NA NA NA NA
54930 NA NA NA NA
54931 Caveats: NA NA NA NA
54932
  1. Death rates are flagged as Unreliable when the rate is calculated with a numerator of 20 or less. More information:
NA NA NA NA
54933 http://wonder.cdc.gov/wonder/help/cmf.html#Unreliable. NA NA NA NA
54934
  1. About national population figures: population figures for 1999 are from the 1990-1999 series of bridged-race intercensal
NA NA NA NA
54935 estimates of the July 1 resident population; population figures for 2000 and 2010 are bridged-race April 1 census counts; NA NA NA NA
54936 population figures for 2001-2009 are from the revised 2000-2009 series of bridged-race intercensal estimates of the July 1 NA NA NA NA
54937 resident population; population figures for 2011 are bridged-race postcensal estimates of the July 1 resident population, from NA NA NA NA
54938 the Vintage 2011 series released by NCHS on July 18, 2012; population figures for 2012 are bridged-race postcensal estimates of NA NA NA NA
54939 the July 1 resident population, from the Vintage 2012 series released by NCHS on June 13, 2013; population figures for 2013 are NA NA NA NA
54940 bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2013 series released by NCHS on June 26, NA NA NA NA
54941 2014; population figures for 2014 are bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2014 NA NA NA NA
54942 series released by NCHS on June 30, 2015. and population figures for 2015 are bridged-race postcensal estimates of the July 1 NA NA NA NA
54943 resident population, from the Vintage 2015 series released by NCHS on June 28, 2016. More information: NA NA NA NA
54944 http://wonder.cdc.gov/wonder/help/cmf.html#Population Information. NA NA NA NA
54945
  1. Deaths of persons with Age “Not Stated” are included in “All” counts and rates, but are not distributed among age groups,
NA NA NA NA
54946 so are not included in age-specific counts, age-specific rates or in any age-adjusted rates. More information: NA NA NA NA
54947 http://wonder.cdc.gov/wonder/help/cmf.html#Not Stated. NA NA NA NA
54948
  1. Information included on the death certificate about the race and Hispanic ethnicity of the decedent is reported by the
NA NA NA NA
54949 funeral director as provided by an informant, often the surviving next of kin, or, in the absence of an informant, on the basis NA NA NA NA
54950 of observation. Race and ethnicity information from the census is by self-report. To the extent that race and Hispanic origin NA NA NA NA
54951 are inconsistent between these two data sources, death rates will be biased. More information: NA NA NA NA
54952 http://wonder.cdc.gov/wonder/help/cmf.html#Racial Differences. NA NA NA NA
54953
  1. As of April 3, 2017, the underlying cause of death has been revised for 125 deaths in 2014. More information:
NA NA NA NA
54954 http://wonder.cdc.gov/wonder/help/cmf.html#2014-Revision. NA NA NA NA
54955
  1. Circumstances in California resulted in unusually high death counts for the ICD-10 cause of death code R99, “Other
NA NA NA NA
54956 ill-defined and unspecified causes of mortality” for deaths occurring in years 2000 and 2001. Caution should be used in NA NA NA NA
54957 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#California-Reporting-Anomalies. NA NA NA NA
54958
  1. Circumstances in Georgia for the years 2008-2009 have resulted in unusually high death counts for the ICD-10 cause of death
NA NA NA NA
54959 code R99, “Other ill-defined and unspecified causes of mortality.” Caution should be used in interpreting these data. More NA NA NA NA
54960 information: http://wonder.cdc.gov/wonder/help/cmf.html#Georgia-Reporting-Anomalies. NA NA NA NA
54961
  1. Circumstances in New Jersey for the year 2009 have resulted in unusually high death counts for the ICD-10 cause of death code
NA NA NA NA
54962 R99, “Other ill-defined and unspecified causes of mortality” and therefore unusually low death counts in other ICD-10 codes, NA NA NA NA
54963 most notably R95, “Sudden Infant Death Syndrome” and X40-X49, “Unintentional poisoning.” Caution should be used in NA NA NA NA
54964 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#New-Jersey-Reporting-Anomalies. NA NA NA NA
67559 NA NA NA NA
67560 Dataset: Compressed Mortality, 1999-2016 NA NA NA NA
67561 Query Parameters: NA NA NA NA
67562 Title: mortality_1999-2016_35-44 NA NA NA NA
67563 Age Group: 35-44 years NA NA NA NA
67564 Group By: ICD Sub-Chapter; Year; Age Group; Gender; Race NA NA NA NA
67565 Show Totals: False NA NA NA NA
67566 Show Zero Values: False NA NA NA NA
67567 Show Suppressed: False NA NA NA NA
67568 Calculate Rates Per: 100,000 NA NA NA NA
67569 NA NA NA NA
67570 Help: See http://wonder.cdc.gov/wonder/help/cmf.html for more information. NA NA NA NA
67571 NA NA NA NA
67572 Query Date: Sep 13, 2022 4:06:23 PM NA NA NA NA
67573 NA NA NA NA
67574 Suggested Citation: Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics NA NA NA NA
67575 System, Mortality: Compressed Mortality File 1999-2016 on CDC WONDER Online Database, released June 2017. Data are from the NA NA NA NA
67576 Compressed Mortality File 1999-2016 Series 20 No. 2U, 2016, as compiled from data provided by the 57 vital statistics NA NA NA NA
67577 jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/cmf-icd10.html on Sep 13, 2022 NA NA NA NA
67578 4:06:23 PM NA NA NA NA
67579 NA NA NA NA
67580 Messages: NA NA NA NA
67581
  1. Rows with zero Deaths are hidden. Use Quick Options above to show zero rows.
NA NA NA NA
67582 NA NA NA NA
67583 Caveats: NA NA NA NA
67584
  1. Death rates are flagged as Unreliable when the rate is calculated with a numerator of 20 or less. More information:
NA NA NA NA
67585 http://wonder.cdc.gov/wonder/help/cmf.html#Unreliable. NA NA NA NA
67586
  1. About national population figures: population figures for 1999 are from the 1990-1999 series of bridged-race intercensal
NA NA NA NA
67587 estimates of the July 1 resident population; population figures for 2000 and 2010 are bridged-race April 1 census counts; NA NA NA NA
67588 population figures for 2001-2009 are from the revised 2000-2009 series of bridged-race intercensal estimates of the July 1 NA NA NA NA
67589 resident population; population figures for 2011 are bridged-race postcensal estimates of the July 1 resident population, from NA NA NA NA
67590 the Vintage 2011 series released by NCHS on July 18, 2012; population figures for 2012 are bridged-race postcensal estimates of NA NA NA NA
67591 the July 1 resident population, from the Vintage 2012 series released by NCHS on June 13, 2013; population figures for 2013 are NA NA NA NA
67592 bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2013 series released by NCHS on June 26, NA NA NA NA
67593 2014; population figures for 2014 are bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2014 NA NA NA NA
67594 series released by NCHS on June 30, 2015. and population figures for 2015 are bridged-race postcensal estimates of the July 1 NA NA NA NA
67595 resident population, from the Vintage 2015 series released by NCHS on June 28, 2016. More information: NA NA NA NA
67596 http://wonder.cdc.gov/wonder/help/cmf.html#Population Information. NA NA NA NA
67597
  1. Deaths of persons with Age “Not Stated” are included in “All” counts and rates, but are not distributed among age groups,
NA NA NA NA
67598 so are not included in age-specific counts, age-specific rates or in any age-adjusted rates. More information: NA NA NA NA
67599 http://wonder.cdc.gov/wonder/help/cmf.html#Not Stated. NA NA NA NA
67600
  1. Information included on the death certificate about the race and Hispanic ethnicity of the decedent is reported by the
NA NA NA NA
67601 funeral director as provided by an informant, often the surviving next of kin, or, in the absence of an informant, on the basis NA NA NA NA
67602 of observation. Race and ethnicity information from the census is by self-report. To the extent that race and Hispanic origin NA NA NA NA
67603 are inconsistent between these two data sources, death rates will be biased. More information: NA NA NA NA
67604 http://wonder.cdc.gov/wonder/help/cmf.html#Racial Differences. NA NA NA NA
67605
  1. As of April 3, 2017, the underlying cause of death has been revised for 125 deaths in 2014. More information:
NA NA NA NA
67606 http://wonder.cdc.gov/wonder/help/cmf.html#2014-Revision. NA NA NA NA
67607
  1. Circumstances in California resulted in unusually high death counts for the ICD-10 cause of death code R99, “Other
NA NA NA NA
67608 ill-defined and unspecified causes of mortality” for deaths occurring in years 2000 and 2001. Caution should be used in NA NA NA NA
67609 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#California-Reporting-Anomalies. NA NA NA NA
67610
  1. Circumstances in Georgia for the years 2008-2009 have resulted in unusually high death counts for the ICD-10 cause of death
NA NA NA NA
67611 code R99, “Other ill-defined and unspecified causes of mortality.” Caution should be used in interpreting these data. More NA NA NA NA
67612 information: http://wonder.cdc.gov/wonder/help/cmf.html#Georgia-Reporting-Anomalies. NA NA NA NA
67613
  1. Circumstances in New Jersey for the year 2009 have resulted in unusually high death counts for the ICD-10 cause of death code
NA NA NA NA
67614 R99, “Other ill-defined and unspecified causes of mortality” and therefore unusually low death counts in other ICD-10 codes, NA NA NA NA
67615 most notably R95, “Sudden Infant Death Syndrome” and X40-X49, “Unintentional poisoning.” Caution should be used in NA NA NA NA
67616 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#New-Jersey-Reporting-Anomalies. NA NA NA NA
81333 NA NA NA NA
81334 Dataset: Compressed Mortality, 1999-2016 NA NA NA NA
81335 Query Parameters: NA NA NA NA
81336 Title: mortality_1999-2016_45-54 NA NA NA NA
81337 Age Group: 45-54 years NA NA NA NA
81338 Group By: ICD Sub-Chapter; Year; Age Group; Gender; Race NA NA NA NA
81339 Show Totals: False NA NA NA NA
81340 Show Zero Values: False NA NA NA NA
81341 Show Suppressed: False NA NA NA NA
81342 Calculate Rates Per: 100,000 NA NA NA NA
81343 NA NA NA NA
81344 Help: See http://wonder.cdc.gov/wonder/help/cmf.html for more information. NA NA NA NA
81345 NA NA NA NA
81346 Query Date: Sep 13, 2022 4:03:48 PM NA NA NA NA
81347 NA NA NA NA
81348 Suggested Citation: Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics NA NA NA NA
81349 System, Mortality: Compressed Mortality File 1999-2016 on CDC WONDER Online Database, released June 2017. Data are from the NA NA NA NA
81350 Compressed Mortality File 1999-2016 Series 20 No. 2U, 2016, as compiled from data provided by the 57 vital statistics NA NA NA NA
81351 jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/cmf-icd10.html on Sep 13, 2022 NA NA NA NA
81352 4:03:48 PM NA NA NA NA
81353 NA NA NA NA
81354 Messages: NA NA NA NA
81355
  1. Rows with zero Deaths are hidden. Use Quick Options above to show zero rows.
NA NA NA NA
81356 NA NA NA NA
81357 Caveats: NA NA NA NA
81358
  1. Death rates are flagged as Unreliable when the rate is calculated with a numerator of 20 or less. More information:
NA NA NA NA
81359 http://wonder.cdc.gov/wonder/help/cmf.html#Unreliable. NA NA NA NA
81360
  1. About national population figures: population figures for 1999 are from the 1990-1999 series of bridged-race intercensal
NA NA NA NA
81361 estimates of the July 1 resident population; population figures for 2000 and 2010 are bridged-race April 1 census counts; NA NA NA NA
81362 population figures for 2001-2009 are from the revised 2000-2009 series of bridged-race intercensal estimates of the July 1 NA NA NA NA
81363 resident population; population figures for 2011 are bridged-race postcensal estimates of the July 1 resident population, from NA NA NA NA
81364 the Vintage 2011 series released by NCHS on July 18, 2012; population figures for 2012 are bridged-race postcensal estimates of NA NA NA NA
81365 the July 1 resident population, from the Vintage 2012 series released by NCHS on June 13, 2013; population figures for 2013 are NA NA NA NA
81366 bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2013 series released by NCHS on June 26, NA NA NA NA
81367 2014; population figures for 2014 are bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2014 NA NA NA NA
81368 series released by NCHS on June 30, 2015. and population figures for 2015 are bridged-race postcensal estimates of the July 1 NA NA NA NA
81369 resident population, from the Vintage 2015 series released by NCHS on June 28, 2016. More information: NA NA NA NA
81370 http://wonder.cdc.gov/wonder/help/cmf.html#Population Information. NA NA NA NA
81371
  1. Deaths of persons with Age “Not Stated” are included in “All” counts and rates, but are not distributed among age groups,
NA NA NA NA
81372 so are not included in age-specific counts, age-specific rates or in any age-adjusted rates. More information: NA NA NA NA
81373 http://wonder.cdc.gov/wonder/help/cmf.html#Not Stated. NA NA NA NA
81374
  1. Information included on the death certificate about the race and Hispanic ethnicity of the decedent is reported by the
NA NA NA NA
81375 funeral director as provided by an informant, often the surviving next of kin, or, in the absence of an informant, on the basis NA NA NA NA
81376 of observation. Race and ethnicity information from the census is by self-report. To the extent that race and Hispanic origin NA NA NA NA
81377 are inconsistent between these two data sources, death rates will be biased. More information: NA NA NA NA
81378 http://wonder.cdc.gov/wonder/help/cmf.html#Racial Differences. NA NA NA NA
81379
  1. As of April 3, 2017, the underlying cause of death has been revised for 125 deaths in 2014. More information:
NA NA NA NA
81380 http://wonder.cdc.gov/wonder/help/cmf.html#2014-Revision. NA NA NA NA
81381
  1. Circumstances in California resulted in unusually high death counts for the ICD-10 cause of death code R99, “Other
NA NA NA NA
81382 ill-defined and unspecified causes of mortality” for deaths occurring in years 2000 and 2001. Caution should be used in NA NA NA NA
81383 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#California-Reporting-Anomalies. NA NA NA NA
81384
  1. Circumstances in Georgia for the years 2008-2009 have resulted in unusually high death counts for the ICD-10 cause of death
NA NA NA NA
81385 code R99, “Other ill-defined and unspecified causes of mortality.” Caution should be used in interpreting these data. More NA NA NA NA
81386 information: http://wonder.cdc.gov/wonder/help/cmf.html#Georgia-Reporting-Anomalies. NA NA NA NA
81387
  1. Circumstances in New Jersey for the year 2009 have resulted in unusually high death counts for the ICD-10 cause of death code
NA NA NA NA
81388 R99, “Other ill-defined and unspecified causes of mortality” and therefore unusually low death counts in other ICD-10 codes, NA NA NA NA
81389 most notably R95, “Sudden Infant Death Syndrome” and X40-X49, “Unintentional poisoning.” Caution should be used in NA NA NA NA
81390 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#New-Jersey-Reporting-Anomalies. NA NA NA NA
95555 NA NA NA NA
95556 Dataset: Compressed Mortality, 1999-2016 NA NA NA NA
95557 Query Parameters: NA NA NA NA
95558 Title: mortality_1999-2016_55-64 NA NA NA NA
95559 Age Group: 55-64 years NA NA NA NA
95560 Group By: ICD Sub-Chapter; Year; Age Group; Gender; Race NA NA NA NA
95561 Show Totals: False NA NA NA NA
95562 Show Zero Values: False NA NA NA NA
95563 Show Suppressed: False NA NA NA NA
95564 Calculate Rates Per: 100,000 NA NA NA NA
95565 NA NA NA NA
95566 Help: See http://wonder.cdc.gov/wonder/help/cmf.html for more information. NA NA NA NA
95567 NA NA NA NA
95568 Query Date: Sep 13, 2022 4:01:53 PM NA NA NA NA
95569 NA NA NA NA
95570 Suggested Citation: Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics NA NA NA NA
95571 System, Mortality: Compressed Mortality File 1999-2016 on CDC WONDER Online Database, released June 2017. Data are from the NA NA NA NA
95572 Compressed Mortality File 1999-2016 Series 20 No. 2U, 2016, as compiled from data provided by the 57 vital statistics NA NA NA NA
95573 jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/cmf-icd10.html on Sep 13, 2022 NA NA NA NA
95574 4:01:53 PM NA NA NA NA
95575 NA NA NA NA
95576 Messages: NA NA NA NA
95577
  1. Rows with zero Deaths are hidden. Use Quick Options above to show zero rows.
NA NA NA NA
95578 NA NA NA NA
95579 Caveats: NA NA NA NA
95580
  1. Death rates are flagged as Unreliable when the rate is calculated with a numerator of 20 or less. More information:
NA NA NA NA
95581 http://wonder.cdc.gov/wonder/help/cmf.html#Unreliable. NA NA NA NA
95582
  1. About national population figures: population figures for 1999 are from the 1990-1999 series of bridged-race intercensal
NA NA NA NA
95583 estimates of the July 1 resident population; population figures for 2000 and 2010 are bridged-race April 1 census counts; NA NA NA NA
95584 population figures for 2001-2009 are from the revised 2000-2009 series of bridged-race intercensal estimates of the July 1 NA NA NA NA
95585 resident population; population figures for 2011 are bridged-race postcensal estimates of the July 1 resident population, from NA NA NA NA
95586 the Vintage 2011 series released by NCHS on July 18, 2012; population figures for 2012 are bridged-race postcensal estimates of NA NA NA NA
95587 the July 1 resident population, from the Vintage 2012 series released by NCHS on June 13, 2013; population figures for 2013 are NA NA NA NA
95588 bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2013 series released by NCHS on June 26, NA NA NA NA
95589 2014; population figures for 2014 are bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2014 NA NA NA NA
95590 series released by NCHS on June 30, 2015. and population figures for 2015 are bridged-race postcensal estimates of the July 1 NA NA NA NA
95591 resident population, from the Vintage 2015 series released by NCHS on June 28, 2016. More information: NA NA NA NA
95592 http://wonder.cdc.gov/wonder/help/cmf.html#Population Information. NA NA NA NA
95593
  1. Deaths of persons with Age “Not Stated” are included in “All” counts and rates, but are not distributed among age groups,
NA NA NA NA
95594 so are not included in age-specific counts, age-specific rates or in any age-adjusted rates. More information: NA NA NA NA
95595 http://wonder.cdc.gov/wonder/help/cmf.html#Not Stated. NA NA NA NA
95596
  1. Information included on the death certificate about the race and Hispanic ethnicity of the decedent is reported by the
NA NA NA NA
95597 funeral director as provided by an informant, often the surviving next of kin, or, in the absence of an informant, on the basis NA NA NA NA
95598 of observation. Race and ethnicity information from the census is by self-report. To the extent that race and Hispanic origin NA NA NA NA
95599 are inconsistent between these two data sources, death rates will be biased. More information: NA NA NA NA
95600 http://wonder.cdc.gov/wonder/help/cmf.html#Racial Differences. NA NA NA NA
95601
  1. As of April 3, 2017, the underlying cause of death has been revised for 125 deaths in 2014. More information:
NA NA NA NA
95602 http://wonder.cdc.gov/wonder/help/cmf.html#2014-Revision. NA NA NA NA
95603
  1. Circumstances in California resulted in unusually high death counts for the ICD-10 cause of death code R99, “Other
NA NA NA NA
95604 ill-defined and unspecified causes of mortality” for deaths occurring in years 2000 and 2001. Caution should be used in NA NA NA NA
95605 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#California-Reporting-Anomalies. NA NA NA NA
95606
  1. Circumstances in Georgia for the years 2008-2009 have resulted in unusually high death counts for the ICD-10 cause of death
NA NA NA NA
95607 code R99, “Other ill-defined and unspecified causes of mortality.” Caution should be used in interpreting these data. More NA NA NA NA
95608 information: http://wonder.cdc.gov/wonder/help/cmf.html#Georgia-Reporting-Anomalies. NA NA NA NA
95609
  1. Circumstances in New Jersey for the year 2009 have resulted in unusually high death counts for the ICD-10 cause of death code
NA NA NA NA
95610 R99, “Other ill-defined and unspecified causes of mortality” and therefore unusually low death counts in other ICD-10 codes, NA NA NA NA
95611 most notably R95, “Sudden Infant Death Syndrome” and X40-X49, “Unintentional poisoning.” Caution should be used in NA NA NA NA
95612 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#New-Jersey-Reporting-Anomalies. NA NA NA NA
109990 NA NA NA NA
109991 Dataset: Compressed Mortality, 1999-2016 NA NA NA NA
109992 Query Parameters: NA NA NA NA
109993 Title: mortality_1999-2016_65-74 NA NA NA NA
109994 Age Group: 65-74 years NA NA NA NA
109995 Group By: ICD Sub-Chapter; Year; Age Group; Gender; Race NA NA NA NA
109996 Show Totals: False NA NA NA NA
109997 Show Zero Values: False NA NA NA NA
109998 Show Suppressed: False NA NA NA NA
109999 Calculate Rates Per: 100,000 NA NA NA NA
110000 NA NA NA NA
110001 Help: See http://wonder.cdc.gov/wonder/help/cmf.html for more information. NA NA NA NA
110002 NA NA NA NA
110003 Query Date: Sep 13, 2022 3:59:35 PM NA NA NA NA
110004 NA NA NA NA
110005 Suggested Citation: Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics NA NA NA NA
110006 System, Mortality: Compressed Mortality File 1999-2016 on CDC WONDER Online Database, released June 2017. Data are from the NA NA NA NA
110007 Compressed Mortality File 1999-2016 Series 20 No. 2U, 2016, as compiled from data provided by the 57 vital statistics NA NA NA NA
110008 jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/cmf-icd10.html on Sep 13, 2022 NA NA NA NA
110009 3:59:35 PM NA NA NA NA
110010 NA NA NA NA
110011 Messages: NA NA NA NA
110012
  1. Rows with zero Deaths are hidden. Use Quick Options above to show zero rows.
NA NA NA NA
110013 NA NA NA NA
110014 Caveats: NA NA NA NA
110015
  1. Death rates are flagged as Unreliable when the rate is calculated with a numerator of 20 or less. More information:
NA NA NA NA
110016 http://wonder.cdc.gov/wonder/help/cmf.html#Unreliable. NA NA NA NA
110017
  1. About national population figures: population figures for 1999 are from the 1990-1999 series of bridged-race intercensal
NA NA NA NA
110018 estimates of the July 1 resident population; population figures for 2000 and 2010 are bridged-race April 1 census counts; NA NA NA NA
110019 population figures for 2001-2009 are from the revised 2000-2009 series of bridged-race intercensal estimates of the July 1 NA NA NA NA
110020 resident population; population figures for 2011 are bridged-race postcensal estimates of the July 1 resident population, from NA NA NA NA
110021 the Vintage 2011 series released by NCHS on July 18, 2012; population figures for 2012 are bridged-race postcensal estimates of NA NA NA NA
110022 the July 1 resident population, from the Vintage 2012 series released by NCHS on June 13, 2013; population figures for 2013 are NA NA NA NA
110023 bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2013 series released by NCHS on June 26, NA NA NA NA
110024 2014; population figures for 2014 are bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2014 NA NA NA NA
110025 series released by NCHS on June 30, 2015. and population figures for 2015 are bridged-race postcensal estimates of the July 1 NA NA NA NA
110026 resident population, from the Vintage 2015 series released by NCHS on June 28, 2016. More information: NA NA NA NA
110027 http://wonder.cdc.gov/wonder/help/cmf.html#Population Information. NA NA NA NA
110028
  1. Deaths of persons with Age “Not Stated” are included in “All” counts and rates, but are not distributed among age groups,
NA NA NA NA
110029 so are not included in age-specific counts, age-specific rates or in any age-adjusted rates. More information: NA NA NA NA
110030 http://wonder.cdc.gov/wonder/help/cmf.html#Not Stated. NA NA NA NA
110031
  1. Information included on the death certificate about the race and Hispanic ethnicity of the decedent is reported by the
NA NA NA NA
110032 funeral director as provided by an informant, often the surviving next of kin, or, in the absence of an informant, on the basis NA NA NA NA
110033 of observation. Race and ethnicity information from the census is by self-report. To the extent that race and Hispanic origin NA NA NA NA
110034 are inconsistent between these two data sources, death rates will be biased. More information: NA NA NA NA
110035 http://wonder.cdc.gov/wonder/help/cmf.html#Racial Differences. NA NA NA NA
110036
  1. As of April 3, 2017, the underlying cause of death has been revised for 125 deaths in 2014. More information:
NA NA NA NA
110037 http://wonder.cdc.gov/wonder/help/cmf.html#2014-Revision. NA NA NA NA
110038
  1. Circumstances in California resulted in unusually high death counts for the ICD-10 cause of death code R99, “Other
NA NA NA NA
110039 ill-defined and unspecified causes of mortality” for deaths occurring in years 2000 and 2001. Caution should be used in NA NA NA NA
110040 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#California-Reporting-Anomalies. NA NA NA NA
110041
  1. Circumstances in Georgia for the years 2008-2009 have resulted in unusually high death counts for the ICD-10 cause of death
NA NA NA NA
110042 code R99, “Other ill-defined and unspecified causes of mortality.” Caution should be used in interpreting these data. More NA NA NA NA
110043 information: http://wonder.cdc.gov/wonder/help/cmf.html#Georgia-Reporting-Anomalies. NA NA NA NA
110044
  1. Circumstances in New Jersey for the year 2009 have resulted in unusually high death counts for the ICD-10 cause of death code
NA NA NA NA
110045 R99, “Other ill-defined and unspecified causes of mortality” and therefore unusually low death counts in other ICD-10 codes, NA NA NA NA
110046 most notably R95, “Sudden Infant Death Syndrome” and X40-X49, “Unintentional poisoning.” Caution should be used in NA NA NA NA
110047 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#New-Jersey-Reporting-Anomalies. NA NA NA NA
124533 NA NA NA NA
124534 Dataset: Compressed Mortality, 1999-2016 NA NA NA NA
124535 Query Parameters: NA NA NA NA
124536 Title: mortality_1999-2016_75-84 NA NA NA NA
124537 Age Group: 75-84 years NA NA NA NA
124538 Group By: ICD Sub-Chapter; Year; Age Group; Gender; Race NA NA NA NA
124539 Show Totals: False NA NA NA NA
124540 Show Zero Values: False NA NA NA NA
124541 Show Suppressed: False NA NA NA NA
124542 Calculate Rates Per: 100,000 NA NA NA NA
124543 NA NA NA NA
124544 Help: See http://wonder.cdc.gov/wonder/help/cmf.html for more information. NA NA NA NA
124545 NA NA NA NA
124546 Query Date: Sep 13, 2022 3:57:48 PM NA NA NA NA
124547 NA NA NA NA
124548 Suggested Citation: Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics NA NA NA NA
124549 System, Mortality: Compressed Mortality File 1999-2016 on CDC WONDER Online Database, released June 2017. Data are from the NA NA NA NA
124550 Compressed Mortality File 1999-2016 Series 20 No. 2U, 2016, as compiled from data provided by the 57 vital statistics NA NA NA NA
124551 jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/cmf-icd10.html on Sep 13, 2022 NA NA NA NA
124552 3:57:48 PM NA NA NA NA
124553 NA NA NA NA
124554 Messages: NA NA NA NA
124555
  1. Rows with zero Deaths are hidden. Use Quick Options above to show zero rows.
NA NA NA NA
124556 NA NA NA NA
124557 Caveats: NA NA NA NA
124558
  1. Death rates are flagged as Unreliable when the rate is calculated with a numerator of 20 or less. More information:
NA NA NA NA
124559 http://wonder.cdc.gov/wonder/help/cmf.html#Unreliable. NA NA NA NA
124560
  1. About national population figures: population figures for 1999 are from the 1990-1999 series of bridged-race intercensal
NA NA NA NA
124561 estimates of the July 1 resident population; population figures for 2000 and 2010 are bridged-race April 1 census counts; NA NA NA NA
124562 population figures for 2001-2009 are from the revised 2000-2009 series of bridged-race intercensal estimates of the July 1 NA NA NA NA
124563 resident population; population figures for 2011 are bridged-race postcensal estimates of the July 1 resident population, from NA NA NA NA
124564 the Vintage 2011 series released by NCHS on July 18, 2012; population figures for 2012 are bridged-race postcensal estimates of NA NA NA NA
124565 the July 1 resident population, from the Vintage 2012 series released by NCHS on June 13, 2013; population figures for 2013 are NA NA NA NA
124566 bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2013 series released by NCHS on June 26, NA NA NA NA
124567 2014; population figures for 2014 are bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2014 NA NA NA NA
124568 series released by NCHS on June 30, 2015. and population figures for 2015 are bridged-race postcensal estimates of the July 1 NA NA NA NA
124569 resident population, from the Vintage 2015 series released by NCHS on June 28, 2016. More information: NA NA NA NA
124570 http://wonder.cdc.gov/wonder/help/cmf.html#Population Information. NA NA NA NA
124571
  1. Deaths of persons with Age “Not Stated” are included in “All” counts and rates, but are not distributed among age groups,
NA NA NA NA
124572 so are not included in age-specific counts, age-specific rates or in any age-adjusted rates. More information: NA NA NA NA
124573 http://wonder.cdc.gov/wonder/help/cmf.html#Not Stated. NA NA NA NA
124574
  1. Information included on the death certificate about the race and Hispanic ethnicity of the decedent is reported by the
NA NA NA NA
124575 funeral director as provided by an informant, often the surviving next of kin, or, in the absence of an informant, on the basis NA NA NA NA
124576 of observation. Race and ethnicity information from the census is by self-report. To the extent that race and Hispanic origin NA NA NA NA
124577 are inconsistent between these two data sources, death rates will be biased. More information: NA NA NA NA
124578 http://wonder.cdc.gov/wonder/help/cmf.html#Racial Differences. NA NA NA NA
124579
  1. As of April 3, 2017, the underlying cause of death has been revised for 125 deaths in 2014. More information:
NA NA NA NA
124580 http://wonder.cdc.gov/wonder/help/cmf.html#2014-Revision. NA NA NA NA
124581
  1. Circumstances in California resulted in unusually high death counts for the ICD-10 cause of death code R99, “Other
NA NA NA NA
124582 ill-defined and unspecified causes of mortality” for deaths occurring in years 2000 and 2001. Caution should be used in NA NA NA NA
124583 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#California-Reporting-Anomalies. NA NA NA NA
124584
  1. Circumstances in Georgia for the years 2008-2009 have resulted in unusually high death counts for the ICD-10 cause of death
NA NA NA NA
124585 code R99, “Other ill-defined and unspecified causes of mortality.” Caution should be used in interpreting these data. More NA NA NA NA
124586 information: http://wonder.cdc.gov/wonder/help/cmf.html#Georgia-Reporting-Anomalies. NA NA NA NA
124587
  1. Circumstances in New Jersey for the year 2009 have resulted in unusually high death counts for the ICD-10 cause of death code
NA NA NA NA
124588 R99, “Other ill-defined and unspecified causes of mortality” and therefore unusually low death counts in other ICD-10 codes, NA NA NA NA
124589 most notably R95, “Sudden Infant Death Syndrome” and X40-X49, “Unintentional poisoning.” Caution should be used in NA NA NA NA
124590 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#New-Jersey-Reporting-Anomalies. NA NA NA NA
138363 NA NA NA NA
138364 Dataset: Compressed Mortality, 1999-2016 NA NA NA NA
138365 Query Parameters: NA NA NA NA
138366 Title: mortality_1999-2016_85+ NA NA NA NA
138367 Age Group: 85+ years NA NA NA NA
138368 Group By: ICD Sub-Chapter; Year; Age Group; Gender; Race NA NA NA NA
138369 Show Totals: False NA NA NA NA
138370 Show Zero Values: False NA NA NA NA
138371 Show Suppressed: False NA NA NA NA
138372 Calculate Rates Per: 100,000 NA NA NA NA
138373 NA NA NA NA
138374 Help: See http://wonder.cdc.gov/wonder/help/cmf.html for more information. NA NA NA NA
138375 NA NA NA NA
138376 Query Date: Sep 13, 2022 3:54:39 PM NA NA NA NA
138377 NA NA NA NA
138378 Suggested Citation: Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics NA NA NA NA
138379 System, Mortality: Compressed Mortality File 1999-2016 on CDC WONDER Online Database, released June 2017. Data are from the NA NA NA NA
138380 Compressed Mortality File 1999-2016 Series 20 No. 2U, 2016, as compiled from data provided by the 57 vital statistics NA NA NA NA
138381 jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/cmf-icd10.html on Sep 13, 2022 NA NA NA NA
138382 3:54:39 PM NA NA NA NA
138383 NA NA NA NA
138384 Messages: NA NA NA NA
138385
  1. Rows with zero Deaths are hidden. Use Quick Options above to show zero rows.
NA NA NA NA
138386 NA NA NA NA
138387 Caveats: NA NA NA NA
138388
  1. Death rates are flagged as Unreliable when the rate is calculated with a numerator of 20 or less. More information:
NA NA NA NA
138389 http://wonder.cdc.gov/wonder/help/cmf.html#Unreliable. NA NA NA NA
138390
  1. About national population figures: population figures for 1999 are from the 1990-1999 series of bridged-race intercensal
NA NA NA NA
138391 estimates of the July 1 resident population; population figures for 2000 and 2010 are bridged-race April 1 census counts; NA NA NA NA
138392 population figures for 2001-2009 are from the revised 2000-2009 series of bridged-race intercensal estimates of the July 1 NA NA NA NA
138393 resident population; population figures for 2011 are bridged-race postcensal estimates of the July 1 resident population, from NA NA NA NA
138394 the Vintage 2011 series released by NCHS on July 18, 2012; population figures for 2012 are bridged-race postcensal estimates of NA NA NA NA
138395 the July 1 resident population, from the Vintage 2012 series released by NCHS on June 13, 2013; population figures for 2013 are NA NA NA NA
138396 bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2013 series released by NCHS on June 26, NA NA NA NA
138397 2014; population figures for 2014 are bridged-race postcensal estimates of the July 1 resident population, from the Vintage 2014 NA NA NA NA
138398 series released by NCHS on June 30, 2015. and population figures for 2015 are bridged-race postcensal estimates of the July 1 NA NA NA NA
138399 resident population, from the Vintage 2015 series released by NCHS on June 28, 2016. More information: NA NA NA NA
138400 http://wonder.cdc.gov/wonder/help/cmf.html#Population Information. NA NA NA NA
138401
  1. Deaths of persons with Age “Not Stated” are included in “All” counts and rates, but are not distributed among age groups,
NA NA NA NA
138402 so are not included in age-specific counts, age-specific rates or in any age-adjusted rates. More information: NA NA NA NA
138403 http://wonder.cdc.gov/wonder/help/cmf.html#Not Stated. NA NA NA NA
138404
  1. Information included on the death certificate about the race and Hispanic ethnicity of the decedent is reported by the
NA NA NA NA
138405 funeral director as provided by an informant, often the surviving next of kin, or, in the absence of an informant, on the basis NA NA NA NA
138406 of observation. Race and ethnicity information from the census is by self-report. To the extent that race and Hispanic origin NA NA NA NA
138407 are inconsistent between these two data sources, death rates will be biased. More information: NA NA NA NA
138408 http://wonder.cdc.gov/wonder/help/cmf.html#Racial Differences. NA NA NA NA
138409
  1. As of April 3, 2017, the underlying cause of death has been revised for 125 deaths in 2014. More information:
NA NA NA NA
138410 http://wonder.cdc.gov/wonder/help/cmf.html#2014-Revision. NA NA NA NA
138411
  1. Circumstances in California resulted in unusually high death counts for the ICD-10 cause of death code R99, “Other
NA NA NA NA
138412 ill-defined and unspecified causes of mortality” for deaths occurring in years 2000 and 2001. Caution should be used in NA NA NA NA
138413 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#California-Reporting-Anomalies. NA NA NA NA
138414
  1. Circumstances in Georgia for the years 2008-2009 have resulted in unusually high death counts for the ICD-10 cause of death
NA NA NA NA
138415 code R99, “Other ill-defined and unspecified causes of mortality.” Caution should be used in interpreting these data. More NA NA NA NA
138416 information: http://wonder.cdc.gov/wonder/help/cmf.html#Georgia-Reporting-Anomalies. NA NA NA NA
138417
  1. Circumstances in New Jersey for the year 2009 have resulted in unusually high death counts for the ICD-10 cause of death code
NA NA NA NA
138418 R99, “Other ill-defined and unspecified causes of mortality” and therefore unusually low death counts in other ICD-10 codes, NA NA NA NA
138419 most notably R95, “Sudden Infant Death Syndrome” and X40-X49, “Unintentional poisoning.” Caution should be used in NA NA NA NA
138420 interpreting these data. More information: http://wonder.cdc.gov/wonder/help/cmf.html#New-Jersey-Reporting-Anomalies. NA NA NA NA

It looks like the Notes column of each query I made contains information about the dataset and the query used to generate it. Every other column is either empty or NA for those rows. For some reason the Age.Group.Code column came back as NA for one query but blank for all the others, which is why it had only 60 NA values. This means I can simply remove all the NA values from the main dataframe, as they all contain no data. It’s a little concerning that this happened, and makes me wonder if I accidentally changed something on one of my 14 queries. Fortunately, the notes fields for these NA values includes a description of each query, and I can verify that the query structure of the oddball query is the same as all the other ones. However, the oddball query is for the <1 year old age group. The query system allows for examination of further infant ranges inside the <1 group, so my best guess is that the existence of this subset of age groups inside the <1 age group is the cause of the minor difference in how the query system output the data here.

Regardless, it looks like I should simply be able to delete these rows, as they contain no actual data. The fact that they contain information on the nature of each exact query, and some of the documentation for the dataset is useful. Conveniently, having just subset those rows into a new dataframe, I’ll still have access to them if I need them, even after removing them from the main dataframe.

df <- subset(df, subset = !is.na(df$Year))

With the NA values taken care of, I also want to check to see if there are any other empty cells in the dataframe.

apply(df, 2, function(x) sum(x == ""))
##                Notes      ICD.Sub.Chapter ICD.Sub.Chapter.Code 
##               137664                    0                    0 
##                 Year            Year.Code            Age.Group 
##                    0                    0                    0 
##       Age.Group.Code               Gender          Gender.Code 
##                    0                    0                    0 
##                 Race            Race.Code               Deaths 
##                    0                    0                    0 
##           Population           Crude.Rate 
##                    0                    0

There are 137,664 empty cells in the Notes column. I know from earlier that the dataframe used to have 138,420 rows, and I just deleted 756, so our dataframe is currently 137,664 rows long. In other words, every cell in the Notes column is empty, so I can feel free to delete the entire column later on when I prune the dataframe down.

Now that the data appears to be fairly clean, I actually want to add a new column. For my analysis, I know that I’m going to want to first look at ICD chapter, rather than ICD sub-chapter. That way I can get a general idea of what the data looks like, then more narrowly examine areas of interest when they come up. Unfortunately, the CDC query system only allows for the data to be grouped by five different fields, and I used all five on sub-chapter, year, race, gender, and age. Fortunately, the system the ICD uses for chapter and sub-chapter codes is simple. The individual cause of death codes are a letter and two digits ranging from A00 to Z99. The code for a chapter is simply a range of values (e.g. “A00-B99”). The code for a sub-chapter is the same, just with a narrower range (e.g. “A00-A09”, “A16-A19”, etc). I already have the sub-chapter ranges, so in order to get a column of chapter ranges, I just need to make a new column and fill each cell with the ICD chapter codes that encompass the sub-chapter range.

To start, I want to define a new dataframe that includes both the chapter codes and short versions of the titles for each chapter in the ICD 10. Fortunately the WHO maintains a webpage with details on the ICD 10 I can use to get this information. Once I have the dataframe with the ICD chapter codes, I’m going to join the codes to the main dataframe. Most R functions for joining dataframes that I know of don’t have the kind of inequality join I need to do this, so I’m going to load in the “sqldf” library to write an SQL join query instead.

# Make ICD Chapter and Chapter Code dataframe
icd_key <- data.frame(
  ICD.Chapter.Code = c(
    "A00-B99",
    "C00-D48",
    "D50-D89",
    "E00-E90",
    "F00-F99",
    "G00-G99",
    "H00-H59",
    "H60-H95",
    "I00-I99",
    "J00-J99",
    "K00-K93",
    "L00-L99",
    "M00-M99",
    "N00-N99",
    "O00-O99",
    "P00-P96",
    "Q00-Q99",
    "R00-R99",
    "S00-T98",
    "V01-Y98",
    "Z00-Z99",
    "U00-U99"
  ),
  ICD.Chapter = c(
    "Parasitic/Infectious Diseases",
    "Neoplasms",
    "Blood Diseases",
    "Endocrine Diseases",
    "Mental Disorders",
    "Nervous System Diseases",
    "Eye Diseases",
    "Ear Diseases",
    "Circulatory Diseases",
    "Respiratory Diseases",
    "Digestive Diseases",
    "Skin Diseases",
    "Musculoskeletal Diseases",
    "Genitourinary Diseases",
    "Pregnancy/Childbirth",
    "Perinatal Conditions",
    "Congenital Malformations",
    "Not Classified Elsewhere",
    "Consequences of External Causes",
    "External Causes",
    "Healthcare Factors",
    "Special Purposes"
  )
)

# Join ICD Chapter Codes to dataframe

library(sqldf)

df <- sqldf("SELECT df.*, icd_key.'ICD.Chapter.Code' 
      FROM df 
      JOIN icd_key ON 
        LEFTSTR(df.'ICD.Sub.Chapter.Code', 3) >= 
          LEFTSTR(icd_key.'ICD.Chapter.Code', 3)
        AND LEFTSTR(df.'ICD.Sub.Chapter.Code', 3) <= 
            RIGHTSTR(icd_key.'ICD.Chapter.Code', 3)")

Next I want to make some changes to the data structure. I want the age group, gender, race, ICD chapter and ICD sub-chapter data to be factors, rather than character strings. Before I make any conversions, I want to make sure there aren’t any unexpected values in any of those columns that will wind up being coerced to NA if I try to convert to factor, or give me weird factor levels I didn’t expect.

unique(df$ICD.Chapter.Code)
##  [1] "A00-B99" "C00-D48" "D50-D89" "E00-E90" "F00-F99" "G00-G99" "H00-H59"
##  [8] "H60-H95" "I00-I99" "J00-J99" "K00-K93" "L00-L99" "M00-M99" "N00-N99"
## [15] "P00-P96" "Q00-Q99" "R00-R99" "V01-Y98" "U00-U99" "O00-O99"
unique(df$ICD.Sub.Chapter.Code)
##   [1] "A00-A09" "A16-A19" "A20-A28" "A30-A49" "A50-A64" "A70-A74" "A80-A89"
##   [8] "A90-A99" "B00-B09" "B15-B19" "B20-B24" "B25-B34" "B35-B49" "B50-B64"
##  [15] "B99-B99" "C00-C97" "D10-D36" "D37-D48" "D50-D53" "D55-D59" "D60-D64"
##  [22] "D65-D69" "D70-D76" "D80-D89" "E00-E07" "E10-E14" "E15-E16" "E20-E34"
##  [29] "E40-E46" "E50-E64" "E65-E68" "E70-E88" "F01-F09" "F10-F19" "F30-F39"
##  [36] "F40-F48" "F70-F79" "F80-F89" "F90-F98" "G00-G09" "G10-G14" "G20-G25"
##  [43] "G30-G31" "G35-G37" "G40-G47" "G50-G58" "G60-G64" "G70-G72" "G80-G83"
##  [50] "G90-G98" "H10-H11" "H15-H21" "H25-H27" "H30-H35" "H40-H40" "H43-H44"
##  [57] "H53-H54" "H65-H74" "H80-H83" "H90-H93" "I05-I09" "I10-I15" "I20-I25"
##  [64] "I26-I28" "I30-I51" "I60-I69" "I70-I78" "I80-I89" "I95-I99" "J00-J06"
##  [71] "J09-J18" "J20-J22" "J30-J39" "J40-J47" "J60-J70" "J80-J84" "J85-J86"
##  [78] "J90-J94" "J96-J98" "K00-K14" "K20-K31" "K35-K38" "K40-K46" "K50-K52"
##  [85] "K55-K63" "K65-K66" "K70-K76" "K80-K86" "K90-K92" "L00-L08" "L20-L30"
##  [92] "L50-L53" "L80-L98" "M00-M25" "M30-M35" "M40-M54" "M60-M79" "M80-M94"
##  [99] "M95-M99" "N00-N07" "N10-N15" "N17-N19" "N20-N23" "N25-N28" "N30-N39"
## [106] "N40-N50" "N70-N76" "N80-N98" "P00-P04" "P05-P08" "P10-P15" "P20-P29"
## [113] "P35-P39" "P50-P61" "P70-P74" "P76-P78" "P80-P83" "P90-P96" "Q00-Q07"
## [120] "Q10-Q18" "Q20-Q28" "Q30-Q34" "Q35-Q37" "Q38-Q45" "Q50-Q56" "Q60-Q64"
## [127] "Q65-Q79" "Q80-Q89" "Q90-Q99" "R00-R09" "R10-R19" "R20-R23" "R25-R29"
## [134] "R40-R46" "R47-R49" "R50-R68" "R70-R79" "R90-R94" "R95-R99" "V01-V99"
## [141] "W00-X59" "X85-Y09" "Y10-Y34" "Y35-Y36" "Y40-Y84" "A65-A69" "A75-A79"
## [148] "B65-B83" "B90-B94" "F20-F29" "F60-F69" "H49-H52" "H55-H57" "I00-I02"
## [155] "L40-L44" "L55-L59" "L60-L75" "U00-U49" "Y85-Y89" "D00-D09" "F50-F59"
## [162] "F99-F99" "H00-H05" "L10-L13" "X60-X84" "B85-B89" "H46-H47" "O00-O07"
## [169] "O10-O16" "O20-O29" "O30-O48" "O60-O75" "O95-O99" "O85-O92" "N60-N64"
## [176] "H60-H61" "R80-R82" "R30-R39" "R83-R89"
unique(df$Age.Group.Code)
##  [1] "1"     "1-4"   "5-9"   "10-14" "15-19" "20-24" "25-34" "35-44" "45-54"
## [10] "55-64" "65-74" "75-84" "85+"
unique(df$Gender.Code)
## [1] "F" "M"
unique(df$Race.Code)
## [1] "2054-5" "2106-3" "A-PI"   "1002-5"

It doesn’t look like there are any nasty surprises hidden in the data, so I can go ahead and convert these columns to factors.

factors <- c("ICD.Chapter.Code",
             "ICD.Sub.Chapter.Code",
             "Age.Group.Code",
             "Gender.Code",
             "Race.Code")

df[,factors] <- lapply(df[,factors], factor)

The next thing I want to fix is that I have too many different sub-groups for age. I’ve mentioned a few times that I have 14 different age groups. However, there are only 11 age group weights to use when doing the age adjustment. This is because the age weights use the groups “5-14 years” and “15-24 years”, while our dataset has the groups “5-9 years,” “10-14 years,” “15-19 years,” and “20-24 years”. This means I have four age groups that I want to collapse into two age groups, summing together the deaths and population. To start, I’ll rename the age groups in my dataframe to match the weights used to calculate age adjustments.

df$Age.Group <- gsub("(5-9 years)|(10-14 years)", "5-14 years", df$Age.Group)
df$Age.Group <- gsub("(15-19 years)|(20-24 years)", "15-24 years", df$Age.Group)
df$Age.Group.Code <- fct_collapse(
  df$Age.Group.Code,
  `5-14` = c("5-9", "10-14"), 
  `15-24` = c("15-19", "20-24")
)

Now my factor levels match the ones I need, but I have two different “5-14” and “15-24” rows for each group. I’ll sum those rows together shortly, but it’ll be a simpler operation if I remove the redundant columns from my dataset first. In general, I want to keep the “.Code” columns, because their version of the data is much shorter, which makes it easier to type, and makes the code easier to read (with the exception of the Year.Code column, which is identical to the regular Year column in every way). However, when visualizing the data, I’ll want more descriptive labels, closer to what’s in the plain English columns. Instead of deleting the plain English columns outright, I’m going to set up a relational structure by making several new dataframes that match the “.Code” version of each factor level to the plain English version. That way I can reference the plain English description of each variable level whenever I need it, but it doesn’t clog up the dataframe, or require overly long strings to filter the data.

gender_key <- unique(df[, c("Gender.Code", "Gender")])
age_key <- unique(df[, c("Age.Group.Code", "Age.Group")])
icd_sub_key <- unique(
  x = df[, c("ICD.Chapter.Code", "ICD.Sub.Chapter.Code", "ICD.Sub.Chapter")]
)
race_key <- unique(df[, c("Race.Code", "Race")])

# Sort keys alphabetically where not already alphabetical

race_key <- arrange(race_key, Race.Code)
icd_key <- arrange(icd_key, ICD.Chapter.Code)
icd_sub_key <- arrange(icd_sub_key, ICD.Sub.Chapter.Code)

Now that I’ve made sure I have all the information about what each column means somewhere, I can go ahead and delete the redundant columns. Additionally, I noted earlier I wanted to remove the Notes and Crude.Rate columns. In addition, the Year.Code column is the same as the Year column, so I certainly don’t need both. “Year” is a lot easier to type than “Year.Code”, so I’ll be removing the latter. I’d also like the ICD.Chapter.Code column to appear as the first column, rather than the last column as it current does.

# Remove unwanted columns
df <- subset(df, select = -c(1, 2, 5, 6, 8, 10, 14))

# Put ICD.Chapter.Code first

df <- select(df, ICD.Chapter.Code, everything())

Finally, I need to sum across the 5-14 and 15-24 age groups that I created earlier.

df <- df %>% 
  group_by(
    ICD.Chapter.Code,
    ICD.Sub.Chapter.Code,
    Year,
    Age.Group.Code,
    Gender.Code,
    Race.Code
  ) %>%
  summarize_all(sum) %>%
  ungroup()

Now that my dataframe is clean and easy to work with, I can start having some fun with data!

Exploratory Analysis

As I mentioned earlier, I want to look at age-adjusted death rates, rather than just the crude death rate. To calculate the age-adjusted mortality rate itself is fairly simple. You multiply the crude death rate for each individual age group by the age weight, and then add each of those results together. When I queried the data, I could have actually had the age-adjusted mortality rates included but, as with the crude rate, I want to define the groups of interest myself and calculate their specific age-adjusted rates.

The first thing I want to do is add the age weights to the age group dataframe I made earlier. In fact, because the age group and age group code are so similar, and I’m not planning to report on any age values, the main reason I made this dataframe to begin with was to have a place to put these age weights. The only other reason to have it is as a reminder that the age group code “1” actually means “<1”.

age_key$Age.Weight <- c(0.013818, # <1
                        0.055317, # 1-4
                        0.145565, # 5-14
                        0.138646, # 15-24
                        0.135573, # 25-34
                        0.162613, # 35-44
                        0.134834, # 45-54
                        0.087247, # 55-64
                        0.066037, # 65-74
                        0.044842, # 75-84
                        0.015508  # >=85
)

If I’m going to make a lot of different figures based on different groupings of people, I’m also going to want a way to easily and quickly calculate the age-adjusted mortality rates by group. That means I need to write a function that outputs just that. It should allow me to work out the age-adjusted mortality rates for the populations I want in just a few commands and pipe the results into a figure.

# Function takes the name of the dataframe, then the name(s) of any grouping 
# variables of interest
age_adjust <- function(X, ...){
  
  # Age.Group.Call needs to be included alongside the specific groups listed in 
  # the function call.
  X <- group_by(X, ..., Age.Group.Code)
  
  X <- summarise(X, Population = sum(unique(Population)), Deaths = sum(Deaths))

  # Calculate crude rate for each age group
  X$Crude.Rate <- (X$Deaths * 100000) / X$Population
   
  # Add in the age weights from the age_key dataframe
  X <- merge(
    x = X, 
    y = age_key[, c("Age.Group.Code", "Age.Weight")], 
    by = "Age.Group.Code"
  )

  # Multiply crude rates by age weight and sum across age groups for finished 
  # product
  X$Age.Adjustment <- X$Age.Weight * X$Crude.Rate

  X <- group_by(X, ...)

  X <- summarise(X, Crude.Rate.Adjusted = sum(Age.Adjustment))
  
  return(X)
}

What are the leading causes of death from 1999 to 2016?

The first thing I want to know is what the top overall causes of death are. For this, I’m not going to use the age-adjusted rates. I just want to get an idea of what the most common causes of death are across this entire time-span. To do that, I’ll just sum the deaths by cause across the dataframe. Because there are over a hundred sub-chapters in the ICD, I just want to look at the main chapters at first to get a general picture.

total_deaths <- df %>%
  left_join(icd_key, by = c("ICD.Chapter.Code" = "ICD.Chapter.Code")) %>%
  group_by(ICD.Chapter.Code, ICD.Chapter) %>%
  summarise(Deaths = sum(Deaths)) %>%
  arrange(desc(Deaths))

kable(total_deaths) %>%
  kable_styling() %>%
  scroll_box(width = "100%", height = "600px")
ICD.Chapter.Code ICD.Chapter Deaths
I00-I99 Circulatory Diseases 15251874
C00-D48 Neoplasms 10504677
J00-J99 Respiratory Diseases 4355028
V01-Y98 External Causes 3281894
G00-G99 Nervous System Diseases 2379786
E00-E90 Endocrine Diseases 1853280
F00-F99 Mental Disorders 1761991
K00-K93 Digestive Diseases 1648674
A00-B99 Parasitic/Infectious Diseases 1198825
N00-N99 Genitourinary Diseases 1120420
R00-R99 Not Classified Elsewhere 612842
M00-M99 Musculoskeletal Diseases 248895
P00-P96 Perinatal Conditions 238935
Q00-Q99 Congenital Malformations 182947
D50-D89 Blood Diseases 177261
L00-L99 Skin Diseases 74236
O00-O99 Pregnancy/Childbirth 14267
U00-U99 Special Purposes 2931
H60-H95 Ear Diseases 1240
H00-H59 Eye Diseases 829

Diseases of the circulatory system and neoplasms lead the other top causes of death by an order of magnitude. This isn’t too surprising, as those are the chapters that include cancer and heart disease. Our least common causes of death include diseases of the eye, ear, deaths from codes reserved for special purposes, pregnancy and childbirth, and skin diseases. I’m actually quite surprised to see pregnancy and childbirth as low on the list as they are, given how dangerous childbirth has historically been.

This table also gives me some information on how I want to break down my figures. Looking at every ICD chapter by year would be too much information to show on a line graph and have it still be readable, so I already knew I’d want to split them across multiple figures. Now I know that I probably want to graph neoplasms and circulatory diseases separately from other causes. As a general rule of thumb, I like to keep the scale of a figure inside the same order of magnitude.

Has the death rate for leading causes of death changed over time?

This whole project was started by a discussion of how heart disease has become more treatable, so naturally I want to delve deeper into the death rate for circulatory disease immediately. Neoplasms works well as a cause of death to graph alongside it, even if it weren’t the only other one in the same order of magnitude. Treating cancer is extremely difficult, and the subject of a massive amount of research, so it’ll be good to see how they compare next to each other.

death_cause <- c("C00-D48", "I00-I99")

filter(df, ICD.Chapter.Code %in% death_cause) %>%
  age_adjust(ICD.Chapter.Code, Year) %>%
  ggplot(aes(x = Year, y = Crude.Rate.Adjusted, color = ICD.Chapter.Code)) +
  geom_line() +
  geom_point() +
  ylab("Age-Adjusted Crude Death Rate") +
  labs(title = "Top Causes of Death Over Time") +
  scale_color_discrete(
    labels = icd_key$ICD.Chapter[icd_key$ICD.Chapter.Code %in% death_cause],
    name = "Cause of Death"
  )

There’s an overall decline for both causes, but it’s much steeper for circulatory diseases than neoplasms. The decline for neoplasms is impressively linear. I wouldn’t normally expect to see such a steady and consistent decline in real-world data. By contrast the slope on the circulatory disease appears less consistent. Just eyeballing it, I’d say it’s probably linear, but there are some potential hints of a logarithmic curve to it, meaning the decline in the death rate for circulatory disease might be starting to flatten out. I don’t think I have data across enough years to get meaningful results out of trying to fit both functions to the data and seeing which performs better. In another analysis it might be worth querying some of the databases that use older versions of the ICD and trying to equate their measures to get a bigger sample size.

Is the decline in leading causes of death the same across gender and race?

Earlier I’d mentioned race and gender as potential factors, so it may be interesting to see if these factors interact with the data.

labels_top <- c("Neoplasms", "Circulatory Diseases")
names(labels_top) <- death_cause

filter(df, ICD.Chapter.Code %in% death_cause) %>%
  age_adjust(ICD.Chapter.Code, Year, Race.Code, Gender.Code) %>%
  ggplot(
    aes(
      x = Year, 
      y = Crude.Rate.Adjusted, 
      color = Race.Code, 
      shape = Gender.Code, 
      linetype = Gender.Code
    )
  ) +
  geom_line() +
  geom_point() +
  facet_wrap(
    facets = ~ICD.Chapter.Code, 
    labeller = labeller(ICD.Chapter.Code = labels_top)
  ) +
  ylab("Age-Adjusted Crude Death Rate") +
  labs(title = "Top Causes of Death Over Time by Race and Gender") +
  scale_color_discrete(labels = race_key$Race, name = "Race") +
  scale_shape_discrete(labels = gender_key$Gender, name = "Gender") +
  scale_linetype_manual(
    labels = gender_key$Gender, 
    name = "Gender", 
    values = c(1, 3)
  ) +
  guides(colour=guide_legend(override.aes=list(shape=NA)))

There are differences in death rate based on race, with Black and African American people having the highest death rates for both neoplasms and circulatory disease, but Asian people and Pacific Islanders having the lowest. In general the death rates are also higher for men than women. The overall reduction over time is fairly similar for each group though, indicating that while there is definitely an inequality in death rates, the decline in death rates doesn’t affect minority groups disproportionately less than white people. In fact, the slopes for neoplasms appear to be steeper for groups that had higher initial death rates. This may indicate that for neoplasms the amount of inequality is starting to lessen over time, though it would help to see more recent data in order to see if this is a lasting trend.

For circulatory diseases, the shape of the data is starting to look a lot more logarithmic now, with a fairly clear trend towards the decline leveling off. While the death rate for Black or African American men has declined sharply, the curve flattens out dramatically around 2011, keeping it well above the other groups.

Have common causes of death changed over time?

Most of the ICD chapters showed a total number of deaths in the range of millions, so next I want to take a general look at all of those.

death_cause <- total_deaths$ICD.Chapter.Code[
  which(
    x = total_deaths$Deaths > 999999 & total_deaths$Deaths < 10000000
  )
]

filter(df, ICD.Chapter.Code %in% death_cause) %>%
  age_adjust(ICD.Chapter.Code, Year) %>%
  ggplot(aes(x = Year, y = Crude.Rate.Adjusted, color = ICD.Chapter.Code)) +
  geom_line() +
  geom_point(aes(shape = ICD.Chapter.Code)) + 
  ylab("Age-Adjusted Crude Death Rate") +
  labs(title = "Common Causes of Death Over Time") +
  scale_color_discrete(
    labels = icd_key$ICD.Chapter[icd_key$ICD.Chapter.Code %in% death_cause],
    name = "Cause of Death"
  ) +
  scale_linetype_discrete(
    labels = icd_key$ICD.Chapter[icd_key$ICD.Chapter.Code %in% death_cause],
    name = "Cause of Death"
  ) +
  scale_shape_manual(
    labels = icd_key$ICD.Chapter[icd_key$ICD.Chapter.Code %in% death_cause],
    name = "Cause of Death",
    values = c(15, 16, 17, 15, 16, 17, 15, 16)
  )

Most of these are fairly steady over time. Interestingly, death due to mental diseases shows a steady incline up to about 2013, after which it starts to decline again. Deaths from nervous system diseases, likewise, increase after that point. The other point of interest is deaths from external causes show a sudden massive uptick between 2014 and 2015, and continue to rise on this new, very steep slope into 2016.

Why is there a tradeoff between deaths from mental disorders and nervous system diseases?

There’s no guarantee these are related, but the difference between a mental disorder, as classified by the ICD 10 and a nervous system disease is a very fine line. In fact, several forms of dementia, such as dementia from Alzheimer’s or Parkinson’s disease, are listed under mental disorders but with a reference to codes in the nervous system diseases chapter for the underlying illnesses themselves. The divergence also appears immediately after the publication of the 5th edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-V). My guess would be that changes in the DSM-V caused a shift in reporting, where deaths were more often classified as being due to nervous system disorders rather than due to dementia stemming from those disorders. In general dementia from organic causes, such as nervous system diseases, is in the F01-F09 sub-chapter of the ICD 10. It looks like the corresponding sub-chapters for nervous system diseases include Pick’s disease (G31), Huntington’s Disease(G10), and Parkinson’s Disease (G20), so the sub-chapters we want there are G10-G14, G20-G25, and G30-G31.

death_cause <- c("F01-F09","G10-G14", "G20-G25", "G30-G31")

filter(df, ICD.Sub.Chapter.Code %in% death_cause) %>%
  age_adjust(ICD.Chapter.Code, ICD.Sub.Chapter.Code, Year) %>%
  ggplot(aes(x = Year, y = Crude.Rate.Adjusted, color = ICD.Sub.Chapter.Code)) +
  geom_line() +
  geom_point() +
  ylab("Age-Adjusted Crude Death Rate") +
  labs(
    title = "Deaths from Organic Mental Disorders and Nervous System Disorders"
  ) +
  scale_color_discrete(name = "ICD Sub-Chapters"
  )

The pattern of interest definitely seems to be related specifically to sub-chapter G30-G31, which includes Alzheimer’s disease and Pick’s disease. As in the overall graph by chapter, around 2013 the number of deaths in the ICD sub-chapter that includes dementia from nervous system diseases starts to decline, while at the same time deaths from Alzheimer’s and Pick’s disease start to rise. That definitely lines up with the idea that there wasn’t a major change in actual causes of death after 2013, just a change in how deaths were classified. The reason the pattern is seen in the Alzheimer’s and Pick’s disease group, but not the others, is probably simply due to Alzheimer’s disease being overwhelmingly more common, and thus making up the majority of the data.

How have deaths from external causes changed over time?

External causes is a broad category that includes essentially anything that isn’t a disease. Because of that, I suspect this is an area where we’re particularly likely to see differences based on things like socio-economic status, so I want to start by breaking the data down by race and gender.

labels_gender <- c(`F` = "Female", `M` = "Male")


filter(df, ICD.Chapter.Code == "V01-Y98") %>%
  age_adjust(ICD.Chapter.Code, Year, Race.Code, Gender.Code) %>%
  ggplot(aes(x = Year, y = Crude.Rate.Adjusted, color = Race.Code)) +
  facet_wrap(~Gender.Code, labeller = labeller(Gender.Code = labels_gender))+
  geom_line() +
  geom_point() +
  ylab("Age-Adjusted Crude Death Rate") +
  labs(title = "Deaths from External Causes by Race and Gender") +
  scale_color_discrete(labels = race_key$Race, name = "Race")

There are some differences due to race, most notably a dramatically lower rate of death via external cause for the Asian or Pacific Islander group. In terms of gender, the massive uptick in death rate for external causes seems to be accounted for almost entirely by the male groups, except for the Asian or Pacific Islander male group.

What causes the increase in deaths from external causes for men after 2014?

To try and work out why there’s a sudden increase in death from external causes for men, I want to break down the data by gender and sub-chapter to try and see what specific cause is behind this.

filter(df, ICD.Chapter.Code == "V01-Y98") %>%
  age_adjust(ICD.Sub.Chapter.Code, Year, Gender.Code) %>%
  ggplot(aes(x = Year, y = Crude.Rate.Adjusted, color = ICD.Sub.Chapter.Code)) +
  facet_wrap(~Gender.Code, labeller = labeller(Gender.Code = labels_gender))+
  geom_line() +
  geom_point() +
  ylab("Age-Adjusted Crude Death Rate") +
  labs(title = "Deaths from External Causes by Category and Gender") +
  scale_color_discrete(
    labels = str_wrap(
      icd_sub_key$ICD.Sub.Chapter[icd_sub_key$ICD.Chapter.Code == "V01-Y98"], 
      20
    ), 
    name = "Death Cause"
  )

Unfortunately the dramatic uptick in male deaths falls into the “Other” category, which contains well over a hundred unique causes of death, including falls, accidents involving machinery, animal attacks, and radiation exposure. With such a broad range, there’s no telling if the sudden spike in death rate for men in this category is due to a single factor or multiple external factors. I’ll likely do another analysis later on that dives into just the external cause of death information to get more answers.

Of additional interest is the decrease in death rate for men due to transport accidents after 2007. I wasn’t able to find any immediately obvious answers for what might have caused this, but it might be worth further investigation to try and determine if there were any likely laws, safety updates, or policy changes that caused this decline so that similar measures can be used to further reduce the death rate. There is a slight dip for women as well in the same time period, but much less noticeable. This is likely due to women having a much lower death rate from transport accidents overall. Any interventions or changes in transportation safety would have had a larger impact on those more likely to suffer accidents in the first place.

Summary

From this dataset, it’s clear to see that there is still a large deal of racial and gender inequality in causes of death. This is especially evident in racial inequality for circulatory diseases and neoplasms. However, there is some slight evidence to show that the inequality is starting to become less severe when it comes to death due to neoplasms. There has been a clear reduction in death rate due to neoplasms and circulatory disease for all groups, though the decline in circulatory disease may be starting to level out. Overall death rates are highest for Black or African American people, and lowest for people who are Asian or Pacific Islanders. This is in line with a 2002 report by the Population Reference Bureau and the trends do not appear to have changed since that time.

There has also been a rise in the Alzheimer’s and Pick’s disease death rates, likely due primarily to Alzheimer’s disease. While the rise in deaths due to Alzheimer’s disease appears to have jumped dramatically after 2013, it’s likely due to deaths that were formerly classified as being due to dementia now being classified as death due to Alzheimer’s disease. With this in mind, the increase in death due to Alzheimer’s disease appears to be a fairly steady trend.

Finally, there is a dramatic uptick in deaths due to external causes after 2014. This uptick is almost entirely accounted for by men who are not Asian or Pacific Islanders dying from causes classified as “other”. Inspection of more detailed data will be needed to try and tease out the trends behind this. Additionally, there was a dramatic drop in death rate from transport accidents after 2007 for men. The exact cause is unknown, but potentially of further interest. If the cause was due to a new policy of some kind, replication of that policy could further lower accident rates.

Overall, while this analysis does answer the original question it set out to examine, I feel as if it’s raised more questions than answers, which I consider to be about the perfect combination. There’s little more satisfying than finding that the answers to your questions come with new questions, and new depths to explore, so I look forward to coming back to this area and trying to see not only what answers I can come up with, but what new questions as well.