R is wonderful for spatial statistics but not so for making beautiful maps. Displaying data on a map in R is straightforward, but making the presentation publication quality can be tedious. Until now. Martijn Tennekes’s tmap package simplifies things alot. And tmap functions accept the new simple features class (sf) of Edzer Pebesma, really sweetening the deal.
Let’s see how this works. We start with getting census data using Kyle Walker’s awesome tidycensus package, which provides an interface to the U.S. Census Bureau’s decennial Census and five-year American Community APIs. The data are returned as data frames (tidyverse-ready) with a simple feature
geometry column. You need your own API key. Get it from http://api.census.gov/data/key_signup.html.
## Loading tidyverse: ggplot2 ## Loading tidyverse: tibble ## Loading tidyverse: tidyr ## Loading tidyverse: readr ## Loading tidyverse: purrr ## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats ## lag(): dplyr, stats
library(tidycensus) census_api_key("YOUR KEY HERE")
Our recent research on determinants of tornado casualties has led us to look closely at the proportion of elderly white people in the Mid South. Here we get the number of 65-74 yr-old women (white only) at the state level. The argument
sumfile = "acs5" says get the data from the 5-year American Community Survey. By default,
tidycensus functions return tidy data frames in which rows represent unit-variable combinations; for a wide data frame with Census variable names in the columns, set
output = "wide" in the function call.
elderly.df = get_decennial(geography = "state", variables = "B01001A_029E", sumfile = "acs5", year = 2010, output = "wide") %>% rename(Women = B01001A_029E) head(elderly.df)
## # A tibble: 6 x 3 ## Women NAME GEOID ## <dbl> <chr> <chr> ## 1 151516 Alabama 01 ## 2 11177 Alaska 02 ## 3 217121 Arizona 04 ## 4 104742 Arkansas 05 ## 5 810243 California 06 ## 6 135227 Colorado 08
The function returns a data frame with three columns in wide format.
GEOID is an identifier for the geographical unit associated with the row and
NAME is a descriptive name of the geographical unit. The variable name is changed to
We can then use
ggplot to visualize the data with a dot chart.
elderly.df %>% ggplot(aes(x = Women/1000, reorder(NAME, Women))) + geom_point() + xlab("Number of Women (65-74) [thousands]") + ylab("")
geometry = TRUE in the
get_decennial function we get a ‘simple feature’ geometry column in the data frame. Here polygons representing county boundaries (
geography = "county"). We get the number of elderly (age 65-74) white women and the total number of white women by county. We use the
mutate function to compute a percentage.
elderly.df = get_decennial(geography = "county", variables = c("B01001A_029E", "B01001_026E"), sumfile = "acs5", year = 2010, output = "wide", geometry = TRUE) %>% rename(Elderly = B01001A_029E, Total = B01001_026E) %>% mutate(Percent = Elderly/Total * 100) head(elderly.df)
## # A tibble: 6 x 6 ## GEOID Elderly Total NAME geometry ## <chr> <dbl> <dbl> <chr> <S3: sfc_MULTIPOLYGON> ## 1 01029 648 7426 Cleburne County, Alabama <S3: sfc_MULTIPOLYGON> ## 2 01031 1783 24609 Coffee County, Alabama <S3: sfc_MULTIPOLYGON> ## 3 01037 417 5804 Coosa County, Alabama <S3: sfc_MULTIPOLYGON> ## 4 01039 1798 19386 Covington County, Alabama <S3: sfc_MULTIPOLYGON> ## 5 01041 488 7191 Crenshaw County, Alabama <S3: sfc_MULTIPOLYGON> ## 6 01045 1757 25194 Dale County, Alabama <S3: sfc_MULTIPOLYGON> ## # ... with 1 more variables: Percent <dbl>
The data frame contains the county borders (
geometry column) as multipolygons in well-known text (WKT) format, so we could go directly to the plotting package. However, the data contains information from U.S. territories that are not relevant to our interests.
We first need a map domain. Here we use the composite counties (Alaska and Hawaii inserted) from Bob Rudis’s nice
albersusa package available on github. Make sure you have the
devtools package installed.
#devtools::install_github("hrbrmstr/albersusa") library(albersusa) cty_sf = counties_sf("aeqd") #plot(cty_sf["census_area"])
plot method for simple features plots the attribute selected (here
census_area). This is the map domain we want.
Join data with spatial data and make a choropleth map.
cty_sf$GEOID = as.character(cty_sf$fips) cty_elderly = left_join(as.data.frame(cty_sf), as.data.frame(elderly.df), by = "GEOID") cty_elderly$geometry = cty_elderly$geometry.x cty_elderly = sf::st_as_sf(cty_elderly) #plot(cty_elderly["Percent"])
While we can tool around with basic R graphics to add legends, scale bar, etc, this requires lots of time-sucking trial-and-error.
Instead, the tmap package is a flexible, layer-based, and easy to use approach to create thematic maps including choropleths and bubble maps. It is based on the grammar of graphics, and resembles the syntax of ggplot2.
library(tmap) us_sf = usa_sf("aeqd") tm_shape(cty_elderly) + tm_polygons("Percent", border.col = NULL, title = "% White Women 65-74", palette = "Purples") + tm_shape(us_sf) + tm_borders() + tm_compass() + tm_scale_bar() + tm_format_NLD(legend.position = c("left", "bottom"), attr.position = c("right", "bottom"))
This plot has two groups of layers from the shape objects
us_sf (state borders). Each layer is called with
tm_shape. The shape objects can have different projections and can also cover different areas (bounding boxes).
The order of the layers dictates the plotting order. We first plot the percent white women by county and then add the state borders. The color ramp is specified using the palette names from the RColorBrewer package.
The compass and scale bar are added as separate layers. The layout format is specified by the
tm_format_NLD function as a set of prespecified options from the
Finally, we can plot total number of white women and the percentage of elderly white women for the states of Arkansas and Missouri, only.
tm_shape(cty_elderly[cty_elderly$state == "Arkansas" | cty_elderly$state == "Missouri", ]) + tm_polygons(c("Elderly", "Percent"), title = c("Elderly White Women", "Percentage of Total"), palette = list("Greens", "Purples")) + tm_scale_bar() + tm_compass() + tm_format_NLD(legend.position = c("left", "bottom"), attr.position = c("right", "top"))
The plot page can be rescaled to better position the compass, scale bar, and legend.
The format of the tmap map objects (meoms) are like those of the ggplot2 geometric objects (geoms) making it easy to get to a publication-quality map. The fine details can be worked out in production.
More information? See: https://cran.r-project.org/web/packages/tmap/vignettes/tmap-nutshell.html