This is a tutorial on how to map spatial data with R. In this
tutorial, census data will be obtained using two different methods and
then mapped to show the housing density in Travis County - Texas,
emphasizing Downtown Austin in the years 1990 and 2000. This tutorial is
in two parts. In part one, census data will be downloaded from the NHGIS website and then mapped. The
second part of this tutorial follows an easier method of census data
extraction using the tidycensus package
and how to map the obtained data. The Census Geography used in this
tutorial is Block Groups.
Objectives
This tutorial assumes that the learner is familiar with or has basic knowledge of Geographic Information Systems. Now let’s begin! :)
The first step to mapping the census data obtained from the NHGIS
website is to look at the dataset and then process it for mapping. This
section aims to import the downloaded dataset into R and make them clean
enough for mapping. The dplyr tool is required to help with
easy data manipulation. This function forms part of the tidyverse package,
thus, while it can be installed and opened individually, installing and
opening the full tidyverse is ideal for this tutorial. The
sf package is
useful for handling spatial vector data and will be useful in this
tutorial. This tutorial assumes that all the required packages have
already been installed in R. If any package has not been installed, use
the install.packages() function to install it.
Click on the download links above this text to download the datasets used in the first part of this tutorial.
#Open the required libraries
library(tidyverse)
library(sf)
#Import the data and store it under a new name
Data1990<-read.csv("INSERT YOUR FILE PATH HERE")
#Have a glance at the data
head(Data1990)
## GISJOIN YEAR STUSAB ANRCA AIANHHA RES_ONLYA TRUSTA AIANCC RES_TRSTA
## 1 G480001095011 1990 TX NA NA NA NA NA NA
## 2 G480001095012 1990 TX NA NA NA NA NA NA
## 3 G480001095013 1990 TX NA NA NA NA NA NA
## 4 G480001095021 1990 TX NA NA NA NA NA NA
## 5 G480001095022 1990 TX NA NA NA NA NA NA
## 6 G480001095023 1990 TX NA NA NA NA NA NA
## BLOCKA BLCK_GRPA TRACTA CD101A C_CITYA CMSA COUNTY COUNTYA CTY_SUBA
## 1 NA 1 9501 NA NA 99 Anderson 1 NA
## 2 NA 2 9501 NA NA 99 Anderson 1 NA
## 3 NA 3 9501 NA NA 99 Anderson 1 NA
## 4 NA 1 9502 NA NA 99 Anderson 1 NA
## 5 NA 2 9502 NA NA 99 Anderson 1 NA
## 6 NA 3 9502 NA NA 99 Anderson 1 NA
## COUSUBCC DIVISIONA MSA_CMSAA PLACEA PLACECC PLACEDC PMSAA REGIONA STATE
## 1 NA 7 9999 NA NA NA 9999 3 Texas
## 2 NA 7 9999 NA NA NA 9999 3 Texas
## 3 NA 7 9999 NA NA NA 9999 3 Texas
## 4 NA 7 9999 NA NA NA 9999 3 Texas
## 5 NA 7 9999 NA NA NA 9999 3 Texas
## 6 NA 7 9999 NA NA NA 9999 3 Texas
## STATEA URBRURALA URB_AREAA CD103A AREALAND AREAWAT ANPSADPI FUNCSTAT INTPTLAT
## 1 48 NA NA NA 165167 4035 BG 1 S 31989858
## 2 48 NA NA NA 15451 37 BG 2 S 32061100
## 3 48 NA NA NA 231306 232 BG 3 S 31974332
## 4 48 NA NA NA 288742 2468 BG 1 S 31969331
## 5 48 NA NA NA 102001 243 BG 2 S 31874321
## 6 48 NA NA NA 114305 1286 BG 3 S 31816593
## INTPTLNG PSADC ESA001
## 1 -95493659 NA 504
## 2 -95499077 NA 548
## 3 -95618887 NA 493
## 4 -95754825 NA 607
## 5 -95817188 NA 227
## 6 -95784733 NA 401
Fortunately, the data used in this tutorial is structured with little processing to be done. From the table, the Total Number of Housing Units is assigned to column ESA001. The GISJOIN Column is a character variable that gives a unique ID to each row and it is very useful for Spatial Joins. The TRACTA Column provides the census tract code for each county and it is very useful for selecting places within each County.
#Now let's clean the data and filter out Travis from the COUNTY variable
#Also, select the needed variables and then rename the Total Housing Units
Data1990<-Data1990|>filter(COUNTY=="Travis")|>
select(GISJOIN, COUNTY, TRACTA, ESA001)|>
rename(HousingUnits=ESA001)
#Now import the shapefile
Shape1990<-read_sf("INSERT YOUR FILE PATH HERE")
#Take a look at the shapefile
head(Shape1990)
## Simple feature collection with 6 features and 8 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -4985513 ymin: 3283276 xmax: -3806974 ymax: 3838300
## Projected CRS: USA_Contiguous_Albers_Equal_Area_Conic
## # A tibble: 6 × 9
## FIPSSTCO TRACT GROUP STFID GISJOIN GISJOIN2 SHAPE_AREA SHAPE_LEN
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 02013 990000 1 020139900001 G020013099001 0200130… 9.73e9 2054992.
## 2 02013 990000 2 020139900002 G020013099002 0200130… 1.19e9 966449.
## 3 02013 990000 3 020139900003 G020013099003 0200130… 6.60e9 1423147.
## 4 02013 990000 4 020139900004 G020013099004 0200130… 6.89e8 422220.
## 5 02013 990099 4 020139900994 G02001309900… 0200130… 2.48e2 71.9
## 6 02016 961900 4 020169619004 G020016096194 0200160… 5.73e9 2433284.
## # ℹ 1 more variable: geometry <MULTIPOLYGON [m]>
#Now join the dataset to the shapefile using GISJOIN
Data1990<-merge(Data1990, Shape1990, by="GISJOIN")
#Take a glance
head(Data1990)
## GISJOIN COUNTY TRACTA HousingUnits FIPSSTCO TRACT GROUP STFID
## 1 G48045300001011 Travis 101 412 48453 000101 1 484530001011
## 2 G48045300001012 Travis 101 370 48453 000101 2 484530001012
## 3 G48045300001013 Travis 101 766 48453 000101 3 484530001013
## 4 G48045300001014 Travis 101 172 48453 000101 4 484530001014
## 5 G48045300001015 Travis 101 235 48453 000101 5 484530001015
## 6 G48045300001021 Travis 102 710 48453 000102 1 484530001021
## GISJOIN2 SHAPE_AREA SHAPE_LEN geometry
## 1 48045300001011 966215.4 5581.166 MULTIPOLYGON (((-168128.1 -...
## 2 48045300001012 469290.9 3368.958 MULTIPOLYGON (((-168190.1 -...
## 3 48045300001013 525051.4 4426.543 MULTIPOLYGON (((-168017.8 -...
## 4 48045300001014 536481.6 3523.410 MULTIPOLYGON (((-167857.5 -...
## 5 48045300001015 630604.9 3650.997 MULTIPOLYGON (((-168539.5 -...
## 6 48045300001021 3091207.1 9995.464 MULTIPOLYGON (((-168997.9 -...
The main objective of this part of the tutorial is to map the Housing Density for the State of Texas Using 1990 Block Group Census Data. The first part of this tutorial is focused on the importation, cleaning, and mapping of census data from NHGIS. Now that we have the data in a nicely structured format, it is time to make simple spatial calculations with R. From the data, it can be seen that the simple feature type is Multipolygon (a set of polygons). Knowing this, we can easily calculate the total land area in each census block group and then calculate the residential or housing density for Travis County at the block group level.
#First, let's convert the data into sf object and then use the appropriate projection for the State of Texas
Data1990<-st_as_sf(Data1990)
#Now find the best CRS for the dataset and then apply it to the dataset. The crsuggest library will help with this.
library(crsuggest)
suggest_crs(Data1990)
## # A tibble: 10 × 6
## crs_code crs_name crs_type crs_gcs crs_units crs_proj4
## <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 6588 NAD83(2011) / Texas South Cent… project… 6318 us-ft +proj=lc…
## 2 6587 NAD83(2011) / Texas South Cent… project… 6318 m +proj=lc…
## 3 3674 NAD83(NSRS2007) / Texas South … project… 4759 us-ft +proj=lc…
## 4 3673 NAD83(NSRS2007) / Texas South … project… 4759 m +proj=lc…
## 5 32140 NAD83 / Texas South Central project… 4269 m +proj=lc…
## 6 32040 NAD27 / Texas South Central project… 4267 us-ft +proj=lc…
## 7 2919 NAD83(HARN) / Texas South Cent… project… 4152 us-ft +proj=lc…
## 8 2847 NAD83(HARN) / Texas South Cent… project… 4152 m +proj=lc…
## 9 2278 NAD83 / Texas South Central (f… project… 4269 us-ft +proj=lc…
## 10 6580 NAD83(2011) / Texas Centric La… project… 6318 m +proj=lc…
#Project using the NAD83(2011) / Texas South Central
Data1990<-Data1990|>st_transform(6587)
#Now that the dataset has been converted to sf objects successfully and projected, let's calculate the area for each block group
#Getting the area will help with calculating the housing density.
#The Units package will be useful for setting the unit for our calculations.
library(units)
#Calculate the area and store it under a new column
Data1990$AreaPerSqKm<-round(set_units(st_area(Data1990$geometry), km^2), digits = 2)
#Now let's calculate the housing Density
Data1990$HousingDensity<-Data1990$HousingUnits/Data1990$AreaPerSqKm
#Let's check if our calculations and projection worked as desired
Data1990|>select(HousingDensity, AreaPerSqKm)|> head(5)
## Simple feature collection with 5 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 718794.5 ymin: 4274595 xmax: 720665.8 ymax: 4278177
## Projected CRS: NAD83(2011) / Texas South Central
## HousingDensity AreaPerSqKm geometry
## 1 424.7423 [1/km^2] 0.97 [km^2] MULTIPOLYGON (((720048.3 42...
## 2 787.2340 [1/km^2] 0.47 [km^2] MULTIPOLYGON (((720023.4 42...
## 3 1445.2830 [1/km^2] 0.53 [km^2] MULTIPOLYGON (((720242.5 42...
## 4 318.5185 [1/km^2] 0.54 [km^2] MULTIPOLYGON (((720407.7 42...
## 5 373.0159 [1/km^2] 0.63 [km^2] MULTIPOLYGON (((719648.6 42...
#Hurray! It worked.
Now that we have the needed variables from our calculations, it is
now time to visualize or map the census data. In this part, we are going
to map our data using the tmap package.
The Classification method used here is manual interval. In
tmap, the breaks can be passed through breaks in
the polygon layer as a vector.
#Open tmap
library(tmap)
#Create the map and store it as an Object
#The classification method used is manual interval.
Map1<-tm_shape(Data1990)+
tm_polygons(col = "HousingDensity",
palette="Greens", title="Housing Density (SqKm)",
breaks=c(0, 1000, 2000, 3000, 4000, 5000),
labels=c("0 to 999", "1000 to 1999","2000 to 2999", "3000 to 3999", "4000 to 5000"))+
tm_layout(frame = FALSE,
legend.outside = TRUE)
#View Map
Map1
Hurray! We have created our first map for this tutorial in a few easy steps. It is now time to locate Downtown Austin from Travis County and then and then create an inset map showing the housing density in the Downtown Austin Neighborhood.
To create the inset map, we have to first locate the Downtown Austin Neighborhood from the dataset and then create a subset. A simple way to locate the city of Austin from our dataset is to use the Tract code. From our data, the tract code for each census geography is stored under the TRACTA column. Now the question is; how can we find the tract code for Travis County?
While a simple search will do the magic, the Census Geocoder Website is very efficient. Note that, tract codes change with time, and thus will strongly recommend checking the updated tract codes on the Geocoder website provided above. For this tutorial, refer to the Census Bureau Link Here to view the census tracts for the year 1990.
#The census tract codes for Downtown Austin in 1990 are 7 and 11.
#The neighborhood had two census tracts in 1990. Now lets create a subset from the dataset.
Downtown1990<-Data1990[Data1990$TRACTA%in%c("7", "11"), ]
#View the subset
view(Downtown1990)
#It worked!
In this step, we are going to create a bounding box to mark the boundary of the neighborhood. We can do this by creating a simple feature list column from our subset and then adding it to Map1.
#Create bounding box for the neighborhood
BoundingBox<-st_bbox(Downtown1990)|>st_as_sfc()
#Now add the bounding box to Map1 and Store it as Map2, also add a compass to the map
Map2<-Map1+tm_shape(BoundingBox)+tm_polygons(border.col = "orangered", lwd=1.5, alpha = 0)+
tm_compass(type = "8star", size = 3, position = c("right","top"))
#Add a Map title
Map2<-Map2+tm_layout(title = "Housing Density in Travis County(1990)-\nEmphasis on Downtown Austin")
#View Map
Map2
Yay! We created our second map for this tutorial. We are making Progress!
Finally, let’s create the inset map to be added to the final map.
#Create a map from the Downtown1990 dataset with similar breaks as that of Map1
Map3<-tm_shape(Downtown1990, bbox = BoundingBox)+
tm_polygons(col = "HousingDensity",
palette="Greens",
breaks=c(0, 1000, 2000, 3000, 4000, 5000),
labels=c("0 to 999", "1000 to 1999","2000 to 2999", "3000 to 3999", "4000 to 5000"))+
tm_layout(legend.show = FALSE, frame.lwd = 2, frame = "orangered")
#View the Map
Map3
Now let’s add Map3 as an inset map to Map2. Here,
we will convert the maps to Grid Graphic Objects. The reason for doing
this is to enable us to draw the plots and store them as a single plot.
In this tutorial, we will convert the plots to grob using
the tm_grob function since we are working with tmaps. After
that, we will use functions from the cowplot package
to draw the grobs (Graphic Objects).
#Open cowplot
library(cowplot)
#Export the plots to grobs
#Note the aspect ratios generated for each grob. This will help with placements and adjustments.
Map2Grob<-tmap_grob(Map2)
Map3Grob<-tmap_grob(Map3)
#Now draw the plots and store it as a single object using ggdraw
FinalMap<-ggdraw()+
draw_grob(Map2Grob)+
draw_grob(Map3Grob, x=0.25, height = 0.4, y=0.1)
#View the Map
FinalMap
Hurray! We have created our final map for part one of this tutorial. We can now export the map and use it as needed. This marks the end of Part One of this tutorial. See you in Part Two! :)
It’s nice to see you again :) I am glad to see you make progress!
In this part of the tutorial, We are going to create a similar map just like the one we made in part one. This means we are going to create a housing density map for Travis County-Texas with an emphasis on Downtown Austin. However, this part differs from that of the previous with regards to data importation, year, and mapping package.
Here, we are going to download our census data from the United States
Census Bureau using the tidycensus package and then map the
census data using ggplot2. The aim of this part of the
tutorial is learning alternatives to accessing census data and mapping
them. R provides various alternatives to mapping spatial data and it is
almost impossible to cover all in this tutorial. However, we will cover
most of them in upcoming tutorials.
The tidycensus package is designed to provide persons who are
interested in working with census data easy access to pre-processed
census data in R. Take a look at this tidycensus package Link
to learn the package’s syntax. Before we proceed, we will need an API
Key to access the data. Click Here to
request the Census Data API Key.
Now let’s download the required data using tidycensus. Here, we will be using decennial data for the year 2000. We will be using summary file 1 from the 2000 decennial census since we are working with demographic data.
#Let's take a look at the variables for the year 1990 and then Identify the the name of the variable needed for our project
library(tidycensus)
Variables<-load_variables(year = 2000, "sf1")
#From the table we can see that the total housing unit variable is named 'H001001'. We will be using this to download our data.
Data2000<-get_decennial(geography = "block group", variables = c(HousingUnits="H001001"), year=2000, state = "TX",
county = "Travis", key = "INSERT YOUR CENSUS API KEY HERE",
sumfile = "sf1", geometry = TRUE)
#Now let's structure our data in a nice way.
Data2000<-Data2000|>pivot_wider(names_from = variable, values_from = value)
From the code above, we downloaded our data using tidycensus while
setting geometry to true to enable us to map
our data. The data downloaded from the tidycensus package is
pre-prepared and thus, there is little or nothing to do with regard to
data cleaning for this tutorial.
Now let’s map our data using ggplot2.
#Calculate the area and store it under a new column
Data2000$AreaPerSqKm<-round(set_units(st_area(Data2000$geometry), km^2), digits = 2)
#Now let's calculate the housing Density
Data2000$HousingDensity<-Data2000$HousingUnits/Data2000$AreaPerSqKm
#Now let's drop the units. This is to enable easy mapping with geom_sf
Data2000$HousingDensity<-drop_units(Data2000$HousingDensity)
#Project using the same projection used in Part 1
Data2000<-Data2000|>st_transform(6587)
Notice that we did not convert the data to sf features. That is because the data from the tidycensus package is pre-processed for mapping.
#Create a map for Housing Density in Travis County using ggplot2 and then store it as an object
Gmap1<-ggplot(data = Data2000)+
geom_sf(aes(fill=HousingDensity))+
scale_fill_distiller(palette = 4, direction = 1)+theme_map()
Gmap1
#Applying theme_map() from the cowplot package as a layer removes all the grids from the plot.
#Alternatively, you can remove the grids manually. This method is shown as follows;
Gmap1<-ggplot(data = Data2000)+
geom_sf(aes(fill=HousingDensity))+
theme_minimal()+
theme(panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank())+
scale_fill_distiller(palette = "Greens",
breaks=c(1000, 2000, 3000, 4000, 5000),
limits=c(0, 5200), direction = 1)+
labs(fill = "Housing Density(SqKm)")
Gmap1
#Now let's add some elements to our map. We can do this by installing ggspatial
library(ggspatial)
Gmap1<-Gmap1+annotation_north_arrow(location="tr", style = north_arrow_nautical())
#View Map
Gmap1
Notice how we passed the breaks through geom_sf(). Now let’s create
an inset map and then add it the map. Click
This Link to learn how ggspatial works.
Click Here to view the Census Tract Outline Map for Travis County for the year 2000.
#The census tract codes for Downtown Austin in 2000 are 7 and 11.
#The neighborhood had two census tracts in 2000. Now lets create a subset from the dataset.
#Since the column is character(string), we will select using str_detect()
Downtown2000<-Data2000[str_detect(Data2000$NAME, "Census Tract 7|Census Tract 11"), ]
#Now let's create a bounding box
BoundingBox2<-st_bbox(Downtown2000)|>st_as_sfc()
#Add the bounding box to the map
Gmap2<-Gmap1+geom_sf(data = BoundingBox2, color="orangered", alpha=0, lwd=0.5)+
theme(legend.position = c(0.1, 0.15))
#Add a map title
Gmap2<-Gmap2+labs(title="Housing Density in Travis County(2000) - Emphasis on Downtown Austin")
#View Map
Gmap2
#Map for Downtown Austin
Gmap3<-ggplot(data = Downtown2000)+
geom_sf(aes(fill=HousingDensity))+
theme_minimal()+
theme(panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
legend.position = "none",
panel.border = element_rect(color = "orangered",
fill = NA,
size = 0.7))+
scale_fill_distiller(palette = "Greens",
breaks=c(1000, 2000, 3000, 4000, 5000),
limits=c(0, 5200), direction = 1)
#View Map
Gmap3
#Now draw the two maps using cowplot.
#We will first convert the plots to grobs using ggplotGrob since we are working with ggplot2
GmapGrob1<-ggplotGrob(Gmap2)
GmapGrob2<-ggplotGrob(Gmap3)
FinalMap2<-ggdraw()+
draw_grob(GmapGrob1)+
draw_grob(GmapGrob2, x=0.37, height = 0.4)
#View Map
FinalMap2
Comparing the map in part one to the map in part two, it can be seen that there has been a significant increase in housing or residential density from the year 1990 to 2000 across various block groups. For this tutorial, we can compare the two maps as a form of time series analysis. This is because the census tracts and block groups for the Downtown Austin Neighborhood remained the same from 1990 to 2000. However, in situations where the census tracts are not the same or changed, it is required to make all the tracts the same before a time series analysis can be made. This method of converting different census tracts into one that looks the same across time is a form of “Spatial Interpolation”. In upcoming tutorials, we will work with datasets that require us to use this method.
Yay! We did it! We have achieved all the goals set for this tutorial.
In this tutorial, we downloaded census data using the tidycensus
package, imported census data from the NHGIS website, and mapped census
data using ggplot2 and the
tmap (Thematic Map) packages in R for a simple spatial
analysis. I hope you found this tutorial helpful.
Steven Manson, Jonathan Schroeder, David Van Riper, Katherine Knowles, Tracy Kugler, Finn Roberts, and Steven Ruggles. IPUMS National Historical Geographic Information System: Version 18.0 [dataset]. Minneapolis, MN: IPUMS. 2023. http://doi.org/10.18128/D050.V18.0
IPUMS NHGIS, University of Minnesota, www.nhgis.org.
END OF TUTORIAL