8  Adding Labels and Text

8.1 Goals

  • Learn about adding text and labels to figures.
  • Introduce the {geomtextpath} package.

8.3 Highlighting moments in time

As noted above, proponents of the long peace theory claim that World War II (or whereabouts) was a key turning point in the history of international conflict. They say that conflicts after World War II were less prevalent (with major power conflict non-existent) and that the conflicts that erupted were less deadly. Let’s focus on the first of these claims using some {peacesciencer} data. In the below code I open up the {tidyverse} and {peacesciencer}. I then create a country-year dataset that is populated with indicators for whether countries were involved in, or started, militarized interstate disputes (MIDs) with other countries in a given year.

## open packages 
library(tidyverse)
library(peacesciencer)

## create state-year data 
create_stateyears(
  subset_years = 1816:2010
) |> ## populate with conflict indicators
  add_gml_mids() -> dt

I’ll start by making a graph that shows conflict initiation over time. To do this, I group the data by year and then I calculate the mean on MID onset initiation. This gives me a collapsed dataset where each row is a year and where there are two columns, one the year and the other the MID initiation rate. Rather than save this summarized data as an object, I give it directly to ggplot() using the pipe operator. I then make a line plot that shows the MID initiation rate by year. I add a few additional touches, like making the the y-axis show percentages, adding a title and subtitle, and turning off the axis titles. Notice that the title tells the audience my interpretation of the data—it looks like MID initiations experience a decline (albeit modest) after World War II.

dt |>
  group_by(year) |>
  summarize(
    mid_init_rate = mean(gmlmidonset_init)
  ) |>
  ggplot() +
  aes(x = year, y = mid_init_rate) +
  geom_line() +
  labs(
    x = NULL,
    y = NULL,
    title = "The conflict rate appears to decline after WWII",
    subtitle = "% of countries starting MIDs over time"
  ) +
  scale_y_continuous(
    labels = scales::percent
  )

The above graph is fine, but it would be better if it explicitly highlighted World War II in time. As long as my audience knows that World War II started in 1939 and ended in 1945, they can just look at the x-axis, find those years, and determine whether they agree with my interpretation of the data. But this will annoy some people in my audience—that’s never good. Thankfully, I have some ready to use tools in the {ggplot2} package to help my audience out.

In the below code I produce a graph just like the one above, but I add two additional layers. First, I use geom_vline() to draw a vertical line at 1945. Second, I use annotate() to include a “WWII Ends” label close to the vertical line to make it clear that I want to highlight when WWII ends.

dt |>
  group_by(year) |>
  summarize(
    mid_init_rate = mean(gmlmidonset_init)
  ) |>
  ggplot() +
  aes(x = year, y = mid_init_rate) +
  geom_line() +
  geom_vline(
    xintercept = 1945,
    color = "steelblue",
    size = 1
  ) +
  annotate(
    geom = "text",
    x = 1945,
    y = 0.4,
    label = "WWII Ends",
    color = "steelblue",
    hjust = -0.1
  ) +
  labs(
    x = NULL,
    y = NULL,
    title = "The conflict rate appears to decline after WWII",
    subtitle = "% of countries starting MIDs over time"
  ) +
  scale_y_continuous(
    labels = scales::percent
  )

I can customize the vertical line to have it cover the whole World War II period, too. In the below code I make the line wider using the size option and make it transparent. I also change the label to just say “WWII” since I’m not just highlighting when World War II ended but the years it was ongoing.

dt |>
  group_by(year) |>
  summarize(
    mid_init_rate = mean(gmlmidonset_init)
  ) |>
  ggplot() +
  aes(x = year, y = mid_init_rate) +
  geom_line() +
  geom_vline(
    xintercept = (1939 + 1945) / 2,
    color = "steelblue",
    size = 6,
    alpha = 0.4
  ) +
  annotate(
    geom = "text",
    x = 1945,
    y = 0.4,
    label = "WWII",
    color = "steelblue",
    hjust = -0.1
  ) +
  labs(
    x = NULL,
    y = NULL,
    title = "The conflict rate appears to decline after WWII",
    subtitle = "% of countries starting MIDs over time"
  ) +
  scale_y_continuous(
    labels = scales::percent
  )

Using a combination of geom_vline() and annotate() works just fine in this example, but I can accomplish something very similar using less code with the help of the {geomtextpath} package. To install it, just run install.packages("geomtextpath") in the R console. To use it, just use library(geomtextpath) to access a range of new geom functions that provide text or label versions of some basic {ggplot2} geoms. In the below code, I use geom_textvline() and specify that I want a vertical line at 1945 and that I want to layer on top of the line the text “WWII Ended.”

library(geomtextpath)
dt |>
  group_by(year) |>
  summarize(
    mid_init_rate = mean(gmlmidonset_init)
  ) |>
  ggplot() +
  aes(x = year, y = mid_init_rate) +
  geom_line() +
  geom_textvline(
    xintercept = 1945,
    label = "WWII Ended",
    color = "steelblue",
    hjust = 0.8,
    linewidth = 1
  ) +
  labs(
    x = NULL,
    y = NULL,
    title = "The conflict rate appears to decline after WWII",
    subtitle = "% of countries starting MIDs over time"
  ) +
  scale_y_continuous(
    labels = scales::percent
  )

I can also make the line wider to highlight the range of years that WWII was ongoing, as I do in the below code. I update a few different things additional things in the graph as well. First, I put the vertical line behind the line showing the rate of MID initiation. Second, I make the label bold and I adjust it to the right.

dt |>
  group_by(year) |>
  summarize(
    mid_init_rate = mean(gmlmidonset_init)
  ) |>
  ggplot() +
  aes(x = year, y = mid_init_rate) +
  geom_textvline(
    xintercept = (1939 + 1945) / 2,
    label = "WWII",
    color = "steelblue",
    hjust = 0.8,
    linewidth = 6,
    fontface = "bold",
    vjust = 2
  ) +
  geom_line() +
  labs(
    x = NULL,
    y = NULL,
    title = "The conflict rate appears to decline after WWII",
    subtitle = "% of countries starting MIDs over time"
  ) +
  scale_y_continuous(
    labels = scales::percent
  )

We could distinguish between pre- and post-World War II trends in other ways. In the below example, I combine a scatter plot with a smooth plot, but I throw a few parlor tricks into the mix. First, I use geom_textsmooth() instead of the usual geom_smooth() so that I can add labels to the fitted regression lines. Second, I specify method = "lm" to indicate that it should fit a linear model and I specify formula = y ~ 1 to indicate that it should fit a linear model only with an intercept. This forces the smoothed layer to effectively report an overall average. Finally, I group the smoothed layer by whether the year is before or after 1945 so that I can show a unique average for each period. I also add some labels to tell my audience what the reported lines represent. Looking at the data in this way brings into sharp relief the fact that the prevalence of countries starting conflicts is higher after World War II than before, in direct contradiction with the long peace theory.

dt |>
  group_by(year) |>
  summarize(
    mid_init_rate = mean(gmlmidonset_init)
  ) |>
  ggplot() +
  aes(x = year, y = mid_init_rate) +
  geom_point(color = "gray") +
  geom_textsmooth(
    aes(
      group = year > 1945,
      label = ifelse(
        year > 1945, 
        "Post-1945 Average", 
        "Pre-1945 Average"
      )
    ),
    method = "lm",
    formula = y ~ 1,
    color = "red3",
    fontface = "bold.italic"
  ) +
  geom_textvline(
    xintercept = 1945,
    label = "WWII Ended",
    color = "steelblue",
    hjust = 0.8,
    linewidth = 2
  ) +
  labs(
    x = NULL,
    y = NULL,
    title = "The conflict rate appears to worsen after WWII",
    subtitle = "% of countries starting MIDs over time"
  ) +
  scale_y_continuous(
    labels = scales::percent
  )