DRAFT: ESA compliance with remotely sensed data

Overview

Section 7 compliance monitoring analyses

This vignette captures the analyses in a forthcoming manuscript (preprint + peer-reviewed paper) evaluating the utility of remotely sensed data for monitoring compliance with section 7 of the US Endangered Species Act. The section 7 consultation data - work types, coordinates, etc. - are from the TAILS database from the US Fish and Wildlife Service (FWS); see Malcom and Li 2015, PNAS for more information.

Data Summary

The dataset includes 364 consultations randomly selected from among over 44,000 consultations recorded with coordinates by FWS from 2008-2015.

Top of the data

head(dat)

dim(dat)

## [1] 364  35

Formal and informal consultation

Although FWS’s consultation data shows that formal consultations only make up ~7% of all consultations, actions only undergo formal consultation if an ESA- listed species is likely to be harmed. That is, we are generally more interested in actions that have undergone formal consultation. The 182 formal consultations we investigated are therefore a “denser” sampling of all formal consultations than the 182 samples of informal consultations.

Geography

The locations of the consultations in our sample generally reflected the distribution of consultations observed by Malcom and Li (2015): formal consultations were more likely to come from the western US, informal consultations more likely from the East. The bias in which offices record coordinates is apparent with the lack of formal consultations from some states, like Colorado (n ~ 500 formal consultations, but rarely records coordinates) and Washington (n ~ 500 formal consultations, high data recording rate).

### A static map in case we want one for Supp. Info.
#
# to_map <- dplyr::filter(dat, dat$lat_dec_deg.x > 20 & dat$lat_dec_deg.x < 60)
# map <- qmplot(
#     data = to_map,
#     x = long_dec_deg.x, 
#     y = lat_dec_deg.x, 
#     maptype = "toner-lite",
#     color = formal_in,
#     size = I(2),
#     alpha = I(0.5),
#     extent = "normal",
#     zoom = 5) +
#   theme_bw() +
#   labs(x = "Longitude",
#        y = "Latitude") +
#   scale_color_viridis(discrete = TRUE, name = "Type")
# ggsave(plot = map, "inst/figs/map.pdf")

cols <- substr(
  viridis(3), 0, 7
)

dat$formal_colors <- ifelse(
  dat$formal_in == "formal",
  cols[1],
  cols[3]
)

tags$div(
  style = "background-color: #e9e9e9",
  leaflet(dat) %>%
    setView(lng=-95, lat=38, zoom = 4) %>%
    addProviderTiles("Stamen.TonerLite") %>%
    mapOptions(zoomToLimits = "first") %>%
    addCircleMarkers(
      lng= ~long_dec_deg.x, 
      lat= ~lat_dec_deg.x,
      stroke = FALSE,
      fillOpacity = 0.5,
      fillColor = ~formal_colors,
      popup = ~activity_code
    )
)

Other variables summaries

Top lead agencies

head(sort(table(as.character(dat$lead_agency)), decreasing = TRUE), 10)

## 
##                       Army Corps of Engineers 
##                                            90 
##                       -- OTHER: CONSULTANT -- 
##                                            53 
##                Federal Highway Administration 
##                                            40 
##                                Forest Service 
##                                            16 
##                      Bureau of Indian Affairs 
##                                            15 
##                     Fish and Wildlife Service 
##                                            12 
##             Federal Communications Commission 
##                                            11 
## California Dept. of Transportation District 6 
##                                            10 
##      Community Planning and Development (HUD) 
##                                            10 
## California Dept. of Transportation District 3 
##                                             9

Top 10 lead ES offices

head(sort(table(as.character(dat$ESOffice.x)), decreasing = TRUE), 10)

## 
##            SACRAMENTO FISH AND WILDLIFE OFFICE 
##                                             69 
##  PENNSYLVANIA ECOLOGICAL SERVICES FIELD OFFICE 
##                                             50 
## SOUTH FLORIDA ECOLOGICAL SERVICES FIELD OFFICE 
##                                             26 
##            WASHINGTON FISH AND WILDLIFE OFFICE 
##                                             23 
##               VENTURA FISH AND WILDLIFE OFFICE 
##                                             21 
##   MISSISSIPPI ECOLOGICAL SERVICES FIELD OFFICE 
##                                             18 
##     TENNESSEE ECOLOGICAL SERVICES FIELD OFFICE 
##                                             17 
##                 IDAHO FISH AND WILDLIFE OFFICE 
##                                             11 
##   BLOOMINGTON ECOLOGICAL SERVICES FIELD OFFICE 
##                                              9 
##       RALEIGH ECOLOGICAL SERVICES FIELD OFFICE 
##                                              9

Fiscal years

table(dat$FY)

## 
## 2008 2009 2010 2011 2012 2013 2014 2015 
##   70   65   62   44   39   33   34   17

Action Observability

Majority of actions not found

Before starting to explore the remotely sensed data, two authors (TK and JM) independently scored each action type for expected observability, with:

0: not expected to be observered;
0.5: may be observed or unknown whether to expect; and
1: expected to be observed.

After independent scoring, we met and reconciled the expected scores. The authors - primarily TK - then attempted to find all 364 consultations in aerial images using Google Earth and the coordinates provided by FWS.

First, the overall expected and observed observability for formal and informal consultations:

expected <- aggregate(reconcile ~ formal_in, 
                      data = dat, 
                      FUN = mean, na.rm = TRUE)
names(expected)[2] <- "mean"
expected$OE <- "Exp"
observed <- aggregate(action_found ~ formal_in, 
                      data = dat, 
                      FUN = mean, na.rm = TRUE)
names(observed)[2] <- "mean"
observed$OE <- "Obs"
overall <- rbind(expected, observed)
overall <- select(overall, OE, formal_in, mean)
names(overall) <- c("Obs/Exp", "Formal/Informal", "Mean")
kable(overall, digits = 3)

Obs/Exp	Formal/Informal	Mean
Exp	formal	0.538
Exp	informal	0.657
Obs	formal	0.389
Obs	informal	0.397

Next, we break the categories down to identify which classes changed between the expected and observed stages, e.g., if “maybe”s were, in reality, not observable:

values <- c("No", "Maybe", "Yes")
for_exp <- table(filter(dat, formal_in == "formal")$reconcile)
for_obs <- table(filter(dat, formal_in == "formal")$action_found)
inf_exp <- table(filter(dat, formal_in == "informal")$reconcile)
inf_obs <- table(filter(dat, formal_in == "informal")$action_found)
new <- data_frame(
  OE = c(rep("expected", 6), rep("observed", 6)),
  `in/formal` = c(rep("formal", 3), rep("informal", 3),
           rep("formal", 3), rep("informal", 3)),
  observed = rep(values, 4),
  frequency = c(for_exp, inf_exp, for_obs, inf_obs)
)

ggplot(data = new, aes(x = observed, y = frequency)) +
  geom_bar(stat = "identity") +
  labs(x = "Action observability",
       y = "# consultations") +
  facet_grid(OE ~ `in/formal`) +
  theme_hc()

The “Maybe” categories for expected were mostly “No” when observed, for both formal and informal consultations.

new

Observability by Work Category

High variation among work categories

Determining the types of work that are most amenable to monitoring from the standpoint of observability - whether the effects of the action are observable in aerial or satellite imagery - is a key challenge. Before showing observability rate, we the sample size by work type and by formal/informal consultation:

n_obs <- as_data_frame(table(dat$work_category, dat$formal_in))
names(n_obs) <- c("work_cat", "formal_in", "n")

marg <- ggplot(n_obs, aes(x = n, y = work_cat)) +
  geom_lollipop(horizontal = TRUE, point.size = 3) +
  facet_grid(. ~ formal_in) +
  labs(x = NULL, y = "Sample size") +
  theme_hc()
marg

# Remove the y-axis labels to make compound fig and save for the ms.
# marg <- marge +
#   theme(axis.title.y=element_blank(),
#         axis.text.y=element_blank(),
#         axis.ticks.y=element_blank())
# ggsave("inst/figs/work_cat_counts.pdf", height = 9.75, width = 3.25)

And now the mean observability by work category and formal/informal consultation:

type_obs <- aggregate(
  action_found ~ work_category + formal_in,
  data = dat,
  FUN = mean, na.rm = TRUE
)
names(type_obs) <- c("category", "in/formal", "observability")

fig <- ggplot(type_obs, aes(y = category, x = `in/formal`, fill = observability)) +
  geom_tile(color = "white", size = 0.1) +
  labs(x = "Consultation Type", y = "") +
  scale_y_discrete(limits = levels(type_obs$category)) +
  scale_fill_viridis(name="Mean Observability") +
  theme_hc()
# fig
# ggsave("inst/figs/work_cat_obs.pdf", height = 9.75, width = 6.5)

# Use ggplotly for this HTML vignette:
ggplotly(fig)

There has been a slight decrease in observability over time, but that decrease is not statistically significant:

cor.test(dat$action_found, dat$FY)

## 
##  Pearson's product-moment correlation
## 
## data:  dat$action_found and dat$FY
## t = -0.91304, df = 358, p-value = 0.3618
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1508101  0.0554384
## sample estimates:
##         cor 
## -0.04819961

Expected Volume of Consultations

We expect ~40% overall observability if all coordinates recorded…

get_observabilities <- function(dat, formal_in) {
  type_mean <- tapply(dat$action_found,
                      INDEX = dat$work_type,
                      FUN = mean, na.rm = TRUE)
  type_median <- tapply(dat$action_found,
                      INDEX = dat$work_type,
                      FUN = median, na.rm = TRUE)
  if (formal_in == "formal") {
      type_count <- tapply(dat$N_formal,
                           INDEX = dat$work_type,
                           FUN = mean, na.rm = TRUE)
  } else {
      type_count <- tapply(dat$N_consultations,
                           INDEX = dat$work_type,
                           FUN = mean, na.rm = TRUE)
  }
  expect_to_see_all <- type_mean * type_count

  # And these are the results at the highest level:
  tot_num_formal <- sum(type_count, na.rm = TRUE)
  exp_num_see <- sum(expect_to_see_all, na.rm = TRUE)
  obs_rate <- round(exp_num_see / tot_num_formal, 2)
  in_set <- round(tot_num_formal, 0)
  exp_obs <- round(exp_num_see, 0)
  return(c(obs_rate, in_set, exp_obs))
}

rows <- c("Observability | work type", 
          "# consultations in set",
          "# consultations expect to see effects")
exp_df <- data_frame(
  value = rows,
  formal = get_observabilities(formal, "formal"),
  informal = get_observabilities(informal, "informal")
)
exp_df

Note: The number of consultations in each set is the number in the work categories we examined, not the total number of consultations with coordinates. There is a long tail of work categories with few consultations.

…but work type matters a lot

Formal consultations:

make_scatter_df <- function(dat, cons_dat, formal_in) {
  type_mean <- tapply(dat$action_found,
                      INDEX = dat$work_type,
                      FUN = mean, na.rm = TRUE)
  type_median <- tapply(dat$action_found,
                      INDEX = dat$work_type,
                      FUN = median, na.rm = TRUE)
  if (formal_in == "formal") {
      type_count <- tapply(dat$N_formal,
                           INDEX = dat$work_type,
                           FUN = mean, na.rm = TRUE)
  } else {
      type_count <- tapply(dat$N_consultations,
                           INDEX = dat$work_type,
                           FUN = mean, na.rm = TRUE)
  }
  expect_to_see_all <- type_mean * type_count

  tmp_dat <- data.frame(type_count = as.vector(type_count),
                        type_mean = as.vector(type_mean),
                        work = names(type_count))
  work_cat_type <- data.frame(cat = cons_dat$work_category,
                              work = as.character(cons_dat$work_type))
  work_cat_type$uniq <- duplicated(work_cat_type$work)
  work_cat_type <- work_cat_type[work_cat_type$uniq == FALSE, ]
  tmp_dat <- inner_join(tmp_dat, work_cat_type, by = "work")
  return(tmp_dat)
}
form_obs_dat <- make_scatter_df(formal, form_cons, "formal")

tags$div(
  plot_ly(data = form_obs_dat,
          type = "scatter",
          mode = "markers",
          y = form_obs_dat$type_count,
          x = round(form_obs_dat$type_mean, 2),
          text = paste("Work type:", 
                       form_obs_dat$work, 
                       "<br>Work category:", 
                       form_obs_dat$cat),
          marker = list(
            color = substr(
              viridis(n = length(unique(form_obs_dat$cat))), 0, 7
            ),
            opacity = 0.6,
            size = 20)
          ) %>%
  layout(yaxis = list(title = "# actions"),
         xaxis = list(title = "Prop. observed"))
)

tags$br()

tags$br()

Informal consultations:

inform_obs_dat <- suppressWarnings(
  make_scatter_df(informal, inform_cons, "informal")
)

tags$div(
  plot_ly(data = inform_obs_dat,
          type = "scatter",
          mode = "markers",
          y = inform_obs_dat$type_count,
          x = round(inform_obs_dat$type_mean, 2),
          text = paste("Work type:", 
                       inform_obs_dat$work, 
                       "<br>Work category:", 
                       inform_obs_dat$cat),
          marker = list(
            color = substr(
              viridis(n = length(unique(inform_obs_dat$cat))), 0, 7
            ),
            opacity = 0.6,
            size = 20)
          ) %>%
  layout(yaxis = list(title = "# actions"),
         xaxis = list(title = "Prop. observed"))
)

Types of Change

Natural habitat to development is dominant type of change

We recorded the type of habitat change observed at each found action site to evaluate the types of change that are consulted on:

trans_hab <- function(x) {
    if (is.na(x) | is.null(x)) NA
    else if (x == 1) "natural -> natural"
    else if (x == 2) "natural -> agriculture"
    else if (x == 3) "natural -> development"
    else if (x == 4) "agriculture -> development"
    else "development -> development"
}

tmp <- sapply(dat$hab_chg, trans_hab)
conv_type <- table(tmp, dat$formal_in)
conv_type <- data_frame(
  conversion = row.names(conv_type),
  formal = conv_type[, 1],
  informal = conv_type[, 2]
)
conv_type

Imagery Dates

Some images are old, but most are recent

Knowing the distribution of the oldest average aerial images is useful for anticipating how far we can “turn the clock back” to track habitat changes at consultation sites.

summary(dat$earliest_date, na.rm = T)

##         Min.      1st Qu.       Median         Mean      3rd Qu. 
## "1939-12-01" "1993-03-01" "1994-02-01" "1992-09-24" "1995-02-01" 
##         Max.         NA's 
## "2005-03-01"          "3"

ggplot(dat, aes(earliest_date)) +
    geom_histogram(colour="white") +
    labs(x = "Earliest Aerial Image Date") +
    theme_hc()