Ravens and Roadkill: example analysis

library(tidyverse)
library(spatstat)
library(rstanarm)

Introduction

This page is part of a code repository (https://github.com/l-a-yates/ravens) to reproduce the analyses in the manuscript “Roadkill islands: carnivore extinction shifts seasonal use of roadside carrion by generalist avian scavenger” by Fielding, Matthew; Buettel, Jessie C; Brook, Barry; Stojanović, Dejan; Yates, Luke (2021). The data are available from the accompanying repository https://doi.org/10.6084/m9.figshare.14036807.v1.

Here we provide a simple worked example of the model-fitting process for Poisson point-process models. We make use of the package spatstat to generate a quadrature scheme using a small subset of the data from the manuscript. The scheme permits numerical approximation of the model likelihood using a generalised linear model (glm). We demonstrate model fitting and model comparison using both maximum-likelihood and Bayesian approaches. These approaches can be applied to other transect data and point patterns.

Load and prepare data

Load the data file, a spatstat::hyperframe object:

data <- readRDS("data/ravens_KF_hyper.rds"); data

## Hyperframe:
##     site linpp     X_F
## K-F  K-F (lpp) (linim)

The hyperframe contains a linear point pattern object (.lpp) and the distance to farmland covariate image for the route K-F (see RavRK_0_prepare.R for data pre-processing). First, we extract linear point pattern:

KF_linpp <- data$linpp[[1]]; KF_linpp

## Point pattern on linear network
## 229 points
## 3 columns of marks: 'season', 'type' and 'hab'
## Linear network with 301 vertices and 300 lines
## Enclosing window: polygonal boundary
## enclosing rectangle: [238542.21, 253543.18] x [5576951, 5582352] units

and choose a season:

seas <- "spring"

Next, we split the point pattern into raven observations and roadkill observations, and we define the two spatial covariates: X_F (distance to open farmland), X_R (distance to nearest roadkill):

ravens_lpp <- subset.lpp(KF_linpp, season == seas & type == "FR", select = F); plot(ravens_lpp)

roadkill_lpp <- subset.lpp(KF_linpp, season == seas & type == "RK", select = F); plot(roadkill_lpp)

X_F <- data$X_F[[1]]; plot(X_F)

X_R <- as.linim(distfun.lpp(roadkill_lpp)); plot(X_R)

We use the spatstat model-fitting function lppm to generate a model frame from the full (saturated) model:

m.spat <- lppm(ravens_lpp ~ 1 + X_F + X_R, eps = 15) 
m.spat %>% model.frame.lppm %>% as_tibble

## # A tibble: 1,227 x 4
##    .mpl.Y    X_F     X_R `(weights)`
##     <dbl>  <dbl>   <dbl>       <dbl>
##  1  0.133   5.28  167.          7.5 
##  2  0.133   1.20   19.2         7.5 
##  3  0.133  10.8    45.4         7.5 
##  4  0.133  11.8   105.          7.5 
##  5  0.200 549.   1278.          5.00
##  6  0.133   8.43   49.0         7.5 
##  7  0.133  24.0    17.3         7.5 
##  8  0.201  12.3     9.95        4.99
##  9  0.203  22.4   145.          4.92
## 10  0.133  19.3   608.          7.5 
## # … with 1,217 more rows

This generates a numerical quadrature scheme which includes a point for each observation together with added ‘dummy points’, each spaced by eps = 15 units. The role of the quadrature scheme is to enable numerical evaluation of the model likelihood which involves (numerical) integration of the density function along the length of each route. Within the model frame, each row corresponds to a quadrature point, where the weight ((weights)) denotes the length of route associated with the point and the response (.mpl.Y) is the density (0 for a dummy point, 1/weight for an observation). The covariate values are given in the remaining two columns.

We rename the columns and add a presence/absence transformation of the response which we call y2 (0 for a dummy point, 1 for an observation).

mf <- m.spat %>% model.frame.lppm %>% rename(y1 = .mpl.Y, w = `(weights)`) %>% mutate(y2 = as.numeric(y1>0))
mf %>% as_tibble

## # A tibble: 1,227 x 5
##       y1    X_F     X_R     w    y2
##    <dbl>  <dbl>   <dbl> <dbl> <dbl>
##  1 0.133   5.28  167.    7.5      1
##  2 0.133   1.20   19.2   7.5      1
##  3 0.133  10.8    45.4   7.5      1
##  4 0.133  11.8   105.    7.5      1
##  5 0.200 549.   1278.    5.00     1
##  6 0.133   8.43   49.0   7.5      1
##  7 0.133  24.0    17.3   7.5      1
##  8 0.201  12.3     9.95  4.99     1
##  9 0.203  22.4   145.    4.92     1
## 10 0.133  19.3   608.    7.5      1
## # … with 1,217 more rows

Model fitting

A Poisson point process is characterised by an inhomogenenous intensity function \(\lambda=\lambda(x,y)\), with the underlying assumption that events (observations) are independent. The intensity (Poisson parameter) is typically modelled as a log-linear function of selected covariates; for example: \[\mbox{log}\,\lambda = \beta_0 + \beta_1X_1 + \beta_2X_2\] where \(X_k = X_k(x,y),\,\,k =1,2\) are spatial covariates. For an introduction to the theory of point-process models, we recommend “Spatial point patterns: methodology and applications” by Baddeley, A., Rubak, E. & Turner, R. (2015).

Using a quadrature scheme, a Poisson point-process model can be approximated as a glm. Indeed, this is the default method used by spatstat::lppm. There are two equivalent ways to specify the glm:
1. Use w as prior weights with response y1 (0 or 1/w)
2. Use log(w) as an offset with response y2 (0 or 1)

Homogeneous model

We compare the two glm variants to the lppm fit for a (homogeneous) intercept-only model:

m0.lppm <- lppm(ravens_lpp ~ 1, eps = 15)
m0.glm.1 <- glm(y1 ~ 1, weights = w, family = poisson, data = mf)
m0.glm.2 <- glm(y2 ~ 1, offset = log(w), family = poisson, data = mf)

The model m0.glm.1 throws a ‘non-integer’ warning for each non-zero observation (here suppressed), but the results are identical for each:

m0.lppm

## Point process model on linear network
## Stationary Poisson process
## Intensity: 0.00180262
##              Estimate      S.E.   CI95.lo   CI95.hi Ztest      Zval
## (Intercept) -6.318514 0.1924501 -6.695709 -5.941319   *** -32.83196
## Original data: ravens_lpp
## Linear network with 301 vertices and 300 lines
## Enclosing window: polygonal boundary
## enclosing rectangle: [238542.21, 253543.18] x [5576951, 5582352] units

m0.glm.1

## 
## Call:  glm(formula = y1 ~ 1, family = poisson, data = mf, weights = w)
## 
## Coefficients:
## (Intercept)  
##      -6.319  
## 
## Degrees of Freedom: 1226 Total (i.e. Null);  1226 Residual
## Null Deviance:       243.8 
## Residual Deviance: 243.8     AIC: Inf

m0.glm.2

## 
## Call:  glm(formula = y2 ~ 1, family = poisson, data = mf, offset = log(w))
## 
## Coefficients:
## (Intercept)  
##      -6.319  
## 
## Degrees of Freedom: 1226 Total (i.e. Null);  1226 Residual
## Null Deviance:       243.8 
## Residual Deviance: 243.8     AIC: 299.8

The AIC estimate for the glm m0.glm.2 is \(243.8\) which differs from the lppm value

AIC(m0.lppm)

## [1] 397.1998

This is because the dummy points are treated differently to the observation points in the calculation of the latter. A modified calculation is easily implemented for both of the glm variants (see Baddeley et al. 2015, eq(9.56), p347):

AIC_adj_1 <- function(fit) -2*(-1*(deviance(fit)/2 + sum(log(fit$prior.weights[(fit$y>0)])) + sum(fit$y>0)) - length(coef(fit)))
AIC_adj_1(m0.glm.1)

## [1] 397.1998

AIC_adj_2 <- function(fit) -2*(-0.5*(deviance(fit) + 2*sum((fit$offset+1)*fit$y)) - length(coef(fit)))
AIC_adj_2(m0.glm.2)

## [1] 397.1998

However, since the modifications to the usual definition of AIC depend only on the weights and the values of response variable, they do not influence model selection provided a fixed quadrature scheme is used for all models.

For the remainder of these examples, and for all analyses in the manuscript, we use offset form of the glm with the response y2.

The full model

We fit the full (heterogeneous) model which includes both of the spatial covariates:

m1.lppm <- lppm(ravens_lpp ~ 1 + X_F + X_R, eps = 15); m1.lppm

## Point process model on linear network
## Nonstationary Poisson process
## 
## Log intensity:  ~1 + X_F + X_R
## 
## Fitted trend coefficients:
##   (Intercept)           X_F           X_R 
## -5.4091512363 -0.0008618054 -0.0030179110 
## 
##                  Estimate         S.E.      CI95.lo       CI95.hi Ztest
## (Intercept) -5.4091512363 0.2630419417 -5.924703968 -4.8935985041   ***
## X_F         -0.0008618054 0.0009053904 -0.002636338  0.0009127272      
## X_R         -0.0030179110 0.0009842679 -0.004947041 -0.0010887813    **
##                    Zval
## (Intercept) -20.5638356
## X_F          -0.9518605
## X_R          -3.0661479
## Original data: ravens_lpp
## Linear network with 301 vertices and 300 lines
## Enclosing window: polygonal boundary
## enclosing rectangle: [238542.21, 253543.18] x [5576951, 5582352] units

m1.glm <- glm(y2 ~ 1 + X_F + X_R, offset = log(w), family = poisson, data = mf); m1.glm

## 
## Call:  glm(formula = y2 ~ 1 + X_F + X_R, family = poisson, data = mf, 
##     offset = log(w))
## 
## Coefficients:
## (Intercept)          X_F          X_R  
##  -5.4091512   -0.0008618   -0.0030179  
## 
## Degrees of Freedom: 1226 Total (i.e. Null);  1224 Residual
## Null Deviance:       243.8 
## Residual Deviance: 226.1     AIC: 286.1

AIC(m1.lppm);AIC_adj_2(m1.glm)

## [1] 383.492

## [1] 383.492

Model comparison using AIC suggests the full model should be selected over the homogeneous model.

Bayesian estimation using `rstan`

Estimation in a Bayesian framework is straight-forward thanks to rstan and the frontend package rstanarm. Using the default (weakly-informative) priors, we fit both the intercept and the full model:

m.stan.0 <- stan_glm(y2~ 1, offset = log(w), family = poisson, data = mf, cores = 4); m.stan.0

## stan_glm
##  family:       poisson [log]
##  formula:      y2 ~ 1
##  observations: 1227
##  predictors:   1
## ------
##             Median MAD_SD
## (Intercept) -6.3    0.2  
## 
## ------
## * For help interpreting the printed output see ?print.stanreg
## * For info on the priors used see ?prior_summary.stanreg

m.stan.1 <- stan_glm(y2~ 1 + X_F + X_R, offset = log(w), family = poisson, data = mf, cores = 4); m.stan.1

## stan_glm
##  family:       poisson [log]
##  formula:      y2 ~ 1 + X_F + X_R
##  observations: 1227
##  predictors:   3
## ------
##             Median MAD_SD
## (Intercept) -5.4    0.3  
## X_F          0.0    0.0  
## X_R          0.0    0.0  
## 
## ------
## * For help interpreting the printed output see ?print.stanreg
## * For info on the priors used see ?prior_summary.stanreg

Plots of the marginal posterior distributions are easily generated:

m.stan.1 %>% plot("mcmc_dens")

Model comparison can be performed using approximate leave-one-out cross validation, built into rstanarm via the package loo.

loo_c <- loo_compare(loo(m.stan.0, cores = 10),loo(m.stan.1, cores = 10))

##          elpd_diff se_diff
## m.stan.1       0.0     0.0
## m.stan.0      -6.1     5.4

Leave-one-out cross validation, here based on predictive log densities and approximated using Pareto-smoothed importance-sampling techniques, is an estimate of the information-theoretic quantity Kullback-Leibler (KL) discrepancy. In this way, it is similar to AIC (up to a factor of \(-2\)), which is also an estimate of KL discrepancy, although cross validation is a more flexible estimator requiring less assumptions than AIC. Importantly, cross validation can be applied to compare hierarchical models with differing effect structures, providing the full set of group-level parameters are conditioned on to ensure the conditional independence of the test data points.

Original Computing Environment

sessionInfo()

## R version 4.0.5 (2021-03-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
##  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
##  [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] rstanarm_2.21.1       Rcpp_1.0.6            spatstat_2.0-1       
##  [4] spatstat.linnet_2.1-1 spatstat.core_2.0-0   rpart_4.1-15         
##  [7] nlme_3.1-152          spatstat.geom_2.0-1   spatstat.data_2.1-0  
## [10] forcats_0.5.1         stringr_1.4.0         dplyr_1.0.5          
## [13] purrr_0.3.4           readr_1.4.0           tidyr_1.1.3          
## [16] tibble_3.1.1          ggplot2_3.3.3         tidyverse_1.3.1      
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1          backports_1.2.1       plyr_1.8.6           
##   [4] igraph_1.2.6          splines_4.0.5         crosstalk_1.1.1      
##   [7] rstantools_2.1.1      inline_0.3.17         digest_0.6.27        
##  [10] htmltools_0.5.1.1     rsconnect_0.8.17      fansi_0.4.2          
##  [13] magrittr_2.0.1        tensor_1.5            modelr_0.1.8         
##  [16] RcppParallel_5.0.3    matrixStats_0.58.0    xts_0.12.1           
##  [19] spatstat.sparse_2.0-0 prettyunits_1.1.1     colorspace_2.0-0     
##  [22] rvest_1.0.0           haven_2.4.0           xfun_0.22            
##  [25] callr_3.5.1           crayon_1.4.1          jsonlite_1.7.2       
##  [28] lme4_1.1-26           survival_3.2-10       zoo_1.8-9            
##  [31] glue_1.4.2            polyclip_1.10-0       gtable_0.3.0         
##  [34] V8_3.4.0              pkgbuild_1.2.0        rstan_2.21.2         
##  [37] abind_1.4-5           scales_1.1.1          DBI_1.1.1            
##  [40] miniUI_0.1.1.1        xtable_1.8-4          stats4_4.0.5         
##  [43] StanHeaders_2.21.0-7  DT_0.18               htmlwidgets_1.5.3    
##  [46] httr_1.4.2            threejs_0.3.3         ellipsis_0.3.1       
##  [49] pkgconfig_2.0.3       loo_2.4.1             sass_0.3.1           
##  [52] dbplyr_2.1.1          deldir_0.2-10         utf8_1.2.1           
##  [55] tidyselect_1.1.0      rlang_0.4.10          reshape2_1.4.4       
##  [58] later_1.1.0.1         munsell_0.5.0         cellranger_1.1.0     
##  [61] tools_4.0.5           cli_2.4.0             generics_0.1.0       
##  [64] broom_0.7.6           ggridges_0.5.3        evaluate_0.14        
##  [67] fastmap_1.1.0         yaml_2.2.1            goftest_1.2-2        
##  [70] processx_3.5.1        knitr_1.32            fs_1.5.0             
##  [73] mime_0.10             xml2_1.3.2            compiler_4.0.5       
##  [76] bayesplot_1.8.0       shinythemes_1.2.0     rstudioapi_0.13      
##  [79] curl_4.3              spatstat.utils_2.1-0  reprex_2.0.0         
##  [82] statmod_1.4.35        bslib_0.2.4           stringi_1.5.3        
##  [85] highr_0.8             ps_1.6.0              lattice_0.20-41      
##  [88] Matrix_1.3-2          nloptr_1.2.2.2        markdown_1.1         
##  [91] shinyjs_2.0.0         vctrs_0.3.6           pillar_1.6.0         
##  [94] lifecycle_1.0.0       jquerylib_0.1.3       httpuv_1.5.5         
##  [97] R6_2.5.0              promises_1.2.0.1      gridExtra_2.3        
## [100] codetools_0.2-18      boot_1.3-27           colourpicker_1.1.0   
## [103] MASS_7.3-53.1         gtools_3.8.2          assertthat_0.2.1     
## [106] withr_2.4.1           shinystan_2.5.0       mgcv_1.8-35          
## [109] parallel_4.0.5        hms_1.0.0             grid_4.0.5           
## [112] minqa_1.2.4           rmarkdown_2.7         shiny_1.6.0          
## [115] lubridate_1.7.10      base64enc_0.1-3       dygraphs_1.1.1.6

Ravens and Roadkill: example analysis

Luke A. Yates, Matthew W. Fielding

2/18/2021

Introduction

Load and prepare data

Model fitting

Homogeneous model

The full model

Bayesian estimation using `rstan`

Original Computing Environment

Ravens and Roadkill: example analysis

Luke A. Yates, Matthew W. Fielding

2/18/2021

Introduction

Load and prepare data

Model fitting

Homogeneous model

The full model

Bayesian estimation using rstan

Original Computing Environment

Bayesian estimation using `rstan`