# Chapter 18 Spatial Regression

Even though it may be tempting to focus on interpreting the map pattern of an area support response variable of interest, the pattern may largely derive from covariates (and their functional forms), as well as the respective spatial footprints of the variables in play. Spatial autoregressive models in two dimensions began without covariates and with clear links to time series (Whittle 1954). Extensions included tests for spatial autocorrelation in linear model residuals, and models applying the autoregressive component to the response or the residuals, where the latter matched the tests for residuals (Cliff and Ord 1972; Cliff and Ord 1973). These “lattice” models of areal data typically express the dependence between observations using a graph of neighbours in the form of a contiguity matrix.

A division has grown up, possibly unhelpfully, between scientific fields using conditional autoregressive (CAR) models (Besag 1974), and simultaneous autoregressive models (SAR) (Ord 1975; Hepple 1976). Although CAR and SAR models are closely related, these fields have found it difficult to share experience of applying similar models, often despite referring to key work summarising the models (Ripley 1981; Ripley 1988; Cressie 1993). Ripley gives the SAR variance as (1981, 89):

${\rm Var}_S = \sigma^ 2(I-\lambda W_S)^{-1} (I-\lambda W_S^{\rm T})^{-1}$ where $$\lambda$$ is a spatial autocorrelation parameter and $$W_S$$ is a nonsingular matrix that represents spatial dependence. The CAR variance is:

${\rm Var}_C = \sigma^ 2(I-\lambda W_C)^{-1}$ where and $$W_C$$ is a symmetric and strictly positive definite matrix

More recent books expounding the theoretical bases for modelling with areal data simply point out the similarities in relevant chapters (Gaetan and Guyon 2010; Lieshout 2019); the interested reader is invited to consult these sources for background information and examples using the functions described below.

Of course, handling a spatial correlation structure in a generalised least squares model or a (generalised) linear or nonlinear mixed effects model such as those provided in the nlme and many other packages does not have to use a graph of neighbours (Pinheiro and Bates 2000). These models are also spatial regression models, using functions of the distance between observations, and fitted variograms to model the spatial autocorrelation present; such models have been held to yield a clearer picture of the underlying processes (Wall 2004), building on geostatistics. For example, the glmmTMB package successfully uses this approach to spatial regression (Brooks et al. 2017). Here we will only consider spatial regression using spatial weights, chiefly as implemented in the spatialreg package recently split out of the spdep package which had grown unnecessarily large, covering too many aspects of spatial dependence.

## 18.1 Spatial regression with spatial weights

Spatial autoregression models using spatial weights matrices were described in some detail using maximum likelihood estimation some time ago (Cliff and Ord 1973; Cliff and Ord 1981). A family of models were elaborated in spatial econometric terms extending earlier work, and in many cases using the simultaneous autoregressive framework and row standardization of spatial weights (Anselin 1988). The simultaneous and conditional autoregressive frameworks can be compared, and both can be supplemented using case weights to reflect the relative importance of different observations (Waller and Gotway 2004).

Here we shall use the Boston housing data set, which has been restructured and furnished with census tract boundaries (Bivand 2017). The original data set used 506 census tracts and a hedonic model to try to estimate willingness to pay for clean air. The response was constructed from counts of ordinal answers to a 1970 census question about house value; the response is left and right censored in the census source. The key covariate was created from a calibrated meteorological model showing the annual nitrogen oxides (NOX) level for a smaller number of model output zones. The numbers of houses responding also varies by tract and model output zone. There are several other covariates, some measured at the tract level, some by town only, where towns broadly correspond to the air pollution model output zones.

library(sf)
boston_506 <- st_read(system.file("shapes/boston_tracts.shp", package="spData"))
#> Reading layer boston_tracts' from data source /home/edzer/R/x86_64-pc-linux-gnu-library/3.6/spData/shapes/boston_tracts.shp' using driver ESRI Shapefile'
#> Simple feature collection with 506 features and 36 fields
#> geometry type:  POLYGON
#> dimension:      XY
#> bbox:           xmin: -71.5 ymin: 42 xmax: -70.6 ymax: 42.7
#> geographic CRS: NAD27
nb_q <- spdep::poly2nb(boston_506)
lw_q <- spdep::nb2listw(nb_q, style="W")

We can start by reading in the 506 tract data set from spData, and creating a contiguity neighbour object and from that again a row standardized spatial weights object. If we examine the median house values, we find that they have been assigned as missing values, and that 17 tracts are affected.

table(boston_506$censored) #> #> left no right #> 2 489 15 summary(boston_506$median)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
#>    5600   16800   21000   21749   24700   50000      17

Next, we can subset to the remaining 489 tracts with non-censored house values, and the neighbour object to match. The neighbour object now has one observation with no neighbours.

boston_489 <- boston_506[!is.na(boston_506$median),] nb_q_489 <- spdep::poly2nb(boston_489) lw_q_489 <- spdep::nb2listw(nb_q_489, style="W", zero.policy=TRUE) The NOX_ID variable specifies the upper level aggregation, letting us aggregate the tracts to air pollution model output zones. We can create aggregate neighbour and row standardized spatial weights objects, and aggregate the NOX variable taking means, and the CHAS Charles River dummy variable for observations on the river. agg_96 <- list(as.character(boston_506$NOX_ID))
boston_96 <- aggregate(boston_506[, "NOX_ID"], by=agg_96, unique)
nb_q_96 <- spdep::poly2nb(boston_96)
lw_q_96 <- spdep::nb2listw(nb_q_96)
boston_96$NOX <- aggregate(boston_506$NOX, agg_96, mean)$x boston_96$CHAS <- aggregate(as.integer(boston_506$CHAS)-1, agg_96, max)$x

The response is aggregated using the weightedMedian() function in matrixStats, and midpoint values for the house value classes. Counts of houses by value class were punched to check the published census values, which can be replicated using weightedMedian() at the tract level. Here we find two output zones with calculated weighted medians over the upper census question limit of USD 50,000, and remove them subsequently as they also are affected by not knowing the appropriate value to insert for the top class by value.

nms <- names(boston_506)
ccounts <- 23:31
for (nm in nms[c(22, ccounts, 36)]) {
boston_96[[nm]] <- aggregate(boston_506[[nm]], agg_96, sum)$x } br2 <- c(3.50, 6.25, 8.75, 12.50, 17.50, 22.50, 30.00, 42.50, 60.00)*1000 counts <- as.data.frame(boston_96)[, nms[ccounts]] f <- function(x) matrixStats::weightedMedian(x=br2, w=x, interpolate=TRUE) boston_96$median <- apply(counts, 1, f)
is.na(boston_96$median) <- boston_96$median > 50000
summary(boston_96$median) #> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's #> 9009 20417 23523 25263 30073 49496 2 Before subsetting, we aggregate the remaining covariates by weighted mean using the tract population counts punched from the census (Bivand 2017). boston_94 <- boston_96[!is.na(boston_96$median),]
nb_q_94 <- spdep::subset.nb(nb_q_96, !is.na(boston_96median)) lw_q_94 <- spdep::nb2listw(nb_q_94, style="W") We now have two data sets each at the lower, census tract level and the upper, air pollution model output zone level, one including the censored observations, the other excluding them. The original model related the log of median house values by tract to the square of NOX values, including other covariates usually related to house value by tract, such as aggregate room counts, aggregate age, ethnicity, social status, distance to downtown and to the nearest radial road, a crime rate, and town-level variables reflecting land use (zoning, industry), taxation and education (Bivand 2017). This structure will be used here to exercise issues raised in fitting spatial regression models, including the presence of multiple levels. form <- formula(log(median) ~ CRIM + ZN + INDUS + CHAS + I((NOX*10)^2) + I(RM^2) + AGE + log(DIS) + log(RAD) + TAX + PTRATIO + I(BB/100) + log(I(LSTAT/100))) Before moving to presentations of issues raised in fitting spatial regression models, it is worth making a few further points. A recent review of spatial regression in a spatial econometrics setting is given by Kelejian and Piras (2017); note that their usage is to call the spatial coefficient of the lagged response $$\lambda$$ and that of the lagged residuals $$\rho$$, the reverse of other usage (Anselin 1988; LeSage and Pace 2009); here we use $$\rho_{\mathrm{Lag}}$$ for the spatial coefficient in the spatial lag model, and $$\rho_{\mathrm{Err}}$$ for the spatial error model. One interesting finding is that relatively dense spatial weights matrices may downweight model estimates, suggesting that sparser weights are preferable (Smith 2009). Another useful finding is that the presence of residual spatial autocorrelation need not bias the estimates of variance of regression coefficients, provided that the covariates themselves do not exhibit spatial autocorrelation (Smith and Lee 2012). In general, however, the footprints of the spatial processes of the response and covariates may not be aligned, and if covariates and the residual are autocorrelated, it is likely that the estimates of variance of regression coefficients will be biassed downwards if attempts are not made to model the spatial processes. In trying to model these spatial processes, we may choose to model the spatial autocorrelation in the residual with a spatial error model (SEM). ${\mathbf y} = {\mathbf X}{\mathbf \beta} + {\mathbf u}, \qquad {\mathbf u} = \rho_{\mathrm{Err}} {\mathbf W} {\mathbf u} + {\mathbf \varepsilon},$ where $${\mathbf y}$$ is an $$(N \times 1)$$ vector of observations on a response variable taken at each of $$N$$ locations, $${\mathbf X}$$ is an $$(N \times k)$$ matrix of covariates, $${\mathbf \beta}$$ is a $$(k \times 1)$$ vector of parameters, $${\mathbf u}$$ is an $$(N \times 1)$$ spatially autocorrelated disturbance vector, $${\mathbf \varepsilon}$$ is an $$(N \times 1)$$ vector of independent and identically distributed disturbances and $$\rho_{\mathrm{Err}}$$ is a scalar spatial parameter. If the processes in the covariates and the response match, we should find little difference between the coefficients of a least squares and a SEM, but very often they diverge, suggesting that a Hausman test for this condition should be employed (Pace and LeSage 2008). This may be related to earlier discussions of a spatial equivalent to the unit root and cointegration where spatial processes match (Fingleton 1999). A model with a spatial process in the response only is termed a spatial lag model (SLM, often SAR - spatial autoregressive) (LeSage and Pace 2009). ${\mathbf y} = \rho_{\mathrm{Lag}} {\mathbf W}{\mathbf y} + {\mathbf X}{\mathbf \beta} + {\mathbf \varepsilon},$ where $$\rho_{\mathrm{Lag}}$$ is a scalar spatial parameter. Work reviewed by Mur and Angulo (2006) on the Durbin model; the Durbin model adds the spatially lagged covariates to the covariates included in the spatial lag model, giving a spatial Durbin model (SDM) with different processes in the response and covariates: ${\mathbf y} = \rho_{\mathrm{Lag}} {\mathbf W}{\mathbf y} + {\mathbf X}{\mathbf \beta} + {\mathbf W}{\mathbf X}{\mathbf \gamma} + {\mathbf \varepsilon},$ where $${\mathbf \gamma}$$ is a $$(k' \times 1)$$ vector of parameters. $$k'$$ defines the subset of the intercept and covariates, often $$k' = k-1$$ when using row standardised spatial weights and omitting the spatially lagged intercept. This permits the spatial processes to be viewed and tested for as a Common Factor (Burridge 1981; Bivand 1984). The inclusion of spatially lagged covariates lets us check whether the same spatial process is manifest in the response and the covariates (SEM), whether they are different processes, or whether no process is detected. The Common Factor is present when $${\mathbf \gamma} = - \rho_{\mathrm{Lag}} {\mathbf \beta}$$: ${\mathbf y} = \rho_{\mathrm{Lag}} {\mathbf W}{\mathbf y} + {\mathbf X}{\mathbf \beta} - \rho_{\mathrm{Lag}} {\mathbf W}{\mathbf X} {\mathbf \beta} + {\mathbf \varepsilon}, \qquad ({\mathbf I} - \rho_{\mathrm{Lag}} {\mathbf W}){\mathbf y} = ({\mathbf I} - \rho_{\mathrm{Lag}} {\mathbf W}) {\mathbf X}{\mathbf \beta} + {\mathbf \varepsilon},$ where $${\mathbf I}$$ is the $$N \times N$$ identity matrix, and $${\mathbf I} - \rho_{\mathrm{Lag}} {\mathbf W}$$ is the Common Factor: $\qquad {\mathbf y} = {\mathbf X}{\mathbf \beta} + ({\mathbf I} - \rho_{\mathrm{Lag}} {\mathbf W})^{-1} {\mathbf \varepsilon}, \qquad {\mathbf y} = {\mathbf X}{\mathbf \beta} + {\mathbf u}, \qquad {\mathbf u} = \rho_{\mathrm{Err}} {\mathbf W} {\mathbf u} + {\mathbf \varepsilon},$ If we extend this family with processes in the covariates and the residual, we get a spatial error Durbin model (SDEM). If it is chosen to admit a spatial process in the residuals in addition to a spatial process in the response, again two models are formed, a general nested model (GNM) nesting all the others, and a model without spatially lagged covariates (SAC, also known as SARAR - Spatial AutoRegressive-AutoRegressive model). If neither the residuals nor the residual are modelled with spatial processes, spatially lagged covariates may be added to a linear model, as a spatially lagged X model (SLX) (Elhorst 2010; Bivand 2012; LeSage 2014; Halleck Vega and Elhorst 2015). We can write the general nested model (GNM) as: ${\mathbf y} = \rho_{\mathrm{Lag}} {\mathbf W}{\mathbf y} + {\mathbf X}{\mathbf \beta} + {\mathbf W}{\mathbf X}{\mathbf \gamma} + {\mathbf u}, \qquad {\mathbf u} = \rho_{\mathrm{Err}} {\mathbf W} {\mathbf u} + {\mathbf \varepsilon},$ This may be constrained to the double spatial coefficient model SAC/SARAR by setting $${\mathbf \gamma} = 0$$, to the spatial Durbin (SDM) by setting $$\rho_{\mathrm{Err}} = 0$$, and to the error Durbin model (SDEM) by setting $$\rho_{\mathrm{Lag}} = 0$$. Imposing more conditions gives the spatial lag model (SLM) with $${\mathbf \gamma} = 0$$ and $$\rho_{\mathrm{Err}} = 0$$, the spatial error model (SEM) with $${\mathbf \gamma} = 0$$ and $$\rho_{\mathrm{Lag}} = 0$$, and the spatially lagged X model (SLX) with $$\rho_{\mathrm{Lag}} = 0$$ and $$\rho_{\mathrm{Err}} = 0$$. Although making predictions for new locations for which covariates are observed was raised as an issue some time ago, it has many years to make progress in reviewing the possibilities (Bivand 2002; Goulard, Laurent, and Thomas-Agnan 2017). The prediction methods for SLM, SDM, SEM, SDEM, SAC and GNM models fitted with maximum likelihood were contributed as a Google Summer of Coding project by Martin Gubri. This work, and work on similar models with missing data (Suesse 2018) is also relevant for exploring censored median house values in the Boston data set. Work on prediction also exposed the importance of the reduced form of these models, in which the spatial process in the response interacts with the regression coefficients in the SLM, SDM, SAC and GNM models. The consequence of these interactions is that a unit change in a covariate will only impact the response as the value of the regression coefficient if the spatial coefficient of the lagged response is zero. Where it is non-zero, global spillovers, impacts, come into play, and these impacts should be reported rather than the regression coefficients (LeSage and Pace 2009; Elhorst 2010; Bivand 2012; LeSage 2014; Halleck Vega and Elhorst 2015). Local impacts may be reported for SDEM and SLX models, using linear combination to calculate standard errors for the total impacts of each covariate (sums of coefficients on the covariates and their spatial lags). This can be seen from the GNM data generation process: $({\mathbf I} - \rho_{\mathrm{Err}} {\mathbf W})({\mathbf I} - \rho_{\mathrm{Lag}} {\mathbf W}){\mathbf y} = ({\mathbf I} - \rho_{\mathrm{Err}} {\mathbf W})({\mathbf X}{\mathbf \beta} + {\mathbf W}{\mathbf X}{\mathbf \gamma}) + {\mathbf \varepsilon},$ re-writing: ${\mathbf y} = ({\mathbf I} - \rho_{\mathrm{Lag}} {\mathbf W})^{-1}({\mathbf X}{\mathbf \beta} + {\mathbf W}{\mathbf X}{\mathbf \gamma}) + ({\mathbf I} - \rho_{\mathrm{Lag}} {\mathbf W})^{-1}({\mathbf I} - \rho_{\mathrm{Err}} {\mathbf W})^{-1}{\mathbf \varepsilon}.$ There is interaction between the $$\rho_{\mathrm{Lag}}$$ and $${\mathbf \beta}$$ (and $${\mathbf \gamma}$$ if present) coefficients. This can be seen from the partial derivatives: $$\partial y_i / \partial x_{jr} = (({\mathbf I} - \rho_{\mathrm{Lag}} {\mathbf W})^{-1} ({\mathbf I} \beta_r + {\mathbf W} \gamma_r))_{ij}$$. This dense matrix $$S_r({\mathbf W}) = (({\mathbf I} - \rho_{\mathrm{Lag}} {\mathbf W})^{-1} ({\mathbf I} \beta_r + {\mathbf W} \gamma_r))$$ expresses the direct impacts (effects) on its principal diagonal, and indirect impacts in off-diagonal elements. Current work in the spatialreg package is focused on refining the handling of spatially lagged covariates using a consistent Durbin= argument taking either a logical value or a formula giving the subset of covariates to add in spatially lagged form. There is a speculation that some covariates, for example some dummy variables, should not be added in spatially lagged form. This then extends to handling these included spatially lagged covariates appropriately in calculating impacts. This work applies to cross-sectional models fitted using MCMC or maximum likelihood, and will offer facilities to spatial panel models. It is worth mentioning the almost unexplored issues of functional form assumptions, for which flexible structures are useful, including spatial quantile regression presented in the McSpatial package (McMillen 2013). There are further issues with discrete response variables, covered by some functions in McSpatial, and in the spatialprobit and ProbitSpatial packages (Wilhelm and Matos 2013; Martinetti and Geniaux 2017); the MCMC implementations of the former are based on LeSage and Pace (2009). Finally, Wagner and Zeileis (2019) show how an SLM model may be used in the setting of recursive partitioning, with an implementation using spatialreg::lagsarlm() in the lagsarlmtree package. ## 18.2 Estimators The review of cross-sectional maximum likelihood and generalized method of moments (GMM) estimators in spatialreg and sphet for spatial econometrics style spatial regression models by Bivand and Piras (2015) is still largely valid. In the review, estimators in these R packages were compared with alternative implementations available in other programming languages elsewhere. The review did not cover Bayesian spatial econometrics style spatial regression. More has changed with respect to spatial panel estimators described in Millo and Piras (2012), but will not be covered here. ### 18.2.1 Maximum likelihood For models with single spatial coefficients (SEM and SDEM using errorsarlm(), SLM and SDM using lagsarlm()), the methods initially described by Ord (1975) are used. The following table shows the functions that can be used to estimate the models described above using maximum likelihood. model model name maximum likelihood estimation function SEM spatial error errorsarlm(..., Durbin=FALSE, ...) SEM spatial error spautolm(..., family="SAR", ...) SDEM spatial Durbin error errorsarlm(..., Durbin=TRUE, ...) SLM spatial lag lagsarlm(..., Durbin=FALSE, ...) SDM spatial Durbin lagsarlm(..., Durbin=TRUE, ...) SAC spatial autoregressive combined sacsarlm(..., Durbin=FALSE, ...) GNM general nested sacsarlm(..., Durbin=TRUE, ...) The estimating functions errorsarlm() and lagsarlm() take similar arguments, where the first two, formula= and data= are shared by most model estimating functions. The third argument is a listw spatial weights object, while na.action= behaves as in other model estimating functions if the spatial weights can reasonably be subsetted to avoid observations with missing values. The weights= argument may be used to provide weights indicating the known degree of per-observation variability in the variance term - this is not available for lagsarlm(). library(spatialreg) #> Loading required package: Matrix #> #> Attaching package: 'Matrix' #> The following objects are masked from 'package:tidyr': #> #> expand, pack, unpack #> #> Attaching package: 'spatialreg' #> The following objects are masked from 'package:spdep': #> #> anova.sarlm, as_dgRMatrix_listw, as_dsCMatrix_I, as_dsCMatrix_IrW, #> as_dsTMatrix_listw, as.spam.listw, bptest.sarlm, can.be.simmed, #> cheb_setup, coef.gmsar, coef.sarlm, coef.spautolm, coef.stsls, #> create_WX, deviance.gmsar, deviance.sarlm, deviance.spautolm, #> deviance.stsls, do_ldet, eigen_pre_setup, eigen_setup, eigenw, #> errorsarlm, fitted.gmsar, fitted.ME_res, fitted.sarlm, #> fitted.SFResult, fitted.spautolm, get.ClusterOption, #> get.coresOption, get.mcOption, get.VerboseOption, #> get.ZeroPolicyOption, GMargminImage, GMerrorsar, griffith_sone, #> gstsls, Hausman.test, HPDinterval.lagImpact, impacts, intImpacts, #> Jacobian_W, jacobianSetup, l_max, lagmess, lagsarlm, lextrB, #> lextrS, lextrW, lmSLX, logLik.sarlm, logLik.spautolm, LR.sarlm, #> LR1.sarlm, LR1.spautolm, LU_prepermutate_setup, LU_setup, #> Matrix_J_setup, Matrix_setup, mcdet_setup, MCMCsamp, ME, mom_calc, #> mom_calc_int2, moments_setup, powerWeights, predict.sarlm, #> predict.SLX, print.gmsar, print.ME_res, print.sarlm, #> print.sarlm.pred, print.SFResult, print.spautolm, print.stsls, #> print.summary.gmsar, print.summary.sarlm, print.summary.spautolm, #> print.summary.stsls, residuals.gmsar, residuals.sarlm, #> residuals.spautolm, residuals.stsls, sacsarlm, SE_classic_setup, #> SE_interp_setup, SE_whichMin_setup, set.ClusterOption, #> set.coresOption, set.mcOption, set.VerboseOption, #> set.ZeroPolicyOption, similar.listw, spam_setup, spam_update_setup, #> SpatialFiltering, spautolm, spBreg_err, spBreg_lag, spBreg_sac, #> stsls, subgraph_eigenw, summary.gmsar, summary.sarlm, #> summary.spautolm, summary.stsls, trW, vcov.sarlm, Wald1.sarlm args(errorsarlm) #> function (formula, data = list(), listw, na.action, weights = NULL, #> Durbin, etype, method = "eigen", quiet = NULL, zero.policy = NULL, #> interval = NULL, tol.solve = .Machinedouble.eps, trs = NULL,
#>     control = list())
#> NULL
args(lagsarlm)
#> function (formula, data = list(), listw, na.action, Durbin, type,
#>     method = "eigen", quiet = NULL, zero.policy = NULL, interval = NULL,
#>     tol.solve = .Machine$double.eps, trs = NULL, control = list()) #> NULL The Durbin= argument replaces the earlier type= and etype= arguments, and if not given is taken as FALSE. If given, it may be FALSE, TRUE in which case all spatially lagged covariates are included, or a one-sided formula specifying which spatially lagged covariates should be included. The method= argument gives the method for calculating the log determinant term in the log likelihood function, and defaults to "eigen", suitable for moderately sized data sets. The interval= argument gives the bounds of the domain for the line search using stats::optimize() used for finding the spatial coefficient. The tol.solve() argument, passed through to base::solve(), was needed to handle data sets with differing numerical scales among the coefficients which hindered inversion of the variance-covariance matrix; the default value in base::solve() used to be much larger. The control= argument takes a list of control values to permit more careful adjustment of the running of the estimation function. The spautolm() function also fits spatial regressions with the spatial process in the residuals, and takes a possibly poorly named family= argument, taking the values of "SAR" for the simultaneous autoregressive model (like errorsarlm()), "CAR" for the conditional autoregressive model, and "SMA" for the spatial moving average model. args(spautolm) #> function (formula, data = list(), listw, weights, na.action, #> family = "SAR", method = "eigen", verbose = NULL, trs = NULL, #> interval = NULL, zero.policy = NULL, tol.solve = .Machine$double.eps,
#>     llprof = NULL, control = list())
#> NULL

The sacsarlm() function may take second spatial weights and interval arguments if the spatial weights used to model the two spatial processes in the SAC and GNM specifications differ. By default, the same spatial weights are used. By default, stats::nlminb() is used for numerical optimization, using a heuristic to choose starting values.

args(sacsarlm)
#> function (formula, data = list(), listw, listw2 = NULL, na.action,
#>     Durbin, type, method = "eigen", quiet = NULL, zero.policy = NULL,
#>     tol.solve = .Machine$double.eps, llprof = NULL, interval1 = NULL, #> interval2 = NULL, trs1 = NULL, trs2 = NULL, control = list()) #> NULL Where larger data sets are used, a numerical Hessian approach is used to calculate the variance-covariance matrix of coefficients, rather than an analytical asymptotic approach. Apart from spautolm() which returns an "spautolm" object, the model fitting functions return "sarlm" objects. Standard methods for fitted models are provided, such as summary(): args(summary.sarlm) #> function (object, correlation = FALSE, Nagelkerke = FALSE, Hausman = FALSE, #> adj.se = FALSE, ...) #> NULL The Nagelkerke= argument permits the return of a value approximately corresponding to a coefficient of determination, although the summary method anyway provides the value of stats::AIC() because a stats::logLik() method is provided for "sarlm" and "spautolm" objects. If the "sarlm" object is a SEM or SDEM, the Hausman test may be performed by setting Hausman=TRUE to see whether the regression coefficients are sufficiently like least squares coefficients, indicating absence of mis-specification from that source. As an example, we may fit SEM and SDEM to the 94 and 489 observation Boston data sets, and present the Hausman test results: eigs_489 <- eigenw(lw_q_489) SDEM_489 <- errorsarlm(form, data=boston_489, listw=lw_q_489, Durbin=TRUE, zero.policy=TRUE, control=list(pre_eig=eigs_489)) SEM_489 <- errorsarlm(form, data=boston_489, listw=lw_q_489, zero.policy=TRUE, control=list(pre_eig=eigs_489)) cbind(data.frame(model=c("SEM", "SDEM")), rbind(broom::tidy(Hausman.test(SEM_489)), broom::tidy(Hausman.test(SDEM_489))))[,1:4] #> model statistic p.value parameter #> 1 SEM 52.0 2.83e-06 14 #> 2 SDEM 48.7 6.48e-03 27 Here we are using the control= list argument to pass through pre-computed eigenvalues for the default "eigen" method. Both test results for the 489 tract data set suggest that the regression coefficients do differ, perhaps that the footprints of the spatial processes do not match. Likelihood ratio tests of the spatial models against their least squares equivalents show that spatial process(es) are present, but we find that neither SEM not SDM are adequate representations. cbind(data.frame(model=c("SEM", "SDEM")), rbind(broom::tidy(LR1.sarlm(SEM_489)), broom::tidy(LR1.sarlm(SDEM_489))))[,c(1, 4:6)] #> model statistic p.value parameter #> 1 SEM 198 0 1 #> 2 SDEM 159 0 1 For the 94 air pollution model output zones, the Hausman tests find little difference between coefficients, but this is related to the fact that the SEM and SDEM models add little to least squares or SLX using likelihood ratio tests. eigs_94 <- eigenw(lw_q_94) SDEM_94 <- errorsarlm(form, data=boston_94, listw=lw_q_94, Durbin=TRUE, control=list(pre_eig=eigs_94)) SEM_94 <- errorsarlm(form, data=boston_94, listw=lw_q_94, control=list(pre_eig=eigs_94)) cbind(data.frame(model=c("SEM", "SDEM")), rbind(broom::tidy(Hausman.test(SEM_94)), broom::tidy(Hausman.test(SDEM_94))))[, 1:4] #> model statistic p.value parameter #> 1 SEM 15.66 0.335 14 #> 2 SDEM 9.21 0.999 27 cbind(data.frame(model=c("SEM", "SDEM")), rbind(broom::tidy(LR1.sarlm(SEM_94)), broom::tidy(LR1.sarlm(SDEM_94))))[,c(1, 4:6)] #> model statistic p.value parameter #> 1 SEM 2.593 0.107 1 #> 2 SDEM 0.216 0.642 1 We can use spatialreg::LR.sarlm() to apply a likelihood ratio test between nested models, but here choose lmtest::lrtest(), which gives the same results, preferring models including spatially lagged covariates. broom::tidy(lmtest::lrtest(SEM_489, SDEM_489)) #> Warning in tidy.anova(lmtest::lrtest(SEM_489, SDEM_489)): The following column #> names in ANOVA output were not recognized or transformed: X.Df, LogLik #> Warning: Unknown or uninitialised column: 'term'. #> # A tibble: 2 x 5 #> X.Df LogLik df statistic p.value #> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 16 273. NA NA NA #> 2 29 311. 13 74.4 1.23e-10 broom::tidy(lmtest::lrtest(SEM_94, SDEM_94)) #> Warning in tidy.anova(lmtest::lrtest(SEM_94, SDEM_94)): The following column #> names in ANOVA output were not recognized or transformed: X.Df, LogLik #> Warning: Unknown or uninitialised column: 'term'. #> # A tibble: 2 x 5 #> X.Df LogLik df statistic p.value #> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 16 59.7 NA NA NA #> 2 29 81.3 13 43.2 0.0000421 The SLX model is fitted using least squares, and also returns a log likelihood value, letting us test whether we need a spatial process in the residuals. args(lmSLX) #> function (formula, data = list(), listw, na.action, weights = NULL, #> Durbin = TRUE, zero.policy = NULL) #> NULL In the tract data set we obviously do: SLX_489 <- lmSLX(form, data=boston_489, listw=lw_q_489, zero.policy=TRUE) broom::tidy(lmtest::lrtest(SLX_489, SDEM_489)) #> Warning in modelUpdate(objects[[i - 1]], objects[[i]]): original model was of #> class "SLX", updated model is of class "sarlm" #> Warning in tidy.anova(lmtest::lrtest(SLX_489, SDEM_489)): The following column #> names in ANOVA output were not recognized or transformed: X.Df, LogLik #> Warning: Unknown or uninitialised column: 'term'. #> # A tibble: 2 x 5 #> X.Df LogLik df statistic p.value #> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 28 231. NA NA NA #> 2 29 311. 1 159. 1.55e-36 but in the output zone case we do not. SLX_94 <- lmSLX(form, data=boston_94, listw=lw_q_94) broom::tidy(lmtest::lrtest(SLX_94, SDEM_94)) #> Warning in modelUpdate(objects[[i - 1]], objects[[i]]): original model was of #> class "SLX", updated model is of class "sarlm" #> Warning in tidy.anova(lmtest::lrtest(SLX_94, SDEM_94)): The following column #> names in ANOVA output were not recognized or transformed: X.Df, LogLik #> Warning: Unknown or uninitialised column: 'term'. #> # A tibble: 2 x 5 #> X.Df LogLik df statistic p.value #> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 28 81.2 NA NA NA #> 2 29 81.3 1 0.216 0.642 This outcome is sustained also when we use the counts of house units by tract and output zones as weights: SLX_94w <- lmSLX(form, data=boston_94, listw=lw_q_94, weights=units) SDEM_94w <- errorsarlm(form, data=boston_94, listw=lw_q_94, Durbin=TRUE, weights=units, control=list(pre_eig=eigs_94)) broom::tidy(lmtest::lrtest(SLX_94w, SDEM_94w)) #> Warning in modelUpdate(objects[[i - 1]], objects[[i]]): original model was of #> class "SLX", updated model is of class "sarlm" #> Warning in tidy.anova(lmtest::lrtest(SLX_94w, SDEM_94w)): The following column #> names in ANOVA output were not recognized or transformed: X.Df, LogLik #> Warning: Unknown or uninitialised column: 'term'. #> # A tibble: 2 x 5 #> X.Df LogLik df statistic p.value #> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 28 97.5 NA NA NA #> 2 29 98.0 1 0.917 0.338 ### 18.2.2 Generalized method of moments The estimation methods used for fitting SLM and the spatially lagged response part of SARAR models are based on instrumental variables, using the spatially lagged covariates (usually including lags of lags too) as instruments for the spatially lagged response (Piras 2010). This, and the use of spatially lagged covariates in the moment conditions for models including a spatial process in the residuals, means that including the spatially lagged covariates in the initial model is challenging, and functions in the sphet package do not provide a Durbin= argument. This makes it harder to accommodate data with multiple spatial process footprints. However, as Kelejian and Piras show in their recent review (2017), these approaches have other benefits, such as being able to instrument variables suspected of suffering from endogeneity or measurement error. Let us first compare the ML and GMM estimates of the SEM regression coefficients for rescaled squared NOX values. We use the sphet::spreg() wrapper function, and fit a SEM model. Extracting the matching rows from the summary objects for these models, we can see that the values are not dissimilar, despite the difference in estimation methods. library(sphet) SEM1_94 <- spreg(form, data=boston_94, listw=lw_q_94, model="error") res <- rbind(summary(SEM_94)$Coef["I((NOX * 10)^2)",],
summary(SEM1_94)$CoefTable["I((NOX * 10)^2)",]) rownames(res) <- c("ML", "GMM") res #> Estimate Std. Error z value Pr(>|z|) #> ML -0.00956 0.00261 -3.66 0.000247 #> GMM -0.00885 0.00267 -3.31 0.000930 Using sphet::spreg(), we can instrument the rescaled squared NOX variable, dropping it first from the formula, next creating the rescaled squared NOX variable as a column in the "sf" object, extracting a matrix of coordinates from the centroids of the output zones, and creating a one-sided instrument formula from a second order polynomial in the coordinates (here improperly, as they are not projected) and mean distances to downtown and radial roads. The endog= argument takes a one-sided formula for the variables to be modelled in the first stage of the model. Had we re-run the original air pollution model many times under slightly varying scenarios, we could have used an ensemble of NOX loadings to yield its distribution by output zone. Because this is not possible, we assume that the measurement error can be captured by using selected instruments. Unfortunately, the NOX regression coefficient estimate from the second stage has fallen substantially in absolute size, although the sign is unchanged. formiv <- update(form, . ~ . - I((NOX*10)^2)) boston_94$NOX2 <- (boston_94$NOX*10)^2 suppressWarnings(ccoords <- st_coordinates(st_centroid(st_geometry(boston_94)))) iform <- formula(~poly(ccoords, degree=2) + DIS + RAD) SEM1_94iv <- spreg(formiv, data=boston_94, listw=lw_q_94, endog = ~NOX2, instruments=iform, model="error") summary(SEM1_94iv)$CoefTable["NOX2",]
#>   Estimate Std. Error    t-value   Pr(>|t|)
#>   -0.00195    0.00454   -0.43063    0.66674

Handling measurement error in this or similar ways is one of the benefits of GMM estimation methods, although here the choice of instruments was somewhat arbitrary.

### 18.2.3 Markov chain Monte Carlo

(draft - a comparative piece is being written for submission in about two months)

The Spatial Econometrics Library is part of the extensive Matlab code repository at https://www.spatial-econometrics.com/ and documented in LeSage and Pace (2009). The Google Summer of Coding project in 2011 by Abhirup Mallik mentored by Virgilio Gómez-Rubio yielded translations of some of the model fitting functions for SEM, SDEM, SLM, SDM, SAC and GNM from the Matlab code. These have now been added to spatialreg as spBreg_err(), spBreg_lag() and spBreg_sac() with Durbin= arguments to handle the inclusion of spatially lagged covariates. As yet, heteroskedastic disturbances are not accommodated. The functions return "mcmc" objects as specified in the coda package, permitting the use of tools from coda for handling model output. The default burnin is 500 draws, followed by default 2000 draws returned, but these values and many others may be set through the control= list argument. Fitting the SDEM model for the output zones takes about an order of magnitude longer than using ML, but there is more work to do subsequently, and this difference scales more in the number of samples than covariates or observations.

system.time(SDEM_94B <- spBreg_err(form, data=boston_94, listw=lw_q_94, Durbin=TRUE))
#>    user  system elapsed
#>   1.925   0.004   1.946
system.time(SDEM_489B <- spBreg_err(form, data=boston_489, listw=lw_q_489, Durbin=TRUE, zero.policy=TRUE))
#>    user  system elapsed
#>   2.663   0.004   2.668

Most time in the ML case using eigenvalues is taken by log determinant setup and optimization, and by dense matrix asymptotic standard errors if chosen (always chosen with default "eigen" log determinant method):

t(errorsarlm(form, data=boston_94, listw=lw_q_94, Durbin=TRUE)$timings[,2]) #> set_up eigen_set_up eigen_opt coefs eigen_hcov eigen_se #> [1,] 0.005 0.006 0.007 0.004 0.002 0.001 t(errorsarlm(form, data=boston_489, listw=lw_q_489, Durbin=TRUE, zero.policy=TRUE)$timings[,2])
#>      set_up eigen_set_up eigen_opt coefs eigen_hcov eigen_se
#> [1,]   0.01        0.051     0.023 0.006       0.04    0.052

while in the MCMC case, the default use of Spatial Econometric Toolbox gridded log determinants and obviously sampling takes most time:

t(attr(SDEM_94B, "timings")[ , 3])
#>      set_up SE_classic_set_up complete_setup sampling finalise
#> [1,]  0.005             0.268          0.036     1.57    0.067
t(attr(SDEM_489B, "timings")[ , 3])
#>      set_up SE_classic_set_up complete_setup sampling finalise
#> [1,]  0.014             0.484          0.063     2.05    0.055

However, as we will see shortly, inference from model impacts may need sampling (where the spatially lagged response is included in the model), and in the case of MCMC models, the samples have already been drawn.

## 18.3 Implementation details

We will briefly touch on selected implementation details that applied users of spatial regression models would be wise to review. The handling of the log determinant term applies to all such users, while impacts are restricted to those employing spatial econometrics style models either including the spatially lagged response or including spatially lagged covariates.

### 18.3.1 Handling the log determinant term

It has been known for over twenty years that the sparse matrix representation of spatial weights overcomes the difficulties of fitting models with larger numbers of observations using maximum likelihood and MCMC where the log determinant term comes into play (R. K. Pace and Barry 1997a; R. K. Pace and Barry 1997b; R. K. Pace and Barry 1997c; R. K. Pace and Barry 1997d). During the development of these approaches in model fitting functions in spatialreg, use was first made of C code also used in the S-PLUS SpatialStats module (Kaluzny et al. 1998), then SparseM which used a compressed sparse row form very similar to "nb" and "listw" objects. This was followed by the use of spam and Matrix methods, both of which mainly use compressed sparse column representations. Details are provided in Bivand, Hauke and Kossowski (2013).

The domain of the spatial coefficient(s) is given by the interval= argument to model fitting functions, and returned in the fitted object:

SEM_94$interval #>  -1.53 1.00 This case is trivial, because the upper bound is unity by definition, because of the use of row standardization. The interval is the inverse of the range of the eigenvalues of the weights matrix: 1/range(eigs_94) #>  -1.53 1.00 Finding the interval within which to search for the spatial coefficient is trivial for smaller data sets, but more complex for larger ones. It is possible to use heuristics implemented in lextrW() (Griffith, Bivand, and Chun 2015): 1/c(lextrW(lw_q_94)) #> lambda_n lambda_1 #> -1.53 1.00 or RSpectra::eigs() after coercion to a Matrix package compressed sparse column representation: W <- as(lw_q_94, "CsparseMatrix") 1/Re(c(RSpectra::eigs(W, k=1, which="SR")$values,
RSpectra::eigs(W, k=1, which="LR")$values)) #>  -1.53 1.00 Why are we extracting the real part of the values returned by eigs()? Since nb_q_94 is symmetric, the row-standardized object lw_q_94 is symmetric by similarity, a result known since Ord (1975); consequently, we can take the real part without concern. Had the underlying neighbour relationships not been symmetric, we should be more careful. The baseline log determinant term as given by Ord (1975) for a coefficient value proposed in sampling or during numerical optimization; this extract matches the "eigen" method (with or without control=list(pre_eig=...)"): coef <- 0.5 sum(log(1 - coef * eigs_94)) #>  -2.87 Using sparse matrix functions from Matrix, the LU decomposition can be used for asymmetric matrices; this extract matches the "LU" method: I <- Diagonal(nrow(boston_94)) LU <- lu(I - coef * W) dU <- abs(diag(slot(LU, "U"))) sum(log(dU)) #>  -2.87 and Cholesky decomposition for symmetric matrices, with similar.listw() used to handle asymmetric weights that are similar to symmetric. The default value od super allows the underlying Matrix code to choose between supernodal or simplicial decomposition; this extract matches the "Matrix_J" method: W <- as(similar.listw(lw_q_94), "CsparseMatrix") super <- as.logical(NA) cch <- Cholesky((I - coef * W), super=super) c(2 * determinant(cch, logarithm = TRUE)$modulus)
#>  -2.87

The "Matrix" and "spam_update" methods are to be preferred as they pre-compute the fill-reducing permutation of the decomposition since the weights do not change for different values of the coefficient.

Maximum likelihood model fitting functions in spatialreg and splm use jacobianSetup() to populate env= environment with intermediate objects needed to find log determinants during optimization. Passing environments to objective functions is efficient because they are passed by reference rather than value. The con= argument passes through the populated control list, containing default values unless the key-value pairs were given in the function call (pre_eig= is extracted separately). The which= argument is 1 by default, but will also take 2 in SAC and GNM models. HSAR uses mcdet_setup() to set up Monte Carlo approximation terms.

args(jacobianSetup)
#> function (method, env, con, pre_eig = NULL, trs = NULL, interval = NULL,
#>     which = 1)
#> NULL

For each value of coef, the do_ldet() function returns the log determinant, using the values stored in environment env=:

args(do_ldet)
#> function (coef, env, which = 1)
#> NULL

As yet the Bayesian models are limited to control argument ldet_method="SE_classic" at present, using "LU" to generate a coarse grid of control argument nrho=200L log determinant values in the interval, spline interpolated to a finer grid of length control argument interpn=2000L, from which griddy Gibbs samples are drawn. It is hoped to add facilities to choose alternative methods in the future. This would offer possibilities to move beyond griddy Gibbs, but using gridded log determinant values seems reasonable at present.

### 18.3.2 Impacts

Global impacts have been seen as crucial for reporting results from fitting models including the spatially lagged response (SLM, SDM, SAC. GNM) for over ten years (LeSage and Pace 2009). Extension to other models including spatially lagged covariates (SLX, SDEM) has followed (Elhorst 2010; Bivand 2012; Halleck Vega and Elhorst 2015). In order to infer from the impacts, linear combination may be used for SLX and SDEM models. For SLM, SDM, SAC and GNM models fitted with maximum likelihood or GMM, the variance-covariance matrix of the coefficients is available, and can be used to make random draws from a multivariate Normal distribution with mean set to coefficient values and variance to the estimated variance-covariance matrix. For these models fitted using Bayesian methods, draws are already available. In the SDEM case, the draws on the regression coefficients of the unlagged covariates represent direct impacts, and draws on the coefficients of the spatially lagged covariates represent indirect impacts, and their by-draw sums the total impacts.

Impacts are calculated using model object class specific impacts() methods, here taking the method for "sarlm" objects as an example. In the sphet package, the impacts method for "gstsls" uses the spatialreg impacts() framework, as does the splm package for "splm" fitted model objects. impacts() methods require either a tr= - a vector of traces of the power series of the weights object typically computed with trW() or a listw= argument. If listw= is given, dense matrix methods are used. The evalues= argument is experimental, does not yet work for all model types, and takes the eigenvalues of the weights matrix. The R= argument gives the number of samples to be taken from the fitted model. The Q= permits the decomposition of impacts to components of the power series of the weights matrix (LeSage and Pace 2009).

args(impacts.sarlm)
#> function (obj, ..., tr = NULL, R = NULL, listw = NULL, evalues = NULL,
#>     useHESS = NULL, tol = 1e-06, empirical = FALSE, Q = NULL)
#> NULL

The summary method for the output of impacts() methods where inference from samples was requested by default uses the summary() method for "mcmc" objects defined in the coda package. It can instead report just matrices of standard errors, z-values and p-values by setting zstats= and short= to TRUE.

args(summary.lagImpact)
#> function (object, ..., zstats = FALSE, short = FALSE, reportQ = NULL)
#> NULL

Since sampling is not required for inference for SLX and SDEM models, linear combination is used for models fitted using maximum likelihood; results are shown here for the air pollution variable only. The literature has not yet resolved the question of how to report model output, as each covariate is now represented by three impacts. Where spatially lagged covariates are included, two coefficients are replaced by three impacts.

sum_imp_94_SDEM <- summary(impacts(SDEM_94))
rbind(Impacts=sum_imp_94_SDEM$mat[5,], SE=sum_imp_94_SDEM$semat[5,])
#>           Direct Indirect   Total
#> Impacts -0.01276 -0.01845 -0.0312
#> SE       0.00235  0.00472  0.0053

The impacts from the same model fitted by MCMC are very similar:

sum_imp_94_SDEM_B <- summary(impacts(SDEM_94B))
rbind(Impacts=sum_imp_94_SDEM_B$mat[5,], SE=sum_imp_94_SDEM_B$semat[5,])
#>          Direct Indirect    Total
#> Impacts -0.0128  -0.0176 -0.03038
#> SE       0.0025   0.0056  0.00661

as also are those from the SLX model. In the SLX and SDEM models, the direct impacts are the consequences for the response of changes in air pollution in the same observational entity, and the indirect (local) impacts are the consequences for the response of changes in air pollution in neighbouring observational entities.

sum_imp_94_SLX <- summary(impacts(SLX_94))
rbind(Impacts=sum_imp_94_SLX$mat[5,], SE=sum_imp_94_SLX$semat[5,])
#>          Direct Indirect    Total
#> Impacts -0.0128 -0.01874 -0.03151
#> SE       0.0028  0.00556  0.00611

In contrast to local indirect impacts in SLX and SDEM models, global indirect impacts are found in models including the spatially lagged response. For purposes of exposition, let us fit an SLM:

SLM_489 <- lagsarlm(form, data=boston_489, listw=lw_q_489, zero.policy=TRUE)

Traces of the first m= matrices of the power series in the spatial weights are pre-computed to speed up inference from samples from the fitted model, and from existing MCMC samples (LeSage and Pace 2009). The traces can also be used in other contexts too, so their pre-computation may be worthwhile anyway. The type= argument is "mult" by default, but may also be set to "MC" for Monte Carlo simulation or "moments" using a space-saving looping algorithm.

args(trW)
#> function (W = NULL, m = 30, p = 16, type = "mult", listw = NULL,
#>     momentsSymmetry = TRUE)
#> NULL
W <- as(lw_q_489, "CsparseMatrix")
tr_489 <- trW(W)
str(tr_489)
#>  num [1:30] 0 90.8 29.4 42.1 26.5 ...
#>  - attr(*, "timings")= Named num [1:2] 0.176 0.177
#>   ..- attr(*, "names")= chr [1:2] "user.self" "elapsed"
#>  - attr(*, "type")= chr "mult"
#>  - attr(*, "n")= int 489

In this case, the spatial process in the response is not strong, so the global indirect impacts (here for the air pollution variable) are weak.

SLM_489_imp <- impacts(SLM_489, tr=tr_489, R=2000)
SLM_489_imp_sum <- summary(SLM_489_imp, short=TRUE, zstats=TRUE)
res <- rbind(Impacts=sapply(SLM_489_imp$res, "[", 5), SE=SLM_489_imp_sum$semat[5,])
colnames(res) <- c("Direct", "Indirect", "Total")
res
#>           Direct  Indirect    Total
#> Impacts -0.00593 -1.01e-05 -0.00594
#> SE       0.00106  1.00e-04  0.00107

Of more interest is trying to reconstruct the direct and total impacts using dense matrix methods; the direct global impacts are the mean of the diagonal of the dense impacts matrix, and the total global impacts are the sum of all matrix elements divided by the number of observations. The direct impacts agree, but the total impacts differ slightly.

coef_SLM_489 <- coef(SLM_489)
IrW <- Diagonal(489) - coef_SLM_489 * W
S_W <- solve(IrW)
S_NOX_W <- S_W %*% (diag(489) * coef_SLM_489)
c(Direct=mean(diag(S_NOX_W)), Total=sum(S_NOX_W)/489)
#>   Direct    Total
#> -0.00593 -0.00594

This bare-bones approach corresponds to using the listw= argument, and as expected gives the same output.

sapply(impacts(SLM_489, listw=lw_q_489), "[", 5)
#>    direct  indirect     total
#> -5.93e-03 -1.01e-05 -5.94e-03

The experimental evalues= approach which is known to be numerically exact by definition gives the same results as the matrix power series trace approach, so the slight difference may be attributed to the consequences of inverting the spatial process matrix.

sapply(impacts(SLM_489, evalues=eigs_489), "[", 5)
#>    direct  indirect     total
#> -5.93e-03 -1.01e-05 -5.94e-03

Impacts are crucial to the interpretation of Durbin models including spatially lagged covariates and models including the spatially lagged response. Tools to calculate impacts and their inferential bases are now available, and should be employed, but as yet some implementation details are under development and ways of presenting results in tabular form have not reached maturity.

### 18.3.3 Predictions

We will use the predict() method for "sarlm" objects to double-check impacts, here for the pupil-teacher ratio (PTRATIO). The method was re-written by Martin Gubri based on Goulard, Laurent and Thomas-Agnan (2017). The pred.type= argument specifies the prediction strategy among those presented in the article.

args(predict.sarlm)
#> function (object, newdata = NULL, listw = NULL, pred.type = "TS",
#>     all.data = FALSE, zero.policy = NULL, legacy = TRUE, legacy.mixed = FALSE,
#>     power = NULL, order = 250, tol = .Machine$double.eps^(3/5), #> spChk = NULL, ...) #> NULL First we’ll increment PTRATIO by one to show that, using least squares, the mean difference between predictions from the incremented new data and fitted values is equal to the regression coefficient. nd_489 <- boston_489 nd_489$PTRATIO <- nd_489$PTRATIO + 1 OLS_489 <- lm(form, data=boston_489) fitted <- predict(OLS_489) nd_fitted <- predict(OLS_489, newdata=nd_489) all.equal(unname(coef(OLS_489)), mean(nd_fitted - fitted)) #>  TRUE In models including the spatially lagged response, and when the spatial coefficient in different from zero, this is not the case in general, and is why we need impacts() methods. The difference here is not great, but neither is it zero, and needs to be handled. fitted <- predict(SLM_489) nd_fitted <- predict(SLM_489, newdata=nd_489, listw=lw_q_489, pred.type="TS", zero.policy=TRUE) all.equal(unname(coef_SLM_489), mean(nd_fitted - fitted)) #>  "Mean relative difference: 0.00178" In the Boston tracts data set, 17 observations of median house values, the response, are censored. Using these as an example and comparing some pred.type= variants for the SDEM model and predicting out-of-sample, we can see that there are differences, suggesting that this is a fruitful area for study. There have been a number of alternative proposals for handling missing variables (Gómez-Rubio, Bivand, and Rue 2015; Suesse 2018). Another reason for increasing attention on prediction is that it is fundamental for machine learning approaches, in which prediction for validation and test data sets drives model specification choice. The choice of training and other data sets with dependent spatial data remains an open question, and is certainly not as simple as with independent data. Here, we’ll list the predictions for the censored tract observations using three different prediction types, taking the exponent to get back to the USD median house values. nd <- boston_506[is.na(boston_506$median),]
t0 <- exp(predict(SDEM_489, newdata=nd, listw=lw_q, pred.type="TS", zero.policy=TRUE))
suppressWarnings(t1  <- exp(predict(SDEM_489, newdata=nd, listw=lw_q, pred.type="KP2",
zero.policy=TRUE)))
suppressWarnings(t2  <- exp(predict(SDEM_489, newdata=nd, listw=lw_q, pred.type="KP5",
zero.policy=TRUE)))
data.frame(fit_TS=t0[,1], fit_KP2=c(t1), fit_KP5=c(t2),
censored=boston_506$censored[as.integer(attr(t0, "region.id"))]) #> fit_TS fit_KP2 fit_KP5 censored #> 13 23912 29477 28147 right #> 14 28126 27001 28516 right #> 15 30553 36184 32476 right #> 17 18518 19621 18878 right #> 43 9564 6817 7561 left #> 50 8371 7196 7383 left #> 312 51477 53301 54173 right #> 313 45921 45823 47095 right #> 314 44196 44586 45361 right #> 317 43427 45707 45442 right #> 337 39879 42072 41127 right #> 346 44708 46694 46108 right #> 355 48188 49068 48911 right #> 376 42881 45883 44966 right #> 408 44294 44615 45670 right #> 418 38211 43375 41914 right #> 434 41647 41690 42398 right ## 18.4 Markov random field and multilevel models with spatial weights There is a large literature in spatial epidemiology using CAR and ICAR models in spatially structured random effects. These extend to multilevel models, in which the spatially structured random effects may apply at different levels of the model (Bivand et al. 2017). In order to try out some of the variants, we need to remove the no-neighbour observations from the tract level, and from the model output zone aggregated level, in two steps as reducing the tract level induces a no-neighbour outcome at the model output zone level. boston_94a <- aggregate(boston_489[,"NOX_ID"], list(boston_489$NOX_ID), unique)
nb_q_94a <- spdep::poly2nb(boston_94a)
NOX_ID_no_neighs <- boston_94a$NOX_ID[which(spdep::card(nb_q_94a) == 0)] boston_487 <- boston_489[is.na(match(boston_489$NOX_ID, NOX_ID_no_neighs)),]
boston_93 <- aggregate(boston_487[, "NOX_ID"], list(ids = boston_487$NOX_ID), unique) row.names(boston_93) <- as.character(boston_93$NOX_ID)
nb_q_93 <- spdep::poly2nb(boston_93, row.names=unique(as.character(boston_93$NOX_ID))) The lme4 package lets us add an IID unstructured random effect at the model output zone level: library(lme4) MLM <- lmer(update(form, . ~ . + (1 | NOX_ID)), data=boston_487, REML=FALSE) copying the random effect into the "sf" object for mapping below. boston_93$MLM_re <- ranef(MLM)[][,1]

Two packages, hglm and HSAR, offer SAR upper level spatially structured random effects, and require the specification of a sparse matrix mapping the upper level enities onto lower level entities, and sparse binary weights matrices:

library(Matrix)
suppressMessages(library(MatrixModels))
Delta <- as(model.Matrix(~ -1 + as.factor(NOX_ID), data=boston_487, sparse=TRUE),
"dgCMatrix")
M <- as(spdep::nb2listw(nb_q_93, style="B"), "CsparseMatrix")

The extension of hglm to sparse spatial setting extended its facilities (Alam, Rönnegård, and Shen 2015), and also permits the modelling of discrete responses. First we fit an IID random effect:

suppressPackageStartupMessages(library(hglm))
y_hglm <- log(boston_487$median) X_hglm <- model.matrix(lm(form, data=boston_487)) suppressWarnings(HGLM_iid <- hglm(y=y_hglm, X=X_hglm, Z=Delta)) followed by a SAR model at the upper level (corresponding to a spatial error (SEM) model), which reports the spatially structured random effect without fully converging, so coefficient standard errors are not available: suppressWarnings(HGLM_sar <- hglm(y=y_hglm, X=X_hglm, Z=Delta, rand.family=SAR(D=M))) boston_93$HGLM_re <- unname(HGLM_iid$ranef) boston_93$HGLM_ss <- HGLM_sar$ranef[,1] The HSAR package is restricted to the Gaussian response case, and fits an upper level SEM using MCMC; if W= is a lower level weights matrix, it will also fit a lower level SLM (Dong and Harris 2015; Dong et al. 2015): library(HSAR) suppressWarnings(HSAR <- hsar(form, data=boston_487, W=NULL, M=M, Delta=Delta, burnin=500, Nsim=2500, thinning=1)) boston_93$HSAR_ss <- HSAR$Mus[1,] The R2BayesX package provides flexible support for structured additive regression models, including spatial multilevel models. The models include an IID unstructured random effect at the upper level using the "re" specification (Umlauf et al. 2015); we choose the "MCMC"method: suppressPackageStartupMessages(library(R2BayesX)) BX_iid <- bayesx(update(form, . ~ . + sx(NOX_ID, bs="re")), family="gaussian", data=boston_487, method="MCMC", iterations=12000, burnin=2000, step=2, seed=123) boston_93$BX_re <- BX_iid$effects["sx(NOX_ID):re"][]$Mean

and the "mrf" (Markov random field) spatially structured random effect specification based on a graph derived from converting a suitable "nb" object for the upper level. The "region.id" attribute of the "nb" object needs to contain values corresponding the the indexing variable.

RBX_gra <- nb2gra(nb_q_93)
BX_mrf <- bayesx(update(form, . ~ . + sx(NOX_ID, bs="mrf", map=RBX_gra)),
family="gaussian", data=boston_487, method="MCMC",
iterations=12000, burnin=2000,step=2, seed=123)
boston_93$BX_ss <- BX_mrf$effects["sx(NOX_ID):mrf"][]$Mean In a very similar way, mgcv::gam() can take an "mrf" term using a suitable "nb" object for the upper level. In this case the "nb" object needs to have the contents of the "region.id" attribute copied as the names of the neighbour list components, and the indexing variable needs to be a factor (Wood 2017) (the "REML" method of bayesx() gives the same result here): library(mgcv) names(nb_q_93) <- attr(nb_q_93, "region.id") boston_487$NOX_ID <- as.factor(boston_487$NOX_ID) GAM_MRF <- gam(update(form, . ~ . + s(NOX_ID, bs="mrf", xt=list(nb=nb_q_93))), data=boston_487, method="REML") boston_93$GAM_ss <- aggregate(predict(GAM_MRF, type="terms", se=FALSE)[,14],
list(boston_487$NOX_ID), mean)$x

In the cases of hglm(), bayesx() and gam(), we could also model discrete responses without further major difficulty, and bayesx() and gam() also facilitate the generalization of functional form fitting for included covariates.

res <- rbind(iid_lmer=summary(MLM)$coefficients[6, 1:2], iid_hglm=summary(HGLM_iid)$FixCoefMat[6, 1:2],
iid_BX=BX_iid$fixed.effects[6, 1:2], sar_hsar=c(HSAR$Mbetas[1, 6], HSAR$SDbetas[1, 6]), mrf_BX=BX_mrf$fixed.effects[6, 1:2],
mrf_GAM=c(summary(GAM_MRF)$p.coeff, summary(GAM_MRF)$se))

Unfortunately, the coefficient estimates for the air pollution variable for these multilevel models are not helpful. All remain negative, but the inclusion of the model output zone level effects, be they IID or spatially structured, suggest that it is hard to disentangle the influence of the scale of observation from that of covariates observed at that scale.

suppressPackageStartupMessages(library(ggplot2))
df_res <- as.data.frame(res)
names(df_res) <- c("mean", "sd")
limits <- aes(ymax = mean + qnorm(0.975)*sd, ymin=mean + qnorm(0.025)*sd)
df_res\$model <- row.names(df_res)
p <- ggplot(df_res, aes(y=mean, x=model)) + geom_point() + geom_errorbar(limits) + geom_hline(yintercept = 0, col="#EB811B") + coord_flip()
p + ggtitle("NOX coefficients and error bars") + theme(plot.background = element_rect(fill = "transparent",colour = NA), legend.background = element_rect(colour = NA, fill = "transparent")) Figure 18.1: Polish municipality types 2015

This map shows that the model output zone level IID random effects are very similar across the three model fitting functions reported.

library(tmap)
tm_shape(boston_93) + tm_fill(c("MLM_re", "HGLM_re", "BX_re"), midpoint=0, title="IID")  + tm_facets(free.scales=FALSE) + tm_borders(lwd=0.3, alpha=0.4) + tm_layout(panel.labels=c("MLM", "HGLM", "BX")) Figure 18.2: IID random effects

The spatially structured SAR and MRF random effects (MRF term in the gam() case) are also very similar, with the MRF somewhat less smoothed than the SAR values.

tm_shape(boston_93) + tm_fill(c("HGLM_ss", "HSAR_ss", "BX_ss", "GAM_ss"), midpoint=0, title="SS")  + tm_facets(free.scales=FALSE) + tm_borders(lwd=0.3, alpha=0.4) + tm_layout(panel.labels=c("HGLM SAR", "HSAR SAR", "BX MRF", "GAM MRF"))` Figure 18.3: Spatially structured random effects

Although there is still a great need for more thorough comparative studies of model fitting functions for spatial regression, there has been much progress over recent years.

### References

Alam, Moudud, Lars Rönnegård, and Xia Shen. 2015. “Fitting Conditional and Simultaneous Autoregressive Spatial Models in Hglm.” The R Journal 7 (2): 5–18. https://doi.org/10.32614/RJ-2015-017.

Anselin, L. 1988. Spatial Econometrics: Methods and Models. Kluwer Academic Publishers.

Besag, Julian. 1974. “Spatial Interaction and the Statistical Analysis of Lattice Systems.” Journal of the Royal Statistical Society. Series B (Methodological) 36: pp. 192–236.

Bivand, R. S. 1984. “Regression Modeling with Spatial Dependence: an Application of Some Class Selection and Estimation Methods.” Geographical Analysis 16: 25–37.

Bivand, R. 2002. “Spatial Econometrics Functions in R: Classes and Methods.” Journal of Geographical Systems 4: 405–21.

Bivand, Roger. 2017. “Revisiting the Boston Data Set — Changing the Units of Observation Affects Estimated Willingness to Pay for Clean Air.” REGION 4 (1): 109–27. doi:10.18335/region.v4i1.107.

Bivand, Roger S. 2012. “After ’Raising the Bar’: Applied Maximum Likelihood Estimation of Families of Models in Spatial Econometrics.” Estadística Española 54: 71–88.

Bivand, Roger S., and Gianfranco Piras. 2015. “Comparing Implementations of Estimation Methods for Spatial Econometrics.” Journal of Statistical Software 63 (1): 1–36. doi:10.18637/jss.v063.i18.

Bivand, Roger S., Jan Hauke, and Tomasz Kossowski. 2013. “Computing the Jacobian in Gaussian Spatial Autoregressive Models: An Illustrated Comparison of Available Methods.” Geographical Analysis 45 (2): 150–79. doi:10.1111/gean.12008.

Bivand, Roger S., Zhe Sha, Liv Osland, and Ingrid Sandvig Thorsen. 2017. “A Comparison of Estimation Methods for Multilevel Models of Spatially Structured Data.” Spatial Statistics. https://doi.org/10.1016/j.spasta.2017.01.002.

Brooks, Mollie E., Kasper Kristensen, Koen J. van Benthem, Arni Magnusson, Casper W. Berg, Anders Nielsen, Hans J. Skaug, Martin Maechler, and Benjamin M. Bolker. 2017. “glmmTMB Balances Speed and Flexibility Among Packages for Zero-Inflated Generalized Linear Mixed Modeling.” The R Journal 9 (2): 378–400. https://journal.r-project.org/archive/2017/RJ-2017-066/index.html.

Burridge, P. 1981. “Testing for a Common Factor in a Spatial Autoregression Model.” Environment and Planning A 13: 795–800.

Cliff, A. D., and J. K. Ord. 1973. Spatial Autocorrelation. London: Pion.

Cliff, A. 1981. Spatial Processes. London: Pion.

Cliff, A., and J.K. Ord. 1972. “Testing for Spatial Autocorrelation Among Regression Residuals.” Geographical Analysis 4: 267–84.

Cressie, N. A. C. 1993. Statistics for Spatial Data. New York:Wiley.

Dong, G., and R. Harris. 2015. “Spatial Autorgressive Models for Geographically Hierarchical Data Structures.” Geographical Analysis 47 (2): 173–91.

Dong, G., R. Harris, K. Jones, and J. Yu. 2015. “Multilevel Modeling with Spatial Interaction Effects with Application to an Emerging Land Market in Beijing, China.” PLOS One 10 (6). doi:doi:10.1371/journal.pone.0130761.

Elhorst, J. Paul. 2010. “Applied Spatial Econometrics: Raising the Bar.” Spatial Economic Analysis 5: 9–28.

Fingleton, B. 1999. “Spurious spatial regression: Some Monte Carlo results with a spatial unit root and spatial cointegration.” Journal of Regional Science 9: 1–19.

Gaetan, Carlo, and Xavier Guyon. 2010. Spatial Statistics and Modeling. New York: Springer.

Gómez-Rubio, Virgilio, Roger Bivand, and Håvard Rue. 2015. “A New Latent Class to Fit Spatial Econometrics Models with Integrated Nested Laplace Approximations.” Procedia Environmental Sciences 27: 116–18. doi:https://doi.org/10.1016/j.proenv.2015.07.119.

Goulard, Michel, Thibault Laurent, and Christine Thomas-Agnan. 2017. “About Predictions in Spatial Autoregressive Models: Optimal and Almost Optimal Strategies.” Spatial Economic Analysis 12 (2-3). Routledge: 304–25. doi:10.1080/17421772.2017.1300679.

Griffith, Daniel, Roger Bivand, and Yongwan Chun. 2015. “Implementing Approximations to Extreme Eigenvalues and Eigenvalues of Irregular Surface Partitionings for Use in Sar and Car Models.” Procedia Environmental Sciences 26: 119–22. doi:https://doi.org/10.1016/j.proenv.2015.05.013.

Halleck Vega, Solmaria, and J. Paul Elhorst. 2015. “The SLX Model.” Journal of Regional Science 55 (3): 339–63. doi:10.1111/jors.12188.

Hepple, Leslie W. 1976. “A Maximum Likelihood Model for Econometric Estimation with Spatial Series.” In Theory and Practice in Regional Science, edited by I. Masser, 90–104. London Papers in Regional Science. London: Pion.

Kaluzny, S.P., S.C. Vega, T.P. Cardoso, and A.A. Shelly. 1998. S+SpatialStats. New York, NY: Springer.

Kelejian, Harry, and Gianfranco Piras. 2017. Spatial Econometrics. London: Academic Press.

LeSage, J. P. 2014. “What Regional Scientists Need to Know About Spatial Econometrics.” Review of Regional Studies 44: 13–32. https://journal.srsa.org/ojs/index.php/RRS/article/view/44.1.2.

LeSage, James P., and Kelley R. Pace. 2009. Introduction to Spatial Econometrics. Boca Raton, FL: CRC Press.

Lieshout, M. N. M. van. 2019. Theory of Spatial Statistics. Boca Raton, FL: Chapman; Hall/CRC.

Martinetti, Davide, and Ghislain Geniaux. 2017. “Approximate Likelihood Estimation of Spatial Probit Models.” Regional Science and Urban Economics 64: 30–45. doi:https://doi.org/10.1016/j.regsciurbeco.2017.02.002.

McMillen, D. P. 2013. Quantile Regression for Spatial Data. Heidelberg: Springer-Verlag.

Millo, Giovanni, and Gianfranco Piras. 2012. “splm: Spatial Panel Data Models in R.” Journal of Statistical Software 47 (1): 1–38.

Mur, Jesús, and Ana Angulo. 2006. “The Spatial Durbin Model and the Common Factor Tests.” Spatial Economic Analysis 1 (2). Routledge: 207–26. doi:10.1080/17421770601009841.

Ord, J. K. 1975. “Estimation Methods for Models of Spatial Interaction.” Journal of the American Statistical Association 70 (349): 120–26.

Pace, R. K., and R. P. Barry. 1997a. “Fast CARs.” Journal of Statistical Computation and Simulation 59 (2): 123–45.

Pace, R. 1997b. “Quick computation of spatial autoregressive estimators.” Geographics Analysis 29 (3): 232–47.

Pace, R. 1997c. “Sparse spatial autoregressions.” Statistics & Probability Letters 33 (3): 291–97.

Pace, R. 1997d. “Fast spatial estimation.” Applied Economics Letters 4 (5): 337–41.

Pace, RK, and JP LeSage. 2008. “A Spatial Hausman Test.” Economics Letters 101: 282–84.

Pinheiro, Jose C., and Douglas M. Bates. 2000. Mixed-Effects Models in S and S-Plus. New York: Springer.

Piras, Gianfranco. 2010. “sphet: Spatial Models with Heteroskedastic Innovations in R.” Journal of Statistical Software 35 (1): 1–21.

Ripley, B. D. 1981. Spatial Statistics. New York: Wiley.

Ripley, B. 1988. Statistical Inference for Spatial Processes. Cambridge: Cambridge University Press.

Smith, T. E., and K. L. Lee. 2012. “The effects of spatial autoregressive dependencies on inference in ordinary least squares: a geometric approach.” Journal of Geographical Systems 14 (January): 91–124. doi:10.1007/s10109-011-0152-x.

Smith, Tony E. 2009. “Estimation Bias in Spatial Models with Strongly Connected Weight Matrices.” Geographical Analysis 41 (3): 307–32. doi:10.1111/j.1538-4632.2009.00758.x.

Suesse, Thomas. 2018. “Marginal Maximum Likelihood Estimation of Sar Models with Missing Data.” Computational Statistics & Data Analysis 120: 98–110. doi:https://doi.org/10.1016/j.csda.2017.11.004.

Umlauf, Nikolaus, Daniel Adler, Thomas Kneib, Stefan Lang, and Achim Zeileis. 2015. “Structured Additive Regression Models: An R Interface to BayesX.” Journal of Statistical Software 63 (21): 1–46. http://www.jstatsoft.org/v63/i21/.

Wagner, Martin, and Achim Zeileis. 2019. “Heterogeneity and Spatial Dependence of Regional Growth in the EU: A Recursive Partitioning Approach.” German Economic Review 20 (1): 67–82. doi:10.1111/geer.12146.

Wall, M. M. 2004. “A Close Look at the Spatial Structure Implied by the CAR and SAR Models.” Journal of Statistical Planning and Inference 121: 311–24.

Waller, Lance A., and Carol A. Gotway. 2004. Applied Spatial Statistics for Public Health Data. Hoboken, NJ: John Wiley & Sons.

Whittle, P. 1954. “On Stationary Processes in the Plane.” Biometrika 41 (3-4): 434–49. doi:10.1093/biomet/41.3-4.434.

Wilhelm, Stefan, and Miguel Godinho de Matos. 2013. “Estimating Spatial Probit Models in R.” The R Journal 5 (1): 130–43. doi:10.32614/RJ-2013-013.

Wood, S.N. 2017. Generalized Additive Models: An Introduction with R. 2nd ed. Chapman; Hall/CRC.