Spatial Data Science
with applications in R
Data science is concerned with finding answers to questions on the basis of available data, and communicating that effort. Besides showing the results, this communication involves sharing the data used, but also exposing the path that led to the answers in a comprehensive and reproducible way. It also acknowledges the fact that available data may not be sufficient to answer questions, and that any answers are conditional on the data collection or sampling protocols employed.
This book introduces and explains the concepts underlying spatial data: points, lines, polygons, rasters, coverages, geometry attributes, data cubes, reference systems, as well as higher-level concepts including how attributes relate to geometries and how this affects analysis. The relationship of attributes to geometries is known as support, and changing support also changes the characteristics of attributes. Some data generation processes are continuous in space, and may be observed everywhere. Others are discrete, observed in tesselated containers. In modern spatial data analysis, tesellated methods are often used for all data, extending across the legacy partition into point process, geostatistical and lattice models. It is support (and the understanding of support) that underlies the importance of spatial representation. The book aims at data scientists who want to get a grip on using spatial data in their analysis. To exemplify how to do things, it uses R.
It is often thought that spatial data boils down to having observations’ longitude and latitude in a dataset, and treating these just like any other variable. This carries the risk of missed opportunities and meaningless analyses. For instance,
- coordinate pairs really are pairs, and lose much of their meaning when treated independently
- rather than having point locations, observations are often associated with spatial lines, areas, or grid cells
- spatial distances between observations are often not well represented by straight-line distances, but by great circle distances, distances through networks, or by measuring the effort it takes getting from A to B
We introduce the concepts behind spatial data, coordinate reference
systems, spatial analysis, and introduce a number of packages,
sf (Pebesma 2018; E. Pebesma 2021c),
stars (E. Pebesma 2021d),
s2 (Dunnington, Pebesma, and Rubak 2021)
lwgeom (E. Pebesma 2021b),
as well as a number of
tidyverse (Wickham 2021) extensions, and a number of
spatial analysis and visualisation packages that can be used with these packages,
gstat (E. Pebesma and Graeler 2021),
spdep (Bivand 2022),
spatialreg (R. Bivand and Piras 2021),
spatstat (Baddeley, Turner, and Rubak 2021),
tmap (Tennekes 2021) and
mapview (Appelhans et al. 2021).
all GitHub contributors (t.b.d.),
Sahil Bhandari, Claus Wilke, Jakub Nowosad, SDSWR class summer 2021, all sf and stars authors,
This work is licensed under the Attribution-NonCommercial-NoDerivatives 4.0 International License.
Appelhans, Tim, Florian Detsch, Christoph Reudenbach, and Stefan Woellauer. 2021. Mapview: Interactive Viewing of Spatial Data in R. https://github.com/r-spatial/mapview.
Baddeley, Adrian, Rolf Turner, and Ege Rubak. 2021. Spatstat: Spatial Point Pattern Analysis, Model- Fitting, Simulation, Tests. http://spatstat.org/.
Bivand, Roger. 2022. Spdep: Spatial Dependence: Weighting Schemes, Statistics. https://CRAN.R-project.org/package=spdep.
Bivand, Roger, and Gianfranco Piras. 2021. Spatialreg: Spatial Regression Analysis. https://CRAN.R-project.org/package=spatialreg.
Dunnington, Dewey, Edzer Pebesma, and Ege Rubak. 2021. S2: Spherical Geometry Operators Using the S2 Geometry Library.
Pebesma, Edzer. 2018. “Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009.
Pebesma, Edzer. 2021b. Lwgeom: Bindings to Selected Liblwgeom Functions for Simple Features. https://github.com/r-spatial/lwgeom/.
Pebesma, Edzer. 2021c. Sf: Simple Features for R.
Pebesma, Edzer. 2021d. Stars: Spatiotemporal Arrays, Raster and Vector Data Cubes.
Pebesma, Edzer, and Benedikt Graeler. 2021. Gstat: Spatial and Spatio-Temporal Geostatistical Modelling, Prediction and Simulation. https://github.com/r-spatial/gstat/.
Tennekes, Martijn. 2021. Tmap: Thematic Maps. https://github.com/mtennekes/tmap.
Wickham, Hadley. 2021. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.