# Chapter 7 Introduction to sf and stars

This chapter introduces R packages sf and stars. sf provides a table format for simple features, where feature geometries are carried in a list-column. R package stars was written to support raster and vector datacubes (Chapter 6), and has raster data stacks and feature time series as special cases. sf first appeared on CRAN in 2016, stars in 2018. Development of both packages received support from the R Consortium as well as strong community engagement. The packages were designed to work together.

All functions operating on sf or stars objects start with st_, making it easy to recognize them or to search for them when using command line completion.

## 7.1 Package sf

Intended to succeed and replace R packages sp, rgeos and the vector parts of rgdal, R package sf (Pebesma 2018) was developed to move spatial data analysis in R closer to standards-based approaches seen in the industry and open source projects, to build upon more modern versions of the open source geospatial software stack (figure 1.6), and to allow for integration of R spatial software with the tidyverse if desired.

To do so, R package sf provides simple features access (J. Herring and others 2011), natively, to R. It provides an interface to several tidyverse packages, in particular to ggplot2, dplyr and tidyr. It can read and write data through GDAL, execute geometrical operations using GEOS (for projected coordinates) or s2geometry (for ellipsoidal coordinates), and carry out coordinate transformations or conversions using PROJ. External C++ libraries are interfaced using Rcpp (Eddelbuettel 2013).

Package sf represents sets of simple features in sf objects, a sub-class of a data.frame or tibble. sf objects contain at least one geometry list-column of class sfc, which for each element contains the geometry as an R object of class sfg. A geometry list-column acts as a variable in a data.frame or tibble, but has a more complex structure than e.g. numeric or character variables. Following the convention of PostGIS, all operations (functions, method) that operate on sf objects or related start with st_.

An sf object has the following meta-data:

• the name of the (active) geometry column, held in attribute sf_column
• for each non-geometry variable, the attribute-geometry relationship (section 5.1), held in attribute agr

An sfc geometry list-column has the following meta-data:

• the coordinate reference system held in attribute crs
• the bounding box held in attribute bbox
• the precision held in attribute precision
• the number of empty geometries held in attribute n_empty

These attributes may best be accessed or set by using functions like st_bbox, st_crs, st_set_crs, st_agr, st_set_agr, st_precision, and st_set_precision.

### 7.1.1 Creation

One could create an sf object from scratch e.g. by

# Simple feature collection with 3 features and 2 fields
# Geometry type: POINT
# Dimension:     XY
# Bounding box:  xmin: 7.22 ymin: 52.2 xmax: 7.44 ymax: 52.4
# Geodetic CRS:  WGS 84
#   elev marker              geom
# 1 33.2   Id01 POINT (7.35 52.4)
# 2 52.1   Id02 POINT (7.22 52.2)
# 3 81.2   Id03 POINT (7.44 52.2)

Figure 7.1 gives an explanation of the components printed. Rather than creating objects from scratch, spatial data in R are typically read from an external source, which can be:

• an external file
• a request to a web service
• a dataset held in some form in another R package

The next section introduces reading from files; section 8.1 discusses handling of datasets too large to fit into working memory.

Reading datasets from an external “data source” (file, web service, or even string) is done using st_read:

# [1] "/home/edzer/R/x86_64-pc-linux-gnu-library/4.0/sf/gpkg/nc.gpkg"
# Reading layer nc.gpkg' from data source
#   /home/edzer/R/x86_64-pc-linux-gnu-library/4.0/sf/gpkg/nc.gpkg'
#   using driver GPKG'
# Simple feature collection with 100 features and 14 fields
# Geometry type: MULTIPOLYGON
# Dimension:     XY
# Bounding box:  xmin: -84.3 ymin: 33.9 xmax: -75.5 ymax: 36.6

Here, the file name and path file is read from the sf package, which has a different path on every machine, and hence is guaranteed to be present on every sf installation.

Command st_read has two arguments: the data set name (dsn) and the layer. In the example above, the geopackage (GPKG) file contains only a single layer that is being read. If it had contained multiple layers, then the first layer would have been read and a warning would have been emitted. The available layers of a data set can be queried by

# Driver: GPKG
# Available layers:
#   layer_name geometry_type features fields crs_name
# 1    nc.gpkg Multi Polygon      100     14    NAD27

Simple feature objects can be written with st_write, as in

# [1] "/tmp/Rtmpj9NN5m/file59b02560fbe34.gpkg"
# Writing layer layer_nc' to data source
#   /tmp/Rtmpj9NN5m/file59b02560fbe34.gpkg' using driver GPKG'
# Writing 100 features with 14 fields and geometry type Multi Polygon.

where the file format (GPKG) is derived from the file name extension.

### 7.1.3 Subsetting

A very common operation is to subset objects; base R can use [ for this. The rules that apply to data.frame objects also apply to sf objects, e.g. that records 2-5 and columns 3-7 are selected by

but with a few additional features, in particular:

• the drop argument is by default FALSE meaning that the geometry column is always selected, and an sf object is returned; when it is set to TRUE and the geometry column not selected, it is dropped and a data.frame is returned
• selection with a spatial (sf, sfc or sfg) object as first argument leads to selection of the features that spatially intersect with that object (see next section); other predicates than intersects can be chosen by setting parameter op to a function such as st_covers or or any other binary predicate function listed in section 3.2.2

### 7.1.4 Binary predicates

Binary predicates like st_intersects, st_covers etc (section 3.2.2) take two sets of features or feature geometries and return for all pairs whether the predicate is TRUE or FALSE. For large sets this would potentially result in a huge matrix, typically filled mostly with FALSE values and for that reason a sparse representation is returned by default:

# Sparse geometry binary predicate list of length 5, where the predicate
# was `intersects'
#  1: 1, 2
#  2: 1, 2, 3
#  3: 2, 3
#  4: 4, 7
#  5: 5, 6

Figure 7.2 shows how the intersections of the first five with the first seven counties can be understood. We can transform the sparse logical matrix into a dense matrix by

#       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]
# [1,]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
# [2,]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE
# [3,] FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE
# [4,] FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE
# [5,] FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE

The number of counties that each of nc5 intersects with is

# [1] 2 3 2 2 2

and the other way around, the number of counties in nc5 that intersect with each of the counties in nc7 is

# [1] 2 3 2 1 1 1 1

The object i is of class sgbp (sparse geometrical binary predicate), and is a list of integer vectors, with each element representing a row in the logical predicate matrix holding the column indices of the TRUE values for that row. It further holds some metadata like the predicate used, and the total number of columns. Methods available for sgbp objects include

#  [1] as.data.frame as.matrix     coerce        dim           initialize
#  [6] Ops           print         show          slotsFromS3   t
# see '?methods' for accessing help and source code

where the only Ops method available is !, the negation operation.

### 7.1.5 tidyverse

The tidyverse is a collection of data science packages that work together, described e.g. in (Wickham and Grolemund 2017; Wickham et al. 2019). Package sf has tidyverse-style read and write functions, read_sf and write_sf, which return a tibble rather than a data.frame, do not print any output, and overwrite existing data by default.

Further tidyverse generics with methods for sf objects include filter, select, group_by, ungroup, mutate, transmute, rowwise, rename, slice, summarise, distinct, gather, pivot_longer, spread, nest, unnest, unite, separate, separate_rows, sample_n, and sample_frac. Most of these methods simply manage the metadata of sf objects, and make sure the geometry remains present. In case a user wants the geometry to be removed, one can use st_drop_geometry() or simply coerce to a tibble or data.frame before selecting:

# # A tibble: 3 × 1
#   BIR74
#   <dbl>
# 1  1091
# 2   487
# 3  3188

The summarise method for sf objects has two special arguments:

• do_union (default TRUE) determines whether grouped geometries are unioned on return, so that they form a valid geometry
• is_coverage (default FALSE) in case the geometries grouped form a coverage (do not have overlaps), setting this to TRUE speeds up the unioning

The distinct method selects distinct records, where st_equals is used to evaluate distinctness of geometries.

filter can be used with the usual predicates; when one wants to use it with a spatial predicate, e.g. to select all counties less than 50 km away from Orange county, one could use

# [1] 17

(where we use dplyr::filter rather than filter to avoid confusion with filter from base R.)

Figure 7.3 shows the results of this analysis, and in addition a buffer around the county borders; note that this buffer serves for illustration, it was not used to select the counties.

## 7.2 Spatial joins

In regular (left, right or inner) joins, joined records from a pair of tables are reported when one or more selected attributes match (are identical) in both tables. A spatial join is similar, but the criterion to join records is not equality of attributes but a spatial predicate. This leaves a wide variety of options in order to define spatially matching records, using binary predicates listed in section 3.2.2. The concepts of “left”, “right”, “inner” or “full” joins remain identical to the non-spatial join as the options for handling records that have no spatial match.

When using spatial joins, each record may have several matched records, yielding a large result table. A way to reduce this complexity may be to select from the matching records the one with the largest overlap with the target geometry. An example of this is shown (visually) in figure 7.4; this is done using st_join with argument largest = TRUE.

Another way to reduce the result set is to use aggregate after a join, all matching records, and union their geometries; see section 5.4.

### 7.2.1 Sampling, gridding, interpolating

Several convenience functions are available in package sf, some of which will be discussed here. Function st_sample generates a sample of points randomly sampled from target geometries, where target geometries can be point, line or polygon geometries. Sampling strategies can be (completely) random, regular or (with polygons) triangular. Chapter 11 explains how spatial sampling (or point pattern simulation) methods available in package spatstat are interfaced through st_sample.

Function st_make_grid creates a square, rectangular or hexagonal grid over a region, or points with the grid centers or corners. It was used to create the rectangular grid in figure 7.4.

Function st_interpolate_aw “interpolates” area values to new areas, as explained in section 5.3, both for intensive and extensive variables.

## 7.3 Package stars

Although package sp has always had limited support for raster data, over the last decade R package raster (Hijmans 2022a) has clearly been dominant as the prime package for powerful, flexible and scalable raster analysis. The raster data model of package raster (and its successor, terra (Hijmans 2022b)) is that of a 2D regular raster, or a set of raster layers (a “raster stack”). This aligns with the classical static “GIS world view”, where the world is modelled as a set of layers, each representing a different theme. A lot of data available today however is dynamic, and comes as time series of rasters or raster stacks. A raster stack does not meaningfully reflect this, requiring the user to keep a register of which layer represents what.

Also, the raster package, and its successor terra do an excellent job in scaling computations up to data sizes no larger than the local storage (the computer’s hard drives), and doing this fast. Recent datasets however, including satellite imagery, climate model or weather forecasting data, often no longer fit in local storage (chapter 8). Package spacetime (Pebesma 2012) addresses the analysis of time series of vector geometries or raster grid cells, but does not extend to higher-dimensional arrays.

Here, we introduce package stars for analysing raster and vector data cubes. The package:

• allows for representing dynamic (time varying) raster stacks
• aims at being scalable, also beyond local disk size
• provides a strong integration of raster functions in the GDAL library
• in addition to regular grids handles rotated, sheared, rectilinear and curvilinear rasters (figure 1.5)
• provides a tight integration with package sf
• also handles array data with non-raster spatial dimensions, the vector data cubes
• follows the tidyverse design principles

Vector data cubes include for instance time series for simple features, or spatial graph data such as potentially dynamic origin-destination matrices. The concept of spatial vector and raster data cubes was explained in chapter 6.

### 7.3.1 Reading and writing raster data

Raster data typically are read from a file. We use a dataset containing a section of Landsat 7 scene, with the 6 30m-resolution bands (bands 1-5 and 7) for a region covering the city of Olinda, Brazil. We can read the example GeoTIFF file holding a regular, non-rotated grid from the package stars:

# stars object with 3 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max.
# L7_ETMs.tif     1      54     69 68.9      86  255
# dimension(s):
#      from  to  offset delta            refsys point values x/y
# x       1 349  288776  28.5 SIRGAS 2000 / ... FALSE   NULL [x]
# y       1 352 9120761 -28.5 SIRGAS 2000 / ... FALSE   NULL [y]
# band    1   6      NA    NA                NA    NA   NULL

where we see the offset, cell size, coordinate reference system, and dimensions. The dimension table reports the following for each dimension:

• from: the starting index
• to: the ending index
• offset: the dimension value at the start (edge) of the first pixel
• delta: the cell size; negative delta values indicate that pixel index increases with decreasing dimension values
• refsys: the reference system
• point: boolean, indicating whether cell values have point support or cell support
• values: a list with values or labels associated with each of the dimension’s values
• x/y: an indicator whether a dimension is associated with a spatial raster x- or y-axis

For regular, rotated or sheared grids or other regularly discretized dimensions (e.g. time), offset and delta are not NA and values is NULL; for other cases, offset and delta are NA and values contains one of:

• the sequence of values, or intervals, in case of a rectlinear spatial raster or irregular time dimension
• in case of a vector data cube, geometries associated with the spatial dimension
• in case of a curvilinear raster, the matrix with coordinate values for each raster cell
• in case of a discrete dimension, the band names or labels associated with the dimension values

The object r is of class stars and is a simple list of length one, holding a three-dimensional array:

# [1] 1
# [1] "array"
#    x    y band
#  349  352    6

and in addition holds an attribute with a dimensions table with all the metadata required to know what the array dimensions refer to, obtained by

We can get the spatial extent of the array by

#    xmin    ymin    xmax    ymax
#  288776 9110729  298723 9120761

Raster data can be written to local disk using write_stars:

where again the data format (in this case, GeoTIFF) is derived from the file extension. As for simple features, reading and writing uses the GDAL library; the list of available drivers for raster data is obtained by

### 7.3.2 Subsetting stars data cubes

Data cubes can be subsetted using [, or using tidyverse verbs. The first options, [ uses for the arguments: attributes first, followed by dimension. This means that r[1:2, 101:200, , 5:10] selects from r attributes 1-2, index 101-200 for dimension 1, and index 5-10 for dimension 3; omitting dimension 2 means that no subsetting takes place. For attributes, attributes names or logical vectors can be used. For dimensions, logical vectors are not supported; Selecting discontinuous ranges supported when it is a regular sequence. By default, drop is FALSE, when set to TRUE dimensions with a single value are dropped:

#    x    y band
#  100   50    1
#   x   y
# 100  50

For selecting particular ranges of dimension values, one can use filter (after loading dplyr):

# stars object with 3 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max.
# L7_ETMs.tif     5      51     63 64.3      75  242
# dimension(s):
#      from  to  offset delta            refsys point values x/y
# x       1  35  289004  28.5 SIRGAS 2000 / ... FALSE   NULL [x]
# y       1 352 9120761 -28.5 SIRGAS 2000 / ... FALSE   NULL [y]
# band    1   6       1     1                NA    NA   NULL

which changes the offset of the $$x$$ dimension. Particular cube slices can also be obtained with slice, e.g.

# stars object with 2 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max.
# L7_ETMs.tif    21      49     63 64.4      77  255
# dimension(s):
#   from  to  offset delta            refsys point values x/y
# x    1 349  288776  28.5 SIRGAS 2000 / ... FALSE   NULL [x]
# y    1 352 9120761 -28.5 SIRGAS 2000 / ... FALSE   NULL [y]

which drops the singular dimension. mutate can be used on stars objects to add new arrays as functions of existing ones, transmute drops existing ones.

### 7.3.3 Cropping

Further subsetting can be done using spatial objects of class sf, sfc or bbox, e.g. when using the sample raster,

# stars object with 3 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
# L7_ETMs.tif    22      54     66 67.7    78.2  174 2184
# dimension(s):
#      from  to  offset delta            refsys point values x/y
# x     157 193  288776  28.5 SIRGAS 2000 / ... FALSE   NULL [x]
# y     159 194 9120761 -28.5 SIRGAS 2000 / ... FALSE   NULL [y]
# band    1   6      NA    NA                NA    NA   NULL
selects the circular center region with a diameter of 500 metre, for the first band shown in figure 7.5,

where we see that pixels outside the spatial object are assigned NA values. This object still has dimension indexes relative to the offset and delta values of r; we can reset these to a new offset with

#      from to  offset delta            refsys point values x/y
# x       1 37  293222  28.5 SIRGAS 2000 / ... FALSE   NULL [x]
# y       1 36 9116258 -28.5 SIRGAS 2000 / ... FALSE   NULL [y]
# band    1  6      NA    NA                NA    NA   NULL

By default, the resulting raster is cropped to the extent of the selection object; an object with the same dimensions as the input object is obtained with

# stars object with 3 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max.   NA's
# L7_ETMs.tif    22      54     66 67.7    78.2  174 731280
# dimension(s):
#      from  to  offset delta            refsys point values x/y
# x       1 349  288776  28.5 SIRGAS 2000 / ... FALSE   NULL [x]
# y       1 352 9120761 -28.5 SIRGAS 2000 / ... FALSE   NULL [y]
# band    1   6      NA    NA                NA    NA   NULL

Cropping a stars object can alternatively be done directly with st_crop, as in

### 7.3.4 Redimensioning stars objects

Package stars uses package abind (Plate and Heiberger 2016) for a number of array manipulations. One of them is aperm which transposes an array by permuting it. A method for stars objects is provided, and

# stars object with 3 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max.
# L7_ETMs.tif     1      54     69 68.9      86  255
# dimension(s):
#      from  to  offset delta            refsys point values x/y
# band    1   6      NA    NA                NA    NA   NULL
# x       1 349  288776  28.5 SIRGAS 2000 / ... FALSE   NULL [x]
# y       1 352 9120761 -28.5 SIRGAS 2000 / ... FALSE   NULL [y]

permutes the order of dimensions of the resulting object.

Attributes and dimensions can be swapped, using split and merge:

# stars object with 2 dimensions and 6 attributes
# attribute(s):
#     Min. 1st Qu. Median Mean 3rd Qu. Max.
# X1    47      67     78 79.1      89  255
# X2    32      55     66 67.6      79  255
# X3    21      49     63 64.4      77  255
# X4     9      52     63 59.2      75  255
# X5     1      63     89 83.2     112  255
# X6     1      32     60 60.0      88  255
# dimension(s):
#   from  to  offset delta            refsys point values x/y
# x    1 349  288776  28.5 SIRGAS 2000 / ... FALSE   NULL [x]
# y    1 352 9120761 -28.5 SIRGAS 2000 / ... FALSE   NULL [y]
# stars object with 3 dimensions and 1 attribute
# attribute(s):
#    Min. 1st Qu. Median Mean 3rd Qu. Max.
# X     1      54     69 68.9      86  255
# dimension(s):
#            from  to  offset delta            refsys point    values x/y
# x             1 349  288776  28.5 SIRGAS 2000 / ... FALSE      NULL [x]
# y             1 352 9120761 -28.5 SIRGAS 2000 / ... FALSE      NULL [y]
# attributes    1   6      NA    NA                NA    NA X1,...,X6

split distributes the band dimension over 6 attributes of a 2-dimensional array, merge reverses this operation. st_redimension can be used for more generic operations, such as splitting a single array dimension over two new dimensions:

# stars object with 4 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max.
# L7_ETMs.tif     1      54     69 68.9      86  255
# dimension(s):
#    from  to  offset delta            refsys point values
# X1    1 349  288776  28.5 SIRGAS 2000 / ... FALSE   NULL
# X2    1 352 9120761 -28.5 SIRGAS 2000 / ... FALSE   NULL
# X3    1   3      NA    NA                NA    NA   NULL
# X4    1   2      NA    NA                NA    NA   NULL

### 7.3.5 Extracting point samples, aggregating

A very common use case for raster data cube analysis is the extraction of values at certain locations, or computing aggregations over certain geometries. st_extract extracts point values. We will do this for a few randomly sampled points over the bounding box of r:

# stars object with 2 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max.
# L7_ETMs.tif    12    41.8     63   61    80.5  145
# dimension(s):
#          from to offset delta            refsys point
# geometry    1 20     NA    NA SIRGAS 2000 / ...  TRUE
# band        1  6     NA    NA                NA    NA
#                                                     values
# geometry POINT (293002 9115516),...,POINT (290941 9114128)
# band                                                  NULL

which results in a vector data cube with 20 points and 6 bands.

Another way of extracting information from data cubes is by aggregating it. One way of doing is by spatial aggregation, e.g. to values for spatial polygons or lines (section 6.4). We can for instance compute the maximum pixel value for each band for each of the circles shown in figure 1.3d by

# stars object with 2 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max.
# L7_ETMs.tif    73    94.2    117  121     142  205
# dimension(s):
#          from to offset delta            refsys point
# geometry    1  3     NA    NA SIRGAS 2000 / ... FALSE
# band        1  6     NA    NA                NA    NA
#                                                                                                  values
# geometry POLYGON ((291330 9114499, 2..., POLYGON ((290519 9119219, 2..., POLYGON ((292193 9116038, 2...
# band                                                                                               NULL

which gives a data cube with 3 geometries and 6 bands. Aggregation over a temporal dimension is done by passing a time variable as the second argument to aggregate, as

• a set of time stamps indicating the start of time intervals or
• a time period like "weeks", "5 days" or "years"

### 7.3.6 Predictive models

The typical model prediction workflow in R is as follows:

• use a data.frame with response and predictor variables (covariates)
• create a model object based on this data.frame
• call predict with this model object and the data.frame with target predictor variable values

Package stars provides a predict method for stars objects that essentially wraps the last step, by creating the data.frame, calling the predict method for that, and reconstructing a stars object with the predicted values.

We will illustrate this with a trivial two-class example mapping land from sea in the example Landsat data set, using the sample points extracted above, shown in figure 7.6.

From this figure, we read “by eye” that the points 8, 14, 15, 18 and 19 are on water, the others on land. Using a linear discriminant (“maximum likelihood”) classifier, we find model predictions as shown in figure 7.7.

Here, we used the MASS:: prefix to avoid loading MASS, as that would mask select from dplyr. Another way would be to load MASS and unload it later on with detach().

We also see that the layer plotted in figure 7.7 is a factor variable, with class labels.

### 7.3.7 Plotting raster data

We can use the base plot method for stars objects, where the plot created with plot(r) is shown in figure 7.8.

is shown in figure 7.8. The default color scale uses grey tones, and stretches these such that color breaks correspond to data quantiles over all bands (“histogram equalization”). A more familiar view is the RGB or false color composite shown in figure 7.9.

Further details and options are given in Chapter 9.

### 7.3.8 Analysing raster data

Element-wise mathematical functions (section 6.3.2) on stars objects are just passed on to the arrays. This means that we can call functions and create expressions:

# stars object with 3 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max.
# L7_ETMs.tif     0    3.99   4.23 4.12    4.45 5.54
# dimension(s):
#      from  to  offset delta            refsys point values x/y
# x       1 349  288776  28.5 SIRGAS 2000 / ... FALSE   NULL [x]
# y       1 352 9120761 -28.5 SIRGAS 2000 / ... FALSE   NULL [y]
# band    1   6      NA    NA                NA    NA   NULL
# stars object with 3 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max.
# L7_ETMs.tif     1      62   77.5 77.1    94.9  266
# dimension(s):
#      from  to  offset delta            refsys point values x/y
# x       1 349  288776  28.5 SIRGAS 2000 / ... FALSE   NULL [x]
# y       1 352 9120761 -28.5 SIRGAS 2000 / ... FALSE   NULL [y]
# band    1   6      NA    NA                NA    NA   NULL

or even mask out certain values:

# stars object with 3 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max.   NA's
# L7_ETMs.tif    50      64     75   79      90  255 149170
# dimension(s):
#      from  to  offset delta            refsys point values x/y
# x       1 349  288776  28.5 SIRGAS 2000 / ... FALSE   NULL [x]
# y       1 352 9120761 -28.5 SIRGAS 2000 / ... FALSE   NULL [y]
# band    1   6      NA    NA                NA    NA   NULL

# stars object with 3 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max.
# L7_ETMs.tif     0      54     69   63      86  255
# dimension(s):
#      from  to  offset delta            refsys point values x/y
# x       1 349  288776  28.5 SIRGAS 2000 / ... FALSE   NULL [x]
# y       1 352 9120761 -28.5 SIRGAS 2000 / ... FALSE   NULL [y]
# band    1   6      NA    NA                NA    NA   NULL

Dimension-wise, we can apply functions to selected array dimensions (section 6.3.3) of stars objects similar to how apply does this to arrays. For instance, we can compute for each pixel the mean of the 6 band values by

# stars object with 2 dimensions and 1 attribute
# attribute(s):
#       Min. 1st Qu. Median Mean 3rd Qu. Max.
# mean  25.5    53.3   68.3 68.9      82  255
# dimension(s):
#   from  to  offset delta            refsys point values x/y
# x    1 349  288776  28.5 SIRGAS 2000 / ... FALSE   NULL [x]
# y    1 352 9120761 -28.5 SIRGAS 2000 / ... FALSE   NULL [y]

A more meaningful function would e.g. compute the NDVI (normalized differenced vegetation index):

# stars object with 2 dimensions and 1 attribute
# attribute(s):
#         Min. 1st Qu.  Median    Mean 3rd Qu.  Max.
# ndvi  -0.753  -0.203 -0.0687 -0.0643   0.187 0.587
# dimension(s):
#   from  to  offset delta            refsys point values x/y
# x    1 349  288776  28.5 SIRGAS 2000 / ... FALSE   NULL [x]
# y    1 352 9120761 -28.5 SIRGAS 2000 / ... FALSE   NULL [y]

Alternatively, one could have defined

which is more convenient if the number of bands is large, but which is also much slower than ndvi as it needs to be called for every pixel whereas ndvi can be called once for all pixels, or for large chunks of pixels. The mean for each band over the whole image is computed by

#   band mean
# 1    1 79.1
# 2    2 67.6
# 3    3 64.4
# 4    4 59.2
# 5    5 83.2
# 6    6 60.0

the result of which is small enough to be printed here as a data.frame. In these two examples, entire dimensions disappear. Sometimes, this does not happen (section 6.3.2); we can for instance compute the three quartiles for each band

# stars object with 2 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max.
# L7_ETMs.tif    32    60.8   66.5 69.8    78.8  112
# dimension(s):
#          from to offset delta refsys point        values
# quantile    1  3     NA    NA     NA    NA 25%, 50%, 75%
# band        1  6     NA    NA     NA    NA          NULL

and see that this creates a new dimension, quantile, with three values. Alternatively, the three quantiles over the 6 bands for each pixel are obtained by

# stars object with 3 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max.
# L7_ETMs.tif     4      55   69.2 67.2    81.2  255
# dimension(s):
#          from  to  offset delta            refsys point        values x/y
# quantile    1   3      NA    NA                NA    NA 25%, 50%, 75%
# x           1 349  288776  28.5 SIRGAS 2000 / ... FALSE          NULL [x]
# y           1 352 9120761 -28.5 SIRGAS 2000 / ... FALSE          NULL [y]

### 7.3.9 Curvilinear rasters

There are several reasons why non-regular rasters occur (figure 1.5). For one, when the data is Earth-bound, a regular raster does not fit the Earth’s surface, which is curved. Other reasons are:

• when we convert or transform a regular raster data into another coordinate reference system, it will become curvilinear unless we resample (warp; section 7.7); warping always goes at the cost of some loss of data and is not reversible
• observation may lead to irregular rasters; e.g. for satellite swaths, we may have a regular raster in the direction of the satellite (not aligned with $$x$$ or $$y$$), and rectilinear in the direction perpendicular to that (e.g. if the sensor discretizes the viewing angle in equal sections)

### 7.3.10 GDAL utils

The GDAL library is typically shipped with a number of executable binaries, the GDAL command line utilities for data translation and processing. A package like gdalUtils (Greenberg and Mattiuzzi 2020) provides R functions that actually call these binaries, using R’s system() call mechanism. This requires that these binaries are properly installed, and findable by R, which is something that an R package by itself cannot guarantee.

Several of these utilities (all except for those written in Python) are also available as C functions in the GDAL library, through the “GDAL Algorithms C API”. If an R package like sf that links to the GDAL library uses these C API algorithms, it means that the user no longer needs to install any GDAL binary command line utilities in addition to the R package.

Package sf allows calling these C API algorithms through function gdal_utils(), where the first argument is the name of the utility (stripped from the gdal prefix):

• info prints information on GDAL (raster) datasets
• warp warps a raster to a new raster, possibly in another CRS
• rasterize rasterizes a vector dataset
• translate translates a raster file to another format
• vectortranslate (for ogr2ogr) translates a vector file to another format
• buildvrt creates a virtual raster tile (a raster created from several files)
• demprocessing does varios processing steps of digital elevation models (dems)
• nearblack converts nearly black/white borders to black
• grid creates a regular grid from scattered data
• mdiminfo prints information on a multidimensional array
• mdimtranslate translates a multidimensional array into another format

These utilities work on files, and not not directly on sf or stars objects. However, stars_proxy objects are essentially pointers to files, and other objects can be written to file. Several of these utilities are (always or optionally) used, e.g. by st_mosaic, st_warp or st_write.

## 7.4 Vector data cube examples

### 7.4.1 Example: aggregating air quality time series

Air quality data from package spacetime were obtained from the airBase European air quality data base. Daily average PM$$_{10}$$ values were downloaded for rural background stations in Germany, 1998-2009.

We can create a stars object from the air matrix, the dates Date vector and the stations SpatialPoints objects by

# space  time
#    70  4383
# stars object with 2 dimensions and 1 attribute
# attribute(s):
#       Min. 1st Qu. Median Mean 3rd Qu. Max.   NA's
# PM10     0    9.92   14.8 17.7      22  274 157659
# dimension(s):
#         from   to     offset  delta            refsys point
# station    1   70         NA     NA +proj=longlat ...  TRUE
# time       1 4383 1998-01-01 1 days              Date FALSE
#                                          values
# station POINT (9.59 53.7),...,POINT (9.45 49.2)
# time                                       NULL

We can see from figure 7.10 that the time series are quite long, but also have large missing value gaps. Figure 7.11 shows the spatial distribution measurement stations and mean PM$$_{10}$$ values.

We can aggregate these station time series to area means, mostly as a simple exercise. For this, we use the aggregate method for stars objects

# stars object with 2 dimensions and 1 attribute
# attribute(s):
#       Min. 1st Qu. Median Mean 3rd Qu. Max.  NA's
# PM10  1.08    10.9   15.3 17.9    21.8  172 25679
# dimension(s):
#          from   to     offset  delta            refsys point
# geometry    1   16         NA     NA +proj=longlat ... FALSE
# time        1 4383 1998-01-01 1 days              Date FALSE
#                                                                     values
# geometry MULTIPOLYGON (((9.65 49.8, ...,...,MULTIPOLYGON (((10.8 51.6, ...
# time                                                                  NULL

and we can now for instance show the maps for six arbitrarily chosen days (figure 7.12), using

or create a time series plot of mean values for a single state (figure 7.13) by

### 7.4.2 Example: Bristol origin-destination datacube

The data used for this example come from Lovelace, Nowosad, and Muenchow (2019), and concern origin-destination (OD) counts: the number of persons going from region A to region B, by transportation mode. We have feature geometries for the 102 origin and destination regions, shown in figure 7.14.

and the OD counts come in a table with OD pairs as records, and transportation mode as variables:

# # A tibble: 6 × 7
#   o         d           all bicycle  foot car_driver train
#   <chr>     <chr>     <dbl>   <dbl> <dbl>      <dbl> <dbl>
# 1 E02002985 E02002985   209       5   127         59     0
# 2 E02002985 E02002987   121       7    35         62     0
# 3 E02002985 E02003036    32       2     1         10     1
# 4 E02002985 E02003043   141       1     2         56    17
# 5 E02002985 E02003049    56       2     4         36     0
# 6 E02002985 E02003054    42       4     0         21     0

We see that many combinations of origin and destination are implicit zeroes, otherwise these two numbers would have been similar:

# [1] 10404
# [1] 2910

We will form a three-dimensional vector datacube with origin, destination and transportation mode as dimensions. For this, we first “tidy” the bristol_od table to have origin (o), destination (d), transportation mode (mode), and count (n) as variables, using pivot_longer:

# # A tibble: 6 × 4
#   o         d         mode           n
#   <chr>     <chr>     <chr>      <dbl>
# 1 E02002985 E02002985 bicycle        5
# 2 E02002985 E02002985 foot         127
# 3 E02002985 E02002985 car_driver    59
# 4 E02002985 E02002985 train          0
# 5 E02002985 E02002987 bicycle        7
# 6 E02002985 E02002987 foot          35

Next, we form the three-dimensional array a, filled with zeroes:

# [1] 102 102   4

We see that the dimensions are named with the zone names (o, d) and the transportation mode name (mode). Every row of bristol_tidy denotes an array entry, and we can use this to to fill the non-zero entries of the bristol_tidy table with their appropriate value (n):

To be sure that there is not an order mismatch between the zones in bristol_zones and the zone names in bristol_tidy, we can get the right set of zones by:

(It happens that the order is already correct, but it is good practice to not assume this).

Next, with zones and modes we can create a stars dimensions object:

#      from  to offset delta refsys point
# o       1 102     NA    NA WGS 84 FALSE
# d       1 102     NA    NA WGS 84 FALSE
# mode    1   4     NA    NA     NA FALSE
#                                                                 values
# o    MULTIPOLYGON (((-2.51 51.4,...,...,MULTIPOLYGON (((-2.55 51.5,...
# d    MULTIPOLYGON (((-2.51 51.4,...,...,MULTIPOLYGON (((-2.55 51.5,...
# mode                                                 bicycle,...,train

and finally build or stars object from a and d:

# stars object with 3 dimensions and 1 attribute
# attribute(s):
#    Min. 1st Qu. Median Mean 3rd Qu. Max.
# N     0       0      0  4.8       0 1296
# dimension(s):
#      from  to offset delta refsys point
# o       1 102     NA    NA WGS 84 FALSE
# d       1 102     NA    NA WGS 84 FALSE
# mode    1   4     NA    NA     NA FALSE
#                                                                 values
# o    MULTIPOLYGON (((-2.51 51.4,...,...,MULTIPOLYGON (((-2.55 51.5,...
# d    MULTIPOLYGON (((-2.51 51.4,...,...,MULTIPOLYGON (((-2.55 51.5,...
# mode                                                 bicycle,...,train

We can take a single slice through from this three-dimensional array, e.g. for zone 33 (figure 7.14), by odm[,,33], and plot it with

the result of which is shown in figure 7.15. Subsetting this way, we take all attributes (there is only one: N) since the first argument is empty, we take all origin regions (second argument empty), we take destination zone 33 (third argument), and all transportation modes (fourth argument empty, or missing).

We plotted this particular zone because it has the largest number of travelers as its destination. We can find this out by summing all origins and travel modes by destination:

# [1] 33

Other aggregations we can carry out include: total transportation by OD (102 x 102):

# stars object with 2 dimensions and 1 attribute
# attribute(s):
#      Min. 1st Qu. Median Mean 3rd Qu. Max.
# sum     0       0      0 19.2      19 1434
# dimension(s):
#   from  to offset delta refsys point
# o    1 102     NA    NA WGS 84 FALSE
# d    1 102     NA    NA WGS 84 FALSE
#                                                              values
# o MULTIPOLYGON (((-2.51 51.4,...,...,MULTIPOLYGON (((-2.55 51.5,...
# d MULTIPOLYGON (((-2.51 51.4,...,...,MULTIPOLYGON (((-2.55 51.5,...

Origin totals, by mode:

# stars object with 2 dimensions and 1 attribute
# attribute(s):
#      Min. 1st Qu. Median Mean 3rd Qu. Max.
# sum     1    57.5    214  490     771 2903
# dimension(s):
#      from  to offset delta refsys point
# o       1 102     NA    NA WGS 84 FALSE
# mode    1   4     NA    NA     NA FALSE
#                                                                 values
# o    MULTIPOLYGON (((-2.51 51.4,...,...,MULTIPOLYGON (((-2.55 51.5,...
# mode                                                 bicycle,...,train

Destination totals, by mode:

# stars object with 2 dimensions and 1 attribute
# attribute(s):
#      Min. 1st Qu. Median Mean 3rd Qu.  Max.
# sum     0      13    104  490     408 12948
# dimension(s):
#      from  to offset delta refsys point
# d       1 102     NA    NA WGS 84 FALSE
# mode    1   4     NA    NA     NA FALSE
#                                                                 values
# d    MULTIPOLYGON (((-2.51 51.4,...,...,MULTIPOLYGON (((-2.55 51.5,...
# mode                                                 bicycle,...,train

Origin totals, summed over modes:

Destination totals, summed over modes (we had this):

We plot o and d together after joining them by

the result of which is shown in figure 7.16.

There is something to say for the argument that such maps give the wrong message, as both amount (color) and polygon size give an impression of amount. To take out the amount in the count, we can compute densities (count / km$$^2$$), by

shown in figure 7.17. Another way to normalize these totals would be to divide them by population size.

### 7.4.3 Tidy array data

The tidy data paper (Wickham 2014b) may suggest that such array data should be processed not as an array, but in a long table where each row holds (region, class, year, value), and it is always good to be able to do this. For primary handling and storage however, this is often not an option, because:

• a lot of array data are collected or generated as array data, e.g. by imagery or other sensory devices, or e.g. by climate models
• it is easier to derive the long table form from the array than vice versa
• the long table form requires much more memory, since the space occupied by dimension values is $$O(nmp)$$, rather than $$O(n+m+p)$$
• when missing-valued cells are dropped, the long table form loses the implicit indexing of the array form

To put this argument to the extreme, consider for instance that all image, video and sound data are stored in array form; few people would make a real case for storing them in a long table form instead. Nevertheless, R packages like tsibble take this approach, and have to deal with ambiguous ordering of multiple records with identical time steps for different spatial features and index them, which is solved for both automatically by using the array form – at the cost of using dense arrays, in package stars.

Package stars tries to follow the tidy manifesto to handle array sets, and has particularly developed support for the case where one or more of the dimensions refer to space, and/or time.

## 7.5 raster-to-vector, vector-to-raster

Section 1.3 already showed some examples of raster-to-vector and vector-to-raster conversions, this section will add some code details and examples.

### 7.5.1 vector-to-raster

st_as_stars is meant as a method to transform objects into stars objects. However, not all stars objects are raster objects, and the method for sf objects creates a vector data cube with the geometry as its spatial (vector) dimension, and attributes as attributes. When given a feature geometry (sfc) object, st_as_stars will rasterize it, as shown in section 7.7, and in figure 7.18.

# [1] "/home/edzer/R/x86_64-pc-linux-gnu-library/4.0/sf/gpkg/nc.gpkg"

Here, st_as_stars can be parameterized to control cell size, number of cells, and/or extent. The cell values returned are 0 for cells with center point outside the geometry and 1 for cell with center point inside or on the border of the geometry. Rasterizing existing features is done using st_rasterize, as also shown in figure 1.4:

# stars object with 2 dimensions and 3 attributes
# attribute(s):
#      SID74           SID79            name
#  Min.   : 0      Min.   : 0      Sampson :  655
#  1st Qu.: 3      1st Qu.: 3      Columbus:  648
#  Median : 5      Median : 6      Robeson :  648
#  Mean   : 8      Mean   :10      Bladen  :  604
#  3rd Qu.:10      3rd Qu.:13      Wake    :  590
#  Max.   :44      Max.   :57      (Other) :30952
#  NA's   :30904   NA's   :30904   NA's    :30904
# dimension(s):
#   from  to   offset      delta refsys point values x/y
# x    1 461 -84.3239  0.0192484  NAD27 FALSE   NULL [x]
# y    1 141  36.5896 -0.0192484  NAD27 FALSE   NULL [y]

Similarly, line and point geometries can be rasterized, as shown in figure 7.19.

## 7.6 Coordinate transformations and conversions

### 7.6.1st_crs

Spatial objects of class sf or stars contain a coordinate reference system that can be get or replaced with st_crs, or be set or replaced in a pipe with st_set_crs. Coordinate reference systems can be set with an EPSG code, like st_crs(4326) which will be converted to st_crs('EPSG:4326'), or with a PROJ.4 string like "+proj=utm +zone=25 +south", a name like “WGS84”, or a name preceded by an authority like “OGC:CRS84”; alternatives include a coordinate reference system definition in WKT, WKT-2 (section 2.5) or PROJJSON.

The object returned contains two fields:

• wkt with the WKT-2 representation
• input with the user input, if any, or a human readable description of the coordinate reference system, if available

### 7.6.6 Axis order

As mentioned in section 2.5, the definition of EPSG:4326,

# GEOGCRS["WGS 84",
#     ENSEMBLE["World Geodetic System 1984 ensemble",
#         MEMBER["World Geodetic System 1984 (Transit)"],
#         MEMBER["World Geodetic System 1984 (G730)"],
#         MEMBER["World Geodetic System 1984 (G873)"],
#         MEMBER["World Geodetic System 1984 (G1150)"],
#         MEMBER["World Geodetic System 1984 (G1674)"],
#         MEMBER["World Geodetic System 1984 (G1762)"],
#         MEMBER["World Geodetic System 1984 (G2139)"],
#         ELLIPSOID["WGS 84",6378137,298.257223563,
#             LENGTHUNIT["metre",1]],
#         ENSEMBLEACCURACY[2.0]],
#     PRIMEM["Greenwich",0,
#         ANGLEUNIT["degree",0.0174532925199433]],
#     CS[ellipsoidal,2],
#         AXIS["geodetic latitude (Lat)",north,
#             ORDER[1],
#             ANGLEUNIT["degree",0.0174532925199433]],
#         AXIS["geodetic longitude (Lon)",east,
#             ORDER[2],
#             ANGLEUNIT["degree",0.0174532925199433]],
#     USAGE[
#         SCOPE["Horizontal component of 3D system."],
#         AREA["World."],
#         BBOX[-90,-180,90,180]],
#     ID["EPSG",4326]]

indicates that the first axis is associated with latitude and the second with longitude; this is also the case for a number of other ellipsoidal coordinate reference systems. Although this is how the authority (EPSG) prescribes this, it is not how most datasets are currently stored! As most other software, package sf by default ignores this, and interprets ellipsoidal coordinates as (longitude, latitude) by default. If however data needs to be read e.g. from a WFS service that wants to be compliant to the authority, one can set

to globally instruct sf, when calling GDAL and PROJ routines, that authority compliance (latitude, longitude order) is assumed. It is anticipated that problems may happen in case of authority compliance, e.g. with plotting data. At the time of writing this, the plot method for sf objects respects the axis order flag and will swap coordinates using the transformation pipeline "+proj=pipeline +step +proj=axisswap +order=2,1" before plotting them, but e.g. geom_sf() in ggplot2 has not been modified to do this. As mentioned earlier, the ambiguity of EPSG:4326 is resolved by replacing it with OGC:CRS84.

## 7.7 Transforming and warping rasters

When using st_transform on a raster data set, as e.g. in

# stars object with 3 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max.
# L7_ETMs.tif     1      54     69 68.9      86  255
# dimension(s):
#      from  to offset delta refsys point                          values x/y
# x       1 349     NA    NA WGS 84 FALSE [349x352] -34.9165,...,-34.8261 [x]
# y       1 352     NA    NA WGS 84 FALSE  [349x352] -8.0408,...,-7.94995 [y]
# band    1   6     NA    NA     NA    NA                            NULL
# curvilinear grid

we see that a curvilinear is created, which means that for every grid cell the coordinates are computed in the new CRS, which no longer form a regular grid. Plotting such data is extremely slow, as small polygons are computed for every grid cell and then plotted. The advantage of this is that no information is lost: grid cell values remain identical after the projection.

When we start with a raster on a regular grid and want to obtain a regular grid in a new coordinate reference system, we need to warp the grid: we need to recreate a grid at new locations, and use some rule to assign values to new grid cells. Rules can involve using the nearest value, or using some form of interpolation. This operation is not lossless and not invertible.

The best options for warping is to specify the target grid as a stars object. When only a target CRS is specified, default options for the target grid are picked that may be completely inappropriate for the problem at hand. An example workflow that uses only a target CRS is

#      from  to   offset        delta refsys point values x/y
# x       1 350 -34.9166  0.000259243 WGS 84    NA   NULL [x]
# y       1 352 -7.94982 -0.000259243 WGS 84    NA   NULL [y]
# band    1   6       NA           NA     NA    NA   NULL

which creates a pretty close raster, but then the transformation is also relatively modest. For a workflow that creates a target raster first, here with exactly the same number of rows and columns as the original raster one could use:

# stars object with 3 dimensions and 1 attribute
# attribute(s):
#              Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
# L7_ETMs.tif     1      54     69 68.9      86  255 6180
# dimension(s):
#      from  to   offset        delta refsys point values x/y
# x       1 349 -34.9166  0.000259666 WGS 84    NA   NULL [x]
# y       1 352 -7.94982 -0.000258821 WGS 84    NA   NULL [y]
# band    1   6       NA           NA     NA    NA   NULL

## 7.8 Exercises

Use R to solve the following exercises.

1. Find the names of the nc counties that intersect LINESTRING(-84 35,-78 35); use [ for this, and as an alternative use st_join() for this.
2. Repeat this after setting sf_use_s2(FALSE), and compute the difference (hint: use setdiff()), and color the counties of the difference using color ‘#88000088’.
3. Plot the two different lines in a single plot; note that R will plot a straight line always straight in the projection currently used; st_segmentize can be used to add points on straight line, or on a great circle for ellipsoidal coordinates.
4. NDVI, normalized differenced vegetation index, is computed as (NIR-R)/(NIR+R), with NIR the near infrared and R the red band. Read the L7_ETMs.tif file into object x, and distribute the band dimensions over attributes by split(x, "band"). Then, add attribute NDVI to this object by using an expression that uses the NIR (band 4) and R (band 3) attributes directly.
5. Compute NDVI for the L7_ETMs.tif image by reducing the band dimension, using st_apply and an a function ndvi = function(x) { (x[4]-x[3])/(x[4]+x[3]) }. Plot the result, and write the result to a GeoTIFF.
6. Use st_transform to transform the stars object read from L7_ETMs.tif to EPSG:4326. Print the object. Is this a regular grid? Plot the first band using arguments axes=TRUE and border=NA, and explain why this takes such a long time.
7. Use st_warp to warp the L7_ETMs.tif object to EPSG:4326, and plot the resulting object with axes=TRUE. Why is the plot created much faster than after st_transform?
8. Using a vector representation of the raster L7_ETMs, plot the intersection with a circular area around POINT(293716 9113692) with radius 75 m, and compute the area-weighted mean pixel values for this circle. Compare the area-weighted values with those obtained by aggregate using the vector data, and by aggregate using the raster data, using exact=FALSE (default) and exact=FALSE. Explain the differences.

### References

Eddelbuettel, Dirk. 2013. Seamless R and C++ Integration with Rcpp. Springer.

Greenberg, Jonathan Asher, and Matteo Mattiuzzi. 2020. GdalUtils: Wrappers for the Geospatial Data Abstraction Library (Gdal) Utilities. https://CRAN.R-project.org/package=gdalUtils.

Herring, John, and others. 2011. “Opengis Implementation Standard for Geographic Information-Simple Feature Access-Part 1: Common Architecture [Corrigendum].”

Hijmans, Robert J. 2022a. Raster: Geographic Data Analysis and Modeling. https://rspatial.org/raster.

Hijmans, Robert J. 2022b. Terra: Spatial Data Analysis. https://rspatial.org/terra/.

Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. 2019. Geocomputation with R. Chapman; Hall/CRC. https://geocompr.robinlovelace.net/.

Pebesma, Edzer. 2012. “spacetime: Spatio-Temporal Data in R.” Journal of Statistical Software 51 (7): 1–30. https://www.jstatsoft.org/v51/i07/.

Pebesma, Edzer. 2018. “Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009.

Plate, Tony, and Richard Heiberger. 2016. Abind: Combine Multidimensional Arrays. https://CRAN.R-project.org/package=abind.

Wickham, Hadley. 2014b. “Tidy Data.” Journal of Statistical Software 59 (1). https://www.jstatsoft.org/article/view/v059i10.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse.” Journal of Open Source Software 4 (43): 1686. https://joss.theoj.org/papers/10.21105/joss.01686.

Wickham, Hadley, and Garret Grolemund. 2017. R for Data Science. O’Reilly. http://r4ds.had.co.nz/.