Interpolation - R Spatial
Interpolation - R Spatial
Interpolation
Introduction
Almost any variable of interest has spatial autocorrelation. That can be a problem in
statistical tests, but it is a very useful feature when we want to predict values at locations
where no measurements have been made; as we can generally safely assume that values
at nearby locations will be similar. There are several spatial interpolation techniques. We
show some of them in this chapter.
Temperature in California
We will be working with temperature data for California. If have not yet done so, first install
the rspatial package to get the data. You may need to install the devtools package first.
if (!require("rspatial")) devtools::install_github('rspatial/rspatial')
## Loading required package: rspatial
library(rspatial)
d <- sp_data('precipitation')
head(d)
## ID NAME LAT LONG ALT JAN FEB MAR APR MAY JUN
## 1 ID741 DEATH VALLEY 36.47 -116.87 -59 7.4 9.5 7.5 3.4 1.7 1.0
## 2 ID743 THERMAL/FAA AIRPORT 33.63 -116.17 -34 9.2 6.9 7.9 1.8 1.6 0.4
## 3 ID744 BRAWLEY 2SW 32.96 -115.55 -31 11.3 8.3 7.6 2.0 0.8 0.1
## 4 ID753 IMPERIAL/FAA AIRPORT 32.83 -115.57 -18 10.6 7.0 6.1 2.5 0.2 0.0
## 5 ID754 NILAND 33.28 -115.51 -18 9.0 8.0 9.0 3.0 0.0 1.0
## 6 ID758 EL CENTRO/NAF 32.82 -115.67 -13 9.8 1.6 3.7 3.0 0.4 0.0
## JUL AUG SEP OCT NOV DEC
## 1 3.7 2.8 4.3 2.2 4.7 3.9
## 2 1.9 3.4 5.3 2.0 6.3 5.5
## 3 1.9 9.2 6.5 5.0 4.8 9.7
## 4 2.4 2.6 8.3 5.4 7.7 7.3
## 5 8.0 9.0 7.0 8.0 7.0 9.0
## 6 3.0 10.8 0.2 0.0 3.3 1.4
https://rspatial.org/analysis/4-interpolation.html 1/16
02/09/2019 Interpolation — R Spatial
library(sp)
dsp <- SpatialPoints(d[,4:3], proj4string=CRS("+proj=longlat +datum=NAD83"))
dsp <- SpatialPointsDataFrame(dsp, d)
CA <- sp_data("counties")
# define groups for mapping
cuts <- c(0,200,300,500,1000,3000)
# set up a palette of interpolated colors
blues <- colorRampPalette(c('yellow', 'orange', 'blue', 'dark blue'))
pols <- list("sp.polygons", CA, fill = "lightgray")
spplot(dsp, 'prec', cuts=cuts, col.regions=blues(5), sp.layout=pols, pch=20, cex=2)
https://rspatial.org/analysis/4-interpolation.html 2/16
02/09/2019
Transform Interpolation
longitude/latitude to planar coordinates, — R Spatial
using the commonly used coordinate
reference system for California (“Teale Albers”) to assure that our interpolation results will
align with other data sets we have.
We are going to interpolate (estimate for unsampled locations) the precipitation values. The
simplest way would be to take the mean of all observations. We can consider that a “Null-
model” that we can compare other approaches to. We’ll use the Root Mean Square Error
(RMSE) as evaluation statistic.
proximity polygons
Proximity polygons can be used to interpolate categorical variables. Another term for this is
“nearest neighbour” interpolation.
library(dismo)
v <- voronoi(dta)
plot(v)
https://rspatial.org/analysis/4-interpolation.html 3/16
02/09/2019 Interpolation — R Spatial
ca <- aggregate(cata)
vca <- intersect(v, ca)
spplot(vca, 'prec', col.regions=rev(get_col_regions()))
Much better. These are polygons. We can ‘rasterize’ the results like this.
https://rspatial.org/analysis/4-interpolation.html 4/16
02/09/2019 Interpolation — R Spatial
set.seed(5132015)
kf <- kfold(nrow(dta))
rmse <- rep(NA, 5)
for (k in 1:5) {
test <- dta[kf == k, ]
train <- dta[kf != k, ]
v <- voronoi(train)
p <- extract(v, test)
rmse[k] <- RMSE(test$prec, p$prec)
}
rmse
## [1] 199.0686 187.8069 166.9153 191.0938 238.9696
mean(rmse)
## [1] 196.7708
1 - (mean(rmse) / null)
## [1] 0.5479875
Question 1: Describe what each step in the code chunk above does
Question 2: How does the proximity-polygon approach compare to the NULL model?
Question 3: You would not typically use proximty polygons for rainfall data. For what kind of
data would you use them?
We can use the gstat package for this. First we fit a model. ~1 means “intercept only”. In
the case of spatial data, that would be only ‘x’ and ‘y’ coordinates are used. We set the
maximum number of points to 5, and the “inverse distance power” idp to zero, such that all
https://rspatial.org/analysis/4-interpolation.html 5/16
02/09/2019
five neighbors are equally weighted Interpolation — R Spatial
library(gstat)
gs <- gstat(formula=prec~1, locations=dta, nmax=5, set=list(idp = 0))
nn <- interpolate(r, gs)
## [inverse distance weighted interpolation]
nnmsk <- mask(nn, vr)
plot(nnmsk)
Cross validate the result. Note that we can use the predict method to get predictions for
the locations of the test points.
https://rspatial.org/analysis/4-interpolation.html 6/16
02/09/2019
A more Interpolationweighted”
commonly used method is “inverse distance — R Spatial interpolation. The only
difference with the nearest neighbour approach is that points that are further away get less
weight in predicting a value a location.
library(gstat)
gs <- gstat(formula=prec~1, locations=dta)
idw <- interpolate(r, gs)
## [inverse distance weighted interpolation]
idwr <- mask(idw, vr)
plot(idwr)
Question 4: IDW generated rasters tend to have a noticeable artefact. What is that?
https://rspatial.org/analysis/4-interpolation.html 7/16
02/09/2019
Question 5: Inspect the arguments used for Interpolation
and make —R
aSpatial
map of the IDW model below.
What other name could you give to this method (IDW with these parameters)? Why?
Data preparation
We use the airqual dataset to interpolate ozone levels for California (averages for 1980-
2009). Use the variable OZDLYAV (unit is parts per billion). Original data source.
Read the data file. To get easier numbers to read, I multiply OZDLYAV with 1000
library(rspatial)
x <- sp_data("airqual")
x$OZDLYAV <- x$OZDLYAV * 1000
Create a SpatialPointsDataFrame and transform to Teale Albers. Note the units=km , which
was needed to fit the variogram.
library(sp)
coordinates(x) <- ~LONGITUDE + LATITUDE
proj4string(x) <- CRS('+proj=longlat +datum=NAD83')
TA <- CRS("+proj=aea +lat_1=34 +lat_2=40.5 +lat_0=0 +lon_0=-120 +x_0=0 +y_0=-4000000
+datum=NAD83 +units=km +ellps=GRS80")
library(rgdal)
aq <- spTransform(x, TA)
Fit a variogram
https://rspatial.org/analysis/4-interpolation.html 8/16
02/09/2019 Interpolation — R Spatial
library(gstat)
gs <- gstat(formula=OZDLYAV~1, locations=aq)
v <- variogram(gs, width=20)
head(v)
## np dist gamma dir.hor dir.ver id
## 1 1010 11.35040 34.80579 0 0 var1
## 2 1806 30.63737 47.52591 0 0 var1
## 3 2355 50.58656 67.26548 0 0 var1
## 4 2619 70.10411 80.92707 0 0 var1
## 5 2967 90.13917 88.93653 0 0 var1
## 6 3437 110.42302 84.13589 0 0 var1
plot(v)
https://rspatial.org/analysis/4-interpolation.html 9/16
02/09/2019 Interpolation — R Spatial
https://rspatial.org/analysis/4-interpolation.html 10/16
02/09/2019 Interpolation — R Spatial
plot(v, fve)
Ordinary kriging
https://rspatial.org/analysis/4-interpolation.html 11/16
02/09/2019 Interpolation — R Spatial
# variance
ok <- brick(kp)
ok <- mask(ok, ca)
names(ok) <- c('prediction', 'variance')
plot(ok)
Let’s use gstat again to do IDW interpolation. The basic approach first.
library(gstat)
idm <- gstat(formula=OZDLYAV~1, locations=aq)
idp <- interpolate(r, idm)
## [inverse distance weighted interpolation]
idp <- mask(idp, ca)
plot(idp)
https://rspatial.org/analysis/4-interpolation.html 12/16
02/09/2019
We Interpolation
can find good values for the idw parameters — R Spatial
(distance decay and number of neighbours)
through optimization. For simplicity’s sake I do not do that k times here. The optim function
may be a bit hard to grasp at first. But the essence is simple. You provide a function that
returns a value that you want to minimize (or maximize) given a number of unknown
parameters. Your provide initial values for these parameters, and optim then searches for
the optimal values (for which the function returns the lowest number).
https://rspatial.org/analysis/4-interpolation.html 13/16
02/09/2019 Interpolation — R Spatial
library(fields)
m <- Tps(coordinates(aq), aq$OZDLYAV)
tps <- interpolate(r, m)
tps <- mask(tps, idw)
plot(tps)
Cross-validate
Cross-validate the three methods (IDW, Ordinary kriging, TPS) and add RMSE weighted
ensemble model.
https://rspatial.org/analysis/4-interpolation.html 14/16
02/09/2019 Interpolation — R Spatial
library(dismo)
nfolds <- 5
k <- kfold(aq, nfolds)
ensrmse <- tpsrmse <- krigrmse <- idwrmse <- rep(NA, 5)
for (i in 1:nfolds) {
test <- aq[k!=i,]
train <- aq[k==i,]
m <- gstat(formula=OZDLYAV~1, locations=train, nmax=opt$par[1], set=list(idp=opt$par[2]))
p1 <- predict(m, newdata=test, debug.level=0)$var1.pred
idwrmse[i] <- RMSE(test$OZDLYAV, p1)
m <- gstat(formula=OZDLYAV~1, locations=train, model=fve)
p2 <- predict(m, newdata=test, debug.level=0)$var1.pred
krigrmse[i] <- RMSE(test$OZDLYAV, p2)
m <- Tps(coordinates(train), train$OZDLYAV)
p3 <- predict(m, coordinates(test))
tpsrmse[i] <- RMSE(test$OZDLYAV, p3)
w <- c(idwrmse[i], krigrmse[i], tpsrmse[i])
weights <- w / sum(w)
ensemble <- p1 * weights[1] + p2 * weights[2] + p3 * weights[3]
ensrmse[i] <- RMSE(test$OZDLYAV, ensemble)
}
rmi <- mean(idwrmse)
rmk <- mean(krigrmse)
rmt <- mean(tpsrmse)
rms <- c(rmi, rmt, rmk)
rms
## [1] 8.041305 8.307235 7.930799
rme <- mean(ensrmse)
rme
## [1] 7.858051
We can use the rmse scores to make a weighted ensemble. Let’s look at the maps
https://rspatial.org/analysis/4-interpolation.html 15/16
02/09/2019 Interpolation — R Spatial
Question 7: Show where the largest difference exist between IDW and OK.
Question 8: Show where the difference between IDW and OK is within the 95% confidence
limit of the OK prediction.
Question 9: Can you describe the pattern we are seeing, and speculate about what is
causing it?
https://rspatial.org/analysis/4-interpolation.html 16/16