# Introduction

mgwrsar package estimates linear and local linear models with spatial autocorrelation of the following forms:

$y=\beta_v(u_i,v_i) X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (GWR)$

$y=\beta_c X_c+\beta_v(u_i,v_i) X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR)$

$y=\lambda Wy+\beta_c X_c+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR-SAR(0,k,0))$

$y=\lambda Wy+\beta_v(u_i,v_i)X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR-SAR(0,0,k))$

$y=\lambda Wy+\beta_c X_c+\beta_v(u_i,v_i)X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR-SAR(0,k_c,k_v))$

$y=\lambda(u_i,v_i) Wy+\beta_c X_c+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR-SAR(1,k,0))$

$y=\lambda(u_i,v_i)Wy+\beta_v(u_i,v_i)X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR-SAR(1,0,k))$

$y=\lambda(u_i,v_i)Wy+\beta_cX_c+\beta_v(u_i,v_i)X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR-SAR(1,k_c,k_v))$

For more details on the estimation method, see Geniaux, G. and Martinetti, D. (2017). A new method for dealing simultaneously with spatial autocorrelation and spatial heterogeneity in regression models. Regional Science and Urban Economics. (https://doi.org/10.1016/j.regsciurbeco.2017.04.001)

The estimation of the previous model can be done using the MGWRSAR wrapper function which requires a dataframe and a matrix of coordinates. Note that:

• When the model implies spatial autocorrelation, a row normalized spatial weight matrix must be provided.
• 2SLS and Best 2SLS method can be used. When model imply local regressions, a bandwidth and a kernel type must be provided.
• When the model implies mixed local regression, the name of stationary covariates must be provided.
• Optimal bandwidth can be estimated using bandwidths_mgwrsar function.

In addition to the ability of considering spatial autocorrelation in GWR/MGWR like models, MGWRSAR function introduces several useful technics for estimating local regression with space coordinates:

• it uses RCCP and RCCPeigen code that speed up computations and allows parallel computing (doParallel package),
• it allows to drop out variables with not enough local variance in local regression, which allows to consider dummies in GWR/MGWR framework without trouble.
• it allows to drop out local outliers in local regression.
• it allows to consider additional variable for kernel, including time (asymmetric kernel) and categorical variables (see Li and Racine 2010). Experimental version.

# Using mgwrsar package

The MGWRSAR function requires a dataframe and a matrix of coordinates, and eventually a spatial weight matrix if model include spatiale dependence. The package includes a simulated data example which is used for this vignette.

library(mgwrsar)
data(data_mgwrsar)
coord=as.matrix(mydata[,c("x_lat","y_lon")])
## Creating a spatial weigth matrix (sparce dgCMatrix) of 8 nearest neighbors
W=KNN(coord,8)

## Estimation

Estimation of a GWR with a gauss kernel with and without parallel computation:

### with parallel computing
#ptm1<-proc.time()
#model_GWR<-MGWRSAR(formula = 'Y_gwr~X1+X2+X3', data = mydata,coord=coord, #fixed_vars=NULL,kernels=c('gauss'),H=0.13, Model = 'GWR',control=list(SE=TRUE,doMC=TRUE,ncore=4))
#(proc.time()-ptm1)[3]

### without parallel computing
ptm1<-proc.time()
model_GWR<-MGWRSAR(formula = 'Y_gwr~X1+X2+X3', data = mydata,coord=coord, fixed_vars=NULL,kernels=c('gauss'),H=0.13, Model = 'GWR',control=list(SE=TRUE,doMC=FALSE))
(proc.time()-ptm1)[3]
#> elapsed
#>   1.004

Summary and plot of mgwrsar object using leaflet:

summary_mgwrsar(model_GWR)
#> Call:
#> MGWRSAR(formula = "Y_gwr~X1+X2+X3", data = mydata, coord = coord,
#>     fixed_vars = NULL, kernels = c("gauss"), H = 0.13, Model = "GWR",
#>     control = list(SE = TRUE, doMC = FALSE))
#> Model: GWR
#> Kernels function: gauss
#> Kernels type: GD
#> bandwidth: 0.13
#> Number of data points: 1000
#> Varying parameters: Intercept X1 X2 X3
#>         Intercept       X1       X2      X3
#> Min.     -2.63740  0.16880 -0.37306 -1.7588
#> 1st Qu.  -1.46100  0.38590  1.52729 -1.3070
#> Median   -1.08812  0.51409  1.94806 -1.0194
#> Mean     -0.97597  0.48879  1.94778 -1.0289
#> 3rd Qu.  -0.62307  0.59454  2.58871 -0.7439
#> Max.      1.06100  0.79570  3.37755 -0.3083
#> Effective degrees of freedom: 955.0674
#> AIC 135.0056
#> Residual sum of squares: 64.0684
plot_mgwrsar(model_GWR,type='B_coef',var='X2')
plot_mgwrsar(model_GWR,type='t_coef',var='X2')

Estimation of a GWR with spgwr package:

library(spgwr)
#> To access larger datasets in this package, install the spDataLarge
#> package with: install.packages('spDataLarge')
#> NOTE: This package does not constitute approval of GWR
#> as a method of spatial analysis; see example(gwr)
mydataSP=mydata
coordinates(mydataSP)=c('x_lat','y_lon')
ptm1<-proc.time()
model_spgwr <- gwr(Y_gwr~X1+X2+X3, data=mydataSP, bandwidth=0.13,hatmatrix=TRUE)
(proc.time()-ptm1)[3]
#> elapsed
#>  15.977
head(model_spgwr$SDF@data$gwr.e)
#> [1] -0.1585824 -0.1861144 -0.4450174 -0.1767796 -0.2014330  0.2670349
model_spgwr
#> Call:
#> gwr(formula = Y_gwr ~ X1 + X2 + X3, data = mydataSP, bandwidth = 0.13,
#>     hatmatrix = TRUE)
#> Kernel function: gwr.Gauss
#> Fixed bandwidth: 0.13
#> Summary of GWR coefficient estimates at data points:
#>                  Min.  1st Qu.   Median  3rd Qu.     Max.  Global
#> X.Intercept. -2.63740 -1.46100 -1.08812 -0.62307  1.06100 -1.4845
#> X1            0.16880  0.38590  0.51409  0.59454  0.79570  0.5000
#> X2           -0.37306  1.52729  1.94806  2.58871  3.37755  2.5481
#> X3           -1.75876 -1.30704 -1.01941 -0.74386 -0.30828 -1.0762
#> Number of data points: 1000
#> Effective number of parameters (residual: 2traceS - traceS'S): 62.85371
#> Effective degrees of freedom (residual: 2traceS - traceS'S): 937.1463
#> Sigma (residual: 2traceS - traceS'S): 0.2614678
#> Effective number of parameters (model: traceS): 44.93259
#> Effective degrees of freedom (model: traceS): 955.0674
#> Sigma (model: traceS): 0.2590031
#> Sigma (ML): 0.2531174
#> AICc (GWR p. 61, eq 2.33; p. 96, eq. 4.21): 186.462
#> AIC (GWR p. 96, eq. 4.22): 135.0056
#> Residual sum of squares: 64.0684
#> Quasi-global R2: 0.9813492

all(abs(model_GWR$residuals-model_spgwr$SDF@data$gwr.e)<0.00000000001) #> [1] TRUE Estimation of a GWR with leave-one out CV, using an adaptive bisquare kernel (based on 20 nearest neighbors) and a local Outlier Detection/Removal procedure: model_GWR<-MGWRSAR(formula = 'Y_gwr~X1+X2+X3', data = mydata,coord=coord, fixed_vars=NULL,kernels=c('bisq_knn'),H=20, Model = 'GWR',control=list(isgcv=TRUE,remove_local_outlier=TRUE,outv=0.01)) summary_mgwrsar(model_GWR) #> Call: #> MGWRSAR(formula = "Y_gwr~X1+X2+X3", data = mydata, coord = coord, #> fixed_vars = NULL, kernels = c("bisq_knn"), H = 20, Model = "GWR", #> control = list(isgcv = TRUE, remove_local_outlier = TRUE, #> outv = 0.01)) #> Model: GWR #> Kernels function: bisq_knn #> Kernels type: GD #> bandwidth: 20 #> Number of data points: 1000 #> Varying parameters: Intercept X1 X2 X3 #> Intercept X1 X2 X3 #> Min. -4.223718 0.001589 -1.182832 -2.0699 #> 1st Qu. -1.448438 0.347360 1.026678 -1.2974 #> Median -0.773579 0.537315 1.660816 -1.0162 #> Mean -0.797981 0.497172 1.665602 -1.0071 #> 3rd Qu. -0.173053 0.640754 2.456682 -0.7263 #> Max. 2.267588 0.911069 4.103331 -0.0933 #> Residual sum of squares: 16.54259 Estimation of a MGWR with stationary intercept using an adapative gaussian kernel (based on 20 nearest neighbors) + leave-one out CV estimation  model_MGWR<-MGWRSAR(formula = 'Y_mgwr~X1+X2+X3', data = mydata,coord=coord, fixed_vars=c('Intercept'),kernels=c('gauss_adapt'),H=20, Model = 'MGWR',control=list(SE=TRUE)) summary_mgwrsar(model_MGWR) #> Call: #> MGWRSAR(formula = "Y_mgwr~X1+X2+X3", data = mydata, coord = coord, #> fixed_vars = c("Intercept"), kernels = c("gauss_adapt"), #> H = 20, Model = "MGWR", control = list(SE = TRUE)) #> Model: MGWR #> Kernels function: gauss_adapt #> Kernels type: GD #> bandwidth: 20 #> Number of data points: 1000 #> Constant parameters: Intercept #> -0.5563375 #> Varying parameters: X1 X2 X3 #> X1 X2 X3 #> Min. 0.17788 0.53761 -1.4822 #> 1st Qu. 0.37102 1.25611 -1.1637 #> Median 0.50101 1.62152 -1.0894 #> Mean 0.48714 1.62199 -1.0462 #> 3rd Qu. 0.58890 2.05394 -0.9554 #> Max. 0.82570 2.71433 -0.5656 #> Effective degrees of freedom: 924.9957 #> AIC -272.9127 #> Residual sum of squares: 41.38677 ### leave-one out CV estimation model_MGWR<-MGWRSAR(formula = 'Y_mgwr~X1+X2+X3', data = mydata,coord=coord, fixed_vars='Intercept',kernels=c('gauss_adapt'),H=20, Model = 'MGWR',control=list(isgcv=TRUE,remove_local_outlier=FALSE,outv=NULL)) summary_mgwrsar(model_MGWR) #> Call: #> MGWRSAR(formula = "Y_mgwr~X1+X2+X3", data = mydata, coord = coord, #> fixed_vars = "Intercept", kernels = c("gauss_adapt"), H = 20, #> Model = "MGWR", control = list(isgcv = TRUE, remove_local_outlier = FALSE, #> outv = NULL)) #> Model: MGWR #> Kernels function: gauss_adapt #> Kernels type: GD #> bandwidth: 20 #> Number of data points: 1000 #> Constant parameters: Intercept #> -0.5836998 #> Varying parameters: X1 X2 X3 #> X1 X2 X3 #> Min. 0.17826 0.53262 -1.4776 #> 1st Qu. 0.37341 1.27013 -1.1609 #> Median 0.50187 1.64459 -1.0867 #> Mean 0.48805 1.63952 -1.0424 #> 3rd Qu. 0.58608 2.07933 -0.9529 #> Max. 0.82377 2.72156 -0.5559 #> Residual sum of squares: 50.18494 Estimation of a MGWR with stationary intercept, using an adapative gaussian kernel (based on 20 nearest neighbors): model_MGWRSAR_1_0_kv<-MGWRSAR(formula = 'Y_mgwrsar_1_0_kv~X1+X2+X3', data = mydata,coord=coord, fixed_vars=NULL,kernels=c('gauss_adapt'),H=20, Model = 'MGWRSAR_1_0_kv',control=list(W=W,Lambdacor=TRUE,SE=TRUE)) summary_mgwrsar(model_MGWRSAR_1_0_kv) #> Call: #> MGWRSAR(formula = "Y_mgwrsar_1_0_kv~X1+X2+X3", data = mydata, #> coord = coord, fixed_vars = NULL, kernels = c("gauss_adapt"), #> H = 20, Model = "MGWRSAR_1_0_kv", control = list(W = W, Lambdacor = TRUE, #> SE = TRUE)) #> Model: MGWRSAR_1_0_kv #> Method for spatial autocorrelation: MGWRSAR_1_0_kv #> Kernels function: gauss_adapt #> Kernels type: GD #> bandwidth: 20 #> Number of data points: 1000 #> Varying parameters: Intercept X1 X2 X3 #> Intercept X1 X2 X3 lambda #> Min. -8.94883 0.10555 -2.65846 -2.10631 -0.2844 #> 1st Qu. -1.74523 0.37095 0.81483 -1.29394 0.1355 #> Median -0.99141 0.54257 2.11968 -1.03264 0.2965 #> Mean -1.06465 0.50832 2.18127 -1.07079 0.3152 #> 3rd Qu. -0.21901 0.63831 3.55568 -0.73756 0.4899 #> Max. 2.20903 0.94188 9.84625 -0.34718 0.9153 #> Effective degrees of freedom: 893.3478 #> AIC 2158.486 #> Residual sum of squares: 456.0999 Estimation of a mgwrsar(0,kc,kv) model with stationary intercept and stationnary spatial multiplier using an adapative gaussian kernel (based on 20 nearest neighbors): model_MGWRSAR_0_kc_kv<-MGWRSAR(formula = 'Y_mgwrsar_0_kc_kv~X1+X2+X3', data = mydata,coord=coord, fixed_vars=c('Intercept'),kernels=c('gauss_adapt'),H=20, Model = 'MGWRSAR_0_kc_kv',control=list(W=W)) summary_mgwrsar(model_MGWRSAR_0_kc_kv) #> Call: #> MGWRSAR(formula = "Y_mgwrsar_0_kc_kv~X1+X2+X3", data = mydata, #> coord = coord, fixed_vars = c("Intercept"), kernels = c("gauss_adapt"), #> H = 20, Model = "MGWRSAR_0_kc_kv", control = list(W = W)) #> Model: MGWRSAR_0_kc_kv #> Method for spatial autocorrelation: MGWRSAR_0_kc_kv #> Kernels function: gauss_adapt #> Kernels type: GD #> bandwidth: 20 #> Number of data points: 1000 #> Constant parameters: Intercept #> -0.3140325 0.2254189 #> Varying parameters: X1 X2 X3 #> X1 X2 X3 #> Min. 0.18113 -0.19560 -1.6156 #> 1st Qu. 0.38038 1.30055 -1.2737 #> Median 0.50396 1.92790 -1.1129 #> Mean 0.49373 1.93858 -1.0928 #> 3rd Qu. 0.59936 2.80034 -0.9463 #> Max. 0.86079 3.67151 -0.4367 #> Residual sum of squares: 122.6273 Estimation of a mgwrsar(0,0,kv) model with stationnary spatial multiplier using an adapative gaussian kernel (based on 20 nearest neighbors): model_MGWRSAR_0_0_kv<-MGWRSAR(formula = 'Y_mgwrsar_0_0_kv~X1+X2+X3', data = mydata,coord=coord, fixed_vars=NULL,kernels=c('gauss_adapt'),H=20, Model = 'MGWRSAR_0_0_kv',control=list(W=W)) summary_mgwrsar(model_MGWRSAR_0_0_kv) #> Call: #> MGWRSAR(formula = "Y_mgwrsar_0_0_kv~X1+X2+X3", data = mydata, #> coord = coord, fixed_vars = NULL, kernels = c("gauss_adapt"), #> H = 20, Model = "MGWRSAR_0_0_kv", control = list(W = W)) #> Model: MGWRSAR_0_0_kv #> Method for spatial autocorrelation: MGWRSAR_0_0_kv #> Kernels function: gauss_adapt #> Kernels type: GD #> bandwidth: 20 #> Number of data points: 1000 #> Constant parameters: #> 0.250141 #> Varying parameters: Intercept X1 X2 X3 #> Intercept X1 X2 X3 #> Min. -4.928243 0.104342 -1.979095 -2.0943 #> 1st Qu. -1.429473 0.365563 1.262898 -1.2896 #> Median -0.779578 0.536273 2.011070 -0.9916 #> Mean -0.689637 0.502558 2.023040 -1.0512 #> 3rd Qu. -0.094623 0.630135 3.365939 -0.7410 #> Max. 2.327138 0.891468 5.401186 -0.3125 #> Residual sum of squares: 93.50372 Estimation of a mgwrsar(1,kv,kv) model with stationary intercept using an adapative gaussian kernel (based on 20 nearest neighbors): model_MGWRSAR_1_kc_kv<-MGWRSAR(formula = 'Y_mgwrsar_1_kc_kv~X1+X2+X3', data = mydata,coord=coord, fixed_vars=c('Intercept'),kernels=c('gauss_adapt'),H=20, Model = 'MGWRSAR_1_kc_kv',control=list(W=W)) summary_mgwrsar(model_MGWRSAR_1_kc_kv) #> Call: #> MGWRSAR(formula = "Y_mgwrsar_1_kc_kv~X1+X2+X3", data = mydata, #> coord = coord, fixed_vars = c("Intercept"), kernels = c("gauss_adapt"), #> H = 20, Model = "MGWRSAR_1_kc_kv", control = list(W = W)) #> Model: MGWRSAR_1_kc_kv #> Method for spatial autocorrelation: MGWRSAR_1_kc_kv #> Kernels function: gauss_adapt #> Kernels type: GD #> bandwidth: 20 #> Number of data points: 1000 #> Constant parameters: Intercept #> -0.6698754 #> Varying parameters: X1 X2 X3 #> X1 X2 X3 lambda #> Min. 0.19982 -0.46764 -1.90291 -0.2478 #> 1st Qu. 0.41206 1.13441 -1.27422 0.1194 #> Median 0.52169 1.84084 -1.09489 0.2813 #> Mean 0.50659 1.96433 -1.09162 0.3263 #> 3rd Qu. 0.60433 2.86682 -0.88669 0.5122 #> Max. 0.83615 4.86693 -0.44568 1.2080 #> Residual sum of squares: 869.4819 ## Bootstrap test mgwrsar_bootstrap_test function allows to perform a bootstrap test for Betas for mgwrsar class model (with parallel computation). Could be long, not run here. model_GWR<-MGWRSAR(formula = 'Y_mgwr~X1+X2+X3', data = mydata,coord=coord, fixed_vars=NULL,kernels=c('gauss_adapt'),H=20, Model = 'GWR',control=list(SE=TRUE)) summary_mgwrsar(model_GWR) model_MGWR<-MGWRSAR(formula = 'Y_mgwr~X1+X2+X3', data = mydata,coord=coord, fixed_vars=c('Intercept'),kernels=c('gauss_adapt'),H=20, Model = 'MGWR',control=list(SE=TRUE)) summary_mgwrsar(model_MGWR) test<-mgwrsar_bootstrap_test(model_MGWR,model_GWR,B=30,domc=FALSE,ncore=1,type='standard',eps='H1',df='H1',focal='median',D=NULL) # result # > test # > p_star 0 # > T 69.92265  ## Prediction In this example, we use rows 1 to 800 as insample data for model estimation and rows 801 to 1000 as outsample for prediction. Prediction of GWR model using sheppard kernel with 8 neighbors: model_GWR_insample<-MGWRSAR(formula = 'Y_gwr~X1+X2+X3', data = mydata[1:800,],coord=coord[1:800,], fixed_vars=NULL,kernels=c('gauss_adapt'),H=8, Model = 'GWR',control=list()) Y_pred=predict_mgwrsar(model_GWR_insample, newdata=mydata[801:1000,], newdata_coord=coord[801:1000,], k_extra = 8, kernel_extra = "sheppard") head(Y_pred) #> [1] 0.5111469 2.8323339 2.8674031 0.2058560 1.8421794 0.6162536 head(mydata$Y_gwr[801:1000])
#> [1] 0.2490662 2.8329445 2.7107657 0.1949003 1.7104271 0.6286141
sqrt(mean((mydata$Y_gwr[801:1000]-Y_pred)^2)) # RMSE #> [1] 0.1343204 Prediction of MGWRSAR_1_0_kv model using sheppard kernel with 8 neighbors and Best Linear Unbiased Predictor (BLUP) : model_MGWRSAR_1_0_kv_insample<-MGWRSAR(formula = 'Y_mgwrsar_1_0_kv~X1+X2+X3', data = mydata[1:800,],coord=coord[1:800,], fixed_vars=NULL,kernels=c('gauss_adapt'),H=20, Model = 'MGWRSAR_1_0_kv',control=list(W=W[1:800,1:800])) summary_mgwrsar(model_MGWRSAR_1_0_kv_insample) #> Call: #> MGWRSAR(formula = "Y_mgwrsar_1_0_kv~X1+X2+X3", data = mydata[1:800, #> ], coord = coord[1:800, ], fixed_vars = NULL, kernels = c("gauss_adapt"), #> H = 20, Model = "MGWRSAR_1_0_kv", control = list(W = W[1:800, #> 1:800])) #> Model: MGWRSAR_1_0_kv #> Method for spatial autocorrelation: MGWRSAR_1_0_kv #> Kernels function: gauss_adapt #> Kernels type: GD #> bandwidth: 20 #> Number of data points: 800 #> Varying parameters: Intercept X1 X2 X3 #> Intercept X1 X2 X3 lambda #> Min. -7.05380 0.10543 -2.35128 -2.19224 -0.5765 #> 1st Qu. -1.92244 0.39116 0.91679 -1.33372 0.0956 #> Median -0.89820 0.54991 2.10489 -1.00608 0.2728 #> Mean -0.99662 0.51309 2.61515 -1.09223 0.2588 #> 3rd Qu. -0.20394 0.64410 4.09003 -0.72127 0.4379 #> Max. 1.98675 0.84522 9.77120 -0.43915 1.2524 #> Residual sum of squares: 352.3345 ## Using Best Linear Unbiased Predictor Y_pred=predict_mgwrsar(model_MGWRSAR_1_0_kv_insample, newdata=mydata[801:1000,], newdata_coord=coord[801:1000,], k_extra = 12, W = W, type = "BPN", kernel_extra = "sheppard") head(Y_pred) #> [,1] #> [1,] 0.8833344 #> [2,] 5.5199667 #> [3,] 4.1304483 #> [4,] 1.1905310 #> [5,] 2.9239718 #> [6,] 1.8684993 head(mydata$Y_mgwrsar_1_0_kv[801:1000])
#> [1] 0.5400965 5.0848714 3.9300001 1.1680551 2.4675708 0.9905510
sqrt(mean((mydata$Y_mgwrsar_1_0_kv[801:1000]-Y_pred)^2)) #> [1] 8.442365 ## without Best Linear Unbiased Predictor Y_pred=predict_mgwrsar(model_MGWRSAR_1_0_kv_insample, newdata=mydata[801:1000,], newdata_coord=coord[801:1000,], k_extra = 12, W = W, type = "TC", kernel_extra = "sheppard") head(Y_pred) #> [1] 1.435196 5.666258 4.219273 1.388306 3.459105 1.706876 head(mydata$Y_mgwrsar_1_0_kv[801:1000])
#> [1] 0.5400965 5.0848714 3.9300001 1.1680551 2.4675708 0.9905510
sqrt(mean((mydata\$Y_mgwrsar_1_0_kv[801:1000]-Y_pred)^2))
#> [1] 54.01216