identifying and attributing spatial
pattern in natural and social
processes with software 



4. Download of the software, with example
datasets 

1.
Introduction
Spatial Stratified Heterogeneity (SSH) is a phenomena that the within strata are more similar than the between strata. Examples of this include landuse types and climate zones in spatial data, seasons and years in time series, occupations, age groups, incomes strata. SSH occurs in all scales from universe to DNA, and has been studied since Aristotle time. Geodetector, or Geographical
Detector, is a statistical tool to measure SSH and to make attributions for or
by SSH (Fig. 1): (1) Measure and identity SSH among data; (2) Test the coupling between two variables Y and X without assuming linearity of the association and with clear
physical meanings; and (3) Investigate the general interaction between two explanatory
variables X1 and X2 and a response variable Y, without any specific form of
interaction such as the assumed product in econometrics (Fig. 2). Each of the above tasks can be accomplished
using the Geodetector qstatistic:
Fig. 1. Principle of Geodetector qstatistic
(Wang et
al 2016) (The bottom map, the
color indicates the values of a population Y. The top map, the population Y is composed of L
strata (h = 1, 2, …, L); the terms “stratification” and
“partition” are equivalent, can be either classification or zonation. Between
the two maps is the equation q(Y{h}),
in which the numerator is the summation of the within strata variance and the
denominator is the pooled variance; N
and s^{2} stand for the number
of units and the variance of Y in a
study area, respectively. [(NL)q]/[(L1)(1q)] ~ F(L1, NL, g), where g is a non central parameter) . The strata of Y (red
polygons in Fig.1) are a partition
of Y, either by Y itself or by an explanatory variable
X. X is a categorical variable or should be stratified if it is a
numerical variable. The number of strata L
might be 210 or more, according to prior knowledge or a classification
algorithm. The “spatial” in “spatial stratified heterogeneity”
can be either spatial in geoscience or in a broad mathermatical sense such as
time and any attributes. Interpretation of Geodetector qstatistic (Fig.1). The value of q is strictly within [0, 1]. (1) If Y is stratified by Y
itself, then a qstatistic of 0
indicates that Y is absent of
spatial stratified heterogeneity; a qstatistic
of 1 indicates that Y is perfectly
spatially stratified heterogenous; and a qstatistic
of 100q% measures the degree of
spatial stratified heterogeneity of Y. (2) If Y is stratified by an explanatory variable X, then a qstatistic
of 0 indicates that there is no coupling between Y and X; a qstatistic of 1 indicates that Y is completely determined by X; and X explains 100q% of Y. Please note that the qstatistic measures the association
between X and Y without assuming the linearity between X and Y. Geodetector qstatistic
can be used to understand spatial confounding, sample bias and overfitting. (1)
Confounding can occur if a model is applied to a (spatial)
stratified heterogeneneous population, leading to a misleading interpretation
and statistical insignificance of the model outcome. This problem can be
avoided by identifying SSH (by Geodetector q statistic) then modelling in the strata, separately. (2)
A sample would be biased if a population is
(spatial) stratified heterogeneous and the sample do not cover all strata.
The problem can be solved by identifing (spatial) stratified heterogeneity
(using Geodetector q statistic)
then applying bias remedy models such as Heckman regression and Bshade
method. (3)
Local models aim to overcome heterogeneity but
often suffer from overfitting and too many parameters to interpret. These
problems can be avoided by modelling in strata or stratifying the outputs of
a local model then interpreting the stratified parameters. Functions of Geodetector: (1)
The risk detector maps response variable Y in strata according to X; (2)
The factor detector qstatistic
measures the degree of spatial stratified heterogeneity of a variable Y if Y is stratified by itself; and the determinant power of an
explanatory variable X on Y if Y is stratified by X; (3)
The ecological detector identifies the difference in the impacts
between two explanatory variables X1
and X2; (4)
The interaction detector reveals whether the risk factors X1 and X2 (and more X, if applicable) have an interactive
influence on a response variable Y
(Fig.2). Fig.
2. The General
interaction between explanatory variables X1 and X2
impacting on a response variable Y:
q(YX1X2). 
2.
Tutorial
The Geodetector software was developed using
Excel and R, respectively. The tools are free of charge, freely downloadable,
and easy to use, and were designed without any GIS plugin components and
with “one click” execution. Users can run the following demo, then simply
replace the demo data in the software using your own data, click Run and get
results ! We henceforth describe Excel Geodetector software. R users can
download the R Geodetector software in the following section “Download of Geodetector Software and
Example Datasets”. As a demo, neuraltube birth defects (NTD) Y and suspected risk factors or their proxies Xs in villages are provided, including
data for the health effect layers “NTD prevalence” and environmental factor
layers, “elevation”, “soil type”, and “watershed”. Their field names are defined
as Y and X1, X2, X3 respectively. Step 1. Download the software and input your
data in Excel (1) Download the Excel
Geodetector software (In the following section “Software and Examples Data
Download”), one click to download any one of the three Examples, unzip the
downloaded file, you will find an Excel file (this is Geodetector software
with an Example dataset!) and double click the Excel file, Fig. 3 and Fig. 5 appear. Fig. 3
is the format of the input data for the Geodetector: each row denotes a
sample unit (e.g. a village); the 1^{st} column record the response
variable Y; the 2^{nd} and
following columns denote partitions of Y
or factors X, the latter were
partitioned according to the similarity within strata. (2) Input your data into
the Excel Geodetector software in the format of Fig. 3. Then go to Step 2. Fig. 3. Input data in Excel and the execution interface (Note: Y should be numerical; X MUST be categorical, e.g. landuse types,
seasons. If X is numerical, it
should be transformed into a categorical variable, e.g. GDP per capita is
stratified into 5 strata. At lease three sample units in each of the strata
are required) (3)
If your data is in GIS format, as shown in Fig. 4, you can use QGIS directly in Section 4, or you can transform the GIS data into Excel data as
shown in Fig. 3. Fig. 4. Data in GIS format Step 2. Run Geodetector
software Only one operation interface was designed (Fig. 5). The function of the “Read
Data” button is to load data; thus, when the button is clicked, all variables
are listed in the “variables” list box. Then, disease and partition of Y or environmental factor variables
are selected into their corresponding list boxes Y and X on the right of the
interface. Finally, Geodetector is executed by clicking the “Run” button. Fig. 5. User interface for Geodetector back to the top 


3.
Output
Geodetector outputs
results from the risk detector, factor detector, ecological detector, and
interaction detector in four Excel spreadsheets (Fig. 6). Fig. 6. Interface
for Geodetector results In the “Risk detector”
sheet (Fig. 7), result information
for each environmental risk factor is presented in two tables. The first
table gives the average disease incidence in each stratum of a
risk factor, the name of which is written at the top left of the table. The
second table gives the statistically significant difference in the average
disease incidence between two strata; if there is a significant difference,
the corresponding value is “Y”, else it is “N”. Fig. 7. Results of risk detector The
Fig. 8 shows the output format of
the q values for each environmental
risk factor, as given in the “Factor detector” sheet. The table header gives
the names of the environmental risk factors X(X1, X2, …, Xn), while the associated q
values and their corresponding p
values are presented in the row below. Fig. 8. Results of factor detector In the “Ecological
detector” sheet (Fig. 9), results of the statistically
significant differences between two environmental risk factors are presented.
If Y(X1)_{ }(risk factor names in row) was significantly
bigger than Y(X2)_{ }(risk factor names in column), the associated
value is “Y”, while “N” expresses the opposite meaning. Fig. 9. Results of ecological detector The format of the
results for the interaction detector is shown in Fig. 10. “Interaction
relationships” below the table represent the interaction relationship for
the two factors. The relationship is
defined in a coordinate axis. It has 5 intervals, including “(∞，min(q(x), q(y)))”,“(min(q(x), q(y)),
max(q(x), q(y)))”, “(max(q(x), q(y)),
q(x) + q(y))”,“q(x)
+ q(y)”,“( q(x)
+ q(y),+∞)”, and the
interaction relationship is determined by the location of q(xÇy) in the 5 intervals (see Table 1).
Fig. 10. Results of interaction detector Tab.
1. Interaction between Explanatory Variables (Xs)
Legend 
4.
Download of the software , with example
datasets
The software was developed using Excel 2007, R
and QGIS, respectively. It is completely free. You can click any one of the
following links to download the Geodetector software. The first three are
Geodetector software in Excel: (1) click one and unzip the file, an Excel
file appears; (2) click the Excel file to start the Geodetector, you may
exercise the demo data; then (3) input your own data to get your own results.
1:
Geodetector Software in Excel, enclosed an Example of a Disease Dataset 2: Geodetector Software in Excel, enclosed an
Example of a Toy Dataset 3: Geodetector
Software in Excel, enclosed an Example of a NDVI Dataset 5:
Geodetector Software in QGIS (please use google to access) 
2011
3. Hu Y, Wang JF, Li XH,
Ren D, Zhu J. 2011. Geographical
detectorbased risk assessment of the underfive mortality in the 2008
Wenchuan earthquake, China. PLoS
ONE 6(6): e21427. 4.
Zou B, Wilson JG, Zhan FB, Zeng YN, Wu KJ. 2011. Spatialtemporal
variations in regional ambient sulfur dioxide concentration and
sourcecontribution analysis: A dispersion modeling approach. Atmospheric Environment 45:
49774985. 
2012
5.
Gajos M. 2012. Geoinformation
technologies in biomedicine and health care: review of scientific journals.
E. Piętka and J. Kawa (Eds.): ITIB
2012, LNCS 7339: 510–524. 6.
Li LF, Wang JF, Wu J. 2012. A spatial model to predict
the incidence of neural tube defects. BMC Public Health 12: 951. 7.
Wang JF, Hu Y. 2012. Environmental
health risk detection with GeogDetector. Environmental Modelling & Software 33: 114115. 
8.
刘彦随, 杨
忍, 2012. 中国县域城镇化的空间特征与形成机理.
地理学报 67(8): 10111020. Liu YS, Yang R.2012.Spatial characteristics and
formation mechanism of the county urbanization in China. Acta Geographica Sinica 67(8): 10111020. 
2013
9. Cao F, Ge Y, Wang JF.
2013. Optimal
discretization for geographical detectorsbased risk assessment. GIScience & Remote Sensing 50(1):
7892. 10. Li XW, Xie YF, Wang JF,
Christakos G, Si JL, Zhao HN, Ding YQ, Li J. 2013. Influence
of planting patterns on Fluoroquinolone residues in the soil of an intensive
vegetable cultivation area in north China. Science of the Total Environment 458460: 6369. 11.
Lee WC. 2013. Assessing
causal mechanistic interactions: a peril ratio index of synergy based on
multiplicativity. PLoS ONE
8(6): e67424. doi:10.1371/journal.pone.0067424. 12.
Raghavan RK, Brenner KM, Harrington Jr JA, Higgins JJ, Harkin KR.
2013. Spatial
scale effects in environmental riskfactor modelling for diseases. Geospatial Health 7(2): 169182. 13.
Wang JF, Wang Y, Zhang J, Christakos G, Sun JL, Liu X, Lu L, Fu XQ,
Shi YQ, Li XM. 2013. Spatiotemporal
transmission and determinants of typhoid and paratyphoid fever in Hongta
District, China. PLoS Neglected
Tropical Diseases 7(3): e2112. 14.
Wang JF, Xu CD, Tong SL, Chen HY, Yang WZ. 2013. Spatial
dynamic patterns of handfootmouth disease in the People’s Republic of China.
Geospatial Health 7(2): 381390. 

2014
15.
Bai HX, Ge Y, Wang JF, Li DY, Liao YL, Zheng XY. 2014. A method for extracting rules from spatial data based on rough
fuzzy sets. KnowledgeBased
Systems 57: 2840. 16.
Hu Y, Gao J, Chi M, Luo C, Lynn H, Sun LQ, Tao B, Wang DC, Zhang ZJ,
Jiang QW. 2014. Spatiotemporal patterns of schistosomiasis Japonica
in lake and marshland areas in China: the effect of snail habitats.
American Journal of Tropical Medicine
and Hygiene 91(3): 547–554. 