Software for measure and attribution of spatial stratified
heterogeneity (SSH), a universal characteristic of the nature at all scales |
|||
|
|||
4. Download of the software, with example
datasets |
|||
1.
Introduction
Spatial Stratified Heterogeneity (SSH) refers to the phenomena that the within strata are more similar than the between strata. Examples are landuse types and climate zones in spatial data, seasons and years in time series, occupations, age groups, incomes strata. SSH occurs in all scales from universe to DNA, offers windows for human beings to understand the nature since Aristotle time. Geodetector, i.e. Geographical
Detector, is a statistical tool to measure SSH and to make attribution for/by SSH (Fig. 1):
(1) measure and find SSH among data; (2) test the coupling between two
variables Y and X, according to their SSHs, without
assumption of linearity of the association; and (3) investigate interaction between two
explanatory variables X1
and X2 to a response
variable Y, without any specific
form of interaction such as the assumed product in econometrics (Fig. 2).
Each of the tasks can be accomplished by the Geodetector q-statistic:
Fig. 1. Principle of Geodetector (The bottom map, the
color indicates the values of a population Y. The top map, the population Y is stratified into strata {h};
the terms “stratification” and “partition” are equivalent, can be either
classification or zonation. Between the two maps is the equation q(Y|{h}), in which the numerator is the
summation of the within strata variance and the denominator is the pooled
variance.) where N
and s2 stand for the number
of units and the variance of Y in a
study area, respectively; the population Y
is composed of L strata (h = 1, 2, …, L). The strata of Y
(red polygons in Fig.1) are a
partition of Y, either by itself h(Y)
or by an explanatory variable X
which is a categorical h(X). X should be stratified if it is a numerical variable, the number
of strata L might be 2-10 or more,
according to prior knowledge or a classification algorithm. [(N-L)q]/[(L-1)(1-q)] ~ F(L-1, N-L, g), where g is a non central parameter (Wang et
al 2016). The strata of Y (red
polygons in Fig.1) are a partition
of Y, either by Y itself or by an explanatory variable
X. X is a categorical variable or should be stratified if it is a
numerical variable. The number of strata L
might be 2-10 or more, according to prior knowledge or a classification
algorithm. The terms “stratified heterogeneity (SH)”, “stratification”,
“classification” and “partition” are equivalent. SH can
be either spatial (spatial stratified heterogeneity, SSH) or aspatial such as
time and any attributes. Interpretation of q value (Fig.1). The value of q is strictly within [0, 1]. (1) If Y is stratified by Y
itself, then q = 0 indicates that Y is absent of SH; q = 1 indicates that Y is SH perfectly; 100q% measures the degree of SH of Y. (2) If Y is stratified by an explanatory variable X, then q = 0 indicates
that there is no coupling between Y
and X; q = 1 indicates that Y
is completely determined by X; X explains 100q% of Y. Please notice
that the q-statistic measures the
association between X and Y, both linearly and nonlinearly. Geodetector q
statistic helps understand spatial confounding, sample bias and overfitting. (1)
Confounding arises if a global model was applied to a SH
population, leading to statistical insignificance. The problem can be simply
avoided if SH is identified (by Geodetector q statistic) then modelling in the strata, separately. (2)
A sample would be biased if a population is SH
and the sample do not cover all strata. The problem can be solved if SH is
identified (by Geodetector q
statistic) then apply bias remedy models such as Heckman regression and
Bshade method. (3)
Local models aim to overcome heterogeneity but
often suffer overfitting and too many parameters to interpret. The problems
can be avoided if modelling in strata or stratifying the outputs of a local
model then interpreting the stratified parameters. Functions of Geodetector: (1)
The risk detector maps response variable in strata: Y(X); (2)
The factor detector q-statistic
measures the degree of SH of a variable Y;
and the determinant power of an explanatory variable X of Y; (3)
The ecological detector identifies the difference of the impacts
between two explanatory variables X1
~ X2; (4)
The interaction detector reveals whether the risk factors X1 and X2 (and more X) have an
interactive influence on a response variable Y (Fig.2). Fig.
2. Interaction between
explanatory variables X1 and X2 impacting on a response variable Y: q(Y|X1 |
2.
Tutorial
The Geodetector
software was developed using Excel and R, respectively. The tools are free of
charge, freely downloadable, and easy to use, and were designed without any
GIS plug-in components and with “one click” execution. Users can run the
following demo, then simply replace the demo data in the software using your
own data, click Run and you get results ! We henceforth describe Excel
Geodetector software. R users can download the R Geodetector software in the
following section “Download of
Geodetector Software and Example Datasets”. As a demo, neural-tube birth defects (NTD) Y and suspected risk factors or their proxies Xs in villages are provided, including
data for the health effect layers “NTD prevalence” and environmental factor
layers, “elevation”, “soil type”, and “watershed”. Their field names are
defined as Y and X1, X2, X3 respectively. Step 1.
Download the software and input your data in Excel (1)
Download the Excel Geodetector software (In the
following section “Software and Examples Data Download”), one click to
download any one of the three Examples, unzip the downloaded file, you will
find an Excel file (this is Geodetector software with an Example dataset!)
and double click the Excel file, Fig.
3 and Fig. 5 appear. Fig. 3 is the format of the input
data for the Geodetector: each row denotes a sample unit (e.g. a village);
the 1st column record the response variable Y; the 2nd and following columns denote partitions of Y or factors X, the latter were partitioned according to the similarity within
strata. (2)
Input your data into the Excel Geodetector
software in the format of Fig. 3.
Then go to Step 2. Fig. 3. Input data
in Excel and the execution interface (Note:
Y is numerical; X MUST be categorical, e.g. landuse types,
seasons. If X is numerical it
should be transformed to be categorical, e.g. GDP per capita is stratified
into 5 strata) (3) If
your data is in GIS format, as Fig. 4,
please transform the GIS data into Excel data as Fig. 3. Fig. 4. Data in
GIS format Step
2. Run Geodetector software Only one
operation interface was designed (Fig.
5). The function of the “Read Data” button is to load data; thus, when
the button is clicked, all variables are listed in the “variables” list box.
Then, disease and partition of Y or
environmental factor variables are selected into their corresponding list
boxes Y and X on the right of the interface. Finally, Geodetector is executed
by clicking the “Run” button. Fig. 5. User
interface for Geodetector back to the top ||
|
||||||||||||||||||
3.
Output
Geodetector
outputs results from the risk detector, factor detector, ecological detector,
and interaction detector in four Excel spreadsheets (Fig. 6). Fig. 6. Interface
for Geodetector results In the
“Risk detector” sheet (Fig. 7),
result information for each environmental risk factor is presented in two tables.
The first table gives the average disease incidence
in each stratum of a risk factor, the name of which is written at the top
left of the table. The second table gives the statistically significant
difference in the average disease incidence between two strata; if there is a
significant difference, the corresponding value is “Y”, else it is “N”. Fig. 7. Results of
risk detector The Fig. 8
shows the output format of the q
values for each environmental risk factor, as given in the “Factor detector”
sheet. The table header gives the names of the environmental risk factors,
while the associated q values (q1, q2, …, qn)
and their corresponding p values
are presented in the row below. Fig. 8. Results of
factor detector In the
“Ecological detector” sheet (Fig. 9), results of the statistically significant differences between two environmental risk
factors are presented. If Y(X1) (risk factor names in
row) was significantly bigger than Y(X2) (risk factor names in
column), the associated value is “Y”, while “N” expresses the opposite
meaning. Fig. 9. Results of
ecological detector The
format of the results for the interaction detector is shown in Fig. 10. “Interaction relationships”
below the table represent the interaction relationship for the two factors.
The relationship is defined in a coordinate
axis. It has 5 intervals, including “(-∞,min(q(x), q(y)))”,“(min(q(x),
q(y)), max(q(x), q(y)))”,
“(max(q(x), q(y)), q(x) + q(y))”,“q(x) + q(y)”,“( q(x) + q(y),+∞)”, and the interaction
relationship is determined by the location of q(xÇy) in the 5 intervals (see Table 1). Fig. 10. Results of
interaction detector Tab. 1. Interaction between Explanatory Variables (Xs)
Legend |
4.
Download of the software , with example
datasets
The software was developed using Excel 2007 and
R, respectively. It is completely free. You can click any one of the
following links to download the Geodetector software. The first three are
Geodetector software in Excel: (1) click one and unzip the file, an Excel
file appears; (2) click the Excel file to start the Geodetector, you may
exercise the demo data; then (3) input your own data to get your own results.
1:
Geodetector Software in Excel, enclosed an Example of a Disease Dataset 2: Geodetector Software in Excel, enclosed an
Example of a Toy Dataset 3: Geodetector Software
in Excel, enclosed an Example of a NDVI Dataset 5: Geodetector
Software in QGIS (please use google to access) |
2011
3. Hu Y, Wang JF, Li XH,
Ren D, Zhu J. 2011. Geographical
detector-based risk assessment of the under-five mortality in the 2008
Wenchuan earthquake, China. PLoS
ONE 6(6): e21427. 4.
Zou B, Wilson JG, Zhan FB, Zeng YN, Wu KJ. 2011. Spatial-temporal
variations in regional ambient sulfur dioxide concentration and
source-contribution analysis: A dispersion modeling approach. Atmospheric Environment 45:
4977-4985. |
2012
5.
Gajos M. 2012. Geoinformation
technologies in biomedicine and health care: review of scientific journals.
E. Piętka and J. Kawa (Eds.): ITIB
2012, LNCS 7339: 510–524. 6.
Li LF, Wang JF, Wu J. 2012. A spatial model to
predict the incidence of neural tube defects. BMC Public Health 12: 951. 7.
Wang JF, Hu Y. 2012. Environmental
health risk detection with GeogDetector. Environmental Modelling & Software 33: 114-115. |
8.
刘彦随, 杨
忍, 2012. 中国县域城镇化的空间特征与形成机理.
地理学报 67(8): 1011-1020. Liu YS, Yang R.2012.Spatial characteristics and
formation mechanism of the county urbanization in China. Acta Geographica Sinica 67(8): 1011-1020. |
2013
9. Cao F, Ge Y, Wang JF.
2013. Optimal
discretization for geographical detectors-based risk assessment. GIScience & Remote Sensing 50(1):
78-92. 10. Li XW, Xie YF, Wang JF,
Christakos G, Si JL, Zhao HN, Ding YQ, Li J. 2013. Influence
of planting patterns on Fluoroquinolone residues in the soil of an intensive
vegetable cultivation area in north China. Science of the Total Environment 458-460: 63-69. 11.
Lee WC. 2013. Assessing
causal mechanistic interactions: a peril ratio index of synergy based on
multiplicativity. PLoS ONE
8(6): e67424. doi:10.1371/journal.pone.0067424. 12.
Raghavan RK, Brenner KM, Harrington Jr JA, Higgins JJ, Harkin KR.
2013. Spatial
scale effects in environmental risk-factor modelling for diseases. Geospatial Health 7(2): 169-182. 13.
Wang JF, Wang Y, Zhang J, Christakos G, Sun JL, Liu X, Lu L, Fu XQ,
Shi YQ, Li XM. 2013. Spatiotemporal
transmission and determinants of typhoid and paratyphoid fever in Hongta
District, China. PLoS Neglected
Tropical Diseases 7(3): e2112. 14.
Wang JF, Xu CD, Tong SL, Chen HY, Yang WZ. 2013. Spatial
dynamic patterns of hand-foot-mouth disease in the People’s Republic of China.
Geospatial Health 7(2): 381-390. |
|
2014
15.
Bai HX, Ge Y, Wang JF, Li DY, Liao YL, Zheng XY. 2014. A method for extracting rules from spatial data based on rough
fuzzy sets. Knowledge-Based
Systems 57: 28-40. 16.
Hu Y, Gao J, Chi M, Luo C, Lynn H, Sun LQ, Tao B, Wang DC, Zhang ZJ,
Jiang QW. 2014. Spatio-temporal patterns of schistosomiasis Japonica
in lake and marshland areas in China: the effect of snail habitats.
American Journal of Tropical Medicine
and Hygiene 91(3): 547–554. 17.
Hu Z, Tang GA, Lu GN. 2014. A new geographical language: a perspective of GIS.
Journal of Geographical Sciences
24(3): 560-576. 18.
Huang JX, Wang JF, Bo YC, Xu CD, Hu MG. 2014. Identification
of health risks of Hand, Foot and Mouth Disease in China using the
Geographical Detector Technique. International Journal of Environmental Research and Public Health
11: 3407-3423. 19.
Luo W. 2014. Impact cratering as a major factor controlling
valley dissection density on MARS - a geographical detector approach.
45th Lunar and Planetary
Science Conference. 2580.pdf. 20.
Qian Q, Zhao J, Fang LQ, Zhou H, Zhang WJ, Wei L, Yang H, Yin WW,
Cao WC, Li Q. 2014. Mapping risk of plague in Qinghai-Tibetan Plateau,
China. BMC Infectious
Diseases 14: 382. 21.
Ren Y, Deng LY, Zuo SD, et al. 2014. Geographical
modeling of spatial interaction between human activity and forest
connectivity in an urban landscape of southeast China. Landscape Ecology 29(10): 1741-1758. 22.
Wu JL, Zhang CS, Pei LJ, Chen G, Zheng XY. 2014. Association between risk of birth defects occurring
level and arsenic concentrations in soils of Lvliang, Shanxi province of
China. Environmental
Pollution 191: 1-7. 23.
Xu EQ, Zhang HQ. 2014. Characterization and interaction of driving factors
in karst rocky desertification: a case study from Changshun, China.
Solid Earth 5: 1329-1340. |
|
24.
蔡芳芳,濮励杰. 2014. 南通市城乡建设用地演变时空特征与形成机理. 资源科学 36(4): 0731-0740. Cai FF, Pu LJ. 2014.
Spatial-Temporal characteristics and formation mechanism of Urban-Rural
construction land in Nantong City. Resources
Science 36(4): 0731-0740. 25.
丁 悦,蔡建明,任周鹏,杨振山. 2014. 基于地理探测器的国家级经济技术开发区经济增长率空间分异及影响因素. 地理科学进展 33(5): 657-666. Ding Y, Cai JM, Ren ZP, Yang ZS. 2014. Spatial
disparities of economic growth rate of China’s National-level ETDZs and their
determinants based on geographical detector analysis. Progress in Geography 33(5): 657-666. 26. 胡 丹,舒晓波,尧 波,曹安庆.
2014. 江西省县域人均粮食占有量的时空格局演变. 地域研究与开发 33(4): 157-162. Hu D, Shu XB, Yao B, Cao QA. 2014. The
evolvement of spatial-temporal pattern of per capita grain possession in
counties of Jiangxi Province. Areal
Research And Development 33(4): 157-162. 27. 李成悦,王 腾,周 勇. 2014. 湖北省区域经济格局时空演化及其影响因素分析. 发展研究 |