Geodetector

Software for measure and attribution of spatial stratified heterogeneity (SSH), a universal characteristic of the nature at all scales

                       

 

1. Introduction

2. Tutorial

3. Output

4. Download of the software, with example datasets

5. Citations

6. Bibliography

7. FAQs

8. Developer and Contact

 

1.        Introduction

Spatial Stratified Heterogeneity (SSH) refers to the phenomena that the within strata are more similar than the between strata. Examples are landuse types and climate zones in spatial data, seasons and years in time series, occupations, age groups, incomes strata. SSH occurs in all scales from universe to DNA, offers windows for human beings to understand the nature since Aristotle time.

Geodetector, i.e. Geographical Detector, is a statistical tool to measure SSH and to make attribution for/by SSH (Fig. 1): (1) measure and find SSH among data; (2) test the coupling between two variables Y and X, according to their SSHs, without assumption of linearity of the association; and (3) investigate interaction between two explanatory variables X1 and X2 to a response variable Y, without any specific form of interaction such as the assumed product in econometrics (Fig. 2). Each of the tasks can be accomplished by the Geodetector q-statistic:

 


Fig. 1. Principle of Geodetector

(The bottom map, the color indicates the values of a population Y. The top map, the population Y is stratified into strata {h}; the terms “stratification” and “partition” are equivalent, can be either classification or zonation. Between the two maps is the equation q(Y|{h}), in which the numerator is the summation of the within strata variance and the denominator is the pooled variance.)

 

where N and s2 stand for the number of units and the variance of Y in a study area, respectively; the population Y is composed of L strata (h = 1, 2, …, L). The strata of Y (red polygons in Fig.1) are a partition of Y, either by itself h(Y) or by an explanatory variable X which is a categorical h(X). X should be stratified if it is a numerical variable, the number of strata L might be 2-10 or more, according to prior knowledge or a classification algorithm. [(N-L)q]/[(L-1)(1-q)] ~ F(L-1, N-L, g), where g is a non central parameter (Wang et al 2016).

The strata of Y (red polygons in Fig.1) are a partition of Y, either by Y itself or by an explanatory variable X. X is a categorical variable or should be stratified if it is a numerical variable. The number of strata L might be 2-10 or more, according to prior knowledge or a classification algorithm. The terms “stratified heterogeneity (SH)”, “stratification”, “classification” and “partition” are equivalent. SH can be either spatial (spatial stratified heterogeneity, SSH) or aspatial such as time and any attributes.

Interpretation of q value (Fig.1).

The value of q is strictly within [0, 1].

(1)  If Y is stratified by Y itself, then q = 0 indicates that Y is absent of SH; q = 1 indicates that Y is SH perfectly; 100q% measures the degree of SH of Y.

(2)  If Y is stratified by an explanatory variable X, then q = 0 indicates that there is no coupling between Y and X; q = 1 indicates that Y is completely determined by X; X explains 100q% of Y. Please notice that the q-statistic measures the association between X and Y, both linearly and nonlinearly.

Geodetector q statistic helps understand spatial confounding, sample bias and overfitting.

(1)    Confounding arises if a global model was applied to a SH population, leading to statistical insignificance. The problem can be simply avoided if SH is identified (by Geodetector q statistic) then modelling in the strata, separately.

(2)    A sample would be biased if a population is SH and the sample do not cover all strata. The problem can be solved if SH is identified (by Geodetector q statistic) then apply bias remedy models such as Heckman regression and Bshade method.

(3)    Local models aim to overcome heterogeneity but often suffer overfitting and too many parameters to interpret. The problems can be avoided if modelling in strata or stratifying the outputs of a local model then interpreting the stratified parameters.

Functions of Geodetector:

(1)    The risk detector maps response variable in strata: Y(X);

(2)    The factor detector q-statistic measures the degree of SH of a variable Y; and the determinant power of an explanatory variable X of Y;

(3)    The ecological detector identifies the difference of the impacts between two explanatory variables X1 ~ X2;

(4)    The interaction detector reveals whether the risk factors X1 and X2 (and more X) have an interactive influence on a response variable Y (Fig.2).

 

Fig. 2. Interaction between explanatory variables X1 and X2 impacting on a response variable Y: q(Y|X1X2).

 

back to the top ||

 

2.        Tutorial

The Geodetector software was developed using Excel and R, respectively. The tools are free of charge, freely downloadable, and easy to use, and were designed without any GIS plug-in components and with “one click” execution. Users can run the following demo, then simply replace the demo data in the software using your own data, click Run and you get results ! We henceforth describe Excel Geodetector software. R users can download the R Geodetector software in the following section “Download of Geodetector Software and Example Datasets”.

As a demo, neural-tube birth defects (NTD) Y and suspected risk factors or their proxies Xs in villages are provided, including data for the health effect layers “NTD prevalence” and environmental factor layers, “elevation”, “soil type”, and “watershed”. Their field names are defined as Y and X1, X2, X3 respectively.

Step 1. Download the software and input your data in Excel

(1)  Download the Excel Geodetector software (In the following section “Software and Examples Data Download”), one click to download any one of the three Examples, unzip the downloaded file, you will find an Excel file (this is Geodetector software with an Example dataset!) and double click the Excel file, Fig. 3 and Fig. 5 appear. Fig. 3 is the format of the input data for the Geodetector: each row denotes a sample unit (e.g. a village); the 1st column record the response variable Y; the 2nd and following columns denote partitions of Y or factors X, the latter were partitioned according to the similarity within strata.

(2)  Input your data into the Excel Geodetector software in the format of Fig. 3. Then go to Step 2.

 

说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: image004

 

Fig. 3. Input data in Excel and the execution interface

(Note: Y is numerical; X MUST be categorical, e.g. landuse types, seasons. If X is numerical it should be transformed to be categorical, e.g. GDP per capita is stratified into 5 strata)

 

(3)  If your data is in GIS format, as Fig. 4, please transform the GIS data into Excel data as Fig. 3.

 

 

Fig. 4. Data in GIS format

 

Step 2. Run Geodetector software

Only one operation interface was designed (Fig. 5). The function of the “Read Data” button is to load data; thus, when the button is clicked, all variables are listed in the “variables” list box. Then, disease and partition of Y or environmental factor variables are selected into their corresponding list boxes Y and X on the right of the interface. Finally, Geodetector is executed by clicking the “Run” button.

 

说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: UI

 

Fig. 5. User interface for Geodetector

back to the top ||

 

3.        Output

Geodetector outputs results from the risk detector, factor detector, ecological detector, and interaction detector in four Excel spreadsheets (Fig. 6).

 

说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: image014

 

Fig. 6. Interface for Geodetector results

 

In the “Risk detector” sheet (Fig. 7), result information for each environmental risk factor is presented in two tables. The first table gives the average disease incidence in each stratum of a risk factor, the name of which is written at the top left of the table. The second table gives the statistically significant difference in the average disease incidence between two strata; if there is a significant difference, the corresponding value is “Y”, else it is “N”.

 

risk-detector

 

Fig. 7. Results of risk detector

 

The Fig. 8 shows the output format of the q values for each environmental risk factor, as given in the “Factor detector” sheet. The table header gives the names of the environmental risk factors, while the associated q values (q1, q2, …, qn) and their corresponding p values are presented in the row below.

 

factor-detector

 

Fig. 8. Results of factor detector

 

In the “Ecological detector” sheet (Fig. 9), results of the statistically significant differences between two environmental risk factors are presented. If Y(X1) (risk factor names in row) was significantly bigger than Y(X2) (risk factor names in column), the associated value is “Y”, while “N” expresses the opposite meaning.

ecological-detector

 

Fig. 9. Results of ecological detector

 

The format of the results for the interaction detector is shown in Fig. 10.Interaction relationships” below the table represent the interaction relationship for the two factors. The relationship is defined in a coordinate axis. It has 5 intervals, including “(-min(q(x), q(y)))”,“(min(q(x), q(y)), max(q(x), q(y)))”, “(max(q(x), q(y)), q(x) + q(y))”,“q(x) + q(y)”,“( q(x) + q(y),+∞)”, and the interaction relationship is determined by the location of q(xÇy) in the 5 intervals (see Table 1).

 

interaction-detector

 

Fig. 10. Results of interaction detector

 

Tab. 1. Interaction between Explanatory Variables (Xs)

 

Graphical representation

Description

Interaction

 

q(X1ÇX2) < Min(q(X1), q(X2))

 

Weaken, nonlinear

Min(q(X1),q(X 2))<q(X1Ç X2)<Max(q(X1)), q(X2))

 

Weaken, uni-

 

q(X1Ç X2) > Max(q(X1), q(X2))

 

Enhance, bi-

 

q(X1Ç X2) = q(X1)+ q(X2)

 

Independent

 

q(X1Ç X2) > q(X1)+ q(X2)

 

Enhance, nonlinear

Legend

说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: 说明: image029

back to the top ||

 

4.        Download of the software , with example datasets

The software was developed using Excel 2007 and R, respectively. It is completely free. You can click any one of the following links to download the Geodetector software. The first three are Geodetector software in Excel: (1) click one and unzip the file, an Excel file appears; (2) click the Excel file to start the Geodetector, you may exercise the demo data; then (3) input your own data to get your own results.

1: Geodetector Software in Excel, enclosed an Example of a Disease Dataset

2: Geodetector Software in Excel, enclosed an Example of a Toy Dataset

3: Geodetector Software in Excel, enclosed an Example of a NDVI Dataset

4: Geodetector Software in R

5: Geodetector Software in QGIS (please use google to access)

back to the top ||

 

5.  Citations. Geodetector can be cited as:

[1] Wang JF, Li XH, Christakos G, Liao YL, Zhang T, Gu X & Zheng XY. 2010. Geographical detectors-based health risk assessment and its application in the neural tube defects study of the Heshun region, China. International Journal of Geographical Information Science 24(1): 107-127.

[2] Wang JF, Zhang TL, Fu BJ. 2016. A measure of spatial stratified heterogeneity. Ecological Indicators 67: 250-256.

back to the top ||

6.  Bibliography

6.1 Featured articles using Geodetector

2016 Luo W, Jasiewicz J, Stepinski T, et al. 2016. Spatial association between dissection density and environmental factors over the entire conterminous United States. Geophysical Research Letters 43(2): 692-700.

2019 Yin Q, et al. 2019. Mapping the increased minimum mortality temperatures in the context of global climate change. Nature Communications 10: 4640.

2019 Zhang LQ, et al. 2019. Air pollution exposure associates with increased risk of neonatal jaundice. Nature Communications 10: 3741.

2020 Li JM, et al. 2020. Spatiotemporal trends and ecological determinants in maternal mortality ratios in 2,205 Chinese counties, 2010-2013: A Bayesian modelling analysis. PLoS Medicine 17(5): e1003114.

2021 Feng RD, et al. 2021. Urban ecological land and natural-anthropogenic environment interactively drive surface urban heat island: An urban agglomeration-level study in China. Environment International 157: 106857.

2021 Hu MG, et al. 2021. The risk of COVID-19 transmission in train passengers: an epidemiological and modelling study. Clinical Infectious Diseases 72(4): 604-610.

2021 Metaxas D, Qu H, Riedlinger G, Wu P, Huang Q, Yi J, De S. 2021. Deep learning-based nuclei segmentation and classification in histopathology images with application to imaging genomics. Computer Vision for Microscopy Image Analysis 185-201.

2021 Xu B, et al. 2021. Seasonal association between viral causes of hospitalized acute lower respiratory infections and meteorological factors in China: a retrospective study. Lancet Planetary Health 5: e154–63.

2022 Chen J, et al. 2022. Magnitudes and patterns of large-scale permafrost ground deformation revealed by Sentinel-1 InSAR on the central Qinghai-Tibet Plateau. Remote Sensing of Environment 268: 112778.

2022 Guo ZF, Boeing WJ, Xu YY, Borgomeo E, Liu D, Zhu YG. 2022. Data-driven discoveries on widespread contamination of freshwater reservoirs by dominant antibiotic resistance genes. Water Research. doi:https://doi.org/10.1016/j.watres.2022.119466.

2022 Yang JT, et al. 2022. Chain modeling for the biogeochemical nexus of cadmium in soil–rice–human health system. Environment International 167 (2022) 107424.

Lecture slides_20221122: Geodetector: Creating Randomness and Working with Heterogeneity in Big Data or Survey Data

 

6.2 Full list of the articles using Geodetector

Tab. 2. Articles using Geodetector [Numbered]

 

 

back to the top ||

 

2010

1.      Wang JF, Li XH, Christakos G, Liao YL, Zhang T, Gu X & Zheng XY. 2010. Geographical detectors-based health risk assessment and its application in the neural tube defects study of the Heshun region, China. International Journal of Geographical Information Science 24(1): 107-127.

2.      Liao YL,Wang JF, Wu JL, Driskell L, Wang WY, Zhang T, Gu X, Zheng XY. 2010. Spatial analysis of neural tube defects in a rural coal mining area.

International Journal of Environmental Health Research 20(6): 439-450.

back to the top ||

 

2011

3.      Hu Y, Wang JF, Li XH, Ren D, Zhu J. 2011. Geographical detector-based risk assessment of the under-five mortality in the 2008 Wenchuan earthquake,

China. PLoS ONE 6(6): e21427.

4.      Zou B, Wilson JG, Zhan FB, Zeng YN, Wu KJ. 2011. Spatial-temporal variations in regional ambient sulfur dioxide concentration and source-contribution analysis: A dispersion modeling approach. Atmospheric Environment 45: 4977-4985.

back to the top ||

 

2012

5.      Gajos M. 2012. Geoinformation technologies in biomedicine and health care: review of scientific journals. E. Piętka and J. Kawa (Eds.): ITIB 2012, LNCS

7339: 510–524.

6.      Li LF, Wang JF, Wu J. 2012. A spatial model to predict the incidence of neural tube defects. BMC Public Health 12: 951.

7.      Wang JF, Hu Y. 2012. Environmental health risk detection with GeogDetector. Environmental Modelling & Software 33: 114-115.

8.      刘彦随,     , 2012. 中国县域城镇化的空间特征与形成机理. 地理学报 67(8): 1011-1020.

Liu YS, Yang R.2012.Spatial characteristics and formation mechanism of the county urbanization in China. Acta Geographica Sinica 67(8): 1011-1020.

back to the top ||

 

2013

9.      Cao F, Ge Y, Wang JF. 2013. Optimal discretization for geographical detectors-based risk assessment. GIScience & Remote Sensing 50(1): 78-92.

10.     Li XW, Xie YF, Wang JF, Christakos G, Si JL, Zhao HN, Ding YQ, Li J. 2013. Influence of planting patterns on Fluoroquinolone residues in the soil of an intensive vegetable cultivation area in north China. Science of the Total Environment 458-460: 63-69.

11.     Lee WC. 2013. Assessing causal mechanistic interactions: a peril ratio index of synergy based on multiplicativity. PLoS ONE 8(6): e67424. doi:10.1371/journal.pone.0067424.

12.     Raghavan RK, Brenner KM, Harrington Jr JA, Higgins JJ, Harkin KR. 2013. Spatial scale effects in environmental risk-factor modelling for diseases. Geospatial Health 7(2): 169-182.

13.     Wang JF, Wang Y, Zhang J, Christakos G, Sun JL, Liu X, Lu L, Fu XQ, Shi YQ, Li XM. 2013. Spatiotemporal transmission and determinants of typhoid and paratyphoid fever in Hongta District, China. PLoS Neglected Tropical Diseases 7(3): e2112.

14.     Wang JF, Xu CD, Tong SL, Chen HY, Yang WZ. 2013. Spatial dynamic patterns of hand-foot-mouth disease in the People’s Republic of China. Geospatial

Health 7(2): 381-390.

back to the top ||

2014

15.     Bai HX, Ge Y, Wang JF, Li DY, Liao YL, Zheng XY. 2014. A method for extracting rules from spatial data based on rough fuzzy sets. Knowledge-Based Systems 57: 28-40.

16.     Hu Y, Gao J, Chi M, Luo C, Lynn H, Sun LQ, Tao B, Wang DC, Zhang ZJ, Jiang QW. 2014. Spatio-temporal patterns of schistosomiasis Japonica in lake and marshland areas in China: the effect of snail habitats. American Journal of Tropical Medicine and Hygiene 91(3): 547–554.

17.     Hu Z, Tang GA, Lu GN. 2014. A new geographical language: a perspective of GIS. Journal of Geographical Sciences 24(3): 560-576.

18.     Huang JX, Wang JF, Bo YC, Xu CD, Hu MG. 2014. Identification of health risks of Hand, Foot and Mouth Disease in China using the Geographical Detector Technique. International Journal of Environmental Research and Public Health 11: 3407-3423.

19.     Luo W. 2014. Impact cratering as a major factor controlling valley dissection density on MARS - a geographical detector approach. 45th Lunar and Planetary Science Conference. 2580.pdf.

20.     Qian Q, Zhao J, Fang LQ, Zhou H, Zhang WJ, Wei L, Yang H, Yin WW, Cao WC, Li Q. 2014. Mapping risk of plague in Qinghai-Tibetan Plateau, China. BMC Infectious Diseases 14: 382.

21.     Ren Y, Deng LY, Zuo SD, et al. 2014. Geographical modeling of spatial interaction between human activity and forest connectivity in an urban landscape of southeast China. Landscape Ecology 29(10): 1741-1758.

22.     Wu JL, Zhang CS, Pei LJ, Chen G, Zheng XY. 2014. Association between risk of birth defects occurring level and arsenic concentrations in soils of Lvliang, Shanxi province of China. Environmental Pollution 191: 1-7.

23.     Xu EQ, Zhang HQ. 2014. Characterization and interaction of driving factors in karst rocky desertification: a case study from Changshun, China. Solid Earth 5: 1329-1340.

 

24.     蔡芳芳,濮励杰. 2014. 南通市城乡建设用地演变时空特征与形成机理. 资源科学 36(4): 0731-0740.

Cai FF, Pu LJ. 2014. Spatial-Temporal characteristics and formation mechanism of Urban-Rural construction land in Nantong City. Resources Science 36(4): 0731-0740.

25.         悦,蔡建明,任周鹏,杨振山. 2014. 基于地理探测器的国家级经济技术开发区经济增长率空间分异及影响因素. 地理科学进展 33(5): 657-666.

Ding Y, Cai JM, Ren ZP, Yang ZS. 2014. Spatial disparities of economic growth rate of China’s National-level ETDZs and their determinants based on geographical detector analysis. Progress in Geography 33(5): 657-666.

26.         丹,舒晓波,尧    波,曹安庆. 2014. 江西省县域人均粮食占有量的时空格局演变. 地域研究与开发 33(4): 157-162.

Hu D, Shu XB, Yao B, Cao QA. 2014. The evolvement of spatial-temporal pattern of per capita grain possession in counties of Jiangxi Province. Areal Research And Development 33(4): 157-162.

27.     李成悦,王    腾,周    . 2014. 湖北省区域经济格局时空演化及其影响因素分析. 发展研究 2014(1): 47-51.

Li CY, Wang T, Zhou Y. 2014. The evolvement of Spatial-Temporal and determinants of regional economic patterns in Hubei Province. Development Research 2014(1): 47-51.

28.     倪书华. 2014. 空间统计学及其在公共卫生领域中的应用. 汕头大学学报(自然科学版)29(4): 61-67.

Ren SH. 2014. Spatial statistics and its application to the field of public health. Journal of Shantou University(Natural Science) 29(4): 61-67.

29.     通拉嘎,徐新良,付    颖,魏凤华. 2014. 地理环境因子对螺情影响的探测分析. 地理科学进展 33(5): 625-635.

Tong LG, Xu XL, Fu Y, Wei FH. 2014. Impact of environmental factors on snail distribution using geographical detector model. Progress in Geography 33(5): 625-635.

30.     魏凤娟,李江风,刘艳中. 2014. 湖北县域土地整治新增耕地的时空特征及其影响因素分析. 农业工程学报 30(14): 267-275.

Wei FJ, Li JF, Liu YZ.2014. Spatial-temporal characteristics and impact factors of newly increased farmland by land consolidation in Hubei province at county level. Transactions of the Chinese Society of Agricultural Engineering 30(14): 267-276.

31.         ,  石培基. 2014. 甘肃省县域城镇化地域差异及形成机理. 干旱区地理 37(4): 838-845.

Yang B, Shi PJ. 2014. Geographical features and formation mechanism of county level urbanization in Gansu Province. Arid Land Geography 37(4): 838-845.

32.     俞佳根,叶世康. 2014. 空间视角下中国对外直接投资与产业结构升级水平研究. 商业经济研究 34: 127-128.

Yu JG, Ye SK.2014. Outward foreign direct investment and industrial structure upgrade level from the perspective of spatial in China. Journal of Commercial Economics 34: 127-128.

back to the top ||

 

 

2015

33.     Chen YH, Ge Y, Heuvelink GBM, Hu JL, Jiang Y. 2015. Hybrid constraints of pure and mixed pixels for soft-then-hard super-resolution mapping with multiple shifted images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 8(5): 2040-2052.

34.     Hu Y, Bergquist R, Lynn H, Gao FH, Wang QZ, Zhang SQ, Li R, Sun LQ, Xia CC, Xiong CL, Zhang ZJ, Jiang QW. 2015. Sandwich mapping of schistosomiasis risk in Anhui Province, China. Geospatial Health 10: 324.

35.     Hu Y, Li R, Bergquist R, Lynn H, Gao FH, Wang QZ, Zhang AQ, Sun LQ, Zhang ZJ, Jiang QW. 2015. Spatio-temporal transmission and environmental determinants of schistosomiasis Japonica in Anhui Province, China. PLoS Neglected Tropical Diseases 9(2): e0003470. doi:10.1371/journal.pntd.0003470.

36.     Lee WC. 2015. Testing for sufficient-cause gene-environment interactions under the assumptions of independence and Hardy-Weinberg equilibrium. American Journal of Epidemiology 182(1): 9–16.

37.     Shen J, Zhang N, Gexi geduren, He B, Liu CY, Li Y, Zhang HY, Chen XY, Lin H. 2015. Construction of a GeogDetector-based model system to indicate the potential occurrence of grasshoppers in Inner Mongolia steppe habitats. Bulletin of Entomological Research 105: 335-346.

38.     Yang R, Liu YS, Long HL, Qiao LY. 2015. Spatio-temporal characteristics of rural settlements and land use in the Bohai Rim of China. Journal of Geographical Sciences 25(5): 559-572.

39.     Zhu H, Liu JM, Chen C, Lin J, Tao H. 2015. A spatial-temporal analysis of urban recreational business districts: A case study in Beijing, China. Journal of Geographical Sciences 25(12): 1521-1536.

40.     毕硕本,    , 陈昌春, 杨鸿儒,    . 2015. 地理探测器在史前聚落人地关系研究中的应用与分析. 地理科学进展 34(1): 118-127.

Bi SB, Ji H, Chen CC, Yang HR, Shen X.2015.Application of geographical detector in human-environment relationship study of prehistoric settlements. Progress in Geography 34(1): 118-127

41.     崔日明,  俞佳根. 2015.