Developing a Workflow to Identify Inconsistencies in Volunteered Geographic Information: A Phenological Case Study

Hamed Mehdipoor; Raul Zurita-Milla; Alyssa Rosemartin; Katharine L. Gerst; Jake F. Weltzin

doi:10.1371/journal.pone.0140811

Abstract

Recent improvements in online information communication and mobile location-aware technologies have led to the production of large volumes of volunteered geographic information. Widespread, large-scale efforts by volunteers to collect data can inform and drive scientific advances in diverse fields, including ecology and climatology. Traditional workflows to check the quality of such volunteered information can be costly and time consuming as they heavily rely on human interventions. However, identifying factors that can influence data quality, such as inconsistency, is crucial when these data are used in modeling and decision-making frameworks. Recently developed workflows use simple statistical approaches that assume that the majority of the information is consistent. However, this assumption is not generalizable, and ignores underlying geographic and environmental contextual variability that may explain apparent inconsistencies. Here we describe an automated workflow to check inconsistency based on the availability of contextual environmental information for sampling locations. The workflow consists of three steps: (1) dimensionality reduction to facilitate further analysis and interpretation of results, (2) model-based clustering to group observations according to their contextual conditions, and (3) identification of inconsistent observations within each cluster. The workflow was applied to volunteered observations of flowering in common and cloned lilac plants (Syringa vulgaris and Syringa x chinensis) in the United States for the period 1980 to 2013. About 97% of the observations for both common and cloned lilacs were flagged as consistent, indicating that volunteers provided reliable information for this case study. Relative to the original dataset, the exclusion of inconsistent observations changed the apparent rate of change in lilac bloom dates by two days per decade, indicating the importance of inconsistency checking as a key step in data quality assessment for volunteered geographic information. Initiatives that leverage volunteered geographic information can adapt this workflow to improve the quality of their datasets and the robustness of their scientific analyses.

Citation: Mehdipoor H, Zurita-Milla R, Rosemartin A, Gerst KL, Weltzin JF (2015) Developing a Workflow to Identify Inconsistencies in Volunteered Geographic Information: A Phenological Case Study. PLoS ONE 10(10): e0140811. https://doi.org/10.1371/journal.pone.0140811

Editor: Yanguang Chen, Peking UIniversity, CHINA

Received: April 23, 2015; Accepted: September 29, 2015; Published: October 20, 2015

This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication

Data Availability: All data are available from: 10.5061/dryad.0262m.

Funding: This research was supported in part by a Google Faculty Research Award to RZM and by Cooperative Agreements G09AC00310 and G14AC00405 from the United States Geological Survey to the University of Arizona. Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The contribution of volunteers to the production of information about geographic phenomena, such as the impacts of climate change, is not new. For example, the Christmas Bird Count has studied the impacts of climate change on spatial distribution and population trends of selected bird species in North America since 1900 [1]. However, improvements in online information communication and mobile location-aware technologies have led to a dramatic increase in the amount of volunteered geographic information (VGI) in recent years [2–5]. VGI, a term coined by Goodchild [2], refers to "the harnessing of tools to create, assemble, and disseminate geographic data provided voluntarily by individuals". VGI is a practical approach to acquire timely and detailed geographic information at low cost across a variety of spatial and temporal scales [6]. Because of this, VGI is used to understand and manage important emerging problems in many fields such as conservation biology [7], urban planning [8], disaster management [9] and earth observation [10–12].

Despite the wide applicability and acceptability of VGI in science [4, 13], many studies argue that the quality of the observations provided by volunteers remains a concern [6, 14–21]. This is because VGI does not often follow scientific principles of sampling design, and levels of expertise vary among volunteers [22, 23]. Moreover, unlike traditional authoritative geographic information, VGI typically lacks automated quality checking mechanisms [24–28]. Among the different data quality aspects, consistency of VGI is considered key for most studies, where inconsistent VGI are observations that are implausible regarding the conditions, geographic location or time they were obtained. Such inconsistent observations can bias analysis and modeling results because they are not representative for the variable studied, or because they decrease the ratio of signal to noise. Hence, the identification of inconsistent observations would clearly benefit VGI-based applications and provide more robust datasets to the scientific community.

The approaches to check VGI quality can be categorized into three main types [6, 20]: 1) crowdsourcing where volunteers validate and thus refine the quality of observations by themselves, 2) social which relies on a hierarchy of trusted people who act as moderators, and 3) geographic, where given the location of the volunteered observations, one can use certain geographic rules to assess quality, e.g., Tobler's “first law of geography” which states that “all things are related, but nearby things are more related than distant things” (Tobler, 1970). The geographic approach is more readily machine-automated than the other two approaches (which rely on human subjectivity) [6], and is therefore the focus of this study.

As an example, eBird, a popular VGI-based initiative for bird monitoring, uses the geographic approach to automatically verify new observations, using historical observations, prior to human moderation [29]. The eBird quality filter relies on substantial prior knowledge about a given organism, geography or time (e.g., a measure of how frequently a species is reported in a region during a specific time period), as well as information about volunteer expertise levels [25]. Such information is not always available for VGI-based initiatives.

Schlieder and Yanenko [30] used spatiotemporal proximity and social distance (i.e. the distance between the observers in the social network of observers on the web) to define constraints for checking the inconsistency of observations. The hypothesis was that spatiotemporally and socially close observations presumably referred to the same event so would more likely be consistent. Their workflow was used to formulate general rules and to find observations that have low confirmation. This workflow was further developed using constraint satisfaction approach to produce more sophisticated results [31]. However, the improved workflow still uses spatial distance as the only criterion to connect observations. Moreover, this workflow is useful only when sequential order of volunteered observations is available at a given location.

Yet another geographic workflow was proposed by Ali and Schmid [32] based on machine learning for identifying wrongly-categorized Open Street Map observations. These authors trained a classifier using contributed entities and their associated class labels (e.g., park or garden). However, their model was only concerned with the inconsistency of areal entities (i.e., extended geometric entities such as buildings) regarding administrative boundaries and semantic classifications.

There is a lack of standardized workflows that address VGI inconsistency. Current inconsistency workflows primarily rely on human review, or simple statistical deviation from an expected probability distribution. Human-dependent workflows can be costly and time-consuming, and are impracticable in some situations, e.g., in cases where events persist only for short periods of time. The statistical workflows assume that the majority of the observations are consistent and, therefore, that these can be used to check for inconsistency. Moreover existing workflows do not optimally use environmental contextual data. This raises the question of how to address inconsistency using a more objective, efficient and automated workflow.

This paper describes a novel automated workflow to identify inconsistency in VGI. A robust identification of inconsistent observations allows testing their potential impact on VGI-based studies. The workflow relies on the availability of contextual information and is built using a combination of dimensionality reduction, clustering and outlier detection techniques and it was illustrated using observations on the timing of the first flower of lilac plants collected by volunteers. While some inconsistent observations may reflect real, unusual events, here we demonstrate that these observations bias the trends (advancement rates) of the date of lilac flowering onset. This shows that identifying inconsistent observations is a pre-requisite to study and interpret the impact of climate change on the timing of life cycle events [33, 34].

Materials and Methods

Phenological VGI

Phenology is the science of the study of periodic plant and animal life cycle events and how seasonal and inter‐annual variations in climate affect them. Phenological studies are important to understand the impact of global change in our planet [35–38]. Worldwide, several VGI-based initiatives collect or have collected phenological data [39, 40]. One VGI-based initiative, the USA National Phenology Network (USA-NPN; www.usanpn.org), has recently released a curated dataset of lilac leafing and flowering observations across the continental United States for the period 1956 to 2014 [41]. From this dataset we extracted flowering records for common lilac (Syringa vulgaris) and cloned lilac (S. x chinensis ‘Red Rothomagensis’). Considering data completeness and the availability of environmental contextual data, we concentrated our analyses on flowering onset dates for the period 1980 to 2013, for cloned lilacs (with 2174 observations) and common lilacs (with 2682 observations) separately.

Widespread and readily observable, lilac plants have been observed across the continental United States since the 1950’s, as a complement to cooperative weather data collection [42]. Observations of lilac leafing, flowering and fruiting have been used for a variety of applications, including understanding trends and variations in the onset of spring and tracking the impacts of climate change on natural resources[43]. Although lilacs are ornamental plants, their phenology and response to climate have been shown to closely track native species and crops [33].

The following attributes were used to check inconsistency for cloned and common lilac flowering dates: (1) a unique ID for each record, (2) the year when the flowering occurred, (3) the day of the year (DOY) when the flowering occurred and (4) geographic location where the phenological phase was reported (latitude, longitude and elevation). It is important to note that since 2009, volunteers report the status of each phenological phase with”Yes” when it is visible and “No” when it is not visible [44]. This status monitoring approach allows for the quantification of uncertainty in flowering onset DOYs (i.e., number of days between the “Yes” and the preceding “No”). Thus, the status monitoring provides additional information on the occurrence of multiple flowering events in a year for individual plants. When a “Yes” report was followed by at least one “No” report and then a subsequent “Yes” record was present on an individual plant, all corresponding DOYs to “Yes” reports were flagged and stored as multiple “Yes” observations in the dataset.

Environmental contextual data

The proposed workflow requires environmental contextual data to characterize observation locations. In phenology, cumulative climatic parameters are the most relevant contextual datasets, because most phenological processes are driven by climate conditions [37, 45, 46]. Therefore, we extracted climate parameters for the period 1980 to 2013 from DAYMET, a dataset that provides 1 km by 1 km gridded estimates of daily climatic parameters for North America [47].

Cumulative climatic variables were created for each geographic location by summing parameter values from the 1 January for the year of the observation to the reported DOY of flowering. Cumulative variables calculated include: maximum daily temperature (degrees C), minimum daily temperature (degrees C), daily precipitation (mm/day), daily water vapor pressure (Pa), daily solar radiation (W/m²), daily day length (s/day) and daily snow water equivalent (kg/m²). In addition, using the daily maximum and minimum temperatures, we calculated daily average temperatures and cumulative average daily temperature (degrees C). Thus, a total of 11 contextual variables (i.e., 8 cumulative climatic variables and the 3 geographic variables of latitude, longitude and elevation) were associated with each phenological observation expressed as DOY (Table 1).

Download:

Table 1. Mean and standard deviation of the geographic and climatic parameters for cloned and common lilacs.

https://doi.org/10.1371/journal.pone.0140811.t001

The context-aware workflow

The proposed context-aware inconsistency check workflow builds upon elements from existing workflows. More precisely, it relies on the wide availability of contextual (environmental and geographic) information, enabling us to characterize complex differences between observation locations in space and time. When this characterization results in a high-dimensional dataset, the data are mapped to a low-dimensional space to facilitate the subsequent analysis of the data and the visualization of the results. Next, observations are clustered into contextually homogenous subsets. Finally, inconsistent observations are identified by analyzing the outliers present in each cluster.

Dimensionality reduction.

The t-distributed stochastic neighbor embedding (t-SNE) algorithm [48] was selected to reduce the dimensionality of the contextual information. This algorithm maps the data to a low-dimensional space, typically two or three dimensions, so that data visualization is possible. It retains the local structure of the data which means that similar objects are mapped to nearby points in the low-dimensional space. Moreover, the model-based clustering step of the workflow has limited ability to deal with high-dimensional data, which further justify the use of the t-SNE algorithm.

The t-SNE defines a probability distribution over pairs of data points in the high-dimensional space so that similar ones have a high probability of being selected. Next, the t-SNE defines a similar distribution over the data points in the low-dimensional space in such a way it minimizes the information lost when such distribution is used to approximate the distribution in high-dimensional space. In particular, t-SNE uses the Kullback–Leibler divergence [49] which quantifies the difference between the two probability distributions (in this case, those of the original and of the low dimensional data points).

The t-SNE algorithm requires the definition of the perplexity value, which is a smooth measure of the effective number of neighbors used to define the probability distribution in the high- and low-dimensional spaces. However, typical perplexity values are located in a limited interval (between 5 and 50) so optimizing its value is relatively easy. We used the “t-SNE” R package to perform all calculations in this study [50].

Model-based clustering.

Model-based clustering [51, 52] was selected to cluster the contextual information because it automatically identifies the number, shape and size of the clusters present in a dataset. This increases the objectivity of the analysis by reducing the need for human intervention and facilitates its use for multiple applications. The automated identification of cluster characteristics is realized by sequentially fitting several mixture models [53] to the dataset and selecting the one that maximizes the Bayesian Information Criterion (BIC) [54]. We calculated the BIC values for ten Gaussian mixture models currently available in the R package, “mclust” [55].

The uncertainty of the clustering was calculated (by subtracting the probability of the most likely group for each data point from one) and analyzed to determine its impact on the identification of inconsistent observations. Data points with an uncertainty value of more than 0.5 were ignored as they could be either an inconsistent or a mis-clustered observation.

The model-based clustering method implemented in “mclust” uses the Expectation Maximization (EM) algorithm [56]. The EM, an iterative method, is used to find maximum likelihood parameters of a mixture model, specifying the mixture component to which each data point belongs. This algorithm is relatively robust but its efficiency is negatively affected by the dimensionality of the input data because the number of parameters that need to be estimated is proportional to the dimensionality of the data [55].

Intra-cluster outlier detection.

The identification of inconsistent observations requires defining objective and easily automatable rules. Here we used the Tukey boxplot as a main tool to highlight inconsistent observations [57]. The boxplot is a hybrid non-parametric method that displays variation and outliers in numerical data by visually indicating its degree of dispersion and skewness in the data (Fig 1). The bottom and top of the box represent the first (Q₁) and third (Q₃) quartiles of the data respectively, and the band inside the box represents the second quartile (the median).

Download:

Fig 1. The Tukey boxplot.

https://doi.org/10.1371/journal.pone.0140811.g001

In the Tukey boxplot the whiskers cover 150% of the interquartile range (i.e. 1.5 x IQR). If the numerical data are normally distributed, points larger or smaller than the values represented by the whiskers are 0.7% of the data and are typically considered outliers [57]. In this study, these outliers are highlighted as inconsistent observations. The outlier detection is also done using the built-in function of boxplot in the R software package to create an automated and clean workflow that can be re-used for multiple applications.

Impact of inconsistent observations.

To investigate the impact of the inclusion of inconsistent observations in an analysis of phenological patterns, we used linear regression to model the trend in the flowering onset DOY–with and without inconsistent observations–over the complete study period. Regression models were developed for pooled observations of cloned and common lilacs, and separately for each type of lilac. Finally, we used analysis of covariance [58] to test the effect of the inconsistency of observations (i.e., consistent and inconsistent) on flowering onset DOY while controlling for the effect of the year of observations. This analysis is used to statistically test for differences in slopes among regression models. The regression modeling and the covariance analysis were done using built-in functions of the R software package.

Results and Discussion

The eleven-dimensional data space that characterizes the phenological observation was transformed to a two-dimensional space (Fig 2) while testing several perplexity values (5 to 50 in steps of 5 units). The optimal perplexity value was chosen as the one that maximizes clustering (i.e. the one that better “spreads” and “separates” the observations into distinct groups). For both datasets, the perplexity value equaled 35, which led to the maximum number of clusters that the EM algorithm could identify.

Download:

Fig 2. The results of applying t-SNE on contextual information.

The transformed contextual information for (A) cloned lilac and (B) common lilac.

https://doi.org/10.1371/journal.pone.0140811.g002

A visual inspection of the transformed data space in Fig 2 shows that the environmental conditions of the observation sites for cloned lilac are similar to each other, as the majority of points formed a cloud shape. It also shows that the observation sites for the common lilac are more clustered, indicating that these observations are made in more contrasting environments [59] relative to the cloned lilacs [60]. This is consistent with the fact that cloned lilacs were only observed in the Eastern U.S. [57], which is characterized by less environmental variability than the Western U.S. (Table 1).

As expected from the t-SNE results, the number of clusters for the common lilac (47 clusters) is larger than for the cloned lilac (12 clusters). These results (Fig 3) demonstrate that a diagonal Gaussian mixture distribution—-with equal shape, variable volume and coordinate axes orientation—-fits best the contextual information for both cloned and common lilacs (Table 2).

Download:

Fig 3. The results and uncertainty of model-based clustering.

Clusters of the transformed contextual information about (A) cloned lilac and (B) common lilac. The uncertainty in clustering of transformed contextual information about (C) cloned lilac and (D) common lilac. In uncertainty plot, the symbols have the following meaning: large filled symbols, 95% quantile of uncertainty; smaller open symbols, 75–95% quantile; small dots, first three quartiles of uncertainty.

https://doi.org/10.1371/journal.pone.0140811.g003

Download:

Table 2. The fitted mixture models currently in the “mclust” package and their corresponding BIC values.

https://doi.org/10.1371/journal.pone.0140811.t002

The phenological observations belonging to each cluster were projected into the geographic space to study their geographic distribution (Figs 4 and 5). For both types of lilac, the observation sites that belong to the same cluster are often spatially clustered (i.e., clusters tend to be compact). Nevertheless, there are some sparse clusters (e.g., cluster 7 and 10 of cloned and clusters 29,31, 32, 36 and 40 of common lilac) that indicate geographically distant observation sites with similar climatic context.

Download:

Fig 4. The geographic distribution of the clusters in context condition of cloned lilac.

https://doi.org/10.1371/journal.pone.0140811.g004

Download:

Fig 5. The geographic distribution of the clusters in context condition of common lilac.

https://doi.org/10.1371/journal.pone.0140811.g005

The variability across the interquartile ranges and median values of the clusters for common lilacs is greater than for cloned lilac (Fig 6). The greater variability in observations on common lilac reported from the Western U.S. was expected based on the clusters described above, and has been noted in other studies [22, 61]. The outliers identified by the boxplots were highlighted as inconsistent phenological observations in this study.

Download:

Fig 6. Intra-cluster boxplot of DOYs that lilac started flowering.

Boxplots of corresponding DOYs in clusters of transformed contextual information for (A) cloned lilac and (B) common lilac. Hollow circles represent intra-cluster outliers.

https://doi.org/10.1371/journal.pone.0140811.g006

Inconsistent observations were found in both pre- and post-2009 phenological observations (Fig 7). For both types of lilacs, the highlighted inconsistencies accounted for about 3% of phenological observations (3.1% and 2.9% of phenological observations on cloned and common lilac respectively). 53% of the inconsistent observations on cloned lilacs have greater than one week uncertainty (>7 days between the prior “No” and the first “Yes” observation) whereas less than 15% of inconsistent observation on common lilac have greater than one week uncertainty in the estimated onset DOYs. Moreover, 41% of the inconsistent observations of cloned lilac and 50% of the common lilacs are associated with sites that report multiple flowering in a year (post 2009, when reports of repeat flowering were allowed, e.g., to account for flowering activity after frosts).

Download:

Fig 7. Plot of inconsistent phenological observations through study area.

Inconsistent volunteered observations on flowering onset DOY of (A) cloned lilac and (B) common lilac. Red points show unusually early while blue ones show unusually late phenological observation. Circles show that phenological observations from historical initiatives whereas stars show phenological observations from contemporary initiatives. Inconsistencies were labeled with the day of year that lilac started flowering.

https://doi.org/10.1371/journal.pone.0140811.g007

The unusually late “Yes” observation are not necessarily a result of erroneous data collection, because lilacs can also flower in the autumn (which may be associated with different environmental factors). In addition, unusually early “Yes” reports preceded by a second consistent “Yes” spring record might point to mild winter in which lilacs start flowering early, experience frost, and then set flower again. For example, in 2012 in Charlottesville, Virginia, first flowering of a cloned lilac shrub was reported in February (i.e., early relative to other observations at the site). The flowering of the shrub was also reported later, on April 7^th, which is more consistent, as determined by the workflow.

For cloned lilacs, the rate of change in flowering onset DOY (i.e., the slope of the regressions) significantly (P < 0.001) changed from -0.19 to -0.37 when inconsistent observations were excluded. In other words, using the cleaned dataset for the trend analysis resulted in two days additional advancement per decade in flowering onset of cloned lilac compared to the raw dataset. Likewise, for common lilacs, excluding inconsistent observations affected the regression slope, but to a lesser degree (from 0.12 to 0.9; P = 0.06) than in the cloned lilacs. For the pooled observations, the slope changed from -0.02 to -0.12 (P < 0.001) when the inconsistent observations were removed, resulting in one additional day advancement per decade in flowering onset across the U.S.. Thus, the inclusion of inconsistent observation underestimates the rate of acceleration of the lilac onset dates over the period 1980–2013 (Fig 8). These results are in agreement with previous studies that found a gradual advance in the flowering onset DOYs [22, 34].

Download:

Fig 8. Comparison of the linear modeling of the original phenological observations and the consistent phenological observations.

Temporal trends in the flowering onset DOY of (A) cloned lilac, (B) common lilac, and (C) pooled observations of cloned and common lilac.

https://doi.org/10.1371/journal.pone.0140811.g008

Conclusions

The identification of inconsistent observations is a pre-requisite for any kind of analysis or modeling effort. In this paper, using a phenology case study, we present and demonstrate a computational workflow that has potential to automate the identification of inconsistencies in data collected by VGI-based initiatives. The workflow relies on environmental data as critical context that affects the variability in the observational datasets, and consists of a sequence of dimensionality reduction, model-based clustering and outlier detection.

The workflow demonstrated that we can highlight unusually early or late observations of the flowering onset DOYs for lilacs. The identified inconsistencies should be further analyzed using more granular climate data or expert knowledge to determine if they are likely observation or transcription errors or represent truly anomalous events, due to microclimate, or genetic variation, in the case common lilacs. Overall low inconsistency rate (about 3%) indicates that volunteer collected observations are a valuable source of information for the study of phenology.

Phenological VGI has greatly contributed to our understanding of seasonal spatial and temporal patterns for plants and animals across the globe. Given that phenology has been recognized as an important indicator of climate change and has emerged as a vibrant area of research at multiple ecological scales, analyses that increase data quality and usability will greatly benefit the fields of climate research, ecology, and natural resource management. We envision that this workflow will greatly increase the reliability of, and potential for scientific contribution from, spatially and temporally rich VGI datasets.

Focusing subsequent analysis on the inconsistent observations identified by our workflow reduces human checks, which saves money and time. Moreover, unlike existing workflows, the proposed workflow uses relevant contextual information for the phenomena under study (as climate drives phenological events). Therefore, we recommend that initiatives collecting volunteered geographic information use the proposed automated workflow and relevant contextual information to check inconsistency in order to improve data quality. This workflow could be applied to volunteered meteorological data [62] to, for instance, highlight unusually high or low temperature reports because daily weather data has a long history and is increasingly available [63].

Acknowledgments

Disclaimer: Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Lilac observations were provided by the USA National Phenology Network and the many participants who have contributed data over time. This research was supported in part by a Google Faculty Research Award to RZM and by Cooperative Agreements G09AC00310 and G14AC00405 from the United States Geological Survey to the University of Arizona.

Author Contributions

Conceived and designed the experiments: HM RZM. Performed the experiments: HM. Analyzed the data: HM RZM AR KLG JFW. Contributed reagents/materials/analysis tools: HM RZM AR KLG JFW. Wrote the paper: HM RZM.

References

1. Butcher GS, Niven DK. Combining data from the Christmas Bird Count and the Breeding Bird Survey to determine the continental status and trends of North America birds. National Audubon Society. 2007.
2. Goodchild MF. Citizens as sensors: the world of volunteered geography. Geojournal. 2007;69(4):211–221.
- View Article
- Google Scholar
3. Gouveia C, Fonseca A. New approaches to environmental monitoring: the use of ICT to explore volunteered geographic information. Geojournal. 2008;72(3):185–197.
- View Article
- Google Scholar
4. Feick R, Roche S. Understanding the Value of VGI. Crowdsourcing Geographic Knowledge: Springer Netherlands; 2013.15–29. https://doi.org/10.1007/978-94-007-4587-2_2
5. Parker CJ. The Rise of Volunteered Information. The Fundamentals of Human Factors Design for Volunteered Geographic Information: Springer International Publishing; 2014. https://doi.org/10.1007/978-3-319-03503-1
6. Goodchild MF, Li L. Assuring the quality of volunteered geographic information. Spat Stat. 2012;1:110–120.
- View Article
- Google Scholar
7. Newell DA, Pembroke MM, Boyd WE. Crowd Sourcing for Conservation: Web 2.0 a Powerful Tool for Biologists. Future Internet. 2012;4(2):551.
- View Article
- Google Scholar
8. Brabham DC. Crowdsourcing the public participation process for planning projects. Planning Theory. 2009;8(3):242–262.
- View Article
- Google Scholar
9. Goodchild MF, Glennon JA. Crowdsourcing geographic information for disaster response: a research frontier. IJDE. 2010;3(3):231–241.
- View Article
- Google Scholar
10. van Vliet AJ, de Groot RS, Bellens Y, Braun P, Bruegger R, Bruns E, et al. The European phenology network. Int J Biometeorol. 2003;47(4):202–212. pmid:12734744
- View Article
- PubMed/NCBI
- Google Scholar
11. Mayer A. Phenology and Citizen Science Volunteers have documented seasonal events for more than a century, and scientific studies are benefiting from the data. Bioscience. 2010;60(3):172–175.
- View Article
- Google Scholar
12. Ferster CJ, Coops NC. A review of earth observation using mobile personal communication devices. Comput Geosci. 2013;51:339–349.
- View Article
- Google Scholar
13. Dickinson JL, Zuckerberg B, Bonter DN. Citizen science as an ecological research tool: challenges and benefits. Annu Rev Ecol Evol Syst. 2010;41:149–172.
- View Article
- Google Scholar
14. Elwood S. Volunteered geographic information: key questions, concepts and methods to guide emerging research and practice. Geojournal. 2008;72(3):133–135.
- View Article
- Google Scholar
15. Flanagin AJ, Metzger MJ. The credibility of volunteered geographic information. Geojournal. 2008;72(3–4):137–148.
- View Article
- Google Scholar
16. Coleman DJ, Georgiadou Y, Labonte J. Volunteered geographic information: The nature and motivation of produsers. IJSDIR. 2009; 4(1):332–358.
- View Article
- Google Scholar
17. Goodchild M. NeoGeography and the nature of geographic expertise. JLBS. 2009;3(2):82–96.
- View Article
- Google Scholar
18. Matyas S, Kiefer P, Schlieder C, Kleyer S. Wisdom about the crowd: assuring geospatial data quality collected in location-based games. Entertainment Computing–ICEC 2011: Springer; 2011.331–336. https://doi.org/10.1007/978-3-642-24500-8_36
19. Galindo A, Díaz P, Huerta J. A quality approach to volunteer geographic information. Proc ISSDQ. 2011:109–114.
- View Article
- Google Scholar
20. Elwood S, Goodchild M, Sui D. Prospects for VGI Research and the Emerging Fourth Paradigm. Crowdsourcing Geographic Knowledge: Springer Netherlands; 2013.361–375. https://doi.org/10.1007/978-94-007-4587-2_20
21. Bimonte S, Boucelma O, Machabert O, Sellami S. From Volunteered Geographic Information to Volunteered Geographic OLAP: A VGI Data Quality-Based Approach. Computational Science and Its Applications. Lecture Notes in Computer Science. 8582: Springer International Publishing; 2014.69–80. https://doi.org/10.1007/978-3-319-09147-1_6
22. Brunsdon C, Comber L. Assessing the changing flowering date of the common lilac in North America: a random coefficient model approach. Geoinformatica. 2012;16(4):675–690.
- View Article
- Google Scholar
23. Comber A, See L, Fritz S, Van der Velde M, Perger C, Foody G. Using control data to determine the reliability of volunteered geographic information about land cover. Int J Appl Earth Obs Geoinf. 2013;23:37–48.
- View Article
- Google Scholar
24. Kelling S, Yu J, Gerbracht J, Wong WK. Emergent Filters: Automated Data Verification in a Large-scale Citizen Science Project. eScienceW; 2011: IEEE. https://doi.org/10.1109/eScienceW.2011.13
25. Kelling S, Gerbracht J, Fink D, Lagoze C, Wong WK, Yu J, et al. eBird: A Human/Computer Learning Network for Biodiversity Conservation and Research. IAAI; 2011. https://doi.org/10.1609/aimag.v34i1.2431
26. Vuurens J, de Vries AP, Eickhoff C. How much spam can you take? an analysis of crowdsourcing results to increase accuracy. Proc CIR. 2011:21–26.
27. Comber A, Brunsdon C, See L, Fritz S, McCallum I. Comparing Expert and Non-expert Conceptualisations of the Land: An Analysis of Crowdsourced Land Cover Data. Spatial Information Theory. Lecture Notes in Computer Science. 8116: Springer International Publishing; 2013.243–260. https://doi.org/10.1007/978-3-319-01790-7_14
28. See L, Comber A, Salk C, Fritz S, van der Velde M, Perger C, et al. Comparing the quality of crowdsourced data contributed by expert and non-experts. PLOS ONE. 2013;8(7):e69958. pmid:23936126
- View Article
- PubMed/NCBI
- Google Scholar
29. Sullivan BL, Wood CL, Iliff MJ, Bonney RE, Fink D, Kelling S. eBird: A citizen-based bird observation network in the biological sciences. Biol Conserv. 2009;142(10):2282–2292.
- View Article
- Google Scholar
30. Schlieder C, Yanenko O. Spatio-temporal proximity and social distance: a confirmation framework for social reporting. Proc 2nd IWLBSN; San Jose, California. 1867711: ACM; 2010.60–67. https://doi.org/10.1145/1867699.1867711
31. Yanenko O, Schlieder C. Enhancing the Quality of Volunteered Geographic Information: A Constraint-Based Approach. Bridging the Geographic Information Sciences. Lecture Notes in Geoinformation and Cartography: Springer Berlin Heidelberg; 2012.429–446. https://doi.org/10.1007/978-3-642-29063-3_23
32. Ali A, Schmid F. Data Quality Assurance for Volunteered Geographic Information. Geographic Information Science. Lecture Notes in Computer Science: Springer International Publishing; 2014.126–141. https://doi.org/10.1007/978-3-319-11593-1_9
33. Schwartz MD, Ault TR, Betancourt JL. Spring onset variations and trends in the continental United States: past and regional assessment using temperature‐based indices. Int J Climatol. 2013;33(13):2917–2922.
- View Article
- Google Scholar
34. Ault T, Henebry G, de Beurs K, Schwartz M, Betancourt J, Moore D. The false spring of 2012, earliest in North American record. Eos (Washington DC). 2013:181–182.
- View Article
- Google Scholar
35. Schwartz MD. Detecting the onset of spring: a possible application of phenological models. Clim Res. 1990; 1(1):23–29.
- View Article
- Google Scholar
36. Cleland EE, Chuine I, Menzel A, Mooney HA, Schwartz MD. Shifting plant phenology in response to global change. Trends Ecol Evol. 2007;22(7):357–365. pmid:17478009
- View Article
- PubMed/NCBI
- Google Scholar
37. Barr A, Black TA, McCaughey H. Climatic and Phenological Controls of the Carbon and Energy Balances of Three Contrasting Boreal Forest Ecosystems in Western Canada. Phenology of Ecosystem Processes: Springer New York; 2009.3–34. https://doi.org/10.1007/978-1-4419-0026-5_1
38. Keatley M, Hudson I. Introduction and Overview. Phenological Research: Springer Netherlands; 2010.1–22. https://doi.org/10.1007/978-90-481-3335-2_1
39. Koch E. Global Framework for Data Collection–Data Bases, Data Availability, Future Networks, Online Databases. Phenological Research: Springer Netherlands; 2010.23–61. https://doi.org/10.1007/978-90-481-3335-2_2
40. Schwartz MD. Phenology: An Integrative Environmental Science. Phenological Data, Networks, and Research: Springer; 2013. https://doi.org/10.1007/978-94-007-0632-3
41. Rosemartin AH, Denny EG, Weltzin JF, Lee Marsh R, Wilson BE, Mehdipoor H, et al. Lilac and honeysuckle phenology data 1956–2014. Sci Data. 2015;2:150038. pmid:26306204
- View Article
- PubMed/NCBI
- Google Scholar
42. Schwartz MD, Betancourt JL, Weltzin JF. From Caprio's lilacs to the USA National Phenology Network. Front Ecol Environ. 2012;10(6):324–327.
- View Article
- Google Scholar
43. Schwartz MD, Ahas R, Aasa A. Onset of spring starting earlier across the Northern Hemisphere. Global Change Biol. 2006;12(2):343–351.
- View Article
- Google Scholar
44. Denny E, Gerst K, Miller-Rushing A, Tierney G, Crimmins T, Enquist CF, et al. Standardized phenology monitoring methods to track plant and animal activity for science and resource management applications. Int J Biometeorol. 2014;58(4):591–601. pmid:24458770
- View Article
- PubMed/NCBI
- Google Scholar
45. Schwartz M. Phenoclimatic Measures. Phenology: An Integrative Environmental Science. Tasks for Vegetation Science. 39: Springer Netherlands; 2003.331–343. https://doi.org/10.1007/978-94-007-0632-3_21
46. Ranta E, Lindström J, Kaitala V, Crone E, Lundberg P, Hokkanen T, et al. Life History Mediated Responses to Weather, Phenology and Large-Scale Population Patterns. Phenological Research: Springer Netherlands; 2010.321–338. https://doi.org/10.1007/978-90-481-3335-2_15
47. Thornton PE, Thornton MM, Mayer BW, Wilhelmi N, Wei Y, Devarakonda R, et al. Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Version 2. 2014.
48. Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008; 9:2580–2605.
- View Article
- Google Scholar
49. Kullback S, Leibler RA. On Information and Sufficiency. The Annals of Mathematical Statistics. 1951;22(1):79–86.
- View Article
- Google Scholar
50. Donaldson J, Donaldson MJ. Package ‘tsne’. 2010.
51. Banfield JD, Raftery AE. Model-Based Gaussian and Non-Gaussian Clustering. Biometrics. 1993;49(3):803–821.
- View Article
- Google Scholar
52. Fraley C, Raftery AE. Model-based clustering, discriminant analysis, and density estimation. JASA. 2002;97(458):611–631.
- View Article
- Google Scholar
53. Rasmussen CE. The infinite Gaussian mixture model. NIPS. 1999; 12:554–560.
- View Article
- Google Scholar
54. Biernacki C, Celeux G, Govaert G. Assessing a mixture model for clustering with the integrated completed likelihood. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 2000;22(7):719–725.
- View Article
- Google Scholar
55. Fraley C, Raftery A, Scrucca L. Normal mixture modeling for model-based clustering, classification, and density estimation. Department of Statistics, University of Washington. 2012. Available: https://cran.r-project.org/web/packages/mclust/index.html.
56. Dempster AP, Laird NM, Rubin DB. Maximum Likelihood from Incomplete Data via the EM Algorithm. J R Stat Soc Series B Stat Methodol. 1977;39(1):1–38.
- View Article
- Google Scholar
57. Frigge M, Hoaglin DC, Iglewicz B. Some Implementations of the Boxplot. The American Statistician. 1989;43(1):50–54.
- View Article
- Google Scholar
58. Logan M. Analysis of Covariance (ANCOVA). Biostatistical Design and Analysis Using R: Wiley-Blackwell; 2010.448–465. https://doi.org/10.1002/9781444319620.ch15
59. Cayan DR, Dettinger MD, Kammerdiener SA, Caprio JM, Peterson DH. Changes in the Onset of Spring in the Western United States. Bull Amer Meteor Soc. 2001;82(3):399–415.
- View Article
- Google Scholar
60. Schwartz M. Monitoring global change with phenology: The case of the spring green wave. Int J Biometeorol. 1994;38(1):18–22.
- View Article
- Google Scholar
61. Schwartz MD, Reiter BE. Changes in north American spring. Int J Climatol. 2000;20(8):929–932.
- View Article
- Google Scholar
62. Council NR. Future of the National Weather Service Cooperative Observer Network. 1998:78. Available: http://www.nap.edu/catalog/6197/future-of-the-national-weather-service-cooperative-observer-network.
63. Menne MJ, Durre I, Vose RS, Gleason BE, Houston TG. An Overview of the Global Historical Climatology Network-Daily Database. J Atmos Oceanic Technol. 2012;29(7):897–910.
- View Article
- Google Scholar

[ref1] 1. Butcher GS, Niven DK. Combining data from the Christmas Bird Count and the Breeding Bird Survey to determine the continental status and trends of North America birds. National Audubon Society. 2007.

[ref2] 2. Goodchild MF. Citizens as sensors: the world of volunteered geography. Geojournal. 2007;69(4):211–221.
View Article
Google Scholar

[3] View Article

[4] Google Scholar

[ref3] 3. Gouveia C, Fonseca A. New approaches to environmental monitoring: the use of ICT to explore volunteered geographic information. Geojournal. 2008;72(3):185–197.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref4] 4. Feick R, Roche S. Understanding the Value of VGI. Crowdsourcing Geographic Knowledge: Springer Netherlands; 2013.15–29. https://doi.org/10.1007/978-94-007-4587-2_2

[ref5] 5. Parker CJ. The Rise of Volunteered Information. The Fundamentals of Human Factors Design for Volunteered Geographic Information: Springer International Publishing; 2014. https://doi.org/10.1007/978-3-319-03503-1

[ref6] 6. Goodchild MF, Li L. Assuring the quality of volunteered geographic information. Spat Stat. 2012;1:110–120.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref7] 7. Newell DA, Pembroke MM, Boyd WE. Crowd Sourcing for Conservation: Web 2.0 a Powerful Tool for Biologists. Future Internet. 2012;4(2):551.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref8] 8. Brabham DC. Crowdsourcing the public participation process for planning projects. Planning Theory. 2009;8(3):242–262.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref9] 9. Goodchild MF, Glennon JA. Crowdsourcing geographic information for disaster response: a research frontier. IJDE. 2010;3(3):231–241.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref10] 10. van Vliet AJ, de Groot RS, Bellens Y, Braun P, Bruegger R, Bruns E, et al. The European phenology network. Int J Biometeorol. 2003;47(4):202–212. pmid:12734744
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref11] 11. Mayer A. Phenology and Citizen Science Volunteers have documented seasonal events for more than a century, and scientific studies are benefiting from the data. Bioscience. 2010;60(3):172–175.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref12] 12. Ferster CJ, Coops NC. A review of earth observation using mobile personal communication devices. Comput Geosci. 2013;51:339–349.
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref13] 13. Dickinson JL, Zuckerberg B, Bonter DN. Citizen science as an ecological research tool: challenges and benefits. Annu Rev Ecol Evol Syst. 2010;41:149–172.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref14] 14. Elwood S. Volunteered geographic information: key questions, concepts and methods to guide emerging research and practice. Geojournal. 2008;72(3):133–135.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref15] 15. Flanagin AJ, Metzger MJ. The credibility of volunteered geographic information. Geojournal. 2008;72(3–4):137–148.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref16] 16. Coleman DJ, Georgiadou Y, Labonte J. Volunteered geographic information: The nature and motivation of produsers. IJSDIR. 2009; 4(1):332–358.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref17] 17. Goodchild M. NeoGeography and the nature of geographic expertise. JLBS. 2009;3(2):82–96.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref18] 18. Matyas S, Kiefer P, Schlieder C, Kleyer S. Wisdom about the crowd: assuring geospatial data quality collected in location-based games. Entertainment Computing–ICEC 2011: Springer; 2011.331–336. https://doi.org/10.1007/978-3-642-24500-8_36

[ref19] 19. Galindo A, Díaz P, Huerta J. A quality approach to volunteer geographic information. Proc ISSDQ. 2011:109–114.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref20] 20. Elwood S, Goodchild M, Sui D. Prospects for VGI Research and the Emerging Fourth Paradigm. Crowdsourcing Geographic Knowledge: Springer Netherlands; 2013.361–375. https://doi.org/10.1007/978-94-007-4587-2_20

[ref21] 21. Bimonte S, Boucelma O, Machabert O, Sellami S. From Volunteered Geographic Information to Volunteered Geographic OLAP: A VGI Data Quality-Based Approach. Computational Science and Its Applications. Lecture Notes in Computer Science. 8582: Springer International Publishing; 2014.69–80. https://doi.org/10.1007/978-3-319-09147-1_6

[ref22] 22. Brunsdon C, Comber L. Assessing the changing flowering date of the common lilac in North America: a random coefficient model approach. Geoinformatica. 2012;16(4):675–690.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref23] 23. Comber A, See L, Fritz S, Van der Velde M, Perger C, Foody G. Using control data to determine the reliability of volunteered geographic information about land cover. Int J Appl Earth Obs Geoinf. 2013;23:37–48.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref24] 24. Kelling S, Yu J, Gerbracht J, Wong WK. Emergent Filters: Automated Data Verification in a Large-scale Citizen Science Project. eScienceW; 2011: IEEE. https://doi.org/10.1109/eScienceW.2011.13

[ref25] 25. Kelling S, Gerbracht J, Fink D, Lagoze C, Wong WK, Yu J, et al. eBird: A Human/Computer Learning Network for Biodiversity Conservation and Research. IAAI; 2011. https://doi.org/10.1609/aimag.v34i1.2431

[ref26] 26. Vuurens J, de Vries AP, Eickhoff C. How much spam can you take? an analysis of crowdsourcing results to increase accuracy. Proc CIR. 2011:21–26.

[ref27] 27. Comber A, Brunsdon C, See L, Fritz S, McCallum I. Comparing Expert and Non-expert Conceptualisations of the Land: An Analysis of Crowdsourced Land Cover Data. Spatial Information Theory. Lecture Notes in Computer Science. 8116: Springer International Publishing; 2013.243–260. https://doi.org/10.1007/978-3-319-01790-7_14

[ref28] 28. See L, Comber A, Salk C, Fritz S, van der Velde M, Perger C, et al. Comparing the quality of crowdsourced data contributed by expert and non-experts. PLOS ONE. 2013;8(7):e69958. pmid:23936126
View Article
PubMed/NCBI
Google Scholar

[64] View Article

[65] PubMed/NCBI

[66] Google Scholar

[ref29] 29. Sullivan BL, Wood CL, Iliff MJ, Bonney RE, Fink D, Kelling S. eBird: A citizen-based bird observation network in the biological sciences. Biol Conserv. 2009;142(10):2282–2292.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref30] 30. Schlieder C, Yanenko O. Spatio-temporal proximity and social distance: a confirmation framework for social reporting. Proc 2nd IWLBSN; San Jose, California. 1867711: ACM; 2010.60–67. https://doi.org/10.1145/1867699.1867711

[ref31] 31. Yanenko O, Schlieder C. Enhancing the Quality of Volunteered Geographic Information: A Constraint-Based Approach. Bridging the Geographic Information Sciences. Lecture Notes in Geoinformation and Cartography: Springer Berlin Heidelberg; 2012.429–446. https://doi.org/10.1007/978-3-642-29063-3_23

[ref32] 32. Ali A, Schmid F. Data Quality Assurance for Volunteered Geographic Information. Geographic Information Science. Lecture Notes in Computer Science: Springer International Publishing; 2014.126–141. https://doi.org/10.1007/978-3-319-11593-1_9

[ref33] 33. Schwartz MD, Ault TR, Betancourt JL. Spring onset variations and trends in the continental United States: past and regional assessment using temperature‐based indices. Int J Climatol. 2013;33(13):2917–2922.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref34] 34. Ault T, Henebry G, de Beurs K, Schwartz M, Betancourt J, Moore D. The false spring of 2012, earliest in North American record. Eos (Washington DC). 2013:181–182.
View Article
Google Scholar

[77] View Article

[78] Google Scholar

[ref35] 35. Schwartz MD. Detecting the onset of spring: a possible application of phenological models. Clim Res. 1990; 1(1):23–29.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref36] 36. Cleland EE, Chuine I, Menzel A, Mooney HA, Schwartz MD. Shifting plant phenology in response to global change. Trends Ecol Evol. 2007;22(7):357–365. pmid:17478009
View Article
PubMed/NCBI
Google Scholar

[83] View Article

[84] PubMed/NCBI

[85] Google Scholar

[ref37] 37. Barr A, Black TA, McCaughey H. Climatic and Phenological Controls of the Carbon and Energy Balances of Three Contrasting Boreal Forest Ecosystems in Western Canada. Phenology of Ecosystem Processes: Springer New York; 2009.3–34. https://doi.org/10.1007/978-1-4419-0026-5_1

[ref38] 38. Keatley M, Hudson I. Introduction and Overview. Phenological Research: Springer Netherlands; 2010.1–22. https://doi.org/10.1007/978-90-481-3335-2_1

[ref39] 39. Koch E. Global Framework for Data Collection–Data Bases, Data Availability, Future Networks, Online Databases. Phenological Research: Springer Netherlands; 2010.23–61. https://doi.org/10.1007/978-90-481-3335-2_2

[ref40] 40. Schwartz MD. Phenology: An Integrative Environmental Science. Phenological Data, Networks, and Research: Springer; 2013. https://doi.org/10.1007/978-94-007-0632-3

[ref41] 41. Rosemartin AH, Denny EG, Weltzin JF, Lee Marsh R, Wilson BE, Mehdipoor H, et al. Lilac and honeysuckle phenology data 1956–2014. Sci Data. 2015;2:150038. pmid:26306204
View Article
PubMed/NCBI
Google Scholar

[91] View Article

[92] PubMed/NCBI

[93] Google Scholar

[ref42] 42. Schwartz MD, Betancourt JL, Weltzin JF. From Caprio's lilacs to the USA National Phenology Network. Front Ecol Environ. 2012;10(6):324–327.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref43] 43. Schwartz MD, Ahas R, Aasa A. Onset of spring starting earlier across the Northern Hemisphere. Global Change Biol. 2006;12(2):343–351.
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref44] 44. Denny E, Gerst K, Miller-Rushing A, Tierney G, Crimmins T, Enquist CF, et al. Standardized phenology monitoring methods to track plant and animal activity for science and resource management applications. Int J Biometeorol. 2014;58(4):591–601. pmid:24458770
View Article
PubMed/NCBI
Google Scholar

[101] View Article

[102] PubMed/NCBI

[103] Google Scholar

[ref45] 45. Schwartz M. Phenoclimatic Measures. Phenology: An Integrative Environmental Science. Tasks for Vegetation Science. 39: Springer Netherlands; 2003.331–343. https://doi.org/10.1007/978-94-007-0632-3_21

[ref46] 46. Ranta E, Lindström J, Kaitala V, Crone E, Lundberg P, Hokkanen T, et al. Life History Mediated Responses to Weather, Phenology and Large-Scale Population Patterns. Phenological Research: Springer Netherlands; 2010.321–338. https://doi.org/10.1007/978-90-481-3335-2_15

[ref47] 47. Thornton PE, Thornton MM, Mayer BW, Wilhelmi N, Wei Y, Devarakonda R, et al. Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Version 2. 2014.

[ref48] 48. Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008; 9:2580–2605.
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref49] 49. Kullback S, Leibler RA. On Information and Sufficiency. The Annals of Mathematical Statistics. 1951;22(1):79–86.
View Article
Google Scholar

[111] View Article

[112] Google Scholar

[ref50] 50. Donaldson J, Donaldson MJ. Package ‘tsne’. 2010.

[ref51] 51. Banfield JD, Raftery AE. Model-Based Gaussian and Non-Gaussian Clustering. Biometrics. 1993;49(3):803–821.
View Article
Google Scholar

[115] View Article

[116] Google Scholar

[ref52] 52. Fraley C, Raftery AE. Model-based clustering, discriminant analysis, and density estimation. JASA. 2002;97(458):611–631.
View Article
Google Scholar

[118] View Article

[119] Google Scholar

[ref53] 53. Rasmussen CE. The infinite Gaussian mixture model. NIPS. 1999; 12:554–560.
View Article
Google Scholar

[121] View Article

[122] Google Scholar

[ref54] 54. Biernacki C, Celeux G, Govaert G. Assessing a mixture model for clustering with the integrated completed likelihood. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 2000;22(7):719–725.
View Article
Google Scholar

[124] View Article

[125] Google Scholar

[ref55] 55. Fraley C, Raftery A, Scrucca L. Normal mixture modeling for model-based clustering, classification, and density estimation. Department of Statistics, University of Washington. 2012. Available: https://cran.r-project.org/web/packages/mclust/index.html.

[ref56] 56. Dempster AP, Laird NM, Rubin DB. Maximum Likelihood from Incomplete Data via the EM Algorithm. J R Stat Soc Series B Stat Methodol. 1977;39(1):1–38.
View Article
Google Scholar

[128] View Article

[129] Google Scholar

[ref57] 57. Frigge M, Hoaglin DC, Iglewicz B. Some Implementations of the Boxplot. The American Statistician. 1989;43(1):50–54.
View Article
Google Scholar

[131] View Article

[132] Google Scholar

[ref58] 58. Logan M. Analysis of Covariance (ANCOVA). Biostatistical Design and Analysis Using R: Wiley-Blackwell; 2010.448–465. https://doi.org/10.1002/9781444319620.ch15

[ref59] 59. Cayan DR, Dettinger MD, Kammerdiener SA, Caprio JM, Peterson DH. Changes in the Onset of Spring in the Western United States. Bull Amer Meteor Soc. 2001;82(3):399–415.
View Article
Google Scholar

[135] View Article

[136] Google Scholar

[ref60] 60. Schwartz M. Monitoring global change with phenology: The case of the spring green wave. Int J Biometeorol. 1994;38(1):18–22.
View Article
Google Scholar

[138] View Article

[139] Google Scholar

[ref61] 61. Schwartz MD, Reiter BE. Changes in north American spring. Int J Climatol. 2000;20(8):929–932.
View Article
Google Scholar

[141] View Article

[142] Google Scholar

[ref62] 62. Council NR. Future of the National Weather Service Cooperative Observer Network. 1998:78. Available: http://www.nap.edu/catalog/6197/future-of-the-national-weather-service-cooperative-observer-network.

[ref63] 63. Menne MJ, Durre I, Vose RS, Gleason BE, Houston TG. An Overview of the Global Historical Climatology Network-Daily Database. J Atmos Oceanic Technol. 2012;29(7):897–910.
View Article
Google Scholar

[145] View Article

[146] Google Scholar

Figures

Abstract

Introduction

Materials and Methods

Phenological VGI

Environmental contextual data

The context-aware workflow

Dimensionality reduction.

Model-based clustering.

Intra-cluster outlier detection.

Impact of inconsistent observations.

Results and Discussion

Conclusions

Acknowledgments

Author Contributions

References