The paper Quality of life, big data and the power of statistics (by Shivam Gupta, Jorge Mateu, Auriol Degbelo, Edzer Pebesma) has been published in Special issue dedicated to Statistics and Big Data of journal Statistics & Probability Letters Volume 136 – May 2018
Abstract: The digital era has opened up new possibilities for data-driven research. This paper discusses big data challenges in environmental monitoring and reflects on the use of statistical methods in tackling these challenges for improving the quality of life in cities.
With an increasing number of people moving in (and to) urban areas, there is an urgent need of examining what this rising number means for the environment and QoL in cities. Air quality has an effect on the population’s QoL (Darçın, 2014), which is also the major environmental risk factor for health. Data for environmental and meteorological analysis are not only of a significant volume but are also complex in space and time. Formats and types of data are also very diverse (e.g., netCDF, GDB, CSV, GeoTIFF, shapefile, JSON, etc.), and many interconnections prevail within data, which make it complicated for traditional data analysis procedures. As Scott (2017) said, statistics remains highly relevant irrespective of ‘bigness’ of data. It provides the basis to make data speak while taking into account the inherent uncertainties. Statistical analysis involves developing data collection procedures to further handle different data sources and to propose formal models for analysis and predictions.
In the published paper we focused on the role of statistics in handling the five Vs (Volume, Velocity Variety, Veracity and Value) of big data, and the challenges posed. We proposed to combine two well-established statistical methods to optimise the selection of variables and locations for spatial and temporal analysis of environmental data sources (with more focus on air quality monitoring). The combined use of both methods; Land Use Regression (LUR) and Spatial Simulated Annealing (SSA), proposed in the paper will help in designing data acquisition processes so that the maximum information can be extracted given a specific number of possible measurement sites. Limiting the data sources can increase the speed of the analysis. Hence, making big data analysis more effective regardless of the “bigness”.
For more detail information, please access the article from : https://www.sciencedirect.com/science/article/pii/S0167715218300750
The article is Open Access and is funded by European Commission within the Marie Skłodowska-Curie Actions, International Training Networks (ITN), European Joint Doctorates (EJD). The funding period is January 1, 2015 – December 31, 2018, Grant Agreement number 642332 — GEO-C — H2020-MSCA-ITN-2014.