Abstract
In this paper, we propose a new class of non-Gaussian random fields named two-piece random fields. The proposed class allows to generate random fields that have flexible marginal distributions, possibly skewed and/or heavy-tailed and, as a consequence, has a wide range of applications. We study the second-order properties of this class and provide analytical expressions for the bivariate distribution and the associated correlation functions. We exemplify our general construction by studying two examples: two-piece Gaussian and two-piece Tukey-h random fields. An interesting feature of the proposed class is that it offers a specific type of dependence that can be useful when modeling data displaying spatial outliers, a property that has been somewhat ignored from modeling viewpoint in the literature for spatial point referenced data. Since the likelihood function involves analytically intractable integrals, we adopt the weighted pairwise likelihood as a method of estimation. The effectiveness of our methodology is illustrated with simulation experiments as well as with the analysis of a georeferenced dataset of mean temperatures in Middle East.
Similar content being viewed by others
References
Alegria A, Caro S, Bevilacqua M, Porcu E, Clarke J (2017) Estimating covariance functions of multivariate skew-Gaussian random fields on the sphere. Spat Stat 22:388–402
Arellano-Valle RB, Gómez HW, Quintana FA (2005) Statistical inference for a general class of asymmetric distributions. J Stati Plan Inference 128(2):427–443
Azzalini A, Capitanio A (2014) The skew-normal and related families. United States of America by Cambridge University Press, New York
Azzimonti D, Ginsbourger D (2018) Estimating orthant probabilities of high dimensional gaussian vectors with an application to set estimation. J Comput Graph Stat 27(2):255–267
Bai Y, Kang J, Song P (2014) Efficient pairwise composite likelihood estimation for spatial-clustered data. Biometrics 7(3):661–670
Banerjee S, Carlin BP, Gelfand AE (2004) Hierarchical modeling and analysis for spatial data. Chapman & Hall/CRC Press, Boca Raton
Bevilacqua M, Gaetan C (2015) Comparing composite likelihood methods based on pairs for spatial gaussian random fields. Stat Comput 25:877–892
Bevilacqua M, Gaetan C, Mateu J, Porcu E (2012) Estimating space and space-time covariance functions for large data sets: a weighted composite likelihood approach. J Am Stat Assoc 107(497):268–280. https://doi.org/10.1080/01621459.2011.646928
Bevilacqua M, Faouzi T, Furrer R, Porcu E (2019a) Estimation and prediction using generalized wendland covariance functions under fixed domain asymptotics. Ann Stat 47(2):828–856
Bevilacqua M, Morales-Oñate V, Caamaño-Carrillo C (2019b) Geomodels: a package for geostatistical Gaussian and non Gaussian data analysis. https://vmoprojs.github.io/GeoModels-page/. r package version 1.0.3-4
Bevilacqua M, Caamaño-Carrillo C, Gaetan C (2020) On modeling positive continuous data with spatiotemporal dependence. Environmetrics 31(7):e2632
Bevilacqua M, Caamaño-Carrillo C, Arellano-Valle R, Morales-Oñate V (2021) Non-gaussian geostatistical modeling using (skew) t processes. Scand J Stat 48:212–245
Chen D, Lu C, Kou Y, Chen F (2008) On detecting spatial outliers. Geoinformatica 12:455–475
Cote M, Genest C (2019) Dependence in a background risk model. J Multivar Anal 172:28–46
Cressie N, Wikle C (2011) Statistics for spatio-temporal data. Wiley Series in Probability and Statistics, Wiley, New York
DeOliveira V (2006) On optimal point and block prediction in log-gaussian random fields. Scand J Stat 33:523–540
Diggle P, Tawn J, Moyeed R (1998) Model-based geostatistics. J Roy Stat Soc Ser C (Appl Stat) 47:299–350
Dutta S, Genton MG (2014) A non-gaussian multivariate distribution with all lower-dimensional gaussians and related families. J Multivar Anal 132:82–93. https://doi.org/10.1016/j.jmva.2014.07.007
Efron B (1982) The jackknife, the bootstrap and other resampling plans. In: CBMS-NSF regional conference series in applied mathematics, SIAM 38
Ernst M, Haesbroeck G (2017) Comparison of local outlier detection techniques in spatial multivariate data. Data Min Knowl Disc 31:371–399
Fechner GT (1897) Kollektivmasslehre. Engelmann
Feng X, Zhu J, Lin P, Steen-Adams M (2014) Composite likelihood estimation for models of spatial ordinal data and spatial proportional data with zero/one values. Environmetrics 25(8):571–583
Fernández C, Steel M (1998) On Bayesian modeling of fat tails and skewness. J Am Stat Assoc 93(441):359–371
Gelfand AE, Schliep EM (2016) Spatial statistics and gaussian processes: a beautiful marriage. Spat Stat 18:86–104
Genton MG, Zhang H (2012) Identifiability problems in some non-Gaussian spatial random fields. Chilean J Stat 3:171–179
Gentz A (1992) Numerical computation of multivariate normal probabilities. J Comput Graph Stat 1:141–150
Genz A, Bretz F (2009) Computation of multivariate normal and t probabilities, vol 195. Springer, New York
Genz A, Kenkel B (2015) pbivnorm: vectorized Bivariate Normal CDF. https://cran.r-project.org/package=pbivnorm, r package version 0.6.0
Gneiting T (2002) Nonseparable, stationary covariance functions for space-time data. J Am Stat Assoc 97(458):590–600
Gneiting T (2013) Strictly and non-strictly positive definite functions on spheres. Bernoulli 19(4):1327–1349
Goerg GM (2015) The lambert way to gaussianize heavy-tailed data with the inverse of Tukey’s h transformation as a special case. Sci World J 1–16
Gradshteyn I, Ryzhik I (2007) Table of integrals, series, and products, 7th edn. Academic Press, New York
Gräler B (2014) Modelling skewed spatial random fields through the spatial vine copula. Spat Stat 10:87–102
Haining R (1993) Spatial data analysis in the social and environmental sciences. Cambridge University Press, Cambridge
Haslett J, Brandley R, Craig P, Unwin A, Wills G (1991) Dynamic graphics for exploringspatial data with application to locating global and local anomalies. Am Stat 45:234–242
Heagerty P, Lele S (1998) A composite likelihood approach to binary spatial data. J Am Stat Assoc 93(443):1099–1111
Joe H (2014) Dependence modeling with copulas. Chapman and Hall/CRC, Boca Raton
Joe H, Lee Y (2009) On weighting of bivariate margins in pairwise likelihood. J Multivar Anal 100(4):670–685
Jones MC (2015) On families of distributions with shape parameters. Int Stat Rev 83(2):175–192
Kazianka H, Pilz J (2010) Copula-based geostatistical modeling of continuous and discrete data including covariates. Stoch Env Res Risk Assess 24:661–673
Kilibarda M, Hengl T, Heuvelink GBM, Gräler B, Pebesma E, Perčec Tadić M, Bajat B (2014) Spatio-temporal interpolation of daily temperatures for global land areas at 1 km resolution. J Geophys Res Atmosp 119(5):2294–2313
Kou Y, Lu CT, Dos Santos RF (2007) Spatial outlier detection: a graph-based approach. In: 19th IEEE international conference on tools with artificial intelligence, vol 1, pp 281–288
Lindsay B (1988) Composite likelihood methods. Contemp Math 80:221–239
Lu CT, Chen D, Kou Y (2003) Detecting spatial outliers with multiple attributes. In: Proceedings of the 15th IEEE international conference on tools with artificial intelligence, pp 122–128
Masarotto G, Varin C (2012) Gaussian copula marginal regression. Electron J Stat 6:1517–1549
Mudholkar GS, Hutson AD (2000) The epsilon skew-normal distribution for analyzing near-normal data. J Stat Plan Inference 83(2):291–309
Murthy GSR (2015) A note on multivariate folded normal distribution. Sankhya B 77:108–113
Porcu E, Bevilacqua M, Genton MG (2016) Spatio-temporal covariance and cross-covariance functions of the great circle distance on a sphere. J Am Stat Assoc 111(514):888–898
R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Rubio FJ, Steel MFJ (2020) The family of two-piece distributions. Significance 17(1):12–13
Rue H, Held L (2005) Gaussian Markov random fields: theory and applications. Chapman & Hall, London
Shekhar S, Lu CT, Zhang P (2001) Detecting graph-based spatial outliers: algorithms and applications (a summary of results). In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, association for computing machinery, New York, NY, USA, KDD ’01, pp 371–376
Shekhar S, Lu C, Zhang P (2003) A unified approach to detecting spatial outliers. GeoInformatica 7:139–166
Singh AK, Lalitha S (2018) A novel spatial outlier detection technique. Commun Stat Theory Methods 47(1):247–257
Stein M (1999) Interpolation of spatial data. Some theory of kriging. Springer-Verlag, New York
Stein M (2005) Space-time covariance functions. J Am Stat Assoc 100(492):310–321
Varin C, Vidoni P (2005) A note on composite likelihood inference and model selection. Biometrika 52(3):519–528
Varin C, Reid N, Firth D (2011) An overview of composite likelihood methods. Stat Sin 21:5–42
Wallis KF (2014) The two-piece normal, binormal, or double gaussian distribution: its origin and rediscoveries. Stat Sci 29(1):106–112
Xua G, Genton MG (2017) Tukey g-and-h random fields. J Am Stat Assoc 112(519):1236–1249
Zhang H, El-Shaarawi A (2010) On spatial skew-gaussian processes and applications. Environmetrics 21(1):33–47
Acknowledgements
Partial support was provided by FONDECYT Grant 1200068, Chile and by ANID—Millennium Science Initiative Program-NCN17_059 and by regional MATH-AmSud program, Grant Number 20-MATH-03 for Moreno Bevilacqua and by Proyecto Regular Interno DIUBB 2120538 IF/R de la Universidad del Bío-Bío for Christian Caamaño. The authors thank the associate editor, and two referees for their comments and suggestions that led to an improved presentation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix
Appendix
1.1 7.1 Proof of Lemma 1
Proof
We make use of some special functions in this proof. In particular the parabolic cylinder function \(D_{n}(x)\), the confluent hypergeometric function \({}_1F_1(a;b;x)\) and the Gaussian hypergeometric function \({}_2F_1(a;b;c;x)\) (see Gradshteyn and Ryzhik (2007) for the definitions of these functions). By definition, we have:
Taking the first integral of (32) and using (3.462.1) of Gradshteyn and Ryzhik (2007), we obtain
where \(D_{n}(x)\) is the parabolic cylinder function. Now, considering (9.240) of Gradshteyn and Ryzhik (2007):
by combining Eqs. (34) and the integral of (33) and using (7.621.4) of Gradshteyn and Ryzhik (2007), we obtain
Similarly, the second integral of (32) is given by
Combining Eqs. (35), (36) in (32), we obtain
Finally, we use the identity:
\(\square \)
Rights and permissions
About this article
Cite this article
Bevilacqua, M., Caamaño-Carrillo, C., Arellano-Valle, R.B. et al. A class of random fields with two-piece marginal distributions for modeling point-referenced data with spatial outliers. TEST 31, 644–674 (2022). https://doi.org/10.1007/s11749-021-00797-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-021-00797-5