Skip to main content
Log in

A class of random fields with two-piece marginal distributions for modeling point-referenced data with spatial outliers

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

In this paper, we propose a new class of non-Gaussian random fields named two-piece random fields. The proposed class allows to generate random fields that have flexible marginal distributions, possibly skewed and/or heavy-tailed and, as a consequence, has a wide range of applications. We study the second-order properties of this class and provide analytical expressions for the bivariate distribution and the associated correlation functions. We exemplify our general construction by studying two examples: two-piece Gaussian and two-piece Tukey-h random fields. An interesting feature of the proposed class is that it offers a specific type of dependence that can be useful when modeling data displaying spatial outliers, a property that has been somewhat ignored from modeling viewpoint in the literature for spatial point referenced data. Since the likelihood function involves analytically intractable integrals, we adopt the weighted pairwise likelihood as a method of estimation. The effectiveness of our methodology is illustrated with simulation experiments as well as with the analysis of a georeferenced dataset of mean temperatures in Middle East.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Alegria A, Caro S, Bevilacqua M, Porcu E, Clarke J (2017) Estimating covariance functions of multivariate skew-Gaussian random fields on the sphere. Spat Stat 22:388–402

    MathSciNet  Google Scholar 

  • Arellano-Valle RB, Gómez HW, Quintana FA (2005) Statistical inference for a general class of asymmetric distributions. J Stati Plan Inference 128(2):427–443

    MathSciNet  MATH  Google Scholar 

  • Azzalini A, Capitanio A (2014) The skew-normal and related families. United States of America by Cambridge University Press, New York

    MATH  Google Scholar 

  • Azzimonti D, Ginsbourger D (2018) Estimating orthant probabilities of high dimensional gaussian vectors with an application to set estimation. J Comput Graph Stat 27(2):255–267

    MathSciNet  MATH  Google Scholar 

  • Bai Y, Kang J, Song P (2014) Efficient pairwise composite likelihood estimation for spatial-clustered data. Biometrics 7(3):661–670

    MathSciNet  MATH  Google Scholar 

  • Banerjee S, Carlin BP, Gelfand AE (2004) Hierarchical modeling and analysis for spatial data. Chapman & Hall/CRC Press, Boca Raton

    MATH  Google Scholar 

  • Bevilacqua M, Gaetan C (2015) Comparing composite likelihood methods based on pairs for spatial gaussian random fields. Stat Comput 25:877–892

    MathSciNet  MATH  Google Scholar 

  • Bevilacqua M, Gaetan C, Mateu J, Porcu E (2012) Estimating space and space-time covariance functions for large data sets: a weighted composite likelihood approach. J Am Stat Assoc 107(497):268–280. https://doi.org/10.1080/01621459.2011.646928

    Article  MathSciNet  MATH  Google Scholar 

  • Bevilacqua M, Faouzi T, Furrer R, Porcu E (2019a) Estimation and prediction using generalized wendland covariance functions under fixed domain asymptotics. Ann Stat 47(2):828–856

    MathSciNet  MATH  Google Scholar 

  • Bevilacqua M, Morales-Oñate V, Caamaño-Carrillo C (2019b) Geomodels: a package for geostatistical Gaussian and non Gaussian data analysis. https://vmoprojs.github.io/GeoModels-page/. r package version 1.0.3-4

  • Bevilacqua M, Caamaño-Carrillo C, Gaetan C (2020) On modeling positive continuous data with spatiotemporal dependence. Environmetrics 31(7):e2632

    MathSciNet  Google Scholar 

  • Bevilacqua M, Caamaño-Carrillo C, Arellano-Valle R, Morales-Oñate V (2021) Non-gaussian geostatistical modeling using (skew) t processes. Scand J Stat 48:212–245

    MathSciNet  MATH  Google Scholar 

  • Chen D, Lu C, Kou Y, Chen F (2008) On detecting spatial outliers. Geoinformatica 12:455–475

    Google Scholar 

  • Cote M, Genest C (2019) Dependence in a background risk model. J Multivar Anal 172:28–46

    MathSciNet  MATH  Google Scholar 

  • Cressie N, Wikle C (2011) Statistics for spatio-temporal data. Wiley Series in Probability and Statistics, Wiley, New York

    MATH  Google Scholar 

  • DeOliveira V (2006) On optimal point and block prediction in log-gaussian random fields. Scand J Stat 33:523–540

    MathSciNet  Google Scholar 

  • Diggle P, Tawn J, Moyeed R (1998) Model-based geostatistics. J Roy Stat Soc Ser C (Appl Stat) 47:299–350

    MathSciNet  MATH  Google Scholar 

  • Dutta S, Genton MG (2014) A non-gaussian multivariate distribution with all lower-dimensional gaussians and related families. J Multivar Anal 132:82–93. https://doi.org/10.1016/j.jmva.2014.07.007

    Article  MathSciNet  MATH  Google Scholar 

  • Efron B (1982) The jackknife, the bootstrap and other resampling plans. In: CBMS-NSF regional conference series in applied mathematics, SIAM 38

  • Ernst M, Haesbroeck G (2017) Comparison of local outlier detection techniques in spatial multivariate data. Data Min Knowl Disc 31:371–399

    MathSciNet  Google Scholar 

  • Fechner GT (1897) Kollektivmasslehre. Engelmann

  • Feng X, Zhu J, Lin P, Steen-Adams M (2014) Composite likelihood estimation for models of spatial ordinal data and spatial proportional data with zero/one values. Environmetrics 25(8):571–583

    MathSciNet  Google Scholar 

  • Fernández C, Steel M (1998) On Bayesian modeling of fat tails and skewness. J Am Stat Assoc 93(441):359–371

    MathSciNet  MATH  Google Scholar 

  • Gelfand AE, Schliep EM (2016) Spatial statistics and gaussian processes: a beautiful marriage. Spat Stat 18:86–104

    MathSciNet  Google Scholar 

  • Genton MG, Zhang H (2012) Identifiability problems in some non-Gaussian spatial random fields. Chilean J Stat 3:171–179

    MathSciNet  MATH  Google Scholar 

  • Gentz A (1992) Numerical computation of multivariate normal probabilities. J Comput Graph Stat 1:141–150

    Google Scholar 

  • Genz A, Bretz F (2009) Computation of multivariate normal and t probabilities, vol 195. Springer, New York

    MATH  Google Scholar 

  • Genz A, Kenkel B (2015) pbivnorm: vectorized Bivariate Normal CDF. https://cran.r-project.org/package=pbivnorm, r package version 0.6.0

  • Gneiting T (2002) Nonseparable, stationary covariance functions for space-time data. J Am Stat Assoc 97(458):590–600

    MathSciNet  MATH  Google Scholar 

  • Gneiting T (2013) Strictly and non-strictly positive definite functions on spheres. Bernoulli 19(4):1327–1349

    MathSciNet  MATH  Google Scholar 

  • Goerg GM (2015) The lambert way to gaussianize heavy-tailed data with the inverse of Tukey’s h transformation as a special case. Sci World J 1–16

  • Gradshteyn I, Ryzhik I (2007) Table of integrals, series, and products, 7th edn. Academic Press, New York

    MATH  Google Scholar 

  • Gräler B (2014) Modelling skewed spatial random fields through the spatial vine copula. Spat Stat 10:87–102

    MathSciNet  Google Scholar 

  • Haining R (1993) Spatial data analysis in the social and environmental sciences. Cambridge University Press, Cambridge

    Google Scholar 

  • Haslett J, Brandley R, Craig P, Unwin A, Wills G (1991) Dynamic graphics for exploringspatial data with application to locating global and local anomalies. Am Stat 45:234–242

    Google Scholar 

  • Heagerty P, Lele S (1998) A composite likelihood approach to binary spatial data. J Am Stat Assoc 93(443):1099–1111

    MathSciNet  MATH  Google Scholar 

  • Joe H (2014) Dependence modeling with copulas. Chapman and Hall/CRC, Boca Raton

    MATH  Google Scholar 

  • Joe H, Lee Y (2009) On weighting of bivariate margins in pairwise likelihood. J Multivar Anal 100(4):670–685

    MathSciNet  MATH  Google Scholar 

  • Jones MC (2015) On families of distributions with shape parameters. Int Stat Rev 83(2):175–192

    MathSciNet  Google Scholar 

  • Kazianka H, Pilz J (2010) Copula-based geostatistical modeling of continuous and discrete data including covariates. Stoch Env Res Risk Assess 24:661–673

    MATH  Google Scholar 

  • Kilibarda M, Hengl T, Heuvelink GBM, Gräler B, Pebesma E, Perčec Tadić M, Bajat B (2014) Spatio-temporal interpolation of daily temperatures for global land areas at 1 km resolution. J Geophys Res Atmosp 119(5):2294–2313

    Google Scholar 

  • Kou Y, Lu CT, Dos Santos RF (2007) Spatial outlier detection: a graph-based approach. In: 19th IEEE international conference on tools with artificial intelligence, vol 1, pp 281–288

  • Lindsay B (1988) Composite likelihood methods. Contemp Math 80:221–239

    MathSciNet  MATH  Google Scholar 

  • Lu CT, Chen D, Kou Y (2003) Detecting spatial outliers with multiple attributes. In: Proceedings of the 15th IEEE international conference on tools with artificial intelligence, pp 122–128

  • Masarotto G, Varin C (2012) Gaussian copula marginal regression. Electron J Stat 6:1517–1549

    MathSciNet  MATH  Google Scholar 

  • Mudholkar GS, Hutson AD (2000) The epsilon skew-normal distribution for analyzing near-normal data. J Stat Plan Inference 83(2):291–309

    MathSciNet  MATH  Google Scholar 

  • Murthy GSR (2015) A note on multivariate folded normal distribution. Sankhya B 77:108–113

    MathSciNet  MATH  Google Scholar 

  • Porcu E, Bevilacqua M, Genton MG (2016) Spatio-temporal covariance and cross-covariance functions of the great circle distance on a sphere. J Am Stat Assoc 111(514):888–898

    MathSciNet  Google Scholar 

  • R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

  • Rubio FJ, Steel MFJ (2020) The family of two-piece distributions. Significance 17(1):12–13

    Google Scholar 

  • Rue H, Held L (2005) Gaussian Markov random fields: theory and applications. Chapman & Hall, London

    MATH  Google Scholar 

  • Shekhar S, Lu CT, Zhang P (2001) Detecting graph-based spatial outliers: algorithms and applications (a summary of results). In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, association for computing machinery, New York, NY, USA, KDD ’01, pp 371–376

  • Shekhar S, Lu C, Zhang P (2003) A unified approach to detecting spatial outliers. GeoInformatica 7:139–166

    Google Scholar 

  • Singh AK, Lalitha S (2018) A novel spatial outlier detection technique. Commun Stat Theory Methods 47(1):247–257

    MathSciNet  MATH  Google Scholar 

  • Stein M (1999) Interpolation of spatial data. Some theory of kriging. Springer-Verlag, New York

    MATH  Google Scholar 

  • Stein M (2005) Space-time covariance functions. J Am Stat Assoc 100(492):310–321

    MathSciNet  MATH  Google Scholar 

  • Varin C, Vidoni P (2005) A note on composite likelihood inference and model selection. Biometrika 52(3):519–528

    MathSciNet  MATH  Google Scholar 

  • Varin C, Reid N, Firth D (2011) An overview of composite likelihood methods. Stat Sin 21:5–42

    MathSciNet  MATH  Google Scholar 

  • Wallis KF (2014) The two-piece normal, binormal, or double gaussian distribution: its origin and rediscoveries. Stat Sci 29(1):106–112

    MathSciNet  MATH  Google Scholar 

  • Xua G, Genton MG (2017) Tukey g-and-h random fields. J Am Stat Assoc 112(519):1236–1249

    MathSciNet  Google Scholar 

  • Zhang H, El-Shaarawi A (2010) On spatial skew-gaussian processes and applications. Environmetrics 21(1):33–47

    MathSciNet  Google Scholar 

Download references

Acknowledgements

Partial support was provided by FONDECYT Grant 1200068, Chile and by ANID—Millennium Science Initiative Program-NCN17_059 and by regional MATH-AmSud program, Grant Number 20-MATH-03 for Moreno Bevilacqua and by Proyecto Regular Interno DIUBB 2120538 IF/R de la Universidad del Bío-Bío for Christian Caamaño. The authors thank the associate editor, and two referees for their comments and suggestions that led to an improved presentation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Moreno Bevilacqua.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (r 3 KB)

Appendix

Appendix

1.1 7.1 Proof of Lemma 1

Proof

We make use of some special functions in this proof. In particular the parabolic cylinder function \(D_{n}(x)\), the confluent hypergeometric function \({}_1F_1(a;b;x)\) and the Gaussian hypergeometric function \({}_2F_1(a;b;c;x)\) (see Gradshteyn and Ryzhik (2007) for the definitions of these functions). By definition, we have:

$$\begin{aligned}&\mathrm{I\! E}(|T_{h}(\varvec{s}_{i})||T_{h}(\varvec{s}_{j})|)\nonumber \\&\quad =\mathrm{I\! E}(|G(\varvec{s}_{i})e^{ \frac{h(G(\varvec{s}_{i}))^{2}}{2} }||G(\varvec{s}_{j})e^{\frac{h(G(\varvec{s}_{j}))^{2}}{2}}|)\nonumber \\&\quad =\int \limits _{\mathbb {R}^2_+} |g_{i}e^{ \frac{hg_i^{2}}{2} }| |g_{j}e^{ \frac{hg_{j}^{2}}{2}}| f_{|\varvec{G}_{ij}|}(g_i,g_j){\mathrm{d}}g_{i}{\mathrm{d}}g_{j}\nonumber \\&\quad =\frac{1}{\pi (1-\rho ^2(\varvec{h}))^{1/2}}\int \limits _{\mathbb {R}^2_+}g_ig_j e^{-\frac{1}{2(1-\rho ^2(\varvec{h}))}\left[ g_i^2+g_j^2-2\rho (\varvec{h})g_ig_j\right] }e^{\frac{hg_i^2}{2}+\frac{hg_j^2}{2}}{\mathrm{d}}g_i{\mathrm{d}}g_j \nonumber \\&\qquad +\frac{1}{\pi (1-\rho ^2(\varvec{h}))^{1/2}}\int \limits _{\mathbb {R}^2_+}g_ig_j e^{-\frac{1}{2(1-\rho ^2(\varvec{h}))}\left[ g_i^2+g_j^2+2\rho (\varvec{h})g_ig_j\right] }e^{\frac{hg_i^2}{2}+\frac{hg_j^2}{2}}{\mathrm{d}}g_i{\mathrm{d}}g_j\nonumber \\&\quad =A_1+A_2. \end{aligned}$$
(32)

Taking the first integral of (32) and using (3.462.1) of Gradshteyn and Ryzhik (2007), we obtain

$$\begin{aligned} A_1&=\frac{1}{\pi (1-\rho ^2(\varvec{h}))^{1/2}}\int \limits _{\mathbb {R}_+}g_je^{-\frac{[1-(1-\rho ^2(\varvec{h}))h]g^2_j}{2(1-\rho ^2(\varvec{h}))}} \left[ \int \limits _{\mathbb {R}_+}g_ie^{\left[ -\frac{[1-(1-\rho ^2(\varvec{h}))h]g^2_i}{2(1-\rho ^2(\varvec{h}))}+\frac{\rho (\varvec{h})g_ig_j}{(1-\rho ^2(\varvec{h}))}\right] }{\mathrm{d}}g_i\right] {\mathrm{d}}g_j\nonumber \\&=\frac{1}{\pi (1-\rho ^2(\varvec{h}))^{1/2}}\left[ \frac{(1-\rho ^2(\varvec{h}))}{1-(1-\rho ^2(\varvec{h}))h}\right] \int \limits _{\mathbb {R}_+} g_je^{-\left[ \frac{[1-(1-\rho ^2(\varvec{h}))h]}{2(1-\rho ^2(\varvec{h}))}-\frac{\rho ^2(\varvec{h})}{4(1-\rho ^2(\varvec{h}))[1-(1-\rho ^2(\varvec{h}))h]}\right] g^2_j}\nonumber \\&\quad \times D_{-2}\left( -\frac{\rho (\varvec{h})g_j}{\sqrt{(1-\rho ^2(\varvec{h}))[1-(1-\rho ^2(\varvec{h}))h]}}\right) {\mathrm{d}}g_j, \end{aligned}$$
(33)

where \(D_{n}(x)\) is the parabolic cylinder function. Now, considering (9.240) of Gradshteyn and Ryzhik (2007):

$$\begin{aligned}&D_{-2}\left( -\frac{\rho (\varvec{h})g_j}{\sqrt{(1-\rho ^2(\varvec{h}))[1-(1-\rho ^2(\varvec{h}))h]}}\right) \nonumber \\&\quad =e^{-\frac{\rho ^2(\varvec{h})g^2_j}{4(1-\rho ^2(\varvec{h}))[1-(1-\rho ^2(\varvec{h}))h]}}\nonumber \\&\qquad \times {}_1F_1\left( 1;\frac{1}{2};\frac{\rho ^2(\varvec{h})g^2_j}{2(1-\rho ^2(\varvec{h}))[1-(1-\rho ^2(\varvec{h}))h]}\right) \nonumber \\&\qquad +\frac{\sqrt{2\pi }\rho (\varvec{h})g_j}{2\sqrt{(1-\rho ^2(\varvec{h}))[1-(1-\rho ^2(\varvec{h}))h]}}e^{-\frac{\rho ^2(\varvec{h})g^2_j}{4(1-\rho ^2(\varvec{h}))[1-(1-\rho ^2(\varvec{h}))h]}}\nonumber \\&\qquad \times {}_1F_1\left( \frac{3}{2};\frac{3}{2};\frac{\rho ^2(\varvec{h})g^2_j}{2(1-\rho ^2(\varvec{h}))[1-(1-\rho ^2(\varvec{h}))h]}\right) . \end{aligned}$$
(34)

by combining Eqs. (34) and the integral of (33) and using (7.621.4) of Gradshteyn and Ryzhik (2007), we obtain

$$\begin{aligned} A_1&=\frac{(1-\rho ^2(\varvec{h}))^{1/2}}{\pi [1-(1-\rho ^2(\varvec{h}))h]}\int \limits _{\mathbb {R}_+}g_j e^{-\frac{[1-(1-\rho ^2(\varvec{h}))h]g_j^2}{2(1-\rho ^2(\varvec{h}))}}\nonumber \\&\quad {}_1F_1\left( 1;\frac{1}{2};\frac{\rho ^2(\varvec{h})g^2_j}{2(1-\rho ^2(\varvec{h}))[1-(1-\rho ^2(\varvec{h}))h]}\right) {\mathrm{d}}g_j\nonumber \\&\qquad +\frac{\sqrt{2\pi }\rho (\varvec{h})}{2\pi [1-(1-\rho ^2(\varvec{h}))h]^{3/2}}\int \limits _{\mathbb {R}_+}g_j^2 e^{-\frac{[1-(1-\rho ^2(\varvec{h}))h]g_j^2}{2(1-\rho ^2(\varvec{h}))}}\nonumber \\&\quad {}_1F_1\left( \frac{3}{2};\frac{3}{2};\frac{\rho ^2(\varvec{h})g^2_j}{2(1-\rho ^2(\varvec{h}))[1-(1-\rho ^2(\varvec{h}))h]}\right) {\mathrm{d}}g_j\nonumber \\&\quad =\frac{(1-\rho ^2(\varvec{h}))^{3/2}}{\pi [1-(1-\rho ^2(\varvec{h}))h]^2} {}_2F_1\left( 1,1;\frac{1}{2};\frac{\rho ^2(\varvec{h})}{[1-(1-\rho ^2(\varvec{h}))h]^2}\right) \nonumber \\&\qquad +\frac{\rho (\varvec{h})(1-\rho ^2(\varvec{h}))^{3/2}}{2[1-(1-\rho ^2(\varvec{h}))h]^3}{}_2F_1\left( \frac{3}{2},\frac{3}{2};\frac{3}{2};\frac{\rho ^2(\varvec{h})}{[1-(1-\rho ^2(\varvec{h}))h]^2}\right) . \end{aligned}$$
(35)

Similarly, the second integral of (32) is given by

$$\begin{aligned} A_2&=\frac{(1-\rho ^2(\varvec{h}))^{3/2}}{\pi [1-(1-\rho ^2(\varvec{h}))h]^2} {}_2F_1\left( 1,1;\frac{1}{2};\frac{\rho ^2(\varvec{h})}{[1-(1-\rho ^2(\varvec{h}))h]^2}\right) \nonumber \\&\quad -\frac{\rho (\varvec{h})(1-\rho ^2(\varvec{h}))^{3/2}}{2[1-(1-\rho ^2(\varvec{h}))h]^3} {}_2F_1\left( \frac{3}{2},\frac{3}{2};\frac{3}{2};\frac{\rho ^2(\varvec{h})}{[1-(1-\rho ^2(\varvec{h}))h]^2}\right) . \end{aligned}$$
(36)

Combining Eqs. (35), (36) in (32), we obtain

$$\begin{aligned} \mathrm{I\! E}\left( |T_{h}(\varvec{s}_{i})||T_{h}(\varvec{s}_{j})|\right)&=\frac{2\left( 1-\rho ^2(\varvec{h})\right) ^{3/2}}{\pi [1-(1-\rho ^2(\varvec{h}))h]^2} {}_2F_1\left( 1,1;\frac{1}{2};\frac{\rho ^2(\varvec{h})}{[1-(1-\rho ^2(\varvec{h}))h]^2}\right) . \end{aligned}$$

Finally, we use the identity:

$$\begin{aligned} {}_2F_1\left( 1,1;\frac{1}{2};x\right) =\frac{\sqrt{x}\arcsin (\sqrt{x})+\sqrt{1-x}}{(1-x)^{3/2}} \end{aligned}$$

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bevilacqua, M., Caamaño-Carrillo, C., Arellano-Valle, R.B. et al. A class of random fields with two-piece marginal distributions for modeling point-referenced data with spatial outliers. TEST 31, 644–674 (2022). https://doi.org/10.1007/s11749-021-00797-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-021-00797-5

Keywords

Mathematics Subject Classification

Navigation