Tests for comparison of multiple endpoints with application to omics data

Marco Marozzi

doi:10.1515/sagmb-2017-0033

Published by De Gruyter January 30, 2018

Tests for comparison of multiple endpoints with application to omics data

Marco Marozzi

From the journal Statistical Applications in Genetics and Molecular Biology

https://doi.org/10.1515/sagmb-2017-0033

Showing a limited preview of this publication:

Abstract

In biomedical research, multiple endpoints are commonly analyzed in “omics” fields like genomics, proteomics and metabolomics. Traditional methods designed for low-dimensional data either perform poorly or are not applicable when analyzing high-dimensional data whose dimension is generally similar to, or even much larger than, the number of subjects. The complex biochemical interplay between hundreds (or thousands) of endpoints is reflected by complex dependence relations. The aim of the paper is to propose tests that are very suitable for analyzing omics data because they do not require the normality assumption, are powerful also for small sample sizes, in the presence of complex dependence relations among endpoints, and when the number of endpoints is much larger than the number of subjects. Unbiasedness and consistency of the tests are proved and their size and power are assessed numerically. It is shown that the proposed approach based on the nonparametric combination of dependent interpoint distance tests is very effective. Applications to genomics and metabolomics are discussed.

Keywords: biomarker; case-control study; high-dimensional data; metabolomics; nonparametric tests

Acknowledgement

We are very grateful to Prof. Dr. H. Shen for kindly providing the second data set analyzed in Section 5.

Conflict of Interest: The author has declared no conflict of interest.

A Appendix

Theorem 1

The FR_euclid test is unbiased for testing H₀ : 𝛍 = 0 against H₁ : 𝛍 ≠ 0.

Proof. We consider the following additive model, which is equivalent to the location difference setting considered in Section 2,

{Xi=μ+Vii=1,...,mYm+j=Vm+jj=1,...,n

where Vs are independent and identically distributed multivariate random variables with 0 location and Σ variance-covariance matrix with no infinite elements. Note that Vs are independent among themselves but their p components can be dependent. Let

Z(0)=(Zi(0),i=1,...,N)=(Vi,i=1,...,N)

denote the pooled sample under the null hypothesis and let

Z(μ)=(Zi(μ),i=1,...,N)=(μ+Vi,i=1,...,m;Vm+j,j=1,...,n)

denote the pooled sample under the alternative hypothesis. Define similarly Z~(0) and Z~(μ).

The FR_euclid test rejects for large values of its statistic, therefore to prove unbiasedness we have to show that the FR_euclid test statistic is stochastically larger when μ ≠ 0, ie under H₁, than when μ = 0, ie under H₀, as shown by Theorem 3 in Pesarin and Salmaso (2010) p. 138.

Theorem 1 in Marozzi (2015a) shows that the M_euclid test is unbiased. Therefore when μ ≠ 0 the M_euclid test statistic is stochastically smaller than when μ = 0. It follows that

πMeuclid(Z(μ))≤πMeuclid(Z(0)),

note that M_euclid test rejects for small values of its statistic. Of course it is also πM~euclid(Z~(μ))≤πM~euclid(Z~(0)). As a consequence

FReuclid(Z(μ))=log⁡(1πMeuclid(Z(μ)))+log⁡(1πM~euclid(Z~(μ)))≥≥log⁡(1πMeuclid(Z(0)))+log⁡(1πM~euclid(Z~(0)))=FReuclid(Z(0))

ie the FR_euclid test statistic is stochastically larger under H₁ than under H₀. This result completes the proof. QED □

Theorem 2

The FR_euclid test is consistent for testing H₀ : 𝛍 = 0 against H₁ : 𝛍 ≠ 0.

Proof. From Theorem 2 in Marozzi (2015a) it follows that the M_euclid test is consistent, that is

limN→∞πMeuclid(Z(μ))=0,

where N → ∞ means that m, n → ∞ with mN→λ<1. Of course it is also limN→∞πM~euclid(Z~(μ))=0. It follows that for N → ∞

FReuclid(Z(μ))=log⁡(1πMeuclid(Z(μ)))+log⁡(1πM~euclid(Z~(μ)))→∞

and that limN→∞πFReuclid(Z(μ))=0. As a consequence, when N diverges the probability that the FR_euclid test rejects the null hypothesis when the null hypothesis is false tends to 1 and therefore the test is consistent. QED □

References

Bai, Z. and H. Saranadasa (1996): “Effect of high dimension: by an example of a two sample problem,” Stat. Sinica, 6, 311–329.Search in Google Scholar

Brombin, C., E. Midena and L. Salmaso (2013): “Robust non-parametric tests for complex-repeated measures problems in ophthalmology,” Stat. Methods Med. Res., 22, 643–660.10.1177/0962280211403659Search in Google Scholar PubMed

Cai, T. T., W. Liu and Y. Xia (2014): “Two-sample test of high dimensional means under dependence,” J. R. Stat. Soc. B, 76, 349–372.10.1111/rssb.12034Search in Google Scholar

Chen, S. X. and Y. L. Qin (2010): “A two-sample test for high-dimensional data with applications to gene-set testing,” Ann. Stat., 38, 808–835.10.1214/09-AOS716Search in Google Scholar

Hajek, J., Z. Sidak and P. K. Sen (1998): Theory of rank tests, 2nd ed., Academic Press, New York.Search in Google Scholar

Huang, Z., L. Lin, Y. Gao, Y. Chen, X. Yan, J. Xing and W. Hang (2011): “Bladder cancer determination via two urinary metabolites: A biomarker pattern approach,” Mol. Cell. Proteomics, 10, M111.007922. DOI: 10.1074/mcp.M111.007922.10.1074/mcp.M111.007922Search in Google Scholar PubMed PubMed Central

Jauregui, O., D. Corella, M. Ruiz-Canela, J. Salas-Salvado, M. Fito, E. Ros, R. Estruch and C. Andres-Lacueva (2015): “A metabolomics-driven approach to predict cocoa product consumption by designing a multimetabolite biomarker model in free-living subjects from the PREDIMED study,” Mol. Nutr. Food Res., 59, 212–220.10.1002/mnfr.201400434Search in Google Scholar PubMed

Jureckova, J. and J. Kalina (2012): “Nonparametric multivariate rank tests and their unbiasedness,” Bernoulli, 18, 229–251.10.3150/10-BEJ326Search in Google Scholar

Lacey, M., C. Baribault and M. Ehrlich (2013): “Modeling, simulation and analysis of methylation profiles from reduced representation bisulfite sequencing experiments,” Stat. Appl. Genet. Mol. Biol., 12, 723–742.10.1515/sagmb-2013-0027Search in Google Scholar PubMed

Marozzi, M. (2015a): “Multivariate multidistance tests for high-dimensional low sample size case-control studies,” Stat. Med., 34, 1511–1526.10.1002/sim.6418Search in Google Scholar PubMed

Marozzi, M. (2015b): “Does bad inference drive out good?,” Clin. Exp. Pharmacol. P., 42, 727–733.10.1111/1440-1681.12422Search in Google Scholar PubMed

Marozzi, M. (2016): “Multivariate tests based on interpoint distances with application to magnetic resonance imaging,” Stat. Methods Med. Res., 25, 2593–2610.10.1177/0962280214529104Search in Google Scholar PubMed

Nelsen, R. B. (2006): An introduction to copulas, 2nd ed., Springer Science+Buisness, New York.Search in Google Scholar

Notterman, D. A., U. Alon, A. J. Sierk and A. J. Levine (2001): “Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays,” Cancer Res., 61, 3124–3130.Search in Google Scholar

Pesarin, F. and L. Salmaso (2010): Permutation tests for complex data, Wiley, Chichester.10.1002/9780470689516Search in Google Scholar

Soussi, T. and K. G. Wiman (2015): “TP53: an oncogene in disguise,” Cell Death Differ., 22, 1239–1249.10.1038/cdd.2015.53Search in Google Scholar PubMed PubMed Central

Srivastava, M. S. and T. Kubokawa (2013): “Tests for multivariate analysis of variance in high dimension under non-normality,” J. Multivariate A., 115, 204–216.10.1016/j.jmva.2012.10.011Search in Google Scholar

Xia, J., D. I. Broadhurst, M. Wilson and D. S. Wishart (2013): “Translational biomarker discovery in clinical metabolomics: an introductory tutorial,” Metabolomics, 9, 280–299.10.1007/s11306-012-0482-9Search in Google Scholar PubMed PubMed Central

Yan, J. (2007): “Enjoy the joy of copulas: with a package copula,” J. Stat. Softw., 21, 1–21.10.18637/jss.v021.i04Search in Google Scholar

Zhang, J., Z. Huang, M. Chen, Y. Xia, F. L. Martin, W. Hang and H. Shen (2014a): “Urinary metabolome identifies signatures of oligozoospermic infertile men,” Fertil. Steril., 102, 44–53.10.1016/j.fertnstert.2014.03.033Search in Google Scholar PubMed

Zhang, J., X. Mu, Y. Xia, F. L. Martin, W. Hang, L. Liu, M. Tian, Q. Huang and H. Shen (2014b): “Metabolomic analysis reveals a unique urinary pattern in normozoospermic infertile men,” J. Proteome Res., 13, 3088–3099.10.1021/pr5003142Search in Google Scholar PubMed

Zhang, J., H. Shen, W. Xu, Y. Xia, D. B. Barr, X. Mu, X. Wang, L. Liu, Q. Huang and M. Tian (2014c): “Urinary metabolomics revealed arsenic internal dose-related metabolic alterations: a proof-of-concept study in a Chinese male cohort,” Environ. Sci. Technol., 48, 12265–12274.10.1021/es503659wSearch in Google Scholar PubMed PubMed Central

Published Online: 2018-1-30

Tests for comparison of multiple endpoints with application to omics data

Abstract

Acknowledgement

A Appendix

Theorem 1

Theorem 2

References

Journal and Issue

Articles in the same Issue