Abstract
In biomedical research, multiple endpoints are commonly analyzed in “omics” fields like genomics, proteomics and metabolomics. Traditional methods designed for low-dimensional data either perform poorly or are not applicable when analyzing high-dimensional data whose dimension is generally similar to, or even much larger than, the number of subjects. The complex biochemical interplay between hundreds (or thousands) of endpoints is reflected by complex dependence relations. The aim of the paper is to propose tests that are very suitable for analyzing omics data because they do not require the normality assumption, are powerful also for small sample sizes, in the presence of complex dependence relations among endpoints, and when the number of endpoints is much larger than the number of subjects. Unbiasedness and consistency of the tests are proved and their size and power are assessed numerically. It is shown that the proposed approach based on the nonparametric combination of dependent interpoint distance tests is very effective. Applications to genomics and metabolomics are discussed.
Acknowledgement
We are very grateful to Prof. Dr. H. Shen for kindly providing the second data set analyzed in Section 5.
Conflict of Interest: The author has declared no conflict of interest.
A Appendix
Theorem 1
The FReuclid test is unbiased for testing H0 : 𝛍 = 0 against H1 : 𝛍 ≠ 0.
Proof. We consider the following additive model, which is equivalent to the location difference setting considered in Section 2,
where Vs are independent and identically distributed multivariate random variables with 0 location and Σ variance-covariance matrix with no infinite elements. Note that Vs are independent among themselves but their p components can be dependent. Let
denote the pooled sample under the null hypothesis and let
denote the pooled sample under the alternative hypothesis. Define similarly
The FReuclid test rejects for large values of its statistic, therefore to prove unbiasedness we have to show that the FReuclid test statistic is stochastically larger when μ ≠ 0, ie under H1, than when μ = 0, ie under H0, as shown by Theorem 3 in Pesarin and Salmaso (2010) p. 138.
Theorem 1 in Marozzi (2015a) shows that the Meuclid test is unbiased. Therefore when μ ≠ 0 the Meuclid test statistic is stochastically smaller than when μ = 0. It follows that
note that Meuclid test rejects for small values of its statistic. Of course it is also
ie the FReuclid test statistic is stochastically larger under H1 than under H0. This result completes the proof. QED □
Theorem 2
The FReuclid test is consistent for testing H0 : 𝛍 = 0 against H1 : 𝛍 ≠ 0.
Proof. From Theorem 2 in Marozzi (2015a) it follows that the Meuclid test is consistent, that is
where N → ∞ means that m, n → ∞ with
and that
References
Bai, Z. and H. Saranadasa (1996): “Effect of high dimension: by an example of a two sample problem,” Stat. Sinica, 6, 311–329.Search in Google Scholar
Brombin, C., E. Midena and L. Salmaso (2013): “Robust non-parametric tests for complex-repeated measures problems in ophthalmology,” Stat. Methods Med. Res., 22, 643–660.10.1177/0962280211403659Search in Google Scholar PubMed
Cai, T. T., W. Liu and Y. Xia (2014): “Two-sample test of high dimensional means under dependence,” J. R. Stat. Soc. B, 76, 349–372.10.1111/rssb.12034Search in Google Scholar
Chen, S. X. and Y. L. Qin (2010): “A two-sample test for high-dimensional data with applications to gene-set testing,” Ann. Stat., 38, 808–835.10.1214/09-AOS716Search in Google Scholar
Hajek, J., Z. Sidak and P. K. Sen (1998): Theory of rank tests, 2nd ed., Academic Press, New York.Search in Google Scholar
Huang, Z., L. Lin, Y. Gao, Y. Chen, X. Yan, J. Xing and W. Hang (2011): “Bladder cancer determination via two urinary metabolites: A biomarker pattern approach,” Mol. Cell. Proteomics, 10, M111.007922. DOI: 10.1074/mcp.M111.007922.10.1074/mcp.M111.007922Search in Google Scholar PubMed PubMed Central
Jauregui, O., D. Corella, M. Ruiz-Canela, J. Salas-Salvado, M. Fito, E. Ros, R. Estruch and C. Andres-Lacueva (2015): “A metabolomics-driven approach to predict cocoa product consumption by designing a multimetabolite biomarker model in free-living subjects from the PREDIMED study,” Mol. Nutr. Food Res., 59, 212–220.10.1002/mnfr.201400434Search in Google Scholar PubMed
Jureckova, J. and J. Kalina (2012): “Nonparametric multivariate rank tests and their unbiasedness,” Bernoulli, 18, 229–251.10.3150/10-BEJ326Search in Google Scholar
Lacey, M., C. Baribault and M. Ehrlich (2013): “Modeling, simulation and analysis of methylation profiles from reduced representation bisulfite sequencing experiments,” Stat. Appl. Genet. Mol. Biol., 12, 723–742.10.1515/sagmb-2013-0027Search in Google Scholar PubMed
Marozzi, M. (2015a): “Multivariate multidistance tests for high-dimensional low sample size case-control studies,” Stat. Med., 34, 1511–1526.10.1002/sim.6418Search in Google Scholar PubMed
Marozzi, M. (2015b): “Does bad inference drive out good?,” Clin. Exp. Pharmacol. P., 42, 727–733.10.1111/1440-1681.12422Search in Google Scholar PubMed
Marozzi, M. (2016): “Multivariate tests based on interpoint distances with application to magnetic resonance imaging,” Stat. Methods Med. Res., 25, 2593–2610.10.1177/0962280214529104Search in Google Scholar PubMed
Nelsen, R. B. (2006): An introduction to copulas, 2nd ed., Springer Science+Buisness, New York.Search in Google Scholar
Notterman, D. A., U. Alon, A. J. Sierk and A. J. Levine (2001): “Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays,” Cancer Res., 61, 3124–3130.Search in Google Scholar
Pesarin, F. and L. Salmaso (2010): Permutation tests for complex data, Wiley, Chichester.10.1002/9780470689516Search in Google Scholar
Soussi, T. and K. G. Wiman (2015): “TP53: an oncogene in disguise,” Cell Death Differ., 22, 1239–1249.10.1038/cdd.2015.53Search in Google Scholar PubMed PubMed Central
Srivastava, M. S. and T. Kubokawa (2013): “Tests for multivariate analysis of variance in high dimension under non-normality,” J. Multivariate A., 115, 204–216.10.1016/j.jmva.2012.10.011Search in Google Scholar
Xia, J., D. I. Broadhurst, M. Wilson and D. S. Wishart (2013): “Translational biomarker discovery in clinical metabolomics: an introductory tutorial,” Metabolomics, 9, 280–299.10.1007/s11306-012-0482-9Search in Google Scholar PubMed PubMed Central
Yan, J. (2007): “Enjoy the joy of copulas: with a package copula,” J. Stat. Softw., 21, 1–21.10.18637/jss.v021.i04Search in Google Scholar
Zhang, J., Z. Huang, M. Chen, Y. Xia, F. L. Martin, W. Hang and H. Shen (2014a): “Urinary metabolome identifies signatures of oligozoospermic infertile men,” Fertil. Steril., 102, 44–53.10.1016/j.fertnstert.2014.03.033Search in Google Scholar PubMed
Zhang, J., X. Mu, Y. Xia, F. L. Martin, W. Hang, L. Liu, M. Tian, Q. Huang and H. Shen (2014b): “Metabolomic analysis reveals a unique urinary pattern in normozoospermic infertile men,” J. Proteome Res., 13, 3088–3099.10.1021/pr5003142Search in Google Scholar PubMed
Zhang, J., H. Shen, W. Xu, Y. Xia, D. B. Barr, X. Mu, X. Wang, L. Liu, Q. Huang and M. Tian (2014c): “Urinary metabolomics revealed arsenic internal dose-related metabolic alterations: a proof-of-concept study in a Chinese male cohort,” Environ. Sci. Technol., 48, 12265–12274.10.1021/es503659wSearch in Google Scholar PubMed PubMed Central
©2018 Walter de Gruyter GmbH, Berlin/Boston