Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter January 30, 2018

Tests for comparison of multiple endpoints with application to omics data

  • Marco Marozzi EMAIL logo

Abstract

In biomedical research, multiple endpoints are commonly analyzed in “omics” fields like genomics, proteomics and metabolomics. Traditional methods designed for low-dimensional data either perform poorly or are not applicable when analyzing high-dimensional data whose dimension is generally similar to, or even much larger than, the number of subjects. The complex biochemical interplay between hundreds (or thousands) of endpoints is reflected by complex dependence relations. The aim of the paper is to propose tests that are very suitable for analyzing omics data because they do not require the normality assumption, are powerful also for small sample sizes, in the presence of complex dependence relations among endpoints, and when the number of endpoints is much larger than the number of subjects. Unbiasedness and consistency of the tests are proved and their size and power are assessed numerically. It is shown that the proposed approach based on the nonparametric combination of dependent interpoint distance tests is very effective. Applications to genomics and metabolomics are discussed.

Acknowledgement

We are very grateful to Prof. Dr. H. Shen for kindly providing the second data set analyzed in Section 5.

  1. Conflict of Interest: The author has declared no conflict of interest.

A Appendix

Theorem 1

The FReuclid test is unbiased for testing H0 : 𝛍 = 0 against H1 : 𝛍0.

Proof. We consider the following additive model, which is equivalent to the location difference setting considered in Section 2,

{Xi=μ+Vii=1,...,mYm+j=Vm+jj=1,...,n

where Vs are independent and identically distributed multivariate random variables with 0 location and Σ variance-covariance matrix with no infinite elements. Note that Vs are independent among themselves but their p components can be dependent. Let

Z(0)=(Zi(0),i=1,...,N)=(Vi,i=1,...,N)

denote the pooled sample under the null hypothesis and let

Z(μ)=(Zi(μ),i=1,...,N)=(μ+Vi,i=1,...,m;Vm+j,j=1,...,n)

denote the pooled sample under the alternative hypothesis. Define similarly Z~(0) and Z~(μ).

The FReuclid test rejects for large values of its statistic, therefore to prove unbiasedness we have to show that the FReuclid test statistic is stochastically larger when μ0, ie under H1, than when μ = 0, ie under H0, as shown by Theorem 3 in Pesarin and Salmaso (2010) p. 138.

Theorem 1 in Marozzi (2015a) shows that the Meuclid test is unbiased. Therefore when μ0 the Meuclid test statistic is stochastically smaller than when μ = 0. It follows that

πMeuclid(Z(μ))πMeuclid(Z(0)),

note that Meuclid test rejects for small values of its statistic. Of course it is also πM~euclid(Z~(μ))πM~euclid(Z~(0)). As a consequence

FReuclid(Z(μ))=log(1πMeuclid(Z(μ)))+log(1πM~euclid(Z~(μ)))log(1πMeuclid(Z(0)))+log(1πM~euclid(Z~(0)))=FReuclid(Z(0))

ie the FReuclid test statistic is stochastically larger under H1 than under H0. This result completes the proof. QED   □

Theorem 2

The FReuclid test is consistent for testing H0 : 𝛍 = 0 against H1 : 𝛍0.

Proof. From Theorem 2 in Marozzi (2015a) it follows that the Meuclid test is consistent, that is

limNπMeuclid(Z(μ))=0,

where N → ∞ means that m, n → ∞ with mNλ<1. Of course it is also limNπM~euclid(Z~(μ))=0. It follows that for N → ∞

FReuclid(Z(μ))=log(1πMeuclid(Z(μ)))+log(1πM~euclid(Z~(μ)))

and that limNπFReuclid(Z(μ))=0. As a consequence, when N diverges the probability that the FReuclid test rejects the null hypothesis when the null hypothesis is false tends to 1 and therefore the test is consistent. QED   □

References

Bai, Z. and H. Saranadasa (1996): “Effect of high dimension: by an example of a two sample problem,” Stat. Sinica, 6, 311–329.Search in Google Scholar

Brombin, C., E. Midena and L. Salmaso (2013): “Robust non-parametric tests for complex-repeated measures problems in ophthalmology,” Stat. Methods Med. Res., 22, 643–660.10.1177/0962280211403659Search in Google Scholar PubMed

Cai, T. T., W. Liu and Y. Xia (2014): “Two-sample test of high dimensional means under dependence,” J. R. Stat. Soc. B, 76, 349–372.10.1111/rssb.12034Search in Google Scholar

Chen, S. X. and Y. L. Qin (2010): “A two-sample test for high-dimensional data with applications to gene-set testing,” Ann. Stat., 38, 808–835.10.1214/09-AOS716Search in Google Scholar

Hajek, J., Z. Sidak and P. K. Sen (1998): Theory of rank tests, 2nd ed., Academic Press, New York.Search in Google Scholar

Huang, Z., L. Lin, Y. Gao, Y. Chen, X. Yan, J. Xing and W. Hang (2011): “Bladder cancer determination via two urinary metabolites: A biomarker pattern approach,” Mol. Cell. Proteomics, 10, M111.007922. DOI: 10.1074/mcp.M111.007922.10.1074/mcp.M111.007922Search in Google Scholar PubMed PubMed Central

Jauregui, O., D. Corella, M. Ruiz-Canela, J. Salas-Salvado, M. Fito, E. Ros, R. Estruch and C. Andres-Lacueva (2015): “A metabolomics-driven approach to predict cocoa product consumption by designing a multimetabolite biomarker model in free-living subjects from the PREDIMED study,” Mol. Nutr. Food Res., 59, 212–220.10.1002/mnfr.201400434Search in Google Scholar PubMed

Jureckova, J. and J. Kalina (2012): “Nonparametric multivariate rank tests and their unbiasedness,” Bernoulli, 18, 229–251.10.3150/10-BEJ326Search in Google Scholar

Lacey, M., C. Baribault and M. Ehrlich (2013): “Modeling, simulation and analysis of methylation profiles from reduced representation bisulfite sequencing experiments,” Stat. Appl. Genet. Mol. Biol., 12, 723–742.10.1515/sagmb-2013-0027Search in Google Scholar PubMed

Marozzi, M. (2015a): “Multivariate multidistance tests for high-dimensional low sample size case-control studies,” Stat. Med., 34, 1511–1526.10.1002/sim.6418Search in Google Scholar PubMed

Marozzi, M. (2015b): “Does bad inference drive out good?,” Clin. Exp. Pharmacol. P., 42, 727–733.10.1111/1440-1681.12422Search in Google Scholar PubMed

Marozzi, M. (2016): “Multivariate tests based on interpoint distances with application to magnetic resonance imaging,” Stat. Methods Med. Res., 25, 2593–2610.10.1177/0962280214529104Search in Google Scholar PubMed

Nelsen, R. B. (2006): An introduction to copulas, 2nd ed., Springer Science+Buisness, New York.Search in Google Scholar

Notterman, D. A., U. Alon, A. J. Sierk and A. J. Levine (2001): “Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays,” Cancer Res., 61, 3124–3130.Search in Google Scholar

Pesarin, F. and L. Salmaso (2010): Permutation tests for complex data, Wiley, Chichester.10.1002/9780470689516Search in Google Scholar

Soussi, T. and K. G. Wiman (2015): “TP53: an oncogene in disguise,” Cell Death Differ., 22, 1239–1249.10.1038/cdd.2015.53Search in Google Scholar PubMed PubMed Central

Srivastava, M. S. and T. Kubokawa (2013): “Tests for multivariate analysis of variance in high dimension under non-normality,” J. Multivariate A., 115, 204–216.10.1016/j.jmva.2012.10.011Search in Google Scholar

Xia, J., D. I. Broadhurst, M. Wilson and D. S. Wishart (2013): “Translational biomarker discovery in clinical metabolomics: an introductory tutorial,” Metabolomics, 9, 280–299.10.1007/s11306-012-0482-9Search in Google Scholar PubMed PubMed Central

Yan, J. (2007): “Enjoy the joy of copulas: with a package copula,” J. Stat. Softw., 21, 1–21.10.18637/jss.v021.i04Search in Google Scholar

Zhang, J., Z. Huang, M. Chen, Y. Xia, F. L. Martin, W. Hang and H. Shen (2014a): “Urinary metabolome identifies signatures of oligozoospermic infertile men,” Fertil. Steril., 102, 44–53.10.1016/j.fertnstert.2014.03.033Search in Google Scholar PubMed

Zhang, J., X. Mu, Y. Xia, F. L. Martin, W. Hang, L. Liu, M. Tian, Q. Huang and H. Shen (2014b): “Metabolomic analysis reveals a unique urinary pattern in normozoospermic infertile men,” J. Proteome Res., 13, 3088–3099.10.1021/pr5003142Search in Google Scholar PubMed

Zhang, J., H. Shen, W. Xu, Y. Xia, D. B. Barr, X. Mu, X. Wang, L. Liu, Q. Huang and M. Tian (2014c): “Urinary metabolomics revealed arsenic internal dose-related metabolic alterations: a proof-of-concept study in a Chinese male cohort,” Environ. Sci. Technol., 48, 12265–12274.10.1021/es503659wSearch in Google Scholar PubMed PubMed Central

Published Online: 2018-1-30

©2018 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 18.4.2024 from https://www.degruyter.com/document/doi/10.1515/sagmb-2017-0033/html
Scroll to top button