Skip to main content

Subspace Determination Through Local Intrinsic Dimensional Decomposition

  • Conference paper
  • First Online:
Similarity Search and Applications (SISAP 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11807))

Included in the following conference series:

Abstract

Axis-aligned subspace clustering generally entails searching through enormous numbers of subspaces (feature combinations) and evaluation of cluster quality within each subspace. In this paper, we tackle the problem of identifying subsets of features with the most significant contribution to the formation of the local neighborhood surrounding a given data point. For each point, the recently-proposed Local Intrinsic Dimension (LID) model is used in identifying the axis directions along which features have the greatest local discriminability, or equivalently, the fewest number of components of LID that capture the local complexity of the data. In this paper, we develop an estimator of LID along axis projections, and provide preliminary evidence that this LID decomposition can indicate axis-aligned data subspaces that support the formation of clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., Müller-Gorman, I., Zimek, A.: Detection and visualization of subspace cluster hierarchies. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 152–163. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71703-4_15

    Chapter  Google Scholar 

  2. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of SIGMOD, pp. 94–105 (1998)

    Article  Google Scholar 

  3. Amsaleg, L., et al.: Estimating local intrinsic dimensionality. In: Proceedings of KDD, pp. 29–38 (2015)

    Google Scholar 

  4. Becker, R., Hafnaoui, I., Houle, M.E., Li, P., Zimek, A.: Subspace determination through local intrinsic dimensional decomposition: theory and experimentation. arXiv e-prints arXiv:1907.06771 (2019)

  5. Bernecker, T., et al.: Subspace similarity search: efficient k-NN queries in arbitrary subspaces. In: Gertz, M., Ludäscher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 555–564. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13818-8_38

    Chapter  Google Scholar 

  6. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “Nearest Neighbor” meaningful? In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49257-7_15

    Chapter  Google Scholar 

  7. Casanova, G., et al.: Dimensional testing for reverse \(k\)-nearest neighbor search. PVLDB 10(7), 769–780 (2017)

    Google Scholar 

  8. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of KDD, pp. 226–231 (1996)

    Google Scholar 

  9. François, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE TKDE 19(7), 873–886 (2007)

    Google Scholar 

  10. Houle, M.E.: Dimensionality, discriminability, density and distance distributions. In: Proceedings of the ICDM Workshops, pp. 468–473 (2013)

    Google Scholar 

  11. Houle, M.E.: Local intrinsic dimensionality I an extreme-value-theoretic foundation for similarity applications. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 64–79. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68474-1_5

    Chapter  Google Scholar 

  12. Houle, M.E.: Local intrinsic dimensionality II: multivariate analysis and distributional support. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 80–95. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68474-1_6

    Chapter  Google Scholar 

  13. Houle, M.E., Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Can shared-neighbor distances defeat the curse of dimensionality? In: Gertz, M., Ludäscher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 482–500. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13818-8_34

    Chapter  Google Scholar 

  14. Houle, M.E., Ma, X., Oria, V., Sun, J.: Efficient algorithms for similarity search in axis-aligned subspaces. In: Traina, A.J.M., Traina, C., Cordeiro, R.L.F. (eds.) SISAP 2014. LNCS, vol. 8821, pp. 1–12. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11988-5_1

    Chapter  Google Scholar 

  15. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)

    Article  Google Scholar 

  16. Kadelburg, Z., Marjanović, M.: Interchanging two limits. Teach. Math. 8(1), 15–29 (2005)

    Google Scholar 

  17. Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM TKDD 3(1), 1–58 (2009)

    Article  Google Scholar 

  18. Ma, X., et al.: Characterizing adversarial subspaces using local intrinsic dimensionality. In: Proceedings of ICLR, pp. 1–15 (2018)

    Google Scholar 

  19. Ma, X., et al.: Dimensionality-driven learning with noisy labels. In: Proceedings of ICML, pp. 3361–3370 (2018)

    Google Scholar 

  20. Romano, S., Chelly, O., Nguyen, V., Bailey, J., Houle, M.E.: Measuring dependency via intrinsic dimensionality. In: ICPR 2016, pp. 1207–1212, December 2016

    Google Scholar 

  21. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)

    Article  Google Scholar 

  22. Rozza, A., Lombardi, G., Ceruti, C., Casiraghi, E., Campadelli, P.: Novel high intrinsic dimensionality estimators. Mach. Learn. 89(1–2), 37–65 (2012)

    Article  MathSciNet  Google Scholar 

  23. Sim, K., Gopalkrishnan, V., Zimek, A., Cong, G.: A survey on enhanced subspace clustering. Data Min. Knowl. Disc. 26(2), 332–397 (2013)

    Article  MathSciNet  Google Scholar 

  24. Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)

    MathSciNet  MATH  Google Scholar 

  25. Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clustering comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)

    MathSciNet  MATH  Google Scholar 

  26. Zimek, A., Schubert, E., Kriegel, H.P.: A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Anal. Data Min. 5(5), 363–387 (2012)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

M. E. Houle was supported by JSPS Kakenhi Kiban (B) Research Grant 18H03296.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael E. Houle .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Becker, R., Hafnaoui, I., Houle, M.E., Li, P., Zimek, A. (2019). Subspace Determination Through Local Intrinsic Dimensional Decomposition. In: Amato, G., Gennaro, C., Oria, V., Radovanović , M. (eds) Similarity Search and Applications. SISAP 2019. Lecture Notes in Computer Science(), vol 11807. Springer, Cham. https://doi.org/10.1007/978-3-030-32047-8_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32047-8_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32046-1

  • Online ISBN: 978-3-030-32047-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics