Skip to main content
Log in

Clustering via nonparametric density estimation

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Although Hartigan (1975) had already put forward the idea of connecting identification of subpopulations with regions with high density of the underlying probability distribution, the actual development of methods for cluster analysis has largely shifted towards other directions, for computational convenience. Current computational resources allow us to reconsider this formulation and to develop clustering techniques directly in order to identify local modes of the density. Given a set of observations, a nonparametric estimate of the underlying density function is constructed, and subsets of points with high density are formed through suitable manipulation of the associated Delaunay triangulation. The method is illustrated with some numerical examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aitchison J. 1986. The Statistical Analysis of Compositional Data. Chapman & Hall, London.

    MATH  Google Scholar 

  • Ankerst M., Breuning M.M., Kriegel H.P., and Sander J. 1999. OPTICS: ordering points to identify the clustering structure. In: International Conference on Management of Data (SIGMOD’99), ACM, pp. 49–60.

  • Barber C.B., Dobkin D.P., and Huhdanpaa H. 1996. The Quickhull algorithm for convex hulls. ACM Trans. Math. Software 22: 469–483.

    Article  MATH  MathSciNet  Google Scholar 

  • Bowman A. and Foster P. 1993. Density based exploration of bivariate data. Statistics and Computing 3: 171–177.

    Article  Google Scholar 

  • Bowman A.W. and Azzalini 1997. Applied Smoothing Techniques for Data Analysis. Claredon Press, Oxford.

  • Cuevas A., Febrero M., and Fraiman R. 2000. Estimating the number of clusters. Canad. J. Stat. 28: 367–382.

    Article  MATH  MathSciNet  Google Scholar 

  • Cuevas A., Febrero M., and Fraiman R. 2001. Cluster analysis: a further approach based on density estimation. Computational Statistics & Data Analysis 36: 441–459.

    Article  MATH  MathSciNet  Google Scholar 

  • Devroye L.P. and Wagner T.J. 1980. The strong uniform consistency of kernel density estimates. In: Multivariate Analysis, North-Holland, Vol. 5, pp. 59–77.

  • Ester M., Kriegel H.P., Sander J., and Xu X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery in Data Mining (KDD-96), Portland, OR, USA. ACM, pp. 226–231.

  • Forina M., Armanino C., Lanteri S., and Tiscornia E. 1983. Classication of olive oils from their fatty acid composition. In: H. Martens and H. J. Russwurm (Eds.), Food Research and Data Analysis, Applied Science Publishers: London, pp. 189–214.

  • Hartigan J.A. 1975. Clustering Algorithms. J. Wiley & Sons, New York.

    MATH  Google Scholar 

  • Hubert L. and Arabie P. 1985. Comparing partitions. Journal of Classification 2: 193–218.

    Article  Google Scholar 

  • Nadaraya É.A. 1965. On non-parametric estimates of density functions and regression curves. Theory Probability its Appl. (Transl. Teorija Verojatnostei i ee Primenenija) 10: 186–190.

  • Okabe A., Boots B.N., and Sugihara K. 1992. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. J. Wiley & Sons, New York.

  • R Development Core Team 2004. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria 3-900051-07-0.

  • Rosolin T., Azzalini A., and Torelli N. 2003. Detecting clusters via nonparametric density estimation. In: Convegno SIS analisi statistica multivariata per le scienze economico-sociali, le scienze naturali e la tecnologia, Napoli, Italy. Società Italiana di Statistica, RCE edizioni.

  • Stuetzle W. 2003. Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. Journal of Classification 20: 25–47.

    Article  MATH  MathSciNet  Google Scholar 

  • Wong A.M. and Lane T. 1983. The kth nearest neighbour clustering procedure. Journal of the Royal Statistical Society, Series B 45: 362–368.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adelchi Azzalini.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Azzalini, A., Torelli, N. Clustering via nonparametric density estimation. Stat Comput 17, 71–80 (2007). https://doi.org/10.1007/s11222-006-9010-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-006-9010-y

Keywords

Navigation