TUNNEL CLUSTERING METHOD

Cover Page

Cite item

Full Text

Open Access Open Access
Restricted Access Access granted
Restricted Access Subscription Access

Abstract

We propose a novel method for rapid pattern analysis in high-dimensional numerical data, termed “tunnel clustering”. The main advantages of this method are its relatively low computational complexity, endogenous determination of cluster composition and number, and a high degree of interpretability of the final results. We present descriptions of three different variations: one with fixed hyperparameters, an adaptive version, and a combined approach. Three fundamental properties of tunnel clustering are examined. Practical applications are demonstrated on both synthetic datasets containing 100,000 objects and on classical benchmark datasets.

About the authors

F. T. Aleskerov

National Research University Higher School of Economics; V. A. Trapeznikov Institute of Control Science of Russian Academy of Science

Email: alesk@hse.ru
Moscow, Russia; Moscow, Russia

A. L. Myachin

National Research University Higher School of Economics; V. A. Trapeznikov Institute of Control Science of Russian Academy of Science

Email: amyachin@hse.ru
Moscow, Russia; Moscow, Russia

V. I. Yakuba

National Research University Higher School of Economics; V. A. Trapeznikov Institute of Control Science of Russian Academy of Science

Email: yakuba@ipu.ru
Moscow, Russia; Moscow, Russia

References

  1. Digital 2023: Global Overview Report. https://datareportal.com/reports/ digital-2024-global-overview-report (дата обращения: 04.06.2024).
  2. SimilarWeb. https://www.similarweb.com/ru/ (дата обращения 04.06.2024).
  3. Cormack R. M. A review of classification // Journal of the Royal Statistical Society: Series A (General). 1971. V. 134. №. 3. P. 321–353.
  4. Draper N. R., Smith H. Applied regression analysis. John Wiley & Sons, 1998.
  5. Chandola V., Banerjee A., Kumar V. Anomaly detection: A survey // ACM computing surveys (CSUR). 2009. V. 41. №. 3. P. 1–58.
  6. Cheng B., Titterington D. M. Neural networks: A review from a statistical perspective // Statistical science. 1994. P. 2–30.
  7. Myachin A. L. Pattern analysis in parallel coordinates based on pairwise comparison of parameters // Automation and Remote Control. 2019. V. 80. P. 112–123.
  8. Shawe-Taylor J., Cristianini N. Kernel methods for pattern analysis. Cambridge university press, 2004.
  9. Agrawal R., Imieliński T., Swami A. Mining association rules between sets of items in large databases // Proceedings of the 1993 ACM SIGMOD international conference on Management of data. 1993. P. 207–216.
  10. Anderberg M. R. Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks. Academic press, 2014.
  11. Mahesh B. Machine learning algorithms – a review // International Journal of Science and Research (IJSR). [Internet]. 2020. V. 9. №. 1. P. 381–386.
  12. Mirkin B. Clustering for data mining: a data recovery approach. Chapman and Hall/CRC, 2005.
  13. Romesburg C. Cluster analysis for researchers. Lulu. com, 2004.
  14. Aleskerov F., Emre Alper C. A Clustering Approach to Some Monetary Facts: A Long‐Run Analysis of Cross‐Country Data // The Japanese Economic Review. 2000. V. 51. №. 4. P. 555–567.
  15. Inselberg A. The plane with parallel coordinates // The visual computer. 1985. V. 1. P. 69–91.
  16. Fisher R. A. The use of multiple measurements in taxonomic problems // Annals of eugenics. 1936. V. 7. №. 2. P. 179–188.
  17. Machine Learning Repository. https://archive. ics.uci.edu/dataset/109/wine (дата обращения: 04.06.2024)

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) 2024 Russian Academy of Sciences