TUNNEL CLUSTERING METHOD
- Authors: Aleskerov F.T.1,2, Myachin A.L.1,2, Yakuba V.I.1,2
-
Affiliations:
- National Research University Higher School of Economics
- V. A. Trapeznikov Institute of Control Science of Russian Academy of Science
- Issue: Vol 520, No 1 (2024)
- Pages: 29-34
- Section: MATHEMATICS
- URL: https://rjeid.com/2686-9543/article/view/682687
- DOI: https://doi.org/10.31857/S2686954324060052
- EDN: https://elibrary.ru/KLEIQU
- ID: 682687
Cite item
Abstract
We propose a novel method for rapid pattern analysis in high-dimensional numerical data, termed “tunnel clustering”. The main advantages of this method are its relatively low computational complexity, endogenous determination of cluster composition and number, and a high degree of interpretability of the final results. We present descriptions of three different variations: one with fixed hyperparameters, an adaptive version, and a combined approach. Three fundamental properties of tunnel clustering are examined. Practical applications are demonstrated on both synthetic datasets containing 100,000 objects and on classical benchmark datasets.
Keywords
About the authors
F. T. Aleskerov
National Research University Higher School of Economics; V. A. Trapeznikov Institute of Control Science of Russian Academy of Science
Email: alesk@hse.ru
Moscow, Russia; Moscow, Russia
A. L. Myachin
National Research University Higher School of Economics; V. A. Trapeznikov Institute of Control Science of Russian Academy of Science
Email: amyachin@hse.ru
Moscow, Russia; Moscow, Russia
V. I. Yakuba
National Research University Higher School of Economics; V. A. Trapeznikov Institute of Control Science of Russian Academy of Science
Email: yakuba@ipu.ru
Moscow, Russia; Moscow, Russia
References
- Digital 2023: Global Overview Report. https://datareportal.com/reports/ digital-2024-global-overview-report (дата обращения: 04.06.2024).
- SimilarWeb. https://www.similarweb.com/ru/ (дата обращения 04.06.2024).
- Cormack R. M. A review of classification // Journal of the Royal Statistical Society: Series A (General). 1971. V. 134. №. 3. P. 321–353.
- Draper N. R., Smith H. Applied regression analysis. John Wiley & Sons, 1998.
- Chandola V., Banerjee A., Kumar V. Anomaly detection: A survey // ACM computing surveys (CSUR). 2009. V. 41. №. 3. P. 1–58.
- Cheng B., Titterington D. M. Neural networks: A review from a statistical perspective // Statistical science. 1994. P. 2–30.
- Myachin A. L. Pattern analysis in parallel coordinates based on pairwise comparison of parameters // Automation and Remote Control. 2019. V. 80. P. 112–123.
- Shawe-Taylor J., Cristianini N. Kernel methods for pattern analysis. Cambridge university press, 2004.
- Agrawal R., Imieliński T., Swami A. Mining association rules between sets of items in large databases // Proceedings of the 1993 ACM SIGMOD international conference on Management of data. 1993. P. 207–216.
- Anderberg M. R. Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks. Academic press, 2014.
- Mahesh B. Machine learning algorithms – a review // International Journal of Science and Research (IJSR). [Internet]. 2020. V. 9. №. 1. P. 381–386.
- Mirkin B. Clustering for data mining: a data recovery approach. Chapman and Hall/CRC, 2005.
- Romesburg C. Cluster analysis for researchers. Lulu. com, 2004.
- Aleskerov F., Emre Alper C. A Clustering Approach to Some Monetary Facts: A Long‐Run Analysis of Cross‐Country Data // The Japanese Economic Review. 2000. V. 51. №. 4. P. 555–567.
- Inselberg A. The plane with parallel coordinates // The visual computer. 1985. V. 1. P. 69–91.
- Fisher R. A. The use of multiple measurements in taxonomic problems // Annals of eugenics. 1936. V. 7. №. 2. P. 179–188.
- Machine Learning Repository. https://archive. ics.uci.edu/dataset/109/wine (дата обращения: 04.06.2024)
Supplementary files
