TUNNEL CLUSTERING METHOD
- Autores: Aleskerov F.T.1,2, Myachin A.L.1,2, Yakuba V.I.1,2
-
Afiliações:
- National Research University Higher School of Economics
- V. A. Trapeznikov Institute of Control Science of Russian Academy of Science
- Edição: Volume 520, Nº 1 (2024)
- Páginas: 29-34
- Seção: MATHEMATICS
- URL: https://rjeid.com/2686-9543/article/view/682687
- DOI: https://doi.org/10.31857/S2686954324060052
- EDN: https://elibrary.ru/KLEIQU
- ID: 682687
Citar
Resumo
We propose a novel method for rapid pattern analysis in high-dimensional numerical data, termed “tunnel clustering”. The main advantages of this method are its relatively low computational complexity, endogenous determination of cluster composition and number, and a high degree of interpretability of the final results. We present descriptions of three different variations: one with fixed hyperparameters, an adaptive version, and a combined approach. Three fundamental properties of tunnel clustering are examined. Practical applications are demonstrated on both synthetic datasets containing 100,000 objects and on classical benchmark datasets.
Palavras-chave
Sobre autores
F. Aleskerov
National Research University Higher School of Economics; V. A. Trapeznikov Institute of Control Science of Russian Academy of Science
Email: alesk@hse.ru
Moscow, Russia; Moscow, Russia
A. Myachin
National Research University Higher School of Economics; V. A. Trapeznikov Institute of Control Science of Russian Academy of Science
Email: amyachin@hse.ru
Moscow, Russia; Moscow, Russia
V. Yakuba
National Research University Higher School of Economics; V. A. Trapeznikov Institute of Control Science of Russian Academy of Science
Email: yakuba@ipu.ru
Moscow, Russia; Moscow, Russia
Bibliografia
- Digital 2023: Global Overview Report. https://datareportal.com/reports/ digital-2024-global-overview-report (дата обращения: 04.06.2024).
- SimilarWeb. https://www.similarweb.com/ru/ (дата обращения 04.06.2024).
- Cormack R. M. A review of classification // Journal of the Royal Statistical Society: Series A (General). 1971. V. 134. №. 3. P. 321–353.
- Draper N. R., Smith H. Applied regression analysis. John Wiley & Sons, 1998.
- Chandola V., Banerjee A., Kumar V. Anomaly detection: A survey // ACM computing surveys (CSUR). 2009. V. 41. №. 3. P. 1–58.
- Cheng B., Titterington D. M. Neural networks: A review from a statistical perspective // Statistical science. 1994. P. 2–30.
- Myachin A. L. Pattern analysis in parallel coordinates based on pairwise comparison of parameters // Automation and Remote Control. 2019. V. 80. P. 112–123.
- Shawe-Taylor J., Cristianini N. Kernel methods for pattern analysis. Cambridge university press, 2004.
- Agrawal R., Imieliński T., Swami A. Mining association rules between sets of items in large databases // Proceedings of the 1993 ACM SIGMOD international conference on Management of data. 1993. P. 207–216.
- Anderberg M. R. Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks. Academic press, 2014.
- Mahesh B. Machine learning algorithms – a review // International Journal of Science and Research (IJSR). [Internet]. 2020. V. 9. №. 1. P. 381–386.
- Mirkin B. Clustering for data mining: a data recovery approach. Chapman and Hall/CRC, 2005.
- Romesburg C. Cluster analysis for researchers. Lulu. com, 2004.
- Aleskerov F., Emre Alper C. A Clustering Approach to Some Monetary Facts: A Long‐Run Analysis of Cross‐Country Data // The Japanese Economic Review. 2000. V. 51. №. 4. P. 555–567.
- Inselberg A. The plane with parallel coordinates // The visual computer. 1985. V. 1. P. 69–91.
- Fisher R. A. The use of multiple measurements in taxonomic problems // Annals of eugenics. 1936. V. 7. №. 2. P. 179–188.
- Machine Learning Repository. https://archive. ics.uci.edu/dataset/109/wine (дата обращения: 04.06.2024)
Arquivos suplementares
