• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

New Clustering Method Simplifies Analysis of Large Data Sets

New Clustering Method Simplifies Analysis of Large Data Sets

© iStock

Researchers from HSE University and the Institute of Control Sciences of the Russian Academy of Sciences have proposed a new method of data analysis: tunnel clustering. It allows for the rapid identification of groups of similar objects and requires fewer computational resources than traditional methods. Depending on the data configuration, the algorithm can operate dozens of times faster than its counterparts. The study was published in the journal Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia.

Each year, the volume of information requiring processing continues to grow. Data comes from a variety of sources: scientific research, financial reports, medical examinations, and many others. Clustering methods—which group data based on similar characteristics—are used to detect patterns and organise information within such large datasets. These groupings are known as clusters.

One of the most widely used clustering methods is the k-means algorithm. It divides data into a predetermined number of clusters, initially selecting their centres (centroids). However, this method has a limitation: the number of clusters must be known beforehand, which is not always possible when dealing with complex data. Scientists from HSE University and the V.A. Trapeznikov Institute of Control Sciences have proposed a new approach to simplify this process—tunnel clustering. Unlike the k-means method, this algorithm does not require the number of clusters to be set in advance; it determines the necessary number itself by analysing the data structure.

‘The algorithm forms “tunnels” in the data—regions in multidimensional space where objects with similar characteristics group together,’ explained Fuad Aleskerov, Head of the Department of Mathematics at the HSE Faculty of Economic Sciences. ‘Users can choose from three modes of operation: with fixed cluster boundaries, with adaptive boundaries that adjust to the data structure, or a combined approach. This makes the method flexible and suitable for various types of tasks.’

The method was tested on a synthetic (artificially generated) dataset of 100,000 objects, as well as on real-world tasks in public administration and the banking sector.

Visualisation of the original data and the results of tunnel clustering in a four-dimensional parallel coordinates system.
© Aleskerov, F.T., Myachin, A.L. & Yakuba, V.I. Tunnel Clustering Method. Dokl. Math. 110, 474–479 (2024)

The main advantage of the new method is its speed. Unlike classical algorithms that demand significant computational resources, tunnel clustering can, depending on the data configuration, perform the analysis dozens of times faster.

In addition, the researchers introduced the concept of the ‘transition degree’—a parameter indicating how many characteristics of an object must change for it to be classified into a different cluster. This helps assess the clarity of cluster boundaries and identify objects situated at the intersection of different groups.

‘People are generating more and more data, and the pace is only accelerating. According to the latest Digital 2025: Global Overview Report, as of early 2025, there were 5.56 billion internet users—nearly 68% of the global population. Adults spend an average of 6 hours and 38 minutes online each day, communicating, working, watching videos, and consuming content,’ said Alexey Myachin, Senior Research Fellow at the HSE International Centre for Decision Choice and Analysis. ‘Companies that ignore data analysis are losing vast sums of money.’

The authors continue to refine the algorithm, including conducting research into dimensionality reduction, which will help further decrease the time required to identify patterns in data.

The study was carried out with partial support from the Russian Science Foundation.

See also:

HSE University Scholars Uncover E-Learning Preferences of Top Students

HSE University experts have analysed students’ digital footprints and shown for the first time that final grades depend on one’s personal approach to an online course. Balanced students have proven to be more successful than those who follow a more traditional and practical approach. The findings from this study will help create a more adaptive and personalised educational system. This research has been published in the journal The Internet and Higher Education.

HSE Scientists Develop Method to Stabilise Iodine in Solar Cells

Scientists at HSE MIEM, in collaboration with colleagues from China, have developed a method to improve the durability of perovskite solar cells by addressing iodine loss from the material. The researchers introduced quaternary ammonium molecules into the perovskite structure; these molecules form strong electrostatic pairs with iodine ions, effectively anchoring them within the crystal lattice. As a result, the solar cells retain more than 92% of their power after a thousand hours of operation at 85°C. The study has been published in Advanced Energy Materials.

HSE Researchers Create Genome-Wide Map of Quadruplexes

An international team, including researchers from HSE University, has created the first comprehensive map of quadruplexes—unstable DNA structures involved in gene regulation. For the first time, scientists have shown that these structures function in pairs: one is located in a DNA region that initiates gene transcription, while the other lies in a nearby region that enhances this process. In healthy tissues, quadruplexes regulate tissue-specific genes, whereas in cancerous tissues they influence genes responsible for cell growth and division. These findings may contribute to the development of new anticancer drugs that target quadruplexes. The study has been published in Nucleic Acids Research.

Mathematician from HSE University–Nizhny Novgorod Solves Equation Considered Unsolvable in Quadratures Since 19th Century

Mathematician Ivan Remizov from HSE University–Nizhny Novgorod and the Institute for Information Transmission Problems of the Russian Academy of Sciences has made a conceptual breakthrough in the theory of differential equations. He has derived a universal formula for solving problems that had been considered unsolvable in quadratures for more than 190 years. This result fundamentally reshapes one of the oldest areas of mathematics and has potential to have important implications for fundamental physics and economics. The paper has been published in Vladikavkaz Mathematical Journal.

Scientists Reveal How Language Supports Complex Cognitive Processing in the Brain

Valeria Vinogradova, a researcher at HSE University, together with British colleagues, studied how language proficiency affects cognitive processing in deaf adults. The study showed that higher language proficiency—regardless of whether the language is signed or spoken—is associated with higher activity and stronger functional connectivity within the brain network responsible for cognitive task performance. The findings have been published in Cerebral Cortex.

HSE AI Research Centre Simplifies Particle Physics Experiments

Scientists at the HSE AI Research Centre have developed a novel approach to determining robustness in deep learning models. Their method works eight times faster than an exhaustive model search and significantly reduces the need for manual verification. It can be applied to particle physics problems using neural networks of various architectures. The study has been published in IEEE Access.

Scientists Show That Peer Influence Can Be as Effective as Expert Advice

Eating habits can be shaped not only by the authority of medical experts but also through ordinary conversations among friends. Researchers at HSE University have shown that advice from peers to reduce sugar consumption is just as effective as advice from experts. The study's findings have been published in Frontiers in Nutrition.

HSE University Develops Tool for Assessing Text Complexity in Low-Resource Languages

Researchers at the HSE Centre for Language and Brain have developed a tool for assessing text complexity in low-resource languages. The first version supports several of Russia’s minority languages, including Adyghe, Bashkir, Buryat, Tatar, Ossetian, and Udmurt. This is the first tool of its kind designed specifically for these languages, taking into account their unique morphological and lexical features.

HSE Scientists Uncover How Authoritativeness Shapes Trust

Researchers at the HSE Institute for Cognitive Neuroscience have studied how the brain responds to audio deepfakes—realistic fake speech recordings created using AI. The study shows that people tend to trust the current opinion of an authoritative speaker even when new statements contradict the speaker’s previous position. This effect also occurs when the statement conflicts with the listener’s internal attitudes. The research has been published in the journal NeuroImage.

Language Mapping in the Operating Room: HSE Neurolinguists Assist Surgeons in Complex Brain Surgery

Researchers from the HSE Center for Language and Brain took part in brain surgery on a patient who had been seriously wounded in the SMO. A shell fragment approximately five centimetres long entered through the eye socket, penetrated the cranial cavity, and became lodged in the brain, piercing the temporal lobe responsible for language. Surgeons at the Burdenko Main Military Clinical Hospital removed the foreign object while the patient remained conscious. During the operation, neurolinguists conducted language tests to ensure that language function was preserved.