-
Machine Learning Informed by Micro and Mesoscopic Statistical Physics Methods for Community Detection
Authors:
Yijun Ran,
Junfan Yi,
Wei Si,
Michael Small,
Ke-ke Shang
Abstract:
Community detection plays a crucial role in understanding the structural organization of complex networks. Previous methods, particularly those from statistical physics, primarily focus on the analysis of mesoscopic network structures and often struggle to integrate fine-grained node similarities. To address this limitation, we propose a low-complexity framework that integrates machine learning to…
▽ More
Community detection plays a crucial role in understanding the structural organization of complex networks. Previous methods, particularly those from statistical physics, primarily focus on the analysis of mesoscopic network structures and often struggle to integrate fine-grained node similarities. To address this limitation, we propose a low-complexity framework that integrates machine learning to embed micro-level node-pair similarities into mesoscopic community structures. By leveraging ensemble learning models, our approach enhances both structural coherence and detection accuracy. Experimental evaluations on artificial and real-world networks demonstrate that our framework consistently outperforms conventional methods, achieving higher modularity and improved accuracy in NMI and ARI. Notably, when ground-truth labels are available, our approach yields the most accurate detection results, effectively recovering real-world community structures while minimizing misclassifications. To further explain our framework's performance, we analyze the correlation between node-pair similarity and evaluation metrics. The results reveal a strong and statistically significant correlation, underscoring the critical role of node-pair similarity in enhancing detection accuracy. Overall, our findings highlight the synergy between machine learning and statistical physics, demonstrating how machine learning techniques can enhance network analysis and uncover complex structural patterns.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
SUANPAN: Scalable Photonic Linear Vector Machine
Authors:
Ziyue Yang,
Chen Li,
Yuqia Ran,
Yongzhuo Li,
Xue Feng,
Kaiyu Cui,
Fang Liu,
Hao Sun,
Wei Zhang,
Yu Ye,
Fei Qiao,
Cun-Zheng Ning,
Jiaxing Wang,
Connie J. Chang-Hasnain,
Yidong Huang
Abstract:
Photonic linear operation is a promising approach to handle the extensive vector multiplications in artificial intelligence techniques due to the natural bosonic parallelism and high-speed information transmission of photonics. Although it is believed that maximizing the interaction of the light beams is necessary to fully utilize the parallelism and tremendous efforts have been made in past decad…
▽ More
Photonic linear operation is a promising approach to handle the extensive vector multiplications in artificial intelligence techniques due to the natural bosonic parallelism and high-speed information transmission of photonics. Although it is believed that maximizing the interaction of the light beams is necessary to fully utilize the parallelism and tremendous efforts have been made in past decades, the achieved dimensionality of vector-matrix multiplication is very limited due to the difficulty of scaling up a tightly interconnected or highly coupled optical system. Additionally, there is still a lack of a universal photonic computing architecture that can be readily merged with existing computing system to meet the computing power demand of AI techniques. Here, we propose a programmable and reconfigurable photonic linear vector machine to perform only the inner product of two vectors, formed by a series of independent basic computing units, while each unit is just one pair of light-emitter and photodetector. Since there is no interaction among light beams inside, extreme scalability could be achieved by simply duplicating the independent basic computing unit while there is no requirement of large-scale analog-to-digital converter and digital-to-analog converter arrays. Our architecture is inspired by the traditional Chinese Suanpan or abacus and thus is denoted as photonic SUANPAN. As a proof of principle, SUANPAN architecture is implemented with an 8*8 vertical cavity surface emission laser array and an 8*8 MoTe2 two-dimensional material photodetector array. We believe that our proposed photonic SUANPAN is capable of serving as a fundamental linear vector machine that can be readily merged with existing electronic digital computing system and is potential to enhance the computing power for future various AI applications.
△ Less
Submitted 31 October, 2024;
originally announced October 2024.
-
Uncovering multi-order Popularity and Similarity Mechanisms in Link Prediction by graphlet predictors
Authors:
Yong-Jian He,
Yijun Ran,
Zengru Di,
Tao Zhou,
Xiao-Ke Xu
Abstract:
Link prediction has become a critical problem in network science and has thus attracted increasing research interest. Popularity and similarity are two primary mechanisms in the formation of real networks. However, the roles of popularity and similarity mechanisms in link prediction across various domain networks remain poorly understood. Accordingly, this study used orbit degrees of graphlets to…
▽ More
Link prediction has become a critical problem in network science and has thus attracted increasing research interest. Popularity and similarity are two primary mechanisms in the formation of real networks. However, the roles of popularity and similarity mechanisms in link prediction across various domain networks remain poorly understood. Accordingly, this study used orbit degrees of graphlets to construct multi-order popularity- and similarity-based network link predictors, demonstrating that traditional popularity- and similarity-based indices can be efficiently represented in terms of orbit degrees. Moreover, we designed a supervised learning model that fuses multiple orbit-degree-based features and validated its link prediction performance. We also evaluated the mean absolute Shapley additive explanations of each feature within this model across 550 real-world networks from six domains. We observed that the homophily mechanism, which is a similarity-based feature, dominated social networks, with its win rate being 91\%. Moreover, a different similarity-based feature was prominent in economic, technological, and information networks. Finally, no single feature dominated the biological and transportation networks. The proposed approach improves the accuracy and interpretability of link prediction, thus facilitating the analysis of complex networks.
△ Less
Submitted 6 October, 2024; v1 submitted 18 August, 2024;
originally announced August 2024.
-
The maximum capability of a topological feature in link prediction
Authors:
Yijun Ran,
Xiao-Ke Xu,
Tao Jia
Abstract:
Networks offer a powerful approach to modeling complex systems by representing the underlying set of pairwise interactions. Link prediction is the task that predicts links of a network that are not directly visible, with profound applications in biological, social, and other complex systems. Despite intensive utilization of the topological feature in this task, it is unclear to what extent a featu…
▽ More
Networks offer a powerful approach to modeling complex systems by representing the underlying set of pairwise interactions. Link prediction is the task that predicts links of a network that are not directly visible, with profound applications in biological, social, and other complex systems. Despite intensive utilization of the topological feature in this task, it is unclear to what extent a feature can be leveraged to infer missing links. Here, we aim to unveil the capability of a topological feature in link prediction by identifying its prediction performance upper bound. We introduce a theoretical framework that is compatible with different indexes to gauge the feature, different prediction approaches to utilize the feature, and different metrics to quantify the prediction performance. The maximum capability of a topological feature follows a simple yet theoretically validated expression, which only depends on the extent to which the feature is held in missing and nonexistent links. Because a family of indexes based on the same feature shares the same upper bound, the potential of all others can be estimated from one single index. Furthermore, a feature's capability is lifted in the supervised prediction, which can be mathematically quantified, allowing us to estimate the benefit of applying machine learning algorithms. The universality of the pattern uncovered is empirically verified by 550 structurally diverse networks. The findings have applications in feature and method selection, and shed light on network characteristics that make a topological feature effective in link prediction.
△ Less
Submitted 19 April, 2024; v1 submitted 30 June, 2022;
originally announced June 2022.
-
A novel similarity measure for mining missing links in long-path networks
Authors:
Yijun Ran,
Tianyu Liu,
Tao Jia,
Xiao-Ke Xu
Abstract:
Network information mining is the study of the network topology, which answers a large number of application-based questions towards the structural evolution and the function of a real system. For example, the questions can be related to how the real system evolves or how individuals interact with each other in social networks. Although the evolution of the real system may seem to be found regular…
▽ More
Network information mining is the study of the network topology, which answers a large number of application-based questions towards the structural evolution and the function of a real system. For example, the questions can be related to how the real system evolves or how individuals interact with each other in social networks. Although the evolution of the real system may seem to be found regularly, capturing patterns on the whole process of the evolution is not trivial. Link prediction is one of the most important technologies in network information mining, which can help us understand the real system's evolution law. Link prediction aims to uncover missing links or quantify the likelihood of the emergence of nonexistent links from known network structures. Currently, widely existing methods of link prediction almost focus on short-path networks that usually have a myriad of close triangular structures. However, these algorithms on highly sparse or long-path networks have poor performance. Here, we propose a new index that is associated with the principles of Structural Equivalence and Shortest Path Length ($SESPL$) to estimate the likelihood of link existence in long-path networks. Through 548 real networks test, we find that $SESPL$ is more effective and efficient than other similarity-based predictors in long-path networks. We also exploit the performance of $SESPL$ predictor and embedding-based approaches via machine learning techniques, and the performance of $SESPL$ can achieve a gain of 44.09\% over $GraphWave$ and 7.93\% over $Node2vec$. Finally, according to the matrix of Maximal Information Coefficient ($MIC$) between all the similarity-based predictors, $SESPL$ is a new independent feature to the space of traditional similarity features.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
A generalized linear threshold model for an improved description of the spreading dynamics
Authors:
Yijun Ran,
Xiaomin Deng,
Xiaomeng Wang,
Tao Jia
Abstract:
Many spreading processes in our real-life can be considered as a complex contagion, and the linear threshold (LT) model is often applied as a very representative model for this mechanism. Despite its intensive usage, the LT model suffers several limitations in describing the time evolution of the spreading. First, the discrete-time step that captures the speed of the spreading is vaguely defined.…
▽ More
Many spreading processes in our real-life can be considered as a complex contagion, and the linear threshold (LT) model is often applied as a very representative model for this mechanism. Despite its intensive usage, the LT model suffers several limitations in describing the time evolution of the spreading. First, the discrete-time step that captures the speed of the spreading is vaguely defined. Second, the synchronous updating rule makes the nodes infected in batches, which can not take individual differences into account. Finally, the LT model is incompatible with existing models for the simple contagion. Here we consider a generalized linear threshold (GLT) model for the continuous-time stochastic complex contagion process that can be efficiently implemented by the Gillespie algorithm. The time in this model has a clear mathematical definition and the updating order is rigidly defined. We find that the traditional LT model systematically underestimates the spreading speed and the randomness in the spreading sequence order. We also show that the GLT model works seamlessly with the susceptible-infected (SI) or susceptible-infected-recovered (SIR) model. One can easily combine them to model a hybrid spreading process in which simple contagion accumulates the critical mass for the complex contagion that leads to the global cascades. Overall, the GLT model we proposed can be a useful tool to study complex contagion, especially when studying the time evolution of the spreading.
△ Less
Submitted 16 August, 2020;
originally announced August 2020.