Search | arXiv e-print repository

Scalable Complexity Control Facilitates Reasoning Ability of LLMs

Authors: Liangkai Hang, Junjie Yao, Zhiwei Bai, Tianyi Chen, Yang Chen, Rongjie Diao, Hezhou Li, Pengxiao Lin, Zhiwei Wang, Cheng Xu, Zhongwang Zhang, Zhangchen Zhou, Zhiyu Li, Zehao Lin, Kai Chen, Feiyu Xiong, Yaoyu Zhang, Weinan E, Hongkang Yang, Zhi-Qin John Xu

Abstract: The reasoning ability of large language models (LLMs) has been rapidly advancing in recent years, attracting interest in more fundamental approaches that can reliably enhance their generalizability. This work demonstrates that model complexity control, conveniently implementable by adjusting the initialization rate and weight decay coefficient, improves the scaling law of LLMs consistently over va… ▽ More The reasoning ability of large language models (LLMs) has been rapidly advancing in recent years, attracting interest in more fundamental approaches that can reliably enhance their generalizability. This work demonstrates that model complexity control, conveniently implementable by adjusting the initialization rate and weight decay coefficient, improves the scaling law of LLMs consistently over varying model sizes and data sizes. This gain is further illustrated by comparing the benchmark performance of 2.4B models pretrained on 1T tokens with different complexity hyperparameters. Instead of fixing the initialization std, we found that a constant initialization rate (the exponent of std) enables the scaling law to descend faster in both model and data sizes. These results indicate that complexity control is a promising direction for the continual advancement of LLMs. △ Less

Submitted 28 May, 2025; originally announced May 2025.

arXiv:2505.21138 [pdf, ps, other]

Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis

Authors: Tianyi Xu, Hongjie Chen, Wang Qing, Lv Hang, Jian Kang, Li Jie, Zhennan Lin, Yongxiang Li, Xie Lei

Abstract: Large-scale training corpora have significantly improved the performance of ASR models. Unfortunately, due to the relative scarcity of data, Chinese accents and dialects remain a challenge for most ASR models. Recent advancements in self-supervised learning have shown that self-supervised pre- training, combined with large language models (LLM), can effectively enhance ASR performance in low-resou… ▽ More Large-scale training corpora have significantly improved the performance of ASR models. Unfortunately, due to the relative scarcity of data, Chinese accents and dialects remain a challenge for most ASR models. Recent advancements in self-supervised learning have shown that self-supervised pre- training, combined with large language models (LLM), can effectively enhance ASR performance in low-resource scenarios. We aim to investigate the effectiveness of this paradigm for Chinese dialects. Specifically, we pre-train a Data2vec2 model on 300,000 hours of unlabeled dialect and accented speech data and do alignment training on a supervised dataset of 40,000 hours. Then, we systematically examine the impact of various projectors and LLMs on Mandarin, dialect, and accented speech recognition performance under this paradigm. Our method achieved SOTA results on multiple dialect datasets, including Kespeech. We will open-source our work to promote reproducible research △ Less

Submitted 27 May, 2025; originally announced May 2025.

arXiv:2504.04381 [pdf, other]

Error analysis of a Euler finite element scheme for Natural convection model with variable density

Authors: Li Hang, Chenyang Li

Abstract: In this paper, we derive first-order Euler finite element discretization schemes for a time-dependent natural convection model with variable density (NCVD). The model is governed by the variable density Navier-Stokes equations coupled with a parabolic partial differential equation that describes the evolution of temperature. Stability and error estimate for the velocity, pressure, density and temp… ▽ More In this paper, we derive first-order Euler finite element discretization schemes for a time-dependent natural convection model with variable density (NCVD). The model is governed by the variable density Navier-Stokes equations coupled with a parabolic partial differential equation that describes the evolution of temperature. Stability and error estimate for the velocity, pressure, density and temperature in $L^2$-norm are proved by using finite element approximations in space and finite differences in time. Finally, the numerical results are showed to support the theoretical analysis. △ Less

Submitted 19 May, 2025; v1 submitted 6 April, 2025; originally announced April 2025.

arXiv:2502.15185 [pdf]

Key Body Posture Characteristics of Short-distance Speed Skaters at the Start Based on Artificial Intelligence

Authors: Zhang Xueliana, Fang Yingjieb, Liu Hang

Abstract: Objective To conduct biomechanical analysis on the starting technique of male short-distance speed skating athletes in China and determine the key factors affecting the effectiveness of the starting movement. Methods 13 high-level male short-distance speed skating athletes were selected as the test subjects, and kinematic data were collected using an artificial intelligence video capture and analy… ▽ More Objective To conduct biomechanical analysis on the starting technique of male short-distance speed skating athletes in China and determine the key factors affecting the effectiveness of the starting movement. Methods 13 high-level male short-distance speed skating athletes were selected as the test subjects, and kinematic data were collected using an artificial intelligence video capture and analysis system. The body posture features and their effects on the starting movement performance were analyzed in the three stages of starting preparation, starting, and sprinting. Results The post-stability angle, anterior knee angle of the front leg, posterior knee angle of the rear leg, and stride length showed moderate to high positive correlations with the starting speed during the starting preparation stage. The trunk angle showed a high negative correlation with the starting speed. The trunk angle (TO4, TD4, TO6, TD6), hip angle (TO1, TO4, TO6), and knee angle (TD1) showed moderate to high negative correlations with the effectiveness of the starting movement during the starting and sprinting stages. The knee angle (TD2), ice-contact angle (TD2, TD4, TD5, TD6), and propulsion angle (TO1, TO4, TO7) showed moderate positive correlations with the effectiveness of the starting movement. Conclusion Stride length, left knee angle, and post-stability angle are the key factors affecting the starting speed. The larger the post-stability angle and left knee angle and the longer the stride length, the faster the starting speed. During the starting and sprinting stages, the smaller the ice-contact angle and propulsion angle, the greater the trunk angle and hip angle changes, the more effective the starting movement. △ Less

Submitted 20 February, 2025; originally announced February 2025.

arXiv:2411.06493 [pdf, other]

LProtector: An LLM-driven Vulnerability Detection System

Authors: Ze Sheng, Fenghua Wu, Xiangwu Zuo, Chao Li, Yuxin Qiao, Lei Hang

Abstract: This paper presents LProtector, an automated vulnerability detection system for C/C++ codebases driven by the large language model (LLM) GPT-4o and Retrieval-Augmented Generation (RAG). As software complexity grows, traditional methods face challenges in detecting vulnerabilities effectively. LProtector leverages GPT-4o's powerful code comprehension and generation capabilities to perform binary cl… ▽ More This paper presents LProtector, an automated vulnerability detection system for C/C++ codebases driven by the large language model (LLM) GPT-4o and Retrieval-Augmented Generation (RAG). As software complexity grows, traditional methods face challenges in detecting vulnerabilities effectively. LProtector leverages GPT-4o's powerful code comprehension and generation capabilities to perform binary classification and identify vulnerabilities within target codebases. We conducted experiments on the Big-Vul dataset, showing that LProtector outperforms two state-of-the-art baselines in terms of F1 score, demonstrating the potential of integrating LLMs with vulnerability detection. △ Less

Submitted 14 November, 2024; v1 submitted 10 November, 2024; originally announced November 2024.

Comments: 5 pages, 4 figures. This is a preprint version of the article. The final version will be published in the proceedings of the IEEE conference

arXiv:2405.00317 [pdf, other]

Input gradient annealing neural network for solving low-temperature Fokker-Planck equations

Authors: Liangkai Hang, Dan Hu, Zhi-Qin John Xu

Abstract: We present a novel yet simple deep learning approach, called input gradient annealing neural network (IGANN), for solving stationary Fokker-Planck equations. Traditional methods, such as finite difference and finite elements, suffer from the curse of dimensionality. Neural network based algorithms are meshless methods, which can avoid the curse of dimensionality. However, at low temperature, when… ▽ More We present a novel yet simple deep learning approach, called input gradient annealing neural network (IGANN), for solving stationary Fokker-Planck equations. Traditional methods, such as finite difference and finite elements, suffer from the curse of dimensionality. Neural network based algorithms are meshless methods, which can avoid the curse of dimensionality. However, at low temperature, when directly solving a stationary Fokker-Planck equation with more than two metastable states in the generalized potential landscape, the small eigenvalue introduces numerical difficulties due to a large condition number. To overcome these problems, we introduce the IGANN method, which uses a penalty of negative input gradient annealing during the training. We demonstrate that the IGANN method can effectively solve high-dimensional and low-temperature Fokker-Planck equations through our numerical experiments. △ Less

Submitted 1 September, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

arXiv:2401.01220 [pdf, other]

Solving multiscale dynamical systems by deep learning

Authors: Junjie Yao, Yuxiao Yi, Liangkai Hang, Weinan E, Weizong Wang, Yaoyu Zhang, Tianhan Zhang, Zhi-Qin John Xu

Abstract: Multiscale dynamical systems, modeled by high-dimensional stiff ordinary differential equations (ODEs) with wide-ranging characteristic timescales, arise across diverse fields of science and engineering, but their numerical solvers often encounter severe efficiency bottlenecks. This paper introduces a novel DeePODE method, which consists of an Evolutionary Monte Carlo Sampling method (EMCS) and an… ▽ More Multiscale dynamical systems, modeled by high-dimensional stiff ordinary differential equations (ODEs) with wide-ranging characteristic timescales, arise across diverse fields of science and engineering, but their numerical solvers often encounter severe efficiency bottlenecks. This paper introduces a novel DeePODE method, which consists of an Evolutionary Monte Carlo Sampling method (EMCS) and an efficient end-to-end deep neural network (DNN) to predict multiscale dynamical systems. We validate this finding across dynamical systems from ecological systems to reactive flows, including a predator-prey model, a power system oscillation, a battery electrolyte thermal runaway, and turbulent reaction-diffusion systems with complex chemical kinetics. The method demonstrates robust generalization capabilities, allowing pre-trained DNN models to accurately predict the behavior in previously unseen scenarios, largely due to the delicately constructed dataset. While theoretical guarantees remain to be established, empirical evidence shows that DeePODE achieves the accuracy of implicit numerical schemes while maintaining the computational efficiency of explicit schemes. This work underscores the crucial relationship between training data distribution and neural network generalization performance. This work demonstrates the potential of deep learning approaches in modeling complex dynamical systems across scientific and engineering domains. △ Less

Submitted 3 January, 2025; v1 submitted 2 January, 2024; originally announced January 2024.

Comments: 18 pages, 6 figures

arXiv:2311.11732 [pdf, other]

Study change of the performance of airfoil of small wind turbine under low wind speed by CFD simulation

Authors: Le Quang Sang, Dinh Van Thin, Nguyen Huu Duc, Nguyen Duc Minh, Doan Hong Quan, Le Thi Thuy Hang

Abstract: Renewable energy has received strong attention and investment to replace fossil energy sources and reduce greenhouse gas emissions. Quite good and good wind speed areas have been invested in building large-capacity wind farms for many years. The low wind speed region occupies a very large on the world, which has been interested in the exploitation of wind energy in recent years. In this study, the… ▽ More Renewable energy has received strong attention and investment to replace fossil energy sources and reduce greenhouse gas emissions. Quite good and good wind speed areas have been invested in building large-capacity wind farms for many years. The low wind speed region occupies a very large on the world, which has been interested in the exploitation of wind energy in recent years. In this study, the original airfoil of S1010 operated at low wind speed was redesigned to increase the aerodynamic efficiency of the airfoil by using XFLR5 software. After, the new VAST-EPU-S1010 airfoil model was adjusted to the maximum thickness and the maximum thickness position. It was simulated in low wind speed conditions of 4-6 m/s by CFD simulation. The lift coefficient, drag coefficient and $C_{L}$/$C_{D}$ coefficient ratio were evaluated under the effect of the angle of attack and the maximum thickness by using the $k-ε$ model. Simulation results show that the VAST-EPU-S1010 airfoil achieved the greatest aerodynamic efficiency at the angle of attack of $3\,^{\circ}$, the maximum thickness of 8\% and the maximum thickness position of 20.32\%. The maximum value of $C_{L}$/$C_{D}$ of the new airfoil at 6 m/s is higher than at the 4 m/s by about 6.25\%. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: 19 pages, 21 figures

MSC Class: 76B10 (Primary); 65B05 (Secondary) ACM Class: G.1.6; I.6.3; I.4.0

arXiv:2209.02320 [pdf, other]

doi 10.1140/epjp/s13360-023-03747-2

ALETHEIA: Hunting for Low-mass Dark Matter with Liquid Helium TPCs

Authors: Junhui Liao, Yuanning Gao, Zhen Jiang, Zhuo Liang, Zebang OuYang, Zhaohua Peng, Fengshou Zhang, Lei hang, Jiangfeng Zhou

Abstract: Dark Matter (DM) is one of the most critical questions to be understood and answered in fundamental physics today. Observations with varied astronomical and cosmological technologies strongly indicated that DM exists in the Universe, the Milky Way, and the Solar System. Nevertheless, understanding DM under the language of elementary physics is still in progress. DM direct detection tests the inter… ▽ More Dark Matter (DM) is one of the most critical questions to be understood and answered in fundamental physics today. Observations with varied astronomical and cosmological technologies strongly indicated that DM exists in the Universe, the Milky Way, and the Solar System. Nevertheless, understanding DM under the language of elementary physics is still in progress. DM direct detection tests the interactive cross-section between galactic DM particles and an underground detector's nucleons. Although Weakly Interactive Massive Particles (WIMPs) are the most discussed DM candidates, the null-WIMPs conclusion has been consistently addressed by the most convincing experiments in the field. Relatively, the low-mass WIMPs region ($\sim$ 10 MeV/c$^2$ - 10 GeV/c$^2$) has not been fully exploited compared to high-mass WIMPs ($\sim$ 10 GeV/c$^2$ - 10 TeV/c$^2$). The ALETHEIA (A Liquid hElium Time projection cHambEr In dArk matter) experiment aims to hunt for low-mass WIMPs with liquid helium-filled TPCs (Time Projection Chambers). In this paper, we go through the physics motivation of the project, the detector's design, the R\&D plan, and the progress we have made since the project has been launched in the summer of 2020. △ Less

Submitted 5 December, 2022; v1 submitted 6 September, 2022; originally announced September 2022.

Comments: arXiv admin note: text overlap with arXiv:2203.07901, arXiv:2103.02161

arXiv:2005.11992 [pdf, other]

MPSUM: Entity Summarization with Predicate-based Matching

Authors: Dongjun Wei, Shiyuan Gao, Yaxin Liu, Zhibing Liu, Longtao Hang

Abstract: With the development of Semantic Web, entity summarization has become an emerging task to generate concrete summaries for real world entities. To solve this problem, we propose an approach named MPSUM that extends a probabilistic topic model by integrating the idea of predicate-uniqueness and object-importance for ranking triples. The approach aims at generating brief but representative summaries… ▽ More With the development of Semantic Web, entity summarization has become an emerging task to generate concrete summaries for real world entities. To solve this problem, we propose an approach named MPSUM that extends a probabilistic topic model by integrating the idea of predicate-uniqueness and object-importance for ranking triples. The approach aims at generating brief but representative summaries for entities. We compare our approach with the state-of-the-art methods using DBpedia and LinkedMDB datasets.The experimental results show that our work improves the quality of entity summarization. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Comments: 6 pages, accepted in EYRE@CIKM'2018

arXiv:1912.05803 [pdf, other]

doi 10.1364/OE.386204

Optical needles with arbitrary homogeneous three-dimensional polarization

Authors: Li Hang, Ying Wang, Peifeng Chen

Abstract: We propose a new method to generate optical needles by focusing vector beams comprised of radially polarized component and azimuthally polarized vortex components. The radial part can generate longitudinal polarization, while the azimuthal parts can generate left- and right-handed polarization. Hence, an arbitrary 3D polarization can be obtained. To our knoeledge, it may be the first time that arb… ▽ More We propose a new method to generate optical needles by focusing vector beams comprised of radially polarized component and azimuthally polarized vortex components. The radial part can generate longitudinal polarization, while the azimuthal parts can generate left- and right-handed polarization. Hence, an arbitrary 3D polarization can be obtained. To our knoeledge, it may be the first time that arbitrarily polarized optical needles whose transverse sizes are under 0.5$λ$ have been achieved. And their polarized homogeneity is beyond 0.97. △ Less

Submitted 12 December, 2019; originally announced December 2019.

Comments: 8 pages, 6 figure

arXiv:1904.05002 [pdf, ps, other]

Predicting Earth's Carrying Capacity of Human Population as the Predator and the Natural Resources as the Prey in the Modified Lotka-Volterra Equations with Time-dependent Parameters

Authors: Cheng Sok Kin, Ian Man Ut, Lo Hang, U Ieng Hou, Ng Ka Weng, Un Soi Ha, Lei Ka Hin, Cheng Kun Heng, Tam Seak Tim, Chan Iong Kuai, Lee Wei Shan

Abstract: We modified the Lotka-Volterra Equations with the assumption that two of the original four constant parameters in the traditional equations are time-dependent. In the first place, we assumed that the human population (borrowed from the T-Function) plays the role as the prey while all lethal factors that jeopardize the existence of the human race as the predator. Although we could still calculate t… ▽ More We modified the Lotka-Volterra Equations with the assumption that two of the original four constant parameters in the traditional equations are time-dependent. In the first place, we assumed that the human population (borrowed from the T-Function) plays the role as the prey while all lethal factors that jeopardize the existence of the human race as the predator. Although we could still calculate the time-dependent lethal function, the idea of treating the lethal factors as the prey was too general to recognize the meaning of them. Hence, in the second part of the modified Lotka-Volterra Equations, we exchanged the roles between the prey and the predator. This time, we treated the prey as the natural resources while the predator as the human population (still borrowed from the T-Function). After carefully choosing appropriate parameters to match the maximum carrying capacity with the saturated number of the human population predicted by the T-Function, we successfully calculated the natural resources as a function of time. Contrary to our intuition, the carrying capacity is constant over time rather than a time-varying function, with the constant value of 10.2 billion people. △ Less

Submitted 8 November, 2019; v1 submitted 10 April, 2019; originally announced April 2019.

arXiv:1807.03959 [pdf, other]

Deep attention-based classification network for robust depth prediction

Authors: Ruibo Li, Ke Xian, Chunhua Shen, Zhiguo Cao, Hao Lu, Lingxiao Hang

Abstract: In this paper, we present our deep attention-based classification (DABC) network for robust single image depth prediction, in the context of the Robust Vision Challenge 2018 (ROB 2018). Unlike conventional depth prediction, our goal is to design a model that can perform well in both indoor and outdoor scenes with a single parameter set. However, robust depth prediction suffers from two challenging… ▽ More In this paper, we present our deep attention-based classification (DABC) network for robust single image depth prediction, in the context of the Robust Vision Challenge 2018 (ROB 2018). Unlike conventional depth prediction, our goal is to design a model that can perform well in both indoor and outdoor scenes with a single parameter set. However, robust depth prediction suffers from two challenging problems: a) How to extract more discriminative features for different scenes (compared to a single scene)? b) How to handle the large differences of depth ranges between indoor and outdoor datasets? To address these two problems, we first formulate depth prediction as a multi-class classification task and apply a softmax classifier to classify the depth label of each pixel. We then introduce a global pooling layer and a channel-wise attention mechanism to adaptively select the discriminative channels of features and to update the original features by assigning important channels with higher weights. Further, to reduce the influence of quantization errors, we employ a soft-weighted sum inference strategy for the final prediction. Experimental results on both indoor and outdoor datasets demonstrate the effectiveness of our method. It is worth mentioning that we won the 2-nd place in single image depth prediction entry of ROB 2018, in conjunction with IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018. △ Less

Submitted 11 July, 2018; originally announced July 2018.

arXiv:1306.1976 [pdf]

Corrections on energy spectrum and scatterings for fast neutron radiography at NECTAR facility

Authors: Liu Shu-Quan, Bücherl Thomas, Li Hang, Zou Yu-Bin, Lu Yuan-Rong, Guo Zhi-Yu

Abstract: Neutron spectrum and scattered neutrons caused distortions are major problems in fast neutron radiography and should be considered for improving the image quality. This paper puts emphasis on the removal of these image distortions and deviations for fast neutron radiography performed at the NECTAR facility of the research reactor FRM-II in Technische Universität München (TUM), Germany. The NECTAR… ▽ More Neutron spectrum and scattered neutrons caused distortions are major problems in fast neutron radiography and should be considered for improving the image quality. This paper puts emphasis on the removal of these image distortions and deviations for fast neutron radiography performed at the NECTAR facility of the research reactor FRM-II in Technische Universität München (TUM), Germany. The NECTAR energy spectrum is analyzed and established to modify the influence caused by neutron spectrum, as well as the Point Scattered Function (PScF) simulated by the Monte-Carlo program MCNPX is used to evaluate scattering effects from the object and improve images qualities. Good analysis results prove the sounded effects of above two corrections. △ Less

Submitted 8 June, 2013; originally announced June 2013.

Showing 1–14 of 14 results for author: Hang, L