Skip to main content

Showing 1–50 of 280 results for author: Rong, Y

.
  1. arXiv:2509.21268  [pdf, ps, other

    cs.CV

    MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

    Authors: Sicong Leng, Jing Wang, Jiaxi Li, Hao Zhang, Zhiqiang Hu, Boqiang Zhang, Yuming Jiang, Hang Zhang, Xin Li, Lidong Bing, Deli Zhao, Wei Lu, Yu Rong, Aixin Sun, Shijian Lu

    Abstract: Large multimodal reasoning models have achieved rapid progress, but their advancement is constrained by two major limitations: the absence of open, large-scale, high-quality long chain-of-thought (CoT) data, and the instability of reinforcement learning (RL) algorithms in post-training. Group Relative Policy Optimization (GRPO), the standard framework for RL fine-tuning, is prone to gradient vanis… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

  2. arXiv:2509.17437  [pdf, ps, other

    cs.CL

    GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning

    Authors: Guizhen Chen, Weiwen Xu, Hao Zhang, Hou Pong Chan, Deli Zhao, Anh Tuan Luu, Yu Rong

    Abstract: Recent advancements in reinforcement learning (RL) have enhanced the reasoning abilities of large language models (LLMs), yet the impact on multimodal LLMs (MLLMs) is limited. Particularly in vision-intensive tasks like geometric reasoning, MLLMs hallucinate frequently, leading to inaccurate reasoning. We attribute this to the perceptual bottleneck in MLLMs, which caps the benefits of reasoning tr… ▽ More

    Submitted 22 September, 2025; originally announced September 2025.

    Comments: Accepted to EMNLP2025 Findings

  3. arXiv:2509.16971  [pdf, ps, other

    cs.SD eess.AS

    AudioGenie-Reasoner: A Training-Free Multi-Agent Framework for Coarse-to-Fine Audio Deep Reasoning

    Authors: Yan Rong, Chenxing Li, Dong Yu, Li Liu

    Abstract: Audio deep reasoning is a challenging task that requires expert-level perception, multi-step logical inference, and the integration of contextual knowledge. However, existing models suffer from a gap between audio perception and reasoning abilities due to the lack of training data with explicit reasoning chains and the absence of mechanisms for active exploration and iterative refinement. To addre… ▽ More

    Submitted 21 September, 2025; originally announced September 2025.

  4. arXiv:2509.16943  [pdf, ps, other

    hep-ex astro-ph.HE

    Investigation of hadronic cross sections of cosmic ray carbon and oxygen on BGO from 200 GeV to 10 TeV energy at the DAMPE experiment

    Authors: F. Alemanno, Q. An, P. Azzarello, F. C. T. Barbato, P. Bernardini, X. J. Bi, H. Boutin, I. Cagnoli, M. S. Cai, E. Casilli, E. Catanzani, J. Chang, D. Y. Chen, J. L. Chen, Z. F. Chen, Z. X. Chen, P. Coppin, M. Y. Cui, T. S. Cui, Y. X. Cui, I. De Mitri, F. de Palma, A. Di Giovanni, T. K. Dong, Z. X. Dong , et al. (122 additional authors not shown)

    Abstract: The Dark Matter Particle Explorer (DAMPE) has made significant progress in measuring the fluxes of cosmic rays. These new measurements are pivotal in advancing our understanding of the origins and propagation mechanisms of cosmic rays. The bismuth germanium oxide (BGO) calorimeter plays a crucial role in these measurements, particularly in the precise determination of cosmic ray fluxes. However, f… ▽ More

    Submitted 21 September, 2025; originally announced September 2025.

  5. arXiv:2509.13984  [pdf, ps, other

    eess.SP physics.optics

    Distributed Coherent Beamforming at 60 GHz Enabled by Optically-Established Coherence

    Authors: Drake Silbernagel, Yu Rong, Isabella Lenz, Prithvi Hemanth, Carl Morgenstern, Owen Ma, Nolan Matthews, Nader Zaki, Kyle W. Martin, John D. Elgin, Jacob Holtom, Daniel W. Bliss, Kimberly Frey

    Abstract: We implement and experimentally demonstrate a 60 GHz distributed system leveraging an optical time synchronization system that provides precise time and frequency alignment between independent elements of the distributed mesh. Utilizing such accurate coherence, we perform receive beamforming with interference rejection and transmit nulling. In these configurations, the system achieves a coherent g… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

  6. arXiv:2509.11606  [pdf, ps, other

    cs.SD cs.LG eess.SP

    Scaling to Multimodal and Multichannel Heart Sound Classification: Fine-Tuning Wav2Vec 2.0 with Synthetic and Augmented Biosignals

    Authors: Milan Marocchi, Matthew Fynn, Kayapanda Mandana, Yue Rong

    Abstract: Cardiovascular diseases (CVDs) are the leading cause of death worldwide, accounting for approximately 17.9 million deaths each year. Early detection is critical, creating a demand for accurate and inexpensive pre-screening methods. Deep learning has recently been applied to classify abnormal heart sounds indicative of CVDs using synchronised phonocardiogram (PCG) and electrocardiogram (ECG) signal… ▽ More

    Submitted 25 September, 2025; v1 submitted 15 September, 2025; originally announced September 2025.

    Comments: 35 pages, 37 figures, 19 tables

  7. arXiv:2509.10891  [pdf, ps, other

    q-bio.NC

    Causal Emergence of Consciousness through Learned Multiscale Neural Dynamics in Mice

    Authors: Zhipeng Wang, Yingqi Rong, Kaiwei Liu, Mingzhe Yang, Jiang Zhang, Jing He

    Abstract: Consciousness spans macroscopic experience and microscopic neuronal activity, yet linking these scales remains challenging. Prevailing theories, such as Integrated Information Theory, focus on a single scale, overlooking how causal power and its dynamics unfold across scales. Progress is constrained by scarce cross-scale data and difficulties in quantifying multiscale causality and dynamics. Here,… ▽ More

    Submitted 13 September, 2025; originally announced September 2025.

  8. From Post To Personality: Harnessing LLMs for MBTI Prediction in Social Media

    Authors: Tian Ma, Kaiyu Feng, Yu Rong, Kangfei Zhao

    Abstract: Personality prediction from social media posts is a critical task that implies diverse applications in psychology and sociology. The Myers Briggs Type Indicator (MBTI), a popular personality inventory, has been traditionally predicted by machine learning (ML) and deep learning (DL) techniques. Recently, the success of Large Language Models (LLMs) has revealed their huge potential in understanding… ▽ More

    Submitted 28 August, 2025; originally announced September 2025.

    Journal ref: CIKM 2025 Short Paper (Technical Report)

  9. arXiv:2508.20494  [pdf, ps, other

    astro-ph.CO astro-ph.GA

    Galaxy Group Spin Alignment with Cosmic Filament in the TNG Simulation

    Authors: Wei Wang, Peng Wang, Yu Rong, Hao-da Wang, Xiao-xiao Tang

    Abstract: We investigate the alignment between the spin vectors of galaxy groups and the axes of their nearest cosmic filaments using the TNG300-1 cosmological hydrodynamical simulation. By systematically analyzing a large sample of groups, we find a robust perpendicular alignment between group spin and filament orientation. Among all examined properties, only group mass and the distance to the nearest fila… ▽ More

    Submitted 27 September, 2025; v1 submitted 28 August, 2025; originally announced August 2025.

    Comments: 16 pages, 5 figures, Accepted for publication in JCAP

  10. arXiv:2508.16044  [pdf, ps, other

    cs.DB

    AMAZe: A Multi-Agent Zero-shot Index Advisor for Relational Databases

    Authors: Zhaodonghui Li, Haitao Yuan, Jiachen Shi, Hao Zhang, Yu Rong, Gao Cong

    Abstract: Index recommendation is one of the most important problems in database management system (DBMS) optimization. Given queries and certain index-related constraints, traditional methods rely on heuristic optimization or learning-based models to select effective indexes and improve query performance. However, heuristic optimization suffers from high computation time, and learning-based models lose gen… ▽ More

    Submitted 16 September, 2025; v1 submitted 21 August, 2025; originally announced August 2025.

  11. arXiv:2508.13597  [pdf, ps, other

    astro-ph.GA astro-ph.CO

    The Cosmic Dance: Observational Detection of Coherent Spin in Galaxy Clusters

    Authors: Xiao-xiao Tang, Peng Wang, Yu Rong, Weiguang cui

    Abstract: The spin of galaxy clusters encodes key information about their formation, dynamics, and the influence of large-scale structure. However, whether clusters possess statistically significant spin and how to measure it observationally remain open questions. Here, we present the first observational statistical detection of coherent spin in galaxy clusters, by using a sample of 2,170 systems with… ▽ More

    Submitted 19 August, 2025; originally announced August 2025.

    Comments: 10 pages, 5+1 figures, submitted

  12. arXiv:2508.10331  [pdf, ps, other

    stat.ME

    Synthesizing Evidence: Data-Pooling as a Tool for Treatment Selection in Online Experiments

    Authors: Zhenkang Peng, Chengzhang Li, Ying Rong, Renyu Zhang

    Abstract: Randomized experiments are the gold standard for causal inference but face significant challenges in business applications, including limited traffic allocation, the need for heterogeneous treatment effect estimation, and the complexity of managing overlapping experiments. These factors lead to high variability in treatment effect estimates, making data-driven policy roll out difficult. To address… ▽ More

    Submitted 15 August, 2025; v1 submitted 14 August, 2025; originally announced August 2025.

  13. arXiv:2507.22607  [pdf, ps, other

    cs.CV cs.AI cs.CL

    VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning

    Authors: Ruifeng Yuan, Chenghao Xiao, Sicong Leng, Jianyu Wang, Long Li, Weiwen Xu, Hou Pong Chan, Deli Zhao, Tingyang Xu, Zhongyu Wei, Hao Zhang, Yu Rong

    Abstract: Reinforcement learning has proven its effectiveness in enhancing the reasoning capabilities of large language models. Recent research efforts have progressively extended this paradigm to multimodal reasoning tasks. Due to the inherent complexity and diversity of multimodal tasks, especially in semantic content and problem formulations, existing models often exhibit unstable performance across vari… ▽ More

    Submitted 31 July, 2025; v1 submitted 30 July, 2025; originally announced July 2025.

    Comments: 21 pages, 5 figures, 6 tables. Work in progress

  14. arXiv:2507.19140  [pdf, ps, other

    cs.CV

    Balancing Conservatism and Aggressiveness: Prototype-Affinity Hybrid Network for Few-Shot Segmentation

    Authors: Tianyu Zou, Shengwu Xiong, Ruilin Yao, Yi Rong

    Abstract: This paper studies the few-shot segmentation (FSS) task, which aims to segment objects belonging to unseen categories in a query image by learning a model on a small number of well-annotated support samples. Our analysis of two mainstream FSS paradigms reveals that the predictions made by prototype learning methods are usually conservative, while those of affinity learning methods tend to be more… ▽ More

    Submitted 25 July, 2025; originally announced July 2025.

    Comments: 8 pages, 7 figures

    Journal ref: ICCV 2025

  15. Hierarchical Graph Information Bottleneck for Multi-Behavior Recommendation

    Authors: Hengyu Zhang, Chunxu Shen, Xiangguo Sun, Jie Tan, Yanchao Tan, Yu Rong, Hong Cheng, Lingling Yi

    Abstract: In real-world recommendation scenarios, users typically engage with platforms through multiple types of behavioral interactions. Multi-behavior recommendation algorithms aim to leverage various auxiliary user behaviors to enhance prediction for target behaviors of primary interest (e.g., buy), thereby overcoming performance limitations caused by data sparsity in target behavior records. Current st… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

    Comments: Accepted by RecSys2025

  16. arXiv:2507.09547  [pdf, ps, other

    nucl-th

    Light and heavy $Λ$ hyperclusters in nuclear matter with RMF models

    Authors: Cheng-Jun Xia, Yu-Ting Rong, Ting-Ting Sun

    Abstract: In the framework of RMF models, we investigate the properties of light and heavy $Λ$ hyperclusters emersed in nuclear matter at various densities $n_{\mathrm{gas}}$ and proton fractions $Y_p$. In particular, the (hyper)clusters are fixed by solving the Dirac equations imposing the Dirichlet-Neumann boundary condition, while the nuclear matter take constant densities and is treated with Thomas-Ferm… ▽ More

    Submitted 13 July, 2025; originally announced July 2025.

  17. arXiv:2507.06853  [pdf, ps, other

    cs.LG cs.AI cs.CE physics.chem-ph q-bio.MN

    DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models

    Authors: Liang Wang, Yu Rong, Tingyang Xu, Zhenyi Zhong, Zhiyuan Liu, Pengju Wang, Deli Zhao, Qiang Liu, Shu Wu, Liang Wang

    Abstract: Molecular structure elucidation from spectra is a foundational problem in chemistry, with profound implications for compound identification, synthesis, and drug development. Traditional methods rely heavily on expert interpretation and lack scalability. Pioneering machine learning methods have introduced retrieval-based strategies, but their reliance on finite libraries limits generalization to no… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

  18. arXiv:2507.05311  [pdf, ps, other

    cs.IR cs.AI

    PLACE: Prompt Learning for Attributed Community Search

    Authors: Shuheng Fang, Kangfei Zhao, Rener Zhang, Yu Rong, Jeffrey Xu Yu

    Abstract: In this paper, we propose PLACE (Prompt Learning for Attributed Community Search), an innovative graph prompt learning framework for ACS. Enlightened by prompt-tuning in Natural Language Processing (NLP), where learnable prompt tokens are inserted to contextualize NLP queries, PLACE integrates structural and learnable prompt tokens into the graph as a query-dependent refinement mechanism, forming… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 15 pages, 9 figures

  19. arXiv:2506.21572  [pdf, ps, other

    cs.CL

    Aligning MLLM Benchmark With Human Preferences via Structural Equation Modeling

    Authors: Tianyu. Zou, Shengwu. Xiong, Ruilin. Yao, Jirui. Huang, Yi. Rong, Yaxiong. Chen, Shili. Xiong, Cong. Wang

    Abstract: Evaluating multimodal large language models (MLLMs) remains a fundamental challenge due to a lack of structured, interpretable, and theoretically grounded benchmark designs. Existing benchmarks often adopt heuristic-based task groupings with unclear cognitive targets, thus resulting in overlapping abilities, redundant indicators, and limited diagnostic power. In this work, we propose a novel frame… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: 9 pages, 5 figures

  20. arXiv:2506.13499  [pdf, ps, other

    nucl-th

    Constraints on $ΛN$ Effective Interactions from Mirror Hypernuclei in a Deformed Relativistic Hartree-Bogoliubov Model

    Authors: Yu-Ting Rong, Dan Yang, Cheng-Jun Xia, Ting-Ting Sun

    Abstract: We investigate the ground-state properties of four mirror hypernuclei pairs--$^{10}_Λ$Be-$^{10}_Λ$B, $^{12}_Λ$B-$^{12}_Λ$C, $^{16}_Λ$N-$^{16}_Λ$O, and $^{40}_Λ$K-$^{40}_Λ$Ca--within the deformed relativistic Hartree-Bogoliubov framework, analyzing their connection to $ΛN$ effective interactions. Systematic calculations with eight distinct effective interactions reveal linear correlations between m… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 10 pages, 5 figures

  21. arXiv:2506.09513  [pdf, ps, other

    cs.CL cs.AI cs.MA

    ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

    Authors: Yu Sun, Xingyu Qian, Weiwen Xu, Hao Zhang, Chenghao Xiao, Long Li, Deli Zhao, Wenbing Huang, Tingyang Xu, Qifeng Bai, Yu Rong

    Abstract: Reasoning-based large language models have excelled in mathematics and programming, yet their potential in knowledge-intensive medical question answering remains underexplored and insufficiently validated in clinical contexts. To bridge this gap, we introduce ReasonMed, the largest medical reasoning dataset to date, comprising 370k high-quality examples distilled from 1.75 million initial reasonin… ▽ More

    Submitted 22 September, 2025; v1 submitted 11 June, 2025; originally announced June 2025.

    Comments: 28 pages, 6 figures, 7 tables

  22. arXiv:2506.07457  [pdf, ps, other

    astro-ph.GA

    The Internal Kinematics, Stellar Population, and Gas-phase Properties of The Pseudobulge in An Ultra-diffuse Galaxy: AGC721966

    Authors: Shihong Liu, Yu Rong, Huiyuan Wang, Hong-Xin Zhang, Tie Li, Yao Yao, Zhicheng He, Teng Liu, Enci Wang, Cheng Cheng, Xu Kong

    Abstract: Leveraging spectroscopic data from the Sloan Digital Sky Survey, we conduct a comprehensive analysis of the central stellar velocity dispersion, stellar population properties, star formation history, and gas-phase chemical abundances in AGC721966, a unique ultra-diffuse galaxy (UDG) harboring a pseudobulge. Our findings reveal that the pseudobulge formed in the early universe but underwent a recen… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: Submitted to ApJ Letters; comments welcome

  23. arXiv:2506.07044  [pdf, ps, other

    cs.CL cs.AI cs.CV

    Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

    Authors: LASA Team, Weiwen Xu, Hou Pong Chan, Long Li, Mahani Aljunied, Ruifeng Yuan, Jianyu Wang, Chenghao Xiao, Guizhen Chen, Chaoqun Liu, Zhaodonghui Li, Yu Sun, Junao Shen, Chaojun Wang, Jie Tan, Deli Zhao, Tingyang Xu, Hao Zhang, Yu Rong

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities in understanding common visual elements, largely due to their large-scale datasets and advanced training strategies. However, their effectiveness in medical applications remains limited due to the inherent discrepancies between data and tasks in medical scenarios and those in the general domain. Concretely, existing… ▽ More

    Submitted 13 June, 2025; v1 submitted 8 June, 2025; originally announced June 2025.

    Comments: Technical Report, 53 pages, 25 tables, and 16 figures. Our webpage is https://alibaba-damo-academy.github.io/lingshu/

  24. arXiv:2505.24635  [pdf, ps, other

    cs.CL

    Disentangling Language and Culture for Evaluating Multilingual Large Language Models

    Authors: Jiahao Ying, Wei Tang, Yiran Zhao, Yixin Cao, Yu Rong, Wenxuan Zhang

    Abstract: This paper introduces a Dual Evaluation Framework to comprehensively assess the multilingual capabilities of LLMs. By decomposing the evaluation along the dimensions of linguistic medium and cultural context, this framework enables a nuanced analysis of LLMs' ability to process questions within both native and cross-cultural contexts cross-lingually. Extensive evaluations are conducted on a wide r… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: Accepted to ACL 2025 (Main Conference)

  25. arXiv:2505.23129  [pdf, ps, other

    cs.CV

    HMAD: Advancing E2E Driving with Anchored Offset Proposals and Simulation-Supervised Multi-target Scoring

    Authors: Bin Wang, Pingjun Li, Jinkun Liu, Jun Cheng, Hailong Lei, Yinze Rong, Huan-ang Gao, Kangliang Chen, Xing Pan, Weihao Gu

    Abstract: End-to-end autonomous driving faces persistent challenges in both generating diverse, rule-compliant trajectories and robustly selecting the optimal path from these options via learned, multi-faceted evaluation. To address these challenges, we introduce HMAD, a framework integrating a distinctive Bird's-Eye-View (BEV) based trajectory proposal mechanism with learned multi-criteria scoring. HMAD le… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  26. arXiv:2505.22412  [pdf, ps, other

    nucl-th

    Neutron Magic Numbers in $sd$ Shell from Nuclear Charge Radii within Neutron-Proton Correction around the Fermi Surface

    Authors: Yu-Ting Rong, Ping-Mo Liu, Dan Yang, Rong An

    Abstract: Charge radii are sensitive indicators to identify the nuclear structure phenomena throughout the whole nuclide chart. In particular, the shrunken trend of changes of charge radii along a long isotopic chain is intimately associated with the shell quenching effect. In this work, the systematic evolution of charge radii along the proton numbers $Z=8$, $10$, $12$, $14$, $18$ isotopes is investigated… ▽ More

    Submitted 14 June, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

    Comments: 10 pages, 4 figures

  27. arXiv:2505.22053  [pdf, ps, other

    cs.SD cs.MA cs.MM eess.AS

    AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation

    Authors: Yan Rong, Jinting Wang, Guangzhi Lei, Shan Yang, Li Liu

    Abstract: Multimodality-to-Multiaudio (MM2MA) generation faces significant challenges in synthesizing diverse and contextually aligned audio types (e.g., sound effects, speech, music, and songs) from multimodal inputs (e.g., video, text, images), owing to the scarcity of high-quality paired datasets and the lack of robust multi-task learning frameworks. Recently, multi-agent system shows great potential in… ▽ More

    Submitted 5 August, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

  28. arXiv:2505.17260  [pdf, ps, other

    cs.CL

    The Rise of Parameter Specialization for Knowledge Storage in Large Language Models

    Authors: Yihuai Hong, Yiran Zhao, Wei Tang, Yang Deng, Yu Rong, Wenxuan Zhang

    Abstract: Over time, a growing wave of large language models from various series has been introduced to the community. Researchers are striving to maximize the performance of language models with constrained parameter sizes. However, from a microscopic perspective, there has been limited research on how to better store knowledge in model parameters, particularly within MLPs, to enable more effective utiliza… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  29. arXiv:2505.16379  [pdf, other

    cond-mat.mtrl-sci cs.AI

    Materials Generation in the Era of Artificial Intelligence: A Comprehensive Survey

    Authors: Zhixun Li, Bin Cao, Rui Jiao, Liang Wang, Ding Wang, Yang Liu, Dingshuo Chen, Jia Li, Qiang Liu, Yu Rong, Liang Wang, Tong-yi Zhang, Jeffrey Xu Yu

    Abstract: Materials are the foundation of modern society, underpinning advancements in energy, electronics, healthcare, transportation, and infrastructure. The ability to discover and design new materials with tailored properties is critical to solving some of the most pressing global challenges. In recent years, the growing availability of high-quality materials data combined with rapid advances in Artific… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: Work in progress

  30. arXiv:2505.15804  [pdf, ps, other

    cs.CV

    STAR-R1: Spatial TrAnsformation Reasoning by Reinforcing Multimodal LLMs

    Authors: Zongzhao Li, Zongyang Ma, Mingze Li, Songyou Li, Yu Rong, Tingyang Xu, Ziqi Zhang, Deli Zhao, Wenbing Huang

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities across diverse tasks, yet they lag significantly behind humans in spatial reasoning. We investigate this gap through Transformation-Driven Visual Reasoning (TVR), a challenging task requiring identification of object transformations across images under varying viewpoints. While traditional Supervised Fine-Tuning (SF… ▽ More

    Submitted 10 July, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  31. arXiv:2505.05766  [pdf, ps, other

    astro-ph.HE

    Measurement of separate electron and positron spectra from 10 GeV to 20GeV with the geomagnetic field on DAMPE

    Authors: DAMPE Collaboration, F. Alemanno, Q. An, P. Azzarello, F. C. T. Barbato, P. Bernardini, X. J. Bi, H. Boutin, I. Cagnoli, M. S. Cai, E. Casilli, E. Catanzani, J. Chang, D. Y. Chen, J. L. Chen, Z. F. Chen, Z. X. Chen, P. Coppin, M. Y. Cui, T. S. Cui, Y. X. Cui, I. DeMitri, F. dePalma, A. DiGiovanni, T. K. Dong , et al. (127 additional authors not shown)

    Abstract: The cosmic-ray (CR) electrons and positrons in space are of great significance for studying the origin and propagation of cosmic-rays. The satellite-borne experiment DArk Matter Particle Explorer (DAMPE) has been used to measure the separate electron and positron spectra, as well as the positron fraction. In this work, the Earth's magnetic field is used to distinguish CR electrons and positrons, a… ▽ More

    Submitted 21 August, 2025; v1 submitted 9 May, 2025; originally announced May 2025.

    Comments: Accepted for publication in Chinese Physics C

  32. arXiv:2504.19353  [pdf, other

    cs.LG cs.AI

    Flow Along the K-Amplitude for Generative Modeling

    Authors: Weitao Du, Shuning Chang, Jiasheng Tang, Yu Rong, Fan Wang, Shengchao Liu

    Abstract: In this work, we propose a novel generative learning paradigm, K-Flow, an algorithm that flows along the $K$-amplitude. Here, $k$ is a scaling parameter that organizes frequency bands (or projected coefficients), and amplitude describes the norm of such projected coefficients. By incorporating the $K$-amplitude decomposition, K-Flow enables flow matching across the scaling parameter as time. We di… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  33. arXiv:2504.17541  [pdf, other

    astro-ph.GA astro-ph.CO astro-ph.SR

    A negative stellar mass$-$gaseous metallicity gradient relation of dwarf galaxies modulated by stellar feedback

    Authors: Tie Li, Hong-Xin Zhang, Wenhe Lyu, Yimeng Tang, Yao Yao, Enci Wang, Yu Rong, Guangwen Chen, Xu Kong, Fuyan Bian, Qiusheng Gu, J. Evelyn Johnston, Xin Li, Shude Mao, Yong Shi, Junfeng Wang, Xin Wang, Xiaoling Yu, Zhiyuan Zheng

    Abstract: Baryonic cycling is reflected in the spatial distribution of metallicity within galaxies, yet gas-phase metallicity distribution and its connection with other properties of dwarf galaxies are largely unexplored. We present the first systematic study of radial gradients of gas-phase metallicities for a sample of 55 normal nearby star-forming dwarf galaxies (stellar mass $M_\star$ ranging from… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: 19 pages, 13 figures, accepted for publication in A&A. The most important findings are shown in Figs 3, 12, and 13

    Journal ref: A&A 698, A208 (2025)

  34. arXiv:2504.13816  [pdf, ps, other

    cs.CL

    Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations

    Authors: Chenghao Xiao, Hou Pong Chan, Hao Zhang, Mahani Aljunied, Lidong Bing, Noura Al Moubayed, Yu Rong

    Abstract: While understanding the knowledge boundaries of LLMs is crucial to prevent hallucination, research on the knowledge boundaries of LLMs has predominantly focused on English. In this work, we present the first study to analyze how LLMs recognize knowledge boundaries across different languages by probing their internal representations when processing known and unknown questions in multiple languages.… ▽ More

    Submitted 24 June, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

    Comments: ACL 2025 main; camera ready

  35. arXiv:2504.11002  [pdf, ps, other

    cs.SD cs.MM eess.AS

    Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Immersive Audiobook Generation

    Authors: Yan Rong, Shan Yang, Chenxing Li, Dong Yu, Li Liu

    Abstract: Audiobook generation aims to create rich, immersive listening experiences from multimodal inputs, but current approaches face three critical challenges: (1) the lack of synergistic generation of diverse audio types (e.g., speech, sound effects, and music) with precise temporal and semantic alignment; (2) the difficulty in conveying expressive, fine-grained emotions, which often results in machine-… ▽ More

    Submitted 12 August, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

  36. arXiv:2504.06835  [pdf, other

    cs.CV

    LVC: A Lightweight Compression Framework for Enhancing VLMs in Long Video Understanding

    Authors: Ziyi Wang, Haoran Wu, Yiming Rong, Deyang Jiang, Yixin Zhang, Yunlong Zhao, Shuang Xu, Bo XU

    Abstract: Long video understanding is a complex task that requires both spatial detail and temporal awareness. While Vision-Language Models (VLMs) obtain frame-level understanding capabilities through multi-frame input, they suffer from information loss due to the sparse sampling strategy. In contrast, Video Large Language Models (Video-LLMs) capture temporal relationships within visual features but are lim… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  37. arXiv:2504.03305  [pdf, other

    astro-ph.CO astro-ph.GA

    Unexpected clustering pattern in dwarf galaxies challenges formation models

    Authors: Ziwen Zhang, Yangyao Chen, Yu Rong, Huiyuan Wang, Houjun Mo, Xiong Luo, Hao Li

    Abstract: The galaxy correlation function serves as a fundamental tool for studying cosmology, galaxy formation, and the nature of dark matter. It is well established that more massive, redder and more compact galaxies tend to have stronger clustering in space. These results can be understood in terms of galaxy formation in Cold Dark Matter (CDM) halos of different mass and assembly history. Here, we report… ▽ More

    Submitted 7 April, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

    Comments: Accepted for publication in Nature. 45 pages, 12 figures, 2 tables

  38. arXiv:2503.15627  [pdf, other

    eess.AS eess.SP

    A Speech Production Model for Radar: Connecting Speech Acoustics with Radar-Measured Vibrations

    Authors: Isabella Lenz, Yu Rong, Daniel Bliss, Julie Liss, Visar Berisha

    Abstract: Millimeter Wave (mmWave) radar has emerged as a promising modality for speech sensing, offering advantages over traditional microphones. Prior works have demonstrated that radar captures motion signals related to vocal vibrations, but there is a gap in the understanding of the analytical connection between radar-measured vibrations and acoustic speech signals. We establish a mathematical framework… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: 5 pages, 6 figure, InterSpeech Conference

  39. arXiv:2503.07062  [pdf

    eess.SP

    Adaptive Extensive Cancellation Algorithm and Harmonic Enhanced Heart Rate Estimation based on MMWave Radar

    Authors: Hui Tang, Zhan Yang, Yu Rong, Li Chai

    Abstract: Heart rate (HR) monitoring is crucial for assessing physical fitness, cardiovascular health, and stress management. Millimeter-wave radar offers a promising noncontact solution for long-term monitoring. However, accurate HR estimation remains challenging in low signal-tonoise ratio (SNR) conditions. To deal with both respiration harmonics and intermodulation interference, this paper proposes a can… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  40. arXiv:2503.05478  [pdf, other

    astro-ph.GA

    Orthogonal Alignment of Galaxy Group Angular Momentum with Cosmic Filament Spines: An Observational Study

    Authors: Yu Rong, Peng Wang, Xiao-xiao Tang

    Abstract: We investigate the alignment between the angular momenta of galaxy groups and the spines of their associated cosmic filaments. Our results demonstrate a significant tendency for these two orientations to be perpendicular, indicating that the rotation of a galaxy group does not originate from the spin of cosmic filaments. Instead, it is driven by the orbital angular momentum contributed by member g… ▽ More

    Submitted 15 March, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

    Comments: Accepted for publication in ApJ Letters; corrected a typo

  41. arXiv:2503.05305  [pdf, other

    cs.CV cs.AI

    Frequency Autoregressive Image Generation with Continuous Tokens

    Authors: Hu Yu, Hao Luo, Hangjie Yuan, Yu Rong, Feng Zhao

    Abstract: Autoregressive (AR) models for image generation typically adopt a two-stage paradigm of vector quantization and raster-scan ``next-token prediction", inspired by its great success in language modeling. However, due to the huge modality gap, image autoregressive models may require a systematic reevaluation from two perspectives: tokenizer format and regression direction. In this paper, we introduce… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  42. arXiv:2503.01488  [pdf, other

    cs.LG

    InversionGNN: A Dual Path Network for Multi-Property Molecular Optimization

    Authors: Yifan Niu, Ziqi Gao, Tingyang Xu, Yang Liu, Yatao Bian, Yu Rong, Junzhou Huang, Jia Li

    Abstract: Exploring chemical space to find novel molecules that simultaneously satisfy multiple properties is crucial in drug discovery. However, existing methods often struggle with trading off multiple properties due to the conflicting or correlated nature of chemical properties. To tackle this issue, we introduce InversionGNN framework, an effective yet sample-efficient dual-path graph neural network (GN… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: ICLR 2025

  43. arXiv:2503.00865  [pdf, other

    cs.CL cs.AI

    Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers

    Authors: Yiran Zhao, Chaoqun Liu, Yue Deng, Jiahao Ying, Mahani Aljunied, Zhaodonghui Li, Lidong Bing, Hou Pong Chan, Yu Rong, Deli Zhao, Wenxuan Zhang

    Abstract: Large language models (LLMs) have revolutionized natural language processing (NLP), yet open-source multilingual LLMs remain scarce, with existing models often limited in language coverage. Such models typically prioritize well-resourced languages, while widely spoken but under-resourced languages are often overlooked. To address this disparity, we introduce $\texttt{Babel}$, an open multilingual… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

  44. arXiv:2502.20238  [pdf, ps, other

    cs.CL

    FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving

    Authors: Guizhen Chen, Weiwen Xu, Hao Zhang, Hou Pong Chan, Chaoqun Liu, Lidong Bing, Deli Zhao, Anh Tuan Luu, Yu Rong

    Abstract: Many challenging reasoning tasks require not just rapid, intuitive responses, but a more deliberate, multi-step approach. Recent progress in large language models (LLMs) highlights an important shift from the "System 1" way of quick reactions to the "System 2" style of reflection-and-correction problem solving. However, current benchmarks heavily rely on the final-answer accuracy, leaving much of… ▽ More

    Submitted 1 June, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

    Comments: Accepted to ACL2025 Main

  45. arXiv:2502.19750  [pdf, other

    cs.LG cs.CV

    CirT: Global Subseasonal-to-Seasonal Forecasting with Geometry-inspired Transformer

    Authors: Yang Liu, Zinan Zheng, Jiashun Cheng, Fugee Tsung, Deli Zhao, Yu Rong, Jia Li

    Abstract: Accurate Subseasonal-to-Seasonal (S2S) climate forecasting is pivotal for decision-making including agriculture planning and disaster preparedness but is known to be challenging due to its chaotic nature. Although recent data-driven models have shown promising results, their performance is limited by inadequate consideration of geometric inductive biases. Usually, they treat the spherical weather… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

  46. arXiv:2502.19739  [pdf, other

    cs.CV

    LUCAS: Layered Universal Codec Avatars

    Authors: Di Liu, Teng Deng, Giljoo Nam, Yu Rong, Stanislav Pidhorskyi, Junxuan Li, Jason Saragih, Dimitris N. Metaxas, Chen Cao

    Abstract: Photorealistic 3D head avatar reconstruction faces critical challenges in modeling dynamic face-hair interactions and achieving cross-identity generalization, particularly during expressions and head movements. We present LUCAS, a novel Universal Prior Model (UPM) for codec avatar modeling that disentangles face and hair through a layered representation. Unlike previous UPMs that treat hair as an… ▽ More

    Submitted 17 March, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

  47. arXiv:2502.17637  [pdf, other

    math.GT

    On the notion of Khovanov A-adequacy

    Authors: Lizzie Buchanan, Huizheng Guo, Gabriel Montoya-Vega, Yongwu Rong, Marithania Silvero

    Abstract: The concept of adequate links, introduced by Lickorish and Thistlethwaite as a generalization of alternating links, has recently gained interest among knot theorists in the context of Khovanov homology. Przytycki and Silvero introduced the more general concept of Khovanov adequacy: a diagram is Khovanov-adequate if its associated Khovanov chain complexes at both potential maximal and minimal quant… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Comments: 12 pages, 11 figures

    MSC Class: 57K10; 57K18

  48. arXiv:2502.16533  [pdf, other

    cs.LG cs.AI

    A Survey of Graph Transformers: Architectures, Theories and Applications

    Authors: Chaohao Yuan, Kangfei Zhao, Ercan Engin Kuruoglu, Liang Wang, Tingyang Xu, Wenbing Huang, Deli Zhao, Hong Cheng, Yu Rong

    Abstract: Graph Transformers (GTs) have demonstrated a strong capability in modeling graph structures by addressing the intrinsic limitations of graph neural networks (GNNs), such as over-smoothing and over-squashing. Recent studies have proposed diverse architectures, enhanced explainability, and practical applications for Graph Transformers. In light of these rapid developments, we conduct a comprehensive… ▽ More

    Submitted 27 February, 2025; v1 submitted 23 February, 2025; originally announced February 2025.

  49. arXiv:2502.16284  [pdf, other

    cs.LG cs.AI cs.CE physics.chem-ph

    MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra

    Authors: Liang Wang, Shaozhen Liu, Yu Rong, Deli Zhao, Qiang Liu, Shu Wu, Liang Wang

    Abstract: Establishing the relationship between 3D structures and the energy states of molecular systems has proven to be a promising approach for learning 3D molecular representations. However, existing methods are limited to modeling the molecular energy states from classical mechanics. This limitation results in a significant oversight of quantum mechanical effects, such as quantized (discrete) energy le… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

    Comments: Accepted by ICLR 2025

  50. arXiv:2502.11149  [pdf, other

    cs.LG cs.AI

    Large Language-Geometry Model: When LLM meets Equivariance

    Authors: Zongzhao Li, Jiacheng Cen, Bing Su, Wenbing Huang, Tingyang Xu, Yu Rong, Deli Zhao

    Abstract: Accurately predicting 3D structures and dynamics of physical systems is crucial in scientific applications. Existing approaches that rely on geometric Graph Neural Networks (GNNs) effectively enforce $\mathrm{E}(3)$-equivariance, but they often fall in leveraging extensive broader information. While direct application of Large Language Models (LLMs) can incorporate external knowledge, they lack th… ▽ More

    Submitted 19 February, 2025; v1 submitted 16 February, 2025; originally announced February 2025.