Skip to main content

Showing 1–50 of 186 results for author: Huo, J

.
  1. arXiv:2505.22101  [pdf, other

    cs.CL

    MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models

    Authors: Zhiyu Li, Shichao Song, Hanyu Wang, Simin Niu, Ding Chen, Jiawei Yang, Chenyang Xi, Huayi Lai, Jihao Zhao, Yezhaohui Wang, Junpeng Ren, Zehao Lin, Jiahao Huo, Tianyi Chen, Kai Chen, Kehang Li, Zhiqiang Yin, Qingchen Yu, Bo Tang, Hongkang Yang, Zhi-Qin John Xu, Feiyu Xiong

    Abstract: Large Language Models (LLMs) have emerged as foundational infrastructure in the pursuit of Artificial General Intelligence (AGI). Despite their remarkable capabilities in language perception and generation, current LLMs fundamentally lack a unified and structured architecture for handling memory. They primarily rely on parametric memory (knowledge encoded in model weights) and ephemeral activation… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  2. arXiv:2505.20824  [pdf, ps, other

    cs.MA cs.AI

    MedSentry: Understanding and Mitigating Safety Risks in Medical LLM Multi-Agent Systems

    Authors: Kai Chen, Taihang Zhen, Hewei Wang, Kailai Liu, Xinfeng Li, Jing Huo, Tianpei Yang, Jinfeng Xu, Wei Dong, Yang Gao

    Abstract: As large language models (LLMs) are increasingly deployed in healthcare, ensuring their safety, particularly within collaborative multi-agent configurations, is paramount. In this paper we introduce MedSentry, a benchmark comprising 5 000 adversarial medical prompts spanning 25 threat categories with 100 subthemes. Coupled with this dataset, we develop an end-to-end attack-defense evaluation pipel… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  3. arXiv:2505.14406  [pdf, other

    cs.CL

    Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit Analysis

    Authors: Haoming Huang, Yibo Yan, Jiahao Huo, Xin Zou, Xinfeng Li, Kun Wang, Xuming Hu

    Abstract: Large Language Models (LLMs), despite their remarkable capabilities, are hampered by hallucinations. A particularly challenging variant, knowledge overshadowing, occurs when one piece of activated knowledge inadvertently masks another relevant piece, leading to erroneous outputs even with high-quality training data. Current understanding of overshadowing is largely confined to inference-time obser… ▽ More

    Submitted 20 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: Under review

  4. arXiv:2505.12246  [pdf, ps, other

    cs.CV

    SEPT: Standard-Definition Map Enhanced Scene Perception and Topology Reasoning for Autonomous Driving

    Authors: Muleilan Pei, Jiayao Shan, Peiliang Li, Jieqi Shi, Jing Huo, Yang Gao, Shaojie Shen

    Abstract: Online scene perception and topology reasoning are critical for autonomous vehicles to understand their driving environments, particularly for mapless driving systems that endeavor to reduce reliance on costly High-Definition (HD) maps. However, recent advances in online scene understanding still face limitations, especially in long-range or occluded scenarios, due to the inherent constraints of o… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

    Comments: Accepted by IEEE Robotics and Automation Letters

  5. arXiv:2505.12009  [pdf, ps, other

    cs.CV

    Black-box Adversaries from Latent Space: Unnoticeable Attacks on Human Pose and Shape Estimation

    Authors: Zhiying Li, Guanggang Geng, Yeying Jin, Zhizhi Guo, Bruce Gu, Jidong Huo, Zhaoxin Fan, Wenjun Wu

    Abstract: Expressive human pose and shape (EHPS) estimation is vital for digital human generation, particularly in live-streaming applications. However, most existing EHPS models focus primarily on minimizing estimation errors, with limited attention on potential security vulnerabilities. Current adversarial attacks on EHPS models often require white-box access (e.g., model details or gradients) or generate… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

    Comments: 17 pages, 6 figures

  6. arXiv:2505.04946  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models

    Authors: Xuyang Guo, Jiayan Huo, Zhenmei Shi, Zhao Song, Jiahao Zhang, Jiale Zhao

    Abstract: Thanks to recent advancements in scalable deep architectures and large-scale pretraining, text-to-video generation has achieved unprecedented capabilities in producing high-fidelity, instruction-following content across a wide range of styles, enabling applications in advertising, entertainment, and education. However, these models' ability to render precise on-screen text, such as captions or mat… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  7. arXiv:2505.00527  [pdf, other

    cs.RO

    DeCo: Task Decomposition and Skill Composition for Zero-Shot Generalization in Long-Horizon 3D Manipulation

    Authors: Zixuan Chen, Junhui Yin, Yangtao Chen, Jing Huo, Pinzhuo Tian, Jieqi Shi, Yiwen Hou, Yinchuan Li, Yang Gao

    Abstract: Generalizing language-conditioned multi-task imitation learning (IL) models to novel long-horizon 3D manipulation tasks remains a significant challenge. To address this, we propose DeCo (Task Decomposition and Skill Composition), a model-agnostic framework compatible with various multi-task IL models, designed to enhance their zero-shot generalization to novel, compositional, long-horizon 3D manip… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  8. arXiv:2505.00337  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation

    Authors: Xuyang Guo, Jiayan Huo, Zhenmei Shi, Zhao Song, Jiahao Zhang, Jiale Zhao

    Abstract: Text-to-video generative models have made significant strides in recent years, producing high-quality videos that excel in both aesthetic appeal and accurate instruction following, and have become central to digital art creation and user engagement online. Yet, despite these advancements, their ability to respect fundamental physical laws remains largely untested: many outputs still violate basic… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  9. arXiv:2504.04765  [pdf, other

    eess.SY

    Multi-Agent Deep Reinforcement Learning for Multiple Anesthetics Collaborative Control

    Authors: Huijie Li, Yide Yu, Si Shi, Anmin Hu, Jian Huo, Wei Lin, Chaoran Wu, Wuman Luo

    Abstract: Automated control of personalized multiple anesthetics in clinical Total Intravenous Anesthesia (TIVA) is crucial yet challenging. Current systems, including target-controlled infusion (TCI) and closed-loop systems, either rely on relatively static pharmacokinetic/pharmacodynamic (PK/PD) models or focus on single anesthetic control, limiting personalization and collaborative control. To address th… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  10. arXiv:2504.04051  [pdf, other

    cs.CV cs.AI cs.LG

    Can You Count to Nine? A Human Evaluation Benchmark for Counting Limits in Modern Text-to-Video Models

    Authors: Xuyang Guo, Zekai Huang, Jiayan Huo, Yingyu Liang, Zhenmei Shi, Zhao Song, Jiahao Zhang

    Abstract: Generative models have driven significant progress in a variety of AI tasks, including text-to-video generation, where models like Video LDM and Stable Video Diffusion can produce realistic, movie-level videos from textual instructions. Despite these advances, current text-to-video models still face fundamental challenges in reliably following human commands, particularly in adhering to simple num… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

  11. arXiv:2504.00432  [pdf, other

    cs.CV

    DecoFuse: Decomposing and Fusing the "What", "Where", and "How" for Brain-Inspired fMRI-to-Video Decoding

    Authors: Chong Li, Jingyang Huo, Weikang Gong, Yanwei Fu, Xiangyang Xue, Jianfeng Feng

    Abstract: Decoding visual experiences from brain activity is a significant challenge. Existing fMRI-to-video methods often focus on semantic content while overlooking spatial and motion information. However, these aspects are all essential and are processed through distinct pathways in the brain. Motivated by this, we propose DecoFuse, a novel brain-inspired framework for decoding videos from fMRI signals.… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  12. arXiv:2503.20285  [pdf, other

    cs.LG cs.AI

    Model-Based Offline Reinforcement Learning with Adversarial Data Augmentation

    Authors: Hongye Cao, Fan Feng, Jing Huo, Shangdong Yang, Meng Fang, Tianpei Yang, Yang Gao

    Abstract: Model-based offline Reinforcement Learning (RL) constructs environment models from offline datasets to perform conservative policy optimization. Existing approaches focus on learning state transitions through ensemble models, rollouting conservative estimation to mitigate extrapolation errors. However, the static data makes it challenging to develop a robust policy, and offline agents cannot acces… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  13. arXiv:2503.18132  [pdf, other

    cs.CL

    MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection

    Authors: Yibo Yan, Shen Wang, Jiahao Huo, Philip S. Yu, Xuming Hu, Qingsong Wen

    Abstract: Mathematical error detection in educational settings presents a significant challenge for Multimodal Large Language Models (MLLMs), requiring a sophisticated understanding of both visual and textual mathematical content along with complex reasoning capabilities. Though effective in mathematical problem-solving, MLLMs often struggle with the nuanced task of identifying and categorizing student erro… ▽ More

    Submitted 20 May, 2025; v1 submitted 23 March, 2025; originally announced March 2025.

    Comments: Accepted by The 63rd Annual Meeting of the Association for Computational Linguistics (ACL Industry 2025, Oral Presentation)

  14. arXiv:2503.12069  [pdf, other

    cs.CV

    Robust Dataset Distillation by Matching Adversarial Trajectories

    Authors: Wei Lai, Tianyu Ding, ren dongdong, Lei Wang, Jing Huo, Yang Gao, Wenbin Li

    Abstract: Dataset distillation synthesizes compact datasets that enable models to achieve performance comparable to training on the original large-scale datasets. However, existing distillation methods overlook the robustness of the model, resulting in models that are vulnerable to adversarial attacks when trained on distilled data. To address this limitation, we introduce the task of ``robust dataset disti… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  15. arXiv:2503.06884  [pdf, other

    cs.CV cs.AI cs.LG

    Text-to-Image Diffusion Models Cannot Count, and Prompt Refinement Cannot Help

    Authors: Yuefan Cao, Xuyang Guo, Jiayan Huo, Yingyu Liang, Zhenmei Shi, Zhao Song, Jiahao Zhang, Zhen Zhuang

    Abstract: Generative modeling is widely regarded as one of the most essential problems in today's AI community, with text-to-image generation having gained unprecedented real-world impacts. Among various approaches, diffusion models have achieved remarkable success and have become the de facto solution for text-to-image generation. However, despite their impressive performance, these models exhibit fundamen… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  16. New insight into the Rapid Burster by Insight-HXMT

    Authors: Y. P. Chen, S. Zhang, S. N. Zhang, L. Ji, L. D. Kong, P. J. Wang, L. Tao, M. Y. Ge, C. Z. Liu, F. J. Lu, J. L. Qu, T. P. Li, Y. P. Xu, X. L. Cao, Y. Chen, Q. C. Bu, C. Cai, Z. Chang, G. Chen, L. Chen, T. X. Chen, W. W. Cui, Y. Y. Du, G. H. Gao, H. Gao , et al. (70 additional authors not shown)

    Abstract: We report the timing and spectral analyses upon of the type II X-ray bursts from the Rapid Burster (MXB 1730--335) observed by Insight-HXMT and Swift/XRT. By stacking the long-duration bursts, we find for the first time that the hard X-rays are lagging than the soft X-rays by 3 seconds. However, such a lag is not visible for the short-duration bursts, probably because of the poor statistics. For a… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

    Journal ref: 2021,ApJ,913,150

  17. arXiv:2502.15188  [pdf, other

    eess.IV cs.CV

    Interleaved Block-based Learned Image Compression with Feature Enhancement and Quantization Error Compensation

    Authors: Shiqi Jiang, Hui Yuan, Shuai Li, Raouf Hamzaoui, Xu Wang, Junyan Huo

    Abstract: In recent years, learned image compression (LIC) methods have achieved significant performance improvements. However, obtaining a more compact latent representation and reducing the impact of quantization errors remain key challenges in the field of LIC. To address these challenges, we propose a feature extraction module, a feature refinement module, and a feature enhancement module. Our feature e… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  18. arXiv:2502.11916  [pdf, other

    cs.CL cs.AI

    EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models

    Authors: Jiamin Su, Yibo Yan, Fangteng Fu, Han Zhang, Jingheng Ye, Xiang Liu, Jiahao Huo, Huiyu Zhou, Xuming Hu

    Abstract: Automated Essay Scoring (AES) plays a crucial role in educational assessment by providing scalable and consistent evaluations of writing tasks. However, traditional AES systems face three major challenges: (1) reliance on handcrafted features that limit generalizability, (2) difficulty in capturing fine-grained traits like coherence and argumentation, and (3) inability to handle multimodal context… ▽ More

    Submitted 20 May, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: Accepted by ACL Findings 2025

  19. arXiv:2502.11090  [pdf, other

    cs.CL cs.AI

    SafeDialBench: A Fine-Grained Safety Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks

    Authors: Hongye Cao, Yanming Wang, Sijia Jing, Ziyue Peng, Zhixin Bai, Zhe Cao, Meng Fang, Fan Feng, Boyan Wang, Jiaheng Liu, Tianpei Yang, Jing Huo, Yang Gao, Fanyu Meng, Xi Yang, Chao Deng, Junlan Feng

    Abstract: With the rapid advancement of Large Language Models (LLMs), the safety of LLMs has been a critical concern requiring precise assessment. Current benchmarks primarily concentrate on single-turn dialogues or a single jailbreak attack method to assess the safety. Additionally, these benchmarks have not taken into account the LLM's capability of identifying and handling unsafe information in detail. T… ▽ More

    Submitted 17 February, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

  20. arXiv:2502.11051  [pdf, other

    cs.CL cs.AI

    MMUnlearner: Reformulating Multimodal Machine Unlearning in the Era of Multimodal Large Language Models

    Authors: Jiahao Huo, Yibo Yan, Xu Zheng, Yuanhuiyi Lyu, Xin Zou, Zhihua Wei, Xuming Hu

    Abstract: Recent progress in Machine Unlearning (MU) has introduced solutions for the selective removal of private or sensitive information encoded within deep neural networks. Nonetheless, MU for Multimodal Large Language Models (MLLMs) remains in its nascent phase. Therefore, we propose to reformulate the task of multimodal MU in the era of MLLMs, which aims to erase only the visual patterns associated wi… ▽ More

    Submitted 27 May, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

    Comments: Accepted as ACL 2025 Findings

  21. arXiv:2502.10097  [pdf, other

    cs.AI cs.LG

    Causal Information Prioritization for Efficient Reinforcement Learning

    Authors: Hongye Cao, Fan Feng, Tianpei Yang, Jing Huo, Yang Gao

    Abstract: Current Reinforcement Learning (RL) methods often suffer from sample-inefficiency, resulting from blind exploration strategies that neglect causal relationships among states, actions, and rewards. Although recent causal approaches aim to address this problem, they lack grounded modeling of reward-guided causal understanding of states and actions for goal-orientation, thus impairing learning effici… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

  22. arXiv:2502.10077  [pdf, other

    cs.AI cs.LG

    Towards Empowerment Gain through Causal Structure Learning in Model-Based RL

    Authors: Hongye Cao, Fan Feng, Meng Fang, Shaokang Dong, Tianpei Yang, Jing Huo, Yang Gao

    Abstract: In Model-Based Reinforcement Learning (MBRL), incorporating causal structures into dynamics models provides agents with a structured understanding of the environments, enabling efficient decision. Empowerment as an intrinsic motivation enhances the ability of agents to actively control their environments by maximizing the mutual information between future states and actions. We posit that empowerm… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

  23. arXiv:2502.02871  [pdf, other

    cs.CL cs.AI

    Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning

    Authors: Yibo Yan, Shen Wang, Jiahao Huo, Jingheng Ye, Zhendong Chu, Xuming Hu, Philip S. Yu, Carla Gomes, Bart Selman, Qingsong Wen

    Abstract: Scientific reasoning, the process through which humans apply logic, evidence, and critical thinking to explore and interpret scientific phenomena, is essential in advancing knowledge reasoning across diverse fields. However, despite significant progress, current scientific reasoning models still struggle with generalization across domains and often fall short of multimodal perception. Multimodal L… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

  24. arXiv:2501.11857  [pdf

    cond-mat.mtrl-sci

    The critical role of entropy in glass transition kinetics

    Authors: Lijian Song, Meng Gao, Juntao Huo, Li-Min Wang, Yuanzheng Yue, Jun-Qiang Wang

    Abstract: Glass transition is a reversible transition that occurs in most amorphous materials. However, the nature of glass transition remains far from being clarified. A key to understand the glass transition is to clarify what determines the glass transition temperature (Tg) and liquid fragility (m). Here the glass transition thermodynamics for 150 different glass-forming systems are studied statistically… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

  25. arXiv:2501.10968  [pdf, other

    hep-ph

    Exploring the two-body strong decay properties of the possible $Λ_cK^{*}$ and $Σ_cK^{(*)}$ molecules

    Authors: Jin-yu Huo, Rui Chen

    Abstract: In this work, we apply the effective Lagrangian approach to investigate the two-body strong decay behaviors of the possible $Λ_c K^*$ and $Σ_c K^{(*)}$ molecules, as predicted in our previous study [Phys. Rev. D 108, 054011 (2023)]. Our results indicate that the decay width for the coupled $Σ_c K / Λ_c K^* / Σ_c K^*$ molecule with $I(J^P) = 1/2(1/2^-)$ is on the order of several MeV, with the… ▽ More

    Submitted 19 January, 2025; originally announced January 2025.

    Comments: 8 pages, 7 figures

  26. arXiv:2501.09580  [pdf, other

    astro-ph.HE astro-ph.GA

    An Intermediate-mass Black Hole Lurking in A Galactic Halo Caught Alive during Outburst

    Authors: C. -C. Jin, D. -Y. Li, N. Jiang, L. -X. Dai, H. -Q. Cheng, J. -Z. Zhu, C. -W. Yang, A. Rau, P. Baldini, T. -G. Wang, H. -Y. Zhou, W. Yuan, C. Zhang, X. -W. Shu, R. -F. Shen, Y. -L. Wang, S. -X. Wen, Q. -Y. Wu, Y. -B. Wang, L. L. Thomsen, Z. -J. Zhang, W. -J. Zhang, A. Coleiro, R. Eyles-Ferris, X. Fang , et al. (116 additional authors not shown)

    Abstract: Stellar-mass and supermassive black holes abound in the Universe, whereas intermediate-mass black holes (IMBHs) of ~10^2-10^5 solar masses in between are largely missing observationally, with few cases found only. Here we report the real-time discovery of a long-duration X-ray transient, EP240222a, accompanied by an optical flare with prominent H and He emission lines revealed by prompt follow-up… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

    Comments: 64 pages, 15 figures, submitted

  27. arXiv:2501.06605  [pdf, other

    cs.RO

    RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation

    Authors: Zixuan Chen, Jing Huo, Yangtao Chen, Yang Gao

    Abstract: Efficient control in long-horizon robotic manipulation is challenging due to complex representation and policy learning requirements. Model-based visual reinforcement learning (RL) has shown great potential in addressing these challenges but still faces notable limitations, particularly in handling sparse rewards and complex visual features in long-horizon environments. To address these limitation… ▽ More

    Submitted 24 January, 2025; v1 submitted 11 January, 2025; originally announced January 2025.

    Comments: Under review

  28. arXiv:2412.17316  [pdf, ps, other

    cs.LG cs.AI cs.CC cs.CL

    Fast Gradient Computation for RoPE Attention in Almost Linear Time

    Authors: Yifang Chen, Jiayan Huo, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song

    Abstract: The Rotary Position Embedding (RoPE) mechanism has become a powerful enhancement to the Transformer architecture, which enables models to capture token relationships when encoding positional information. However, the RoPE mechanisms make the computations of attention mechanisms more complicated, which makes efficient algorithms challenging. Earlier research introduced almost linear time, i.e.,… ▽ More

    Submitted 31 December, 2024; v1 submitted 23 December, 2024; originally announced December 2024.

  29. arXiv:2412.02104  [pdf, other

    cs.CL

    Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey

    Authors: Yunkai Dang, Kaichen Huang, Jiahao Huo, Yibo Yan, Sirui Huang, Dongrui Liu, Mengxi Gao, Jie Zhang, Chen Qian, Kun Wang, Yong Liu, Jing Shao, Hui Xiong, Xuming Hu

    Abstract: The rapid development of Artificial Intelligence (AI) has revolutionized numerous fields, with large language models (LLMs) and computer vision (CV) systems driving advancements in natural language understanding and visual processing, respectively. The convergence of these technologies has catalyzed the rise of multimodal AI, enabling richer, cross-modal understanding that spans text, vision, audi… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

  30. arXiv:2412.00716  [pdf

    econ.GN

    Effects of time aggregation, product aggregation, and seasonality in measuring bullwhip ratio

    Authors: Hau Mike Ma, Jiazhen Huo, Yongrui Duan

    Abstract: The bullwhip study has received a lot of attention in the literature, but with conflicting results, especially in the context of data aggregation. In this paper, we investigate three widely studied factors in bullwhip measurement: time aggregation, product aggregation, and seasonality. In time aggregation, we decompose the variance into two components: the expectation of the subset variances and t… ▽ More

    Submitted 1 December, 2024; originally announced December 2024.

  31. arXiv:2411.17605  [pdf, other

    cs.CV

    Distractor-free Generalizable 3D Gaussian Splatting

    Authors: Yanqi Bao, Jing Liao, Jing Huo, Yang Gao

    Abstract: We present DGGS, a novel framework addressing the previously unexplored challenge of Distractor-free Generalizable 3D Gaussian Splatting (3DGS). It accomplishes two key objectives: fortifying generalizable 3DGS against distractor-laden data during both training and inference phases, while successfully extending cross-scene adaptation capabilities to conventional distractor-free approaches. To achi… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

  32. arXiv:2411.12755  [pdf, other

    eess.IV cs.CV

    SAM-I2I: Unleash the Power of Segment Anything Model for Medical Image Translation

    Authors: Jiayu Huo, Sebastien Ourselin, Rachel Sparks

    Abstract: Medical image translation is crucial for reducing the need for redundant and expensive multi-modal imaging in clinical field. However, current approaches based on Convolutional Neural Networks (CNNs) and Transformers often fail to capture fine-grain semantic features, resulting in suboptimal image quality. To address this challenge, we propose SAM-I2I, a novel image-to-image translation framework… ▽ More

    Submitted 12 November, 2024; originally announced November 2024.

  33. Einstein Probe discovery of EP240408a: a peculiar X-ray transient with an intermediate timescale

    Authors: Wenda Zhang, Weimin Yuan, Zhixing Ling, Yong Chen, Nanda Rea, Arne Rau, Zhiming Cai, Huaqing Cheng, Francesco Coti Zelati, Lixin Dai, Jingwei Hu, Shumei Jia, Chichuan Jin, Dongyue Li, Paul O'Brien, Rongfeng Shen, Xinwen Shu, Shengli Sun, Xiaojin Sun, Xiaofeng Wang, Lei Yang, Bing Zhang, Chen Zhang, Shuang-Nan Zhang, Yonghe Zhang , et al. (115 additional authors not shown)

    Abstract: We report the discovery of a peculiar X-ray transient, EP240408a, by Einstein Probe (EP) and follow-up studies made with EP, Swift, NICER, GROND, ATCA and other ground-based multi-wavelength telescopes. The new transient was first detected with Wide-field X-ray Telescope (WXT) on board EP on April 8th, 2024, manifested in an intense yet brief X-ray flare lasting for 12 seconds. The flare reached a… ▽ More

    Submitted 28 October, 2024; originally announced October 2024.

    Comments: 25 pages, 11 figures

    Journal ref: published in SCIENCE CHINA Physics, Mechanics & Astronomy(SCPMA) (2024)

  34. arXiv:2410.16714  [pdf, other

    cs.CL

    Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment

    Authors: Mingzhi Wang, Chengdong Ma, Qizhi Chen, Linjian Meng, Yang Han, Jiancong Xiao, Zhaowei Zhang, Jing Huo, Weijie J. Su, Yaodong Yang

    Abstract: Self-play methods have demonstrated remarkable success in enhancing model capabilities across various domains. In the context of Reinforcement Learning from Human Feedback (RLHF), self-play not only boosts Large Language Model (LLM) performance but also overcomes the limitations of traditional Bradley-Terry (BT) model assumptions by finding the Nash equilibrium (NE) of a preference-based, two-play… ▽ More

    Submitted 19 April, 2025; v1 submitted 22 October, 2024; originally announced October 2024.

    Comments: ICLR 2025

  35. arXiv:2410.04819  [pdf, other

    cs.CL

    MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models

    Authors: Kaichen Huang, Jiahao Huo, Yibo Yan, Kun Wang, Yutao Yue, Xuming Hu

    Abstract: In recent years, multimodal large language models (MLLMs) have significantly advanced, integrating more modalities into diverse applications. However, the lack of explainability remains a major barrier to their use in scenarios requiring decision transparency. Current neuron-level explanation paradigms mainly focus on knowledge localization or language- and domain-specific analyses, leaving the ex… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

  36. arXiv:2410.04509  [pdf, other

    cs.CL

    ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection

    Authors: Yibo Yan, Shen Wang, Jiahao Huo, Hang Li, Boyan Li, Jiamin Su, Xiong Gao, Yi-Fan Zhang, Tianlong Xu, Zhendong Chu, Aoxiao Zhong, Kun Wang, Hui Xiong, Philip S. Yu, Xuming Hu, Qingsong Wen

    Abstract: As the field of Multimodal Large Language Models (MLLMs) continues to evolve, their potential to revolutionize artificial intelligence is particularly promising, especially in addressing mathematical reasoning tasks. Current mathematical benchmarks predominantly focus on evaluating MLLMs' problem-solving ability, yet there is a crucial gap in addressing more complex scenarios such as error detecti… ▽ More

    Submitted 8 October, 2024; v1 submitted 6 October, 2024; originally announced October 2024.

  37. arXiv:2410.02315  [pdf, other

    astro-ph.HE

    Extragalactic fast X-ray transient from a weak relativistic jet associated with a Type Ic-BL supernova

    Authors: H. Sun, W. -X. Li, L. -D. Liu, H. Gao, X. -F. Wang, W. Yuan, B. Zhang, A. V. Filippenko, D. Xu, T. An, S. Ai, T. G. Brink, Y. Liu, Y. -Q. Liu, C. -Y. Wang, Q. -Y. Wu, X. -F. Wu, Y. Yang, B. -B. Zhang, W. -K. Zheng, T. Ahumada, Z. -G. Dai, J. Delaunay, N. Elias-Rosa, S. Benetti , et al. (140 additional authors not shown)

    Abstract: Massive stars end their life as core-collapse supernovae, amongst which some extremes are Type Ic broad-lined supernovae associated with long-duration gamma-ray bursts (LGRBs) having powerful relativistic jets. Their less-extreme brethren make unsuccessful jets that are choked inside the stars, appearing as X-ray flashes or low-luminosity GRBs. On the other hand, there exists a population of extra… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: 43 pages, 9 figures, 4 tables, submitted. Comments are welcome

  38. arXiv:2409.20154  [pdf, other

    cs.RO

    GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation

    Authors: Yangtao Chen, Zixuan Chen, Junhui Yin, Jing Huo, Pinzhuo Tian, Jieqi Shi, Yang Gao

    Abstract: Robots' ability to follow language instructions and execute diverse 3D manipulation tasks is vital in robot learning. Traditional imitation learning-based methods perform well on seen tasks but struggle with novel, unseen ones due to variability. Recent approaches leverage large foundation models to assist in understanding novel tasks, thereby mitigating this issue. However, these methods lack a t… ▽ More

    Submitted 16 March, 2025; v1 submitted 30 September, 2024; originally announced September 2024.

    Comments: ICLR 2025. The first two authors contributed equally

  39. arXiv:2408.11297  [pdf, other

    cs.CV

    Making Large Vision Language Models to be Good Few-shot Learners

    Authors: Fan Liu, Wenwen Cai, Jian Huo, Chuanyi Zhang, Delong Chen, Jun Zhou

    Abstract: Few-shot classification (FSC) is a fundamental yet challenging task in computer vision that involves recognizing novel classes from limited data. While previous methods have focused on enhancing visual features or incorporating additional modalities, Large Vision Language Models (LVLMs) offer a promising alternative due to their rich knowledge and strong visual perception. However, LVLMs risk lear… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  40. arXiv:2407.17418  [pdf, other

    cs.CV

    3D Gaussian Splatting: Survey, Technologies, Challenges, and Opportunities

    Authors: Yanqi Bao, Tianyu Ding, Jing Huo, Yaoli Liu, Yuxin Li, Wenbin Li, Yang Gao, Jiebo Luo

    Abstract: 3D Gaussian Splatting (3DGS) has emerged as a prominent technique with the potential to become a mainstream method for 3D representations. It can effectively transform multi-view images into explicit 3D Gaussian through efficient training, and achieve real-time rendering of novel views. This survey aims to analyze existing 3DGS-related works from multiple intersecting perspectives, including relat… ▽ More

    Submitted 17 December, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

  41. Exploring the mass spectrum and the electromagnetic properties of the possible $Ξ_{cc}K^{(*)}$ and $Ξ_{cc}\bar{K}^{(*)}$ molecules

    Authors: Li-Cheng Sheng, Jin-Yu Huo, Rui Chen, Fu-Lai Wang, Xiang Liu

    Abstract: Using the one-boson-exchange model, we investigate the interactions between the doubly charmed baryon $Ξ_{cc}(3621)$ and the $S-$wave (anti-)kaon accounting for the $S-D$ wave mixing and coupled-channel effects. We find the coupled $Ξ_{cc}K/Ξ_{cc}K^*$ state with $I(J^P)=0(1/2^-)$, the $Ξ_{cc}K^*$ state with $0(1/2^-)$, the $Ξ_{cc}\bar{K}$ state with $0(1/2^-)$, and the $Ξ_{cc}\bar{K}^*$ states wit… ▽ More

    Submitted 29 September, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

    Comments: 11 pages, 5 figures

    Journal ref: Phys.Rev.D 110 (2024), 054044

  42. arXiv:2406.14826  [pdf, other

    eess.IV cs.AI

    Self-supervised Brain Lesion Generation for Effective Data Augmentation of Medical Images

    Authors: Jiayu Huo, Sebastien Ourselin, Rachel Sparks

    Abstract: Accurate brain lesion delineation is important for planning neurosurgical treatment. Automatic brain lesion segmentation methods based on convolutional neural networks have demonstrated remarkable performance. However, neural network performance is constrained by the lack of large-scale well-annotated training datasets. In this manuscript, we propose a comprehensive framework to efficiently genera… ▽ More

    Submitted 18 August, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 11 pages, 7 figures, 8 tables

  43. MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model

    Authors: Jiahao Huo, Yibo Yan, Boren Hu, Yutao Yue, Xuming Hu

    Abstract: Projecting visual features into word embedding space has become a significant fusion strategy adopted by Multimodal Large Language Models (MLLMs). However, its internal mechanisms have yet to be explored. Inspired by multilingual research, we identify domain-specific neurons in multimodal large language models. Specifically, we investigate the distribution of domain-specific neurons and the mechan… ▽ More

    Submitted 1 October, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted by the Main Conference of Empirical Methods in Natural Language Processing (EMNLP) 2024

  44. arXiv:2406.08750  [pdf

    eess.SY

    The expressway network design problem for multiple urban subregions based on the macroscopic fundamental diagram

    Authors: Yunran Di, Weihua Zhang, Haotian Shi, Heng Ding, Jinbiao Huo, Bin Ran

    Abstract: As urbanization advances, cities are expanding, leading to a more decentralized urban structure and longer average commuting durations. The construction of an urban expressway system emerges as a critical strategy to tackle this challenge. However, the traditional link-level network design method faces modeling and solution challenges when dealing with the large-scale expressway network design pro… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  45. arXiv:2406.04888  [pdf, other

    cs.CV

    Zero-Shot Video Editing through Adaptive Sliding Score Distillation

    Authors: Lianghan Zhu, Yanqi Bao, Jing Huo, Jing Wu, Yu-Kun Lai, Wenbin Li, Yang Gao

    Abstract: The rapidly evolving field of Text-to-Video generation (T2V) has catalyzed renewed interest in controllable video editing research. While the application of editing prompts to guide diffusion model denoising has gained prominence, mirroring advancements in image editing, this noise-based inference process inherently compromises the original video's integrity, resulting in unintended over-editing a… ▽ More

    Submitted 6 September, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

  46. arXiv:2406.03108  [pdf, ps, other

    hep-ph

    Lepton flavor violating decays $Z\rightarrow l^{\pm}_{i}l^{\mp}_{j}$ in the B-L Supersymmetric Standard Model

    Authors: Jia-Peng Huo, Xing-Xing Dong, Jiao Ma, Shu-Min Zhao, Cai Guo, Hai-Bin Zhang, Jin-Lei Yang, Tai-Fu Feng

    Abstract: Lepton flavor violation (LFV) represents a clear new physics (NP) signal beyond the standard model (SM). In this paper, we study LFV decays $Z\rightarrow l^{\pm}_{i}l^{\mp}_{j}$ in the B-L Supersymmetric Standard Model(B-LSSM). We calculate these processes separately in the mass eigenstate basis and the electroweak interaction basis, and the latter adopt the mass insertion approximation (MIA) meth… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  47. arXiv:2405.10316  [pdf, other

    cs.CV cs.GR

    Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model

    Authors: Zheng Gu, Shiyuan Yang, Jing Liao, Jing Huo, Yang Gao

    Abstract: Visual In-Context Learning (ICL) has emerged as a promising research area due to its capability to accomplish various tasks with limited example pairs through analogical reasoning. However, training-based visual ICL has limitations in its ability to generalize to unseen tasks and requires the collection of a diverse task dataset. On the other hand, existing methods in the inference-based visual IC… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Project page: https://analogist2d.github.io

  48. Predicting possible molecular states of nucleons with $Ξ_c$, $Ξ_c^{*}$, and $Ξ_c^{\prime}$

    Authors: Jin-Yu Huo, Li-Cheng Sheng, Rui Chen, Xiang Liu

    Abstract: In the framework of a one-boson-exchange model, we carry out a comprehensive investigation of the $Ξ_cN/Λ_cΣ/Ξ_c^{\prime}N/Σ_cΛ/Ξ_c^*N/Σ_c^*Λ/Σ_cΣ/Σ_c^*Σ$ interactions. We consider the $S$-$D$-wave mixing effects and the coupled-channel effects to derive the relevant effective potentials. Our results can predict several possible charm-strange deuteronlike $Ξ_c^{(',*)}N$ hexaquarks, the… ▽ More

    Submitted 29 September, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

    Comments: 11 pages

    Journal ref: Phys.Rev.D 110 (2024), 054040

  49. arXiv:2404.16425  [pdf, other

    astro-ph.HE

    Soft X-ray prompt emission from a high-redshift gamma-ray burst EP240315a

    Authors: Y. Liu, H. Sun, D. Xu, D. S. Svinkin, J. Delaunay, N. R. Tanvir, H. Gao, C. Zhang, Y. Chen, X. -F. Wu, B. Zhang, W. Yuan, J. An, G. Bruni, D. D. Frederiks, G. Ghirlanda, J. -W. Hu, A. Li, C. -K. Li, J. -D. Li, D. B. Malesani, L. Piro, G. Raman, R. Ricci, E. Troja , et al. (170 additional authors not shown)

    Abstract: Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a,… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 41 pages, 8 figures, 7 tables

  50. arXiv:2404.10160  [pdf, other

    cs.AI

    Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs

    Authors: Ruoxi Cheng, Haoxuan Ma, Shuirong Cao, Jiaqi Li, Aihua Pei, Zhiqiang Wang, Pengliang Ji, Haoyu Wang, Jiaqi Huo

    Abstract: Bias in LLMs can harm user experience and societal outcomes. However, current bias mitigation methods often require intensive human feedback, lack transferability to other topics or yield overconfident and random outputs. We find that involving LLMs in role-playing scenario boosts their ability to recognize and mitigate biases. Based on this, we propose Reinforcement Learning from Multi-role Debat… ▽ More

    Submitted 16 August, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: The first three authors contributed equally to this work