Skip to main content

Showing 1–50 of 1,523 results for author: Hu, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05265  [pdf, ps, other

    q-bio.GN cs.LG

    BMFM-DNA: A SNP-aware DNA foundation model to capture variant effects

    Authors: Hongyang Li, Sanjoy Dey, Bum Chul Kwon, Michael Danziger, Michal Rosen-Tzvi, Jianying Hu, James Kozloski, Ching-Huei Tsou, Bharath Dandala, Pablo Meyer

    Abstract: Large language models (LLMs) trained on text demonstrated remarkable results on natural language processing (NLP) tasks. These models have been adapted to decipher the language of DNA, where sequences of nucleotides act as "words" that encode genomic functions. However, the genome differs fundamentally from natural language, as it lacks clearly defined words or a consistent grammar. Although DNA l… ▽ More

    Submitted 26 June, 2025; originally announced July 2025.

  2. arXiv:2507.05255  [pdf, ps, other

    cs.CV cs.CL

    Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

    Authors: Yana Wei, Liang Zhao, Jianjian Sun, Kangheng Lin, Jisheng Yin, Jingcheng Hu, Yinmin Zhang, En Yu, Haoran Lv, Zejia Weng, Jia Wang, Chunrui Han, Yuang Peng, Qi Han, Zheng Ge, Xiangyu Zhang, Daxin Jiang, Vishal M. Patel

    Abstract: The remarkable reasoning capability of large language models (LLMs) stems from cognitive behaviors that emerge through reinforcement with verifiable rewards. This work investigates how to transfer this principle to Multimodal LLMs (MLLMs) to unlock advanced visual reasoning. We introduce a two-stage paradigm built on Qwen2.5-VL-7B: a massive linguistic cold-start fine-tuning, followed by multimoda… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  3. arXiv:2507.05057  [pdf, ps, other

    cs.IT

    Circular Holographic MIMO Beamforming for Integrated Data and Energy Multicast Systems

    Authors: Qingxiao Huang, Yizhe Zhao, Jie Hu, Kun Yang, Yuguang Fang

    Abstract: Thanks to the application of metamaterials, holographic multiple-input multiple-output (H-MIMO) is expected to achieve a higher spatial diversity gain with lower hardware complexity. With the aid of a circular antenna arrangement of H-MIMO, integrated data and energy multicast (IDEM) can fully exploit the near-field channel to realize wider range of energy focusing and higher achievable rate. In t… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  4. arXiv:2507.04947  [pdf, ps, other

    cs.CV cs.AI

    DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer

    Authors: Yecheng Wu, Junyu Chen, Zhuoyang Zhang, Enze Xie, Jincheng Yu, Junsong Chen, Jinyi Hu, Yao Lu, Song Han, Han Cai

    Abstract: We introduce DC-AR, a novel masked autoregressive (AR) text-to-image generation framework that delivers superior image generation quality with exceptional computational efficiency. Due to the tokenizers' limitations, prior masked AR models have lagged behind diffusion models in terms of quality or efficiency. We overcome this limitation by introducing DC-HT - a deep compression hybrid tokenizer fo… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: ICCV 2025

  5. arXiv:2507.04870  [pdf, ps, other

    cs.LG

    NTSFormer: A Self-Teaching Graph Transformer for Multimodal Cold-Start Node Classification

    Authors: Jun Hu, Yufei He, Yuan Li, Bryan Hooi, Bingsheng He

    Abstract: Cold-start node classification on multimodal graphs is challenging because cold-start nodes are isolated (i.e., no edges) and often have missing modalities (e.g., absent text or image features). Existing methods address structural isolation by degrading graph learning models to MLPs for cold-start inference, using a teacher model (with graph access) to guide the MLP. However, this results in limit… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  6. arXiv:2507.04758  [pdf, ps, other

    cs.MM

    Music2Palette: Emotion-aligned Color Palette Generation via Cross-Modal Representation Learning

    Authors: Jiayun Hu, Yueyi He, Tianyi Liang, Changbo Wang, Chenhui Li

    Abstract: Emotion alignment between music and palettes is crucial for effective multimedia content, yet misalignment creates confusion that weakens the intended message. However, existing methods often generate only a single dominant color, missing emotion variation. Others rely on indirect mappings through text or images, resulting in the loss of crucial emotion details. To address these challenges, we pre… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  7. arXiv:2507.04631  [pdf, ps, other

    cs.CV cs.AI cs.RO

    Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts

    Authors: Yun Wang, Longguang Wang, Chenghao Zhang, Yongjian Zhang, Zhanjie Zhang, Ao Ma, Chenyou Fan, Tin Lun Lam, Junjie Hu

    Abstract: Recently, learning-based stereo matching networks have advanced significantly. However, they often lack robustness and struggle to achieve impressive cross-domain performance due to domain shifts and imbalanced disparity distributions among diverse datasets. Leveraging Vision Foundation Models (VFMs) can intuitively enhance the model's robustness, but integrating such a model into stereo matching… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Journal ref: ICCV 2025

  8. arXiv:2507.04105  [pdf, ps, other

    cs.AI cs.MA

    Enhancing Robustness of LLM-Driven Multi-Agent Systems through Randomized Smoothing

    Authors: Jinwei Hu, Yi Dong, Zhengtao Ding, Xiaowei Huang

    Abstract: This paper presents a defense framework for enhancing the safety of large language model (LLM) empowered multi-agent systems (MAS) in safety-critical domains such as aerospace. We apply randomized smoothing, a statistical robustness certification technique, to the MAS consensus context, enabling probabilistic guarantees on agent decisions under adversarial influence. Unlike traditional verificatio… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

    Comments: Preprint accepted by Chinese Journal of Aeronautics

  9. arXiv:2507.04100  [pdf, ps, other

    cs.LG cs.AI eess.SY

    Hierarchical Testing with Rabbit Optimization for Industrial Cyber-Physical Systems

    Authors: Jinwei Hu, Zezhi Tang, Xin Jin, Benyuan Zhang, Yi Dong, Xiaowei Huang

    Abstract: This paper presents HERO (Hierarchical Testing with Rabbit Optimization), a novel black-box adversarial testing framework for evaluating the robustness of deep learning-based Prognostics and Health Management systems in Industrial Cyber-Physical Systems. Leveraging Artificial Rabbit Optimization, HERO generates physically constrained adversarial examples that align with real-world data distributio… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

    Comments: Preprint accepted by IEEE Transactions on Industrial Cyber Physical Systems

  10. arXiv:2507.04062  [pdf, ps, other

    cs.CV cs.AI

    Stochastic Human Motion Prediction with Memory of Action Transition and Action Characteristic

    Authors: Jianwei Tang, Hong Yang, Tengyue Chen, Jian-Fang Hu

    Abstract: Action-driven stochastic human motion prediction aims to generate future motion sequences of a pre-defined target action based on given past observed sequences performing non-target actions. This task primarily presents two challenges. Firstly, generating smooth transition motions is hard due to the varying transition speeds of different actions. Secondly, the action characteristic is difficult to… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

    Comments: accepted by CVPR2025

    Journal ref: Proceedings of the Computer Vision and Pattern Recognition Conference. 2025: 1883-1893

  11. arXiv:2507.04060  [pdf, ps, other

    cs.CV cs.AI

    Temporal Continual Learning with Prior Compensation for Human Motion Prediction

    Authors: Jianwei Tang, Jiangxin Sun, Xiaotong Lin, Lifang Zhang, Wei-Shi Zheng, Jian-Fang Hu

    Abstract: Human Motion Prediction (HMP) aims to predict future poses at different moments according to past motion sequences. Previous approaches have treated the prediction of various moments equally, resulting in two main limitations: the learning of short-term predictions is hindered by the focus on long-term predictions, and the incorporation of prior information from past predictions into subsequent pr… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

    Comments: Advances in Neural Information Processing Systems 2023

    Journal ref: Advances in Neural Information Processing Systems, 2023, 36: 65837-65849

  12. arXiv:2507.03243  [pdf, ps, other

    cs.HC

    Beyond Charging Anxiety: An Explainable Approach to Understanding User Preferences of EV Charging Stations Using Review Data

    Authors: Zifei Wang, Emmanuel Abolarin, Kai Wu, Venkatarao Rebba, Jian Hu, Zhen Hu, Shan Bao, Feng Zhou

    Abstract: Electric vehicles (EVs) charging infrastructure is directly related to the overall EV user experience and thus impacts the widespread adoption of EVs. Understanding key factors that affect EV users' charging experience is essential for building a robust and user-friendly EV charging infrastructure. This study leverages about $17,000$ charging station (CS) reviews on Google Maps to explore EV user… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 19 pages, 8 figures

  13. arXiv:2507.03227  [pdf, ps, other

    cs.RO

    Dexterous Teleoperation of 20-DoF ByteDexter Hand via Human Motion Retargeting

    Authors: Ruoshi Wen, Jiajun Zhang, Guangzeng Chen, Zhongren Cui, Min Du, Yang Gou, Zhigang Han, Junkai Hu, Liqun Huang, Hao Niu, Wei Xu, Haoxiang Zhang, Zhengming Zhu, Hang Li, Zeyu Ren

    Abstract: Replicating human--level dexterity remains a fundamental robotics challenge, requiring integrated solutions from mechatronic design to the control of high degree--of--freedom (DoF) robotic hands. While imitation learning shows promise in transferring human dexterity to robots, the efficacy of trained policies relies on the quality of human demonstration data. We bridge this gap with a hand--arm te… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: Tech Report. Project page: https://byte-dexter.github.io/

  14. arXiv:2507.02345  [pdf, ps, other

    q-bio.BM cs.AI

    HelixDesign-Antibody: A Scalable Production-Grade Platform for Antibody Design Built on HelixFold3

    Authors: Jie Gao, Jing Hu, Shanzhuo Zhang, Kunrui Zhu, Sheng Qian, Yueyang Huang, Xiaonan Zhang, Xiaomin Fang

    Abstract: Antibody engineering is essential for developing therapeutics and advancing biomedical research. Traditional discovery methods often rely on time-consuming and resource-intensive experimental screening. To enhance and streamline this process, we introduce a production-grade, high-throughput platform built on HelixFold3, HelixDesign-Antibody, which utilizes the high-accuracy structure prediction mo… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  15. arXiv:2507.01938  [pdf, ps, other

    cs.CV

    CI-VID: A Coherent Interleaved Text-Video Dataset

    Authors: Yiming Ju, Jijin Hu, Zhengxiong Luo, Haoge Deng, hanyu Zhao, Li Du, Chengwei Wu, Donglin Hao, Xinlong Wang, Tengfei Pan

    Abstract: Text-to-video (T2V) generation has recently attracted considerable attention, resulting in the development of numerous high-quality datasets that have propelled progress in this area. However, existing public datasets are primarily composed of isolated text-video (T-V) pairs and thus fail to support the modeling of coherent multi-clip video sequences. To address this limitation, we introduce CI-VI… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  16. arXiv:2507.01616  [pdf, ps, other

    cs.IR cs.AI cs.DB

    Enhanced Influence-aware Group Recommendation for Online Media Propagation

    Authors: Chengkun He, Xiangmin Zhou, Chen Wang, Longbing Cao, Jie Shao, Xiaodong Li, Guang Xu, Carrie Jinqiu Hu, Zahir Tari

    Abstract: Group recommendation over social media streams has attracted significant attention due to its wide applications in domains such as e-commerce, entertainment, and online news broadcasting. By leveraging social connections and group behaviours, group recommendation (GR) aims to provide more accurate and engaging content to a set of users rather than individuals. Recently, influence-aware GR has emer… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  17. arXiv:2507.00755  [pdf

    eess.AS cs.AI cs.SD

    LearnAFE: Circuit-Algorithm Co-design Framework for Learnable Audio Analog Front-End

    Authors: Jinhai Hu, Zhongyi Zhang, Cong Sheng Leow, Wang Ling Goh, Yuan Gao

    Abstract: This paper presents a circuit-algorithm co-design framework for learnable analog front-end (AFE) in audio signal classification. Designing AFE and backend classifiers separately is a common practice but non-ideal, as shown in this paper. Instead, this paper proposes a joint optimization of the backend classifier with the AFE's transfer function to achieve system-level optimum. More specifically, t… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 11 pages, 15 figures, accepted for publication on IEEE Transactions on Circuits and Systems I: Regular Papers

  18. arXiv:2507.00566  [pdf, ps, other

    cs.CV

    Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment

    Authors: Kai Zhou, Shuhai Zhang, Zeng You, Jinwu Hu, Mingkui Tan, Fei Liu

    Abstract: Zero-shot skeleton-based action recognition aims to classify unseen skeleton-based human actions without prior exposure to such categories during training. This task is extremely challenging due to the difficulty in generalizing from known to unknown actions. Previous studies typically use two-stage training: pre-training skeleton encoders on seen action categories using cross-entropy loss and the… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: This paper is accepted by IEEE TIP 2025. Code is publicly available at https://github.com/kaai520/PGFA

  19. arXiv:2507.00271  [pdf, ps, other

    cs.HC cs.RO

    User Concerns Regarding Social Robots for Mood Regulation: A Case Study on the "Sunday Blues"

    Authors: Zhuochao Peng, Jiaxin Xu, Jun Hu, Haian Xue, Laurens A. G. Kolks, Pieter M. A. Desmet

    Abstract: While recent research highlights the potential of social robots to support mood regulation, little is known about how prospective users view their integration into everyday life. To explore this, we conducted an exploratory case study that used a speculative robot concept "Mora" to provoke reflection and facilitate meaningful discussion about using social robots to manage subtle, day-to-day emotio… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

    Comments: Accepted to International Conference on Social Robotics + AI (ICSR 2025)

  20. arXiv:2506.23644  [pdf, ps, other

    cs.SE cs.AI cs.CR

    QLPro: Automated Code Vulnerability Discovery via LLM and Static Code Analysis Integration

    Authors: Junze Hu, Xiangyu Jin, Yizhe Zeng, Yuling Liu, Yunpeng Li, Dan Du, Kaiyu Xie, Hongsong Zhu

    Abstract: We introduce QLPro, a vulnerability detection framework that systematically integrates LLMs and static analysis tools to enable comprehensive vulnerability detection across entire open-source projects.We constructed a new dataset, JavaTest, comprising 10 open-source projects from GitHub with 62 confirmed vulnerabilities. CodeQL, a state-of-the-art static analysis tool, detected only 24 of these vu… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  21. arXiv:2506.22769  [pdf, ps, other

    cs.RO

    Learning Efficient Robotic Garment Manipulation with Standardization

    Authors: Changshi Zhou, Feng Luan, Jiarui Hu, Shaoqiang Meng, Zhipeng Wang, Yanchao Dong, Yanmin Zhou, Bin He

    Abstract: Garment manipulation is a significant challenge for robots due to the complex dynamics and potential self-occlusion of garments. Most existing methods of efficient garment unfolding overlook the crucial role of standardization of flattened garments, which could significantly simplify downstream tasks like folding, ironing, and packing. This paper presents APS-Net, a novel approach to garment manip… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  22. arXiv:2506.22228  [pdf, ps, other

    stat.ML cs.LG q-bio.GN stat.AP

    Uncovering smooth structures in single-cell data with PCS-guided neighbor embeddings

    Authors: Rong Ma, Xi Li, Jingyuan Hu, Bin Yu

    Abstract: Single-cell sequencing is revolutionizing biology by enabling detailed investigations of cell-state transitions. Many biological processes unfold along continuous trajectories, yet it remains challenging to extract smooth, low-dimensional representations from inherently noisy, high-dimensional single-cell data. Neighbor embedding (NE) algorithms, such as t-SNE and UMAP, are widely used to embed hi… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  23. arXiv:2506.22049  [pdf, ps, other

    cs.LG cs.CL

    GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling

    Authors: Tianhao Chen, Xin Xu, Zijing Liu, Pengxiang Li, Xinyuan Song, Ajay Kumar Jaiswal, Fan Zhang, Jishan Hu, Yang Wang, Hao Chen, Shizhe Diao, Shiwei Liu, Yu Li, Lu Yin, Can Yang

    Abstract: Modern Large Language Models, such as the LLaMA, Qwen and DeepSeek series, predominantly adopt the Pre-LayerNorm (Pre-LN) Transformer architecture. While being stable during pretraining and scalable to large model sizes, Pre-LN suffers from an exponential growth in activation variance across layers, causing the shortcut to dominate over sub-layer outputs in the residual connection and limiting the… ▽ More

    Submitted 3 July, 2025; v1 submitted 27 June, 2025; originally announced June 2025.

  24. arXiv:2506.21285  [pdf, ps, other

    cs.CL

    Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning

    Authors: Xin Xu, Tianhao Chen, Fan Zhang, Wanlong Liu, Pengxiang Li, Ajay Kumar Jaiswal, Yuchen Yan, Jishan Hu, Yang Wang, Hao Chen, Shiwei Liu, Shizhe Diao, Can Yang, Lu Yin

    Abstract: While slow-thinking large language models (LLMs) exhibit reflection-like reasoning, commonly referred to as the "aha moment:, their ability to generate informative critiques and refine prior solutions remains limited. In this paper, we introduce Double-Checker, a principled framework designed to enhance the reasoning capabilities of slow-thinking LLMs by fostering explicit self-critique and iterat… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: 10 pages

  25. arXiv:2506.21135  [pdf, ps, other

    cs.CV

    YOLO-FDA: Integrating Hierarchical Attention and Detail Enhancement for Surface Defect Detection

    Authors: Jiawei Hu

    Abstract: Surface defect detection in industrial scenarios is both crucial and technically demanding due to the wide variability in defect types, irregular shapes and sizes, fine-grained requirements, and complex material textures. Although recent advances in AI-based detectors have improved performance, existing methods often suffer from redundant features, limited detail sensitivity, and weak robustness u… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: 14 pages, 6 figures. Submitted to The 8th Chinese Conference on Pattern Recognition and Computer Vision

  26. arXiv:2506.20666  [pdf, ps, other

    cs.CL cs.AI

    Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs

    Authors: Sonia K. Murthy, Rosie Zhao, Jennifer Hu, Sham Kakade, Markus Wulfmeier, Peng Qian, Tomer Ullman

    Abstract: Navigating everyday social situations often requires juggling conflicting goals, such as conveying a harsh truth, maintaining trust, all while still being mindful of another person's feelings. These value trade-offs are an integral part of human decision-making and language use, however, current tools for interpreting such dynamic and multi-faceted notions of values in LLMs are limited. In cogniti… ▽ More

    Submitted 6 July, 2025; v1 submitted 25 June, 2025; originally announced June 2025.

    Comments: 11 pages, 3 figures

  27. arXiv:2506.20251  [pdf, ps, other

    cs.LG cs.AI

    Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models

    Authors: Kejia Chen, Jiawen Zhang, Jiacong Hu, Yu Wang, Jian Lou, Zunlei Feng, Mingli Song

    Abstract: Quantized large language models (LLMs) have gained increasing attention and significance for enabling deployment in resource-constrained environments. However, emerging studies on a few calibration dataset-free quantization methods suggest that quantization may compromise the safety capabilities of LLMs, underscoring the urgent need for systematic safety evaluations and effective mitigation strate… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: ICML 2025

  28. arXiv:2506.19893  [pdf, ps, other

    cs.LG cs.AI cs.IT eess.IV

    Distillation-Enabled Knowledge Alignment for Generative Semantic Communications in AIGC Provisioning Tasks

    Authors: Jingzhi Hu, Geoffrey Ye Li

    Abstract: Due to the surging amount of AI-generated content (AIGC), its provisioning to edges and mobile users from the cloud incurs substantial traffic on networks. Generative semantic communication (GSC) offers a promising solution by transmitting highly compact information, i.e., prompt text and latent representations, instead of high-dimensional AIGC data. However, GSC relies on the alignment between th… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  29. arXiv:2506.19852  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation

    Authors: Xingyang Li, Muyang Li, Tianle Cai, Haocheng Xi, Shuo Yang, Yujun Lin, Lvmin Zhang, Songlin Yang, Jinbo Hu, Kelly Peng, Maneesh Agrawala, Ion Stoica, Kurt Keutzer, Song Han

    Abstract: Recent advances in diffusion models have enabled high-quality video generation, but the additional temporal dimension significantly increases computational costs, making training and inference on long videos prohibitively expensive. In this paper, we identify a phenomenon we term Spatiotemporal Energy Decay in video diffusion models: post-softmax attention scores diminish as spatial and temporal d… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: Code: https://github.com/mit-han-lab/radial-attention

  30. arXiv:2506.19846  [pdf, ps, other

    cs.AI

    JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning

    Authors: Ai Han, Junxing Hu, Pu Wei, Zhiqian Zhang, Yuhang Guo, Jiawei Lu, Zicheng Zhang

    Abstract: Multi-agent reinforcement learning (MARL) has emerged as a prominent paradigm for increasingly complex tasks. However, joint evolution across heterogeneous agents remains challenging due to cooperative inefficiency and training instability. In this paper, we propose the joint evolution dynamics for MARL called JoyAgents-R1, which first applies Group Relative Policy Optimization (GRPO) to the joint… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: 33 pages, 7 figures, under review

  31. arXiv:2506.19340  [pdf, ps, other

    physics.space-ph cs.LG

    CAM-NET: An AI Model for Whole Atmosphere with Thermosphere and Ionosphere Extension

    Authors: Jiahui Hu, Wenjun Dong

    Abstract: We present Compressible Atmospheric Model-Network (CAM-NET), an AI model designed to predict neutral atmospheric variables from the Earth's surface to the ionosphere with high accuracy and computational efficiency. Accurate modeling of the entire atmosphere is critical for understanding the upward propagation of gravity waves, which influence upper-atmospheric dynamics and coupling across atmosphe… ▽ More

    Submitted 1 July, 2025; v1 submitted 24 June, 2025; originally announced June 2025.

  32. arXiv:2506.18931  [pdf, ps, other

    cs.LG cs.AI

    Safe Pruning LoRA: Robust Distance-Guided Pruning for Safety Alignment in Adaptation of LLMs

    Authors: Shuang Ao, Yi Dong, Jinwei Hu, Sarvapali Ramchurn

    Abstract: Fine-tuning Large Language Models (LLMs) with Low-Rank Adaptation (LoRA) enhances adaptability while reducing computational costs. However, fine-tuning can compromise safety alignment, even with benign data, increasing susceptibility to harmful outputs. Existing safety alignment methods struggle to capture complex parameter shifts, leading to suboptimal safety-utility trade-offs. To address this i… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: 13 pages, 3 figures

  33. arXiv:2506.18527  [pdf, ps, other

    cs.CV

    Auto-Regressively Generating Multi-View Consistent Images

    Authors: JiaKui Hu, Yuxiao Yang, Jialun Liu, Jinbo Wu, Chen Zhao, Yanye Lu

    Abstract: Generating multi-view images from human instructions is crucial for 3D content creation. The primary challenges involve maintaining consistency across multiple views and effectively synthesizing shapes and textures under diverse conditions. In this paper, we propose the Multi-View Auto-Regressive (MV-AR) method, which leverages an auto-regressive model to progressively generate consistent multi-vi… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  34. arXiv:2506.18520  [pdf, ps, other

    cs.CV

    Enhancing Image Restoration Transformer via Adaptive Translation Equivariance

    Authors: JiaKui Hu, Zhengjian Yao, Lujia Jin, Hangzhou He, Yanye Lu

    Abstract: Translation equivariance is a fundamental inductive bias in image restoration, ensuring that translated inputs produce translated outputs. Attention mechanisms in modern restoration transformers undermine this property, adversely impacting both training convergence and generalization. To alleviate this issue, we propose two key strategies for incorporating translation equivariance: slide indexing… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  35. arXiv:2506.18476  [pdf, ps, other

    cs.CV

    Context Consistency Learning via Sentence Removal for Semi-Supervised Video Paragraph Grounding

    Authors: Yaokun Zhong, Siyu Jiang, Jian Zhu, Jian-Fang Hu

    Abstract: Semi-Supervised Video Paragraph Grounding (SSVPG) aims to localize multiple sentences in a paragraph from an untrimmed video with limited temporal annotations. Existing methods focus on teacher-student consistency learning and video-level contrastive loss, but they overlook the importance of perturbing query contexts to generate strong supervisory signals. In this work, we propose a novel Context… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Accepted by ICME2025

  36. arXiv:2506.18046  [pdf, ps, other

    cs.LG

    TAB: Unified Benchmarking of Time Series Anomaly Detection Methods

    Authors: Xiangfei Qiu, Zhe Li, Wanghui Qiu, Shiyan Hu, Lekui Zhou, Xingjian Wu, Zhengyu Li, Chenjuan Guo, Aoying Zhou, Zhenli Sheng, Jilin Hu, Christian S. Jensen, Bin Yang

    Abstract: Time series anomaly detection (TSAD) plays an important role in many domains such as finance, transportation, and healthcare. With the ongoing instrumentation of reality, more time series data will be available, leading also to growing demands for TSAD. While many TSAD methods already exist, new and better methods are still desirable. However, effective progress hinges on the availability of relia… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: Accepted by PVLDB2025

  37. arXiv:2506.17874  [pdf, ps, other

    stat.ML cs.CV cs.LG

    DRO-Augment Framework: Robustness by Synergizing Wasserstein Distributionally Robust Optimization and Data Augmentation

    Authors: Jiaming Hu, Debarghya Mukherjee, Ioannis Ch. Paschalidis

    Abstract: In many real-world applications, ensuring the robustness and stability of deep neural networks (DNNs) is crucial, particularly for image classification tasks that encounter various input perturbations. While data augmentation techniques have been widely adopted to enhance the resilience of a trained model against such perturbations, there remains significant room for improvement in robustness agai… ▽ More

    Submitted 24 June, 2025; v1 submitted 21 June, 2025; originally announced June 2025.

    Comments: 26 pages,3 figures

  38. arXiv:2506.16796  [pdf, ps, other

    cs.CV

    RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought

    Authors: Junbo Qiao, Miaomiao Cai, Wei Li, Yutong Liu, Xudong Huang, Gaoqi He, Jiao Xie, Jie Hu, Xinghao Chen, Shaohui Lin

    Abstract: Real-World Image Super-Resolution is one of the most challenging task in image restoration. However, existing methods struggle with an accurate understanding of degraded image content, leading to reconstructed results that are both low-fidelity and unnatural. We present RealSR-R1 in this work, which empowers the RealSR models with understanding and reasoning capabilities. Inspired by the success o… ▽ More

    Submitted 23 June, 2025; v1 submitted 20 June, 2025; originally announced June 2025.

  39. arXiv:2506.15880  [pdf

    cs.AI cs.LG

    Deep Reinforcement Learning Xiangqi Player with Monte Carlo Tree Search

    Authors: Berk Yilmaz, Junyu Hu, Jinsong Liu

    Abstract: This paper presents a Deep Reinforcement Learning (DRL) system for Xiangqi (Chinese Chess) that integrates neural networks with Monte Carlo Tree Search (MCTS) to enable strategic self-play and self-improvement. Addressing the underexplored complexity of Xiangqi, including its unique board layout, piece movement constraints, and victory conditions, our approach combines policy-value networks with M… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: All authors contributed equally to this work.24 pages, 10 figures

    MSC Class: 68T05; 68T20

  40. arXiv:2506.15868  [pdf, ps, other

    cs.RO

    CooperRisk: A Driving Risk Quantification Pipeline with Multi-Agent Cooperative Perception and Prediction

    Authors: Mingyue Lei, Zewei Zhou, Hongchen Li, Jia Hu, Jiaqi Ma

    Abstract: Risk quantification is a critical component of safe autonomous driving, however, constrained by the limited perception range and occlusion of single-vehicle systems in complex and dense scenarios. Vehicle-to-everything (V2X) paradigm has been a promising solution to sharing complementary perception information, nevertheless, how to ensure the risk interpretability while understanding multi-agent i… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: IROS2025

  41. arXiv:2506.15524  [pdf, ps, other

    cs.CV

    NTIRE 2025 Image Shadow Removal Challenge Report

    Authors: Florin-Alexandru Vasluianu, Tim Seizinger, Zhuyun Zhou, Cailian Chen, Zongwei Wu, Radu Timofte, Mingjia Li, Jin Hu, Hainuo Wang, Hengxing Liu, Jiarui Wang, Qiming Hu, Xiaojie Guo, Xin Lu, Jiarong Yang, Yuanfei Bao, Anya Hu, Zihao Fan, Kunyu Wang, Jie Xiao, Xi Wang, Xueyang Fu, Zheng-Jun Zha, Yu-Fan Lin, Chia-Ming Lee , et al. (57 additional authors not shown)

    Abstract: This work examines the findings of the NTIRE 2025 Shadow Removal Challenge. A total of 306 participants have registered, with 17 teams successfully submitting their solutions during the final evaluation phase. Following the last two editions, this challenge had two evaluation tracks: one focusing on reconstruction fidelity and the other on visual perception through a user study. Both tracks were e… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  42. arXiv:2506.15477  [pdf, ps, other

    cs.CV

    Multimodal Large Language Models for Medical Report Generation via Customized Prompt Tuning

    Authors: Chunlei Li, Jingyang Hou, Yilei Shi, Jingliang Hu, Xiao Xiang Zhu, Lichao Mou

    Abstract: Medical report generation from imaging data remains a challenging task in clinical practice. While large language models (LLMs) show great promise in addressing this challenge, their effective integration with medical imaging data still deserves in-depth exploration. In this paper, we present MRG-LLM, a novel multimodal large language model (MLLM) that combines a frozen LLM with a learnable visual… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  43. arXiv:2506.14861  [pdf, ps, other

    q-bio.GN cs.AI q-bio.QM

    BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models

    Authors: Bharath Dandala, Michael M. Danziger, Ella Barkan, Tanwi Biswas, Viatcheslav Gurev, Jianying Hu, Matthew Madgwick, Akira Koseki, Tal Kozlovski, Michal Rosen-Zvi, Yishai Shimoni, Ching-Huei Tsou

    Abstract: Transcriptomic foundation models (TFMs) have recently emerged as powerful tools for analyzing gene expression in cells and tissues, supporting key tasks such as cell-type annotation, batch correction, and perturbation prediction. However, the diversity of model implementations and training strategies across recent TFMs, though promising, makes it challenging to isolate the contribution of individu… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  44. arXiv:2506.13315  [pdf, ps, other

    cs.IR

    Gated Rotary-Enhanced Linear Attention for Long-term Sequential Recommendation

    Authors: Juntao Hu, Wei Zhou, Huayi Shen, Xiao Du, Jie Liao, Junhao Wen, Min Gao

    Abstract: In Sequential Recommendation Systems (SRSs), Transformer models show remarkable performance but face computation cost challenges when modeling long-term user behavior sequences due to the quadratic complexity of the dot-product attention mechanism. By approximating the dot-product attention, linear attention provides an efficient option with linear complexity. However, existing linear attention me… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 24 pages,9 figures

  45. arXiv:2506.12712  [pdf, ps, other

    cs.CV eess.IV

    Combining Self-attention and Dilation Convolutional for Semantic Segmentation of Coal Maceral Groups

    Authors: Zhenghao Xi, Zhengnan Lv, Yang Zheng, Xiang Liu, Zhuang Yu, Junran Chen, Jing Hu, Yaqi Liu

    Abstract: The segmentation of coal maceral groups can be described as a semantic segmentation process of coal maceral group images, which is of great significance for studying the chemical properties of coal. Generally, existing semantic segmentation models of coal maceral groups use the method of stacking parameters to achieve higher accuracy. It leads to increased computational requirements and impacts mo… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  46. arXiv:2506.11496  [pdf, ps, other

    eess.IV cs.CV

    Taming Stable Diffusion for Computed Tomography Blind Super-Resolution

    Authors: Chunlei Li, Yilei Shi, Haoxi Hu, Jingliang Hu, Xiao Xiang Zhu, Lichao Mou

    Abstract: High-resolution computed tomography (CT) imaging is essential for medical diagnosis but requires increased radiation exposure, creating a critical trade-off between image quality and patient safety. While deep learning methods have shown promise in CT super-resolution, they face challenges with complex degradations and limited medical training data. Meanwhile, large-scale pre-trained diffusion mod… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  47. arXiv:2506.11418  [pdf, ps, other

    cs.CL

    Efficient Long-Context LLM Inference via KV Cache Clustering

    Authors: Jie Hu, Shengnan Wang, Yutong He, Ping Gong, Jiawei Yi, Juncheng Zhang, Youhui Bai, Renhai Chen, Gong Zhang, Cheng Li, Kun Yuan

    Abstract: Large language models (LLMs) with extended context windows have become increasingly prevalent for tackling complex tasks. However, the substantial Key-Value (KV) cache required for long-context LLMs poses significant deployment challenges. Existing approaches either discard potentially critical information needed for future generations or offer limited efficiency gains due to high computational ov… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  48. arXiv:2506.11339  [pdf, ps, other

    cs.CY eess.SY

    WIP: Exploring the Value of a Debugging Cheat Sheet and Mini Lecture in Improving Undergraduate Debugging Skills and Mindset

    Authors: Andrew Ash, John Hu

    Abstract: This work-in-progress research paper explores the efficacy of a small-scale microelectronics debugging education intervention utilizing quasi-experimental design in an introductory microelectronics course for third-year electrical and computer engineering (ECE) students. In the first semester of research, the experimental group attended a debugging "mini lecture" covering two common sources of cir… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: This is the accepted version of a paper accepted for presentation at the 2025 IEEE Frontiers in Education Conference (FIE). The final version will be available via IEEE Xplore at: https://ieeexplore.ieee.org

  49. arXiv:2506.11332  [pdf, ps, other

    cond-mat.mtrl-sci cs.LG

    Polymorphism Crystal Structure Prediction with Adaptive Space Group Diversity Control

    Authors: Sadman Sadeed Omee, Lai Wei, Sourin Dey, Jianjun Hu

    Abstract: Crystalline materials can form different structural arrangements (i.e. polymorphs) with the same chemical composition, exhibiting distinct physical properties depending on how they were synthesized or the conditions under which they operate. For example, carbon can exist as graphite (soft, conductive) or diamond (hard, insulating). Computational methods that can predict these polymorphs are vital… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  50. arXiv:2506.10501  [pdf, ps, other

    cs.SE cs.LG

    BugGen: A Self-Correcting Multi-Agent LLM Pipeline for Realistic RTL Bug Synthesis

    Authors: Surya Jasper, Minh Luu, Evan Pan, Aakash Tyagi, Michael Quinn, Jiang Hu, David Kebo Houngninou

    Abstract: Hardware complexity continues to strain verification resources, motivating the adoption of machine learning (ML) methods to improve debug efficiency. However, ML-assisted debugging critically depends on diverse and scalable bug datasets, which existing manual or automated bug insertion methods fail to reliably produce. We introduce BugGen, a first of its kind, fully autonomous, multi-agent pipelin… ▽ More

    Submitted 18 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.