Skip to main content

Showing 51–100 of 1,963 results for author: Jiang, L

.
  1. arXiv:2505.11158  [pdf, ps, other

    eess.IV cs.CV

    Recent Advances in Diffusion Models for Hyperspectral Image Processing and Analysis: A Review

    Authors: Xing Hu, Xiangcheng Liu, Danfeng Hong, Qianqian Duan, Linghua Jiang, Haima Yang, Dawei Zhan

    Abstract: Hyperspectral image processing and analysis has important application value in remote sensing, agriculture and environmental monitoring, but its high dimensionality, data redundancy and noise interference etc. bring great challenges to the analysis. Traditional models have limitations in dealing with these complex data, and it is difficult to meet the increasing demand for analysis. In recent year… ▽ More

    Submitted 27 May, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

  2. arXiv:2505.11095  [pdf, ps, other

    cs.CL

    Towards Better Evaluation for Generated Patent Claims

    Authors: Lekang Jiang, Pascal A Scherz, Stephan Goetz

    Abstract: Patent claims define the scope of protection and establish the legal boundaries of an invention. Drafting these claims is a complex and time-consuming process that usually requires the expertise of skilled patent attorneys, which can form a large access barrier for many small enterprises. To solve these challenges, researchers have investigated the use of large language models (LLMs) for automatin… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: Accepted to ACL 2025. 14 pages, 8 tables

  3. arXiv:2505.10499  [pdf, ps, other

    quant-ph

    Achievable rates for concatenated square Gottesman-Kitaev-Preskill codes

    Authors: Mahadevan Subramanian, Guo Zheng, Liang Jiang

    Abstract: The Gottesman-Kitaev-Preskill (GKP) codes are known to achieve optimal rates under displacement noise and pure loss channels, which establishes theoretical foundations for its optimality. However, such optimal rates are only known to be achieved at a discrete set of noise strength with the current self-dual symplectic lattice construction. In this work, we develop a new coding strategy using conca… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 14+15 pages, 6+4 figures

  4. arXiv:2505.09687  [pdf, ps, other

    quant-ph

    Efficient benchmarking of logical magic state

    Authors: Su-un Lee, Ming Yuan, Senrui Chen, Kento Tsubouchi, Liang Jiang

    Abstract: High-fidelity logical magic states are a critical resource for fault-tolerant quantum computation, enabling non-Clifford logical operations through state injection. However, benchmarking these states presents significant challenges: one must estimate the infidelity $ε$ with multiplicative precision, while many quantum error-correcting codes only permit Clifford operations to be implemented fault-t… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  5. arXiv:2505.06810  [pdf, ps, other

    cs.ET

    QSeer: A Quantum-Inspired Graph Neural Network for Parameter Initialization in Quantum Approximate Optimization Algorithm Circuits

    Authors: Lei Jiang, Chi Zhang, Fan Chen

    Abstract: To mitigate the barren plateau problem, effective parameter initialization is crucial for optimizing the Quantum Approximate Optimization Algorithm (QAOA) in the near-term Noisy Intermediate-Scale Quantum (NISQ) era. Prior physics-driven approaches leveraged the optimal parameter concentration phenomenon, utilizing medium values of previously optimized QAOA parameters stored in databases as initia… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  6. arXiv:2505.06690  [pdf

    cs.LG

    E2E-FANet: A Highly Generalizable Framework for Waves prediction Behind Floating Breakwaters via Exogenous-to-Endogenous Variable Attention

    Authors: Jianxin Zhang, Lianzi Jiang, Xinyu Han, Xiangrong Wang, Weinan Huang

    Abstract: Accurate prediction of waves behind floating breakwaters (FB) is crucial for optimizing coastal engineering structures, enhancing safety, and improving design efficiency. Existing methods demonstrate limitations in capturing nonlinear interactions between waves and structures, while exhibiting insufficient capability in modeling the complex frequency-domain relationships among elevations of differ… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  7. arXiv:2505.06688  [pdf

    cs.LG

    A Novel Framework for Significant Wave Height Prediction based on Adaptive Feature Extraction Time-Frequency Network

    Authors: Jianxin Zhang, Lianzi Jiang, Xinyu Han, Xiangrong Wang

    Abstract: Precise forecasting of significant wave height (Hs) is essential for the development and utilization of wave energy. The challenges in predicting Hs arise from its non-linear and non-stationary characteristics. The combination of decomposition preprocessing and machine learning models have demonstrated significant effectiveness in Hs prediction by extracting data features. However, decomposing the… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  8. arXiv:2505.06635  [pdf, ps, other

    cs.CV

    Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization

    Authors: Xu Zheng, Yuanhuiyi Lyu, Lutao Jiang, Danda Pani Paudel, Luc Van Gool, Xuming Hu

    Abstract: Fusing and balancing multi-modal inputs from novel sensors for dense prediction tasks, particularly semantic segmentation, is critically important yet remains a significant challenge. One major limitation is the tendency of multi-modal frameworks to over-rely on easily learnable modalities, a phenomenon referred to as unimodal dominance or bias. This issue becomes especially problematic in real-wo… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  9. Virtualized 3D Gaussians: Flexible Cluster-based Level-of-Detail System for Real-Time Rendering of Composed Scenes

    Authors: Xijie Yang, Linning Xu, Lihan Jiang, Dahua Lin, Bo Dai

    Abstract: 3D Gaussian Splatting (3DGS) enables the reconstruction of intricate digital 3D assets from multi-view images by leveraging a set of 3D Gaussian primitives for rendering. Its explicit and discrete representation facilitates the seamless composition of complex digital worlds, offering significant advantages over previous neural implicit methods. However, when applied to large-scale compositions, su… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: project page: https://xijie-yang.github.io/V3DG/

  10. arXiv:2505.05766  [pdf, ps, other

    astro-ph.HE

    Measurement of separate electron and positron spectra from 10 GeV to 20GeV with the geomagnetic field on DAMPE

    Authors: DAMPE Collaboration, F. Alemanno, Q. An, P. Azzarello, F. C. T. Barbato, P. Bernardini, X. J. Bi, H. Boutin, I. Cagnoli, M. S. Cai, E. Casilli, E. Catanzani, J. Chang, D. Y. Chen, J. L. Chen, Z. F. Chen, Z. X. Chen, P. Coppin, M. Y. Cui, T. S. Cui, Y. X. Cui, I. DeMitri, F. dePalma, A. DiGiovanni, T. K. Dong , et al. (127 additional authors not shown)

    Abstract: The cosmic-ray (CR) electrons and positrons in space are of great significance for studying the origin and propagation of cosmic-rays. The satellite-borne experiment DArk Matter Particle Explorer (DAMPE) has been used to measure the separate electron and positron spectra, as well as the positron fraction. In this work, the Earth's magnetic field is used to distinguish CR electrons and positrons, a… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 18 pages, 5 figures

  11. Automated Learning of Semantic Embedding Representations for Diffusion Models

    Authors: Limai Jiang, Yunpeng Cai

    Abstract: Generative models capture the true distribution of data, yielding semantically rich representations. Denoising diffusion models (DDMs) exhibit superior generative capabilities, though efficient representation learning for them are lacking. In this work, we employ a multi-level denoising autoencoder framework to expand the representation capacity of DDMs, which introduces sequentially consistent Di… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Extended version of the paper published in SDM25

  12. arXiv:2505.05679  [pdf, other

    cs.SE

    From Bias To Improved Prompts: A Case Study of Bias Mitigation of Clone Detection Models

    Authors: QiHong Chen, Lianghao Jiang, Iftekhar Ahmed

    Abstract: The issue of clone code has persisted in software engineering, primarily because developers often copy and paste code segments. This common practice has elevated the importance of clone code detection, garnering attention from both software engineering researchers and industry professionals. Their collective concern arises from the potential negative impacts that clone code can have on software qu… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  13. arXiv:2505.05271  [pdf, other

    cs.CL cs.AI

    T-T: Table Transformer for Tagging-based Aspect Sentiment Triplet Extraction

    Authors: Kun Peng, Chaodong Tong, Cong Cao, Hao Peng, Qian Li, Guanlin Wu, Lei Jiang, Yanbing Liu, Philip S. Yu

    Abstract: Aspect sentiment triplet extraction (ASTE) aims to extract triplets composed of aspect terms, opinion terms, and sentiment polarities from given sentences. The table tagging method is a popular approach to addressing this task, which encodes a sentence into a 2-dimensional table, allowing for the tagging of relations between any two words. Previous efforts have focused on designing various downstr… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCAI2025

  14. arXiv:2505.04591  [pdf, other

    quant-ph

    Timescales, Squeezing and Heisenberg Scalings in Many-Body Continuous Sensing

    Authors: Gideon Lee, Ron Belyansky, Liang Jiang, Aashish A. Clerk

    Abstract: The continuous monitoring of driven-dissipative systems offers new avenues for quantum advantage in metrology. This approach mixes temporal and spatial correlations in a manner distinct from traditional metrology, leading to ambiguities in how one identifies Heisenberg scalings (e.g.~standard asymptotic metrics like the sensitivity are not bounded by system size). Here, we propose a new metric for… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 18 pages including supplementary material, 2 figures

  15. arXiv:2505.02351  [pdf, ps, other

    cs.DC

    Opt-GPTQ: An Optimized GPTQ Combining Sparse Attention and Quantization Techniques

    Authors: Jie Kong, Junxiang Zhang, Jiheng Xu, Yalong Li, Shouhua Zhang, Jiehan Zhou, Yuhai Liu, Peng Liang, Quan Zhang, Luohan Jiang

    Abstract: In the field of deep learning, traditional attention mechanisms face significant challenges related to high computational complexity and large memory consumption when processing long sequence data. To address these limitations, we propose Opt-GPTQ, an optimized Gradient-based Post Training Quantization (GPTQ) combining the Grouped Query Attention (GQA) mechanism with paging memory management, opti… ▽ More

    Submitted 10 July, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

  16. arXiv:2505.01947  [pdf, ps, other

    cs.SE cs.LG cs.RO

    Runtime Anomaly Detection for Drones: An Integrated Rule-Mining and Unsupervised-Learning Approach

    Authors: Ivan Tan, Wei Minn, Christopher M. Poskitt, Lwin Khin Shar, Lingxiao Jiang

    Abstract: UAVs, commonly referred to as drones, have witnessed a remarkable surge in popularity due to their versatile applications. These cyber-physical systems depend on multiple sensor inputs, such as cameras, GPS receivers, accelerometers, and gyroscopes, with faults potentially leading to physical instability and serious safety concerns. To mitigate such risks, anomaly detection has emerged as a crucia… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: Accepted by the 29th International Conference on Engineering of Complex Computer Systems (ICECCS 2025)

  17. arXiv:2505.01657  [pdf, other

    cs.IR cs.CV

    RAGAR: Retrieval Augment Personalized Image Generation Guided by Recommendation

    Authors: Run Ling, Wenji Wang, Yuting Liu, Guibing Guo, Linying Jiang, Xingwei Wang

    Abstract: Personalized image generation is crucial for improving the user experience, as it renders reference images into preferred ones according to user visual preferences. Although effective, existing methods face two main issues. First, existing methods treat all items in the user historical sequence equally when extracting user preferences, overlooking the varying semantic similarities between historic… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  18. arXiv:2505.01236  [pdf, other

    quant-ph physics.comp-ph

    Qracle: A Graph-Neural-Network-based Parameter Initializer for Variational Quantum Eigensolvers

    Authors: Chi Zhang, Lei Jiang, Fan Chen

    Abstract: Variational Quantum Eigensolvers (VQEs) are a leading class of noisy intermediate-scale quantum (NISQ) algorithms with broad applications in quantum physics and quantum chemistry. However, as system size increases, VQE optimization is increasingly hindered by the barren plateau phenomenon, where gradients vanish and the loss function becomes trapped in local minima. While machine learning-based pa… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  19. arXiv:2505.01073  [pdf, other

    cs.AI

    Retrieval Augmented Learning: A Retrial-based Large Language Model Self-Supervised Learning and Autonomous Knowledge Generation

    Authors: Zongyuan Li, Pengfei Li, Runnan Qi, Yanan Ni, Lumin Jiang, Hui Wu, Xuebo Zhang, Kuihua Huang, Xian Guo

    Abstract: The lack of domain-specific data in the pre-training of Large Language Models (LLMs) severely limits LLM-based decision systems in specialized applications, while post-training a model in the scenarios requires significant computational resources. In this paper, we present Retrial-Augmented Learning (RAL), a reward-free self-supervised learning framework for LLMs that operates without model traini… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  20. arXiv:2505.00946  [pdf, other

    cs.LG cs.CR

    Addressing Noise and Stochasticity in Fraud Detection for Service Networks

    Authors: Wenxin Zhang, Ding Xu, Xi Xuan, Lei Jiang, Guangzhen Yao, Renda Han, Xiangxiang Lang, Cuicui Luo

    Abstract: Fraud detection is crucial in social service networks to maintain user trust and improve service network security. Existing spectral graph-based methods address this challenge by leveraging different graph filters to capture signals with different frequencies in service networks. However, most graph filter-based methods struggle with deriving clean and discriminative graph signals. On the one hand… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  21. arXiv:2505.00824  [pdf, ps, other

    quant-ph

    Enhancing Microwave-Optical Bell Pairs Generation for Quantum Transduction Using Kerr Nonlinearity

    Authors: Fangxin Li, Ming Yuan, Zhaoyou Wang, Changchun Zhong, Liang Jiang

    Abstract: Microwave-optical quantum transduction can be achieved via quantum teleportation using microwave-optical photon Bell pairs. The standard spontaneous parametric down-conversion (SPDC) has to trade off between generation fidelity and probability due to unwanted higher-excitation pairs in the output. In this work, we propose a pulsed SPDC scheme that employs strong Kerr nonlinearity in the microwave… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  22. arXiv:2504.17670  [pdf, other

    cs.CV

    DiMeR: Disentangled Mesh Reconstruction Model

    Authors: Lutao Jiang, Jiantao Lin, Kanghao Chen, Wenhang Ge, Xin Yang, Yifan Jiang, Yuanhuiyi Lyu, Xu Zheng, Yinchuan Li, Yingcong Chen

    Abstract: We propose DiMeR, a novel geometry-texture disentangled feed-forward model with 3D supervision for sparse-view mesh reconstruction. Existing methods confront two persistent obstacles: (i) textures can conceal geometric errors, i.e., visually plausible images can be rendered even with wrong geometry, producing multiple ambiguous optimization objectives in geometry-texture mixed solution space for s… ▽ More

    Submitted 26 May, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

    Comments: Project Page: https://lutao2021.github.io/DiMeR_page/

  23. arXiv:2504.17542  [pdf, other

    cs.SE

    Large Language Model-Driven Concolic Execution for Highly Structured Test Input Generation

    Authors: Haoxin Tu, Seongmin Lee, Yuxian Li, Peng Chen, Lingxiao Jiang, Marcel Böhme

    Abstract: How can we perform concolic execution to generate highly structured test inputs for systematically testing parsing programs? Existing concolic execution engines are significantly restricted by (1) input structure-agnostic path constraint selection, leading to the waste of testing effort or missing coverage; (2) limited constraint-solving capability, yielding many syntactically invalid test inputs;… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: 18 pages (including Appendix)

  24. Zoozve: A Strip-Mining-Free RISC-V Vector Extension with Arbitrary Register Grouping Compilation Support (WIP)

    Authors: Siyi Xu, Limin Jiang, Yintao Liu, Yihao Shen, Yi Shi, Shan Cao, Zhiyuan Jiang

    Abstract: Vector processing is crucial for boosting processor performance and efficiency, particularly with data-parallel tasks. The RISC-V "V" Vector Extension (RVV) enhances algorithm efficiency by supporting vector registers of dynamic sizes and their grouping. Nevertheless, for very long vectors, the static number of RVV vector registers and its power-of-two grouping can lead to performance restrictions… ▽ More

    Submitted 19 June, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

    Comments: 6 pages, 4 figures, LCTES'25

    Journal ref: Proceedings of the 26th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems. (2025), 51-56

  25. arXiv:2504.15054  [pdf, other

    cs.CV

    Structure-guided Diffusion Transformer for Low-Light Image Enhancement

    Authors: Xiangchen Yin, Zhenda Yu, Longtao Jiang, Xin Gao, Xiao Sun, Zhi Liu, Xun Yang

    Abstract: While the diffusion transformer (DiT) has become a focal point of interest in recent years, its application in low-light image enhancement remains a blank area for exploration. Current methods recover the details from low-light images while inevitably amplifying the noise in images, resulting in poor visual quality. In this paper, we firstly introduce DiT into the low-light enhancement task and de… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Accepted by IEEE Transactions on Multimedia (TMM)

  26. arXiv:2504.14202  [pdf, other

    cs.CV cs.AI

    Learning Joint ID-Textual Representation for ID-Preserving Image Synthesis

    Authors: Zichuan Liu, Liming Jiang, Qing Yan, Yumin Jia, Hao Kang, Xin Lu

    Abstract: We propose a novel framework for ID-preserving generation using a multi-modal encoding strategy rather than injecting identity features via adapters into pre-trained models. Our method treats identity and text as a unified conditioning input. To achieve this, we introduce FaceCLIP, a multi-modal encoder that learns a joint embedding space for both identity and textual semantics. Given a reference… ▽ More

    Submitted 21 May, 2025; v1 submitted 19 April, 2025; originally announced April 2025.

  27. arXiv:2504.13203  [pdf, other

    cs.CR cs.AI cs.CL cs.LG cs.MA

    X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents

    Authors: Salman Rahman, Liwei Jiang, James Shiffer, Genglin Liu, Sheriff Issaka, Md Rizwan Parvez, Hamid Palangi, Kai-Wei Chang, Yejin Choi, Saadia Gabriel

    Abstract: Multi-turn interactions with language models (LMs) pose critical safety risks, as harmful intent can be strategically spread across exchanges. Yet, the vast majority of prior work has focused on single-turn safety, while adaptability and diversity remain among the key challenges of multi-turn red-teaming. To address these challenges, we present X-Teaming, a scalable framework that systematically e… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  28. arXiv:2504.13168  [pdf, other

    quant-ph

    Restoring Heisenberg scaling in time via autonomous quantum error correction

    Authors: Hyukgun Kwon, Uwe R. Fischer, Seung-Woo Lee, Liang Jiang

    Abstract: We establish a sufficient condition under which autonomous quantum error correction (AutoQEC) can effectively restore Heisenberg scaling (HS) in quantum metrology. Specifically, we show that if all Lindblad operators associated with the noise commute with the signal Hamiltonian and a particular constrained linear equation admits a solution, then an ancilla-free AutoQEC scheme with finite $R$ (wher… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: 5 pages, 3 figures, 10 pages of supplemental material

  29. arXiv:2504.11478  [pdf, other

    cs.CV cs.AI

    Flux Already Knows -- Activating Subject-Driven Image Generation without Training

    Authors: Hao Kang, Stathi Fotiadis, Liming Jiang, Qing Yan, Yumin Jia, Zichuan Liu, Min Jin Chong, Xin Lu

    Abstract: We propose a simple yet effective zero-shot framework for subject-driven image generation using a vanilla Flux model. By framing the task as grid-based image completion and simply replicating the subject image(s) in a mosaic layout, we activate strong identity-preserving capabilities without any additional data, training, or inference-time fine-tuning. This "free lunch" approach is further strengt… ▽ More

    Submitted 19 April, 2025; v1 submitted 12 April, 2025; originally announced April 2025.

  30. arXiv:2504.10832  [pdf, other

    cs.AR

    Unlimited Vector Processing for Wireless Baseband Based on RISC-V Extension

    Authors: Limin Jiang, Yi Shi, Yihao Shen, Shan Cao, Zhiyuan Jiang, Sheng Zhou

    Abstract: Wireless baseband processing (WBP) serves as an ideal scenario for utilizing vector processing, which excels in managing data-parallel operations due to its parallel structure. However, conventional vector architectures face certain constraints such as limited vector register sizes, reliance on power-of-two vector length multipliers, and vector permutation capabilities tied to specific architectur… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: 13 pages, 9 figures, 3 tables, Under Review

  31. arXiv:2504.10686  [pdf, other

    cs.CV eess.IV

    The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song, Hongyuan Yu, Pufan Xu, Cheng Wan, Zhijuan Huang, Peng Guo, Shuyuan Cui, Chenjun Li, Xuehai Hu, Pan Pan, Xin Zhang, Heng Zhang, Qing Luo, Linyan Jiang , et al. (122 additional authors not shown)

    Abstract: This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR2025 NTIRE Workshop, Efficient Super-Resolution Challenge Report. 50 pages

  32. arXiv:2504.09090  [pdf, other

    eess.SP

    Leveraging Large Self-Supervised Time-Series Models for Transferable Diagnosis in Cross-Aircraft Type Bleed Air System

    Authors: Yilin Wang, Peixuan Lei, Xuyang Wang, Liangliang Jiang, Liming Xuan, Wei Cheng, Honghua Zhao, Yuanxiang Li

    Abstract: Bleed Air System (BAS) is critical for maintaining flight safety and operational efficiency, supporting functions such as cabin pressurization, air conditioning, and engine anti-icing. However, BAS malfunctions, including overpressure, low pressure, and overheating, pose significant risks such as cabin depressurization, equipment failure, or engine damage. Current diagnostic approaches face notabl… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  33. arXiv:2504.08685  [pdf, other

    cs.CV cs.AI

    Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

    Authors: Team Seawead, Ceyuan Yang, Zhijie Lin, Yang Zhao, Shanchuan Lin, Zhibei Ma, Haoyuan Guo, Hao Chen, Lu Qi, Sen Wang, Feng Cheng, Feilong Zuo, Xuejiao Zeng, Ziyan Yang, Fangyuan Kong, Meng Wei, Zhiwu Qing, Fei Xiao, Tuyen Hoang, Siyu Zhang, Peihao Zhu, Qi Zhao, Jiangqiao Yan, Liangke Gui, Sheng Bi , et al. (30 additional authors not shown)

    Abstract: This technical report presents a cost-efficient strategy for training a video generation foundation model. We present a mid-sized research model with approximately 7 billion parameters (7B) called Seaweed-7B trained from scratch using 665,000 H100 GPU hours. Despite being trained with moderate computational resources, Seaweed-7B demonstrates highly competitive performance compared to contemporary… ▽ More

    Submitted 4 May, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

    Comments: Technical report (some typos fixed)

  34. arXiv:2504.08371  [pdf, other

    cs.SD cs.AI eess.AS

    Passive Underwater Acoustic Signal Separation based on Feature Decoupling Dual-path Network

    Authors: Yucheng Liu, Longyu Jiang

    Abstract: Signal separation in the passive underwater acoustic domain has heavily relied on deep learning techniques to isolate ship radiated noise. However, the separation networks commonly used in this domain stem from speech separation applications and may not fully consider the unique aspects of underwater acoustics beforehand, such as the influence of different propagation media, signal frequencies and… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: 10pages,4 figures

    MSC Class: 68T10 ACM Class: I.5.4; I.2.6; J.2

  35. arXiv:2504.07158   

    cs.LG cs.CL

    Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models

    Authors: Ling Team, Caizhi Tang, Chilin Fu, Chunwei Wu, Jia Guo, Jianwen Wang, Jingyu Hu, Liang Jiang, Meng Li, Peng Jiao, Pingping Liu, Shaomian Zheng, Shiwei Liang, Shuaicheng Li, Yalin Zhang, Yingting Wu, Yongkang Liu, Zhenyu Huang

    Abstract: This technical report presents Ring-Lite-Distill, a lightweight reasoning model derived from our open-source Mixture-of-Experts (MoE) Large Language Models (LLMs) Ling-Lite. This study demonstrates that through meticulous high-quality data curation and ingenious training paradigms, the compact MoE model Ling-Lite can be further trained to achieve exceptional reasoning capabilities, while maintaini… ▽ More

    Submitted 10 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

    Comments: Based on the further discussion of the working group, the current version is deemed unsuitable for release. We are currently undertaking further work that is expected to involve significant revisions, but this process will require some additional time. We plan to proceed with the release once these updates have been fully implemented

  36. Probable evidence for a transient mega-electron volt emission line in the GRB 221023A

    Authors: Lu-Yao Jiang, Yun Wang, Yu-Jia Wei, Da-Ming Wei, Xiang Li, Hao-Ning He, Jia Ren, Zhao-Qiang Shen, Zhi-Ping Jin

    Abstract: Detection of spectral line in gamma-ray bursts (GRBs) is importance for studying GRB physics, as it provides insights into the composition and physical conditions of the GRB environment. However, progress in detecting X-ray or gamma-ray emission and absorption lines in GRB spectra has been relatively slow, only the narrow emission line feature of about 10 MeV found in GRB 221009A has exhibited a s… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: 20 pages, 5 figures, 3 tables. Publication in the Nature Communications

  37. arXiv:2504.05706  [pdf, other

    cs.CV

    SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning

    Authors: Fida Mohammad Thoker, Letian Jiang, Chen Zhao, Piyush Bagad, Hazel Doughty, Bernard Ghanem, Cees G. M. Snoek

    Abstract: Continued advances in self-supervised learning have led to significant progress in video representation learning, offering a scalable alternative to supervised approaches by removing the need for manual annotations. Despite strong performance on standard action recognition benchmarks, video self-supervised learning methods are largely evaluated under narrow protocols, typically pretraining on Kine… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: Under Review

  38. arXiv:2504.04377  [pdf, other

    cs.CL

    PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

    Authors: Priyanshu Kumar, Devansh Jain, Akhila Yerukola, Liwei Jiang, Himanshu Beniwal, Thomas Hartvigsen, Maarten Sap

    Abstract: Truly multilingual safety moderation efforts for Large Language Models (LLMs) have been hindered by a narrow focus on a small set of languages (e.g., English, Chinese) as well as a limited scope of safety definition, resulting in significant gaps in moderation capabilities. To bridge these gaps, we release POLYGUARD, a new state-of-the-art multilingual safety model for safeguarding LLM generations… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

  39. arXiv:2504.04099  [pdf, other

    cs.CV cs.AI

    TARAC: Mitigating Hallucination in LVLMs via Temporal Attention Real-time Accumulative Connection

    Authors: Chunzhao Xie, Tongxuan Liu, Lei Jiang, Yuting Zeng, jinrong Guo, Yunheng Shen, Weizhe Huang, Jing Li, Xiaohua Xu

    Abstract: Large Vision-Language Models have demonstrated remarkable performance across various tasks; however, the challenge of hallucinations constrains their practical applications. The hallucination problem arises from multiple factors, including the inherent hallucinations in language models, the limitations of visual encoders in perception, and biases introduced by multimodal data. Extensive research h… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

  40. arXiv:2504.03661  [pdf, other

    cs.DC

    MILLION: Mastering Long-Context LLM Inference Via Outlier-Immunized KV Product Quantization

    Authors: Zongwu Wang, Peng Xu, Fangxin Liu, Yiwei Hu, Qingxiao Sun, Gezi Li, Cheng Li, Xuan Wang, Li Jiang, Haibing Guan

    Abstract: Large language models (LLMs) are increasingly utilized for complex tasks requiring longer context lengths, with some models supporting up to 128K or 1M tokens. This trend, however, presents significant challenges in inference speed and memory management. Quantization emerges as a promising approach to address the widening gap between LLM size and memory capacity. However, traditional quantization… ▽ More

    Submitted 8 April, 2025; v1 submitted 12 March, 2025; originally announced April 2025.

    Comments: 7 pages, 7 figures and 4 tables

    ACM Class: I.2.0

  41. arXiv:2504.01523  [pdf, other

    cs.SE

    Adapting Knowledge Prompt Tuning for Enhanced Automated Program Repair

    Authors: Xuemeng Cai, Lingxiao Jiang

    Abstract: Automated Program Repair (APR) aims to enhance software reliability by automatically generating bug-fixing patches. Recent work has improved the state-of-the-art of APR by fine-tuning pre-trained large language models (LLMs), such as CodeT5, for APR. However, the effectiveness of fine-tuning becomes weakened in data scarcity scenarios, and data scarcity can be a common issue in practice, limiting… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  42. arXiv:2504.00527  [pdf, other

    cs.CV

    SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning

    Authors: Fida Mohammad Thoker, Letian Jiang, Chen Zhao, Bernard Ghanem

    Abstract: Masked video modeling, such as VideoMAE, is an effective paradigm for video self-supervised learning (SSL). However, they are primarily based on reconstructing pixel-level details on natural videos which have substantial temporal redundancy, limiting their capability for semantic representation and sufficient encoding of motion dynamics. To address these issues, this paper introduces a novel SSL a… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: Accepted to CVPR 2025

  43. arXiv:2504.00387  [pdf, other

    cs.CV

    Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration

    Authors: Zilong Huang, Jun He, Junyan Ye, Lihan Jiang, Weijia Li, Yiping Chen, Ting Han

    Abstract: The reconstruction of immersive and realistic 3D scenes holds significant practical importance in various fields of computer vision and computer graphics. Typically, immersive and realistic scenes should be free from obstructions by dynamic objects, maintain global texture consistency, and allow for unrestricted exploration. The current mainstream methods for image-driven scene construction involv… ▽ More

    Submitted 20 April, 2025; v1 submitted 31 March, 2025; originally announced April 2025.

    Comments: CVPR 2025, 11 pages, 7 figures

  44. arXiv:2504.00296  [pdf, other

    astro-ph.EP

    Dependence of Planet populations on Stellar Mass and Metallicity: A Pebble Accretion-based Planet Population Synthesis

    Authors: Mengrui Pan, Beibei Liu, Linjie Jiang, Jiwei Xie, Wei Zhu, Ignasi Ribas

    Abstract: The formation and evolution of planetary systems are linked to their host stellar environment. In this study, we employ a pebble accretion-based planet population synthesis model to explore the correlation between planetary properties and stellar mass/metallicity. Our numerical results reproduce several main aspects of exoplanetary observations. First, we find that the occurrence rate of super-Ear… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

    Comments: 15 pages, 7 figures, accepted by AJ

  45. arXiv:2503.23644  [pdf, other

    cs.GR cs.AR cs.CV

    Uni-Render: A Unified Accelerator for Real-Time Rendering Across Diverse Neural Renderers

    Authors: Chaojian Li, Sixu Li, Linrui Jiang, Jingqun Zhang, Yingyan Celine Lin

    Abstract: Recent advancements in neural rendering technologies and their supporting devices have paved the way for immersive 3D experiences, significantly transforming human interaction with intelligent devices across diverse applications. However, achieving the desired real-time rendering speeds for immersive interactions is still hindered by (1) the lack of a universal algorithmic solution for different a… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

    Comments: Accepted by HPCA'25

  46. arXiv:2503.23025  [pdf, other

    cs.CG

    Simplification of Trajectory Streams

    Authors: Siu-Wing Cheng, Haoqiang Huang, Le Jiang

    Abstract: While there are software systems that simplify trajectory streams on the fly, few curve simplification algorithms with quality guarantees fit the streaming requirements. We present streaming algorithms for two such problems under the Fréchet distance $d_F$ in $\mathbb{R}^d$ for some constant $d \geq 2$. Consider a polygonal curve $τ$ in $\mathbb{R}^d$ in a stream. We present a streaming algorith… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

    Comments: SoCG 2025

  47. arXiv:2503.22436  [pdf, other

    cs.CV

    NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving

    Authors: Fuhao Li, Huan Jin, Bin Gao, Liaoyuan Fan, Lihui Jiang, Long Zeng

    Abstract: Multi-view 3D visual grounding is critical for autonomous driving vehicles to interpret natural languages and localize target objects in complex environments. However, existing datasets and methods suffer from coarse-grained language instructions, and inadequate integration of 3D geometric reasoning with linguistic comprehension. To this end, we introduce NuGrounding, the first large-scale benchma… ▽ More

    Submitted 25 May, 2025; v1 submitted 28 March, 2025; originally announced March 2025.

  48. "Ignorance is Not Bliss": Designing Personalized Moderation to Address Ableist Hate on Social Media

    Authors: Sharon Heung, Lucy Jiang, Shiri Azenkot, Aditya Vashistha

    Abstract: Disabled people on social media often experience ableist hate and microaggressions. Prior work has shown that platform moderation often fails to remove ableist hate leaving disabled users exposed to harmful content. This paper examines how personalized moderation can safeguard users from viewing ableist comments. During interviews and focus groups with 23 disabled social media users, we presented… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  49. arXiv:2503.20822  [pdf, other

    eess.IV cs.AI cs.GR

    Synthetic Video Enhances Physical Fidelity in Video Synthesis

    Authors: Qi Zhao, Xingyu Ni, Ziyu Wang, Feng Cheng, Ziyan Yang, Lu Jiang, Bohan Wang

    Abstract: We investigate how to enhance the physical fidelity of video generation models by leveraging synthetic videos derived from computer graphics pipelines. These rendered videos respect real-world physics, such as maintaining 3D consistency, and serve as a valuable resource that can potentially improve video generation models. To harness this potential, we propose a solution that curates and integrate… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  50. arXiv:2503.18016  [pdf, other

    cs.CV

    Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook

    Authors: Xu Zheng, Ziqiao Weng, Yuanhuiyi Lyu, Lutao Jiang, Haiwei Xue, Bin Ren, Danda Paudel, Nicu Sebe, Luc Van Gool, Xuming Hu

    Abstract: Retrieval-augmented generation (RAG) has emerged as a pivotal technique in artificial intelligence (AI), particularly in enhancing the capabilities of large language models (LLMs) by enabling access to external, reliable, and up-to-date knowledge sources. In the context of AI-Generated Content (AIGC), RAG has proven invaluable by augmenting model outputs with supplementary, relevant information, t… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: 19 pages, 10 figures