Skip to main content

Showing 1–50 of 261 results for author: Jiang, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05121  [pdf, ps, other

    cs.IT cs.AI cs.CV cs.LG

    LVM4CSI: Enabling Direct Application of Pre-Trained Large Vision Models for Wireless Channel Tasks

    Authors: Jiajia Guo, Peiwen Jiang, Chao-Kai Wen, Shi Jin, Jun Zhang

    Abstract: Accurate channel state information (CSI) is critical to the performance of wireless communication systems, especially with the increasing scale and complexity introduced by 5G and future 6G technologies. While artificial intelligence (AI) offers a promising approach to CSI acquisition and utilization, existing methods largely depend on task-specific neural networks (NNs) that require expert-driven… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: This work has been submitted for possible publication

  2. Reward Balancing Revisited: Enhancing Offline Reinforcement Learning for Recommender Systems

    Authors: Wenzheng Shu, Yanxiang Zeng, Yongxiang Tang, Teng Sha, Ning Luo, Yanhua Cheng, Xialong Liu, Fan Zhou, Peng Jiang

    Abstract: Offline reinforcement learning (RL) has emerged as a prevalent and effective methodology for real-world recommender systems, enabling learning policies from historical data and capturing user preferences. In offline RL, reward shaping encounters significant challenges, with past efforts to incorporate prior strategies for uncertainty to improve world models or penalize underexplored state-action p… ▽ More

    Submitted 30 June, 2025; v1 submitted 27 June, 2025; originally announced June 2025.

    Comments: Accepted in Companion Proceedings of the ACM Web Conference 2025

  3. Optimal Return-to-Go Guided Decision Transformer for Auto-Bidding in Advertisement

    Authors: Hao Jiang, Yongxiang Tang, Yanxiang Zeng, Pengjia Yuan, Yanhua Cheng, Teng Sha, Xialong Liu, Peng Jiang

    Abstract: In the realm of online advertising, advertisers partake in ad auctions to obtain advertising slots, frequently taking advantage of auto-bidding tools provided by demand-side platforms. To improve the automation of these bidding systems, we adopt generative models, namely the Decision Transformer (DT), to tackle the difficulties inherent in automated bidding. Applying the Decision Transformer to th… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  4. arXiv:2506.14809  [pdf

    cs.HC cs.LG

    Impact of a Deployed LLM Survey Creation Tool through the IS Success Model

    Authors: Peng Jiang, Vinicius Cezar Monteiro de Lira, Antonio Maiorino

    Abstract: Surveys are a cornerstone of Information Systems (IS) research, yet creating high-quality surveys remains labor-intensive, requiring both domain expertise and methodological rigor. With the evolution of large language models (LLMs), new opportunities emerge to automate survey generation. This paper presents the real-world deployment of an LLM-powered system designed to accelerate data collection w… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    ACM Class: I.2; H.4

  5. arXiv:2506.11430  [pdf, ps, other

    cs.CV

    Auto-Connect: Connectivity-Preserving RigFormer with Direct Preference Optimization

    Authors: Jingfeng Guo, Jian Liu, Jinnan Chen, Shiwei Mao, Changrong Hu, Puhua Jiang, Junlin Yu, Jing Xu, Qi Liu, Lixin Xu, Zhuo Chen, Chunchao Guo

    Abstract: We introduce Auto-Connect, a novel approach for automatic rigging that explicitly preserves skeletal connectivity through a connectivity-preserving tokenization scheme. Unlike previous methods that predict bone positions represented as two joints or first predict points before determining connectivity, our method employs special tokens to define endpoints for each joint's children and for each hie… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  6. arXiv:2506.08364  [pdf, ps, other

    cs.CL

    CC-RAG: Structured Multi-Hop Reasoning via Theme-Based Causal Graphs

    Authors: Jash Rajesh Parekh, Pengcheng Jiang, Jiawei Han

    Abstract: Understanding cause and effect relationships remains a formidable challenge for Large Language Models (LLMs), particularly in specialized domains where reasoning requires more than surface-level correlations. Retrieval-Augmented Generation (RAG) improves factual accuracy, but standard RAG pipelines treat evidence as flat context, lacking the structure required to model true causal dependencies. We… ▽ More

    Submitted 10 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

  7. arXiv:2506.06979  [pdf

    cs.NE

    Research on Aerodynamic Performance Prediction of Airfoils Based on a Fusion Algorithm of Transformer and GAN

    Authors: MaolinYang, Yaohui Wang, Pingyu Jiang

    Abstract: Predicting of airfoil aerodynamic performance is a key part of aircraft design optimization, but the traditional methods (such as wind tunnel test and CFD simulation) have the problems of high cost and low efficiency, and the existing data-driven models face the challenges of insufficient accuracy and strong data dependence in multi-objective prediction. Therefore, this study proposes a deep learn… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: 33 pages,10 figures

  8. arXiv:2506.05384  [pdf, ps, other

    cs.CV cs.AI cs.MM

    Q-Ponder: A Unified Training Pipeline for Reasoning-based Visual Quality Assessment

    Authors: Zhuoxuan Cai, Jian Zhang, Xinbin Yuan, Peng-Tao Jiang, Wenxiang Chen, Bowen Tang, Lujian Yao, Qiyuan Wang, Jinwen Chen, Bo Li

    Abstract: Recent studies demonstrate that multimodal large language models (MLLMs) can proficiently evaluate visual quality through interpretable assessments. However, existing approaches typically treat quality scoring and reasoning descriptions as separate tasks with disjoint optimization objectives, leading to a trade-off: models adept at quality reasoning descriptions struggle with precise score regress… ▽ More

    Submitted 12 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

  9. arXiv:2506.04458  [pdf, ps, other

    cs.CL

    Zero-Shot Open-Schema Entity Structure Discovery

    Authors: Xueqiang Xu, Jinfeng Xiao, James Barry, Mohab Elkaref, Jiaru Zou, Pengcheng Jiang, Yunyi Zhang, Max Giammona, Geeth de Mel, Jiawei Han

    Abstract: Entity structure extraction, which aims to extract entities and their associated attribute-value structures from text, is an essential task for text understanding and knowledge graph construction. Existing methods based on large language models (LLMs) typically rely heavily on predefined entity attribute schemas or annotated datasets, often leading to incomplete extraction results. To address thes… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 14 pages, 3 figures

  10. arXiv:2506.03542  [pdf, ps, other

    cs.LG

    Learning Monotonic Probabilities with a Generative Cost Model

    Authors: Yongxiang Tang, Yanhua Cheng, Xiaocheng Liu, Chenchen Jiao, Yanxiang Zeng, Ning Luo, Pengjia Yuan, Xialong Liu, Peng Jiang

    Abstract: In many machine learning tasks, it is often necessary for the relationship between input and output variables to be monotonic, including both strictly monotonic and implicitly monotonic relationships. Traditional methods for maintaining monotonicity mainly rely on construction or regularization techniques, whereas this paper shows that the issue of strict monotonic probability can be viewed as a p… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  11. arXiv:2505.24593  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis

    Authors: Junzhuo Li, Bo Wang, Xiuze Zhou, Peijie Jiang, Jia Liu, Xuming Hu

    Abstract: The interpretability of Mixture-of-Experts (MoE) models, especially those with heterogeneous designs, remains underexplored. Existing attribution methods for dense models fail to capture dynamic routing-expert interactions in sparse MoE architectures. To address this issue, we propose a cross-level attribution algorithm to analyze sparse MoE architectures (Qwen 1.5-MoE, OLMoE, Mixtral-8x7B) agains… ▽ More

    Submitted 11 June, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

    Comments: ACL 2025

  12. arXiv:2505.22977  [pdf, ps, other

    cs.CV

    HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions

    Authors: Shuolin Xu, Siming Zheng, Ziyi Wang, HC Yu, Jinwei Chen, Huaqi Zhang, Bo Li, Peng-Tao Jiang

    Abstract: Recent advances in diffusion models have significantly improved conditional video generation, particularly in the pose-guided human image animation task. Although existing methods are capable of generating high-fidelity and time-consistent animation sequences in regular motions and static scenes, there are still obvious limitations when facing complex human body motions (Hypermotion) that contain… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 17 pages, 7 figures

  13. arXiv:2505.22831  [pdf, ps, other

    cs.HC cs.AI

    Orca: Browsing at Scale Through User-Driven and AI-Facilitated Orchestration Across Malleable Webpages

    Authors: Peiling Jiang, Haijun Xia

    Abstract: Web-based activities are fundamentally distributed across webpages. However, conventional browsers with stacks of tabs fail to support operating and synthesizing large volumes of information across pages. While recent AI systems enable fully automated web browsing and information synthesis, they often diminish user agency and hinder contextual understanding. Therefore, we explore how AI could inst… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  14. arXiv:2505.22445  [pdf, other

    cs.CV cs.AI

    NFR: Neural Feature-Guided Non-Rigid Shape Registration

    Authors: Puhua Jiang, Zhangquan Chen, Mingze Sun, Ruqi Huang

    Abstract: In this paper, we propose a novel learning-based framework for 3D shape registration, which overcomes the challenges of significant non-rigid deformation and partiality undergoing among input shapes, and, remarkably, requires no correspondence annotation during training. Our key insight is to incorporate neural features learned by deep learning-based shape matching networks into an iterative, geom… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 20 pages, 9 figures. arXiv admin note: substantial text overlap with arXiv:2311.04494

    ACM Class: I.4.m; I.2.6

  15. arXiv:2505.22153  [pdf, ps, other

    cs.IR

    Personalized Tree based progressive regression model for watch-time prediction in short video recommendation

    Authors: Xiaokai Chen, Xiao Lin, Changcheng Li, Peng Jiang

    Abstract: In online video platforms, accurate watch time prediction has become a fundamental and challenging problem in video recommendation. Previous research has revealed that the accuracy of watch time prediction highly depends on both the transformation of watch-time labels and the decomposition of the estimation process. TPM (Tree based Progressive Regression Model) achieves State-of-the-Art performanc… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  16. arXiv:2505.21593  [pdf, ps, other

    cs.CV cs.AI

    Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion

    Authors: Yang Yang, Siming Zheng, Jinwei Chen, Boxi Wu, Xiaofei He, Deng Cai, Bo Li, Peng-Tao Jiang

    Abstract: Recent advances in diffusion based editing models have enabled realistic camera simulation and image-based bokeh, but video bokeh remains largely unexplored. Existing video editing models cannot explicitly control focus planes or adjust bokeh intensity, limiting their applicability for controllable optical effects. Moreover, naively extending image-based bokeh methods to video often results in tem… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: project page: https://vivocameraresearch.github.io/any2bokeh/

  17. arXiv:2505.21325  [pdf, ps, other

    cs.CV

    MagicTryOn: Harnessing Diffusion Transformer for Garment-Preserving Video Virtual Try-on

    Authors: Guangyuan Li, Siming Zheng, Hao Zhang, Jinwei Chen, Junsheng Luan, Binkai Ou, Lei Zhao, Bo Li, Peng-Tao Jiang

    Abstract: Video Virtual Try-On (VVT) aims to simulate the natural appearance of garments across consecutive video frames, capturing their dynamic variations and interactions with human body motion. However, current VVT methods still face challenges in terms of spatiotemporal consistency and garment content preservation. First, they use diffusion models based on the U-Net, which are limited in their expressi… ▽ More

    Submitted 28 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  18. arXiv:2505.20926  [pdf

    cs.RO eess.SY

    COM Adjustment Mechanism Control for Multi-Configuration Motion Stability of Unmanned Deformable Vehicle

    Authors: Jun Liu, Hongxun Liu, Cheng Zhang, Jiandang Xing, Shang Jiang, Ping Jiang

    Abstract: An unmanned deformable vehicle is a wheel-legged robot transforming between two configurations: vehicular and humanoid states, with different motion modes and stability characteristics. To address motion stability in multiple configurations, a center-of-mass adjustment mechanism was designed. Further, a motion stability hierarchical control algorithm was proposed, and an electromechanical model ba… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  19. arXiv:2505.20655  [pdf, ps, other

    cs.CV

    Photography Perspective Composition: Towards Aesthetic Perspective Recommendation

    Authors: Lujian Yao, Siming Zheng, Xinbin Yuan, Zhuoxuan Cai, Pu Wu, Jinwei Chen, Bo Li, Peng-Tao Jiang

    Abstract: Traditional photography composition approaches are dominated by 2D cropping-based methods. However, these methods fall short when scenes contain poorly arranged subjects. Professional photographers often employ perspective adjustment as a form of 3D recomposition, modifying the projected 2D relationships between subjects while maintaining their actual spatial positions to achieve better compositio… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  20. arXiv:2505.19505  [pdf, other

    cs.IR cs.AI

    Hierarchical Tree Search-based User Lifelong Behavior Modeling on Large Language Model

    Authors: Yu Xia, Rui Zhong, Hao Gu, Wei Yang, Chi Lu, Peng Jiang, Kun Gai

    Abstract: Large Language Models (LLMs) have garnered significant attention in Recommendation Systems (RS) due to their extensive world knowledge and robust reasoning capabilities. However, a critical challenge lies in enabling LLMs to effectively comprehend and extract insights from massive user behaviors. Current approaches that directly leverage LLMs for user interest learning face limitations in handling… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  21. arXiv:2505.18078  [pdf, ps, other

    cs.CV

    DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation

    Authors: Junhao Chen, Mingjin Chen, Jianjin Xu, Xiang Li, Junting Dong, Mingze Sun, Puhua Jiang, Hongxiang Li, Yuhang Yang, Hao Zhao, Xiaoxiao Long, Ruqi Huang

    Abstract: Controllable video generation (CVG) has advanced rapidly, yet current systems falter when more than one actor must move, interact, and exchange positions under noisy control signals. We address this gap with DanceTogether, the first end-to-end diffusion framework that turns a single reference image plus independent pose-mask streams into long, photorealistic videos while strictly preserving every… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: Our video demos and code are available at https://DanceTog.github.io/

  22. arXiv:2505.17621  [pdf, ps, other

    cs.LG

    Navigate the Unknown: Enhancing LLM Reasoning with Intrinsic Motivation Guided Exploration

    Authors: Jingtong Gao, Ling Pan, Yejing Wang, Rui Zhong, Chi Lu, Qingpeng Cai, Peng Jiang, Xiangyu Zhao

    Abstract: Reinforcement learning (RL) has emerged as a pivotal method for improving the reasoning capabilities of Large Language Models (LLMs). However, prevalent RL approaches such as Proximal Policy Optimization (PPO) and Group-Regularized Policy Optimization (GRPO) face critical limitations due to their reliance on sparse outcome-based rewards and inadequate mechanisms for incentivizing exploration. Thes… ▽ More

    Submitted 27 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  23. arXiv:2505.16643  [pdf, other

    cs.CV cs.AI

    From Evaluation to Defense: Advancing Safety in Video Large Language Models

    Authors: Yiwei Sun, Peiqi Jiang, Chuanbin Liu, Luohao Lin, Zhiying Lu, Hongtao Xie

    Abstract: While the safety risks of image-based large language models have been extensively studied, their video-based counterparts (Video LLMs) remain critically under-examined. To systematically study this problem, we introduce \textbf{VideoSafetyBench (VSB-77k) - the first large-scale, culturally diverse benchmark for Video LLM safety}, which compromises 77,646 video-query pairs and spans 19 principal ri… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 49 pages, 12 figures, 17 tables

  24. arXiv:2505.16097  [pdf, ps, other

    cs.AI

    TrialPanorama: Database and Benchmark for Systematic Review and Design of Clinical Trials

    Authors: Zifeng Wang, Qiao Jin, Jiacheng Lin, Junyi Gao, Jathurshan Pradeepkumar, Pengcheng Jiang, Benjamin Danek, Zhiyong Lu, Jimeng Sun

    Abstract: Developing artificial intelligence (AI) for vertical domains requires a solid data foundation for both training and evaluation. In this work, we introduce TrialPanorama, a large-scale, structured database comprising 1,657,476 clinical trial records aggregated from 15 global sources. The database captures key aspects of trial design and execution, including trial setups, interventions, conditions,… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  25. arXiv:2505.14146  [pdf, ps, other

    cs.AI cs.CL

    s3: You Don't Need That Much Data to Train a Search Agent via RL

    Authors: Pengcheng Jiang, Xueqiang Xu, Jiacheng Lin, Jinfeng Xiao, Zifeng Wang, Jimeng Sun, Jiawei Han

    Abstract: Retrieval-augmented generation (RAG) systems empower large language models (LLMs) to access external knowledge during inference. Recent advances have enabled LLMs to act as search agents via reinforcement learning (RL), improving information acquisition through multi-turn interactions with retrieval engines. However, existing approaches either optimize retrieval using search-only metrics (e.g., ND… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  26. arXiv:2505.12370  [pdf, other

    cs.AI

    Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning

    Authors: Xinbin Yuan, Jian Zhang, Kaixin Li, Zhuoxuan Cai, Lujian Yao, Jie Chen, Enguang Wang, Qibin Hou, Jinwei Chen, Peng-Tao Jiang, Bo Li

    Abstract: Graphical User Interface (GUI) agents have made substantial strides in understanding and executing user instructions across diverse platforms. Yet, grounding these instructions to precise interface elements remains challenging, especially in complex, high-resolution, professional environments. Traditional supervised finetuning (SFT) methods often require large volumes of diverse data and exhibit w… ▽ More

    Submitted 23 May, 2025; v1 submitted 18 May, 2025; originally announced May 2025.

  27. arXiv:2505.10648  [pdf

    cs.HC

    Generative Muscle Stimulation: Physical Assistance by Constraining Multimodal-AI with Biomechanical Knowledge

    Authors: Yun Ho, Romain Nith, Peili Jiang, Shan-Yuan Teng, Pedro Lopes

    Abstract: Decades of interactive electrical-muscle-stimulation (EMS) revealed its promise as a wearable interface for physical assistance-EMS directly demonstrates movements through the users' body (e.g., shaking a spray-can before painting). However, interactive EMS-systems are highly-specialized because their feedback is (1) fixed (e.g., one program executes spray-can instructions, another executes piano… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 13 pages, 19 figures

  28. Massive MIMO-OFDM Channel Acquisition with Time-Frequency Phase-Shifted Pilots

    Authors: Jinke Tang, Xiqi Gao, Li You, Ding Shi, Jiyuan Yang, Xiang-Gen Xia, Xinwei Zhao, Peigang Jiang

    Abstract: In this paper, we propose a channel acquisition approach with time-frequency phase-shifted pilots (TFPSPs) for massive multi-input multi-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. We first present a triple-beam (TB) based channel tensor model, allowing for the representation of the space-frequency-time (SFT) domain channel as the product of beam matrices and the TB doma… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 15 pages, 10 figures. Accepted for publication on IEEE Transactions on Communications

    Journal ref: IEEE Transactions on Communications, vol. 73, no. 6, pp. 4520-4535, Jun. 2025

  29. arXiv:2505.02043  [pdf, other

    cs.CV

    Point2Primitive: CAD Reconstruction from Point Cloud by Direct Primitive Prediction

    Authors: Cheng Wang, Xinzhu Ma, Bin Wang, Shixiang Tang, Yuan Meng, Ping Jiang

    Abstract: Recovering CAD models from point clouds, especially the sketch-extrusion process, can be seen as the process of rebuilding the topology and extrusion primitives. Previous methods utilize implicit fields for sketch representation, leading to shape reconstruction of curved edges. In this paper, we proposed a CAD reconstruction network that produces editable CAD models from input point clouds (Point2… ▽ More

    Submitted 20 May, 2025; v1 submitted 4 May, 2025; originally announced May 2025.

  30. arXiv:2504.19736  [pdf, other

    cs.RO

    UTTG_ A Universal Teleoperation Approach via Online Trajectory Generation

    Authors: Shengjian Fang, Yixuan Zhou, Yu Zheng, Pengyu Jiang, Siyuan Liu, Hesheng Wang

    Abstract: Teleoperation is crucial for hazardous environment operations and serves as a key tool for collecting expert demonstrations in robot learning. However, existing methods face robotic hardware dependency and control frequency mismatches between teleoperation devices and robotic platforms. Our approach automatically extracts kinematic parameters from unified robot description format (URDF) files, and… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  31. arXiv:2504.19566  [pdf, other

    cs.CR

    Metadata-private Messaging without Coordination

    Authors: Peipei Jiang, Yihao Wu, Lei Xu, Wentao Dong, Peiyuan Chen, Yulong Ming, Cong Wang, Xiaohua Jia, Qian Wang

    Abstract: For those seeking end-to-end private communication free from pervasive metadata tracking and censorship, the Tor network has been the de-facto choice in practice, despite its susceptibility to traffic analysis attacks. Recently, numerous metadata-private messaging proposals have emerged with the aim to surpass Tor in the messaging context by obscuring the relationships between any two messaging bu… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  32. arXiv:2504.15756  [pdf, other

    cs.CV eess.IV

    DSDNet: Raw Domain Demoiréing via Dual Color-Space Synergy

    Authors: Qirui Yang, Fangpu Zhang, Yeying Jin, Qihua Cheng, Pengtao Jiang, Huanjing Yue, Jingyu Yang

    Abstract: With the rapid advancement of mobile imaging, capturing screens using smartphones has become a prevalent practice in distance learning and conference recording. However, moiré artifacts, caused by frequency aliasing between display screens and camera sensors, are further amplified by the image signal processing pipeline, leading to severe visual degradation. Existing sRGB domain demoiréing methods… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  33. arXiv:2504.14587  [pdf, other

    cs.LG cs.IR

    Generative Auto-Bidding with Value-Guided Explorations

    Authors: Jingtong Gao, Yewen Li, Shuai Mao, Peng Jiang, Nan Jiang, Yejing Wang, Qingpeng Cai, Fei Pan, Peng Jiang, Kun Gai, Bo An, Xiangyu Zhao

    Abstract: Auto-bidding, with its strong capability to optimize bidding decisions within dynamic and competitive online environments, has become a pivotal strategy for advertising platforms. Existing approaches typically employ rule-based strategies or Reinforcement Learning (RL) techniques. However, rule-based strategies lack the flexibility to adapt to time-varying market conditions, and RL-based methods s… ▽ More

    Submitted 25 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

  34. arXiv:2504.06780  [pdf, ps, other

    cs.IR

    CHIME: A Compressive Framework for Holistic Interest Modeling

    Authors: Yong Bai, Rui Xiang, Kaiyuan Li, Yongxiang Tang, Yanhua Cheng, Xialong Liu, Peng Jiang, Kun Gai

    Abstract: Modeling holistic user interests is important for improving recommendation systems but is challenged by high computational cost and difficulty in handling diverse information with full behavior context. Existing search-based methods might lose critical signals during behavior selection. To overcome these limitations, we propose CHIME: A Compressive Framework for Holistic Interest Modeling. It uses… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  35. arXiv:2504.06636  [pdf, other

    cs.IR

    BBQRec: Behavior-Bind Quantization for Multi-Modal Sequential Recommendation

    Authors: Kaiyuan Li, Rui Xiang, Yong Bai, Yongxiang Tang, Yanhua Cheng, Xialong Liu, Peng Jiang, Kun Gai

    Abstract: Multi-modal sequential recommendation systems leverage auxiliary signals (e.g., text, images) to alleviate data sparsity in user-item interactions. While recent methods exploit large language models to encode modalities into discrete semantic IDs for autoregressive prediction, we identify two critical limitations: (1) Existing approaches adopt fragmented quantization, where modalities are independ… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  36. arXiv:2504.02509  [pdf

    cs.AI cs.RO

    A Memory-Augmented LLM-Driven Method for Autonomous Merging of 3D Printing Work Orders

    Authors: Yuhao Liu, Maolin Yang, Pingyu Jiang

    Abstract: With the rapid development of 3D printing, the demand for personalized and customized production on the manufacturing line is steadily increasing. Efficient merging of printing workpieces can significantly enhance the processing efficiency of the production line. Addressing the challenge, a Large Language Model (LLM)-driven method is established in this paper for the autonomous merging of 3D print… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: 6 pages, 5 figures

  37. arXiv:2503.22214  [pdf, other

    cs.LG

    Interpretable Deep Learning Paradigm for Airborne Transient Electromagnetic Inversion

    Authors: Shuang Wang, Xuben Wang, Fei Deng, Xiaodong Yu, Peifan Jiang, Lifeng Mao

    Abstract: The extraction of geoelectric structural information from airborne transient electromagnetic(ATEM)data primarily involves data processing and inversion. Conventional methods rely on empirical parameter selection, making it difficult to process complex field data with high noise levels. Additionally, inversion computations are time consuming and often suffer from multiple local minima. Existing dee… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

  38. arXiv:2503.21791  [pdf, other

    physics.geo-ph cs.LG

    SeisRDT: Latent Diffusion Model Based On Representation Learning For Seismic Data Interpolation And Reconstruction

    Authors: Shuang Wang, Fei Deng, Peifan Jiang, Zezheng Ni, Bin Wang

    Abstract: Due to limitations such as geographic, physical, or economic factors, collected seismic data often have missing traces. Traditional seismic data reconstruction methods face the challenge of selecting numerous empirical parameters and struggle to handle large-scale continuous missing traces. With the advancement of deep learning, various diffusion models have demonstrated strong reconstruction capa… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

    Comments: Submitted to geopysics

  39. arXiv:2503.20031  [pdf, other

    astro-ph.IM cs.CE

    Lossy Compression of Scientific Data: Applications Constrains and Requirements

    Authors: Franck Cappello, Allison Baker, Ebru Bozda, Martin Burtscher, Kyle Chard, Sheng Di, Paul Christopher O Grady, Peng Jiang, Shaomeng Li, Erik Lindahl, Peter Lindstrom, Magnus Lundborg, Kai Zhao, Xin Liang, Masaru Nagaso, Kento Sato, Amarjit Singh, Seung Woo Son, Dingwen Tao, Jiannan Tian, Robert Underwood, Kazutomo Yoshii, Danylo Lykov, Yuri Alexeev, Kyle Gerard Felker

    Abstract: Increasing data volumes from scientific simulations and instruments (supercomputers, accelerators, telescopes) often exceed network, storage, and analysis capabilities. The scientific community's response to this challenge is scientific data reduction. Reduction can take many forms, such as triggering, sampling, filtering, quantization, and dimensionality reduction. This report focuses on a specif… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: 33 pages

  40. arXiv:2503.17672  [pdf, other

    cs.CV

    A Temporal Modeling Framework for Video Pre-Training on Video Instance Segmentation

    Authors: Qing Zhong, Peng-Tao Jiang, Wen Wang, Guodong Ding, Lin Wu, Kaiqi Huang

    Abstract: Contemporary Video Instance Segmentation (VIS) methods typically adhere to a pre-train then fine-tune regime, where a segmentation model trained on images is fine-tuned on videos. However, the lack of temporal knowledge in the pre-trained model introduces a domain gap which may adversely affect the VIS performance. To effectively bridge this gap, we present a novel video pre-training approach to e… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

    Comments: 7 pages, 5figures, 6 tables, Accepted to ICME 2025

  41. arXiv:2503.16254  [pdf, other

    cs.CV

    M2N2V2: Multi-Modal Unsupervised and Training-free Interactive Segmentation

    Authors: Markus Karmann, Peng-Tao Jiang, Bo Li, Onay Urfalioglu

    Abstract: We present Markov Map Nearest Neighbor V2 (M2N2V2), a novel and simple, yet effective approach which leverages depth guidance and attention maps for unsupervised and training-free point-prompt-based interactive segmentation. Following recent trends in supervised multimodal approaches, we carefully integrate depth as an additional modality to create novel depth-guided Markov-maps. Furthermore, we o… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  42. arXiv:2503.09492  [pdf, ps, other

    cs.IR cs.LG

    Learning Cascade Ranking as One Network

    Authors: Yunli Wang, Zhen Zhang, Zhiqiang Wang, Zixuan Yang, Yu Li, Jian Yang, Shiyang Wen, Peng Jiang, Kun Gai

    Abstract: Cascade Ranking is a prevalent architecture in large-scale top-k selection systems like recommendation and advertising platforms. Traditional training methods focus on single-stage optimization, neglecting interactions between stages. Recent advances have introduced interaction-aware training paradigms, but still struggle to 1) align training objectives with the goal of the entire cascade ranking… ▽ More

    Submitted 4 June, 2025; v1 submitted 12 March, 2025; originally announced March 2025.

    Comments: Accepted by ICML 2025

  43. arXiv:2503.04084  [pdf, other

    cs.HC

    Generative and Malleable User Interfaces with Generative and Evolving Task-Driven Data Model

    Authors: Yining Cao, Peiling Jiang, Haijun Xia

    Abstract: Unlike static and rigid user interfaces, generative and malleable user interfaces offer the potential to respond to diverse users' goals and tasks. However, current approaches primarily rely on generating code, making it difficult for end-users to iteratively tailor the generated interface to their evolving needs. We propose employing task-driven data models-representing the essential information… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  44. arXiv:2503.00223  [pdf, other

    cs.IR

    DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning

    Authors: Pengcheng Jiang, Jiacheng Lin, Lang Cao, Runchu Tian, SeongKu Kang, Zifeng Wang, Jimeng Sun, Jiawei Han

    Abstract: Information retrieval systems are crucial for enabling effective access to large document collections. Recent approaches have leveraged Large Language Models (LLMs) to enhance retrieval performance through query augmentation, but often rely on expensive supervised learning or distillation techniques that require significant computational resources and hand-labeled data. We introduce DeepRetrieval,… ▽ More

    Submitted 11 April, 2025; v1 submitted 28 February, 2025; originally announced March 2025.

  45. arXiv:2502.12520  [pdf, other

    cs.CV

    SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning

    Authors: Junkai Chen, Zhijie Deng, Kening Zheng, Yibo Yan, Shuliang Liu, PeiJun Wu, Peijie Jiang, Jia Liu, Xuming Hu

    Abstract: As Multimodal Large Language Models (MLLMs) develop, their potential security issues have become increasingly prominent. Machine Unlearning (MU), as an effective strategy for forgetting specific knowledge in training data, has been widely used in privacy protection. However, MU for safety in MLLM has yet to be fully explored. To address this issue, we propose SAFEERASER, a safety unlearning benchm… ▽ More

    Submitted 24 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  46. arXiv:2502.12448  [pdf, other

    cs.IR

    From Principles to Applications: A Comprehensive Survey of Discrete Tokenizers in Generation, Comprehension, Recommendation, and Information Retrieval

    Authors: Jian Jia, Jingtong Gao, Ben Xue, Junhao Wang, Qingpeng Cai, Quan Chen, Xiangyu Zhao, Peng Jiang, Kun Gai

    Abstract: Discrete tokenizers have emerged as indispensable components in modern machine learning systems, particularly within the context of autoregressive modeling and large language models (LLMs). These tokenizers serve as the critical interface that transforms raw, unstructured data from diverse modalities into discrete tokens, enabling LLMs to operate effectively across a wide range of tasks. Despite t… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  47. arXiv:2502.10996  [pdf, other

    cs.CL

    RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation

    Authors: Pengcheng Jiang, Lang Cao, Ruike Zhu, Minhao Jiang, Yunyi Zhang, Jimeng Sun, Jiawei Han

    Abstract: Large language models (LLMs) have achieved impressive performance on knowledge-intensive tasks, yet they often struggle with multi-step reasoning due to the unstructured nature of retrieved context. While retrieval-augmented generation (RAG) methods provide external information, the lack of explicit organization among retrieved passages limits their effectiveness, leading to brittle reasoning path… ▽ More

    Submitted 17 May, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

    Comments: under review

  48. arXiv:2502.05822  [pdf, other

    cs.IR

    HCMRM: A High-Consistency Multimodal Relevance Model for Search Ads

    Authors: Guobing Gan, Kaiming Gao, Li Wang, Shen Jiang, Peng Jiang

    Abstract: Search advertising is essential for merchants to reach the target users on short video platforms. Short video ads aligned with user search intents are displayed through relevance matching and bid ranking mechanisms. This paper focuses on improving query-to-video relevance matching to enhance the effectiveness of ranking in ad systems. Recent vision-language pre-training models have demonstrated pr… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

    Comments: Accepted by WWW 2025 (Industry Track)

  49. arXiv:2501.19274  [pdf, other

    cs.RO

    GO: The Great Outdoors Multimodal Dataset

    Authors: Peng Jiang, Kasi Viswanath, Akhil Nagariya, George Chustz, Maggie Wigness, Philip Osteen, Timothy Overbye, Christian Ellis, Long Quang, Srikanth Saripalli

    Abstract: The Great Outdoors (GO) dataset is a multi-modal annotated data resource aimed at advancing ground robotics research in unstructured environments. This dataset provides the most comprehensive set of data modalities and annotations compared to existing off-road datasets. In total, the GO dataset includes six unique sensor types with high-quality semantic annotations and GPS traces to support tasks… ▽ More

    Submitted 31 January, 2025; originally announced January 2025.

    Comments: 5 pages, 5 figures

  50. arXiv:2501.07212  [pdf, other

    cs.IR

    Future-Conditioned Recommendations with Multi-Objective Controllable Decision Transformer

    Authors: Chongming Gao, Kexin Huang, Ziang Fei, Jiaju Chen, Jiawei Chen, Jianshan Sun, Shuchang Liu, Qingpeng Cai, Peng Jiang

    Abstract: Securing long-term success is the ultimate aim of recommender systems, demanding strategies capable of foreseeing and shaping the impact of decisions on future user satisfaction. Current recommendation strategies grapple with two significant hurdles. Firstly, the future impacts of recommendation decisions remain obscured, rendering it impractical to evaluate them through direct optimization of imm… ▽ More

    Submitted 13 January, 2025; originally announced January 2025.