Skip to main content

Showing 1–50 of 371 results for author: Jiang, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.07515  [pdf, ps, other

    cs.DS math.PR

    Improved Mixing of Critical Hardcore Model

    Authors: Zongchen Chen, Tianhui Jiang

    Abstract: The hardcore model is one of the most classic and widely studied examples of undirected graphical models. Given a graph $G$, the hardcore model describes a Gibbs distribution of $λ$-weighted independent sets of $G$. In the last two decades, a beautiful computational phase transition has been established at a precise threshold $λ_c(Δ)$ where $Δ$ denotes the maximum degree, where the task of samplin… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 28 pages, 0 figures

  2. arXiv:2505.02887  [pdf, ps, other

    q-bio.BM cs.AI cs.LG

    CreoPep: A Universal Deep Learning Framework for Target-Specific Peptide Design and Optimization

    Authors: Cheng Ge, Han-Shen Tae, Zhenqiang Zhang, Lu Lu, Zhijie Huang, Yilin Wang, Tao Jiang, Wenqing Cai, Shan Chang, David J. Adams, Rilei Yu

    Abstract: Target-specific peptides, such as conotoxins, exhibit exceptional binding affinity and selectivity toward ion channels and receptors. However, their therapeutic potential remains underutilized due to the limited diversity of natural variants and the labor-intensive nature of traditional optimization strategies. Here, we present CreoPep, a deep learning-based conditional generative framework that i… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 16 pages, 6 figures

  3. arXiv:2504.10888  [pdf, other

    cs.CV cs.AI

    CDUPatch: Color-Driven Universal Adversarial Patch Attack for Dual-Modal Visible-Infrared Detectors

    Authors: Jiahuan Long, Wen Yao, Tingsong Jiang, Chao Ma

    Abstract: Adversarial patches are widely used to evaluate the robustness of object detection systems in real-world scenarios. These patches were initially designed to deceive single-modal detectors (e.g., visible or infrared) and have recently been extended to target visible-infrared dual-modal detectors. However, existing dual-modal adversarial patch attacks have limited attack effectiveness across diverse… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  4. arXiv:2504.10479  [pdf, other

    cs.CV

    InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

    Authors: Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu, Shenglong Ye, Lixin Gu, Hao Tian, Yuchen Duan, Weijie Su, Jie Shao, Zhangwei Gao, Erfei Cui, Xuehui Wang, Yue Cao, Yangzhou Liu, Xingguang Wei, Hongjie Zhang, Haomin Wang, Weiye Xu, Hao Li, Jiahao Wang, Nianchen Deng, Songze Li, Yinan He, Tan Jiang , et al. (26 additional authors not shown)

    Abstract: We introduce InternVL3, a significant advancement in the InternVL series featuring a native multimodal pre-training paradigm. Rather than adapting a text-only large language model (LLM) into a multimodal large language model (MLLM) that supports visual inputs, InternVL3 jointly acquires multimodal and linguistic capabilities from both diverse multimodal data and pure-text corpora during a single p… ▽ More

    Submitted 18 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

    Comments: Technical Report

  5. arXiv:2504.10143  [pdf, other

    cs.LG cs.CV

    Negate or Embrace: On How Misalignment Shapes Multimodal Representation Learning

    Authors: Yichao Cai, Yuhang Liu, Erdun Gao, Tianjiao Jiang, Zhen Zhang, Anton van den Hengel, Javen Qinfeng Shi

    Abstract: Multimodal representation learning, exemplified by multimodal contrastive learning (MMCL) using image-text pairs, aims to learn powerful representations by aligning cues across modalities. This approach relies on the core assumption that the exemplar image-text pairs constitute two representations of an identical concept. However, recent research has revealed that real-world datasets often exhibit… ▽ More

    Submitted 29 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

  6. arXiv:2504.09361  [pdf, other

    cs.CV

    PapMOT: Exploring Adversarial Patch Attack against Multiple Object Tracking

    Authors: Jiahuan Long, Tingsong Jiang, Wen Yao, Shuai Jia, Weijia Zhang, Weien Zhou, Chao Ma, Xiaoqian Chen

    Abstract: Tracking multiple objects in a continuous video stream is crucial for many computer vision tasks. It involves detecting and associating objects with their respective identities across successive frames. Despite significant progress made in multiple object tracking (MOT), recent studies have revealed the vulnerability of existing MOT methods to adversarial attacks. Nevertheless, all of these attack… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

    Comments: Accepted by ECCV 2024

  7. arXiv:2504.09153  [pdf, ps, other

    cs.CR

    Secure Physical Layer Communications for Low-Altitude Economy Networking: A Survey

    Authors: Lingyi Cai, Jiacheng Wang, Ruichen Zhang, Yu Zhang, Tao Jiang, Dusit Niyato, Xianbin Wang, Abbas Jamalipour, Xuemin Shen

    Abstract: The Low-Altitude Economy Networking (LAENet) is emerging as a transformative paradigm that enables an integrated and sophisticated communication infrastructure to support aerial vehicles in carrying out a wide range of economic activities within low-altitude airspace. However, the physical layer communications in the LAENet face growing security threats due to inherent characteristics of aerial co… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

    Comments: 31 pages, 11 figures, survey paper

  8. arXiv:2504.08915  [pdf, other

    cs.CV cs.AI

    Parameter-Free Fine-tuning via Redundancy Elimination for Vision Foundation Models

    Authors: Jiahuan Long, Tingsong Jiang, Wen Yao, Yizhe Xiong, Zhengqin Xu, Shuai Jia, Chao Ma

    Abstract: Vision foundation models (VFMs) are large pre-trained models that form the backbone of various vision tasks. Fine-tuning VFMs can further unlock their potential for downstream tasks or scenarios. However, VFMs often contain significant feature redundancy, which may limit their adaptability to new tasks. In this paper, we investigate the redundancies in the segment anything model (SAM) and then pro… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  9. arXiv:2504.08906  [pdf, other

    cs.CV cs.AI

    Robust SAM: On the Adversarial Robustness of Vision Foundation Models

    Authors: Jiahuan Long, Zhengqin Xu, Tingsong Jiang, Wen Yao, Shuai Jia, Chao Ma, Xiaoqian Chen

    Abstract: The Segment Anything Model (SAM) is a widely used vision foundation model with diverse applications, including image segmentation, detection, and tracking. Given SAM's wide applications, understanding its robustness against adversarial attacks is crucial for real-world deployment. However, research on SAM's robustness is still in its early stages. Existing attacks often overlook the role of prompt… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: Accepted by AAAI2025

  10. arXiv:2504.05844  [pdf, other

    cs.LG

    Adaptive Substructure-Aware Expert Model for Molecular Property Prediction

    Authors: Tianyi Jiang, Zeyu Wang, Shanqing Yu, Qi Xuan

    Abstract: Molecular property prediction is essential for applications such as drug discovery and toxicity assessment. While Graph Neural Networks (GNNs) have shown promising results by modeling molecules as molecular graphs, their reliance on data-driven learning limits their ability to generalize, particularly in the presence of data imbalance and diverse molecular substructures. Existing methods often ove… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  11. arXiv:2503.20589  [pdf, other

    cs.SE

    What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond

    Authors: Wenchao Gu, Juntao Chen, Yanlin Wang, Tianyue Jiang, Xingzhe Li, Mingwei Liu, Xilin Liu, Yuchi Ma, Zibin Zheng

    Abstract: Repository-level code generation remains challenging due to complex code dependencies and the limitations of large language models (LLMs) in processing long contexts. While retrieval-augmented generation (RAG) frameworks are widely adopted, the effectiveness of different retrieved information sources-contextual code, APIs, and similar snippets-has not been rigorously analyzed. Through an empirical… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  12. arXiv:2503.14237  [pdf, other

    cs.CV

    Make Your Training Flexible: Towards Deployment-Efficient Video Models

    Authors: Chenting Wang, Kunchang Li, Tianxiang Jiang, Xiangyu Zeng, Yi Wang, Limin Wang

    Abstract: Popular video training methods mainly operate on a fixed number of tokens sampled from a predetermined spatiotemporal grid, resulting in sub-optimal accuracy-computation trade-offs due to inherent video redundancy. They also lack adaptability to varying computational budgets for downstream tasks, hindering applications of the most competitive model in real-world scenes. We thus propose a new test… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  13. arXiv:2503.08471  [pdf, other

    cs.CV

    TrackOcc: Camera-based 4D Panoptic Occupancy Tracking

    Authors: Zhuoguang Chen, Kenan Li, Xiuyu Yang, Tao Jiang, Yiming Li, Hang Zhao

    Abstract: Comprehensive and consistent dynamic scene understanding from camera input is essential for advanced autonomous systems. Traditional camera-based perception tasks like 3D object tracking and semantic occupancy prediction lack either spatial comprehensiveness or temporal consistency. In this work, we introduce a brand-new task, Camera-based 4D Panoptic Occupancy Tracking, which simultaneously addre… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: Accepted at ICRA 2025

  14. arXiv:2503.06863  [pdf, other

    cs.RO cs.CV

    HIF: Height Interval Filtering for Efficient Dynamic Points Removal

    Authors: Shufang Zhang, Tao Jiang, Jiazheng Wu, Ziyu Meng, Ziyang Zhang, Shan An

    Abstract: 3D point cloud mapping plays a essential role in localization and autonomous navigation. However, dynamic objects often leave residual traces during the map construction process, which undermine the performance of subsequent tasks. Therefore, dynamic object removal has become a critical challenge in point cloud based map construction within dynamic scenarios. Existing approaches, however, often in… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  15. arXiv:2503.01116  [pdf, other

    eess.SP cs.LG

    Large AI Model for Delay-Doppler Domain Channel Prediction in 6G OTFS-Based Vehicular Networks

    Authors: Jianzhe Xue, Dongcheng Yuan, Zhanxi Ma, Tiankai Jiang, Yu Sun, Haibo Zhou, Xuemin Shen

    Abstract: Channel prediction is crucial for high-mobility vehicular networks, as it enables the anticipation of future channel conditions and the proactive adjustment of communication strategies. However, achieving accurate vehicular channel prediction is challenging due to significant Doppler effects and rapid channel variations resulting from high-speed vehicle movement and complex propagation environment… ▽ More

    Submitted 8 May, 2025; v1 submitted 2 March, 2025; originally announced March 2025.

    Comments: This manuscript has been accepted by SCIENCE CHINA Information Sciences

  16. arXiv:2502.15286  [pdf

    cs.CV

    Soybean pod and seed counting in both outdoor fields and indoor laboratories using unions of deep neural networks

    Authors: Tianyou Jiang, Mingshun Shao, Tianyi Zhang, Xiaoyu Liu, Qun Yu

    Abstract: Automatic counting soybean pods and seeds in outdoor fields allows for rapid yield estimation before harvesting, while indoor laboratory counting offers greater accuracy. Both methods can significantly accelerate the breeding process. However, it remains challenging for accurately counting pods and seeds in outdoor fields, and there are still no accurate enough tools for counting pods and seeds in… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

  17. arXiv:2502.14314  [pdf, other

    cs.CV

    ODverse33: Is the New YOLO Version Always Better? A Multi Domain benchmark from YOLO v5 to v11

    Authors: Tianyou Jiang, Yang Zhong

    Abstract: You Look Only Once (YOLO) models have been widely used for building real-time object detectors across various domains. With the increasing frequency of new YOLO versions being released, key questions arise. Are the newer versions always better than their previous versions? What are the core innovations in each YOLO version and how do these changes translate into real-world performance gains? In th… ▽ More

    Submitted 11 April, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: 20 pages, 4 figures, 7 tables

  18. arXiv:2502.13189  [pdf, other

    cs.LG cs.AI cs.CL

    MoBA: Mixture of Block Attention for Long-Context LLMs

    Authors: Enzhe Lu, Zhejun Jiang, Jingyuan Liu, Yulun Du, Tao Jiang, Chao Hong, Shaowei Liu, Weiran He, Enming Yuan, Yuzhi Wang, Zhiqi Huang, Huan Yuan, Suting Xu, Xinran Xu, Guokun Lai, Yanru Chen, Huabin Zheng, Junjie Yan, Jianlin Su, Yuxin Wu, Neo Y. Zhang, Zhilin Yang, Xinyu Zhou, Mingxing Zhang, Jiezhong Qiu

    Abstract: Scaling the effective context length is essential for advancing large language models (LLMs) toward artificial general intelligence (AGI). However, the quadratic increase in computational complexity inherent in traditional attention mechanisms presents a prohibitive overhead. Existing approaches either impose strongly biased structures, such as sink or window attention which are task-specific, or… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: 15 pages

  19. arXiv:2502.12794  [pdf, other

    cs.CR cs.CV cs.LG

    RAPID: Retrieval Augmented Training of Differentially Private Diffusion Models

    Authors: Tanqiu Jiang, Changjiang Li, Fenglong Ma, Ting Wang

    Abstract: Differentially private diffusion models (DPDMs) harness the remarkable generative capabilities of diffusion models while enforcing differential privacy (DP) for sensitive data. However, existing DPDM training approaches often suffer from significant utility loss, large memory footprint, and expensive inference cost, impeding their practical uses. To overcome such limitations, we present RAPID: Ret… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: Published in ICLR 2025

  20. arXiv:2502.08905  [pdf, other

    cs.CV

    DiffoRA: Enabling Parameter-Efficient LLM Fine-Tuning via Differential Low-Rank Matrix Adaptation

    Authors: Tangyu Jiang, Haodi Wang, Chun Yuan

    Abstract: The Parameter-Efficient Fine-Tuning (PEFT) methods have been extensively researched for large language models in the downstream tasks. Among all the existing approaches, the Low-Rank Adaptation (LoRA) has gained popularity for its streamlined design by incorporating low-rank matrices into existing pre-trained models. Though effective, LoRA allocates every module an identical low-rank matrix, which… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  21. arXiv:2502.06432  [pdf, other

    cs.CV cs.AI

    Prompt-SID: Learning Structural Representation Prompt via Latent Diffusion for Single-Image Denoising

    Authors: Huaqiu Li, Wang Zhang, Xiaowan Hu, Tao Jiang, Zikang Chen, Haoqian Wang

    Abstract: Many studies have concentrated on constructing supervised models utilizing paired datasets for image denoising, which proves to be expensive and time-consuming. Current self-supervised and unsupervised approaches typically rely on blind-spot networks or sub-image pairs sampling, resulting in pixel information loss and destruction of detailed structural information, thereby significantly constraini… ▽ More

    Submitted 13 March, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

  22. arXiv:2502.06087  [pdf, other

    cs.CL

    ConMeC: A Dataset for Metonymy Resolution with Common Nouns

    Authors: Saptarshi Ghosh, Tianyu Jiang

    Abstract: Metonymy plays an important role in our daily communication. People naturally think about things using their most salient properties or commonly related concepts. For example, by saying "The bus decided to skip our stop today," we actually mean that the bus driver made the decision, not the bus. Prior work on metonymy resolution has mainly focused on named entities. However, metonymy involving com… ▽ More

    Submitted 10 February, 2025; v1 submitted 9 February, 2025; originally announced February 2025.

    Comments: NAACL 2025

  23. arXiv:2501.16591  [pdf

    cs.LG cs.AI

    Applying Ensemble Models based on Graph Neural Network and Reinforcement Learning for Wind Power Forecasting

    Authors: Hongjin Song, Qianrun Chen, Tianqi Jiang, Yongfeng Li, Xusheng Li, Wenjun Xi, Songtao Huang

    Abstract: Accurately predicting the wind power output of a wind farm across various time scales utilizing Wind Power Forecasting (WPF) is a critical issue in wind power trading and utilization. The WPF problem remains unresolved due to numerous influencing variables, such as wind speed, temperature, latitude, and longitude. Furthermore, achieving high prediction accuracy is crucial for maintaining electric… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

  24. arXiv:2501.14249  [pdf, other

    cs.LG cs.AI cs.CL

    Humanity's Last Exam

    Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1084 additional authors not shown)

    Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More

    Submitted 19 April, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 29 pages, 6 figures

  25. arXiv:2501.14050  [pdf, other

    cs.LG cs.AI cs.CR

    GraphRAG under Fire

    Authors: Jiacheng Liang, Yuhui Wang, Changjiang Li, Rongyi Zhu, Tanqiu Jiang, Neil Gong, Ting Wang

    Abstract: GraphRAG advances retrieval-augmented generation (RAG) by structuring external knowledge as multi-scale knowledge graphs, enabling language models to integrate both broad context and granular details in their generation. While GraphRAG has demonstrated success across domains, its security implications remain largely unexplored. To bridge this gap, this work examines GraphRAG's vulnerability to poi… ▽ More

    Submitted 23 April, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

    Comments: 13 pages

  26. arXiv:2501.12599  [pdf, other

    cs.AI cs.LG

    Kimi k1.5: Scaling Reinforcement Learning with LLMs

    Authors: Kimi Team, Angang Du, Bofei Gao, Bowei Xing, Changjiu Jiang, Cheng Chen, Cheng Li, Chenjun Xiao, Chenzhuang Du, Chonghua Liao, Chuning Tang, Congcong Wang, Dehao Zhang, Enming Yuan, Enzhe Lu, Fengxiang Tang, Flood Sung, Guangda Wei, Guokun Lai, Haiqing Guo, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang , et al. (69 additional authors not shown)

    Abstract: Language model pretraining with next token prediction has proved effective for scaling compute but is limited to the amount of available training data. Scaling reinforcement learning (RL) unlocks a new axis for the continued improvement of artificial intelligence, with the promise that large language models (LLMs) can scale their training data by learning to explore with rewards. However, prior pu… ▽ More

    Submitted 4 March, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

    Comments: 25 pages

  27. arXiv:2501.11347  [pdf, other

    cs.CV

    EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery

    Authors: Guankun Wang, Long Bai, Junyi Wang, Kun Yuan, Zhen Li, Tianxu Jiang, Xiting He, Jinlin Wu, Zhen Chen, Zhen Lei, Hongbin Liu, Jiazheng Wang, Fan Zhang, Nicolas Padoy, Nassir Navab, Hongliang Ren

    Abstract: Recently, Multimodal Large Language Models (MLLMs) have demonstrated their immense potential in computer-aided diagnosis and decision-making. In the context of robotic-assisted surgery, MLLMs can serve as effective tools for surgical training and guidance. However, there is still a lack of MLLMs specialized for surgical scene understanding in clinical applications. In this work, we introduce EndoC… ▽ More

    Submitted 14 March, 2025; v1 submitted 20 January, 2025; originally announced January 2025.

  28. arXiv:2501.09431  [pdf, other

    cs.AI cs.CL cs.CR cs.CY

    A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy

    Authors: Huandong Wang, Wenjie Fu, Yingzhou Tang, Zhilong Chen, Yuxi Huang, Jinghua Piao, Chen Gao, Fengli Xu, Tao Jiang, Yong Li

    Abstract: While large language models (LLMs) present significant potential for supporting numerous real-world applications and delivering positive social impacts, they still face significant challenges in terms of the inherent risk of privacy leakage, hallucinated outputs, and value misalignment, and can be maliciously used for generating toxic content and unethical purposes after been jailbroken. Therefore… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

  29. arXiv:2501.03635  [pdf, other

    cs.LG cs.AI

    MHGNet: Multi-Heterogeneous Graph Neural Network for Traffic Prediction

    Authors: Mei Wu, Yiqian Lin, Tianfan Jiang, Wenchao Weng

    Abstract: In recent years, traffic flow prediction has played a crucial role in the management of intelligent transportation systems. However, traditional forecasting methods often model non-Euclidean low-dimensional traffic data as a simple graph with single-type nodes and edges, failing to capture similar trends among nodes of the same type. To address this limitation, this paper proposes MHGNet, a novel… ▽ More

    Submitted 7 January, 2025; originally announced January 2025.

    Comments: Accepted by 2025 lEEE International Conference on Acoustics, speech, and signal Processing (lCASSP2025)

  30. arXiv:2501.02771  [pdf, other

    cs.CV

    WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation

    Authors: Tianjian Jiang, Johsan Billingham, Sebastian Müksch, Juan Zarate, Nicolas Evans, Martin R. Oswald, Marc Pollefeys, Otmar Hilliges, Manuel Kaufmann, Jie Song

    Abstract: We present WorldPose, a novel dataset for advancing research in multi-person global pose estimation in the wild, featuring footage from the 2022 FIFA World Cup. While previous datasets have primarily focused on local poses, often limited to a single person or in constrained, indoor settings, the infrastructure deployed for this sporting event allows access to multiple fixed and moving cameras in d… ▽ More

    Submitted 20 January, 2025; v1 submitted 6 January, 2025; originally announced January 2025.

  31. arXiv:2412.11820  [pdf, other

    cs.CV

    Spatiotemporal Blind-Spot Network with Calibrated Flow Alignment for Self-Supervised Video Denoising

    Authors: Zikang Chen, Tao Jiang, Xiaowan Hu, Wang Zhang, Huaqiu Li, Haoqian Wang

    Abstract: Self-supervised video denoising aims to remove noise from videos without relying on ground truth data, leveraging the video itself to recover clean frames. Existing methods often rely on simplistic feature stacking or apply optical flow without thorough analysis. This results in suboptimal utilization of both inter-frame and intra-frame information, and it also neglects the potential of optical fl… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

  32. Wireless Environmental Information Theory: A New Paradigm towards 6G Online and Proactive Environment Intelligence Communication

    Authors: Jianhua Zhang, Li Yu, Shaoyi Liu, Yichen Cai, Yuxiang Zhang, Hongbo Xing, Tao jiang

    Abstract: The channel is one of the five critical components of a communication system, and its ergodic capacity is based on all realizations of statistic channel model. This statistical paradigm has successfully guided the design of mobile communication systems from 1G to 5G. However, this approach relies on offline channel measurements in specific environments, and the system passively adapts to new envir… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

  33. arXiv:2412.05271  [pdf, other

    cs.CV

    Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

    Authors: Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao , et al. (17 additional authors not shown)

    Abstract: We introduce InternVL 2.5, an advanced multimodal large language model (MLLM) series that builds upon InternVL 2.0, maintaining its core model architecture while introducing significant enhancements in training and testing strategies as well as data quality. In this work, we delve into the relationship between model scaling and performance, systematically exploring the performance trends in vision… ▽ More

    Submitted 13 January, 2025; v1 submitted 6 December, 2024; originally announced December 2024.

    Comments: Technical Report

  34. arXiv:2411.17764  [pdf, other

    cs.RO cs.AI

    PROGRESSOR: A Perceptually Guided Reward Estimator with Self-Supervised Online Refinement

    Authors: Tewodros Ayalew, Xiao Zhang, Kevin Yuanbo Wu, Tianchong Jiang, Michael Maire, Matthew R. Walter

    Abstract: We present PROGRESSOR, a novel framework that learns a task-agnostic reward function from videos, enabling policy training through goal-conditioned reinforcement learning (RL) without manual supervision. Underlying this reward is an estimate of the distribution over task progress as a function of the current, initial, and goal observations that is learned in a self-supervised fashion. Crucially, P… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

    Comments: 15 pages,13 figures

  35. arXiv:2411.15430  [pdf

    cs.IR cs.DL

    The Landscape of Data Reuse in Interactive Information Retrieval: Motivations, Sources, and Evaluation of Reusability

    Authors: Tianji Jiang, Wenqi Li, Jiqun Liu

    Abstract: Sharing and reusing research data can effectively reduce redundant efforts in data collection and curation, especially for small labs and research teams conducting human-centered system research, and enhance the replicability of evaluation experiments. Building a sustainable data reuse process and culture relies on frameworks that encompass policies, standards, roles, and responsibilities, all of… ▽ More

    Submitted 22 November, 2024; originally announced November 2024.

  36. arXiv:2411.13885  [pdf

    cs.RO

    Trajectory Tracking Using Frenet Coordinates with Deep Deterministic Policy Gradient

    Authors: Tongzhou Jiang, Lipeng Liu, Junyue Jiang, Tianyao Zheng, Yuhui Jin, Kunpeng Xu

    Abstract: This paper studies the application of the DDPG algorithm in trajectory-tracking tasks and proposes a trajectorytracking control method combined with Frenet coordinate system. By converting the vehicle's position and velocity information from the Cartesian coordinate system to Frenet coordinate system, this method can more accurately describe the vehicle's deviation and travel distance relative to… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

  37. arXiv:2411.13053  [pdf, other

    cs.CV cs.AI cs.LG

    MEGL: Multimodal Explanation-Guided Learning

    Authors: Yifei Zhang, Tianxu Jiang, Bo Pan, Jingyu Wang, Guangji Bai, Liang Zhao

    Abstract: Explaining the decision-making processes of Artificial Intelligence (AI) models is crucial for addressing their "black box" nature, particularly in tasks like image classification. Traditional eXplainable AI (XAI) methods typically rely on unimodal explanations, either visual or textual, each with inherent limitations. Visual explanations highlight key regions but often lack rationale, while textu… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.

  38. arXiv:2410.21727  [pdf, ps, other

    cs.DS

    Edge Arrival Online Matching: The Power of Free Disposal on Acyclic Graphs

    Authors: Tianle Jiang, Yuhao Zhang

    Abstract: Online matching is a fundamental problem in the study of online algorithms. We study the problem under a very general arrival model: the edge arrival model. Free disposal is an important notion in the online matching literature, which allows the algorithm to dispose of the previously matched edges. Without free disposal, we cannot achieve any bounded ratio, even with randomized algorithms, when ed… ▽ More

    Submitted 29 October, 2024; originally announced October 2024.

  39. arXiv:2410.19937  [pdf, other

    cs.CR cs.AI cs.CL

    RobustKV: Defending Large Language Models against Jailbreak Attacks via KV Eviction

    Authors: Tanqiu Jiang, Zian Wang, Jiacheng Liang, Changjiang Li, Yuhui Wang, Ting Wang

    Abstract: Jailbreak attacks circumvent LLMs' built-in safeguards by concealing harmful queries within jailbreak prompts. While existing defenses primarily focus on mitigating the effects of jailbreak prompts, they often prove inadequate as jailbreak prompts can take arbitrary, adaptive forms. This paper presents RobustKV, a novel defense that adopts a fundamentally different approach by selectively removing… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

  40. arXiv:2410.19702  [pdf, other

    cs.CV cs.AI cs.MM

    TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning

    Authors: Xiangyu Zeng, Kunchang Li, Chenting Wang, Xinhao Li, Tianxiang Jiang, Ziang Yan, Songze Li, Yansong Shi, Zhengrong Yue, Yi Wang, Yali Wang, Yu Qiao, Limin Wang

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated impressive performance in short video understanding. However, understanding long-form videos still remains challenging for MLLMs. This paper proposes TimeSuite, a collection of new designs to adapt the existing short-form video MLLMs for long video understanding, including a simple yet efficient framework to process long video sequence, a… ▽ More

    Submitted 12 February, 2025; v1 submitted 25 October, 2024; originally announced October 2024.

    Comments: Accepted by ICLR2025

  41. arXiv:2410.16955  [pdf, other

    cs.CV eess.IV

    PGCS: Physical Law embedded Generative Cloud Synthesis in Remote Sensing Images

    Authors: Liying Xu, Huifang Li, Huanfeng Shen, Mingyang Lei, Tao Jiang

    Abstract: Data quantity and quality are both critical for information extraction and analyzation in remote sensing. However, the current remote sensing datasets often fail to meet these two requirements, for which cloud is a primary factor degrading the data quantity and quality. This limitation affects the precision of results in remote sensing application, particularly those derived from data-driven techn… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

    Comments: 20 pages, 16 figures

  42. arXiv:2410.00057  [pdf, other

    cs.LG

    STTM: A New Approach Based Spatial-Temporal Transformer And Memory Network For Real-time Pressure Signal In On-demand Food Delivery

    Authors: Jiang Wang, Haibin Wei, Xiaowei Xu, Jiacheng Shi, Jian Nie, Longzhi Du, Taixu Jiang

    Abstract: On-demand Food Delivery (OFD) services have become very common around the world. For example, on the Ele.me platform, users place more than 15 million food orders every day. Predicting the Real-time Pressure Signal (RPS) is crucial for OFD services, as it is primarily used to measure the current status of pressure on the logistics system. When RPS rises, the pressure increases, and the platform ne… ▽ More

    Submitted 29 September, 2024; originally announced October 2024.

  43. arXiv:2409.18098  [pdf, other

    cs.RO

    StackGen: Generating Stable Structures from Silhouettes via Diffusion

    Authors: Luzhe Sun, Takuma Yoneda, Samuel W. Wheeler, Tianchong Jiang, Matthew R. Walter

    Abstract: Humans naturally obtain intuition about the interactions between and the stability of rigid objects by observing and interacting with the world. It is this intuition that governs the way in which we regularly configure objects in our environment, allowing us to build complex structures from simple, everyday objects. Robotic agents, on the other hand, traditionally require an explicit model of the… ▽ More

    Submitted 18 March, 2025; v1 submitted 26 September, 2024; originally announced September 2024.

  44. arXiv:2409.15269  [pdf, other

    cs.CV

    ReLoo: Reconstructing Humans Dressed in Loose Garments from Monocular Video in the Wild

    Authors: Chen Guo, Tianjian Jiang, Manuel Kaufmann, Chengwei Zheng, Julien Valentin, Jie Song, Otmar Hilliges

    Abstract: While previous years have seen great progress in the 3D reconstruction of humans from monocular videos, few of the state-of-the-art methods are able to handle loose garments that exhibit large non-rigid surface deformations during articulation. This limits the application of such methods to humans that are dressed in standard pants or T-shirts. Our method, ReLoo, overcomes this limitation and reco… ▽ More

    Submitted 28 September, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

    Comments: Project page: https://moygcc.github.io/ReLoo/

  45. arXiv:2409.13430  [pdf, other

    cs.CV cs.AI

    CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction

    Authors: Zhangchen Ye, Tao Jiang, Chenfeng Xu, Yiming Li, Hang Zhao

    Abstract: Vision-based 3D occupancy prediction is significantly challenged by the inherent limitations of monocular vision in depth estimation. This paper introduces CVT-Occ, a novel approach that leverages temporal fusion through the geometric correspondence of voxels over time to improve the accuracy of 3D occupancy predictions. By sampling points along the line of sight of each voxel and integrating the… ▽ More

    Submitted 25 September, 2024; v1 submitted 20 September, 2024; originally announced September 2024.

    Comments: Accepted to ECCV 2024

  46. arXiv:2409.11663  [pdf, other

    cs.CR cs.AI

    Training with Differential Privacy: A Gradient-Preserving Noise Reduction Approach with Provable Security

    Authors: Haodi Wang, Tangyu Jiang, Yu Guo, Chengjun Cai, Cong Wang, Xiaohua Jia

    Abstract: Deep learning models have been extensively adopted in various regions due to their ability to represent hierarchical features, which highly rely on the training set and procedures. Thus, protecting the training process and deep learning algorithms is paramount in privacy preservation. Although Differential Privacy (DP) as a powerful cryptographic primitive has achieved satisfying results in deep l… ▽ More

    Submitted 11 March, 2025; v1 submitted 17 September, 2024; originally announced September 2024.

  47. arXiv:2409.08750  [pdf, other

    cs.RO

    DexSim2Real$^{2}$: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation

    Authors: Taoran Jiang, Liqian Ma, Yixuan Guan, Jiaojiao Meng, Weihang Chen, Zecui Zeng, Lusong Li, Dan Wu, Jing Xu, Rui Chen

    Abstract: Articulated object manipulation is ubiquitous in daily life. In this paper, we present DexSim2Real$^{2}$, a novel robot learning framework for goal-conditioned articulated object manipulation using both two-finger grippers and multi-finger dexterous hands. The key of our framework is constructing an explicit world model of unseen articulated objects through active one-step interactions. This expli… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: Project Webpage: https://jiangtaoran.github.io/dexsim2real2_website/. arXiv admin note: text overlap with arXiv:2302.10693

  48. arXiv:2408.14493  [pdf

    cs.LG eess.SY

    Extraction of Typical Operating Scenarios of New Power System Based on Deep Time Series Aggregation

    Authors: Zhaoyang Qu, Zhenming Zhang, Nan Qu, Yuguang Zhou, Yang Li, Tao Jiang, Min Li, Chao Long

    Abstract: Extracting typical operational scenarios is essential for making flexible decisions in the dispatch of a new power system. This study proposed a novel deep time series aggregation scheme (DTSAs) to generate typical operational scenarios, considering the large amount of historical operational snapshot data. Specifically, DTSAs analyze the intrinsic mechanisms of different scheduling operational sce… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by CAAI Transactions on Intelligence Technology

    Journal ref: CAAI Transactions on Intelligence Technology 10 (2025) 283-299

  49. arXiv:2408.14114  [pdf, other

    cs.CV

    ShapeMamba-EM: Fine-Tuning Foundation Model with Local Shape Descriptors and Mamba Blocks for 3D EM Image Segmentation

    Authors: Ruohua Shi, Qiufan Pang, Lei Ma, Lingyu Duan, Tiejun Huang, Tingting Jiang

    Abstract: Electron microscopy (EM) imaging offers unparalleled resolution for analyzing neural tissues, crucial for uncovering the intricacies of synaptic connections and neural processes fundamental to understanding behavioral mechanisms. Recently, the foundation models have demonstrated impressive performance across numerous natural and medical image segmentation tasks. However, applying these foundation… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Journal ref: MICCAI 2024

  50. arXiv:2408.08661  [pdf, other

    cs.CL cs.CR cs.LG

    MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector

    Authors: Wenjie Fu, Huandong Wang, Chen Gao, Guanghua Liu, Yong Li, Tao Jiang

    Abstract: The increasing parameters and expansive dataset of large language models (LLMs) highlight the urgent demand for a technical solution to audit the underlying privacy risks and copyright issues associated with LLMs. Existing studies have partially addressed this need through an exploration of the pre-training data detection problem, which is an instance of a membership inference attack (MIA). This p… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: code and dataset: https://github.com/wjfu99/MIA-Tuner

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2025)