Skip to main content

Showing 1–50 of 138 results for author: Gu, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.06335  [pdf, ps, other

    cs.LG cs.AI cs.CR

    Remote Rowhammer Attack using Adversarial Observations on Federated Learning Clients

    Authors: Jinsheng Yuan, Yuhang Hao, Weisi Guo, Yun Wu, Chongyan Gu

    Abstract: Federated Learning (FL) has the potential for simultaneous global learning amongst a large number of parallel agents, enabling emerging AI such as LLMs to be trained across demographically diverse data. Central to this being efficient is the ability for FL to perform sparse gradient updates and remote direct memory access at the central server. Most of the research in FL security focuses on protec… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  2. arXiv:2504.08856  [pdf, other

    cs.CY cs.AI

    Examining GPT's Capability to Generate and Map Course Concepts and Their Relationship

    Authors: Tianyuan Yang, Ren Baofeng, Chenghao Gu, Tianjia He, Boxuan Ma, Shinichi Konomi

    Abstract: Extracting key concepts and their relationships from course information and materials facilitates the provision of visualizations and recommendations for learners who need to select the right courses to take from a large number of courses. However, identifying and extracting themes manually is labor-intensive and time-consuming. Previous machine learning-based methods to extract relevant concepts… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  3. arXiv:2503.18398  [pdf, ps, other

    cs.GT

    Global Profits, Local Decisions: Why Global Cooperation Falters in Multi-level Games

    Authors: Jinhua Zhao, Xinguo Yu, Rui Ding, Cuiling Gu, Xianjia Wang

    Abstract: Global cooperation often falters despite shared objectives, as misaligned interests and unequal incentives undermine collective efforts, such as those in international climate change collaborations. To tackle this issue, this paper introduces a multi-level game-theoretic model to analyze the dynamics of complex interactions within hierarchical systems. The model consists of global, local, and pair… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    MSC Class: 91A22; 05C57; 91A43

  4. arXiv:2503.18328  [pdf, other

    cs.CV

    TensoFlow: Tensorial Flow-based Sampler for Inverse Rendering

    Authors: Chun Gu, Xiaofei Wei, Li Zhang, Xiatian Zhu

    Abstract: Inverse rendering aims to recover scene geometry, material properties, and lighting from multi-view images. Given the complexity of light-surface interactions, importance sampling is essential for the evaluation of the rendering equation, as it reduces variance and enhances the efficiency of Monte Carlo sampling. Existing inverse rendering methods typically use pre-defined non-learnable importance… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: CVPR 2025. Code: https://github.com/fudan-zvg/tensoflow

  5. arXiv:2503.10631  [pdf, other

    cs.CV cs.RO

    HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model

    Authors: Jiaming Liu, Hao Chen, Pengju An, Zhuoyang Liu, Renrui Zhang, Chenyang Gu, Xiaoqi Li, Ziyu Guo, Sixiang Chen, Mengzhen Liu, Chengkai Hou, Mengdi Zhao, KC alex Zhou, Pheng-Ann Heng, Shanghang Zhang

    Abstract: Recent advancements in vision-language models (VLMs) for common-sense reasoning have led to the development of vision-language-action (VLA) models, enabling robots to perform generalized manipulation. Although existing autoregressive VLA methods leverage large-scale pretrained knowledge, they disrupt the continuity of actions. Meanwhile, some VLA methods incorporate an additional diffusion head to… ▽ More

    Submitted 17 March, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

  6. arXiv:2503.05731  [pdf, other

    cs.CY cs.AI

    AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons

    Authors: Shaona Ghosh, Heather Frase, Adina Williams, Sarah Luger, Paul Röttger, Fazl Barez, Sean McGregor, Kenneth Fricklas, Mala Kumar, Quentin Feuillade--Montixi, Kurt Bollacker, Felix Friedrich, Ryan Tsang, Bertie Vidgen, Alicia Parrish, Chris Knotz, Eleonora Presani, Jonathan Bennion, Marisa Ferrara Boston, Mike Kuniavsky, Wiebke Hutiri, James Ezick, Malek Ben Salem, Rajat Sahay, Sujata Goswami , et al. (77 additional authors not shown)

    Abstract: The rapid advancement and deployment of AI systems have created an urgent need for standard safety-evaluation frameworks. This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability. Its development employed an open process that included participants from multiple fields. The benchmark evaluates an AI system's resistance… ▽ More

    Submitted 18 April, 2025; v1 submitted 19 February, 2025; originally announced March 2025.

    Comments: 51 pages, 8 figures and an appendix

  7. arXiv:2502.19672  [pdf, other

    cs.CV cs.LG

    Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack

    Authors: Chenhe Gu, Jindong Gu, Andong Hua, Yao Qin

    Abstract: Multimodal Large Language Models (MLLMs), built upon LLMs, have recently gained attention for their capabilities in image recognition and understanding. However, while MLLMs are vulnerable to adversarial attacks, the transferability of these attacks across different models remains limited, especially under targeted attack setting. Existing methods primarily focus on vision-specific perturbations b… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: arXiv admin note: text overlap with arXiv:2403.09766

  8. arXiv:2502.18138  [pdf, other

    cs.SI cs.AI

    Large Language Model Driven Agents for Simulating Echo Chamber Formation

    Authors: Chenhao Gu, Ling Luo, Zainab Razia Zaidi, Shanika Karunasekera

    Abstract: The rise of echo chambers on social media platforms has heightened concerns about polarization and the reinforcement of existing beliefs. Traditional approaches for simulating echo chamber formation have often relied on predefined rules and numerical simulations, which, while insightful, may lack the nuance needed to capture complex, real-world interactions. In this paper, we present a novel frame… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

  9. arXiv:2502.16894  [pdf, other

    cs.CL

    Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

    Authors: Chenghao Fan, Zhenyi Lu, Sichen Liu, Xiaoye Qu, Wei Wei, Chengfeng Gu, Yu Cheng

    Abstract: While Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning for Large Language Models (LLMs), its performance often falls short of Full Fine-Tuning (Full FT). Current methods optimize LoRA by initializing with static singular value decomposition (SVD) subsets, leading to suboptimal leveraging of pre-trained knowledge. Another path for improving LoRA is incorporating a Mixture-of-Exper… ▽ More

    Submitted 26 February, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

  10. arXiv:2502.08332  [pdf, other

    cs.CR cs.AI

    Modification and Generated-Text Detection: Achieving Dual Detection Capabilities for the Outputs of LLM by Watermark

    Authors: Yuhang Cai, Yaofei Wang, Donghui Hu, Chen Gu

    Abstract: The development of large language models (LLMs) has raised concerns about potential misuse. One practical solution is to embed a watermark in the text, allowing ownership verification through watermark extraction. Existing methods primarily focus on defending against modification attacks, often neglecting other spoofing attacks. For example, attackers can alter the watermarked text to produce harm… ▽ More

    Submitted 1 March, 2025; v1 submitted 12 February, 2025; originally announced February 2025.

  11. arXiv:2502.07776  [pdf, other

    cs.CL cs.CR cs.LG

    Auditing Prompt Caching in Language Model APIs

    Authors: Chenchen Gu, Xiang Lisa Li, Rohith Kuditipudi, Percy Liang, Tatsunori Hashimoto

    Abstract: Prompt caching in large language models (LLMs) results in data-dependent timing variations: cached prompts are processed faster than non-cached prompts. These timing differences introduce the risk of side-channel timing attacks. For example, if the cache is shared across users, an attacker could identify cached prompts from fast API response times to learn information about other users' prompts. B… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

    Comments: 20 pages, 7 figures

  12. arXiv:2502.07384   

    cs.CE

    SAGEPhos: Sage Bio-Coupled and Augmented Fusion for Phosphorylation Site Detection

    Authors: Jingjie Zhang, Hanqun Cao, Zijun Gao, Xiaorui Wang, Chunbin Gu

    Abstract: Phosphorylation site prediction based on kinase-substrate interaction plays a vital role in understanding cellular signaling pathways and disease mechanisms. Computational methods for this task can be categorized into kinase-family-focused and individual kinase-targeted approaches. Individual kinase-targeted methods have gained prominence for their ability to explore a broader protein space and pr… ▽ More

    Submitted 16 April, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

    Comments: Due to significant disagreements within the author team regarding the content of the paper and an inability to reach a consensus, we have decided to withdraw the current version to allow for further verification and refinement of the research content

  13. arXiv:2501.16684  [pdf, other

    cs.CV

    SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation

    Authors: Jianing Li, Ming Lu, Hao Wang, Chenyang Gu, Wenzhao Zheng, Li Du, Shanghang Zhang

    Abstract: 3D semantic occupancy prediction is a crucial task in visual perception, as it requires the simultaneous comprehension of both scene geometry and semantics. It plays a crucial role in understanding 3D scenes and has great potential for various applications, such as robotic vision perception and autonomous driving. Many existing works utilize planar-based representations such as Bird's Eye View (BE… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

    Comments: Accepted by ICRA 2025;

  14. arXiv:2501.13971  [pdf, other

    cs.CV cs.GR eess.IV

    GS-LiDAR: Generating Realistic LiDAR Point Clouds with Panoramic Gaussian Splatting

    Authors: Junzhe Jiang, Chun Gu, Yurui Chen, Li Zhang

    Abstract: LiDAR novel view synthesis (NVS) has emerged as a novel task within LiDAR simulation, offering valuable simulated point cloud data from novel viewpoints to aid in autonomous driving systems. However, existing LiDAR NVS methods typically rely on neural radiance fields (NeRF) as their 3D representation, which incurs significant computational costs in both training and rendering. Moreover, NeRF and i… ▽ More

    Submitted 5 February, 2025; v1 submitted 22 January, 2025; originally announced January 2025.

  15. arXiv:2501.05244  [pdf, other

    eess.IV cs.CV eess.SP physics.optics

    Optimized Sampling for Non-Line-of-Sight Imaging Using Modified Fast Fourier Transforms

    Authors: Talha Sultan, Alex Bocchieri, Chaoying Gu, Xiaochun Liu, Pavel Polynkin, Andreas Velten

    Abstract: Non-line-of-Sight (NLOS) imaging systems collect light at a diffuse relay surface and input this measurement into computational algorithms that output a 3D volumetric reconstruction. These algorithms utilize the Fast Fourier Transform (FFT) to accelerate the reconstruction process but require both input and output to be sampled spatially with uniform grids. However, the geometry of NLOS imaging in… ▽ More

    Submitted 9 January, 2025; originally announced January 2025.

  16. arXiv:2412.19282  [pdf, other

    cs.CV

    Reflective Gaussian Splatting

    Authors: Yuxuan Yao, Zixuan Zeng, Chun Gu, Xiatian Zhu, Li Zhang

    Abstract: Novel view synthesis has experienced significant advancements owing to increasingly capable NeRF- and 3DGS-based methods. However, reflective object reconstruction remains challenging, lacking a proper solution to achieve real-time, high-quality rendering while accommodating inter-reflection. To fill this gap, we introduce a Reflective Gaussian splatting (Ref-Gaussian) framework characterized with… ▽ More

    Submitted 3 February, 2025; v1 submitted 26 December, 2024; originally announced December 2024.

    Comments: Accepted for ICLR 2025

  17. arXiv:2412.16496  [pdf, other

    cs.NI

    STARVERI: Efficient and Accurate Verification for Risk-Avoidance Routing in LEO Satellite Networks

    Authors: Chenwei Gu, Qian Wu, Zeqi Lai, Hewu Li, Jihao Li, Weisen Liu, Qi Zhang, Jun Liu, Yuanjie Li

    Abstract: Emerging satellite Internet constellations such as SpaceX's Starlink will deploy thousands of broadband satellites and construct Low-Earth Orbit(LEO) satellite networks(LSNs) in space, significantly expanding the boundaries of today's terrestrial Internet. However, due to the unique global LEO dynamics, satellite routers will inevitably pass through uncontrolled areas, suffering from security thre… ▽ More

    Submitted 21 December, 2024; originally announced December 2024.

  18. arXiv:2412.15867  [pdf, other

    cs.CV

    IRGS: Inter-Reflective Gaussian Splatting with 2D Gaussian Ray Tracing

    Authors: Chun Gu, Xiaofei Wei, Zixuan Zeng, Yuxuan Yao, Li Zhang

    Abstract: In inverse rendering, accurately modeling visibility and indirect radiance for incident light is essential for capturing secondary effects. Due to the absence of a powerful Gaussian ray tracer, previous 3DGS-based methods have either adopted a simplified rendering equation or used learnable parameters to approximate incident light, resulting in inaccurate material and lighting estimations. To this… ▽ More

    Submitted 23 March, 2025; v1 submitted 20 December, 2024; originally announced December 2024.

    Comments: CVPR 2025. Project page: https://fudan-zvg.github.io/IRGS

  19. arXiv:2412.13877  [pdf, other

    cs.RO cs.AI

    RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation

    Authors: Kun Wu, Chengkai Hou, Jiaming Liu, Zhengping Che, Xiaozhu Ju, Zhuqin Yang, Meng Li, Yinuo Zhao, Zhiyuan Xu, Guang Yang, Shichao Fan, Xinhua Wang, Fei Liao, Zhen Zhao, Guangyu Li, Zhao Jin, Lecheng Wang, Jilei Mao, Ning Liu, Pei Ren, Qiang Zhang, Yaoxu Lyu, Mengzhen Liu, Jingyang He, Yulin Luo , et al. (12 additional authors not shown)

    Abstract: In this paper, we introduce RoboMIND (Multi-embodiment Intelligence Normative Data for Robot Manipulation), a dataset containing 107k demonstration trajectories across 479 diverse tasks involving 96 object classes. RoboMIND is collected through human teleoperation and encompasses comprehensive robotic-related information, including multi-view observations, proprioceptive robot state information, a… ▽ More

    Submitted 14 February, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

  20. arXiv:2412.13552  [pdf, other

    cs.CV cs.GR

    DragScene: Interactive 3D Scene Editing with Single-view Drag Instructions

    Authors: Chenghao Gu, Zhenzhe Li, Zhengqi Zhang, Yunpeng Bai, Shuzhao Xie, Zhi Wang

    Abstract: 3D editing has shown remarkable capability in editing scenes based on various instructions. However, existing methods struggle with achieving intuitive, localized editing, such as selectively making flowers blossom. Drag-style editing has shown exceptional capability to edit images with direct manipulation instead of ambiguous text commands. Nevertheless, extending drag-based editing to 3D scenes… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

  21. arXiv:2411.18623  [pdf, other

    cs.CV

    Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation

    Authors: Yueru Jia, Jiaming Liu, Sixiang Chen, Chenyang Gu, Zhilue Wang, Longzan Luo, Lily Lee, Pengwei Wang, Zhongyuan Wang, Renrui Zhang, Shanghang Zhang

    Abstract: 3D geometric information is essential for manipulation tasks, as robots need to perceive the 3D environment, reason about spatial relationships, and interact with intricate spatial configurations. Recent research has increasingly focused on the explicit extraction of 3D features, while still facing challenges such as the lack of large-scale robotic 3D data and the potential loss of spatial geometr… ▽ More

    Submitted 14 December, 2024; v1 submitted 27 November, 2024; originally announced November 2024.

  22. arXiv:2411.10883  [pdf, other

    cs.CR

    I Know What You Sync: Covert and Side Channel Attacks on File Systems via syncfs

    Authors: Cheng Gu, Yicheng Zhang, Nael Abu-Ghazaleh

    Abstract: Operating Systems enforce logical isolation using abstractions such as processes, containers, and isolation technologies to protect a system from malicious or buggy code. In this paper, we show new types of side channels through the file system that break this logical isolation. The file system plays a critical role in the operating system, managing all I/O activities between the application layer… ▽ More

    Submitted 26 April, 2025; v1 submitted 16 November, 2024; originally announced November 2024.

    Comments: Accepted to IEEE S&P 2025

  23. arXiv:2411.09968  [pdf, other

    cs.CV cs.AI

    Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs

    Authors: Xiaofeng Zhang, Yihao Quan, Chaochen Gu, Chen Shen, Xiaosong Yuan, Shaotian Yan, Hao Cheng, Kaijie Wu, Jieping Ye

    Abstract: The hallucination problem in multimodal large language models (MLLMs) remains a common issue. Although image tokens occupy a majority of the input sequence of MLLMs, there is limited research to explore the relationship between image tokens and hallucinations. In this paper, we analyze the distribution of attention scores for image tokens across each layer and head of the model, revealing an intri… ▽ More

    Submitted 15 November, 2024; originally announced November 2024.

  24. arXiv:2410.14946  [pdf, other

    cs.LG cs.AI q-bio.BM

    DEL-Ranking: Ranking-Correction Denoising Framework for Elucidating Molecular Affinities in DNA-Encoded Libraries

    Authors: Hanqun Cao, Mutian He, Ning Ma, Chang-yu Hsieh, Chunbin Gu, Pheng-Ann Heng

    Abstract: DNA-encoded library (DEL) screening has revolutionized the detection of protein-ligand interactions through read counts, enabling rapid exploration of vast chemical spaces. However, noise in read counts, stemming from nonspecific interactions, can mislead this exploration process. We present DEL-Ranking, a novel distribution-correction denoising framework that addresses these challenges. Our appro… ▽ More

    Submitted 4 December, 2024; v1 submitted 18 October, 2024; originally announced October 2024.

  25. arXiv:2410.13201  [pdf, other

    cs.CL cs.AI cs.LG

    Meta-DiffuB: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration

    Authors: Yun-Yen Chuang, Hung-Min Hsu, Kevin Lin, Chen-Sheng Gu, Ling Zhen Li, Ray-I Chang, Hung-yi Lee

    Abstract: The diffusion model, a new generative modeling paradigm, has achieved significant success in generating images, audio, video, and text. It has been adapted for sequence-to-sequence text generation (Seq2Seq) through DiffuSeq, termed S2S Diffusion. Existing S2S-Diffusion models predominantly rely on fixed or hand-crafted rules to schedule noise during the diffusion and denoising processes. However,… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  26. arXiv:2410.02231  [pdf, other

    cs.AI cs.LG eess.SY

    SEAL: SEmantic-Augmented Imitation Learning via Language Model

    Authors: Chengyang Gu, Yuxin Pan, Haotian Bai, Hui Xiong, Yize Chen

    Abstract: Hierarchical Imitation Learning (HIL) is a promising approach for tackling long-horizon decision-making tasks. While it is a challenging task due to the lack of detailed supervisory labels for sub-goal learning, and reliance on hundreds to thousands of expert demonstrations. In this work, we introduce SEAL, a novel framework that leverages Large Language Models (LLMs)'s powerful semantic and world… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: 18 pages, 5 figures, in submission

  27. arXiv:2409.19143  [pdf, other

    cs.CV

    Diverse Code Query Learning for Speech-Driven Facial Animation

    Authors: Chunzhi Gu, Shigeru Kuriyama, Katsuya Hotta

    Abstract: Speech-driven facial animation aims to synthesize lip-synchronized 3D talking faces following the given speech signal. Prior methods to this task mostly focus on pursuing realism with deterministic systems, yet characterizing the potentially stochastic nature of facial motions has been to date rarely studied. While generative modeling approaches can easily handle the one-to-many mapping by repeate… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

  28. arXiv:2409.08904  [pdf, other

    cs.RO cs.AI cs.LG

    AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language Models

    Authors: Yifei Yao, Wentao He, Chenyu Gu, Jiaheng Du, Fuwei Tan, Zhen Zhu, Junguo Lu

    Abstract: Training and deploying reinforcement learning (RL) policies for robots, especially in accomplishing specific tasks, presents substantial challenges. Recent advancements have explored diverse reward function designs, training techniques, simulation-to-reality (sim-to-real) transfers, and performance analysis methodologies, yet these still require significant human intervention. This paper introduce… ▽ More

    Submitted 23 February, 2025; v1 submitted 13 September, 2024; originally announced September 2024.

  29. arXiv:2409.05916  [pdf, other

    q-bio.QM cs.AI cs.LG q-bio.BM

    Unlocking Potential Binders: Multimodal Pretraining DEL-Fusion for Denoising DNA-Encoded Libraries

    Authors: Chunbin Gu, Mutian He, Hanqun Cao, Guangyong Chen, Chang-yu Hsieh, Pheng Ann Heng

    Abstract: In the realm of drug discovery, DNA-encoded library (DEL) screening technology has emerged as an efficient method for identifying high-affinity compounds. However, DEL screening faces a significant challenge: noise arising from nonspecific interactions within complex biological systems. Neural networks trained on DEL libraries have been employed to extract compound features, aiming to denoise the… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

  30. arXiv:2409.00909  [pdf, other

    cs.CV cs.AI

    ViRED: Prediction of Visual Relations in Engineering Drawings

    Authors: Chao Gu, Ke Lin, Yiyang Luo, Jiahui Hou, Xiang-Yang Li

    Abstract: To accurately understand engineering drawings, it is essential to establish the correspondence between images and their description tables within the drawings. Existing document understanding methods predominantly focus on text as the main modality, which is not suitable for documents containing substantial image information. In the field of visual relation detection, the structure of the task inh… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 8 pages, 5 figures

  31. arXiv:2408.05358  [pdf, other

    eess.SP cs.CV cs.HC cs.LG

    GesturePrint: Enabling User Identification for mmWave-based Gesture Recognition Systems

    Authors: Lilin Xu, Keyi Wang, Chaojie Gu, Xiuzhen Guo, Shibo He, Jiming Chen

    Abstract: The millimeter-wave (mmWave) radar has been exploited for gesture recognition. However, existing mmWave-based gesture recognition methods cannot identify different users, which is important for ubiquitous gesture interaction in many applications. In this paper, we propose GesturePrint, which is the first to achieve gesture recognition and gesture-based user identification using a commodity mmWave… ▽ More

    Submitted 25 July, 2024; originally announced August 2024.

    Comments: Accepted to the 44th IEEE International Conference on Distributed Computing Systems (ICDCS 2024)

  32. arXiv:2407.06188  [pdf, other

    cs.CV

    CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation

    Authors: Yukang Cao, Xinying Guo, Mingyuan Zhang, Haozhe Xie, Chenyang Gu, Ziwei Liu

    Abstract: While recent advances in text-to-motion generation have shown promising results, they typically assume all individuals are grouped as a single unit. Scaling these methods to handle larger crowds and ensuring that individuals respond appropriately to specific events remains a significant challenge. This is primarily due to the complexities of scene planning, which involves organizing groups, planni… ▽ More

    Submitted 9 May, 2025; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: Project page: https://yukangcao.github.io/CrowdMoGen

  33. arXiv:2407.00987  [pdf, other

    cs.NI eess.SY

    Exploiting Dependency-Aware Priority Adjustment for Mixed-Criticality TSN Flow Scheduling

    Authors: Miao Guo, Yifei Sun, Chaojie Gu, Shibo He, Zhiguo Shi

    Abstract: Time-Sensitive Networking (TSN) serves as a one-size-fits-all solution for mixed-criticality communication, in which flow scheduling is vital to guarantee real-time transmissions. Traditional approaches statically assign priorities to flows based on their associated applications, resulting in significant queuing delays. In this paper, we observe that assigning different priorities to a flow leads… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by IWQoS'24

  34. arXiv:2406.06579  [pdf, other

    cs.CL cs.AI cs.CV

    From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks

    Authors: Xiaofeng Zhang, Yihao Quan, Chen Shen, Xiaosong Yuan, Shaotian Yan, Liang Xie, Wenxiao Wang, Chaochen Gu, Hao Tang, Jieping Ye

    Abstract: Large Vision Language Models (LVLMs) achieve great performance on visual-language reasoning tasks, however, the black-box nature of LVLMs hinders in-depth research on the reasoning mechanism. As all images need to be converted into image tokens to fit the input format of large language models (LLMs) along with natural language prompts, sequential visual representation is essential to the performan… ▽ More

    Submitted 16 October, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  35. arXiv:2406.06258  [pdf, other

    cs.CV

    Tuning-Free Visual Customization via View Iterative Self-Attention Control

    Authors: Xiaojie Li, Chenghao Gu, Shuzhao Xie, Yunpeng Bai, Weixiang Zhang, Zhi Wang

    Abstract: Fine-Tuning Diffusion Models enable a wide range of personalized generation and editing applications on diverse visual modalities. While Low-Rank Adaptation (LoRA) accelerates the fine-tuning process, it still requires multiple reference images and time-consuming training, which constrains its scalability for large-scale and real-time applications. In this paper, we propose \textit{View Iterative… ▽ More

    Submitted 10 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Under review

  36. arXiv:2406.01579  [pdf, other

    cs.CV

    Tetrahedron Splatting for 3D Generation

    Authors: Chun Gu, Zeyu Yang, Zijie Pan, Xiatian Zhu, Li Zhang

    Abstract: 3D representation is essential to the significant advance of 3D generation with 2D diffusion priors. As a flexible representation, NeRF has been first adopted for 3D representation. With density-based volumetric rendering, it however suffers both intensive computational overhead and inaccurate mesh extraction. Using a signed distance field and Marching Tetrahedra, DMTet allows for precise mesh ext… ▽ More

    Submitted 11 October, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Project page: https://fudan-zvg.github.io/tet-splatting/

  37. arXiv:2405.16437  [pdf, other

    cs.CV

    Incremental Pseudo-Labeling for Black-Box Unsupervised Domain Adaptation

    Authors: Yawen Zou, Chunzhi Gu, Jun Yu, Shangce Gao, Chao Zhang

    Abstract: Black-Box unsupervised domain adaptation (BBUDA) learns knowledge only with the prediction of target data from the source model without access to the source data and source model, which attempts to alleviate concerns about the privacy and security of data. However, incorrect pseudo-labels are prevalent in the prediction generated by the source model due to the cross-domain discrepancy, which may s… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  38. arXiv:2405.00816  [pdf

    cs.SI cs.LG

    Sifting out communities in large sparse networks

    Authors: Sharlee Climer, Kenneth Smith Jr, Wei Yang, Lisa de las Fuentes, Victor G. Dávila-Román, C. Charles Gu

    Abstract: Research data sets are growing to unprecedented sizes and network modeling is commonly used to extract complex relationships in diverse domains, such as genetic interactions involved in disease, logistics, and social communities. As the number of nodes increases in a network, an increasing sparsity of edges is a practical limitation due to memory restrictions. Moreover, many of these sparse networ… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  39. arXiv:2404.18359  [pdf, other

    cs.CL cs.AI

    FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models

    Authors: Wei Li, Ren Ma, Jiang Wu, Chenya Gu, Jiahui Peng, Jinyang Len, Songyang Zhang, Hang Yan, Dahua Lin, Conghui He

    Abstract: In the burgeoning field of large language models (LLMs), the assessment of fundamental knowledge remains a critical challenge, particularly for models tailored to Chinese language and culture. This paper introduces FoundaBench, a pioneering benchmark designed to rigorously evaluate the fundamental knowledge capabilities of Chinese LLMs. FoundaBench encompasses a diverse array of 3354 multiple-choi… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  40. arXiv:2404.17381  [pdf, other

    cs.CV

    Frequency-Guided Multi-Level Human Action Anomaly Detection with Normalizing Flows

    Authors: Shun Maeda, Chunzhi Gu, Jun Yu, Shogo Tokai, Shangce Gao, Chao Zhang

    Abstract: We introduce the task of human action anomaly detection (HAAD), which aims to identify anomalous motions in an unsupervised manner given only the pre-determined normal category of training action samples. Compared to prior human-related anomaly detection tasks which primarily focus on unusual events from videos, HAAD involves the learning of specific action labels to recognize semantically anomalo… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  41. arXiv:2404.10311  [pdf, other

    eess.SY cs.AI

    Learning and Optimization for Price-based Demand Response of Electric Vehicle Charging

    Authors: Chengyang Gu, Yuxin Pan, Ruohong Liu, Yize Chen

    Abstract: In the context of charging electric vehicles (EVs), the price-based demand response (PBDR) is becoming increasingly significant for charging load management. Such response usually encourages cost-sensitive customers to adjust their energy demand in response to changes in price for financial incentives. Thus, to model and optimize EV charging, it is important for charging station operator to model… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted by American Control Conference (ACC) 2024

  42. arXiv:2404.02148  [pdf, other

    cs.CV

    Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models

    Authors: Zeyu Yang, Zijie Pan, Chun Gu, Li Zhang

    Abstract: Recent advancements in 3D generation are predominantly propelled by improvements in 3D-aware image diffusion models. These models are pretrained on Internet-scale image data and fine-tuned on massive 3D data, offering the capability of producing highly consistent multi-view images. However, due to the scarcity of synchronized multi-view video data, it remains challenging to adapt this paradigm to… ▽ More

    Submitted 2 October, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Technical Report

  43. arXiv:2404.01958  [pdf, other

    cs.LG

    MESEN: Exploit Multimodal Data to Design Unimodal Human Activity Recognition with Few Labels

    Authors: Lilin Xu, Chaojie Gu, Rui Tan, Shibo He, Jiming Chen

    Abstract: Human activity recognition (HAR) will be an essential function of various emerging applications. However, HAR typically encounters challenges related to modality limitations and label scarcity, leading to an application gap between current solutions and real-world requirements. In this work, we propose MESEN, a multimodal-empowered unimodal sensing framework, to utilize unlabeled multimodal data a… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted to the 21th ACM Conference on Embedded Networked Sensor Systems (SenSys 2023)

  44. arXiv:2404.01284  [pdf, other

    cs.CV

    Large Motion Model for Unified Multi-Modal Motion Generation

    Authors: Mingyuan Zhang, Daisheng Jin, Chenyang Gu, Fangzhou Hong, Zhongang Cai, Jingfang Huang, Chongzhi Zhang, Xinying Guo, Lei Yang, Ying He, Ziwei Liu

    Abstract: Human motion generation, a cornerstone technique in animation and video production, has widespread applications in various tasks like text-to-motion and music-to-dance. Previous works focus on developing specialist models tailored for each task without scalability. In this work, we present Large Motion Model (LMM), a motion-centric, multi-modal framework that unifies mainstream motion generation t… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Homepage: https://mingyuan-zhang.github.io/projects/LMM.html

  45. A Learning-based Incentive Mechanism for Mobile AIGC Service in Decentralized Internet of Vehicles

    Authors: Jiani Fan, Minrui Xu, Ziyao Liu, Huanyi Ye, Chaojie Gu, Dusit Niyato, Kwok-Yan Lam

    Abstract: Artificial Intelligence-Generated Content (AIGC) refers to the paradigm of automated content generation utilizing AI models. Mobile AIGC services in the Internet of Vehicles (IoV) network have numerous advantages over traditional cloud-based AIGC services, including enhanced network efficiency, better reconfigurability, and stronger data security and privacy. Nonetheless, AIGC service provisioning… ▽ More

    Submitted 9 May, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

    Comments: 2023 IEEE 98th Vehicular Technology Conference (VTC2023-Fall)

  46. arXiv:2403.17297  [pdf, other

    cs.CL cs.AI

    InternLM2 Technical Report

    Authors: Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang , et al. (75 additional authors not shown)

    Abstract: The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context m… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  47. Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations

    Authors: Jiaxing Sun, Weiquan Huang, Jiang Wu, Chenya Gu, Wei Li, Songyang Zhang, Hang Yan, Conghui He

    Abstract: We introduce CHARM, the first benchmark for comprehensively and in-depth evaluating the commonsense reasoning ability of large language models (LLMs) in Chinese, which covers both globally known and Chinese-specific commonsense. We evaluated 7 English and 12 Chinese-oriented LLMs on CHARM, employing 5 representative prompt strategies for improving LLMs' reasoning ability, such as Chain-of-Thought.… ▽ More

    Submitted 10 December, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: Equal contribution: Jiaxing Sun, Weiquan Huang, Jiang Wu; Corresponding author: Conghui He

  48. arXiv:2403.10020  [pdf, other

    cs.CL cs.MM

    Lost in Overlap: Exploring Logit-based Watermark Collision in LLMs

    Authors: Yiyang Luo, Ke Lin, Chao Gu, Jiahui Hou, Lijie Wen, Ping Luo

    Abstract: The proliferation of large language models (LLMs) in generating content raises concerns about text copyright. Watermarking methods, particularly logit-based approaches, embed imperceptible identifiers into text to address these challenges. However, the widespread usage of watermarking across diverse LLMs has led to an inevitable issue known as watermark collision during common tasks, such as parap… ▽ More

    Submitted 5 February, 2025; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: Long Paper, 9 pages, accepted at NAACL 2025 Findings

  49. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1112 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 16 December, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  50. arXiv:2401.16235  [pdf

    cs.LG stat.AP

    Player Pressure Map -- A Novel Representation of Pressure in Soccer for Evaluating Player Performance in Different Game Contexts

    Authors: Chaoyi Gu, Jiaming Na, Yisheng Pei, Varuna De Silva

    Abstract: In soccer, contextual player performance metrics are invaluable to coaches. For example, the ability to perform under pressure during matches distinguishes the elite from the average. Appropriate pressure metric enables teams to assess players' performance accurately under pressure and design targeted training scenarios to address their weaknesses. The primary objective of this paper is to leverag… ▽ More

    Submitted 7 March, 2024; v1 submitted 29 January, 2024; originally announced January 2024.