Skip to main content

Showing 1–50 of 270 results for author: Luo, T

Searching in archive cs. Search in all archives.
.
  1. Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems

    Authors: Benjamin Chen Ming Choong, Tao Luo, Cheng Liu, Bingsheng He, Wei Zhang, Joey Tianyi Zhou

    Abstract: Deep neural networks generate and process large volumes of data, posing challenges for low-resource embedded systems. In-memory computing has been demonstrated as an efficient computing infrastructure and shows promise for embedded AI applications. Among newly-researched memory technologies, racetrack memory is a non-volatile technology that allows high data density fabrication, making it a good f… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  2. arXiv:2507.01006  [pdf, ps, other

    cs.CV cs.AI cs.LG

    GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

    Authors: GLM-V Team, :, Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Lihang Pan, Shuaiqi Duan, Weihan Wang, Yan Wang, Yean Cheng, Zehai He, Zhe Su, Zhen Yang, Ziyang Pan, Aohan Zeng, Baoxu Wang, Boyan Shi, Changyu Pang, Chenhui Zhang , et al. (54 additional authors not shown)

    Abstract: We present GLM-4.1V-Thinking, a vision-language model (VLM) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the fi… ▽ More

    Submitted 2 July, 2025; v1 submitted 1 July, 2025; originally announced July 2025.

  3. arXiv:2506.14541  [pdf, ps, other

    cs.CV

    Exploring Diffusion with Test-Time Training on Efficient Image Restoration

    Authors: Rongchang Lu, Tianduo Luo, Yunzhi Jiang, Conghan Yue, Pei Yang, Guibao Liu, Changyang Gu

    Abstract: Image restoration faces challenges including ineffective feature fusion, computational bottlenecks and inefficient diffusion processes. To address these, we propose DiffRWKVIR, a novel framework unifying Test-Time Training (TTT) with efficient diffusion. Our approach introduces three key innovations: (1) Omni-Scale 2D State Evolution extends RWKV's location-dependent parameterization to hierarchic… ▽ More

    Submitted 22 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

    ACM Class: I.4.9

  4. arXiv:2506.10531  [pdf, ps, other

    cs.DC

    GPU-Accelerated Distributed QAOA on Large-scale HPC Ecosystems

    Authors: Zhihao Xu, Srikar Chundury, Seongmin Kim, Amir Shehata, Xinyi Li, Ang Li, Tengfei Luo, Frank Mueller, In-Saeng Suh

    Abstract: Quantum computing holds great potential to accelerate the process of solving complex combinatorial optimization problems. The Distributed Quantum Approximate Optimization Algorithm (DQAOA) addresses high-dimensional, dense problems using current quantum computing techniques and high-performance computing (HPC) systems. In this work, we improve the scalability and efficiency of DQAOA through advanc… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  5. Predicting the Past: Estimating Historical Appraisals with OCR and Machine Learning

    Authors: Mihir Bhaskar, Jun Tao Luo, Zihan Geng, Asmita Hajra, Junia Howell, Matthew R. Gormley

    Abstract: Despite well-documented consequences of the U.S. government's 1930s housing policies on racial wealth disparities, scholars have struggled to quantify its precise financial effects due to the inaccessibility of historical property appraisal records. Many counties still store these records in physical formats, making large-scale quantitative analysis difficult. We present an approach scholars can u… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: Accepted to COMPASS 2025

  6. arXiv:2505.16332  [pdf, other

    quant-ph cs.AI cs.PF

    Is Quantum Optimization Ready? An Effort Towards Neural Network Compression using Adiabatic Quantum Computing

    Authors: Zhehui Wanga, Benjamin Chen Ming Choonga, Tian Huang, Daniel Gerlinghoffa, Rick Siow Mong Goh, Cheng Liu, Tao Luo

    Abstract: Quantum optimization is the most mature quantum computing technology to date, providing a promising approach towards efficiently solving complex combinatorial problems. Methods such as adiabatic quantum computing (AQC) have been employed in recent years on important optimization problems across various domains. In deep learning, deep neural networks (DNN) have reached immense sizes to support new… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  7. arXiv:2505.13955  [pdf, other

    cs.DC

    Paradigm Shift in Infrastructure Inspection Technology: Leveraging High-performance Imaging and Advanced AI Analytics to Inspect Road Infrastructure

    Authors: Du Wu, Enzhi Zhang, Isaac Lyngaas, Xiao Wang, Amir Ziabari, Tao Luo, Peng Chen, Kento Sato, Fumiyoshi Shoji, Takaki Hatsui, Kentaro Uesugi, Akira Seo, Yasuhito Sakai, Toshio Endo, Tetsuya Ishikawa, Satoshi Matsuoka, Mohamed Wahib

    Abstract: Effective road infrastructure management is crucial for modern society. Traditional manual inspection techniques remain constrained by cost, efficiency, and scalability, while camera and laser imaging methods fail to capture subsurface defects critical for long-term structural integrity. This paper introduces ROVAI, an end-to-end framework that integrates high-resolution X-ray computed tomography… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: Submitting this work to be considered for the Gordon Bell Award in SC25

  8. arXiv:2505.13582  [pdf, ps, other

    cs.LG

    Uncovering Critical Sets of Deep Neural Networks via Sample-Independent Critical Lifting

    Authors: Leyang Zhang, Yaoyu Zhang, Tao Luo

    Abstract: This paper investigates the sample dependence of critical points for neural networks. We introduce a sample-independent critical lifting operator that associates a parameter of one network with a set of parameters of another, thus defining sample-dependent and sample-independent lifted critical points. We then show by example that previously studied critical embeddings do not capture all sample-in… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: 20 pages

  9. arXiv:2505.12632  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents

    Authors: Yunseok Jang, Yeda Song, Sungryull Sohn, Lajanugen Logeswaran, Tiange Luo, Dong-Ki Kim, Kyunghoon Bae, Honglak Lee

    Abstract: Recent advancements in Large Language Models (LLMs) and Vision-Language Models (VLMs) have sparked significant interest in developing GUI visual agents. We introduce MONDAY (Mobile OS Navigation Task Dataset for Agents from YouTube), a large-scale dataset of 313K annotated frames from 20K instructional videos capturing diverse real-world mobile OS navigation across multiple platforms. Models that… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

    Comments: CVPR 2025

  10. arXiv:2505.12419  [pdf, ps, other

    cs.LG stat.ML

    Embedding principle of homogeneous neural network for classification problem

    Authors: Jiahan Zhang, Yaoyu Zhang, Tao Luo

    Abstract: Understanding the convergence points and optimization landscape of neural networks is crucial, particularly for homogeneous networks where Karush-Kuhn-Tucker (KKT) points of the associated maximum-margin problem often characterize solutions. This paper investigates the relationship between such KKT points across networks of different widths generated via neuron splitting. We introduce and formaliz… ▽ More

    Submitted 21 May, 2025; v1 submitted 18 May, 2025; originally announced May 2025.

  11. arXiv:2505.10726  [pdf, other

    cs.LG cs.AI

    Learning Repetition-Invariant Representations for Polymer Informatics

    Authors: Yihan Zhu, Gang Liu, Eric Inae, Tengfei Luo, Meng Jiang

    Abstract: Polymers are large macromolecules composed of repeating structural units known as monomers and are widely applied in fields such as energy storage, construction, medicine, and aerospace. However, existing graph neural network methods, though effective for small molecules, only model the single unit of polymers and fail to produce consistent vector representations for the true polymer structure wit… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 18 pages,3 figuares

  12. arXiv:2505.07787  [pdf, other

    cs.CL

    Learning from Peers in Reasoning Models

    Authors: Tongxu Luo, Wenyu Du, Jiaxi Bi, Stephen Chung, Zhengyang Tang, Hao Yang, Min Zhang, Benyou Wang

    Abstract: Large Reasoning Models (LRMs) have the ability to self-correct even when they make mistakes in their reasoning paths. However, our study reveals that when the reasoning process starts with a short but poor beginning, it becomes difficult for the model to recover. We refer to this phenomenon as the "Prefix Dominance Trap". Inspired by psychological findings that peer interaction can promote self-co… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 29 pages, 32 figures

  13. arXiv:2505.03042  [pdf, ps, other

    cs.LG

    A New Perspective To Understanding Multi-resolution Hash Encoding For Neural Fields

    Authors: Steven Tin Sui Luo

    Abstract: Instant-NGP has been the state-of-the-art architecture of neural fields in recent years. Its incredible signal-fitting capabilities are generally attributed to its multi-resolution hash grid structure and have been used and improved in numerous following works. However, it is unclear how and why such a hash grid structure improves the capabilities of a neural network by such great margins. A lack… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  14. arXiv:2505.01656  [pdf, other

    cs.CV

    A Novel WaveInst-based Network for Tree Trunk Structure Extraction and Pattern Analysis in Forest Inventory

    Authors: Chenyang Fan, Xujie Zhu, Taige Luo, Sheng Xu, Zhulin Chen, Hongxin Yang

    Abstract: The pattern analysis of tree structure holds significant scientific value for genetic breeding and forestry management. The current trunk and branch extraction technologies are mainly LiDAR-based or UAV-based. The former approaches obtain high-precision 3D data, but its equipment cost is high and the three-dimensional (3D) data processing is complex. The latter approaches efficiently capture canop… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  15. arXiv:2505.00684  [pdf, other

    cs.CV cs.AI cs.LG

    Visual Test-time Scaling for GUI Agent Grounding

    Authors: Tiange Luo, Lajanugen Logeswaran, Justin Johnson, Honglak Lee

    Abstract: We introduce RegionFocus, a visual test-time scaling approach for Vision Language Model Agents. Understanding webpages is challenging due to the visual complexity of GUI images and the large number of interface elements, making accurate action selection difficult. Our approach dynamically zooms in on relevant regions, reducing background clutter and improving grounding accuracy. To support this pr… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  16. arXiv:2504.21263  [pdf, other

    cs.CV cs.LG cs.MM

    Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning

    Authors: Jinpeng Wang, Tianci Luo, Yaohua Zha, Yan Feng, Ruisheng Luo, Bin Chen, Tao Dai, Long Chen, Yaowei Wang, Shu-Tao Xia

    Abstract: Visual In-Context Learning (VICL) enables adaptively solving vision tasks by leveraging pixel demonstrations, mimicking human-like task completion through analogy. Prompt selection is critical in VICL, but current methods assume the existence of a single "ideal" prompt in a pool of candidates, which in practice may not hold true. Multiple suitable prompts may exist, but individually they often fal… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR'25. 10 pages, 5 figures, 6 tables

  17. arXiv:2504.16074  [pdf, other

    cs.CL

    PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

    Authors: Shi Qiu, Shaoyang Guo, Zhuo-Yang Song, Yunbo Sun, Zeyu Cai, Jiashen Wei, Tianyu Luo, Yixuan Yin, Haoxu Zhang, Yi Hu, Chenyang Wang, Chencheng Tang, Haoling Chang, Qi Liu, Ziheng Zhou, Tianyu Zhang, Jingtian Zhang, Zhangyi Liu, Minghao Li, Yuku Zhang, Boxuan Jing, Xianqi Yin, Yutong Ren, Zizhuo Fu, Jiaming Ji , et al. (29 additional authors not shown)

    Abstract: Current benchmarks for evaluating the reasoning capabilities of Large Language Models (LLMs) face significant limitations: task oversimplification, data contamination, and flawed evaluation items. These deficiencies necessitate more rigorous assessment methods. To address these limitations, we introduce PHYBench, a benchmark of 500 original physics problems ranging from high school to Physics Olym… ▽ More

    Submitted 18 May, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

    Comments: 34 pages ,12 figures, 7 tables, latest update in 2025/05/18

  18. arXiv:2504.14906  [pdf, ps, other

    eess.AS cs.CV cs.SD

    OmniAudio: Generating Spatial Audio from 360-Degree Video

    Authors: Huadai Liu, Tianyi Luo, Kaicheng Luo, Qikai Jiang, Peiwen Sun, Jialei Wang, Rongjie Huang, Qian Chen, Wen Wang, Xiangtai Li, Shiliang Zhang, Zhijie Yan, Zhou Zhao, Wei Xue

    Abstract: Traditional video-to-audio generation techniques primarily focus on perspective video and non-spatial audio, often missing the spatial cues necessary for accurately representing sound sources in 3D environments. To address this limitation, we introduce a novel task, 360V2SA, to generate spatial audio from 360-degree videos, specifically producing First-order Ambisonics (FOA) audio - a standard for… ▽ More

    Submitted 2 June, 2025; v1 submitted 21 April, 2025; originally announced April 2025.

    Comments: ICML 2025

  19. arXiv:2504.14143  [pdf, other

    cs.LG

    Predicting Stress and Damage in Carbon Fiber-Reinforced Composites Deformation Process using Composite U-Net Surrogate Model

    Authors: Zeping Chen, Marwa Yacouti, Maryam Shakiba, Jian-Xun Wang, Tengfei Luo, Vikas Varshney

    Abstract: Carbon fiber-reinforced composites (CFRC) are pivotal in advanced engineering applications due to their exceptional mechanical properties. A deep understanding of CFRC behavior under mechanical loading is essential for optimizing performance in demanding applications such as aerospace structures. While traditional Finite Element Method (FEM) simulations, including advanced techniques like Interfac… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  20. arXiv:2504.13547  [pdf, ps, other

    cs.CR cs.SE

    Version-level Third-Party Library Detection in Android Applications via Class Structural Similarity

    Authors: Bolin Zhou, Jingzheng Wu, Xiang Ling, Tianyue Luo, Jingkun Zhang

    Abstract: Android applications (apps) integrate reusable and well-tested third-party libraries (TPLs) to enhance functionality and shorten development cycles. However, recent research reveals that TPLs have become the largest attack surface for Android apps, where the use of insecure TPLs can compromise both developer and user interests. To mitigate such threats, researchers have proposed various tools to d… ▽ More

    Submitted 18 June, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

    Comments: This paper has been accepted by the International Conference on Evaluation and Assessment in Software Engineering (EASE) 2025

  21. arXiv:2504.12643  [pdf, ps, other

    cs.CV

    RoPETR: Improving Temporal Camera-Only 3D Detection by Integrating Enhanced Rotary Position Embedding

    Authors: Hang Ji, Tao Ni, Xufeng Huang, Zhan Shi, Tao Luo, Xin Zhan, Junbo Chen

    Abstract: This technical report introduces a targeted improvement to the StreamPETR framework, specifically aimed at enhancing velocity estimation, a critical factor influencing the overall NuScenes Detection Score. While StreamPETR exhibits strong 3D bounding box detection performance as reflected by its high mean Average Precision our analysis identified velocity estimation as a substantial bottleneck whe… ▽ More

    Submitted 6 June, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

  22. arXiv:2504.06201  [pdf

    quant-ph cs.CE

    Quantum Annealing for Combinatorial Optimization: A Benchmarking Study

    Authors: Seongmin Kim, Sang-Woo Ahn, In-Saeng Suh, Alexander W. Dowling, Eungkyu Lee, Tengfei Luo

    Abstract: Quantum annealing (QA) has the potential to significantly improve solution quality and reduce time complexity in solving combinatorial optimization problems compared to classical optimization methods. However, due to the limited number of qubits and their connectivity, the QA hardware did not show such an advantage over classical methods in past benchmarking studies. Recent advancements in QA with… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  23. arXiv:2504.03230  [pdf, ps, other

    cs.CV cs.LG

    Unlocking Neural Transparency: Jacobian Maps for Explainable AI in Alzheimer's Detection

    Authors: Yasmine Mustafa, Mohamed Elmahallawy, Tie Luo

    Abstract: Alzheimer's disease (AD) leads to progressive cognitive decline, making early detection crucial for effective intervention. While deep learning models have shown high accuracy in AD diagnosis, their lack of interpretability limits clinical trust and adoption. This paper introduces a novel pre-model approach leveraging Jacobian Maps (JMs) within a multi-modal framework to enhance explainability and… ▽ More

    Submitted 15 June, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

    Comments: PM4B 2025 Best Paper

    Journal ref: 2025 PAKDD Workshop on Pattern mining and Machine learning for Bioinformatics (PM4B)

  24. arXiv:2504.02260  [pdf, other

    cs.LG cs.AI

    Implicit Neural Differential Model for Spatiotemporal Dynamics

    Authors: Deepak Akhare, Pan Du, Tengfei Luo, Jian-Xun Wang

    Abstract: Hybrid neural-physics modeling frameworks through differentiable programming have emerged as powerful tools in scientific machine learning, enabling the integration of known physics with data-driven learning to improve prediction accuracy and generalizability. However, most existing hybrid frameworks rely on explicit recurrent formulations, which suffer from numerical instability and error accumul… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  25. arXiv:2503.23491  [pdf, other

    cond-mat.mtrl-sci cs.AI cs.LG

    POINT$^{2}$: A Polymer Informatics Training and Testing Database

    Authors: Jiaxin Xu, Gang Liu, Ruilan Guo, Meng Jiang, Tengfei Luo

    Abstract: The advancement of polymer informatics has been significantly propelled by the integration of machine learning (ML) techniques, enabling the rapid prediction of polymer properties and expediting the discovery of high-performance polymeric materials. However, the field lacks a standardized workflow that encompasses prediction accuracy, uncertainty quantification, ML interpretability, and polymer sy… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

  26. RBFleX-NAS: Training-Free Neural Architecture Search Using Radial Basis Function Kernel and Hyperparameter Detection

    Authors: Tomomasa Yamasaki, Zhehui Wang, Tao Luo, Niangjun Chen, Bo Wang

    Abstract: Neural Architecture Search (NAS) is an automated technique to design optimal neural network architectures for a specific workload. Conventionally, evaluating candidate networks in NAS involves extensive training, which requires significant time and computational resources. To address this, training-free NAS has been proposed to expedite network evaluation with minimal search time. However, state-o… ▽ More

    Submitted 3 June, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

    Comments: 15 pages, 17 figures, Published on IEEE Transactions on Neural Networks and Learning Systems (TNNLS)

    Journal ref: Year 2025, Volume 36, Number 6

  27. arXiv:2503.21566  [pdf

    cs.CV

    Bearing fault diagnosis based on multi-scale spectral images and convolutional neural network

    Authors: Tongchao Luo, Mingquan Qiu, Zhenyu Wu, Zebo Zhao, Dingyou Zhang

    Abstract: To address the challenges of low diagnostic accuracy in traditional bearing fault diagnosis methods, this paper proposes a novel fault diagnosis approach based on multi-scale spectrum feature images and deep learning. Firstly, the vibration signal are preprocessed through mean removal and then converted to multi-length spectrum with fast Fourier transforms (FFT). Secondly, a novel feature called m… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: 12pages, 10 figures and 8 tables

  28. arXiv:2503.20310  [pdf, other

    cs.CV cs.CR cs.LG

    Enabling Heterogeneous Adversarial Transferability via Feature Permutation Attacks

    Authors: Tao Wu, Tie Luo

    Abstract: Adversarial attacks in black-box settings are highly practical, with transfer-based attacks being the most effective at generating adversarial examples (AEs) that transfer from surrogate models to unseen target models. However, their performance significantly degrades when transferring across heterogeneous architectures -- such as CNNs, MLPs, and Vision Transformers (ViTs) -- due to fundamental ar… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: PAKDD 2025. Main Track

  29. arXiv:2503.16689  [pdf, other

    cs.SD cs.CL eess.AS

    WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching

    Authors: Tianze Luo, Xingchen Miao, Wenbo Duan

    Abstract: Flow matching offers a robust and stable approach to training diffusion models. However, directly applying flow matching to neural vocoders can result in subpar audio quality. In this work, we present WaveFM, a reparameterized flow matching model for mel-spectrogram conditioned speech synthesis, designed to enhance both sample quality and generation speed for diffusion vocoders. Since mel-spectrog… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: Accepted to the main conference of NAACL 2025. The codes are available at https://github.com/luotianze666/WaveFM

  30. arXiv:2503.12880  [pdf, ps, other

    cs.CL cs.AI

    nvBench 2.0: Resolving Ambiguity in Text-to-Visualization through Stepwise Reasoning

    Authors: Tianqi Luo, Chuhan Huang, Leixian Shen, Boyan Li, Shuyu Shen, Wei Zeng, Nan Tang, Yuyu Luo

    Abstract: Text-to-Visualization (Text2VIS) enables users to create visualizations from natural language queries, making data insights more accessible. However, Text2VIS faces challenges in interpreting ambiguous queries, as users often express their visualization needs in imprecise language. To address this challenge, we introduce nBench 2.0, a new benchmark designed to evaluate Text2VIS systems in scenar… ▽ More

    Submitted 7 June, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

  31. arXiv:2502.11168  [pdf, other

    cs.CV cs.AI

    Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding

    Authors: Xin Gu, Yaojie Shen, Chenxi Luo, Tiejian Luo, Yan Huang, Yuewei Lin, Heng Fan, Libo Zhang

    Abstract: Transformer has attracted increasing interest in STVG, owing to its end-to-end pipeline and promising result. Existing Transformer-based STVG approaches often leverage a set of object queries, which are initialized simply using zeros and then gradually learn target position information via iterative interactions with multimodal features, for spatial and temporal localization. Despite simplicity, t… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

  32. arXiv:2502.05567  [pdf, other

    cs.CL cs.AI cs.LG

    ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data

    Authors: Xiaoyang Liu, Kangjie Bao, Jiashuo Zhang, Yunqi Liu, Yuntian Liu, Yu Chen, Yang Jiao, Tao Luo

    Abstract: Autoformalization, the automatic translation of mathematical content from natural language into machine-verifiable formal languages, has seen significant progress driven by advances in large language models (LLMs). Nonetheless, a primary barrier to further improvements is the limited availability of parallel corpora that map informal mathematical text to its formal counterpart. To address this lim… ▽ More

    Submitted 19 May, 2025; v1 submitted 8 February, 2025; originally announced February 2025.

  33. arXiv:2502.05242  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Beyond External Monitors: Enhancing Transparency of Large Language Models for Easier Monitoring

    Authors: Guanxu Chen, Dongrui Liu, Tao Luo, Lijie Hu, Jing Shao

    Abstract: Large language models (LLMs) are becoming increasingly capable, but the mechanisms of their thinking and decision-making process remain unclear. Chain-of-thoughts (CoTs) have been commonly utilized to monitor LLMs, but this strategy fails to accurately reflect LLMs' thinking process. Techniques based on LLMs' hidden representations provide an inner perspective to monitor their latent thinking. How… ▽ More

    Submitted 28 May, 2025; v1 submitted 7 February, 2025; originally announced February 2025.

    Comments: 25 pages,6 figures,13 tables

  34. arXiv:2501.05107  [pdf

    cs.RO physics.app-ph

    Harnessing the Power of Vibration Motors to Develop Miniature Untethered Robotic Fishes

    Authors: Chongjie Jiang, Yingying Dai, Jinyang Le, Xiaomeng Chen, Yu Xie, Wei Zhou, Fuzhou Niu, Ying Li, Tao Luo

    Abstract: Miniature underwater robots play a crucial role in the exploration and development of marine resources, particularly in confined spaces and high-pressure deep-sea environments. This study presents the design, optimization, and performance of a miniature robotic fish, powered by the oscillation of bio-inspired fins. These fins feature a rigid-flexible hybrid structure and use an eccentric rotating… ▽ More

    Submitted 9 January, 2025; originally announced January 2025.

    Comments: 8 pages, 8 figures

  35. arXiv:2501.01645  [pdf, other

    cs.CV cs.AI

    HLV-1K: A Large-scale Hour-Long Video Benchmark for Time-Specific Long Video Understanding

    Authors: Heqing Zou, Tianze Luo, Guiyang Xie, Victor Xiao Jie Zhang, Fengmao Lv, Guangcong Wang, Junyang Chen, Zhuochen Wang, Hansheng Zhang, Huaijian Zhang

    Abstract: Multimodal large language models have become a popular topic in deep visual understanding due to many promising real-world applications. However, hour-long video understanding, spanning over one hour and containing tens of thousands of visual frames, remains under-explored because of 1) challenging long-term video analyses, 2) inefficient large-model approaches, and 3) lack of large-scale benchmar… ▽ More

    Submitted 13 May, 2025; v1 submitted 3 January, 2025; originally announced January 2025.

    Comments: Accepted to ICME 2025

  36. arXiv:2501.00569  [pdf, other

    cs.CV cs.LG

    Probing Visual Language Priors in VLMs

    Authors: Tiange Luo, Ang Cao, Gunhee Lee, Justin Johnson, Honglak Lee

    Abstract: Despite recent advances in Vision-Language Models (VLMs), they may over-rely on visual language priors existing in their training data rather than true visual reasoning. To investigate this, we introduce ViLP, a benchmark featuring deliberately out-of-distribution images synthesized via image generation models and out-of-distribution Q&A pairs. Each question in ViLP is coupled with three potential… ▽ More

    Submitted 11 April, 2025; v1 submitted 31 December, 2024; originally announced January 2025.

    Comments: Project Page: https://vilp-team.github.io/

  37. arXiv:2412.01317  [pdf, other

    cs.SE

    The Seeds of the FUTURE Sprout from History: Fuzzing for Unveiling Vulnerabilities in Prospective Deep-Learning Libraries

    Authors: Zhiyuan Li, Jingzheng Wu, Xiang Ling, Tianyue Luo, Zhiqing Rui, Yanjun Wu

    Abstract: The widespread application of large language models (LLMs) underscores the importance of deep learning (DL) technologies that rely on foundational DL libraries such as PyTorch and TensorFlow. Despite their robust features, these libraries face challenges with scalability and adaptation to rapid advancements in the LLM community. In response, tech giants like Apple and Huawei are developing their o… ▽ More

    Submitted 11 December, 2024; v1 submitted 2 December, 2024; originally announced December 2024.

    Comments: This paper has been accepted by 47th International Conference on Software Engineering (ICSE 2025)

  38. arXiv:2411.16799  [pdf, other

    cs.CV

    One is Plenty: A Polymorphic Feature Interpreter for Immutable Heterogeneous Collaborative Perception

    Authors: Yuchen Xia, Quan Yuan, Guiyang Luo, Xiaoyuan Fu, Yang Li, Xuanhan Zhu, Tianyou Luo, Siheng Chen, Jinglin Li

    Abstract: Collaborative perception in autonomous driving significantly enhances the perception capabilities of individual agents. Immutable heterogeneity, where agents have different and fixed perception networks, presents a major challenge due to the semantic gap in exchanged intermediate features without modifying the perception networks. Most existing methods bridge the semantic gap through interpreters.… ▽ More

    Submitted 23 March, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

    Comments: CVPR2025

  39. arXiv:2411.16724  [pdf, other

    cs.CV

    Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens

    Authors: Zhangqi Jiang, Junkai Chen, Beier Zhu, Tingjin Luo, Yankun Shen, Xu Yang

    Abstract: Hallucinations in Large Vision-Language Models (LVLMs) significantly undermine their reliability, motivating researchers to explore the causes of hallucination. However, most studies primarily focus on the language aspect rather than the visual. In this paper, we address how LVLMs process visual information and whether this process causes hallucination. Firstly, we use the attention lens to identi… ▽ More

    Submitted 31 March, 2025; v1 submitted 22 November, 2024; originally announced November 2024.

  40. arXiv:2411.15583  [pdf, other

    cs.HC

    Exploring Viewing Modalities in Cinematic Virtual Reality: A Systematic Review and Meta-Analysis of Challenges in Evaluating User Experience

    Authors: Yawen Zhang, Han Zhou, Zhoumingju Jiang, Zilu Tang, Tao Luo, Qinyuan Lei

    Abstract: Cinematic Virtual Reality (CVR) is a narrative-driven VR experience that uses head-mounted displays with a 360-degree field of view. Previous research has explored different viewing modalities to enhance viewers' CVR experience. This study conducted a systematic review and meta-analysis focusing on how different viewing modalities, including intervened rotation, avatar assistance, guidance cues, a… ▽ More

    Submitted 23 November, 2024; originally announced November 2024.

    Comments: 29 pages, recommend for acceptance by CSCW

  41. arXiv:2411.13490  [pdf, other

    eess.IV cs.CV cs.NE cs.PF

    Efficient Brain Imaging Analysis for Alzheimer's and Dementia Detection Using Convolution-Derivative Operations

    Authors: Yasmine Mustafa, Mohamed Elmahallawy, Tie Luo

    Abstract: Alzheimer's disease (AD) is characterized by progressive neurodegeneration and results in detrimental structural changes in human brains. Detecting these changes is crucial for early diagnosis and timely intervention of disease progression. Jacobian maps, derived from spatial normalization in voxel-based morphometry (VBM), have been instrumental in interpreting volume alterations associated with A… ▽ More

    Submitted 22 November, 2024; v1 submitted 20 November, 2024; originally announced November 2024.

  42. arXiv:2411.10682  [pdf, other

    cs.CV

    Underwater Image Enhancement with Cascaded Contrastive Learning

    Authors: Yi Liu, Qiuping Jiang, Xinyi Wang, Ting Luo, Jingchun Zhou

    Abstract: Underwater image enhancement (UIE) is a highly challenging task due to the complexity of underwater environment and the diversity of underwater image degradation. Due to the application of deep learning, current UIE methods have made significant progress. Most of the existing deep learning-based UIE methods follow a single-stage network which cannot effectively address the diverse degradations sim… ▽ More

    Submitted 15 November, 2024; originally announced November 2024.

    Comments: Accepted by IEEE Transacitons on MultiMedia

  43. arXiv:2411.04713  [pdf, other

    cs.CV

    Multi-Reward as Condition for Instruction-based Image Editing

    Authors: Xin Gu, Ming Li, Libo Zhang, Fan Chen, Longyin Wen, Tiejian Luo, Sijie Zhu

    Abstract: High-quality training triplets (instruction, original image, edited image) are essential for instruction-based image editing. Predominant training datasets (e.g., InsPix2Pix) are created using text-to-image generative models (e.g., Stable Diffusion, DALL-E) which are not trained for image editing. Accordingly, these datasets suffer from inaccurate instruction following, poor detail preserving, and… ▽ More

    Submitted 19 March, 2025; v1 submitted 6 November, 2024; originally announced November 2024.

  44. DesignRepair: Dual-Stream Design Guideline-Aware Frontend Repair with Large Language Models

    Authors: Mingyue Yuan, Jieshan Chen, Zhenchang Xing, Aaron Quigley, Yuyu Luo, Tianqi Luo, Gelareh Mohammadi, Qinghua Lu, Liming Zhu

    Abstract: The rise of Large Language Models (LLMs) has streamlined frontend interface creation through tools like Vercel's V0, yet surfaced challenges in design quality (e.g., accessibility, and usability). Current solutions, often limited by their focus, generalisability, or data dependency, fall short in addressing these complexities. Moreover, none of them examine the quality of LLM-generated UI design.… ▽ More

    Submitted 12 December, 2024; v1 submitted 3 November, 2024; originally announced November 2024.

    Comments: 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)

    ACM Class: D.2.2

  45. arXiv:2410.20680  [pdf, ps, other

    eess.SP cs.LG

    Multi-modal Data based Semi-Supervised Learning for Vehicle Positioning

    Authors: Ouwen Huan, Yang Yang, Tao Luo, Mingzhe Chen

    Abstract: In this paper, a multi-modal data based semi-supervised learning (SSL) framework that jointly use channel state information (CSI) data and RGB images for vehicle positioning is designed. In particular, an outdoor positioning system where the vehicle locations are determined by a base station (BS) is considered. The BS equipped with several cameras can collect a large amount of unlabeled CSI data a… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  46. arXiv:2410.20119  [pdf, other

    cs.LG

    On Multi-Stage Loss Dynamics in Neural Networks: Mechanisms of Plateau and Descent Stages

    Authors: Zheng-An Chen, Tao Luo, GuiHong Wang

    Abstract: The multi-stage phenomenon in the training loss curves of neural networks has been widely observed, reflecting the non-linearity and complexity inherent in the training process. In this work, we investigate the training dynamics of neural networks (NNs), with particular emphasis on the small initialization regime, identifying three distinct stages observed in the loss curve during training: the in… ▽ More

    Submitted 5 November, 2024; v1 submitted 26 October, 2024; originally announced October 2024.

  47. arXiv:2410.19788  [pdf, ps, other

    eess.SP cs.CV cs.LG

    Multi-modal Image and Radio Frequency Fusion for Optimizing Vehicle Positioning

    Authors: Ouwen Huan, Tao Luo, Mingzhe Chen

    Abstract: In this paper, a multi-modal vehicle positioning framework that jointly localizes vehicles with channel state information (CSI) and images is designed. In particular, we consider an outdoor scenario where each vehicle can communicate with only one BS, and hence, it can upload its estimated CSI to only its associated BS. Each BS is equipped with a set of cameras, such that it can collect a small nu… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  48. Enabling Energy-Efficient Deployment of Large Language Models on Memristor Crossbar: A Synergy of Large and Small

    Authors: Zhehui Wang, Tao Luo, Cheng Liu, Weichen Liu, Rick Siow Mong Goh, Weng-Fai Wong

    Abstract: Large language models (LLMs) have garnered substantial attention due to their promising applications in diverse domains. Nevertheless, the increasing size of LLMs comes with a significant surge in the computational requirements for training and deployment. Memristor crossbars have emerged as a promising solution, which demonstrated a small footprint and remarkably high energy efficiency in compute… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (2024 early access)

  49. arXiv:2410.06308  [pdf, other

    math.NA cs.LG

    Quantifying Training Difficulty and Accelerating Convergence in Neural Network-Based PDE Solvers

    Authors: Chuqi Chen, Qixuan Zhou, Yahong Yang, Yang Xiang, Tao Luo

    Abstract: Neural network-based methods have emerged as powerful tools for solving partial differential equations (PDEs) in scientific and engineering applications, particularly when handling complex domains or incorporating empirical data. These methods leverage neural networks as basis functions to approximate PDE solutions. However, training such networks can be challenging, often resulting in limited acc… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  50. arXiv:2410.05161  [pdf, other

    cs.DC

    A Seesaw Model Attack Algorithm for Distributed Learning

    Authors: Kun Yang, Tianyi Luo, Yanjie Dong, Aohan Li

    Abstract: We investigate the Byzantine attack problem within the context of model training in distributed learning systems. While ensuring the convergence of current model training processes, common solvers (e.g. SGD, Adam, RMSProp, etc.) can be easily compromised by malicious nodes in these systems. Consequently, the training process may either converge slowly or even diverge. To develop effective secure d… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: Accepted for presentation at IEEE SmartIoT 2024