Skip to main content

Showing 1–50 of 64 results for author: Tu, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.13909  [pdf, ps, other

    cs.LG cs.AI

    Few-Shot Learning for Industrial Time Series: A Comparative Analysis Using the Example of Screw-Fastening Process Monitoring

    Authors: Xinyuan Tu, Haocheng Zhang, Tao Chengxu, Zuyi Chen

    Abstract: Few-shot learning (FSL) has shown promise in vision but remains largely unexplored for \emph{industrial} time-series data, where annotating every new defect is prohibitively expensive. We present a systematic FSL study on screw-fastening process monitoring, using a 2\,300-sample multivariate torque dataset that covers 16 uni- and multi-factorial defect types. Beyond benchmarking, we introduce a \t… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  2. arXiv:2506.13833  [pdf, ps, other

    cs.SD cs.AI cs.RO eess.AS physics.app-ph

    A Survey on World Models Grounded in Acoustic Physical Information

    Authors: Xiaoliang Chen, Le Chang, Xin Yu, Yunhe Huang, Xianling Tu

    Abstract: This survey provides a comprehensive overview of the emerging field of world models grounded in the foundation of acoustic physical information. It examines the theoretical underpinnings, essential methodological frameworks, and recent technological advancements in leveraging acoustic signals for high-fidelity environmental perception, causal physical reasoning, and predictive simulation of dynami… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 28 pages,11 equations

    MSC Class: 68T07; 35L05; 78A45 ACM Class: I.2.6; H.5.5; I.2.9

  3. arXiv:2506.04953  [pdf, ps, other

    cs.CV

    APVR: Hour-Level Long Video Understanding with Adaptive Pivot Visual Information Retrieval

    Authors: Hong Gao, Yiming Bao, Xuezhen Tu, Bin Zhong, Minling Zhang

    Abstract: Current multimodal large language models (MLLMs) struggle with hour-level video understanding, facing significant challenges not only in modeling the substantial information volume of long videos but also in overcoming the memory wall and resource constraints during both training and inference. Although recent training-free approaches have alleviated resource demands by compressing visual features… ▽ More

    Submitted 28 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

  4. arXiv:2505.10144  [pdf, ps, other

    cs.GR cs.CV

    VRSplat: Fast and Robust Gaussian Splatting for Virtual Reality

    Authors: Xuechang Tu, Lukas Radl, Michael Steiner, Markus Steinberger, Bernhard Kerbl, Fernando de la Torre

    Abstract: 3D Gaussian Splatting (3DGS) has rapidly become a leading technique for novel-view synthesis, providing exceptional performance through efficient software-based GPU rasterization. Its versatility enables real-time applications, including on mobile and lower-powered devices. However, 3DGS faces key challenges in virtual reality (VR): (1) temporal artifacts, such as popping during head movements, (2… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: I3D'25 (PACMCGIT); Project Page: https://cekavis.site/VRSplat/

    Journal ref: Proc. ACM Comput. Graph. Interact. Tech., volume 8(1), May 2025

  5. arXiv:2505.01998  [pdf, other

    cs.RO cs.AI physics.app-ph

    A Synergistic Framework of Nonlinear Acoustic Computing and Reinforcement Learning for Real-World Human-Robot Interaction

    Authors: Xiaoliang Chen, Xin Yu, Le Chang, Yunhe Huang, Jiashuai He, Shibo Zhang, Jin Li, Likai Lin, Ziyu Zeng, Xianling Tu, Shuyu Zhang

    Abstract: This paper introduces a novel framework integrating nonlinear acoustic computing and reinforcement learning to enhance advanced human-robot interaction under complex noise and reverberation. Leveraging physically informed wave equations (e.g., Westervelt, KZK), the approach captures higher-order phenomena such as harmonic generation and shock formation. By embedding these models in a reinforcement… ▽ More

    Submitted 6 May, 2025; v1 submitted 4 May, 2025; originally announced May 2025.

    Comments: 34 pages, 11 figures, 10 tables, and 10 equations

    MSC Class: 68T01 ACM Class: I.2.8

  6. arXiv:2504.15300  [pdf, other

    cs.LG cs.DC cs.MA

    Collaborative Learning of On-Device Small Model and Cloud-Based Large Model: Advances and Future Directions

    Authors: Chaoyue Niu, Yucheng Ding, Junhui Lu, Zhengxiang Huang, Hang Zeng, Yutong Dai, Xuezhen Tu, Chengfei Lv, Fan Wu, Guihai Chen

    Abstract: The conventional cloud-based large model learning framework is increasingly constrained by latency, cost, personalization, and privacy concerns. In this survey, we explore an emerging paradigm: collaborative learning between on-device small model and cloud-based large model, which promises low-latency, cost-efficient, and personalized intelligent services while preserving user privacy. We provide… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  7. arXiv:2504.04050  [pdf, other

    cs.CL

    FISH-Tuning: Enhancing PEFT Methods with Fisher Information

    Authors: Kang Xue, Ming Dong, Xinhui Tu, Tingting He

    Abstract: The rapid growth in the parameter size of Large Language Models (LLMs) has spurred the development of Parameter-Efficient Fine-Tuning (PEFT) methods to mitigate the substantial computational costs of fine-tuning. Among these, Fisher Induced Sparse uncHanging (FISH) Mask is a selection-based PEFT technique that identifies a critical subset of pre-trained parameters using approximate Fisher informat… ▽ More

    Submitted 25 May, 2025; v1 submitted 5 April, 2025; originally announced April 2025.

  8. arXiv:2503.07998  [pdf, other

    cs.CV

    Efficient Dataset Distillation through Low-Rank Space Sampling

    Authors: Hangyang Kong, Wenbo Zhou, Xuxiang He, Xiaotong Tu, Xinghao Ding

    Abstract: Huge amount of data is the key of the success of deep learning, however, redundant information impairs the generalization ability of the model and increases the burden of calculation. Dataset Distillation (DD) compresses the original dataset into a smaller but representative subset for high-quality data and efficient training strategies. Existing works for DD generate synthetic images by treating… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: 9 pages, 5 figures

  9. arXiv:2502.11307  [pdf, other

    cs.CV cs.AI

    Exploiting Point-Language Models with Dual-Prompts for 3D Anomaly Detection

    Authors: Jiaxiang Wang, Haote Xu, Xiaolu Chen, Haodi Xu, Yue Huang, Xinghao Ding, Xiaotong Tu

    Abstract: Anomaly detection (AD) in 3D point clouds is crucial in a wide range of industrial applications, especially in various forms of precision manufacturing. Considering the industrial demand for reliable 3D AD, several methods have been developed. However, most of these approaches typically require training separate models for each category, which is memory-intensive and lacks flexibility. In this pap… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

    Comments: 10 pages, 7 figures

  10. arXiv:2502.00545  [pdf, other

    cs.LG cs.AI cs.CV

    Integrating Frequency Guidance into Multi-source Domain Generalization for Bearing Fault Diagnosis

    Authors: Xiaotong Tu, Chenyu Ma, Qingyao Wu, Yinhao Liu, Hongyang Zhang

    Abstract: Recent generalizable fault diagnosis researches have effectively tackled the distributional shift between unseen working conditions. Most of them mainly focus on learning domain-invariant representation through feature-level methods. However, the increasing numbers of unseen domains may lead to domain-invariant features contain instance-level spurious correlations, which impact the previous models… ▽ More

    Submitted 1 February, 2025; originally announced February 2025.

  11. arXiv:2501.14995  [pdf, other

    cs.LG

    GreenAuto: An Automated Platform for Sustainable AI Model Design on Edge Devices

    Authors: Xiaolong Tu, Dawei Chen, Kyungtae Han, Onur Altintas, Haoxin Wang

    Abstract: We present GreenAuto, an end-to-end automated platform designed for sustainable AI model exploration, generation, deployment, and evaluation. GreenAuto employs a Pareto front-based search method within an expanded neural architecture search (NAS) space, guided by gradient descent to optimize model exploration. Pre-trained kernel-level energy predictors estimate energy consumption across all models… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

  12. arXiv:2412.05582  [pdf, other

    eess.SP cs.IT cs.LG

    DM-SBL: Channel Estimation under Structured Interference

    Authors: Yifan Wang, Chengjie Yu, Jiang Zhu, Fangyong Wang, Xingbin Tu, Yan Wei, Fengzhong Qu

    Abstract: Channel estimation is a fundamental task in communication systems and is critical for effective demodulation. While most works deal with a simple scenario where the measurements are corrupted by the additive white Gaussian noise (AWGN), this work addresses the more challenging scenario where both AWGN and structured interference coexist. Such conditions arise, for example, when a sonar/radar trans… ▽ More

    Submitted 7 December, 2024; originally announced December 2024.

  13. arXiv:2411.18875  [pdf, other

    cs.SI

    Know Your Account: Double Graph Inference-based Account De-anonymization on Ethereum

    Authors: Shuyi Miao, Wangjie Qiu, Hongwei Zheng, Qinnan Zhang, Xiaofan Tu, Xunan Liu, Yang Liu, Jin Dong, Zhiming Zheng

    Abstract: The scaled Web 3.0 digital economy, represented by decentralized finance (DeFi), has sparked increasing interest in the past few years, which usually relies on blockchain for token transfer and diverse transaction logic. However, illegal behaviors, such as financial fraud, hacker attacks, and money laundering, are rampant in the blockchain ecosystem and seriously threaten its integrity and securit… ▽ More

    Submitted 13 January, 2025; v1 submitted 27 November, 2024; originally announced November 2024.

  14. arXiv:2411.09804  [pdf, other

    cs.LG

    Fair Resource Allocation in Weakly Coupled Markov Decision Processes

    Authors: Xiaohui Tu, Yossiri Adulyasak, Nima Akbarzadeh, Erick Delage

    Abstract: We consider fair resource allocation in sequential decision-making environments modeled as weakly coupled Markov decision processes, where resource constraints couple the action spaces of $N$ sub-Markov decision processes (sub-MDPs) that would otherwise operate independently. We adopt a fairness definition using the generalized Gini function instead of the traditional utilitarian (total-sum) objec… ▽ More

    Submitted 27 April, 2025; v1 submitted 14 November, 2024; originally announced November 2024.

  15. arXiv:2411.04386  [pdf, other

    cs.RO

    SuperQ-GRASP: Superquadrics-based Grasp Pose Estimation on Larger Objects for Mobile-Manipulation

    Authors: Xun Tu, Karthik Desingh

    Abstract: Grasp planning and estimation have been a longstanding research problem in robotics, with two main approaches to find graspable poses on the objects: 1) geometric approach, which relies on 3D models of objects and the gripper to estimate valid grasp poses, and 2) data-driven, learning-based approach, with models trained to identify grasp poses from raw sensor observations. The latter assumes compr… ▽ More

    Submitted 9 April, 2025; v1 submitted 6 November, 2024; originally announced November 2024.

    Comments: 8 pages, 7 figures, accepted by ICRA 2025

    ACM Class: I.3.5; I.2.9

  16. arXiv:2409.18899  [pdf, other

    cs.CV eess.IV

    Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors

    Authors: Yunlong Lin, Zhenqi Fu, Kairun Wen, Tian Ye, Sixiang Chen, Ge Meng, Yingying Wang, Yue Huang, Xiaotong Tu, Xinghao Ding

    Abstract: Low-light image enhancement (LIE) aims at precisely and efficiently recovering an image degraded in poor illumination environments. Recent advanced LIE techniques are using deep neural networks, which require lots of low-normal light image pairs, network parameters, and computational resources. As a result, their practicality is limited. In this work, we devise a novel unsupervised LIE framework b… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

    Comments: 13 pages, 10 figures

  17. arXiv:2409.09401  [pdf, ps, other

    cs.CL

    Towards Diverse and Efficient Audio Captioning via Diffusion Models

    Authors: Manjie Xu, Chenxing Li, Xinyi Tu, Yong Ren, Ruibo Fu, Wei Liang, Dong Yu

    Abstract: We introduce Diffusion-based Audio Captioning (DAC), a non-autoregressive diffusion model tailored for diverse and efficient audio captioning. Although existing captioning models relying on language backbones have achieved remarkable success in various captioning tasks, their insufficient performance in terms of generation speed and diversity impede progress in audio understanding and multimedia a… ▽ More

    Submitted 1 June, 2025; v1 submitted 14 September, 2024; originally announced September 2024.

    Comments: https://sites.google.com/view/diffusion-audio-captioning

  18. arXiv:2407.07464  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Video-to-Audio Generation with Hidden Alignment

    Authors: Manjie Xu, Chenxing Li, Xinyi Tu, Yong Ren, Rilin Chen, Yu Gu, Wei Liang, Dong Yu

    Abstract: Generating semantically and temporally aligned audio content in accordance with video input has become a focal point for researchers, particularly following the remarkable breakthrough in text-to-video generation. In this work, we aim to offer insights into the video-to-audio generation paradigm, focusing on three crucial aspects: vision encoders, auxiliary embeddings, and data augmentation techni… ▽ More

    Submitted 11 March, 2025; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: https://sites.google.com/view/vta-ldm

  19. arXiv:2404.15807  [pdf, other

    cs.CL

    One Subgraph for All: Efficient Reasoning on Opening Subgraphs for Inductive Knowledge Graph Completion

    Authors: Zhiwen Xie, Yi Zhang, Guangyou Zhou, Jin Liu, Xinhui Tu, Jimmy Xiangji Huang

    Abstract: Knowledge Graph Completion (KGC) has garnered massive research interest recently, and most existing methods are designed following a transductive setting where all entities are observed during training. Despite the great progress on the transductive KGC, these methods struggle to conduct reasoning on emerging KGs involving unseen entities. Thus, inductive KGC, which aims to deduce missing links am… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  20. arXiv:2403.07952  [pdf, other

    cs.CV cs.AI cs.MM

    AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production

    Authors: Jiuniu Wang, Zehua Du, Yuyuan Zhao, Bo Yuan, Kexiang Wang, Jian Liang, Yaxi Zhao, Yihen Lu, Gengliang Li, Junlong Gao, Xin Tu, Zhenyu Guo

    Abstract: The Agent and AIGC (Artificial Intelligence Generated Content) technologies have recently made significant progress. We propose AesopAgent, an Agent-driven Evolutionary System on Story-to-Video Production. AesopAgent is a practical application of agent technology for multimodal content generation. The system integrates multiple generative capabilities within a unified framework, so that individual… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 22 pages, 13 figures

  21. arXiv:2403.00784  [pdf, other

    cs.IR cs.AI cs.CL

    Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges

    Authors: Jiajia Wang, Jimmy X. Huang, Xinhui Tu, Junmei Wang, Angela J. Huang, Md Tahmid Rahman Laskar, Amran Bhuiyan

    Abstract: Recent years have witnessed a substantial increase in the use of deep learning to solve various natural language processing (NLP) problems. Early deep learning models were constrained by their sequential or unidirectional nature, such that they struggled to capture the contextual relationships across text inputs. The introduction of bidirectional encoder representations from transformers (BERT) le… ▽ More

    Submitted 18 February, 2024; originally announced March 2024.

  22. arXiv:2402.13714  [pdf, other

    q-bio.QM cs.AI cs.LG

    An Evaluation of Large Language Models in Bioinformatics Research

    Authors: Hengchuang Yin, Zhonghui Gu, Fanhao Wang, Yiparemu Abuduhaibaier, Yanqiao Zhu, Xinming Tu, Xian-Sheng Hua, Xiao Luo, Yizhou Sun

    Abstract: Large language models (LLMs) such as ChatGPT have gained considerable interest across diverse research communities. Their notable ability for text completion and generation has inaugurated a novel paradigm for language-interfaced problem solving. However, the potential and efficacy of these models in bioinformatics remain incompletely explored. In this work, we study the performance LLMs on a wide… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Under review

  23. arXiv:2312.15516  [pdf, other

    cs.CV

    A-SDM: Accelerating Stable Diffusion through Redundancy Removal and Performance Optimization

    Authors: Jinchao Zhu, Yuxuan Wang, Xiaobing Tu, Siyuan Pan, Pengfei Wan, Gao Huang

    Abstract: The Stable Diffusion Model (SDM) is a popular and efficient text-to-image (t2i) generation and image-to-image (i2i) generation model. Although there have been some attempts to reduce sampling steps, model distillation, and network quantization, these previous methods generally retain the original network architecture. Billion scale parameters and high computing requirements make the research of mo… ▽ More

    Submitted 4 March, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

    Comments: Since the experimental part has not been added, we wish to withdraw the manuscript, and we hope to submit it after the experiment has been verified

  24. arXiv:2312.00103  [pdf, other

    cs.LG cs.PF

    DeepEn2023: Energy Datasets for Edge Artificial Intelligence

    Authors: Xiaolong Tu, Anik Mallik, Haoxin Wang, Jiang Xie

    Abstract: Climate change poses one of the most significant challenges to humanity. As a result of these climatic changes, the frequency of weather, climate, and water-related disasters has multiplied fivefold over the past 50 years, resulting in over 2 million deaths and losses exceeding $3.64 trillion USD. Leveraging AI-powered technologies for sustainable development and combating climate change is a prom… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

    Comments: arXiv admin note: text overlap with arXiv:2310.18329

  25. arXiv:2310.18329  [pdf, other

    cs.NI cs.AI cs.LG cs.PF

    Unveiling Energy Efficiency in Deep Learning: Measurement, Prediction, and Scoring across Edge Devices

    Authors: Xiaolong Tu, Anik Mallik, Dawei Chen, Kyungtae Han, Onur Altintas, Haoxin Wang, Jiang Xie

    Abstract: Today, deep learning optimization is primarily driven by research focused on achieving high inference accuracy and reducing latency. However, the energy efficiency aspect is often overlooked, possibly due to a lack of sustainability mindset in the field and the absence of a holistic energy dataset. In this paper, we conduct a threefold study, including energy measurement, prediction, and efficienc… ▽ More

    Submitted 10 June, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: This paper has been accepted by ACM/IEEE Symposium on Edge Computing (SEC '23)

    ACM Class: I.2.11

  26. arXiv:2310.08045  [pdf, other

    cs.RO eess.SY

    Model Predictive Inferential Control of Neural State-Space Models for Autonomous Vehicle Motion Planning

    Authors: Iman Askari, Ali Vaziri, Xuemin Tu, Shen Zeng, Huazhen Fang

    Abstract: Model predictive control (MPC) has proven useful in enabling safe and optimal motion planning for autonomous vehicles. In this paper, we investigate how to achieve MPC-based motion planning when a neural state-space model represents the vehicle dynamics. As the neural state-space model will lead to highly complex, nonlinear and nonconvex optimization landscapes, mainstream gradient-based MPC metho… ▽ More

    Submitted 23 May, 2025; v1 submitted 12 October, 2023; originally announced October 2023.

  27. arXiv:2310.04780  [pdf, other

    cs.CV

    IPMix: Label-Preserving Data Augmentation Method for Training Robust Classifiers

    Authors: Zhenglin Huang, Xiaoan Bao, Na Zhang, Qingqi Zhang, Xiaomei Tu, Biao Wu, Xi Yang

    Abstract: Data augmentation has been proven effective for training high-accuracy convolutional neural network classifiers by preventing overfitting. However, building deep neural networks in real-world scenarios requires not only high accuracy on clean data but also robustness when data distributions shift. While prior methods have proposed that there is a trade-off between accuracy and robustness, we propo… ▽ More

    Submitted 13 March, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  28. arXiv:2308.14524  [pdf, other

    cs.HC cs.MM cs.NI eess.SY

    Towards enabling reliable immersive teleoperation through Digital Twin: A UAV command and control use case

    Authors: Nassim Sehad, Xinyi Tu, Akash Rajasekaran, Hamed Hellaoui, Riku Jäntti, Mérouane Debbah

    Abstract: This paper addresses the challenging problem of enabling reliable immersive teleoperation in scenarios where an Unmanned Aerial Vehicle (UAV) is remotely controlled by an operator via a cellular network. Such scenarios can be quite critical particularly when the UAV lacks advanced equipment (e.g., Lidar-based auto stop) or when the network is subject to some performance constraints (e.g., delay).… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted by IEEE Globecom 2023

  29. arXiv:2307.09666  [pdf, other

    cs.NI

    Visualization of Mobility Digital Twin: Framework Design, Case Study, and Future Challenges

    Authors: Yueyang Liu, Xiaolong Tu, Dawei Chen, Kyungtae Han, Onur Altintas, Haoxin Wang, Jiang Xie

    Abstract: A Mobility Digital Twin is an emerging implementation of digital twin technology in the transportation domain, which creates digital replicas for various physical mobility entities, such as vehicles, drivers, and pedestrians. Although a few work have investigated the applications of mobility digital twin recently, the extent to which it can facilitate safer autonomous vehicles remains insufficient… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: The paper has been accepted by The 20th IEEE International Conference on Mobile Ad-Hoc and Smart Systems (MASS 2023)

  30. arXiv:2307.02792  [pdf, other

    cs.CY cs.AI cs.CL

    What Should Data Science Education Do with Large Language Models?

    Authors: Xinming Tu, James Zou, Weijie J. Su, Linjun Zhang

    Abstract: The rapid advances of large language models (LLMs), such as ChatGPT, are revolutionizing data science and statistics. These state-of-the-art tools can streamline complex processes. As a result, it reshapes the role of data scientists. We argue that LLMs are transforming the responsibilities of data scientists, shifting their focus from hands-on coding, data-wrangling and conducting standard analys… ▽ More

    Submitted 7 July, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

  31. arXiv:2306.10772  [pdf, other

    cs.SD eess.AS

    Learning an Interpretable End-to-End Network for Real-Time Acoustic Beamforming

    Authors: Hao Liang, Guanxing Zhou, Xiaotong Tu, Andreas Jakobsson, Xinghao Ding, Yue Huang

    Abstract: Recently, many forms of audio industrial applications, such as sound monitoring and source localization, have begun exploiting smart multi-modal devices equipped with a microphone array. Regrettably, model-based methods are often difficult to employ for such devices due to their high computational complexity, as well as the difficulty of appropriately selecting the user-determined parameters. As a… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

    Comments: 12 pages, 9 figures

  32. arXiv:2305.14403  [pdf, other

    cs.CV cs.AI cs.LG

    Layer-adaptive Structured Pruning Guided by Latency

    Authors: Siyuan Pan, Linna Zhang, Jie Zhang, Xiaoshuang Li, Liang Hou, Xiaobing Tu

    Abstract: Structured pruning can simplify network architecture and improve inference speed. Combined with the underlying hardware and inference engine in which the final model is deployed, better results can be obtained by using latency collaborative loss function to guide network pruning together. Existing pruning methods that optimize latency have demonstrated leading performance, however, they often over… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: text overlap with arXiv:2010.07611, arXiv:2110.10811 by other authors

  33. arXiv:2305.04524  [pdf, other

    cs.CV

    Scene Text Recognition with Image-Text Matching-guided Dictionary

    Authors: Jiajun Wei, Hongjian Zhan, Xiao Tu, Yue Lu, Umapada Pal

    Abstract: Employing a dictionary can efficiently rectify the deviation between the visual prediction and the ground truth in scene text recognition methods. However, the independence of the dictionary on the visual features may lead to incorrect rectification of accurate visual predictions. In this paper, we propose a new dictionary language model leveraging the Scene Image-Text Matching(SITM) network, whic… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted at ICDAR2023

  34. arXiv:2303.13412  [pdf, other

    cs.CV

    Low-Light Image Enhancement by Learning Contrastive Representations in Spatial and Frequency Domains

    Authors: Yi Huang, Xiaoguang Tu, Gui Fu, Tingting Liu, Bokai Liu, Ming Yang, Ziliang Feng

    Abstract: Images taken under low-light conditions tend to suffer from poor visibility, which can decrease image quality and even reduce the performance of the downstream tasks. It is hard for a CNN-based method to learn generalized features that can recover normal images from the ones under various unknow low-light conditions. In this paper, we propose to incorporate the contrastive learning into an illumin… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

  35. arXiv:2212.05216  [pdf, other

    cs.CV

    Information-Preserved Blending Method for Forward-Looking Sonar Mosaicing in Non-Ideal System Configuration

    Authors: Jiayi Su, Xingbin Tu, Fengzhong Qu, Yan Wei

    Abstract: Forward-Looking Sonar (FLS) has started to gain attention in the field of near-bottom close-range underwater inspection because of its high resolution and high framerate features. Although Automatic Target Recognition (ATR) algorithms have been applied tentatively for object-searching tasks, human supervision is still indispensable, especially when involving critical areas. A clear FLS mosaic cont… ▽ More

    Submitted 20 March, 2025; v1 submitted 10 December, 2022; originally announced December 2022.

  36. arXiv:2211.17059  [pdf, other

    cs.CV cs.LG

    Hint-dynamic Knowledge Distillation

    Authors: Yiyang Liu, Chenxin Li, Xiaotong Tu, Xinghao Ding, Yue Huang

    Abstract: Knowledge Distillation (KD) transfers the knowledge from a high-capacity teacher model to promote a smaller student model. Existing efforts guide the distillation by matching their prediction logits, feature embedding, etc., while leaving how to efficiently utilize them in junction less explored. In this paper, we propose Hint-dynamic Knowledge Distillation, dubbed HKD, which excavates the knowled… ▽ More

    Submitted 30 November, 2022; originally announced November 2022.

    Comments: 5 pages

  37. pMPL: A Robust Multi-Party Learning Framework with a Privileged Party

    Authors: Lushan Song, Jiaxuan Wang, Zhexuan Wang, Xinyu Tu, Guopeng Lin, Wenqiang Ruan, Haoqi Wu, Weili Han

    Abstract: In order to perform machine learning among multiple parties while protecting the privacy of raw data, privacy-preserving machine learning based on secure multi-party computation (MPL for short) has been a hot spot in recent. The configuration of MPL usually follows the peer-to-peer architecture, where each party has the same chance to reveal the output result. However, typical business scenarios o… ▽ More

    Submitted 16 November, 2022; v1 submitted 2 October, 2022; originally announced October 2022.

    Comments: This paper is the full version of a paper to appear in CCS 2022

    Journal ref: 2022 ACM SIGSAC Conference on Computer and Communications Security (CCS'22)

  38. arXiv:2203.16988  [pdf

    cs.SD cs.LG eess.AS

    Acoustic-Net: A Novel Neural Network for Sound Localization and Quantification

    Authors: Guanxing Zhou, Hao Liang, Xinghao Ding, Yue Huang, Xiaotong Tu, Saqlain Abbas

    Abstract: Acoustic source localization has been applied in different fields, such as aeronautics and ocean science, generally using multiple microphones array data to reconstruct the source location. However, the model-based beamforming methods fail to achieve the high-resolution of conventional beamforming maps. Deep neural networks are also appropriate to locate the sound source, but in general, these met… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

  39. arXiv:2203.06321  [pdf, other

    cs.CV cs.AI

    Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation

    Authors: Linfeng Zhang, Xin Chen, Xiaobing Tu, Pengfei Wan, Ning Xu, Kaisheng Ma

    Abstract: Remarkable achievements have been attained with Generative Adversarial Networks (GANs) in image-to-image translation. However, due to a tremendous amount of parameters, state-of-the-art GANs usually suffer from low efficiency and bulky memory usage. To tackle this challenge, firstly, this paper investigates GANs performance from a frequency perspective. The results show that GANs, especially small… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

    Comments: Accepted by CVPR2022

  40. arXiv:2112.13165  [pdf, other

    cs.CV

    Semantic Clustering based Deduction Learning for Image Recognition and Classification

    Authors: Wenchi Ma, Xuemin Tu, Bo Luo, Guanghui Wang

    Abstract: The paper proposes a semantic clustering based deduction learning by mimicking the learning and thinking process of human brains. Human beings can make judgments based on experience and cognition, and as a result, no one would recognize an unknown animal as a car. Inspired by this observation, we propose to train deep learning models using the clustering prior that can guide the models to learn wi… ▽ More

    Submitted 24 December, 2021; originally announced December 2021.

    Journal ref: Pattern Recognition 2021

  41. arXiv:2112.08193  [pdf, other

    cs.AR cs.AI cs.PF

    N3H-Core: Neuron-designed Neural Network Accelerator via FPGA-based Heterogeneous Computing Cores

    Authors: Yu Gong, Zhihan Xu, Zhezhi He, Weifeng Zhang, Xiaobing Tu, Xiaoyao Liang, Li Jiang

    Abstract: Accelerating the neural network inference by FPGA has emerged as a popular option, since the reconfigurability and high performance computing capability of FPGA intrinsically satisfies the computation demand of the fast-evolving neural algorithms. However, the popular neural accelerators on FPGA (e.g., Xilinx DPU) mainly utilize the DSP resources for constructing their processing units, while the… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

    Comments: 11 pages, 12 figures, In Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'22), February 27-March 1, 2022, Virtual Event, CA, USA

    ACM Class: C.3

  42. arXiv:2111.11850  [pdf, ps, other

    cs.GT cs.LG

    Incentive Mechanisms for Federated Learning: From Economic and Game Theoretic Perspective

    Authors: Xuezhen Tu, Kun Zhu, Nguyen Cong Luong, Dusit Niyato, Yang Zhang, Juan Li

    Abstract: Federated learning (FL) becomes popular and has shown great potentials in training large-scale machine learning (ML) models without exposing the owners' raw data. In FL, the data owners can train ML models based on their local data and only send the model updates rather than raw data to the model owner for aggregation. To improve learning performance in terms of model accuracy and training complet… ▽ More

    Submitted 20 November, 2021; originally announced November 2021.

    Comments: 26 pages, 10 figures

  43. arXiv:2107.07467  [pdf, other

    cs.LG

    Only Train Once: A One-Shot Neural Network Training And Pruning Framework

    Authors: Tianyi Chen, Bo Ji, Tianyu Ding, Biyi Fang, Guanyi Wang, Zhihui Zhu, Luming Liang, Yixin Shi, Sheng Yi, Xiao Tu

    Abstract: Structured pruning is a commonly used technique in deploying deep neural networks (DNNs) onto resource-constrained devices. However, the existing pruning methods are usually heuristic, task-specified, and require an extra fine-tuning procedure. To overcome these limitations, we propose a framework that compresses DNNs into slimmer architectures with competitive performances and significant FLOPs r… ▽ More

    Submitted 11 November, 2021; v1 submitted 15 July, 2021; originally announced July 2021.

    Comments: Accepted by NeurIPS 2021

  44. arXiv:2105.14678  [pdf, other

    cs.CV eess.IV

    Image-to-Video Generation via 3D Facial Dynamics

    Authors: Xiaoguang Tu, Yingtian Zou, Jian Zhao, Wenjie Ai, Jian Dong, Yuan Yao, Zhikang Wang, Guodong Guo, Zhifeng Li, Wei Liu, Jiashi Feng

    Abstract: We present a versatile model, FaceAnime, for various video generation tasks from still images. Video generation from a single face image is an interesting problem and usually tackled by utilizing Generative Adversarial Networks (GANs) to integrate information from the input face image and a sequence of sparse facial landmarks. However, the generated face images usually suffer from quality loss, im… ▽ More

    Submitted 30 May, 2021; originally announced May 2021.

  45. Joint Face Image Restoration and Frontalization for Recognition

    Authors: Xiaoguang Tu, Jian Zhao, Qiankun Liu, Wenjie Ai, Guodong Guo, Zhifeng Li, Wei Liu, Jiashi Feng

    Abstract: In real-world scenarios, many factors may harm face recognition performance, e.g., large pose, bad illumination,low resolution, blur and noise. To address these challenges, previous efforts usually first restore the low-quality faces to high-quality ones and then perform face recognition. However, most of these methods are stage-wise, which is sub-optimal and deviates from the reality. In this pap… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

    Comments: 14 pages, 9 figures

  46. arXiv:2011.06116  [pdf

    cs.RO eess.SY

    A Data-Driven Reinforcement Learning Solution Framework for Optimal and Adaptive Personalization of a Hip Exoskeleton

    Authors: Xikai Tu, Minhan Li, Ming Liu, Jennie Si, He, Huang

    Abstract: Robotic exoskeletons are exciting technologies for augmenting human mobility. However, designing such a device for seamless integration with the human user and to assist human movement still is a major challenge. This paper aims at developing a novel data-driven solution framework based on reinforcement learning (RL), without first modeling the human-robot dynamics, to provide optimal and adaptive… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: 7 pages, 9 figures, ICRA 2021

    MSC Class: 68T40; 93C85 ACM Class: I.2.9; I.2.6

  47. arXiv:2011.04868  [pdf, other

    cs.LG math.OC stat.ML

    Neural Network Compression Via Sparse Optimization

    Authors: Tianyi Chen, Bo Ji, Yixin Shi, Tianyu Ding, Biyi Fang, Sheng Yi, Xiao Tu

    Abstract: The compression of deep neural networks (DNNs) to reduce inference cost becomes increasingly important to meet realistic deployment requirements of various applications. There have been a significant amount of work regarding network compression, while most of them are heuristic rule-based or typically not friendly to be incorporated into varying scenarios. On the other hand, sparse optimization yi… ▽ More

    Submitted 11 November, 2020; v1 submitted 9 November, 2020; originally announced November 2020.

  48. Dog Identification using Soft Biometrics and Neural Networks

    Authors: Kenneth Lai, Xinyuan Tu, Svetlana Yanushkevich

    Abstract: This paper addresses the problem of biometric identification of animals, specifically dogs. We apply advanced machine learning models such as deep neural network on the photographs of pets in order to determine the pet identity. In this paper, we explore the possibility of using different types of "soft" biometrics, such as breed, height, or gender, in fusion with "hard" biometrics such as photogr… ▽ More

    Submitted 22 July, 2020; originally announced July 2020.

    Journal ref: 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 2019, pp. 1-8

  49. arXiv:2005.10455  [pdf, other

    eess.IV cs.CV

    Single Image Super-Resolution via Residual Neuron Attention Networks

    Authors: Wenjie Ai, Xiaoguang Tu, Shilei Cheng, Mei Xie

    Abstract: Deep Convolutional Neural Networks (DCNNs) have achieved impressive performance in Single Image Super-Resolution (SISR). To further improve the performance, existing CNN-based methods generally focus on designing deeper architecture of the network. However, we argue blindly increasing network's depth is not the most sensible way. In this paper, we propose a novel end-to-end Residual Neuron Attenti… ▽ More

    Submitted 21 May, 2020; originally announced May 2020.

    Comments: 6 pages, 4 figures, Accepted by IEEE ICIP 2020

  50. arXiv:2004.03639  [pdf, other

    math.OC cs.LG stat.ML

    Orthant Based Proximal Stochastic Gradient Method for $\ell_1$-Regularized Optimization

    Authors: Tianyi Chen, Tianyu Ding, Bo Ji, Guanyi Wang, Jing Tian, Yixin Shi, Sheng Yi, Xiao Tu, Zhihui Zhu

    Abstract: Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression. In this paper, we present a novel stochastic method -- Orthant Based Proximal Stochastic Gradient Method (OBProx-SG) -- to solve perhaps the most popular instance, i.e., the l1-regularized problem. The OBProx-SG method contains two steps: (i) a proximal st… ▽ More

    Submitted 23 July, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

    Comments: Accepted by ECML 2020