Skip to main content

Showing 1–50 of 847 results for author: Huang, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09887  [pdf, other

    cs.RO

    Unsupervised Radar Point Cloud Enhancement via Arbitrary LiDAR Guided Diffusion Prior

    Authors: Yanlong Yang, Jianan Liu, Guanxiong Luo, Hao Li, Euijoon Ahn, Mostafa Rahimi Azghadi, Tao Huang

    Abstract: In industrial automation, radar is a critical sensor in machine perception. However, the angular resolution of radar is inherently limited by the Rayleigh criterion, which depends on both the radar's operating wavelength and the effective aperture of its antenna array.To overcome these hardware-imposed limitations, recent neural network-based methods have leveraged high-resolution LiDAR data, pair… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 19 pages, 15 figures, 4 tables

  2. arXiv:2505.09590  [pdf, ps, other

    cs.IR

    Distance-aware Self-adaptive Graph Convolution for Fine-grained Hierarchical Recommendation

    Authors: Tao Huang, Yihong Chen, Wei Fan, Wei Zhou, Junhao Wen

    Abstract: Graph Convolutional Networks (GCNs) are widely used to improve recommendation accuracy and performance by effectively learning the representations of user and item nodes. However, two major challenges remain: (1) the lack of further optimization in the graph representation structure and (2) insufficient attention given to the varying contributions of different convolutional layers.This paper propo… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  3. arXiv:2505.07866  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Computationally Efficient Diffusion Models in Medical Imaging: A Comprehensive Review

    Authors: Abdullah, Tao Huang, Ickjai Lee, Euijoon Ahn

    Abstract: The diffusion model has recently emerged as a potent approach in computer vision, demonstrating remarkable performances in the field of generative artificial intelligence. Capable of producing high-quality synthetic images, diffusion models have been successfully applied across a range of applications. However, a significant challenge remains with the high computational cost associated with traini… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: pages 36, 6 figures

  4. arXiv:2505.07347  [pdf, other

    cs.CV

    AI-Enabled Accurate Non-Invasive Assessment of Pulmonary Hypertension Progression via Multi-Modal Echocardiography

    Authors: Jiewen Yang, Taoran Huang, Shangwei Ding, Xiaowei Xu, Qinhua Zhao, Yong Jiang, Jiarong Guo, Bin Pu, Jiexuan Zheng, Caojin Zhang, Hongwen Fei, Xiaomeng Li

    Abstract: Echocardiographers can detect pulmonary hypertension using Doppler echocardiography; however, accurately assessing its progression often proves challenging. Right heart catheterization (RHC), the gold standard for precise evaluation, is invasive and unsuitable for routine use, limiting its practicality for timely diagnosis and monitoring of pulmonary hypertension progression. Here, we propose MePH… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  5. arXiv:2505.03738  [pdf, ps, other

    cs.RO cs.AI cs.LG

    AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control

    Authors: Jialong Li, Xuxin Cheng, Tianshu Huang, Shiqi Yang, Ri-Zhao Qiu, Xiaolong Wang

    Abstract: Humanoid robots derive much of their dexterity from hyper-dexterous whole-body movements, enabling tasks that require a large operational workspace: such as picking objects off the ground. However, achieving these capabilities on real humanoids remains challenging due to their high degrees of freedom (DoF) and nonlinear dynamics. We propose Adaptive Motion Optimization (AMO), a framework that inte… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: website: https://amo-humanoid.github.io

  6. arXiv:2505.00660  [pdf, other

    cs.IT eess.SP

    AI-based CSI Feedback with Digital Twins: Real-World Validation and Insights

    Authors: Tzu-Hao Huang, Chao-Kai Wen, Shang-Ho Tsai, Trung Q. Duong

    Abstract: Deep learning (DL) has shown great potential for enhancing channel state information (CSI) feedback in multiple-input multiple-output (MIMO) communication systems, a subject currently under study by the 3GPP standards body. Digital twins (DTs) have emerged as an effective means to generate site-specific datasets for training DL-based CSI feedback models. However, most existing studies rely solely… ▽ More

    Submitted 2 May, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

    Comments: 5 pages, 4 figures, 3 tables; this work has been submitted to IEEE for possible publication

  7. arXiv:2505.00394  [pdf, other

    cs.CV

    SOTA: Spike-Navigated Optimal TrAnsport Saliency Region Detection in Composite-bias Videos

    Authors: Wenxuan Liu, Yao Deng, Kang Chen, Xian Zhong, Zhaofei Yu, Tiejun Huang

    Abstract: Existing saliency detection methods struggle in real-world scenarios due to motion blur and occlusions. In contrast, spike cameras, with their high temporal resolution, significantly enhance visual saliency maps. However, the composite noise inherent to spike camera imaging introduces discontinuities in saliency detection. Low-quality samples further distort model predictions, leading to saliency… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: Accepted to IJCAI 2025

  8. arXiv:2505.00358  [pdf, other

    cs.LG cs.AI cs.CL

    R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training

    Authors: Albert Ge, Tzu-Heng Huang, John Cooper, Avi Trost, Ziyi Chu, Satya Sai Srinath Namburi GNVV, Ziyang Cai, Kendall Park, Nicholas Roberts, Frederic Sala

    Abstract: Data mixing strategies have successfully reduced the costs involved in training language models. While promising, such methods suffer from two flaws. First, they rely on predetermined data domains (e.g., data sources, task types), which may fail to capture critical semantic nuances, leaving performance on the table. Second, these methods scale with the number of domains in a computationally prohib… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  9. arXiv:2504.19546  [pdf

    cs.CV

    Crowd Detection Using Very-Fine-Resolution Satellite Imagery

    Authors: Tong Xiao, Qunming Wang, Ping Lu, Tenghai Huang, Xiaohua Tong, Peter M. Atkinson

    Abstract: Accurate crowd detection (CD) is critical for public safety and historical pattern analysis, yet existing methods relying on ground and aerial imagery suffer from limited spatio-temporal coverage. The development of very-fine-resolution (VFR) satellite sensor imagery (e.g., ~0.3 m spatial resolution) provides unprecedented opportunities for large-scale crowd activity analysis, but it has never bee… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: 17 pages, 12 figures, 5 tables

  10. arXiv:2504.18864  [pdf, other

    cs.CV

    Spike Imaging Velocimetry: Dense Motion Estimation of Fluids Using Spike Cameras

    Authors: Yunzhong Zhang, Bo Xiong, You Zhou, Changqing Su, Zhen Cheng, Zhaofei Yu, Xun Cao, Tiejun Huang

    Abstract: The need for accurate and non-intrusive flow measurement methods has led to the widespread adoption of Particle Image Velocimetry (PIV), a powerful diagnostic tool in fluid motion estimation. This study investigates the tremendous potential of spike cameras (a type of ultra-high-speed, high-dynamic-range camera) in PIV. We propose a deep learning framework, Spike Imaging Velocimetry (SIV), designe… ▽ More

    Submitted 12 May, 2025; v1 submitted 26 April, 2025; originally announced April 2025.

  11. arXiv:2504.17878  [pdf, other

    cs.CR cs.AI

    Crypto-ncRNA: Non-coding RNA (ncRNA) Based Encryption Algorithm

    Authors: Xu Wang, Yiquan Wang, Tin-yeh Huang

    Abstract: In the looming post-quantum era, traditional cryptographic systems are increasingly vulnerable to quantum computing attacks that can compromise their mathematical foundations. To address this critical challenge, we propose crypto-ncRNA-a bio-convergent cryptographic framework that leverages the dynamic folding properties of non-coding RNA (ncRNA) to generate high-entropy, quantum-resistant keys an… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Accepted at the AI4NA workshop at ICLR 2025. 18pages, 4figures

  12. arXiv:2504.15888  [pdf, other

    cs.CV

    MS-Occ: Multi-Stage LiDAR-Camera Fusion for 3D Semantic Occupancy Prediction

    Authors: Zhiqiang Wei, Lianqing Zheng, Jianan Liu, Tao Huang, Qing-Long Han, Wenwen Zhang, Fengdeng Zhang

    Abstract: Accurate 3D semantic occupancy perception is essential for autonomous driving in complex environments with diverse and irregular objects. While vision-centric methods suffer from geometric inaccuracies, LiDAR-based approaches often lack rich semantic information. To address these limitations, MS-Occ, a novel multi-stage LiDAR-camera fusion framework which includes middle-stage fusion and late-stag… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: 8 pages, 5 figures

  13. arXiv:2504.15742  [pdf, other

    cs.DB cs.SE

    Proving Cypher Query Equivalence

    Authors: Lei Tang, Wensheng Dou, Yingying Zheng, Lijie Xu, Wei Wang, Jun Wei, Tao Huang

    Abstract: Graph database systems store graph data as nodes and relationships, and utilize graph query languages (e.g., Cypher) for efficiently querying graph data. Proving the equivalence of graph queries is an important foundation for optimizing graph query performance, ensuring graph query reliability, etc. Although researchers have proposed many SQL query equivalence provers for relational database syste… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: 14 pages, accepted by ICDE 2025

  14. arXiv:2504.13914  [pdf, other

    cs.CL

    Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

    Authors: ByteDance Seed, :, Jiaze Chen, Tiantian Fan, Xin Liu, Lingjun Liu, Zhiqi Lin, Mingxuan Wang, Chengyi Wang, Xiangpeng Wei, Wenyuan Xu, Yufeng Yuan, Yu Yue, Lin Yan, Qiying Yu, Xiaochen Zuo, Chi Zhang, Ruofei Zhu, Zhecheng An, Zhihao Bai, Yu Bao, Xingyan Bin, Jiangjie Chen, Feng Chen, Hongmin Chen , et al. (249 additional authors not shown)

    Abstract: We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For in… ▽ More

    Submitted 29 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

  15. arXiv:2504.13603  [pdf, other

    cs.CL

    Continual Pre-Training is (not) What You Need in Domain Adaption

    Authors: Pin-Er Chen, Da-Chen Lian, Shu-Kai Hsieh, Sieh-Chuen Huang, Hsuan-Lei Shao, Jun-Wei Chiu, Yang-Hsien Lin, Zih-Ching Chen, Cheng-Kuang, Eddie TC Huang, Simon See

    Abstract: The recent advances in Legal Large Language Models (LLMs) have transformed the landscape of legal research and practice by automating tasks, enhancing research precision, and supporting complex decision-making processes. However, effectively adapting LLMs to the legal domain remains challenging due to the complexity of legal reasoning, the need for precise interpretation of specialized language, a… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: 11 pages, 2 figures

  16. arXiv:2504.10499  [pdf, other

    cs.IR cs.CL

    Graph-based Approaches and Functionalities in Retrieval-Augmented Generation: A Comprehensive Survey

    Authors: Zulun Zhu, Tiancheng Huang, Kai Wang, Junda Ye, Xinghe Chen, Siqiang Luo

    Abstract: Large language models (LLMs) struggle with the factual error during inference due to the lack of sufficient training data and the most updated knowledge, leading to the hallucination problem. Retrieval-Augmented Generation (RAG) has gained attention as a promising solution to address the limitation of LLMs, by retrieving relevant information from external source to generate more accurate answers t… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    MSC Class: Information storage and retrieval of data; Natural language processing ACM Class: H.3.3; I.2.7

  17. arXiv:2504.09518  [pdf, other

    cs.CV

    3D CoCa: Contrastive Learners are 3D Captioners

    Authors: Ting Huang, Zeyu Zhang, Yemin Wang, Hao Tang

    Abstract: 3D captioning, which aims to describe the content of 3D scenes in natural language, remains highly challenging due to the inherent sparsity of point clouds and weak cross-modal alignment in existing methods. To address these challenges, we propose 3D CoCa, a novel unified framework that seamlessly combines contrastive vision-language learning with 3D caption generation in a single architecture. Ou… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  18. arXiv:2504.09472  [pdf, other

    cs.CV

    CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models

    Authors: Pooja Guhan, Divya Kothandaraman, Tsung-Wei Huang, Guan-Ming Su, Dinesh Manocha

    Abstract: We introduce CamMimic, an innovative algorithm tailored for dynamic video editing needs. It is designed to seamlessly transfer the camera motion observed in a given reference video onto any scene of the user's choice in a zero-shot manner without requiring any additional data. Our algorithm achieves this using a two-phase strategy by leveraging a text-to-video diffusion model. In the first phase,… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  19. arXiv:2504.08150  [pdf, other

    cs.LG

    Beyond Feature Importance: Feature Interactions in Predicting Post-Stroke Rigidity with Graph Explainable AI

    Authors: Jiawei Xu, Yonggeon Lee, Anthony Elkommos Youssef, Eunjin Yun, Tinglin Huang, Tianjian Guo, Hamidreza Saber, Rex Ying, Ying Ding

    Abstract: This study addresses the challenge of predicting post-stroke rigidity by emphasizing feature interactions through graph-based explainable AI. Post-stroke rigidity, characterized by increased muscle tone and stiffness, significantly affects survivors' mobility and quality of life. Despite its prevalence, early prediction remains limited, delaying intervention. We analyze 519K stroke hospitalization… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: Jiawei Xu and Yonggeon Lee contributed equally to this work

  20. arXiv:2504.06643  [pdf, other

    cs.LG cs.AI

    AMAD: AutoMasked Attention for Unsupervised Multivariate Time Series Anomaly Detection

    Authors: Tiange Huang, Yongjun Li

    Abstract: Unsupervised multivariate time series anomaly detection (UMTSAD) plays a critical role in various domains, including finance, networks, and sensor systems. In recent years, due to the outstanding performance of deep learning in general sequential tasks, many models have been specialized for deep UMTSAD tasks and have achieved impressive results, particularly those based on the Transformer and self… ▽ More

    Submitted 25 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

    Comments: fix some grammar issues

    ACM Class: I.5.1

  21. arXiv:2504.05694  [pdf, other

    cs.IR cs.AI

    Large Language Models Enhanced Hyperbolic Space Recommender Systems

    Authors: Wentao Cheng, Zhida Qin, Zexue Wu, Pengzhan Zhou, Tianyu Huang

    Abstract: Large Language Models (LLMs) have attracted significant attention in recommender systems for their excellent world knowledge capabilities. However, existing methods that rely on Euclidean space struggle to capture the rich hierarchical information inherent in textual and semantic data, which is essential for capturing user preferences. The geometric properties of hyperbolic space offer a promising… ▽ More

    Submitted 19 April, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

    Comments: Accepted as a SIGIR'25 full paper

  22. arXiv:2504.04924  [pdf, other

    cs.CV eess.IV

    Inter-event Interval Microscopy for Event Cameras

    Authors: Changqing Su, Yanqin Chen, Zihan Lin, Zhen Cheng, You Zhou, Bo Xiong, Zhaofei Yu, Tiejun Huang

    Abstract: Event cameras, an innovative bio-inspired sensor, differ from traditional cameras by sensing changes in intensity rather than directly perceiving intensity and recording these variations as a continuous stream of "events". The intensity reconstruction from these sparse events has long been a challenging problem. Previous approaches mainly focused on transforming motion-induced events into videos o… ▽ More

    Submitted 12 May, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

  23. arXiv:2504.03198  [pdf, other

    cs.CV cs.AI

    Endo3R: Unified Online Reconstruction from Dynamic Monocular Endoscopic Video

    Authors: Jiaxin Guo, Wenzhen Dong, Tianyu Huang, Hao Ding, Ziyi Wang, Haomin Kuang, Qi Dou, Yun-Hui Liu

    Abstract: Reconstructing 3D scenes from monocular surgical videos can enhance surgeon's perception and therefore plays a vital role in various computer-assisted surgery tasks. However, achieving scale-consistent reconstruction remains an open challenge due to inherent issues in endoscopic videos, such as dynamic deformations and textureless surfaces. Despite recent advances, current methods either rely on c… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  24. arXiv:2504.01457  [pdf

    cs.CV

    Deep LG-Track: An Enhanced Localization-Confidence-Guided Multi-Object Tracker

    Authors: Ting Meng, Chunyun Fu, Xiangyan Yan, Zheng Liang, Pan Ji, Jianwen Wang, Tao Huang

    Abstract: Multi-object tracking plays a crucial role in various applications, such as autonomous driving and security surveillance. This study introduces Deep LG-Track, a novel multi-object tracker that incorporates three key enhancements to improve the tracking accuracy and robustness. First, an adaptive Kalman filter is developed to dynamically update the covariance of measurement noise based on detection… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: 11 pages, 6 fugures

  25. arXiv:2504.00351  [pdf, other

    cs.IT

    Geo2ComMap: Deep Learning-Based MIMO Throughput Prediction Using Geographic Data

    Authors: Fan-Hao Lin, Tzu-Hao Huang, Chao-Kai Wen, Trung Q. Duong

    Abstract: Accurate communication performance prediction is crucial for wireless applications such as network deployment and resource management. Unlike conventional systems with a single transmit and receive antenna, throughput (Tput) estimation in antenna array-based multiple-output multiple-input (MIMO) systems is computationally intensive, i.e., requiring analysis of channel matrices, rank conditions, an… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

    Comments: 5 pages, 8 figures, 1 table, this work has been submitted to IEEE for possible publication. The source code and datasets used in this study are publicly available at https://github.com/geo2commap/Geo2ComMap

  26. arXiv:2503.22986  [pdf, other

    cs.CV

    FreeSplat++: Generalizable 3D Gaussian Splatting for Efficient Indoor Scene Reconstruction

    Authors: Yunsong Wang, Tianxin Huang, Hanlin Chen, Gim Hee Lee

    Abstract: Recently, the integration of the efficient feed-forward scheme into 3D Gaussian Splatting (3DGS) has been actively explored. However, most existing methods focus on sparse view reconstruction of small regions and cannot produce eligible whole-scene reconstruction results in terms of either quality or efficiency. In this paper, we propose FreeSplat++, which focuses on extending the generalizable 3D… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

  27. arXiv:2503.21943  [pdf, other

    cs.CV cs.AI eess.IV

    Parametric Shadow Control for Portrait Generation in Text-to-Image Diffusion Models

    Authors: Haoming Cai, Tsung-Wei Huang, Shiv Gehlot, Brandon Y. Feng, Sachin Shah, Guan-Ming Su, Christopher Metzler

    Abstract: Text-to-image diffusion models excel at generating diverse portraits, but lack intuitive shadow control. Existing editing approaches, as post-processing, struggle to offer effective manipulation across diverse styles. Additionally, these methods either rely on expensive real-world light-stage data collection or require extensive computational resources for training. To address these limitations, w… ▽ More

    Submitted 7 April, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

    Comments: ShadowDirector Arxiv Version. Fix the arxiv title text issue

  28. arXiv:2503.21122  [pdf, other

    cs.CV

    One Snapshot is All You Need: A Generalized Method for mmWave Signal Generation

    Authors: Teng Huang, Han Ding, Wenxin Sun, Cui Zhao, Ge Wang, Fei Wang, Kun Zhao, Zhi Wang, Wei Xi

    Abstract: Wireless sensing systems, particularly those using mmWave technology, offer distinct advantages over traditional vision-based approaches, such as enhanced privacy and effectiveness in poor lighting conditions. These systems, leveraging FMCW signals, have shown success in human-centric applications like localization, gesture recognition, and so on. However, comprehensive mmWave datasets for diverse… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: IEEE INFOCOM 2025

  29. arXiv:2503.20315  [pdf, other

    cs.CV

    SpikeDerain: Unveiling Clear Videos from Rainy Sequences Using Color Spike Streams

    Authors: Hanwen Liang, Xian Zhong, Wenxuan Liu, Yajing Zheng, Wenxin Huang, Zhaofei Yu, Tiejun Huang

    Abstract: Restoring clear frames from rainy videos presents a significant challenge due to the rapid motion of rain streaks. Traditional frame-based visual sensors, which capture scene content synchronously, struggle to capture the fast-moving details of rain accurately. In recent years, neuromorphic sensors have introduced a new paradigm for dynamic scene perception, offering microsecond temporal resolutio… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  30. arXiv:2503.18083  [pdf, other

    cs.CV

    Unified Geometry and Color Compression Framework for Point Clouds via Generative Diffusion Priors

    Authors: Tianxin Huang, Gim Hee Lee

    Abstract: With the growth of 3D applications and the rapid increase in sensor-collected 3D point cloud data, there is a rising demand for efficient compression algorithms. Most existing learning-based compression methods handle geometry and color attributes separately, treating them as distinct tasks, making these methods challenging to apply directly to point clouds with colors. Besides, the limited capaci… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

  31. arXiv:2503.17933  [pdf, other

    cs.CL cs.AI cs.IR

    Experience Retrieval-Augmentation with Electronic Health Records Enables Accurate Discharge QA

    Authors: Justice Ou, Tinglin Huang, Yilun Zhao, Ziyang Yu, Peiqing Lu, Rex Ying

    Abstract: To improve the reliability of Large Language Models (LLMs) in clinical applications, retrieval-augmented generation (RAG) is extensively applied to provide factual medical knowledge. However, beyond general medical knowledge from open-ended datasets, clinical case-based knowledge is also critical for effective medical reasoning, as it provides context grounded in real-world patient experiences. Mo… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

  32. arXiv:2503.17172  [pdf, other

    cs.LG

    Principal Eigenvalue Regularization for Improved Worst-Class Certified Robustness of Smoothed Classifiers

    Authors: Gaojie Jin, Tianjin Huang, Ronghui Mu, Xiaowei Huang

    Abstract: Recent studies have identified a critical challenge in deep neural networks (DNNs) known as ``robust fairness", where models exhibit significant disparities in robust accuracy across different classes. While prior work has attempted to address this issue in adversarial robustness, the study of worst-class certified robustness for smoothed classifiers remains unexplored. Our work bridges this gap b… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: Under Review

  33. arXiv:2503.13896  [pdf, other

    cs.RO cs.CV

    Evaluating Global Geo-alignment for Precision Learned Autonomous Vehicle Localization using Aerial Data

    Authors: Yi Yang, Xuran Zhao, H. Charles Zhao, Shumin Yuan, Samuel M. Bateman, Tiffany A. Huang, Chris Beall, Will Maddern

    Abstract: Recently there has been growing interest in the use of aerial and satellite map data for autonomous vehicles, primarily due to its potential for significant cost reduction and enhanced scalability. Despite the advantages, aerial data also comes with challenges such as a sensor-modality gap and a viewpoint difference gap. Learned localization methods have shown promise for overcoming these challeng… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: 8 pages, 7 figures, accepted by International Conference on Robotics and Automation (ICRA) 2025

    ACM Class: I.2.9

  34. arXiv:2503.12968  [pdf, other

    cs.CV cs.RO

    OptiPMB: Enhancing 3D Multi-Object Tracking with Optimized Poisson Multi-Bernoulli Filtering

    Authors: Guanhua Ding, Yuxuan Xia, Runwei Guan, Qinchen Wu, Tao Huang, Weiping Ding, Jinping Sun, Guoqiang Mao

    Abstract: Accurate 3D multi-object tracking (MOT) is crucial for autonomous driving, as it enables robust perception, navigation, and planning in complex environments. While deep learning-based solutions have demonstrated impressive 3D MOT performance, model-based approaches remain appealing for their simplicity, interpretability, and data efficiency. Conventional model-based trackers typically rely on rand… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  35. arXiv:2503.12964  [pdf, other

    cs.CV cs.AI cs.LG

    Training Video Foundation Models with NVIDIA NeMo

    Authors: Zeeshan Patel, Ethan He, Parth Mannan, Xiaowei Ren, Ryan Wolf, Niket Agarwal, Jacob Huffman, Zhuoyao Wang, Carl Wang, Jack Chang, Yan Bai, Tommy Huang, Linnan Wang, Sahil Jain, Shanmugam Ramasamy, Joseph Jennings, Ekaterina Sirazitdinova, Oleg Sudakov, Mingyuan Ma, Bobby Chen, Forrest Lin, Hao Wang, Vasanth Rao Naik Sabavat, Sriharsha Niverty, Rong Ou , et al. (4 additional authors not shown)

    Abstract: Video Foundation Models (VFMs) have recently been used to simulate the real world to train physical AI systems and develop creative visual experiences. However, there are significant challenges in training large-scale, high quality VFMs that can generate high-quality videos. We present a scalable, open-source VFM training pipeline with NVIDIA NeMo, providing accelerated video dataset curation, mul… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  36. arXiv:2503.10697  [pdf, other

    cs.CV cs.AI eess.IV

    Zero-Shot Subject-Centric Generation for Creative Application Using Entropy Fusion

    Authors: Kaifeng Zou, Xiaoyi Feng, Peng Wang, Tao Huang, Zizhou Huang, Zhang Haihang, Yuntao Zou, Dagang Li

    Abstract: Generative models are widely used in visual content creation. However, current text-to-image models often face challenges in practical applications-such as textile pattern design and meme generation-due to the presence of unwanted elements that are difficult to separate with existing methods. Meanwhile, subject-reference generation has emerged as a key research trend, highlighting the need for tec… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: 8 pages, 8 figure

  37. arXiv:2503.09975  [pdf, ps, other

    cs.AR

    Faster Inference of LLMs using FP8 on the Intel Gaudi

    Authors: Joonhyung Lee, Shmulik Markovich-Golan, Daniel Ohayon, Yair Hanani, Gunho Park, Byeongwook Kim, Asaf Karnieli, Uri Livne, Haihao Shen, Tai Huang, Se Jung Kwon, Dongsoo Lee

    Abstract: Low-precision data types are essential in modern neural networks during both training and inference as they enhance throughput and computational capacity by better exploiting available hardware resources. Despite the incorporation of FP8 in commercially available neural network accelerators, a comprehensive exposition of its underlying mechanisms, along with rigorous performance and accuracy evalu… ▽ More

    Submitted 16 March, 2025; v1 submitted 12 March, 2025; originally announced March 2025.

  38. arXiv:2503.09095  [pdf, other

    cs.CR cs.CV

    C^2 ATTACK: Towards Representation Backdoor on CLIP via Concept Confusion

    Authors: Lijie Hu, Junchi Liao, Weimin Lyu, Shaopeng Fu, Tianhao Huang, Shu Yang, Guimin Hu, Di Wang

    Abstract: Backdoor attacks pose a significant threat to deep learning models, enabling adversaries to embed hidden triggers that manipulate the behavior of the model during inference. Traditional backdoor attacks typically rely on inserting explicit triggers (e.g., external patches, or perturbations) into input data, but they often struggle to evade existing defense mechanisms. To address this limitation, w… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  39. arXiv:2503.08079  [pdf

    cs.CL

    Advancing Sentiment Analysis: A Novel LSTM Framework with Multi-head Attention

    Authors: Jingyuan Yi, Peiyang Yu, Tianyi Huang, Xiaochuan Xu

    Abstract: This work proposes an LSTM-based sentiment classification model with multi-head attention mechanism and TF-IDF optimization. Through the integration of TF-IDF feature extraction and multi-head attention, the model significantly improves text sentiment analysis performance. Experimental results on public data sets demonstrate that the new method achieves substantial improvements in the most critica… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  40. arXiv:2503.07041  [pdf, other

    cs.CL

    TCM-3CEval: A Triaxial Benchmark for Assessing Responses from Large Language Models in Traditional Chinese Medicine

    Authors: Tianai Huang, Lu Lu, Jiayuan Chen, Lihao Liu, Junjun He, Yuping Zhao, Wenchao Tang, Jie Xu

    Abstract: Large language models (LLMs) excel in various NLP tasks and modern medicine, but their evaluation in traditional Chinese medicine (TCM) is underexplored. To address this, we introduce TCM3CEval, a benchmark assessing LLMs in TCM across three dimensions: core knowledge mastery, classical text understanding, and clinical decision-making. We evaluate diverse models, including international (e.g., GPT… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  41. arXiv:2503.06501  [pdf, other

    cs.CV cs.RO

    TextInPlace: Indoor Visual Place Recognition in Repetitive Structures with Scene Text Spotting and Verification

    Authors: Huaqi Tao, Bingxi Liu, Calvin Chen, Tingjun Huang, He Li, Jinqiang Cui, Hong Zhang

    Abstract: Visual Place Recognition (VPR) is a crucial capability for long-term autonomous robots, enabling them to identify previously visited locations using visual information. However, existing methods remain limited in indoor settings due to the highly repetitive structures inherent in such environments. We observe that scene text typically appears in indoor spaces, serving to distinguish visually simil… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: 8 pages,5 figures

  42. arXiv:2503.06428  [pdf, other

    cs.LG

    Interference-Aware Edge Runtime Prediction with Conformal Matrix Completion

    Authors: Tianshu Huang, Arjun Ramesh, Emily Ruppel, Nuno Pereira, Anthony Rowe, Carlee Joe-Wong

    Abstract: Accurately estimating workload runtime is a longstanding goal in computer systems, and plays a key role in efficient resource provisioning, latency minimization, and various other system management tasks. Runtime prediction is particularly important for managing increasingly complex distributed systems in which more sophisticated processing is pushed to the edge in search of better latency. Previo… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

    Comments: To appear at MLSys 2025

  43. arXiv:2503.05139  [pdf, other

    cs.LG cs.AI cs.CL

    Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs

    Authors: Ling Team, Binwei Zeng, Chao Huang, Chao Zhang, Changxin Tian, Cong Chen, Dingnan Jin, Feng Yu, Feng Zhu, Feng Yuan, Fakang Wang, Gangshan Wang, Guangyao Zhai, Haitao Zhang, Huizhong Li, Jun Zhou, Jia Liu, Junpeng Fang, Junjie Ou, Jun Hu, Ji Luo, Ji Zhang, Jian Liu, Jian Sha, Jianxue Qian , et al. (49 additional authors not shown)

    Abstract: In this technical report, we tackle the challenges of training large-scale Mixture of Experts (MoE) models, focusing on overcoming cost inefficiency and resource limitations prevalent in such systems. To address these issues, we present two differently sized MoE large language models (LLMs), namely Ling-Lite and Ling-Plus (referred to as "Bailing" in Chinese, spelled Bǎilíng in Pinyin). Ling-Lite… ▽ More

    Submitted 10 March, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    Comments: 34 pages

  44. arXiv:2503.03794  [pdf, other

    cs.LG cs.AI cs.CY

    Synthetic Data Augmentation for Enhancing Harmful Algal Bloom Detection with Machine Learning

    Authors: Tianyi Huang

    Abstract: Harmful Algal Blooms (HABs) pose severe threats to aquatic ecosystems and public health, resulting in substantial economic losses globally. Early detection is crucial but often hindered by the scarcity of high-quality datasets necessary for training reliable machine learning (ML) models. This study investigates the use of synthetic data augmentation using Gaussian Copulas to enhance ML-based HAB d… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

    Comments: Accepted Paper at the 2025 IEEE Conference on Technologies for Sustainability (SusTech)

  45. arXiv:2503.02836  [pdf, other

    cs.LG cs.AI

    SeqFusion: Sequential Fusion of Pre-Trained Models for Zero-Shot Time-Series Forecasting

    Authors: Ting-Ji Huang, Xu-Yang Chen, Han-Jia Ye

    Abstract: Unlike traditional time-series forecasting methods that require extensive in-task data for training, zero-shot forecasting can directly predict future values given a target time series without additional training data. Current zero-shot approaches primarily rely on pre-trained generalized models, with their performance often depending on the variety and relevance of the pre-training data, which ca… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

  46. arXiv:2503.00724  [pdf, other

    cs.CL

    Unmasking Digital Falsehoods: A Comparative Analysis of LLM-Based Misinformation Detection Strategies

    Authors: Tianyi Huang, Jingyuan Yi, Peiyang Yu, Xiaochuan Xu

    Abstract: The proliferation of misinformation on social media has raised significant societal concerns, necessitating robust detection mechanisms. Large Language Models such as GPT-4 and LLaMA2 have been envisioned as possible tools for detecting misinformation based on their advanced natural language understanding and reasoning capabilities. This paper conducts a comparison of LLM-based approaches to detec… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

  47. arXiv:2503.00555  [pdf, other

    cs.CR cs.AI cs.LG

    Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable

    Authors: Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Zachary Yahn, Yichang Xu, Ling Liu

    Abstract: Safety alignment is an important procedure before the official deployment of a Large Language Model (LLM). While safety alignment has been extensively studied for LLM, there is still a large research gap for Large Reasoning Models (LRMs) that equip with improved reasoning capability. We in this paper systematically examine a simplified pipeline for producing safety aligned LRMs. With our evaluatio… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

  48. arXiv:2503.00355  [pdf, other

    cs.CL cs.AI

    Structured Reasoning for Fairness: A Multi-Agent Approach to Bias Detection in Textual Data

    Authors: Tianyi Huang, Elsa Fan

    Abstract: From disinformation spread by AI chatbots to AI recommendations that inadvertently reinforce stereotypes, textual bias poses a significant challenge to the trustworthiness of large language models (LLMs). In this paper, we propose a multi-agent framework that systematically identifies biases by disentangling each statement as fact or opinion, assigning a bias intensity score, and providing concise… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

    Comments: Accepted Paper (Oral Presentation) in the Workshop on the Social Impact of AI: Research, Diversity and Inclusion Frameworks at AAAI 2025

  49. arXiv:2503.00301  [pdf, other

    cs.CV

    Differential Coding for Training-Free ANN-to-SNN Conversion

    Authors: Zihan Huang, Wei Fang, Tong Bu, Peng Xue, Zecheng Hao, Wenxuan Liu, Yuanhong Tang, Zhaofei Yu, Tiejun Huang

    Abstract: Spiking Neural Networks (SNNs) exhibit significant potential due to their low energy consumption. Converting Artificial Neural Networks (ANNs) to SNNs is an efficient way to achieve high-performance SNNs. However, many conversion methods are based on rate coding, which requires numerous spikes and longer time-steps compared to directly trained SNNs, leading to increased energy consumption and late… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

  50. arXiv:2502.21193  [pdf, other

    cs.CV

    Towards High-performance Spiking Transformers from ANN to SNN Conversion

    Authors: Zihan Huang, Xinyu Shi, Zecheng Hao, Tong Bu, Jianhao Ding, Zhaofei Yu, Tiejun Huang

    Abstract: Spiking neural networks (SNNs) show great potential due to their energy efficiency, fast processing capabilities, and robustness. There are two main approaches to constructing SNNs. Direct training methods require much memory, while conversion methods offer a simpler and more efficient option. However, current conversion methods mainly focus on converting convolutional neural networks (CNNs) to SN… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.