Skip to main content

Showing 1–50 of 412 results for author: Ren, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10557  [pdf, ps, other

    cs.CV cs.AI cs.CL

    MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning

    Authors: Ke Wang, Junting Pan, Linda Wei, Aojun Zhou, Weikang Shi, Zimu Lu, Han Xiao, Yunqiao Yang, Houxing Ren, Mingjie Zhan, Hongsheng Li

    Abstract: Natural language image-caption datasets, widely used for training Large Multimodal Models, mainly focus on natural scenarios and overlook the intricate details of mathematical figures that are critical for problem-solving, hindering the advancement of current LMMs in multimodal mathematical reasoning. To this end, we propose leveraging code as supervision for cross-modal alignment, since code inhe… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: Accepted to ACL 2025 Findings

  2. arXiv:2505.07245  [pdf

    cs.LG cs.AI

    REMEDI: Relative Feature Enhanced Meta-Learning with Distillation for Imbalanced Prediction

    Authors: Fei Liu, Huanhuan Ren, Yu Guan, Xiuxu Wang, Wang Lv, Zhiqiang Hu, Yaxi Chen

    Abstract: Predicting future vehicle purchases among existing owners presents a critical challenge due to extreme class imbalance (<0.5% positive rate) and complex behavioral patterns. We propose REMEDI (Relative feature Enhanced Meta-learning with Distillation for Imbalanced prediction), a novel multi-stage framework addressing these challenges. REMEDI first trains diverse base models to capture complementa… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  3. arXiv:2505.07149  [pdf, ps, other

    cs.LG

    AugMixCloak: A Defense against Membership Inference Attacks via Image Transformation

    Authors: Heqing Ren, Chao Feng, Alberto Huertas, Burkhard Stiller

    Abstract: Traditional machine learning (ML) raises serious privacy concerns, while federated learning (FL) mitigates the risk of data leakage by keeping data on local devices. However, the training process of FL can still leak sensitive information, which adversaries may exploit to infer private data. One of the most prominent threats is the membership inference attack (MIA), where the adversary aims to det… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  4. arXiv:2505.06573  [pdf, ps, other

    cs.CV

    ElectricSight: 3D Hazard Monitoring for Power Lines Using Low-Cost Sensors

    Authors: Xingchen Li, LiDian Wang, Yu Sheng, ZhiPeng Tang, Haojie Ren, Guoliang You, YiFan Duan, Jianmin Ji, Yanyong Zhang

    Abstract: Protecting power transmission lines from potential hazards involves critical tasks, one of which is the accurate measurement of distances between power lines and potential threats, such as large cranes. The challenge with this task is that the current sensor-based methods face challenges in balancing accuracy and cost in distance measurement. A common practice is to install cameras on transmission… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  5. arXiv:2505.04986  [pdf, other

    stat.ML cs.LG

    Conformal Prediction with Cellwise Outliers: A Detect-then-Impute Approach

    Authors: Qian Peng, Yajie Bao, Haojie Ren, Zhaojun Wang, Changliang Zou

    Abstract: Conformal prediction is a powerful tool for constructing prediction intervals for black-box models, providing a finite sample coverage guarantee for exchangeable data. However, this exchangeability is compromised when some entries of the test feature are contaminated, such as in the case of cellwise outliers. To address this issue, this paper introduces a novel framework called detect-then-impute… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 23 pages, 15 figures

  6. arXiv:2505.03733  [pdf, other

    cs.CL

    WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch

    Authors: Zimu Lu, Yunqiao Yang, Houxing Ren, Haotian Hou, Han Xiao, Ke Wang, Weikang Shi, Aojun Zhou, Mingjie Zhan, Hongsheng Li

    Abstract: LLM-based agents have demonstrated great potential in generating and managing code within complex codebases. In this paper, we introduce WebGen-Bench, a novel benchmark designed to measure an LLM-based agent's ability to create multi-file website codebases from scratch. It contains diverse instructions for website generation, created through the combined efforts of human annotators and GPT-4o. The… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  7. arXiv:2505.01766  [pdf, other

    cs.CV cs.RO

    Multimodal Graph Representation Learning for Robust Surgical Workflow Recognition with Adversarial Feature Disentanglement

    Authors: Long Bai, Boyi Ma, Ruohan Wang, Guankun Wang, Beilei Cui, Zhongliang Jiang, Mobarakol Islam, Zhe Min, Jiewen Lai, Nassir Navab, Hongliang Ren

    Abstract: Surgical workflow recognition is vital for automating tasks, supporting decision-making, and training novice surgeons, ultimately improving patient safety and standardizing procedures. However, data corruption can lead to performance degradation due to issues like occlusion from bleeding or smoke in surgical scenes and problems with data storage and transmission. In this case, we explore a robust… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: Accepted by Information Fusion

  8. arXiv:2504.20278  [pdf, other

    cs.AI

    Deep Physics Prior for First Order Inverse Optimization

    Authors: Haoyu Yang, Kamyar Azizzadenesheli, Haoxing Ren

    Abstract: Inverse design optimization aims to infer system parameters from observed solutions, posing critical challenges across domains such as semiconductor manufacturing, structural engineering, materials science, and fluid dynamics. The lack of explicit mathematical representations in many systems complicates this process and makes the first order optimization impossible. Mainstream approaches, includin… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: 10 pages, 5 figure. Under Review

  9. arXiv:2504.18813  [pdf, other

    cs.ET physics.optics

    Automated Routing-Informed Placement for Large-Scale Photonic Integrated Circuits

    Authors: Hongjian Zhou, Haoyu Yang, Gangi Nicholas, Haoxing Ren, Huang Rena, Jiaqi Gu

    Abstract: As technology advances, photonic integrated circuits (PICs) are rapidly scaling in size and complexity, with modern designs integrating thousands of components. However, the analog custom layout nature of photonics, the curvy waveguide structures, and single-layer routing resources impose stringent physical constraints, such as minimum bend radii and waveguide crossing penalties, which make manual… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

  10. arXiv:2504.18249  [pdf, other

    cs.CV cs.AI cs.LG

    Event-Based Eye Tracking. 2025 Event-based Vision Workshop

    Authors: Qinyu Chen, Chang Gao, Min Liu, Daniele Perrone, Yan Ru Pei, Zuowen Wang, Zhuo Zou, Shihang Tan, Tao Han, Guorui Lu, Zhen Xu, Junyuan Ding, Ziteng Wang, Zongwei Wu, Han Han, Yuliang Wu, Jinze Chen, Wei Zhai, Yang Cao, Zheng-jun Zha, Nuwan Bandara, Thivya Kandappu, Archan Misra, Xiaopeng Lin, Hongxiang Huang , et al. (7 additional authors not shown)

    Abstract: This survey serves as a review for the 2025 Event-Based Eye Tracking Challenge organized as part of the 2025 CVPR event-based vision workshop. This challenge focuses on the task of predicting the pupil center by processing event camera recorded eye movement. We review and summarize the innovative methods from teams rank the top in the challenge to advance future event-based eye tracking research.… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  11. arXiv:2504.14604  [pdf, other

    cs.RO

    RoboOcc: Enhancing the Geometric and Semantic Scene Understanding for Robots

    Authors: Zhang Zhang, Qiang Zhang, Wei Cui, Shuai Shi, Yijie Guo, Gang Han, Wen Zhao, Hengle Ren, Renjing Xu, Jian Tang

    Abstract: 3D occupancy prediction enables the robots to obtain spatial fine-grained geometry and semantics of the surrounding scene, and has become an essential task for embodied perception. Existing methods based on 3D Gaussians instead of dense voxels do not effectively exploit the geometry and opacity properties of Gaussians, which limits the network's estimation of complex environments and also limits t… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  12. arXiv:2504.14548  [pdf, other

    cs.CV cs.AI

    VGNC: Reducing the Overfitting of Sparse-view 3DGS via Validation-guided Gaussian Number Control

    Authors: Lifeng Lin, Rongfeng Lu, Quan Chen, Haofan Ren, Ming Lu, Yaoqi Sun, Chenggang Yan, Anke Xue

    Abstract: Sparse-view 3D reconstruction is a fundamental yet challenging task in practical 3D reconstruction applications. Recently, many methods based on the 3D Gaussian Splatting (3DGS) framework have been proposed to address sparse-view 3D reconstruction. Although these methods have made considerable advancements, they still show significant issues with overfitting. To reduce the overfitting, we introduc… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: 10 pages,8 figures

  13. arXiv:2504.13914  [pdf, other

    cs.CL

    Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

    Authors: ByteDance Seed, :, Jiaze Chen, Tiantian Fan, Xin Liu, Lingjun Liu, Zhiqi Lin, Mingxuan Wang, Chengyi Wang, Xiangpeng Wei, Wenyuan Xu, Yufeng Yuan, Yu Yue, Lin Yan, Qiying Yu, Xiaochen Zuo, Chi Zhang, Ruofei Zhu, Zhecheng An, Zhihao Bai, Yu Bao, Xingyan Bin, Jiangjie Chen, Feng Chen, Hongmin Chen , et al. (249 additional authors not shown)

    Abstract: We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For in… ▽ More

    Submitted 29 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

  14. arXiv:2504.12456  [pdf, other

    cs.CV

    DG-MVP: 3D Domain Generalization via Multiple Views of Point Clouds for Classification

    Authors: Huantao Ren, Minmin Yang, Senem Velipasalar

    Abstract: Deep neural networks have achieved significant success in 3D point cloud classification while relying on large-scale, annotated point cloud datasets, which are labor-intensive to build. Compared to capturing data with LiDAR sensors and then performing annotation, it is relatively easier to sample point clouds from CAD models. Yet, data sampled from CAD models is regular, and does not suffer from o… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  15. arXiv:2504.12442  [pdf, other

    cs.CV

    3D-PointZshotS: Geometry-Aware 3D Point Cloud Zero-Shot Semantic Segmentation Narrowing the Visual-Semantic Gap

    Authors: Minmin Yang, Huantao Ren, Senem Velipasalar

    Abstract: Existing zero-shot 3D point cloud segmentation methods often struggle with limited transferability from seen classes to unseen classes and from semantic to visual space. To alleviate this, we introduce 3D-PointZshotS, a geometry-aware zero-shot segmentation framework that enhances both feature generation and alignment using latent geometric prototypes (LGPs). Specifically, we integrate LGPs into a… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  16. arXiv:2504.11930  [pdf, other

    cs.CV

    Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning

    Authors: Hairui Ren, Fan Tang, He Zhao, Zixuan Wang, Dandan Guo, Yi Chang

    Abstract: Fine-tuning vision-language models (VLMs) with large amounts of unlabeled data has recently garnered significant interest. However, a key challenge remains the lack of high-quality pseudo-labeled data. Current pseudo-labeling strategies often struggle with mismatches between semantic and visual information, leading to sub-optimal performance of unsupervised prompt learning (UPL) methods. In this p… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  17. arXiv:2504.11502  [pdf, other

    cs.SE cs.LG

    Timing Analysis Agent: Autonomous Multi-Corner Multi-Mode (MCMM) Timing Debugging with Timing Debug Relation Graph

    Authors: Jatin Nainani, Chia-Tung Ho, Anirudh Dhurka, Haoxing Ren

    Abstract: Timing analysis is an essential and demanding verification method for Very Large Scale Integrated (VLSI) circuit design and optimization. In addition, it also serves as the cornerstone of the final sign-off, determining whether the chip is ready to be sent to the semiconductor foundry for fabrication. Recently, as the technology advance relentlessly, smaller metal pitches and the increasing number… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: 7 pages, 7 figures, 2 tables

  18. arXiv:2504.10041  [pdf, other

    cs.RO cs.CV

    Prior Does Matter: Visual Navigation via Denoising Diffusion Bridge Models

    Authors: Hao Ren, Yiming Zeng, Zetong Bi, Zhaoliang Wan, Junlong Huang, Hui Cheng

    Abstract: Recent advancements in diffusion-based imitation learning, which show impressive performance in modeling multimodal distributions and training stability, have led to substantial progress in various robot learning tasks. In visual navigation, previous diffusion-based policies typically generate action sequences by initiating from denoising Gaussian noise. However, the target action distribution oft… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Journal ref: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025

  19. arXiv:2504.10003  [pdf, other

    cs.RO cs.CV

    NaviDiffusor: Cost-Guided Diffusion Model for Visual Navigation

    Authors: Yiming Zeng, Hao Ren, Shuhang Wang, Junlong Huang, Hui Cheng

    Abstract: Visual navigation, a fundamental challenge in mobile robotics, demands versatile policies to handle diverse environments. Classical methods leverage geometric solutions to minimize specific costs, offering adaptability to new scenarios but are prone to system errors due to their multi-modular design and reliance on hand-crafted rules. Learning-based methods, while achieving high planning success r… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Journal ref: ICRA 2025

  20. arXiv:2504.01962  [pdf

    cs.AR

    Marco: Configurable Graph-Based Task Solving and Multi-AI Agents Framework for Hardware Design

    Authors: Chia-Tung Ho, Jing Gong, Yunsheng Bai, Chenhui Deng, Haoxing Ren, Brucek Khailany

    Abstract: Hardware design presents numerous challenges stemming from its complexity and advancing technologies. These challenges result in longer turn-around-time (TAT) for optimizing performance, power, area, and cost (PPAC) during synthesis, verification, physical design, and reliability loops. Large Language Models (LLMs) have shown remarkable capacity to comprehend and generate natural language at a mas… ▽ More

    Submitted 25 February, 2025; originally announced April 2025.

    Comments: 3 pages, 5 figures, 2 tables

  21. arXiv:2504.00993  [pdf, other

    cs.CL cs.AI

    MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs

    Authors: Juncheng Wu, Wenlong Deng, Xingxuan Li, Sheng Liu, Taomian Mi, Yifan Peng, Ziyang Xu, Yi Liu, Hyunjin Cho, Chang-In Choi, Yihan Cao, Hui Ren, Xiang Li, Xiaoxiao Li, Yuyin Zhou

    Abstract: Medical tasks such as diagnosis and treatment planning require precise and complex reasoning, particularly in life-critical domains. Unlike mathematical reasoning, medical reasoning demands meticulous, verifiable thought processes to ensure reliability and accuracy. However, there is a notable lack of datasets that provide transparent, step-by-step reasoning to validate and enhance the medical rea… ▽ More

    Submitted 4 April, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

    Comments: 18 pages, 11 figures, 6 tables. Project page: https://github.com/UCSC-VLAA/MedReason

  22. arXiv:2503.24306  [pdf, other

    cs.CV

    Point Tracking in Surgery--The 2024 Surgical Tattoos in Infrared (STIR) Challenge

    Authors: Adam Schmidt, Mert Asim Karaoglu, Soham Sinha, Mingang Jang, Ho-Gun Ha, Kyungmin Jung, Kyeongmo Gu, Ihsan Ullah, Hyunki Lee, Jonáš Šerých, Michal Neoral, Jiří Matas, Rulin Zhou, Wenlong He, An Wang, Hongliang Ren, Bruno Silva, Sandro Queirós, Estêvão Lima, João L. Vilaça, Shunsuke Kikuchi, Atsushi Kouno, Hiroki Matsuzaki, Tongtong Li, Yulu Chen , et al. (15 additional authors not shown)

    Abstract: Understanding tissue motion in surgery is crucial to enable applications in downstream tasks such as segmentation, 3D reconstruction, virtual tissue landmarking, autonomous probe-based scanning, and subtask autonomy. Labeled data are essential to enabling algorithms in these downstream tasks since they allow us to quantify and train algorithms. This paper introduces a point tracking challenge to a… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  23. arXiv:2503.23725  [pdf, other

    cs.CV

    Exploring Temporal Dynamics in Event-based Eye Tracker

    Authors: Hongwei Ren, Xiaopeng Lin, Hongxiang Huang, Yue Zhou, Bojun Cheng

    Abstract: Eye-tracking is a vital technology for human-computer interaction, especially in wearable devices such as AR, VR, and XR. The realization of high-speed and high-precision eye-tracking using frame-based image sensors is constrained by their limited temporal resolution, which impairs the accurate capture of rapid ocular dynamics, such as saccades and blinks. Event cameras, inspired by biological vis… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

    Comments: Accepted by CVPR 2025 Event-based Vision Workshop

  24. arXiv:2503.23130  [pdf, other

    cs.CV cs.CL cs.RO

    Can DeepSeek Reason Like a Surgeon? An Empirical Evaluation for Vision-Language Understanding in Robotic-Assisted Surgery

    Authors: Boyi Ma, Yanguang Zhao, Jie Wang, Guankun Wang, Kun Yuan, Tong Chen, Long Bai, Hongliang Ren

    Abstract: The DeepSeek models have shown exceptional performance in general scene understanding, question-answering (QA), and text generation tasks, owing to their efficient training paradigm and strong reasoning capabilities. In this study, we investigate the dialogue capabilities of the DeepSeek model in robotic surgery scenarios, focusing on tasks such as Single Phrase QA, Visual QA, and Detailed Descrip… ▽ More

    Submitted 3 April, 2025; v1 submitted 29 March, 2025; originally announced March 2025.

    Comments: Technical Report

  25. arXiv:2503.22900  [pdf, other

    cs.LG cs.AR

    Learning Library Cell Representations in Vector Space

    Authors: Rongjian Liang, Yi-Chen Lu, Wen-Hao Liu, Haoxing Ren

    Abstract: We propose Lib2Vec, a novel self-supervised framework to efficiently learn meaningful vector representations of library cells, enabling ML models to capture essential cell semantics. The framework comprises three key components: (1) an automated method for generating regularity tests to quantitatively evaluate how well cell representations reflect inter-cell relationships; (2) a self-supervised le… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

  26. arXiv:2503.22394  [pdf, other

    cs.CV cs.AI

    Endo-TTAP: Robust Endoscopic Tissue Tracking via Multi-Facet Guided Attention and Hybrid Flow-point Supervision

    Authors: Rulin Zhou, Wenlong He, An Wang, Qiqi Yao, Haijun Hu, Jiankun Wang, Xi Zhang an Hongliang Ren

    Abstract: Accurate tissue point tracking in endoscopic videos is critical for robotic-assisted surgical navigation and scene understanding, but remains challenging due to complex deformations, instrument occlusion, and the scarcity of dense trajectory annotations. Existing methods struggle with long-term tracking under these conditions due to limited feature utilization and annotation dependence. We present… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

  27. arXiv:2503.20776  [pdf, other

    cs.CV

    Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields

    Authors: Shijie Zhou, Hui Ren, Yijia Weng, Shuwang Zhang, Zhen Wang, Dejia Xu, Zhiwen Fan, Suya You, Zhangyang Wang, Leonidas Guibas, Achuta Kadambi

    Abstract: Recent advancements in 2D and multimodal models have achieved remarkable success by leveraging large-scale training on extensive datasets. However, extending these achievements to enable free-form interactions and high-level semantic operations with complex 3D/4D scenes remains challenging. This difficulty stems from the limited availability of large-scale, annotated 3D/4D or multi-view datasets,… ▽ More

    Submitted 28 March, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

  28. arXiv:2503.19174  [pdf, other

    cs.AI

    AssertionForge: Enhancing Formal Verification Assertion Generation with Structured Representation of Specifications and RTL

    Authors: Yunsheng Bai, Ghaith Bany Hamad, Syed Suhaib, Haoxing Ren

    Abstract: Generating SystemVerilog Assertions (SVAs) from natural language specifications remains a major challenge in formal verification (FV) due to the inherent ambiguity and incompleteness of specifications. Existing LLM-based approaches, such as AssertLLM, focus on extracting information solely from specification documents, often failing to capture essential internal signal interactions and design deta… ▽ More

    Submitted 14 May, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

    Comments: LAD 2025

  29. arXiv:2503.18671  [pdf, other

    cs.CV

    Structure-Aware Correspondence Learning for Relative Pose Estimation

    Authors: Yihan Chen, Wenfei Yang, Huan Ren, Shifeng Zhang, Tianzhu Zhang, Feng Wu

    Abstract: Relative pose estimation provides a promising way for achieving object-agnostic pose estimation. Despite the success of existing 3D correspondence-based methods, the reliance on explicit feature matching suffers from small overlaps in visible regions and unreliable feature estimation for invisible regions. Inspired by humans' ability to assemble two object parts that have small or no overlapping r… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: CVPR2025

  30. arXiv:2503.15917  [pdf, other

    cs.CV

    Learning to Efficiently Adapt Foundation Models for Self-Supervised Endoscopic 3D Scene Reconstruction from Any Cameras

    Authors: Beilei Cui, Long Bai, Mobarakol Islam, An Wang, Zhiqi Ma, Yiming Huang, Feng Li, Zhen Chen, Zhongliang Jiang, Nassir Navab, Hongliang Ren

    Abstract: Accurate 3D scene reconstruction is essential for numerous medical tasks. Given the challenges in obtaining ground truth data, there has been an increasing focus on self-supervised learning (SSL) for endoscopic depth estimation as a basis for scene reconstruction. While foundation models have shown remarkable progress in visual tasks, their direct application to the medical domain often leads to s… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  31. arXiv:2503.13926  [pdf, other

    cs.CV

    Learning Shape-Independent Transformation via Spherical Representations for Category-Level Object Pose Estimation

    Authors: Huan Ren, Wenfei Yang, Xiang Liu, Shifeng Zhang, Tianzhu Zhang

    Abstract: Category-level object pose estimation aims to determine the pose and size of novel objects in specific categories. Existing correspondence-based approaches typically adopt point-based representations to establish the correspondences between primitive observed points and normalized object coordinates. However, due to the inherent shape-dependence of canonical coordinates, these methods suffer from… ▽ More

    Submitted 19 March, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

    Comments: Accepted by ICLR 2025. Project page is available at https://renhuan1999.github.io/SpherePose

  32. Rubikon: Intelligent Tutoring for Rubik's Cube Learning Through AR-enabled Physical Task Reconfiguration

    Authors: Haocheng Ren, Muzhe Wu, Gregory Croisdale, Anhong Guo, Xu Wang

    Abstract: Learning to solve a Rubik's Cube requires the learners to repeatedly practice a skill component, e.g., identifying a misplaced square and putting it back. However, for 3D physical tasks such as this, generating sufficient repeated practice opportunities for learners can be challenging, in part because it is difficult for novices to reconfigure the physical object to specific states. We propose Rub… ▽ More

    Submitted 14 May, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

    Comments: DIS 2025

  33. arXiv:2503.02647  [pdf, other

    cs.IT eess.SP

    A Framework for Uplink ISAC Receiver Designs: Performance Analysis and Algorithm Development

    Authors: Zhiyuan Yu, Hong Ren, Cunhua Pan, Gui Zhou, Dongming Wang, Chau Yuen, Jiangzhou Wang

    Abstract: Uplink integrated sensing and communication (ISAC) systems have recently emerged as a promising research direction, enabling simultaneous uplink signal detection and target sensing. In this paper, we propose the flexible projection (FP)-type receiver that unify the projection-type receiver and the successive interference cancellation (SIC)-type receiver by using a flexible tradeoff factor to adapt… ▽ More

    Submitted 3 April, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

    Comments: 13 pages, 9 figures, submitted to an IEEE journal for possible publication

  34. arXiv:2502.16963  [pdf, other

    cs.AR

    Make LLM Inference Affordable to Everyone: Augmenting GPU Memory with NDP-DIMM

    Authors: Lian Liu, Shixin Zhao, Bing Li, Haimeng Ren, Zhaohui Xu, Mengdi Wang, Xiaowei Li, Yinhe Han, Ying Wang

    Abstract: The billion-scale Large Language Models (LLMs) need deployment on expensive server-grade GPUs with large-storage HBMs and abundant computation capability. As LLM-assisted services become popular, achieving cost-effective LLM inference on budget-friendly hardware becomes the trend. Extensive researches relocate LLM parameters from expensive GPUs to host memory. However, the restricted bandwidth bet… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Comments: 15 pages, 17 figures, accepted by HPCA 2025

    ACM Class: C.1.3

  35. arXiv:2502.11419  [pdf, other

    cs.CL

    InsBank: Evolving Instruction Subset for Ongoing Alignment

    Authors: Jiayi Shi, Yiwei Li, Shaoxiong Feng, Peiwen Yuan, Xinglin Wang, Yueqi Zhang, Chuyi Tan, Boyuan Pan, Huan Ren, Yao Hu, Kan Li

    Abstract: Large language models (LLMs) typically undergo instruction tuning to enhance alignment. Recent studies emphasize that quality and diversity of instruction data are more crucial than quantity, highlighting the need to select diverse, high-quality subsets to reduce training costs. However, how to evolve these selected subsets alongside the development of new instruction data remains insufficiently e… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

  36. arXiv:2501.19319  [pdf, other

    cs.CV cs.RO

    Advancing Dense Endoscopic Reconstruction with Gaussian Splatting-driven Surface Normal-aware Tracking and Mapping

    Authors: Yiming Huang, Beilei Cui, Long Bai, Zhen Chen, Jinlin Wu, Zhen Li, Hongbin Liu, Hongliang Ren

    Abstract: Simultaneous Localization and Mapping (SLAM) is essential for precise surgical interventions and robotic tasks in minimally invasive procedures. While recent advancements in 3D Gaussian Splatting (3DGS) have improved SLAM with high-quality novel view synthesis and fast rendering, these systems struggle with accurate depth and surface reconstruction due to multi-view inconsistencies. Simply incorpo… ▽ More

    Submitted 31 January, 2025; originally announced January 2025.

    Comments: Accepted by ICRA 2025

  37. arXiv:2501.15808  [pdf, other

    cs.CV

    ClearSight: Human Vision-Inspired Solutions for Event-Based Motion Deblurring

    Authors: Xiaopeng Lin, Yulong Huang, Hongwei Ren, Zunchang Liu, Yue Zhou, Haotian Fu, Bojun Cheng

    Abstract: Motion deblurring addresses the challenge of image blur caused by camera or scene movement. Event cameras provide motion information that is encoded in the asynchronous event streams. To efficiently leverage the temporal information of event streams, we employ Spiking Neural Networks (SNNs) for motion feature extraction and Artificial Neural Networks (ANNs) for color information processing. Due to… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

    Comments: 11 pages, 8 figures

  38. arXiv:2501.11577  [pdf, other

    cs.CR cs.LG

    Rethinking Membership Inference Attacks Against Transfer Learning

    Authors: Cong Wu, Jing Chen, Qianru Fang, Kun He, Ziming Zhao, Hao Ren, Guowen Xu, Yang Liu, Yang Xiang

    Abstract: Transfer learning, successful in knowledge translation across related tasks, faces a substantial privacy threat from membership inference attacks (MIAs). These attacks, despite posing significant risk to ML model's training data, remain limited-explored in transfer learning. The interaction between teacher and student models in transfer learning has not been thoroughly explored in MIAs, potentiall… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

  39. arXiv:2501.11360  [pdf, other

    cs.LG cs.AI

    Federated Learning with Sample-level Client Drift Mitigation

    Authors: Haoran Xu, Jiaze Li, Wanyi Wu, Hao Ren

    Abstract: Federated Learning (FL) suffers from severe performance degradation due to the data heterogeneity among clients. Existing works reveal that the fundamental reason is that data heterogeneity can cause client drift where the local model update deviates from the global one, and thus they usually tackle this problem from the perspective of calibrating the obtained local update. Despite effectiveness,… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

    Comments: Accepted by AAAI 2025

  40. arXiv:2501.11347  [pdf, other

    cs.CV

    EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery

    Authors: Guankun Wang, Long Bai, Junyi Wang, Kun Yuan, Zhen Li, Tianxu Jiang, Xiting He, Jinlin Wu, Zhen Chen, Zhen Lei, Hongbin Liu, Jiazheng Wang, Fan Zhang, Nicolas Padoy, Nassir Navab, Hongliang Ren

    Abstract: Recently, Multimodal Large Language Models (MLLMs) have demonstrated their immense potential in computer-aided diagnosis and decision-making. In the context of robotic-assisted surgery, MLLMs can serve as effective tools for surgical training and guidance. However, there is still a lack of MLLMs specialized for surgical scene understanding in clinical applications. In this work, we introduce EndoC… ▽ More

    Submitted 14 March, 2025; v1 submitted 20 January, 2025; originally announced January 2025.

  41. arXiv:2501.04993  [pdf, other

    cs.OS

    ByteFS: System Support for (CXL-based) Memory-Semantic Solid-State Drives

    Authors: Shaobo Li, Yirui Eric Zhou, Hao Ren, Jian Huang

    Abstract: Unlike non-volatile memory that resides on the processor memory bus, memory-semantic solid-state drives (SSDs) support both byte and block access granularity via PCIe or CXL interconnects. They provide scalable memory capacity using NAND flash at a much lower cost. In addition, they have different performance characteristics for their dual byte/block interface respectively, while offering essentia… ▽ More

    Submitted 9 January, 2025; originally announced January 2025.

    Comments: This paper is accepted at the 30th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2025)

  42. arXiv:2501.00909  [pdf, other

    cs.IT eess.SP

    RIS-Aided Integrated Sensing and Communication Systems under Dual-polarized Channels

    Authors: Dongnan Xia, Cunhua Pan, Hong Ren, Zhiyuan Yu, Yasheng Jin, Jiangzhou Wang

    Abstract: This paper considers reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) systems under dual-polarized (DP) channels. Unlike the existing ISAC systems, which ignored polarization of electromagnetic waves, this study adopts DP base station (BS) and DP RIS to serve users with a pair of DP antennas. The achievable sum rate is maximized through jointly optimiz… ▽ More

    Submitted 1 January, 2025; originally announced January 2025.

  43. arXiv:2412.20803  [pdf, other

    cs.CV

    Frequency-aware Event Cloud Network

    Authors: Hongwei Ren, Fei Ma, Xiaopeng Lin, Yuetong Fang, Hongxiang Huang, Yulong Huang, Yue Zhou, Haotian Fu, Ziyi Yang, Fei Richard Yu, Bojun Cheng

    Abstract: Event cameras are biologically inspired sensors that emit events asynchronously with remarkable temporal resolution, garnering significant attention from both industry and academia. Mainstream methods favor frame and voxel representations, which reach a satisfactory performance while introducing time-consuming transformation, bulky models, and sacrificing fine-grained temporal information. Alterna… ▽ More

    Submitted 30 December, 2024; originally announced December 2024.

    Comments: Under Review

  44. arXiv:2412.19819  [pdf, other

    cs.AR cs.AI

    ChipAlign: Instruction Alignment in Large Language Models for Chip Design via Geodesic Interpolation

    Authors: Chenhui Deng, Yunsheng Bai, Haoxing Ren

    Abstract: Recent advancements in large language models (LLMs) have expanded their application across various domains, including chip design, where domain-adapted chip models like ChipNeMo have emerged. However, these models often struggle with instruction alignment, a crucial capability for LLMs that involves following explicit human directives. This limitation impedes the practical application of chip LLMs… ▽ More

    Submitted 14 December, 2024; originally announced December 2024.

  45. arXiv:2412.19279  [pdf, other

    cs.SD cs.LG eess.AS

    Improving Generalization for AI-Synthesized Voice Detection

    Authors: Hainan Ren, Li Lin, Chun-Hao Liu, Xin Wang, Shu Hu

    Abstract: AI-synthesized voice technology has the potential to create realistic human voices for beneficial applications, but it can also be misused for malicious purposes. While existing AI-synthesized voice detection models excel in intra-domain evaluation, they face challenges in generalizing across different domains, potentially becoming obsolete as new voice generators emerge. Current solutions use div… ▽ More

    Submitted 30 December, 2024; v1 submitted 26 December, 2024; originally announced December 2024.

    Comments: AAAI25

  46. arXiv:2412.17595  [pdf, other

    cs.CV cs.AI cs.RO

    V$^2$-SfMLearner: Learning Monocular Depth and Ego-motion for Multimodal Wireless Capsule Endoscopy

    Authors: Long Bai, Beilei Cui, Liangyu Wang, Yanheng Li, Shilong Yao, Sishen Yuan, Yanan Wu, Yang Zhang, Max Q. -H. Meng, Zhen Li, Weiping Ding, Hongliang Ren

    Abstract: Deep learning can predict depth maps and capsule ego-motion from capsule endoscopy videos, aiding in 3D scene reconstruction and lesion localization. However, the collisions of the capsule endoscopies within the gastrointestinal tract cause vibration perturbations in the training data. Existing solutions focus solely on vision-based processing, neglecting other auxiliary signals like vibrations th… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

    Comments: To appear in IEEE Transactions on Automation Science and Engineering (IEEE TASE)

  47. arXiv:2412.16720  [pdf, other

    cs.AI

    OpenAI o1 System Card

    Authors: OpenAI, :, Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, Alex Iftimie, Alex Karpenko, Alex Tachard Passos, Alexander Neitz, Alexander Prokofiev, Alexander Wei, Allison Tam, Ally Bennett, Ananya Kumar, Andre Saraiva, Andrea Vallone, Andrew Duberstein, Andrew Kondrich , et al. (238 additional authors not shown)

    Abstract: The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment. This leads to state-of-the-ar… ▽ More

    Submitted 21 December, 2024; originally announced December 2024.

  48. arXiv:2412.16339  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Deliberative Alignment: Reasoning Enables Safer Language Models

    Authors: Melody Y. Guan, Manas Joglekar, Eric Wallace, Saachi Jain, Boaz Barak, Alec Helyar, Rachel Dias, Andrea Vallone, Hongyu Ren, Jason Wei, Hyung Won Chung, Sam Toyer, Johannes Heidecke, Alex Beutel, Amelia Glaese

    Abstract: As large-scale language models increasingly impact safety-critical domains, ensuring their reliable adherence to well-defined principles remains a fundamental challenge. We introduce Deliberative Alignment, a new paradigm that directly teaches the model safety specifications and trains it to explicitly recall and accurately reason over the specifications before answering. We used this approach to… ▽ More

    Submitted 8 January, 2025; v1 submitted 20 December, 2024; originally announced December 2024.

    Comments: 24 pages

  49. arXiv:2412.16219  [pdf, other

    cs.CV cs.NE

    Adaptive Calibration: A Unified Conversion Framework of Spiking Neural Network

    Authors: Ziqing Wang, Yuetong Fang, Jiahang Cao, Hongwei Ren, Renjing Xu

    Abstract: Spiking Neural Networks (SNNs) are seen as an energy-efficient alternative to traditional Artificial Neural Networks (ANNs), but the performance gap remains a challenge. While this gap is narrowing through ANN-to-SNN conversion, substantial computational resources are still needed, and the energy efficiency of converted SNNs cannot be ensured. To address this, we present a unified training-free co… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

  50. arXiv:2412.14018  [pdf, other

    cs.CV cs.AI cs.MM cs.RO

    SurgSora: Decoupled RGBD-Flow Diffusion Model for Controllable Surgical Video Generation

    Authors: Tong Chen, Shuya Yang, Junyi Wang, Long Bai, Hongliang Ren, Luping Zhou

    Abstract: Medical video generation has transformative potential for enhancing surgical understanding and pathology insights through precise and controllable visual representations. However, current models face limitations in controllability and authenticity. To bridge this gap, we propose SurgSora, a motion-controllable surgical video generation framework that uses a single input frame and user-controllable… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.