Skip to main content

Showing 1–50 of 523 results for author: Yu, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10527  [pdf, other

    cs.CL

    WorldPM: Scaling Human Preference Modeling

    Authors: Binghai Wang, Runji Lin, Keming Lu, Le Yu, Zhenru Zhang, Fei Huang, Chujie Zheng, Kai Dang, Yang Fan, Xingzhang Ren, An Yang, Binyuan Hui, Dayiheng Liu, Tao Gui, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang, Bowen Yu, Jingren Zhou, Junyang Lin

    Abstract: Motivated by scaling laws in language modeling that demonstrate how test loss scales as a power law with model and dataset sizes, we find that similar laws exist in preference modeling. We propose World Preference Modeling$ (WorldPM) to emphasize this scaling potential, where World Preference embodies a unified representation of human preferences. In this paper, we collect preference data from pub… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.09388  [pdf, other

    cs.CL

    Qwen3 Technical Report

    Authors: An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou , et al. (35 additional authors not shown)

    Abstract: In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert (MoE) architectures, with parameter scales ranging from 0.6 to 235 billion. A key innovation in Qwen3 is the integration… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  3. arXiv:2505.08784  [pdf, ps, other

    stat.ML cs.LG math.ST stat.ME

    PCS-UQ: Uncertainty Quantification via the Predictability-Computability-Stability Framework

    Authors: Abhineet Agarwal, Michael Xiao, Rebecca Barter, Omer Ronen, Boyu Fan, Bin Yu

    Abstract: As machine learning (ML) models are increasingly deployed in high-stakes domains, trustworthy uncertainty quantification (UQ) is critical for ensuring the safety and reliability of these models. Traditional UQ methods rely on specifying a true generative model and are not robust to misspecification. On the other hand, conformal inference allows for arbitrary ML models but does not consider model s… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  4. arXiv:2505.08341  [pdf, ps, other

    cs.AI cs.MA q-bio.GN

    Benchmarking AI scientists in omics data-driven biological research

    Authors: Erpai Luo, Jinmeng Jia, Yifan Xiong, Xiangyu Li, Xiaobo Guo, Baoqi Yu, Lei Wei, Xuegong Zhang

    Abstract: The rise of large language models and multi-agent systems has sparked growing interest in AI scientists capable of autonomous biological research. However, existing benchmarks either focus on reasoning without data or on data analysis with predefined statistical answers, lacking realistic, data-driven evaluation settings. Here, we introduce the Biological AI Scientist Benchmark (BaisBench), a benc… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  5. arXiv:2505.07654  [pdf, ps, other

    eess.IV cs.CV

    Breast Cancer Classification in Deep Ultraviolet Fluorescence Images Using a Patch-Level Vision Transformer Framework

    Authors: Pouya Afshin, David Helminiak, Tongtong Lu, Tina Yen, Julie M. Jorns, Mollie Patton, Bing Yu, Dong Hye Ye

    Abstract: Breast-conserving surgery (BCS) aims to completely remove malignant lesions while maximizing healthy tissue preservation. Intraoperative margin assessment is essential to achieve a balance between thorough cancer resection and tissue conservation. A deep ultraviolet fluorescence scanning microscope (DUV-FSM) enables rapid acquisition of whole surface images (WSIs) for excised tissue, providing con… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  6. arXiv:2505.04173  [pdf, other

    cs.LG cs.CV

    DiffPattern-Flex: Efficient Layout Pattern Generation via Discrete Diffusion

    Authors: Zixiao Wang, Wenqian Zhao, Yunheng Shen, Yang Bai, Guojin Chen, Farzan Farnia, Bei Yu

    Abstract: Recent advancements in layout pattern generation have been dominated by deep generative models. However, relying solely on neural networks for legality guarantees raises concerns in many practical applications. In this paper, we present \tool{DiffPattern}-Flex, a novel approach designed to generate reliable layout patterns efficiently. \tool{DiffPattern}-Flex incorporates a new method for generati… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 13 pages, 13 figures. Accepted by TCAD

  7. arXiv:2505.03469  [pdf, other

    cs.CL

    Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models

    Authors: Bin Yu, Hang Yuan, Yuliang Wei, Bailing Wang, Weizhen Qi, Kai Chen

    Abstract: Recent advances in large language models have demonstrated that Supervised Fine-Tuning (SFT) with Chain-of-Thought (CoT) reasoning data distilled from large reasoning models (e.g., DeepSeek R1) can effectively transfer reasoning capabilities to non-reasoning models. However, models fine-tuned with this approach inherit the "overthinking" problem from teacher models, producing verbose and redundant… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 11 pages, 2 figures

  8. arXiv:2505.00938  [pdf, other

    cs.CV cs.AI

    CDFormer: Cross-Domain Few-Shot Object Detection Transformer Against Feature Confusion

    Authors: Boyuan Meng, Xiaohan Zhang, Peilin Li, Zhe Wu, Yiming Li, Wenkai Zhao, Beinan Yu, Hui-Liang Shen

    Abstract: Cross-domain few-shot object detection (CD-FSOD) aims to detect novel objects across different domains with limited class instances. Feature confusion, including object-background confusion and object-object confusion, presents significant challenges in both cross-domain and few-shot settings. In this work, we introduce CDFormer, a cross-domain few-shot object detection transformer against feature… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  9. arXiv:2504.17801  [pdf, other

    cs.NE cs.AI

    Evolution of Optimization Algorithms for Global Placement via Large Language Models

    Authors: Xufeng Yao, Jiaxi Jiang, Yuxuan Zhao, Peiyu Liao, Yibo Lin, Bei Yu

    Abstract: Optimization algorithms are widely employed to tackle complex problems, but designing them manually is often labor-intensive and requires significant expertise. Global placement is a fundamental step in electronic design automation (EDA). While analytical approaches represent the state-of-the-art (SOTA) in global placement, their core optimization algorithms remain heavily dependent on heuristics… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  10. arXiv:2504.14800  [pdf, other

    cs.LG cs.CV

    A Survey on Small Sample Imbalance Problem: Metrics, Feature Analysis, and Solutions

    Authors: Shuxian Zhao, Jie Gui, Minjing Dong, Baosheng Yu, Zhipeng Gui, Lu Dong, Yuan Yan Tang, James Tin-Yau Kwok

    Abstract: The small sample imbalance (S&I) problem is a major challenge in machine learning and data analysis. It is characterized by a small number of samples and an imbalanced class distribution, which leads to poor model performance. In addition, indistinct inter-class feature distributions further complicate classification tasks. Existing methods often rely on algorithmic heuristics without sufficiently… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  11. arXiv:2504.14657  [pdf, other

    cs.CL cs.AI cs.LG

    A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMs

    Authors: Yihan Lin, Zhirong Bella Yu, Simon Lee

    Abstract: Synthetic Electronic Health Records (EHRs) offer a valuable opportunity to create privacy preserving and harmonized structured data, supporting numerous applications in healthcare. Key benefits of synthetic data include precise control over the data schema, improved fairness and representation of patient populations, and the ability to share datasets without concerns about compromising real indivi… ▽ More

    Submitted 25 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

    Comments: Accepted at the Conference of Health, Inference, Learning (CHIL 2025) in Berkeley, CA. To appear in PMLR later in 2025

  12. arXiv:2504.14286  [pdf, other

    cs.LG

    SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM

    Authors: Xiaojiang Zhang, Jinghui Wang, Zifei Cheng, Wenhao Zhuang, Zheng Lin, Minglei Zhang, Shaojie Wang, Yinghan Cui, Chao Wang, Junyi Peng, Shimiao Jiang, Shiqi Kuang, Shouyu Yin, Chaohang Wen, Haotian Zhang, Bin Chen, Bing Yu

    Abstract: Recent advances of reasoning models, exemplified by OpenAI's o1 and DeepSeek's R1, highlight the significant potential of Reinforcement Learning (RL) to enhance the reasoning capabilities of Large Language Models (LLMs). However, replicating these advancements across diverse domains remains challenging due to limited methodological transparency. In this work, we present two-Staged history-Resampli… ▽ More

    Submitted 22 April, 2025; v1 submitted 19 April, 2025; originally announced April 2025.

  13. arXiv:2504.12323  [pdf, other

    cs.CL cs.AI

    The Other Side of the Coin: Exploring Fairness in Retrieval-Augmented Generation

    Authors: Zheng Zhang, Ning Li, Qi Liu, Rui Li, Weibo Gao, Qingyang Mao, Zhenya Huang, Baosheng Yu, Dacheng Tao

    Abstract: Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by retrieving relevant document from external knowledge sources. By referencing this external knowledge, RAG effectively reduces the generation of factually incorrect content and addresses hallucination issues within LLMs. Recently, there has been growing attention to improving the performance and efficiency of RAG systems… ▽ More

    Submitted 19 April, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

    Comments: 12 pages

  14. arXiv:2504.09461  [pdf, other

    cs.RO cs.AR

    ADDT -- A Digital Twin Framework for Proactive Safety Validation in Autonomous Driving Systems

    Authors: Bo Yu, Chaoran Yuan, Zishen Wan, Jie Tang, Fadi Kurdahi, Shaoshan Liu

    Abstract: Autonomous driving systems continue to face safety-critical failures, often triggered by rare and unpredictable corner cases that evade conventional testing. We present the Autonomous Driving Digital Twin (ADDT) framework, a high-fidelity simulation platform designed to proactively identify hidden faults, evaluate real-time performance, and validate safety before deployment. ADDT combines realisti… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  15. arXiv:2504.08296  [pdf, other

    cs.CV

    Generative AI for Film Creation: A Survey of Recent Advances

    Authors: Ruihan Zhang, Borou Yu, Jiajian Min, Yetong Xin, Zheng Wei, Juncheng Nemo Shi, Mingzhen Huang, Xianghao Kong, Nix Liu Xin, Shanshan Jiang, Praagya Bahuguna, Mark Chan, Khushi Hora, Lijian Yang, Yongqi Liang, Runhe Bian, Yunlei Liu, Isabela Campillo Valencia, Patricia Morales Tredinick, Ilia Kozlov, Sijia Jiang, Peiwen Huang, Na Chen, Xuanxuan Liu, Anyi Rao

    Abstract: Generative AI (GenAI) is transforming filmmaking, equipping artists with tools like text-to-image and image-to-video diffusion, neural radiance fields, avatar generation, and 3D synthesis. This paper examines the adoption of these technologies in filmmaking, analyzing workflows from recent AI-driven films to understand how GenAI contributes to character creation, aesthetic styling, and narration.… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: Accepted at CVPR 2025 CVEU workshop: AI for Creative Visual Content Generation Editing and Understanding

  16. arXiv:2504.02404  [pdf, other

    cs.CL

    AnesBench: Multi-Dimensional Evaluation of LLM Reasoning in Anesthesiology

    Authors: Xiang Feng, Wentao Jiang, Zengmao Wang, Yong Luo, Pingbo Xu, Baosheng Yu, Hua Jin, Bo Du, Jing Zhang

    Abstract: The application of large language models (LLMs) in the medical field has gained significant attention, yet their reasoning capabilities in more specialized domains like anesthesiology remain underexplored. In this paper, we systematically evaluate the reasoning capabilities of LLMs in anesthesiology and analyze key factors influencing their performance. To this end, we introduce AnesBench, a cross… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: 23 pages, 9 figures

  17. arXiv:2503.21210  [pdf, other

    cs.CV

    FakeReasoning: Towards Generalizable Forgery Detection and Reasoning

    Authors: Yueying Gao, Dongliang Chang, Bingyao Yu, Haotian Qin, Lei Chen, Kongming Liang, Zhanyu Ma

    Abstract: Accurate and interpretable detection of AI-generated images is essential for mitigating risks associated with AI misuse. However, the substantial domain gap among generative models makes it challenging to develop a generalizable forgery detection model. Moreover, since every pixel in an AI-generated image is synthesized, traditional saliency-based forgery explanation methods are not well suited fo… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  18. arXiv:2503.19937  [pdf, other

    cs.CV cs.AI

    Reverse Prompt: Cracking the Recipe Inside Text-to-Image Generation

    Authors: Zhiyao Ren, Yibing Zhan, Baosheng Yu, Dacheng Tao

    Abstract: Text-to-image generation has become increasingly popular, but achieving the desired images often requires extensive prompt engineering. In this paper, we explore how to decode textual prompts from reference images, a process we refer to as image reverse prompt engineering. This technique enables us to gain insights from reference images, understand the creative processes of great artists, and gene… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  19. arXiv:2503.12935  [pdf, other

    cs.CV

    L2HCount:Generalizing Crowd Counting from Low to High Crowd Density via Density Simulation

    Authors: Guoliang Xu, Jianqin Yin, Ren Zhang, Yonghao Dang, Feng Zhou, Bo Yu

    Abstract: Since COVID-19, crowd-counting tasks have gained wide applications. While supervised methods are reliable, annotation is more challenging in high-density scenes due to small head sizes and severe occlusion, whereas it's simpler in low-density scenes. Interestingly, can we train the model in low-density scenes and generalize it to high-density scenes? Therefore, we propose a low- to high-density ge… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  20. arXiv:2503.12512  [pdf, other

    cs.AR

    A Systematic Approach for Multi-objective Double-side Clock Tree Synthesis

    Authors: Xun Jiang, Haoran Lu, Yuxuan Zhao, Jiarui Wang, Zizheng Guo, Heng Wu, Bei Yu, Sung Kyu Lim, Runsheng Wang, Ru Huang, Yibo Lin

    Abstract: As the scaling of semiconductor devices nears its limits, utilizing the back-side space of silicon has emerged as a new trend for future integrated circuits. With intense interest, several works have hacked existing backend tools to explore the potential of synthesizing double-side clock trees via nano Through-Silicon-Vias (nTSVs). However, these works lack a systematic perspective on design resou… ▽ More

    Submitted 8 May, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

  21. arXiv:2503.12496  [pdf, other

    cs.CV

    Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?

    Authors: Tianyuan Qu, Longxiang Tang, Bohao Peng, Senqiao Yang, Bei Yu, Jiaya Jia

    Abstract: The rise of Large Vision-Language Models (LVLMs) has significantly advanced video understanding. However, efficiently processing long videos remains a challenge due to the ``Sampling Dilemma'': low-density sampling risks missing critical information, while high-density sampling introduces redundancy. To address this issue, we introduce LSDBench, the first benchmark designed to evaluate LVLMs on lo… ▽ More

    Submitted 27 March, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

  22. arXiv:2503.11004  [pdf, other

    cs.CV

    VA-AR: Learning Velocity-Aware Action Representations with Mixture of Window Attention

    Authors: Jiangning Wei, Lixiong Qin, Bo Yu, Tianjian Zou, Chuhan Yan, Dandan Xiao, Yang Yu, Lan Yang, Ke Li, Jun Liu

    Abstract: Action recognition is a crucial task in artificial intelligence, with significant implications across various domains. We initially perform a comprehensive analysis of seven prominent action recognition methods across five widely-used datasets. This analysis reveals a critical, yet previously overlooked, observation: as the velocity of actions increases, the performance of these methods variably d… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: Accepted by AAAI 2025

  23. arXiv:2503.08575  [pdf, other

    cs.CV

    Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation

    Authors: Mingkang Zhu, Xi Chen, Zhongdao Wang, Bei Yu, Hengshuang Zhao, Jiaya Jia

    Abstract: Recent diffusion model customization has shown impressive results in incorporating subject or style concepts with a handful of images. However, the modular composition of multiple concepts into a customized model, aimed to efficiently merge decentralized-trained concepts without influencing their identities, remains unresolved. Modular customization is essential for applications like concept styli… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  24. arXiv:2503.08038  [pdf, other

    cs.LG cs.AI cs.CV

    Generalized Kullback-Leibler Divergence Loss

    Authors: Jiequan Cui, Beier Zhu, Qingshan Xu, Zhuotao Tian, Xiaojuan Qi, Bei Yu, Hanwang Zhang, Richang Hong

    Abstract: In this paper, we delve deeper into the Kullback-Leibler (KL) Divergence loss and mathematically prove that it is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss that consists of (1) a weighted Mean Square Error (wMSE) loss and (2) a Cross-Entropy loss incorporating soft labels. Thanks to the decoupled structure of DKL loss, we have identified two areas for improvement. Firstly,… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: extension of our NeurIPS paper "Decoupled Kullback-Leibler Divergence Loss". arXiv admin note: substantial text overlap with arXiv:2305.13948

  25. arXiv:2503.06998  [pdf, other

    cs.CV

    SOYO: A Tuning-Free Approach for Video Style Morphing via Style-Adaptive Interpolation in Diffusion Models

    Authors: Haoyu Zheng, Qifan Yu, Binghe Yu, Yang Dai, Wenqiao Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang

    Abstract: Diffusion models have achieved remarkable progress in image and video stylization. However, most existing methods focus on single-style transfer, while video stylization involving multiple styles necessitates seamless transitions between them. We refer to this smooth style transition between video frames as video style morphing. Current approaches often generate stylized video frames with disconti… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  26. arXiv:2503.06730  [pdf, other

    cs.LG

    Adaptive Test-Time Intervention for Concept Bottleneck Models

    Authors: Matthew Shen, Aliyah Hsu, Abhineet Agarwal, Bin Yu

    Abstract: Concept bottleneck models (CBM) aim to improve model interpretability by predicting human level "concepts" in a bottleneck within a deep learning model architecture. However, how the predicted concepts are used in predicting the target still either remains black-box or is simplified to maintain interpretability at the cost of prediction performance. We propose to use Fast Interpretable Greedy Sum-… ▽ More

    Submitted 14 April, 2025; v1 submitted 9 March, 2025; originally announced March 2025.

  27. arXiv:2503.06520  [pdf, other

    cs.CV cs.MM

    Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement

    Authors: Yuqi Liu, Bohao Peng, Zhisheng Zhong, Zihao Yue, Fanbin Lu, Bei Yu, Jiaya Jia

    Abstract: Traditional methods for reasoning segmentation rely on supervised fine-tuning with categorical labels and simple descriptions, limiting its out-of-domain generalization and lacking explicit reasoning processes. To address these limitations, we propose Seg-Zero, a novel framework that demonstrates remarkable generalizability and derives explicit chain-of-thought reasoning through cognitive reinforc… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  28. arXiv:2503.05161  [pdf, other

    cs.CV cs.CE

    GaussianCAD: Robust Self-Supervised CAD Reconstruction from Three Orthographic Views Using 3D Gaussian Splatting

    Authors: Zheng Zhou, Zhe Li, Bo Yu, Lina Hu, Liang Dong, Zijian Yang, Xiaoli Liu, Ning Xu, Ziwei Wang, Yonghao Dang, Jianqin Yin

    Abstract: The automatic reconstruction of 3D computer-aided design (CAD) models from CAD sketches has recently gained significant attention in the computer vision community. Most existing methods, however, rely on vector CAD sketches and 3D ground truth for supervision, which are often difficult to be obtained in industrial applications and are sensitive to noise inputs. We propose viewing CAD reconstructio… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  29. arXiv:2503.04625  [pdf, other

    cs.CL

    START: Self-taught Reasoner with Tools

    Authors: Chengpeng Li, Mingfeng Xue, Zhenru Zhang, Jiaxi Yang, Beichen Zhang, Xiang Wang, Bowen Yu, Binyuan Hui, Junyang Lin, Dayiheng Liu

    Abstract: Large reasoning models (LRMs) like OpenAI-o1 and DeepSeek-R1 have demonstrated remarkable capabilities in complex reasoning tasks through the utilization of long Chain-of-thought (CoT). However, these models often suffer from hallucinations and inefficiencies due to their reliance solely on internal reasoning processes. In this paper, we introduce START (Self-Taught Reasoner with Tools), a novel t… ▽ More

    Submitted 7 March, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    Comments: 38 pages, 5 figures and 6 tables

  30. arXiv:2503.03705  [pdf, other

    cs.CL cs.LG

    Effective LLM Knowledge Learning via Model Generalization

    Authors: Mingkang Zhu, Xi Chen, Zhongdao Wang, Bei Yu, Hengshuang Zhao, Jiaya Jia

    Abstract: Large language models (LLMs) are trained on enormous documents that contain extensive world knowledge. However, it is still not well-understood how knowledge is acquired via autoregressive pre-training. This lack of understanding greatly hinders effective knowledge learning, especially for continued pretraining on up-to-date information, as this evolving information often lacks diverse repetitions… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  31. arXiv:2503.02356  [pdf, other

    cs.DC

    Efficient Long Context Fine-tuning with Chunk Flow

    Authors: Xiulong Yuan, Hongtao Xu, Wenting Shen, Ang Wang, Xiafei Qiu, Jie Zhang, Yuqiong Liu, Bowen Yu, Junyang Lin, Mingzhen Li, Weile Jia, Yong Li, Wei Lin

    Abstract: Long context fine-tuning of large language models(LLMs) involves training on datasets that are predominantly composed of short sequences and a small proportion of longer sequences. However, existing approaches overlook this long-tail distribution and employ training strategies designed specifically for long sequences. Moreover, these approaches also fail to address the challenges posed by variable… ▽ More

    Submitted 5 March, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

  32. arXiv:2502.20500  [pdf, other

    cs.RO math.OC

    Equivariant Reinforcement Learning Frameworks for Quadrotor Low-Level Control

    Authors: Beomyeol Yu, Taeyoung Lee

    Abstract: Improving sampling efficiency and generalization capability is critical for the successful data-driven control of quadrotor unmanned aerial vehicles (UAVs) that are inherently unstable. While various reinforcement learning (RL) approaches have been applied to autonomous quadrotor flight, they often require extensive training data, posing multiple challenges and safety risks in practice. To address… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: 14 pages, 8 figures

  33. arXiv:2502.16906  [pdf, other

    cs.CL

    AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language Models

    Authors: Qin Zhu, Fei Huang, Runyu Peng, Keming Lu, Bowen Yu, Qinyuan Cheng, Xipeng Qiu, Xuanjing Huang, Junyang Lin

    Abstract: While logical reasoning evaluation of Large Language Models (LLMs) has attracted significant attention, existing benchmarks predominantly rely on multiple-choice formats that are vulnerable to random guessing, leading to overestimated performance and substantial performance fluctuations. To obtain more accurate assessments of models' reasoning capabilities, we propose an automated method for synth… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

  34. arXiv:2502.13870  [pdf, other

    cs.LG cs.AI cs.CL cs.IT

    SPEX: Scaling Feature Interaction Explanations for LLMs

    Authors: Justin Singh Kang, Landon Butler, Abhineet Agarwal, Yigit Efe Erginbas, Ramtin Pedarsani, Kannan Ramchandran, Bin Yu

    Abstract: Large language models (LLMs) have revolutionized machine learning due to their ability to capture complex interactions between input features. Popular post-hoc explanation methods like SHAP provide marginal feature attributions, while their extensions to interaction importances only scale to small input lengths ($\approx 20$). We propose Spectral Explainer (SPEX), a model-agnostic interaction attr… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  35. arXiv:2502.13383  [pdf, other

    cs.CL cs.CV cs.LG

    MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification

    Authors: Linzhuang Sun, Hao Liang, Jingxuan Wei, Bihui Yu, Tianpeng Li, Fan Yang, Zenan Zhou, Wentao Zhang

    Abstract: According to the Test-Time Scaling, the integration of External Slow-Thinking with the Verify mechanism has been demonstrated to enhance multi-round reasoning in large language models (LLMs). However, in the multimodal (MM) domain, there is still a lack of a strong MM-Verifier. In this paper, we introduce MM-Verifier and MM-Reasoner to enhance multimodal reasoning through longer inference and more… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  36. arXiv:2502.13283  [pdf, other

    cs.LG stat.ML

    Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression

    Authors: Jingfeng Wu, Peter Bartlett, Matus Telgarsky, Bin Yu

    Abstract: In overparameterized logistic regression, gradient descent (GD) iterates diverge in norm while converging in direction to the maximum $\ell_2$-margin solution -- a phenomenon known as the implicit bias of GD. This work investigates additional regularization effects induced by early stopping in well-specified high-dimensional logistic regression. We first demonstrate that the excess logistic risk v… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  37. arXiv:2502.12751  [pdf, other

    cs.LG

    Architect of the Bits World: Masked Autoregressive Modeling for Circuit Generation Guided by Truth Table

    Authors: Haoyuan Wu, Haisheng Zheng, Shoubo Hu, Zhuolun He, Bei Yu

    Abstract: Logic synthesis, a critical stage in electronic design automation (EDA), optimizes gate-level circuits to minimize power consumption and area occupancy in integrated circuits (ICs). Traditional logic synthesis tools rely on human-designed heuristics, often yielding suboptimal results. Although differentiable architecture search (DAS) has shown promise in generating circuits from truth tables, it f… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  38. arXiv:2502.12732  [pdf, other

    cs.LG

    Circuit Representation Learning with Masked Gate Modeling and Verilog-AIG Alignment

    Authors: Haoyuan Wu, Haisheng Zheng, Yuan Pu, Bei Yu

    Abstract: Understanding the structure and function of circuits is crucial for electronic design automation (EDA). Circuits can be formulated as And-Inverter graphs (AIGs), enabling efficient implementation of representation learning through graph neural networks (GNNs). Masked modeling paradigms have been proven effective in graph representation learning. However, masking augmentation to original circuits w… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  39. arXiv:2502.12502  [pdf, other

    cs.CL

    Efficient OpAmp Adaptation for Zoom Attention to Golden Contexts

    Authors: Haoyuan Wu, Rui Ming, Haisheng Zheng, Zhuolun He, Bei Yu

    Abstract: Large language models (LLMs) have shown significant promise in question-answering (QA) tasks, particularly in retrieval-augmented generation (RAG) scenarios and long-context applications. However, their performance is hindered by noisy reference documents, which often distract from essential information. Despite fine-tuning efforts, Transformer-based architectures struggle to prioritize relevant c… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  40. arXiv:2502.12159  [pdf, other

    physics.soc-ph cs.CL

    Causal Interpretations in Observational Studies: The Role of Sociocultural Backgrounds and Team Dynamics

    Authors: Jun Wang, Bei Yu

    Abstract: The prevalence of drawing causal conclusions from observational studies has raised concerns about potential exaggeration in science communication. While some believe causal language should only apply to randomized controlled trials, others argue that rigorous methods can justify causal claims in observational studies. Ideally, causal language should align with the strength of the evidence. However… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

    Comments: 13 pages, 4 figures, 2 tables

  41. arXiv:2502.11095  [pdf, other

    cs.CL

    A Survey of Large Language Models in Psychotherapy: Current Landscape and Future Directions

    Authors: Hongbin Na, Yining Hua, Zimu Wang, Tao Shen, Beibei Yu, Lilin Wang, Wei Wang, John Torous, Ling Chen

    Abstract: Mental health remains a critical global challenge, with increasing demand for accessible, effective interventions. Large language models (LLMs) offer promising solutions in psychotherapy by enhancing the assessment, diagnosis, and treatment of mental health conditions through dynamic, context-aware interactions. This survey provides a comprehensive overview of the current landscape of LLM applicat… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

    Comments: in progress

  42. arXiv:2502.10857  [pdf, other

    cs.CL

    Divergent Thoughts toward One Goal: LLM-based Multi-Agent Collaboration System for Electronic Design Automation

    Authors: Haoyuan Wu, Haisheng Zheng, Zhuolun He, Bei Yu

    Abstract: Recently, with the development of tool-calling capabilities in large language models (LLMs), these models have demonstrated significant potential for automating electronic design automation (EDA) flows by interacting with EDA tool APIs via EDA scripts. However, considering the limited understanding of EDA tools, LLMs face challenges in practical scenarios where diverse interfaces of EDA tools exis… ▽ More

    Submitted 15 February, 2025; originally announced February 2025.

  43. arXiv:2502.09838  [pdf, other

    cs.CV cs.AI

    HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

    Authors: Tianwei Lin, Wenqiao Zhang, Sijing Li, Yuqian Yuan, Binhe Yu, Haoyuan Li, Wanggui He, Hao Jiang, Mengze Li, Xiaohui Song, Siliang Tang, Jun Xiao, Hui Lin, Yueting Zhuang, Beng Chin Ooi

    Abstract: We present HealthGPT, a powerful Medical Large Vision-Language Model (Med-LVLM) that integrates medical visual comprehension and generation capabilities within a unified autoregressive paradigm. Our bootstrapping philosophy is to progressively adapt heterogeneous comprehension and generation knowledge to pre-trained large language models (LLMs). This is achieved through a novel heterogeneous low-r… ▽ More

    Submitted 21 February, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

    Comments: Comments: added project page

  44. arXiv:2502.09793  [pdf, other

    cs.CV

    Noise Controlled CT Super-Resolution with Conditional Diffusion Model

    Authors: Yuang Wang, Siyeop Yoon, Rui Hu, Baihui Yu, Duhgoon Lee, Rajiv Gupta, Li Zhang, Zhiqiang Chen, Dufan Wu

    Abstract: Improving the spatial resolution of CT images is a meaningful yet challenging task, often accompanied by the issue of noise amplification. This article introduces an innovative framework for noise-controlled CT super-resolution utilizing the conditional diffusion model. The model is trained on hybrid datasets, combining noise-matched simulation data with segmented details from real data. Experimen… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

    Comments: The 8th International Conference on Image Formation in X-Ray Computed Tomography, Bamberg, Germany, August 5 - 9, 2024

  45. arXiv:2502.06838  [pdf, other

    cs.LG

    TorchResist: Open-Source Differentiable Resist Simulator

    Authors: Zixiao Wang, Jieya Zhou, Su Zheng, Shuo Yin, Kaichao Liang, Shoubo Hu, Xiao Chen, Bei Yu

    Abstract: Recent decades have witnessed remarkable advancements in artificial intelligence (AI), including large language models (LLMs), image and video generative models, and embodied AI systems. These advancements have led to an explosive increase in the demand for computational power, challenging the limits of Moore's Law. Optical lithography, a critical technology in semiconductor manufacturing, faces s… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

    Comments: SPIE Advanced Lithography + Patterning, 2025

  46. arXiv:2502.05817  [pdf, other

    cs.RO eess.SY

    DreamFLEX: Learning Fault-Aware Quadrupedal Locomotion Controller for Anomaly Situation in Rough Terrains

    Authors: Seunghyun Lee, I Made Aswin Nahrendra, Dongkyu Lee, Byeongho Yu, Minho Oh, Hyun Myung

    Abstract: Recent advances in quadrupedal robots have demonstrated impressive agility and the ability to traverse diverse terrains. However, hardware issues, such as motor overheating or joint locking, may occur during long-distance walking or traversing through rough terrains leading to locomotion failures. Although several studies have proposed fault-tolerant control methods for quadrupedal robots, there a… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

    Comments: Accepted for ICRA 2025. Project site is available at https://dreamflex.github.io/

  47. arXiv:2502.05409  [pdf, other

    cs.CV cs.AI cs.LG cs.RO eess.SY

    Vision-in-the-loop Simulation for Deep Monocular Pose Estimation of UAV in Ocean Environment

    Authors: Maneesha Wickramasuriya, Beomyeol Yu, Taeyoung Lee, Murray Snyder

    Abstract: This paper proposes a vision-in-the-loop simulation environment for deep monocular pose estimation of a UAV operating in an ocean environment. Recently, a deep neural network with a transformer architecture has been successfully trained to estimate the pose of a UAV relative to the flight deck of a research vessel, overcoming several limitations of GPS-based approaches. However, validating the dee… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

    Comments: 8 pages, 15 figures, conference

  48. arXiv:2502.04416  [pdf, other

    cs.LG cs.AI

    CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference

    Authors: Zehua Pei, Lancheng Zou, Hui-Ling Zhen, Xianzhi Yu, Wulong Liu, Sinno Jialin Pan, Mingxuan Yuan, Bei Yu

    Abstract: Large language models (LLMs) achieve impressive performance by scaling model parameters, but this comes with significant inference overhead. Feed-forward networks (FFNs), which dominate LLM parameters, exhibit high activation sparsity in hidden neurons. To exploit this, researchers have proposed using a mixture-of-experts (MoE) architecture, where only a subset of parameters is activated. However,… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  49. arXiv:2502.02869  [pdf, other

    cs.LG cs.AI

    OmniRL: In-Context Reinforcement Learning by Large-Scale Meta-Training in Randomized Worlds

    Authors: Fan Wang, Pengtao Shao, Yiming Zhang, Bo Yu, Shaoshan Liu, Ning Ding, Yang Cao, Yu Kang, Haifeng Wang

    Abstract: We introduce OmniRL, a highly generalizable in-context reinforcement learning (ICRL) model that is meta-trained on hundreds of thousands of diverse tasks. These tasks are procedurally generated by randomizing state transitions and rewards within Markov Decision Processes. To facilitate this extensive meta-training, we propose two key innovations: 1. An efficient data synthesis pipeline for ICRL, w… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

    Comments: Preprint

  50. arXiv:2502.02779  [pdf, other

    cs.CV cs.AI

    3D Foundation AI Model for Generalizable Disease Detection in Head Computed Tomography

    Authors: Weicheng Zhu, Haoxu Huang, Huanze Tang, Rushabh Musthyala, Boyang Yu, Long Chen, Emilio Vega, Thomas O'Donnell, Seena Dehkharghani, Jennifer A. Frontera, Arjun V. Masurkar, Kara Melmed, Narges Razavian

    Abstract: Head computed tomography (CT) imaging is a widely-used imaging modality with multitudes of medical indications, particularly in assessing pathology of the brain, skull, and cerebrovascular system. It is commonly the first-line imaging in neurologic emergencies given its rapidity of image acquisition, safety, cost, and ubiquity. Deep learning models may facilitate detection of a wide range of disea… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

    Comments: Under Review Preprint