Skip to main content

Showing 1–50 of 137 results for author: Pan, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.15867  [pdf, ps, other

    cs.HC

    Understanding the Role of Large Language Models in Competitive Programming

    Authors: Dongyijie Primo Pan, Ji Zhu, Lan Luo, Zhiqi Gao, Xin Tong, Pan Hui

    Abstract: This paper investigates how large language models (LLMs) are reshaping competitive programming. The field functions as an intellectual contest within computer science education and is marked by rapid iteration, real-time feedback, transparent solutions, and strict integrity norms. Prior work has evaluated LLMs performance on contest problems, but little is known about how human stakeholders -- con… ▽ More

    Submitted 19 September, 2025; originally announced September 2025.

  2. arXiv:2509.14169  [pdf, ps, other

    cs.LG

    TopoSizing: An LLM-aided Framework of Topology-based Understanding and Sizing for AMS Circuits

    Authors: Ziming Wei, Zichen Kong, Yuan Wang, David Z. Pan, Xiyuan Tang

    Abstract: Analog and mixed-signal circuit design remains challenging due to the shortage of high-quality data and the difficulty of embedding domain knowledge into automated flows. Traditional black-box optimization achieves sampling efficiency but lacks circuit understanding, which often causes evaluations to be wasted in low-value regions of the design space. In contrast, learning-based methods embed stru… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

  3. arXiv:2509.02208  [pdf, ps, other

    cs.LG cs.AI

    Baichuan-M2: Scaling Medical Capability with Large Verifier System

    Authors: Baichuan-M2 Team, :, Chengfeng Dou, Chong Liu, Fan Yang, Fei Li, Jiyuan Jia, Mingyang Chen, Qiang Ju, Shuai Wang, Shunya Dang, Tianpeng Li, Xiangrong Zeng, Yijie Zhou, Chenzheng Zhu, Da Pan, Fei Deng, Guangwei Ai, Guosheng Dong, Hongda Zhang, Jinyang Tai, Jixiang Hong, Kai Lu, Linzhuang Sun, Peidong Guo , et al. (10 additional authors not shown)

    Abstract: As large language models (LLMs) advance in conversational and reasoning capabilities, their practical application in healthcare has become a critical research focus. However, there is a notable gap between the performance of medical LLMs on static benchmarks such as USMLE and their utility in real-world clinical decision-making. This discrepancy arises because traditional exams fail to capture the… ▽ More

    Submitted 2 September, 2025; originally announced September 2025.

    Comments: Baichuan-M2 Technical Report

  4. arXiv:2509.01322  [pdf, ps, other

    cs.CL cs.AI cs.DC cs.LG

    LongCat-Flash Technical Report

    Authors: Meituan LongCat Team, Bayan, Bei Li, Bingye Lei, Bo Wang, Bolin Rong, Chao Wang, Chao Zhang, Chen Gao, Chen Zhang, Cheng Sun, Chengcheng Han, Chenguang Xi, Chi Zhang, Chong Peng, Chuan Qin, Chuyu Zhang, Cong Chen, Congkui Wang, Dan Ma, Daoru Pan, Defei Bu, Dengchang Zhao, Deyang Kong, Dishan Liu , et al. (157 additional authors not shown)

    Abstract: We introduce LongCat-Flash, a 560-billion-parameter Mixture-of-Experts (MoE) language model designed for both computational efficiency and advanced agentic capabilities. Stemming from the need for scalable efficiency, LongCat-Flash adopts two novel designs: (a) Zero-computation Experts, which enables dynamic computational budget allocation and activates 18.6B-31.3B (27B on average) per token depen… ▽ More

    Submitted 19 September, 2025; v1 submitted 1 September, 2025; originally announced September 2025.

  5. arXiv:2508.13666  [pdf, ps, other

    cs.SE cs.AI

    The Hidden Cost of Readability: How Code Formatting Silently Consumes Your LLM Budget

    Authors: Dangfeng Pan, Zhensu Sun, Cenyuan Zhang, David Lo, Xiaoning Du

    Abstract: Source code is usually formatted with elements like indentation and newlines to improve readability for human developers. However, these visual aids do not seem to be beneficial for large language models (LLMs) in the same way since the code is processed as a linear sequence of tokens. Furthermore, these additional tokens can lead to increased computational costs and longer response times for LLMs… ▽ More

    Submitted 19 August, 2025; originally announced August 2025.

    Comments: Accepted by ICSE'26 (First Cycle)

  6. arXiv:2508.02518  [pdf, ps, other

    cs.LG

    AnalogCoder-Pro: Unifying Analog Circuit Generation and Optimization via Multi-modal LLMs

    Authors: Yao Lai, Souradip Poddar, Sungyoung Lee, Guojin Chen, Mengkang Hu, Bei Yu, Ping Luo, David Z. Pan

    Abstract: Despite recent advances, analog front-end design still relies heavily on expert intuition and iterative simulations, which limits the potential for automation. We present AnalogCoder-Pro, a multimodal large language model (LLM) framework that integrates generative and optimization techniques. The framework features a multimodal diagnosis-and-repair feedback loop that uses simulation error messages… ▽ More

    Submitted 31 August, 2025; v1 submitted 4 August, 2025; originally announced August 2025.

  7. arXiv:2507.23541  [pdf, ps, other

    cs.CL

    Med-R$^3$: Enhancing Medical Retrieval-Augmented Reasoning of LLMs via Progressive Reinforcement Learning

    Authors: Keer Lu, Zheng Liang, Youquan Li, Jiejun Tan, Da Pan, Shusen Zhang, Guosheng Dong, Huang Leng, Bin Cui, Wentao Zhang

    Abstract: In medical scenarios, effectively retrieving external knowledge and leveraging it for rigorous logical reasoning is of significant importance. Despite their potential, existing work has predominantly focused on enhancing either retrieval or reasoning capabilities of the models in isolation, with little attention given to their joint optimization, which leads to limited coordination between the two… ▽ More

    Submitted 2 August, 2025; v1 submitted 31 July, 2025; originally announced July 2025.

  8. arXiv:2507.01478  [pdf, ps, other

    cs.CV

    Active Control Points-based 6DoF Pose Tracking for Industrial Metal Objects

    Authors: Chentao Shen, Ding Pan, Mingyu Mei, Zaixing He, Xinyue Zhao

    Abstract: Visual pose tracking is playing an increasingly vital role in industrial contexts in recent years. However, the pose tracking for industrial metal objects remains a challenging task especially in the real world-environments, due to the reflection characteristic of metal objects. To address this issue, we propose a novel 6DoF pose tracking method based on active control points. The method uses imag… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: preprint version

  9. arXiv:2506.19977  [pdf, ps, other

    cs.AI

    Context Attribution with Multi-Armed Bandit Optimization

    Authors: Deng Pan, Keerthiram Murugesan, Nuno Moniz, Nitesh Chawla

    Abstract: Understanding which parts of the retrieved context contribute to a large language model's generated answer is essential for building interpretable and trustworthy generative QA systems. We propose a novel framework that formulates context attribution as a combinatorial multi-armed bandit (CMAB) problem. Each context segment is treated as a bandit arm, and we employ Combinatorial Thompson Sampling… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  10. arXiv:2506.10331  [pdf, ps, other

    cs.CV eess.IV

    Research on Audio-Visual Quality Assessment Dataset and Method for User-Generated Omnidirectional Video

    Authors: Fei Zhao, Da Pan, Zelu Qi, Ping Shi

    Abstract: In response to the rising prominence of the Metaverse, omnidirectional videos (ODVs) have garnered notable interest, gradually shifting from professional-generated content (PGC) to user-generated content (UGC). However, the study of audio-visual quality assessment (AVQA) within ODVs remains limited. To address this, we construct a dataset of UGC omnidirectional audio and video (A/V) content. The v… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Our paper has been accepted by ICME 2025

  11. arXiv:2506.04715  [pdf, ps, other

    cs.CV

    Towards Holistic Visual Quality Assessment of AI-Generated Videos: A LLM-Based Multi-Dimensional Evaluation Model

    Authors: Zelu Qi, Ping Shi, Chaoyang Zhang, Shuqi Wang, Fei Zhao, Da Pan, Zefeng Ying

    Abstract: The development of AI-Generated Video (AIGV) technology has been remarkable in recent years, significantly transforming the paradigm of video content production. However, AIGVs still suffer from noticeable visual quality defects, such as noise, blurriness, frame jitter and low dynamic degree, which severely impact the user's viewing experience. Therefore, an effective automatic visual quality asse… ▽ More

    Submitted 11 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

    Comments: This paper has been accepted by CVPR Workshop 2025

  12. arXiv:2505.11815  [pdf, ps, other

    cs.CV

    UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings

    Authors: Jiajun Qin, Yuan Pu, Zhuolun He, Seunggeun Kim, David Z. Pan, Bei Yu

    Abstract: Current research has explored vision-language models for multi-modal embedding tasks, such as information retrieval, visual grounding, and classification. However, real-world scenarios often involve diverse modality combinations between queries and targets, such as text and image to text, text and image to text and image, and text to text and image. These diverse combinations pose significant chal… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  13. arXiv:2505.10928  [pdf, other

    cs.LG

    A Dataset for Spatiotemporal-Sensitive POI Question Answering

    Authors: Xiao Han, Dayan Pan, Xiangyu Zhao, Xuyuan Hu, Zhaolin Deng, Xiangjie Kong, Guojiang Shen

    Abstract: Spatiotemporal relationships are critical in data science, as many prediction and reasoning tasks require analysis across both spatial and temporal dimensions--for instance, navigating an unfamiliar city involves planning itineraries that sequence locations and timing cultural experiences. However, existing Question-Answering (QA) datasets lack sufficient spatiotemporal-sensitive questions, making… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: Under Review

  14. arXiv:2504.19959  [pdf, ps, other

    cs.AR

    From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification

    Authors: Junhao Ye, Yuchen Hu, Ke Xu, Dingrong Pan, Qichun Chen, Jie Zhou, Shuai Zhao, Xinwei Fang, Xi Wang, Nan Guan, Zhe Jiang

    Abstract: Verification presents a major bottleneck in Integrated Circuit (IC) development, consuming nearly 70% of the total development effort. While the Universal Verification Methodology (UVM) is widely used in industry to improve verification efficiency through structured and reusable testbenches, constructing these testbenches and generating sufficient stimuli remain challenging. These challenges arise… ▽ More

    Submitted 19 August, 2025; v1 submitted 28 April, 2025; originally announced April 2025.

  15. arXiv:2504.14482  [pdf, other

    cs.CL cs.SD

    DialogueAgents: A Hybrid Agent-Based Speech Synthesis Framework for Multi-Party Dialogue

    Authors: Xiang Li, Duyi Pan, Hongru Xiao, Jiale Han, Jing Tang, Jiabao Ma, Wei Wang, Bo Cheng

    Abstract: Speech synthesis is crucial for human-computer interaction, enabling natural and intuitive communication. However, existing datasets involve high construction costs due to manual annotation and suffer from limited character diversity, contextual scenarios, and emotional expressiveness. To address these issues, we propose DialogueAgents, a novel hybrid agent-based speech synthesis framework, which… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: Accepted by ICME 2025. Dataset and code are publicly available: [https://github.com/uirlx/DialogueAgents](https://github.com/uirlx/DialogueAgents)

  16. AD-Det: Boosting Object Detection in UAV Images with Focused Small Objects and Balanced Tail Classes

    Authors: Zhenteng Li, Sheng Lian, Dengfeng Pan, Youlin Wang, Wei Liu

    Abstract: Object detection in Unmanned Aerial Vehicle (UAV) images poses significant challenges due to complex scale variations and class imbalance among objects. Existing methods often address these challenges separately, overlooking the intricate nature of UAV images and the potential synergy between them. In response, this paper proposes AD-Det, a novel framework employing a coherent coarse-to-fine strat… ▽ More

    Submitted 27 April, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

    Comments: Published in Remote Sensing

    Journal ref: Remote Sens. 2025, 17(9), 1556

  17. arXiv:2504.05535  [pdf, other

    cs.CL

    COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values

    Authors: M-A-P Team, Siwei Wu, Jincheng Ren, Xinrun Du, Shuyue Guo, Xingwei Qu, Yiming Liang, Jie Liu, Yunwen Li, Tianyu Zheng, Boyu Feng, Huaqing Yuan, Zenith Wang, Jiaheng Liu, Wenhao Huang, Chenglin Cai, Haoran Que, Jian Yang, Yuelin Bai, Zekun Moore Wang, Zhouliang Yu, Qunshu Lin, Ding Pan, Yuchen Jiang, Tiannan Wang , et al. (7 additional authors not shown)

    Abstract: Aligning large language models (LLMs) with human preferences has achieved remarkable success. However, existing Chinese preference datasets are limited by small scale, narrow domain coverage, and lack of rigorous data validation. Additionally, the reliance on human annotators for instruction and response labeling significantly constrains the scalability of human preference datasets. To address the… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  18. arXiv:2504.03476  [pdf, other

    cs.CV

    ATM-Net: Anatomy-Aware Text-Guided Multi-Modal Fusion for Fine-Grained Lumbar Spine Segmentation

    Authors: Sheng Lian, Dengfeng Pan, Jianlong Cai, Guang-Yong Chen, Zhun Zhong, Zhiming Luo, Shen Zhao, Shuo Li

    Abstract: Accurate lumbar spine segmentation is crucial for diagnosing spinal disorders. Existing methods typically use coarse-grained segmentation strategies that lack the fine detail needed for precise diagnosis. Additionally, their reliance on visual-only models hinders the capture of anatomical semantics, leading to misclassified categories and poor segmentation details. To address these limitations, we… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  19. arXiv:2503.24320  [pdf, ps, other

    cs.CV

    Can Test-Time Scaling Improve World Foundation Model?

    Authors: Wenyan Cong, Hanqing Zhu, Peihao Wang, Bangya Liu, Dejia Xu, Kevin Wang, David Z. Pan, Yan Wang, Zhiwen Fan, Zhangyang Wang

    Abstract: World foundation models, which simulate the physical world by predicting future states from current observations and inputs, have become central to many applications in physical intelligence, including autonomous driving and robotics. However, these models require substantial computational resources for pretraining and are further constrained by available data during post-training. As such, scalin… ▽ More

    Submitted 8 August, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

    Comments: Accepted by COLM2025

  20. arXiv:2503.22958  [pdf

    cs.AR cs.AI

    Late Breaking Results: Breaking Symmetry- Unconventional Placement of Analog Circuits using Multi-Level Multi-Agent Reinforcement Learning

    Authors: Supriyo Maji, Linran Zhao, Souradip Poddar, David Z. Pan

    Abstract: Layout-dependent effects (LDEs) significantly impact analog circuit performance. Traditionally, designers have relied on symmetric placement of circuit components to mitigate variations caused by LDEs. However, due to non-linear nature of these effects, conventional methods often fall short. We propose an objective-driven, multi-level, multi-agent Q-learning framework to explore unconventional des… ▽ More

    Submitted 10 April, 2025; v1 submitted 28 March, 2025; originally announced March 2025.

    Comments: 2 pages, 3 figures, Proceedings of the 62nd ACM/IEEE Design Automation Conference (DAC), 2025

  21. arXiv:2502.17239  [pdf, other

    cs.CL cs.SD eess.AS

    Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction

    Authors: Tianpeng Li, Jun Liu, Tao Zhang, Yuanbo Fang, Da Pan, Mingrui Wang, Zheng Liang, Zehuan Li, Mingan Lin, Guosheng Dong, Jianhua Xu, Haoze Sun, Zenan Zhou, Weipeng Chen

    Abstract: We introduce Baichuan-Audio, an end-to-end audio large language model that seamlessly integrates audio understanding and generation. It features a text-guided aligned speech generation mechanism, enabling real-time speech interaction with both comprehension and generation capabilities. Baichuan-Audio leverages a pre-trained ASR model, followed by multi-codebook discretization of speech at a frame… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

  22. arXiv:2502.12671  [pdf, other

    cs.CL

    Baichuan-M1: Pushing the Medical Capability of Large Language Models

    Authors: Bingning Wang, Haizhou Zhao, Huozhi Zhou, Liang Song, Mingyu Xu, Wei Cheng, Xiangrong Zeng, Yupeng Zhang, Yuqi Huo, Zecheng Wang, Zhengyun Zhao, Da Pan, Fei Kou, Fei Li, Fuzhong Chen, Guosheng Dong, Han Liu, Hongda Zhang, Jin He, Jinjie Yang, Kangxi Wu, Kegeng Wu, Lei Su, Linlin Niu, Linzhuang Sun , et al. (17 additional authors not shown)

    Abstract: The current generation of large language models (LLMs) is typically designed for broad, general-purpose applications, while domain-specific LLMs, especially in vertical fields like medicine, remain relatively scarce. In particular, the development of highly efficient and practical LLMs for the medical domain is challenging due to the complexity of medical knowledge and the limited availability of… ▽ More

    Submitted 5 March, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

    Comments: 33 pages, technical report

  23. arXiv:2502.08949  [pdf, other

    cs.LG

    DICE: Device-level Integrated Circuits Encoder with Graph Contrastive Pretraining

    Authors: Sungyoung Lee, Ziyi Wang, Seunggeun Kim, Taekyun Lee, Yao Lai, David Z. Pan

    Abstract: Pretraining models with unsupervised graph representation learning has led to significant advancements in domains such as social network analysis, molecular design, and electronic design automation (EDA). However, prior work in EDA has mainly focused on pretraining models for digital circuits, overlooking analog and mixed-signal circuits. To bridge this gap, we introduce DICE, a Device-level Integ… ▽ More

    Submitted 19 May, 2025; v1 submitted 12 February, 2025; originally announced February 2025.

  24. arXiv:2502.01670  [pdf

    cs.AR cs.ET cs.LG

    Hardware-Efficient Photonic Tensor Core: Accelerating Deep Neural Networks with Structured Compression

    Authors: Shupeng Ning, Hanqing Zhu, Chenghao Feng, Jiaqi Gu, David Z. Pan, Ray T. Chen

    Abstract: The rapid growth in computing demands, particularly driven by artificial intelligence applications, has begun to exceed the capabilities of traditional electronic hardware. Optical computing offers a promising alternative due to its parallelism, high computational speed, and low power consumption. However, existing photonic integrated circuits are constrained by large footprints, costly electro-op… ▽ More

    Submitted 23 July, 2025; v1 submitted 1 February, 2025; originally announced February 2025.

    Journal ref: Optica Vol. 12, Issue 7, 2025, 1079-1089

  25. arXiv:2501.15368  [pdf, other

    cs.CL cs.SD eess.AS

    Baichuan-Omni-1.5 Technical Report

    Authors: Yadong Li, Jun Liu, Tao Zhang, Tao Zhang, Song Chen, Tianpeng Li, Zehuan Li, Lijun Liu, Lingfeng Ming, Guosheng Dong, Da Pan, Chong Li, Yuanbo Fang, Dongdong Kuang, Mingrui Wang, Chenglin Zhu, Youwei Zhang, Hongyu Guo, Fengyu Zhang, Yuran Wang, Bowen Ding, Wei Song, Xu Li, Yuqi Huo, Zheng Liang , et al. (68 additional authors not shown)

    Abstract: We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pip… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

  26. arXiv:2501.11885  [pdf, other

    cs.CL

    Med-R$^2$: Crafting Trustworthy LLM Physicians via Retrieval and Reasoning of Evidence-Based Medicine

    Authors: Keer Lu, Zheng Liang, Zhuoran Zhang, Da Pan, Shusen Zhang, Xin Wu, Zenan Zhou, Guosheng Dong, Bin Cui, Tengjiao Wang, Wentao Zhang

    Abstract: Large Language Models (LLMs) have exhibited remarkable capabilities in clinical scenarios. Despite their potential, existing works face challenges when applying LLMs to medical settings. Strategies relying on training with medical datasets are highly cost-intensive and may suffer from outdated training data. Leveraging external knowledge bases is a suitable alternative, yet it faces obstacles such… ▽ More

    Submitted 15 May, 2025; v1 submitted 20 January, 2025; originally announced January 2025.

  27. arXiv:2501.08545  [pdf, ps, other

    cs.CV

    T2VEval: Benchmark Dataset and Objective Evaluation Method for T2V-generated Videos

    Authors: Zelu Qi, Ping Shi, Shuqi Wang, Chaoyang Zhang, Fei Zhao, Zefeng Ying, Da Pan, Xi Yang, Zheqi He, Teng Dai

    Abstract: Recent advances in text-to-video (T2V) technology, as demonstrated by models such as Runway Gen-3, Pika, Sora, and Kling, have significantly broadened the applicability and popularity of the technology. This progress has created a growing demand for accurate quality assessment metrics to evaluate the perceptual quality of T2V-generated videos and optimize video generation models. However, assessin… ▽ More

    Submitted 6 August, 2025; v1 submitted 14 January, 2025; originally announced January 2025.

    Comments: This paper has been accepted by DISPLAYS

  28. arXiv:2501.08453  [pdf, other

    cs.CV cs.LG

    Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models

    Authors: Weichen Fan, Chenyang Si, Junhao Song, Zhenyu Yang, Yinan He, Long Zhuo, Ziqi Huang, Ziyue Dong, Jingwen He, Dongwei Pan, Yi Wang, Yuming Jiang, Yaohui Wang, Peng Gao, Xinyuan Chen, Hengjie Li, Dahua Lin, Yu Qiao, Ziwei Liu

    Abstract: We present Vchitect-2.0, a parallel transformer architecture designed to scale up video diffusion models for large-scale text-to-video generation. The overall Vchitect-2.0 system has several key designs. (1) By introducing a novel Multimodal Diffusion Block, our approach achieves consistent alignment between text descriptions and generated video frames, while maintaining temporal coherence across… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

  29. arXiv:2412.10658  [pdf, ps, other

    stat.ME cs.AI cs.LG

    Combining Priors with Experience: Confidence Calibration Based on Binomial Process Modeling

    Authors: Jinzong Dong, Zhaohui Jiang, Dong Pan, Haoyang Yu

    Abstract: Confidence calibration of classification models is a technique to estimate the true posterior probability of the predicted class, which is critical for ensuring reliable decision-making in practical applications. Existing confidence calibration methods mostly use statistical techniques to estimate the calibration curve from data or fit a user-defined calibration function, but often overlook fully… ▽ More

    Submitted 18 February, 2025; v1 submitted 13 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI-25

  30. arXiv:2412.05270  [pdf, other

    cs.LG cs.AI cs.PF

    APOLLO: SGD-like Memory, AdamW-level Performance

    Authors: Hanqing Zhu, Zhenyu Zhang, Wenyan Cong, Xi Liu, Sem Park, Vikas Chandra, Bo Long, David Z. Pan, Zhangyang Wang, Jinwon Lee

    Abstract: Large language models (LLMs) are notoriously memory-intensive during training, particularly with the popular AdamW optimizer. This memory burden necessitates using more or higher-end GPUs or reducing batch sizes, limiting training scalability and throughput. To address this, various memory-efficient optimizers have been proposed to reduce optimizer memory usage. However, they face critical challen… ▽ More

    Submitted 17 February, 2025; v1 submitted 6 December, 2024; originally announced December 2024.

    Comments: Accepted to MLSys 2025; the newest version with new experiments

  31. arXiv:2412.03058  [pdf, other

    cs.CV

    Revisiting Energy-Based Model for Out-of-Distribution Detection

    Authors: Yifan Wu, Xichen Ye, Songmin Dai, Dengye Pan, Xiaoqiang Li, Weizhong Zhang, Yifan Chen

    Abstract: Out-of-distribution (OOD) detection is an essential approach to robustifying deep learning models, enabling them to identify inputs that fall outside of their trained distribution. Existing OOD detection methods usually depend on crafted data, such as specific outlier datasets or elaborate data augmentations. While this is reasonable, the frequent mismatch between crafted data and OOD data limits… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

    Comments: This work has been submitted to the IEEE for possible publication

    MSC Class: 68T05; 68T45 ACM Class: I.2.10; I.5.1

  32. arXiv:2412.02421  [pdf, other

    cs.CV

    TimeWalker: Personalized Neural Space for Lifelong Head Avatars

    Authors: Dongwei Pan, Yang Li, Hongsheng Li, Kwan-Yee Lin

    Abstract: We present TimeWalker, a novel framework that models realistic, full-scale 3D head avatars of a person on lifelong scale. Unlike current human head avatar pipelines that capture identity at the momentary level(e.g., instant photography or short videos), TimeWalker constructs a person's comprehensive identity from unstructured data collection over his/her various life stages, offering a paradigm to… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

    Comments: Project Page: https://timewalker2024.github.io/timewalker.github.io/ , Video: https://www.youtube.com/watch?v=x8cpOVMY_ko

  33. arXiv:2411.16238  [pdf, other

    cs.AR

    UVLLM: An Automated Universal RTL Verification Framework using LLMs

    Authors: Yuchen Hu, Junhao Ye, Ke Xu, Jialin Sun, Shiyue Zhang, Xinyao Jiao, Dingrong Pan, Jie Zhou, Ning Wang, Weiwei Shan, Xinwei Fang, Xi Wang, Nan Guan, Zhe Jiang

    Abstract: Verifying hardware designs in embedded systems is crucial but often labor-intensive and time-consuming. While existing solutions have improved automation, they frequently rely on unrealistic assumptions. To address these challenges, we introduce a novel framework, UVLLM, which combines Large Language Models (LLMs) with the Universal Verification Methodology (UVM) to relax these assumptions. UVLLM… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

  34. arXiv:2411.16019  [pdf, other

    cs.LG

    M3: Mamba-assisted Multi-Circuit Optimization via MBRL with Effective Scheduling

    Authors: Youngmin Oh, Jinje Park, Seunggeun Kim, Taejin Paik, David Pan, Bosun Hwang

    Abstract: Recent advancements in reinforcement learning (RL) for analog circuit optimization have demonstrated significant potential for improving sample efficiency and generalization across diverse circuit topologies and target specifications. However, there are challenges such as high computational overhead, the need for bespoke models for each circuit. To address them, we propose M3, a novel Model-based… ▽ More

    Submitted 24 November, 2024; originally announced November 2024.

  35. arXiv:2411.11266  [pdf, other

    cs.CL

    VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMs

    Authors: Keer Lu, Keshi Zhao, Zhuoran Zhang, Zheng Liang, Da Pan, Shusen Zhang, Xin Wu, Guosheng Dong, Bin Cui, Tengjiao Wang, Wentao Zhang

    Abstract: As demonstrated by the proprietary Large Language Models (LLMs) such as GPT and Claude series, LLMs have the potential to achieve remarkable proficiency across a wide range of domains, including law, medicine, finance, science, code, etc., all within a single model. These capabilities are further augmented during the Supervised Fine-Tuning (SFT) phase. Despite their potential, existing work mainly… ▽ More

    Submitted 19 May, 2025; v1 submitted 17 November, 2024; originally announced November 2024.

  36. arXiv:2411.03527  [pdf, other

    cs.LG physics.optics

    PACE: Pacing Operator Learning to Accurate Optical Field Simulation for Complicated Photonic Devices

    Authors: Hanqing Zhu, Wenyan Cong, Guojin Chen, Shupeng Ning, Ray T. Chen, Jiaqi Gu, David Z. Pan

    Abstract: Electromagnetic field simulation is central to designing, optimizing, and validating photonic devices and circuits. However, costly computation associated with numerical simulation poses a significant bottleneck, hindering scalability and turnaround time in the photonic circuit design process. Neural operators offer a promising alternative, but existing SOTA approaches, NeurOLight, struggle with p… ▽ More

    Submitted 5 November, 2024; originally announced November 2024.

    Comments: Accepeted by Neurips 2024, 21 pages

  37. arXiv:2410.17238  [pdf, other

    cs.AI cs.CL cs.LG cs.SE

    SELA: Tree-Search Enhanced LLM Agents for Automated Machine Learning

    Authors: Yizhou Chi, Yizhang Lin, Sirui Hong, Duyi Pan, Yaying Fei, Guanghao Mei, Bangbang Liu, Tianqi Pang, Jacky Kwok, Ceyao Zhang, Bang Liu, Chenglin Wu

    Abstract: Automated Machine Learning (AutoML) approaches encompass traditional methods that optimize fixed pipelines for model selection and ensembling, as well as newer LLM-based frameworks that autonomously build pipelines. While LLM-based agents have shown promise in automating machine learning tasks, they often generate low-diversity and suboptimal code, even after multiple iterations. To overcome these… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

    Comments: The code is available at https://github.com/geekan/MetaGPT

  38. arXiv:2410.08565  [pdf, other

    cs.AI cs.CL cs.CV

    Baichuan-Omni Technical Report

    Authors: Yadong Li, Haoze Sun, Mingan Lin, Tianpeng Li, Guosheng Dong, Tao Zhang, Bowen Ding, Wei Song, Zhenglin Cheng, Yuqi Huo, Song Chen, Xu Li, Da Pan, Shusen Zhang, Xin Wu, Zheng Liang, Jun Liu, Tao Zhang, Keer Lu, Yaqi Zhao, Yanjun Shen, Fan Yang, Kaicheng Yu, Tao Lin, Jianhua Xu , et al. (2 additional authors not shown)

    Abstract: The salient multimodal capabilities and interactive experience of GPT-4o highlight its critical role in practical applications, yet it lacks a high-performing open-source counterpart. In this paper, we introduce Baichuan-omni, the first open-source 7B Multimodal Large Language Model (MLLM) adept at concurrently processing and analyzing modalities of image, video, audio, and text, while delivering… ▽ More

    Submitted 27 December, 2024; v1 submitted 11 October, 2024; originally announced October 2024.

  39. arXiv:2409.15306  [pdf, other

    physics.app-ph cs.ET

    Open-Source Differentiable Lithography Imaging Framework

    Authors: Guojin Chen, Hao Geng, Bei Yu, David Z. Pan

    Abstract: The rapid evolution of the electronics industry, driven by Moore's law and the proliferation of integrated circuits, has led to significant advancements in modern society, including the Internet, wireless communication, and artificial intelligence (AI). Central to this progress is optical lithography, a critical technology in semiconductor manufacturing that accounts for approximately 30\% to 40\%… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: Accepted by SPIE24

  40. arXiv:2409.00997  [pdf, other

    cs.CL

    DataSculpt: Crafting Data Landscapes for Long-Context LLMs through Multi-Objective Partitioning

    Authors: Keer Lu, Xiaonan Nie, Zheng Liang, Da Pan, Shusen Zhang, Keshi Zhao, Weipeng Chen, Zenan Zhou, Guosheng Dong, Bin Cui, Wentao Zhang

    Abstract: In recent years, Large Language Models (LLMs) have demonstrated significant improvements across a variety of tasks, one of which is the long-context capability. The key to improving long-context performance lies in effective data organization and management strategies that integrate data from multiple domains and optimize the context window during training. Through extensive experimental analysis,… ▽ More

    Submitted 2 October, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

  41. arXiv:2408.15079  [pdf, other

    cs.CL cs.AI

    BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline

    Authors: Guosheng Dong, Da Pan, Yiding Sun, Shusen Zhang, Zheng Liang, Xin Wu, Yanjun Shen, Fan Yang, Haoze Sun, Tianpeng Li, Mingan Lin, Jianhua Xu, Yufan Zhang, Xiaonan Nie, Lei Su, Bingning Wang, Wentao Zhang, Jiaxin Mao, Zenan Zhou, Weipeng Chen

    Abstract: The general capabilities of Large Language Models (LLM) highly rely on the composition and selection on extensive pretraining datasets, treated as commercial secrets by several institutions. To mitigate this issue, we open-source the details of a universally applicable data processing pipeline and validate its effectiveness and potential by introducing a competitive LLM baseline. Specifically, the… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 19 pages, 6 figures

  42. arXiv:2408.08969  [pdf, other

    cs.AI physics.optics

    Differentiable Edge-based OPC

    Authors: Guojin Chen, Haoyu Yang, Haoxing Ren, Bei Yu, David Z. Pan

    Abstract: Optical proximity correction (OPC) is crucial for pushing the boundaries of semiconductor manufacturing and enabling the continued scaling of integrated circuits. While pixel-based OPC, termed as inverse lithography technology (ILT), has gained research interest due to its flexibility and precision. Its complexity and intricate features can lead to challenges in mask writing, increased defects, an… ▽ More

    Submitted 29 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

    Comments: Accepted by ICCAD24

  43. arXiv:2407.20544  [pdf, other

    cs.CR cs.AR

    Automated Physical Design Watermarking Leveraging Graph Neural Networks

    Authors: Ruisi Zhang, Rachel Selina Rajarathnam, David Z. Pan, Farinaz Koushanfar

    Abstract: This paper presents AutoMarks, an automated and transferable watermarking framework that leverages graph neural networks to reduce the watermark search overheads during the placement stage. AutoMarks's novel automated watermark search is accomplished by (i) constructing novel graph and node features with physical, semantic, and design constraint-aware representation; (ii) designing a data-efficien… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: accept to MLCAD24, code: https://github.com/ruisizhang123/PD_WM_GNN

  44. arXiv:2407.07346  [pdf, other

    cs.LG cs.CE

    INSIGHT: Universal Neural Simulator for Analog Circuits Harnessing Autoregressive Transformers

    Authors: Souradip Poddar, Youngmin Oh, Yao Lai, Hanqing Zhu, Bosun Hwang, David Z. Pan

    Abstract: Analog front-end design heavily relies on specialized human expertise and costly trial-and-error simulations, which motivated many prior works on analog design automation. However, efficient and effective exploration of the vast and complex design space remains constrained by the time-consuming nature of SPICE simulations, making effective design automation a challenging endeavor. In this paper, w… ▽ More

    Submitted 6 August, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  45. Multi-Objective Optimization for Common-Centroid Placement of Analog Transistors

    Authors: Supriyo Maji, Hyungjoo Park, Gi moon Hong, Souradip Poddar, David Z. Pan

    Abstract: In analog circuits, process variation can cause unpredictability in circuit performance. Common-centroid (CC) type layouts have been shown to mitigate process-induced variations and are widely used to match circuit elements. Nevertheless, selecting the most suitable CC topology necessitates careful consideration of important layout constraints. Manual handling of these constraints becomes challeng… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  46. arXiv:2406.05250  [pdf, other

    cs.AI cs.AR cs.LG

    LLM-Enhanced Bayesian Optimization for Efficient Analog Layout Constraint Generation

    Authors: Guojin Chen, Keren Zhu, Seunggeun Kim, Hanqing Zhu, Yao Lai, Bei Yu, David Z. Pan

    Abstract: Analog layout synthesis faces significant challenges due to its dependence on manual processes, considerable time requirements, and performance instability. Current Bayesian Optimization (BO)-based techniques for analog layout synthesis, despite their potential for automation, suffer from slow convergence and extensive data needs, limiting their practical application. This paper presents the \text… ▽ More

    Submitted 6 December, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

  47. arXiv:2405.19327  [pdf, other

    cs.CL cs.AI cs.LG

    MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

    Authors: Ge Zhang, Scott Qu, Jiaheng Liu, Chenchen Zhang, Chenghua Lin, Chou Leuang Yu, Danny Pan, Esther Cheng, Jie Liu, Qunshu Lin, Raven Yuan, Tuney Zheng, Wei Pang, Xinrun Du, Yiming Liang, Yinghao Ma, Yizhi Li, Ziyang Ma, Bill Lin, Emmanouil Benetos, Huan Yang, Junting Zhou, Kaijing Ma, Minghao Liu, Morry Niu , et al. (20 additional authors not shown)

    Abstract: Large Language Models (LLMs) have made great strides in recent years to achieve unprecedented performance across different tasks. However, due to commercial interest, the most competitive models like GPT, Gemini, and Claude have been gated behind proprietary interfaces without disclosing the training details. Recently, many institutions have open-sourced several strong LLMs like LLaMA-3, comparabl… ▽ More

    Submitted 10 July, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: https://map-neo.github.io/

  48. arXiv:2405.18664  [pdf, other

    cs.LG cs.AI

    Fast Explanations via Policy Gradient-Optimized Explainer

    Authors: Deng Pan, Nuno Moniz, Nitesh Chawla

    Abstract: The challenge of delivering efficient explanations is a critical barrier that prevents the adoption of model explanations in real-world applications. Existing approaches often depend on extensive model queries for sample-level explanations or rely on expert's knowledge of specific model structures that trade general applicability for efficiency. To address these limitations, this paper introduces… ▽ More

    Submitted 27 January, 2025; v1 submitted 28 May, 2024; originally announced May 2024.

  49. arXiv:2405.14918  [pdf, other

    cs.LG cs.ET

    AnalogCoder: Analog Circuit Design via Training-Free Code Generation

    Authors: Yao Lai, Sungyoung Lee, Guojin Chen, Souradip Poddar, Mengkang Hu, David Z. Pan, Ping Luo

    Abstract: Analog circuit design is a significant task in modern chip technology, focusing on the selection of component types, connectivity, and parameters to ensure proper circuit functionality. Despite advances made by Large Language Models (LLMs) in digital circuit design, the complexity and scarcity of data in analog circuitry pose significant challenges. To mitigate these issues, we introduce AnalogCod… ▽ More

    Submitted 30 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  50. arXiv:2405.06758  [pdf, other

    cs.LG

    Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier Designs

    Authors: Yao Lai, Jinxin Liu, David Z. Pan, Ping Luo

    Abstract: Across a wide range of hardware scenarios, the computational efficiency and physical size of the arithmetic units significantly influence the speed and footprint of the overall hardware system. Nevertheless, the effectiveness of prior arithmetic design techniques proves inadequate, as it does not sufficiently optimize speed and area, resulting in a reduced processing rate and larger module size. T… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.