Skip to main content

Showing 1–50 of 588 results for author: Lin, B

.
  1. arXiv:2507.02289  [pdf, ps, other

    eess.IV cs.CV

    CineMyoPS: Segmenting Myocardial Pathologies from Cine Cardiac MR

    Authors: Wangbin Ding, Lei Li, Junyi Qiu, Bogen Lin, Mingjing Yang, Liqin Huang, Lianming Wu, Sihan Wang, Xiahai Zhuang

    Abstract: Myocardial infarction (MI) is a leading cause of death worldwide. Late gadolinium enhancement (LGE) and T2-weighted cardiac magnetic resonance (CMR) imaging can respectively identify scarring and edema areas, both of which are essential for MI risk stratification and prognosis assessment. Although combining complementary information from multi-sequence CMR is useful, acquiring these sequences can… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  2. arXiv:2506.16695  [pdf

    cond-mat.mtrl-sci

    Crystal Growth of Chalcogenides and Oxy-Chalcogenides Using Chloride Exchange Reaction

    Authors: Shantanu Singh, Boyang Zhao, Christopher E. Stevens, Mythili Surendran, Tzu-Chi Huang, Bi-Hsuan Lin, Joshua R. Hendrickson, Jayakanth Ravichandran

    Abstract: Chalcogenides and oxy-chalcogenides, including complex chalcogenides and transition metal dichalcogenides, are emerging semiconductors with direct or indirect band gaps within the visible spectrum. These materials are being explored for various photonic and electronic applications, such as photodetectors, photovoltaics, and phase-change electronics. Understanding the fundamental properties of thes… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  3. arXiv:2506.10337  [pdf, ps, other

    cs.CV

    GeoCAD: Local Geometry-Controllable CAD Generation

    Authors: Zhanwei Zhang, Kaiyuan Liu, Junjie Liu, Wenxiao Wang, Binbin Lin, Liang Xie, Chen Shen, Deng Cai

    Abstract: Local geometry-controllable computer-aided design (CAD) generation aims to modify local parts of CAD models automatically, enhancing design efficiency. It also ensures that the shapes of newly generated local parts follow user-specific geometric instructions (e.g., an isosceles right triangle or a rectangle with one corner cut off). However, existing methods encounter challenges in achieving this… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: 18 pages, 12 figures

  4. arXiv:2506.08708  [pdf, ps, other

    cs.RO cs.AI cs.CV

    PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly

    Authors: Liang Ma, Jiajun Wen, Min Lin, Rongtao Xu, Xiwen Liang, Bingqian Lin, Jun Ma, Yongxin Wang, Ziming Wei, Haokun Lin, Mingfei Han, Meng Cao, Bokui Chen, Ivan Laptev, Xiaodan Liang

    Abstract: While vision-language models (VLMs) have demonstrated promising capabilities in reasoning and planning for embodied agents, their ability to comprehend physical phenomena, particularly within structured 3D environments, remains severely limited. To close this gap, we introduce PhyBlock, a progressive benchmark designed to assess VLMs on physical understanding and planning through robotic 3D block… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  5. arXiv:2506.07900  [pdf, ps, other

    cs.CL cs.AI

    MiniCPM4: Ultra-Efficient LLMs on End Devices

    Authors: MiniCPM Team, Chaojun Xiao, Yuxuan Li, Xu Han, Yuzhuo Bai, Jie Cai, Haotian Chen, Wentong Chen, Xin Cong, Ganqu Cui, Ning Ding, Shengdan Fan, Yewei Fang, Zixuan Fu, Wenyu Guan, Yitong Guan, Junshao Guo, Yufeng Han, Bingxiang He, Yuxiang Huang, Cunliang Kong, Qiuzuo Li, Siyuan Li, Wenhao Li, Yanghao Li , et al. (50 additional authors not shown)

    Abstract: This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelera… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: MiniCPM4 Technical Report

  6. arXiv:2506.07551  [pdf, ps, other

    cs.LG cs.AI cs.CE cs.CL

    CheMatAgent: Enhancing LLMs for Chemistry and Materials Science through Tree-Search Based Tool Learning

    Authors: Mengsong Wu, YaFei Wang, Yidong Ming, Yuqi An, Yuwei Wan, Wenliang Chen, Binbin Lin, Yuqiang Li, Tong Xie, Dongzhan Zhou

    Abstract: Large language models (LLMs) have recently demonstrated promising capabilities in chemistry tasks while still facing challenges due to outdated pretraining knowledge and the difficulty of incorporating specialized chemical expertise. To address these issues, we propose an LLM-based agent that synergistically integrates 137 external chemical tools created ranging from basic information retrieval to… ▽ More

    Submitted 12 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

    Comments: 15 pages, 6 figures

  7. arXiv:2506.03147  [pdf, ps, other

    cs.CV cs.AI cs.CL

    UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

    Authors: Bin Lin, Zongjian Li, Xinhua Cheng, Yuwei Niu, Yang Ye, Xianyi He, Shenghai Yuan, Wangbo Yu, Shaodong Wang, Yunyang Ge, Yatian Pang, Li Yuan

    Abstract: Although existing unified models achieve strong performance in vision-language understanding and text-to-image generation, they remain limited in addressing image perception and manipulation -- capabilities increasingly demanded in practical applications. Recently, OpenAI introduced the powerful GPT-4o-Image model, which showcases advanced capabilities in comprehensive image perception and manipul… ▽ More

    Submitted 18 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

  8. arXiv:2506.03017  [pdf, ps, other

    cs.RO

    Adjusting Tissue Puncture Omnidirectionally In Situ with Pneumatic Rotatable Biopsy Mechanism and Hierarchical Airflow Management in Tortuous Luminal Pathways

    Authors: Botao Lin, Tinghua Zhang, Sishen Yuan, Tiantian Wang, Jiaole Wang, Wu Yuan, Hongliang Ren

    Abstract: In situ tissue biopsy with an endoluminal catheter is an efficient approach for disease diagnosis, featuring low invasiveness and few complications. However, the endoluminal catheter struggles to adjust the biopsy direction by distal endoscope bending or proximal twisting for tissue sampling within the tortuous luminal organs, due to friction-induced hysteresis and narrow spaces. Here, we propose… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  9. arXiv:2506.01551  [pdf, ps, other

    cs.CV cs.AI cs.CL

    EvolveNav: Self-Improving Embodied Reasoning for LLM-Based Vision-Language Navigation

    Authors: Bingqian Lin, Yunshuang Nie, Khun Loun Zai, Ziming Wei, Mingfei Han, Rongtao Xu, Minzhe Niu, Jianhua Han, Liang Lin, Cewu Lu, Xiaodan Liang

    Abstract: Building Vision-Language Navigation (VLN) agents which can navigate following natural language instructions is a long-standing goal in human-robot interaction applications. Recent studies have revealed the potential of training open-source Large Language Models (LLMs) to unleash LLMs' reasoning ability for improving navigation, and simultaneously mitigate the domain gap between LLMs' training corp… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  10. arXiv:2505.23977  [pdf, other

    cs.CV cs.AI cs.LG

    VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL

    Authors: Yichen Feng, Zhangchen Xu, Fengqing Jiang, Yuetai Li, Bhaskar Ramasubramanian, Luyao Niu, Bill Yuchen Lin, Radha Poovendran

    Abstract: Vision language models (VLMs) are expected to perform effective multimodal reasoning and make logically coherent decisions, which is critical to tasks such as diagram understanding and spatial problem solving. However, current VLM reasoning lacks large-scale and well-structured training datasets. To bridge this gap, we propose VisualSphinx, a first-of-its-kind large-scale synthetic visual logical… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Project page at https://visualsphinx.github.io/

  11. arXiv:2505.22587  [pdf, ps, other

    stat.ME

    Bayesian Non-Parametric Inference for Lévy Measures in State-Space Models

    Authors: Bill Z. Lin, Simon Godsill

    Abstract: Lévy processes, known for their ability to model complex dynamics with skewness, heavy tails and discontinuities, play a critical role in stochastic modeling across various domains. However, inference for most Lévy processes, whether in parametric or non-parametric settings, remains a significant challenge. In this work, we present a novel Bayesian non-parametric inference framework for inferring… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  12. arXiv:2505.20700  [pdf, ps, other

    cs.CL

    Beyond Templates: Dynamic Adaptation of Reasoning Demonstrations via Feasibility-Aware Exploration

    Authors: Yong Wu, Weihang Pan, Ke Li, Chen Binhui, Ping Li, Binbin Lin

    Abstract: Large language models (LLMs) have shown remarkable reasoning capabilities, yet aligning such abilities to small language models (SLMs) remains a challenge due to distributional mismatches and limited model capacity. Existing reasoning datasets, typically designed for powerful LLMs, often lead to degraded performance when directly applied to weaker models. In this work, we introduce Dynamic Adaptat… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  13. arXiv:2505.20292  [pdf, ps, other

    cs.CV cs.AI

    OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

    Authors: Shenghai Yuan, Xianyi He, Yufan Deng, Yang Ye, Jinfa Huang, Bin Lin, Jiebo Luo, Li Yuan

    Abstract: Subject-to-Video (S2V) generation aims to create videos that faithfully incorporate reference content, providing enhanced flexibility in the production of videos. To establish the infrastructure for S2V generation, we propose OpenS2V-Nexus, consisting of (i) OpenS2V-Eval, a fine-grained benchmark, and (ii) OpenS2V-5M, a million-scale dataset. In contrast to existing S2V benchmarks inherited from V… ▽ More

    Submitted 3 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

    Comments: Code and Dataset: https://github.com/PKU-YuanGroup/OpenS2V-Nexus

  14. arXiv:2505.20275  [pdf, ps, other

    cs.CV

    ImgEdit: A Unified Image Editing Dataset and Benchmark

    Authors: Yang Ye, Xianyi He, Zongjian Li, Bin Lin, Shenghai Yuan, Zhiyuan Yan, Bohan Hou, Li Yuan

    Abstract: Recent advancements in generative models have enabled high-fidelity text-to-image generation. However, open-source image-editing models still lag behind their proprietary counterparts, primarily due to limited high-quality data and insufficient benchmarks. To overcome these limitations, we introduce ImgEdit, a large-scale, high-quality image-editing dataset comprising 1.2 million carefully curated… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  15. arXiv:2505.20196  [pdf, other

    cs.AI cs.LG

    Temporal Sampling for Forgotten Reasoning in LLMs

    Authors: Yuetai Li, Zhangchen Xu, Fengqing Jiang, Bhaskar Ramasubramanian, Luyao Niu, Bill Yuchen Lin, Xiang Yue, Radha Poovendran

    Abstract: Fine-tuning large language models (LLMs) is intended to improve their reasoning capabilities, yet we uncover a counterintuitive effect: models often forget how to solve problems they previously answered correctly during training. We term this phenomenon temporal forgetting and show that it is widespread across model sizes, fine-tuning methods (both Reinforcement Learning and Supervised Fine-Tuning… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  16. arXiv:2505.20148  [pdf, ps, other

    cs.AI

    MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents

    Authors: Ziming Wei, Bingqian Lin, Zijian Jiao, Yunshuang Nie, Liang Ma, Yuecheng Liu, Yuzheng Zhuang, Xiaodan Liang

    Abstract: Spatial Planning is a crucial part in the field of spatial intelligence, which requires the understanding and planning about object arrangements in space perspective. AI agents with the spatial planning ability can better adapt to various real-world applications, including robotic manipulation, automatic assembly, urban planning etc. Recent works have attempted to construct benchmarks for evaluati… ▽ More

    Submitted 27 May, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  17. arXiv:2505.14625  [pdf, ps, other

    cs.LG cs.AI cs.CL

    TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning

    Authors: Zhangchen Xu, Yuetai Li, Fengqing Jiang, Bhaskar Ramasubramanian, Luyao Niu, Bill Yuchen Lin, Radha Poovendran

    Abstract: Reinforcement Learning (RL) has become a powerful tool for enhancing the reasoning abilities of large language models (LLMs) by optimizing their policies with reward signals. Yet, RL's success relies on the reliability of rewards, which are provided by verifiers. In this paper, we expose and analyze a widespread problem--false negatives--where verifiers wrongly reject correct model outputs. Our in… ▽ More

    Submitted 22 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

  18. arXiv:2505.13050  [pdf, ps, other

    cs.CV

    RGB-to-Polarization Estimation: A New Task and Benchmark Study

    Authors: Beibei Lin, Zifeng Yuan, Tingting Chen

    Abstract: Polarization images provide rich physical information that is fundamentally absent from standard RGB images, benefiting a wide range of computer vision applications such as reflection separation and material classification. However, the acquisition of polarization images typically requires additional optical components, which increases both the cost and the complexity of the applications. To bridg… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  19. arXiv:2505.08163  [pdf, ps, other

    cs.AI cs.CV

    Decoding Neighborhood Environments with Large Language Models

    Authors: Andrew Cart, Shaohu Zhang, Melanie Escue, Xugui Zhou, Haitao Zhao, Prashanth BusiReddyGari, Beiyu Lin, Shuang Li

    Abstract: Neighborhood environments include physical and environmental conditions such as housing quality, roads, and sidewalks, which significantly influence human health and well-being. Traditional methods for assessing these environments, including field surveys and geographic information systems (GIS), are resource-intensive and challenging to evaluate neighborhood environments at scale. Although machin… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 8 pages

  20. arXiv:2505.06274  [pdf, ps, other

    cs.LG cs.AI

    PARM: Multi-Objective Test-Time Alignment via Preference-Aware Autoregressive Reward Model

    Authors: Baijiong Lin, Weisen Jiang, Yuancheng Xu, Hao Chen, Ying-Cong Chen

    Abstract: Multi-objective test-time alignment aims to adapt large language models (LLMs) to diverse multi-dimensional user preferences during inference while keeping LLMs frozen. Recently, GenARM (Xu et al., 2025) first independently trains Autoregressive Reward Models (ARMs) for each preference dimension without awareness of each other, then combines their outputs based on user-specific preference vectors… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Accepted by ICML 2025

  21. arXiv:2504.19444  [pdf, other

    cs.SE cs.CL

    Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks

    Authors: Kang Yang, Xinjun Mao, Shangwen Wang, Yanlin Wang, Tanghaoran Zhang, Bo Lin, Yihao Qin, Zhang Zhang, Yao Lu, Kamal Al-Sabahi

    Abstract: Pre-trained code models rely heavily on high-quality pre-training data, particularly human-written reference comments that bridge code and natural language. However, these comments often become outdated as software evolves, degrading model performance. Large language models (LLMs) excel at generating high-quality code comments. We investigate whether replacing human-written comments with LLM-gener… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: Awarded the ACM SIGSOFT Distinguished Paper Award in ICPC 2025

  22. arXiv:2504.16429  [pdf, other

    cs.CR cs.SE

    Give LLMs a Security Course: Securing Retrieval-Augmented Code Generation via Knowledge Injection

    Authors: Bo Lin, Shangwen Wang, Yihao Qin, Liqian Chen, Xiaoguang Mao

    Abstract: Retrieval-Augmented Code Generation (RACG) leverages external knowledge to enhance Large Language Models (LLMs) in code synthesis, improving the functional correctness of the generated code. However, existing RACG systems largely overlook security, leading to substantial risks. Especially, the poisoning of malicious code into knowledge bases can mislead LLMs, resulting in the generation of insecur… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  23. arXiv:2504.10315  [pdf

    physics.med-ph math.OC

    An energy optimization method based on mixed-integer model and variational quantum computing algorithm for faster IMPT

    Authors: Ya-Nan Zhu, Nimita Shinde, Bowen Lin, Hao Gao

    Abstract: Intensity-modulated proton therapy (IMPT) offers superior dose conformity with reduced exposure to surrounding healthy tissues compared to conventional photon therapy. Improving IMPT delivery efficiency reduces motion-related uncertainties, enhances plan robustness, and benefits breath-hold techniques by shortening treatment time. Among various factors, energy switching time plays a critical role,… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  24. arXiv:2504.09066  [pdf

    cs.CV

    Hyperlocal disaster damage assessment using bi-temporal street-view imagery and pre-trained vision models

    Authors: Yifan Yang, Lei Zou, Bing Zhou, Daoyang Li, Binbin Lin, Joynal Abedin, Mingzheng Yang

    Abstract: Street-view images offer unique advantages for disaster damage estimation as they capture impacts from a visual perspective and provide detailed, on-the-ground insights. Despite several investigations attempting to analyze street-view images for damage estimation, they mainly focus on post-disaster images. The potential of time-series street-view images remains underexplored. Pre-disaster images p… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: 27 pages,9 figures

  25. arXiv:2504.07433  [pdf, other

    cs.CL

    LSR-MCTS: Alleviating Long Range Dependency in Code Generation

    Authors: Tingwei Lu, Yangning Li, Liyuan Wang, Binghuai Lin, Jiwei Tang, Qingsong Lv, Wanshi Xu, Hai-Tao Zheng, Yinghui Li, Xin Su, Zifei Shan

    Abstract: The emergence of large language models (LLMs) has significantly promoted the development of code generation task, sparking a surge in pertinent literature. Current research is hindered by redundant generation results and a tendency to overfit local patterns in the short term. Although existing studies attempt to alleviate the issue by adopting a multi-token prediction strategy, there remains limit… ▽ More

    Submitted 17 May, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

  26. arXiv:2504.04871  [pdf, other

    physics.med-ph hep-ex

    Characteristics of Ge-doped Multi-Mode Fibers in Total Ionizing Dose

    Authors: Datao Gong, Suen Hou, Bo-Jing Juang, Bin Lin, Chonghan Liu, Tiankuan Liu, Ming Qi, Yi Yang, Jingbo Ye, Lei Zhang, Li Zhang, HuiPing Zhu

    Abstract: Purpose: The fiber optical links in 850 nm band with Ge-doped multi-mode (MM) fibers are well developed for data transmission at 10 Gbps and higher. The applications in nuclear environments require radiation resistance. The characteristics of Ge-doped MM fibers are investigated for Radiation Induced Attenuation (RIA) in Total Ionizing Dose (TID). Methods: Commercial samples of Ge-doped MM fibers… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: 9 pages, 13 figures

  27. arXiv:2504.02894   

    cs.CL cs.AI

    OnRL-RAG: Real-Time Personalized Mental Health Dialogue System

    Authors: Ahsan Bilal, Beiyu Lin

    Abstract: Large language models (LLMs) have been widely used for various tasks and applications. However, LLMs and fine-tuning are limited to the pre-trained data. For example, ChatGPT's world knowledge until 2021 can be outdated or inaccurate. To enhance the capabilities of LLMs, Retrieval-Augmented Generation (RAG), is proposed to augment LLMs with additional, new, latest details and information to LLMs.… ▽ More

    Submitted 22 April, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

    Comments: It needs more revisions. I am currently working on it with my co-author

  28. arXiv:2504.00636  [pdf, other

    cs.HC

    Exploring the Impact of an LLM-Powered Teachable Agent on Learning Gains and Cognitive Load in Music Education

    Authors: Lingxi Jin, Baicheng Lin, Mengze Hong, Kun Zhang, Hyo-Jeong So

    Abstract: This study examines the impact of an LLM-powered teachable agent, grounded in the Learning by Teaching (LBT) pedagogy, on students' music theory learning and cognitive load. The participants were 28 Chinese university students with prior music instrumental experiences. In an online experiment, they were assigned to either an experimental group, which engaged in music analysis with the teachable ag… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: Accepted at CHI 2025 Workshop on Augmented Educators and AI: Shaping the Future of Human and AI Cooperation in Learning

  29. arXiv:2504.00125  [pdf, other

    cs.AI cs.CL

    LLMs for Explainable AI: A Comprehensive Survey

    Authors: Ahsan Bilal, David Ebert, Beiyu Lin

    Abstract: Large Language Models (LLMs) offer a promising approach to enhancing Explainable AI (XAI) by transforming complex machine learning outputs into easy-to-understand narratives, making model predictions more accessible to users, and helping bridge the gap between sophisticated model behavior and human interpretability. AI models, such as state-of-the-art neural networks and deep learning models, are… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

    Comments: This manuscript is intended for submission to ACM Transactions on Intelligent Systems and Technology

  30. arXiv:2504.00043  [pdf, other

    cs.CL cs.AI cs.CV

    CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation

    Authors: Jixuan Leng, Chengsong Huang, Langlin Huang, Bill Yuchen Lin, William W. Cohen, Haohan Wang, Jiaxin Huang

    Abstract: Existing reasoning evaluation frameworks for Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) predominantly either assess text-based reasoning or vision-language understanding capabilities, with limited dynamic interplay between textual and visual constraints. To address this limitation, we introduce CrossWordBench, a benchmark designed to evaluate the reasoning capabilities o… ▽ More

    Submitted 30 March, 2025; originally announced April 2025.

  31. arXiv:2503.18888  [pdf, other

    cs.SE cs.CL cs.IR

    Toward building next-generation Geocoding systems: a systematic review

    Authors: Zhengcong Yin, Daniel W. Goldberg, Binbin Lin, Bing Zhou, Diya Li, Andong Ma, Ziqian Ming, Heng Cai, Zhe Zhang, Shaohua Wang, Shanzhen Gao, Joey Ying Lee, Xiao Li, Da Huo

    Abstract: Geocoding systems are widely used in both scientific research for spatial analysis and everyday life through location-based services. The quality of geocoded data significantly impacts subsequent processes and applications, underscoring the need for next-generation systems. In response to this demand, this review first examines the evolving requirements for geocoding inputs and outputs across vari… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  32. arXiv:2503.18853  [pdf, other

    cs.CV

    3DSwapping: Texture Swapping For 3D Object From Single Reference Image

    Authors: Xiao Cao, Beibei Lin, Bo Wang, Zhiyong Huang, Robby T. Tan

    Abstract: 3D texture swapping allows for the customization of 3D object textures, enabling efficient and versatile visual transformations in 3D editing. While no dedicated method exists, adapted 2D editing and text-driven 3D editing approaches can serve this purpose. However, 2D editing requires frame-by-frame manipulation, causing inconsistencies across views, while text-driven 3D editing struggles to pres… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  33. arXiv:2503.18065  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation

    Authors: Ziming Wei, Bingqian Lin, Yunshuang Nie, Jiaqi Chen, Shikui Ma, Hang Xu, Xiaodan Liang

    Abstract: Data scarcity is a long-standing challenge in the Vision-Language Navigation (VLN) field, which extremely hinders the generalization of agents to unseen environments. Previous works primarily rely on additional simulator data or web-collected images/videos to improve the generalization. However, the simulator environments still face limited diversity, and the web-collected data often requires exte… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

  34. arXiv:2503.17953  [pdf, other

    cs.SE

    Smoke and Mirrors: Jailbreaking LLM-based Code Generation via Implicit Malicious Prompts

    Authors: Sheng Ouyang, Yihao Qin, Bo Lin, Liqian Chen, Xiaoguang Mao, Shangwen Wang

    Abstract: The proliferation of Large Language Models (LLMs) has revolutionized natural language processing and significantly impacted code generation tasks, enhancing software development efficiency and productivity. Notably, LLMs like GPT-4 have demonstrated remarkable proficiency in text-to-code generation tasks. However, the growing reliance on LLMs for code generation necessitates a critical examination… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

  35. arXiv:2503.14359  [pdf, other

    cs.CV

    ImViD: Immersive Volumetric Videos for Enhanced VR Engagement

    Authors: Zhengxian Yang, Shi Pan, Shengqi Wang, Haoxiang Wang, Li Lin, Guanjun Li, Zhengqi Wen, Borong Lin, Jianhua Tao, Tao Yu

    Abstract: User engagement is greatly enhanced by fully immersive multi-modal experiences that combine visual and auditory stimuli. Consequently, the next frontier in VR/AR technologies lies in immersive volumetric videos with complete scene capture, large 6-DoF interaction space, multi-modal feedback, and high resolution & frame-rate contents. To stimulate the reconstruction of immersive volumetric videos,… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: Accepted by CVPR 2025

  36. arXiv:2503.09154  [pdf, other

    cs.CV

    SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video

    Authors: Chengshu Zhao, Yunyang Ge, Xinhua Cheng, Bin Zhu, Yatian Pang, Bin Lin, Fan Yang, Feng Gao, Li Yuan

    Abstract: Video body-swapping aims to replace the body in an existing video with a new body from arbitrary sources, which has garnered more attention in recent years. Existing methods treat video body-swapping as a composite of multiple tasks instead of an independent task and typically rely on various models to achieve video body-swapping sequentially. However, these methods fail to achieve end-to-end opti… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  37. arXiv:2503.08969  [pdf, other

    cs.SE cs.CR

    Large Language Models-Aided Program Debloating

    Authors: Bo Lin, Shangwen Wang, Yihao Qin, Liqian Chen, Xiaoguang Mao

    Abstract: As software grows in complexity to accommodate diverse features and platforms, software bloating has emerged as a significant challenge, adversely affecting performance and security. However, existing approaches inadequately address the dual objectives of debloating: maintaining functionality by preserving essential features and enhancing security by reducing security issues. Specifically, current… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  38. arXiv:2503.08073  [pdf, other

    cs.CV

    Seeing Beyond Haze: Generative Nighttime Image Dehazing

    Authors: Beibei Lin, Stephen Lin, Robby Tan

    Abstract: Nighttime image dehazing is particularly challenging when dense haze and intense glow severely degrade or completely obscure background information. Existing methods often encounter difficulties due to insufficient background priors and limited generative ability, both essential for handling such conditions. In this paper, we introduce BeyondHaze, a generative nighttime dehazing method that not on… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  39. arXiv:2503.07265  [pdf, other

    cs.CV cs.AI cs.CL

    WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation

    Authors: Yuwei Niu, Munan Ning, Mengren Zheng, Weiyang Jin, Bin Lin, Peng Jin, Jiaqi Liao, Chaoran Feng, Kunpeng Ning, Bin Zhu, Li Yuan

    Abstract: Text-to-Image (T2I) models are capable of generating high-quality artistic creations and visual content. However, existing research and evaluation standards predominantly focus on image realism and shallow text-image alignment, lacking a comprehensive assessment of complex semantic understanding and world knowledge integration in text to image generation. To address this challenge, we propose… ▽ More

    Submitted 27 May, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

    Comments: Code, data and leaderboard: https://github.com/PKU-YuanGroup/WISE

    ACM Class: I.2.7; I.2.10; I.4.9

  40. arXiv:2503.06511  [pdf, other

    cs.LG cs.AI

    HFedCKD: Toward Robust Heterogeneous Federated Learning via Data-free Knowledge Distillation and Two-way Contrast

    Authors: Yiting Zheng, Bohan Lin, Jinqian Chen, Jihua Zhu

    Abstract: Most current federated learning frameworks are modeled as static processes, ignoring the dynamic characteristics of the learning system. Under the limited communication budget of the central server, the flexible model architecture of a large number of clients participating in knowledge transfer requires a lower participation rate, active clients have uneven contributions, and the client scale seri… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  41. arXiv:2503.01245  [pdf, other

    cs.SE cs.LG

    Large Language Models for Code Generation: A Comprehensive Survey of Challenges, Techniques, Evaluation, and Applications

    Authors: Nam Huynh, Beiyu Lin

    Abstract: Large Language Models (LLMs) have demonstrated their remarkable capabilities in numerous fields. This survey focuses on how LLMs empower users, regardless of their technical background, to use human languages to automatically generate executable code. We begin with understanding LLMs' limitations and challenges in automated code generation. Subsequently, we review various fine-tuning techniques de… ▽ More

    Submitted 2 April, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

  42. arXiv:2503.00968  [pdf, other

    physics.ins-det hep-ex

    Simulation of the Background from $^{13}$C$(α, n)^{16}$O Reaction in the JUNO Scintillator

    Authors: JUNO Collaboration, Thomas Adam, Kai Adamowicz, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Fengpeng An, Costas Andreopoulos, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Beretta, Antonio Bergnoli, Nikita Bessonov, Daniel Bick, Lukas Bieger, Svetlana Biktemerova , et al. (608 additional authors not shown)

    Abstract: Large-scale organic liquid scintillator detectors are highly efficient in the detection of MeV-scale electron antineutrinos. These signal events can be detected through inverse beta decay on protons, which produce a positron accompanied by a neutron. A noteworthy background for antineutrinos coming from nuclear power reactors and from the depths of the Earth (geoneutrinos) is generated by ($α, n$)… ▽ More

    Submitted 2 May, 2025; v1 submitted 2 March, 2025; originally announced March 2025.

    Comments: 25 pages, 14 figures, 4 tables

  43. arXiv:2502.20742  [pdf, other

    cs.CV cs.AI cs.CL

    Structured Preference Optimization for Vision-Language Long-Horizon Task Planning

    Authors: Xiwen Liang, Min Lin, Weiqi Ruan, Rongtao Xu, Yuecheng Liu, Jiaqi Chen, Bingqian Lin, Yuzheng Zhuang, Xiaodan Liang

    Abstract: Existing methods for vision-language task planning excel in short-horizon tasks but often fall short in complex, long-horizon planning within dynamic environments. These challenges primarily arise from the difficulty of effectively training models to produce high-quality reasoning processes for long-horizon tasks. To address this, we propose Structured Preference Optimization (SPO), which aims to… ▽ More

    Submitted 15 May, 2025; v1 submitted 28 February, 2025; originally announced February 2025.

    Comments: 18 pages

  44. arXiv:2502.15224  [pdf, other

    cs.LG cs.AI

    Auto-Bench: An Automated Benchmark for Scientific Discovery in LLMs

    Authors: Tingting Chen, Srinivas Anumasa, Beibei Lin, Vedant Shah, Anirudh Goyal, Dianbo Liu

    Abstract: Given the remarkable performance of Large Language Models (LLMs), an important question arises: Can LLMs conduct human-like scientific research and discover new knowledge, and act as an AI scientist? Scientific discovery is an iterative process that demands efficient knowledge updating and encoding. It involves understanding the environment, identifying new hypotheses, and reasoning about actions;… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

    Comments: 13 pages

  45. arXiv:2502.12143  [pdf, other

    cs.AI

    Small Models Struggle to Learn from Strong Reasoners

    Authors: Yuetai Li, Xiang Yue, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Bhaskar Ramasubramanian, Radha Poovendran

    Abstract: Large language models (LLMs) excel in complex reasoning tasks, and distilling their reasoning capabilities into smaller models has shown promise. However, we uncover an interesting phenomenon, which we term the Small Model Learnability Gap: small models ($\leq$3B parameters) do not consistently benefit from long chain-of-thought (CoT) reasoning or distillation from larger models. Instead, they per… ▽ More

    Submitted 22 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  46. arXiv:2502.12025  [pdf, other

    cs.AI cs.CL

    SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities

    Authors: Fengqing Jiang, Zhangchen Xu, Yuetai Li, Luyao Niu, Zhen Xiang, Bo Li, Bill Yuchen Lin, Radha Poovendran

    Abstract: Emerging large reasoning models (LRMs), such as DeepSeek-R1 models, leverage long chain-of-thought (CoT) reasoning to generate structured intermediate steps, enhancing their reasoning capabilities. However, long CoT does not inherently guarantee safe outputs, potentially leading to harmful consequences such as the introduction of security vulnerabilities in code or the spread of misinformation. Cu… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  47. arXiv:2502.05322  [pdf, other

    math.OC math.CO math.MG math.ST

    Tropical Fréchet Means

    Authors: Bo Lin, Kamillo Ferry, Carlos Améndola, Anthea Monod, Ruriko Yoshida

    Abstract: The Fréchet mean is a key measure of central tendency as a barycenter for a given set of points in a general metric space. It is computed by solving an optimization problem and is a fundamental quantity in statistics. In this paper, we study Fréchet means in tropical geometry -- a piecewise linear, combinatorial, and polyhedral variant of algebraic geometry that has gained prominence in applicatio… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

    Comments: 18 pages. 5 figures

    MSC Class: 14T90; 62R01; 62R20; 90C24

  48. arXiv:2502.03233  [pdf, other

    cs.CR cs.SE

    Exploring the Security Threats of Knowledge Base Poisoning in Retrieval-Augmented Code Generation

    Authors: Bo Lin, Shangwen Wang, Liqian Chen, Xiaoguang Mao

    Abstract: The integration of Large Language Models (LLMs) into software development has revolutionized the field, particularly through the use of Retrieval-Augmented Code Generation (RACG) systems that enhance code generation with information from external knowledge bases. However, the security implications of RACG systems, particularly the risks posed by vulnerable code examples in the knowledge base, rema… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  49. arXiv:2502.01100  [pdf, other

    cs.AI cs.CL cs.LG

    ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

    Authors: Bill Yuchen Lin, Ronan Le Bras, Kyle Richardson, Ashish Sabharwal, Radha Poovendran, Peter Clark, Yejin Choi

    Abstract: We investigate the logical reasoning capabilities of large language models (LLMs) and their scalability in complex non-monotonic reasoning. To this end, we introduce ZebraLogic, a comprehensive evaluation framework for assessing LLM reasoning performance on logic grid puzzles derived from constraint satisfaction problems (CSPs). ZebraLogic enables the generation of puzzles with controllable and qu… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

    Comments: Website: https://huggingface.co/spaces/WildEval/ZebraLogic

  50. arXiv:2502.00466  [pdf, ps, other

    cs.LG

    EDELINE: Enhancing Memory in Diffusion-based World Models via Linear-Time Sequence Modeling

    Authors: Jia-Hua Lee, Bor-Jiun Lin, Wei-Fang Sun, Chun-Yi Lee

    Abstract: World models represent a promising approach for training reinforcement learning agents with significantly improved sample efficiency. While most world model methods primarily rely on sequences of discrete latent variables to model environment dynamics, this compression often neglects critical visual details essential for reinforcement learning. Recent diffusion-based world models condition generat… ▽ More

    Submitted 15 June, 2025; v1 submitted 1 February, 2025; originally announced February 2025.

    Comments: 31 pages