Skip to main content

Showing 1–50 of 257 results for author: Yin, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.23448  [pdf, ps, other

    cs.DC

    Lyte Quorum: Off-Chain Ready Smart Contract Hosted with Choice

    Authors: Hao Hao, Dahlia Malkhi, Maofan Yin, Lizan Zhou

    Abstract: This paper introduces Lyquor, a decentralized platform that reimagines blockchain infrastructure through a service-centric model where nodes selectively host smart contracts (called Lyquids) while preserving global composability. We present three key innovations: (1) Fate-Constrained Ordering (FCO), which decouples consensus from execution to enable selective hosting without sacrificing Layer-1 gr… ▽ More

    Submitted 27 September, 2025; originally announced September 2025.

  2. arXiv:2509.13160  [pdf, ps, other

    cs.LG cs.AI

    FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning

    Authors: Liang Hu, Jianpeng Jiao, Jiashuo Liu, Yanle Ren, Zhoufutu Wen, Kaiyuan Zhang, Xuanliang Zhang, Xiang Gao, Tianci He, Fei Hu, Yali Liao, Zaiyuan Wang, Chenghao Yang, Qianyu Yang, Mingren Yin, Zhiyuan Zeng, Ge Zhang, Xinyi Zhang, Xiying Zhao, Zhenwei Zhu, Hongseok Namkoong, Wenhao Huang, Yuwen Tang

    Abstract: Search has emerged as core infrastructure for LLM-based agents and is widely viewed as critical on the path toward more general intelligence. Finance is a particularly demanding proving ground: analysts routinely conduct complex, multi-step searches over time-sensitive, domain-specific data, making it ideal for assessing both search proficiency and knowledge-grounded reasoning. Yet no existing ope… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

    Comments: 29 pages

  3. arXiv:2509.09868  [pdf, ps, other

    cs.DC cs.CR cs.MA

    Ordered Consensus with Equal Opportunity

    Authors: Yunhao Zhang, Haobin Ni, Soumya Basu, Shir Cohen, Maofan Yin, Lorenzo Alvisi, Robbert van Renesse, Qi Chen, Lidong Zhou

    Abstract: The specification of state machine replication (SMR) has no requirement on the final total order of commands. In blockchains based on SMR, however, order matters, since different orders could provide their clients with different financial rewards. Ordered consensus augments the specification of SMR to include specific guarantees on such order, with a focus on limiting the influence of Byzantine no… ▽ More

    Submitted 11 September, 2025; originally announced September 2025.

  4. arXiv:2509.06925  [pdf, ps, other

    physics.geo-ph cs.LG

    Data-driven solar forecasting enables near-optimal economic decisions

    Authors: Zhixiang Dai, Minghao Yin, Xuanhong Chen, Alberto Carpentieri, Jussi Leinonen, Boris Bonev, Chengzhe Zhong, Thorsten Kurth, Jingan Sun, Ram Cherukuri, Yuzhou Zhang, Ruihua Zhang, Farah Hariri, Xiaodong Ding, Chuanxiang Zhu, Dake Zhang, Yaodan Cui, Yuxi Lu, Yue Song, Bin He, Jie Chen, Yixin Zhu, Chenheng Xu, Maofeng Liu, Zeyi Niu , et al. (5 additional authors not shown)

    Abstract: Solar energy adoption is critical to achieving net-zero emissions. However, it remains difficult for many industrial and commercial actors to decide on whether they should adopt distributed solar-battery systems, which is largely due to the unavailability of fast, low-cost, and high-resolution irradiance forecasts. Here, we present SunCastNet, a lightweight data-driven forecasting system that prov… ▽ More

    Submitted 8 September, 2025; originally announced September 2025.

    Comments: Main text ~12 pages, 4 figures, 0 tables

  5. arXiv:2508.18781  [pdf, ps, other

    cs.AI cs.MM

    AniME: Adaptive Multi-Agent Planning for Long Animation Generation

    Authors: Lisai Zhang, Baohan Xu, Siqian Yang, Mingyu Yin, Jing Liu, Chao Xu, Siqi Wang, Yidi Wu, Yuxin Hong, Zihao Zhang, Yanzhang Liang, Yudong Jiang

    Abstract: We present AniME, a director-oriented multi-agent system for automated long-form anime production, covering the full workflow from a story to the final video. The director agent keeps a global memory for the whole workflow, and coordinates several downstream specialized agents. By integrating customized Model Context Protocol (MCP) with downstream model instruction, the specialized agent adaptivel… ▽ More

    Submitted 26 August, 2025; v1 submitted 26 August, 2025; originally announced August 2025.

    Comments: 2 pages, Technical Report

  6. arXiv:2508.18264  [pdf, ps, other

    cs.CV

    MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs

    Authors: Sixun Dong, Juhua Hu, Mian Zhang, Ming Yin, Yanjie Fu, Qi Qian

    Abstract: Vision-Language Models (VLMs) demonstrate impressive performance in understanding visual content with language instruction by converting visual input to vision tokens. However, redundancy in vision tokens results in the degenerated inference efficiency of VLMs. While many algorithms have been proposed to reduce the number of vision tokens, most of them apply only unimodal information (i.e., vision… ▽ More

    Submitted 25 August, 2025; originally announced August 2025.

    Comments: Project page: https://project.ironieser.cc/mmtok

  7. arXiv:2508.17069  [pdf, ps, other

    cs.AR cs.AI

    Optimizing Neural Networks with Learnable Non-Linear Activation Functions via Lookup-Based FPGA Acceleration

    Authors: Mengyuan Yin, Benjamin Chen Ming Choong, Chuping Qu, Rick Siow Mong Goh, Weng-Fai Wong, Tao Luo

    Abstract: Learned activation functions in models like Kolmogorov-Arnold Networks (KANs) outperform fixed-activation architectures in terms of accuracy and interpretability; however, their computational complexity poses critical challenges for energy-constrained edge AI deployments. Conventional CPUs/GPUs incur prohibitive latency and power costs when evaluating higher order activations, limiting deployabili… ▽ More

    Submitted 23 August, 2025; originally announced August 2025.

  8. arXiv:2508.15760  [pdf, ps, other

    cs.CL cs.AI

    LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

    Authors: Ming Yin, Dinghan Shen, Silei Xu, Jianbing Han, Sixun Dong, Mian Zhang, Yebowen Hu, Shujian Liu, Simin Ma, Song Wang, Sathish Reddy Indurthi, Xun Wang, Yiran Chen, Kaiqiang Song

    Abstract: Tool calling has emerged as a critical capability for AI agents to interact with the real world and solve complex tasks. While the Model Context Protocol (MCP) provides a powerful standardized framework for tool integration, there is a significant gap in benchmarking how well AI agents can effectively solve multi-step tasks using diverse MCP tools in realistic, dynamic scenarios. In this work, we… ▽ More

    Submitted 21 August, 2025; originally announced August 2025.

  9. arXiv:2508.14393  [pdf, ps, other

    cs.CV

    Img2ST-Net: Efficient High-Resolution Spatial Omics Prediction from Whole Slide Histology Images via Fully Convolutional Image-to-Image Learning

    Authors: Junchao Zhu, Ruining Deng, Junlin Guo, Tianyuan Yao, Juming Xiong, Chongyu Qu, Mengmeng Yin, Yu Wang, Shilin Zhao, Haichun Yang, Daguang Xu, Yucheng Tang, Yuankai Huo

    Abstract: Recent advances in multi-modal AI have demonstrated promising potential for generating the currently expensive spatial transcriptomics (ST) data directly from routine histology images, offering a means to reduce the high cost and time-intensive nature of ST data acquisition. However, the increasing resolution of ST, particularly with platforms such as Visium HD achieving 8um or finer, introduces s… ▽ More

    Submitted 19 August, 2025; originally announced August 2025.

  10. arXiv:2508.11987  [pdf, ps, other

    cs.AI cs.LG

    FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction

    Authors: Zhiyuan Zeng, Jiashuo Liu, Siyuan Chen, Tianci He, Yali Liao, Yixiao Tian, Jinpeng Wang, Zaiyuan Wang, Yang Yang, Lingyue Yin, Mingren Yin, Zhenwei Zhu, Tianle Cai, Zehui Chen, Jiecao Chen, Yantao Du, Xiang Gao, Jiacheng Guo, Liang Hu, Jianpeng Jiao, Xiangsheng Li, Jingkai Liu, Shuang Ni, Zhoufutu Wen, Ge Zhang , et al. (6 additional authors not shown)

    Abstract: Future prediction is a complex task for LLM agents, requiring a high level of analytical thinking, information gathering, contextual understanding, and decision-making under uncertainty. Agents must not only gather and interpret vast amounts of dynamic information but also integrate diverse data sources, weigh uncertainties, and adapt predictions based on emerging trends, just as human experts do… ▽ More

    Submitted 5 September, 2025; v1 submitted 16 August, 2025; originally announced August 2025.

    Comments: Technical report, 51 pages. Update the results

  11. arXiv:2508.11870  [pdf, ps, other

    cs.CV cs.AI

    AdaRing: Towards Ultra-Light Vision-Language Adaptation via Cross-Layer Tensor Ring Decomposition

    Authors: Ying Huang, Yuanbin Man, Wenqi Jia, Zhengzhong Tu, Junzhou Huang, Miao Yin

    Abstract: Adapter-based fine-tuning has gained remarkable attention in adapting large pre-trained vision language models (VLMs) for a wide range of downstream tasks efficiently. In this paradigm, only the inserted adapters are fine-tuned, without the need for training the original VLM backbone. Existing works scale adapters by integrating them into every layer of VLMs to increase the capacity of adapters. H… ▽ More

    Submitted 19 August, 2025; v1 submitted 15 August, 2025; originally announced August 2025.

  12. arXiv:2508.09125  [pdf, ps, other

    cs.CL cs.LG

    Complex Logical Instruction Generation

    Authors: Mian Zhang, Shujian Liu, Sixun Dong, Ming Yin, Yebowen Hu, Xun Wang, Steven Ma, Song Wang, Sathish Reddy Indurthi, Haoyun Deng, Zhiyu Zoey Chen, Kaiqiang Song

    Abstract: Instruction following has catalyzed the recent era of Large Language Models (LLMs) and is the foundational skill underpinning more advanced capabilities such as reasoning and agentic behaviors. As tasks grow more challenging, the logic structures embedded in natural language instructions becomes increasingly intricate. However, how well LLMs perform on such logic-rich instructions remains under-ex… ▽ More

    Submitted 12 August, 2025; originally announced August 2025.

  13. arXiv:2508.07557  [pdf, ps, other

    cs.CV

    Splat4D: Diffusion-Enhanced 4D Gaussian Splatting for Temporally and Spatially Consistent Content Creation

    Authors: Minghao Yin, Yukang Cao, Songyou Peng, Kai Han

    Abstract: Generating high-quality 4D content from monocular videos for applications such as digital humans and AR/VR poses challenges in ensuring temporal and spatial consistency, preserving intricate details, and incorporating user guidance effectively. To overcome these challenges, we introduce Splat4D, a novel framework enabling high-fidelity 4D content generation from a monocular video. Splat4D achieves… ▽ More

    Submitted 10 August, 2025; originally announced August 2025.

  14. arXiv:2507.09577  [pdf, ps, other

    cs.CV

    Memory-Augmented SAM2 for Training-Free Surgical Video Segmentation

    Authors: Ming Yin, Fu Wang, Xujiong Ye, Yanda Meng, Zeyu Fu

    Abstract: Surgical video segmentation is a critical task in computer-assisted surgery, essential for enhancing surgical quality and patient outcomes. Recently, the Segment Anything Model 2 (SAM2) framework has demonstrated remarkable advancements in both image and video segmentation. However, the inherent limitations of SAM2's greedy selection memory design are amplified by the unique properties of surgical… ▽ More

    Submitted 22 July, 2025; v1 submitted 13 July, 2025; originally announced July 2025.

    Comments: Accepted in MICCAI 2025

  15. arXiv:2507.01224  [pdf, ps, other

    cs.DC

    FLARE: A Dataflow-Aware and Scalable Hardware Architecture for Neural-Hybrid Scientific Lossy Compression

    Authors: Wenqi Jia, Ying Huang, Jian Xu, Zhewen Hu, Sian Jin, Jiannan Tian, Yuede Ji, Miao Yin

    Abstract: Scientific simulation leveraging high-performance computing (HPC) systems is crucial for modeling complex systems and phenomena in fields such as astrophysics, climate science, and fluid dynamics, generating massive datasets that often reach petabyte to exabyte scales. However, managing these vast data volumes introduces significant I/O and network bottlenecks, limiting practical performance and s… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  16. arXiv:2506.22675  [pdf, ps, other

    stat.ML cs.LG

    Bayesian Invariance Modeling of Multi-Environment Data

    Authors: Luhuan Wu, Mingzhang Yin, Yixin Wang, John P. Cunningham, David M. Blei

    Abstract: Invariant prediction [Peters et al., 2016] analyzes feature/outcome data from multiple environments to identify invariant features - those with a stable predictive relationship to the outcome. Such features support generalization to new environments and help reveal causal mechanisms. Previous methods have primarily tackled this problem through hypothesis testing or regularized optimization. Here w… ▽ More

    Submitted 9 July, 2025; v1 submitted 27 June, 2025; originally announced June 2025.

  17. arXiv:2506.22365  [pdf, ps, other

    cs.LG cs.RO

    Reinforcement Learning with Physics-Informed Symbolic Program Priors for Zero-Shot Wireless Indoor Navigation

    Authors: Tao Li, Haozhe Lei, Mingsheng Yin, Yaqi Hu

    Abstract: When using reinforcement learning (RL) to tackle physical control tasks, inductive biases that encode physics priors can help improve sample efficiency during training and enhance generalization in testing. However, the current practice of incorporating these helpful physics-informed inductive biases inevitably runs into significant manual labor and domain expertise, making them prohibitive for ge… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Spotlight paper at Reinforcement Learning Conference 2025, Workshop on Inductive Biases in Reinforcement Learning

  18. arXiv:2506.21923  [pdf, ps, other

    cs.CV

    ZeroReg3D: A Zero-shot Registration Pipeline for 3D Consecutive Histopathology Image Reconstruction

    Authors: Juming Xiong, Ruining Deng, Jialin Yue, Siqi Lu, Junlin Guo, Marilyn Lionts, Tianyuan Yao, Can Cui, Junchao Zhu, Chongyu Qu, Mengmeng Yin, Haichun Yang, Yuankai Huo

    Abstract: Histological analysis plays a crucial role in understanding tissue structure and pathology. While recent advancements in registration methods have improved 2D histological analysis, they often struggle to preserve critical 3D spatial relationships, limiting their utility in both clinical and research applications. Specifically, constructing accurate 3D models from 2D slices remains challenging due… ▽ More

    Submitted 28 July, 2025; v1 submitted 27 June, 2025; originally announced June 2025.

  19. arXiv:2506.18269  [pdf, ps, other

    cs.HC

    Co-persona: Leveraging LLMs and Expert Collaboration to Understand User Personas through Social Media Data Analysis

    Authors: Min Yin, Haoyu Liu, Boyi Lian, Chunlei Chai

    Abstract: This study introduces Co-Persona, a methodological framework bridging large-scale social media analysis with authentic user understanding through systematic integration of Large Language Models and expert validation. Through a case study of B.Co, a Chinese manufacturer, we investigated Co-Persona application in bedside lamp development. Our methodology analyzed over 38 million posts from Xiao Hong… ▽ More

    Submitted 24 June, 2025; v1 submitted 22 June, 2025; originally announced June 2025.

    Comments: 17pages,5figures,8tables

  20. arXiv:2506.14929  [pdf, ps, other

    cs.LG

    FedOne: Query-Efficient Federated Learning for Black-box Discrete Prompt Learning

    Authors: Ganyu Wang, Jinjie Fang, Maxwell J. Yin, Bin Gu, Xi Chen, Boyu Wang, Yi Chang, Charles Ling

    Abstract: Black-Box Discrete Prompt Learning is a prompt-tuning method that optimizes discrete prompts without accessing model parameters or gradients, making the prompt tuning on a cloud-based Large Language Model (LLM) feasible. Adapting federated learning to BDPL could further enhance prompt tuning performance by leveraging data from diverse sources. However, all previous research on federated black-box… ▽ More

    Submitted 23 September, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

    Comments: Published in Proceedings of the 42nd International Conference on Machine Learning

    Journal ref: Proceedings of the 42nd International Conference on Machine Learning, Vancouver, Canada, PMLR267, 2025

  21. arXiv:2505.22855  [pdf, ps, other

    eess.IV cs.CV

    IRS: Incremental Relationship-guided Segmentation for Digital Pathology

    Authors: Ruining Deng, Junchao Zhu, Juming Xiong, Can Cui, Tianyuan Yao, Junlin Guo, Siqi Lu, Marilyn Lionts, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Yihe Yang, Paul Dennis Simonson, Mert R. Sabuncu, Haichun Yang, Yuankai Huo

    Abstract: Continual learning is rapidly emerging as a key focus in computer vision, aiming to develop AI systems capable of continuous improvement, thereby enhancing their value and practicality in diverse real-world applications. In healthcare, continual learning holds great promise for continuously acquired digital pathology data, which is collected in hospitals on a daily basis. However, panoramic segmen… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  22. arXiv:2505.19501  [pdf, ps, other

    cs.AI

    Toward Scientific Reasoning in LLMs: Training from Expert Discussions via Reinforcement Learning

    Authors: Ming Yin, Yuanhao Qu, Ling Yang, Le Cong, Mengdi Wang

    Abstract: We investigate how to teach large language models (LLMs) to perform scientific reasoning by leveraging expert discussions as a learning signal. Focusing on the genomics domain, we develop an automated pipeline to extract trainable data and introduce Genome-Bench, a new benchmark constructed from over a decade of scientific forum discussions on genome engineering. Our pipeline transforms raw intera… ▽ More

    Submitted 2 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  23. arXiv:2505.17925  [pdf, ps, other

    cs.IR

    Enhancing CTR Prediction with De-correlated Expert Networks

    Authors: Jiancheng Wang, Mingjia Yin, Hao Wang, Enhong Chen

    Abstract: Modeling feature interactions is essential for accurate click-through rate (CTR) prediction in advertising systems. Recent studies have adopted the Mixture-of-Experts (MoE) approach to improve performance by ensembling multiple feature interaction experts. These studies employ various strategies, such as learning independent embedding tables for each expert or utilizing heterogeneous expert archit… ▽ More

    Submitted 15 September, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  24. arXiv:2505.00212  [pdf, ps, other

    cs.MA cs.CL

    Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems

    Authors: Shaokun Zhang, Ming Yin, Jieyu Zhang, Jiale Liu, Zhiguang Han, Jingyang Zhang, Beibin Li, Chi Wang, Huazheng Wang, Yiran Chen, Qingyun Wu

    Abstract: Failure attribution in LLM multi-agent systems-identifying the agent and step responsible for task failures-provides crucial clues for systems debugging but remains underexplored and labor-intensive. In this paper, we propose and formulate a new research area: automated failure attribution for LLM multi-agent systems. To support this initiative, we introduce the Who&When dataset, comprising extens… ▽ More

    Submitted 1 June, 2025; v1 submitted 30 April, 2025; originally announced May 2025.

    Comments: camera-ready

  25. arXiv:2504.21583  [pdf, other

    cs.NI

    Toward Realization of Low-Altitude Economy Networks: Core Architecture, Integrated Technologies, and Future Directions

    Authors: Yixian Wang, Geng Sun, Zemin Sun, Jiacheng Wang, Jiahui Li, Changyuan Zhao, Jing Wu, Shuang Liang, Minghao Yin, Pengfei Wang, Dusit Niyato, Sumei Sun, Dong In Kim

    Abstract: The rise of the low-altitude economy (LAE) is propelling urban development and emerging industries by integrating advanced technologies to enhance efficiency, safety, and sustainability in low-altitude operations. The widespread adoption of unmanned aerial vehicles (UAVs) and electric vertical takeoff and landing (eVTOL) aircraft plays a crucial role in enabling key applications within LAE, such a… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: 25 pages, 12 figures, published to TCCN

  26. arXiv:2504.10983  [pdf, other

    cs.LG cs.AI q-bio.BM

    ProtFlow: Fast Protein Sequence Design via Flow Matching on Compressed Protein Language Model Embeddings

    Authors: Zitai Kong, Yiheng Zhu, Yinlong Xu, Hanjing Zhou, Mingzhe Yin, Jialu Wu, Hongxia Xu, Chang-Yu Hsieh, Tingjun Hou, Jian Wu

    Abstract: The design of protein sequences with desired functionalities is a fundamental task in protein engineering. Deep generative methods, such as autoregressive models and diffusion models, have greatly accelerated the discovery of novel protein sequences. However, these methods mainly focus on local or shallow residual semantics and suffer from low inference efficiency, large modeling space and high tr… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  27. arXiv:2504.10044  [pdf, ps, other

    cs.CV

    Aligning Anime Video Generation with Human Feedback

    Authors: Bingwen Zhu, Yudong Jiang, Baohan Xu, Siqian Yang, Mingyu Yin, Yidi Wu, Huyang Sun, Zuxuan Wu

    Abstract: Anime video generation faces significant challenges due to the scarcity of anime data and unusual motion patterns, leading to issues such as motion distortion and flickering artifacts, which result in misalignment with human preferences. Existing reward models, designed primarily for real-world videos, fail to capture the unique appearance and consistency requirements of anime. In this work, we pr… ▽ More

    Submitted 24 June, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

    Comments: 10 pages, 7 figures, 7 tables

  28. Entertainers Between Real and Virtual -- Investigating Viewer Interaction, Engagement, and Relationships with Avatarized Virtual Livestreamers

    Authors: Michael Yin, Chenxinran Shen, Robert Xiao

    Abstract: Virtual YouTubers (VTubers) are avatar-based livestreamers that are voiced and played by human actors. VTubers have been popular in East Asia for years and have more recently seen widespread international growth. Despite their emergent popularity, research has been scarce into the interactions and relationships that exist between avatarized VTubers and their viewers, particularly in contrast to no… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: 15 pages, to be published in the ACM International Conference on Interactive Media Experiences (IMX'25)

  29. VIBES: Exploring Viewer Spatial Interactions as Direct Input for Livestreamed Content

    Authors: Michael Yin, Robert Xiao

    Abstract: Livestreaming has rapidly become a popular online pastime, with real-time interaction between streamer and viewer being a key motivating feature. However, viewers have traditionally had limited opportunity to directly influence the streamed content; even when such interactions are possible, it has been reliant on text-based chat. We investigate the potential of spatial interaction on the livestrea… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: 20 pages, 11 figures, to be published in the ACM International Conference on Interactive Media Experiences (IMX'25)

  30. arXiv:2503.10742  [pdf, other

    cs.LG cs.CL

    Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing

    Authors: Yudong Liu, Jingwei Sun, Yueqian Lin, Jingyang Zhang, Ming Yin, Qinsi Wang, Jianyi Zhang, Hai Li, Yiran Chen

    Abstract: Vision language models (VLMs) demonstrate strong capabilities in jointly processing visual and textual data. However, they often incur substantial computational overhead due to redundant visual information, particularly in long-form video scenarios. Existing approaches predominantly focus on either vision token pruning, which may overlook spatio-temporal dependencies, or keyframe selection, which… ▽ More

    Submitted 24 April, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

  31. arXiv:2503.04362  [pdf, other

    cs.LG cs.AI q-bio.BM

    A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery

    Authors: Yiheng Zhu, Mingyang Li, Junlong Liu, Kun Fu, Jiansheng Wu, Qiuyi Li, Mingze Yin, Jieping Ye, Jian Wu, Zheng Wang

    Abstract: Structure-based drug discovery (SBDD) is a systematic scientific process that develops new drugs by leveraging the detailed physical structure of the target protein. Recent advancements in pre-trained models for biomolecules have demonstrated remarkable success across various biochemical applications, including drug discovery and protein engineering. However, in most approaches, the pre-trained mo… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  32. arXiv:2502.21011  [pdf, other

    cs.CV

    MagNet: Multi-Level Attention Graph Network for Predicting High-Resolution Spatial Transcriptomics

    Authors: Junchao Zhu, Ruining Deng, Tianyuan Yao, Juming Xiong, Chongyu Qu, Junlin Guo, Siqi Lu, Yucheng Tang, Daguang Xu, Mengmeng Yin, Yu Wang, Shilin Zhao, Yaohong Wang, Haichun Yang, Yuankai Huo

    Abstract: The rapid development of spatial transcriptomics (ST) offers new opportunities to explore the gene expression patterns within the spatial microenvironment. Current research integrates pathological images to infer gene expression, addressing the high costs and time-consuming processes to generate spatial transcriptomics data. However, as spatial transcriptomics resolution continues to improve, exis… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

  33. arXiv:2502.11919  [pdf, other

    cs.HC cs.CL

    From Text to Trust: Empowering AI-assisted Decision Making with Adaptive LLM-powered Analysis

    Authors: Zhuoyan Li, Hangxiao Zhu, Zhuoran Lu, Ziang Xiao, Ming Yin

    Abstract: AI-assisted decision making becomes increasingly prevalent, yet individuals often fail to utilize AI-based decision aids appropriately especially when the AI explanations are absent, potentially as they do not %understand reflect on AI's decision recommendations critically. Large language models (LLMs), with their exceptional conversational and analytical capabilities, present great opportunities… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: CHI 2025

  34. arXiv:2502.07302  [pdf, other

    cs.CV

    CASC-AI: Consensus-aware Self-corrective Learning for Noise Cell Segmentation

    Authors: Ruining Deng, Yihe Yang, David J. Pisapia, Benjamin Liechty, Junchao Zhu, Juming Xiong, Junlin Guo, Zhengyi Lu, Jiacheng Wang, Xing Yao, Runxuan Yu, Rendong Zhang, Gaurav Rudravaram, Mengmeng Yin, Pinaki Sarder, Haichun Yang, Yuankai Huo, Mert R. Sabuncu

    Abstract: Multi-class cell segmentation in high-resolution gigapixel whole slide images (WSIs) is crucial for various clinical applications. However, training such models typically requires labor-intensive, pixel-wise annotations by domain experts. Recent efforts have democratized this process by involving lay annotators without medical expertise. However, conventional non-corrective approaches struggle to… ▽ More

    Submitted 10 March, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

  35. arXiv:2502.07292  [pdf, other

    cs.HC

    Investigating Creativity in Humans and Generative AI Through Circles Exercises

    Authors: Runlin Duan, Shao-Kang Hsia, Yuzhao Chen, Yichen Hu, Ming Yin, Karthik Ramani

    Abstract: Generative AI (GenAI) is transforming the creativity process. However, as presented in this paper, GenAI encounters "narrow creativity" barriers. We observe that both humans and GenAI focus on limited subsets of the design space. We investigate this phenomenon using the "Circles Exercise," a creativity test widely used to examine the creativity of humans. Quantitative analysis reveals that humans… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

  36. arXiv:2502.07288  [pdf, other

    cs.CV cs.AI

    KPIs 2024 Challenge: Advancing Glomerular Segmentation from Patch- to Slide-Level

    Authors: Ruining Deng, Tianyuan Yao, Yucheng Tang, Junlin Guo, Siqi Lu, Juming Xiong, Lining Yu, Quan Huu Cap, Pengzhou Cai, Libin Lan, Ze Zhao, Adrian Galdran, Amit Kumar, Gunjan Deotale, Dev Kumar Das, Inyoung Paik, Joonho Lee, Geongyu Lee, Yujia Chen, Wangkai Li, Zhaoyang Li, Xuege Hou, Zeyuan Wu, Shengjin Wang, Maximilian Fischer , et al. (22 additional authors not shown)

    Abstract: Chronic kidney disease (CKD) is a major global health issue, affecting over 10% of the population and causing significant mortality. While kidney biopsy remains the gold standard for CKD diagnosis and treatment, the lack of comprehensive benchmarks for kidney pathology segmentation hinders progress in the field. To address this, we organized the Kidney Pathology Image Segmentation (KPIs) Challenge… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

  37. arXiv:2502.06453  [pdf, other

    cs.LG cs.AI cs.CL

    MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations

    Authors: Kaixuan Huang, Jiacheng Guo, Zihao Li, Xiang Ji, Jiawei Ge, Wenzhe Li, Yingqing Guo, Tianle Cai, Hui Yuan, Runzhe Wang, Yue Wu, Ming Yin, Shange Tang, Yangsibo Huang, Chi Jin, Xinyun Chen, Chiyuan Zhang, Mengdi Wang

    Abstract: Large language models have demonstrated impressive performance on challenging mathematical reasoning tasks, which has triggered the discussion of whether the performance is achieved by true reasoning capability or memorization. To investigate this question, prior work has constructed mathematical benchmarks when questions undergo simple perturbations -- modifications that still preserve the underl… ▽ More

    Submitted 12 February, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

    Comments: v2: fix bugs in Fig. 1

  38. arXiv:2502.03266  [pdf, other

    cs.CV cs.RO

    ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models

    Authors: Ying Zhang, Maoliang Yin, Wenfu Bi, Haibao Yan, Shaohan Bian, Cui-Hua Zhang, Changchun Hua

    Abstract: Service robots operating in unstructured environments must effectively recognize and segment unknown objects to enhance their functionality. Traditional supervised learningbased segmentation techniques require extensive annotated datasets, which are impractical for the diversity of objects encountered in real-world scenarios. Unseen Object Instance Segmentation (UOIS) methods aim to address this b… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  39. TD3: Tucker Decomposition Based Dataset Distillation Method for Sequential Recommendation

    Authors: Jiaqing Zhang, Mingjia Yin, Hao Wang, Yawen Li, Yuyang Ye, Xingyu Lou, Junping Du, Enhong Chen

    Abstract: In the era of data-centric AI, the focus of recommender systems has shifted from model-centric innovations to data-centric approaches. The success of modern AI models is built on large-scale datasets, but this also results in significant training costs. Dataset distillation has emerged as a key solution, condensing large datasets to accelerate model training while preserving model performance. How… ▽ More

    Submitted 6 February, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

    Comments: This work has been accepted by WWW2025

  40. arXiv:2502.01160  [pdf, ps, other

    cs.AI cs.IT

    Scalable Precise Computation of Shannon Entropy

    Authors: Yong Lai, Haolong Tong, Zhenghang Xu, Minghao Yin

    Abstract: Quantitative information flow analyses (QIF) are a class of techniques for measuring the amount of confidential information leaked by a program to its public outputs. Shannon entropy is an important method to quantify the amount of leakage in QIF. This paper focuses on the programs modeled in Boolean constraints and optimizes the two stages of the Shannon entropy computation to implement a scalabl… ▽ More

    Submitted 14 June, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

    Comments: 19 pages, 5 figures

  41. arXiv:2501.15849  [pdf, ps, other

    eess.SY cs.LG

    Gaussian Process-Based Prediction and Control of Hammerstein-Wiener Systems

    Authors: Mingzhou Yin, Matthias A. Müller

    Abstract: This work investigates data-driven prediction and control of Hammerstein-Wiener systems using physics-informed Gaussian process models. Data-driven prediction algorithms have been developed for structured nonlinear systems based on Willems' fundamental lemma. However, existing frameworks cannot treat output nonlinearities and require a dictionary of basis functions for Hammerstein systems. In this… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

  42. arXiv:2501.14249  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Humanity's Last Exam

    Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1087 additional authors not shown)

    Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More

    Submitted 25 September, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 29 pages, 6 figures

  43. arXiv:2501.09804  [pdf, other

    cs.LG cs.AI cs.CL

    Enhancing Generalization in Chain of Thought Reasoning for Smaller Models

    Authors: Maxwell J. Yin, Dingyi Jiang, Yongbing Chen, Boyu Wang, Charles Ling

    Abstract: Chain-of-Thought (CoT) reasoning in smaller language models is a challenging natural language process problem yet highly desirable in many real-life applications. Existing CoT knowledge distillation methods often suffer from overly conservative memorization in smaller LLMs, leading to low generalization confidence. As fully preserving the CoT ability of teacher model is impossible, we hypothesize… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

  44. arXiv:2501.06151  [pdf, other

    eess.IV cs.CV

    PySpatial: A High-Speed Whole Slide Image Pathomics Toolkit

    Authors: Yuechen Yang, Yu Wang, Tianyuan Yao, Ruining Deng, Mengmeng Yin, Shilin Zhao, Haichun Yang, Yuankai Huo

    Abstract: Whole Slide Image (WSI) analysis plays a crucial role in modern digital pathology, enabling large-scale feature extraction from tissue samples. However, traditional feature extraction pipelines based on tools like CellProfiler often involve lengthy workflows, requiring WSI segmentation into patches, feature extraction at the patch level, and subsequent mapping back to the original WSI. To address… ▽ More

    Submitted 10 January, 2025; originally announced January 2025.

  45. arXiv:2501.05361  [pdf, other

    cs.LG

    No-Regret Linear Bandits under Gap-Adjusted Misspecification

    Authors: Chong Liu, Dan Qiao, Ming Yin, Ilija Bogunovic, Yu-Xiang Wang

    Abstract: This work studies linear bandits under a new notion of gap-adjusted misspecification and is an extension of Liu et al. (2023). When the underlying reward function is not linear, existing linear bandits work usually relies on a uniform misspecification parameter $ε$ that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever $ε> 0$. We pr… ▽ More

    Submitted 9 January, 2025; originally announced January 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2302.13252

  46. arXiv:2501.02089  [pdf, other

    cs.LG cs.AI

    On the Statistical Complexity for Offline and Low-Adaptive Reinforcement Learning with Structures

    Authors: Ming Yin, Mengdi Wang, Yu-Xiang Wang

    Abstract: This article reviews the recent advances on the statistical foundation of reinforcement learning (RL) in the offline and low-adaptive settings. We will start by arguing why offline RL is the appropriate model for almost any real-life ML problems, even if they have nothing to do with the recent AI breakthroughs that use RL. Then we will zoom into two fundamental problems of offline RL: offline poli… ▽ More

    Submitted 3 January, 2025; originally announced January 2025.

    Comments: Review Article

  47. arXiv:2412.20014  [pdf, other

    cs.LG cs.AI q-bio.BM

    ProtCLIP: Function-Informed Protein Multi-Modal Learning

    Authors: Hanjing Zhou, Mingze Yin, Wei Wu, Mingyang Li, Kun Fu, Jintai Chen, Jian Wu, Zheng Wang

    Abstract: Multi-modality pre-training paradigm that aligns protein sequences and biological descriptions has learned general protein representations and achieved promising performance in various downstream applications. However, these works were still unable to replicate the extraordinary success of language-supervised visual foundation models due to the ineffective usage of aligned protein-text paired data… ▽ More

    Submitted 27 December, 2024; originally announced December 2024.

    Journal ref: AAAI 2025

  48. arXiv:2412.16118  [pdf, other

    physics.med-ph cs.AI

    Convolutional Deep Operator Networks for Learning Nonlinear Focused Ultrasound Wave Propagation in Heterogeneous Spinal Cord Anatomy

    Authors: Avisha Kumar, Xuzhe Zhi, Zan Ahmad, Minglang Yin, Amir Manbachi

    Abstract: Focused ultrasound (FUS) therapy is a promising tool for optimally targeted treatment of spinal cord injuries (SCI), offering submillimeter precision to enhance blood flow at injury sites while minimizing impact on surrounding tissues. However, its efficacy is highly sensitive to the placement of the ultrasound source, as the spinal cord's complex geometry and acoustic heterogeneity distort and at… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: Accepted for oral presentation at AAAI Conference on Artificial Intelligence: AI for Accelerating Science and Engineering Workshop 2025

  49. arXiv:2412.10255  [pdf, other

    cs.GR cs.AI

    AniSora: Exploring the Frontiers of Animation Video Generation in the Sora Era

    Authors: Yudong Jiang, Baohan Xu, Siqian Yang, Mingyu Yin, Jing Liu, Chao Xu, Siqi Wang, Yidi Wu, Bingwen Zhu, Xinwen Zhang, Xingyu Zheng, Jixuan Xu, Yue Zhang, Jinlong Hou, Huyang Sun

    Abstract: Animation has gained significant interest in the recent film and TV industry. Despite the success of advanced video generation models like Sora, Kling, and CogVideoX in generating natural videos, they lack the same effectiveness in handling animation videos. Evaluating animation video generation is also a great challenge due to its unique artist styles, violating the laws of physics and exaggerate… ▽ More

    Submitted 22 May, 2025; v1 submitted 13 December, 2024; originally announced December 2024.

  50. arXiv:2412.03026  [pdf, other

    cs.CV

    ASIGN: An Anatomy-aware Spatial Imputation Graphic Network for 3D Spatial Transcriptomics

    Authors: Junchao Zhu, Ruining Deng, Tianyuan Yao, Juming Xiong, Chongyu Qu, Junlin Guo, Siqi Lu, Mengmeng Yin, Yu Wang, Shilin Zhao, Haichun Yang, Yuankai Huo

    Abstract: Spatial transcriptomics (ST) is an emerging technology that enables medical computer vision scientists to automatically interpret the molecular profiles underlying morphological features. Currently, however, most deep learning-based ST analyses are limited to two-dimensional (2D) sections, which can introduce diagnostic errors due to the heterogeneity of pathological tissues across 3D sections. Ex… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.