Skip to main content

Showing 1–50 of 212 results for author: Yijia

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.17267  [pdf, ps, other

    cs.LG cs.AI

    CF-VLM:CounterFactual Vision-Language Fine-tuning

    Authors: Jusheng Zhang, Kaitong Cai, Yijia Fan, Jian Wang, Keze Wang

    Abstract: Recent advances in vision-language models (VLMs) have greatly improved cross-modal semantic understanding, yet significant limitations remain in fine-grained discrimination and deep causal reasoning tasks. Existing VLMs often rely on superficial statistical correlations, lacking the ability to capture the underlying causal logic between visual and textual content. To address this, we propose Count… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  2. arXiv:2506.14028  [pdf, ps, other

    cs.CL

    MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

    Authors: Xueqing Peng, Lingfei Qian, Yan Wang, Ruoyu Xiang, Yueru He, Yang Ren, Mingyang Jiang, Jeff Zhao, Huan He, Yi Han, Yun Feng, Yuechen Jiang, Yupeng Cao, Haohang Li, Yangyang Yu, Xiaoyu Wang, Penglei Gao, Shengyuan Lin, Keyi Wang, Shanshan Yang, Yilun Zhao, Zhiwei Liu, Peng Lu, Jerry Huang, Suyuchen Wang , et al. (19 additional authors not shown)

    Abstract: Recent advances in large language models (LLMs) have accelerated progress in financial NLP and applications, yet existing benchmarks remain limited to monolingual and unimodal settings, often over-relying on simple tasks and failing to reflect the complexity of real-world financial communication. We introduce MultiFinBen, the first multilingual and multimodal benchmark tailored to the global finan… ▽ More

    Submitted 19 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

  3. arXiv:2506.12533  [pdf, other

    cs.DM

    Stereotype graph: A mathematical framework of category stereotypes via graph theory

    Authors: Yijia Yan

    Abstract: In social psychology and cognitive science, there has been much interest in studying category stereotypes. However, we still lack a consensual mathematical definition or framework, which is necessary for us to hold a deeper understanding of stereotypes in human cognition. In this paper, we use graph theory to portray category stereotypes in human cognition, based on pairs of labels having special… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  4. arXiv:2506.07298  [pdf, ps, other

    cs.LG cs.AI

    Pre-trained Large Language Models Learn Hidden Markov Models In-context

    Authors: Yijia Dai, Zhaolin Gao, Yahya Sattar, Sarah Dean, Jennifer J. Sun

    Abstract: Hidden Markov Models (HMMs) are foundational tools for modeling sequential data with latent Markovian structure, yet fitting them to real-world data remains computationally challenging. In this work, we show that pre-trained large language models (LLMs) can effectively model data generated by HMMs via in-context learning (ICL)$\unicode{x2013}$their ability to infer patterns from examples within a… ▽ More

    Submitted 11 June, 2025; v1 submitted 8 June, 2025; originally announced June 2025.

  5. arXiv:2506.06576  [pdf, ps, other

    cs.CY cs.AI cs.CL cs.HC cs.LG

    Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce

    Authors: Yijia Shao, Humishka Zope, Yucheng Jiang, Jiaxin Pei, David Nguyen, Erik Brynjolfsson, Diyi Yang

    Abstract: The rapid rise of compound AI systems (a.k.a., AI agents) is reshaping the labor market, raising concerns about job displacement, diminished human agency, and overreliance on automation. Yet, we lack a systematic understanding of the evolving landscape. In this paper, we address this gap by introducing a novel auditing framework to assess which occupational tasks workers want AI agents to automate… ▽ More

    Submitted 11 June, 2025; v1 submitted 6 June, 2025; originally announced June 2025.

    Comments: Preprint

  6. arXiv:2506.06122  [pdf, ps, other

    cs.LG cs.DC

    Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library

    Authors: Weixun Wang, Shaopan Xiong, Gengru Chen, Wei Gao, Sheng Guo, Yancheng He, Ju Huang, Jiaheng Liu, Zhendong Li, Xiaoyang Li, Zichen Liu, Haizhou Zhao, Dakai An, Lunxi Cao, Qiyang Cao, Wanxi Deng, Feilei Du, Yiliang Gu, Jiahe Li, Xiang Li, Mingjie Liu, Yijia Luo, Zihe Liu, Yadao Wang, Pei Wang , et al. (16 additional authors not shown)

    Abstract: We introduce ROLL, an efficient, scalable, and user-friendly library designed for Reinforcement Learning Optimization for Large-scale Learning. ROLL caters to three primary user groups: tech pioneers aiming for cost-effective, fault-tolerant large-scale training, developers requiring flexible control over training workflows, and researchers seeking agile experimentation. ROLL is built upon several… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: 16 pages

  7. arXiv:2505.23760  [pdf, ps, other

    cs.LG

    Model Immunization from a Condition Number Perspective

    Authors: Amber Yijia Zheng, Cedar Site Bai, Brian Bullins, Raymond A. Yeh

    Abstract: Model immunization aims to pre-train models that are difficult to fine-tune on harmful tasks while retaining their utility on other non-harmful tasks. Though prior work has shown empirical evidence for immunizing text-to-image models, the key understanding of when immunization is possible and a precise definition of an immunized model remain unclear. In this work, we propose a framework, based on… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: ICML 2025

  8. arXiv:2505.23743  [pdf, ps, other

    cs.CV eess.IV

    DarkDiff: Advancing Low-Light Raw Enhancement by Retasking Diffusion Models for Camera ISP

    Authors: Amber Yijia Zheng, Yu Zhang, Jun Hu, Raymond A. Yeh, Chen Chen

    Abstract: High-quality photography in extreme low-light conditions is challenging but impactful for digital cameras. With advanced computing hardware, traditional camera image signal processor (ISP) algorithms are gradually being replaced by efficient deep networks that enhance noisy raw images more intelligently. However, existing regression-based models often minimize pixel errors and result in oversmooth… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  9. arXiv:2505.23399  [pdf

    cs.AI

    GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning

    Authors: Jusheng Zhang, Yijia Fan, Wenjun Lin, Ruiqi Chen, Haoyi Jiang, Wenhao Chai, Jian Wang, Keze Wang

    Abstract: We propose GAM-Agent, a game-theoretic multi-agent framework for enhancing vision-language reasoning. Unlike prior single-agent or monolithic models, GAM-Agent formulates the reasoning process as a non-zero-sum game between base agents--each specializing in visual perception subtasks--and a critical agent that verifies logic consistency and factual correctness. Agents communicate via structured cl… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  10. arXiv:2505.20246  [pdf, ps, other

    cs.AI cs.CL

    On Path to Multimodal Historical Reasoning: HistBench and HistAgent

    Authors: Jiahao Qiu, Fulian Xiao, Yimin Wang, Yuchen Mao, Yijia Chen, Xinzhe Juan, Shu Zhang, Siran Wang, Xuan Qi, Tongcheng Zhang, Zixin Yao, Jiacheng Guo, Yifu Lu, Charles Argon, Jundi Cui, Daixin Chen, Junran Zhou, Shuyao Zhou, Zhanpeng Zhou, Ling Yang, Shilong Liu, Hongru Wang, Kaixuan Huang, Xun Jiang, Yuming Cao , et al. (74 additional authors not shown)

    Abstract: Recent advances in large language models (LLMs) have led to remarkable progress across domains, yet their capabilities in the humanities, particularly history, remain underexplored. Historical reasoning poses unique challenges for AI, involving multimodal source interpretation, temporal inference, and cross-linguistic analysis. While general-purpose agents perform well on many existing benchmarks,… ▽ More

    Submitted 19 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

    Comments: 17 pages, 7 figures

  11. arXiv:2505.18475  [pdf, other

    cs.LG cs.AI

    Using Large Language Models to Tackle Fundamental Challenges in Graph Learning: A Comprehensive Survey

    Authors: Mengran Li, Pengyu Zhang, Wenbin Xing, Yijia Zheng, Klim Zaporojets, Junzhou Chen, Ronghui Zhang, Yong Zhang, Siyuan Gong, Jia Hu, Xiaolei Ma, Zhiyuan Liu, Paul Groth, Marcel Worring

    Abstract: Graphs are a widely used paradigm for representing non-Euclidean data, with applications ranging from social network analysis to biomolecular prediction. Conventional graph learning approaches typically rely on fixed structural assumptions or fully observed data, limiting their effectiveness in more complex, noisy, or evolving settings. Consequently, real-world graph data often violates the assump… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  12. arXiv:2505.15431  [pdf, ps, other

    cs.CL

    Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought

    Authors: Tencent Hunyuan Team, Ao Liu, Botong Zhou, Can Xu, Chayse Zhou, ChenChen Zhang, Chengcheng Xu, Chenhao Wang, Decheng Wu, Dengpeng Wu, Dian Jiao, Dong Du, Dong Wang, Feng Zhang, Fengzong Lian, Guanghui Xu, Guanwei Zhang, Hai Wang, Haipeng Luo, Han Hu, Huilin Xu, Jiajia Wu, Jianchen Zhu, Jianfeng Yan, Jiaqi Zhu , et al. (230 additional authors not shown)

    Abstract: As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model. It synergistically combines Mamba's long-sequence processing efficiency with Transformer's superior contextual understanding. Hunyuan-TurboS features an adaptive long-short chain-of-thought (CoT) mechanism, dynamically switching between rapid response… ▽ More

    Submitted 22 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  13. arXiv:2505.11523  [pdf, ps, other

    cs.LG

    PRIME: Physics-Related Intelligent Mixture of Experts for Transistor Characteristics Prediction

    Authors: Zhenxing Dou, Yijiao Wang, Tao Zou, Zhiwei Chen, Fei Liu, Peng Wang, Weisheng Zhao

    Abstract: In recent years, machine learning has been extensively applied to data prediction during process ramp-up, with a particular focus on transistor characteristics for circuit design and manufacture. However, capturing the nonlinear current response across multiple operating regions remains a challenge for neural networks. To address such challenge, a novel machine learning framework, PRIME (Physics-R… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: 8 pages, 6figures

  14. arXiv:2505.10003  [pdf, ps, other

    cs.LG eess.SP

    AI2MMUM: AI-AI Oriented Multi-Modal Universal Model Leveraging Telecom Domain Large Model

    Authors: Tianyu Jiao, Zhuoran Xiao, Yihang Huang, Chenhui Ye, Yijia Feng, Liyu Cai, Jiang Chang, Fangkun Liu, Yin Xu, Dazhi He, Yunfeng Guan, Wenjun Zhang

    Abstract: Designing a 6G-oriented universal model capable of processing multi-modal data and executing diverse air interface tasks has emerged as a common goal in future wireless systems. Building on our prior work in communication multi-modal alignment and telecom large language model (LLM), we propose a scalable, task-aware artificial intelligence-air interface multi-modal universal model (AI2MMUM), which… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  15. arXiv:2505.09406  [pdf, ps, other

    cs.CV

    FreeDriveRF: Monocular RGB Dynamic NeRF without Poses for Autonomous Driving via Point-Level Dynamic-Static Decoupling

    Authors: Yue Wen, Liang Song, Yijia Liu, Siting Zhu, Yanzi Miao, Lijun Han, Hesheng Wang

    Abstract: Dynamic scene reconstruction for autonomous driving enables vehicles to perceive and interpret complex scene changes more precisely. Dynamic Neural Radiance Fields (NeRFs) have recently shown promising capability in scene modeling. However, many existing methods rely heavily on accurate poses inputs and multi-sensor data, leading to increased system complexity. To address this, we propose FreeDriv… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 7 pages, 9 figures, accepted by ICRA2025

  16. arXiv:2504.18765  [pdf, ps, other

    cs.AI

    A Vision for Auto Research with LLM Agents

    Authors: Chengwei Liu, Chong Wang, Jiayue Cao, Jingquan Ge, Kun Wang, Lyuye Zhang, Ming-Ming Cheng, Penghai Zhao, Tianlin Li, Xiaojun Jia, Xiang Li, Xinfeng Li, Yang Liu, Yebo Feng, Yihao Huang, Yijia Xu, Yuqiang Sun, Zhenhong Zhou, Zhengzi Xu

    Abstract: This paper introduces Agent-Based Auto Research, a structured multi-agent framework designed to automate, coordinate, and optimize the full lifecycle of scientific research. Leveraging the capabilities of large language models (LLMs) and modular agent collaboration, the system spans all major research phases, including literature review, ideation, methodology planning, experimentation, paper writi… ▽ More

    Submitted 12 June, 2025; v1 submitted 25 April, 2025; originally announced April 2025.

  17. arXiv:2504.01911  [pdf, other

    cs.AI cs.CL cs.HC physics.comp-ph

    Advancing AI-Scientist Understanding: Making LLM Think Like a Physicist with Interpretable Reasoning

    Authors: Yinggan Xu, Hana Kimlee, Yijia Xiao, Di Luo

    Abstract: Large Language Models (LLMs) are playing an expanding role in physics research by enhancing reasoning, symbolic manipulation, and numerical computation. However, ensuring the reliability and interpretability of their outputs remains a significant challenge. In our framework, we conceptualize the collaboration between AI and human scientists as a dynamic interplay among three modules: the reasoning… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  18. arXiv:2503.23327  [pdf

    cs.HC

    AI Delivers Creative Output but Struggles with Thinking Processes

    Authors: Man Zhang, Ying Li, Yang Peng, Yijia Sun, Wenxin Guo, Huiqing Hu, Shi Chen, Qingbai Zhao

    Abstract: A key objective in artificial intelligence (AI) development is to create systems that match or surpass human creativity. Although current AI models perform well across diverse creative tasks, it remains unclear whether these achievements reflect genuine creative thinking. This study examined whether AI models (GPT-3.5-turbo, GPT-4, and GPT-4o) engage in creative thinking by comparing their perform… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

  19. arXiv:2503.22625  [pdf, ps, other

    cs.SE cs.AI cs.LG

    Challenges and Paths Towards AI for Software Engineering

    Authors: Alex Gu, Naman Jain, Wen-Ding Li, Manish Shetty, Yijia Shao, Ziyang Li, Diyi Yang, Kevin Ellis, Koushik Sen, Armando Solar-Lezama

    Abstract: AI for software engineering has made remarkable progress recently, becoming a notable success within generative AI. Despite this, there are still many challenges that need to be addressed before automated software engineering reaches its full potential. It should be possible to reach high levels of automation where humans can focus on the critical decisions of what to build and how to balance diff… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: 75 pages

  20. arXiv:2503.20776  [pdf, other

    cs.CV

    Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields

    Authors: Shijie Zhou, Hui Ren, Yijia Weng, Shuwang Zhang, Zhen Wang, Dejia Xu, Zhiwen Fan, Suya You, Zhangyang Wang, Leonidas Guibas, Achuta Kadambi

    Abstract: Recent advancements in 2D and multimodal models have achieved remarkable success by leveraging large-scale training on extensive datasets. However, extending these achievements to enable free-form interactions and high-level semantic operations with complex 3D/4D scenes remains challenging. This difficulty stems from the limited availability of large-scale, annotated 3D/4D or multi-view datasets,… ▽ More

    Submitted 28 March, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

  21. arXiv:2503.18886  [pdf, other

    cs.CV

    CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models

    Authors: Weichen Fan, Amber Yijia Zheng, Raymond A. Yeh, Ziwei Liu

    Abstract: Classifier-Free Guidance (CFG) is a widely adopted technique in diffusion/flow models to improve image fidelity and controllability. In this work, we first analytically study the effect of CFG on flow matching models trained on Gaussian mixtures where the ground-truth flow can be derived. We observe that in the early stages of training, when the flow estimation is inaccurate, CFG directs samples t… ▽ More

    Submitted 3 April, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

    Comments: Project Page: https://weichenfan.github.io/webpage-cfg-zero-star/ Github: https://github.com/WeichenFan/CFG-Zero-star

  22. arXiv:2503.18107  [pdf, other

    cs.CV

    PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding

    Authors: Hongjia Zhai, Hai Li, Zhenzhe Li, Xiaokun Pan, Yijia He, Guofeng Zhang

    Abstract: Recently, 3D Gaussian Splatting (3DGS) has shown encouraging performance for open vocabulary scene understanding tasks. However, previous methods cannot distinguish 3D instance-level information, which usually predicts a heatmap between the scene feature and text query. In this paper, we propose PanoGS, a novel and effective 3D panoptic open vocabulary scene understanding approach. Technically, to… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: CVPR 2025

  23. arXiv:2503.16385  [pdf, ps, other

    cs.AI

    Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation

    Authors: Yijia Luo, Yulin Song, Xingyao Zhang, Jiaheng Liu, Weixun Wang, GengRu Chen, Wenbo Su, Bo Zheng

    Abstract: Recent advancements in large language models (LLMs) have demonstrated remarkable reasoning capabilities through long chain-of-thought (CoT) reasoning. The R1 distillation scheme has emerged as a promising approach for training cost-effective models with enhanced reasoning abilities. However, the underlying mechanisms driving its effectiveness remain unclear. This study examines the universality of… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  24. arXiv:2503.16328  [pdf, other

    cs.LG cs.AI

    Knowledge-guided machine learning for county-level corn yield prediction under drought

    Authors: Xiaoyu Wang, Yijia Xu, Jingyi Huang, Zhengwei Yang, Zhou Zhang

    Abstract: Remote sensing (RS) technique, enabling the non-contact acquisition of extensive ground observations, is a valuable tool for crop yield predictions. Traditional process-based models struggle to incorporate large volumes of RS data, and most users lack understanding of crop growth mechanisms. In contrast, machine learning (ML) models are often criticized as "black boxes" due to their limited interp… ▽ More

    Submitted 5 May, 2025; v1 submitted 20 March, 2025; originally announced March 2025.

  25. arXiv:2503.12988  [pdf, other

    cs.AR cs.AI

    ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM

    Authors: Wenqiang Wang, Yijia Zhang, Zikai Zhang, Guanting Huo, Hao Liang, Shijie Cao, Ningyi Xu

    Abstract: As large language models (LLMs) demonstrate powerful capabilities, deploying them on edge devices has become increasingly crucial, offering advantages in privacy and real-time interaction. QLoRA has emerged as the standard approach for on-device LLMs, leveraging quantized models to reduce memory and computational costs while utilizing LoRA for task-specific adaptability. In this work, we propose R… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  26. arXiv:2503.12782  [pdf, other

    cs.RO

    DART: Dual-level Autonomous Robotic Topology for Efficient Exploration in Unknown Environments

    Authors: Qiming Wang, Yulong Gao, Yang Wang, Xiongwei Zhao, Yijiao Sun, Xiangyan Kong

    Abstract: Conventional algorithms in autonomous exploration face challenges due to their inability to accurately and efficiently identify the spatial distribution of convex regions in the real-time map. These methods often prioritize navigation toward the nearest or information-rich frontiers -- the boundaries between known and unknown areas -- resulting in incomplete convex region exploration and requiring… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

    Comments: 11 pages, 9 figures, Journal

  27. arXiv:2503.11129  [pdf, other

    cs.CV cs.AI

    Direction-Aware Diagonal Autoregressive Image Generation

    Authors: Yijia Xu, Jianzhong Ju, Jian Luan, Jinshi Cui

    Abstract: The raster-ordered image token sequence exhibits a significant Euclidean distance between index-adjacent tokens at line breaks, making it unsuitable for autoregressive generation. To address this issue, this paper proposes Direction-Aware Diagonal Autoregressive Image Generation (DAR) method, which generates image tokens following a diagonal scanning order. The proposed diagonal scanning order ens… ▽ More

    Submitted 16 April, 2025; v1 submitted 14 March, 2025; originally announced March 2025.

  28. arXiv:2503.09814  [pdf

    cond-mat.mtrl-sci cs.LG

    A practical guide to machine learning interatomic potentials -- Status and future

    Authors: Ryan Jacobs, Dane Morgan, Siamak Attarian, Jun Meng, Chen Shen, Zhenghao Wu, Clare Yijia Xie, Julia H. Yang, Nongnuch Artrith, Ben Blaiszik, Gerbrand Ceder, Kamal Choudhary, Gabor Csanyi, Ekin Dogus Cubuk, Bowen Deng, Ralf Drautz, Xiang Fu, Jonathan Godwin, Vasant Honavar, Olexandr Isayev, Anders Johansson, Boris Kozinsky, Stefano Martiniani, Shyue Ping Ong, Igor Poltavsky , et al. (5 additional authors not shown)

    Abstract: The rapid development and large body of literature on machine learning interatomic potentials (MLIPs) can make it difficult to know how to proceed for researchers who are not experts but wish to use these tools. The spirit of this review is to help such researchers by serving as a practical, accessible guide to the state-of-the-art in MLIPs. This review paper covers a broad range of topics related… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Journal ref: Current Opinion in Solid State and Materials Science, 35, 101214 (2025)

  29. arXiv:2503.09033  [pdf, other

    cs.RO cs.AI

    RFUAV: A Benchmark Dataset for Unmanned Aerial Vehicle Detection and Identification

    Authors: Rui Shi, Xiaodong Yu, Shengming Wang, Yijia Zhang, Lu Xu, Peng Pan, Chunlai Ma

    Abstract: In this paper, we propose RFUAV as a new benchmark dataset for radio-frequency based (RF-based) unmanned aerial vehicle (UAV) identification and address the following challenges: Firstly, many existing datasets feature a restricted variety of drone types and insufficient volumes of raw data, which fail to meet the demands of practical applications. Secondly, existing datasets often lack raw data c… ▽ More

    Submitted 17 March, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

    Comments: 23 pages, 13 figures, conference

  30. arXiv:2503.06117  [pdf, other

    cs.CV

    NeuraLoc: Visual Localization in Neural Implicit Map with Dual Complementary Features

    Authors: Hongjia Zhai, Boming Zhao, Hai Li, Xiaokun Pan, Yijia He, Zhaopeng Cui, Hujun Bao, Guofeng Zhang

    Abstract: Recently, neural radiance fields (NeRF) have gained significant attention in the field of visual localization. However, existing NeRF-based approaches either lack geometric constraints or require extensive storage for feature matching, limiting their practical applications. To address these challenges, we propose an efficient and novel visual localization approach based on the neural implicit map… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

    Comments: ICRA 2025

  31. arXiv:2502.17504  [pdf, other

    q-bio.BM cs.AI cs.CE cs.CL cs.LG

    Protein Large Language Models: A Comprehensive Survey

    Authors: Yijia Xiao, Wanjia Zhao, Junkai Zhang, Yiqiao Jin, Han Zhang, Zhicheng Ren, Renliang Sun, Haixin Wang, Guancheng Wan, Pan Lu, Xiao Luo, Yu Zhang, James Zou, Yizhou Sun, Wei Wang

    Abstract: Protein-specific large language models (Protein LLMs) are revolutionizing protein science by enabling more efficient protein structure prediction, function annotation, and design. While existing surveys focus on specific aspects or applications, this work provides the first comprehensive overview of Protein LLMs, covering their architectures, training datasets, evaluation metrics, and diverse appl… ▽ More

    Submitted 6 March, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

    Comments: 24 pages, 4 figures, 5 tables

  32. arXiv:2502.11770  [pdf, other

    cs.AI

    Cognitive-Aligned Document Selection for Retrieval-augmented Generation

    Authors: Bingyu Wan, Fuxi Zhang, Zhongpeng Qi, Jiayi Ding, Jijun Li, Baoshi Fan, Yijia Zhang, Jun Zhang

    Abstract: Large language models (LLMs) inherently display hallucinations since the precision of generated texts cannot be guaranteed purely by the parametric knowledge they include. Although retrieval-augmented generation (RAG) systems enhance the accuracy and reliability of generative models by incorporating external documents, these retrieved documents often fail to adequately support the model's response… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  33. arXiv:2502.07350  [pdf, other

    cs.AI

    KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems

    Authors: Jusheng Zhang, Zimeng Huang, Yijia Fan, Ningyuan Liu, Mingyan Li, Zhuojie Yang, Jiawei Yao, Jian Wang, Keze Wang

    Abstract: As scaling large language models faces prohibitive costs, multi-agent systems emerge as a promising alternative, though challenged by static knowledge assumptions and coordination inefficiencies. We introduces Knowledge-Aware Bayesian Bandits (KABB), a novel framework that enhances multi-agent system coordination through semantic understanding and dynamic adaptation. The framework features three k… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

    Report number: Accepted by the main conference of ICML 2025

  34. arXiv:2502.06111  [pdf, other

    cs.SE cs.AI cs.LG

    CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories

    Authors: Yijia Xiao, Runhui Wang, Luyang Kong, Davor Golac, Wei Wang

    Abstract: The increasing complexity of computer science research projects demands more effective tools for deploying code repositories. Large Language Models (LLMs), such as Anthropic Claude and Meta Llama, have demonstrated significant advancements across various fields of computer science research, including the automation of diverse software engineering tasks. To evaluate the effectiveness of LLMs in han… ▽ More

    Submitted 11 February, 2025; v1 submitted 9 February, 2025; originally announced February 2025.

  35. arXiv:2502.06018  [pdf, other

    cs.LG cs.AI

    Kolmogorov-Arnold Fourier Networks

    Authors: Jusheng Zhang, Yijia Fan, Kaitong Cai, Keze Wang

    Abstract: Although Kolmogorov-Arnold based interpretable networks (KAN) have strong theoretical expressiveness, they face significant parameter explosion and high-frequency feature capture challenges in high-dimensional tasks. To address this issue, we propose the Kolmogorov-Arnold-Fourier Network (KAF), which effectively integrates trainable Random Fourier Features (RFF) and a novel hybrid GELU-Fourier act… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

  36. arXiv:2502.02020  [pdf, other

    cs.LG stat.ME

    Causal bandits with backdoor adjustment on unknown Gaussian DAGs

    Authors: Yijia Zhao, Qing Zhou

    Abstract: The causal bandit problem aims to sequentially learn the intervention that maximizes the expectation of a reward variable within a system governed by a causal graph. Most existing approaches assume prior knowledge of the graph structure, or impose unrealistically restrictive conditions on the graph. In this paper, we assume a Gaussian linear directed acyclic graph (DAG) over arms and the reward va… ▽ More

    Submitted 7 March, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

  37. arXiv:2501.17326  [pdf, other

    cs.CL cs.AI cs.LG

    Memorize and Rank: Elevating Large Language Models for Clinical Diagnosis Prediction

    Authors: Mingyu Derek Ma, Xiaoxuan Wang, Yijia Xiao, Anthony Cuturrufo, Vijay S Nori, Eran Halperin, Wei Wang

    Abstract: Clinical diagnosis prediction models, when provided with a patient's medical history, aim to detect potential diseases early, facilitating timely intervention and improving prognostic outcomes. However, the inherent scarcity of patient data and large disease candidate space often pose challenges in developing satisfactory models for this intricate task. The exploration of leveraging Large Language… ▽ More

    Submitted 28 January, 2025; originally announced January 2025.

    Comments: To appear at AAAI 2025

  38. arXiv:2501.16255  [pdf, other

    cs.CL

    A foundation model for human-AI collaboration in medical literature mining

    Authors: Zifeng Wang, Lang Cao, Qiao Jin, Joey Chan, Nicholas Wan, Behdad Afzali, Hyun-Jin Cho, Chang-In Choi, Mehdi Emamverdi, Manjot K. Gill, Sun-Hyung Kim, Yijia Li, Yi Liu, Hanley Ong, Justin Rousseau, Irfan Sheikh, Jenny J. Wei, Ziyang Xu, Christopher M. Zallek, Kyungsang Kim, Yifan Peng, Zhiyong Lu, Jimeng Sun

    Abstract: Systematic literature review is essential for evidence-based medicine, requiring comprehensive analysis of clinical trial publications. However, the application of artificial intelligence (AI) models for medical literature mining has been limited by insufficient training and evaluation across broad therapeutic areas and diverse tasks. Here, we present LEADS, an AI foundation model for study search… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

  39. arXiv:2501.15368  [pdf, other

    cs.CL cs.SD eess.AS

    Baichuan-Omni-1.5 Technical Report

    Authors: Yadong Li, Jun Liu, Tao Zhang, Tao Zhang, Song Chen, Tianpeng Li, Zehuan Li, Lijun Liu, Lingfeng Ming, Guosheng Dong, Da Pan, Chong Li, Yuanbo Fang, Dongdong Kuang, Mingrui Wang, Chenglin Zhu, Youwei Zhang, Hongyu Guo, Fengyu Zhang, Yuran Wang, Bowen Ding, Wei Song, Xu Li, Yuqi Huo, Zheng Liang , et al. (68 additional authors not shown)

    Abstract: We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pip… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

  40. arXiv:2501.11235  [pdf, other

    cs.CR

    Arbitrary-Threshold Fully Homomorphic Encryption with Lower Complexity

    Authors: Yijia Chang, Songze Li

    Abstract: Threshold fully homomorphic encryption (ThFHE) enables multiple parties to compute functions over their sensitive data without leaking data privacy. Most of existing ThFHE schemes are restricted to full threshold and require the participation of \textit{all} parties to output computing results. Compared with these full-threshold schemes, arbitrary threshold (ATh)-FHE schemes are robust to non-part… ▽ More

    Submitted 19 January, 2025; originally announced January 2025.

    Comments: Accepted by USENIX Security 2025

  41. arXiv:2412.20138  [pdf, ps, other

    q-fin.TR cs.AI cs.CE cs.LG

    TradingAgents: Multi-Agents LLM Financial Trading Framework

    Authors: Yijia Xiao, Edward Sun, Di Luo, Wei Wang

    Abstract: Significant progress has been made in automated problem-solving using societies of agents powered by large language models (LLMs). In finance, efforts have largely focused on single-agent systems handling specific tasks or multi-agent frameworks independently gathering data. However, the multi-agent systems' potential to replicate real-world trading firms' collaborative dynamics remains underexplo… ▽ More

    Submitted 3 June, 2025; v1 submitted 28 December, 2024; originally announced December 2024.

    Comments: Tauric Research @ https://github.com/TauricResearch; Oral @ Multi-Agent AI in the Real World

  42. arXiv:2412.17524  [pdf, other

    cs.AI

    STAHGNet: Modeling Hybrid-grained Heterogenous Dependency Efficiently for Traffic Prediction

    Authors: Jiyao Wang, Zehua Peng, Yijia Zhang, Dengbo He, Lei Chen

    Abstract: Traffic flow prediction plays a critical role in the intelligent transportation system, and it is also a challenging task because of the underlying complex Spatio-temporal patterns and heterogeneities evolving across time. However, most present works mostly concentrate on solely capturing Spatial-temporal dependency or extracting implicit similarity graphs, but the hybrid-granularity evolution is… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

    Comments: Accepted by Neural Computing and Applications

  43. arXiv:2412.15701  [pdf, other

    cs.AI cs.CL cs.HC

    Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration

    Authors: Yijia Shao, Vinay Samuel, Yucheng Jiang, John Yang, Diyi Yang

    Abstract: Recent advancements in language models (LMs) have sparked growing interest in developing LM agents. While fully autonomous agents could excel in many scenarios, numerous use cases inherently require them to collaborate with humans due to humans' latent preferences, domain expertise, or need for control. To facilitate the study of human-agent collaboration, we present Collaborative Gym (Co-Gym), a… ▽ More

    Submitted 16 January, 2025; v1 submitted 20 December, 2024; originally announced December 2024.

    Comments: Preprint. Work in progress

  44. arXiv:2412.15320  [pdf, other

    cs.CV

    Multi-concept Model Immunization through Differentiable Model Merging

    Authors: Amber Yijia Zheng, Raymond A. Yeh

    Abstract: Model immunization is an emerging direction that aims to mitigate the potential risk of misuse associated with open-sourced models and advancing adaptation methods. The idea is to make the released models' weights difficult to fine-tune on certain harmful applications, hence the name ``immunized''. Recent work on model immunization focuses on the single-concept setting. However, models need to be… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

    Comments: AAAI 2025

  45. arXiv:2412.11109  [pdf, other

    cs.CR

    SpearBot: Leveraging Large Language Models in a Generative-Critique Framework for Spear-Phishing Email Generation

    Authors: Qinglin Qi, Yun Luo, Yijia Xu, Wenbo Guo, Yong Fang

    Abstract: Large Language Models (LLMs) are increasingly capable, aiding in tasks such as content generation, yet they also pose risks, particularly in generating harmful spear-phishing emails. These emails, crafted to entice clicks on malicious URLs, threaten personal information security. This paper proposes an adversarial framework, SpearBot, which utilizes LLMs to generate spear-phishing emails with vari… ▽ More

    Submitted 15 December, 2024; originally announced December 2024.

  46. arXiv:2412.08135  [pdf, other

    cs.RO cs.CV cs.LG

    DOGE: An Extrinsic Orientation and Gyroscope Bias Estimation for Visual-Inertial Odometry Initialization

    Authors: Zewen Xu, Yijia He, Hao Wei, Yihong Wu

    Abstract: Most existing visual-inertial odometry (VIO) initialization methods rely on accurate pre-calibrated extrinsic parameters. However, during long-term use, irreversible structural deformation caused by temperature changes, mechanical squeezing, etc. will cause changes in extrinsic parameters, especially in the rotational part. Existing initialization methods that simultaneously estimate extrinsic par… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  47. arXiv:2412.07685  [pdf, other

    math.OC cs.DS math.CO

    Automated Discovery of Branching Rules with Optimal Complexity for the Maximum Independent Set Problem

    Authors: Xuan-Zhao Gao, Yi-Jia Wang, Pan Zhang, Jin-Guo Liu

    Abstract: The branching algorithm is a fundamental technique for designing fast exponential-time algorithms to solve combinatorial optimization problems exactly. It divides the entire solution space into independent search branches using predetermined branching rules, and ignores the search on suboptimal branches to reduce the time complexity. The complexity of a branching algorithm is primarily determined… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

    Comments: 27 pages, 8 figures

  48. arXiv:2412.05584  [pdf, other

    cs.CV cs.AI

    UMSPU: Universal Multi-Size Phase Unwrapping via Mutual Self-Distillation and Adaptive Boosting Ensemble Segmenters

    Authors: Lintong Du, Huazhen Liu, Yijia Zhang, ShuXin Liu, Yuan Qu, Zenghui Zhang, Jiamiao Yang

    Abstract: Spatial phase unwrapping is a key technique for extracting phase information to obtain 3D morphology and other features. Modern industrial measurement scenarios demand high precision, large image sizes, and high speed. However, conventional methods struggle with noise resistance and processing speed. Current deep learning methods are limited by the receptive field size and sparse semantic informat… ▽ More

    Submitted 16 April, 2025; v1 submitted 7 December, 2024; originally announced December 2024.

  49. arXiv:2412.03121  [pdf, other

    cs.CV eess.IV

    Splats in Splats: Embedding Invisible 3D Watermark within Gaussian Splatting

    Authors: Yijia Guo, Wenkai Huang, Yang Li, Gaolei Li, Hang Zhang, Liwen Hu, Jianhua Li, Tiejun Huang, Lei Ma

    Abstract: 3D Gaussian splatting (3DGS) has demonstrated impressive 3D reconstruction performance with explicit scene representations. Given the widespread application of 3DGS in 3D reconstruction and generation tasks, there is an urgent need to protect the copyright of 3DGS assets. However, existing copyright protection techniques for 3DGS overlook the usability of 3D assets, posing challenges for practical… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

  50. arXiv:2411.18873  [pdf, other

    cs.PF cs.LG

    Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach

    Authors: Yijia Zhang, Zhihong Gou, Shijie Cao, Weigang Feng, Sicheng Zhang, Guohao Dai, Ningyi Xu

    Abstract: Deep Neural Networks (DNNs) have revolutionized various fields, but their deployment on GPUs often leads to significant energy consumption. Unlike existing methods for reducing GPU energy consumption, which are either hardware-inflexible or limited by workload constraints, this paper addresses the problem at the GPU kernel level. We propose a novel search-based compilation method to generate energ… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.