Skip to main content

Showing 1–50 of 97 results for author: Bai, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.07235  [pdf, ps, other

    cs.CV cs.CL

    Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification

    Authors: Tianyi Bai, Zengjie Hu, Fupeng Sun, Jiantao Qiu, Yizhen Jiang, Guangxin He, Bohan Zeng, Conghui He, Binhang Yuan, Wentao Zhang

    Abstract: Multi-modal large language models (MLLMs) have achieved remarkable capabilities by integrating visual perception with language understanding, enabling applications such as image-grounded dialogue, visual question answering, and scientific analysis. However, most MLLMs adopt a static inference paradigm, encoding the entire image into fixed visual tokens upfront, which limits their ability to iterat… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  2. arXiv:2506.07227  [pdf, ps, other

    cs.CV cs.CL

    Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning

    Authors: Tianyi Bai, Yuxuan Fan, Jiantao Qiu, Fupeng Sun, Jiayi Song, Junlin Han, Zichen Liu, Conghui He, Wentao Zhang, Binhang Yuan

    Abstract: Multimodal large language models (MLLMs) have achieved strong performance on vision-language tasks but still struggle with fine-grained visual differences, leading to hallucinations or missed semantic shifts. We attribute this to limitations in both training data and learning objectives. To address these issues, we propose a controlled data generation pipeline that produces minimally edited image… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  3. arXiv:2506.06326  [pdf, ps, other

    cs.AI

    Memory OS of AI Agent

    Authors: Jiazheng Kang, Mingming Ji, Zhe Zhao, Ting Bai

    Abstract: Large Language Models (LLMs) face a crucial challenge from fixed context windows and inadequate memory management, leading to a severe shortage of long-term memory capabilities and limited personalization in the interactive experience with AI agents. To overcome this challenge, we innovatively propose a Memory Operating System, i.e., MemoryOS, to achieve comprehensive and efficient memory manageme… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  4. arXiv:2506.02781  [pdf, ps, other

    cs.CV

    FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts

    Authors: Tongyuan Bai, Wangyuanfan Bai, Dong Chen, Tieru Wu, Manyi Li, Rui Ma

    Abstract: Controllability plays a crucial role in the practical applications of 3D indoor scene synthesis. Existing works either allow rough language-based control, that is convenient but lacks fine-grained scene customization, or employ graph based control, which offers better controllability but demands considerable knowledge for the cumbersome graph design process. To address these challenges, we present… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Accepted to CVPR 2025

  5. arXiv:2506.01352  [pdf, other

    cs.LG

    TAH-QUANT: Effective Activation Quantization in Pipeline Parallelism over Slow Network

    Authors: Guangxin He, Yuan Cao, Yutong He, Tianyi Bai, Kun Yuan, Binhang Yuan

    Abstract: Decentralized training of large language models offers the opportunity to pool computational resources across geographically distributed participants but faces significant network communication bottlenecks, particularly in pipeline-parallel settings. While pipeline parallelism partitions model layers across devices to handle large-scale models, it necessitates frequent communication of intermediat… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  6. arXiv:2505.12772  [pdf, ps, other

    cs.CV

    Pyramid Sparse Transformer: Enhancing Multi-Scale Feature Fusion with Dynamic Token Selection

    Authors: Junyi Hu, Tian Bai, Fengyi Wu, Zhenming Peng, Yi Zhang

    Abstract: Feature fusion is critical for high-performance vision models but often incurs prohibitive complexity. However, prevailing attention-based fusion methods often involve significant computational complexity and implementation challenges, limiting their efficiency in resource-constrained environments. To address these issues, we introduce the Pyramid Sparse Transformer (PST), a lightweight, plug-and-… ▽ More

    Submitted 20 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: 13 pages, 5 figures

  7. Cell Library Characterization for Composite Current Source Models Based on Gaussian Process Regression and Active Learning

    Authors: Tao Bai, Junzhuo Zhou, Zeyuan Deng, Ting-Jung Lin, Wei Xing, Peng Cao, Lei He

    Abstract: The composite current source (CCS) model has been adopted as an advanced timing model that represents the current behavior of cells for improved accuracy and better capability than traditional non-linear delay models (NLDM) to model complex dynamic effects and interactions under advanced process nodes. However, the high accuracy requirement, large amount of data and extensive simulation cost pose… ▽ More

    Submitted 23 May, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

  8. arXiv:2505.00917  [pdf, other

    stat.ME cs.AI cs.LG stat.ML

    Multivariate Conformal Selection

    Authors: Tian Bai, Yue Zhao, Xiang Yu, Archer Y. Yang

    Abstract: Selecting high-quality candidates from large datasets is critical in applications such as drug discovery, precision medicine, and alignment of large language models (LLMs). While Conformal Selection (CS) provides rigorous uncertainty quantification, it is limited to univariate responses and scalar criteria. To address this issue, we propose Multivariate Conformal Selection (mCS), a generalization… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: 25 pages, 4 figures. Accepted to ICML 2025

  9. arXiv:2505.00308  [pdf

    cs.CV cs.AI stat.AP

    AI-Assisted Decision-Making for Clinical Assessment of Auto-Segmented Contour Quality

    Authors: Biling Wang, Austen Maniscalco, Ti Bai, Siqiu Wang, Michael Dohopolski, Mu-Han Lin, Chenyang Shen, Dan Nguyen, Junzhou Huang, Steve Jiang, Xinlei Wang

    Abstract: Purpose: This study presents a Deep Learning (DL)-based quality assessment (QA) approach for evaluating auto-generated contours (auto-contours) in radiotherapy, with emphasis on Online Adaptive Radiotherapy (OART). Leveraging Bayesian Ordinal Classification (BOC) and calibrated uncertainty thresholds, the method enables confident QA predictions without relying on ground truth contours or extensive… ▽ More

    Submitted 11 May, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

  10. arXiv:2504.16358  [pdf, other

    cs.CL

    Text-to-TrajVis: Enabling Trajectory Data Visualizations from Natural Language Questions

    Authors: Tian Bai, Huiyan Ying, Kailong Suo, Junqiu Wei, Tao Fan, Yuanfeng Song

    Abstract: This paper introduces the Text-to-TrajVis task, which aims to transform natural language questions into trajectory data visualizations, facilitating the development of natural language interfaces for trajectory visualization systems. As this is a novel task, there is currently no relevant dataset available in the community. To address this gap, we first devised a new visualization language called… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  11. arXiv:2504.14194  [pdf, ps, other

    cs.CL

    Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models

    Authors: Xinlin Zhuang, Jiahui Peng, Ren Ma, Yinfan Wang, Tianyi Bai, Xingjian Wei, Jiantao Qiu, Chi Zhang, Ying Qian, Conghui He

    Abstract: The composition of pre-training datasets for large language models (LLMs) remains largely undisclosed, hindering transparency and efforts to optimize data quality, a critical driver of model performance. Current data selection methods, such as natural language quality assessments, diversity-based filters, and classifier-based approaches, are limited by single-dimensional evaluation or redundancy-f… ▽ More

    Submitted 4 June, 2025; v1 submitted 19 April, 2025; originally announced April 2025.

    Comments: Accepted by ACL 2025

  12. arXiv:2504.07491  [pdf, ps, other

    cs.CV

    Kimi-VL Technical Report

    Authors: Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, Congcong Wang, Dehao Zhang, Dikang Du, Dongliang Wang, Enming Yuan, Enzhe Lu, Fang Li, Flood Sung, Guangda Wei, Guokun Lai, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang , et al. (70 additional authors not shown)

    Abstract: We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2.8B parameters in its language decoder (Kimi-VL-A3B). Kimi-VL demonstrates strong performance across challenging domains: as a general-purpose VLM, Kimi-VL excels in multi-… ▽ More

    Submitted 23 June, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

    Comments: Updated Kimi-VL-A3B-Thinking-2506 information

  13. arXiv:2503.23991  [pdf, other

    cs.GT math.OC

    Deviation Between Team-Optimal Solution and Nash Equilibrium in Flow Assignment Problems

    Authors: Gehui Xu, Ting Bai, Andreas A. Malikopoulos, Thomas Parisini

    Abstract: We investigate the relationship between the team-optimal solution and the Nash equilibrium (NE) to assess the impact of strategy deviation on team performance. As a working use case, we focus on a class of flow assignment problems in which each source node acts as a cooperating decision maker (DM) within a team that minimizes the team cost based on the team-optimal strategy. In practice, some self… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  14. arXiv:2503.10447  [pdf, ps, other

    cs.DS

    An Almost Quadratic Vertex Kernel for Subset Feedback Arc Set in Tournaments

    Authors: Tian Bai

    Abstract: In the Feedback Arc Set in Tournaments (Subset-FAST) problem, we are given a tournament $D$ and a positive integer $k$, and the objective is to determine whether there exists an arc set $S \subseteq A(D)$ of size at most $k$ whose removal makes the graph acyclic. This problem is well-known to be equivalent to a natural tournament ranking problem, whose task is to rank players in a tournament such… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: 19 pages, 4 figures

  15. arXiv:2502.16802  [pdf, other

    cs.CL cs.AI

    Unsupervised Topic Models are Data Mixers for Pre-training Language Models

    Authors: Jiahui Peng, Xinlin Zhuang, Qiu Jiantao, Ren Ma, Jing Yu, Tianyi Bai, Conghui He

    Abstract: The performance of large language models (LLMs) is significantly affected by the quality and composition of their pre-training data, which is inherently diverse, spanning various domains, sources, and topics. Effectively integrating these heterogeneous data sources is crucial for optimizing LLM performance. Previous research has predominantly concentrated on domain-based data mixing, often neglect… ▽ More

    Submitted 5 March, 2025; v1 submitted 23 February, 2025; originally announced February 2025.

    Comments: 18 pages,7 figures

  16. arXiv:2502.13997  [pdf, other

    cs.GR

    SigStyle: Signature Style Transfer via Personalized Text-to-Image Models

    Authors: Ye Wang, Tongyuan Bai, Xuping Xie, Zili Yi, Yilin Wang, Rui Ma

    Abstract: Style transfer enables the seamless integration of artistic styles from a style image into a content image, resulting in visually striking and aesthetically enriched outputs. Despite numerous advances in this field, existing methods did not explicitly focus on the signature style, which represents the distinct and recognizable visual traits of the image such as geometric and structural patterns, c… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  17. arXiv:2501.11462  [pdf, other

    cs.CV eess.IV

    On the Adversarial Vulnerabilities of Transfer Learning in Remote Sensing

    Authors: Tao Bai, Xingjian Tian, Yonghao Xu, Bihan Wen

    Abstract: The use of pretrained models from general computer vision tasks is widespread in remote sensing, significantly reducing training costs and improving performance. However, this practice also introduces vulnerabilities to downstream tasks, where publicly available pretrained models can be used as a proxy to compromise downstream models. This paper presents a novel Adversarial Neuron Manipulation met… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  18. arXiv:2501.06234  [pdf, other

    cs.OS cs.CR

    Fast, Secure, Adaptable: LionsOS Design, Implementation and Performance

    Authors: Gernot Heiser, Ivan Velickovic, Peter Chubb, Alwin Joshy, Anuraag Ganesh, Bill Nguyen, Cheng Li, Courtney Darville, Guangtao Zhu, James Archer, Jingyao Zhou, Krishnan Winter, Lucy Parker, Szymon Duchniewicz, Tianyi Bai

    Abstract: We present LionsOS, an operating system for security- and safety-critical embedded systems. LionsOS is based on the formally verified seL4 microkernel and designed with verification in mind. It uses a static architecture and features a highly modular design driven by strict separa- tion of concerns and a focus on simplicity. We demonstrate that LionsOS achieves excellent performance on system-call… ▽ More

    Submitted 27 May, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

    Comments: 14 pages, 13 figures

    ACM Class: D.4.7; D.4.8

  19. arXiv:2412.20036  [pdf, other

    cs.IR

    Invariant debiasing learning for recommendation via biased imputation

    Authors: Ting Bai, Weijie Chen, Cheng Yang, Chuan Shi

    Abstract: Previous debiasing studies utilize unbiased data to make supervision of model training. They suffer from the high trial risks and experimental costs to obtain unbiased data. Recent research attempts to use invariant learning to detach the invariant preference of users for unbiased recommendations in an unsupervised way. However, it faces the drawbacks of low model accuracy and unstable prediction… ▽ More

    Submitted 5 February, 2025; v1 submitted 28 December, 2024; originally announced December 2024.

    Journal ref: Information Processing & Management,Volume 62, Issue 3, May 2025, 104028

  20. arXiv:2412.20024  [pdf, other

    cs.AI cs.CL

    BaiJia: A Large-Scale Role-Playing Agent Corpus of Chinese Historical Characters

    Authors: Ting Bai, Jiazheng Kang, Jiayang Fan

    Abstract: We introduce a comprehensive large-scale role-playing agent corpus, termed BaiJia, that comprises various Chinese historical characters. This corpus is noteworthy for being the pioneering compilation of low-resource data that can be utilized in large language models (LLMs) to engage in AI-driven historical role-playing agents. BaiJia addresses the challenges in terms of fragmented historical textu… ▽ More

    Submitted 5 January, 2025; v1 submitted 28 December, 2024; originally announced December 2024.

  21. arXiv:2412.16216  [pdf, other

    cs.LG cs.AI cs.CL

    GMoE: Empowering LLMs Fine-Tuning via MoE Graph Collaboration

    Authors: Ting Bai, Yue Yu, Le Huang, Zenan Xu, Zhe Zhao, Chuan Shi

    Abstract: The sparse Mixture-of-Experts (MoE) architecture of large language models (LLMs) confronts an inherent issue of load imbalance arising from the simplistic linear router strategy, which ultimately causes the instability and inefficient learning of LLMs. To address this challenge, we introduce a novel MoE graph-based framework $\textbf{GMoE}$, aimed at enhancing the collaboration among multiple expe… ▽ More

    Submitted 26 May, 2025; v1 submitted 17 December, 2024; originally announced December 2024.

    Comments: 9 pages, 25 figures

    ACM Class: I.2.7

  22. arXiv:2412.05547  [pdf, other

    cs.IR cs.AI

    KG-Retriever: Efficient Knowledge Indexing for Retrieval-Augmented Large Language Models

    Authors: Weijie Chen, Ting Bai, Jinbo Su, Jian Luan, Wei Liu, Chuan Shi

    Abstract: Large language models with retrieval-augmented generation encounter a pivotal challenge in intricate retrieval tasks, e.g., multi-hop question answering, which requires the model to navigate across multiple documents and generate comprehensive responses based on fragmented information. To tackle this challenge, we introduce a novel Knowledge Graph-based RAG framework with a hierarchical knowledge… ▽ More

    Submitted 5 May, 2025; v1 submitted 7 December, 2024; originally announced December 2024.

  23. arXiv:2411.17983  [pdf, other

    stat.ME cs.AI cs.LG stat.ML

    Optimized Conformal Selection: Powerful Selective Inference After Conformity Score Optimization

    Authors: Tian Bai, Ying Jin

    Abstract: Model selection/optimization in conformal inference is challenging, since it may break the exchangeability between labeled and unlabeled data. We study this problem in the context of conformal selection, which uses conformal p-values to select ``interesting'' instances with large unobserved labels from a pool of unlabeled data, while controlling the FDR in finite sample. For validity, existing sol… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

  24. arXiv:2410.23041  [pdf, other

    cs.AI

    Emotional RAG: Enhancing Role-Playing Agents through Emotional Retrieval

    Authors: Le Huang, Hengzhi Lan, Zijun Sun, Chuan Shi, Ting Bai

    Abstract: As LLMs exhibit a high degree of human-like capability, increasing attention has been paid to role-playing research areas in which responses generated by LLMs are expected to mimic human replies. This has promoted the exploration of role-playing agents in various applications, such as chatbots that can engage in natural conversations with users and virtual assistants that can provide personalized… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

  25. arXiv:2410.09732  [pdf, other

    cs.CV

    LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

    Authors: Junyan Ye, Baichuan Zhou, Zilong Huang, Junan Zhang, Tianyi Bai, Hengrui Kang, Jun He, Honglin Lin, Zihao Wang, Tong Wu, Zhizheng Wu, Yiping Chen, Dahua Lin, Conghui He, Weijia Li

    Abstract: With the rapid development of AI-generated content, the future internet may be inundated with synthetic data, making the discrimination of authentic and credible multimodal data increasingly challenging. Synthetic data detection has thus garnered widespread attention, and the performance of large multimodal models (LMMs) in this task has attracted significant interest. LMMs can provide natural lan… ▽ More

    Submitted 20 April, 2025; v1 submitted 13 October, 2024; originally announced October 2024.

    Comments: ICLR 2025 SPOTLIGHT, 83 pages, 63 figures

  26. arXiv:2410.08102  [pdf, ps, other

    cs.CL

    Efficient Pretraining Data Selection for Language Models via Multi-Actor Collaboration

    Authors: Tianyi Bai, Ling Yang, Zhen Hao Wong, Fupeng Sun, Jiahui Peng, Xinlin Zhuang, Chi Zhang, Lijun Wu, Jiantao Qiu, Wentao Zhang, Binhang Yuan, Conghui He

    Abstract: Efficient data selection is crucial to accelerate the pretraining of language model (LMs). While various methods have been proposed to enhance data efficiency, limited research has addressed the inherent conflicts between these approaches to achieve optimal data selection for LM pretraining. To tackle this problem, we propose a multi-actor collaborative data selection mechanism: each data selectio… ▽ More

    Submitted 8 June, 2025; v1 submitted 10 October, 2024; originally announced October 2024.

  27. arXiv:2410.04211  [pdf, other

    cs.CL cs.AI

    Correlation-Aware Select and Merge Attention for Efficient Fine-Tuning and Context Length Extension

    Authors: Ning Wang, Zekun Li, Tongxin Bai, Guoqi Li

    Abstract: Modeling long sequences is crucial for various large-scale models; however, extending existing architectures to handle longer sequences presents significant technical and resource challenges. In this paper, we propose an efficient and flexible attention architecture that enables the extension of context lengths in large language models with reduced computational resources and fine-tuning time comp… ▽ More

    Submitted 5 October, 2024; originally announced October 2024.

    Comments: 11 pages, 2 figures

  28. arXiv:2409.16986  [pdf, other

    cs.AI

    Harnessing Diversity for Important Data Selection in Pretraining Large Language Models

    Authors: Chi Zhang, Huaping Zhong, Kuan Zhang, Chengliang Chai, Rui Wang, Xinlin Zhuang, Tianyi Bai, Jiantao Qiu, Lei Cao, Ju Fan, Ye Yuan, Guoren Wang, Conghui He

    Abstract: Data selection is of great significance in pre-training large language models, given the variation in quality within the large-scale available training corpora. To achieve this, researchers are currently investigating the use of data influence to measure the importance of data instances, $i.e.,$ a high influence score indicates that incorporating this instance to the training set is likely to enha… ▽ More

    Submitted 5 October, 2024; v1 submitted 25 September, 2024; originally announced September 2024.

  29. arXiv:2408.17267  [pdf, other

    cs.CV cs.AI

    UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

    Authors: Baichuan Zhou, Haote Yang, Dairong Chen, Junyan Ye, Tianyi Bai, Jinhua Yu, Songyang Zhang, Dahua Lin, Conghui He, Weijia Li

    Abstract: Recent evaluations of Large Multimodal Models (LMMs) have explored their capabilities in various domains, with only few benchmarks specifically focusing on urban environments. Moreover, existing urban benchmarks have been limited to evaluating LMMs with basic region-level urban tasks under singular views, leading to incomplete evaluations of LMMs' abilities in urban environments. To address these… ▽ More

    Submitted 9 March, 2025; v1 submitted 30 August, 2024; originally announced August 2024.

    Comments: 9 pages, 6 figures

  30. arXiv:2408.17214  [pdf, other

    cs.IR

    Efficient Multi-task Prompt Tuning for Recommendation

    Authors: Ting Bai, Le Huang, Yue Yu, Cheng Yang, Cheng Hou, Zhe Zhao, Chuan Shi

    Abstract: With the expansion of business scenarios, real recommender systems are facing challenges in dealing with the constantly emerging new tasks in multi-task learning frameworks. In this paper, we attempt to improve the generalization ability of multi-task recommendations when dealing with new tasks. We find that joint training will enhance the performance of the new task but always negatively impact e… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  31. arXiv:2407.20756  [pdf, other

    cs.CV cs.CL

    SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models

    Authors: Zheng Liu, Hao Liang, Bozhou Li, Tianyi Bai, Wentao Xiong, Chong Chen, Conghui He, Wentao Zhang, Bin Cui

    Abstract: Vision-Language Models (VLMs) have recently emerged, demonstrating remarkable vision-understanding capabilities. However, training these models requires large-scale datasets, which brings challenges related to efficiency, effectiveness, quality, and privacy of web data. In this paper, we introduce SynthVLM, a novel data synthesis and curation method for generating image-caption pairs. Unlike tradi… ▽ More

    Submitted 18 February, 2025; v1 submitted 30 July, 2024; originally announced July 2024.

  32. arXiv:2407.03104  [pdf, other

    cs.CV cs.CL cs.MM

    KeyVideoLLM: Towards Large-scale Video Keyframe Selection

    Authors: Hao Liang, Jiapeng Li, Tianyi Bai, Xijie Huang, Linzhuang Sun, Zhengren Wang, Conghui He, Bin Cui, Chong Chen, Wentao Zhang

    Abstract: Recently, with the rise of web videos, managing and understanding large-scale video datasets has become increasingly important. Video Large Language Models (VideoLLMs) have emerged in recent years due to their strong video understanding capabilities. However, training and inference processes for VideoLLMs demand vast amounts of data, presenting significant challenges to data management, particular… ▽ More

    Submitted 10 August, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  33. arXiv:2405.16640  [pdf, other

    cs.AI cs.CL cs.CV cs.MM

    A Survey of Multimodal Large Language Model from A Data-centric Perspective

    Authors: Tianyi Bai, Hao Liang, Binwang Wan, Yanran Xu, Xi Li, Shiyu Li, Ling Yang, Bozhou Li, Yifan Wang, Bin Cui, Ping Huang, Jiulong Shan, Conghui He, Binhang Yuan, Wentao Zhang

    Abstract: Multimodal large language models (MLLMs) enhance the capabilities of standard large language models by integrating and processing data from multiple modalities, including text, vision, audio, video, and 3D environments. Data plays a pivotal role in the development and refinement of these models. In this survey, we comprehensively review the literature on MLLMs from a data-centric perspective. Spec… ▽ More

    Submitted 18 July, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

  34. arXiv:2405.05288  [pdf, other

    cs.SI cs.IR cs.LG

    Learning Social Graph for Inactive User Recommendation

    Authors: Nian Liu, Shen Fan, Ting Bai, Peng Wang, Mingwei Sun, Yanhu Mo, Xiaoxiao Xu, Hong Liu, Chuan Shi

    Abstract: Social relations have been widely incorporated into recommender systems to alleviate data sparsity problem. However, raw social relations don't always benefit recommendation due to their inferior quality and insufficient quantity, especially for inactive users, whose interacted items are limited. In this paper, we propose a novel social recommendation method called LSIR (\textbf{L}earning \textbf{… ▽ More

    Submitted 22 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: This paper has been received by DASFAA 2024

  35. arXiv:2404.08963  [pdf, ps, other

    cs.GT

    Facility Assignment with Fair Cost Sharing: Equilibrium and Mechanism Design

    Authors: Mengfan Ma, Mingyu Xiao, Tian Bai, Xin Cheng

    Abstract: In the one-dimensional facility assignment problem, m facilities and n agents are positioned along the real line. Each agent will be assigned to a single facility to receive service. Each facility incurs a building cost, which is shared equally among the agents utilizing it. Additionally, each agent independently bears a connection cost to access a facility. Thus, an agent's cost is the sum of the… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  36. arXiv:2402.16810  [pdf

    cs.CL

    OncoGPT: A Medical Conversational Model Tailored with Oncology Domain Expertise on a Large Language Model Meta-AI (LLaMA)

    Authors: Fujian Jia, Xin Liu, Lixi Deng, Jiwen Gu, Chunchao Pu, Tunan Bai, Mengjiang Huang, Yuanzhi Lu, Kang Liu

    Abstract: In the past year, there has been a growing trend in applying Large Language Models (LLMs) to the field of medicine, particularly with the advent of advanced language models such as ChatGPT developed by OpenAI. However, there is limited research on LLMs specifically addressing oncology-related queries. The primary aim of this research was to develop a specialized language model that demonstrates im… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  37. arXiv:2312.04822  [pdf, other

    cs.CV

    SiCP: Simultaneous Individual and Cooperative Perception for 3D Object Detection in Connected and Automated Vehicles

    Authors: Deyuan Qu, Qi Chen, Tianyu Bai, Hongsheng Lu, Heng Fan, Hao Zhang, Song Fu, Qing Yang

    Abstract: Cooperative perception for connected and automated vehicles is traditionally achieved through the fusion of feature maps from two or more vehicles. However, the absence of feature maps shared from other vehicles can lead to a significant decline in 3D object detection performance for cooperative perception models compared to standalone 3D detection models. This drawback impedes the adoption of coo… ▽ More

    Submitted 26 August, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: Accepted by IROS 2024

  38. Graph Foundation Models: Concepts, Opportunities and Challenges

    Authors: Jiawei Liu, Cheng Yang, Zhiyuan Lu, Junze Chen, Yibo Li, Mengmei Zhang, Ting Bai, Yuan Fang, Lichao Sun, Philip S. Yu, Chuan Shi

    Abstract: Foundation models have emerged as critical components in a variety of artificial intelligence applications, and showcase significant success in natural language processing and several other domains. Meanwhile, the field of graph machine learning is witnessing a paradigm transition from shallow methods to more sophisticated deep learning approaches. The capabilities of foundation models in generali… ▽ More

    Submitted 10 March, 2025; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: This is the author's version of the accepted paper (not the IEEE-published version). Citation information: DOI 10.1109/TPAMI.2025.3548729. For access to the final edited and published article, please follow the link provided: https://ieeexplore.ieee.org/document/10915556

  39. arXiv:2307.16273  [pdf, other

    cs.LG cs.CR

    zkDL: Efficient Zero-Knowledge Proofs of Deep Learning Training

    Authors: Haochen Sun, Tonghe Bai, Jason Li, Hongyang Zhang

    Abstract: The recent advancements in deep learning have brought about significant changes in various aspects of people's lives. Meanwhile, these rapid developments have raised concerns about the legitimacy of the training process of deep neural networks. To protect the intellectual properties of AI developers, directly examining the training process by accessing the model parameters and training data is oft… ▽ More

    Submitted 5 December, 2023; v1 submitted 30 July, 2023; originally announced July 2023.

    Comments: 16 pages

  40. arXiv:2307.09751  [pdf, other

    cs.IR cs.AI

    Information Retrieval Meets Large Language Models: A Strategic Report from Chinese IR Community

    Authors: Qingyao Ai, Ting Bai, Zhao Cao, Yi Chang, Jiawei Chen, Zhumin Chen, Zhiyong Cheng, Shoubin Dong, Zhicheng Dou, Fuli Feng, Shen Gao, Jiafeng Guo, Xiangnan He, Yanyan Lan, Chenliang Li, Yiqun Liu, Ziyu Lyu, Weizhi Ma, Jun Ma, Zhaochun Ren, Pengjie Ren, Zhiqiang Wang, Mingwen Wang, Ji-Rong Wen, Le Wu , et al. (8 additional authors not shown)

    Abstract: The research field of Information Retrieval (IR) has evolved significantly, expanding beyond traditional search to meet diverse user information needs. Recently, Large Language Models (LLMs) have demonstrated exceptional capabilities in text understanding, generation, and knowledge inference, opening up exciting avenues for IR research. LLMs not only facilitate generative retrieval but also offer… ▽ More

    Submitted 26 July, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

    Comments: 17 pages

  41. Gaussian processes for Bayesian inverse problems associated with linear partial differential equations

    Authors: Tianming Bai, Aretha L. Teckentrup, Konstantinos C. Zygalakis

    Abstract: This work is concerned with the use of Gaussian surrogate models for Bayesian inverse problems associated with linear partial differential equations. A particular focus is on the regime where only a small amount of training data is available. In this regime the type of Gaussian prior used is of critical importance with respect to how well the surrogate model will perform in terms of Bayesian inver… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  42. EulerNet: Adaptive Feature Interaction Learning via Euler's Formula for CTR Prediction

    Authors: Zhen Tian, Ting Bai, Wayne Xin Zhao, Ji-Rong Wen, Zhao Cao

    Abstract: Learning effective high-order feature interactions is very crucial in the CTR prediction task. However, it is very time-consuming to calculate high-order feature interactions with massive features in online e-commerce platforms. Most existing methods manually design a maximal order and further filter out the useless interactions from them. Although they reduce the high computational costs caused b… ▽ More

    Submitted 12 September, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: 10 pages, 7 figures, accepted for publication in SIGIR'23

  43. arXiv:2303.17764  [pdf, other

    cs.LG cs.AI

    Towards Adversarially Robust Continual Learning

    Authors: Tao Bai, Chen Chen, Lingjuan Lyu, Jun Zhao, Bihan Wen

    Abstract: Recent studies show that models trained by continual learning can achieve the comparable performances as the standard supervised learning and the learning flexibility of continual learning models enables their wide applications in the real world. Deep learning models, however, are shown to be vulnerable to adversarial attacks. Though there are many studies on the model robustness in the context of… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023

  44. arXiv:2303.00343  [pdf, other

    cs.CR

    SMPC Task Decomposition: A Theory for Accelerating Secure Multi-party Computation Task

    Authors: Yuanqing Feng, Tao Bai, Songfeng Lu, Xueming Tang, Junjun Wu

    Abstract: Today, we are in the era of big data, and data are becoming more and more important, especially private data. Secure Multi-party Computation (SMPC) technology enables parties to perform computing tasks without revealing original data. However, the underlying implementation of SMPC is too heavy, such as garbled circuit (GC) and oblivious transfer(OT). Every time a piece of data is added, the resour… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

  45. arXiv:2302.05927  [pdf, other

    cs.LG cs.AI

    Transfer Learning for Bayesian Optimization: A Survey

    Authors: Tianyi Bai, Yang Li, Yu Shen, Xinyi Zhang, Wentao Zhang, Bin Cui

    Abstract: A wide spectrum of design and decision problems, including parameter tuning, A/B testing and drug design, intrinsically are instances of black-box optimization. Bayesian optimization (BO) is a powerful tool that models and optimizes such expensive "black-box" functions. However, at the beginning of optimization, vanilla Bayesian optimization methods often suffer from slow convergence issue due to… ▽ More

    Submitted 12 February, 2023; originally announced February 2023.

  46. arXiv:2302.01493  [pdf

    eess.IV cs.CV physics.med-ph

    Deep Learning (DL)-based Automatic Segmentation of the Internal Pudendal Artery (IPA) for Reduction of Erectile Dysfunction in Definitive Radiotherapy of Localized Prostate Cancer

    Authors: Anjali Balagopal, Michael Dohopolski, Young Suk Kwon, Steven Montalvo, Howard Morgan, Ti Bai, Dan Nguyen, Xiao Liang, Xinran Zhong, Mu-Han Lin, Neil Desai, Steve Jiang

    Abstract: Background and purpose: Radiation-induced erectile dysfunction (RiED) is commonly seen in prostate cancer patients. Clinical trials have been developed in multiple institutions to investigate whether dose-sparing to the internal-pudendal-arteries (IPA) will improve retention of sexual potency. The IPA is usually not considered a conventional organ-at-risk (OAR) due to segmentation difficulty. In t… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

  47. AI Security for Geoscience and Remote Sensing: Challenges and Future Trends

    Authors: Yonghao Xu, Tao Bai, Weikang Yu, Shizhen Chang, Peter M. Atkinson, Pedram Ghamisi

    Abstract: Recent advances in artificial intelligence (AI) have significantly intensified research in the geoscience and remote sensing (RS) field. AI algorithms, especially deep learning-based ones, have been developed and applied widely to RS data analysis. The successful application of AI covers almost all aspects of Earth observation (EO) missions, from low-level vision tasks like super-resolution, denoi… ▽ More

    Submitted 22 June, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Journal ref: IEEE Geoscience and Remote Sensing Magazine, Volume 11, Issue 2, Pages 60-85, 2023

  48. arXiv:2212.04726  [pdf, other

    cs.DS

    Breaking the Barrier $2^k$ for Subset Feedback Vertex Set in Chordal Graphs

    Authors: Tian Bai, Mingyu Xiao

    Abstract: The Subset Feedback Vertex Set problem (SFVS), to delete $k$ vertices from a given graph such that any vertex in a vertex subset (called a terminal set) is not in a cycle in the remaining graph, generalizes the famous Feedback Vertex Set problem and Multiway Cut problem. SFVS remains NP-hard even in split and chordal graphs, and SFVS in Chordal Graphs (SFVS-C) can be considered as an implicit 3-Hi… ▽ More

    Submitted 2 January, 2025; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: 33 pages, 9 figures. Full version

  49. Directed Acyclic Graph Factorization Machines for CTR Prediction via Knowledge Distillation

    Authors: Zhen Tian, Ting Bai, Zibin Zhang, Zhiyuan Xu, Kangyi Lin, Ji-Rong Wen, Wayne Xin Zhao

    Abstract: With the growth of high-dimensional sparse data in web-scale recommender systems, the computational cost to learn high-order feature interaction in CTR prediction task largely increases, which limits the use of high-order interaction models in real industrial applications. Some recent knowledge distillation based methods transfer knowledge from complex teacher models to shallow student models for… ▽ More

    Submitted 21 December, 2022; v1 submitted 20 November, 2022; originally announced November 2022.

  50. arXiv:2211.11144  [pdf

    eess.IV cs.CV

    Coarse-Super-Resolution-Fine Network (CoSF-Net): A Unified End-to-End Neural Network for 4D-MRI with Simultaneous Motion Estimation and Super-Resolution

    Authors: Shaohua Zhi, Yinghui Wang, Haonan Xiao, Ti Bai, Hong Ge, Bing Li, Chenyang Liu, Wen Li, Tian Li, Jing Cai

    Abstract: Four-dimensional magnetic resonance imaging (4D-MRI) is an emerging technique for tumor motion management in image-guided radiation therapy (IGRT). However, current 4D-MRI suffers from low spatial resolution and strong motion artifacts owing to the long acquisition time and patients' respiratory variations; these limitations, if not managed properly, can adversely affect treatment planning and del… ▽ More

    Submitted 20 November, 2022; originally announced November 2022.