Skip to main content

Showing 1–50 of 157 results for author: Cai, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.06625  [pdf, ps, other

    cs.AR cs.AI cs.OS

    CaMDN: Enhancing Cache Efficiency for Multi-tenant DNNs on Integrated NPUs

    Authors: Tianhao Cai, Liang Wang, Limin Xiao, Meng Han, Zeyu Wang, Lin Sun, Xiaojian Liao

    Abstract: With the rapid development of DNN applications, multi-tenant execution, where multiple DNNs are co-located on a single SoC, is becoming a prevailing trend. Although many methods are proposed in prior works to improve multi-tenant performance, the impact of shared cache is not well studied. This paper proposes CaMDN, an architecture-scheduling co-design to enhance cache efficiency for multi-tenant… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: 7 pages, 9 figures. This paper has been accepted to the 2025 Design Automation Conference (DAC)

  2. arXiv:2503.20561  [pdf, other

    cs.LG stat.ML

    A Theoretical Framework for Prompt Engineering: Approximating Smooth Functions with Transformer Prompts

    Authors: Ryumei Nakada, Wenlong Ji, Tianxi Cai, James Zou, Linjun Zhang

    Abstract: Prompt engineering has emerged as a powerful technique for guiding large language models (LLMs) toward desired responses, significantly enhancing their performance across diverse tasks. Beyond their role as static predictors, LLMs increasingly function as intelligent agents, capable of reasoning, decision-making, and adapting dynamically to complex environments. However, the theoretical underpinni… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: 55 pages, 2 figures

  3. arXiv:2503.14492  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

    Authors: NVIDIA, :, Hassan Abu Alhaija, Jose Alvarez, Maciej Bala, Tiffany Cai, Tianshi Cao, Liz Cha, Joshua Chen, Mike Chen, Francesco Ferroni, Sanja Fidler, Dieter Fox, Yunhao Ge, Jinwei Gu, Ali Hassani, Michael Isaev, Pooya Jannaty, Shiyi Lan, Tobias Lasser, Huan Ling, Ming-Yu Liu, Xian Liu, Yifan Lu, Alice Luo , et al. (16 additional authors not shown)

    Abstract: We introduce Cosmos-Transfer, a conditional world generation model that can generate world simulations based on multiple spatial control inputs of various modalities such as segmentation, depth, and edge. In the design, the spatial conditional scheme is adaptive and customizable. It allows weighting different conditional inputs differently at different spatial locations. This enables highly contro… ▽ More

    Submitted 1 April, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

  4. arXiv:2503.11159  [pdf, other

    cs.CV

    Stabilizing Quantization-Aware Training by Implicit-Regularization on Hessian Matrix

    Authors: Junbiao Pang, Tianyang Cai

    Abstract: Quantization-Aware Training (QAT) is one of the prevailing neural network compression solutions. However, its stability has been challenged for yielding deteriorating performances as the quantization error is inevitable. We find that the sharp landscape of loss, which leads to a dramatic performance drop, is an essential factor that causes instability. Theoretically, we have discovered that the pe… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 9 pages, 5 figures

  5. arXiv:2503.10034  [pdf, other

    cs.CV cs.RO

    V2X-ReaLO: An Open Online Framework and Dataset for Cooperative Perception in Reality

    Authors: Hao Xiang, Zhaoliang Zheng, Xin Xia, Seth Z. Zhao, Letian Gao, Zewei Zhou, Tianhui Cai, Yun Zhang, Jiaqi Ma

    Abstract: Cooperative perception enabled by Vehicle-to-Everything (V2X) communication holds significant promise for enhancing the perception capabilities of autonomous vehicles, allowing them to overcome occlusions and extend their field of view. However, existing research predominantly relies on simulated environments or static datasets, leaving the feasibility and effectiveness of V2X cooperative percepti… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  6. arXiv:2503.08604  [pdf, other

    cs.RO cs.AI

    EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments

    Authors: Dongping Li, Tielong Cai, Tianci Tang, Wenhao Chai, Katherine Rose Driggs-Campbell, Gaoang Wang

    Abstract: Developing autonomous home robots controlled by natural language has long been a pursuit of humanity. While advancements in large language models (LLMs) and embodied intelligence make this goal closer, several challenges persist: the lack of a unified benchmark for more complex robot tasks, limited evaluation methods and metrics, data incompatibility between LLMs and mobile manipulation trajectori… ▽ More

    Submitted 14 May, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

  7. arXiv:2503.05139  [pdf, other

    cs.LG cs.AI cs.CL

    Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs

    Authors: Ling Team, Binwei Zeng, Chao Huang, Chao Zhang, Changxin Tian, Cong Chen, Dingnan Jin, Feng Yu, Feng Zhu, Feng Yuan, Fakang Wang, Gangshan Wang, Guangyao Zhai, Haitao Zhang, Huizhong Li, Jun Zhou, Jia Liu, Junpeng Fang, Junjie Ou, Jun Hu, Ji Luo, Ji Zhang, Jian Liu, Jian Sha, Jianxue Qian , et al. (49 additional authors not shown)

    Abstract: In this technical report, we tackle the challenges of training large-scale Mixture of Experts (MoE) models, focusing on overcoming cost inefficiency and resource limitations prevalent in such systems. To address these issues, we present two differently sized MoE large language models (LLMs), namely Ling-Lite and Ling-Plus (referred to as "Bailing" in Chinese, spelled Bǎilíng in Pinyin). Ling-Lite… ▽ More

    Submitted 10 March, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    Comments: 34 pages

  8. arXiv:2503.01926  [pdf, other

    cs.CL cs.AI

    Unnatural Languages Are Not Bugs but Features for LLMs

    Authors: Keyu Duan, Yiran Zhao, Zhili Feng, Jinjie Ni, Tianyu Pang, Qian Liu, Tianle Cai, Longxu Dou, Kenji Kawaguchi, Anirudh Goyal, J. Zico Kolter, Michael Qizhe Shieh

    Abstract: Large Language Models (LLMs) have been observed to process non-human-readable text sequences, such as jailbreak prompts, often viewed as a bug for aligned LLMs. In this work, we present a systematic investigation challenging this perception, demonstrating that unnatural languages - strings that appear incomprehensible to humans but maintain semantic meanings for LLMs - contain latent features usab… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

  9. arXiv:2502.14281  [pdf, other

    cs.LG cs.AI

    Correcting Noisy Multilabel Predictions: Modeling Label Noise through Latent Space Shifts

    Authors: Weipeng Huang, Qin Li, Yang Xiao, Cheng Qiao, Tie Cai, Junwei Liang, Neil J. Hurley, Guangyuan Piao

    Abstract: Noise in data appears to be inevitable in most real-world machine learning applications and would cause severe overfitting problems. Not only can data features contain noise, but labels are also prone to be noisy due to human input. In this paper, rather than noisy label learning in multiclass classifications, we instead focus on the less explored area of noisy label learning for multilabel classi… ▽ More

    Submitted 7 May, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

  10. arXiv:2502.08547  [pdf, other

    cs.AI

    Representation Learning to Advance Multi-institutional Studies with Electronic Health Record Data

    Authors: Doudou Zhou, Han Tong, Linshanshan Wang, Suqi Liu, Xin Xiong, Ziming Gan, Romain Griffier, Boris Hejblum, Yun-Chung Liu, Chuan Hong, Clara-Lea Bonzel, Tianrun Cai, Kevin Pan, Yuk-Lam Ho, Lauren Costa, Vidul A. Panickan, J. Michael Gaziano, Kenneth Mandl, Vianney Jouhet, Rodolphe Thiebaut, Zongqi Xia, Kelly Cho, Katherine Liao, Tianxi Cai

    Abstract: The adoption of EHRs has expanded opportunities to leverage data-driven algorithms in clinical care and research. A major bottleneck in effectively conducting multi-institutional EHR studies is the data heterogeneity across systems with numerous codes that either do not exist or represent different clinical concepts across institutions. The need for data privacy further limits the feasibility of i… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  11. arXiv:2502.08105  [pdf, other

    cs.LG

    Out-of-Distribution Detection on Graphs: A Survey

    Authors: Tingyi Cai, Yunliang Jiang, Yixin Liu, Ming Li, Changqin Huang, Shirui Pan

    Abstract: Graph machine learning has witnessed rapid growth, driving advancements across diverse domains. However, the in-distribution assumption, where training and testing data share the same distribution, often breaks in real-world scenarios, leading to degraded model performance under distribution shifts. This challenge has catalyzed interest in graph out-of-distribution (GOOD) detection, which focuses… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

    Comments: 9 pages, 6 figures

  12. arXiv:2502.07064  [pdf, other

    cs.LG cs.AI stat.ML

    Contextual Thompson Sampling via Generation of Missing Data

    Authors: Kelly W. Zhang, Tiffany Tianhui Cai, Hongseok Namkoong, Daniel Russo

    Abstract: We introduce a framework for Thompson sampling contextual bandit algorithms, in which the algorithm's ability to quantify uncertainty and make decisions depends on the quality of a generative model that is learned offline. Instead of viewing uncertainty in the environment as arising from unobservable latent parameters, our algorithm treats uncertainty as stemming from missing, but potentially obse… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  13. arXiv:2502.06453  [pdf, other

    cs.LG cs.AI cs.CL

    MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations

    Authors: Kaixuan Huang, Jiacheng Guo, Zihao Li, Xiang Ji, Jiawei Ge, Wenzhe Li, Yingqing Guo, Tianle Cai, Hui Yuan, Runzhe Wang, Yue Wu, Ming Yin, Shange Tang, Yangsibo Huang, Chi Jin, Xinyun Chen, Chiyuan Zhang, Mengdi Wang

    Abstract: Large language models have demonstrated impressive performance on challenging mathematical reasoning tasks, which has triggered the discussion of whether the performance is achieved by true reasoning capability or memorization. To investigate this question, prior work has constructed mathematical benchmarks when questions undergo simple perturbations -- modifications that still preserve the underl… ▽ More

    Submitted 12 February, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

    Comments: v2: fix bugs in Fig. 1

  14. arXiv:2501.18934  [pdf, other

    cs.CR

    Deep Learning Model Inversion Attacks and Defenses: A Comprehensive Survey

    Authors: Wencheng Yang, Song Wang, Di Wu, Taotao Cai, Yanming Zhu, Shicheng Wei, Yiying Zhang, Xu Yang, Zhaohui Tang, Yan Li

    Abstract: The rapid adoption of deep learning in sensitive domains has brought tremendous benefits. However, this widespread adoption has also given rise to serious vulnerabilities, particularly model inversion (MI) attacks, posing a significant threat to the privacy and integrity of personal data. The increasing prevalence of these attacks in applications such as biometrics, healthcare, and finance has cre… ▽ More

    Submitted 30 April, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

    Comments: 2 figures, 56 pages

  15. arXiv:2501.18435  [pdf

    cs.CL

    GENIE: Generative Note Information Extraction model for structuring EHR data

    Authors: Huaiyuan Ying, Hongyi Yuan, Jinsen Lu, Zitian Qu, Yang Zhao, Zhengyun Zhao, Isaac Kohane, Tianxi Cai, Sheng Yu

    Abstract: Electronic Health Records (EHRs) hold immense potential for advancing healthcare, offering rich, longitudinal data that combines structured information with valuable insights from unstructured clinical notes. However, the unstructured nature of clinical text poses significant challenges for secondary applications. Traditional methods for structuring EHR free-text data, such as rule-based systems a… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

  16. arXiv:2501.03575  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Cosmos World Foundation Model Platform for Physical AI

    Authors: NVIDIA, :, Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Yunhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman , et al. (54 additional authors not shown)

    Abstract: Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups. We position a world foundation model as a general-purpose world model that can be fine-tuned into cu… ▽ More

    Submitted 18 March, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

  17. arXiv:2412.18992  [pdf, other

    math.ST cs.LG

    Optimal Federated Learning for Functional Mean Estimation under Heterogeneous Privacy Constraints

    Authors: Tony Cai, Abhinav Chakraborty, Lasse Vuursteen

    Abstract: Federated learning (FL) is a distributed machine learning technique designed to preserve data privacy and security, and it has gained significant importance due to its broad range of applications. This paper addresses the problem of optimal functional mean estimation from discretely sampled data in a federated setting. We consider a heterogeneous framework where the number of individuals, measur… ▽ More

    Submitted 15 January, 2025; v1 submitted 25 December, 2024; originally announced December 2024.

    Comments: 54 pages: 25 page article and 29 pages of appendix

    MSC Class: 62G08; 62C20; 68P27; 62F30

  18. arXiv:2412.01812  [pdf, other

    cs.CV

    V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction

    Authors: Zewei Zhou, Hao Xiang, Zhaoliang Zheng, Seth Z. Zhao, Mingyue Lei, Yun Zhang, Tianhui Cai, Xinyi Liu, Johnson Liu, Maheswari Bajji, Xin Xia, Zhiyu Huang, Bolei Zhou, Jiaqi Ma

    Abstract: Vehicle-to-everything (V2X) technologies offer a promising paradigm to mitigate the limitations of constrained observability in single-vehicle systems. Prior work primarily focuses on single-frame cooperative perception, which fuses agents' information across different spatial locations but ignores temporal cues and temporal tasks (e.g., temporal perception and prediction). In this paper, we focus… ▽ More

    Submitted 13 March, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

    Comments: Website link: https://mobility-lab.seas.ucla.edu/v2xpnp/

  19. arXiv:2411.15660  [pdf, other

    math.ST cs.IT stat.ML

    Federated PCA and Estimation for Spiked Covariance Matrices: Optimal Rates and Efficient Algorithm

    Authors: Jingyang Li, T. Tony Cai, Dong Xia, Anru R. Zhang

    Abstract: Federated Learning (FL) has gained significant recent attention in machine learning for its enhanced privacy and data security, making it indispensable in fields such as healthcare, finance, and personalized services. This paper investigates federated PCA and estimation for spiked covariance matrices under distributed differential privacy constraints. We establish minimax rates of convergence, wit… ▽ More

    Submitted 23 November, 2024; originally announced November 2024.

  20. arXiv:2411.07579  [pdf, other

    cs.CV cs.GR

    Projecting Gaussian Ellipsoids While Avoiding Affine Projection Approximation

    Authors: Han Qi, Tao Cai, Xiyue Han

    Abstract: Recently, 3D Gaussian Splatting has dominated novel-view synthesis with its real-time rendering speed and state-of-the-art rendering quality. However, during the rendering process, the use of the Jacobian of the affine approximation of the projection transformation leads to inevitable errors, resulting in blurriness, artifacts and a lack of scene consistency in the final rendered images. To addres… ▽ More

    Submitted 14 November, 2024; v1 submitted 12 November, 2024; originally announced November 2024.

  21. arXiv:2411.07126  [pdf, other

    cs.CV cs.LG

    Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models

    Authors: NVIDIA, :, Yuval Atzmon, Maciej Bala, Yogesh Balaji, Tiffany Cai, Yin Cui, Jiaojiao Fan, Yunhao Ge, Siddharth Gururani, Jacob Huffman, Ronald Isaac, Pooya Jannaty, Tero Karras, Grace Lam, J. P. Lewis, Aaron Licata, Yen-Chen Lin, Ming-Yu Liu, Qianli Ma, Arun Mallya, Ashlee Martino-Tarr, Doug Mendez, Seungjun Nah, Chris Pruett , et al. (7 additional authors not shown)

    Abstract: We introduce Edify Image, a family of diffusion models capable of generating photorealistic image content with pixel-perfect accuracy. Edify Image utilizes cascaded pixel-space diffusion models trained using a novel Laplacian diffusion process, in which image signals at different frequency bands are attenuated at varying rates. Edify Image supports a wide range of applications, including text-to-i… ▽ More

    Submitted 11 November, 2024; originally announced November 2024.

  22. arXiv:2411.05007  [pdf, other

    cs.CV cs.LG

    SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

    Authors: Muyang Li, Yujun Lin, Zhekai Zhang, Tianle Cai, Xiuyu Li, Junxian Guo, Enze Xie, Chenlin Meng, Jun-Yan Zhu, Song Han

    Abstract: Diffusion models can effectively generate high-quality images. However, as they scale, rising memory demands and higher latency pose substantial deployment challenges. In this work, we aim to accelerate diffusion models by quantizing their weights and activations to 4 bits. At such an aggressive level, both weights and activations are highly sensitive, where existing post-training quantization met… ▽ More

    Submitted 3 March, 2025; v1 submitted 7 November, 2024; originally announced November 2024.

    Comments: ICLR 2025 Spotlight Quantization Library: https://github.com/mit-han-lab/deepcompressor Inference Engine: https://github.com/mit-han-lab/nunchaku Website: https://hanlab.mit.edu/projects/svdquant Demo: https://svdquant.mit.edu Blog: https://hanlab.mit.edu/blog/svdquant

  23. arXiv:2410.10144  [pdf, other

    cs.LG cs.AI cs.CL stat.AP

    Unified Representation of Genomic and Biomedical Concepts through Multi-Task, Multi-Source Contrastive Learning

    Authors: Hongyi Yuan, Suqi Liu, Kelly Cho, Katherine Liao, Alexandre Pereira, Tianxi Cai

    Abstract: We introduce GENomic Encoding REpresentation with Language Model (GENEREL), a framework designed to bridge genetic and biomedical knowledge bases. What sets GENEREL apart is its ability to fine-tune language models to infuse biological knowledge behind clinical concepts such as diseases and medications. This fine-tuning enables the model to capture complex biomedical relationships more effectively… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: 15 pages, 2 figures, 5 tables

  24. arXiv:2410.07454  [pdf, other

    stat.ME cs.LG math.ST

    Representation-Enhanced Neural Knowledge Integration with Application to Large-Scale Medical Ontology Learning

    Authors: Suqi Liu, Tianxi Cai, Xiaoou Li

    Abstract: A large-scale knowledge graph enhances reproducibility in biomedical data discovery by providing a standardized, integrated framework that ensures consistent interpretation across diverse datasets. It improves generalizability by connecting data from various sources, enabling broader applicability of findings across different populations and conditions. Generating reliable knowledge graph, leverag… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

  25. arXiv:2410.04759  [pdf, other

    cs.AI

    Driving with Regulation: Interpretable Decision-Making for Autonomous Vehicles with Retrieval-Augmented Reasoning via LLM

    Authors: Tianhui Cai, Yifan Liu, Zewei Zhou, Haoxuan Ma, Seth Z. Zhao, Zhiwen Wu, Jiaqi Ma

    Abstract: This work presents an interpretable decision-making framework for autonomous vehicles that integrates traffic regulations, norms, and safety guidelines comprehensively and enables seamless adaptation to different regions. While traditional rule-based methods struggle to incorporate the full scope of traffic rules, we develop a Traffic Regulation Retrieval (TRR) Agent based on Retrieval-Augmented G… ▽ More

    Submitted 13 March, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

  26. arXiv:2409.13758  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Optimizing the Songwriting Process: Genre-Based Lyric Generation Using Deep Learning Models

    Authors: Tracy Cai, Wilson Liang, Donte Townes

    Abstract: The traditional songwriting process is rather complex and this is evident in the time it takes to produce lyrics that fit the genre and form comprehensive verses. Our project aims to simplify this process with deep learning techniques, thus optimizing the songwriting process and enabling an artist to hit their target audience by staying in genre. Using a dataset of 18,000 songs off Spotify, we dev… ▽ More

    Submitted 15 September, 2024; originally announced September 2024.

  27. arXiv:2409.10783  [pdf, other

    cs.CL

    Predicting Punctuation in Ancient Chinese Texts: A Multi-Layered LSTM and Attention-Based Approach

    Authors: Tracy Cai, Kimmy Chang, Fahad Nabi

    Abstract: It was only until the 20th century when the Chinese language began using punctuation. In fact, many ancient Chinese texts contain thousands of lines with no distinct punctuation marks or delimiters in sight. The lack of punctuation in such texts makes it difficult for humans to identify when there pauses or breaks between particular phrases and understand the semantic meaning of the written text (… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

  28. arXiv:2409.08396  [pdf, other

    stat.ML cs.LG stat.AP

    Federated One-Shot Ensemble Clustering

    Authors: Rui Duan, Xin Xiong, Jueyi Liu, Katherine P. Liao, Tianxi Cai

    Abstract: Cluster analysis across multiple institutions poses significant challenges due to data-sharing restrictions. To overcome these limitations, we introduce the Federated One-shot Ensemble Clustering (FONT) algorithm, a novel solution tailored for multi-site analyses under such constraints. FONT requires only a single round of communication between sites and ensures privacy by exchanging only fitted m… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  29. arXiv:2408.14690  [pdf, other

    cs.CL cs.AI

    Training-Free Activation Sparsity in Large Language Models

    Authors: James Liu, Pragaash Ponnusamy, Tianle Cai, Han Guo, Yoon Kim, Ben Athiwaratkun

    Abstract: Activation sparsity can enable practical inference speedups in large language models (LLMs) by reducing the compute and memory-movement required for matrix multiplications during the forward pass. However, existing methods face limitations that inhibit widespread adoption. Some approaches are tailored towards older models with ReLU-based sparsity, while others require extensive continued pre-train… ▽ More

    Submitted 25 February, 2025; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: Rev. 2: ICLR 2025 Acceptance (Spotlight)

  30. arXiv:2408.01800  [pdf, other

    cs.CV

    MiniCPM-V: A GPT-4V Level MLLM on Your Phone

    Authors: Yuan Yao, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui, Hongji Zhu, Tianchi Cai, Haoyu Li, Weilin Zhao, Zhihui He, Qianyu Chen, Huarong Zhou, Zhensheng Zou, Haoye Zhang, Shengding Hu, Zhi Zheng, Jie Zhou, Jie Cai, Xu Han, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun

    Abstract: The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of AI research and industry, shedding light on a promising path toward the next AI milestone. However, significant challenges remain preventing MLLMs from being practical in real-world applications. The most notable challenge comes from the huge cost of running an MLLM with a massive number of par… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: preprint

  31. arXiv:2407.20228  [pdf, other

    cs.CV

    FlexAttention for Efficient High-Resolution Vision-Language Models

    Authors: Junyan Li, Delin Chen, Tianle Cai, Peihao Chen, Yining Hong, Zhenfang Chen, Yikang Shen, Chuang Gan

    Abstract: Current high-resolution vision-language models encode images as high-resolution image tokens and exhaustively take all these tokens to compute attention, which significantly increases the computational cost. To address this problem, we propose FlexAttention, a flexible attention mechanism for efficient high-resolution vision-language models. Specifically, a high-resolution image is encoded both as… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  32. Face4RAG: Factual Consistency Evaluation for Retrieval Augmented Generation in Chinese

    Authors: Yunqi Xu, Tianchi Cai, Jiyan Jiang, Xierui Song

    Abstract: The prevailing issue of factual inconsistency errors in conventional Retrieval Augmented Generation (RAG) motivates the study of Factual Consistency Evaluation (FCE). Despite the various FCE methods proposed earlier, these methods are evaluated on datasets generated by specific Large Language Models (LLMs). Without a comprehensive benchmark, it remains unexplored how these FCE methods perform on o… ▽ More

    Submitted 3 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Journal ref: KDD 2024 (oral)

  33. FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering

    Authors: Tianchi Cai, Zhiwen Tan, Xierui Song, Tao Sun, Jiyan Jiang, Yunqi Xu, Yinger Zhang, Jinjie Gu

    Abstract: Retrieval Augmented Generation (RAG) has become prevalent in question-answering (QA) tasks due to its ability of utilizing search engine to enhance the quality of long-form question-answering (LFQA). Despite the emergence of various open source methods and web-enhanced commercial systems such as Bing Chat, two critical problems remain unsolved, i.e., the lack of factuality and clear logic in the g… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Report number: 30th

    Journal ref: KDD 2024

  34. arXiv:2406.06755  [pdf, other

    math.ST cs.LG stat.ML

    Optimal Federated Learning for Nonparametric Regression with Heterogeneous Distributed Differential Privacy Constraints

    Authors: T. Tony Cai, Abhinav Chakraborty, Lasse Vuursteen

    Abstract: This paper studies federated learning for nonparametric regression in the context of distributed samples across different servers, each adhering to distinct differential privacy constraints. The setting we consider is heterogeneous, encompassing both varying sample sizes and differential privacy constraints across servers. Within this framework, both global and pointwise estimation are considered,… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 49 pages total, consisting of an article (24 pages) and a supplement (25 pages)

    MSC Class: 62G08; 62C20; 68P27; 62F30;

  35. arXiv:2406.06749  [pdf, other

    math.ST cs.LG stat.ML

    Federated Nonparametric Hypothesis Testing with Differential Privacy Constraints: Optimal Rates and Adaptive Tests

    Authors: T. Tony Cai, Abhinav Chakraborty, Lasse Vuursteen

    Abstract: Federated learning has attracted significant recent attention due to its applicability across a wide range of settings where data is collected and analyzed across disparate locations. In this paper, we study federated nonparametric goodness-of-fit testing in the white-noise-with-drift model under distributed differential privacy (DP) constraints. We first establish matching lower and upper bound… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 77 pages total; consisting of a main article (28 pages) and supplement (49 pages)

    MSC Class: 62G10; 62C20; 68P27; 62F30

  36. arXiv:2405.19466  [pdf, other

    cs.LG stat.ML

    Active Exploration via Autoregressive Generation of Missing Data

    Authors: Tiffany Tianhui Cai, Hongseok Namkoong, Daniel Russo, Kelly W Zhang

    Abstract: We pose uncertainty quantification and exploration in online decision-making as a problem of training and generation from an autoregressive sequence model, an area experiencing rapid innovation. Our approach rests on viewing uncertainty as arising from missing future outcomes that would be revealed through appropriate action choices, rather than from unobservable latent parameters of the environme… ▽ More

    Submitted 5 February, 2025; v1 submitted 29 May, 2024; originally announced May 2024.

  37. arXiv:2405.18971  [pdf, other

    cs.IR

    Mitigate Position Bias with Coupled Ranking Bias on CTR Prediction

    Authors: Yao Zhao, Zhining Liu, Tianchi Cai, Haipeng Zhang, Chenyi Zhuang, Jinjie Gu

    Abstract: Position bias, i.e., users' preference of an item is affected by its placing position, is well studied in the recommender system literature. However, most existing methods ignore the widely coupled ranking bias, which is also related to the placing position of the item. Using both synthetic and industrial datasets, we first show how this widely coexisted ranking bias deteriorates the performance o… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 5 pages, 3 figures

  38. arXiv:2405.16042  [pdf, other

    cs.CL

    Incremental Comprehension of Garden-Path Sentences by Large Language Models: Semantic Interpretation, Syntactic Re-Analysis, and Attention

    Authors: Andrew Li, Xianle Feng, Siddhant Narang, Austin Peng, Tianle Cai, Raj Sanjay Shah, Sashank Varma

    Abstract: When reading temporarily ambiguous garden-path sentences, misinterpretations sometimes linger past the point of disambiguation. This phenomenon has traditionally been studied in psycholinguistic experiments using online measures such as reading times and offline measures such as comprehension questions. Here, we investigate the processing of garden-path sentences and the fate of lingering misinter… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by CogSci-24

  39. arXiv:2405.09493  [pdf, other

    stat.ML cs.LG

    C-Learner: Constrained Learning for Causal Inference and Semiparametric Statistics

    Authors: Tiffany Tianhui Cai, Yuri Fonseca, Kaiwen Hou, Hongseok Namkoong

    Abstract: Popular debiased causal estimation methods, e.g. for the average treatment effect -- such as one-step estimation (e.g., augmented inverse propensity weighting) and targeted maximum likelihood estimation -- enjoy desirable asymptotic properties such as statistical efficiency and double robustness. However, they often produce unstable estimates when there is limited overlap between treatment and con… ▽ More

    Submitted 14 October, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

  40. arXiv:2405.06107  [pdf, other

    cs.LG cs.SC hep-ph hep-th stat.ML

    Transforming the Bootstrap: Using Transformers to Compute Scattering Amplitudes in Planar N = 4 Super Yang-Mills Theory

    Authors: Tianji Cai, Garrett W. Merz, François Charton, Niklas Nolte, Matthias Wilhelm, Kyle Cranmer, Lance J. Dixon

    Abstract: We pursue the use of deep learning methods to improve state-of-the-art computations in theoretical high-energy physics. Planar N = 4 Super Yang-Mills theory is a close cousin to the theory that describes Higgs boson production at the Large Hadron Collider; its scattering amplitudes are large mathematical expressions containing integer coefficients. In this paper, we apply Transformers to predict t… ▽ More

    Submitted 19 September, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: 26+10 pages, 9 figures, 7 tables, application of machine learning aimed at physics and machine learning audience; v2: clarifications added, matches published version

    Report number: SLAC-PUB-17774

    Journal ref: Mach.Learn.Sci.Tech. 5 (2024) 3, 035073

  41. arXiv:2404.14469  [pdf, other

    cs.CL cs.AI

    SnapKV: LLM Knows What You are Looking for Before Generation

    Authors: Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, Deming Chen

    Abstract: Large Language Models (LLMs) have made remarkable progress in processing extensive contexts, with the Key-Value (KV) cache playing a vital role in enhancing their performance. However, the growth of the KV cache in response to increasing input length poses challenges to memory and time efficiency. To address this problem, this paper introduces SnapKV, an innovative and fine-tuning-free approach th… ▽ More

    Submitted 16 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  42. arXiv:2404.07413  [pdf, other

    cs.CL cs.AI

    JetMoE: Reaching Llama2 Performance with 0.1M Dollars

    Authors: Yikang Shen, Zhen Guo, Tianle Cai, Zengyi Qin

    Abstract: Large Language Models (LLMs) have achieved remarkable results, but their increasing resource demand has become a major obstacle to the development of powerful and accessible super-human intelligence. This report introduces JetMoE-8B, a new LLM trained with less than $0.1 million, using 1.25T tokens from carefully mixed open-source corpora and 30,000 H100 GPU hours. Despite its low cost, the JetMoE… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  43. arXiv:2404.06676  [pdf

    cs.LG eess.SP stat.AP

    Topological Feature Search Method for Multichannel EEG: Application in ADHD classification

    Authors: Tianming Cai, Guoying Zhao, Junbin Zang, Chen Zong, Zhidong Zhang, Chenyang Xue

    Abstract: In recent years, the preliminary diagnosis of ADHD using EEG has attracted the attention from researchers. EEG, known for its expediency and efficiency, plays a pivotal role in the diagnosis and treatment of ADHD. However, the non-stationarity of EEG signals and inter-subject variability pose challenges to the diagnostic and classification processes. Topological Data Analysis offers a novel perspe… ▽ More

    Submitted 4 November, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  44. arXiv:2403.15484  [pdf, other

    cs.CL cs.LG

    RakutenAI-7B: Extending Large Language Models for Japanese

    Authors: Rakuten Group, Aaron Levine, Connie Huang, Chenguang Wang, Eduardo Batista, Ewa Szymanska, Hongyi Ding, Hou Wei Chou, Jean-François Pessiot, Johanes Effendi, Justin Chiu, Kai Torben Ohlhus, Karan Chopra, Keiji Shinzato, Koji Murakami, Lee Xiong, Lei Chen, Maki Kubota, Maksim Tkachenko, Miroku Lee, Naoki Takahashi, Prathyusha Jwalapuram, Ryutaro Tatsushima, Saurabh Jain, Sunil Kumar Yadav , et al. (5 additional authors not shown)

    Abstract: We introduce RakutenAI-7B, a suite of Japanese-oriented large language models that achieve the best performance on the Japanese LM Harness benchmarks among the open 7B models. Along with the foundation model, we release instruction- and chat-tuned models, RakutenAI-7B-instruct and RakutenAI-7B-chat respectively, under the Apache 2.0 license.

    Submitted 21 March, 2024; originally announced March 2024.

  45. arXiv:2403.14926  [pdf, other

    stat.ML cs.LG

    Contrastive Learning on Multimodal Analysis of Electronic Health Records

    Authors: Tianxi Cai, Feiqing Huang, Ryumei Nakada, Linjun Zhang, Doudou Zhou

    Abstract: Electronic health record (EHR) systems contain a wealth of multimodal clinical data including structured data like clinical codes and unstructured data such as clinical notes. However, many existing EHR-focused studies has traditionally either concentrated on an individual modality or merged different modalities in a rather rudimentary fashion. This approach often results in the perception of stru… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 34 pages

  46. arXiv:2403.10006  [pdf, other

    cs.CY cs.HC cs.LG cs.SI

    Graph Enhanced Reinforcement Learning for Effective Group Formation in Collaborative Problem Solving

    Authors: Zheng Fang, Fucai Ke, Jae Young Han, Zhijie Feng, Toby Cai

    Abstract: This study addresses the challenge of forming effective groups in collaborative problem-solving environments. Recognizing the complexity of human interactions and the necessity for efficient collaboration, we propose a novel approach leveraging graph theory and reinforcement learning. Our methodology involves constructing a graph from a dataset where nodes represent participants, and edges signify… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  47. arXiv:2403.01251  [pdf, other

    cs.CL

    Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling

    Authors: Yiran Zhao, Wenyue Zheng, Tianle Cai, Xuan Long Do, Kenji Kawaguchi, Anirudh Goyal, Michael Shieh

    Abstract: Safety of Large Language Models (LLMs) has become a critical issue given their rapid progresses. Greedy Coordinate Gradient (GCG) is shown to be effective in constructing adversarial prompts to break the aligned LLMs, but optimization of GCG is time-consuming. To reduce the time cost of GCG and enable more comprehensive studies of LLM safety, in this work, we study a new algorithm called… ▽ More

    Submitted 8 November, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

  48. arXiv:2402.19481  [pdf, other

    cs.CV

    DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

    Authors: Muyang Li, Tianle Cai, Jiaxin Cao, Qinsheng Zhang, Han Cai, Junjie Bai, Yangqing Jia, Ming-Yu Liu, Kai Li, Song Han

    Abstract: Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous computational costs, resulting in a prohibitive latency for interactive applications. In this paper, we propose DistriFusion to tackle this problem by leveraging parallelism across multiple GPUs. Our method split… ▽ More

    Submitted 14 July, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: CVPR 2024 Highlight Code: https://github.com/mit-han-lab/distrifuser Website: https://hanlab.mit.edu/projects/distrifusion Blog: https://hanlab.mit.edu/blog/distrifusion

  49. arXiv:2402.17437  [pdf, other

    cs.CL cs.AI

    Exploiting Emotion-Semantic Correlations for Empathetic Response Generation

    Authors: Zhou Yang, Zhaochun Ren, Yufeng Wang, Xiaofei Zhu, Zhihao Chen, Tiecheng Cai, Yunbing Wu, Yisong Su, Sibo Ju, Xiangwen Liao

    Abstract: Empathetic response generation aims to generate empathetic responses by understanding the speaker's emotional feelings from the language of dialogue. Recent methods capture emotional words in the language of communicators and construct them as static vectors to perceive nuanced emotions. However, linguistic research has shown that emotional words in language are dynamic and have correlations with… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 12 pages, 3 figures, Findings of EMNLP 2023

  50. arXiv:2402.13497  [pdf, other

    cs.CV

    In-Distribution Consistency Regularization Improves the Generalization of Quantization-Aware Training

    Authors: Junbiao Pang, Tianyang Cai, Baochang Zhang, Jiaqi Wu

    Abstract: Although existing Quantization-Aware Training (QAT) methods intensively depend on knowledge distillation to guarantee performance, QAT still suffers from severe performance drop. The experiments have shown that vanilla quantization is sensitive to the perturbation from both the input and weights. Therefore, we assume that the generalization ability of QAT is predominantly caused by both the intrin… ▽ More

    Submitted 12 January, 2025; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: 10 pages, 8 figures, 10 tables