Skip to main content

Showing 1–50 of 456 results for author: Zou, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.03779  [pdf, ps, other

    cs.CV cs.AI cs.LG

    FastDINOv2: Frequency Based Curriculum Learning Improves Robustness and Training Speed

    Authors: Jiaqi Zhang, Juntuo Wang, Zhixin Sun, John Zou, Randall Balestriero

    Abstract: Large-scale vision foundation models such as DINOv2 boast impressive performances by leveraging massive architectures and training datasets. But numerous scenarios require practitioners to reproduce those pre-training solutions, such as on private data, new modalities, or simply for scientific questioning--which is currently extremely demanding computation-wise. We thus propose a novel pre-trainin… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  2. arXiv:2507.03041  [pdf, ps, other

    cs.LG cs.AI

    Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards

    Authors: Shirley Wu, Parth Sarthi, Shiyu Zhao, Aaron Lee, Herumb Shandilya, Adrian Mladenic Grobelnik, Nurendra Choudhary, Eddie Huang, Karthik Subbian, Linjun Zhang, Diyi Yang, James Zou, Jure Leskovec

    Abstract: Compound AI systems integrating multiple components, such as Large Language Models, specialized tools, and traditional machine learning models, are increasingly deployed to solve complex real-world tasks. However, optimizing compound systems remains challenging due to their non-differentiable structures and diverse configuration types across components, including prompts, hyperparameters, and mode… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 20 pages

  3. arXiv:2507.02173  [pdf, ps, other

    cs.AI

    Data Diversification Methods In Alignment Enhance Math Performance In LLMs

    Authors: Berkan Dokmeci, Qingyang Wu, Ben Athiwaratkun, Ce Zhang, Shuaiwen Leon Song, James Zou

    Abstract: While recent advances in preference learning have enhanced alignment in human feedback, mathematical reasoning remains a persistent challenge. We investigate how data diversification strategies in preference optimization can improve the mathematical reasoning abilities of large language models (LLMs). We evaluate three common data generation methods: temperature sampling, Chain-of-Thought promptin… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  4. arXiv:2507.02085  [pdf, ps, other

    cs.LG cs.AI

    GeoAda: Efficiently Finetune Geometric Diffusion Models with Equivariant Adapters

    Authors: Wanjia Zhao, Jiaqi Han, Siyi Gu, Mingjian Jiang, James Zou, Stefano Ermon

    Abstract: Geometric diffusion models have shown remarkable success in molecular dynamics and structure generation. However, efficiently fine-tuning them for downstream tasks with varying geometric controls remains underexplored. In this work, we propose an SE(3)-equivariant adapter framework ( GeoAda) that enables flexible and parameter-efficient fine-tuning for controlled generative tasks without modifying… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  5. arXiv:2506.18896  [pdf, ps, other

    cs.CL

    ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs

    Authors: Jiaru Zou, Ling Yang, Jingwen Gu, Jiahao Qiu, Ke Shen, Jingrui He, Mengdi Wang

    Abstract: Process Reward Models (PRMs) have recently emerged as a powerful framework for supervising intermediate reasoning steps in large language models (LLMs). Previous PRMs are primarily trained on model final output responses and struggle to evaluate intermediate thinking trajectories robustly, especially in the emerging setting of trajectory-response outputs generated by frontier reasoning models like… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Codes and Models: https://github.com/Gen-Verse/ReasonFlux

  6. arXiv:2506.16411  [pdf, ps, other

    cs.CL cs.LG

    When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework

    Authors: Zhen Xu, Shang Zhu, Jue Wang, Junlin Wang, Ben Athiwaratkun, Chi Wang, James Zou, Ce Zhang

    Abstract: We investigate the challenge of applying Large Language Models (LLMs) to long texts. We propose a theoretical framework that distinguishes the failure modes of long context tasks into three categories: cross-chunk dependence (task noise), confusion that grows with context size (model noise), and the imperfect integration of partial results (aggregator noise). Under this view, we analyze when it is… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: under review

  7. arXiv:2506.16029  [pdf, ps, other

    cs.CL cs.AI cs.LG

    EvoLM: In Search of Lost Language Model Training Dynamics

    Authors: Zhenting Qi, Fan Nie, Alexandre Alahi, James Zou, Himabindu Lakkaraju, Yilun Du, Eric Xing, Sham Kakade, Hanlin Zhang

    Abstract: Modern language model (LM) training has been divided into multiple stages, making it difficult for downstream developers to evaluate the impact of design choices made at each stage. We present EvoLM, a model suite that enables systematic and transparent analysis of LMs' training dynamics across pre-training, continued pre-training, supervised fine-tuning, and reinforcement learning. By training ov… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  8. arXiv:2506.15882  [pdf, ps, other

    cs.LG cs.AI cs.CL eess.SP

    Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute

    Authors: Sheng Liu, Tianlang Chen, Pan Lu, Haotian Ye, Yizheng Chen, Lei Xing, James Zou

    Abstract: Test-time compute has emerged as a powerful paradigm for improving the performance of large language models (LLMs), where generating multiple outputs or refining individual chains can significantly boost answer accuracy. However, existing methods like Best-of-N, majority voting, and self-reflection typically apply reasoning in a uniform way across inputs, overlooking the fact that different proble… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 18 pages, 5 figures, Project website: https://shengliu66.github.io/fractreason/

  9. arXiv:2506.09416  [pdf, ps, other

    cs.CV

    Noise Conditional Variational Score Distillation

    Authors: Xinyu Peng, Ziyang Zheng, Yaoming Wang, Han Li, Nuowen Kan, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong

    Abstract: We propose Noise Conditional Variational Score Distillation (NCVSD), a novel method for distilling pretrained diffusion models into generative denoisers. We achieve this by revealing that the unconditional score function implicitly characterizes the score function of denoising posterior distributions. By integrating this insight into the Variational Score Distillation (VSD) framework, we enable sc… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  10. arXiv:2506.09113  [pdf, ps, other

    cs.CV

    Seedance 1.0: Exploring the Boundaries of Video Generation Models

    Authors: Yu Gao, Haoyuan Guo, Tuyen Hoang, Weilin Huang, Lu Jiang, Fangyuan Kong, Huixia Li, Jiashi Li, Liang Li, Xiaojie Li, Xunsong Li, Yifu Li, Shanchuan Lin, Zhijie Lin, Jiawei Liu, Shu Liu, Xiaonan Nie, Zhiwu Qing, Yuxi Ren, Li Sun, Zhi Tian, Rui Wang, Sen Wang, Guoqiang Wei, Guohong Wu , et al. (19 additional authors not shown)

    Abstract: Notable breakthroughs in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still face critical challenges in simultaneously balancing prompt following, motion plausibility, and visual quality. In this report, we introduce Seedance 1.0, a high-performance and inference-efficient video foundation generation model that integrates several core tec… ▽ More

    Submitted 28 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: Seedance 1.0 Technical Report

  11. arXiv:2506.07927  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Solving Inequality Proofs with Large Language Models

    Authors: Jiayi Sheng, Luna Lyu, Jikai Jin, Tony Xia, Alex Gu, James Zou, Pan Lu

    Abstract: Inequality proving, crucial across diverse scientific and mathematical fields, tests advanced reasoning skills such as discovering tight bounds and strategic theorem application. This makes it a distinct, demanding frontier for large language models (LLMs), offering insights beyond general mathematical problem-solving. Progress in this area is hampered by existing datasets that are often scarce, s… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 52 pages, 16 figures

  12. arXiv:2506.06266  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Cartridges: Lightweight and general-purpose long context representations via self-study

    Authors: Sabri Eyuboglu, Ryan Ehrlich, Simran Arora, Neel Guha, Dylan Zinsley, Emily Liu, Will Tennien, Atri Rudra, James Zou, Azalia Mirhoseini, Christopher Re

    Abstract: Large language models are often used to answer queries grounded in large text corpora (e.g. codebases, legal documents, or chat histories) by placing the entire corpus in the context window and leveraging in-context learning (ICL). Although current models support contexts of 100K-1M tokens, this setup is costly to serve because the memory consumption of the KV cache scales with input length. We ex… ▽ More

    Submitted 13 June, 2025; v1 submitted 6 June, 2025; originally announced June 2025.

  13. arXiv:2506.04458  [pdf, ps, other

    cs.CL

    Zero-Shot Open-Schema Entity Structure Discovery

    Authors: Xueqiang Xu, Jinfeng Xiao, James Barry, Mohab Elkaref, Jiaru Zou, Pengcheng Jiang, Yunyi Zhang, Max Giammona, Geeth de Mel, Jiawei Han

    Abstract: Entity structure extraction, which aims to extract entities and their associated attribute-value structures from text, is an essential task for text understanding and knowledge graph construction. Existing methods based on large language models (LLMs) typically rely heavily on predefined entity attribute schemas or annotated datasets, often leading to incomplete extraction results. To address thes… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 14 pages, 3 figures

  14. arXiv:2506.02260  [pdf, ps, other

    stat.ML cs.LG stat.AP

    MoCA: Multi-modal Cross-masked Autoencoder for Digital Health Measurements

    Authors: Howon Ryu, Yuliang Chen, Yacun Wang, Andrea Z. LaCroix, Chongzhi Di, Loki Natarajan, Yu Wang, Jingjing Zou

    Abstract: The growing prevalence of digital health technologies has led to the generation of complex multi-modal data, such as physical activity measurements simultaneously collected from various sensors of mobile and wearable devices. These data hold immense potential for advancing health studies, but current methods predominantly rely on supervised learning, requiring extensive labeled datasets that are o… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  15. arXiv:2506.02126  [pdf, ps, other

    cs.CL

    Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains

    Authors: Juncheng Wu, Sheng Liu, Haoqin Tu, Hang Yu, Xiaoke Huang, James Zou, Cihang Xie, Yuyin Zhou

    Abstract: Recent advances in reasoning-enhanced Large Language Models such as OpenAI-o1/3 and DeepSeek-R1 have significantly improved performance on complex tasks. However, the quality and transparency of their internal reasoning processes remain underexplored. This work moves beyond the final-answer accuracy and investigates step-by-step reasoning in the medical and mathematical domains by explicitly decom… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: 17 pages, preprint

  16. arXiv:2505.23614  [pdf, ps, other

    cs.LG stat.ML

    Inference-time Scaling of Diffusion Models through Classical Search

    Authors: Xiangcheng Zhang, Haowei Lin, Haotian Ye, James Zou, Jianzhu Ma, Yitao Liang, Yilun Du

    Abstract: Classical search algorithms have long underpinned modern artificial intelligence. In this work, we tackle the challenge of inference-time control in diffusion models -- adapting generated outputs to meet diverse test-time objectives -- using principles from classical search. We propose a general framework that orchestrates local and global search to efficiently navigate the generative space. It em… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Website at https://diffusion-inference-scaling.github.io/

  17. arXiv:2505.21523  [pdf, ps, other

    cs.CL cs.AI cs.CV

    More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

    Authors: Chengzhi Liu, Zhongxing Xu, Qingyue Wei, Juncheng Wu, James Zou, Xin Eric Wang, Yuyin Zhou, Sheng Liu

    Abstract: Test-time compute has empowered multimodal large language models to generate extended reasoning chains, yielding strong performance on tasks such as multimodal math reasoning. However, this improved reasoning ability often comes with increased hallucination: as generations become longer, models tend to drift away from image-grounded content and rely more heavily on language priors. Attention analy… ▽ More

    Submitted 20 June, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  18. arXiv:2505.20906  [pdf

    cs.RO

    HS-SLAM: A Fast and Hybrid Strategy-Based SLAM Approach for Low-Speed Autonomous Driving

    Authors: Bingxiang Kang, Jie Zou, Guofa Li, Pengwei Zhang, Jie Zeng, Kan Wang, Jie Li

    Abstract: Visual-inertial simultaneous localization and mapping (SLAM) is a key module of robotics and low-speed autonomous vehicles, which is usually limited by the high computation burden for practical applications. To this end, an innovative strategy-based hybrid framework HS-SLAM is proposed to integrate the advantages of direct and feature-based methods for fast computation without decreasing the perfo… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 11 pages, 7 figures

    ACM Class: I.2.9

  19. arXiv:2505.19281  [pdf, other

    cs.LG

    A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning

    Authors: Yuzheng Hu, Fan Wu, Haotian Ye, David Forsyth, James Zou, Nan Jiang, Jiaqi W. Ma, Han Zhao

    Abstract: Online reinforcement learning (RL) excels in complex, safety-critical domains, yet it faces challenges such as sample inefficiency, training instability, and a lack of interpretability. Data attribution offers a principled way to trace model behavior back to individual training samples. However, in online RL, each training sample not only drives policy updates but also influences future data colle… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  20. arXiv:2505.18996  [pdf, ps, other

    cs.LG stat.ML

    Automatic and Structure-Aware Sparsification of Hybrid Neural ODEs

    Authors: Bob Junyi Zou, Lu Tian

    Abstract: Hybrid neural ordinary differential equations (neural ODEs) integrate mechanistic models with neural ODEs, offering strong inductive bias and flexibility, and are particularly advantageous in data-scarce healthcare settings. However, excessive latent states and interactions from mechanistic models can lead to training inefficiency and over-fitting, limiting practical effectiveness of hybrid neural… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  21. arXiv:2505.18524  [pdf, ps, other

    cs.CL

    metaTextGrad: Automatically optimizing language model optimizers

    Authors: Guowei Xu, Mert Yuksekgonul, Carlos Guestrin, James Zou

    Abstract: Large language models (LLMs) are increasingly used in learning algorithms, evaluations, and optimization tasks. Recent studies have shown that using LLM-based optimizers to automatically optimize model prompts, demonstrations, predictions themselves, or other components can significantly enhance the performance of AI systems, as demonstrated by frameworks such as DSPy and TextGrad. However, optimi… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: 21 pages, 2 figures

  22. arXiv:2505.17951  [pdf, ps, other

    cs.CV

    SplatCo: Structure-View Collaborative Gaussian Splatting for Detail-Preserving Rendering of Large-Scale Unbounded Scenes

    Authors: Haihong Xiao, Jianan Zou, Yuxin Zhou, Ying He, Wenxiong Kang

    Abstract: We present SplatCo, a structure-view collaborative Gaussian splatting framework for high-fidelity rendering of complex outdoor environments. SplatCo builds upon two novel components: (1) a cross-structure collaboration module that combines global tri-plane representations, which capture coarse scene layouts, with local context grid features that represent fine surface details. This fusion is achie… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  23. arXiv:2505.16609  [pdf

    cs.RO eess.SP

    Monitoring Electrostatic Adhesion Forces via Acoustic Pressure

    Authors: Huacen Wang, Jiarui Zou, Zeju Zheng, Hongqiang Wang

    Abstract: Electrostatic adhesion is widely used in mobile robotics, haptics, and robotic end effectors for its adaptability to diverse substrates and low energy consumption. Force sensing is important for feedback control, interaction, and monitoring in the EA system. However, EA force monitoring often relies on bulky and expensive sensors, increasing the complexity and weight of the entire system. This pap… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 6 pages, 7 figures

  24. arXiv:2505.16270  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning

    Authors: Jiaru Zou, Yikun Ban, Zihao Li, Yunzhe Qi, Ruizhong Qiu, Ling Yang, Jingrui He

    Abstract: Large language models are typically adapted to downstream tasks through supervised fine-tuning on domain-specific data. While standard fine-tuning focuses on minimizing generation loss to optimize model parameters, we take a deeper step by retaining and leveraging the model's own learning signals, analogous to how human learners reflect on past mistakes to improve future performance. We first intr… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 33 pages, 7 figures

  25. arXiv:2505.14460  [pdf, ps, other

    cs.CV

    VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

    Authors: Tianhe Wu, Jian Zou, Jie Liang, Lei Zhang, Kede Ma

    Abstract: DeepSeek-R1 has demonstrated remarkable effectiveness in incentivizing reasoning and generalization capabilities of large language models (LLMs) through reinforcement learning. Nevertheless, the potential of reasoning-induced computational modeling has not been thoroughly explored in the context of image quality assessment (IQA), a task critically dependent on visual reasoning. In this paper, we i… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  26. arXiv:2505.12057  [pdf, other

    cs.AI

    CorBenchX: Large-Scale Chest X-Ray Error Dataset and Vision-Language Model Benchmark for Report Error Correction

    Authors: Jing Zou, Qingqiu Li, Chenyu Lian, Lihao Liu, Xiaohan Yan, Shujun Wang, Jing Qin

    Abstract: AI-driven models have shown great promise in detecting errors in radiology reports, yet the field lacks a unified benchmark for rigorous evaluation of error detection and further correction. To address this gap, we introduce CorBenchX, a comprehensive suite for automated error detection and correction in chest X-ray reports, designed to advance AI-assisted quality control in clinical practice. We… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

    Comments: 12 pages, 5figures

  27. arXiv:2505.11733  [pdf, ps, other

    cs.CL

    MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports

    Authors: Kevin Wu, Eric Wu, Rahul Thapa, Kevin Wei, Angela Zhang, Arvind Suresh, Jacqueline J. Tao, Min Woo Sun, Alejandro Lozano, James Zou

    Abstract: Doctors and patients alike increasingly use Large Language Models (LLMs) to diagnose clinical cases. However, unlike domains such as math or coding, where correctness can be objectively defined by the final answer, medical diagnosis requires both the outcome and the reasoning process to be accurate. Currently, widely used medical benchmarks like MedQA and MMLU assess only accuracy in the final ans… ▽ More

    Submitted 20 May, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

  28. arXiv:2505.11462  [pdf, ps, other

    cs.CL cs.AI

    Disentangling Reasoning and Knowledge in Medical Large Language Models

    Authors: Rahul Thapa, Qingyang Wu, Kevin Wu, Harrison Zhang, Angela Zhang, Eric Wu, Haotian Ye, Suhana Bedi, Nevin Aresh, Joseph Boen, Shriya Reddy, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou

    Abstract: Medical reasoning in large language models (LLMs) aims to emulate clinicians' diagnostic thinking, but current benchmarks such as MedQA-USMLE, MedMCQA, and PubMedQA often mix reasoning with factual recall. We address this by separating 11 biomedical QA benchmarks into reasoning- and knowledge-focused subsets using a PubMedBERT classifier that reaches 81 percent accuracy, comparable to human perfor… ▽ More

    Submitted 23 June, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

  29. arXiv:2505.04638  [pdf, ps, other

    cs.AI cs.CL cs.IR

    Towards Artificial Intelligence Research Assistant for Expert-Involved Learning

    Authors: Tianyu Liu, Simeng Han, Xiao Luo, Hanchen Wang, Pan Lu, Biqing Zhu, Yuge Wang, Keyi Li, Jiapeng Chen, Rihao Qu, Yufeng Liu, Xinyue Cui, Aviv Yaish, Yuhang Chen, Minsheng Hao, Chuhan Li, Kexing Li, Arman Cohan, Hua Xu, Mark Gerstein, James Zou, Hongyu Zhao

    Abstract: Large Language Models (LLMs) and Large Multi-Modal Models (LMMs) have emerged as transformative tools in scientific research, yet their reliability and specific contributions to biomedical applications remain insufficiently characterized. In this study, we present \textbf{AR}tificial \textbf{I}ntelligence research assistant for \textbf{E}xpert-involved \textbf{L}earning (ARIEL), a multimodal datas… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: 36 pages, 7 figures

  30. arXiv:2505.03641  [pdf, other

    cs.AI

    Synthesizing Images on Perceptual Boundaries of ANNs for Uncovering and Manipulating Human Perceptual Variability

    Authors: Chen Wei, Chi Zhang, Jiachen Zou, Haotian Deng, Dietmar Heinke, Quanying Liu

    Abstract: Human decision-making in cognitive tasks and daily life exhibits considerable variability, shaped by factors such as task difficulty, individual preferences, and personal experiences. Understanding this variability across individuals is essential for uncovering the perceptual and decision-making mechanisms that humans rely on when faced with uncertainty and ambiguity. We present a computational fr… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: accepted at ICML 2025

  31. arXiv:2505.03059  [pdf, other

    cs.CL

    Improving Model Alignment Through Collective Intelligence of Open-Source LLMS

    Authors: Junlin Wang, Roy Xie, Shang Zhu, Jue Wang, Ben Athiwaratkun, Bhuwan Dhingra, Shuaiwen Leon Song, Ce Zhang, James Zou

    Abstract: Building helpful and harmless large language models (LLMs) requires effective model alignment approach based on human instructions and feedback, which necessitates high-quality human-labeled data. Constructing such datasets is often expensive and hard to scale, and may face potential limitations on diversity and generalization. To address these challenges, we introduce Mixture of Agents Alignment… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: ICML 2025

  32. arXiv:2504.17427  [pdf, other

    cs.IR

    Beyond Whole Dialogue Modeling: Contextual Disentanglement for Conversational Recommendation

    Authors: Guojia An, Jie Zou, Jiwei Wei, Chaoning Zhang, Fuming Sun, Yang Yang

    Abstract: Conversational recommender systems aim to provide personalized recommendations by analyzing and utilizing contextual information related to dialogue. However, existing methods typically model the dialogue context as a whole, neglecting the inherent complexity and entanglement within the dialogue. Specifically, a dialogue comprises both focus information and background information, which mutually i… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  33. arXiv:2504.15667  [pdf, other

    eess.IV cs.CV

    Performance Estimation for Supervised Medical Image Segmentation Models on Unlabeled Data Using UniverSeg

    Authors: Jingchen Zou, Jianqiang Li, Gabriel Jimenez, Qing Zhao, Daniel Racoceanu, Matias Cosarinsky, Enzo Ferrante, Guanghui Fu

    Abstract: The performance of medical image segmentation models is usually evaluated using metrics like the Dice score and Hausdorff distance, which compare predicted masks to ground truth annotations. However, when applying the model to unseen data, such as in clinical settings, it is often impractical to annotate all the data, making the model's performance uncertain. To address this challenge, we propose… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  34. arXiv:2504.14391  [pdf, other

    cs.CV

    How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?

    Authors: Rahul Thapa, Andrew Li, Qingyang Wu, Bryan He, Yuki Sahashi, Christina Binder, Angela Zhang, Ben Athiwaratkun, Shuaiwen Leon Song, David Ouyang, James Zou

    Abstract: Publicly available biomedical videos, such as those on YouTube, serve as valuable educational resources for medical students. Unlike standard machine learning datasets, these videos are designed for human learners, often mixing medical imagery with narration, explanatory diagrams, and contextual framing. In this work, we investigate whether such pedagogically rich, yet non-standardized and heterog… ▽ More

    Submitted 21 May, 2025; v1 submitted 19 April, 2025; originally announced April 2025.

  35. arXiv:2504.14047  [pdf, other

    cs.AI

    Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods

    Authors: Junlin Wang, Shang Zhu, Jon Saad-Falcon, Ben Athiwaratkun, Qingyang Wu, Jue Wang, Shuaiwen Leon Song, Ce Zhang, Bhuwan Dhingra, James Zou

    Abstract: There is intense interest in investigating how inference time compute (ITC) (e.g. repeated sampling, refinements, etc) can improve large language model (LLM) capabilities. At the same time, recent breakthroughs in reasoning models, such as Deepseek-R1, unlock the opportunity for reinforcement learning to improve LLM reasoning skills. An in-depth understanding of how ITC interacts with reasoning ac… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  36. arXiv:2504.13655  [pdf, other

    cs.CL cs.AI cs.IR

    Multi-Type Context-Aware Conversational Recommender Systems via Mixture-of-Experts

    Authors: Jie Zou, Cheng Lin, Weikang Guo, Zheng Wang, Jiwei Wei, Yang Yang, Hengtao Shen

    Abstract: Conversational recommender systems enable natural language conversations and thus lead to a more engaging and effective recommendation scenario. As the conversations for recommender systems usually contain limited contextual information, many existing conversational recommender systems incorporate external sources to enrich the contextual information. However, how to combine different types of con… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: 30 pages

  37. arXiv:2504.13359  [pdf, other

    cs.AI cs.CL

    Cost-of-Pass: An Economic Framework for Evaluating Language Models

    Authors: Mehmet Hamza Erol, Batu El, Mirac Suzgun, Mert Yuksekgonul, James Zou

    Abstract: The widespread adoption of AI systems in the economy hinges on their ability to generate economic value that outweighs their inference costs. Evaluating this tradeoff requires metrics that account for both performance and costs. We propose a framework grounded in production theory for evaluating language models by combining accuracy and inference cost. We introduce "cost-of-pass", the expected mon… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Code is available at: https://github.com/mhamzaerol/Cost-of-Pass

  38. MSCRS: Multi-modal Semantic Graph Prompt Learning Framework for Conversational Recommender Systems

    Authors: Yibiao Wei, Jie Zou, Weikang Guo, Guoqing Wang, Xing Xu, Yang Yang

    Abstract: Conversational Recommender Systems (CRSs) aim to provide personalized recommendations by interacting with users through conversations. Most existing studies of CRS focus on extracting user preferences from conversational contexts. However, due to the short and sparse nature of conversational contexts, it is difficult to fully capture user preferences by conversational contexts only. We argue that… ▽ More

    Submitted 25 April, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

  39. arXiv:2504.09737  [pdf, other

    cs.AI cs.CL cs.HC cs.LG

    Can LLM feedback enhance review quality? A randomized study of 20K reviews at ICLR 2025

    Authors: Nitya Thakkar, Mert Yuksekgonul, Jake Silberg, Animesh Garg, Nanyun Peng, Fei Sha, Rose Yu, Carl Vondrick, James Zou

    Abstract: Peer review at AI conferences is stressed by rapidly rising submission volumes, leading to deteriorating review quality and increased author dissatisfaction. To address these issues, we developed Review Feedback Agent, a system leveraging multiple large language models (LLMs) to improve review clarity and actionability by providing automated feedback on vague comments, content misunderstandings, a… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: 30 pages, 7 figures

  40. arXiv:2504.09135  [pdf, other

    cs.CL

    Efficient and Asymptotically Unbiased Constrained Decoding for Large Language Models

    Authors: Haotian Ye, Himanshu Jain, Chong You, Ananda Theertha Suresh, Haowei Lin, James Zou, Felix Yu

    Abstract: In real-world applications of large language models, outputs are often required to be confined: selecting items from predefined product or document sets, generating phrases that comply with safety standards, or conforming to specialized formatting styles. To control the generation, constrained decoding has been widely adopted. However, existing prefix-tree-based constrained decoding is inefficient… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

    Journal ref: AISTATS 2025

  41. arXiv:2504.07952  [pdf, other

    cs.LG cs.CL

    Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory

    Authors: Mirac Suzgun, Mert Yuksekgonul, Federico Bianchi, Dan Jurafsky, James Zou

    Abstract: Despite their impressive performance on complex tasks, current language models (LMs) typically operate in a vacuum: Each input query is processed separately, without retaining insights from previous attempts. Here, we present Dynamic Cheatsheet (DC), a lightweight framework that endows a black-box LM with a persistent, evolving memory. Rather than repeatedly re-discovering or re-committing the sam… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: https://github.com/suzgunmirac/dynamic-cheatsheet

  42. arXiv:2504.04785  [pdf, other

    cs.AI

    Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors

    Authors: Fan Nie, Lan Feng, Haotian Ye, Weixin Liang, Pan Lu, Huaxiu Yao, Alexandre Alahi, James Zou

    Abstract: Efficiently leveraging of the capabilities of contemporary large language models (LLMs) is increasingly challenging, particularly when direct fine-tuning is expensive and often impractical. Existing training-free methods, including manually or automated designed workflows, typically demand substantial human effort or yield suboptimal results. This paper proposes Weak-for-Strong Harnessing (W4S), a… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  43. arXiv:2504.02810  [pdf, other

    cs.CL cs.AI cs.LG

    Generative Evaluation of Complex Reasoning in Large Language Models

    Authors: Haowei Lin, Xiangyu Wang, Ruilin Yan, Baizhou Huang, Haotian Ye, Jianhua Zhu, Zihao Wang, James Zou, Jianzhu Ma, Yitao Liang

    Abstract: With powerful large language models (LLMs) demonstrating superhuman reasoning capabilities, a critical question arises: Do LLMs genuinely reason, or do they merely recall answers from their extensive, web-scraped training datasets? Publicly released benchmarks inevitably become contaminated once incorporated into subsequent LLM training sets, undermining their reliability as faithful assessments.… ▽ More

    Submitted 25 April, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

  44. arXiv:2504.01346  [pdf, other

    cs.CL cs.IR cs.LG

    GTR: Graph-Table-RAG for Cross-Table Question Answering

    Authors: Jiaru Zou, Dongqi Fu, Sirui Chen, Xinrui He, Zihao Li, Yada Zhu, Jiawei Han, Jingrui He

    Abstract: Beyond pure text, a substantial amount of knowledge is stored in tables. In real-world scenarios, user questions often require retrieving answers that are distributed across multiple tables. GraphRAG has recently attracted much attention for enhancing LLMs' reasoning capabilities by organizing external knowledge to address ad-hoc and complex questions, exemplifying a promising direction for cross-… ▽ More

    Submitted 26 May, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

    Comments: 20 pages, 7 figures

  45. arXiv:2503.24150  [pdf, other

    cs.LG cs.AI cs.HC

    Learning a Canonical Basis of Human Preferences from Binary Ratings

    Authors: Kailas Vodrahalli, Wei Wei, James Zou

    Abstract: Recent advances in generative AI have been driven by alignment techniques such as reinforcement learning from human feedback (RLHF). RLHF and related techniques typically involve constructing a dataset of binary or ranked choice human preferences and subsequently fine-tuning models to align with these preferences. This paper shifts the focus to understanding the preferences encoded in such dataset… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

    Comments: 25 pages, 11 figures

  46. arXiv:2503.23331  [pdf, other

    cs.CV cs.LG

    HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation

    Authors: Hongwei Zheng, Han Li, Wenrui Dai, Ziyang Zheng, Chenglin Li, Junni Zou, Hongkai Xiong

    Abstract: Existing 2D-to-3D human pose estimation (HPE) methods struggle with the occlusion issue by enriching information like temporal and visual cues in the lifting stage. In this paper, we argue that these methods ignore the limitation of the sparse skeleton 2D input representation, which fundamentally restricts the 2D-to-3D lifting and worsens the occlusion issue. To address these, we propose a novel t… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

    Comments: CVPR2025

  47. arXiv:2503.20561  [pdf, other

    cs.LG stat.ML

    A Theoretical Framework for Prompt Engineering: Approximating Smooth Functions with Transformer Prompts

    Authors: Ryumei Nakada, Wenlong Ji, Tianxi Cai, James Zou, Linjun Zhang

    Abstract: Prompt engineering has emerged as a powerful technique for guiding large language models (LLMs) toward desired responses, significantly enhancing their performance across diverse tasks. Beyond their role as static predictors, LLMs increasingly function as intelligent agents, capable of reasoning, decision-making, and adapting dynamically to complex environments. However, the theoretical underpinni… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: 55 pages, 2 figures

  48. arXiv:2503.15754  [pdf, other

    cs.CR cs.AI

    AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration

    Authors: Andy Zhou, Kevin Wu, Francesco Pinto, Zhaorun Chen, Yi Zeng, Yu Yang, Shuang Yang, Sanmi Koyejo, James Zou, Bo Li

    Abstract: As large language models (LLMs) become increasingly capable, security and safety evaluation are crucial. While current red teaming approaches have made strides in assessing LLM vulnerabilities, they often rely heavily on human input and lack comprehensive coverage of emerging attack vectors. This paper introduces AutoRedTeamer, a novel framework for fully automated, end-to-end red teaming against… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  49. arXiv:2503.15243  [pdf, other

    cs.IT cs.ET

    Integrating Sensing and Communications in 6G? Not Until It Is Secure to Do So

    Authors: Nanchi Su, Fan Liu, Jiaqi Zou, Christos Masouros, George C. Alexandropoulos, Alain Mourad, Javier Lorca Hernando, Qinyu Zhang, Tse-Tin Chan

    Abstract: Integrated Sensing and Communication (ISAC) is emerging as a cornerstone technology for forthcoming 6G systems, significantly improving spectrum and energy efficiency. However, the commercial viability of ISAC hinges on addressing critical challenges surrounding security, privacy, and trustworthiness. These challenges necessitate an end-to-end framework to safeguards both communication data and se… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: 8 pages; 5 figures; submitted to an IEEE magazine

  50. arXiv:2503.08686  [pdf, other

    cs.CV

    OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models

    Authors: Jialv Zou, Bencheng Liao, Qian Zhang, Wenyu Liu, Xinggang Wang

    Abstract: Recent advancements in unified multimodal understanding and visual generation (or multimodal generation) models have been hindered by their quadratic computational complexity and dependence on large-scale training data. We present OmniMamba, the first linear-architecture-based multimodal generation model that generates both text and images through a unified next-token prediction paradigm. The mode… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.