-
Efficient Controllable Diffusion via Optimal Classifier Guidance
Authors:
Owen Oertell,
Shikun Sun,
Yiding Chen,
Jin Peng Zhou,
Zhiyong Wang,
Wen Sun
Abstract:
The controllable generation of diffusion models aims to steer the model to generate samples that optimize some given objective functions. It is desirable for a variety of applications including image generation, molecule generation, and DNA/sequence generation. Reinforcement Learning (RL) based fine-tuning of the base model is a popular approach but it can overfit the reward function while requiri…
▽ More
The controllable generation of diffusion models aims to steer the model to generate samples that optimize some given objective functions. It is desirable for a variety of applications including image generation, molecule generation, and DNA/sequence generation. Reinforcement Learning (RL) based fine-tuning of the base model is a popular approach but it can overfit the reward function while requiring significant resources. We frame controllable generation as a problem of finding a distribution that optimizes a KL-regularized objective function. We present SLCD -- Supervised Learning based Controllable Diffusion, which iteratively generates online data and trains a small classifier to guide the generation of the diffusion model. Similar to the standard classifier-guided diffusion, SLCD's key computation primitive is classification and does not involve any complex concepts from RL or control. Via a reduction to no-regret online learning analysis, we show that under KL divergence, the output from SLCD provably converges to the optimal solution of the KL-regularized objective. Further, we empirically demonstrate that SLCD can generate high quality samples with nearly the same inference time as the base model in both image generation with continuous diffusion and biological sequence generation with discrete diffusion. Our code is available at https://github.com/Owen-Oertell/slcd
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Value-Guided Search for Efficient Chain-of-Thought Reasoning
Authors:
Kaiwen Wang,
Jin Peng Zhou,
Jonathan Chang,
Zhaolin Gao,
Nathan Kallus,
Kianté Brantley,
Wen Sun
Abstract:
In this paper, we propose a simple and efficient method for value model training on long-context reasoning traces. Compared to existing process reward models (PRMs), our method does not require a fine-grained notion of "step," which is difficult to define for long-context reasoning models. By collecting a dataset of 2.5 million reasoning traces, we train a 1.5B token-level value model and apply it…
▽ More
In this paper, we propose a simple and efficient method for value model training on long-context reasoning traces. Compared to existing process reward models (PRMs), our method does not require a fine-grained notion of "step," which is difficult to define for long-context reasoning models. By collecting a dataset of 2.5 million reasoning traces, we train a 1.5B token-level value model and apply it to DeepSeek models for improved performance with test-time compute scaling. We find that block-wise value-guided search (VGS) with a final weighted majority vote achieves better test-time scaling than standard methods such as majority voting or best-of-n. With an inference budget of 64 generations, VGS with DeepSeek-R1-Distill-1.5B achieves an average accuracy of 45.7% across four competition math benchmarks (AIME 2024 & 2025, HMMT Feb 2024 & 2025), reaching parity with o3-mini-medium. Moreover, VGS significantly reduces the inference FLOPs required to achieve the same performance of majority voting. Our dataset, model and codebase are open-sourced.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Pre-training Large Memory Language Models with Internal and External Knowledge
Authors:
Linxi Zhao,
Sofian Zalouk,
Christian K. Belardi,
Justin Lovelace,
Jin Peng Zhou,
Kilian Q. Weinberger,
Yoav Artzi,
Jennifer J. Sun
Abstract:
Neural language models are black-boxes -- both linguistic patterns and factual knowledge are distributed across billions of opaque parameters. This entangled encoding makes it difficult to reliably inspect, verify, or update specific facts. We propose a new class of language models, Large Memory Language Models (LMLM) with a pre-training recipe that stores factual knowledge in both internal weight…
▽ More
Neural language models are black-boxes -- both linguistic patterns and factual knowledge are distributed across billions of opaque parameters. This entangled encoding makes it difficult to reliably inspect, verify, or update specific facts. We propose a new class of language models, Large Memory Language Models (LMLM) with a pre-training recipe that stores factual knowledge in both internal weights and an external database. Our approach strategically masks externally retrieved factual values from the training loss, thereby teaching the model to perform targeted lookups rather than relying on memorization in model weights. Our experiments demonstrate that LMLMs achieve competitive performance compared to significantly larger, knowledge-dense LLMs on standard benchmarks, while offering the advantages of explicit, editable, and verifiable knowledge bases. This work represents a fundamental shift in how language models interact with and manage factual knowledge.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Learning to decode logical circuits
Authors:
Yiqing Zhou,
Chao Wan,
Yichen Xu,
Jin Peng Zhou,
Kilian Q. Weinberger,
Eun-Ah Kim
Abstract:
With the development of quantum hardware bringing the error-corrected quantum circuits to the near future, the lack of an efficient polynomial-time decoding algorithms for logical circuits presents a critical bottleneck. While quantum memory decoding has been well-studied, inevitable correlated errors introduced by entangling logical gates prevent the straightforward generalization of quantum memo…
▽ More
With the development of quantum hardware bringing the error-corrected quantum circuits to the near future, the lack of an efficient polynomial-time decoding algorithms for logical circuits presents a critical bottleneck. While quantum memory decoding has been well-studied, inevitable correlated errors introduced by entangling logical gates prevent the straightforward generalization of quantum memory decoders. We introduce a data-centric modular decoder framework, Multi-Core Circuit Decoder (MCCD), consisting of decoder modules corresponding to each logical operation supported by the quantum hardware. The MCCD handles both single-qubit and entangling gates within a unified framework. We train MCCD using mirror-symmetric random Clifford circuits, demonstrating its ability to effectively learn correlated decoding patterns. Through extensive testing on circuits significantly deeper than those used in training, we show that MCCD maintains high logical accuracy while exhibiting competitive polynomial decoding time across increasing circuit depths and code distances. When compared with conventional decoders like Minimum Weight Perfect Matching (MWPM), Most Likely Error (MLE), and Belief Propagation with Ordered Statistics Post-processing (BP-OSD), MCCD achieves competitive accuracy with substantially better time efficiency, particularly for circuits with entangling gates. Our approach represents a noise-model agnostic solution to the decoding challenge for deep logical quantum circuits.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
INPROVF: Leveraging Large Language Models to Repair High-level Robot Controllers from Assumption Violations
Authors:
Qian Meng,
Jin Peng Zhou,
Kilian Q. Weinberger,
Hadas Kress-Gazit
Abstract:
This paper presents INPROVF, an automatic framework that combines large language models (LLMs) and formal methods to speed up the repair process of high-level robot controllers. Previous approaches based solely on formal methods are computationally expensive and cannot scale to large state spaces. In contrast, INPROVF uses LLMs to generate repair candidates, and formal methods to verify their corr…
▽ More
This paper presents INPROVF, an automatic framework that combines large language models (LLMs) and formal methods to speed up the repair process of high-level robot controllers. Previous approaches based solely on formal methods are computationally expensive and cannot scale to large state spaces. In contrast, INPROVF uses LLMs to generate repair candidates, and formal methods to verify their correctness. To improve the quality of these candidates, our framework first translates the symbolic representations of the environment and controllers into natural language descriptions. If a candidate fails the verification, INPROVF provides feedback on potential unsafe behaviors or unsatisfied tasks, and iteratively prompts LLMs to generate improved solutions. We demonstrate the effectiveness of INPROVF through 12 violations with various workspaces, tasks, and state space sizes.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Authors:
Jin Peng Zhou,
Kaiwen Wang,
Jonathan Chang,
Zhaolin Gao,
Nathan Kallus,
Kilian Q. Weinberger,
Kianté Brantley,
Wen Sun
Abstract:
Reinforcement learning (RL) post-training is crucial for LLM alignment and reasoning, but existing policy-based methods, such as PPO and DPO, can fall short of fixing shortcuts inherited from pre-training. In this work, we introduce $Q\sharp$, a value-based algorithm for KL-regularized RL that guides the reference policy using the optimal regularized $Q$ function. We propose to learn the optimal…
▽ More
Reinforcement learning (RL) post-training is crucial for LLM alignment and reasoning, but existing policy-based methods, such as PPO and DPO, can fall short of fixing shortcuts inherited from pre-training. In this work, we introduce $Q\sharp$, a value-based algorithm for KL-regularized RL that guides the reference policy using the optimal regularized $Q$ function. We propose to learn the optimal $Q$ function using distributional RL on an aggregated online dataset. Unlike prior value-based baselines that guide the model using unregularized $Q$-values, our method is theoretically principled and provably learns the optimal policy for the KL-regularized RL problem. Empirically, $Q\sharp$ outperforms prior baselines in math reasoning benchmarks while maintaining a smaller KL divergence to the reference policy. Theoretically, we establish a reduction from KL-regularized RL to no-regret online learning, providing the first bounds for deterministic MDPs under only realizability. Thanks to distributional RL, our bounds are also variance-dependent and converge faster when the reference policy has small variance. In sum, our results highlight $Q\sharp$ as an effective approach for post-training LLMs, offering both improved performance and theoretical guarantees. The code can be found at https://github.com/jinpz/q_sharp.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond
Authors:
Qizhou Wang,
Jin Peng Zhou,
Zhanke Zhou,
Saebyeol Shin,
Bo Han,
Kilian Q. Weinberger
Abstract:
Large language models (LLMs) should undergo rigorous audits to identify potential risks, such as copyright and privacy infringements. Once these risks emerge, timely updates are crucial to remove undesirable responses, ensuring legal and safe model usage. It has spurred recent research into LLM unlearning, focusing on erasing targeted undesirable knowledge without compromising the integrity of oth…
▽ More
Large language models (LLMs) should undergo rigorous audits to identify potential risks, such as copyright and privacy infringements. Once these risks emerge, timely updates are crucial to remove undesirable responses, ensuring legal and safe model usage. It has spurred recent research into LLM unlearning, focusing on erasing targeted undesirable knowledge without compromising the integrity of other, non-targeted responses. Existing studies have introduced various unlearning objectives to pursue LLM unlearning without necessitating complete retraining. However, each of these objectives has unique properties, and no unified framework is currently available to comprehend them thoroughly. To fill the gap, we propose a toolkit of the gradient effect (G-effect), quantifying the impacts of unlearning objectives on model performance from a gradient perspective. A notable advantage is its broad ability to detail the unlearning impacts from various aspects across instances, updating steps, and LLM layers. Accordingly, the G-effect offers new insights into identifying drawbacks of existing unlearning objectives, further motivating us to explore a series of new solutions for their mitigation and improvements. Finally, we outline promising directions that merit further studies, aiming at contributing to the community to advance this important field.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
Graders should cheat: privileged information enables expert-level automated evaluations
Authors:
Jin Peng Zhou,
Sébastien M. R. Arnold,
Nan Ding,
Kilian Q. Weinberger,
Nan Hua,
Fei Sha
Abstract:
Auto-evaluating language models (LMs), i.e., using a grader LM to evaluate the candidate LM, is an appealing way to accelerate the evaluation process and the cost associated with it. But this presents a paradox: how can we trust the grader LM, which is presumably weaker than the candidate LM, to assess problems that are beyond the frontier of the capabilities of either model or both? For instance,…
▽ More
Auto-evaluating language models (LMs), i.e., using a grader LM to evaluate the candidate LM, is an appealing way to accelerate the evaluation process and the cost associated with it. But this presents a paradox: how can we trust the grader LM, which is presumably weaker than the candidate LM, to assess problems that are beyond the frontier of the capabilities of either model or both? For instance, today's LMs struggle on graduate-level physics and Olympiad-level math, making them unreliable graders in these domains.
We show that providing privileged information -- such as ground-truth solutions or problem-specific guidelines -- improves automated evaluations on such frontier problems. This approach offers two key advantages. First, it expands the range of problems where LMs graders apply. Specifically, weaker models can now rate the predictions of stronger models. Second, privileged information can be used to devise easier variations of challenging problems which improves the separability of different LMs on tasks where their performance is generally low. With this approach, general-purpose LM graders match the state of the art performance on RewardBench, surpassing almost all the specially-tuned models. LM graders also outperform individual human raters on Vibe-Eval, and approach human expert graders on Olympiad-level math problems.
△ Less
Submitted 15 February, 2025;
originally announced February 2025.
-
Enhancing Cognitive Diagnosis by Modeling Learner Cognitive Structure State
Authors:
Zhifu Chen,
Hengnian Gu,
Jin Peng Zhou,
Dongdai Zhou
Abstract:
Cognitive diagnosis represents a fundamental research area within intelligent education, with the objective of measuring the cognitive status of individuals. Theoretically, an individual's cognitive state is essentially equivalent to their cognitive structure state. Cognitive structure state comprises two key components: knowledge state (KS) and knowledge structure state (KUS). The knowledge state…
▽ More
Cognitive diagnosis represents a fundamental research area within intelligent education, with the objective of measuring the cognitive status of individuals. Theoretically, an individual's cognitive state is essentially equivalent to their cognitive structure state. Cognitive structure state comprises two key components: knowledge state (KS) and knowledge structure state (KUS). The knowledge state reflects the learner's mastery of individual concepts, a widely studied focus within cognitive diagnosis. In contrast, the knowledge structure state-representing the learner's understanding of the relationships between concepts-remains inadequately modeled. A learner's cognitive structure is essential for promoting meaningful learning and shaping academic performance. Although various methods have been proposed, most focus on assessing KS and fail to assess KUS. To bridge this gap, we propose an innovative and effective framework-CSCD (Cognitive Structure State-based Cognitive Diagnosis)-which introduces a novel framework to modeling learners' cognitive structures in diagnostic assessments, thereby offering new insights into cognitive structure modeling. Specifically, we employ an edge-feature-based graph attention network to represent the learner's cognitive structure state, effectively integrating KS and KUS. Extensive experiments conducted on real datasets demonstrate the superior performance of this framework in terms of diagnostic accuracy and interpretability.
△ Less
Submitted 27 December, 2024;
originally announced December 2024.
-
Towards More Robust Retrieval-Augmented Generation: Evaluating RAG Under Adversarial Poisoning Attacks
Authors:
Jinyan Su,
Jin Peng Zhou,
Zhengxin Zhang,
Preslav Nakov,
Claire Cardie
Abstract:
Retrieval-Augmented Generation (RAG) systems have emerged as a promising solution to mitigate LLM hallucinations and enhance their performance in knowledge-intensive domains. However, these systems are vulnerable to adversarial poisoning attacks, where malicious passages injected into retrieval databases can mislead the model into generating factually incorrect outputs. In this paper, we investiga…
▽ More
Retrieval-Augmented Generation (RAG) systems have emerged as a promising solution to mitigate LLM hallucinations and enhance their performance in knowledge-intensive domains. However, these systems are vulnerable to adversarial poisoning attacks, where malicious passages injected into retrieval databases can mislead the model into generating factually incorrect outputs. In this paper, we investigate both the retrieval and the generation components of RAG systems to understand how to enhance their robustness against such attacks. From the retrieval perspective, we analyze why and how the adversarial contexts are retrieved and assess how the quality of the retrieved passages impacts downstream generation. From a generation perspective, we evaluate whether LLMs' advanced critical thinking and internal knowledge capabilities can be leveraged to mitigate the impact of adversarial contexts, i.e., using skeptical prompting as a self-defense mechanism. Our experiments and findings provide actionable insights into designing safer and more resilient retrieval-augmented frameworks, paving the way for their reliable deployment in real-world applications.
△ Less
Submitted 21 December, 2024;
originally announced December 2024.
-
Gemma 2: Improving Open Language Models at a Practical Size
Authors:
Gemma Team,
Morgane Riviere,
Shreya Pathak,
Pier Giuseppe Sessa,
Cassidy Hardin,
Surya Bhupatiraju,
Léonard Hussenot,
Thomas Mesnard,
Bobak Shahriari,
Alexandre Ramé,
Johan Ferret,
Peter Liu,
Pouya Tafti,
Abe Friesen,
Michelle Casbon,
Sabela Ramos,
Ravin Kumar,
Charline Le Lan,
Sammy Jerome,
Anton Tsitsulin,
Nino Vieillard,
Piotr Stanczyk,
Sertan Girgin,
Nikola Momchev,
Matt Hoffman
, et al. (173 additional authors not shown)
Abstract:
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al…
▽ More
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.
△ Less
Submitted 2 October, 2024; v1 submitted 31 July, 2024;
originally announced August 2024.
-
On Speeding Up Language Model Evaluation
Authors:
Jin Peng Zhou,
Christian K. Belardi,
Ruihan Wu,
Travis Zhang,
Carla P. Gomes,
Wen Sun,
Kilian Q. Weinberger
Abstract:
Developing prompt-based methods with Large Language Models (LLMs) requires making numerous decisions, which give rise to a combinatorial search problem over hyper-parameters. This exhaustive evaluation can be time-consuming and costly. In this paper, we propose an $\textit{adaptive}$ approach to explore this space. We are exploiting the fact that often only few samples are needed to identify clear…
▽ More
Developing prompt-based methods with Large Language Models (LLMs) requires making numerous decisions, which give rise to a combinatorial search problem over hyper-parameters. This exhaustive evaluation can be time-consuming and costly. In this paper, we propose an $\textit{adaptive}$ approach to explore this space. We are exploiting the fact that often only few samples are needed to identify clearly superior or inferior settings, and that many evaluation tests are highly correlated. We lean on multi-armed bandits to sequentially identify the next (method, validation sample)-pair to evaluate and utilize low-rank matrix factorization to fill in missing evaluations. We carefully assess the efficacy of our approach on several competitive benchmark problems and show that it can identify the top-performing method using only 5-15% of the typical resources -- resulting in 85-95% LLM cost savings. Our code is available at https://github.com/kilian-group/banditeval.
△ Less
Submitted 26 February, 2025; v1 submitted 8 July, 2024;
originally announced July 2024.
-
Orchestrating LLMs with Different Personalizations
Authors:
Jin Peng Zhou,
Katie Z Luo,
Jingwen Gu,
Jason Yuan,
Kilian Q. Weinberger,
Wen Sun
Abstract:
This paper presents a novel approach to aligning large language models (LLMs) with individual human preferences, sometimes referred to as Reinforcement Learning from \textit{Personalized} Human Feedback (RLPHF). Given stated preferences along multiple dimensions, such as helpfulness, conciseness, or humor, the goal is to create an LLM without re-training that best adheres to this specification. St…
▽ More
This paper presents a novel approach to aligning large language models (LLMs) with individual human preferences, sometimes referred to as Reinforcement Learning from \textit{Personalized} Human Feedback (RLPHF). Given stated preferences along multiple dimensions, such as helpfulness, conciseness, or humor, the goal is to create an LLM without re-training that best adheres to this specification. Starting from specialized expert LLMs, each trained for one such particular preference dimension, we propose a black-box method that merges their outputs on a per-token level. We train a lightweight Preference Control Model (PCM) that dynamically translates the preference description and current context into next-token prediction weights. By combining the expert models' outputs at the token level, our approach dynamically generates text that optimizes the given preference. Empirical tests show that our method matches or surpasses existing preference merging techniques, providing a scalable, efficient alternative to fine-tuning LLMs for individual personalization.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Code Repair with LLMs gives an Exploration-Exploitation Tradeoff
Authors:
Hao Tang,
Keya Hu,
Jin Peng Zhou,
Sicheng Zhong,
Wei-Long Zheng,
Xujie Si,
Kevin Ellis
Abstract:
Iteratively improving and repairing source code with large language models (LLMs), known as refinement, has emerged as a popular way of generating programs that would be too complex to construct in one shot. Given a bank of test cases, together with a candidate program, an LLM can improve that program by being prompted with failed test cases. But it remains an open question how to best iteratively…
▽ More
Iteratively improving and repairing source code with large language models (LLMs), known as refinement, has emerged as a popular way of generating programs that would be too complex to construct in one shot. Given a bank of test cases, together with a candidate program, an LLM can improve that program by being prompted with failed test cases. But it remains an open question how to best iteratively refine code, with prior work employing simple greedy or breadth-first strategies. We show here that refinement exposes an explore-exploit tradeoff: exploit by refining the program that passes the most test cases, or explore by refining a lesser considered program. We frame this as an arm-acquiring bandit problem, which we solve with Thompson Sampling. The resulting LLM-based program synthesis algorithm is broadly applicable: Across loop invariant synthesis, visual reasoning puzzles, and competition programming problems, we find that our new method can solve more problems using fewer language model calls.
△ Less
Submitted 29 October, 2024; v1 submitted 26 May, 2024;
originally announced May 2024.
-
Attention to Quantum Complexity
Authors:
Hyejin Kim,
Yiqing Zhou,
Yichen Xu,
Kaarthik Varma,
Amir H. Karamlou,
Ilan T. Rosen,
Jesse C. Hoke,
Chao Wan,
Jin Peng Zhou,
William D. Oliver,
Yuri D. Lensky,
Kilian Q. Weinberger,
Eun-Ah Kim
Abstract:
The imminent era of error-corrected quantum computing urgently demands robust methods to characterize complex quantum states, even from limited and noisy measurements. We introduce the Quantum Attention Network (QuAN), a versatile classical AI framework leveraging the power of attention mechanisms specifically tailored to address the unique challenges of learning quantum complexity. Inspired by la…
▽ More
The imminent era of error-corrected quantum computing urgently demands robust methods to characterize complex quantum states, even from limited and noisy measurements. We introduce the Quantum Attention Network (QuAN), a versatile classical AI framework leveraging the power of attention mechanisms specifically tailored to address the unique challenges of learning quantum complexity. Inspired by large language models, QuAN treats measurement snapshots as tokens while respecting their permutation invariance. Combined with a novel parameter-efficient mini-set self-attention block (MSSAB), such data structure enables QuAN to access high-order moments of the bit-string distribution and preferentially attend to less noisy snapshots. We rigorously test QuAN across three distinct quantum simulation settings: driven hard-core Bose-Hubbard model, random quantum circuits, and the toric code under coherent and incoherent noise. QuAN directly learns the growth in entanglement and state complexity from experimentally obtained computational basis measurements. In particular, it learns the growth in complexity of random circuit data upon increasing depth from noisy experimental data. Taken to a regime inaccessible by existing theory, QuAN unveils the complete phase diagram for noisy toric code data as a function of both noise types. This breakthrough highlights the transformative potential of using purposefully designed AI-driven solutions to assist quantum hardware.
△ Less
Submitted 20 November, 2024; v1 submitted 19 May, 2024;
originally announced May 2024.
-
Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization
Authors:
Jin Peng Zhou,
Charles Staats,
Wenda Li,
Christian Szegedy,
Kilian Q. Weinberger,
Yuhuai Wu
Abstract:
Large language models (LLM), such as Google's Minerva and OpenAI's GPT families, are becoming increasingly capable of solving mathematical quantitative reasoning problems. However, they still make unjustified logical and computational errors in their reasoning steps and answers. In this paper, we leverage the fact that if the training corpus of LLMs contained sufficiently many examples of formal m…
▽ More
Large language models (LLM), such as Google's Minerva and OpenAI's GPT families, are becoming increasingly capable of solving mathematical quantitative reasoning problems. However, they still make unjustified logical and computational errors in their reasoning steps and answers. In this paper, we leverage the fact that if the training corpus of LLMs contained sufficiently many examples of formal mathematics (e.g. in Isabelle, a formal theorem proving environment), they can be prompted to translate i.e. autoformalize informal mathematical statements into formal Isabelle code -- which can be verified automatically for internal consistency. This provides a mechanism to automatically reject solutions whose formalized versions are inconsistent within themselves or with the formalized problem statement. We evaluate our method on GSM8K, MATH and MultiArith datasets and demonstrate that our approach provides a consistently better heuristic than vanilla majority voting -- the previously best method to identify correct answers, by more than 12% on GSM8K. In our experiments it improves results consistently across all datasets and LLM model sizes. The code can be found at https://github.com/jinpz/dtv.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
REFACTOR: Learning to Extract Theorems from Proofs
Authors:
Jin Peng Zhou,
Yuhuai Wu,
Qiyang Li,
Roger Grosse
Abstract:
Human mathematicians are often good at recognizing modular and reusable theorems that make complex mathematical results within reach. In this paper, we propose a novel method called theoREm-from-prooF extrACTOR (REFACTOR) for training neural networks to mimic this ability in formal mathematical theorem proving. We show on a set of unseen proofs, REFACTOR is able to extract 19.6% of the theorems th…
▽ More
Human mathematicians are often good at recognizing modular and reusable theorems that make complex mathematical results within reach. In this paper, we propose a novel method called theoREm-from-prooF extrACTOR (REFACTOR) for training neural networks to mimic this ability in formal mathematical theorem proving. We show on a set of unseen proofs, REFACTOR is able to extract 19.6% of the theorems that humans would use to write the proofs. When applying the model to the existing Metamath library, REFACTOR extracted 16 new theorems. With newly extracted theorems, we show that the existing proofs in the MetaMath database can be refactored. The new theorems are used very frequently after refactoring, with an average usage of 733.5 times, and help shorten the proof lengths. Lastly, we demonstrate that the prover trained on the new-theorem refactored dataset proves more test theorems and outperforms state-of-the-art baselines by frequently leveraging a diverse set of newly extracted theorems. Code can be found at https://github.com/jinpz/refactor.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Detecting Out-of-Distribution Objects through Class-Conditioned Inpainting
Authors:
Quang-Huy Nguyen,
Jin Peng Zhou,
Zhenzhen Liu,
Khanh-Huyen Bui,
Kilian Q. Weinberger,
Wei-Lun Chao,
Dung D. Le
Abstract:
Recent object detectors have achieved impressive accuracy in identifying objects seen during training. However, real-world deployment often introduces novel and unexpected objects, referred to as out-of-distribution (OOD) objects, posing significant challenges to model trustworthiness. Modern object detectors are typically overconfident, making it unreliable to use their predictions alone for OOD…
▽ More
Recent object detectors have achieved impressive accuracy in identifying objects seen during training. However, real-world deployment often introduces novel and unexpected objects, referred to as out-of-distribution (OOD) objects, posing significant challenges to model trustworthiness. Modern object detectors are typically overconfident, making it unreliable to use their predictions alone for OOD detection. To address this, we propose leveraging an auxiliary model as a complementary solution. Specifically, we utilize an off-the-shelf text-to-image generative model, such as Stable Diffusion, which is trained with objective functions distinct from those of discriminative object detectors. We hypothesize that this fundamental difference enables the detection of OOD objects by measuring inconsistencies between the models. Concretely, for a given detected object bounding box and its predicted in-distribution class label, we perform class-conditioned inpainting on the image with the object removed. If the object is OOD, the inpainted image is likely to deviate significantly from the original, making the reconstruction error a robust indicator of OOD status. Extensive experiments demonstrate that our approach consistently surpasses existing zero-shot and non-zero-shot OOD detection methods, establishing a robust framework for enhancing object detection systems in dynamic environments.
△ Less
Submitted 9 June, 2025; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Correction with Backtracking Reduces Hallucination in Summarization
Authors:
Zhenzhen Liu,
Chao Wan,
Varsha Kishore,
Jin Peng Zhou,
Minmin Chen,
Kilian Q. Weinberger
Abstract:
Abstractive summarization aims at generating natural language summaries of a source document that are succinct while preserving the important elements. Despite recent advances, neural text summarization models are known to be susceptible to hallucinating (or more correctly confabulating), that is to produce summaries with details that are not grounded in the source document. In this paper, we intr…
▽ More
Abstractive summarization aims at generating natural language summaries of a source document that are succinct while preserving the important elements. Despite recent advances, neural text summarization models are known to be susceptible to hallucinating (or more correctly confabulating), that is to produce summaries with details that are not grounded in the source document. In this paper, we introduce a simple yet efficient technique, CoBa, to reduce hallucination in abstractive summarization. The approach is based on two steps: hallucination detection and mitigation. We show that the former can be achieved through measuring simple statistics about conditional word probabilities and distance to context words. Further, we demonstrate that straight-forward backtracking is surprisingly effective at mitigation. We thoroughly evaluate the proposed method with prior art on three benchmark datasets for text summarization. The results show that CoBa is effective and efficient in reducing hallucination, and offers great adaptability and flexibility. Code can be found at https://github.com/zhenzhel/CoBa.
△ Less
Submitted 3 September, 2024; v1 submitted 24 October, 2023;
originally announced October 2023.
-
Magnushammer: A Transformer-Based Approach to Premise Selection
Authors:
Maciej Mikuła,
Szymon Tworkowski,
Szymon Antoniak,
Bartosz Piotrowski,
Albert Qiaochu Jiang,
Jin Peng Zhou,
Christian Szegedy,
Łukasz Kuciński,
Piotr Miłoś,
Yuhuai Wu
Abstract:
This paper presents a novel approach to premise selection, a crucial reasoning task in automated theorem proving. Traditionally, symbolic methods that rely on extensive domain knowledge and engineering effort are applied to this task. In contrast, this work demonstrates that contrastive training with the transformer architecture can achieve higher-quality retrieval of relevant premises, without th…
▽ More
This paper presents a novel approach to premise selection, a crucial reasoning task in automated theorem proving. Traditionally, symbolic methods that rely on extensive domain knowledge and engineering effort are applied to this task. In contrast, this work demonstrates that contrastive training with the transformer architecture can achieve higher-quality retrieval of relevant premises, without the engineering overhead. Our method, Magnushammer, outperforms the most advanced and widely used automation tool in interactive theorem proving called Sledgehammer. On the PISA and miniF2F benchmarks Magnushammer achieves $59.5\%$ (against $38.3\%$) and $34.0\%$ (against $20.9\%$) success rates, respectively. By combining \method with a language-model-based automated theorem prover, we further improve the state-of-the-art proof success rate from $57.0\%$ to $71.0\%$ on the PISA benchmark using $4$x fewer parameters. Moreover, we develop and open source a novel dataset for premise selection, containing textual representations of (proof state, relevant premise) pairs. To the best of our knowledge, this is the largest available premise selection dataset, and the first one for the Isabelle proof assistant.
△ Less
Submitted 18 March, 2024; v1 submitted 8 March, 2023;
originally announced March 2023.
-
Unsupervised Out-of-Distribution Detection with Diffusion Inpainting
Authors:
Zhenzhen Liu,
Jin Peng Zhou,
Yufan Wang,
Kilian Q. Weinberger
Abstract:
Unsupervised out-of-distribution detection (OOD) seeks to identify out-of-domain data by learning only from unlabeled in-domain data. We present a novel approach for this task - Lift, Map, Detect (LMD) - that leverages recent advancement in diffusion models. Diffusion models are one type of generative models. At their core, they learn an iterative denoising process that gradually maps a noisy imag…
▽ More
Unsupervised out-of-distribution detection (OOD) seeks to identify out-of-domain data by learning only from unlabeled in-domain data. We present a novel approach for this task - Lift, Map, Detect (LMD) - that leverages recent advancement in diffusion models. Diffusion models are one type of generative models. At their core, they learn an iterative denoising process that gradually maps a noisy image closer to their training manifolds. LMD leverages this intuition for OOD detection. Specifically, LMD lifts an image off its original manifold by corrupting it, and maps it towards the in-domain manifold with a diffusion model. For an out-of-domain image, the mapped image would have a large distance away from its original manifold, and LMD would identify it as OOD accordingly. We show through extensive experiments that LMD achieves competitive performance across a broad variety of datasets. Code can be found at https://github.com/zhenzhel/lift_map_detect.
△ Less
Submitted 16 August, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
Learned Systems Security
Authors:
Roei Schuster,
Jin Peng Zhou,
Thorsten Eisenhofer,
Paul Grubbs,
Nicolas Papernot
Abstract:
A learned system uses machine learning (ML) internally to improve performance. We can expect such systems to be vulnerable to some adversarial-ML attacks. Often, the learned component is shared between mutually-distrusting users or processes, much like microarchitectural resources such as caches, potentially giving rise to highly-realistic attacker models. However, compared to attacks on other ML-…
▽ More
A learned system uses machine learning (ML) internally to improve performance. We can expect such systems to be vulnerable to some adversarial-ML attacks. Often, the learned component is shared between mutually-distrusting users or processes, much like microarchitectural resources such as caches, potentially giving rise to highly-realistic attacker models. However, compared to attacks on other ML-based systems, attackers face a level of indirection as they cannot interact directly with the learned model. Additionally, the difference between the attack surface of learned and non-learned versions of the same system is often subtle. These factors obfuscate the de-facto risks that the incorporation of ML carries. We analyze the root causes of potentially-increased attack surface in learned systems and develop a framework for identifying vulnerabilities that stem from the use of ML. We apply our framework to a broad set of learned systems under active development. To empirically validate the many vulnerabilities surfaced by our framework, we choose 3 of them and implement and evaluate exploits against prominent learned-system instances. We show that the use of ML caused leakage of past queries in a database, enabled a poisoning attack that causes exponential memory blowup in an index structure and crashes it in seconds, and enabled index users to snoop on each others' key distributions by timing queries over their own keys. We find that adversarial ML is a universal threat against learned systems, point to open research gaps in our understanding of learned-systems security, and conclude by discussing mitigations, while noting that data leakage is inherent in systems whose learned component is shared between multiple parties.
△ Less
Submitted 10 January, 2023; v1 submitted 20 December, 2022;
originally announced December 2022.
-
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs
Authors:
Albert Q. Jiang,
Sean Welleck,
Jin Peng Zhou,
Wenda Li,
Jiacheng Liu,
Mateja Jamnik,
Timothée Lacroix,
Yuhuai Wu,
Guillaume Lample
Abstract:
The formalization of existing mathematical proofs is a notoriously difficult process. Despite decades of research on automation and proof assistants, writing formal proofs remains arduous and only accessible to a few experts. While previous studies to automate formalization focused on powerful search algorithms, no attempts were made to take advantage of available informal proofs. In this work, we…
▽ More
The formalization of existing mathematical proofs is a notoriously difficult process. Despite decades of research on automation and proof assistants, writing formal proofs remains arduous and only accessible to a few experts. While previous studies to automate formalization focused on powerful search algorithms, no attempts were made to take advantage of available informal proofs. In this work, we introduce Draft, Sketch, and Prove (DSP), a method that maps informal proofs to formal proof sketches, and uses the sketches to guide an automated prover by directing its search to easier sub-problems. We investigate two relevant setups where informal proofs are either written by humans or generated by a language model. Our experiments and ablation studies show that large language models are able to produce well-structured formal sketches that follow the same reasoning steps as the informal proofs. Guiding an automated prover with these sketches enhances its performance from 20.9% to 39.3% on a collection of mathematical competition problems.
△ Less
Submitted 20 February, 2023; v1 submitted 21 October, 2022;
originally announced October 2022.
-
Does Label Differential Privacy Prevent Label Inference Attacks?
Authors:
Ruihan Wu,
Jin Peng Zhou,
Kilian Q. Weinberger,
Chuan Guo
Abstract:
Label differential privacy (label-DP) is a popular framework for training private ML models on datasets with public features and sensitive private labels. Despite its rigorous privacy guarantee, it has been observed that in practice label-DP does not preclude label inference attacks (LIAs): Models trained with label-DP can be evaluated on the public training features to recover, with high accuracy…
▽ More
Label differential privacy (label-DP) is a popular framework for training private ML models on datasets with public features and sensitive private labels. Despite its rigorous privacy guarantee, it has been observed that in practice label-DP does not preclude label inference attacks (LIAs): Models trained with label-DP can be evaluated on the public training features to recover, with high accuracy, the very private labels that it was designed to protect. In this work, we argue that this phenomenon is not paradoxical and that label-DP is designed to limit the advantage of an LIA adversary compared to predicting training labels using the Bayes classifier. At label-DP $ε=0$ this advantage is zero, hence the optimal attack is to predict according to the Bayes classifier and is independent of the training labels. Our bound shows the semantic protection conferred by label-DP and gives guidelines on how to choose $\varepsilon$ to limit the threat of LIAs below a certain level. Finally, we empirically demonstrate that our result closely captures the behavior of simulated attacks on both synthetic and real world datasets.
△ Less
Submitted 3 June, 2023; v1 submitted 25 February, 2022;
originally announced February 2022.
-
On Attribution of Deepfakes
Authors:
Baiwu Zhang,
Jin Peng Zhou,
Ilia Shumailov,
Nicolas Papernot
Abstract:
Progress in generative modelling, especially generative adversarial networks, have made it possible to efficiently synthesize and alter media at scale. Malicious individuals now rely on these machine-generated media, or deepfakes, to manipulate social discourse. In order to ensure media authenticity, existing research is focused on deepfake detection. Yet, the adversarial nature of frameworks used…
▽ More
Progress in generative modelling, especially generative adversarial networks, have made it possible to efficiently synthesize and alter media at scale. Malicious individuals now rely on these machine-generated media, or deepfakes, to manipulate social discourse. In order to ensure media authenticity, existing research is focused on deepfake detection. Yet, the adversarial nature of frameworks used for generative modeling suggests that progress towards detecting deepfakes will enable more realistic deepfake generation. Therefore, it comes at no surprise that developers of generative models are under the scrutiny of stakeholders dealing with misinformation campaigns. At the same time, generative models have a lot of positive applications. As such, there is a clear need to develop tools that ensure the transparent use of generative modeling, while minimizing the harm caused by malicious applications.
Our technique optimizes over the source of entropy of each generative model to probabilistically attribute a deepfake to one of the models. We evaluate our method on the seminal example of face synthesis, demonstrating that our approach achieves 97.62% attribution accuracy, and is less sensitive to perturbations and adversarial examples. We discuss the ethical implications of our work, identify where our technique can be used, and highlight that a more meaningful legislative framework is required for a more transparent and ethical use of generative modeling. Finally, we argue that model developers should be capable of claiming plausible deniability and propose a second framework to do so -- this allows a model developer to produce evidence that they did not produce media that they are being accused of having produced.
△ Less
Submitted 3 March, 2021; v1 submitted 20 August, 2020;
originally announced August 2020.
-
Noise Contrastive Estimation for Autoencoding-based One-Class Collaborative Filtering
Authors:
Jin Peng Zhou,
Ga Wu,
Zheda Mai,
Scott Sanner
Abstract:
One-class collaborative filtering (OC-CF) is a common class of recommendation problem where only the positive class is explicitly observed (e.g., purchases, clicks). Autoencoder based recommenders such as AutoRec and variants demonstrate strong performance on many OC-CF benchmarks, but also empirically suffer from a strong popularity bias. While a careful choice of negative samples in the OC-CF se…
▽ More
One-class collaborative filtering (OC-CF) is a common class of recommendation problem where only the positive class is explicitly observed (e.g., purchases, clicks). Autoencoder based recommenders such as AutoRec and variants demonstrate strong performance on many OC-CF benchmarks, but also empirically suffer from a strong popularity bias. While a careful choice of negative samples in the OC-CF setting can mitigate popularity bias, Negative Sampling (NS) is often better for training embeddings than for the end task itself. To address this, we propose a two-headed AutoRec to first train an embedding layer via one head using Negative Sampling then to train for the final task via the second head. While this NS-AutoRec improves results for AutoRec and outperforms many state-of-the-art baselines on OC-CF problems, we notice that Negative Sampling can still take a large amount of time to train. Since Negative Sampling is known to be a special case of Noise Contrastive Estimation (NCE), we adapt a recently proposed closed-form NCE solution for collaborative filtering to AutoRec yielding NCE-AutoRec. Overall, we show that our novel two-headed AutoRec models (NCE-AutoRec and NS-AutoRec) successfully mitigate the popularity bias issue and maintain competitive performance in comparison to state-of-the-art recommenders on multiple real-world datasets.
△ Less
Submitted 5 August, 2020; v1 submitted 3 August, 2020;
originally announced August 2020.