Search | arXiv e-print repository

Auditing language models for hidden objectives

Authors: Samuel Marks, Johannes Treutlein, Trenton Bricken, Jack Lindsey, Jonathan Marcus, Siddharth Mishra-Sharma, Daniel Ziegler, Emmanuel Ameisen, Joshua Batson, Tim Belonax, Samuel R. Bowman, Shan Carter, Brian Chen, Hoagy Cunningham, Carson Denison, Florian Dietz, Satvik Golechha, Akbir Khan, Jan Kirchner, Jan Leike, Austin Meek, Kei Nishimura-Gasparian, Euan Ong, Christopher Olah, Adam Pearce , et al. (10 additional authors not shown)

Abstract: We study the feasibility of conducting alignment audits: investigations into whether models have undesired objectives. As a testbed, we train a language model with a hidden objective. Our training pipeline first teaches the model about exploitable errors in RLHF reward models (RMs), then trains the model to exploit some of these errors. We verify via out-of-distribution evaluations that the model… ▽ More We study the feasibility of conducting alignment audits: investigations into whether models have undesired objectives. As a testbed, we train a language model with a hidden objective. Our training pipeline first teaches the model about exploitable errors in RLHF reward models (RMs), then trains the model to exploit some of these errors. We verify via out-of-distribution evaluations that the model generalizes to exhibit whatever behaviors it believes RMs rate highly, including ones not reinforced during training. We leverage this model to study alignment audits in two ways. First, we conduct a blind auditing game where four teams, unaware of the model's hidden objective or training, investigate it for concerning behaviors and their causes. Three teams successfully uncovered the model's hidden objective using techniques including interpretability with sparse autoencoders (SAEs), behavioral attacks, and training data analysis. Second, we conduct an unblinded follow-up study of eight techniques for auditing the model, analyzing their strengths and limitations. Overall, our work provides a concrete example of using alignment audits to discover a model's hidden objective and proposes a methodology for practicing and validating progress in alignment auditing. △ Less

Submitted 27 March, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

arXiv:2310.13798 [pdf, other]

Specific versus General Principles for Constitutional AI

Authors: Sandipan Kundu, Yuntao Bai, Saurav Kadavath, Amanda Askell, Andrew Callahan, Anna Chen, Anna Goldie, Avital Balwit, Azalia Mirhoseini, Brayden McLean, Catherine Olsson, Cassie Evraets, Eli Tran-Johnson, Esin Durmus, Ethan Perez, Jackson Kernion, Jamie Kerr, Kamal Ndousse, Karina Nguyen, Nelson Elhage, Newton Cheng, Nicholas Schiefer, Nova DasSarma, Oliver Rausch, Robin Larson , et al. (11 additional authors not shown)

Abstract: Human feedback can prevent overtly harmful utterances in conversational models, but may not automatically mitigate subtle problematic behaviors such as a stated desire for self-preservation or power. Constitutional AI offers an alternative, replacing human feedback with feedback from AI models conditioned only on a list of written principles. We find this approach effectively prevents the expressi… ▽ More Human feedback can prevent overtly harmful utterances in conversational models, but may not automatically mitigate subtle problematic behaviors such as a stated desire for self-preservation or power. Constitutional AI offers an alternative, replacing human feedback with feedback from AI models conditioned only on a list of written principles. We find this approach effectively prevents the expression of such behaviors. The success of simple principles motivates us to ask: can models learn general ethical behaviors from only a single written principle? To test this, we run experiments using a principle roughly stated as "do what's best for humanity". We find that the largest dialogue models can generalize from this short constitution, resulting in harmless assistants with no stated interest in specific motivations like power. A general principle may thus partially avoid the need for a long list of constitutions targeting potentially harmful behaviors. However, more detailed constitutions still improve fine-grained control over specific types of harms. This suggests both general and specific principles have value for steering AI safely. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2307.13702 [pdf, other]

Measuring Faithfulness in Chain-of-Thought Reasoning

Authors: Tamera Lanham, Anna Chen, Ansh Radhakrishnan, Benoit Steiner, Carson Denison, Danny Hernandez, Dustin Li, Esin Durmus, Evan Hubinger, Jackson Kernion, Kamilė Lukošiūtė, Karina Nguyen, Newton Cheng, Nicholas Joseph, Nicholas Schiefer, Oliver Rausch, Robin Larson, Sam McCandlish, Sandipan Kundu, Saurav Kadavath, Shannon Yang, Thomas Henighan, Timothy Maxwell, Timothy Telleen-Lawton, Tristan Hume , et al. (5 additional authors not shown)

Abstract: Large language models (LLMs) perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question, but it is unclear if the stated reasoning is a faithful explanation of the model's actual reasoning (i.e., its process for answering the question). We investigate hypotheses for how CoT reasoning may be unfaithful, by examining how the model predictions change… ▽ More Large language models (LLMs) perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question, but it is unclear if the stated reasoning is a faithful explanation of the model's actual reasoning (i.e., its process for answering the question). We investigate hypotheses for how CoT reasoning may be unfaithful, by examining how the model predictions change when we intervene on the CoT (e.g., by adding mistakes or paraphrasing it). Models show large variation across tasks in how strongly they condition on the CoT when predicting their answer, sometimes relying heavily on the CoT and other times primarily ignoring it. CoT's performance boost does not seem to come from CoT's added test-time compute alone or from information encoded via the particular phrasing of the CoT. As models become larger and more capable, they produce less faithful reasoning on most tasks we study. Overall, our results suggest that CoT can be faithful if the circumstances such as the model size and task are carefully chosen. △ Less

Submitted 16 July, 2023; originally announced July 2023.

arXiv:2307.12132 [pdf, other]

Ultrafast measurements of mode-specific deformation potentials of Bi$_2$Te$_3$ and Bi$_2$Se$_3$

Authors: Yijing Huang, José D. Querales-Flores, Samuel W. Teitelbaum, Jiang Cao, Thomas Henighan, Hanzhe Liu, Mason Jiang, Gilberto De la Peña, Viktor Krapivin, Johann Haber, Takahiro Sato, Matthieu Chollet, Diling Zhu, Tetsuo Katayama, Robert Power, Meabh Allen, Costel R. Rotundu, Trevor P. Bailey, Ctirad Uher, Mariano Trigo, Patrick S. Kirchmann, Éamonn D. Murray, Zhi-Xun Shen, Ivana Savic, Stephen Fahy , et al. (2 additional authors not shown)

Abstract: Quantifying electron-phonon interactions for the surface states of topological materials can provide key insights into surface-state transport, topological superconductivity, and potentially how to manipulate the surface state using a structural degree of freedom. We perform time-resolved x-ray diffraction (XRD) and angle-resolved photoemission (ARPES) measurements on Bi$_2$Te$_3$ and Bi$_2$Se… ▽ More Quantifying electron-phonon interactions for the surface states of topological materials can provide key insights into surface-state transport, topological superconductivity, and potentially how to manipulate the surface state using a structural degree of freedom. We perform time-resolved x-ray diffraction (XRD) and angle-resolved photoemission (ARPES) measurements on Bi$_2$Te$_3$ and Bi$_2$Se$_3$, following the excitation of coherent A$_{1g}$ optical phonons. We extract and compare the deformation potentials coupling the surface electronic states to local A$_{1g}$-like displacements in these two materials using the experimentally determined atomic displacements from XRD and electron band shifts from ARPES.We find the coupling in Bi$_2$Te$_3$ and Bi$_2$Se$_3$ to be similar and in general in agreement with expectations from density functional theory. We establish a methodology that quantifies the mode-specific electron-phonon coupling experimentally, allowing detailed comparison to theory. Our results shed light on fundamental processes in topological insulators involving electron-phonon coupling. △ Less

Submitted 22 July, 2023; originally announced July 2023.

arXiv:2302.07459 [pdf, other]

The Capacity for Moral Self-Correction in Large Language Models

Authors: Deep Ganguli, Amanda Askell, Nicholas Schiefer, Thomas I. Liao, Kamilė Lukošiūtė, Anna Chen, Anna Goldie, Azalia Mirhoseini, Catherine Olsson, Danny Hernandez, Dawn Drain, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jackson Kernion, Jamie Kerr, Jared Mueller, Joshua Landau, Kamal Ndousse, Karina Nguyen, Liane Lovitt, Michael Sellitto, Nelson Elhage, Noemi Mercado, Nova DasSarma , et al. (24 additional authors not shown)

Abstract: We test the hypothesis that language models trained with reinforcement learning from human feedback (RLHF) have the capability to "morally self-correct" -- to avoid producing harmful outputs -- if instructed to do so. We find strong evidence in support of this hypothesis across three different experiments, each of which reveal different facets of moral self-correction. We find that the capability… ▽ More We test the hypothesis that language models trained with reinforcement learning from human feedback (RLHF) have the capability to "morally self-correct" -- to avoid producing harmful outputs -- if instructed to do so. We find strong evidence in support of this hypothesis across three different experiments, each of which reveal different facets of moral self-correction. We find that the capability for moral self-correction emerges at 22B model parameters, and typically improves with increasing model size and RLHF training. We believe that at this level of scale, language models obtain two capabilities that they can use for moral self-correction: (1) they can follow instructions and (2) they can learn complex normative concepts of harm like stereotyping, bias, and discrimination. As such, they can follow instructions to avoid certain kinds of morally harmful outputs. We believe our results are cause for cautious optimism regarding the ability to train language models to abide by ethical principles. △ Less

Submitted 18 February, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

arXiv:2212.09892 [pdf, other]

doi 10.1103/PhysRevB.107.014305

Influence of local symmetry on lattice dynamics coupled to topological surface states

Authors: Jonathan A. Sobota, Samuel W. Teitelbaum, Yijing Huang, José D. Querales-Flores, Robert Power, Meabh Allen, Costel R. Rotundu, Trevor P. Bailey, Ctirad Uher, Tom Henighan, Mason Jiang, Diling Zhu, Matthieu Chollet, Takahiro Sato, Mariano Trigo, Éamonn D. Murray, Ivana Savić, Patrick S. Kirchmann, Stephen Fahy, David. A. Reis, Zhi-Xun Shen

Abstract: We investigate coupled electron-lattice dynamics in the topological insulator Bi2Te3 with time-resolved photoemission and time-resolved x-ray diffraction. It is well established that coherent phonons can be launched by optical excitation, but selection rules generally restrict these modes to zone-center wavevectors and Raman-active branches. We find that the topological surface state couples to ad… ▽ More We investigate coupled electron-lattice dynamics in the topological insulator Bi2Te3 with time-resolved photoemission and time-resolved x-ray diffraction. It is well established that coherent phonons can be launched by optical excitation, but selection rules generally restrict these modes to zone-center wavevectors and Raman-active branches. We find that the topological surface state couples to additional modes, including a continuum of surface-projected bulk modes from both Raman- and infrared-branches, with possible contributions from surface-localized modes when they exist. Our calculations show that this surface vibrational spectrum occurs naturally as a consequence of the translational and inversion symmetries broken at the surface, without requiring the splitting-off of surface-localized phonon modes. The generality of this result suggests that coherent phonon spectra are useful by providing unique fingerprints for identifying surface states in more controversial materials. These effects may also expand the phase space for tailoring surface state wavefunctions via ultrafast optical excitation. △ Less

Submitted 19 December, 2022; originally announced December 2022.

arXiv:2212.09251 [pdf, other]

Discovering Language Model Behaviors with Model-Written Evaluations

Authors: Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Chen, Ben Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Guro Khundadze, Jackson Kernion , et al. (38 additional authors not shown)

Abstract: As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which are not always available). Here, we automatically generate evaluations with LMs. We explore approaches with varying amounts of human effort, from inst… ▽ More As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which are not always available). Here, we automatically generate evaluations with LMs. We explore approaches with varying amounts of human effort, from instructing LMs to write yes/no questions to making complex Winogender schemas with multiple stages of LM-based generation and filtering. Crowdworkers rate the examples as highly relevant and agree with 90-100% of labels, sometimes more so than corresponding human-written datasets. We generate 154 datasets and discover new cases of inverse scaling where LMs get worse with size. Larger LMs repeat back a dialog user's preferred answer ("sycophancy") and express greater desire to pursue concerning goals like resource acquisition and goal preservation. We also find some of the first examples of inverse scaling in RL from Human Feedback (RLHF), where more RLHF makes LMs worse. For example, RLHF makes LMs express stronger political views (on gun rights and immigration) and a greater desire to avoid shut down. Overall, LM-written evaluations are high-quality and let us quickly discover many novel LM behaviors. △ Less

Submitted 19 December, 2022; originally announced December 2022.

Comments: for associated data visualizations, see https://www.evals.anthropic.com/model-written/ for full datasets, see https://github.com/anthropics/evals

arXiv:2212.08073 [pdf, other]

Constitutional AI: Harmlessness from AI Feedback

Authors: Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Kamile Lukosuite , et al. (26 additional authors not shown)

Abstract: As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'. The process involves both a supe… ▽ More As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'. The process involves both a supervised learning and a reinforcement learning phase. In the supervised phase we sample from an initial model, then generate self-critiques and revisions, and then finetune the original model on revised responses. In the RL phase, we sample from the finetuned model, use a model to evaluate which of the two samples is better, and then train a preference model from this dataset of AI preferences. We then train with RL using the preference model as the reward signal, i.e. we use 'RL from AI Feedback' (RLAIF). As a result we are able to train a harmless but non-evasive AI assistant that engages with harmful queries by explaining its objections to them. Both the SL and RL methods can leverage chain-of-thought style reasoning to improve the human-judged performance and transparency of AI decision making. These methods make it possible to control AI behavior more precisely and with far fewer human labels. △ Less

Submitted 15 December, 2022; originally announced December 2022.

arXiv:2211.03540 [pdf, other]

Measuring Progress on Scalable Oversight for Large Language Models

Authors: Samuel R. Bowman, Jeeyoon Hyun, Ethan Perez, Edwin Chen, Craig Pettit, Scott Heiner, Kamilė Lukošiūtė, Amanda Askell, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Christopher Olah, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Jackson Kernion, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse , et al. (21 additional authors not shown)

Abstract: Developing safe and useful general-purpose AI systems will require us to make progress on scalable oversight: the problem of supervising systems that potentially outperform us on most skills relevant to the task at hand. Empirical work on this problem is not straightforward, since we do not yet have systems that broadly exceed our abilities. This paper discusses one of the major ways we think abou… ▽ More Developing safe and useful general-purpose AI systems will require us to make progress on scalable oversight: the problem of supervising systems that potentially outperform us on most skills relevant to the task at hand. Empirical work on this problem is not straightforward, since we do not yet have systems that broadly exceed our abilities. This paper discusses one of the major ways we think about this problem, with a focus on ways it can be studied empirically. We first present an experimental design centered on tasks for which human specialists succeed but unaided humans and current general AI systems fail. We then present a proof-of-concept experiment meant to demonstrate a key feature of this experimental design and show its viability with two question-answering tasks: MMLU and time-limited QuALITY. On these tasks, we find that human participants who interact with an unreliable large-language-model dialog assistant through chat -- a trivial baseline strategy for scalable oversight -- substantially outperform both the model alone and their own unaided performance. These results are an encouraging sign that scalable oversight will be tractable to study with present models and bolster recent findings that large language models can productively assist humans with difficult tasks. △ Less

Submitted 11 November, 2022; v1 submitted 4 November, 2022; originally announced November 2022.

Comments: v2 fixes a few typos from v1

arXiv:2209.11895 [pdf]

In-context Learning and Induction Heads

Authors: Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish , et al. (1 additional authors not shown)

Abstract: "Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing token indices). We find that induc… ▽ More "Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing token indices). We find that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability, visible as a bump in the training loss. We present six complementary lines of evidence, arguing that induction heads may be the mechanistic source of general in-context learning in transformer models of any size. For small attention-only models, we present strong, causal evidence; for larger models with MLPs, we present correlational evidence. △ Less

Submitted 23 September, 2022; originally announced September 2022.

arXiv:2209.10652 [pdf]

Toy Models of Superposition

Authors: Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, Roger Grosse, Sam McCandlish, Jared Kaplan, Dario Amodei, Martin Wattenberg, Christopher Olah

Abstract: Neural networks often pack many unrelated concepts into a single neuron - a puzzling phenomenon known as 'polysemanticity' which makes interpretability much more challenging. This paper provides a toy model where polysemanticity can be fully understood, arising as a result of models storing additional sparse features in "superposition." We demonstrate the existence of a phase change, a surprising… ▽ More Neural networks often pack many unrelated concepts into a single neuron - a puzzling phenomenon known as 'polysemanticity' which makes interpretability much more challenging. This paper provides a toy model where polysemanticity can be fully understood, arising as a result of models storing additional sparse features in "superposition." We demonstrate the existence of a phase change, a surprising connection to the geometry of uniform polytopes, and evidence of a link to adversarial examples. We also discuss potential implications for mechanistic interpretability. △ Less

Submitted 21 September, 2022; originally announced September 2022.

Comments: Also available at https://transformer-circuits.pub/2022/toy_model/index.html

arXiv:2209.07858 [pdf, other]

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Authors: Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, Andy Jones, Sam Bowman, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El-Showk, Stanislav Fort, Zac Hatfield-Dodds, Tom Henighan, Danny Hernandez, Tristan Hume, Josh Jacobson, Scott Johnston , et al. (11 additional authors not shown)

Abstract: We describe our early efforts to red team language models in order to simultaneously discover, measure, and attempt to reduce their potentially harmful outputs. We make three main contributions. First, we investigate scaling behaviors for red teaming across 3 model sizes (2.7B, 13B, and 52B parameters) and 4 model types: a plain language model (LM); an LM prompted to be helpful, honest, and harmle… ▽ More We describe our early efforts to red team language models in order to simultaneously discover, measure, and attempt to reduce their potentially harmful outputs. We make three main contributions. First, we investigate scaling behaviors for red teaming across 3 model sizes (2.7B, 13B, and 52B parameters) and 4 model types: a plain language model (LM); an LM prompted to be helpful, honest, and harmless; an LM with rejection sampling; and a model trained to be helpful and harmless using reinforcement learning from human feedback (RLHF). We find that the RLHF models are increasingly difficult to red team as they scale, and we find a flat trend with scale for the other model types. Second, we release our dataset of 38,961 red team attacks for others to analyze and learn from. We provide our own analysis of the data and find a variety of harmful outputs, which range from offensive language to more subtly harmful non-violent unethical outputs. Third, we exhaustively describe our instructions, processes, statistical methodologies, and uncertainty about red teaming. We hope that this transparency accelerates our ability to work together as a community in order to develop shared norms, practices, and technical standards for how to red team language models. △ Less

Submitted 22 November, 2022; v1 submitted 23 August, 2022; originally announced September 2022.

arXiv:2207.05221 [pdf, other]

Language Models (Mostly) Know What They Know

Authors: Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, Scott Johnston, Sheer El-Showk, Andy Jones, Nelson Elhage, Tristan Hume, Anna Chen, Yuntao Bai, Sam Bowman, Stanislav Fort, Deep Ganguli, Danny Hernandez, Josh Jacobson, Jackson Kernion, Shauna Kravec, Liane Lovitt , et al. (11 additional authors not shown)

Abstract: We study whether language models can evaluate the validity of their own claims and predict which questions they will be able to answer correctly. We first show that larger models are well-calibrated on diverse multiple choice and true/false questions when they are provided in the right format. Thus we can approach self-evaluation on open-ended sampling tasks by asking models to first propose answe… ▽ More We study whether language models can evaluate the validity of their own claims and predict which questions they will be able to answer correctly. We first show that larger models are well-calibrated on diverse multiple choice and true/false questions when they are provided in the right format. Thus we can approach self-evaluation on open-ended sampling tasks by asking models to first propose answers, and then to evaluate the probability "P(True)" that their answers are correct. We find encouraging performance, calibration, and scaling for P(True) on a diverse array of tasks. Performance at self-evaluation further improves when we allow models to consider many of their own samples before predicting the validity of one specific possibility. Next, we investigate whether models can be trained to predict "P(IK)", the probability that "I know" the answer to a question, without reference to any particular proposed answer. Models perform well at predicting P(IK) and partially generalize across tasks, though they struggle with calibration of P(IK) on new tasks. The predicted P(IK) probabilities also increase appropriately in the presence of relevant source materials in the context, and in the presence of hints towards the solution of mathematical word problems. We hope these observations lay the groundwork for training more honest models, and for investigating how honesty generalizes to cases where models are trained on objectives other than the imitation of human writing. △ Less

Submitted 21 November, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

Comments: 23+17 pages; refs added, typos fixed

arXiv:2205.10487 [pdf, other]

Scaling Laws and Interpretability of Learning from Repeated Data

Authors: Danny Hernandez, Tom Brown, Tom Conerly, Nova DasSarma, Dawn Drain, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Tom Henighan, Tristan Hume, Scott Johnston, Ben Mann, Chris Olah, Catherine Olsson, Dario Amodei, Nicholas Joseph, Jared Kaplan, Sam McCandlish

Abstract: Recent large language models have been trained on vast datasets, but also often on repeated data, either intentionally for the purpose of upweighting higher quality data, or unintentionally because data deduplication is not perfect and the model is exposed to repeated data at the sentence, paragraph, or document level. Some works have reported substantial negative performance effects of this repea… ▽ More Recent large language models have been trained on vast datasets, but also often on repeated data, either intentionally for the purpose of upweighting higher quality data, or unintentionally because data deduplication is not perfect and the model is exposed to repeated data at the sentence, paragraph, or document level. Some works have reported substantial negative performance effects of this repeated data. In this paper we attempt to study repeated data systematically and to understand its effects mechanistically. To do this, we train a family of models where most of the data is unique but a small fraction of it is repeated many times. We find a strong double descent phenomenon, in which repeated data can lead test loss to increase midway through training. A predictable range of repetition frequency leads to surprisingly severe degradation in performance. For instance, performance of an 800M parameter model can be degraded to that of a 2x smaller model (400M params) by repeating 0.1% of the data 100 times, despite the other 90% of the training tokens remaining unique. We suspect there is a range in the middle where the data can be memorized and doing so consumes a large fraction of the model's capacity, and this may be where the peak of degradation occurs. Finally, we connect these observations to recent mechanistic interpretability work - attempting to reverse engineer the detailed computations performed by the model - by showing that data repetition disproportionately damages copying and internal structures associated with generalization, such as induction heads, providing a possible mechanism for the shift from generalization to memorization. Taken together, these results provide a hypothesis for why repeating a relatively small fraction of data in large language models could lead to disproportionately large harms to performance. △ Less

Submitted 20 May, 2022; originally announced May 2022.

Comments: 23 pages, 22 figures

arXiv:2204.05862 [pdf, other]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Authors: Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, Dario Amodei , et al. (6 additional authors not shown)

Abstract: We apply preference modeling and reinforcement learning from human feedback (RLHF) to finetune language models to act as helpful and harmless assistants. We find this alignment training improves performance on almost all NLP evaluations, and is fully compatible with training for specialized skills such as python coding and summarization. We explore an iterated online mode of training, where prefer… ▽ More We apply preference modeling and reinforcement learning from human feedback (RLHF) to finetune language models to act as helpful and harmless assistants. We find this alignment training improves performance on almost all NLP evaluations, and is fully compatible with training for specialized skills such as python coding and summarization. We explore an iterated online mode of training, where preference models and RL policies are updated on a weekly cadence with fresh human feedback data, efficiently improving our datasets and models. Finally, we investigate the robustness of RLHF training, and identify a roughly linear relation between the RL reward and the square root of the KL divergence between the policy and its initialization. Alongside our main results, we perform peripheral analyses on calibration, competing objectives, and the use of OOD detection, compare our models with human writers, and provide samples from our models using prompts appearing in recent related work. △ Less

Submitted 12 April, 2022; originally announced April 2022.

Comments: Data available at https://github.com/anthropics/hh-rlhf

arXiv:2202.07785 [pdf, other]

doi 10.1145/3531146.3533229

Predictability and Surprise in Large Generative Models

Authors: Deep Ganguli, Danny Hernandez, Liane Lovitt, Nova DasSarma, Tom Henighan, Andy Jones, Nicholas Joseph, Jackson Kernion, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Nelson Elhage, Sheer El Showk, Stanislav Fort, Zac Hatfield-Dodds, Scott Johnston, Shauna Kravec, Neel Nanda, Kamal Ndousse, Catherine Olsson, Daniela Amodei, Dario Amodei , et al. (5 additional authors not shown)

Abstract: Large-scale pre-training has recently emerged as a technique for creating capable, general purpose, generative models such as GPT-3, Megatron-Turing NLG, Gopher, and many others. In this paper, we highlight a counterintuitive property of such models and discuss the policy implications of this property. Namely, these generative models have an unusual combination of predictable loss on a broad train… ▽ More Large-scale pre-training has recently emerged as a technique for creating capable, general purpose, generative models such as GPT-3, Megatron-Turing NLG, Gopher, and many others. In this paper, we highlight a counterintuitive property of such models and discuss the policy implications of this property. Namely, these generative models have an unusual combination of predictable loss on a broad training distribution (as embodied in their "scaling laws"), and unpredictable specific capabilities, inputs, and outputs. We believe that the high-level predictability and appearance of useful capabilities drives rapid development of such models, while the unpredictable qualities make it difficult to anticipate the consequences of model deployment. We go through examples of how this combination can lead to socially harmful behavior with examples from the literature and real world observations, and we also perform two novel experiments to illustrate our point about harms from unpredictability. Furthermore, we analyze how these conflicting properties combine to give model developers various motivations for deploying these models, and challenges that can hinder deployment. We conclude with a list of possible interventions the AI community may take to increase the chance of these models having a beneficial impact. We intend this paper to be useful to policymakers who want to understand and regulate AI systems, technologists who care about the potential policy impact of their work, and academics who want to analyze, critique, and potentially develop large generative models. △ Less

Submitted 3 October, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

Comments: Updated to reflect the version submitted (and accepted) to ACM FAccT '22. This update incorporates feedback from peer-review and fixes minor typos. See open access FAccT conference version at: https://dl.acm.org/doi/abs/10.1145/3531146.3533229

arXiv:2112.00861 [pdf, other]

A General Language Assistant as a Laboratory for Alignment

Authors: Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Jackson Kernion, Kamal Ndousse, Catherine Olsson, Dario Amodei, Tom Brown, Jack Clark, Sam McCandlish, Chris Olah, Jared Kaplan

Abstract: Given the broad capabilities of large language models, it should be possible to work towards a general-purpose, text-based assistant that is aligned with human values, meaning that it is helpful, honest, and harmless. As an initial foray in this direction we study simple baseline techniques and evaluations, such as prompting. We find that the benefits from modest interventions increase with model… ▽ More Given the broad capabilities of large language models, it should be possible to work towards a general-purpose, text-based assistant that is aligned with human values, meaning that it is helpful, honest, and harmless. As an initial foray in this direction we study simple baseline techniques and evaluations, such as prompting. We find that the benefits from modest interventions increase with model size, generalize to a variety of alignment evaluations, and do not compromise the performance of large models. Next we investigate scaling trends for several training objectives relevant to alignment, comparing imitation learning, binary discrimination, and ranked preference modeling. We find that ranked preference modeling performs much better than imitation learning, and often scales more favorably with model size. In contrast, binary discrimination typically performs and scales very similarly to imitation learning. Finally we study a `preference model pre-training' stage of training, with the goal of improving sample efficiency when finetuning on human preferences. △ Less

Submitted 9 December, 2021; v1 submitted 1 December, 2021; originally announced December 2021.

Comments: 26+19 pages; v2 typos fixed, refs added, figure scale / colors fixed; v3 correct very non-standard TruthfulQA formatting and metric, alignment implications slightly improved

arXiv:2109.01301 [pdf, other]

Photo-induced plasmon-phonon coupling in PbTe

Authors: M. P. Jiang, M. Trigo, S. Fahy, A. Hauber, É. D. Murray, I Savić, C. Bray, J. N. Clark, T. Henighan, M. Kozina, M. Chollet, J. M. Glownia, M. C. Hoffmann, D. Zhu, O. Delaire, A. F. May, B. C. Sales, A. M. Lindenberg, P. Zalden, T. Sato, R. Merlin, D. A. Reis

Abstract: We report the observation of photo-induced plasmon-phonon coupled modes in the group IV-VI semiconductor PbTe using Fourier-transform inelastic X-ray scattering at the Linac Coherent Light Source (LCLS). We measure the near-zone-center dispersion of the heavily screened longitudinal optical (LO) phonon branch as extracted from differential changes in x-ray diffuse scattering intensity following ab… ▽ More We report the observation of photo-induced plasmon-phonon coupled modes in the group IV-VI semiconductor PbTe using Fourier-transform inelastic X-ray scattering at the Linac Coherent Light Source (LCLS). We measure the near-zone-center dispersion of the heavily screened longitudinal optical (LO) phonon branch as extracted from differential changes in x-ray diffuse scattering intensity following above band gap photoexcitation. △ Less

Submitted 3 September, 2021; originally announced September 2021.

Comments: 5 pages, 2 figures

arXiv:2102.01293 [pdf, other]

Scaling Laws for Transfer

Authors: Danny Hernandez, Jared Kaplan, Tom Henighan, Sam McCandlish

Abstract: We study empirical scaling laws for transfer learning between distributions in an unsupervised, fine-tuning setting. When we train increasingly large neural networks from-scratch on a fixed-size dataset, they eventually become data-limited and stop improving in performance (cross-entropy loss). When we do the same for models pre-trained on a large language dataset, the slope in performance gains i… ▽ More We study empirical scaling laws for transfer learning between distributions in an unsupervised, fine-tuning setting. When we train increasingly large neural networks from-scratch on a fixed-size dataset, they eventually become data-limited and stop improving in performance (cross-entropy loss). When we do the same for models pre-trained on a large language dataset, the slope in performance gains is merely reduced rather than going to zero. We calculate the effective data "transferred" from pre-training by determining how much data a transformer of the same size would have required to achieve the same loss when training from scratch. In other words, we focus on units of data while holding everything else fixed. We find that the effective data transferred is described well in the low data regime by a power-law of parameter count and fine-tuning dataset size. We believe the exponents in these power-laws correspond to measures of the generality of a model and proximity of distributions (in a directed rather than symmetric sense). We find that pre-training effectively multiplies the fine-tuning dataset size. Transfer, like overall performance, scales predictably in terms of parameters, data, and compute. △ Less

Submitted 1 February, 2021; originally announced February 2021.

Comments: 19 pages, 15 figures

arXiv:2010.14701 [pdf, other]

Scaling Laws for Autoregressive Generative Modeling

Authors: Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, Chris Hallacy, Benjamin Mann, Alec Radford, Aditya Ramesh, Nick Ryder, Daniel M. Ziegler, John Schulman, Dario Amodei, Sam McCandlish

Abstract: We identify empirical scaling laws for the cross-entropy loss in four domains: generative image modeling, video modeling, multimodal image$\leftrightarrow$text models, and mathematical problem solving. In all cases autoregressive Transformers smoothly improve in performance as model size and compute budgets increase, following a power-law plus constant scaling law. The optimal model size also depe… ▽ More We identify empirical scaling laws for the cross-entropy loss in four domains: generative image modeling, video modeling, multimodal image$\leftrightarrow$text models, and mathematical problem solving. In all cases autoregressive Transformers smoothly improve in performance as model size and compute budgets increase, following a power-law plus constant scaling law. The optimal model size also depends on the compute budget through a power-law, with exponents that are nearly universal across all data domains. The cross-entropy loss has an information theoretic interpretation as $S($True$) + D_{\mathrm{KL}}($True$||$Model$)$, and the empirical scaling laws suggest a prediction for both the true data distribution's entropy and the KL divergence between the true and model distributions. With this interpretation, billion-parameter Transformers are nearly perfect models of the YFCC100M image distribution downsampled to an $8\times 8$ resolution, and we can forecast the model size needed to achieve any given reducible loss (ie $D_{\mathrm{KL}}$) in nats/image for other resolutions. We find a number of additional scaling laws in specific domains: (a) we identify a scaling relation for the mutual information between captions and images in multimodal models, and show how to answer the question "Is a picture worth a thousand words?"; (b) in the case of mathematical problem solving, we identify scaling laws for model performance when extrapolating beyond the training distribution; (c) we finetune generative image models for ImageNet classification and find smooth scaling of the classification loss and error rate, even as the generative loss levels off. Taken together, these results strengthen the case that scaling laws have important implications for neural network performance, including on downstream tasks. △ Less

Submitted 5 November, 2020; v1 submitted 27 October, 2020; originally announced October 2020.

Comments: 20+17 pages, 33 figures; added appendix with additional language results

arXiv:2006.08879 [pdf, other]

doi 10.1103/PhysRevB.103.054109

Formation of buried domain walls in the ultrafast transition of SmTe$_3$

Authors: M. Trigo, P. Giraldo-Gallo, J. N. Clark, M. E. Kozina, T. Henighan, M. P. Jiang, M. Chollet, I. R. Fisher, J. M. Glownia, T. Katayama, P. S. Kirchmann, D. Leuenberger, H. Liu, D. A. Reis, Z. X. Shen, D. Zhu

Abstract: We study ultrafast x-ray diffraction on the charge density wave (CDW) of SmTe$_3$ using an x-ray free electron laser. The CDW peaks show that photoexcitation with near-infrared pump centered at 800 nm generates domain walls of the order parameter propagating perpendicular to the sample surface. These domain walls break the CDW long range order and suppress the diffraction intensity of the CDW for… ▽ More We study ultrafast x-ray diffraction on the charge density wave (CDW) of SmTe$_3$ using an x-ray free electron laser. The CDW peaks show that photoexcitation with near-infrared pump centered at 800 nm generates domain walls of the order parameter propagating perpendicular to the sample surface. These domain walls break the CDW long range order and suppress the diffraction intensity of the CDW for times much longer than the $\sim 1$~ps recovery of the local electronic gap. We reconstruct the spatial and temporal dependence of the order parameter using a simple Ginzburg-Landau model and find good agreement between the experimental and model fluence dependences. Based on the model we find that at long times, depending on the pump fluence, multiple domain walls remain at distances of few nm from the surface. △ Less

Submitted 15 June, 2020; originally announced June 2020.

Journal ref: Phys. Rev. B 103, 054109 (2021)

arXiv:2005.14165 [pdf, other]

Language Models are Few-Shot Learners

Authors: Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess , et al. (6 additional authors not shown)

Abstract: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few… ▽ More Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general. △ Less

Submitted 22 July, 2020; v1 submitted 28 May, 2020; originally announced May 2020.

Comments: 40+32 pages

arXiv:2001.08361 [pdf, other]

Scaling Laws for Neural Language Models

Authors: Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei

Abstract: We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence… ▽ More We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These relationships allow us to determine the optimal allocation of a fixed compute budget. Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence. △ Less

Submitted 22 January, 2020; originally announced January 2020.

Comments: 19 pages, 15 figures

arXiv:1908.07161 [pdf, ps, other]

doi 10.1103/PhysRevB.103.L180101

Measurements of Nonequilibrium Interatomic Forces in Photoexcited Bismuth

Authors: Samuel W. Teitelbaum, Thomas C. Henighan, Hanzhe Liu, Mason P. Jiang, Diling Zhu, Matthieu Chollet, Takahiro Sato, Éamonn D. Murray, Stephen Fahy, Shane O'Mahony, Trevor P. Bailey, Ctirad Uher, Mariano Trigo, David A. Reis

Abstract: We determine experimentally the excited-state interatomic forces in photoexcited bismuth. The forces are obtained by a constrained least-squares fit of the excited-state dispersion obtained by femtosecond time-resolved x-ray diffuse scattering to a fifteen-nearest neighbor Born-von Karman model. We find that the observed softening of the zone-center $A_{1g}$ optical mode and transverse acoustic mo… ▽ More We determine experimentally the excited-state interatomic forces in photoexcited bismuth. The forces are obtained by a constrained least-squares fit of the excited-state dispersion obtained by femtosecond time-resolved x-ray diffuse scattering to a fifteen-nearest neighbor Born-von Karman model. We find that the observed softening of the zone-center $A_{1g}$ optical mode and transverse acoustic modes with photoexcitation are primarily due to a weakening of three nearest neighbor forces along the bonding direction. This provides a more complete picture of what drives the partial reversal of the Peierls distortion previously observed in photoexcited bismuth. △ Less

Submitted 20 August, 2019; originally announced August 2019.

Comments: 6 pages, 3 figures, plus 3 pages, 3 figures of supplemental information

Journal ref: Phys. Rev. B 103, 180101 (2021)

arXiv:1809.09799 [pdf, other]

doi 10.1103/PhysRevB.99.104111

Coherent order parameter dynamics in SmTe$_3$

Authors: M. Trigo, P. Giraldo-Gallo, M. E. Kozina, T. Henighan, M. P. Jiang, H. Liu, J. N. Clark, M. Chollet, J. M. Glownia, D. Zhu, T. Katayama, D. Leuenberger, P. S. Kirchmann, I. R. Fisher, Z. X. Shen, D. A. Reis

Abstract: We present a combined ultrafast optical pump-probe and ultrafast x-ray diffraction measurement of the CDW dynamics in SmTe$_3$ at 300 K. The ultrafast x-ray diffraction measurements, taken at the Linac Coherent Light Source reveal a $\sim 1.55$ THz mode that becomes overdamped with increasing fluence. We identify this oscillation with the lattice component of the amplitude mode. Furthermore, these… ▽ More We present a combined ultrafast optical pump-probe and ultrafast x-ray diffraction measurement of the CDW dynamics in SmTe$_3$ at 300 K. The ultrafast x-ray diffraction measurements, taken at the Linac Coherent Light Source reveal a $\sim 1.55$ THz mode that becomes overdamped with increasing fluence. We identify this oscillation with the lattice component of the amplitude mode. Furthermore, these data allow for a more clear identification of the frequencies present in the optical pump-probe data. In both, reflectivity and diffraction, we observe a crossover of the response from linear (for small displacements) to quadratic in the amplitude of the order parameter displacement. Finally, a time-dependent Ginzburg-Landau model captures the essential features of the experimental observations. △ Less

Submitted 26 September, 2018; originally announced September 2018.

Comments: 7 pages, 5 figures

Journal ref: Phys. Rev. B 99, 104111 (2019)

arXiv:1710.02207 [pdf, ps, other]

doi 10.1103/PhysRevLett.121.125901

Direct Measurement of Anharmonic Decay Channels of a Coherent Phonon

Authors: Samuel W. Teitelbaum, Tom Henighan, Yijing Huang, Hanzhe Liu, Mason P. Jiang, Diling Zhu, Matthieu Chollet, Takahiro Sato, Éamonn D. Murray, Stephen Fahy, Shane O'Mahony, Trevor P. Bailey, Ctirad Uher, Mariano Trigo, David A. Reis

Abstract: We observe anharmonic decay of the photoexcited coherent A1g phonon in bismuth to points in the Brillouin zone where conservation of momentum and energy are satisfied for three-phonon scattering. The decay of a coherent phonon can be understood as a parametric resonance process whereby the atomic displacement periodically modulates the frequency of a broad continuum of modes. This results in energ… ▽ More We observe anharmonic decay of the photoexcited coherent A1g phonon in bismuth to points in the Brillouin zone where conservation of momentum and energy are satisfied for three-phonon scattering. The decay of a coherent phonon can be understood as a parametric resonance process whereby the atomic displacement periodically modulates the frequency of a broad continuum of modes. This results in energy transfer through resonant squeezing of the target modes. Using ultrafast diffuse x-ray scattering, we observe build up of coherent oscillations in the target modes driven by this parametric resonance over a wide range of the Brillouin zone. We compare the extracted anharmonic coupling constant to first principles calculations for a representative decay channel. △ Less

Submitted 20 October, 2017; v1 submitted 5 October, 2017; originally announced October 2017.

Comments: 5 pages, 6 figures

Journal ref: Phys. Rev. Lett. 121, 125901 (2018)

arXiv:1511.03626 [pdf]

Photoinduced suppression of the ferroelectric instability in PbTe

Authors: M. P. Jiang, M. Trigo, S. Fahy, É. D. Murray, I. Savić, C. Bray, J. Clark, T. Henighan, M. Kozina, M. Chollet, J. M. Glownia, M. Hoffmann, D. Zhu, O. Delaire, A. F. May, B. C. Sales, A. M. Lindenberg, P. Zalden, T. Sato, R. Merlin, D. A. Reis

Abstract: The interactions between electrons and phonons drive a large array of technologically relevant material properties including ferroelectricity, thermoelectricity, and phase-change behaviour. In the case of many group IV-VI, V, and related materials, these interactions are strong and the materials exist near electronic and structural phase transitions. Their close proximity to phase instability prod… ▽ More The interactions between electrons and phonons drive a large array of technologically relevant material properties including ferroelectricity, thermoelectricity, and phase-change behaviour. In the case of many group IV-VI, V, and related materials, these interactions are strong and the materials exist near electronic and structural phase transitions. Their close proximity to phase instability produces a fragile balance among the various properties. The prototypical example is PbTe whose incipient ferroelectric behaviour has been associated with large phonon anharmonicity and thermoelectricity. Experimental measurements on PbTe reveal anomalous lattice dynamics, especially in the soft transverse optical phonon branch. This has been interpreted in terms of both giant anharmonicity and local symmetry breaking due to off-centering of the Pb ions. The observed anomalies have prompted renewed theoretical and computational interest, which has in turn revived focus on the extent that electron-phonon interactions drive lattice instabilities in PbTe and related materials. Here, we use Fourier-transform inelastic x-ray scattering (FT-IXS) to show that photo-injection of free carriers stabilizes the paraelectric state. With support from constrained density functional theory (CDFT) calculations, we find that photoexcitation weakens the long-range forces along the cubic direction tied to resonant bonding and incipient ferroelectricity. This demonstrates the importance of electronic states near the band edges in determining the equilibrium structure. △ Less

Submitted 11 November, 2015; originally announced November 2015.

Comments: 9 page, 3 figures

arXiv:1510.02403 [pdf, other]

doi 10.1103/PhysRevB.94.020302

How to distinguish squeezed and coherent phonons in femtosecond x-ray diffuse scattering

Authors: T. Henighan, M. Trigo, M. Chollet, J. N. Clark, S. Fahy, J. M. Glownia, M. P. Jiang, M. Kozina, H. Liu, S. Song, D. Zhu, D. A. Reis

Abstract: Impulsive optical excitation can generate both coherent and squeezed phonons. The expectation value of the phonon displacement $<u_q>$ oscillates at the mode frequency for the coherent state but remains zero for a pure squeezed state. In contrast, both show oscillations in $<|u_q|^2>$ at twice the mode frequency. Therefore it can be difficult to distinguish them in a second-order measurement of th… ▽ More Impulsive optical excitation can generate both coherent and squeezed phonons. The expectation value of the phonon displacement $<u_q>$ oscillates at the mode frequency for the coherent state but remains zero for a pure squeezed state. In contrast, both show oscillations in $<|u_q|^2>$ at twice the mode frequency. Therefore it can be difficult to distinguish them in a second-order measurement of the displacements, such as in first-order x-ray diffuse scattering. Here we demonstrate a simple method to distinguish squeezed from coherent atomic motion by measurement of the diffuse scattering following double impulsive excitation. We find that femtosecond optical excitation generates squeezed phonons spanning the Brillouin zone in Ge, GaAs and InSb. Our results confirm the mechanism suggested in [Nature Physics 9, 790 (2013)]. △ Less

Submitted 8 October, 2015; originally announced October 2015.

Comments: 5 pages, 3 figures

Journal ref: Phys. Rev. B 94, 020302 (2016)

arXiv:1509.03348 [pdf, other]

doi 10.1103/PhysRevB.93.220301

Generation of high-frequency strain waves during femtosecond demagnetization of Fe/MgO films

Authors: T. Henighan, M. Trigo, S. Bonetti, P. Granitzka, D. Higley, Z. Chen, M. P. Jiang, R. Kukreja, A. Gray, A. H. Reid, E. Jal, M. C. Hoffmann, M. Kozina, S. Song, M. Chollet, D. Zhu, P. F. Xu, J. Jeong, K. Carva, P. Maldonado, P. M. Oppeneer, M. G. Samant, S. S. P. Parkin, D. A. Reis, H. A. Dürr

Abstract: We use femtosecond time-resolved hard x-ray scattering to detect coherent acoustic phonons excited during ultrafast laser demagnetization of bcc Fe films. We determine the lattice strain propagating through the film through analysis of the oscillations in the x-ray scattering signal as a function of momentum transfer. The width of the strain wavefront is ~100 fs, similar to demagnetization timesca… ▽ More We use femtosecond time-resolved hard x-ray scattering to detect coherent acoustic phonons excited during ultrafast laser demagnetization of bcc Fe films. We determine the lattice strain propagating through the film through analysis of the oscillations in the x-ray scattering signal as a function of momentum transfer. The width of the strain wavefront is ~100 fs, similar to demagnetization timescales. First-principles calculations show that the high-frequency Fourier components of the strain, which give rise to the sharp wavefront, could in part originate from non-thermal dynamics of the lattice not considered in the two-temperature model. △ Less

Submitted 10 September, 2015; originally announced September 2015.

Comments: 5 pages, 3 figures

Journal ref: Phys. Rev. B 93, 220301 (2016)

arXiv:1504.06655 [pdf, other]

doi 10.1103/PhysRevB.92.054303

Phonon Spectroscopy with Sub-meV Resolution by Femtosecond X-ray Diffuse Scattering

Authors: Diling Zhu, Aymeric Robert, Tom Henighan, Henrik T. Lemke, Matthieu Chollet, J. Michael Glownia, David A. Reis, Mariano Trigo

Abstract: We present a reconstruction of the transverse acoustic phonon dispersion of germanium from femtosecond time-resolved x-ray diffuse scattering measurements at the Linac Coherent Light Source. We demonstrate an energy resolution of 0.3 meV with momentum resolution of 0.01 nm^-1 using 10 keV x-rays with a bandwidth of ~ 1 eV. This high resolution was achieved simultaneously for a large section of rec… ▽ More We present a reconstruction of the transverse acoustic phonon dispersion of germanium from femtosecond time-resolved x-ray diffuse scattering measurements at the Linac Coherent Light Source. We demonstrate an energy resolution of 0.3 meV with momentum resolution of 0.01 nm^-1 using 10 keV x-rays with a bandwidth of ~ 1 eV. This high resolution was achieved simultaneously for a large section of reciprocal space including regions closely following three of the principle symmetry directions. The phonon dispersion was reconstructed with less than three hours of measurement time, during which neither the x-ray energy, the sample orientation, nor the detector position were scanned. These results demonstrate how time-domain measurements can complement conventional frequency domain inelastic scattering techniques. △ Less

Submitted 24 April, 2015; originally announced April 2015.

Comments: 3 figures, 4 pages

Journal ref: Phys. Rev. B 92, 054303 (2015)

arXiv:1502.00704 [pdf]

Nonlinear X-ray Compton Scattering

Authors: Matthias Fuchs, Mariano Trigo, Jian Chen, Shambhu Ghimire, Sharon Shwartz, Michael Kozina, Mason Jiang, Thomas Henighan, Crystal Bray, Georges Ndabashimiye, P. H. Bucksbaum, Yiping Feng, Sven Herrmann, Gabriella Carini, Jack Pines, Philip Hart, Christopher Kenney, Serge Guillet, Sebastien Boutet, Garth Williams, Marc Messerschmidt, Marvin Seibert, Stefan Moeller, Jerome B. Hastings, David A. Reis

Abstract: X-ray scattering is a weak linear probe of matter. It is primarily sensitive to the position of electrons and their momentum distribution. Elastic X-ray scattering forms the basis of atomic structural determination while inelastic Compton scattering is often used as a spectroscopic probe of both single-particle excitations and collective modes. X-ray free-electron lasers (XFELs) are unique tools f… ▽ More X-ray scattering is a weak linear probe of matter. It is primarily sensitive to the position of electrons and their momentum distribution. Elastic X-ray scattering forms the basis of atomic structural determination while inelastic Compton scattering is often used as a spectroscopic probe of both single-particle excitations and collective modes. X-ray free-electron lasers (XFELs) are unique tools for studying matter on its natural time and length scales due to their bright and coherent ultrashort pulses. However, in the focus of an XFEL the assumption of a weak linear probe breaks down, and nonlinear light-matter interactions can become ubiquitous. The field can be sufficiently high that even non-resonant multiphoton interactions at hard X-rays wavelengths become relevant. Here we report the observation of one of the most fundamental nonlinear X-ray-matter interactions, the simultaneous Compton scattering of two identical photons producing a single photon at nearly twice the photon energy. We measure scattered photons with an energy near 18 keV generated from solid beryllium irradiated by 8.8-9.75 keV XFEL pulses. The intensity in the X-ray focus reaches up to 4x20 W/cm2, which corresponds to a peak electric field two orders of magnitude higher than the atomic unit of field-strength and within four orders of magnitude of the quantum electrodynamic critical field. The observed signal scales quadratically in intensity and is emitted into a non-dipolar pattern, consistent with the simultaneous two-photon scattering from free electrons. However, the energy of the generated photons shows an anomalously large redshift only present at high intensities. This indicates that the instantaneous high-intensity scattering effectively interacts with a different electron momentum distribution than linear Compton scattering, with implications for the study of atomic-scale structure and dynamics of matter △ Less

Submitted 27 February, 2015; v1 submitted 2 February, 2015; originally announced February 2015.

Showing 1–31 of 31 results for author: Henighan, T