Skip to main content

Showing 1–19 of 19 results for author: Chi, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.19823  [pdf, ps, other

    cs.LG cs.AI

    Persona Features Control Emergent Misalignment

    Authors: Miles Wang, Tom Dupré la Tour, Olivia Watkins, Alex Makelov, Ryan A. Chi, Samuel Miserendino, Johannes Heidecke, Tejal Patwardhan, Dan Mossing

    Abstract: Understanding how language models generalize behaviors from their training to a broader deployment distribution is an important problem in AI safety. Betley et al. discovered that fine-tuning GPT-4o on intentionally insecure code causes "emergent misalignment," where models give stereotypically malicious responses to unrelated prompts. We extend this work, demonstrating emergent misalignment acros… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    ACM Class: I.2.6; I.2.7

  2. arXiv:2406.17038  [pdf, other

    cs.CL

    modeLing: A Novel Dataset for Testing Linguistic Reasoning in Language Models

    Authors: Nathan A. Chi, Teodor Malchev, Riley Kong, Ryan A. Chi, Lucas Huang, Ethan A. Chi, R. Thomas McCoy, Dragomir Radev

    Abstract: We introduce modeLing, a novel benchmark of Linguistics Olympiad-style puzzles which tests few-shot reasoning in AI systems. Solving these puzzles necessitates inferring aspects of a language's grammatical structure from a small number of examples. Such puzzles provide a natural testbed for language models, as they require compositional generalization and few-shot inductive reasoning. Consisting s… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  3. arXiv:2406.06196  [pdf, other

    cs.CL

    LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages

    Authors: Andrew M. Bean, Simi Hellsten, Harry Mayne, Jabez Magomere, Ethan A. Chi, Ryan Chi, Scott A. Hale, Hannah Rose Kirk

    Abstract: In this paper, we present the LingOly benchmark, a novel benchmark for advanced reasoning abilities in large language models. Using challenging Linguistic Olympiad puzzles, we evaluate (i) capabilities for in-context identification and generalisation of linguistic patterns in very low-resource or extinct languages, and (ii) abilities to follow complex task instructions. The LingOly benchmark cover… ▽ More

    Submitted 31 October, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Oral presentation at NeurIPS 2024 Datasets and Benchmarks Track. 10 pages, 5 figures, 22 pages supplemental materials

  4. arXiv:2402.08939  [pdf, other

    cs.AI cs.CL

    Premise Order Matters in Reasoning with Large Language Models

    Authors: Xinyun Chen, Ryan A. Chi, Xuezhi Wang, Denny Zhou

    Abstract: Large language models (LLMs) have accomplished remarkable reasoning performance in various domains. However, in the domain of reasoning tasks, we discover a frailty: LLMs are surprisingly brittle to the ordering of the premises, despite the fact that such ordering does not alter the underlying task. In particular, we observe that LLMs achieve the best performance when the premise order aligns with… ▽ More

    Submitted 28 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: Published at ICML 2024. Xinyun and Ryan contribute equally

  5. arXiv:2310.15773  [pdf, other

    cs.CL

    BLESS: Benchmarking Large Language Models on Sentence Simplification

    Authors: Tannon Kew, Alison Chi, Laura Vásquez-Rodríguez, Sweta Agrawal, Dennis Aumiller, Fernando Alva-Manchego, Matthew Shardlow

    Abstract: We present BLESS, a comprehensive performance benchmark of the most recent state-of-the-art large language models (LLMs) on the task of text simplification (TS). We examine how well off-the-shelf LLMs can solve this challenging task, assessing a total of 44 models, differing in size, architecture, pre-training methods, and accessibility, on three test sets from different domains (Wikipedia, news,… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: This paper has been accepted to EMNLP 2023 as a main long paper. 9 pages, 7 figures

  6. Learning to Paraphrase Sentences to Different Complexity Levels

    Authors: Alison Chi, Li-Kuang Chen, Yi-Chen Chang, Shu-Hui Lee, Jason S. Chang

    Abstract: While sentence simplification is an active research topic in NLP, its adjacent tasks of sentence complexification and same-level paraphrasing are not. To train models on all three tasks, we present two new unsupervised datasets. We compare these datasets, one labeled by a weak classifier and the other by a rule-based approach, with a single supervised dataset. Using these three datasets for traini… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: This arXiv version is a pre-MIT Press publication version, this paper has been accepted by TACL. 22 pages, 3 figures, 13 tables

  7. arXiv:2304.13671  [pdf, other

    math.OC cs.AI

    Multiobjective Logistics Optimization for Automated ATM Cash Replenishment Process

    Authors: Bui Tien Thanh, Dinh Van Tuan, Tuan Anh Chi, Nguyen Van Dai, Nguyen Tai Quang Dinh, Nguyen Thu Thuy, Nguyen Thi Xuan Hoa

    Abstract: In the digital transformation era, integrating digital technology into every aspect of banking operations improves process automation, cost efficiency, and service level improvement. Although logistics for ATM cash is a crucial task that impacts operating costs and consumer satisfaction, there has been little effort to enhance it. Specifically, in Vietnam, with a market of more than 20,000 ATMs na… ▽ More

    Submitted 22 July, 2023; v1 submitted 23 April, 2023; originally announced April 2023.

  8. arXiv:2210.06340  [pdf, other

    cs.CL cs.AI cs.LG

    Improving Radiology Report Generation Systems by Removing Hallucinated References to Non-existent Priors

    Authors: Vignav Ramesh, Nathan Andrew Chi, Pranav Rajpurkar

    Abstract: Current deep learning models trained to generate radiology reports from chest radiographs are capable of producing clinically accurate, clear, and actionable text that can advance patient care. However, such systems all succumb to the same problem: making hallucinated references to non-existent prior reports. Such hallucinations occur because these models are trained on datasets of real-world pati… ▽ More

    Submitted 13 October, 2022; v1 submitted 26 September, 2022; originally announced October 2022.

    Comments: 13 pages, 1 figure, 11 tables

  9. arXiv:2207.12021  [pdf, other

    cs.CL

    Neural Generation Meets Real People: Building a Social, Informative Open-Domain Dialogue Agent

    Authors: Ethan A. Chi, Ashwin Paranjape, Abigail See, Caleb Chiam, Trenton Chang, Kathleen Kenealy, Swee Kiat Lim, Amelia Hardy, Chetanya Rastogi, Haojun Li, Alexander Iyabor, Yutong He, Hari Sowrirajan, Peng Qi, Kaushik Ram Sadagopan, Nguyet Minh Phu, Dilara Soylu, Jillian Tang, Avanika Narayan, Giovanni Campagna, Christopher D. Manning

    Abstract: We present Chirpy Cardinal, an open-domain social chatbot. Aiming to be both informative and conversational, our bot chats with users in an authentic, emotionally intelligent way. By integrating controlled neural generation with scaffolded, hand-written dialogue, we let both the user and bot take turns driving the conversation, producing an engaging and socially fluent experience. Deployed in the… ▽ More

    Submitted 16 January, 2023; v1 submitted 25 July, 2022; originally announced July 2022.

    Comments: SIGDIAL '22

  10. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  11. Wrist-Squeezing Force Feedback Improves Accuracy and Speed in Robotic Surgery Training

    Authors: Sergio Machaca, Eric Cao, Amy Chi, Gina Adrales, Katherine J Kuchenbecker, Jeremy D Brown

    Abstract: Current robotic minimally invasive surgery (RMIS) platforms provide surgeons with no haptic feedback of the robot's physical interactions. This limitation forces surgeons to rely heavily on visual feedback and can make it challenging for surgical trainees to manipulate tissue gently. Prior research has demonstrated that haptic feedback can increase task accuracy in RMIS training. However, it remai… ▽ More

    Submitted 31 March, 2023; v1 submitted 13 May, 2022; originally announced May 2022.

    Comments: 6 figures, 8 pages

    Journal ref: 2022 9th IEEE RAS/EMBS International Conference for Biomedical Robotics and Biomechatronics, Seoul, Republic of Korea, pp. 1-8

  12. arXiv:2201.00927  [pdf

    cs.SD cs.LG eess.AS

    Classifying Autism from Crowdsourced Semi-Structured Speech Recordings: A Machine Learning Approach

    Authors: Nathan A. Chi, Peter Washington, Aaron Kline, Arman Husic, Cathy Hou, Chloe He, Kaitlyn Dunlap, Dennis Wall

    Abstract: Autism spectrum disorder (ASD) is a neurodevelopmental disorder which results in altered behavior, social development, and communication patterns. In past years, autism prevalence has tripled, with 1 in 54 children now affected. Given that traditional diagnosis is a lengthy, labor-intensive process, significant attention has been given to developing systems that automatically screen for autism. Pr… ▽ More

    Submitted 3 January, 2022; originally announced January 2022.

    Comments: 17 pages, 4 figures, submitted to JMIR Pediatrics and Parenting

  13. arXiv:2107.11174  [pdf

    cs.RO cs.HC

    Preliminary investigation into how limb choice affects kinesthetic perception

    Authors: Mohit Singhala, Amy Chi, Maria Coleman, Jeremy D. Brown

    Abstract: We have a limited understanding of how we integrate haptic information in real-time from our upper limbs to perform complex bimanual tasks, an ability that humans routinely employ to perform tasks of varying levels of difficulty. In order to understand how information from both limbs is used to create a unified percept, it is important to study both the limbs separately first. Prevalent theories h… ▽ More

    Submitted 22 July, 2021; originally announced July 2021.

    Comments: Accepted as Works-in-Progress paper to World Haptics 2019

  14. arXiv:2101.11043  [pdf, other

    cs.CL

    Deep Subjecthood: Higher-Order Grammatical Features in Multilingual BERT

    Authors: Isabel Papadimitriou, Ethan A. Chi, Richard Futrell, Kyle Mahowald

    Abstract: We investigate how Multilingual BERT (mBERT) encodes grammar by examining how the high-order grammatical feature of morphosyntactic alignment (how different languages define what counts as a "subject") is manifested across the embedding spaces of different languages. To understand if and how morphosyntactic alignment affects contextual embedding spaces, we train classifiers to recover the subjecth… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

    Comments: EACL 2021

  15. arXiv:2010.14233  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment

    Authors: Ethan A. Chi, Julian Salazar, Katrin Kirchhoff

    Abstract: Non-autoregressive models greatly improve decoding speed over typical sequence-to-sequence models, but suffer from degraded performance. Infilling and iterative refinement models make up some of this gap by editing the outputs of a non-autoregressive model, but are constrained in the edits that they can make. We propose iterative realignment, where refinements occur over latent alignments rather t… ▽ More

    Submitted 24 October, 2020; originally announced October 2020.

    ACM Class: I.2.7

  16. arXiv:2005.04511  [pdf, other

    cs.CL cs.LG

    Finding Universal Grammatical Relations in Multilingual BERT

    Authors: Ethan A. Chi, John Hewitt, Christopher D. Manning

    Abstract: Recent work has found evidence that Multilingual BERT (mBERT), a transformer-based multilingual masked language model, is capable of zero-shot cross-lingual transfer, suggesting that some aspects of its representations are shared cross-lingually. To better understand this overlap, we extend recent work on finding syntactic trees in neural networks' internal representations to the multilingual sett… ▽ More

    Submitted 20 May, 2020; v1 submitted 9 May, 2020; originally announced May 2020.

    Comments: To appear in ACL 2020; Farsi typo corrected

    ACM Class: I.2.7

  17. arXiv:1912.07800  [pdf, other

    cs.LG stat.ML

    SGVAE: Sequential Graph Variational Autoencoder

    Authors: Bowen Jing, Ethan A. Chi, Jillian Tang

    Abstract: Generative models of graphs are well-known, but many existing models are limited in scalability and expressivity. We present a novel sequential graphical variational autoencoder operating directly on graphical representations of data. In our model, the encoding and decoding of a graph as is framed as a sequential deconstruction and construction process, respectively, enabling the the learning of a… ▽ More

    Submitted 16 December, 2019; originally announced December 2019.

  18. arXiv:1805.11544  [pdf, other

    cs.CR

    Limitless HTTP in an HTTPS World: Inferring the Semantics of the HTTPS Protocol without Decryption

    Authors: Blake Anderson, Andrew Chi, Scott Dunlop, David McGrew

    Abstract: We present new analytic techniques for inferring HTTP semantics from passive observations of HTTPS that can infer the value of important fields including the status-code, Content-Type, and Server, and the presence or absence of several additional HTTP header fields, e.g., Cookie and Referer. Our goals are twofold: to better understand the limitations of the confidentiality of HTTPS, and to explore… ▽ More

    Submitted 29 May, 2018; originally announced May 2018.

  19. arXiv:1603.04085  [pdf, other

    cs.CR

    Server-side verification of client behavior in cryptographic protocols

    Authors: Andrew Chi, Robert Cochran, Marie Nesfield, Michael K. Reiter, Cynthia Sturton

    Abstract: Numerous exploits of client-server protocols and applications involve modifying clients to behave in ways that untampered clients would not, such as crafting malicious packets. In this paper, we demonstrate practical verification of a cryptographic protocol client's messaging behavior as being consistent with the client program it is believed to be running. Moreover, we accomplish this without mod… ▽ More

    Submitted 13 March, 2016; originally announced March 2016.