Skip to main content

Showing 1–50 of 50 results for author: Murray, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04517  [pdf, ps, other

    cs.LG cs.CL

    DOTResize: Reducing LLM Width via Discrete Optimal Transport-based Neuron Merging

    Authors: Neha Verma, Kenton Murray, Kevin Duh

    Abstract: Model compression offers a promising path to reducing the cost and inaccessibility of large pre-trained models, without significantly compromising their impressive performance. Large Transformer models, including large language models (LLMs), often contain computational redundancy, which can serve as a target for new model compression methods. In this work, we specifically target neuron-level redu… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  2. arXiv:2506.22724  [pdf, ps, other

    cs.CL

    The Translation Barrier Hypothesis: Multilingual Generation with Large Language Models Suffers from Implicit Translation Failure

    Authors: Niyati Bafna, Tianjian Li, Kenton Murray, David R. Mortensen, David Yarowsky, Hale Sirin, Daniel Khashabi

    Abstract: Multilingual generation with large language models (LLMs) is often of poor quality for mid- to low-resource languages. Building on insights from interpretability, we demonstrate the existence of an implicit task-solving-->translation pipeline for generation, whereby the model first solves the required task in a largely target-language-agnostic manner, and subsequently translates answer concepts in… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: 23 pages incl. appendix

  3. arXiv:2503.20698  [pdf, other

    cs.CV cs.IR

    MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank Fusion

    Authors: Saron Samuel, Dan DeGenaro, Jimena Guallar-Blasco, Kate Sanders, Oluwaseun Eisape, Tanner Spendlove, Arun Reddy, Alexander Martin, Andrew Yates, Eugene Yang, Cameron Carpenter, David Etter, Efsun Kayi, Matthew Wiesner, Kenton Murray, Reno Kriz

    Abstract: Videos inherently contain multiple modalities, including visual events, text overlays, sounds, and speech, all of which are important for retrieval. However, state-of-the-art multimodal language models like VAST and LanguageBind are built on vision-language models (VLMs), and thus overly prioritize visual signals. Retrieval benchmarks further reinforce this bias by focusing on visual queries and n… ▽ More

    Submitted 9 May, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

  4. arXiv:2503.19009  [pdf, other

    cs.CV cs.IR

    Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval

    Authors: Arun Reddy, Alexander Martin, Eugene Yang, Andrew Yates, Kate Sanders, Kenton Murray, Reno Kriz, Celso M. de Melo, Benjamin Van Durme, Rama Chellappa

    Abstract: In this work, we tackle the problem of text-to-video retrieval (T2VR). Inspired by the success of late interaction techniques in text-document, text-image, and text-video retrieval, our approach, Video-ColBERT, introduces a simple and efficient mechanism for fine-grained similarity assessment between queries and videos. Video-ColBERT is built upon 3 main components: a fine-grained spatial and temp… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: Accepted at CVPR 2025. 13 pages, 4 figures. Approved for public release: distribution unlimited

  5. arXiv:2501.16581  [pdf, other

    cs.CL

    DialUp! Modeling the Language Continuum by Adapting Models to Dialects and Dialects to Models

    Authors: Niyati Bafna, Emily Chang, Nathaniel R. Robinson, David R. Mortensen, Kenton Murray, David Yarowsky, Hale Sirin

    Abstract: Most of the world's languages and dialects are low-resource, and lack support in mainstream machine translation (MT) models. However, many of them have a closely-related high-resource language (HRL) neighbor, and differ in linguistically regular ways from it. This underscores the importance of model robustness to dialectal variation and cross-lingual generalization to the HRL dialect continuum. We… ▽ More

    Submitted 24 February, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

    Comments: 9 pages, 46 incl. appendix

  6. arXiv:2501.06126  [pdf, other

    cs.CL cs.LG

    Merging Feed-Forward Sublayers for Compressed Transformers

    Authors: Neha Verma, Kenton Murray, Kevin Duh

    Abstract: With the rise and ubiquity of larger deep learning models, the need for high-quality compression techniques is growing in order to deploy these models widely. The sheer parameter count of these models makes it difficult to fit them into the memory constraints of different hardware. In this work, we present a novel approach to model compression by merging similar parameter groups within a model, ra… ▽ More

    Submitted 28 March, 2025; v1 submitted 10 January, 2025; originally announced January 2025.

  7. arXiv:2411.05088  [pdf

    cs.CL

    Findings of the IWSLT 2024 Evaluation Campaign

    Authors: Ibrahim Said Ahmad, Antonios Anastasopoulos, Ondřej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, William Chen, Qianqian Dong, Marcello Federico, Barry Haddow, Dávid Javorský, Mateusz Krubiński, Tsz Kin Lam, Xutai Ma, Prashant Mathur, Evgeny Matusov, Chandresh Maurya, John McCrae, Kenton Murray, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, Atul Kr. Ojha , et al. (20 additional authors not shown)

    Abstract: This paper reports on the shared tasks organized by the 21st IWSLT Conference. The shared tasks address 7 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, speech-to-speech translation, dialect and low-resource speech translation, and Indic languages. The shared tasks attracted 18 teams whose submissions are documented in… ▽ More

    Submitted 7 November, 2024; originally announced November 2024.

    Comments: IWSLT 2024; 59 pages

  8. arXiv:2411.05020  [pdf, other

    cs.CY stat.AP

    Cast vote records: A database of ballots from the 2020 U.S. Election

    Authors: Shiro Kuriwaki, Mason Reece, Samuel Baltz, Aleksandra Conevska, Joseph R. Loffredo, Can Mutlu, Taran Samarth, Kevin E. Acevedo Jetter, Zachary Djanogly Garai, Kate Murray, Shigeo Hirano, Jeffrey B. Lewis, James M. Snyder Jr., Charles H. Stewart III

    Abstract: Ballots are the core records of elections. Electronic records of actual ballots cast (cast vote records) are available to the public in some jurisdictions. However, they have been released in a variety of formats and have not been independently evaluated. Here we introduce a database of cast vote records from the 2020 U.S. general election. We downloaded publicly available unstandardized cast vote… ▽ More

    Submitted 24 October, 2024; originally announced November 2024.

    Comments: 26 pages and appendix

  9. arXiv:2410.11619  [pdf, other

    cs.CV cs.CL

    MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval

    Authors: Reno Kriz, Kate Sanders, David Etter, Kenton Murray, Cameron Carpenter, Kelly Van Ochten, Hannah Recknor, Jimena Guallar-Blasco, Alexander Martin, Ronald Colaianni, Nolan King, Eugene Yang, Benjamin Van Durme

    Abstract: Efficiently retrieving and synthesizing information from large-scale multimodal collections has become a critical challenge. However, existing video retrieval datasets suffer from scope limitations, primarily focusing on matching descriptive but vague queries with small collections of professionally edited, English-centric videos. To address this gap, we introduce $\textbf{MultiVENT 2.0}$, a large… ▽ More

    Submitted 10 February, 2025; v1 submitted 15 October, 2024; originally announced October 2024.

  10. arXiv:2410.04579  [pdf, other

    cs.CL cs.LG stat.ML

    Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets

    Authors: Tianjian Li, Haoran Xu, Weiting Tan, Kenton Murray, Daniel Khashabi

    Abstract: Data abundance across different domains exhibits a long-tailed distribution: few domains have abundant data, while most face data scarcity. Our work focuses on a multilingual setting, where available data is heavily skewed towards high-resource languages. Two common strategies to address this disparity are upsampling low-resource data (Temperature Sampling) and upweighting low-resource loss (Scala… ▽ More

    Submitted 9 March, 2025; v1 submitted 6 October, 2024; originally announced October 2024.

    Comments: 19 pages, 9 figures, accepted to NAACL 2025 main conference

  11. arXiv:2410.03115  [pdf, other

    cs.CL

    X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale

    Authors: Haoran Xu, Kenton Murray, Philipp Koehn, Hieu Hoang, Akiko Eriguchi, Huda Khayrallah

    Abstract: Large language models (LLMs) have achieved remarkable success across various NLP tasks with a focus on English due to English-centric pre-training and limited multilingual data. In this work, we focus on the problem of translation, and while some multilingual LLMs claim to support for hundreds of languages, models often fail to provide high-quality responses for mid- and low-resource languages, le… ▽ More

    Submitted 2 March, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

    Comments: Published as a conference paper at ICLR 2025 (spotlight)

  12. arXiv:2407.19884  [pdf, other

    cs.CL

    Preliminary WMT24 Ranking of General MT Systems and LLMs

    Authors: Tom Kocmi, Eleftherios Avramidis, Rachel Bawden, Ondrej Bojar, Anton Dvorkovich, Christian Federmann, Mark Fishel, Markus Freitag, Thamme Gowda, Roman Grundkiewicz, Barry Haddow, Marzena Karpinska, Philipp Koehn, Benjamin Marie, Kenton Murray, Masaaki Nagata, Martin Popel, Maja Popovic, Mariya Shmatova, Steinþór Steingrímsson, Vilém Zouhar

    Abstract: This is the preliminary ranking of WMT24 General MT systems based on automatic metrics. The official ranking will be a human evaluation, which is superior to the automatic ranking and supersedes it. The purpose of this report is not to interpret any findings but only provide preliminary results to the participants of the General MT task that may be useful during the writing of the system submissio… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  13. Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models

    Authors: Nikhil Sharma, Kenton Murray, Ziang Xiao

    Abstract: Although the multilingual capability of LLMs offers new opportunities to overcome the language barrier, do these capabilities translate into real-life scenarios where linguistic divide and knowledge conflicts between multilingual sources are known occurrences? In this paper, we studied LLM's linguistic preference in a cross-language RAG-based information search setting. We found that LLMs displaye… ▽ More

    Submitted 11 February, 2025; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: NAACL 2025

  14. arXiv:2406.13718  [pdf, other

    cs.CL

    Evaluating Large Language Models along Dimensions of Language Variation: A Systematik Invesdigatiom uv Cross-lingual Generalization

    Authors: Niyati Bafna, Kenton Murray, David Yarowsky

    Abstract: While large language models exhibit certain cross-lingual generalization capabilities, they suffer from performance degradation (PD) on unseen closely-related languages (CRLs) and dialects relative to their high-resource language neighbour (HRLN). However, we currently lack a fundamental understanding of what kinds of linguistic distances contribute to PD, and to what extent. Furthermore, studies… ▽ More

    Submitted 27 January, 2025; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: 21 pages. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

  15. arXiv:2405.05376  [pdf, other

    cs.CL

    Kreyòl-MT: Building MT for Latin American, Caribbean and Colonial African Creole Languages

    Authors: Nathaniel R. Robinson, Raj Dabre, Ammon Shurtz, Rasul Dent, Onenamiyi Onesi, Claire Bizon Monroc, Loïc Grobol, Hasan Muhammad, Ashi Garg, Naome A. Etori, Vijay Murari Tiyyala, Olanrewaju Samuel, Matthew Dean Stutzman, Bismarck Bamfo Odoom, Sanjeev Khudanpur, Stephen D. Richardson, Kenton Murray

    Abstract: A majority of language technologies are tailored for a small number of high-resource languages, while relatively many low-resource languages are neglected. One such group, Creole languages, have long been marginalized in academic study, though their speakers could benefit from machine translation (MT). These languages are predominantly used in much of Latin America, Africa and the Caribbean. We pr… ▽ More

    Submitted 13 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: NAACL 2024

  16. arXiv:2401.08417  [pdf, other

    cs.CL

    Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

    Authors: Haoran Xu, Amr Sharaf, Yunmo Chen, Weiting Tan, Lingfeng Shen, Benjamin Van Durme, Kenton Murray, Young Jin Kim

    Abstract: Moderate-sized large language models (LLMs) -- those with 7B or 13B parameters -- exhibit promising machine translation (MT) performance. However, even the top-performing 13B LLM-based translation models, like ALMA, does not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4. In this study, we bridge this performance gap. We… ▽ More

    Submitted 2 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted at ICML 2024

  17. arXiv:2311.02310  [pdf, other

    cs.CL

    Narrowing the Gap between Zero- and Few-shot Machine Translation by Matching Styles

    Authors: Weiting Tan, Haoran Xu, Lingfeng Shen, Shuyue Stella Li, Kenton Murray, Philipp Koehn, Benjamin Van Durme, Yunmo Chen

    Abstract: Large language models trained primarily in a monolingual setting have demonstrated their ability to generalize to machine translation using zero- and few-shot examples with in-context learning. However, even though zero-shot translations are relatively good, there remains a discernible gap comparing their performance with the few-shot setting. In this paper, we investigate the factors contributing… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  18. arXiv:2310.07908  [pdf, ps, other

    q-bio.NC cs.AI cs.NE

    Phase codes emerge in recurrent neural networks optimized for modular arithmetic

    Authors: Keith T. Murray

    Abstract: Recurrent neural networks (RNNs) can implement complex computations by leveraging a range of dynamics, such as oscillations, attractors, and transient trajectories. A growing body of work has highlighted the emergence of phase codes, a type of oscillatory activity where information is encoded in the relative phase of network activity, in RNNs trained for working memory tasks. However, these studie… ▽ More

    Submitted 6 July, 2025; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 7 pages, 3 figures

  19. arXiv:2310.00840  [pdf, other

    cs.CL

    Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models

    Authors: Tianjian Li, Haoran Xu, Philipp Koehn, Daniel Khashabi, Kenton Murray

    Abstract: Text generation models are notoriously vulnerable to errors in the training data. With the wide-spread availability of massive amounts of web-crawled data becoming more commonplace, how can we enhance the robustness of models trained on a massive amount of noisy web-crawled text? In our work, we propose Error Norm Truncation (ENT), a robust enhancement method to the standard training objective tha… ▽ More

    Submitted 18 March, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  20. arXiv:2309.04607  [pdf

    cs.CL cs.AI

    Linking Symptom Inventories using Semantic Textual Similarity

    Authors: Eamonn Kennedy, Shashank Vadlamani, Hannah M Lindsey, Kelly S Peterson, Kristen Dams OConnor, Kenton Murray, Ronak Agarwal, Houshang H Amiri, Raeda K Andersen, Talin Babikian, David A Baron, Erin D Bigler, Karen Caeyenberghs, Lisa Delano-Wood, Seth G Disner, Ekaterina Dobryakova, Blessen C Eapen, Rachel M Edelstein, Carrie Esopenko, Helen M Genova, Elbert Geuze, Naomi J Goodrich-Hunsaker, Jordan Grafman, Asta K Haberg, Cooper B Hodges , et al. (57 additional authors not shown)

    Abstract: An extensive library of symptom inventories has been developed over time to measure clinical symptoms, but this variety has led to several long standing issues. Most notably, results drawn from different settings and studies are not comparable, which limits reproducibility. Here, we present an artificial intelligence (AI) approach using semantic textual similarity (STS) to link symptoms and scores… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

  21. arXiv:2307.07049  [pdf, other

    cs.CL

    MegaWika: Millions of reports and their sources across 50 diverse languages

    Authors: Samuel Barham, Orion Weller, Michelle Yuan, Kenton Murray, Mahsa Yarmohammadi, Zhengping Jiang, Siddharth Vashishtha, Alexander Martin, Anqi Liu, Aaron Steven White, Jordan Boyd-Graber, Benjamin Van Durme

    Abstract: To foster the development of new models for collaborative AI-assisted report generation, we introduce MegaWika, consisting of 13 million Wikipedia articles in 50 diverse languages, along with their 71 million referenced source materials. We process this dataset for a myriad of applications, going beyond the initial Wikipedia citation extraction and web scraping of content, including translating no… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

    Comments: Submitted to ACL, 2023

    ACM Class: I.2.7

  22. arXiv:2305.17325  [pdf, other

    cs.CL

    Why Does Zero-Shot Cross-Lingual Generation Fail? An Explanation and a Solution

    Authors: Tianjian Li, Kenton Murray

    Abstract: Zero-shot cross-lingual transfer is when a multilingual model is trained to perform a task in one language and then is applied to another language. Although the zero-shot cross-lingual transfer approach has achieved success in various classification tasks, its performance on natural language generation tasks falls short in quality and sometimes outputs an incorrect language. In our study, we show… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Findings of ACL 2023

  23. arXiv:2305.14230  [pdf, other

    cs.CL

    Exploring Representational Disparities Between Multilingual and Bilingual Translation Models

    Authors: Neha Verma, Kenton Murray, Kevin Duh

    Abstract: Multilingual machine translation has proven immensely useful for both parameter efficiency and overall performance across many language pairs via complete multilingual parameter sharing. However, some language pairs in multilingual models can see worse performance than in bilingual models, especially in the one-to-many translation setting. Motivated by their empirical differences, we examine the g… ▽ More

    Submitted 26 March, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: LREC-COLING 2024

  24. arXiv:2305.13993  [pdf, other

    cs.CL

    Condensing Multilingual Knowledge with Lightweight Language-Specific Modules

    Authors: Haoran Xu, Weiting Tan, Shuyue Stella Li, Yunmo Chen, Benjamin Van Durme, Philipp Koehn, Kenton Murray

    Abstract: Incorporating language-specific (LS) modules is a proven method to boost performance in multilingual machine translation. This approach bears similarity to Mixture-of-Experts (MoE) because it does not inflate FLOPs. However, the scalability of this approach to hundreds of languages (experts) tends to be unmanageable due to the prohibitive number of parameters introduced by full-rank matrices in fu… ▽ More

    Submitted 22 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted at the main conference of EMNLP 2023

  25. arXiv:2305.02176  [pdf, other

    cs.CL

    Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity

    Authors: Haoran Xu, Maha Elbayad, Kenton Murray, Jean Maillard, Vedanuj Goswami

    Abstract: Mixture-of-experts (MoE) models that employ sparse activation have demonstrated effectiveness in significantly increasing the number of parameters while maintaining low computational requirements per token. However, recent studies have established that MoE models are inherently parameter-inefficient as the improvement in performance diminishes with an increasing number of experts. We hypothesize t… ▽ More

    Submitted 22 October, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: Accepted at Findings of EMNLP 2023

  26. arXiv:2303.06311  [pdf, other

    hep-ex cs.LG physics.ins-det

    Generative Adversarial Networks for Scintillation Signal Simulation in EXO-200

    Authors: S. Li, I. Ostrovskiy, Z. Li, L. Yang, S. Al Kharusi, G. Anton, I. Badhrees, P. S. Barbeau, D. Beck, V. Belov, T. Bhatta, M. Breidenbach, T. Brunner, G. F. Cao, W. R. Cen, C. Chambers, B. Cleveland, M. Coon, A. Craycraft, T. Daniels, L. Darroch, S. J. Daugherty, J. Davis, S. Delaquis, A. Der Mesrobian-Kabakian , et al. (65 additional authors not shown)

    Abstract: Generative Adversarial Networks trained on samples of simulated or actual events have been proposed as a way of generating large simulated datasets at a reduced computational cost. In this work, a novel approach to perform the simulation of photodetector signals from the time projection chamber of the EXO-200 experiment is demonstrated. The method is based on a Wasserstein Generative Adversarial N… ▽ More

    Submitted 8 May, 2023; v1 submitted 11 March, 2023; originally announced March 2023.

    Comments: As accepted by JINST

    Journal ref: JINST 18 P06005 2023

  27. arXiv:2211.07628  [pdf, other

    cs.CL

    Language Agnostic Code-Mixing Data Augmentation by Predicting Linguistic Patterns

    Authors: Shuyue Stella Li, Kenton Murray

    Abstract: In this work, we focus on intrasentential code-mixing and propose several different Synthetic Code-Mixing (SCM) data augmentation methods that outperform the baseline on downstream sentiment analysis tasks across various amounts of labeled gold data. Most importantly, our proposed methods demonstrate that strategically replacing parts of sentences in the matrix language with a constant mask signif… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: 12 pages, 5 figures

  28. arXiv:2205.11416  [pdf, other

    cs.CL

    The Importance of Being Parameters: An Intra-Distillation Method for Serious Gains

    Authors: Haoran Xu, Philipp Koehn, Kenton Murray

    Abstract: Recent model pruning methods have demonstrated the ability to remove redundant parameters without sacrificing model performance. Common methods remove redundant parameters according to the parameter sensitivity, a gradient-based measure reflecting the contribution of the parameters. In this paper, however, we argue that redundant parameters can be trained to make beneficial contributions. We first… ▽ More

    Submitted 22 October, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: Accepted at EMNLP 2022

  29. arXiv:2204.13869  [pdf, other

    cs.CL

    Por Qué Não Utiliser Alla Språk? Mixed Training with Gradient Optimization in Few-Shot Cross-Lingual Transfer

    Authors: Haoran Xu, Kenton Murray

    Abstract: The current state-of-the-art for few-shot cross-lingual transfer learning first trains on abundant labeled data in the source language and then fine-tunes with a few examples on the target language, termed target-adapting. Though this has been demonstrated to work on a variety of tasks, in this paper we show some deficiencies of this approach and propose a one-step mixed training method that train… ▽ More

    Submitted 29 April, 2022; originally announced April 2022.

    Comments: Accepted at Findings of NAACL 2022

  30. arXiv:2202.01975  [pdf

    q-bio.QM cs.LG

    Performance of multilabel machine learning models and risk stratification schemas for predicting stroke and bleeding risk in patients with non-valvular atrial fibrillation

    Authors: Juan Lu, Rebecca Hutchens, Joseph Hung, Mohammed Bennamoun, Brendan McQuillan, Tom Briffa, Ferdous Sohel, Kevin Murray, Jonathon Stewart, Benjamin Chow, Frank Sanfilippo, Girish Dwivedi

    Abstract: Appropriate antithrombotic therapy for patients with atrial fibrillation (AF) requires assessment of ischemic stroke and bleeding risks. However, risk stratification schemas such as CHA2DS2-VASc and HAS-BLED have modest predictive capacity for patients with AF. Machine learning (ML) techniques may improve predictive performance and support decision-making for appropriate antithrombotic therapy. We… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

  31. arXiv:2201.08471  [pdf, other

    cs.IR cs.CL

    Transfer Learning Approaches for Building Cross-Language Dense Retrieval Models

    Authors: Suraj Nair, Eugene Yang, Dawn Lawrie, Kevin Duh, Paul McNamee, Kenton Murray, James Mayfield, Douglas W. Oard

    Abstract: The advent of transformer-based models such as BERT has led to the rise of neural ranking models. These models have improved the effectiveness of retrieval systems well beyond that of lexical term matching models such as BM25. While monolingual retrieval tasks have benefited from large-scale training collections such as MS MARCO and advances in neural architectures, cross-language retrieval tasks… ▽ More

    Submitted 20 January, 2022; originally announced January 2022.

    Comments: Accepted at ECIR 2022 (Full paper)

  32. arXiv:2112.02721  [pdf, other

    cs.CL cs.AI cs.LG

    NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

    Authors: Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, Jinho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo , et al. (101 additional authors not shown)

    Abstract: Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data split… ▽ More

    Submitted 11 October, 2022; v1 submitted 5 December, 2021; originally announced December 2021.

    Comments: 39 pages, repository at https://github.com/GEM-benchmark/NL-Augmenter

  33. arXiv:2109.06798  [pdf, other

    cs.CL

    Everything Is All It Takes: A Multipronged Strategy for Zero-Shot Cross-Lingual Information Extraction

    Authors: Mahsa Yarmohammadi, Shijie Wu, Marc Marone, Haoran Xu, Seth Ebner, Guanghui Qin, Yunmo Chen, Jialiang Guo, Craig Harman, Kenton Murray, Aaron Steven White, Mark Dredze, Benjamin Van Durme

    Abstract: Zero-shot cross-lingual information extraction (IE) describes the construction of an IE model for some target language, given existing annotations exclusively in some other language, typically English. While the advance of pretrained multilingual encoders suggests an easy optimism of "train on English, run on any language", we find through a thorough exploration and extension of techniques that a… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021

  34. arXiv:2109.04588  [pdf, other

    cs.CL

    BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation

    Authors: Haoran Xu, Benjamin Van Durme, Kenton Murray

    Abstract: The success of bidirectional encoders using masked language models, such as BERT, on numerous natural language processing tasks has prompted researchers to attempt to incorporate these pre-trained models into neural machine translation (NMT) systems. However, proposed methods for incorporating pre-trained models are non-trivial and mainly focus on BERT, which lacks a comparison of the impact that… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021

    Journal ref: EMNLP 2021

  35. arXiv:2105.01691  [pdf, other

    cs.CL

    Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution

    Authors: Toan Q. Nguyen, Kenton Murray, David Chiang

    Abstract: In this paper, we investigate the driving factors behind concatenation, a simple but effective data augmentation method for low-resource neural machine translation. Our experiments suggest that discourse context is unlikely the cause for the improvement of about +1 BLEU across four language pairs. Instead, we demonstrate that the improvement comes from three other factors unrelated to discourse: c… ▽ More

    Submitted 2 July, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

    Comments: Accepted at IWSLT 2021

  36. arXiv:2104.05696  [pdf, other

    cs.CL

    Joint Universal Syntactic and Semantic Parsing

    Authors: Elias Stengel-Eskin, Kenton Murray, Sheng Zhang, Aaron Steven White, Benjamin Van Durme

    Abstract: While numerous attempts have been made to jointly parse syntax and semantics, high performance in one domain typically comes at the price of performance in the other. This trade-off contradicts the large body of research focusing on the rich interactions at the syntax-semantics interface. We explore multiple model architectures which allow us to exploit the rich syntactic and semantic annotations… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: To appear: TACL 2021

  37. arXiv:2103.02205  [pdf, other

    cs.CL

    Gradual Fine-Tuning for Low-Resource Domain Adaptation

    Authors: Haoran Xu, Seth Ebner, Mahsa Yarmohammadi, Aaron Steven White, Benjamin Van Durme, Kenton Murray

    Abstract: Fine-tuning is known to improve NLP models by adapting an initial model trained on more plentiful but less domain-salient examples to data in a target domain. Such domain adaptation is typically done using one stage of fine-tuning. We demonstrate that gradually fine-tuning in a multi-stage process can yield substantial further gains and can be applied without modifying the model or learning object… ▽ More

    Submitted 1 September, 2021; v1 submitted 3 March, 2021; originally announced March 2021.

    Comments: Adapt-NLP, EACL 2021

    Journal ref: Adapt-NLP EACL 2021

  38. arXiv:2101.04893  [pdf, other

    cs.HC

    Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels

    Authors: Xiaoyi Zhang, Lilian de Greef, Amanda Swearngin, Samuel White, Kyle Murray, Lisa Yu, Qi Shan, Jeffrey Nichols, Jason Wu, Chris Fleizach, Aaron Everitt, Jeffrey P. Bigham

    Abstract: Many accessibility features available on mobile platforms require applications (apps) to provide complete and accurate metadata describing user interface (UI) components. Unfortunately, many apps do not provide sufficient metadata for accessibility features to work as expected. In this paper, we explore inferring accessibility metadata for mobile apps from their pixels, as the visual interfaces of… ▽ More

    Submitted 13 January, 2021; originally announced January 2021.

  39. arXiv:1910.07134  [pdf, other

    cs.CL

    Efficiency through Auto-Sizing: Notre Dame NLP's Submission to the WNGT 2019 Efficiency Task

    Authors: Kenton Murray, Brian DuSell, David Chiang

    Abstract: This paper describes the Notre Dame Natural Language Processing Group's (NDNLP) submission to the WNGT 2019 shared task (Hayashi et al., 2019). We investigated the impact of auto-sizing (Murray and Chiang, 2015; Murray et al., 2019) to the Transformer network (Vaswani et al., 2017) with the goal of substantially reducing the number of parameters in the model. Our method was able to eliminate more… ▽ More

    Submitted 15 October, 2019; originally announced October 2019.

    Comments: The 3rd Workshop on Neural Generation and Translation (WNGT 2019)

  40. arXiv:1910.06717  [pdf, other

    cs.CL cs.LG stat.ML

    Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

    Authors: Kenton Murray, Jeffery Kinnison, Toan Q. Nguyen, Walter Scheirer, David Chiang

    Abstract: Neural sequence-to-sequence models, particularly the Transformer, are the state of the art in machine translation. Yet these neural networks are very sensitive to architecture and hyperparameter settings. Optimizing these settings by grid or random search is computationally expensive because it requires many training runs. In this paper, we incorporate architecture search into a single training ru… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

    Comments: The 3rd Workshop on Neural Generation and Translation (WNGT 2019)

  41. arXiv:1907.05376  [pdf, other

    eess.IV cs.CV physics.med-ph

    Monocular 3D Sway Tracking for Assessing Postural Instability in Cerebral Hypoperfusion During Quiet Standing

    Authors: Robert Amelard, Kevin R Murray, Eric T Hedge, Taylor W Cleworth, Mamiko Noguchi, Andrew Laing, Richard L Hughson

    Abstract: Postural instability is prevalent in aging and neurodegenerative disease, decreasing quality of life and independence. Quantitatively monitoring balance control is important for assessing treatment efficacy and rehabilitation progress. However, existing technologies for assessing postural sway are complex and expensive, limiting their widespread utility. Here, we propose a monocular imaging system… ▽ More

    Submitted 5 November, 2019; v1 submitted 11 July, 2019; originally announced July 2019.

  42. arXiv:1811.00739  [pdf, other

    cs.CL cs.LG

    An Empirical Exploration of Curriculum Learning for Neural Machine Translation

    Authors: Xuan Zhang, Gaurav Kumar, Huda Khayrallah, Kenton Murray, Jeremy Gwinnup, Marianna J Martindale, Paul McNamee, Kevin Duh, Marine Carpuat

    Abstract: Machine translation systems based on deep neural networks are expensive to train. Curriculum learning aims to address this issue by choosing the order in which samples are presented during training to help train better models faster. We adopt a probabilistic view of curriculum learning, which lets us flexibly evaluate the impact of curricula design, and perform an extensive exploration on a German… ▽ More

    Submitted 2 November, 2018; originally announced November 2018.

  43. arXiv:1808.10006  [pdf, other

    cs.CL

    Correcting Length Bias in Neural Machine Translation

    Authors: Kenton Murray, David Chiang

    Abstract: We study two problems in neural machine translation (NMT). First, in beam search, whereas a wider beam should in principle help translation, it often hurts NMT. Second, NMT has a tendency to produce translations that are too short. Here, we argue that these problems are closely related and both rooted in label bias. We show that correcting the brevity problem almost eliminates the beam problem; we… ▽ More

    Submitted 31 August, 2018; v1 submitted 29 August, 2018; originally announced August 2018.

    Comments: WMT 2018

  44. arXiv:1612.00712  [pdf, other

    cs.NE cs.AI cs.LG

    Probabilistic Neural Programs

    Authors: Kenton W. Murray, Jayant Krishnamurthy

    Abstract: We present probabilistic neural programs, a framework for program induction that permits flexible specification of both a computational model and inference algorithm while simultaneously enabling the use of deep neural networks. Probabilistic neural programs combine a computation graph for specifying a neural network with an operator for weighted nondeterministic choice. Thus, a program describes… ▽ More

    Submitted 2 December, 2016; originally announced December 2016.

    Comments: Appears in NAMPI workshop at NIPS 2016

  45. arXiv:1602.04393  [pdf, other

    cs.IR stat.ML

    Semantic Scan: Detecting Subtle, Spatially Localized Events in Text Streams

    Authors: Abhinav Maurya, Kenton Murray, Yandong Liu, Chris Dyer, William W. Cohen, Daniel B. Neill

    Abstract: Early detection and precise characterization of emerging topics in text streams can be highly useful in applications such as timely and targeted public health interventions and discovering evolving regional business trends. Many methods have been proposed for detecting emerging events in text streams using topic modeling. However, these methods have numerous shortcomings that make them unsuitable… ▽ More

    Submitted 13 February, 2016; originally announced February 2016.

    Comments: 10 pages, 4 figures, KDD 2016 submission

  46. arXiv:1508.05051  [pdf, other

    cs.CL

    Auto-Sizing Neural Networks: With Applications to n-gram Language Models

    Authors: Kenton Murray, David Chiang

    Abstract: Neural networks have been shown to improve performance across a range of natural-language tasks. However, designing and training them can be complicated. Frequently, researchers resort to repeated experimentation to pick optimal settings. In this paper, we address the issue of choosing the correct number of units in hidden layers. We introduce a method for automatically adjusting network size by p… ▽ More

    Submitted 20 August, 2015; originally announced August 2015.

    Comments: EMNLP 2015

  47. arXiv:1508.02982  [pdf

    cs.HC

    WearWrite: Orchestrating the Crowd to Complete Complex Tasks from Wearables (We Wrote This Paper on a Watch)

    Authors: Michael Nebeling, Anhong Guo, Kyle Murray, Annika Tostengard, Angelos Giannopoulos, Martin Mihajlov, Steven Dow, Jaime Teevan, Jeffrey P. Bigham

    Abstract: In this paper we introduce a paradigm for completing complex tasks from wearable devices by leveraging crowdsourcing, and demonstrate its validity for academic writing. We explore this paradigm using a collaborative authoring system, called WearWrite, which is designed to enable authors and crowd workers to work together using an Android smartwatch and Google Docs to produce academic papers, inclu… ▽ More

    Submitted 25 July, 2015; originally announced August 2015.

  48. arXiv:1204.3678  [pdf, other

    cs.SI cs.HC physics.soc-ph

    Crowd Memory: Learning in the Collective

    Authors: Walter S. Lasecki, Samuel C. White, Kyle I. Murray, Jeffrey P. Bigham

    Abstract: Crowd algorithms often assume workers are inexperienced and thus fail to adapt as workers in the crowd learn a task. These assumptions fundamentally limit the types of tasks that systems based on such algorithms can handle. This paper explores how the crowd learns and remembers over time in the context of human computation, and how more realistic assumptions of worker experience may be used when d… ▽ More

    Submitted 18 April, 2012; v1 submitted 16 April, 2012; originally announced April 2012.

    Comments: Presented at Collective Intelligence conference, 2012 (arXiv:1204.2991)

    Report number: CollectiveIntelligence/2012/27

  49. arXiv:1106.4064  [pdf, ps, other

    cs.LG

    Algorithmic Programming Language Identification

    Authors: David Klein, Kyle Murray, Simon Weber

    Abstract: Motivated by the amount of code that goes unidentified on the web, we introduce a practical method for algorithmically identifying the programming language of source code. Our work is based on supervised learning and intelligent statistical features. We also explored, but abandoned, a grammatical approach. In testing, our implementation greatly outperforms that of an existing tool that relies on a… ▽ More

    Submitted 9 November, 2011; v1 submitted 20 June, 2011; originally announced June 2011.

    Comments: 11 pages. Code: https://github.com/simon-weber/Programming-Language-Identification

    ACM Class: I.2.6; K.3.2

  50. arXiv:1106.1150  [pdf, ps, other

    cs.CC

    Barbosa, Uniform Polynomial Time Bounds, and Promises

    Authors: Lane A. Hemaspaandra, Kyle Murray, Xiaoqing Tang

    Abstract: This note is a commentary on, and critique of, Andre Luiz Barbosa's paper entitled "P != NP Proof." Despite its provocative title, what the paper is seeking to do is not to prove P \neq NP in the standard sense in which that notation is used in the literature. Rather, Barbosa is (and is aware that he is) arguing that a different meaning should be associated with the notation P \neq NP, and he clai… ▽ More

    Submitted 6 June, 2011; originally announced June 2011.

    Report number: URCS TR-2011-969 MSC Class: 68Q15 (Primary); 68Q05 (Secondary) ACM Class: F.1.3