Skip to main content

Showing 1–45 of 45 results for author: Knight, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.21149  [pdf, other

    eess.IV cs.AI cs.CV

    Domain Shift Analysis in Chest Radiographs Classification in a Veterans Healthcare Administration Population

    Authors: Mayanka Chandrashekar, Ian Goethert, Md Inzamam Ul Haque, Benjamin McMahon, Sayera Dhaubhadel, Kathryn Knight, Joseph Erdos, Donna Reagan, Caroline Taylor, Peter Kuzmak, John Michael Gaziano, Eileen McAllister, Lauren Costa, Yuk-Lam Ho, Kelly Cho, Suzanne Tamang, Samah Fodeh-Jarad, Olga S. Ovchinnikova, Amy C. Justice, Jacob Hinkle, Ioana Danciu

    Abstract: Objectives: This study aims to assess the impact of domain shift on chest X-ray classification accuracy and to analyze the influence of ground truth label quality and demographic factors such as age group, sex, and study year. Materials and Methods: We used a DenseNet121 model pretrained MIMIC-CXR dataset for deep learning-based multilabel classification using ground truth labels from radiology re… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  2. arXiv:2406.10314  [pdf

    cs.LG cs.CY

    Development and Validation of a Machine Learning Algorithm for Clinical Wellness Visit Classification in Cats and Dogs

    Authors: Donald Szlosek, Michael Coyne, Julia Riggot, Kevin Knight, DJ McCrann, Dave Kincaid

    Abstract: Early disease detection in veterinary care relies on identifying subclinical abnormalities in asymptomatic animals during wellness visits. This study introduces an algorithm designed to distinguish between wellness and other veterinary visits.The purpose of this study is to validate the use of a visit classification algorithm compared to manual classification of veterinary visits by three board-ce… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 21 pages, 2 figures, 4 tables

  3. arXiv:2404.18842  [pdf, other

    cs.CV

    VISION: Toward a Standardized Process for Radiology Image Management at the National Level

    Authors: Kathryn Knight, Ioana Danciu, Olga Ovchinnikova, Jacob Hinkle, Mayanka Chandra Shekar, Debangshu Mukherjee, Eileen McAllister, Caitlin Rizy, Kelly Cho, Amy C. Justice, Joseph Erdos, Peter Kuzmak, Lauren Costa, Yuk-Lam Ho, Reddy Madipadga, Suzanne Tamang, Ian Goethert

    Abstract: The compilation and analysis of radiological images poses numerous challenges for researchers. The sheer volume of data as well as the computational needs of algorithms capable of operating on images are extensive. Additionally, the assembly of these images alone is difficult, as these exams may differ widely in terms of clinical context, structured annotation available for model training, modalit… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  4. arXiv:2403.12297  [pdf, other

    cs.CL cs.AI

    Leveraging Large Language Models to Extract Information on Substance Use Disorder Severity from Clinical Notes: A Zero-shot Learning Approach

    Authors: Maria Mahbub, Gregory M. Dams, Sudarshan Srinivasan, Caitlin Rizy, Ioana Danciu, Jodie Trafton, Kathryn Knight

    Abstract: Substance use disorder (SUD) poses a major concern due to its detrimental effects on health and society. SUD identification and treatment depend on a variety of factors such as severity, co-determinants (e.g., withdrawal symptoms), and social determinants of health. Existing diagnostic coding systems used by American insurance providers, like the International Classification of Diseases (ICD-10),… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 10 pages, 4 figures, 2 tables

  5. arXiv:2305.08777  [pdf, other

    cs.AI cs.CL cs.LG

    Question-Answering System Extracts Information on Injection Drug Use from Clinical Notes

    Authors: Maria Mahbub, Ian Goethert, Ioana Danciu, Kathryn Knight, Sudarshan Srinivasan, Suzanne Tamang, Karine Rozenberg-Ben-Dror, Hugo Solares, Susana Martins, Jodie Trafton, Edmon Begoli, Gregory Peterson

    Abstract: Background: Injection drug use (IDU) is a dangerous health behavior that increases mortality and morbidity. Identifying IDU early and initiating harm reduction interventions can benefit individuals at risk. However, extracting IDU behaviors from patients' electronic health records (EHR) is difficult because there is no International Classification of Disease (ICD) code and the only place IDU infor… ▽ More

    Submitted 28 December, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: 31 pages, 11 tables, 7 figures

  6. F*** workflows: when parts of FAIR are missing

    Authors: Sean R. Wilkinson, Greg Eisenhauer, Anuj J. Kapadia, Kathryn Knight, Jeremy Logan, Patrick Widener, Matthew Wolf

    Abstract: The FAIR principles for scientific data (Findable, Accessible, Interoperable, Reusable) are also relevant to other digital objects such as research software and scientific workflows that operate on scientific data. The FAIR principles can be applied to the data being handled by a scientific workflow as well as the processes, software, and other infrastructure which are necessary to specify and exe… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: 6 pages, 0 figures, accepted to ERROR 2022 workshop (see https://error-workshop.org/ for more information), to be published in proceedings of IEEE eScience 2022

  7. arXiv:2109.09597  [pdf, other

    cs.CL cs.AI cs.GT

    Two Approaches to Building Collaborative, Task-Oriented Dialog Agents through Self-Play

    Authors: Arkady Arkhangorodsky, Scot Fang, Victoria Knight, Ajay Nagesh, Maria Ryskina, Kevin Knight

    Abstract: Task-oriented dialog systems are often trained on human/human dialogs, such as collected from Wizard-of-Oz interfaces. However, human/human corpora are frequently too small for supervised training to be effective. This paper investigates two approaches to training agent-bots and user-bots through self-play, in which they autonomously explore an API environment, discovering communication strategies… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

    Comments: 4 pages, 5 figures

  8. arXiv:2109.09577  [pdf, other

    cs.CL cs.AI

    MeetDot: Videoconferencing with Live Translation Captions

    Authors: Arkady Arkhangorodsky, Christopher Chu, Scot Fang, Yiqi Huang, Denglin Jiang, Ajay Nagesh, Boliang Zhang, Kevin Knight

    Abstract: We present MeetDot, a videoconferencing system with live translation captions overlaid on screen. The system aims to facilitate conversation between people who speak different languages, thereby reducing communication barriers between multilingual participants. Currently, our system supports speech and captions in 4 languages and combines automatic speech recognition (ASR) and machine translation… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

    Comments: 7 pages, 4 figures, Accepted as EMNLP 2021 demo paper

  9. arXiv:2109.07230  [pdf, other

    cs.CL cs.LG

    Learning Mathematical Properties of Integers

    Authors: Maria Ryskina, Kevin Knight

    Abstract: Embedding words in high-dimensional vector spaces has proven valuable in many natural language applications. In this work, we investigate whether similarly-trained embeddings of integers can capture concepts that are useful for mathematical applications. We probe the integer embeddings for mathematical knowledge, apply them to a set of numerical reasoning tasks, and show that by learning the repre… ▽ More

    Submitted 15 September, 2021; originally announced September 2021.

    Comments: BlackboxNLP 2021

  10. arXiv:2105.06545  [pdf

    cs.CR

    What Clinical Trials Can Teach Us about the Development of More Resilient AI for Cybersecurity

    Authors: Edmon Begoli, Robert A. Bridges, Sean Oesch, Kathryn E. Knight

    Abstract: Policy-mandated, rigorously administered scientific testing is needed to provide transparency into the efficacy of artificial intelligence-based (AI-based) cyber defense tools for consumers and to prioritize future research and development. In this article, we propose a model that is informed by our experience, urged forward by massive scale cyberattacks, and inspired by parallel developments in t… ▽ More

    Submitted 13 May, 2021; originally announced May 2021.

  11. arXiv:2102.04506  [pdf, other

    cs.CL cs.AI

    A Hybrid Task-Oriented Dialog System with Domain and Task Adaptive Pretraining

    Authors: Boliang Zhang, Ying Lyu, Ning Ding, Tianhao Shen, Zhaoyang Jia, Kun Han, Kevin Knight

    Abstract: This paper describes our submission for the End-to-end Multi-domain Task Completion Dialog shared task at the 9th Dialog System Technology Challenge (DSTC-9). Participants in the shared task build an end-to-end task completion dialog system which is evaluated by human evaluation and a user simulator based automatic evaluation. Different from traditional pipelined approaches where modules are optim… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

  12. arXiv:2012.13454  [pdf, ps, other

    cs.CL

    Why Neural Machine Translation Prefers Empty Outputs

    Authors: Xing Shi, Yijun Xiao, Kevin Knight

    Abstract: We investigate why neural machine translation (NMT) systems assign high probability to empty translations. We find two explanations. First, label smoothing makes correct-length translations less confident, making it easier for the empty translation to finally outscore them. Second, NMT systems use the same, high-frequency EoS word to end all target sentences, regardless of length. This creates an… ▽ More

    Submitted 24 December, 2020; originally announced December 2020.

    Comments: 6 pages

  13. arXiv:2011.04761  [pdf, other

    cs.CV

    MUSE: Textual Attributes Guided Portrait Painting Generation

    Authors: Xiaodan Hu, Pengfei Yu, Kevin Knight, Heng Ji, Bo Li, Honghui Shi

    Abstract: We propose a novel approach, MUSE, to illustrate textual attributes visually via portrait generation. MUSE takes a set of attributes written in text, in addition to facial features extracted from a photo of the subject as input. We propose 11 attribute types to represent inspirations from a subject's profile, emotion, story, and environment. We propose a novel stacked neural network architecture b… ▽ More

    Submitted 19 September, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

    Comments: Accepted by AIART 2021

  14. arXiv:2010.08185  [pdf, ps, other

    cs.CL cs.AI

    DiDi's Machine Translation System for WMT2020

    Authors: Tanfang Chen, Weiwei Wang, Wenyang Wei, Xing Shi, Xiangang Li, Jieping Ye, Kevin Knight

    Abstract: This paper describes DiDi AI Labs' submission to the WMT2020 news translation shared task. We participate in the translation direction of Chinese->English. In this direction, we use the Transformer as our baseline model, and integrate several techniques for model enhancement, including data filtering, data selection, back-translation, fine-tuning, model ensembling, and re-ranking. As a result, our… ▽ More

    Submitted 16 October, 2020; originally announced October 2020.

    Comments: Accepted at WMT 2020

  15. ReviewRobot: Explainable Paper Review Generation based on Knowledge Synthesis

    Authors: Qingyun Wang, Qi Zeng, Lifu Huang, Kevin Knight, Heng Ji, Nazneen Fatema Rajani

    Abstract: To assist human review process, we build a novel ReviewRobot to automatically assign a review score and write comments for multiple categories such as novelty and meaningful comparison. A good review needs to be knowledgeable, namely that the comments should be constructive and informative to help improve the paper; and explainable by providing detailed evidence. ReviewRobot achieves these goals v… ▽ More

    Submitted 3 December, 2020; v1 submitted 12 October, 2020; originally announced October 2020.

    Comments: 14 pages. Accepted by The 14th International Conference on Natural Language Generation (INLG 2020) Code and resource is available at https://github.com/EagleW/ReviewRobot

  16. arXiv:2010.04747  [pdf, other

    cs.CL

    MEEP: An Open-Source Platform for Human-Human Dialog Collection and End-to-End Agent Training

    Authors: Arkady Arkhangorodsky, Amittai Axelrod, Christopher Chu, Scot Fang, Yiqi Huang, Ajay Nagesh, Xing Shi, Boliang Zhang, Kevin Knight

    Abstract: We create a new task-oriented dialog platform (MEEP) where agents are given considerable freedom in terms of utterances and API calls, but are constrained to work within a push-button environment. We include facilities for collecting human-human dialog corpora, and for training automatic agents in an end-to-end fashion. We demonstrate MEEP with a dialog assistant that lets users specify trip desti… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    Comments: 10 pages

  17. arXiv:2010.04746  [pdf, other

    cs.CL

    Solving Historical Dictionary Codes with a Neural Language Model

    Authors: Christopher Chu, Raphael Valenti, Kevin Knight

    Abstract: We solve difficult word-based substitution codes by constructing a decoding lattice and searching that lattice with a neural language model. We apply our method to a set of enciphered letters exchanged between US Army General James Wilkinson and agents of the Spanish Crown in the late 1700s and early 1800s, obtained from the US Library of Congress. We are able to decipher 75.1% of the cipher-word… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    Comments: 10 pages, 6 figures. To appear in EMNLP 2020

  18. arXiv:2010.04744  [pdf, other

    cs.CL

    Learning to Pronounce Chinese Without a Pronunciation Dictionary

    Authors: Christopher Chu, Scot Fang, Kevin Knight

    Abstract: We demonstrate a program that learns to pronounce Chinese text in Mandarin, without a pronunciation dictionary. From non-parallel streams of Chinese characters and Chinese pinyin syllables, it establishes a many-to-many mapping between characters and pronunciations. Using unsupervised methods, the program effectively deciphers writing into speech. Its token-level character-to-syllable accuracy is… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    Comments: 7 pages. To appear in EMNLP 2020

  19. arXiv:2007.00809  [pdf, other

    eess.AS cs.SD

    Automated Empathy Detection for Oncology Encounters

    Authors: Zhuohao Chen, James Gibson, Ming-Chang Chiu, Qiaohong Hu, Tara K Knight, Daniella Meeker, James A Tulsky, Kathryn I Pollak, Shrikanth Narayanan

    Abstract: Empathy involves understanding other people's situation, perspective, and feelings. In clinical interactions, it helps clinicians establish rapport with a patient and support patient-centered care and decision making. Understanding physician communication through observation of audio-recorded encounters is largely carried out with manual annotation and analysis. However, manual annotation has a pr… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

    Comments: Accepted by the 8TH IEEE International Conference on Healthcare Informatics (ICHI2020)

  20. arXiv:2005.06166  [pdf, other

    cs.CL cs.LG

    Parallel Corpus Filtering via Pre-trained Language Models

    Authors: Boliang Zhang, Ajay Nagesh, Kevin Knight

    Abstract: Web-crawled data provides a good source of parallel corpora for training machine translation models. It is automatically obtained, but extremely noisy, and recent work shows that neural machine translation systems are more sensitive to noise than traditional statistical machine translation methods. In this paper, we propose a novel approach to filter out noisy sentence pairs from web-crawled corpo… ▽ More

    Submitted 13 May, 2020; originally announced May 2020.

    Comments: ACL 2020

  21. arXiv:1906.05683  [pdf, ps, other

    cs.CL

    Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation

    Authors: Nima Pourdamghani, Nada Aldarrab, Marjan Ghazvininejad, Kevin Knight, Jonathan May

    Abstract: Given a rough, word-by-word gloss of a source language sentence, target language natives can uncover the latent, fully-fluent rendering of the translation. In this work we explore this intuition by breaking translation into a two step process: generating a rough gloss by means of a dictionary and then `translating' the resulting pseudo-translation, or `Translationese' into a fully fluent translati… ▽ More

    Submitted 11 June, 2019; originally announced June 2019.

    Comments: Accepted in ACL 2019

  22. One SQL to Rule Them All

    Authors: Edmon Begoli, Tyler Akidau, Fabian Hueske, Julian Hyde, Kathryn Knight, Kenneth Knowles

    Abstract: Real-time data analysis and management are increasingly critical for today`s businesses. SQL is the de facto lingua franca for these endeavors, yet support for robust streaming analysis and management with SQL remains limited. Many approaches restrict semantics to a reduced subset of features and/or require a suite of non-standard constructs. Additionally, use of event timestamps to provide native… ▽ More

    Submitted 28 May, 2019; originally announced May 2019.

    ACM Class: H.2.3

  23. arXiv:1905.07870  [pdf, other

    cs.CL cs.AI cs.LG

    PaperRobot: Incremental Draft Generation of Scientific Ideas

    Authors: Qingyun Wang, Lifu Huang, Zhiying Jiang, Kevin Knight, Heng Ji, Mohit Bansal, Yi Luan

    Abstract: We present a PaperRobot who performs as an automatic research assistant by (1) conducting deep understanding of a large collection of human-written papers in a target domain and constructing comprehensive background knowledge graphs (KGs); (2) creating new ideas by predicting links from the background KGs, by combining graph attention and contextual text attention; (3) incrementally writing some k… ▽ More

    Submitted 31 May, 2019; v1 submitted 20 May, 2019; originally announced May 2019.

    Comments: 12 pages. Accepted by ACL 2019 Code and resource is available at https://github.com/EagleW/PaperRobot

  24. arXiv:1811.05701  [pdf, other

    cs.CL

    Plan-And-Write: Towards Better Automatic Storytelling

    Authors: Lili Yao, Nanyun Peng, Ralph Weischedel, Kevin Knight, Dongyan Zhao, Rui Yan

    Abstract: Automatic storytelling is challenging since it requires generating long, coherent natural language to describes a sensible sequence of events. Despite considerable efforts on automatic story generation in the past, prior work either is restricted in plot planning, or can only generate stories in a narrow domain. In this paper, we explore open-domain story generation that writes stories given a tit… ▽ More

    Submitted 19 February, 2019; v1 submitted 14 November, 2018; originally announced November 2018.

    Comments: Accepted by AAAI 2019

  25. arXiv:1810.04297  [pdf, other

    cs.CL

    Decipherment of Historical Manuscript Images

    Authors: Xusen Yin, Nada Aldarrab, Beáta Megyesi, Kevin Knight

    Abstract: European libraries and archives are filled with enciphered manuscripts from the early modern period. These include military and diplomatic correspondence, records of secret societies, private letters, and so on. Although they are enciphered with classical cryptographic algorithms, their contents are unavailable to working historians. We therefore attack the problem of automatically converting ciph… ▽ More

    Submitted 2 June, 2019; v1 submitted 9 October, 2018; originally announced October 2018.

    Comments: International Conference on Document Analysis and Recognition 2019 Long paper

  26. Describing a Knowledge Base

    Authors: Qingyun Wang, Xiaoman Pan, Lifu Huang, Boliang Zhang, Zhiying Jiang, Heng Ji, Kevin Knight

    Abstract: We aim to automatically generate natural language descriptions about an input structured knowledge base (KB). We build our generation framework based on a pointer network which can copy facts from the input KB, and add two attention mechanisms: (i) slot-aware attention to capture the association between a slot type and its corresponding slot value; and (ii) a new \emph{table position self-attentio… ▽ More

    Submitted 30 September, 2018; v1 submitted 5 September, 2018; originally announced September 2018.

    Comments: 12 pages. Accepted by The 11th International Conference on Natural Language Generation (INLG 2018) Code at https://github.com/EagleW/Describing_a_Knowledge_Base

  27. arXiv:1808.05700  [pdf, other

    cs.CL

    Augmenting Statistical Machine Translation with Subword Translation of Out-of-Vocabulary Words

    Authors: Nelson F. Liu, Jonathan May, Michael Pust, Kevin Knight

    Abstract: Most statistical machine translation systems cannot translate words that are unseen in the training data. However, humans can translate many classes of out-of-vocabulary (OOV) words (e.g., novel morphological variants, misspellings, and compounds) without context by using orthographic clues. Following this observation, we describe and evaluate several general methods for OOV translation that use o… ▽ More

    Submitted 16 August, 2018; originally announced August 2018.

    Comments: 7 pages

  28. arXiv:1806.00588  [pdf, other

    cs.CL cs.AI cs.DC cs.DS

    Fast Locality Sensitive Hashing for Beam Search on GPU

    Authors: Xing Shi, Shizhen Xu, Kevin Knight

    Abstract: We present a GPU-based Locality Sensitive Hashing (LSH) algorithm to speed up beam search for sequence models. We utilize the winner-take-all (WTA) hash, which is based on relative ranking order of hidden dimensions and thus resilient to perturbations in numerical values. Our algorithm is designed by fully considering the underling architecture of CUDA-enabled GPUs (Algorithm/Architecture Co-desig… ▽ More

    Submitted 2 June, 2018; originally announced June 2018.

  29. arXiv:1805.06533  [pdf, other

    cs.CL

    Modeling Naive Psychology of Characters in Simple Commonsense Stories

    Authors: Hannah Rashkin, Antoine Bosselut, Maarten Sap, Kevin Knight, Yejin Choi

    Abstract: Understanding a narrative requires reading between the lines and reasoning about the unspoken but obvious implications about events and people's mental states - a capability that is trivial for humans but remarkably hard for machines. To facilitate research addressing this challenge, we introduce a new annotation framework to explain naive psychology of story characters as fully-specified chains o… ▽ More

    Submitted 16 May, 2018; originally announced May 2018.

    Comments: Accepted to ACL 2018 (long paper)

  30. Paper Abstract Writing through Editing Mechanism

    Authors: Qingyun Wang, Zhihao Zhou, Lifu Huang, Spencer Whitehead, Boliang Zhang, Heng Ji, Kevin Knight

    Abstract: We present a paper abstract writing system based on an attentive neural sequence-to-sequence model that can take a title as input and automatically generate an abstract. We design a novel Writing-editing Network that can attend to both the title and the previously generated abstract drafts and then iteratively revise and polish the abstract. With two series of Turing tests, where the human judges… ▽ More

    Submitted 15 May, 2018; originally announced May 2018.

    Comments: * Equal contribution. 6 pages. Accepted by ACL 2018; The code and dataset are available at https://github.com/EagleW/Writing-editing-Network

  31. arXiv:1804.07875  [pdf, other

    cs.CL cs.AI

    Multi-lingual Common Semantic Space Construction via Cluster-consistent Word Embedding

    Authors: Lifu Huang, Kyunghyun Cho, Boliang Zhang, Heng Ji, Kevin Knight

    Abstract: We construct a multilingual common semantic space based on distributional semantics, where words from multiple languages are projected into a shared space to enable knowledge and resource transfer across languages. Beyond word alignment, we introduce multiple cluster-level alignments and enforce the word clusters to be consistently distributed across multiple languages. We exploit three signals fo… ▽ More

    Submitted 20 April, 2018; originally announced April 2018.

    Comments: 10 pages

  32. arXiv:1802.02607  [pdf, other

    cs.CL cs.SD eess.AS

    Learning from Past Mistakes: Improving Automatic Speech Recognition Output via Noisy-Clean Phrase Context Modeling

    Authors: Prashanth Gurunath Shivakumar, Haoqi Li, Kevin Knight, Panayiotis Georgiou

    Abstract: Automatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example pruning words due to acoustics using short-term context, prior to rescoring with long-term context based on linguistics. In this work we model ASR as a phrase-based noisy transformation channel and propose an error correction system that can… ▽ More

    Submitted 28 March, 2019; v1 submitted 7 February, 2018; originally announced February 2018.

    Journal ref: APSIPA Transactions on Signal and Information Processing 8. Cambridge University Press: e8, 2019

  33. arXiv:1711.05408  [pdf, other

    cs.FL cs.CC cs.CL

    Recurrent Neural Networks as Weighted Language Recognizers

    Authors: Yining Chen, Sorcha Gilroy, Andreas Maletti, Jonathan May, Kevin Knight

    Abstract: We investigate the computational complexity of various problems for simple recurrent neural networks (RNNs) as formal models for recognizing weighted languages. We focus on the single-layer, ReLU-activation, rational-weight RNNs with softmax, which are commonly used in natural language processing applications. We show that most problems for such RNNs are undecidable, including consistency, equival… ▽ More

    Submitted 4 March, 2018; v1 submitted 14 November, 2017; originally announced November 2017.

  34. arXiv:1609.09007  [pdf, other

    cs.CL cs.LG

    Unsupervised Neural Hidden Markov Models

    Authors: Ke Tran, Yonatan Bisk, Ashish Vaswani, Daniel Marcu, Kevin Knight

    Abstract: In this work, we present the first results for neuralizing an Unsupervised Hidden Markov Model. We evaluate our approach on tag in- duction. Our approach outperforms existing generative models and is competitive with the state-of-the-art though with a simpler model easily extended to include additional context.

    Submitted 28 September, 2016; originally announced September 2016.

    Comments: accepted at EMNLP 2016, Workshop on Structured Prediction for NLP. Oral presentation

  35. arXiv:1604.02201  [pdf, other

    cs.CL

    Transfer Learning for Low-Resource Neural Machine Translation

    Authors: Barret Zoph, Deniz Yuret, Jonathan May, Kevin Knight

    Abstract: The encoder-decoder framework for neural machine translation (NMT) has been shown effective in large data scenarios, but is much less effective for low-resource languages. We present a transfer learning method that significantly improves Bleu scores across a range of low-resource languages. Our key idea is to first train a high-resource language pair (the parent model), then transfer some of the l… ▽ More

    Submitted 7 April, 2016; originally announced April 2016.

    Comments: 8 pages

  36. arXiv:1601.00710  [pdf, other

    cs.CL

    Multi-Source Neural Translation

    Authors: Barret Zoph, Kevin Knight

    Abstract: We build a multi-source machine translation model and train it to maximize the probability of a target English string given French and German sources. Using the neural encoder-decoder framework, we explore several combination methods and report up to +4.8 Bleu increases on top of a very strong attention-based neural translation model.

    Submitted 4 January, 2016; originally announced January 2016.

    Comments: 5 pages, 6 figures

  37. arXiv:1504.06665  [pdf, other

    cs.CL cs.AI

    Using Syntax-Based Machine Translation to Parse English into Abstract Meaning Representation

    Authors: Michael Pust, Ulf Hermjakob, Kevin Knight, Daniel Marcu, Jonathan May

    Abstract: We present a parser for Abstract Meaning Representation (AMR). We treat English-to-AMR conversion within the framework of string-to-tree, syntax-based machine translation (SBMT). To make this work, we transform the AMR structure into a form suitable for the mechanics of SBMT and useful for modeling. We introduce an AMR-specific language model and add data and features drawn from semantic resources… ▽ More

    Submitted 28 April, 2015; v1 submitted 24 April, 2015; originally announced April 2015.

    Comments: 10 pages, 8 figures

    ACM Class: I.2.7

  38. arXiv:cs/0302032  [pdf

    cs.CL

    Empirical Methods for Compound Splitting

    Authors: Philipp Koehn, Kevin Knight

    Abstract: Compounded words are a challenge for NLP applications such as machine translation (MT). We introduce methods to learn splitting rules from monolingual and parallel corpora. We evaluate them against a gold standard and measure their impact on performance of statistical MT systems. Results show accuracy of 99.1% and performance gains for MT of 0.039 BLEU on a German-English noun phrase translation… ▽ More

    Submitted 22 February, 2003; originally announced February 2003.

    Comments: 8 pages, 2 figures. Published at EACL 2003

    ACM Class: I.2.7

  39. arXiv:cmp-lg/9704003  [pdf, ps

    cs.CL

    Machine Transliteration

    Authors: Kevin Knight, Jonathan Graehl

    Abstract: It is challenging to translate names and technical terms across languages with different alphabets and sound inventories. These items are commonly transliterated, i.e., replaced with approximate phonetic equivalents. For example, "computer" in English comes out as "konpyuutaa" in Japanese. Translating such items from Japanese back to English is even more challenging, and of practical interest, a… ▽ More

    Submitted 14 April, 1997; originally announced April 1997.

    Comments: 8 pages, postscript, to appear, ACL-97/EACL-97

  40. arXiv:cmp-lg/9506011  [pdf, ps

    cs.CL

    Unification-Based Glossing

    Authors: Vasileios Hatzivassiloglou, Kevin Knight

    Abstract: We present an approach to syntax-based machine translation that combines unification-style interpretation with statistical processing. This approach enables us to translate any Japanese newspaper article into English, with quality far better than a word-for-word translation. Novel ideas include the use of feature structures to encode word lattices and the use of unification to compose and manipu… ▽ More

    Submitted 9 June, 1995; originally announced June 1995.

    Comments: 8 pages, Compressed and uuencoded postscript. To appear: IJCAI-95

  41. arXiv:cmp-lg/9506010  [pdf, ps

    cs.CL

    Two-level, Many-Paths Generation

    Authors: Kevin Knight, Vasileios Hatzivassiloglou

    Abstract: Large-scale natural language generation requires the integration of vast amounts of knowledge: lexical, grammatical, and conceptual. A robust generator must be able to operate well even when pieces of knowledge are missing. It must also be robust against incomplete or inaccurate inputs. To attack these problems, we have built a hybrid generator, in which gaps in symbolic knowledge are filled by… ▽ More

    Submitted 9 June, 1995; originally announced June 1995.

    Comments: 9 pages, Compressed and uuencoded postscript. To appear: ACL-95

  42. arXiv:cmp-lg/9506009  [pdf, ps

    cs.CL

    Filling Knowledge Gaps in a Broad-Coverage Machine Translation System

    Authors: Kevin Knight, Ishwar Chander, Matthew Haines, Vasileios Hatzivassiloglou, Eduard Hovy, Masayo Iida, Steve K. Luk, Richard Whitney, Kenji Yamada

    Abstract: Knowledge-based machine translation (KBMT) techniques yield high quality in domains with detailed semantic models, limited vocabulary, and controlled input grammar. Scaling up along these dimensions means acquiring large knowledge resources. It also means behaving reasonably when definitive knowledge is not yet available. This paper describes how we can fill various KBMT knowledge gaps, often us… ▽ More

    Submitted 9 June, 1995; originally announced June 1995.

    Comments: 7 pages, Compressed and uuencoded postscript. To appear: IJCAI-95

  43. arXiv:cmp-lg/9409001  [pdf, ps

    cs.CL

    Integrating Knowledge Bases and Statistics in MT

    Authors: Kevin Knight, Ishwar Chander, Matthew Haines, Vasileios Hatzivassiloglou, Eduard Hovy, Masayo Iida, Steve K. Luk, Akitoshi Okumura, Richard Whitney, Kenji Yamada

    Abstract: We summarize recent machine translation (MT) research at the Information Sciences Institute of USC, and we describe its application to the development of a Japanese-English newspaper MT system. Our work aims at scaling up grammar-based, knowledge-based MT techniques. This scale-up involves the use of statistical methods, both in acquiring effective knowledge resources and in making reasonable li… ▽ More

    Submitted 5 September, 1994; originally announced September 1994.

    Comments: 8 pages, compressed, uuencoded postscript

    Journal ref: Proc Association for Machine Translation in the Americas (AMTA-94)

  44. arXiv:cmp-lg/9407029  [pdf, ps

    cs.CL

    Building a Large-Scale Knowledge Base for Machine Translation

    Authors: Kevin Knight, Steve K. Luk

    Abstract: Knowledge-based machine translation (KBMT) systems have achieved excellent results in constrained domains, but have not yet scaled up to newspaper text. The reason is that knowledge resources (lexicons, grammar rules, world models) must be painstakingly handcrafted from scratch. One of the hypotheses being tested in the PANGLOSS machine translation project is whether or not these resources can b… ▽ More

    Submitted 29 July, 1994; originally announced July 1994.

    Comments: 6 pages, Compressed and uuencoded postscript. To appear: AAAI-94

  45. arXiv:cmp-lg/9407028  [pdf, ps

    cs.CL

    Automated Postediting of Documents

    Authors: Kevin Knight, Ishwar Chander

    Abstract: Large amounts of low- to medium-quality English texts are now being produced by machine translation (MT) systems, optical character readers (OCR), and non-native speakers of English. Most of this text must be postedited by hand before it sees the light of day. Improving text quality is tedious work, but its automation has not received much research attention. Anyone who has postedited a technica… ▽ More

    Submitted 29 July, 1994; originally announced July 1994.

    Comments: 6 pages, Compressed and uuencoded postscript. To appear: AAAI-94