Skip to main content

Showing 1–50 of 103 results for author: Berg-Kirkpatrick, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.08175  [pdf, ps, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Fast Text-to-Audio Generation with Adversarial Post-Training

    Authors: Zachary Novack, Zach Evans, Zack Zukowski, Josiah Taylor, CJ Carr, Julian Parker, Adnan Al-Sinan, Gian Marco Iodice, Julian McAuley, Taylor Berg-Kirkpatrick, Jordi Pons

    Abstract: Text-to-audio systems, while increasingly performant, are slow at inference time, thus making their latency unpractical for many creative applications. We present Adversarial Relativistic-Contrastive (ARC) post-training, the first adversarial acceleration algorithm for diffusion/flow models not based on distillation. While past adversarial post-training methods have struggled to compare against th… ▽ More

    Submitted 14 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

  2. arXiv:2504.03101  [pdf, other

    cs.CL

    Single-Pass Document Scanning for Question Answering

    Authors: Weili Cao, Jianyou Wang, Youze Zheng, Longtian Bao, Qirui Zheng, Taylor Berg-Kirkpatrick, Ramamohan Paturi, Leon Bergen

    Abstract: Handling extremely large documents for question answering is challenging: chunk-based embedding methods often lose track of important global context, while full-context transformers can be prohibitively expensive for hundreds of thousands of tokens. We propose a single-pass document scanning approach that processes the entire text in linear time, preserving global coherence while deciding which se… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  3. arXiv:2504.00369  [pdf, other

    cs.SD

    Are you really listening? Boosting Perceptual Awareness in Music-QA Benchmarks

    Authors: Yongyi Zang, Sean O'Brien, Taylor Berg-Kirkpatrick, Julian McAuley, Zachary Novack

    Abstract: Large Audio Language Models (LALMs), where pretrained text LLMs are finetuned with audio input, have made remarkable progress in music understanding. However, current evaluation methodologies exhibit critical limitations: on the leading Music Question Answering benchmark, MuchoMusic, text-only LLMs without audio perception capabilities achieve surprisingly high accuracy of up to 56.4%, on par or a… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

  4. arXiv:2502.15849  [pdf, other

    cs.AI cs.LO cs.SD

    Deriving Representative Structure from Music Corpora

    Authors: Ilana Shapiro, Ruanqianqian Huang, Zachary Novack, Cheng-i Wang, Hao-Wen Dong, Taylor Berg-Kirkpatrick, Shlomo Dubnov, Sorin Lerner

    Abstract: Western music is an innately hierarchical system of interacting levels of structure, from fine-grained melody to high-level form. In order to analyze music compositions holistically and at multiple granularities, we propose a unified, hierarchical meta-representation of musical structure called the structural temporal graph (STG). For a single piece, the STG is a data structure that defines a hier… ▽ More

    Submitted 30 March, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: 12 pages, 8 figures, 7 tables

    ACM Class: G.1.6; I.2.4; J.5; G.2.2

  5. arXiv:2411.00412  [pdf, other

    cs.LG cs.AI cs.CL

    Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation

    Authors: Bohan Lyu, Yadi Cao, Duncan Watson-Parris, Leon Bergen, Taylor Berg-Kirkpatrick, Rose Yu

    Abstract: Large Language Models (LLMs) demonstrate promising capabilities in solving simple scientific problems but, even with domain-specific fine-tuning, often produce hallucinations for complex ones. While integrating LLMs with tools can mitigate this reliability issue, models finetuned on tool usage only often over-rely on them, incurring unnecessary costs from resource-intensive scientific tools even f… ▽ More

    Submitted 5 February, 2025; v1 submitted 1 November, 2024; originally announced November 2024.

    Comments: 32 pages, 16 figures

    ACM Class: I.2.6; I.2.7

  6. arXiv:2410.16701  [pdf, other

    cs.LG

    ClimaQA: An Automated Evaluation Framework for Climate Question Answering Models

    Authors: Veeramakali Vignesh Manivannan, Yasaman Jafari, Srikar Eranky, Spencer Ho, Rose Yu, Duncan Watson-Parris, Yian Ma, Leon Bergen, Taylor Berg-Kirkpatrick

    Abstract: The use of Large Language Models (LLMs) in climate science has recently gained significant attention. However, a critical issue remains: the lack of a comprehensive evaluation framework capable of assessing the quality and scientific validity of model outputs. To address this issue, we develop ClimaGen (Climate QA Generator), an adaptive learning framework that generates question-answer pairs from… ▽ More

    Submitted 9 March, 2025; v1 submitted 22 October, 2024; originally announced October 2024.

    Comments: Accepted to ICLR 2025

    Journal ref: ICLR 2025

  7. arXiv:2410.14923  [pdf, other

    cs.CR

    Imprompter: Tricking LLM Agents into Improper Tool Use

    Authors: Xiaohan Fu, Shuheng Li, Zihan Wang, Yihao Liu, Rajesh K. Gupta, Taylor Berg-Kirkpatrick, Earlence Fernandes

    Abstract: Large Language Model (LLM) Agents are an emerging computing paradigm that blends generative machine learning with tools such as code interpreters, web browsing, email, and more generally, external resources. These agent-based systems represent an emerging shift in personal computing. We contribute to the security foundations of agent-based systems and surface a new class of automatically computed… ▽ More

    Submitted 21 October, 2024; v1 submitted 18 October, 2024; originally announced October 2024.

    Comments: website: https://imprompter.ai code: https://github.com/Reapor-Yurnero/imprompter v2 changelog: add new results to Table 3, correct several typos

  8. arXiv:2410.05586  [pdf, other

    cs.CV cs.AI

    TeaserGen: Generating Teasers for Long Documentaries

    Authors: Weihan Xu, Paul Pu Liang, Haven Kim, Julian McAuley, Taylor Berg-Kirkpatrick, Hao-Wen Dong

    Abstract: Teasers are an effective tool for promoting content in entertainment, commercial and educational fields. However, creating an effective teaser for long videos is challenging for it requires long-range multimodal modeling on the input videos, while necessitating maintaining audiovisual alignments, managing scene changes and preserving factual accuracy for the output teasers. Due to the lack of a pu… ▽ More

    Submitted 9 November, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

  9. arXiv:2410.05167  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Presto! Distilling Steps and Layers for Accelerating Music Generation

    Authors: Zachary Novack, Ge Zhu, Jonah Casebeer, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan

    Abstract: Despite advances in diffusion-based text-to-music (TTM) methods, efficient, high-quality generation remains a challenge. We introduce Presto!, an approach to inference acceleration for score-based diffusion transformers via reducing both sampling steps and cost per step. To reduce steps, we develop a new score-based distribution matching distillation (DMD) method for the EDM-family of diffusion mo… ▽ More

    Submitted 16 April, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: Accepted as Spotlight at ICLR 2025

  10. arXiv:2410.02084  [pdf, other

    cs.SD eess.AS

    Generating Symbolic Music from Natural Language Prompts using an LLM-Enhanced Dataset

    Authors: Weihan Xu, Julian McAuley, Taylor Berg-Kirkpatrick, Shlomo Dubnov, Hao-Wen Dong

    Abstract: Recent years have seen many audio-domain text-to-music generation models that rely on large amounts of text-audio pairs for training. However, symbolic-domain controllable music generation has lagged behind partly due to the lack of a large-scale symbolic music dataset with extensive metadata and captions. In this work, we present MetaScore, a new dataset consisting of 963K musical scores paired w… ▽ More

    Submitted 21 October, 2024; v1 submitted 2 October, 2024; originally announced October 2024.

  11. arXiv:2409.10831  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music Processing

    Authors: Phillip Long, Zachary Novack, Taylor Berg-Kirkpatrick, Julian McAuley

    Abstract: The recent explosion of generative AI-Music systems has raised numerous concerns over data copyright, licensing music from musicians, and the conflict between open-source AI and large prestige companies. Such issues highlight the need for publicly available, copyright-free musical data, in which there is a large shortage, particularly for symbolic music data. To alleviate this issue, we present PD… ▽ More

    Submitted 16 March, 2025; v1 submitted 16 September, 2024; originally announced September 2024.

    Comments: Accepted to 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  12. arXiv:2408.16126  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation

    Authors: Ke Chen, Jiaqi Su, Taylor Berg-Kirkpatrick, Shlomo Dubnov, Zeyu Jin

    Abstract: Achieving robust speech separation for overlapping speakers in various acoustic environments with noise and reverberation remains an open challenge. Although existing datasets are available to train separators for specific scenarios, they do not effectively generalize across diverse real-world scenarios. In this paper, we present a novel data simulation pipeline that produces diverse training data… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: In Proceedings of the 25th Annual Conference of the International Speech Communication Association, Interspeech 2024

  13. arXiv:2408.04628  [pdf, other

    cs.CL cs.AI cs.CV

    LogogramNLP: Comparing Visual and Textual Representations of Ancient Logographic Writing Systems for NLP

    Authors: Danlu Chen, Freda Shi, Aditi Agarwal, Jacobo Myerston, Taylor Berg-Kirkpatrick

    Abstract: Standard natural language processing (NLP) pipelines operate on symbolic representations of language, which typically consist of sequences of discrete tokens. However, creating an analogous representation for ancient logographic writing systems is an extremely labor intensive process that requires expert knowledge. At present, a large portion of logographic data persists in a purely visual form du… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Journal ref: ACL 2024, long paper

  14. arXiv:2405.21047  [pdf, other

    cs.AI cs.CL cs.LG

    Grammar-Aligned Decoding

    Authors: Kanghee Park, Jiayu Wang, Taylor Berg-Kirkpatrick, Nadia Polikarpova, Loris D'Antoni

    Abstract: Large Language Models (LLMs) struggle with reliably generating highly structured outputs, such as program code, mathematical formulas, or well-formed markup. Constrained decoding approaches mitigate this problem by greedily restricting what tokens an LLM can output at each step to guarantee that the output matches a given constraint. Specifically, in grammar-constrained decoding (GCD), the LLM's o… ▽ More

    Submitted 4 November, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Accepted to NeurIPS 2024

  15. arXiv:2405.20289  [pdf, other

    cs.SD cs.AI cs.LG

    DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

    Authors: Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas Bryan

    Abstract: Controllable music generation methods are critical for human-centered AI-based music creation, but are currently limited by speed, quality, and control design trade-offs. Diffusion Inference-Time T-optimization (DITTO), in particular, offers state-of-the-art results, but is over 10x slower than real-time, limiting practical use. We propose Distilled Diffusion Inference-Time T -Optimization (or DIT… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  16. arXiv:2405.15880  [pdf, other

    cs.PL cs.AI

    HYSYNTH: Context-Free LLM Approximation for Guiding Program Synthesis

    Authors: Shraddha Barke, Emmanuel Anaya Gonzalez, Saketh Ram Kasibatla, Taylor Berg-Kirkpatrick, Nadia Polikarpova

    Abstract: Many structured prediction and reasoning tasks can be framed as program synthesis problems, where the goal is to generate a program in a domain-specific language (DSL) that transforms input data into the desired output. Unfortunately, purely neural approaches, such as large language models (LLMs), often fail to produce fully correct programs in unfamiliar DSLs, while purely symbolic methods based… ▽ More

    Submitted 31 October, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted at NeurIPS 2024

  17. arXiv:2405.00752  [pdf, other

    cs.DL

    Clustering Running Titles to Understand the Printing of Early Modern Books

    Authors: Nikolai Vogler, Kartik Goyal, Samuel V. Lemley, D. J. Schuldt, Christopher N. Warren, Max G'Sell, Taylor Berg-Kirkpatrick

    Abstract: We propose a novel computational approach to automatically analyze the physical process behind printing of early modern letterpress books via clustering the running titles found at the top of their pages. Specifically, we design and compare custom neural and feature-based kernels for computing pairwise visual similarity of a scanned document's running titles and cluster the titles in order to trac… ▽ More

    Submitted 22 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted at ICDAR 2024; updated Acknowledgments in v2

  18. arXiv:2402.11711  [pdf, other

    cs.CL

    MORL-Prompt: An Empirical Analysis of Multi-Objective Reinforcement Learning for Discrete Prompt Optimization

    Authors: Yasaman Jafari, Dheeraj Mekala, Rose Yu, Taylor Berg-Kirkpatrick

    Abstract: RL-based techniques can be employed to search for prompts that, when fed into a target language model, maximize a set of user-specified reward functions. However, in many target applications, the natural reward functions are in tension with one another -- for example, content preservation vs. style matching in style transfer tasks. Current techniques focus on maximizing the average of reward funct… ▽ More

    Submitted 16 October, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  19. arXiv:2401.12179  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    DITTO: Diffusion Inference-Time T-Optimization for Music Generation

    Authors: Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan

    Abstract: We propose Diffusion Inference-Time T-Optimization (DITTO), a general-purpose frame-work for controlling pre-trained text-to-music diffusion models at inference-time via optimizing initial noise latents. Our method can be used to optimize through any differentiable feature matching loss to achieve a target (stylized) output and leverages gradient checkpointing for memory efficiency. We demonstrate… ▽ More

    Submitted 3 June, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    Comments: Oral at ICML 2024

  20. arXiv:2312.04510  [pdf, other

    cs.CL cs.LG

    A Block Metropolis-Hastings Sampler for Controllable Energy-based Text Generation

    Authors: Jarad Forristal, Niloofar Mireshghallah, Greg Durrett, Taylor Berg-Kirkpatrick

    Abstract: Recent work has shown that energy-based language modeling is an effective framework for controllable text generation because it enables flexible integration of arbitrary discriminators. However, because energy-based LMs are globally normalized, approximate techniques like Metropolis-Hastings (MH) are required for inference. Past work has largely explored simple proposal distributions that modify a… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  21. arXiv:2310.16303  [pdf, other

    cs.CL cs.IR

    URL-BERT: Training Webpage Representations via Social Media Engagements

    Authors: Ayesha Qamar, Chetan Verma, Ahmed El-Kishky, Sumit Binnani, Sneha Mehta, Taylor Berg-Kirkpatrick

    Abstract: Understanding and representing webpages is crucial to online social networks where users may share and engage with URLs. Common language model (LM) encoders such as BERT can be used to understand and represent the textual content of webpages. However, these representations may not model thematic information of web domains and URLs or accurately capture their appeal to social media users. In this w… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  22. arXiv:2310.10772  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Unsupervised Lead Sheet Generation via Semantic Compression

    Authors: Zachary Novack, Nikita Srivatsan, Taylor Berg-Kirkpatrick, Julian McAuley

    Abstract: Lead sheets have become commonplace in generative music research, being used as an initial compressed representation for downstream tasks like multitrack music generation and automatic arrangement. Despite this, researchers have often fallen back on deterministic reduction methods (such as the skyline algorithm) to generate lead sheets when seeking paired lead sheets and full scores, with little a… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  23. arXiv:2310.08049  [pdf, other

    cs.LG

    Is attention required for ICL? Exploring the Relationship Between Model Architecture and In-Context Learning Ability

    Authors: Ivan Lee, Nan Jiang, Taylor Berg-Kirkpatrick

    Abstract: What is the relationship between model architecture and the ability to perform in-context learning? In this empirical study, we take the first steps toward answering this question. We evaluate thirteen model architectures capable of causal language modeling across a suite of synthetic in-context learning tasks. These selected architectures represent a broad range of paradigms, including recurrent… ▽ More

    Submitted 1 April, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

  24. arXiv:2310.03185  [pdf, other

    cs.CR cs.AI

    Misusing Tools in Large Language Models With Visual Adversarial Examples

    Authors: Xiaohan Fu, Zihan Wang, Shuheng Li, Rajesh K. Gupta, Niloofar Mireshghallah, Taylor Berg-Kirkpatrick, Earlence Fernandes

    Abstract: Large Language Models (LLMs) are being enhanced with the ability to use tools and to process multiple modalities. These new capabilities bring new benefits and also new security risks. In this work, we show that an attacker can use visual adversarial examples to cause attacker-desired tool usage. For example, the attacker could cause a victim LLM to delete calendar events, leak private conversatio… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  25. arXiv:2308.02723  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction

    Authors: Keren Shao, Ke Chen, Taylor Berg-Kirkpatrick, Shlomo Dubnov

    Abstract: In deep learning research, many melody extraction models rely on redesigning neural network architectures to improve performance. In this paper, we propose an input feature modification and a training objective modification based on two assumptions. First, harmonics in the spectrograms of audio data decay rapidly along the frequency axis. To enhance the model's sensitivity on the trailing harmonic… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: 7 pages, 4 figures, 2 tables, Proceedings of the 24th International Society for Music Information Retrieval Conference, ISMIR 2023

  26. arXiv:2308.01546  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies

    Authors: Ke Chen, Yusong Wu, Haohe Liu, Marianna Nezhurina, Taylor Berg-Kirkpatrick, Shlomo Dubnov

    Abstract: Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation. However, generating music, as a special type of audio, presents unique challenges due to limited availability of music data and sensitive issues related to copyright and plagiarism. In this paper, to tackle these challenges, we first construct a state-of-the-art text… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: 16 pages, 3 figures, 2 tables, demo page: https://musicldm.github.io/

  27. arXiv:2306.09635  [pdf, other

    cs.SD cs.LG cs.MM eess.AS eess.SP

    CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models

    Authors: Hao-Wen Dong, Xiaoyu Liu, Jordi Pons, Gautam Bhattacharya, Santiago Pascual, Joan Serrà, Taylor Berg-Kirkpatrick, Julian McAuley

    Abstract: Recent work has studied text-to-audio synthesis using large amounts of paired text-audio data. However, audio recordings with high-quality text annotations can be difficult to acquire. In this work, we approach text-to-audio synthesis using unlabeled videos and pretrained language-vision models. We propose to learn the desired text-audio correspondence by leveraging the visual modality as a bridge… ▽ More

    Submitted 23 July, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted by WASPAA 2023. Demo: https://salu133445.github.io/clipsonic/

  28. arXiv:2306.07998  [pdf, other

    cs.CV cs.AI

    Contrastive Attention Networks for Attribution of Early Modern Print

    Authors: Nikolai Vogler, Kartik Goyal, Kishore PV Reddy, Elizaveta Pertseva, Samuel V. Lemley, Christopher N. Warren, Max G'Sell, Taylor Berg-Kirkpatrick

    Abstract: In this paper, we develop machine learning techniques to identify unknown printers in early modern (c.~1500--1800) English printed books. Specifically, we focus on matching uniquely damaged character type-imprints in anonymously printed books to works with known printers in order to provide evidence of their origins. Until now, this work has been limited to manual investigations by analytical bibl… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: Proceedings of AAAI 2023

  29. arXiv:2305.18462  [pdf, other

    cs.CL cs.CR cs.LG

    Membership Inference Attacks against Language Models via Neighbourhood Comparison

    Authors: Justus Mattern, Fatemehsadat Mireshghallah, Zhijing Jin, Bernhard Schölkopf, Mrinmaya Sachan, Taylor Berg-Kirkpatrick

    Abstract: Membership Inference attacks (MIAs) aim to predict whether a data sample was present in the training data of a machine learning model or not, and are widely used for assessing the privacy risks of language models. Most existing attacks rely on the observation that models tend to assign higher probabilities to their training samples than non-training points. However, simple thresholding of the mode… ▽ More

    Submitted 7 August, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

  30. arXiv:2305.14779  [pdf, other

    cs.CV cs.CL cs.LG

    Alt-Text with Context: Improving Accessibility for Images on Twitter

    Authors: Nikita Srivatsan, Sofia Samaniego, Omar Florez, Taylor Berg-Kirkpatrick

    Abstract: In this work we present an approach for generating alternative text (or alt-text) descriptions for images shared on social media, specifically Twitter. More than just a special case of image captioning, alt-text is both more literally descriptive and context-specific. Also critically, images posted to Twitter are often accompanied by user-written text that despite not necessarily describing the im… ▽ More

    Submitted 29 February, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: ICLR 2024

  31. arXiv:2305.09859  [pdf, other

    cs.CL cs.LG

    Smaller Language Models are Better Black-box Machine-Generated Text Detectors

    Authors: Niloofar Mireshghallah, Justus Mattern, Sicun Gao, Reza Shokri, Taylor Berg-Kirkpatrick

    Abstract: With the advent of fluent generative language models that can produce convincing utterances very similar to those written by humans, distinguishing whether a piece of text is machine-generated or human-written becomes more challenging and more important, as such models could be used to spread misinformation, fake news, fake reviews and to mimic certain authors and figures. To this end, there have… ▽ More

    Submitted 24 February, 2024; v1 submitted 16 May, 2023; originally announced May 2023.

  32. arXiv:2305.07447  [pdf, other

    cs.SD eess.AS

    Universal Source Separation with Weakly Labelled Data

    Authors: Qiuqiang Kong, Ke Chen, Haohe Liu, Xingjian Du, Taylor Berg-Kirkpatrick, Shlomo Dubnov, Mark D. Plumbley

    Abstract: Universal source separation (USS) is a fundamental research task for computational auditory scene analysis, which aims to separate mono recordings into individual source tracks. There are three potential challenges awaiting the solution to the audio source separation task. First, previous audio source separation systems mainly focus on separating one or a limited number of specific sources. There… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

  33. arXiv:2301.04253  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Towards Answering Climate Questionnaires from Unstructured Climate Reports

    Authors: Daniel Spokoyny, Tanmay Laud, Tom Corringham, Taylor Berg-Kirkpatrick

    Abstract: The topic of Climate Change (CC) has received limited attention in NLP despite its urgency. Activists and policymakers need NLP tools to effectively process the vast and rapidly growing unstructured textual climate reports into structured form. To tackle this challenge we introduce two new large-scale climate questionnaire datasets and use their existing structure to train self-supervised models.… ▽ More

    Submitted 27 July, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

  34. arXiv:2212.10726  [pdf, other

    cs.CL cs.LG

    Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval

    Authors: John Wieting, Jonathan H. Clark, William W. Cohen, Graham Neubig, Taylor Berg-Kirkpatrick

    Abstract: Contrastive learning has been successfully used for retrieval of semantically aligned sentences, but it often requires large batch sizes or careful engineering to work well. In this paper, we instead propose a generative model for learning multilingual text embeddings which can be used to retrieve or score sentence pairs. Our model operates on parallel data in $N$ languages and, through an approxi… ▽ More

    Submitted 4 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: Published as a long paper at ACL 2023

  35. arXiv:2212.07065  [pdf, other

    cs.SD cs.CV cs.LG cs.MM eess.AS

    CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos

    Authors: Hao-Wen Dong, Naoya Takahashi, Yuki Mitsufuji, Julian McAuley, Taylor Berg-Kirkpatrick

    Abstract: Recent years have seen progress beyond domain-specific sound separation for speech or music towards universal sound separation for arbitrary sounds. Prior work on universal sound separation has investigated separating a target sound out of an audio mixture given a text query. Such text-queried sound separation systems provide a natural and scalable interface for specifying arbitrary target sounds.… ▽ More

    Submitted 3 March, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

    Comments: Accepted by ICLR 2023. Audio samples can be found at https://sony.github.io/CLIPSep/

  36. arXiv:2211.06687  [pdf, other

    cs.SD eess.AS

    Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

    Authors: Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Marianna Nezhurina, Taylor Berg-Kirkpatrick, Shlomo Dubnov

    Abstract: Contrastive learning has shown remarkable success in the field of multimodal representation learning. In this paper, we propose a pipeline of contrastive language-audio pretraining to develop an audio representation by combining audio data with natural language descriptions. To accomplish this target, we first release LAION-Audio-630K, a large collection of 633,526 audio-text pairs from different… ▽ More

    Submitted 21 March, 2024; v1 submitted 12 November, 2022; originally announced November 2022.

  37. arXiv:2209.05706  [pdf, other

    cs.CL

    Non-Parametric Temporal Adaptation for Social Media Topic Classification

    Authors: Fatemehsadat Mireshghallah, Nikolai Vogler, Junxian He, Omar Florez, Ahmed El-Kishky, Taylor Berg-Kirkpatrick

    Abstract: User-generated social media data is constantly changing as new trends influence online discussion and personal information is deleted due to privacy concerns. However, most current NLP models are static and rely on fixed training data, which means they are unable to adapt to temporal change -- both test distribution shift and deleted training data -- without frequent, costly re-training. In this p… ▽ More

    Submitted 15 May, 2023; v1 submitted 12 September, 2022; originally announced September 2022.

  38. arXiv:2209.05622  [pdf, other

    cs.LG

    Checklist Models for Improved Output Fluency in Piano Fingering Prediction

    Authors: Nikita Srivatsan, Taylor Berg-Kirkpatrick

    Abstract: In this work we present a new approach for the task of predicting fingerings for piano music. While prior neural approaches have often treated this as a sequence tagging problem with independent predictions, we put forward a checklist system, trained via reinforcement learning, that maintains a representation of recent predictions in addition to a hidden state, allowing it to learn soft constraint… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

    Comments: ISMIR 2022

  39. arXiv:2209.02871  [pdf, other

    cs.SD cs.MM eess.AS

    Improving Choral Music Separation through Expressive Synthesized Data from Sampled Instruments

    Authors: Ke Chen, Hao-Wen Dong, Yi Luo, Julian McAuley, Taylor Berg-Kirkpatrick, Miller Puckette, Shlomo Dubnov

    Abstract: Choral music separation refers to the task of extracting tracks of voice parts (e.g., soprano, alto, tenor, and bass) from mixed audio. The lack of datasets has impeded research on this topic as previous work has only been able to train and evaluate models on a few minutes of choral music data due to copyright issues and dataset collection difficulties. In this paper, we investigate the use of syn… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

    Comments: Camera Ready for Proceedings of the 23rd International Society for Music Information Retrieval Conference, ISMIR 2022

    Journal ref: The 23rd International Society for Music Information Retrieval Conference, 2022

  40. arXiv:2207.06983  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Multitrack Music Transformer

    Authors: Hao-Wen Dong, Ke Chen, Shlomo Dubnov, Julian McAuley, Taylor Berg-Kirkpatrick

    Abstract: Existing approaches for generating multitrack music with transformer models have been limited in terms of the number of instruments, the length of the music segments and slow inference. This is partly due to the memory requirements of the lengthy input sequences necessitated by existing representations. In this work, we propose a new multitrack music representation that allows a diverse set of ins… ▽ More

    Submitted 24 May, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: Accepted by ICASSP 2023. Demo: https://salu133445.github.io/mmt/ . Code: https://github.com/salu133445/mmt

  41. arXiv:2205.12506  [pdf, other

    cs.CL cs.LG

    Memorization in NLP Fine-tuning Methods

    Authors: Fatemehsadat Mireshghallah, Archit Uniyal, Tianhao Wang, David Evans, Taylor Berg-Kirkpatrick

    Abstract: Large language models are shown to present privacy risks through memorization of training data, and several recent works have studied such risks for the pre-training phase. Little attention, however, has been given to the fine-tuning phase and it is not well understood how different fine-tuning methods (such as fine-tuning the full model, the model head, and adapter) compare in terms of memorizati… ▽ More

    Submitted 3 November, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

  42. arXiv:2205.00049  [pdf, other

    cs.CL cs.LG

    Prompt Consistency for Zero-Shot Task Generalization

    Authors: Chunting Zhou, Junxian He, Xuezhe Ma, Taylor Berg-Kirkpatrick, Graham Neubig

    Abstract: One of the most impressive results of recent NLP history is the ability of pre-trained language models to solve new tasks in a zero-shot setting. To achieve this, NLP tasks are framed as natural language prompts, generating a response indicating the predicted output. Nonetheless, the performance in such settings often lags far behind its supervised counterpart, suggesting a large space for potenti… ▽ More

    Submitted 26 December, 2022; v1 submitted 29 April, 2022; originally announced May 2022.

    Comments: EMNLP 2022 Findings. Code is available at https://github.com/violet-zct/swarm-distillation-zero-shot

  43. arXiv:2203.13299  [pdf, other

    cs.CL cs.LG

    Mix and Match: Learning-free Controllable Text Generation using Energy Language Models

    Authors: Fatemehsadat Mireshghallah, Kartik Goyal, Taylor Berg-Kirkpatrick

    Abstract: Recent work on controlled text generation has either required attribute-based fine-tuning of the base language model (LM), or has restricted the parameterization of the attribute discriminator to be compatible with the base autoregressive LM. In this work, we propose Mix and Match LM, a global score-based alternative for controllable text generation that combines arbitrary pre-trained black-box mo… ▽ More

    Submitted 4 April, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: Camera ready--ACL 2022 (minor edits)

  44. arXiv:2203.11399  [pdf, other

    cs.CL

    Achieving Conversational Goals with Unsupervised Post-hoc Knowledge Injection

    Authors: Bodhisattwa Prasad Majumder, Harsh Jhamtani, Taylor Berg-Kirkpatrick, Julian McAuley

    Abstract: A limitation of current neural dialog models is that they tend to suffer from a lack of specificity and informativeness in generated responses, primarily due to dependence on training data that covers a limited variety of scenarios and conveys limited knowledge. One way to alleviate this issue is to extract relevant knowledge from external sources at decoding time and incorporate it into the dialo… ▽ More

    Submitted 21 March, 2022; originally announced March 2022.

    Comments: Accepted at ACL 2022 main conference

  45. arXiv:2203.03929  [pdf, other

    cs.LG cs.AI cs.CR

    Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks

    Authors: Fatemehsadat Mireshghallah, Kartik Goyal, Archit Uniyal, Taylor Berg-Kirkpatrick, Reza Shokri

    Abstract: The wide adoption and application of Masked language models~(MLMs) on sensitive data (from legal to medical) necessitates a thorough quantitative investigation into their privacy vulnerabilities -- to what extent do MLMs leak information about their training data? Prior attempts at measuring leakage of MLMs via membership inference attacks have been inconclusive, implying the potential robustness… ▽ More

    Submitted 3 November, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

  46. arXiv:2202.06034  [pdf, other

    cs.SD cs.LG cs.MM eess.AS eess.SP

    Deep Performer: Score-to-Audio Music Performance Synthesis

    Authors: Hao-Wen Dong, Cong Zhou, Taylor Berg-Kirkpatrick, Julian McAuley

    Abstract: Music performance synthesis aims to synthesize a musical score into a natural performance. In this paper, we borrow recent advances in text-to-speech synthesis and present the Deep Performer -- a novel system for score-to-audio music performance synthesis. Unlike speech, music often contains polyphony and long notes. Hence, we propose two new techniques for handling polyphonic inputs and providing… ▽ More

    Submitted 20 February, 2022; v1 submitted 12 February, 2022; originally announced February 2022.

    Comments: ICASSP 2022 final version with appendix

  47. arXiv:2202.00951  [pdf, other

    eess.AS cs.AI cs.LG cs.MM cs.SD

    TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music

    Authors: Ke Chen, Shuai Yu, Cheng-i Wang, Wei Li, Taylor Berg-Kirkpatrick, Shlomo Dubnov

    Abstract: Singing melody extraction is an important problem in the field of music information retrieval. Existing methods typically rely on frequency-domain representations to estimate the sung frequencies. However, this design does not lead to human-level performance in the perception of melody information for both tone (pitch-class) and octave. In this paper, we propose TONet, a plug-and-play model that i… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

    Comments: Preprint Version for ICASSP 2022, Singapore

  48. arXiv:2202.00874  [pdf, other

    cs.SD cs.AI cs.IR cs.LG eess.AS

    HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection

    Authors: Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov

    Abstract: Audio classification is an important task of mapping audio samples into their corresponding labels. Recently, the transformer model with self-attention mechanisms has been adopted in this field. However, existing audio transformers require large GPU memories and long training time, meanwhile relying on pretrained vision models to achieve high performance, which limits the model's scalability in au… ▽ More

    Submitted 1 February, 2022; originally announced February 2022.

    Comments: Preprint version for ICASSP 2022, Singapore

  49. arXiv:2201.02321  [pdf, other

    cs.CL

    An Unsupervised Masking Objective for Abstractive Multi-Document News Summarization

    Authors: Nikolai Vogler, Songlin Li, Yujie Xu, Yujian Mi, Taylor Berg-Kirkpatrick

    Abstract: We show that a simple unsupervised masking objective can approach near supervised performance on abstractive multi-document news summarization. Our method trains a state-of-the-art neural summarization model to predict the masked out source document with highest lexical centrality relative to the multi-document group. In experiments on the Multi-News dataset, our masked training objective yields a… ▽ More

    Submitted 6 January, 2022; originally announced January 2022.

  50. arXiv:2112.08692  [pdf, other

    cs.CV cs.CL cs.LG

    Lacuna Reconstruction: Self-supervised Pre-training for Low-Resource Historical Document Transcription

    Authors: Nikolai Vogler, Jonathan Parkes Allen, Matthew Thomas Miller, Taylor Berg-Kirkpatrick

    Abstract: We present a self-supervised pre-training approach for learning rich visual language representations for both handwritten and printed historical document transcription. After supervised fine-tuning of our pre-trained encoder representations for low-resource document transcription on two languages, (1) a heterogeneous set of handwritten Islamicate manuscript images and (2) early modern English prin… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.