-
AbLit: A Resource for Analyzing and Generating Abridged Versions of English Literature
Authors:
Melissa Roemmele,
Kyle Shaffer,
Katrina Olsen,
Yiyi Wang,
Steve DeNeefe
Abstract:
Creating an abridged version of a text involves shortening it while maintaining its linguistic qualities. In this paper, we examine this task from an NLP perspective for the first time. We present a new resource, AbLit, which is derived from abridged versions of English literature books. The dataset captures passage-level alignments between the original and abridged texts. We characterize the ling…
▽ More
Creating an abridged version of a text involves shortening it while maintaining its linguistic qualities. In this paper, we examine this task from an NLP perspective for the first time. We present a new resource, AbLit, which is derived from abridged versions of English literature books. The dataset captures passage-level alignments between the original and abridged texts. We characterize the linguistic relations of these alignments, and create automated models to predict these relations as well as to generate abridgements for new texts. Our findings establish abridgement as a challenging task, motivating future resources and research. The dataset is available at github.com/roemmele/AbLit.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
Using Social Media to Predict the Future: A Systematic Literature Review
Authors:
Lawrence Phillips,
Chase Dowling,
Kyle Shaffer,
Nathan Hodas,
Svitlana Volkova
Abstract:
Social media (SM) data provides a vast record of humanity's everyday thoughts, feelings, and actions at a resolution previously unimaginable. Because user behavior on SM is a reflection of events in the real world, researchers have realized they can use SM in order to forecast, making predictions about the future. The advantage of SM data is its relative ease of acquisition, large quantity, and ab…
▽ More
Social media (SM) data provides a vast record of humanity's everyday thoughts, feelings, and actions at a resolution previously unimaginable. Because user behavior on SM is a reflection of events in the real world, researchers have realized they can use SM in order to forecast, making predictions about the future. The advantage of SM data is its relative ease of acquisition, large quantity, and ability to capture socially relevant information, which may be difficult to gather from other data sources. Promising results exist across a wide variety of domains, but one will find little consensus regarding best practices in either methodology or evaluation. In this systematic review, we examine relevant literature over the past decade, tabulate mixed results across a number of scientific disciplines, and identify common pitfalls and best practices. We find that SM forecasting is limited by data biases, noisy data, lack of generalizable results, a lack of domain-specific theory, and underlying complexity in many prediction tasks. But despite these shortcomings, recurring findings and promising results continue to galvanize researchers and demand continued investigation. Based on the existing literature, we identify research practices which lead to success, citing specific examples in each case and making recommendations for best practices. These recommendations will help researchers take advantage of the exciting possibilities offered by SM platforms.
△ Less
Submitted 19 June, 2017;
originally announced June 2017.
-
Beyond Fine Tuning: A Modular Approach to Learning on Small Data
Authors:
Ark Anderson,
Kyle Shaffer,
Artem Yankov,
Court D. Corley,
Nathan O. Hodas
Abstract:
In this paper we present a technique to train neural network models on small amounts of data. Current methods for training neural networks on small amounts of rich data typically rely on strategies such as fine-tuning a pre-trained neural network or the use of domain-specific hand-engineered features. Here we take the approach of treating network layers, or entire networks, as modules and combine…
▽ More
In this paper we present a technique to train neural network models on small amounts of data. Current methods for training neural networks on small amounts of rich data typically rely on strategies such as fine-tuning a pre-trained neural network or the use of domain-specific hand-engineered features. Here we take the approach of treating network layers, or entire networks, as modules and combine pre-trained modules with untrained modules, to learn the shift in distributions between data sets. The central impact of using a modular approach comes from adding new representations to a network, as opposed to replacing representations via fine-tuning. Using this technique, we are able surpass results using standard fine-tuning transfer learning approaches, and we are also able to significantly increase performance over such approaches when using smaller amounts of data.
△ Less
Submitted 5 November, 2016;
originally announced November 2016.