Skip to main content

Showing 1–11 of 11 results for author: Cahill, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2412.13511  [pdf, other

    cs.CL

    CEHA: A Dataset of Conflict Events in the Horn of Africa

    Authors: Rui Bai, Di Lu, Shihao Ran, Elizabeth Olson, Hemank Lamba, Aoife Cahill, Joel Tetreault, Alex Jaimes

    Abstract: Natural Language Processing (NLP) of news articles can play an important role in understanding the dynamics and causes of violent conflict. Despite the availability of datasets categorizing various conflict events, the existing labels often do not cover all of the fine-grained violent conflict event types relevant to areas like the Horn of Africa. In this paper, we introduce a new benchmark datase… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

    Comments: Accepted by COLING 2025

  2. arXiv:2412.13098  [pdf, other

    cs.CL cs.SI

    Uchaguzi-2022: A Dataset of Citizen Reports on the 2022 Kenyan Election

    Authors: Roberto Mondini, Neema Kotonya, Robert L. Logan IV, Elizabeth M Olson, Angela Oduor Lungati, Daniel Duke Odongo, Tim Ombasa, Hemank Lamba, Aoife Cahill, Joel R. Tetreault, Alejandro Jaimes

    Abstract: Online reporting platforms have enabled citizens around the world to collectively share their opinions and report in real time on events impacting their local communities. Systematically organizing (e.g., categorizing by attributes) and geotagging large amounts of crowdsourced information is crucial to ensuring that accurate and meaningful insights can be drawn from this data and used by policy ma… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

    Comments: COLING 2025

  3. arXiv:2410.06370  [pdf, other

    cs.CL cs.AI cs.LG cs.SI

    HumVI: A Multilingual Dataset for Detecting Violent Incidents Impacting Humanitarian Aid

    Authors: Hemank Lamba, Anton Abilov, Ke Zhang, Elizabeth M. Olson, Henry k. Dambanemuya, João c. Bárcia, David S. Batista, Christina Wille, Aoife Cahill, Joel Tetreault, Alex Jaimes

    Abstract: Humanitarian organizations can enhance their effectiveness by analyzing data to discover trends, gather aggregated insights, manage their security risks, support decision-making, and inform advocacy and funding proposals. However, data about violent incidents with direct impact and relevance for humanitarian aid operations is not readily available. An automatic data collection and NLP-backed class… ▽ More

    Submitted 15 October, 2024; v1 submitted 8 October, 2024; originally announced October 2024.

  4. arXiv:2406.02416  [pdf, other

    cs.LG cs.DC

    Improved Modelling of Federated Datasets using Mixtures-of-Dirichlet-Multinomials

    Authors: Jonathan Scott, Áine Cahill

    Abstract: In practice, training using federated learning can be orders of magnitude slower than standard centralized training. This severely limits the amount of experimentation and tuning that can be done, making it challenging to obtain good performance on a given task. Server-side proxy data can be used to run training simulations, for instance for hyperparameter tuning. This can greatly speed up the tra… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  5. arXiv:2404.06430  [pdf, other

    cs.LG cs.AI cs.CR cs.CV

    pfl-research: simulation framework for accelerating research in Private Federated Learning

    Authors: Filip Granqvist, Congzheng Song, Áine Cahill, Rogier van Dalen, Martin Pelikan, Yi Sheng Chan, Xiaojun Feng, Natarajan Krishnaswami, Vojta Jina, Mona Chitnis

    Abstract: Federated learning (FL) is an emerging machine learning (ML) training paradigm where clients own their data and collaborate to train a global model, without revealing any data to the server and other participants. Researchers commonly perform experiments in a simulation environment to quickly iterate on ideas. However, existing open-source tools do not offer the efficiency required to simulate FL… ▽ More

    Submitted 10 December, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  6. arXiv:2307.15017  [pdf, other

    cs.CR cs.LG

    Samplable Anonymous Aggregation for Private Federated Data Analysis

    Authors: Kunal Talwar, Shan Wang, Audra McMillan, Vojta Jina, Vitaly Feldman, Pansy Bansal, Bailey Basile, Aine Cahill, Yi Sheng Chan, Mike Chatzidakis, Junye Chen, Oliver Chick, Mona Chitnis, Suman Ganta, Yusuf Goren, Filip Granqvist, Kristine Guo, Frederic Jacobs, Omid Javidbakht, Albert Liu, Richard Low, Dan Mascenik, Steve Myers, David Park, Wonhee Park , et al. (12 additional authors not shown)

    Abstract: We revisit the problem of designing scalable protocols for private statistics and private federated learning when each device holds its private data. Locally differentially private algorithms require little trust but are (provably) limited in their utility. Centrally differentially private algorithms can allow significantly better utility but require a trusted curator. This gap has led to signific… ▽ More

    Submitted 18 July, 2024; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: 34 pages

  7. arXiv:2306.17695  [pdf, other

    cs.CL

    A New Task and Dataset on Detecting Attacks on Human Rights Defenders

    Authors: Shihao Ran, Di Lu, Joel Tetreault, Aoife Cahill, Alejandro Jaimes

    Abstract: The ability to conduct retrospective analyses of attacks on human rights defenders over time and by location is important for humanitarian organizations to better understand historical or ongoing human rights violations and thus better manage the global impact of such events. We hypothesize that NLP can support such efforts by quickly processing large collections of news articles to detect and sum… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

  8. arXiv:2203.09943  [pdf, other

    cs.CR cs.CL cs.LG

    Training a Tokenizer for Free with Private Federated Learning

    Authors: Eugene Bagdasaryan, Congzheng Song, Rogier van Dalen, Matt Seigel, Áine Cahill

    Abstract: Federated learning with differential privacy, i.e. private federated learning (PFL), makes it possible to train models on private data distributed across users' devices without harming privacy. PFL is efficient for models, such as neural networks, that have a fixed number of parameters, and thus a fixed-dimensional gradient vector. Such models include neural-net language models, but not tokenizers… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

  9. arXiv:2102.08503  [pdf, other

    cs.LG

    Federated Evaluation and Tuning for On-Device Personalization: System Design & Applications

    Authors: Matthias Paulik, Matt Seigel, Henry Mason, Dominic Telaar, Joris Kluivers, Rogier van Dalen, Chi Wai Lau, Luke Carlson, Filip Granqvist, Chris Vandevelde, Sudeep Agarwal, Julien Freudiger, Andrew Byde, Abhishek Bhowmick, Gaurav Kapoor, Si Beaumont, Áine Cahill, Dominic Hughes, Omid Javidbakht, Fei Dong, Rehan Rishi, Stanley Hung

    Abstract: We describe the design of our federated task processing system. Originally, the system was created to support two specific federated tasks: evaluation and tuning of on-device ML systems, primarily for the purpose of personalizing these systems. In recent years, support for an additional federated task has been added: federated learning (FL) of deep neural networks. To our knowledge, only one other… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: 11 pages, 1 figure

  10. arXiv:2008.02651  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Improving on-device speaker verification using federated learning with privacy

    Authors: Filip Granqvist, Matt Seigel, Rogier van Dalen, Áine Cahill, Stephen Shum, Matthias Paulik

    Abstract: Information on speaker characteristics can be useful as side information in improving speaker recognition accuracy. However, such information is often private. This paper investigates how privacy-preserving learning can improve a speaker verification system, by enabling the use of privacy-sensitive speaker data to train an auxiliary classification model that predicts vocal characteristics of speak… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: To appear in proceedings of INTERSPEECH 2020

  11. arXiv:1403.0801  [pdf, other

    cs.CL

    Is getting the right answer just about choosing the right words? The role of syntactically-informed features in short answer scoring

    Authors: Derrick Higgins, Chris Brew, Michael Heilman, Ramon Ziai, Lei Chen, Aoife Cahill, Michael Flor, Nitin Madnani, Joel Tetreault, Daniel Blanchard, Diane Napolitano, Chong Min Lee, John Blackmore

    Abstract: Developments in the educational landscape have spurred greater interest in the problem of automatically scoring short answer questions. A recent shared task on this topic revealed a fundamental divide in the modeling approaches that have been applied to this problem, with the best-performing systems split between those that employ a knowledge engineering approach and those that almost solely lever… ▽ More

    Submitted 5 March, 2014; v1 submitted 4 March, 2014; originally announced March 2014.