Skip to main content

Showing 1–7 of 7 results for author: D'Amico-Wong, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.21530  [pdf, other

    cs.CL cs.LG

    Data Contamination Report from the 2024 CONDA Shared Task

    Authors: Oscar Sainz, Iker García-Ferrero, Alon Jacovi, Jon Ander Campos, Yanai Elazar, Eneko Agirre, Yoav Goldberg, Wei-Lin Chen, Jenny Chim, Leshem Choshen, Luca D'Amico-Wong, Melissa Dell, Run-Ze Fan, Shahriar Golchin, Yucheng Li, Pengfei Liu, Bhavish Pahwa, Ameya Prabhu, Suryansh Sharma, Emily Silcock, Kateryna Solonko, David Stap, Mihai Surdeanu, Yu-Min Tseng, Vishaal Udandarao , et al. (3 additional authors not shown)

    Abstract: The 1st Workshop on Data Contamination (CONDA 2024) focuses on all relevant aspects of data contamination in natural language processing, where data contamination is understood as situations where evaluation data is included in pre-training corpora used to train large scale models, compromising evaluation results. The workshop fostered a shared task to collect evidence on data contamination in cur… ▽ More

    Submitted 4 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Comments: https://huggingface.co/spaces/CONDA-Workshop/Data-Contamination-Database

  2. arXiv:2406.09490  [pdf, other

    cs.CL econ.GN

    Newswire: A Large-Scale Structured Database of a Century of Historical News

    Authors: Emily Silcock, Abhishek Arora, Luca D'Amico-Wong, Melissa Dell

    Abstract: In the U.S. historically, local newspapers drew their content largely from newswires like the Associated Press. Historians argue that newswires played a pivotal role in creating a national identity and shared understanding of the world, but there is no comprehensive archive of the content sent over newswires. We reconstruct such an archive by applying a customized deep learning pipeline to hundred… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2306.17810, arXiv:2308.12477

  3. arXiv:2406.07385  [pdf, other

    cs.GT cs.CC

    Disrupting Bipartite Trading Networks: Matching for Revenue Maximization

    Authors: Luca D'Amico-Wong, Yannai A. Gonczarowski, Gary Qiurui Ma, David C. Parkes

    Abstract: We model the role of an online platform disrupting a market with unit-demand buyers and unit-supply sellers. Each seller can transact with a subset of the buyers whom she already knows, as well as with any additional buyers to whom she is introduced by the platform. Given these constraints on trade, prices and transactions are induced by a competitive equilibrium. The platform's revenue is proport… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted at the Twenty-Fifth ACM Conference on Economics and Computation (EC'24), 2024

  4. arXiv:2402.11835  [pdf, other

    cs.LG cs.GT cs.MA

    Easy as ABCs: Unifying Boltzmann Q-Learning and Counterfactual Regret Minimization

    Authors: Luca D'Amico-Wong, Hugh Zhang, Marc Lanctot, David C. Parkes

    Abstract: We propose ABCs (Adaptive Branching through Child stationarity), a best-of-both-worlds algorithm combining Boltzmann Q-learning (BQL), a classic reinforcement learning algorithm for single-agent domains, and counterfactual regret minimization (CFR), a central algorithm for learning in multi-agent domains. ABCs adaptively chooses what fraction of the environment to explore each iteration by measuri… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  5. arXiv:2308.12477  [pdf, other

    cs.CL cs.CV econ.GN

    American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers

    Authors: Melissa Dell, Jacob Carlson, Tom Bryan, Emily Silcock, Abhishek Arora, Zejiang Shen, Luca D'Amico-Wong, Quan Le, Pablo Querubin, Leander Heldring

    Abstract: Existing full text datasets of U.S. public domain newspapers do not recognize the often complex layouts of newspaper scans, and as a result the digitized content scrambles texts from articles, headlines, captions, advertisements, and other layout regions. OCR quality can also be low. This study develops a novel, deep learning pipeline for extracting full article texts from newspaper images and app… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  6. arXiv:2306.09478  [pdf, other

    cs.LG

    Understanding and Mitigating Extrapolation Failures in Physics-Informed Neural Networks

    Authors: Lukas Fesser, Luca D'Amico-Wong, Richard Qiu

    Abstract: Physics-informed Neural Networks (PINNs) have recently gained popularity due to their effective approximation of partial differential equations (PDEs) using deep neural networks (DNNs). However, their out of domain behavior is not well understood, with previous work speculating that the presence of high frequency components in the solution function might be to blame for poor extrapolation performa… ▽ More

    Submitted 26 November, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

  7. arXiv:2210.04261  [pdf, other

    cs.CL

    Noise-Robust De-Duplication at Scale

    Authors: Emily Silcock, Luca D'Amico-Wong, Jinglin Yang, Melissa Dell

    Abstract: Identifying near duplicates within large, noisy text corpora has a myriad of applications that range from de-duplicating training datasets, reducing privacy risk, and evaluating test set leakage, to identifying reproduced news articles and literature within large corpora. Across these diverse applications, the overwhelming majority of work relies on N-grams. Limited efforts have been made to evalu… ▽ More

    Submitted 24 April, 2024; v1 submitted 9 October, 2022; originally announced October 2022.