Skip to main content

Showing 1–23 of 23 results for author: Phillips, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.06287  [pdf, other

    cs.AI

    Deep Research Bench: Evaluating AI Web Research Agents

    Authors: FutureSearch, :, Nikos I. Bosse, Jon Evans, Robert G. Gambee, Daniel Hnyk, Peter Mühlbacher, Lawrence Phillips, Dan Schwarz, Jack Wildman

    Abstract: Amongst the most common use cases of modern AI is LLM chat with web search enabled. However, no direct evaluations of the quality of web research agents exist that control for the continually-changing web. We introduce Deep Research Bench, consisting of 89 multi-step web research task instances of varying difficulty across 8 diverse task categories, with the answers carefully worked out by skilled… ▽ More

    Submitted 6 May, 2025; originally announced June 2025.

  2. arXiv:2505.21801  [pdf, ps, other

    cs.DB

    Query, Don't Train: Privacy-Preserving Tabular Prediction from EHR Data via SQL Queries

    Authors: Josefa Lia Stoisser, Marc Boubnovski Martell, Kaspar Märtens, Lawrence Phillips, Stephen Michael Town, Rory Donovan-Maiye, Julien Fauqueur

    Abstract: Electronic health records (EHRs) contain richly structured, longitudinal data essential for predictive modeling, yet stringent privacy regulations (e.g., HIPAA, GDPR) often restrict access to individual-level records. We introduce Query, Don't Train (QDT): a structured-data foundation-model interface enabling tabular inference via LLM-generated SQL over EHRs. Instead of training on or accessing in… ▽ More

    Submitted 29 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  3. arXiv:2409.14913  [pdf, other

    cs.CL cs.IR cs.LG

    Towards a Realistic Long-Term Benchmark for Open-Web Research Agents

    Authors: Peter Mühlbacher, Nikos I. Bosse, Lawrence Phillips

    Abstract: We present initial results of a forthcoming benchmark for evaluating LLM agents on white-collar tasks of economic value. We evaluate agents on real-world "messy" open-web research tasks of the type that are routine in finance and consulting. In doing so, we lay the groundwork for an LLM agent evaluation suite where good performance directly corresponds to a large economic and societal impact. We b… ▽ More

    Submitted 25 September, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

  4. arXiv:2201.11766  [pdf, other

    cs.CL cs.AI cs.LG

    Recursive Decoding: A Situated Cognition Approach to Compositional Generation in Grounded Language Understanding

    Authors: Matthew Setzler, Scott Howland, Lauren Phillips

    Abstract: Compositional generalization is a troubling blind spot for neural language models. Recent efforts have presented techniques for improving a model's ability to encode novel combinations of known inputs, but less work has focused on generating novel combinations of known outputs. Here we focus on this latter "decode-side" form of generalization in the context of gSCAN, a synthetic benchmark for comp… ▽ More

    Submitted 18 February, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

  5. arXiv:2111.10937  [pdf, other

    cs.LG cs.CV

    Adaptive Transfer Learning: a simple but effective transfer learning

    Authors: Jung H Lee, Henry J Kvinge, Scott Howland, Zachary New, John Buckheit, Lauren A. Phillips, Elliott Skomski, Jessica Hibler, Courtney D. Corley, Nathan O. Hodas

    Abstract: Transfer learning (TL) leverages previously obtained knowledge to learn new tasks efficiently and has been used to train deep learning (DL) models with limited amount of data. When TL is applied to DL, pretrained (teacher) models are fine-tuned to build domain specific (student) models. This fine-tuning relies on the fact that DL model can be decomposed to classifiers and feature extractors, and a… ▽ More

    Submitted 21 November, 2021; originally announced November 2021.

    Comments: 10 pages, 7 figures

  6. arXiv:2107.13616  [pdf, other

    eess.AS cs.NE cs.SD

    Proposal-based Few-shot Sound Event Detection for Speech and Environmental Sounds with Perceivers

    Authors: Piper Wolters, Logan Sizemore, Chris Daw, Brian Hutchinson, Lauren Phillips

    Abstract: Many applications involve detecting and localizing specific sound events within long, untrimmed documents, including keyword spotting, medical observation, and bioacoustic monitoring for conservation. Deep learning techniques often set the state-of-the-art for these tasks. However, for some types of events, there is insufficient labeled data to train such models. In this paper, we propose a region… ▽ More

    Submitted 23 December, 2023; v1 submitted 28 July, 2021; originally announced July 2021.

    Comments: Updated results based on additional experimentation and moved dataset generation prose to stand-alone section

  7. arXiv:2107.02526  [pdf, other

    cs.LG stat.ML

    Intrinsic uncertainties and where to find them

    Authors: Francesco Farina, Lawrence Phillips, Nicola J Richmond

    Abstract: We introduce a framework for uncertainty estimation that both describes and extends many existing methods. We consider typical hyperparameters involved in classical training as random variables and marginalise them out to capture various sources of uncertainty in the parameter space. We investigate which forms and combinations of marginalisation are most useful from a practical point of view on st… ▽ More

    Submitted 6 July, 2021; originally announced July 2021.

    Comments: Presented at the ICML 2021 Workshop on Uncertainty and Robustness in Deep Learning

  8. arXiv:2106.01423  [pdf, other

    cs.LG cs.AI cs.CV math.MG

    One Representation to Rule Them All: Identifying Out-of-Support Examples in Few-shot Learning with Generic Representations

    Authors: Henry Kvinge, Scott Howland, Nico Courts, Lauren A. Phillips, John Buckheit, Zachary New, Elliott Skomski, Jung H. Lee, Sandeep Tiwari, Jessica Hibler, Courtney D. Corley, Nathan O. Hodas

    Abstract: The field of few-shot learning has made remarkable strides in developing powerful models that can operate in the small data regime. Nearly all of these methods assume every unlabeled instance encountered will belong to a handful of known classes for which one has examples. This can be problematic for real-world use cases where one routinely finds 'none-of-the-above' examples. In this paper we desc… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: 15 pages

  9. arXiv:2104.03496  [pdf, other

    cs.CV cs.LG cs.NE

    Prototypical Region Proposal Networks for Few-Shot Localization and Classification

    Authors: Elliott Skomski, Aaron Tuor, Andrew Avila, Lauren Phillips, Zachary New, Henry Kvinge, Courtney D. Corley, Nathan Hodas

    Abstract: Recently proposed few-shot image classification methods have generally focused on use cases where the objects to be classified are the central subject of images. Despite success on benchmark vision datasets aligned with this use case, these methods typically fail on use cases involving densely-annotated, busy images: images common in the wild where objects of relevance are not the central subject,… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: 9 pages, 1 figure. Submitted to 4th Workshop on Meta-Learning at NeurIPS 2020

  10. arXiv:2103.03228  [pdf, other

    cs.LG cs.DS cs.GT

    One for One, or All for All: Equilibria and Optimality of Collaboration in Federated Learning

    Authors: Avrim Blum, Nika Haghtalab, Richard Lanas Phillips, Han Shao

    Abstract: In recent years, federated learning has been embraced as an approach for bringing about collaboration across large populations of learning agents. However, little is known about how collaboration protocols should take agents' incentives into account when allocating individual resources for communal learning in order to maintain such collaborations. Inspired by game theoretic notions, this paper in… ▽ More

    Submitted 4 March, 2021; originally announced March 2021.

  11. arXiv:2012.07421  [pdf, other

    cs.LG

    WILDS: A Benchmark of in-the-Wild Distribution Shifts

    Authors: Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton A. Earnshaw, Imran S. Haque, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, Percy Liang

    Abstract: Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild. Despite their ubiquity in the real-world deployments, these distribution shifts are under-represented in the datasets widely used in the ML community today. To address this gap, we present WILDS, a curated benchma… ▽ More

    Submitted 16 July, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

  12. arXiv:2012.01573  [pdf, other

    eess.AS cs.NE cs.SD

    A Study of Few-Shot Audio Classification

    Authors: Piper Wolters, Chris Careaga, Brian Hutchinson, Lauren Phillips

    Abstract: Advances in deep learning have resulted in state-of-the-art performance for many audio classification tasks but, unlike humans, these systems traditionally require large amounts of data to make accurate predictions. Not every person or organization has access to those resources, and the organizations that do, like our field at large, do not reflect the demographics of our country. Enabling people… ▽ More

    Submitted 2 December, 2020; originally announced December 2020.

    Comments: Presented at GHC 2020

  13. arXiv:2009.11253  [pdf, other

    cs.LG cs.AI cs.CV math.GN stat.ML

    Fuzzy Simplicial Networks: A Topology-Inspired Model to Improve Task Generalization in Few-shot Learning

    Authors: Henry Kvinge, Zachary New, Nico Courts, Jung H. Lee, Lauren A. Phillips, Courtney D. Corley, Aaron Tuor, Andrew Avila, Nathan O. Hodas

    Abstract: Deep learning has shown great success in settings with massive amounts of data but has struggled when data is limited. Few-shot learning algorithms, which seek to address this limitation, are designed to generalize well to new tasks with limited data. Typically, models are evaluated on unseen classes and datasets that are defined by the same fundamental task as they are trained for (e.g. category… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

    Comments: 17 pages

  14. arXiv:2003.07896  [pdf, other

    stat.ML cs.LG

    Teacher-Student Domain Adaptation for Biosensor Models

    Authors: Lawrence G. Phillips, David B. Grimes, Yihan Jessie Li

    Abstract: We present an approach to domain adaptation, addressing the case where data from the source domain is abundant, labelled data from the target domain is limited or non-existent, and a small amount of paired source-target data is available. The method is designed for developing deep learning models that detect the presence of medical conditions based on data from consumer-grade portable biosensors.… ▽ More

    Submitted 19 March, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

    Comments: ICLR 2020

  15. arXiv:1911.06876  [pdf, other

    cs.LG cs.AI stat.ML

    Explanatory Masks for Neural Network Interpretability

    Authors: Lawrence Phillips, Garrett Goh, Nathan Hodas

    Abstract: Neural network interpretability is a vital component for applications across a wide variety of domains. In such cases it is often useful to analyze a network which has already been trained for its specific purpose. In this work, we develop a method to produce explanation masks for pre-trained networks. The mask localizes the most important aspects of each input for prediction of the original netwo… ▽ More

    Submitted 15 November, 2019; originally announced November 2019.

    Comments: Presented at IJCAI-18 Workshop on Explainable Artificial Intelligence (XAI)

  16. arXiv:1909.09602  [pdf, other

    cs.CV cs.LG stat.ML

    Metric-Based Few-Shot Learning for Video Action Recognition

    Authors: Chris Careaga, Brian Hutchinson, Nathan Hodas, Lawrence Phillips

    Abstract: In the few-shot scenario, a learner must effectively generalize to unseen classes given a small support set of labeled examples. While a relatively large amount of research has gone into few-shot learning for image classification, little work has been done on few-shot video classification. In this work, we address the task of few-shot video action recognition with a set of two-stream models. We ev… ▽ More

    Submitted 14 September, 2019; originally announced September 2019.

  17. arXiv:1908.02065  [pdf, other

    cs.LG stat.ML

    Sparse hierarchical representation learning on molecular graphs

    Authors: Matthias Bal, Hagen Triendl, Mariana Assmann, Michael Craig, Lawrence Phillips, Jarvist Moore Frost, Usman Bashir, Noor Shaker, Vid Stojevic

    Abstract: Architectures for sparse hierarchical representation learning have recently been proposed for graph-structured data, but so far assume the absence of edge features in the graph. We close this gap and propose a method to pool graphs with edge features, inspired by the hierarchical nature of chemistry. In particular, we introduce two types of pooling layers compatible with an edge-feature graph-conv… ▽ More

    Submitted 6 August, 2019; originally announced August 2019.

    Comments: 4 pages, 2 figures, accepted as a DLG 2019 workshop paper at KDD 2019

  18. arXiv:1906.08652  [pdf, other

    cs.LG stat.ML

    Disentangling Influence: Using Disentangled Representations to Audit Model Predictions

    Authors: Charles T. Marx, Richard Lanas Phillips, Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian

    Abstract: Motivated by the need to audit complex and black box models, there has been extensive research on quantifying how data features influence model predictions. Feature influence can be direct (a direct influence on model outcomes) and indirect (model outcomes are influenced via proxy features). Feature influence can also be expressed in aggregate over the training or test data or locally with respect… ▽ More

    Submitted 20 June, 2019; originally announced June 2019.

  19. arXiv:1802.04376  [pdf, other

    cs.LG stat.ML

    Few-Shot Learning with Metric-Agnostic Conditional Embeddings

    Authors: Nathan Hilliard, Lawrence Phillips, Scott Howland, Artëm Yankov, Courtney D. Corley, Nathan O. Hodas

    Abstract: Learning high quality class representations from few examples is a key problem in metric-learning approaches to few-shot learning. To accomplish this, we introduce a novel architecture where class representations are conditioned for each few-shot trial based on a target image. We also deviate from traditional metric-learning approaches by training a network to perform comparisons between classes r… ▽ More

    Submitted 12 February, 2018; originally announced February 2018.

  20. arXiv:1708.00049  [pdf, other

    stat.ML cs.LG

    Interpretable Active Learning

    Authors: Richard L. Phillips, Kyu Hyun Chang, Sorelle A. Friedler

    Abstract: Active learning has long been a topic of study in machine learning. However, as increasingly complex and opaque models have become standard practice, the process of active learning, too, has become more opaque. There has been little investigation into interpreting what specific trends and patterns an active learning strategy may be exploring. This work expands on the Local Interpretable Model-agno… ▽ More

    Submitted 23 June, 2018; v1 submitted 31 July, 2017; originally announced August 2017.

    Comments: 13 pages, 8 figures, presented at 2018 Conference on Fairness, Accountability, and Transparency (FAT*), New York, New York, USA. Proceedings of the 1st Conference on Fairness, Accountability and Transparency, PMLR 81:49-61, 2018

  21. arXiv:1706.06134  [pdf, ps, other

    cs.CY

    Using Social Media to Predict the Future: A Systematic Literature Review

    Authors: Lawrence Phillips, Chase Dowling, Kyle Shaffer, Nathan Hodas, Svitlana Volkova

    Abstract: Social media (SM) data provides a vast record of humanity's everyday thoughts, feelings, and actions at a resolution previously unimaginable. Because user behavior on SM is a reflection of events in the real world, researchers have realized they can use SM in order to forecast, making predictions about the future. The advantage of SM data is its relative ease of acquisition, large quantity, and ab… ▽ More

    Submitted 19 June, 2017; originally announced June 2017.

  22. arXiv:1706.01839  [pdf, other

    cs.CL

    Assessing the Linguistic Productivity of Unsupervised Deep Neural Networks

    Authors: Lawrence Phillips, Nathan Hodas

    Abstract: Increasingly, cognitive scientists have demonstrated interest in applying tools from deep learning. One use for deep learning is in language acquisition where it is useful to know if a linguistic phenomenon can be learned through domain-general means. To assess whether unsupervised deep learning is appropriate, we first pose a smaller question: Can unsupervised neural networks apply linguistic rul… ▽ More

    Submitted 6 June, 2017; originally announced June 2017.

    Comments: To be presented at the 39th Annual Meeting of the Cognitive Science Society

  23. arXiv:1208.5752  [pdf, other

    math.OC cond-mat.soft cs.CG

    Optimal Fillings - A new spatial subdivision problem related to packing and covering

    Authors: Carolyn L. Phillips, Joshua A. Anderson, Elizabeth R. Chen, Sharon C. Glotzer

    Abstract: We present filling as a new type of spatial subdivision problem that is related to covering and packing. Filling addresses the optimal placement of overlapping objects lying entirely inside an arbitrary shape so as to cover the most interior volume. In n-dimensional space, if the objects are polydisperse n-balls, we show that solutions correspond to sets of maximal n-balls and the solution space c… ▽ More

    Submitted 28 August, 2012; originally announced August 2012.

    Comments: 38 pages, 21 figures