Skip to main content

Showing 1–22 of 22 results for author: Arbour, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.16234  [pdf, ps, other

    cs.LG

    Think Global, Act Local: Bayesian Causal Discovery with Language Models in Sequential Data

    Authors: Prakhar Verma, David Arbour, Sunav Choudhary, Harshita Chopra, Arno Solin, Atanu R. Sinha

    Abstract: Causal discovery from observational data typically assumes full access to data and availability of domain experts. In practice, data often arrive in batches, and expert knowledge is scarce. Language Models (LMs) offer a surrogate but come with their own issues-hallucinations, inconsistencies, and bias. We present BLANCE (Bayesian LM-Augmented Causal Estimation)-a hybrid Bayesian framework that bri… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 24 pages, preprint

  2. arXiv:2505.21859  [pdf, ps, other

    cs.CL

    Principled Content Selection to Generate Diverse and Personalized Multi-Document Summaries

    Authors: Vishakh Padmakumar, Zichao Wang, David Arbour, Jennifer Healey

    Abstract: While large language models (LLMs) are increasingly capable of handling longer contexts, recent work has demonstrated that they exhibit the "lost in the middle" phenomenon (Liu et al., 2024) of unevenly attending to different parts of the provided context. This hinders their ability to cover diverse source material in multi-document summarization, as noted in the DiverseSumm benchmark (Huang et al… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: To appear at ACL 2025 - Main Conference

  3. Evaluation and Incident Prevention in an Enterprise AI Assistant

    Authors: Akash V. Maharaj, David Arbour, Daniel Lee, Uttaran Bhattacharya, Anup Rao, Austin Zane, Avi Feller, Kun Qian, Yunyao Li

    Abstract: Enterprise AI Assistants are increasingly deployed in domains where accuracy is paramount, making each erroneous output a potentially significant incident. This paper presents a comprehensive framework for monitoring, benchmarking, and continuously improving such complex, multi-component systems under active development by multiple teams. Our approach encompasses three key elements: (1) a hierarch… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: 7 pages, 5 figures. Accepted at IAAI-25

  4. arXiv:2402.00168  [pdf, other

    stat.ML cs.LG stat.ME

    Continuous Treatment Effects with Surrogate Outcomes

    Authors: Zhenghao Zeng, David Arbour, Avi Feller, Raghavendra Addanki, Ryan Rossi, Ritwik Sinha, Edward H. Kennedy

    Abstract: In many real-world causal inference applications, the primary outcomes (labels) are often partially missing, especially if they are expensive or difficult to collect. If the missingness depends on covariates (i.e., missingness is not completely at random), analyses based on fully observed samples alone may be biased. Incorporating surrogates, which are fully observed post-treatment variables relat… ▽ More

    Submitted 21 May, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: 30 pages, 7 figures

  5. arXiv:2311.17856  [pdf, other

    cs.LG cs.SI

    Leveraging Graph Diffusion Models for Network Refinement Tasks

    Authors: Puja Trivedi, Ryan Rossi, David Arbour, Tong Yu, Franck Dernoncourt, Sungchul Kim, Nedim Lipka, Namyong Park, Nesreen K. Ahmed, Danai Koutra

    Abstract: Most real-world networks are noisy and incomplete samples from an unknown target distribution. Refining them by correcting corruptions or inferring unobserved regions typically improves downstream performance. Inspired by the impressive generative capabilities that have been used to correct corruptions in images, and the similarities between "in-painting" and filling in missing nodes and edges con… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: Work in Progress. 21 pages, 7 figures

  6. arXiv:2308.14165  [pdf, other

    cs.IR cs.AI cs.LG

    Distributional Off-Policy Evaluation for Slate Recommendations

    Authors: Shreyas Chaudhari, David Arbour, Georgios Theocharous, Nikos Vlassis

    Abstract: Recommendation strategies are typically evaluated by using previously logged data, employing off-policy evaluation methods to estimate their expected performance. However, for strategies that present users with slates of multiple items, the resulting combinatorial action space renders many of these methods impractical. Prior work has developed estimators that leverage the structure in slates to es… ▽ More

    Submitted 27 December, 2023; v1 submitted 27 August, 2023; originally announced August 2023.

    Comments: Accepted in The 38th Annual AAAI Conference on Artificial Intelligence (AAAI-24)

  7. arXiv:2304.08648  [pdf, other

    cs.DS

    Dynamic Vector Bin Packing for Online Resource Allocation in the Cloud

    Authors: Aniket Murhekar, David Arbour, Tung Mai, Anup Rao

    Abstract: Several cloud-based applications, such as cloud gaming, rent servers to execute jobs which arrive in an online fashion. Each job has a resource demand and must be dispatched to a cloud server which has enough resources to execute the job, which departs after its completion. Under the `pay-as-you-go' billing model, the server rental cost is proportional to the total time that servers are actively r… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: 24 pages, to appear at SPAA 2023

  8. arXiv:2210.06594  [pdf, other

    cs.LG cs.AI cs.DS econ.EM stat.ME

    Sample Constrained Treatment Effect Estimation

    Authors: Raghavendra Addanki, David Arbour, Tung Mai, Cameron Musco, Anup Rao

    Abstract: Treatment effect estimation is a fundamental problem in causal inference. We focus on designing efficient randomized controlled trials, to accurately estimate the effect of some treatment on a population of $n$ individuals. In particular, we study sample-constrained treatment effect estimation, where we must select a subset of $s \ll n$ individuals from the population to experiment on. This subset… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: Conference on Neural Information Processing Systems (NeurIPS) 2022

  9. arXiv:2208.12210  [pdf, other

    cs.AI

    Learning Relational Causal Models with Cycles through Relational Acyclification

    Authors: Ragib Ahsan, David Arbour, Elena Zheleva

    Abstract: In real-world phenomena which involve mutual influence or causal effects between interconnected units, equilibrium states are typically represented with cycles in graphical models. An expressive class of graphical models, relational causal models, can represent and reason about complex dynamic systems exhibiting such cycles or feedback loops. Existing cyclic causal discovery algorithms for learnin… ▽ More

    Submitted 17 March, 2023; v1 submitted 25 August, 2022; originally announced August 2022.

    Comments: Published in the 37th AAAI Conference on Artificial Intelligence (AAAI 2023)

    Journal ref: AAAI 2023

  10. arXiv:2207.00163  [pdf, ps, other

    stat.ML cs.AI cs.LG

    Non-Parametric Inference of Relational Dependence

    Authors: Ragib Ahsan, Zahra Fatemi, David Arbour, Elena Zheleva

    Abstract: Independence testing plays a central role in statistical and causal inference from observational data. Standard independence tests assume that the data samples are independent and identically distributed (i.i.d.) but that assumption is violated in many real-world datasets and applications centered on relational systems. This work examines the problem of estimating independence in data drawn from r… ▽ More

    Submitted 29 June, 2022; originally announced July 2022.

    Comments: To appear in UAI 2022

  11. arXiv:2206.02470  [pdf, other

    cs.IR

    Offline Evaluation of Ranked Lists using Parametric Estimation of Propensities

    Authors: Vishwa Vinay, Manoj Kilaru, David Arbour

    Abstract: Search engines and recommendation systems attempt to continually improve the quality of the experience they afford to their users. Refining the ranker that produces the lists displayed in response to user requests is an important component of this process. A common practice is for the service providers to make changes (e.g. new ranking features, different ranking models) and A/B test them on a fra… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

    Comments: Accepted as a full paper at SIGIR 2022

  12. arXiv:2203.02807  [pdf, other

    cs.LG

    Off-Policy Evaluation in Embedded Spaces

    Authors: Jaron J. R. Lee, David Arbour, Georgios Theocharous

    Abstract: Off-policy evaluation methods are important in recommendation systems and search engines, where data collected under an existing logging policy is used to estimate the performance of a new proposed policy. A common approach to this problem is weighting, where data is weighted by a density ratio between the probability of actions given contexts in the target and logged policies. In practice, two is… ▽ More

    Submitted 2 January, 2023; v1 submitted 5 March, 2022; originally announced March 2022.

    Comments: 9 pages, appeared at NeurIPS 2021 Workshop "Causal Inference Challenges in Sequential Decision Making: Bridging Theory and Practice", presented virtually Dec 14th 2021

  13. arXiv:2202.10706  [pdf, other

    cs.AI cs.SI

    Relational Causal Models with Cycles:Representation and Reasoning

    Authors: Ragib Ahsan, David Arbour, Elena Zheleva

    Abstract: Causal reasoning in relational domains is fundamental to studying real-world social phenomena in which individual units can influence each other's traits and behavior. Dynamics between interconnected units can be represented as an instantiation of a relational causal model; however, causal reasoning over such instantiation requires additional templating assumptions that capture feedback loops of i… ▽ More

    Submitted 6 May, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

    Comments: Published in the 1st Conference on Causal Learning and Reasoning (2022)

  14. arXiv:2112.15221  [pdf, other

    cs.AI

    Constraint Sampling Reinforcement Learning: Incorporating Expertise For Faster Learning

    Authors: Tong Mu, Georgios Theocharous, David Arbour, Emma Brunskill

    Abstract: Online reinforcement learning (RL) algorithms are often difficult to deploy in complex human-facing applications as they may learn slowly and have poor early performance. To address this, we introduce a practical algorithm for incorporating human insight to speed learning. Our algorithm, Constraint Sampling Reinforcement Learning (CSRL), incorporates prior domain knowledge as constraints/restricti… ▽ More

    Submitted 30 December, 2021; originally announced December 2021.

    Journal ref: AAAI2022

  15. arXiv:2102.02765   

    cs.DS cs.DM math.CO

    Online Discrepancy Minimization via Persistent Self-Balancing Walks

    Authors: David Arbour, Drew Dimmery, Tung Mai, Anup Rao

    Abstract: We study the online discrepancy minimization problem for vectors in $\mathbb{R}^d$ in the oblivious setting where an adversary is allowed fix the vectors $x_1, x_2, \ldots, x_n$ in arbitrary order ahead of time. We give an algorithm that maintains $O(\sqrt{\log(nd/δ)})$ discrepancy with probability $1-δ$, matching the lower bound given in [Bansal et al. 2020] up to an $O(\sqrt{\log \log n})$ facto… ▽ More

    Submitted 5 February, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

    Comments: The proof of Lemma 7 is incorrect. There is a serious issue that we don't know how to fix at the moment. We thank Yang, Nikhil and collaborators for bringing it to our attention

  16. arXiv:2010.14058  [pdf, other

    cs.SI cs.DS cs.LG

    Heterogeneous Graphlets

    Authors: Ryan A. Rossi, Nesreen K. Ahmed, Aldo Carranza, David Arbour, Anup Rao, Sungchul Kim, Eunyee Koh

    Abstract: In this paper, we introduce a generalization of graphlets to heterogeneous networks called typed graphlets. Informally, typed graphlets are small typed induced subgraphs. Typed graphlets generalize graphlets to rich heterogeneous networks as they explicitly capture the higher-order typed connectivity patterns in such networks. To address this problem, we describe a general framework for counting t… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1901.10026

  17. arXiv:2009.09961  [pdf, other

    cs.CL

    Adjusting for Confounders with Text: Challenges and an Empirical Evaluation Framework for Causal Inference

    Authors: Galen Weld, Peter West, Maria Glenski, David Arbour, Ryan Rossi, Tim Althoff

    Abstract: Causal inference studies using textual social media data can provide actionable insights on human behavior. Making accurate causal inferences with text requires controlling for confounding which could otherwise impart bias. Recently, many different methods for adjusting for confounders have been proposed, and we show that these existing methods disagree with one another on two datasets inspired by… ▽ More

    Submitted 6 May, 2022; v1 submitted 21 September, 2020; originally announced September 2020.

    Comments: to appear at ICWSM 2022

  18. arXiv:2004.01218  [pdf, other

    stat.ME cs.AI cs.LG

    General Identification of Dynamic Treatment Regimes Under Interference

    Authors: Eli Sherman, David Arbour, Ilya Shpitser

    Abstract: In many applied fields, researchers are often interested in tailoring treatments to unit-level characteristics in order to optimize an outcome of interest. Methods for identifying and estimating treatment policies are the subject of the dynamic treatment regime literature. Separately, in many settings the assumption that data are independent and identically distributed does not hold due to inter-s… ▽ More

    Submitted 2 April, 2020; originally announced April 2020.

    Comments: 2020 Conference on Artificial Intelligence and Statistics (AIStats)

  19. arXiv:1906.03694  [pdf, other

    cs.LG stat.ME stat.ML

    Balanced off-policy evaluation in general action spaces

    Authors: Arjun Sondhi, David Arbour, Drew Dimmery

    Abstract: Estimation of importance sampling weights for off-policy evaluation of contextual bandits often results in imbalance - a mismatch between the desired and the actual distribution of state-action pairs after weighting. In this work we present balanced off-policy evaluation (B-OPE), a generic method for estimating weights which minimize this imbalance. Estimation of these weights reduces to a binary… ▽ More

    Submitted 4 March, 2020; v1 submitted 9 June, 2019; originally announced June 2019.

    Comments: Accepted to AISTATS 2020

  20. arXiv:1901.10026  [pdf, other

    cs.SI cs.DM cs.DS cs.LG

    Heterogeneous Network Motifs

    Authors: Ryan A. Rossi, Nesreen K. Ahmed, Aldo Carranza, David Arbour, Anup Rao, Sungchul Kim, Eunyee Koh

    Abstract: Many real-world applications give rise to large heterogeneous networks where nodes and edges can be of any arbitrary type (e.g., user, web page, location). Special cases of such heterogeneous graphs include homogeneous graphs, bipartite, k-partite, signed, labeled graphs, among many others. In this work, we generalize the notion of network motifs to heterogeneous networks. In particular, small ind… ▽ More

    Submitted 10 May, 2019; v1 submitted 28 January, 2019; originally announced January 2019.

  21. arXiv:1412.5238  [pdf, other

    cs.SI physics.soc-ph

    Refining the Semantics of Social Influence

    Authors: Katerina Marazopoulou, David Arbour, David Jensen

    Abstract: With the proliferation of network data, researchers are increasingly focusing on questions investigating phenomena occurring on networks. This often includes analysis of peer-effects, i.e., how the connections of an individual affect that individual's behavior. This type of influence is not limited to direct connections of an individual (such as friends), but also to individuals that are connected… ▽ More

    Submitted 16 December, 2014; originally announced December 2014.

    Comments: Networks: From Graphs to Rich Data - NIPS Workshop

  22. arXiv:1309.6843  [pdf

    cs.AI

    A Sound and Complete Algorithm for Learning Causal Models from Relational Data

    Authors: Marc Maier, Katerina Marazopoulou, David Arbour, David Jensen

    Abstract: The PC algorithm learns maximally oriented causal Bayesian networks. However, there is no equivalent complete algorithm for learning the structure of relational models, a more expressive generalization of Bayesian networks. Recent developments in the theory and representation of relational models support lifted reasoning about conditional independence. This enables a powerful constraint for orient… ▽ More

    Submitted 26 September, 2013; originally announced September 2013.

    Comments: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

    Report number: UAI-P-2013-PG-371-380