Skip to main content

Showing 1–11 of 11 results for author: Riva, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.19537  [pdf, other

    cs.HC

    Agent-Initiated Interaction in Phone UI Automation

    Authors: Noam Kahlon, Guy Rom, Anatoly Efros, Filippo Galgani, Omri Berkovitch, Sapir Caduri, William E. Bishop, Oriana Riva, Ido Dagan

    Abstract: Phone automation agents aim to autonomously perform a given natural-language user request, such as scheduling appointments or booking a hotel. While much research effort has been devoted to screen understanding and action planning, complex tasks often necessitate user interaction for successful completion. Aligning the agent with the user's expectations is crucial for building trust and enabling p… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  2. arXiv:2406.03679  [pdf, other

    cs.AI cs.LG

    On the Effects of Data Scale on UI Control Agents

    Authors: Wei Li, William Bishop, Alice Li, Chris Rawles, Folawiyo Campbell-Ajala, Divya Tyamagundlu, Oriana Riva

    Abstract: Autonomous agents that control computer interfaces to accomplish human tasks are emerging. Leveraging LLMs to power such agents has been of special interest, but unless fine-tuned on human-collected task demonstrations, performance is still relatively low. In this work we study whether fine-tuning alone is a viable approach for building real-world computer control agents. In particularly, we inves… ▽ More

    Submitted 13 November, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: NeurIPS 2024 (Datasets and Benchmarks)

  3. arXiv:2405.19773  [pdf, other

    cs.CV

    VQA Training Sets are Self-play Environments for Generating Few-shot Pools

    Authors: Tautvydas Misiunas, Hassan Mansoor, Jasper Uijlings, Oriana Riva, Victor Carbune

    Abstract: Large-language models and large-vision models are increasingly capable of solving compositional reasoning tasks, as measured by breakthroughs in visual-question answering benchmarks. However, state-of-the-art solutions often involve careful construction of large pre-training and fine-tuning datasets, which can be expensive. The use of external tools, whether other ML models, search engines, or API… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  4. arXiv:2405.14573  [pdf, other

    cs.AI cs.LG

    AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents

    Authors: Christopher Rawles, Sarah Clinckemaillie, Yifan Chang, Jonathan Waltz, Gabrielle Lau, Marybeth Fair, Alice Li, William Bishop, Wei Li, Folawiyo Campbell-Ajala, Daniel Toyama, Robert Berry, Divya Tyamagundlu, Timothy Lillicrap, Oriana Riva

    Abstract: Autonomous agents that execute human tasks by controlling computers can enhance human productivity and application accessibility. However, progress in this field will be driven by realistic and reproducible benchmarks. We present AndroidWorld, a fully functional Android environment that provides reward signals for 116 programmatic tasks across 20 real-world Android apps. Unlike existing interactiv… ▽ More

    Submitted 6 April, 2025; v1 submitted 23 May, 2024; originally announced May 2024.

  5. arXiv:2405.11120  [pdf, other

    cs.AI cs.LG

    Latent State Estimation Helps UI Agents to Reason

    Authors: William E Bishop, Alice Li, Christopher Rawles, Oriana Riva

    Abstract: A common problem for agents operating in real-world environments is that the response of an environment to their actions may be non-deterministic and observed through noise. This renders environmental state and progress towards completing a task latent. Despite recent impressive demonstrations of LLM's reasoning abilities on various benchmarks, whether LLMs can build estimates of latent state and… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  6. arXiv:2312.10170  [pdf, other

    cs.HC cs.AI

    UINav: A Practical Approach to Train On-Device Automation Agents

    Authors: Wei Li, Fu-Lin Hsu, Will Bishop, Folawiyo Campbell-Ajala, Max Lin, Oriana Riva

    Abstract: Automation systems that can autonomously drive application user interfaces to complete user tasks are of great benefit, especially when users are situationally or permanently impaired. Prior automation systems do not produce generalizable models while AI-based automation agents work reliably only in simple, hand-crafted applications or incur high computation costs. We propose UINav, a demonstratio… ▽ More

    Submitted 28 June, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Journal ref: NAACL 2024 Industry Track

  7. arXiv:2307.10088  [pdf, other

    cs.LG cs.CL cs.HC

    Android in the Wild: A Large-Scale Dataset for Android Device Control

    Authors: Christopher Rawles, Alice Li, Daniel Rodriguez, Oriana Riva, Timothy Lillicrap

    Abstract: There is a growing interest in device-control systems that can interpret human natural language instructions and execute them on a digital device by directly controlling its user interface. We present a dataset for device-control research, Android in the Wild (AITW), which is orders of magnitude larger than current datasets. The dataset contains human demonstrations of device interactions, includi… ▽ More

    Submitted 27 October, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

  8. arXiv:2301.10165  [pdf, other

    cs.CL cs.AI

    Lexi: Self-Supervised Learning of the UI Language

    Authors: Pratyay Banerjee, Shweti Mahajan, Kushal Arora, Chitta Baral, Oriana Riva

    Abstract: Humans can learn to operate the user interface (UI) of an application by reading an instruction manual or how-to guide. Along with text, these resources include visual content such as UI screenshots and images of application icons referenced in the text. We explore how to leverage this data to learn generic visio-linguistic representations of UI screens and their components. These representations… ▽ More

    Submitted 23 January, 2023; originally announced January 2023.

    Comments: EMNLP (Findings) 2022

  9. arXiv:2102.10263  [pdf, other

    stat.ML cs.LG stat.ME

    Inducing a hierarchy for multi-class classification problems

    Authors: Hayden S. Helm, Weiwei Yang, Sujeeth Bharadwaj, Kate Lytvynets, Oriana Riva, Christopher White, Ali Geisa, Carey E. Priebe

    Abstract: In applications where categorical labels follow a natural hierarchy, classification methods that exploit the label structure often outperform those that do not. Un-fortunately, the majority of classification datasets do not come pre-equipped with a hierarchical structure and classical flat classifiers must be employed. In this paper, we investigate a class of methods that induce a hierarchy that c… ▽ More

    Submitted 20 February, 2021; originally announced February 2021.

  10. arXiv:2012.05818  [pdf, other

    cs.IR cs.CL

    Bew: Towards Answering Business-Entity-Related Web Questions

    Authors: Qingqing Cao, Oriana Riva, Aruna Balasubramanian, Niranjan Balasubramanian

    Abstract: We present BewQA, a system specifically designed to answer a class of questions that we call Bew questions. Bew questions are related to businesses/services such as restaurants, hotels, and movie theaters; for example, "Until what time is happy hour?". These questions are challenging to answer because the answers are found in open-domain Web, are present in short sentences without surrounding cont… ▽ More

    Submitted 10 December, 2020; originally announced December 2020.

  11. arXiv:2010.12844  [pdf, other

    cs.CL cs.AI

    FLIN: A Flexible Natural Language Interface for Web Navigation

    Authors: Sahisnu Mazumder, Oriana Riva

    Abstract: AI assistants can now carry out tasks for users by directly interacting with website UIs. Current semantic parsing and slot-filling techniques cannot flexibly adapt to many different websites without being constantly re-trained. We propose FLIN, a natural language interface for web navigation that maps user commands to concept-level actions (rather than low-level UI actions), thus being able to fl… ▽ More

    Submitted 13 April, 2021; v1 submitted 24 October, 2020; originally announced October 2020.

    Comments: Accepted to NAACL-HLT 2021