Skip to main content

Showing 1–25 of 25 results for author: Komeili, M

.
  1. arXiv:2506.09987  [pdf, ps, other

    cs.CV cs.LG

    A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs

    Authors: Benno Krojer, Mojtaba Komeili, Candace Ross, Quentin Garrido, Koustuv Sinha, Nicolas Ballas, Mahmoud Assran

    Abstract: Existing benchmarks for assessing the spatio-temporal understanding and reasoning abilities of video language models are susceptible to score inflation due to the presence of shortcut solutions based on superficial visual or textual cues. This paper mitigates the challenges in accurately assessing model performance by introducing the Minimal Video Pairs (MVP) benchmark, a simple shortcut-aware vid… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  2. arXiv:2506.08279  [pdf

    cs.CV cs.AI cs.LG

    Seeing Voices: Generating A-Roll Video from Audio with Mirage

    Authors: Aditi Sundararaman, Amogh Adishesha, Andrew Jaegle, Dan Bigioi, Hyoung-Kyu Song, Jon Kyl, Justin Mao, Kevin Lan, Mojtaba Komeili, ShahRukh Athar, Sheila Babayan, Stanislau Beliasau, William Buchwalter

    Abstract: From professional filmmaking to user-generated content, creators and consumers have long recognized that the power of video depends on the harmonious integration of what we hear (the video's audio track) with what we see (the video's image sequence). Current approaches to video generation either ignore sound to focus on general-purpose but silent image sequence generation or address both visual an… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: Technical report website: mirage.app/research/seeing-voices, product website: mirage.app

  3. arXiv:2504.04722  [pdf, other

    cs.CV

    TactileNet: Bridging the Accessibility Gap with AI-Generated Tactile Graphics for Individuals with Vision Impairment

    Authors: Adnan Khan, Alireza Choubineh, Mai A. Shaaban, Abbas Akkasi, Majid Komeili

    Abstract: Tactile graphics are essential for providing access to visual information for the 43 million people globally living with vision loss. Traditional methods for creating these graphics are labor-intensive and cannot meet growing demand. We introduce TactileNet, the first comprehensive dataset and AI-driven framework for generating embossing-ready 2D tactile templates using text-to-image Stable Diffus… ▽ More

    Submitted 15 May, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

  4. arXiv:2412.07191  [pdf, other

    cs.CV

    A Step towards Automated and Generalizable Tactile Map Generation using Generative Adversarial Networks

    Authors: David G Hobson, Majid Komeili

    Abstract: Blindness and visual impairments affect many people worldwide. For help with navigation, people with visual impairments often rely on tactile maps that utilize raised surfaces and edges to convey information through touch. Although these maps are helpful, they are often not widely available and current tools to automate their production have similar limitations including only working at certain sc… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

  5. arXiv:2410.03478  [pdf, other

    cs.CV cs.LG

    VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning

    Authors: Han Lin, Tushar Nagarajan, Nicolas Ballas, Mido Assran, Mojtaba Komeili, Mohit Bansal, Koustuv Sinha

    Abstract: Procedural video representation learning is an active research area where the objective is to learn an agent which can anticipate and forecast the future given the present video input, typically in conjunction with textual annotations. Prior works often rely on large-scale pretraining of visual encoders and prediction models with language supervision. However, the necessity and effectiveness of ex… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

    Comments: 10 pages

  6. Fine-Tuned Large Language Models for Symptom Recognition from Spanish Clinical Text

    Authors: Mai A. Shaaban, Abbas Akkasi, Adnan Khan, Majid Komeili, Mohammad Yaqub

    Abstract: The accurate recognition of symptoms in clinical reports is significantly important in the fields of healthcare and biomedical natural language processing. These entities serve as essential building blocks for clinical information extraction, enabling retrieval of critical medical insights from vast amounts of textual data. Furthermore, the ability to identify and categorize these entities is fund… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

  7. arXiv:2310.06966  [pdf, other

    cs.CV cs.AI cs.HC cs.LG

    On the Interpretability of Part-Prototype Based Classifiers: A Human Centric Analysis

    Authors: Omid Davoodi, Shayan Mohammadizadehsamakosh, Majid Komeili

    Abstract: Part-prototype networks have recently become methods of interest as an interpretable alternative to many of the current black-box image classifiers. However, the interpretability of these methods from the perspective of human users has not been sufficiently explored. In this work, we have devised a framework for evaluating the interpretability of part-prototype-based models from a human perspectiv… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: Intended for submission to Nature Scientific Reports

  8. arXiv:2309.11495  [pdf, other

    cs.CL cs.AI

    Chain-of-Verification Reduces Hallucination in Large Language Models

    Authors: Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, Jason Weston

    Abstract: Generation of plausible yet incorrect factual information, termed hallucination, is an unsolved issue in large language models. We study the ability of language models to deliberate on the responses they give in order to correct their mistakes. We develop the Chain-of-Verification (CoVe) method whereby the model first (i) drafts an initial response; then (ii) plans verification questions to fact-c… ▽ More

    Submitted 25 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

  9. arXiv:2306.04765  [pdf, other

    cs.AI cs.CL

    The HCI Aspects of Public Deployment of Research Chatbots: A User Study, Design Recommendations, and Open Challenges

    Authors: Morteza Behrooz, William Ngan, Joshua Lane, Giuliano Morse, Benjamin Babcock, Kurt Shuster, Mojtaba Komeili, Moya Chen, Melanie Kambadur, Y-Lan Boureau, Jason Weston

    Abstract: Publicly deploying research chatbots is a nuanced topic involving necessary risk-benefit analyses. While there have recently been frequent discussions on whether it is responsible to deploy such models, there has been far less focus on the interaction paradigms and design approaches that the resulting interfaces should adopt, in order to achieve their goals more effectively. We aim to pose, ground… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  10. arXiv:2306.04707  [pdf, other

    cs.CL cs.AI

    Improving Open Language Models by Learning from Organic Interactions

    Authors: Jing Xu, Da Ju, Joshua Lane, Mojtaba Komeili, Eric Michael Smith, Megan Ung, Morteza Behrooz, William Ngan, Rashel Moritz, Sainbayar Sukhbaatar, Y-Lan Boureau, Jason Weston, Kurt Shuster

    Abstract: We present BlenderBot 3x, an update on the conversational model BlenderBot 3, which is now trained using organic conversation and feedback data from participating users of the system in order to improve both its skills and safety. We are publicly releasing the participating de-identified interaction data for use by the research community, in order to spur further progress. Training models with org… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  11. arXiv:2304.13835  [pdf, other

    cs.CL cs.LG

    Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models

    Authors: Jimmy Wei, Kurt Shuster, Arthur Szlam, Jason Weston, Jack Urbanek, Mojtaba Komeili

    Abstract: Current dialogue research primarily studies pairwise (two-party) conversations, and does not address the everyday setting where more than two speakers converse together. In this work, we both collect and evaluate multi-party conversations to study this more general case. We use the LIGHT environment to construct grounded conversations, where each participant has an assigned character to role-play.… ▽ More

    Submitted 8 June, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

  12. arXiv:2304.06858  [pdf, ps, other

    cs.SI cs.CL cs.LG

    Vax-Culture: A Dataset for Studying Vaccine Discourse on Twitter

    Authors: Mohammad Reza Zarei, Michael Christensen, Sarah Everts, Majid Komeili

    Abstract: Vaccine hesitancy continues to be a main challenge for public health officials during the COVID-19 pandemic. As this hesitancy undermines vaccine campaigns, many researchers have sought to identify its root causes, finding that the increasing volume of anti-vaccine misinformation on social media platforms is a key element of this problem. We explored Twitter as a source of misleading content with… ▽ More

    Submitted 11 June, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

  13. arXiv:2303.10151  [pdf, other

    cs.CV

    Toward Super-Resolution for Appearance-Based Gaze Estimation

    Authors: Galen O'Shea, Majid Komeili

    Abstract: Gaze tracking is a valuable tool with a broad range of applications in various fields, including medicine, psychology, virtual reality, marketing, and safety. Therefore, it is essential to have gaze tracking software that is cost-efficient and high-performing. Accurately predicting gaze remains a difficult task, particularly in real-world situations where images are affected by motion blur, video… ▽ More

    Submitted 17 March, 2023; originally announced March 2023.

  14. arXiv:2301.05746  [pdf, other

    cs.CL cs.AI

    Infusing Commonsense World Models with Graph Knowledge

    Authors: Alexander Gurung, Mojtaba Komeili, Arthur Szlam, Jason Weston, Jack Urbanek

    Abstract: While language models have become more capable of producing compelling language, we find there are still gaps in maintaining consistency, especially when describing events in a dynamically changing world. We study the setting of generating narratives in an open world text adventure game, where a graph representation of the underlying game state can be used to train models that consume and output b… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

  15. arXiv:2211.09107  [pdf, other

    cs.LG cs.CV

    Interpretable Few-shot Learning with Online Attribute Selection

    Authors: Mohammad Reza Zarei, Majid Komeili

    Abstract: Few-shot learning (FSL) presents a challenging learning problem in which only a few samples are available for each class. Decision interpretation is more important in few-shot classification due to a greater chance of error compared to traditional classification. However, the majority of the previous FSL methods are black-box models. In this paper, we propose an inherently interpretable model for… ▽ More

    Submitted 30 March, 2025; v1 submitted 16 November, 2022; originally announced November 2022.

  16. arXiv:2208.03270  [pdf, other

    cs.CL cs.AI

    Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback

    Authors: Jing Xu, Megan Ung, Mojtaba Komeili, Kushal Arora, Y-Lan Boureau, Jason Weston

    Abstract: Frozen models trained to mimic static datasets can never improve their performance. Models that can employ internet-retrieval for up-to-date information and obtain feedback from humans during deployment provide the promise of both adapting to new information, and improving their performance. In this work we study how to improve internet-driven conversational skills in such a learning framework. We… ▽ More

    Submitted 16 August, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  17. arXiv:2208.03188  [pdf, other

    cs.CL cs.AI

    BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

    Authors: Kurt Shuster, Jing Xu, Mojtaba Komeili, Da Ju, Eric Michael Smith, Stephen Roller, Megan Ung, Moya Chen, Kushal Arora, Joshua Lane, Morteza Behrooz, William Ngan, Spencer Poff, Naman Goyal, Arthur Szlam, Y-Lan Boureau, Melanie Kambadur, Jason Weston

    Abstract: We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks. We release both the model weights and code, and have also deployed the model on a public web page to interact with organic users. This technical report describes how the model was built (arc… ▽ More

    Submitted 10 August, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  18. arXiv:2203.13224  [pdf, other

    cs.CL cs.AI

    Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion

    Authors: Kurt Shuster, Mojtaba Komeili, Leonard Adolphs, Stephen Roller, Arthur Szlam, Jason Weston

    Abstract: Language models (LMs) have recently been shown to generate more factual responses by employing modularity (Zhou et al., 2021) in combination with retrieval (Adolphs et al., 2021). We extend the recent approach of Adolphs et al. (2021) to include internet search as a module. Our SeeKeR (Search engine->Knowledge->Response) method thus applies a single LM to three modular tasks in succession: search,… ▽ More

    Submitted 29 March, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

  19. arXiv:2202.13474  [pdf, other

    cs.LG cs.CV

    Interpretable Concept-based Prototypical Networks for Few-Shot Learning

    Authors: Mohammad Reza Zarei, Majid Komeili

    Abstract: Few-shot learning aims at recognizing new instances from classes with limited samples. This challenging task is usually alleviated by performing meta-learning on similar tasks. However, the resulting models are black-boxes. There has been growing concerns about deploying black-box machine learning models and FSL is not an exception in this regard. In this paper, we propose a method for FSL based o… ▽ More

    Submitted 27 February, 2022; originally announced February 2022.

  20. arXiv:2110.09421  [pdf, other

    cs.CL cs.AI cs.CY

    Measuring Cognitive Status from Speech in a Smart Home Environment

    Authors: Kathleen C. Fraser, Majid Komeili

    Abstract: The population is aging, and becoming more tech-savvy. The United Nations predicts that by 2050, one in six people in the world will be over age 65 (up from one in 11 in 2019), and this increases to one in four in Europe and Northern America. Meanwhile, the proportion of American adults over 65 who own a smartphone has risen 24 percentage points from 2013-2017, and the majority have Internet in th… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Journal ref: IEEE Instrumentation & Measurement Magazine (Volume: 24, Issue: 6, September 2021)

  21. arXiv:2107.07566  [pdf, other

    cs.AI cs.CL

    Internet-Augmented Dialogue Generation

    Authors: Mojtaba Komeili, Kurt Shuster, Jason Weston

    Abstract: The largest store of continually updating knowledge on our planet can be accessed via internet search. In this work we study giving access to this information to conversational agents. Large language models, even though they store an impressive amount of knowledge within their weights, are known to hallucinate facts when generating dialogue (Shuster et al., 2021); moreover, those facts are frozen… ▽ More

    Submitted 15 July, 2021; originally announced July 2021.

  22. Feature-Based Interpretable Reinforcement Learning based on State-Transition Models

    Authors: Omid Davoodi, Majid Komeili

    Abstract: Growing concerns regarding the operational usage of AI models in the real-world has caused a surge of interest in explaining AI models' decisions to humans. Reinforcement Learning is not an exception in this regard. In this work, we propose a method for offering local explanations on risk in reinforcement learning. Our method only requires a log of previous interactions between the agent and the e… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

  23. arXiv:2105.07033  [pdf, other

    cs.LG

    Cause and Effect: Hierarchical Concept-based Explanation of Neural Networks

    Authors: Mohammad Nokhbeh Zaeem, Majid Komeili

    Abstract: In many scenarios, human decisions are explained based on some high-level concepts. In this work, we take a step in the interpretability of neural networks by examining their internal representation or neuron's activations against concepts. A concept is characterized by a set of samples that have specific features in common. We propose a framework to check the existence of a causal relationship be… ▽ More

    Submitted 6 November, 2021; v1 submitted 14 May, 2021; originally announced May 2021.

    Comments: 13 pages, 14 figures

  24. Efficient quantum walk on the grid with multiple marked elements

    Authors: Peter Hoyer, Mojtaba Komeili

    Abstract: We give a quantum algorithm for finding a marked element on the grid when there are multiple marked elements. Our algorithm uses quadratically fewer steps than a random walk on the grid, ignoring logarithmic factors. This is the first known quantum walk that finds a marked element in a number of steps less than the square-root of the extended hitting time. We also give a new tighter upper bound on… ▽ More

    Submitted 28 December, 2016; originally announced December 2016.

    Comments: 18 pages, to appear in STACS 2017, the 34th International Symposium on Theoretical Aspects of Computer Science

    ACM Class: F.1.2; F.2.2; G.2.2

    Journal ref: 34th Symposium on Theoretical Aspects of Computer Science (STACS), vol 66 of LIPIcs, pp. 42:1-42:14, 2017

  25. arXiv:1602.02675  [pdf

    cs.CE math.NA

    Performance of 1-D and 2-D Lattice Boltzmann (LB) in Solution of the Shock Tube Problem

    Authors: M. Komeili, M. Mirzaei, M. Shabouei

    Abstract: In this paper we presented a lattice Boltzmann with square grid for compressible flow problems. Triple level velocity is considered for each cell. Migration step use discrete velocity but continuous parameters are utilized to calculate density, velocity, and energy. So, we called this semi-discrete method. To evaluate the performance of the method the well-known shock tube problem is solved, using… ▽ More

    Submitted 9 September, 2016; v1 submitted 8 February, 2016; originally announced February 2016.

    Comments: in International Conference on Fascinating Advancement in Mechanical Engineering (FAME2008), India, 2008