Skip to main content

Showing 1–50 of 62 results for author: Melo, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.00469  [pdf, ps, other

    cs.CV cs.LG

    Bisecle: Binding and Separation in Continual Learning for Video Language Understanding

    Authors: Yue Tan, Xiaoqian Hu, Hao Xue, Celso De Melo, Flora D. Salim

    Abstract: Frontier vision-language models (VLMs) have made remarkable improvements in video understanding tasks. However, real-world videos typically exist as continuously evolving data streams (e.g., dynamic scenes captured by wearable glasses), necessitating models to continually adapt to shifting data distributions and novel scenarios. Considering the prohibitive computational costs of fine-tuning models… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 23 pages, 12 figures, 10 tables

  2. arXiv:2505.24216  [pdf, ps, other

    cs.CV

    Shuffle PatchMix Augmentation with Confidence-Margin Weighted Pseudo-Labels for Enhanced Source-Free Domain Adaptation

    Authors: Prasanna Reddy Pulakurthi, Majid Rabbani, Jamison Heard, Sohail Dianat, Celso M. de Melo, Raghuveer Rao

    Abstract: This work investigates Source-Free Domain Adaptation (SFDA), where a model adapts to a target domain without access to source data. A new augmentation technique, Shuffle PatchMix (SPM), and a novel reweighting strategy are introduced to enhance performance. SPM shuffles and blends image patches to generate diverse and challenging augmentations, while the reweighting strategy prioritizes reliable p… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: 6 pages, 3 figures, 5 tables, Accepted to IEEE ICIP 2025

  3. arXiv:2505.18048  [pdf, ps, other

    cs.CV

    SHARDeg: A Benchmark for Skeletal Human Action Recognition in Degraded Scenarios

    Authors: Simon Malzard, Nitish Mital, Richard Walters, Victoria Nockles, Raghuveer Rao, Celso M. De Melo

    Abstract: Computer vision (CV) models for detection, prediction or classification tasks operate on video data-streams that are often degraded in the real world, due to deployment in real-time or on resource-constrained hardware. It is therefore critical that these models are robust to degraded data, but state of the art (SoTA) models are often insufficiently assessed with these real-world constraints in min… ▽ More

    Submitted 27 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

    Comments: 19 pages, 2 images, updated acknowledgements versus previous versions to be compliant with funders

  4. arXiv:2505.00788  [pdf, ps, other

    cs.CV

    SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models

    Authors: Wufei Ma, Luoxin Ye, Celso M de Melo, Jieneng Chen, Alan Yuille

    Abstract: Humans naturally understand 3D spatial relationships, enabling complex reasoning like predicting collisions of vehicles from different directions. Current large multimodal models (LMMs), however, lack of this capability of 3D spatial reasoning. This limitation stems from the scarcity of 3D training data and the bias in current model designs toward 2D data. In this paper, we systematically study th… ▽ More

    Submitted 10 June, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

    Comments: CVPR 2025 highlight

  5. arXiv:2504.20024  [pdf, ps, other

    cs.CV

    SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning

    Authors: Wufei Ma, Yu-Cheng Chou, Qihao Liu, Xingrui Wang, Celso de Melo, Jianwen Xie, Alan Yuille

    Abstract: Despite recent advances on multi-modal models, 3D spatial reasoning remains a challenging task for state-of-the-art open-source and proprietary models. Recent studies explore data-driven approaches and achieve enhanced spatial reasoning performance by fine-tuning models on 3D-related visual question-answering data. However, these methods typically perform spatial reasoning in an implicit manner an… ▽ More

    Submitted 10 June, 2025; v1 submitted 28 April, 2025; originally announced April 2025.

    Comments: Project page: https://spatial-reasoner.github.io

  6. Effective Dual-Region Augmentation for Reduced Reliance on Large Amounts of Labeled Data

    Authors: Prasanna Reddy Pulakurthi, Majid Rabbani, Celso M. de Melo, Sohail A. Dianat, Raghuveer M. Rao

    Abstract: This paper introduces a novel dual-region augmentation approach designed to reduce reliance on large-scale labeled datasets while improving model robustness and adaptability across diverse computer vision tasks, including source-free domain adaptation (SFDA) and person re-identification (ReID). Our method performs targeted data transformations by applying random noise perturbations to foreground o… ▽ More

    Submitted 3 June, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: 9 pages, 2 figures, 4 tables, Accepted to SPIE DSC 2025 Conference: Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications III

    Journal ref: Proc. SPIE 13459, Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications III, 134590I (2025)

  7. arXiv:2504.02801  [pdf, other

    cs.CV

    F-ViTA: Foundation Model Guided Visible to Thermal Translation

    Authors: Jay N. Paranjape, Celso de Melo, Vishal M. Patel

    Abstract: Thermal imaging is crucial for scene understanding, particularly in low-light and nighttime conditions. However, collecting large thermal datasets is costly and labor-intensive due to the specialized equipment required for infrared image capture. To address this challenge, researchers have explored visible-to-thermal image translation. Most existing methods rely on Generative Adversarial Networks… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  8. arXiv:2503.19009  [pdf, other

    cs.CV cs.IR

    Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval

    Authors: Arun Reddy, Alexander Martin, Eugene Yang, Andrew Yates, Kate Sanders, Kenton Murray, Reno Kriz, Celso M. de Melo, Benjamin Van Durme, Rama Chellappa

    Abstract: In this work, we tackle the problem of text-to-video retrieval (T2VR). Inspired by the success of late interaction techniques in text-document, text-image, and text-video retrieval, our approach, Video-ColBERT, introduces a simple and efficient mechanism for fine-grained similarity assessment between queries and videos. Video-ColBERT is built upon 3 main components: a fine-grained spatial and temp… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: Accepted at CVPR 2025. 13 pages, 4 figures. Approved for public release: distribution unlimited

  9. arXiv:2503.08933  [pdf, other

    cs.CV

    PromptGAR: Flexible Promptive Group Activity Recognition

    Authors: Zhangyu Jin, Andrew Feng, Ankur Chemburkar, Celso M. De Melo

    Abstract: We present PromptGAR, a novel framework that addresses the limitations of current Group Activity Recognition (GAR) approaches by leveraging multi-modal prompts to achieve both input flexibility and high recognition accuracy. The existing approaches suffer from limited real-world applicability due to their reliance on full prompt annotations, the lack of long-term actor consistency, and under-explo… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  10. arXiv:2503.07391  [pdf, other

    cs.DC

    Availability Modeling for Blockchain Provisioning in Private Clouds

    Authors: J Dantas, P Silva, L Fiondella, C Melo, P Maciel

    Abstract: Blockchain technology has emerged, and many previous studies have assessed its performance issues. However, less attention has been paid to the dependability attributes, which have been a critical topic in service provisioning, considering public or private infrastructures. This paper introduces analytical models to assess the availability of private blockchain infrastructure for Hyperledger Fabri… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  11. arXiv:2502.12257  [pdf, other

    cs.CL cs.LG

    InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context

    Authors: Bryan L. M. de Oliveira, Luana G. B. Martins, Bruno Brandão, Luckeciano C. Melo

    Abstract: Large language models excel at following explicit instructions, but they often struggle with ambiguous or incomplete user requests, defaulting to verbose, generic responses instead of seeking clarification. We introduce InfoQuest, a multi-turn chat benchmark designed to evaluate how dialogue agents handle hidden context in open-ended user requests. This benchmark presents intentionally ambiguous s… ▽ More

    Submitted 25 April, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  12. arXiv:2502.11250  [pdf, other

    cs.CL

    Uncertainty-Aware Step-wise Verification with Generative Reward Models

    Authors: Zihuiwen Ye, Luckeciano Carvalho Melo, Younesse Kaddar, Phil Blunsom, Sam Staton, Yarin Gal

    Abstract: Complex multi-step reasoning tasks, such as solving mathematical problems, remain challenging for large language models (LLMs). While outcome supervision is commonly used, process supervision via process reward models (PRMs) provides intermediate rewards to verify step-wise correctness in solution traces. However, as proxies for human judgement, PRMs suffer from reliability issues, including susce… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

  13. Distributed Application Provisioning over Ethereum based private and permissioned Blockchain: Availability modeling, capacity, and costs planning

    Authors: Carlos Melo, Jamilson Dantas, Paulo Pereira, Paulo Maciel

    Abstract: Blockchain and Cloud Computing are two of the main topics related to the distributed computing paradigm, and in the last decade, they have seen exponential growth in their adoption. Cloud computing has long been established as the main mechanism to test, develop, and deliver new applications and services in a distributed manner across the World Wide Web. Large data centers host many services and s… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

  14. A Comprehensive Hyperledger Fabric Performance Evaluation based on Resources Capacity Planning

    Authors: Carlos Melo, Glauber Gonçalves, Francisco A. Silva, André Soares

    Abstract: Hyperledger Fabric is a platform for permissioned blockchain networks that enables secure and auditable distributed data storage for enterprise applications. There is a growing interest in applications based on this platform, but its use requires the configuration of different blockchain parameters. Various configurations impact the system's non-functional qualities, especially performance and cos… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

  15. Transactional Dynamics in Hyperledger Fabric: A Stochastic Modeling and Performance Evaluation of Permissioned Blockchains

    Authors: Carlos Melo, Glauber Gonçalves, Francisco Airton Silva, Iure Fé, Ericksulino Moura, André Soares, Eunmi Choi, Dugki Min, Jae-Woo Lee, Tuan Anh Nguyen

    Abstract: Blockchain, often integrated with distributed systems and security enhancements, has significant potential in various industries. However, environmental concerns and the efficiency of consortia-controlled permissioned networks remain critical issues. We use a Stochastic Petri Net model to analyze transaction flows in Hyperledger Fabric networks, achieving a 95% confidence interval for response tim… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

  16. Optimal Resource Utilization in Hyperledger Fabric: A Comprehensive SPN-Based Performance Evaluation Paradigm

    Authors: Carlos Melo, Glauber Gonçalves, Francisco A. Silva, Leonel Feitosa, Iure Fé, André Soares, Eunmi Choi, Tuan Anh Nguyen, Dugki Min

    Abstract: Hyperledger Fabric stands as a leading framework for permissioned blockchain systems, ensuring data security and auditability for enterprise applications. As applications on this platform grow, understanding its complex configuration concerning various blockchain parameters becomes vital. These configurations significantly affect the system's performance and cost. In this research, we introduce a… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  17. Performance Modeling and Evaluation of Hyperledger Fabric: An Analysis Based on Transaction Flow and Endorsement Policies

    Authors: Carlos Melo, Glauber Gonçalves, Francisco A. Silva, André Soares

    Abstract: Blockchain is a paradigm derived from distributed systems, protocols, and security concepts. However, can blockchain applications provide services in industrial environments, especially concerning performance issues? In blockchains, long response times can impair both user and service experience, and intensive resource use may increase the costs of service provision. The proposed paper tries to an… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

    Comments: 29th IEEE Symposium on Computers and Communications (ISCC)

  18. arXiv:2502.08636  [pdf, ps, other

    cs.CV

    Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models

    Authors: Xingrui Wang, Wufei Ma, Tiezheng Zhang, Celso M de Melo, Jieneng Chen, Alan Yuille

    Abstract: Although large multimodal models (LMMs) have demonstrated remarkable capabilities in visual scene interpretation and reasoning, their capacity for complex and precise 3-dimensional spatial reasoning remains uncertain. Existing benchmarks focus predominantly on 2D spatial understanding and lack a framework to comprehensively evaluate 6D spatial reasoning across varying complexities. To address this… ▽ More

    Submitted 8 June, 2025; v1 submitted 12 February, 2025; originally announced February 2025.

    Comments: Published in CVPR 2025 as Highlight. Data and code are released at https://github.com/XingruiWang/Spatial457

  19. arXiv:2501.07396  [pdf, other

    cs.CV

    Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models

    Authors: Yasiru Ranasinghe, Vibashan VS, James Uplinger, Celso De Melo, Vishal M. Patel

    Abstract: Automatic target recognition (ATR) plays a critical role in tasks such as navigation and surveillance, where safety and accuracy are paramount. In extreme use cases, such as military applications, these factors are often challenged due to the presence of unknown terrains, environmental conditions, and novel object categories. Current object detectors, including open-world detectors, lack the abili… ▽ More

    Submitted 13 January, 2025; originally announced January 2025.

  20. arXiv:2412.16358  [pdf, other

    cs.CV

    Texture- and Shape-based Adversarial Attacks for Vehicle Detection in Synthetic Overhead Imagery

    Authors: Mikael Yeghiazaryan, Sai Abhishek Siddhartha Namburu, Emily Kim, Stanislav Panev, Celso de Melo, Brent Lance, Fernando De la Torre, Jessica K. Hodgins

    Abstract: Detecting vehicles in aerial images can be very challenging due to complex backgrounds, small resolution, shadows, and occlusions. Despite the effectiveness of SOTA detectors such as YOLO, they remain vulnerable to adversarial attacks (AAs), compromising their reliability. Traditional AA strategies often overlook the practical constraints of physical implementation, focusing solely on attack perfo… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

  21. arXiv:2412.07825  [pdf, other

    cs.CV

    3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark

    Authors: Wufei Ma, Haoyu Chen, Guofeng Zhang, Yu-Cheng Chou, Celso M de Melo, Alan Yuille

    Abstract: 3D spatial reasoning is the ability to analyze and interpret the positions, orientations, and spatial relationships of objects within the 3D space. This allows models to develop a comprehensive understanding of the 3D scene, enabling their applicability to a broader range of areas, such as autonomous navigation, robotics, and AR/VR. While large multi-modal models (LMMs) have achieved remarkable pr… ▽ More

    Submitted 8 May, 2025; v1 submitted 10 December, 2024; originally announced December 2024.

    Comments: Project page: https://3dsrbench.github.io

  22. arXiv:2412.01477  [pdf, other

    cs.CV

    Improving Object Detection by Modifying Synthetic Data with Explainable AI

    Authors: Nitish Mital, Simon Malzard, Richard Walters, Celso M. De Melo, Raghuveer Rao, Victoria Nockles

    Abstract: Limited real-world data severely impacts model performance in many computer vision domains, particularly for samples that are underrepresented in training. Synthetically generated images are a promising solution, but 1) it remains unclear how to design synthetic training data to optimally improve model performance (e.g, whether and where to introduce more realism or more abstraction) and 2) the do… ▽ More

    Submitted 3 April, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

  23. arXiv:2410.14038  [pdf, ps, other

    cs.LG

    Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning

    Authors: Bryan L. M. de Oliveira, Luana G. B. Martins, Bruno Brandão, Murilo L. da Luz, Telma W. de L. Soares, Luckeciano C. Melo

    Abstract: Effective visual representation learning is crucial for reinforcement learning (RL) agents to extract task-relevant information from raw sensory inputs and generalize across diverse environments. However, existing RL benchmarks lack the ability to systematically evaluate representation learning capabilities in isolation from other learning challenges. To address this gap, we introduce the Sliding… ▽ More

    Submitted 1 July, 2025; v1 submitted 17 October, 2024; originally announced October 2024.

    Comments: Accepted at ICML 2025

  24. arXiv:2410.07812  [pdf, other

    cs.LG cs.AI

    Temporal-Difference Variational Continual Learning

    Authors: Luckeciano C. Melo, Alessandro Abate, Yarin Gal

    Abstract: Machine Learning models in real-world applications must continuously learn new tasks to adapt to shifts in the data-generating distribution. Yet, for Continual Learning (CL), models often struggle to balance learning new tasks (plasticity) with retaining previous knowledge (memory stability). Consequently, they are susceptible to Catastrophic Forgetting, which degrades performance and undermines t… ▽ More

    Submitted 14 May, 2025; v1 submitted 10 October, 2024; originally announced October 2024.

  25. arXiv:2410.06108  [pdf, other

    cs.AI

    ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution

    Authors: Corban Rivera, Grayson Byrd, William Paul, Tyler Feldman, Meghan Booker, Emma Holmes, David Handelman, Bethany Kemp, Andrew Badger, Aurora Schmidt, Krishna Murthy Jatavallabhula, Celso M de Melo, Lalithkumar Seenivasan, Mathias Unberath, Rama Chellappa

    Abstract: Robotic planning and execution in open-world environments is a complex problem due to the vast state spaces and high variability of task embodiment. Recent advances in perception algorithms, combined with Large Language Models (LLMs) for planning, offer promising solutions to these challenges, as the common sense reasoning capabilities of LLMs provide a strong heuristic for efficiently searching t… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  26. An Evaluation of Large Pre-Trained Models for Gesture Recognition using Synthetic Videos

    Authors: Arun Reddy, Ketul Shah, Corban Rivera, William Paul, Celso M. De Melo, Rama Chellappa

    Abstract: In this work, we explore the possibility of using synthetically generated data for video-based gesture recognition with large pre-trained models. We consider whether these models have sufficiently robust and expressive representation spaces to enable "training-free" classification. Specifically, we utilize various state-of-the-art video encoders to extract features for use in k-nearest neighbors c… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

    Comments: Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II (SPIE Defense + Commercial Sensing, 2024)

    Journal ref: Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II. Vol. 13035. SPIE, 2024

  27. arXiv:2407.06839  [pdf, other

    cs.CV

    A Mamba-based Siamese Network for Remote Sensing Change Detection

    Authors: Jay N. Paranjape, Celso de Melo, Vishal M. Patel

    Abstract: Change detection in remote sensing images is an essential tool for analyzing a region at different times. It finds varied applications in monitoring environmental changes, man-made changes as well as corresponding decision-making and prediction of future trends. Deep learning methods like Convolutional Neural Networks (CNNs) and Transformers have achieved remarkable success in detecting significan… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 11 pages, 7 figures

  28. arXiv:2406.13123  [pdf, other

    cs.AI cs.CV

    ViLCo-Bench: VIdeo Language COntinual learning Benchmark

    Authors: Tianqi Tang, Shohreh Deldari, Hao Xue, Celso De Melo, Flora D. Salim

    Abstract: Video language continual learning involves continuously adapting to information from video and text inputs, enhancing a model's ability to handle new tasks while retaining prior knowledge. This field is a relatively under-explored area, and establishing appropriate datasets is crucial for facilitating communication and research in this field. In this study, we present the first dedicated benchmark… ▽ More

    Submitted 15 December, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: 14 pages, 4 figures, 8 tables, Accepted at NeurIPS Dataset and Benchmark Track 2024

  29. arXiv:2406.10023  [pdf, other

    cs.LG cs.CL stat.ML

    Deep Bayesian Active Learning for Preference Modeling in Large Language Models

    Authors: Luckeciano C. Melo, Panagiotis Tigas, Alessandro Abate, Yarin Gal

    Abstract: Leveraging human preferences for steering the behavior of Large Language Models (LLMs) has demonstrated notable success in recent years. Nonetheless, data selection and labeling are still a bottleneck for these systems, particularly at large scale. Hence, selecting the most informative points for acquiring human feedback may considerably reduce the cost of preference labeling and unleash the furth… ▽ More

    Submitted 28 October, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 38th Conference on Neural Information Processing Systems (NeurIPS 2024)

  30. arXiv:2403.14874  [pdf, other

    cs.CV cs.LG

    WeatherProof: Leveraging Language Guidance for Semantic Segmentation in Adverse Weather

    Authors: Blake Gella, Howard Zhang, Rishi Upadhyay, Tiffany Chang, Nathan Wei, Matthew Waliman, Yunhao Ba, Celso de Melo, Alex Wong, Achuta Kadambi

    Abstract: We propose a method to infer semantic segmentation maps from images captured under adverse weather conditions. We begin by examining existing models on images degraded by weather conditions such as rain, fog, or snow, and found that they exhibit a large performance drop as compared to those captured under clear weather. To control for changes in scene structures, we propose WeatherProof, the first… ▽ More

    Submitted 7 May, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2312.09534

  31. arXiv:2312.14126  [pdf, other

    cs.CV

    Entropic Open-set Active Learning

    Authors: Bardia Safaei, Vibashan VS, Celso M. de Melo, Vishal M. Patel

    Abstract: Active Learning (AL) aims to enhance the performance of deep models by selecting the most informative samples for annotation from a pool of unlabeled data. Despite impressive performance in closed-set settings, most AL methods fail in real-world scenarios where the unlabeled data contains unknown categories. Recently, a few studies have attempted to tackle the AL problem for the open-set setting.… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted in AAAI 2024

  32. arXiv:2312.02914  [pdf, other

    cs.CV cs.LG

    Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training

    Authors: Arun Reddy, William Paul, Corban Rivera, Ketul Shah, Celso M. de Melo, Rama Chellappa

    Abstract: In this work, we tackle the problem of unsupervised domain adaptation (UDA) for video action recognition. Our approach, which we call UNITE, uses an image teacher model to adapt a video student model to the target domain. UNITE first employs self-supervised pre-training to promote discriminative feature learning on target domain videos using a teacher-guided masked distillation objective. We then… ▽ More

    Submitted 4 March, 2025; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Accepted at CVPR 2024. 13 pages, 4 figures. Approved for public release: distribution unlimited

  33. arXiv:2312.02151  [pdf, other

    cs.CV cs.AI cs.LG

    Guarding Barlow Twins Against Overfitting with Mixed Samples

    Authors: Wele Gedara Chaminda Bandara, Celso M. De Melo, Vishal M. Patel

    Abstract: Self-supervised Learning (SSL) aims to learn transferable feature representations for downstream applications without relying on labeled data. The Barlow Twins algorithm, renowned for its widespread adoption and straightforward implementation compared to its counterparts like contrastive learning methods, minimizes feature redundancy while maximizing invariance to common corruptions. Optimizing fo… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Code and checkpoints are available at: https://github.com/wgcban/mix-bt.git

  34. arXiv:2310.13290  [pdf, other

    cs.CL

    Interpreting Indirect Answers to Yes-No Questions in Multiple Languages

    Authors: Zijie Wang, Md Mosharaf Hossain, Shivam Mathur, Terry Cruz Melo, Kadir Bulut Ozler, Keun Hee Park, Jacob Quintero, MohammadHossein Rezaei, Shreya Nupur Shakya, Md Nayem Uddin, Eduardo Blanco

    Abstract: Yes-no questions expect a yes or no for an answer, but people often skip polar keywords. Instead, they answer with long explanations that must be interpreted. In this paper, we focus on this challenging problem and release new benchmarks in eight languages. We present a distant supervision approach to collect training data. We also demonstrate that direct answers (i.e., with polar keywords) are us… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 Findings

  35. arXiv:2309.16650  [pdf, other

    cs.RO cs.CV

    ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning

    Authors: Qiao Gu, Alihusein Kuwajerwala, Sacha Morin, Krishna Murthy Jatavallabhula, Bipasha Sen, Aditya Agarwal, Corban Rivera, William Paul, Kirsty Ellis, Rama Chellappa, Chuang Gan, Celso Miguel de Melo, Joshua B. Tenenbaum, Antonio Torralba, Florian Shkurti, Liam Paull

    Abstract: For robots to perform a wide variety of tasks, they require a 3D representation of the world that is semantically rich, yet compact and efficient for task-driven perception and planning. Recent approaches have attempted to leverage features from large vision-language models to encode semantics in 3D representations. However, these approaches tend to produce maps with per-point feature vectors, whi… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: Project page: https://concept-graphs.github.io/ Explainer video: https://youtu.be/mRhNkQwRYnc

  36. arXiv:2307.10018  [pdf, other

    cs.RO cs.AI

    RobôCIn Small Size League Extended Team Description Paper for RoboCup 2023

    Authors: Aline Lima de Oliveira, Cauê Addae da Silva Gomes, Cecília Virginia Santos da Silva, Charles Matheus de Sousa Alves, Danilo Andrade Martins de Souza, Driele Pires Ferreira Araújo Xavier, Edgleyson Pereira da Silva, Felipe Bezerra Martins, Lucas Henrique Cavalcanti Santos, Lucas Dias Maciel, Matheus Paixão Gumercindo dos Santos, Matheus Lafayette Vasconcelos, Matheus Vinícius Teotonio do Nascimento Andrade, João Guilherme Oliveira Carvalho de Melo, João Pedro Souza Pereira de Moura, José Ronald da Silva, José Victor Silva Cruz, Pedro Henrique Santana de Morais, Pedro Paulo Salman de Oliveira, Riei Joaquim Matos Rodrigues, Roberto Costa Fernandes, Ryan Vinicius Santos Morais, Tamara Mayara Ramos Teobaldo, Washington Igor dos Santos Silva, Edna Natividade Silva Barros

    Abstract: RobôCIn has participated in RoboCup Small Size League since 2019, won its first world title in 2022 (Division B), and is currently a three-times Latin-American champion. This paper presents our improvements to defend the Small Size League (SSL) division B title in RoboCup 2023 in Bordeaux, France. This paper aims to share some of the academic research that our team developed over the past year. Ou… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

  37. arXiv:2303.18177  [pdf, other

    cs.CV

    STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition

    Authors: Xiaoyu Zhu, Po-Yao Huang, Junwei Liang, Celso M. de Melo, Alexander Hauptmann

    Abstract: We study the problem of human action recognition using motion capture (MoCap) sequences. Unlike existing techniques that take multiple manual steps to derive standardized skeleton representations as model input, we propose a novel Spatial-Temporal Mesh Transformer (STMT) to directly model the mesh sequences. The model uses a hierarchical transformer with intra-frame off-set attention and inter-fra… ▽ More

    Submitted 26 July, 2024; v1 submitted 31 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  38. arXiv:2303.10280  [pdf, other

    cs.CV

    Synthetic-to-Real Domain Adaptation for Action Recognition: A Dataset and Baseline Performances

    Authors: Arun V. Reddy, Ketul Shah, William Paul, Rohita Mocharla, Judy Hoffman, Kapil D. Katyal, Dinesh Manocha, Celso M. de Melo, Rama Chellappa

    Abstract: Human action recognition is a challenging problem, particularly when there is high variability in factors such as subject appearance, backgrounds and viewpoint. While deep neural networks (DNNs) have been shown to perform well on action recognition tasks, they typically require large amounts of high-quality labeled data to achieve robust performance across a variety of conditions. Synthetic data h… ▽ More

    Submitted 1 August, 2024; v1 submitted 17 March, 2023; originally announced March 2023.

    Comments: ICRA 2023. The first two authors contributed equally. Dataset available at: https://github.com/reddyav1/RoCoG-v2

  39. AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal Reasoning

    Authors: Xijun Wang, Ruiqi Xian, Tianrui Guan, Celso M. de Melo, Stephen M. Nogar, Aniket Bera, Dinesh Manocha

    Abstract: We propose a novel approach for aerial video action recognition. Our method is designed for videos captured using UAVs and can run on edge or mobile devices. We present a learning-based approach that uses customized auto zoom to automatically identify the human target and scale it appropriately. This makes it easier to extract the key features and reduces the computational overhead. We also presen… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted for publication at ICRA 2023

  40. arXiv:2302.07241  [pdf, other

    cs.CV cs.AI cs.RO

    ConceptFusion: Open-set Multimodal 3D Mapping

    Authors: Krishna Murthy Jatavallabhula, Alihusein Kuwajerwala, Qiao Gu, Mohd Omama, Tao Chen, Alaa Maalouf, Shuang Li, Ganesh Iyer, Soroush Saryazdi, Nikhil Keetha, Ayush Tewari, Joshua B. Tenenbaum, Celso Miguel de Melo, Madhava Krishna, Liam Paull, Florian Shkurti, Antonio Torralba

    Abstract: Building 3D maps of the environment is central to robot navigation, planning, and interaction with objects in a scene. Most existing approaches that integrate semantic concepts with 3D maps largely remain confined to the closed-set setting: they can only reason about a finite set of concepts, pre-defined at training time. Further, these maps can only be queried using class labels, or in recent wor… ▽ More

    Submitted 23 October, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

    Comments: RSS 2023. Project page: https://concept-fusion.github.io Explainer video: https://www.youtube.com/watch?v=rkXgws8fiDs Code: https://github.com/concept-fusion/concept-fusion

  41. arXiv:2211.05883  [pdf, other

    cs.CV

    Open-Set Automatic Target Recognition

    Authors: Bardia Safaei, Vibashan VS, Celso M. de Melo, Shuowen Hu, Vishal M. Patel

    Abstract: Automatic Target Recognition (ATR) is a category of computer vision algorithms which attempts to recognize targets on data obtained from different sensors. ATR algorithms are extensively used in real-world scenarios such as military and surveillance applications. Existing ATR algorithms are developed for traditional closed-set methods where training and testing have the same class distribution. Th… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

    Comments: 5 pages, 3 figures. Submitted to ICASSP 2023

  42. arXiv:2207.00925  [pdf

    cs.GT

    The Impact of Partner Expressions on Felt Emotion in the Iterated Prisoner's Dilemma: An Event-level Analysis

    Authors: Maria Angelika-Nikita, Celso M. de Melo, Kazunori Terada, Gale Lucas, Jonathan Gratch

    Abstract: Social games like the prisoner's dilemma are often used to develop models of the role of emotion in social decision-making. Here we examine an understudied aspect of emotion in such games: how an individual's feelings are shaped by their partner's expressions. Prior research has tended to focus on other aspects of emotion. Research on felt-emotion has focused on how an individual's feelings shape… ▽ More

    Submitted 2 July, 2022; originally announced July 2022.

    Comments: 18 pages, 7 figures, Ninth Annual Conference on Advances in Cognitive Systems

  43. On Structuring Functional Programs with Monoidal Profunctors

    Authors: Alexandre Garcia de Oliveira, Mauro Jaskelioff, Ana Cristina Vieira de Melo

    Abstract: We study monoidal profunctors as a tool to reason and structure pure functional programs both from a categorical perspective and as a Haskell implementation. From the categorical point of view we approach them as monoids in a certain monoidal category of profunctors. We study properties of this monoidal category and construct and implement the free monoidal profunctor. We study the relationship of… ▽ More

    Submitted 2 July, 2022; originally announced July 2022.

    Comments: In Proceedings MSFP 2022, arXiv:2206.09534

    Journal ref: EPTCS 360, 2022, pp. 134-150

  44. arXiv:2206.10779  [pdf, other

    cs.CV

    Not Just Streaks: Towards Ground Truth for Single Image Deraining

    Authors: Yunhao Ba, Howard Zhang, Ethan Yang, Akira Suzuki, Arnold Pfahnl, Chethan Chinder Chandrappa, Celso de Melo, Suya You, Stefano Soatto, Alex Wong, Achuta Kadambi

    Abstract: We propose a large-scale dataset of real-world rainy and clean image pairs and a method to remove degradations, induced by rain streaks and rain accumulation, from the image. As there exists no real-world dataset for deraining, current state-of-the-art methods rely on synthetic data and thus are limited by the sim2real domain gap; moreover, rigorous evaluation remains a challenge due to the absenc… ▽ More

    Submitted 29 July, 2024; v1 submitted 21 June, 2022; originally announced June 2022.

  45. arXiv:2206.06614  [pdf, other

    cs.LG cs.AI

    Transformers are Meta-Reinforcement Learners

    Authors: Luckeciano C. Melo

    Abstract: The transformer architecture and variants presented remarkable success across many machine learning tasks in recent years. This success is intrinsically related to the capability of handling long sequences and the presence of context-dependent weights from the attention mechanism. We argue that these capabilities suit the central role of a Meta-Reinforcement Learning algorithm. Indeed, a meta-RL a… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Comments: Published at the International Conference on Machine Learning (ICML) 2022

  46. arXiv:2203.14779  [pdf, other

    cs.CV cs.HC cs.SD eess.AS

    A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition

    Authors: R. Gnana Praveen, Wheidima Carneiro de Melo, Nasib Ullah, Haseeb Aslam, Osama Zeeshan, Théo Denorme, Marco Pedersoli, Alessandro Koerich, Simon Bacon, Patrick Cardinal, Eric Granger

    Abstract: Multimodal emotion recognition has recently gained much attention since it can leverage diverse and complementary relationships over multiple modalities (e.g., audio, visual, biosignals, etc.), and can provide some robustness to noisy modalities. Most state-of-the-art methods for audio-visual (A-V) fusion rely on recurrent networks or conventional attention mechanisms that do not effectively lever… ▽ More

    Submitted 6 July, 2024; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: arXiv admin note: text overlap with arXiv:2111.05222

  47. arXiv:2203.11111  [pdf, other

    cs.CV

    Facial Expression Analysis Using Decomposed Multiscale Spatiotemporal Networks

    Authors: Wheidima Carneiro de Melo, Eric Granger, Miguel Bordallo Lopez

    Abstract: Video-based analysis of facial expressions has been increasingly applied to infer health states of individuals, such as depression and pain. Among the existing approaches, deep learning models composed of structures for multiscale spatiotemporal processing have shown strong potential for encoding facial dynamics. However, such models have high computational complexity, making for a difficult deplo… ▽ More

    Submitted 21 March, 2022; originally announced March 2022.

  48. arXiv:2109.02304  [pdf, other

    cs.SE

    Towards Multi-Criteria Prioritization of Best Practices in Research Artifact Sharing

    Authors: Carlos Diego Nascimento Damasceno, Isotilia Costa Melo, Daniel Struber

    Abstract: Research artifact sharing is known to strengthen the transparency of scientific studies. However, in the lack of common discipline-specific guidelines for artifacts evaluation, subjective and conflicting expectations may happen and threaten artifact quality. In this paper, we discuss our preliminary ideas for a framework based on quality management principles (5W2H) that can aid in the establishme… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: 5 pages, 2 figures, Emerging results paper published in the 1st Workshop on Open Science Practices for Software Engineering (OpenScienSE 2021)

  49. arXiv:2103.05520  [pdf, other

    cs.SE

    Sustainability: Delivering Agility's Promise

    Authors: Jutta Eckstein, Claudia de O. Melo

    Abstract: Sustainability is a promise by agile development, as it is part of both the Agile Alliance's and the Scrum Alliance's vision. Thus far, not much has been delivered on this promise. This paper explores the Agile Manifesto and points out how agility could contribute to sustainability in its three dimensions - social, economic, and environmental. Additionally, this paper provides some sample cases of… ▽ More

    Submitted 9 March, 2021; originally announced March 2021.

    Comments: Submitted to C. Calero et al, Software Sustainability, Springer International Publishing, 2021

  50. arXiv:2010.07035  [pdf, other

    cs.IR cs.HC cs.LG stat.ML

    MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces

    Authors: Marlesson R. O. Santana, Luckeciano C. Melo, Fernando H. F. Camargo, Bruno Brandão, Anderson Soares, Renan M. Oliveira, Sandor Caetano

    Abstract: Recommender Systems are especially challenging for marketplaces since they must maximize user satisfaction while maintaining the healthiness and fairness of such ecosystems. In this context, we observed a lack of resources to design, train, and evaluate agents that learn by interacting within these environments. For this matter, we propose MARS-Gym, an open-source framework to empower researchers… ▽ More

    Submitted 30 September, 2020; originally announced October 2020.

    Comments: 15 pages, 14 figures, see https://github.com/deeplearningbrasil/mars-gym

    ACM Class: I.6.5; H.4.2