-
MedBrowseComp: Benchmarking Medical Deep Research and Computer Use
Authors:
Shan Chen,
Pedro Moreira,
Yuxin Xiao,
Sam Schmidgall,
Jeremy Warner,
Hugo Aerts,
Thomas Hartvigsen,
Jack Gallifant,
Danielle S. Bitterman
Abstract:
Large language models (LLMs) are increasingly envisioned as decision-support tools in clinical practice, yet safe clinical reasoning demands integrating heterogeneous knowledge bases -- trials, primary studies, regulatory documents, and cost data -- under strict accuracy constraints. Existing evaluations often rely on synthetic prompts, reduce the task to single-hop factoid queries, or conflate re…
▽ More
Large language models (LLMs) are increasingly envisioned as decision-support tools in clinical practice, yet safe clinical reasoning demands integrating heterogeneous knowledge bases -- trials, primary studies, regulatory documents, and cost data -- under strict accuracy constraints. Existing evaluations often rely on synthetic prompts, reduce the task to single-hop factoid queries, or conflate reasoning with open-ended generation, leaving their real-world utility unclear. To close this gap, we present MedBrowseComp, the first benchmark that systematically tests an agent's ability to reliably retrieve and synthesize multi-hop medical facts from live, domain-specific knowledge bases. MedBrowseComp contains more than 1,000 human-curated questions that mirror clinical scenarios where practitioners must reconcile fragmented or conflicting information to reach an up-to-date conclusion. Applying MedBrowseComp to frontier agentic systems reveals performance shortfalls as low as ten percent, exposing a critical gap between current LLM capabilities and the rigor demanded in clinical settings. MedBrowseComp therefore offers a clear testbed for reliable medical information seeking and sets concrete goals for future model and toolchain upgrades. You can visit our project page at: https://moreirap12.github.io/mbc-browse-app/
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
MIRACL-VISION: A Large, multilingual, visual document retrieval benchmark
Authors:
Radek Osmulski,
Gabriel de Souza P. Moreira,
Ronay Ak,
Mengyao Xu,
Benedikt Schifferer,
Even Oldridge
Abstract:
Document retrieval is an important task for search and Retrieval-Augmented Generation (RAG) applications. Large Language Models (LLMs) have contributed to improving the accuracy of text-based document retrieval. However, documents with complex layout and visual elements like tables, charts and infographics are not perfectly represented in textual format. Recently, image-based document retrieval pi…
▽ More
Document retrieval is an important task for search and Retrieval-Augmented Generation (RAG) applications. Large Language Models (LLMs) have contributed to improving the accuracy of text-based document retrieval. However, documents with complex layout and visual elements like tables, charts and infographics are not perfectly represented in textual format. Recently, image-based document retrieval pipelines have become popular, which use visual large language models (VLMs) to retrieve relevant page images given a query. Current evaluation benchmarks on visual document retrieval are limited, as they primarily focus only English language, rely on synthetically generated questions and offer a small corpus size. Therefore, we introduce MIRACL-VISION, a multilingual visual document retrieval evaluation benchmark. MIRACL-VISION covers 18 languages, and is an extension of the MIRACL dataset, a popular benchmark to evaluate text-based multilingual retrieval pipelines. MIRACL was built using a human-intensive annotation process to generate high-quality questions. In order to reduce MIRACL-VISION corpus size to make evaluation more compute friendly while keeping the datasets challenging, we have designed a method for eliminating the "easy" negatives from the corpus. We conducted extensive experiments comparing MIRACL-VISION with other benchmarks, using popular public text and image models. We observe a gap in state-of-the-art VLM-based embedding models on multilingual capabilities, with up to 59.7% lower retrieval accuracy than a text-based retrieval models. Even for the English language, the visual models retrieval accuracy is 12.1% lower compared to text-based models. MIRACL-VISION is a challenging, representative, multilingual evaluation benchmark for visual retrieval pipelines and will help the community build robust models for document retrieval.
△ Less
Submitted 21 May, 2025; v1 submitted 16 May, 2025;
originally announced May 2025.
-
Leveraging graph neural networks and mobility data for COVID-19 forecasting
Authors:
Fernando H. O. Duarte,
Gladston J. P. Moreira,
Eduardo J. S. Luz,
Leonardo B. L. Santos,
Vander L. S. Freitas
Abstract:
The COVID-19 pandemic has victimized over 7 million people to date, prompting diverse research efforts. Spatio-temporal models combining mobility data with machine learning have gained attention for disease forecasting. Here, we explore Graph Convolutional Recurrent Network (GCRN) and Graph Convolutional Long Short-Term Memory (GCLSTM), which combine the power of Graph Neural Networks (GNN) with t…
▽ More
The COVID-19 pandemic has victimized over 7 million people to date, prompting diverse research efforts. Spatio-temporal models combining mobility data with machine learning have gained attention for disease forecasting. Here, we explore Graph Convolutional Recurrent Network (GCRN) and Graph Convolutional Long Short-Term Memory (GCLSTM), which combine the power of Graph Neural Networks (GNN) with traditional architectures that deal with sequential data. The aim is to forecast future values of COVID-19 cases in Brazil and China by leveraging human mobility networks, whose nodes represent geographical locations and links are flows of vehicles or people. We show that employing backbone extraction to filter out negligible connections in the mobility network enhances predictive stability. Comparing regression and classification tasks demonstrates that binary classification yields smoother, more interpretable results. Interestingly, we observe qualitatively equivalent results for both Brazil and China datasets by introducing sliding windows of variable size and prediction horizons. Compared to prior studies, introducing the sliding window and the network backbone extraction strategies yields improvements of about 80% in root mean squared errors.
△ Less
Submitted 20 January, 2025;
originally announced January 2025.
-
RoBIn: A Transformer-Based Model For Risk Of Bias Inference With Machine Reading Comprehension
Authors:
Abel Corrêa Dias,
Viviane Pereira Moreira,
João Luiz Dihl Comba
Abstract:
Objective: Scientific publications play a crucial role in uncovering insights, testing novel drugs, and shaping healthcare policies. Accessing the quality of publications requires evaluating their Risk of Bias (RoB), a process typically conducted by human reviewers. In this study, we introduce a new dataset for machine reading comprehension and RoB assessment and present RoBIn (Risk of Bias Infere…
▽ More
Objective: Scientific publications play a crucial role in uncovering insights, testing novel drugs, and shaping healthcare policies. Accessing the quality of publications requires evaluating their Risk of Bias (RoB), a process typically conducted by human reviewers. In this study, we introduce a new dataset for machine reading comprehension and RoB assessment and present RoBIn (Risk of Bias Inference), an innovative model crafted to automate such evaluation. The model employs a dual-task approach, extracting evidence from a given context and assessing the RoB based on the gathered evidence. Methods: We use data from the Cochrane Database of Systematic Reviews (CDSR) as ground truth to label open-access clinical trial publications from PubMed. This process enabled us to develop training and test datasets specifically for machine reading comprehension and RoB inference. Additionally, we created extractive (RoBInExt) and generative (RoBInGen) Transformer-based approaches to extract relevant evidence and classify the RoB effectively. Results: RoBIn is evaluated across various settings and benchmarked against state-of-the-art methods for RoB inference, including large language models in multiple scenarios. In most cases, the best-performing RoBIn variant surpasses traditional machine learning and LLM-based approaches, achieving an ROC AUC of 0.83. Conclusion: Based on the evidence extracted from clinical trial reports, RoBIn performs a binary classification to decide whether the trial is at a low RoB or a high/unclear RoB. We found that both RoBInGen and RoBInExt are robust and have the best results in many settings.
△ Less
Submitted 28 October, 2024;
originally announced October 2024.
-
Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG
Authors:
Gabriel de Souza P. Moreira,
Ronay Ak,
Benedikt Schifferer,
Mengyao Xu,
Radek Osmulski,
Even Oldridge
Abstract:
Ranking models play a crucial role in enhancing overall accuracy of text retrieval systems. These multi-stage systems typically utilize either dense embedding models or sparse lexical indices to retrieve relevant passages based on a given query, followed by ranking models that refine the ordering of the candidate passages by its relevance to the query.
This paper benchmarks various publicly avai…
▽ More
Ranking models play a crucial role in enhancing overall accuracy of text retrieval systems. These multi-stage systems typically utilize either dense embedding models or sparse lexical indices to retrieve relevant passages based on a given query, followed by ranking models that refine the ordering of the candidate passages by its relevance to the query.
This paper benchmarks various publicly available ranking models and examines their impact on ranking accuracy. We focus on text retrieval for question-answering tasks, a common use case for Retrieval-Augmented Generation systems. Our evaluation benchmarks include models some of which are commercially viable for industrial applications.
We introduce a state-of-the-art ranking model, NV-RerankQA-Mistral-4B-v3, which achieves a significant accuracy increase of ~14% compared to pipelines with other rerankers. We also provide an ablation study comparing the fine-tuning of ranking models with different sizes, losses and self-attention mechanisms.
Finally, we discuss challenges of text retrieval pipelines with ranking models in real-world industry applications, in particular the trade-offs among model size, ranking accuracy and system requirements like indexing and serving latency / throughput.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
NV-Retriever: Improving text embedding models with effective hard-negative mining
Authors:
Gabriel de Souza P. Moreira,
Radek Osmulski,
Mengyao Xu,
Ronay Ak,
Benedikt Schifferer,
Even Oldridge
Abstract:
Text embedding models have been popular for information retrieval applications such as semantic search and Question-Answering systems based on Retrieval-Augmented Generation (RAG). Those models are typically Transformer models that are fine-tuned with contrastive learning objectives. One of the challenging aspects of fine-tuning embedding models is the selection of high quality hard-negative passa…
▽ More
Text embedding models have been popular for information retrieval applications such as semantic search and Question-Answering systems based on Retrieval-Augmented Generation (RAG). Those models are typically Transformer models that are fine-tuned with contrastive learning objectives. One of the challenging aspects of fine-tuning embedding models is the selection of high quality hard-negative passages for contrastive learning. In this paper we introduce a family of positive-aware mining methods that use the positive relevance score as an anchor for effective false negative removal, leading to faster training and more accurate retrieval models. We provide an ablation study on hard-negative mining methods over their configurations, exploring different teacher and base models. We further demonstrate the efficacy of our proposed mining methods at scale with the NV-Retriever-v1 model, which scores 60.9 on MTEB Retrieval (BEIR) benchmark and placed 1st when it was published to the MTEB Retrieval on July, 2024.
△ Less
Submitted 7 February, 2025; v1 submitted 22 July, 2024;
originally announced July 2024.
-
Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks
Authors:
Jack Gallifant,
Shan Chen,
Pedro Moreira,
Nikolaj Munch,
Mingye Gao,
Jackson Pond,
Leo Anthony Celi,
Hugo Aerts,
Thomas Hartvigsen,
Danielle Bitterman
Abstract:
Medical knowledge is context-dependent and requires consistent reasoning across various natural language expressions of semantically equivalent phrases. This is particularly crucial for drug names, where patients often use brand names like Advil or Tylenol instead of their generic equivalents. To study this, we create a new robustness dataset, RABBITS, to evaluate performance differences on medica…
▽ More
Medical knowledge is context-dependent and requires consistent reasoning across various natural language expressions of semantically equivalent phrases. This is particularly crucial for drug names, where patients often use brand names like Advil or Tylenol instead of their generic equivalents. To study this, we create a new robustness dataset, RABBITS, to evaluate performance differences on medical benchmarks after swapping brand and generic drug names using physician expert annotations.
We assess both open-source and API-based LLMs on MedQA and MedMCQA, revealing a consistent performance drop ranging from 1-10\%. Furthermore, we identify a potential source of this fragility as the contamination of test data in widely used pre-training datasets. All code is accessible at https://github.com/BittermanLab/RABBITS, and a HuggingFace leaderboard is available at https://huggingface.co/spaces/AIM-Harvard/rabbits-leaderboard.
△ Less
Submitted 18 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
BVE + EKF: A viewpoint estimator for the estimation of the object's position in the 3D task space using Extended Kalman Filters
Authors:
Sandro Costa Magalhães,
António Paulo Moreira,
Filipe Neves dos Santos,
Jorge Dias
Abstract:
RGB-D sensors face multiple challenges operating under open-field environments because of their sensitivity to external perturbations such as radiation or rain. Multiple works are approaching the challenge of perceiving the 3D position of objects using monocular cameras. However, most of these works focus mainly on deep learning-based solutions, which are complex, data-driven, and difficult to pre…
▽ More
RGB-D sensors face multiple challenges operating under open-field environments because of their sensitivity to external perturbations such as radiation or rain. Multiple works are approaching the challenge of perceiving the 3D position of objects using monocular cameras. However, most of these works focus mainly on deep learning-based solutions, which are complex, data-driven, and difficult to predict. So, we aim to approach the problem of predicting the 3D objects' position using a Gaussian viewpoint estimator named best viewpoint estimator (BVE) powered by an extended Kalman filter (EKF). The algorithm proved efficient on the tasks and reached a maximum average Euclidean error of about 32 mm. The experiments were deployed and evaluated in MATLAB using artificial Gaussian noise. Future work aims to implement the system in a robotic system.
△ Less
Submitted 3 October, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias
Authors:
Shan Chen,
Jack Gallifant,
Mingye Gao,
Pedro Moreira,
Nikolaj Munch,
Ajay Muthukkumar,
Arvind Rajan,
Jaya Kolluri,
Amelia Fiske,
Janna Hastings,
Hugo Aerts,
Brian Anthony,
Leo Anthony Celi,
William G. La Cava,
Danielle S. Bitterman
Abstract:
Large language models (LLMs) are increasingly essential in processing natural languages, yet their application is frequently compromised by biases and inaccuracies originating in their training data. In this study, we introduce Cross-Care, the first benchmark framework dedicated to assessing biases and real world knowledge in LLMs, specifically focusing on the representation of disease prevalence…
▽ More
Large language models (LLMs) are increasingly essential in processing natural languages, yet their application is frequently compromised by biases and inaccuracies originating in their training data. In this study, we introduce Cross-Care, the first benchmark framework dedicated to assessing biases and real world knowledge in LLMs, specifically focusing on the representation of disease prevalence across diverse demographic groups. We systematically evaluate how demographic biases embedded in pre-training corpora like $ThePile$ influence the outputs of LLMs. We expose and quantify discrepancies by juxtaposing these biases against actual disease prevalences in various U.S. demographic groups. Our results highlight substantial misalignment between LLM representation of disease prevalence and real disease prevalence rates across demographic subgroups, indicating a pronounced risk of bias propagation and a lack of real-world grounding for medical applications of LLMs. Furthermore, we observe that various alignment methods minimally resolve inconsistencies in the models' representation of disease prevalence across different languages. For further exploration and analysis, we make all data and a data visualization tool available at: www.crosscare.net.
△ Less
Submitted 24 June, 2024; v1 submitted 8 May, 2024;
originally announced May 2024.
-
Leveraging Visibility Graphs for Enhanced Arrhythmia Classification with Graph Convolutional Networks
Authors:
Rafael F. Oliveira,
Gladston J. P. Moreira,
Vander L. S. Freitas,
Eduardo J. S. Luz
Abstract:
Arrhythmias, detectable through electrocardiograms (ECGs), pose significant health risks, underscoring the need for accurate and efficient automated detection techniques. While recent advancements in graph-based methods have demonstrated potential to enhance arrhythmia classification, the challenge lies in effectively representing ECG signals as graphs. This study investigates the use of Visibilit…
▽ More
Arrhythmias, detectable through electrocardiograms (ECGs), pose significant health risks, underscoring the need for accurate and efficient automated detection techniques. While recent advancements in graph-based methods have demonstrated potential to enhance arrhythmia classification, the challenge lies in effectively representing ECG signals as graphs. This study investigates the use of Visibility Graph (VG) and Vector Visibility Graph (VVG) representations combined with Graph Convolutional Networks (GCNs) for arrhythmia classification under the ANSI/AAMI standard, ensuring reproducibility and fair comparison with other techniques. Through extensive experiments on the MIT-BIH dataset, we evaluate various GCN architectures and preprocessing parameters. Our findings demonstrate that VG and VVG mappings enable GCNs to classify arrhythmias directly from raw ECG signals, without the need for preprocessing or noise removal. Notably, VG offers superior computational efficiency, while VVG delivers enhanced classification performance by leveraging additional lead features. The proposed approach outperforms baseline methods in several metrics, although challenges persist in classifying the supraventricular ectopic beat (S) class, particularly under the inter-patient paradigm.
△ Less
Submitted 3 December, 2024; v1 submitted 19 April, 2024;
originally announced April 2024.
-
LlamaRec: Two-Stage Recommendation using Large Language Models for Ranking
Authors:
Zhenrui Yue,
Sara Rabhi,
Gabriel de Souza Pereira Moreira,
Dong Wang,
Even Oldridge
Abstract:
Recently, large language models (LLMs) have exhibited significant progress in language understanding and generation. By leveraging textual features, customized LLMs are also applied for recommendation and demonstrate improvements across diverse recommendation scenarios. Yet the majority of existing methods perform training-free recommendation that heavily relies on pretrained knowledge (e.g., movi…
▽ More
Recently, large language models (LLMs) have exhibited significant progress in language understanding and generation. By leveraging textual features, customized LLMs are also applied for recommendation and demonstrate improvements across diverse recommendation scenarios. Yet the majority of existing methods perform training-free recommendation that heavily relies on pretrained knowledge (e.g., movie recommendation). In addition, inference on LLMs is slow due to autoregressive generation, rendering existing methods less effective for real-time recommendation. As such, we propose a two-stage framework using large language models for ranking-based recommendation (LlamaRec). In particular, we use small-scale sequential recommenders to retrieve candidates based on the user interaction history. Then, both history and retrieved items are fed to the LLM in text via a carefully designed prompt template. Instead of generating next-item titles, we adopt a verbalizer-based approach that transforms output logits into probability distributions over the candidate items. Therefore, the proposed LlamaRec can efficiently rank items without generating long text. To validate the effectiveness of the proposed framework, we compare against state-of-the-art baseline methods on benchmark datasets. Our experimental results demonstrate the performance of LlamaRec, which consistently achieves superior performance in both recommendation performance and efficiency.
△ Less
Submitted 25 October, 2023;
originally announced November 2023.
-
MonoVisual3DFilter: 3D tomatoes' localisation with monocular cameras using histogram filters
Authors:
Sandro Costa Magalhães,
Filipe Neves dos Santos,
António Paulo Moreira,
Jorge Dias
Abstract:
Performing tasks in agriculture, such as fruit monitoring or harvesting, requires perceiving the objects' spatial position. RGB-D cameras are limited under open-field environments due to lightning interferences. So, in this study, we state to answer the research question: "How can we use and control monocular sensors to perceive objects' position in the 3D task space?" Towards this aim, we approac…
▽ More
Performing tasks in agriculture, such as fruit monitoring or harvesting, requires perceiving the objects' spatial position. RGB-D cameras are limited under open-field environments due to lightning interferences. So, in this study, we state to answer the research question: "How can we use and control monocular sensors to perceive objects' position in the 3D task space?" Towards this aim, we approached histogram filters (Bayesian discrete filters) to estimate the position of tomatoes in the tomato plant through the algorithm MonoVisual3DFilter. Two kernel filters were studied: the square kernel and the Gaussian kernel. The implemented algorithm was essayed in simulation, with and without Gaussian noise and random noise, and in a testbed at laboratory conditions. The algorithm reported a mean absolute error lower than 10 mm in simulation and 20 mm in the testbed at laboratory conditions with an assessing distance of about 0.5 m. So, the results are viable for real environments and should be improved at closer distances.
△ Less
Submitted 3 October, 2024; v1 submitted 9 October, 2023;
originally announced October 2023.
-
Regularization Through Simultaneous Learning: A Case Study on Plant Classification
Authors:
Pedro Henrique Nascimento Castro,
Gabriel Cássia Fortuna,
Rafael Alves Bonfim de Queiroz,
Gladston Juliano Prates Moreira,
Eduardo José da Silva Luz
Abstract:
In response to the prevalent challenge of overfitting in deep neural networks, this paper introduces Simultaneous Learning, a regularization approach drawing on principles of Transfer Learning and Multi-task Learning. We leverage auxiliary datasets with the target dataset, the UFOP-HVD, to facilitate simultaneous classification guided by a customized loss function featuring an inter-group penalty.…
▽ More
In response to the prevalent challenge of overfitting in deep neural networks, this paper introduces Simultaneous Learning, a regularization approach drawing on principles of Transfer Learning and Multi-task Learning. We leverage auxiliary datasets with the target dataset, the UFOP-HVD, to facilitate simultaneous classification guided by a customized loss function featuring an inter-group penalty. This experimental configuration allows for a detailed examination of model performance across similar (PlantNet) and dissimilar (ImageNet) domains, thereby enriching the generalizability of Convolutional Neural Network models. Remarkably, our approach demonstrates superior performance over models without regularization and those applying dropout regularization exclusively, enhancing accuracy by 5 to 22 percentage points. Moreover, when combined with dropout, the proposed approach improves generalization, securing state-of-the-art results for the UFOP-HVD challenge. The method also showcases efficiency with significantly smaller sample sizes, suggesting its broad applicability across a spectrum of related tasks. In addition, an interpretability approach is deployed to evaluate feature quality by analyzing class feature correlations within the network's convolutional layers. The findings of this study provide deeper insights into the efficacy of Simultaneous Learning, particularly concerning its interaction with the auxiliary and target datasets.
△ Less
Submitted 20 June, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
E Pluribus Unum: Guidelines on Multi-Objective Evaluation of Recommender Systems
Authors:
Patrick John Chia,
Giuseppe Attanasio,
Jacopo Tagliabue,
Federico Bianchi,
Ciro Greco,
Gabriel de Souza P. Moreira,
Davide Eynard,
Fahd Husain
Abstract:
Recommender Systems today are still mostly evaluated in terms of accuracy, with other aspects beyond the immediate relevance of recommendations, such as diversity, long-term user retention and fairness, often taking a back seat. Moreover, reconciling multiple performance perspectives is by definition indeterminate, presenting a stumbling block to those in the pursuit of rounded evaluation of Recom…
▽ More
Recommender Systems today are still mostly evaluated in terms of accuracy, with other aspects beyond the immediate relevance of recommendations, such as diversity, long-term user retention and fairness, often taking a back seat. Moreover, reconciling multiple performance perspectives is by definition indeterminate, presenting a stumbling block to those in the pursuit of rounded evaluation of Recommender Systems. EvalRS 2022 -- a data challenge designed around Multi-Objective Evaluation -- was a first practical endeavour, providing many insights into the requirements and challenges of balancing multiple objectives in evaluation. In this work, we reflect on EvalRS 2022 and expound upon crucial learnings to formulate a first-principles approach toward Multi-Objective model selection, and outline a set of guidelines for carrying out a Multi-Objective Evaluation challenge, with potential applicability to the problem of rounded evaluation of competing models in real-world deployments.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Robust human position estimation in cooperative robotic cells
Authors:
António Amorim,
Diana Guimarães,
Tiago Mendonça,
Pedro Neto,
Paulo Costa,
António Paulo Moreira
Abstract:
Robots are increasingly present in our lives, sharing the workspace and tasks with human co-workers. However, existing interfaces for human-robot interaction / cooperation (HRI/C) have limited levels of intuitiveness to use and safety is a major concern when humans and robots share the same workspace. Many times, this is due to the lack of a reliable estimation of the human pose in space which is…
▽ More
Robots are increasingly present in our lives, sharing the workspace and tasks with human co-workers. However, existing interfaces for human-robot interaction / cooperation (HRI/C) have limited levels of intuitiveness to use and safety is a major concern when humans and robots share the same workspace. Many times, this is due to the lack of a reliable estimation of the human pose in space which is the primary input to calculate the human-robot minimum distance (required for safety and collision avoidance) and HRI/C featuring machine learning algorithms classifying human behaviours / gestures. Each sensor type has its own characteristics resulting in problems such as occlusions (vision) and drift (inertial) when used in an isolated fashion. In this paper, it is proposed a combined system that merges the human tracking provided by a 3D vision sensor with the pose estimation provided by a set of inertial measurement units (IMUs) placed in human body limbs. The IMUs compensate the gaps in occluded areas to have tracking continuity. To mitigate the lingering effects of the IMU offset we propose a continuous online calculation of the offset value. Experimental tests were designed to simulate human motion in a human-robot collaborative environment where the robot moves away to avoid unexpected collisions with de human. Results indicate that our approach is able to capture the human\textsc's position, for example the forearm, with a precision in the millimetre range and robustness to occlusions.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
Benchmarking Edge Computing Devices for Grape Bunches and Trunks Detection using Accelerated Object Detection Single Shot MultiBox Deep Learning Models
Authors:
Sandro Costa Magalhães,
Filipe Neves Santos,
Pedro Machado,
António Paulo Moreira,
Jorge Dias
Abstract:
Purpose: Visual perception enables robots to perceive the environment. Visual data is processed using computer vision algorithms that are usually time-expensive and require powerful devices to process the visual data in real-time, which is unfeasible for open-field robots with limited energy. This work benchmarks the performance of different heterogeneous platforms for object detection in real-tim…
▽ More
Purpose: Visual perception enables robots to perceive the environment. Visual data is processed using computer vision algorithms that are usually time-expensive and require powerful devices to process the visual data in real-time, which is unfeasible for open-field robots with limited energy. This work benchmarks the performance of different heterogeneous platforms for object detection in real-time. This research benchmarks three architectures: embedded GPU -- Graphical Processing Units (such as NVIDIA Jetson Nano 2 GB and 4 GB, and NVIDIA Jetson TX2), TPU -- Tensor Processing Unit (such as Coral Dev Board TPU), and DPU -- Deep Learning Processor Unit (such as in AMD-Xilinx ZCU104 Development Board, and AMD-Xilinx Kria KV260 Starter Kit). Method: The authors used the RetinaNet ResNet-50 fine-tuned using the natural VineSet dataset. After the trained model was converted and compiled for target-specific hardware formats to improve the execution efficiency. Conclusions and Results: The platforms were assessed in terms of performance of the evaluation metrics and efficiency (time of inference). Graphical Processing Units (GPUs) were the slowest devices, running at 3 FPS to 5 FPS, and Field Programmable Gate Arrays (FPGAs) were the fastest devices, running at 14 FPS to 25 FPS. The efficiency of the Tensor Processing Unit (TPU) is irrelevant and similar to NVIDIA Jetson TX2. TPU and GPU are the most power-efficient, consuming about 5W. The performance differences, in the evaluation metrics, across devices are irrelevant and have an F1 of about 70 % and mean Average Precision (mAP) of about 60 %.
△ Less
Submitted 21 November, 2022;
originally announced November 2022.
-
Omnidirectional robot modeling and simulation
Authors:
Sandro Costa Magalhães,
António Paulo Moreira,
Paulo Costa
Abstract:
A robot simulation system is a basic need for any robotics application. With it, developers' teams of robots can test their algorithms and make initial calibrations without risk of damage to the real robots, assuring safety. However, building these simulation environments is usually time-consuming work, and when considering robot fleets, the simulation reveals to be computing expensive. With it, d…
▽ More
A robot simulation system is a basic need for any robotics application. With it, developers' teams of robots can test their algorithms and make initial calibrations without risk of damage to the real robots, assuring safety. However, building these simulation environments is usually time-consuming work, and when considering robot fleets, the simulation reveals to be computing expensive. With it, developers building teams of robots can test their algorithms and make initial calibrations without risk of damage to the real robots, assuring safety. An omnidirectional robot from the 5DPO robotics soccer team served to test this approach. The modeling issue was divided into two steps: modeling the motor's non-linear features and modeling the general behavior of the robot. A proper fitting of the robot was reached, considering the velocity robot's response.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
Automated Medical Device Display Reading Using Deep Learning Object Detection
Authors:
Lucas P. Moreira
Abstract:
Telemedicine and mobile health applications, especially during the quarantine imposed by the covid-19 pandemic, led to an increase on the need of transferring health monitor readings from patients to specialists. Considering that most home medical devices use seven-segment displays, an automatic display reading algorithm should provide a more reliable tool for remote health care. This work propose…
▽ More
Telemedicine and mobile health applications, especially during the quarantine imposed by the covid-19 pandemic, led to an increase on the need of transferring health monitor readings from patients to specialists. Considering that most home medical devices use seven-segment displays, an automatic display reading algorithm should provide a more reliable tool for remote health care. This work proposes an end-to-end method for detection and reading seven-segment displays from medical devices based on deep learning object detection models. Two state of the art model families, EfficientDet and EfficientDet-lite, previously trained with the MS-COCO dataset, were fine-tuned on a dataset comprised by medical devices photos taken with mobile digital cameras, to simulate real case applications. Evaluation of the trained model show high efficiency, where all models achieved more than 98% of detection precision and more than 98% classification accuracy, with model EfficientDet-lite1 showing 100% detection precision and 100% correct digit classification for a test set of 104 images and 438 digits.
△ Less
Submitted 3 October, 2022;
originally announced October 2022.
-
EvalRS: a Rounded Evaluation of Recommender Systems
Authors:
Jacopo Tagliabue,
Federico Bianchi,
Tobias Schnabel,
Giuseppe Attanasio,
Ciro Greco,
Gabriel de Souza P. Moreira,
Patrick John Chia
Abstract:
Much of the complexity of Recommender Systems (RSs) comes from the fact that they are used as part of more complex applications and affect user experience through a varied range of user interfaces. However, research focused almost exclusively on the ability of RSs to produce accurate item rankings while giving little attention to the evaluation of RS behavior in real-world scenarios. Such narrow f…
▽ More
Much of the complexity of Recommender Systems (RSs) comes from the fact that they are used as part of more complex applications and affect user experience through a varied range of user interfaces. However, research focused almost exclusively on the ability of RSs to produce accurate item rankings while giving little attention to the evaluation of RS behavior in real-world scenarios. Such narrow focus has limited the capacity of RSs to have a lasting impact in the real world and makes them vulnerable to undesired behavior, such as reinforcing data biases. We propose EvalRS as a new type of challenge, in order to foster this discussion among practitioners and build in the open new methodologies for testing RSs "in the wild".
△ Less
Submitted 12 August, 2022; v1 submitted 12 July, 2022;
originally announced July 2022.
-
An End-to-End Approach for Seam Carving Detection using Deep Neural Networks
Authors:
Thierry P. Moreira,
Marcos Cleison S. Santana,
Leandro A. Passos João Paulo Papa,
Kelton Augusto P. da Costa
Abstract:
Seam carving is a computational method capable of resizing images for both reduction and expansion based on its content, instead of the image geometry. Although the technique is mostly employed to deal with redundant information, i.e., regions composed of pixels with similar intensity, it can also be used for tampering images by inserting or removing relevant objects. Therefore, detecting such a p…
▽ More
Seam carving is a computational method capable of resizing images for both reduction and expansion based on its content, instead of the image geometry. Although the technique is mostly employed to deal with redundant information, i.e., regions composed of pixels with similar intensity, it can also be used for tampering images by inserting or removing relevant objects. Therefore, detecting such a process is of extreme importance regarding the image security domain. However, recognizing seam-carved images does not represent a straightforward task even for human eyes, and robust computation tools capable of identifying such alterations are very desirable. In this paper, we propose an end-to-end approach to cope with the problem of automatic seam carving detection that can obtain state-of-the-art results. Experiments conducted over public and private datasets with several tampering configurations evidence the suitability of the proposed model.
△ Less
Submitted 5 March, 2022;
originally announced March 2022.
-
Gait Recognition Based on Deep Learning: A Survey
Authors:
Claudio Filipi Gonçalves dos Santos,
Diego de Souza Oliveira,
Leandro A. Passos,
Rafael Gonçalves Pires,
Daniel Felipe Silva Santos,
Lucas Pascotti Valem,
Thierry P. Moreira,
Marcos Cleison S. Santana,
Mateus Roder,
João Paulo Papa,
Danilo Colombo
Abstract:
In general, biometry-based control systems may not rely on individual expected behavior or cooperation to operate appropriately. Instead, such systems should be aware of malicious procedures for unauthorized access attempts. Some works available in the literature suggest addressing the problem through gait recognition approaches. Such methods aim at identifying human beings through intrinsic perce…
▽ More
In general, biometry-based control systems may not rely on individual expected behavior or cooperation to operate appropriately. Instead, such systems should be aware of malicious procedures for unauthorized access attempts. Some works available in the literature suggest addressing the problem through gait recognition approaches. Such methods aim at identifying human beings through intrinsic perceptible features, despite dressed clothes or accessories. Although the issue denotes a relatively long-time challenge, most of the techniques developed to handle the problem present several drawbacks related to feature extraction and low classification rates, among other issues. However, deep learning-based approaches recently emerged as a robust set of tools to deal with virtually any image and computer-vision related problem, providing paramount results for gait recognition as well. Therefore, this work provides a surveyed compilation of recent works regarding biometric detection through gait recognition with a focus on deep learning approaches, emphasizing their benefits, and exposing their weaknesses. Besides, it also presents categorized and characterized descriptions of the datasets, approaches, and architectures employed to tackle associated constraints.
△ Less
Submitted 10 January, 2022;
originally announced January 2022.
-
Synthetic Data and Simulators for Recommendation Systems: Current State and Future Directions
Authors:
Adam Lesnikowski,
Gabriel de Souza Pereira Moreira,
Sara Rabhi,
Karl Byleen-Higley
Abstract:
Synthetic data and simulators have the potential to markedly improve the performance and robustness of recommendation systems. These approaches have already had a beneficial impact in other machine-learning driven fields. We identify and discuss a key trade-off between data fidelity and privacy in the past work on synthetic data and simulators for recommendation systems. For the important use case…
▽ More
Synthetic data and simulators have the potential to markedly improve the performance and robustness of recommendation systems. These approaches have already had a beneficial impact in other machine-learning driven fields. We identify and discuss a key trade-off between data fidelity and privacy in the past work on synthetic data and simulators for recommendation systems. For the important use case of predicting algorithm rankings on real data from synthetic data, we provide motivation and current successes versus limitations. Finally we outline a number of exciting future directions for recommendation systems that we believe deserve further attention and work, including mixing real and synthetic data, feedback in dataset generation, robust simulations, and privacy-preserving methods.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
Evaluating the Single-Shot MultiBox Detector and YOLO Deep Learning Models for the Detection of Tomatoes in a Greenhouse
Authors:
Sandro A. Magalhães,
Luís Castro,
Germano Moreira,
Filipe N. Santos,
mário Cunha,
Jorge Dias,
António P. Moreira
Abstract:
The development of robotic solutions for agriculture requires advanced perception capabilities that can work reliably in any crop stage. For example, to automatise the tomato harvesting process in greenhouses, the visual perception system needs to detect the tomato in any life cycle stage (flower to the ripe tomato). The state-of-the-art for visual tomato detection focuses mainly on ripe tomato, w…
▽ More
The development of robotic solutions for agriculture requires advanced perception capabilities that can work reliably in any crop stage. For example, to automatise the tomato harvesting process in greenhouses, the visual perception system needs to detect the tomato in any life cycle stage (flower to the ripe tomato). The state-of-the-art for visual tomato detection focuses mainly on ripe tomato, which has a distinctive colour from the background. This paper contributes with an annotated visual dataset of green and reddish tomatoes. This kind of dataset is uncommon and not available for research purposes. This will enable further developments in edge artificial intelligence for in situ and in real-time visual tomato detection required for the development of harvesting robots. Considering this dataset, five deep learning models were selected, trained and benchmarked to detect green and reddish tomatoes grown in greenhouses. Considering our robotic platform specifications, only the Single-Shot MultiBox Detector (SSD) and YOLO architectures were considered. The results proved that the system can detect green and reddish tomatoes, even those occluded by leaves. SSD MobileNet v2 had the best performance when compared against SSD Inception v2, SSD ResNet 50, SSD ResNet 101 and YOLOv4 Tiny, reaching an F1-score of 66.15%, an mAP of 51.46% and an inference time of 16.44 ms with the NVIDIA Turing Architecture platform, an NVIDIA Tesla T4, with 12 GB. YOLOv4 Tiny also had impressive results, mainly concerning inferring times of about 5 ms.
△ Less
Submitted 2 September, 2021;
originally announced September 2021.
-
Transformers with multi-modal features and post-fusion context for e-commerce session-based recommendation
Authors:
Gabriel de Souza P. Moreira,
Sara Rabhi,
Ronay Ak,
Md Yasin Kabir,
Even Oldridge
Abstract:
Session-based recommendation is an important task for e-commerce services, where a large number of users browse anonymously or may have very distinct interests for different sessions. In this paper we present one of the winning solutions for the Recommendation task of the SIGIR 2021 Workshop on E-commerce Data Challenge. Our solution was inspired by NLP techniques and consists of an ensemble of tw…
▽ More
Session-based recommendation is an important task for e-commerce services, where a large number of users browse anonymously or may have very distinct interests for different sessions. In this paper we present one of the winning solutions for the Recommendation task of the SIGIR 2021 Workshop on E-commerce Data Challenge. Our solution was inspired by NLP techniques and consists of an ensemble of two Transformer architectures - Transformer-XL and XLNet - trained with autoregressive and autoencoding approaches. To leverage most of the rich dataset made available for the competition, we describe how we prepared multi-model features by combining tabular events with textual and image vectors. We also present a model prediction analysis to better understand the effectiveness of our architectures for the session-based recommendation.
△ Less
Submitted 11 July, 2021;
originally announced July 2021.
-
Simple Unsupervised Similarity-Based Aspect Extraction
Authors:
Danny Suarez Vargas,
Lucas R. C. Pessutto,
Viviane Pereira Moreira
Abstract:
In the context of sentiment analysis, there has been growing interest in performing a finer granularity analysis focusing on the specific aspects of the entities being evaluated. This is the goal of Aspect-Based Sentiment Analysis (ABSA) which basically involves two tasks: aspect extraction and polarity detection. The first task is responsible for discovering the aspects mentioned in the review te…
▽ More
In the context of sentiment analysis, there has been growing interest in performing a finer granularity analysis focusing on the specific aspects of the entities being evaluated. This is the goal of Aspect-Based Sentiment Analysis (ABSA) which basically involves two tasks: aspect extraction and polarity detection. The first task is responsible for discovering the aspects mentioned in the review text and the second task assigns a sentiment orientation (positive, negative, or neutral) to that aspect. Currently, the state-of-the-art in ABSA consists of the application of deep learning methods such as recurrent, convolutional and attention neural networks. The limitation of these techniques is that they require a lot of training data and are computationally expensive. In this paper, we propose a simple approach called SUAEx for aspect extraction. SUAEx is unsupervised and relies solely on the similarity of word embeddings. Experimental results on datasets from three different domains have shown that SUAEx achieves results that can outperform the state-of-the-art attention-based approach at a fraction of the time.
△ Less
Submitted 25 August, 2020;
originally announced August 2020.
-
BabelEnconding at SemEval-2020 Task 3: Contextual Similarity as a Combination of Multilingualism and Language Models
Authors:
Lucas R. C. Pessutto,
Tiago de Melo,
Viviane P. Moreira,
Altigran da Silva
Abstract:
This paper describes the system submitted by our team (BabelEnconding) to SemEval-2020 Task 3: Predicting the Graded Effect of Context in Word Similarity. We propose an approach that relies on translation and multilingual language models in order to compute the contextual similarity between pairs of words. Our hypothesis is that evidence from additional languages can leverage the correlation with…
▽ More
This paper describes the system submitted by our team (BabelEnconding) to SemEval-2020 Task 3: Predicting the Graded Effect of Context in Word Similarity. We propose an approach that relies on translation and multilingual language models in order to compute the contextual similarity between pairs of words. Our hypothesis is that evidence from additional languages can leverage the correlation with the human generated scores. BabelEnconding was applied to both subtasks and ranked among the top-3 in six out of eight task/language combinations and was the highest scoring system three times.
△ Less
Submitted 19 August, 2020;
originally announced August 2020.
-
Mono vs Multilingual Transformer-based Models: a Comparison across Several Language Tasks
Authors:
Diego de Vargas Feijo,
Viviane Pereira Moreira
Abstract:
BERT (Bidirectional Encoder Representations from Transformers) and ALBERT (A Lite BERT) are methods for pre-training language models which can later be fine-tuned for a variety of Natural Language Understanding tasks. These methods have been applied to a number of such tasks (mostly in English), achieving results that outperform the state-of-the-art. In this paper, our contribution is twofold. Fir…
▽ More
BERT (Bidirectional Encoder Representations from Transformers) and ALBERT (A Lite BERT) are methods for pre-training language models which can later be fine-tuned for a variety of Natural Language Understanding tasks. These methods have been applied to a number of such tasks (mostly in English), achieving results that outperform the state-of-the-art. In this paper, our contribution is twofold. First, we make available our trained BERT and Albert model for Portuguese. Second, we compare our monolingual and the standard multilingual models using experiments in semantic textual similarity, recognizing textual entailment, textual category classification, sentiment analysis, offensive comment detection, and fake news detection, to assess the effectiveness of the generated language representations. The results suggest that both monolingual and multilingual models are able to achieve state-of-the-art and the advantage of training a single language model, if any, is small.
△ Less
Submitted 19 July, 2020;
originally announced July 2020.
-
Hybrid Session-based News Recommendation using Recurrent Neural Networks
Authors:
Gabriel de Souza P. Moreira,
Dietmar Jannach,
Adilson Marques da Cunha
Abstract:
We describe a hybrid meta-architecture -- the CHAMELEON -- for session-based news recommendation that is able to leverage a variety of information types using Recurrent Neural Networks. We evaluated our approach on two public datasets, using a temporal evaluation protocol that simulates the dynamics of a news portal in a realistic way. Our results confirm the benefits of modeling the sequence of s…
▽ More
We describe a hybrid meta-architecture -- the CHAMELEON -- for session-based news recommendation that is able to leverage a variety of information types using Recurrent Neural Networks. We evaluated our approach on two public datasets, using a temporal evaluation protocol that simulates the dynamics of a news portal in a realistic way. Our results confirm the benefits of modeling the sequence of session clicks with RNNs and leveraging side information about users and articles, resulting in significantly higher recommendation accuracy and catalog coverage than other session-based algorithms.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
SemifreddoNets: Partially Frozen Neural Networks for Efficient Computer Vision Systems
Authors:
Leo F Isikdogan,
Bhavin V Nayak,
Chyuan-Tyng Wu,
Joao Peralta Moreira,
Sushma Rao,
Gilad Michael
Abstract:
We propose a system comprised of fixed-topology neural networks having partially frozen weights, named SemifreddoNets. SemifreddoNets work as fully-pipelined hardware blocks that are optimized to have an efficient hardware implementation. Those blocks freeze a certain portion of the parameters at every layer and replace the corresponding multipliers with fixed scalers. Fixing the weights reduces t…
▽ More
We propose a system comprised of fixed-topology neural networks having partially frozen weights, named SemifreddoNets. SemifreddoNets work as fully-pipelined hardware blocks that are optimized to have an efficient hardware implementation. Those blocks freeze a certain portion of the parameters at every layer and replace the corresponding multipliers with fixed scalers. Fixing the weights reduces the silicon area, logic delay, and memory requirements, leading to significant savings in cost and power consumption. Unlike traditional layer-wise freezing approaches, SemifreddoNets make a profitable trade between the cost and flexibility by having some of the weights configurable at different scales and levels of abstraction in the model. Although fixing the topology and some of the weights somewhat limits the flexibility, we argue that the efficiency benefits of this strategy outweigh the advantages of a fully configurable model for many use cases. Furthermore, our system uses repeatable blocks, therefore it has the flexibility to adjust model complexity without requiring any hardware change. The hardware implementation of SemifreddoNets provides up to an order of magnitude reduction in silicon area and power consumption as compared to their equivalent implementation on a general-purpose accelerator.
△ Less
Submitted 11 June, 2020;
originally announced June 2020.
-
CHAMELEON: A Deep Learning Meta-Architecture for News Recommender Systems [Phd. Thesis]
Authors:
Gabriel de Souza Pereira Moreira
Abstract:
Recommender Systems (RS) have became a popular research topic and, since 2016, Deep Learning methods and techniques have been increasingly explored in this area. News RS are aimed to personalize users experiences and help them discover relevant articles from a large and dynamic search space. The main contribution of this research was named CHAMELEON, a Deep Learning meta-architecture designed to t…
▽ More
Recommender Systems (RS) have became a popular research topic and, since 2016, Deep Learning methods and techniques have been increasingly explored in this area. News RS are aimed to personalize users experiences and help them discover relevant articles from a large and dynamic search space. The main contribution of this research was named CHAMELEON, a Deep Learning meta-architecture designed to tackle the specific challenges of news recommendation. It consists of a modular reference architecture which can be instantiated using different neural building blocks. As information about users' past interactions is scarce in the news domain, the user context can be leveraged to deal with the user cold-start problem. Articles' content is also important to tackle the item cold-start problem. Additionally, the temporal decay of items (articles) relevance is very accelerated in the news domain. Furthermore, external breaking events may temporally attract global readership attention, a phenomenon generally known as concept drift in machine learning. All those characteristics are explicitly modeled on this research by a contextual hybrid session-based recommendation approach using Recurrent Neural Networks. The task addressed by this research is session-based news recommendation, i.e., next-click prediction using only information available in the current user session. A method is proposed for a realistic temporal offline evaluation of such task, replaying the stream of user clicks and fresh articles being continuously published in a news portal. Experiments performed with two large datasets have shown the effectiveness of the CHAMELEON for news recommendation on many quality factors such as accuracy, item coverage, novelty, and reduced item cold-start problem, when compared to other traditional and state-of-the-art session-based recommendation algorithms.
△ Less
Submitted 29 December, 2019;
originally announced January 2020.
-
On the Importance of News Content Representation in Hybrid Neural Session-based Recommender Systems
Authors:
Gabriel de Souza P. Moreira,
Dietmar Jannach,
Adilson Marques da Cunha
Abstract:
News recommender systems are designed to surface relevant information for online readers by personalizing their user experiences. A particular problem in that context is that online readers are often anonymous, which means that this personalization can only be based on the last few recorded interactions with the user, a setting named session-based recommendation. Another particularity of the news…
▽ More
News recommender systems are designed to surface relevant information for online readers by personalizing their user experiences. A particular problem in that context is that online readers are often anonymous, which means that this personalization can only be based on the last few recorded interactions with the user, a setting named session-based recommendation. Another particularity of the news domain is that constantly fresh articles are published, which should be immediately considered for recommendation. To deal with this item cold-start problem, it is important to consider the actual content of items when recommending. Hybrid approaches are therefore often considered as the method of choice in such settings. In this work, we analyze the importance of considering content information in a hybrid neural news recommender system. We contrast content-aware and content-agnostic techniques and also explore the effects of using different content encodings. Experiments on two public datasets confirm the importance of adopting a hybrid approach. Furthermore, we show that the choice of the content encoding can have an impact on the resulting performance.
△ Less
Submitted 6 September, 2019; v1 submitted 12 July, 2019;
originally announced July 2019.
-
A Large Parallel Corpus of Full-Text Scientific Articles
Authors:
Felipe Soares,
Viviane Pereira Moreira,
Karin Becker
Abstract:
The Scielo database is an important source of scientific information in Latin America, containing articles from several research domains. A striking characteristic of Scielo is that many of its full-text contents are presented in more than one language, thus being a potential source of parallel corpora. In this article, we present the development of a parallel corpus from Scielo in three languages…
▽ More
The Scielo database is an important source of scientific information in Latin America, containing articles from several research domains. A striking characteristic of Scielo is that many of its full-text contents are presented in more than one language, thus being a potential source of parallel corpora. In this article, we present the development of a parallel corpus from Scielo in three languages: English, Portuguese, and Spanish. Sentences were automatically aligned using the Hunalign algorithm for all language pairs, and for a subset of trilingual articles also. We demonstrate the capabilities of our corpus by training a Statistical Machine Translation system (Moses) for each language pair, which outperformed related works on scientific articles. Sentence alignment was also manually evaluated, presenting an average of 98.8% correctly aligned sentences across all languages. Our parallel corpus is freely available in the TMX format, with complementary information regarding article metadata.
△ Less
Submitted 6 May, 2019;
originally announced May 2019.
-
Contextual Hybrid Session-based News Recommendation with Recurrent Neural Networks
Authors:
Gabriel de Souza Pereira Moreira,
Dietmar Jannach,
Adilson Marques da Cunha
Abstract:
Recommender systems help users deal with information overload by providing tailored item suggestions to them. The recommendation of news is often considered to be challenging, since the relevance of an article for a user can depend on a variety of factors, including the user's short-term reading interests, the reader's context, or the recency or popularity of an article. Previous work has shown th…
▽ More
Recommender systems help users deal with information overload by providing tailored item suggestions to them. The recommendation of news is often considered to be challenging, since the relevance of an article for a user can depend on a variety of factors, including the user's short-term reading interests, the reader's context, or the recency or popularity of an article. Previous work has shown that the use of Recurrent Neural Networks is promising for the next-in-session prediction task, but has certain limitations when only recorded item click sequences are used as input. In this work, we present a contextual hybrid, deep learning based approach for session-based news recommendation that is able to leverage a variety of information types. We evaluated our approach on two public datasets, using a temporal evaluation protocol that simulates the dynamics of a news portal in a realistic way. Our results confirm the benefits of considering additional types of information, including article popularity and recency, in the proposed way, resulting in significantly higher recommendation accuracy and catalog coverage than other session-based algorithms. Additional experiments show that the proposed parameterizable loss function used in our method also allows us to balance two usually conflicting quality factors, accuracy and novelty.
Keywords: Artificial Neural Networks, Context-Aware Recommender Systems, Hybrid Recommender Systems, News Recommender Systems, Session-based Recommendation
△ Less
Submitted 8 December, 2019; v1 submitted 15 April, 2019;
originally announced April 2019.
-
News Session-Based Recommendations using Deep Neural Networks
Authors:
Gabriel de Souza P. Moreira,
Felipe Ferreira,
Adilson Marques da Cunha
Abstract:
News recommender systems are aimed to personalize users experiences and help them to discover relevant articles from a large and dynamic search space. Therefore, news domain is a challenging scenario for recommendations, due to its sparse user profiling, fast growing number of items, accelerated item's value decay, and users preferences dynamic shift. Some promising results have been recently achi…
▽ More
News recommender systems are aimed to personalize users experiences and help them to discover relevant articles from a large and dynamic search space. Therefore, news domain is a challenging scenario for recommendations, due to its sparse user profiling, fast growing number of items, accelerated item's value decay, and users preferences dynamic shift. Some promising results have been recently achieved by the usage of Deep Learning techniques on Recommender Systems, specially for item's feature extraction and for session-based recommendations with Recurrent Neural Networks. In this paper, it is proposed an instantiation of the CHAMELEON -- a Deep Learning Meta-Architecture for News Recommender Systems. This architecture is composed of two modules, the first responsible to learn news articles representations, based on their text and metadata, and the second module aimed to provide session-based recommendations using Recurrent Neural Networks. The recommendation task addressed in this work is next-item prediction for users sessions: "what is the next most likely article a user might read in a session?" Users sessions context is leveraged by the architecture to provide additional information in such extreme cold-start scenario of news recommendation. Users' behavior and item features are both merged in an hybrid recommendation approach. A temporal offline evaluation method is also proposed as a complementary contribution, for a more realistic evaluation of such task, considering dynamic factors that affect global readership interests like popularity, recency, and seasonality. Experiments with an extensive number of session-based recommendation methods were performed and the proposed instantiation of CHAMELEON meta-architecture obtained a significant relative improvement in top-n accuracy and ranking metrics (10% on Hit Rate and 13% on MRR) over the best benchmark methods.
△ Less
Submitted 16 September, 2018; v1 submitted 31 July, 2018;
originally announced August 2018.
-
Where Is My Puppy? Retrieving Lost Dogs by Facial Features
Authors:
Thierry Pinheiro Moreira,
Mauricio Lisboa Perez,
Rafael de Oliveira Werneck,
Eduardo Valle
Abstract:
A pet that goes missing is among many people's worst fears: a moment of distraction is enough for a dog or a cat wandering off from home. Some measures help matching lost animals to their owners; but automated visual recognition is one that - although convenient, highly available, and low-cost - is surprisingly overlooked. In this paper, we inaugurate that promising avenue by pursuing face recogni…
▽ More
A pet that goes missing is among many people's worst fears: a moment of distraction is enough for a dog or a cat wandering off from home. Some measures help matching lost animals to their owners; but automated visual recognition is one that - although convenient, highly available, and low-cost - is surprisingly overlooked. In this paper, we inaugurate that promising avenue by pursuing face recognition for dogs. We contrast four ready-to-use human facial recognizers (EigenFaces, FisherFaces, LBPH, and a Sparse method) to two original solutions based upon convolutional neural networks: BARK (inspired in architecture-optimized networks employed for human facial recognition) and WOOF (based upon off-the-shelf OverFeat features). Human facial recognizers perform poorly for dogs (up to 60.5% accuracy), showing that dog facial recognition is not a trivial extension of human facial recognition. The convolutional network solutions work much better, with BARK attaining up to 81.1% accuracy, and WOOF, 89.4%. The tests were conducted in two datasets: Flickr-dog, with 42 dogs of two breeds (pugs and huskies); and Snoopybook, with 18 mongrel dogs.
△ Less
Submitted 1 August, 2016; v1 submitted 9 October, 2015;
originally announced October 2015.
-
3-D position estimation from inertial sensing: minimizing the error from the process of double integration of accelerations
Authors:
P. Neto,
J. N. Pires,
A. P Moreira
Abstract:
This paper introduces a new approach to 3-D position estimation from acceleration data, i.e., a 3-D motion tracking system having a small size and low-cost magnetic and inertial measurement unit (MIMU) composed by both a digital compass and a gyroscope as interaction technology. A major challenge is to minimize the error caused by the process of double integration of accelerations due to motion (t…
▽ More
This paper introduces a new approach to 3-D position estimation from acceleration data, i.e., a 3-D motion tracking system having a small size and low-cost magnetic and inertial measurement unit (MIMU) composed by both a digital compass and a gyroscope as interaction technology. A major challenge is to minimize the error caused by the process of double integration of accelerations due to motion (these ones have to be separated from the accelerations due to gravity). Owing to drift error, position estimation cannot be performed with adequate accuracy for periods longer than few seconds. For this reason, we propose a method to detect motion stops and only integrate accelerations in moments of effective hand motion during the demonstration process. The proposed system is validated and evaluated with experiments reporting a common daily life pick-and-place task.
△ Less
Submitted 18 November, 2013;
originally announced November 2013.
-
High-level programming and control for industrial robotics: using a hand-held accelerometer-based input device for gesture and posture recognition
Authors:
Pedro Neto,
Norberto Pires,
Paulo Moreira
Abstract:
Purpose - Most industrial robots are still programmed using the typical teaching process, through the use of the robot teach pendant. This is a tedious and time-consuming task that requires some technical expertise, and hence new approaches to robot programming are required. The purpose of this paper is to present a robotic system that allows users to instruct and program a robot with a high-level…
▽ More
Purpose - Most industrial robots are still programmed using the typical teaching process, through the use of the robot teach pendant. This is a tedious and time-consuming task that requires some technical expertise, and hence new approaches to robot programming are required. The purpose of this paper is to present a robotic system that allows users to instruct and program a robot with a high-level of abstraction from the robot language.
Design/methodology/approach - The paper presents in detail a robotic system that allows users, especially non-expert programmers, to instruct and program a robot just showing it what it should do, in an intuitive way. This is done using the two most natural human interfaces (gestures and speech), a force control system and several code generation techniques. Special attention will be given to the recognition of gestures, where the data extracted from a motion sensor (three-axis accelerometer) embedded in the Wii remote controller was used to capture human hand behaviours. Gestures (dynamic hand positions) as well as manual postures (static hand positions) are recognized using a statistical approach and artificial neural networks.
Practical implications - The key contribution of this paper is that it offers a practical method to program robots by means of gestures and speech, improving work efficiency and saving time.
Originality/value - This paper presents an alternative to the typical robot teaching process, extending the concept of human-robot interaction and co-worker scenario. Since most companies do not have engineering resources to make changes or add new functionalities to their robotic manufacturing systems, this system constitutes a major advantage for small- to medium-sized enterprises.
△ Less
Submitted 9 September, 2013;
originally announced September 2013.
-
Accelerometer-based control of an industrial robotic arm
Authors:
Pedro Neto,
Norberto Pires,
Paulo Moreira
Abstract:
Most of industrial robots are still programmed using the typical teaching process, through the use of the robot teach pendant. In this paper is proposed an accelerometer-based system to control an industrial robot using two low-cost and small 3-axis wireless accelerometers. These accelerometers are attached to the human arms, capturing its behavior (gestures and postures). An Artificial Neural Net…
▽ More
Most of industrial robots are still programmed using the typical teaching process, through the use of the robot teach pendant. In this paper is proposed an accelerometer-based system to control an industrial robot using two low-cost and small 3-axis wireless accelerometers. These accelerometers are attached to the human arms, capturing its behavior (gestures and postures). An Artificial Neural Network (ANN) trained with a back-propagation algorithm was used to recognize arm gestures and postures, which then will be used as input in the control of the robot. The aim is that the robot starts the movement almost at the same time as the user starts to perform a gesture or posture (low response time). The results show that the system allows the control of an industrial robot in an intuitive way. However, the achieved recognition rate of gestures and postures (92%) should be improved in future, keeping the compromise with the system response time (160 milliseconds). Finally, the results of some tests performed with an industrial robot are presented and discussed.
△ Less
Submitted 9 September, 2013;
originally announced September 2013.
-
A low-cost laser scanning solution for flexible robotic cells: spray coating
Authors:
Marcos Ferreira,
António Paulo Moreira,
Pedro Neto
Abstract:
In this paper, an adaptive and low-cost robotic coating platform for small production series is presented. This new platform presents a flexible architecture that enables fast/automatic system adaptive behaviour without human intervention. The concept is based on contactless technology, using artificial vision and laser scanning to identify and characterize different workpieces travelling on a con…
▽ More
In this paper, an adaptive and low-cost robotic coating platform for small production series is presented. This new platform presents a flexible architecture that enables fast/automatic system adaptive behaviour without human intervention. The concept is based on contactless technology, using artificial vision and laser scanning to identify and characterize different workpieces travelling on a conveyor. Using laser triangulation, the workpieces are virtually reconstructed through a simplified cloud of three-dimensional (3D) points. From those reconstructed models, several algorithms are implemented to extract information about workpieces profile (pattern recognition), size, boundary and pose. Such information is then used to on-line adjust the base robot programmes. These robot programmes are off-line generated from a 3D computer-aided design model of each different workpiece profile. Finally, the robotic manipulator executes the coating process after its base programmes have been adjusted. This is a low-cost and fully autonomous system that allows adapting the robots behaviour to different manufacturing situations. It means that the robot is ready to work over any piece at any time, and thus, small production series can be reduced to as much as a one-object series. No skilled workers and large setup times are needed to operate it. Experimental results showed that this solution proved to be efficient and can be applied not only for spray coating purposes but also for many other industrial processes (automatic manipulation, pick-and-place, inspection, etc.).
△ Less
Submitted 9 September, 2013;
originally announced September 2013.
-
High-level robot programming based on CAD: dealing with unpredictable environments
Authors:
Pedro Neto,
Nuno Mendes,
Ricardo Araújo,
J. Norberto Pires,
A. Paulo Moreira
Abstract:
Purpose - The purpose of this paper is to present a CAD-based human-robot interface that allows non-expert users to teach a robot in a manner similar to that used by human beings to teach each other.
Design/methodology/approach - Intuitive robot programming is achieved by using CAD drawings to generate robot programs off-line. Sensory feedback allows minimization of the effects of uncertainty, p…
▽ More
Purpose - The purpose of this paper is to present a CAD-based human-robot interface that allows non-expert users to teach a robot in a manner similar to that used by human beings to teach each other.
Design/methodology/approach - Intuitive robot programming is achieved by using CAD drawings to generate robot programs off-line. Sensory feedback allows minimization of the effects of uncertainty, providing information to adjust the robot paths during robot operation.
Findings - It was found that it is possible to generate a robot program from a common CAD drawing and run it without any major concerns about calibration or CAD model accuracy.
Research limitations/implications - A limitation of the proposed system has to do with the fact that it was designed to be used for particular technological applications.
Practical implications - Since most manufacturing companies have CAD packages in their facilities today, CAD-based robot programming may be a good option to program robots without the need for skilled robot programmers.
Originality/value - The paper proposes a new CAD-based robot programming system. Robot programs are directly generated from a CAD drawing running on a commonly available 3D CAD package (Autodesk Inventor) and not from a commercial, computer aided robotics (CAR) software, making it a simple CAD integrated solution. This is a low-cost and low-setup time system where no advanced robot programming skills are required to operate it. In summary, robot programs are generated with a high-level of abstraction from the robot language.
△ Less
Submitted 9 September, 2013;
originally announced September 2013.
-
Real-Time and Continuous Hand Gesture Spotting: an Approach Based on Artificial Neural Networks
Authors:
Pedro Neto,
Dário Pereira,
Norberto Pires,
Paulo Moreira
Abstract:
New and more natural human-robot interfaces are of crucial interest to the evolution of robotics. This paper addresses continuous and real-time hand gesture spotting, i.e., gesture segmentation plus gesture recognition. Gesture patterns are recognized by using artificial neural networks (ANNs) specifically adapted to the process of controlling an industrial robot. Since in continuous gesture recog…
▽ More
New and more natural human-robot interfaces are of crucial interest to the evolution of robotics. This paper addresses continuous and real-time hand gesture spotting, i.e., gesture segmentation plus gesture recognition. Gesture patterns are recognized by using artificial neural networks (ANNs) specifically adapted to the process of controlling an industrial robot. Since in continuous gesture recognition the communicative gestures appear intermittently with the noncommunicative, we are proposing a new architecture with two ANNs in series to recognize both kinds of gesture. A data glove is used as interface technology. Experimental results demonstrated that the proposed solution presents high recognition rates (over 99% for a library of ten gestures and over 96% for a library of thirty gestures), low training and learning time and a good capacity to generalize from particular situations.
△ Less
Submitted 9 September, 2013;
originally announced September 2013.
-
CAD-based robot programming: The role of Fuzzy-PI force control in unstructured environments
Authors:
Pedro Neto,
Nuno Mendes,
Norberto Pires,
Paulo Moreira
Abstract:
More and more, new ways of interaction between humans and robots are desired, something that allow us to program a robot in an intuitive way, quickly and with a high-level of abstraction from the robot language. In this paper is presented a CAD-based system that allows users with basic skills in CAD and without skills in robot programming to generate robot programs from a CAD model of a robotic ce…
▽ More
More and more, new ways of interaction between humans and robots are desired, something that allow us to program a robot in an intuitive way, quickly and with a high-level of abstraction from the robot language. In this paper is presented a CAD-based system that allows users with basic skills in CAD and without skills in robot programming to generate robot programs from a CAD model of a robotic cell. When the CAD model reproduces exactly the real scenario, the system presents a satisfactory performance. On the contrary, when the CAD model does not reproduce exactly the real scenario or the calibration process is poorly done, we are dealing with uncertain (unstructured environment). In order to minimize or eliminate the previously mentioned problems, it was introduced sensory feedback (force and torque sensing) in the robotic framework. By controlling the end-effector pose and specifying its relationship to the interaction/contact forces, robot programmers can ensure that the robot maneuvers in an unstructured environment, damping possible impacts and also increasing the tolerance to positioning errors from the calibration process. Fuzzy-PI reasoning was used as a force control technique. The effectiveness of the proposed approach was evaluated in a series of experiments.
△ Less
Submitted 9 September, 2013;
originally announced September 2013.
-
Visualization Optimization : Application to the RoboCup Rescue Domain
Authors:
Pedro Miguel Moreira,
Luís Paulo Reis,
António Augusto de Sousa
Abstract:
In this paper we demonstrate the use of intelligent optimization methodologies on the visualization optimization of virtual / simulated environments. The problem of automatic selection of an optimized set of views, which better describes an on-going simulation over a virtual environment is addressed in the context of the RoboCup Rescue Simulation domain. A generic architecture for optimization i…
▽ More
In this paper we demonstrate the use of intelligent optimization methodologies on the visualization optimization of virtual / simulated environments. The problem of automatic selection of an optimized set of views, which better describes an on-going simulation over a virtual environment is addressed in the context of the RoboCup Rescue Simulation domain. A generic architecture for optimization is proposed and described. We outline the possible extensions of this architecture and argue on how several problems within the fields of Interactive Rendering and Visualization can benefit from it.
△ Less
Submitted 13 October, 2008;
originally announced October 2008.