-
On the Residual-based Neural Network for Unmodeled Distortions in Coordinate Transformation
Authors:
Vinicius Francisco Rofatto,
Luiz Felipe Rodrigues de Almeida,
Marcelo Tomio Matsuoka,
Ivandro Klein,
Mauricio Roberto Veronez,
Luiz Gonzaga Da Silveira Junior
Abstract:
Coordinate transformation models often fail to account for nonlinear and spatially dependent distortions, leading to significant residual errors in geospatial applications. Here we propose a residual-based neural correction strategy, in which a neural network learns to model only the systematic distortions left by an initial geometric transformation. By focusing solely on residual patterns, the pr…
▽ More
Coordinate transformation models often fail to account for nonlinear and spatially dependent distortions, leading to significant residual errors in geospatial applications. Here we propose a residual-based neural correction strategy, in which a neural network learns to model only the systematic distortions left by an initial geometric transformation. By focusing solely on residual patterns, the proposed method reduces model complexity and improves performance, particularly in scenarios with sparse or structured control point configurations. We evaluate the method using both simulated datasets with varying distortion intensities and sampling strategies, as well as under the real-world image georeferencing tasks. Compared with direct neural network coordinate converter and classical transformation models, the residual-based neural correction delivers more accurate and stable results under challenging conditions, while maintaining comparable performance in ideal cases. These findings demonstrate the effectiveness of residual modelling as a lightweight and robust alternative for improving coordinate transformation accuracy.
△ Less
Submitted 19 April, 2025;
originally announced May 2025.
-
Enhancing Portuguese Variety Identification with Cross-Domain Approaches
Authors:
Hugo Sousa,
Rúben Almeida,
Purificação Silvano,
Inês Cantante,
Ricardo Campos,
Alípio Jorge
Abstract:
Recent advances in natural language processing have raised expectations for generative models to produce coherent text across diverse language varieties. In the particular case of the Portuguese language, the predominance of Brazilian Portuguese corpora online introduces linguistic biases in these models, limiting their applicability outside of Brazil. To address this gap and promote the creation…
▽ More
Recent advances in natural language processing have raised expectations for generative models to produce coherent text across diverse language varieties. In the particular case of the Portuguese language, the predominance of Brazilian Portuguese corpora online introduces linguistic biases in these models, limiting their applicability outside of Brazil. To address this gap and promote the creation of European Portuguese resources, we developed a cross-domain language variety identifier (LVI) to discriminate between European and Brazilian Portuguese. Motivated by the findings of our literature review, we compiled the PtBrVarId corpus, a cross-domain LVI dataset, and study the effectiveness of transformer-based LVI classifiers for cross-domain scenarios. Although this research focuses on two Portuguese varieties, our contribution can be extended to other varieties and languages. We open source the code, corpus, and models to foster further research in this task.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Debiasing Architectural Decision-Making: An Experiment With Students and Practitioners
Authors:
Klara Borowa,
Rodrigo Rebouças de Almeida,
Marion Wiese
Abstract:
Cognitive biases are predictable, systematic errors in human reasoning. They influence decision-making in various areas, including architectural decision-making, where architects face many choices. For example, anchoring can cause architects to unconsciously prefer the first architectural solution that they came up with, without considering any solution alternatives. Prior research suggests that t…
▽ More
Cognitive biases are predictable, systematic errors in human reasoning. They influence decision-making in various areas, including architectural decision-making, where architects face many choices. For example, anchoring can cause architects to unconsciously prefer the first architectural solution that they came up with, without considering any solution alternatives. Prior research suggests that training individuals in debiasing techniques during a practical workshop can help reduce the impact of biases. The goal of this study was to design and evaluate a debiasing workshop with individuals at various stages of their professional careers. To test the workshop's effectiveness, we performed an experiment with 16 students and 20 practitioners, split into control and workshop group pairs. We recorded and analyzed their think-aloud discussions about improving the architectures of systems they collaborated on. The workshop improved the participants' argumentation when discussing architectural decisions and increased the use of debiasing techniques taught during the workshop. This led to the successful reduction of the researched biases' occurrences. In particular, anchoring and optimism bias occurrences decreased significantly. We also found that practitioners were more susceptible to cognitive biases than students, so the workshop had a more substantial impact on practitioners. We assume that the practitioners' attachment to their systems may be the cause of their susceptibility to biases. Finally, we identified factors that may reduce the effectiveness of the debiasing workshop. On that basis, we prepared a set of teaching suggestions for educators. Overall, we recommend using this workshop to educate both students and experienced practitioners about the typical harmful influences of cognitive bias on architectural decisions and how to avoid them.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
The TechDebt Game -- Enabling Discussions about Technical Debt
Authors:
Marion Wiese,
Angelina Heinrichs,
Nino Rusieshvili,
Rodrigo Rebouças de Almeida,
Klara Borowa
Abstract:
Context. Technical Debt (TD), defined as software constructs that are beneficial in the short term but may hinder future change, is a frequently used term in software development practice. Nevertheless, practitioners do not always fully understand its definition and, in particular, conceptual model. Previous research highlights that communication about TD is challenging, especially with non-techni…
▽ More
Context. Technical Debt (TD), defined as software constructs that are beneficial in the short term but may hinder future change, is a frequently used term in software development practice. Nevertheless, practitioners do not always fully understand its definition and, in particular, conceptual model. Previous research highlights that communication about TD is challenging, especially with non-technical stakeholders. Discussions on this topic often cause conflicts due to misunderstandings related to other stakeholders' perspectives. Goal. We designed a board game to emulate TD concepts to make them tangible to all stakeholders, including non-technical ones. The game aims to encourage discussions about TD in an emulated and safe environment, thereby avoiding real-life conflicts. Method. To evaluate the game's effectiveness, we surveyed 46 practitioners from diverse domains, positions, and experience levels who played the game in 13 sessions following extensive testing during its development. In addition to the players' general feedback, we examined situations where players recognized new insights about TD or connected game scenarios to real-life experiences. Results. Overall, the feedback on the game and its enjoyment factor were highly positive. While developers and software architects often connected game situations to their real-world experiences, non-technical stakeholders, such as scrum masters, product owners, and less experienced developers, encountered multiple new insights on TD. Numerous players have shifted their attitudes toward TD and have outlined a plan to modify their behavior regarding TD management. Conclusions. Although the game may not lead to long-term behavior change among stakeholders, participants' feedback provides evidence that it might serve as a valuable starting point for team discussions on technical debt management.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
THÖR-MAGNI Act: Actions for Human Motion Modeling in Robot-Shared Industrial Spaces
Authors:
Tiago Rodrigues de Almeida,
Tim Schreiter,
Andrey Rudenko,
Luigi Palmieiri,
Johannes A. Stork,
Achim J. Lilienthal
Abstract:
Accurate human activity and trajectory prediction are crucial for ensuring safe and reliable human-robot interactions in dynamic environments, such as industrial settings, with mobile robots. Datasets with fine-grained action labels for moving people in industrial environments with mobile robots are scarce, as most existing datasets focus on social navigation in public spaces. This paper introduce…
▽ More
Accurate human activity and trajectory prediction are crucial for ensuring safe and reliable human-robot interactions in dynamic environments, such as industrial settings, with mobile robots. Datasets with fine-grained action labels for moving people in industrial environments with mobile robots are scarce, as most existing datasets focus on social navigation in public spaces. This paper introduces the THÖR-MAGNI Act dataset, a substantial extension of the THÖR-MAGNI dataset, which captures participant movements alongside robots in diverse semantic and spatial contexts. THÖR-MAGNI Act provides 8.3 hours of manually labeled participant actions derived from egocentric videos recorded via eye-tracking glasses. These actions, aligned with the provided THÖR-MAGNI motion cues, follow a long-tailed distribution with diversified acceleration, velocity, and navigation distance profiles. We demonstrate the utility of THÖR-MAGNI Act for two tasks: action-conditioned trajectory prediction and joint action and trajectory prediction. We propose two efficient transformer-based models that outperform the baselines to address these tasks. These results underscore the potential of THÖR-MAGNI Act to develop predictive models for enhanced human-robot interaction in complex environments.
△ Less
Submitted 23 December, 2024; v1 submitted 18 December, 2024;
originally announced December 2024.
-
Using Deep Neural Networks to Quantify Parking Dwell Time
Authors:
Marcelo Eduardo Marques Ribas,
Heloisa Benedet Mendes,
Luiz Eduardo Soares de Oliveira,
Luiz Antonio Zanlorensi,
Paulo Ricardo Lisboa de Almeida
Abstract:
In smart cities, it is common practice to define a maximum length of stay for a given parking space to increase the space's rotativity and discourage the usage of individual transportation solutions. However, automatically determining individual car dwell times from images faces challenges, such as images collected from low-resolution cameras, lighting variations, and weather effects. In this work…
▽ More
In smart cities, it is common practice to define a maximum length of stay for a given parking space to increase the space's rotativity and discourage the usage of individual transportation solutions. However, automatically determining individual car dwell times from images faces challenges, such as images collected from low-resolution cameras, lighting variations, and weather effects. In this work, we propose a method that combines two deep neural networks to compute the dwell time of each car in a parking lot. The proposed method first defines the parking space status between occupied and empty using a deep classification network. Then, it uses a Siamese network to check if the parked car is the same as the previous image. Using an experimental protocol that focuses on a cross-dataset scenario, we show that if a perfect classifier is used, the proposed system generates 75% of perfect dwell time predictions, where the predicted value matched exactly the time the car stayed parked. Nevertheless, our experiments show a drop in prediction quality when a real-world classifier is used to predict the parking space statuses, reaching 49% of perfect predictions, showing that the proposed Siamese network is promising but impacted by the quality of the classifier used at the beginning of the pipeline.
△ Less
Submitted 31 October, 2024;
originally announced November 2024.
-
Statistical Validation of Column Matching in the Database Schema Evolution of the Brazilian Public School Census
Authors:
Muriki G. Yamanaka,
Diogo H. de Almeida,
Paulo R. Lisboa de Almeida,
Simone Dominico,
Leticia M. Peres,
Marcos S. Sunye,
Eduardo C. de Almeida
Abstract:
Publicly available datasets are subject to new versions, with each new version potentially reflecting changes to the data. These changes may involve adding or removing attributes, changing data types, and modifying values or their semantics. Integrating these datasets into a database poses a significant challenge: how to keep track of the evolving database schema while incorporating different vers…
▽ More
Publicly available datasets are subject to new versions, with each new version potentially reflecting changes to the data. These changes may involve adding or removing attributes, changing data types, and modifying values or their semantics. Integrating these datasets into a database poses a significant challenge: how to keep track of the evolving database schema while incorporating different versions of the data sources? This paper presents a statistical methodology to validate the integration of 12 years of open-access datasets from Brazil's School Census, with a new version of the datasets released annually by the Brazilian Ministry of Education (MEC). We employ various statistical tests to find matching attributes between datasets from a specific year and their potential equivalents in datasets from later years. The results show that by using the Kolmogorov-Smirnov test we can successfully match columns from different dataset versions in about 90% of cases.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
A Coalgebraic Semantics for Intuitionistic Modal Logic
Authors:
Rodrigo Nicolau Almeida,
Nick Bezhanishvili
Abstract:
We give a new coalgebraic semantics for intuitionistic modal logic with $\Box$. In particular, we provide a colagebraic representation of intuitionistic descriptive modal frames and of intuitonistic modal Kripke frames based on image-finite posets. This gives a solution to a problem in the area of coalgebaic logic for these classes of frames, raised explicitly by Litak (2014) and de Groot and Patt…
▽ More
We give a new coalgebraic semantics for intuitionistic modal logic with $\Box$. In particular, we provide a colagebraic representation of intuitionistic descriptive modal frames and of intuitonistic modal Kripke frames based on image-finite posets. This gives a solution to a problem in the area of coalgebaic logic for these classes of frames, raised explicitly by Litak (2014) and de Groot and Pattinson (2020). Our key technical tool is a recent generalization of a construction by Ghilardi, in the form of a right adjoint to the inclusion of the category of Esakia spaces in the category of Priestley spaces. As an application of these results, we study bisimulations of intuitionistic modal frames, describe dual spaces of free modal Heyting algebras, and provide a path towards a theory of coalgebraic intuitionistic logics.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Unification with Simple Variable Restrictions and Admissibility of $Π_{2}$-rules
Authors:
Rodrigo Nicolau Almeida,
Silvio Ghilardi
Abstract:
We develop a method to recognize admissibility of $Π_{2}$-rules, relating this problem to a specific instance of the unification problem with linear constants restriction, called here "unification with simple variable restriction". It is shown that for logical systems enjoying an appropriate algebraic semantics and a finite approximation of left uniform interpolation, this unification with simple…
▽ More
We develop a method to recognize admissibility of $Π_{2}$-rules, relating this problem to a specific instance of the unification problem with linear constants restriction, called here "unification with simple variable restriction". It is shown that for logical systems enjoying an appropriate algebraic semantics and a finite approximation of left uniform interpolation, this unification with simple variable restriction can be reduced to standard unification. As a corollary, we obtain the decidability of admissibility of $Π_{2}$-rules for many logical systems.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
A Legal Framework for Natural Language Processing Model Training in Portugal
Authors:
Rúben Almeida,
Evelin Amorim
Abstract:
Recent advances in deep learning have promoted the advent of many computational systems capable of performing intelligent actions that, until then, were restricted to the human intellect. In the particular case of human languages, these advances allowed the introduction of applications like ChatGPT that are capable of generating coherent text without being explicitly programmed to do so. Instead,…
▽ More
Recent advances in deep learning have promoted the advent of many computational systems capable of performing intelligent actions that, until then, were restricted to the human intellect. In the particular case of human languages, these advances allowed the introduction of applications like ChatGPT that are capable of generating coherent text without being explicitly programmed to do so. Instead, these models use large volumes of textual data to learn meaningful representations of human languages. Associated with these advances, concerns about copyright and data privacy infringements caused by these applications have emerged. Despite these concerns, the pace at which new natural language processing applications continued to be developed largely outperformed the introduction of new regulations. Today, communication barriers between legal experts and computer scientists motivate many unintentional legal infringements during the development of such applications. In this paper, a multidisciplinary team intends to bridge this communication gap and promote more compliant Portuguese NLP research by presenting a series of everyday NLP use cases, while highlighting the Portuguese legislation that may arise during its development.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
THÖR-MAGNI: A Large-scale Indoor Motion Capture Recording of Human Movement and Robot Interaction
Authors:
Tim Schreiter,
Tiago Rodrigues de Almeida,
Yufei Zhu,
Eduardo Gutierrez Maestro,
Lucas Morillo-Mendez,
Andrey Rudenko,
Luigi Palmieri,
Tomasz P. Kucner,
Martin Magnusson,
Achim J. Lilienthal
Abstract:
We present a new large dataset of indoor human and robot navigation and interaction, called THÖR-MAGNI, that is designed to facilitate research on social navigation: e.g., modelling and predicting human motion, analyzing goal-oriented interactions between humans and robots, and investigating visual attention in a social interaction context. THÖR-MAGNI was created to fill a gap in available dataset…
▽ More
We present a new large dataset of indoor human and robot navigation and interaction, called THÖR-MAGNI, that is designed to facilitate research on social navigation: e.g., modelling and predicting human motion, analyzing goal-oriented interactions between humans and robots, and investigating visual attention in a social interaction context. THÖR-MAGNI was created to fill a gap in available datasets for human motion analysis and HRI. This gap is characterized by a lack of comprehensive inclusion of exogenous factors and essential target agent cues, which hinders the development of robust models capable of capturing the relationship between contextual cues and human behavior in different scenarios. Unlike existing datasets, THÖR-MAGNI includes a broader set of contextual features and offers multiple scenario variations to facilitate factor isolation. The dataset includes many social human-human and human-robot interaction scenarios, rich context annotations, and multi-modal data, such as walking trajectories, gaze tracking data, and lidar and camera streams recorded from a mobile robot. We also provide a set of tools for visualization and processing of the recorded data. THÖR-MAGNI is, to the best of our knowledge, unique in the amount and diversity of sensor data collected in a contextualized and socially dynamic environment, capturing natural human-robot interactions.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Indexing Portuguese NLP Resources with PT-Pump-Up
Authors:
Rúben Almeida,
Ricardo Campos,
Alípio Jorge,
Sérgio Nunes
Abstract:
The recent advances in natural language processing (NLP) are linked to training processes that require vast amounts of corpora. Access to this data is commonly not a trivial process due to resource dispersion and the need to maintain these infrastructures online and up-to-date. New developments in NLP are often compromised due to the scarcity of data or lack of a shared repository that works as an…
▽ More
The recent advances in natural language processing (NLP) are linked to training processes that require vast amounts of corpora. Access to this data is commonly not a trivial process due to resource dispersion and the need to maintain these infrastructures online and up-to-date. New developments in NLP are often compromised due to the scarcity of data or lack of a shared repository that works as an entry point to the community. This is especially true in low and mid-resource languages, such as Portuguese, which lack data and proper resource management infrastructures. In this work, we propose PT-Pump-Up, a set of tools that aim to reduce resource dispersion and improve the accessibility to Portuguese NLP resources. Our proposal is divided into four software components: a) a web platform to list the available resources; b) a client-side Python package to simplify the loading of Portuguese NLP resources; c) an administrative Python package to manage the platform and d) a public GitHub repository to foster future collaboration and contributions. All four components are accessible using: https://linktr.ee/pt_pump_up
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
Physio: An LLM-Based Physiotherapy Advisor
Authors:
Rúben Almeida,
Hugo Sousa,
Luís F. Cunha,
Nuno Guimarães,
Ricardo Campos,
Alípio Jorge
Abstract:
The capabilities of the most recent language models have increased the interest in integrating them into real-world applications. However, the fact that these models generate plausible, yet incorrect text poses a constraint when considering their use in several domains. Healthcare is a prime example of a domain where text-generative trustworthiness is a hard requirement to safeguard patient well-b…
▽ More
The capabilities of the most recent language models have increased the interest in integrating them into real-world applications. However, the fact that these models generate plausible, yet incorrect text poses a constraint when considering their use in several domains. Healthcare is a prime example of a domain where text-generative trustworthiness is a hard requirement to safeguard patient well-being. In this paper, we present Physio, a chat-based application for physical rehabilitation. Physio is capable of making an initial diagnosis while citing reliable health sources to support the information provided. Furthermore, drawing upon external knowledge databases, Physio can recommend rehabilitation exercises and over-the-counter medication for symptom relief. By combining these features, Physio can leverage the power of generative models for language processing while also conditioning its response on dependable and verifiable sources. A live demo of Physio is available at https://physio.inesctec.pt.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
$Π_{2}$-Rule Systems and Inductive Classes of Gödel Algebras
Authors:
Rodrigo Nicolau Almeida
Abstract:
In this paper we present a general theory of $Π_{2}$-rules for systems of intuitionistic and modal logic. We introduce the notions of $Π_{2}$-rule system and of an Inductive Class, and provide model-theoretic and algebraic completeness theorems, which serve as our basic tools. As an illustration of the general theory, we analyse the structure of inductive classes of Gödel algebras, from a structur…
▽ More
In this paper we present a general theory of $Π_{2}$-rules for systems of intuitionistic and modal logic. We introduce the notions of $Π_{2}$-rule system and of an Inductive Class, and provide model-theoretic and algebraic completeness theorems, which serve as our basic tools. As an illustration of the general theory, we analyse the structure of inductive classes of Gödel algebras, from a structure theoretic and logical point of view. We show that unlike other well-studied settings (such as logics, or single-conclusion rule systems), there are continuum many $Π_{2}$-rule systems extending $\mathsf{LC}=\mathsf{IPC}+(p\rightarrow q)\vee (q\rightarrow p)$, and show how our methods allow easy proofs of the admissibility of the well-known Takeuti-Titani rule. Our final results concern general questions admissibility in $\mathsf{LC}$: (1) we present a full classification of those inductive classes which are inductively complete, i.e., where all $Π_{2}$-rules which are admissible are derivable, and (2) show that the problem of admissibility of $Π_{2}$-rules over $\mathsf{LC}$ is decidable.
△ Less
Submitted 13 November, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
Deep Single Models vs. Ensembles: Insights for a Fast Deployment of Parking Monitoring Systems
Authors:
Andre Gustavo Hochuli,
Jean Paul Barddal,
Gillian Cezar Palhano,
Leonardo Matheus Mendes,
Paulo Ricardo Lisboa de Almeida
Abstract:
Searching for available parking spots in high-density urban centers is a stressful task for drivers that can be mitigated by systems that know in advance the nearest parking space available.
To this end, image-based systems offer cost advantages over other sensor-based alternatives (e.g., ultrasonic sensors), requiring less physical infrastructure for installation and maintenance.
Despite rece…
▽ More
Searching for available parking spots in high-density urban centers is a stressful task for drivers that can be mitigated by systems that know in advance the nearest parking space available.
To this end, image-based systems offer cost advantages over other sensor-based alternatives (e.g., ultrasonic sensors), requiring less physical infrastructure for installation and maintenance.
Despite recent deep learning advances, deploying intelligent parking monitoring is still a challenge since most approaches involve collecting and labeling large amounts of data, which is laborious and time-consuming. Our study aims to uncover the challenges in creating a global framework, trained using publicly available labeled parking lot images, that performs accurately across diverse scenarios, enabling the parking space monitoring as a ready-to-use system to deploy in a new environment. Through exhaustive experiments involving different datasets and deep learning architectures, including fusion strategies and ensemble methods, we found that models trained on diverse datasets can achieve 95\% accuracy without the burden of data annotation and model training on the target parking lot
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
Likely, Light, and Accurate Context-Free Clusters-based Trajectory Prediction
Authors:
Tiago Rodrigues de Almeida,
Oscar Martinez Mozos
Abstract:
Autonomous systems in the road transportation network require intelligent mechanisms that cope with uncertainty to foresee the future. In this paper, we propose a multi-stage probabilistic approach for trajectory forecasting: trajectory transformation to displacement space, clustering of displacement time series, trajectory proposals, and ranking proposals. We introduce a new deep feature clusteri…
▽ More
Autonomous systems in the road transportation network require intelligent mechanisms that cope with uncertainty to foresee the future. In this paper, we propose a multi-stage probabilistic approach for trajectory forecasting: trajectory transformation to displacement space, clustering of displacement time series, trajectory proposals, and ranking proposals. We introduce a new deep feature clustering method, underlying self-conditioned GAN, which copes better with distribution shifts than traditional methods. Additionally, we propose novel distance-based ranking proposals to assign probabilities to the generated trajectories that are more efficient yet accurate than an auxiliary neural network. The overall system surpasses context-free deep generative models in human and road agents trajectory data while performing similarly to point estimators when comparing the most probable trajectory.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Distance Functions and Normalization Under Stream Scenarios
Authors:
Eduardo V. L. Barboza,
Paulo R. Lisboa de Almeida,
Alceu de Souza Britto Jr,
Rafael M. O. Cruz
Abstract:
Data normalization is an essential task when modeling a classification system. When dealing with data streams, data normalization becomes especially challenging since we may not know in advance the properties of the features, such as their minimum/maximum values, and these properties may change over time. We compare the accuracies generated by eight well-known distance functions in data streams wi…
▽ More
Data normalization is an essential task when modeling a classification system. When dealing with data streams, data normalization becomes especially challenging since we may not know in advance the properties of the features, such as their minimum/maximum values, and these properties may change over time. We compare the accuracies generated by eight well-known distance functions in data streams without normalization, normalized considering the statistics of the first batch of data received, and considering the previous batch received. We argue that experimental protocols for streams that consider the full stream as normalized are unrealistic and can lead to biased and poor results. Our results indicate that using the original data stream without applying normalization, and the Canberra distance, can be a good combination when no information about the data stream is known beforehand.
△ Less
Submitted 4 July, 2023; v1 submitted 30 June, 2023;
originally announced July 2023.
-
Vehicle Occurrence-based Parking Space Detection
Authors:
Paulo R. Lisboa de Almeida,
Jeovane Honório Alves,
Luiz S. Oliveira,
Andre Gustavo Hochuli,
João V. Fröhlich,
Rodrigo A. Krauel
Abstract:
Smart-parking solutions use sensors, cameras, and data analysis to improve parking efficiency and reduce traffic congestion. Computer vision-based methods have been used extensively in recent years to tackle the problem of parking lot management, but most of the works assume that the parking spots are manually labeled, impacting the cost and feasibility of deployment. To fill this gap, this work p…
▽ More
Smart-parking solutions use sensors, cameras, and data analysis to improve parking efficiency and reduce traffic congestion. Computer vision-based methods have been used extensively in recent years to tackle the problem of parking lot management, but most of the works assume that the parking spots are manually labeled, impacting the cost and feasibility of deployment. To fill this gap, this work presents an automatic parking space detection method, which receives a sequence of images of a parking lot and returns a list of coordinates identifying the detected parking spaces. The proposed method employs instance segmentation to identify cars and, using vehicle occurrence, generate a heat map of parking spaces. The results using twelve different subsets from the PKLot and CNRPark-EXT parking lot datasets show that the method achieved an AP25 score up to 95.60\% and AP50 score up to 79.90\%.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
UAS in the Airspace: A Review on Integration, Simulation, Optimization, and Open Challenges
Authors:
Euclides Carlos Pinto Neto,
Derick Moreira Baum,
Jorge Rady de Almeida Jr.,
Joao Batista Camargo Jr.,
Paulo Sergio Cugnasca
Abstract:
Air transportation is essential for society, and it is increasing gradually due to its importance. To improve the airspace operation, new technologies are under development, such as Unmanned Aircraft Systems (UAS). In fact, in the past few years, there has been a growth in UAS numbers in segregated airspace. However, there is an interest in integrating these aircraft into the National Airspace Sys…
▽ More
Air transportation is essential for society, and it is increasing gradually due to its importance. To improve the airspace operation, new technologies are under development, such as Unmanned Aircraft Systems (UAS). In fact, in the past few years, there has been a growth in UAS numbers in segregated airspace. However, there is an interest in integrating these aircraft into the National Airspace System (NAS). The UAS is vital to different industries due to its advantages brought to the airspace (e.g., efficiency). Conversely, the relationship between UAS and Air Traffic Control (ATC) needs to be well-defined due to the impacts on ATC capacity these aircraft may present. Throughout the years, this impact may be lower than it is nowadays because the current lack of familiarity in this relationship contributes to higher workload levels. Thereupon, the primary goal of this research is to present a comprehensive review of the advancements in the integration of UAS in the National Airspace System (NAS) from different perspectives. We consider the challenges regarding simulation, final approach, and optimization of problems related to the interoperability of such systems in the airspace. Finally, we identify several open challenges in the field based on the existing state-of-the-art proposals.
△ Less
Submitted 24 November, 2022;
originally announced November 2022.
-
The Magni Human Motion Dataset: Accurate, Complex, Multi-Modal, Natural, Semantically-Rich and Contextualized
Authors:
Tim Schreiter,
Tiago Rodrigues de Almeida,
Yufei Zhu,
Eduardo Gutierrez Maestro,
Lucas Morillo-Mendez,
Andrey Rudenko,
Tomasz P. Kucner,
Oscar Martinez Mozos,
Martin Magnusson,
Luigi Palmieri,
Kai O. Arras,
Achim J. Lilienthal
Abstract:
Rapid development of social robots stimulates active research in human motion modeling, interpretation and prediction, proactive collision avoidance, human-robot interaction and co-habitation in shared spaces. Modern approaches to this end require high quality datasets for training and evaluation. However, the majority of available datasets suffers from either inaccurate tracking data or unnatural…
▽ More
Rapid development of social robots stimulates active research in human motion modeling, interpretation and prediction, proactive collision avoidance, human-robot interaction and co-habitation in shared spaces. Modern approaches to this end require high quality datasets for training and evaluation. However, the majority of available datasets suffers from either inaccurate tracking data or unnatural, scripted behavior of the tracked people. This paper attempts to fill this gap by providing high quality tracking information from motion capture, eye-gaze trackers and on-board robot sensors in a semantically-rich environment. To induce natural behavior of the recorded participants, we utilise loosely scripted task assignment, which induces the participants navigate through the dynamic laboratory environment in a natural and purposeful way. The motion dataset, presented in this paper, sets a high quality standard, as the realistic and accurate data is enhanced with semantic information, enabling development of new algorithms which rely not only on the tracking information but also on contextual cues of the moving agents, static and dynamic environment.
△ Less
Submitted 31 August, 2022;
originally announced August 2022.
-
Evaluation of Different Annotation Strategies for Deployment of Parking Spaces Classification Systems
Authors:
Andre G. Hochuli,
Alceu S. Britto Jr.,
Paulo R. L. de Almeida,
Williams B. S. Alves,
Fabio M. C. Cagni
Abstract:
When using vision-based approaches to classify individual parking spaces between occupied and empty, human experts often need to annotate the locations and label a training set containing images collected in the target parking lot to fine-tune the system. We propose investigating three annotation types (polygons, bounding boxes, and fixed-size squares), providing different data representations of…
▽ More
When using vision-based approaches to classify individual parking spaces between occupied and empty, human experts often need to annotate the locations and label a training set containing images collected in the target parking lot to fine-tune the system. We propose investigating three annotation types (polygons, bounding boxes, and fixed-size squares), providing different data representations of the parking spaces. The rationale is to elucidate the best trade-off between handcraft annotation precision and model performance. We also investigate the number of annotated parking spaces necessary to fine-tune a pre-trained model in the target parking lot. Experiments using the PKLot dataset show that it is possible to fine-tune a model to the target parking lot with less than 1,000 labeled samples, using low precision annotations such as fixed-size squares.
△ Less
Submitted 22 July, 2022;
originally announced July 2022.
-
A Report on Achieving Complete Regular-Expression Matching using Mealy Machines
Authors:
Ricardo Almeida
Abstract:
While regexp matching is a powerful mechanism for finding patterns in data streams, regexp engines in general only find matches that do not overlap. Moreover, different forms of nondeterministic exploration, where symbols read are processed more than once, are often used, which can be costly in real-time matching. We present an algorithm that constructs from any regexp a Mealy machine that finds a…
▽ More
While regexp matching is a powerful mechanism for finding patterns in data streams, regexp engines in general only find matches that do not overlap. Moreover, different forms of nondeterministic exploration, where symbols read are processed more than once, are often used, which can be costly in real-time matching. We present an algorithm that constructs from any regexp a Mealy machine that finds all matches and while reading each input symbol only once. The machine computed can also detect and distinguish different patterns or sub-patterns inside patterns. Additionally, we show how to compute a minimal Mealy machine via a variation of DFA minimization, by formalizing Mealy machines in terms of regular languages.
△ Less
Submitted 10 June, 2022;
originally announced June 2022.
-
A Systematic Review on Computer Vision-Based Parking Lot Management Applied on Public Datasets
Authors:
Paulo Ricardo Lisboa de Almeida,
Jeovane Honório Alves,
Rafael Stubs Parpinelli,
Jean Paul Barddal
Abstract:
Computer vision-based parking lot management methods have been extensively researched upon owing to their flexibility and cost-effectiveness. To evaluate such methods authors often employ publicly available parking lot image datasets. In this study, we surveyed and compared robust publicly available image datasets specifically crafted to test computer vision-based methods for parking lot managemen…
▽ More
Computer vision-based parking lot management methods have been extensively researched upon owing to their flexibility and cost-effectiveness. To evaluate such methods authors often employ publicly available parking lot image datasets. In this study, we surveyed and compared robust publicly available image datasets specifically crafted to test computer vision-based methods for parking lot management approaches and consequently present a systematic and comprehensive review of existing works that employ such datasets. The literature review identified relevant gaps that require further research, such as the requirement of dataset-independent approaches and methods suitable for autonomous detection of position of parking spaces. In addition, we have noticed that several important factors such as the presence of the same cars across consecutive images, have been neglected in most studies, thereby rendering unrealistic assessment protocols. Furthermore, the analysis of the datasets also revealed that certain features that should be present when developing new benchmarks, such as the availability of video sequences and images taken in more diverse conditions, including nighttime and snow, have not been incorporated.
△ Less
Submitted 12 March, 2022;
originally announced March 2022.
-
What's behind tight deadlines? Business causes of technical debt
Authors:
Rodrigo Rebouças de Almeida,
Christoph Treude,
Uirá Kulesza
Abstract:
What are the business causes behind tight deadlines? What drives the prioritization of features that pushes quality matters to the back burner? We conducted a survey with 71 experienced practitioners and did a thematic analysis of the open-ended answers to the question: ``Could you give examples of how business may contribute to technical debt?'' Business-related causes were organized into two cat…
▽ More
What are the business causes behind tight deadlines? What drives the prioritization of features that pushes quality matters to the back burner? We conducted a survey with 71 experienced practitioners and did a thematic analysis of the open-ended answers to the question: ``Could you give examples of how business may contribute to technical debt?'' Business-related causes were organized into two categories: pure-business and business/IT gap, and they were related to `tight deadlines' and `features over quality', the most frequently cited management reasons for technical debt. We contribute a cause-effect model which relates the various business causes of tight deadlines and the behavior of prioritizing features over quality aspects.
△ Less
Submitted 19 March, 2023; v1 submitted 19 April, 2021;
originally announced April 2021.
-
A new interpretable unsupervised anomaly detection method based on residual explanation
Authors:
David F. N. Oliveira,
Lucio F. Vismari,
Alexandre M. Nascimento,
Jorge R. de Almeida Jr,
Paulo S. Cugnasca,
Joao B. Camargo Jr,
Leandro Almeida,
Rafael Gripp,
Marcelo Neves
Abstract:
Despite the superior performance in modeling complex patterns to address challenging problems, the black-box nature of Deep Learning (DL) methods impose limitations to their application in real-world critical domains. The lack of a smooth manner for enabling human reasoning about the black-box decisions hinder any preventive action to unexpected events, in which may lead to catastrophic consequenc…
▽ More
Despite the superior performance in modeling complex patterns to address challenging problems, the black-box nature of Deep Learning (DL) methods impose limitations to their application in real-world critical domains. The lack of a smooth manner for enabling human reasoning about the black-box decisions hinder any preventive action to unexpected events, in which may lead to catastrophic consequences. To tackle the unclearness from black-box models, interpretability became a fundamental requirement in DL-based systems, leveraging trust and knowledge by providing ways to understand the model's behavior. Although a current hot topic, further advances are still needed to overcome the existing limitations of the current interpretability methods in unsupervised DL-based models for Anomaly Detection (AD). Autoencoders (AE) are the core of unsupervised DL-based for AD applications, achieving best-in-class performance. However, due to their hybrid aspect to obtain the results (by requiring additional calculations out of network), only agnostic interpretable methods can be applied to AE-based AD. These agnostic methods are computationally expensive to process a large number of parameters. In this paper we present the RXP (Residual eXPlainer), a new interpretability method to deal with the limitations for AE-based AD in large-scale systems. It stands out for its implementation simplicity, low computational cost and deterministic behavior, in which explanations are obtained through the deviation analysis of reconstructed input features. In an experiment using data from a real heavy-haul railway line, the proposed method achieved superior performance compared to SHAP, demonstrating its potential to support decision making in large scale critical systems.
△ Less
Submitted 14 March, 2021;
originally announced March 2021.
-
Business-Driven Technical Debt Prioritization: An Industrial Case Study
Authors:
Rodrigo Rebouças de Almeida,
Rafael do Nascimento Ribeiro,
Christoph Treude,
Uirá Kulesza
Abstract:
Incorporating the business perspective into prioritizing technical debt is essential to contribute to decision making in industry. In this paper, we evolve and evaluate a business-driven approach for technical debt prioritization. The approach was evaluated during a five-month industrial case study with business and technical stakeholders' active participation. The results show that the approach c…
▽ More
Incorporating the business perspective into prioritizing technical debt is essential to contribute to decision making in industry. In this paper, we evolve and evaluate a business-driven approach for technical debt prioritization. The approach was evaluated during a five-month industrial case study with business and technical stakeholders' active participation. The results show that the approach contributed to aligning business criteria between the business and technical stakeholders. We also observed a downward trend in the amount of technical debt that affects high-value business assets. Moreover, we identified eight business factors that affect the decision making related to the prioritization of technical debt. The study results suggest that the proposed business-driven technical debt prioritization approach can help teams to focus their efforts on paying off the business' most relevant debt.
△ Less
Submitted 21 March, 2021; v1 submitted 19 October, 2020;
originally announced October 2020.
-
Applying Lie Groups Approaches for Rigid Registration of Point Clouds
Authors:
Liliane Rodrigues de Almeida,
Gilson A. Giraldi,
Marcelo Bernardes Vieira
Abstract:
In the last decades, some literature appeared using the Lie groups theory to solve problems in computer vision. On the other hand, Lie algebraic representations of the transformations therein were introduced to overcome the difficulties behind group structure by mapping the transformation groups to linear spaces. In this paper we focus on application of Lie groups and Lie algebras to find the rigi…
▽ More
In the last decades, some literature appeared using the Lie groups theory to solve problems in computer vision. On the other hand, Lie algebraic representations of the transformations therein were introduced to overcome the difficulties behind group structure by mapping the transformation groups to linear spaces. In this paper we focus on application of Lie groups and Lie algebras to find the rigid transformation that best register two surfaces represented by point clouds. The so called pairwise rigid registration can be formulated by comparing intrinsic second-order orientation tensors that encode local geometry. These tensors can be (locally) represented by symmetric non-negative definite matrices. In this paper we interpret the obtained tensor field as a multivariate normal model. So, we start with the fact that the space of Gaussians can be equipped with a Lie group structure, that is isomorphic to a subgroup of the upper triangular matrices. Consequently, the associated Lie algebra structure enables us to handle Gaussians, and consequently, to compare orientation tensors, with Euclidean operations. We apply this methodology to variants of the Iterative Closest Point (ICP), a known technique for pairwise registration. We compare the obtained results with the original implementations that apply the comparative tensor shape factor (CTSF), which is a similarity notion based on the eigenvalues of the orientation tensors. We notice that the similarity measure in tensor spaces directly derived from Lie's approach is not invariant under rotations, which is a problem in terms of rigid registration. Despite of this, the performed computational experiments show promising results when embedding orientation tensor fields in Lie algebras.
△ Less
Submitted 23 June, 2020;
originally announced June 2020.
-
Super-resolution of multispectral satellite images using convolutional neural networks
Authors:
M. U. Müller,
N. Ekhtiari,
R. M. Almeida,
C. Rieke
Abstract:
Super-resolution aims at increasing image resolution by algorithmic means and has progressed over the recent years due to advances in the fields of computer vision and deep learning. Convolutional Neural Networks based on a variety of architectures have been applied to the problem, e.g. autoencoders and residual networks. While most research focuses on the processing of photographs consisting only…
▽ More
Super-resolution aims at increasing image resolution by algorithmic means and has progressed over the recent years due to advances in the fields of computer vision and deep learning. Convolutional Neural Networks based on a variety of architectures have been applied to the problem, e.g. autoencoders and residual networks. While most research focuses on the processing of photographs consisting only of RGB color channels, little work can be found concentrating on multi-band, analytic satellite imagery. Satellite images often include a panchromatic band, which has higher spatial resolution but lower spectral resolution than the other bands. In the field of remote sensing, there is a long tradition of applying pan-sharpening to satellite images, i.e. bringing the multispectral bands to the higher spatial resolution by merging them with the panchromatic band. To our knowledge there are so far no approaches to super-resolution which take advantage of the panchromatic band. In this paper we propose a method to train state-of-the-art CNNs using pairs of lower-resolution multispectral and high-resolution pan-sharpened image tiles in order to create super-resolved analytic images. The derived quality metrics show that the method improves information content of the processed images. We compare the results created by four CNN architectures, with RedNet30 performing best.
△ Less
Submitted 8 April, 2020; v1 submitted 3 February, 2020;
originally announced February 2020.
-
Business-Driven Technical Debt Prioritization
Authors:
Rodrigo Rebouças de Almeida
Abstract:
Technical debt happens when teams take shortcuts on software development to gain short-term benefits at the cost of making future changes more expensive. Previous results show that there is a misalignment between the prioritization done by technical professionals and the prioritization expected by business ones. This paper presents a business-driven approach to prioritize technical debt items. The…
▽ More
Technical debt happens when teams take shortcuts on software development to gain short-term benefits at the cost of making future changes more expensive. Previous results show that there is a misalignment between the prioritization done by technical professionals and the prioritization expected by business ones. This paper presents a business-driven approach to prioritize technical debt items. The research is organized into four phases: exploratory, to identify the research focus; concept verification, where the proposed approach was evaluated on a multi-case study; solution, where a design science research was conducted to develop Tracy, a framework for technical debt prioritization; and validation. Results so far show that the business-driven prioritization of technical debt items can improve the alignment and communication between the technical and business stakeholders.
△ Less
Submitted 31 July, 2019;
originally announced August 2019.
-
Tracy: A Business-driven Technical Debt Prioritization Framework
Authors:
Rodrigo Rebouças de Almeida,
Christoph Treude,
Uirá Kulesza
Abstract:
Technical debt is a pervasive problem in software development. Software development teams have to prioritize debt items and determine whether they should address debt or develop new features at any point in time. This paper presents "Tracy", a framework for the prioritization of technical debt using a business-driven approach built on top of business processes. The current stage of the proposed fr…
▽ More
Technical debt is a pervasive problem in software development. Software development teams have to prioritize debt items and determine whether they should address debt or develop new features at any point in time. This paper presents "Tracy", a framework for the prioritization of technical debt using a business-driven approach built on top of business processes. The current stage of the proposed framework is at the beginning of the third phase of Design Science Research, which is usually divided into the phases of exploration, engineering, and evaluation. The exploration and engineering phases involved the participation of 49 professionals from 12 different groups of three companies. The initial evaluation shows that the presented framework is coherent in its structure and that its results contribute to business-driven decision making on technical debt prioritization.
△ Less
Submitted 31 July, 2019;
originally announced August 2019.
-
A Systematic Literature Review about the impact of Artificial Intelligence on Autonomous Vehicle Safety
Authors:
A. M. Nascimento,
L. F. Vismari,
C. B. S. T. Molina,
P. S. Cugnasca,
J. B. Camargo Jr.,
J. R. de Almeida Jr.,
R. Inam,
E. Fersman,
M. V. Marquezini,
A. Y. Hata
Abstract:
Autonomous Vehicles (AV) are expected to bring considerable benefits to society, such as traffic optimization and accidents reduction. They rely heavily on advances in many Artificial Intelligence (AI) approaches and techniques. However, while some researchers in this field believe AI is the core element to enhance safety, others believe AI imposes new challenges to assure the safety of these new…
▽ More
Autonomous Vehicles (AV) are expected to bring considerable benefits to society, such as traffic optimization and accidents reduction. They rely heavily on advances in many Artificial Intelligence (AI) approaches and techniques. However, while some researchers in this field believe AI is the core element to enhance safety, others believe AI imposes new challenges to assure the safety of these new AI-based systems and applications. In this non-convergent context, this paper presents a systematic literature review to paint a clear picture of the state of the art of the literature in AI on AV safety. Based on an initial sample of 4870 retrieved papers, 59 studies were selected as the result of the selection criteria detailed in the paper. The shortlisted studies were then mapped into six categories to answer the proposed research questions. An AV system model was proposed and applied to orient the discussions about the SLR findings. As a main result, we have reinforced our preliminary observation about the necessity of considering a serious safety agenda for the future studies on AI-based AV systems.
△ Less
Submitted 4 April, 2019;
originally announced April 2019.
-
Aligning Technical Debt Prioritization with Business Objectives: A Multiple-Case Study
Authors:
Rodrigo Rebouças de Almeida,
Uirá Kulesza,
Christoph Treude,
D'angellys Cavalcanti Feitosa,
Aliandro Higino Guedes Lima
Abstract:
Technical debt (TD) is a metaphor to describe the trade-off between short-term workarounds and long-term goals in software development. Despite being widely used to explain technical issues in business terms, industry and academia still lack a proper way to manage technical debt while explicitly considering business priorities. In this paper, we report on a multiple-case study of how two big softw…
▽ More
Technical debt (TD) is a metaphor to describe the trade-off between short-term workarounds and long-term goals in software development. Despite being widely used to explain technical issues in business terms, industry and academia still lack a proper way to manage technical debt while explicitly considering business priorities. In this paper, we report on a multiple-case study of how two big software development companies handle technical debt items, and we show how taking the business perspective into account can improve the decision making for the prioritization of technical debt. We also propose a first step toward an approach that uses business process management (BPM) to manage technical debt. We interviewed a set of IT business stakeholders, and we collected and analyzed different sets of technical debt items, comparing how these items would be prioritized using a purely technical versus a business-oriented approach. We found that the use of business process management to support technical debt management makes the technical debt prioritization decision process more aligned with business expectations. We also found evidence that the business process management approach can help technical debt management achieve business objectives.
△ Less
Submitted 15 July, 2018;
originally announced July 2018.
-
Reducing Nondeterministic Tree Automata by Adding Transitions
Authors:
Ricardo Manuel de Oliveira Almeida
Abstract:
We introduce saturation of nondeterministic tree automata, a technique that consists of adding new transitions to an automaton while preserving its language. We implemented our algorithm on minotaut - a module of the tree automata library libvata that reduces the size of automata by merging states and removing superfluous transitions - and we show how saturation can make subsequent merge and trans…
▽ More
We introduce saturation of nondeterministic tree automata, a technique that consists of adding new transitions to an automaton while preserving its language. We implemented our algorithm on minotaut - a module of the tree automata library libvata that reduces the size of automata by merging states and removing superfluous transitions - and we show how saturation can make subsequent merge and transition-removal operations more effective. Thus we obtain a Ptime algorithm that reduces the size of tree automata even more than before. Additionally, we explore how minotaut alone can play an important role when performing hard operations like complementation, allowing to both obtain smaller complement automata and lower computation times. We then show how saturation can extend this contribution even further. We tested our algorithms on a large collection of automata from applications of libvata in shape analysis, and on different classes of randomly generated automata.
△ Less
Submitted 15 December, 2016;
originally announced December 2016.
-
Reduction of Nondeterministic Tree Automata
Authors:
Ricardo Almeida,
Lukáš Holík,
Richard Mayr
Abstract:
We present an efficient algorithm to reduce the size of nondeterministic tree automata, while retaining their language. It is based on new transition pruning techniques, and quotienting of the state space w.r.t. suitable equivalences. It uses criteria based on combinations of downward and upward simulation preorder on trees, and the more general downward and upward language inclusions. Since tree-…
▽ More
We present an efficient algorithm to reduce the size of nondeterministic tree automata, while retaining their language. It is based on new transition pruning techniques, and quotienting of the state space w.r.t. suitable equivalences. It uses criteria based on combinations of downward and upward simulation preorder on trees, and the more general downward and upward language inclusions. Since tree-language inclusion is EXPTIME-complete, we describe methods to compute good approximations in polynomial time. We implemented our algorithm as a module of the well-known libvata tree automata library, and tested its performance on a given collection of tree automata from various applications of libvata in regular model checking and shape analysis, as well as on various classes of randomly generated tree automata. Our algorithm yields substantially smaller and sparser automata than all previously known reduction techniques, and it is still fast enough to handle large instances.
△ Less
Submitted 5 January, 2016; v1 submitted 29 December, 2015;
originally announced December 2015.
-
Deciding KAT and Hoare Logic with Derivatives
Authors:
Ricardo Almeida,
Sabine Broda,
Nelma Moreira
Abstract:
Kleene algebra with tests (KAT) is an equational system for program verification, which is the combination of Boolean algebra (BA) and Kleene algebra (KA), the algebra of regular expressions. In particular, KAT subsumes the propositional fragment of Hoare logic (PHL) which is a formal system for the specification and verification of programs, and that is currently the base of most tools for check…
▽ More
Kleene algebra with tests (KAT) is an equational system for program verification, which is the combination of Boolean algebra (BA) and Kleene algebra (KA), the algebra of regular expressions. In particular, KAT subsumes the propositional fragment of Hoare logic (PHL) which is a formal system for the specification and verification of programs, and that is currently the base of most tools for checking program correctness. Both the equational theory of KAT and the encoding of PHL in KAT are known to be decidable. In this paper we present a new decision procedure for the equivalence of two KAT expressions based on the notion of partial derivatives. We also introduce the notion of derivative modulo particular sets of equations. With this we extend the previous procedure for deciding PHL. Some experimental results are also presented.
△ Less
Submitted 8 October, 2012;
originally announced October 2012.
-
IACTalks: an on-line archive of astronomy-related seminars
Authors:
Johan H. Knapen,
Jorge A. Pérez Prieto,
Tariq Shahbaz,
Anna Ferré-Mateu,
Nicola Caon,
Cristina Ramos Almeida,
Brandon Tingley,
Valentina Luridiana,
Inés Flores-Cacho,
Orlagh Creevey,
Arturo Manchado Torres,
Ignacio Trujillo,
Maria Rosa Zapatero Osorio,
Francisco Sánchez Martínez,
Francisco López Molina,
Gabriel Pérez Díaz,
Miguel Briganti,
Inés Bonet
Abstract:
We present IACTalks, a free and open access seminars archive (http://iactalks.iac.es) aimed at promoting astronomy and the exchange of ideas by providing high-quality scientific seminars to the astronomical community. The archive of seminars and talks given at the Instituto de Astrofiísica de Canarias goes back to 2008. Over 360 talks and seminars are now freely available by streaming over the int…
▽ More
We present IACTalks, a free and open access seminars archive (http://iactalks.iac.es) aimed at promoting astronomy and the exchange of ideas by providing high-quality scientific seminars to the astronomical community. The archive of seminars and talks given at the Instituto de Astrofiísica de Canarias goes back to 2008. Over 360 talks and seminars are now freely available by streaming over the internet. We describe the user interface, which includes two video streams, one showing the speaker, the other the presentation. A search function is available, and seminars are indexed by keywords and in some cases by series, such as special training courses or the 2011 Winter School of Astrophysics, on secular evolution of galaxies. The archive is made available as an open resource, to be used by scientists and the public.
△ Less
Submitted 27 June, 2012;
originally announced June 2012.
-
Improving Spam Detection Based on Structural Similarity
Authors:
Luiz H. Gomes,
Fernando D. O. Castro,
Rodrigo B. Almeida,
Luis M. A. Bettencourt,
Virgilio A. F. Almeida,
Jussara M. Almeida
Abstract:
We propose a new detection algorithm that uses structural relationships between senders and recipients of email as the basis for the identification of spam messages. Users and receivers are represented as vectors in their reciprocal spaces. A measure of similarity between vectors is constructed and used to group users into clusters. Knowledge of their classification as past senders/receivers of…
▽ More
We propose a new detection algorithm that uses structural relationships between senders and recipients of email as the basis for the identification of spam messages. Users and receivers are represented as vectors in their reciprocal spaces. A measure of similarity between vectors is constructed and used to group users into clusters. Knowledge of their classification as past senders/receivers of spam or legitimate mail, comming from an auxiliary detection algorithm, is then used to label these clusters probabilistically. This knowledge comes from an auxiliary algorithm. The measure of similarity between the sender and receiver sets of a new message to the center vector of clusters is then used to asses the possibility of that message being legitimate or spam. We show that the proposed algorithm is able to correct part of the false positives (legitimate messages classified as spam) using a testbed of one week smtp log.
△ Less
Submitted 5 April, 2005;
originally announced April 2005.
-
Local Community Identification through User Access Patterns
Authors:
Rodrigo B. Almeida,
Virgilio A. F. Almeida
Abstract:
Community identification algorithms have been used to enhance the quality of the services perceived by its users. Although algorithms for community have a widespread use in the Web, their application to portals or specific subsets of the Web has not been much studied. In this paper, we propose a technique for local community identification that takes into account user access behavior derived fro…
▽ More
Community identification algorithms have been used to enhance the quality of the services perceived by its users. Although algorithms for community have a widespread use in the Web, their application to portals or specific subsets of the Web has not been much studied. In this paper, we propose a technique for local community identification that takes into account user access behavior derived from access logs of servers in the Web. The technique takes a departure from the existing community algorithms since it changes the focus of in terest, moving from authors to users. Our approach does not use relations imposed by authors (e.g. hyperlinks in the case of Web pages). It uses information derived from user accesses to a service in order to infer relationships. The communities identified are of great interest to content providers since they can be used to improve quality of their services. We also propose an evaluation methodology for analyzing the results obtained by the algorithm. We present two case studies based on actual data from two services: an online bookstore and an online radio. The case of the online radio is particularly relevant, because it emphasizes the contribution of the proposed algorithm to find out communities in an environment (i.e., streaming media service) without links, that represent the relations imposed by authors (e.g. hyperlinks in the case of Web pages).
△ Less
Submitted 16 December, 2002;
originally announced December 2002.