-
Deep Inertial Pose: A deep learning approach for human pose estimation
Authors:
Sara M. Cerqueira,
Manuel Palermo,
Cristina P. Santos
Abstract:
Inertial-based Motion capture system has been attracting growing attention due to its wearability and unsconstrained use. However, accurate human joint estimation demands several complex and expertise demanding steps, which leads to expensive software such as the state-of-the-art MVN Awinda from Xsens Technologies. This work aims to study the use of Neural Networks to abstract the complex biomecha…
▽ More
Inertial-based Motion capture system has been attracting growing attention due to its wearability and unsconstrained use. However, accurate human joint estimation demands several complex and expertise demanding steps, which leads to expensive software such as the state-of-the-art MVN Awinda from Xsens Technologies. This work aims to study the use of Neural Networks to abstract the complex biomechanical models and analytical mathematics required for pose estimation. Thus, it presents a comparison of different Neural Network architectures and methodologies to understand how accurately these methods can estimate human pose, using both low cost(MPU9250) and high end (Mtw Awinda) Magnetic, Angular Rate, and Gravity (MARG) sensors. The most efficient method was the Hybrid LSTM-Madgwick detached, which achieved an Quaternion Angle distance error of 7.96, using Mtw Awinda data. Also, an ablation study was conducted to study the impact of data augmentation, output representation, window size, loss function and magnetometer data on the pose estimation error. This work indicates that Neural Networks can be trained to estimate human pose, with results comparable to the state-of-the-art fusion filters.
△ Less
Submitted 7 June, 2025;
originally announced June 2025.
-
Turning to Online Forums for Legal Information: A Case Study of GDPR's Legitimate Interests
Authors:
Lin Kyi,
Cristiana Santos,
Sushil Ammanaghatta Shivakumar,
Franziska Roesner,
Asia Biega
Abstract:
Practitioners building online services and tools often turn to online forums such as Reddit, Law Stack Exchange, and Stack Overflow for legal guidance to ensure compliance with the GDPR. The legal information presented in these forums directly impact present-day industry practitioner's decisions. Online forums can serve as gateways that, depending on the accuracy and quality of the answers provide…
▽ More
Practitioners building online services and tools often turn to online forums such as Reddit, Law Stack Exchange, and Stack Overflow for legal guidance to ensure compliance with the GDPR. The legal information presented in these forums directly impact present-day industry practitioner's decisions. Online forums can serve as gateways that, depending on the accuracy and quality of the answers provided, may either support or undermine the protection of privacy and data protection fundamental rights. However, there is a need for deeper investigation into practitioners' decision-making processes and their understanding of legal compliance.
Using GDPR's ``legitimate interests'' legal ground for processing personal data as a case study, we investigate how practitioners use online forums to identify common areas of confusion in applying legitimate interests in practice, and evaluate how legally sound online forum responses are. Our analysis found that applying the ``legitimate interests'' legal basis is complex for practitioners, with important implications for how the GDPR is implemented in practice. The legal analysis showed that crowdsourced legal information tends to be legally sound, though sometimes incomplete. We outline recommendations to improve the quality of online forums by ensuring that responses are more legally sound and comprehensive, enabling practitioners to apply legitimate interests effectively in practice and uphold the GDPR.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
Stochastic Fractional Neural Operators: A Symmetrized Approach to Modeling Turbulence in Complex Fluid Dynamics
Authors:
Rômulo Damasclin Chaves dos Santos,
Jorge Henrique de Oliveira Sales
Abstract:
In this work, we introduce a new class of neural network operators designed to handle problems where memory effects and randomness play a central role. In this work, we introduce a new class of neural network operators designed to handle problems where memory effects and randomness play a central role. These operators merge symmetrized activation functions, Caputo-type fractional derivatives, and…
▽ More
In this work, we introduce a new class of neural network operators designed to handle problems where memory effects and randomness play a central role. In this work, we introduce a new class of neural network operators designed to handle problems where memory effects and randomness play a central role. These operators merge symmetrized activation functions, Caputo-type fractional derivatives, and stochastic perturbations introduced via Itô type noise. The result is a powerful framework capable of approximating functions that evolve over time with both long-term memory and uncertain dynamics. We develop the mathematical foundations of these operators, proving three key theorems of Voronovskaya type. These results describe the asymptotic behavior of the operators, their convergence in the mean-square sense, and their consistency under fractional regularity assumptions. All estimates explicitly account for the influence of the memory parameter $α$ and the noise level $σ$. As a practical application, we apply the proposed theory to the fractional Navier-Stokes equations with stochastic forcing, a model often used to describe turbulence in fluid flows with memory. Our approach provides theoretical guarantees for the approximation quality and suggests that these neural operators can serve as effective tools in the analysis and simulation of complex systems. By blending ideas from neural networks, fractional calculus, and stochastic analysis, this research opens new perspectives for modeling turbulent phenomena and other multiscale processes where memory and randomness are fundamental. The results lay the groundwork for hybrid learning-based methods with strong analytical backing.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
"I will never pay for this" Perception of fairness and factors affecting behaviour on 'pay-or-ok' models
Authors:
Victor Morel,
Farzaneh Karegar,
Cristiana Santos
Abstract:
The rise of cookie paywalls ('pay-or-ok' models) has prompted growing debates around the right to privacy and data protection, monetisation, and the legitimacy of user consent. Despite their increasing use across sectors, limited research has explored how users perceive these models or what shapes their decisions to either consent to tracking or pay. To address this gap, we conducted four focus gr…
▽ More
The rise of cookie paywalls ('pay-or-ok' models) has prompted growing debates around the right to privacy and data protection, monetisation, and the legitimacy of user consent. Despite their increasing use across sectors, limited research has explored how users perceive these models or what shapes their decisions to either consent to tracking or pay. To address this gap, we conducted four focus groups (n= 14) to examine users' perceptions of cookie paywalls, their judgments of fairness, and the conditions under which they might consider paying, alongside a legal analysis within the EU data protection legal framework.
Participants primarily viewed cookie paywalls as profit-driven, with fairness perceptions varying depending on factors such as the presence of a third option beyond consent or payment, transparency of data practices, and the authenticity or exclusivity of the paid content. Participants voiced expectations for greater transparency, meaningful control over data collection, and less coercive alternatives, such as contextual advertising or "reject all" buttons. Although some conditions, including trusted providers, exclusive content, and reasonable pricing, could make participants consider paying, most expressed reluctance or unwillingness to do so.
Crucially, our findings raise concerns about economic exclusion, where privacy and data protection might end up becoming a privilege rather than fundamental rights. Consent given under financial pressure may not meet the standard of being freely given, as required by GDPR. To address these concerns, we recommend user-centred approaches that enhance transparency, reduce coercion, ensure the value of paid content, and explore inclusive alternatives. These measures are essential for supporting fairness, meaningful choice, and user autonomy in consent-driven digital environments.
△ Less
Submitted 23 May, 2025; v1 submitted 19 May, 2025;
originally announced May 2025.
-
Multi-level Cellular Automata for FLIM networks
Authors:
Felipe Crispim Salvagnini,
Jancarlo F. Gomes,
Cid A. N. Santos,
Silvio Jamil F. Guimarães,
Alexandre X. Falcão
Abstract:
The necessity of abundant annotated data and complex network architectures presents a significant challenge in deep-learning Salient Object Detection (deep SOD) and across the broader deep-learning landscape. This challenge is particularly acute in medical applications in developing countries with limited computational resources. Combining modern and classical techniques offers a path to maintaini…
▽ More
The necessity of abundant annotated data and complex network architectures presents a significant challenge in deep-learning Salient Object Detection (deep SOD) and across the broader deep-learning landscape. This challenge is particularly acute in medical applications in developing countries with limited computational resources. Combining modern and classical techniques offers a path to maintaining competitive performance while enabling practical applications. Feature Learning from Image Markers (FLIM) methodology empowers experts to design convolutional encoders through user-drawn markers, with filters learned directly from these annotations. Recent findings demonstrate that coupling a FLIM encoder with an adaptive decoder creates a flyweight network suitable for SOD, requiring significantly fewer parameters than lightweight models and eliminating the need for backpropagation. Cellular Automata (CA) methods have proven successful in data-scarce scenarios but require proper initialization -- typically through user input, priors, or randomness. We propose a practical intersection of these approaches: using FLIM networks to initialize CA states with expert knowledge without requiring user interaction for each image. By decoding features from each level of a FLIM network, we can initialize multiple CAs simultaneously, creating a multi-level framework. Our method leverages the hierarchical knowledge encoded across different network layers, merging multiple saliency maps into a high-quality final output that functions as a CA ensemble. Benchmarks across two challenging medical datasets demonstrate the competitiveness of our multi-level CA approach compared to established models in the deep SOD literature.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
Ankle Exoskeletons in Walking and Load-Carrying Tasks: Insights into Biomechanics and Human-Robot Interaction
Authors:
J. F. Almeida,
J. André,
C. P. Santos
Abstract:
Background: Lower limb exoskeletons can enhance quality of life, but widespread adoption is limited by the lack of frameworks to assess their biomechanical and human-robot interaction effects, which are essential for developing adaptive and personalized control strategies. Understanding impacts on kinematics, muscle activity, and HRI dynamics is key to achieve improved usability of wearable robots…
▽ More
Background: Lower limb exoskeletons can enhance quality of life, but widespread adoption is limited by the lack of frameworks to assess their biomechanical and human-robot interaction effects, which are essential for developing adaptive and personalized control strategies. Understanding impacts on kinematics, muscle activity, and HRI dynamics is key to achieve improved usability of wearable robots. Objectives: We propose a systematic methodology evaluate an ankle exoskeleton's effects on human movement during walking and load-carrying (10 kg front pack), focusing on joint kinematics, muscle activity, and HRI torque signals. Materials and Methods: Using Xsens MVN (inertial motion capture), Delsys EMG, and a unilateral exoskeleton, three experiments were conducted: (1) isolated dorsiflexion/plantarflexion; (2) gait analysis (two subjects, passive/active modes); and (3) load-carrying under assistance. Results and Conclusions: The first experiment confirmed that the HRI sensor captured both voluntary and involuntary torques, providing directional torque insights. The second experiment showed that the device slightly restricted ankle range of motion (RoM) but supported normal gait patterns across all assistance modes. The exoskeleton reduced muscle activity, particularly in active mode. HRI torque varied according to gait phases and highlighted reduced synchronization, suggesting a need for improved support. The third experiment revealed that load-carrying increased GM and TA muscle activity, but the device partially mitigated user effort by reducing muscle activity compared to unassisted walking. HRI increased during load-carrying, providing insights into user-device dynamics. These results demonstrate the importance of tailoring exoskeleton evaluation methods to specific devices and users, while offering a framework for future studies on exoskeleton biomechanics and HRI.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
A Human-Sensitive Controller: Adapting to Human Ergonomics and Physical Constraints via Reinforcement Learning
Authors:
Vitor Martins,
Sara M. Cerqueira,
Mercedes Balcells,
Elazer R Edelman,
Cristina P. Santos
Abstract:
Work-Related Musculoskeletal Disorders continue to be a major challenge in industrial environments, leading to reduced workforce participation, increased healthcare costs, and long-term disability. This study introduces a human-sensitive robotic system aimed at reintegrating individuals with a history of musculoskeletal disorders into standard job roles, while simultaneously optimizing ergonomic c…
▽ More
Work-Related Musculoskeletal Disorders continue to be a major challenge in industrial environments, leading to reduced workforce participation, increased healthcare costs, and long-term disability. This study introduces a human-sensitive robotic system aimed at reintegrating individuals with a history of musculoskeletal disorders into standard job roles, while simultaneously optimizing ergonomic conditions for the broader workforce. This research leverages reinforcement learning to develop a human-aware control strategy for collaborative robots, focusing on optimizing ergonomic conditions and preventing pain during task execution. Two RL approaches, Q-Learning and Deep Q-Network (DQN), were implemented and tested to personalize control strategies based on individual user characteristics. Although experimental results revealed a simulation-to-real gap, a fine-tuning phase successfully adapted the policies to real-world conditions. DQN outperformed Q-Learning by completing tasks faster while maintaining zero pain risk and safe ergonomic levels. The structured testing protocol confirmed the system's adaptability to diverse human anthropometries, underscoring the potential of RL-driven cobots to enable safer, more inclusive workplaces.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
Revolutionizing Fractional Calculus with Neural Networks: Voronovskaya-Damasclin Theory for Next-Generation AI Systems
Authors:
Rômulo Damasclin Chaves dos Santos,
Jorge Henrique de Oliveira Sales
Abstract:
This work introduces rigorous convergence rates for neural network operators activated by symmetrized and perturbed hyperbolic tangent functions, utilizing novel Voronovskaya-Damasclin asymptotic expansions. We analyze basic, Kantorovich, and quadrature-type operators over infinite domains, extending classical approximation theory to fractional calculus via Caputo derivatives. Key innovations incl…
▽ More
This work introduces rigorous convergence rates for neural network operators activated by symmetrized and perturbed hyperbolic tangent functions, utilizing novel Voronovskaya-Damasclin asymptotic expansions. We analyze basic, Kantorovich, and quadrature-type operators over infinite domains, extending classical approximation theory to fractional calculus via Caputo derivatives. Key innovations include parameterized activation functions with asymmetry control, symmetrized density operators, and fractional Taylor expansions for error analysis. The main theorem demonstrates that Kantorovich operators achieve \(o(n^{-β(N-\varepsilon)})\) convergence rates, while basic operators exhibit \(\mathcal{O}(n^{-βN})\) error decay. For deep networks, we prove \(\mathcal{O}(L^{-β(N-\varepsilon)})\) approximation bounds. Stability results under parameter perturbations highlight operator robustness. By integrating neural approximation theory with fractional calculus, this work provides foundational mathematical insights and deployable engineering solutions, with potential applications in complex system modeling and signal processing.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
Impacto de Treinamento em Programação Competitiva no Ensino Médio: Resultados e Desafios
Authors:
Camila da Cruz Santos,
Sarah Souto dos Santos,
Crishna Irion,
Giullia Rodrigues de Menezes,
Rafael Dias Araújo,
João Henrique de Souza Pereira
Abstract:
This article presents an ongoing research aiming to develop an effective methodology for teaching programming, focusing on participation in the Brazilian Informatics Olympiad (OBI), for elementary and high school students. The training conducted with students from the Federal Institute and state schools, demonstrates the importance of programming training programs as a way to promote interest in c…
▽ More
This article presents an ongoing research aiming to develop an effective methodology for teaching programming, focusing on participation in the Brazilian Informatics Olympiad (OBI), for elementary and high school students. The training conducted with students from the Federal Institute and state schools, demonstrates the importance of programming training programs as a way to promote interest in computing, stimulate the development of computational skills, and increase participation in competitions such as the OBI. The next steps of the research include conducting more training cycles and analyzing the results obtained in the competitions.
△ Less
Submitted 31 January, 2025;
originally announced March 2025.
-
Promoting Gender Equality in Competitive Programming: Strategies and Impacts of Affirmative Actions in Programming Marathons in Brazil
Authors:
Crishna Irion,
Camila da Cruz Santos,
Luiz Claudio Theodoro,
Rafael Dias Araujo,
Joao Henrique de Souza Pereira
Abstract:
In the context of Computing, competitive programming is a relevant area that aims to have students, usually in teams, solve programming challenges, developing skills and competencies in the field. However, female participation remains significantly low and notably distant compared to male participation, even with proven intellectual equity between genders. This research aims to present strategies…
▽ More
In the context of Computing, competitive programming is a relevant area that aims to have students, usually in teams, solve programming challenges, developing skills and competencies in the field. However, female participation remains significantly low and notably distant compared to male participation, even with proven intellectual equity between genders. This research aims to present strategies used to improve female participation in Programming Marathons in Brasil. The developed research is documentary, applied, and exploratory, with actions that generate results for female participation, with affirmative and inclusion actions, an important step towards gender equity in competitive programming.
△ Less
Submitted 21 February, 2025;
originally announced February 2025.
-
Humanity's Last Exam
Authors:
Long Phan,
Alice Gatti,
Ziwen Han,
Nathaniel Li,
Josephina Hu,
Hugh Zhang,
Chen Bo Calvin Zhang,
Mohamed Shaaban,
John Ling,
Sean Shi,
Michael Choi,
Anish Agrawal,
Arnav Chopra,
Adam Khoja,
Ryan Kim,
Richard Ren,
Jason Hausenloy,
Oliver Zhang,
Mantas Mazeika,
Dmitry Dodonov,
Tung Nguyen,
Jaeho Lee,
Daron Anderson,
Mikhail Doroshenko,
Alun Cennyth Stokes
, et al. (1084 additional authors not shown)
Abstract:
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of…
▽ More
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
△ Less
Submitted 19 April, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
Extension of Symmetrized Neural Network Operators with Fractional and Mixed Activation Functions
Authors:
Rômulo Damasclin Chaves dos Santos,
Jorge Henrique de Oliveira Sales
Abstract:
We propose a novel extension to symmetrized neural network operators by incorporating fractional and mixed activation functions. This study addresses the limitations of existing models in approximating higher-order smooth functions, particularly in complex and high-dimensional spaces. Our framework introduces a fractional exponent in the activation functions, allowing adaptive non-linear approxima…
▽ More
We propose a novel extension to symmetrized neural network operators by incorporating fractional and mixed activation functions. This study addresses the limitations of existing models in approximating higher-order smooth functions, particularly in complex and high-dimensional spaces. Our framework introduces a fractional exponent in the activation functions, allowing adaptive non-linear approximations with improved accuracy. We define new density functions based on $q$-deformed and $θ$-parametrized logistic models and derive advanced Jackson-type inequalities that establish uniform convergence rates. Additionally, we provide a rigorous mathematical foundation for the proposed operators, supported by numerical validations demonstrating their efficiency in handling oscillatory and fractional components. The results extend the applicability of neural network approximation theory to broader functional spaces, paving the way for applications in solving partial differential equations and modeling complex systems.
△ Less
Submitted 17 January, 2025;
originally announced January 2025.
-
An Empirical Study of Safetensors' Usage Trends and Developers' Perceptions
Authors:
Beatrice Casey,
Kaia Damian,
Andrew Cotaj,
Joanna C. S. Santos
Abstract:
Developers are sharing pre-trained Machine Learning (ML) models through a variety of model sharing platforms, such as Hugging Face, in an effort to make ML development more collaborative. To share the models, they must first be serialized. While there are many methods of serialization in Python, most of them are unsafe. To tame this insecurity, Hugging Face released safetensors as a way to mitigat…
▽ More
Developers are sharing pre-trained Machine Learning (ML) models through a variety of model sharing platforms, such as Hugging Face, in an effort to make ML development more collaborative. To share the models, they must first be serialized. While there are many methods of serialization in Python, most of them are unsafe. To tame this insecurity, Hugging Face released safetensors as a way to mitigate the threats posed by unsafe serialization formats. In this context, this paper investigates developer's shifts towards using safetensors on Hugging Face in an effort to understand security practices in the ML development community, as well as how developers react to new methods of serialization. Our results find that more developers are adopting safetensors, and many safetensor adoptions were made by automated conversions of existing models by Hugging Face's conversion tool. We also found, however, that a majority of developers ignore the conversion tool's pull requests, and that while many developers are facing issues with using safetensors, they are eager to learn about and adapt the format.
△ Less
Submitted 3 January, 2025;
originally announced January 2025.
-
Measuring Compliance of Consent Revocation on the Web
Authors:
Gayatri Priyadarsini Kancherla,
Nataliia Bielova,
Cristiana Santos,
Abhishek Bichhawat
Abstract:
The GDPR requires websites to facilitate the right to revoke consent from Web users. While numerous studies measured compliance of consent with the various consent requirements, no prior work has studied consent revocation on the Web. Therefore, it remains unclear how difficult it is to revoke consent on the websites' interfaces, nor whether revoked consent is properly stored and communicated behi…
▽ More
The GDPR requires websites to facilitate the right to revoke consent from Web users. While numerous studies measured compliance of consent with the various consent requirements, no prior work has studied consent revocation on the Web. Therefore, it remains unclear how difficult it is to revoke consent on the websites' interfaces, nor whether revoked consent is properly stored and communicated behind the user interface. Our work aims to fill this gap by measuring compliance of consent revocation on the Web on the top-200 websites. We found that 19.87% of websites make it difficult for users to revoke consent throughout different interfaces, 20.5% of websites require more effort than acceptance, and 2.48% do not provide consent revocation at all, thus violating legal requirements for valid consent. 57.5% websites do not delete the cookies after consent revocation enabling continuous illegal processing of users' data. Moreover, we analyzed 281 websites implementing the IAB Europe TCF, and found 22 websites that store a positive consent despite user's revocation. Surprisingly, we found that on 101 websites, third parties that have received consent upon user's acceptance, are not informed of user's revocation, leading to the illegal processing of users' data by such third parties. Our findings emphasise the need for improved legal compliance of consent revocation, and proper, consistent, and uniform implementation of revocation communication and data deletion practices.
△ Less
Submitted 22 May, 2025; v1 submitted 22 November, 2024;
originally announced November 2024.
-
The connected Grundy coloring problem: Formulations and a local-search enhanced biased random-key genetic algorithm
Authors:
Mateus C. Silva,
Rafael A. Melo,
Mauricio G. C. Resende,
Marcio C. Santos,
Rodrigo F. Toso
Abstract:
Given a graph G=(V,E), a connected Grundy coloring is a proper vertex coloring that can be obtained by a first-fit heuristic on a connected vertex sequence. A first-fit coloring heuristic is one that attributes to each vertex in a sequence the lowest-index color not used for its preceding neighbors. A connected vertex sequence is one in which each element, except for the first one, is connected to…
▽ More
Given a graph G=(V,E), a connected Grundy coloring is a proper vertex coloring that can be obtained by a first-fit heuristic on a connected vertex sequence. A first-fit coloring heuristic is one that attributes to each vertex in a sequence the lowest-index color not used for its preceding neighbors. A connected vertex sequence is one in which each element, except for the first one, is connected to at least one element preceding it. The connected Grundy coloring problem consists of obtaining a connected Grundy coloring maximizing the number of colors. In this paper, we propose two integer programming (IP) formulations and a local-search enhanced biased random-key genetic algorithm (BRKGA) for the connected Grundy coloring problem. The first formulation follows the standard way of partitioning the vertices into color classes while the second one relies on the idea of representatives in an attempt to break symmetries. The BRKGA encompasses a local search procedure using a newly proposed neighborhood. A theoretical neighborhood analysis is also presented. Extensive computational experiments indicate that the problem is computationally demanding for the proposed IP formulations. Nonetheless, the formulation by representatives outperforms the standard one for the considered benchmark instances. Additionally, our BRKGA can find high-quality solutions in low computational times for considerably large instances, showing improved performance when enhanced with local search and a reset mechanism. Moreover we show that our BRKGA can be easily extended to successfully tackle the Grundy coloring problem, i.e., the one without the connectivity requirements.
△ Less
Submitted 21 November, 2024;
originally announced November 2024.
-
MojoBench: Language Modeling and Benchmarks for Mojo
Authors:
Nishat Raihan,
Joanna C. S. Santos,
Marcos Zampieri
Abstract:
The recently introduced Mojo programming language (PL) by Modular, has received significant attention in the scientific community due to its claimed significant speed boost over Python. Despite advancements in code Large Language Models (LLMs) across various PLs, Mojo remains unexplored in this context. To address this gap, we introduce MojoBench, the first framework for Mojo code generation. Mojo…
▽ More
The recently introduced Mojo programming language (PL) by Modular, has received significant attention in the scientific community due to its claimed significant speed boost over Python. Despite advancements in code Large Language Models (LLMs) across various PLs, Mojo remains unexplored in this context. To address this gap, we introduce MojoBench, the first framework for Mojo code generation. MojoBench includes HumanEval-Mojo, a benchmark dataset designed for evaluating code LLMs on Mojo, and Mojo-Coder, the first LLM pretrained and finetuned for Mojo code generation, which supports instructions in 5 natural languages (NLs). Our results show that Mojo-Coder achieves a 30-35% performance improvement over leading models like GPT-4o and Claude-3.5-Sonnet. Furthermore, we provide insights into LLM behavior with underrepresented and unseen PLs, offering potential strategies for enhancing model adaptability. MojoBench contributes to our understanding of LLM capabilities and limitations in emerging programming paradigms fostering more robust code generation systems.
△ Less
Submitted 23 October, 2024;
originally announced October 2024.
-
Large Language Models in Computer Science Education: A Systematic Literature Review
Authors:
Nishat Raihan,
Mohammed Latif Siddiq,
Joanna C. S. Santos,
Marcos Zampieri
Abstract:
Large language models (LLMs) are becoming increasingly better at a wide range of Natural Language Processing tasks (NLP), such as text generation and understanding. Recently, these models have extended their capabilities to coding tasks, bridging the gap between natural languages (NL) and programming languages (PL). Foundational models such as the Generative Pre-trained Transformer (GPT) and LLaMA…
▽ More
Large language models (LLMs) are becoming increasingly better at a wide range of Natural Language Processing tasks (NLP), such as text generation and understanding. Recently, these models have extended their capabilities to coding tasks, bridging the gap between natural languages (NL) and programming languages (PL). Foundational models such as the Generative Pre-trained Transformer (GPT) and LLaMA series have set strong baseline performances in various NL and PL tasks. Additionally, several models have been fine-tuned specifically for code generation, showing significant improvements in code-related applications. Both foundational and fine-tuned models are increasingly used in education, helping students write, debug, and understand code. We present a comprehensive systematic literature review to examine the impact of LLMs in computer science and computer engineering education. We analyze their effectiveness in enhancing the learning experience, supporting personalized education, and aiding educators in curriculum development. We address five research questions to uncover insights into how LLMs contribute to educational outcomes, identify challenges, and suggest directions for future research.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Emílias Podcast -- Mulheres na Computação: Ampliando Horizontes e Inspirando Carreiras em STEM
Authors:
Nathálya Chaves Dos Santos,
Adolfo Gustavo Serra Seca Neto
Abstract:
On October 3, 2024, the "Emílias Podcast -- Women in Computing" celebrates its 5th anniversary, standing out as a platform that promotes the participation of women in STEM (an acronym for "science, technology, engineering, and mathematics"). The podcast aims to provide a space for women in computing and related fields to share their experiences and highlight the various opportunities in Informatio…
▽ More
On October 3, 2024, the "Emílias Podcast -- Women in Computing" celebrates its 5th anniversary, standing out as a platform that promotes the participation of women in STEM (an acronym for "science, technology, engineering, and mathematics"). The podcast aims to provide a space for women in computing and related fields to share their experiences and highlight the various opportunities in Information and Communication Technology (ICT). The methodology included a feedback survey with interviewees, conducted via Google Forms, to assess their experience and determine whether they would recommend the podcast. In addition, we analyzed audience data, which showed consistent growth over the five years. The results revealed that 100% of the interviewees would recommend "Emílias Podcast," reflecting a high level of satisfaction with the project. The average participation experience rating was 4.7 on a scale of 1 to 5, highlighting positive aspects such as the quality of the script, the interview conduction, and the networking opportunities. The audience data also underscore the podcast's impact: with over 10,000 accumulated downloads and plays, it is primarily listened to by people aged 23 to 44, with 50.9% of the audience being female, demonstrating its relevance and reach. In conclusion, the feedback from interviewees and the audience data reinforce the podcast's positive impact and its crucial role in the inclusion of women in technology. The results highlight the importance of promoting the field and its opportunities, contributing to a more inclusive and inspiring future. The data analysis demonstrates the podcast's effectiveness in engaging and expanding its audience, establishing it as a significant example of social impact in ICT.
△ Less
Submitted 6 October, 2024;
originally announced October 2024.
-
A Large-Scale Exploit Instrumentation Study of AI/ML Supply Chain Attacks in Hugging Face Models
Authors:
Beatrice Casey,
Joanna C. S. Santos,
Mehdi Mirakhorli
Abstract:
The development of machine learning (ML) techniques has led to ample opportunities for developers to develop and deploy their own models. Hugging Face serves as an open source platform where developers can share and download other models in an effort to make ML development more collaborative. In order for models to be shared, they first need to be serialized. Certain Python serialization methods a…
▽ More
The development of machine learning (ML) techniques has led to ample opportunities for developers to develop and deploy their own models. Hugging Face serves as an open source platform where developers can share and download other models in an effort to make ML development more collaborative. In order for models to be shared, they first need to be serialized. Certain Python serialization methods are considered unsafe, as they are vulnerable to object injection. This paper investigates the pervasiveness of these unsafe serialization methods across Hugging Face, and demonstrates through an exploitation approach, that models using unsafe serialization methods can be exploited and shared, creating an unsafe environment for ML developers. We investigate to what extent Hugging Face is able to flag repositories and files using unsafe serialization methods, and develop a technique to detect malicious models. Our results show that Hugging Face is home to a wide range of potentially vulnerable models.
△ Less
Submitted 6 October, 2024;
originally announced October 2024.
-
Extended Reality System for Robotic Learning from Human Demonstration
Authors:
Isaac Ngui,
Courtney McBeth,
Grace He,
André Corrêa Santos,
Luciano Soares,
Marco Morales,
Nancy M. Amato
Abstract:
Many real-world tasks are intuitive for a human to perform, but difficult to encode algorithmically when utilizing a robot to perform the tasks. In these scenarios, robotic systems can benefit from expert demonstrations to learn how to perform each task. In many settings, it may be difficult or unsafe to use a physical robot to provide these demonstrations, for example, considering cooking tasks s…
▽ More
Many real-world tasks are intuitive for a human to perform, but difficult to encode algorithmically when utilizing a robot to perform the tasks. In these scenarios, robotic systems can benefit from expert demonstrations to learn how to perform each task. In many settings, it may be difficult or unsafe to use a physical robot to provide these demonstrations, for example, considering cooking tasks such as slicing with a knife. Extended reality provides a natural setting for demonstrating robotic trajectories while bypassing safety concerns and providing a broader range of interaction modalities. We propose the Robot Action Demonstration in Extended Reality (RADER) system, a generic extended reality interface for learning from demonstration. We additionally present its application to an existing state-of-the-art learning from demonstration approach and show comparable results between demonstrations given on a physical robot and those given using our extended reality system.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
Standing on the shoulders of giants
Authors:
Lucas Felipe Ferraro Cardoso,
José de Sousa Ribeiro Filho,
Vitor Cirilo Araujo Santos,
Regiane Silva Kawasaki Frances,
Ronnie Cley de Oliveira Alves
Abstract:
Although fundamental to the advancement of Machine Learning, the classic evaluation metrics extracted from the confusion matrix, such as precision and F1, are limited. Such metrics only offer a quantitative view of the models' performance, without considering the complexity of the data or the quality of the hit. To overcome these limitations, recent research has introduced the use of psychometric…
▽ More
Although fundamental to the advancement of Machine Learning, the classic evaluation metrics extracted from the confusion matrix, such as precision and F1, are limited. Such metrics only offer a quantitative view of the models' performance, without considering the complexity of the data or the quality of the hit. To overcome these limitations, recent research has introduced the use of psychometric metrics such as Item Response Theory (IRT), which allows an assessment at the level of latent characteristics of instances. This work investigates how IRT concepts can enrich a confusion matrix in order to identify which model is the most appropriate among options with similar performance. In the study carried out, IRT does not replace, but complements classical metrics by offering a new layer of evaluation and observation of the fine behavior of models in specific instances. It was also observed that there is 97% confidence that the score from the IRT has different contributions from 66% of the classical metrics analyzed.
△ Less
Submitted 6 September, 2024; v1 submitted 4 September, 2024;
originally announced September 2024.
-
Towards Robust Ferrous Scrap Material Classification with Deep Learning and Conformal Prediction
Authors:
Paulo Henrique dos Santos,
Valéria de Carvalho Santos,
Eduardo José da Silva Luz
Abstract:
In the steel production domain, recycling ferrous scrap is essential for environmental and economic sustainability, as it reduces both energy consumption and greenhouse gas emissions. However, the classification of scrap materials poses a significant challenge, requiring advancements in automation technology. Additionally, building trust among human operators is a major obstacle. Traditional appro…
▽ More
In the steel production domain, recycling ferrous scrap is essential for environmental and economic sustainability, as it reduces both energy consumption and greenhouse gas emissions. However, the classification of scrap materials poses a significant challenge, requiring advancements in automation technology. Additionally, building trust among human operators is a major obstacle. Traditional approaches often fail to quantify uncertainty and lack clarity in model decision-making, which complicates acceptance. In this article, we describe how conformal prediction can be employed to quantify uncertainty and add robustness in scrap classification. We have adapted the Split Conformal Prediction technique to seamlessly integrate with state-of-the-art computer vision models, such as the Vision Transformer (ViT), Swin Transformer, and ResNet-50, while also incorporating Explainable Artificial Intelligence (XAI) methods. We evaluate the approach using a comprehensive dataset of 8147 images spanning nine ferrous scrap classes. The application of the Split Conformal Prediction method allowed for the quantification of each model's uncertainties, which enhanced the understanding of predictions and increased the reliability of the results. Specifically, the Swin Transformer model demonstrated more reliable outcomes than the others, as evidenced by its smaller average size of prediction sets and achieving an average classification accuracy exceeding 95%. Furthermore, the Score-CAM method proved highly effective in clarifying visual features, significantly enhancing the explainability of the classification decisions.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
The Fault in our Stars: Quality Assessment of Code Generation Benchmarks
Authors:
Mohammed Latif Siddiq,
Simantika Dristi,
Joy Saha,
Joanna C. S. Santos
Abstract:
Large Language Models (LLMs) are gaining popularity among software engineers. A crucial aspect of developing effective code generation LLMs is to evaluate these models using a robust benchmark. Evaluation benchmarks with quality issues can provide a false sense of performance. In this work, we conduct the first-of-its-kind study of the quality of prompts within benchmarks used to compare the perfo…
▽ More
Large Language Models (LLMs) are gaining popularity among software engineers. A crucial aspect of developing effective code generation LLMs is to evaluate these models using a robust benchmark. Evaluation benchmarks with quality issues can provide a false sense of performance. In this work, we conduct the first-of-its-kind study of the quality of prompts within benchmarks used to compare the performance of different code generation models. To conduct this study, we analyzed 3,566 prompts from 9 code generation benchmarks to identify quality issues in them. We also investigated whether fixing the identified quality issues in the benchmarks' prompts affects a model's performance. We also studied memorization issues of the evaluation dataset, which can put into question a benchmark's trustworthiness. We found that code generation evaluation benchmarks mainly focused on Python and coding exercises and had very limited contextual dependencies to challenge the model. These datasets and the developers' prompts suffer from quality issues like spelling and grammatical errors, unclear sentences to express developers' intent, and not using proper documentation style. Fixing all these issues in the benchmarks can lead to a better performance for Python code generation, but not a significant improvement was observed for Java code generation. We also found evidence that GPT-3.5-Turbo and CodeGen-2.5 models may have data contamination issues.
△ Less
Submitted 4 September, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Enhancing Decision Analysis with a Large Language Model: pyDecision a Comprehensive Library of MCDA Methods in Python
Authors:
Valdecy Pereira,
Marcio Pereira Basilio,
Carlos Henrique Tarjano SantosCarlos Henrique Tarjano Santos
Abstract:
Purpose: Multicriteria decision analysis (MCDA) has become increasingly essential for decision-making in complex environments. In response to this need, the pyDecision library, implemented in Python and available at https://bit.ly/3tLFGtH, has been developed to provide a comprehensive and accessible collection of MCDA methods. Methods: The pyDecision offers 70 MCDA methods, including AHP, TOPSIS,…
▽ More
Purpose: Multicriteria decision analysis (MCDA) has become increasingly essential for decision-making in complex environments. In response to this need, the pyDecision library, implemented in Python and available at https://bit.ly/3tLFGtH, has been developed to provide a comprehensive and accessible collection of MCDA methods. Methods: The pyDecision offers 70 MCDA methods, including AHP, TOPSIS, and the PROMETHEE and ELECTRE families. Beyond offering a vast range of techniques, the library provides visualization tools for more intuitive results interpretation. In addition to these features, pyDecision has integrated ChatGPT, an advanced Large Language Model, where decision-makers can use ChatGPT to discuss and compare the outcomes of different methods, providing a more interactive and intuitive understanding of the solutions. Findings: Large Language Models are undeniably potent but can sometimes be a double-edged sword. Its answers may be misleading without rigorous verification of its outputs, especially for researchers lacking deep domain expertise. It's imperative to approach its insights with a discerning eye and a solid foundation in the relevant field. Originality: With the integration of MCDA methods and ChatGPT, pyDecision is a significant contribution to the scientific community, as it is an invaluable resource for researchers, practitioners, and decision-makers navigating complex decision-making problems and seeking the most appropriate solutions based on MCDA methods.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
A Survey of Source Code Representations for Machine Learning-Based Cybersecurity Tasks
Authors:
Beatrice Casey,
Joanna C. S. Santos,
George Perry
Abstract:
Machine learning techniques for cybersecurity-related software engineering tasks are becoming increasingly popular. The representation of source code is a key portion of the technique that can impact the way the model is able to learn the features of the source code. With an increasing number of these techniques being developed, it is valuable to see the current state of the field to better unders…
▽ More
Machine learning techniques for cybersecurity-related software engineering tasks are becoming increasingly popular. The representation of source code is a key portion of the technique that can impact the way the model is able to learn the features of the source code. With an increasing number of these techniques being developed, it is valuable to see the current state of the field to better understand what exists and what is not there yet. This article presents a study of these existing machine learning based approaches and demonstrates what type of representations were used for different cybersecurity tasks and programming languages. Additionally, we study what types of models are used with different representations. We have found that graph-based representations are the most popular category of representation, and tokenizers and Abstract Syntax Trees (ASTs) are the two most popular representations overall (e.g., AST and tokenizers are the representations with the highest count of papers, whereas graph-based representations is the category with the highest count of papers). We also found that the most popular cybersecurity task is vulnerability detection, and the language that is covered by the most techniques is C. Finally, we found that sequence-based models are the most popular category of models, and Support Vector Machines are the most popular model overall.
△ Less
Submitted 9 April, 2025; v1 submitted 15 March, 2024;
originally announced March 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1112 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 16 December, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Self-calibrated convolution towards glioma segmentation
Authors:
Felipe C. R. Salvagnini,
Gerson O. Barbosa,
Alexandre X. Falcao,
Cid A. N. Santos
Abstract:
Accurate brain tumor segmentation in the early stages of the disease is crucial for the treatment's effectiveness, avoiding exhaustive visual inspection of a qualified specialist on 3D MR brain images of multiple protocols (e.g., T1, T2, T2-FLAIR, T1-Gd). Several networks exist for Glioma segmentation, being nnU-Net one of the best. In this work, we evaluate self-calibrated convolutions in differe…
▽ More
Accurate brain tumor segmentation in the early stages of the disease is crucial for the treatment's effectiveness, avoiding exhaustive visual inspection of a qualified specialist on 3D MR brain images of multiple protocols (e.g., T1, T2, T2-FLAIR, T1-Gd). Several networks exist for Glioma segmentation, being nnU-Net one of the best. In this work, we evaluate self-calibrated convolutions in different parts of the nnU-Net network to demonstrate that self-calibrated modules in skip connections can significantly improve the enhanced-tumor and tumor-core segmentation accuracy while preserving the wholetumor segmentation accuracy.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Feedback to the European Data Protection Board's Guidelines 2/2023 on Technical Scope of Art. 5(3) of ePrivacy Directive
Authors:
Cristiana Santos,
Nataliia Bielova,
Vincent Roca,
Mathieu Cunche,
Gilles Mertens,
Karel Kubicek,
Hamed Haddadi
Abstract:
We very much welcome the EDPB's Guidelines. Please find hereunder our feedback to the Guidelines 2/2023 on Technical Scope of Art. 5(3) of ePrivacy Directive. Our comments are presented after a quotation from the proposed text by the EDPB in a box.
We very much welcome the EDPB's Guidelines. Please find hereunder our feedback to the Guidelines 2/2023 on Technical Scope of Art. 5(3) of ePrivacy Directive. Our comments are presented after a quotation from the proposed text by the EDPB in a box.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Efficient $(3,3)$-isogenies on fast Kummer surfaces
Authors:
Maria Corte-Real Santos,
Craig Costello,
Benjamin Smith
Abstract:
We give an alternative derivation of $(N,N)$-isogenies between fast Kummer surfaces which complements existing works based on the theory oftheta functions. We use this framework to produce explicit formulae for the case of $N = 3$, and show that the resulting algorithms are more efficient than all prior $(3, 3)$-isogeny algorithms.
We give an alternative derivation of $(N,N)$-isogenies between fast Kummer surfaces which complements existing works based on the theory oftheta functions. We use this framework to produce explicit formulae for the case of $N = 3$, and show that the resulting algorithms are more efficient than all prior $(3, 3)$-isogeny algorithms.
△ Less
Submitted 4 September, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Using Zero-shot Prompting in the Automatic Creation and Expansion of Topic Taxonomies for Tagging Retail Banking Transactions
Authors:
Daniel de S. Moraes,
Pedro T. C. Santos,
Polyana B. da Costa,
Matheus A. S. Pinto,
Ivan de J. P. Pinto,
Álvaro M. G. da Veiga,
Sergio Colcher,
Antonio J. G. Busson,
Rafael H. Rocha,
Rennan Gaio,
Rafael Miceli,
Gabriela Tourinho,
Marcos Rabaioli,
Leandro Santos,
Fellipe Marques,
David Favaro
Abstract:
This work presents an unsupervised method for automatically constructing and expanding topic taxonomies using instruction-based fine-tuned LLMs (Large Language Models). We apply topic modeling and keyword extraction techniques to create initial topic taxonomies and LLMs to post-process the resulting terms and create a hierarchy. To expand an existing taxonomy with new terms, we use zero-shot promp…
▽ More
This work presents an unsupervised method for automatically constructing and expanding topic taxonomies using instruction-based fine-tuned LLMs (Large Language Models). We apply topic modeling and keyword extraction techniques to create initial topic taxonomies and LLMs to post-process the resulting terms and create a hierarchy. To expand an existing taxonomy with new terms, we use zero-shot prompting to find out where to add new nodes, which, to our knowledge, is the first work to present such an approach to taxonomy tasks. We use the resulting taxonomies to assign tags that characterize merchants from a retail bank dataset. To evaluate our work, we asked 12 volunteers to answer a two-part form in which we first assessed the quality of the taxonomies created and then the tags assigned to merchants based on that taxonomy. The evaluation revealed a coherence rate exceeding 90% for the chosen taxonomies. The taxonomies' expansion with LLMs also showed exciting results for parent node prediction, with an f1-score above 70% in our taxonomies.
△ Less
Submitted 11 February, 2024; v1 submitted 7 January, 2024;
originally announced January 2024.
-
Skin cancer diagnosis using NIR spectroscopy data of skin lesions in vivo using machine learning algorithms
Authors:
Flavio P. Loss,
Pedro H. da Cunha,
Matheus B. Rocha,
Madson Poltronieri Zanoni,
Leandro M. de Lima,
Isadora Tavares Nascimento,
Isabella Rezende,
Tania R. P. Canuto,
Luciana de Paula Vieira,
Renan Rossoni,
Maria C. S. Santos,
Patricia Lyra Frasson,
Wanderson Romão,
Paulo R. Filgueiras,
Renato A. Krohling
Abstract:
Skin lesions are classified in benign or malignant. Among the malignant, melanoma is a very aggressive cancer and the major cause of deaths. So, early diagnosis of skin cancer is very desired. In the last few years, there is a growing interest in computer aided diagnostic (CAD) using most image and clinical data of the lesion. These sources of information present limitations due to their inability…
▽ More
Skin lesions are classified in benign or malignant. Among the malignant, melanoma is a very aggressive cancer and the major cause of deaths. So, early diagnosis of skin cancer is very desired. In the last few years, there is a growing interest in computer aided diagnostic (CAD) using most image and clinical data of the lesion. These sources of information present limitations due to their inability to provide information of the molecular structure of the lesion. NIR spectroscopy may provide an alternative source of information to automated CAD of skin lesions. The most commonly used techniques and classification algorithms used in spectroscopy are Principal Component Analysis (PCA), Partial Least Squares - Discriminant Analysis (PLS-DA), and Support Vector Machines (SVM). Nonetheless, there is a growing interest in applying the modern techniques of machine and deep learning (MDL) to spectroscopy. One of the main limitations to apply MDL to spectroscopy is the lack of public datasets. Since there is no public dataset of NIR spectral data to skin lesions, as far as we know, an effort has been made and a new dataset named NIR-SC-UFES, has been collected, annotated and analyzed generating the gold-standard for classification of NIR spectral data to skin cancer. Next, the machine learning algorithms XGBoost, CatBoost, LightGBM, 1D-convolutional neural network (1D-CNN) were investigated to classify cancer and non-cancer skin lesions. Experimental results indicate the best performance obtained by LightGBM with pre-processing using standard normal variate (SNV), feature extraction providing values of 0.839 for balanced accuracy, 0.851 for recall, 0.852 for precision, and 0.850 for F-score. The obtained results indicate the first steps in CAD of skin lesions aiming the automated triage of patients with skin lesions in vivo using NIR spectral data.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
A Case Study on Test Case Construction with Large Language Models: Unveiling Practical Insights and Challenges
Authors:
Roberto Francisco de Lima Junior,
Luiz Fernando Paes de Barros Presta,
Lucca Santos Borborema,
Vanderson Nogueira da Silva,
Marcio Leal de Melo Dahia,
Anderson Carlos Sousa e Santos
Abstract:
This paper presents a detailed case study examining the application of Large Language Models (LLMs) in the construction of test cases within the context of software engineering. LLMs, characterized by their advanced natural language processing capabilities, are increasingly garnering attention as tools to automate and enhance various aspects of the software development life cycle. Leveraging a cas…
▽ More
This paper presents a detailed case study examining the application of Large Language Models (LLMs) in the construction of test cases within the context of software engineering. LLMs, characterized by their advanced natural language processing capabilities, are increasingly garnering attention as tools to automate and enhance various aspects of the software development life cycle. Leveraging a case study methodology, we systematically explore the integration of LLMs in the test case construction process, aiming to shed light on their practical efficacy, challenges encountered, and implications for software quality assurance. The study encompasses the selection of a representative software application, the formulation of test case construction methodologies employing LLMs, and the subsequent evaluation of outcomes. Through a blend of qualitative and quantitative analyses, this study assesses the impact of LLMs on test case comprehensiveness, accuracy, and efficiency. Additionally, delves into challenges such as model interpretability and adaptation to diverse software contexts. The findings from this case study contributes with nuanced insights into the practical utility of LLMs in the domain of test case construction, elucidating their potential benefits and limitations. By addressing real-world scenarios and complexities, this research aims to inform software practitioners and researchers alike about the tangible implications of incorporating LLMs into the software testing landscape, fostering a more comprehensive understanding of their role in optimizing the software development process.
△ Less
Submitted 21 December, 2023; v1 submitted 19 December, 2023;
originally announced December 2023.
-
You Can't Trust Your Tag Neither: Privacy Leaks and Potential Legal Violations within the Google Tag Manager
Authors:
Gilles Mertens,
Nataliia Bielova,
Vincent Roca,
Cristiana Santos
Abstract:
Tag Management Systems were developed in order to support website publishers in installing multiple third-party JavaScript scripts (Tags) on their websites. Google developed its own TMS called ``Google Tag Manager'' (GTM) that is currently present on 42\% of the top 1 million most popular websites. However, GTM has not yet been thoroughly evaluated by the academic research community. In this work,…
▽ More
Tag Management Systems were developed in order to support website publishers in installing multiple third-party JavaScript scripts (Tags) on their websites. Google developed its own TMS called ``Google Tag Manager'' (GTM) that is currently present on 42\% of the top 1 million most popular websites. However, GTM has not yet been thoroughly evaluated by the academic research community. In this work, we study, for the first time, the Tags provided within the GTM system. We propose a new methodology called ``detecting privacy leaks in isolation'' and apply it to multiple Tags to analyse the types of data that Tags collect and contrast them to the legal and technical documentation, in collaboration with a legal expert. Across three studies - in-depth analysis of 6 Tags, automated analysis of 718 Tags, and analysis of Google ``Consent Mode'' - we discover multiple hidden data leaks, incomplete and diverging declarations, undisclosed third-parties and cookies, personal data sharing without consent and we further identify potential legal violations within EU Data Protection law.
△ Less
Submitted 11 April, 2025; v1 submitted 14 December, 2023;
originally announced December 2023.
-
"It doesn't tell me anything about how my data is used'': User Perceptions of Data Collection Purposes
Authors:
Lin Kyi,
Abraham Mhaidli,
Cristiana Santos,
Franziska Roesner,
Asia Biega
Abstract:
Data collection purposes and their descriptions are presented on almost all privacy notices under the GDPR, yet there is a lack of research focusing on how effective they are at informing users about data practices. We fill this gap by investigating users' perceptions of data collection purposes and their descriptions, a crucial aspect of informed consent. We conducted 23 semi-structured interview…
▽ More
Data collection purposes and their descriptions are presented on almost all privacy notices under the GDPR, yet there is a lack of research focusing on how effective they are at informing users about data practices. We fill this gap by investigating users' perceptions of data collection purposes and their descriptions, a crucial aspect of informed consent. We conducted 23 semi-structured interviews with European users to investigate user perceptions of six common purposes (Strictly Necessary, Statistics and Analytics, Performance and Functionality, Marketing and Advertising, Personalized Advertising, and Personalized Content) and identified elements of an effective purpose name and description.
We found that most purpose descriptions do not contain the information users wish to know, and that participants preferred some purpose names over others due to their perceived transparency or ease of understanding. Based on these findings, we suggest how the framing of purposes can be improved toward meaningful informed consent.
△ Less
Submitted 6 February, 2024; v1 submitted 12 December, 2023;
originally announced December 2023.
-
Memory Augmented Language Models through Mixture of Word Experts
Authors:
Cicero Nogueira dos Santos,
James Lee-Thorp,
Isaac Noble,
Chung-Ching Chang,
David Uthus
Abstract:
Scaling up the number of parameters of language models has proven to be an effective approach to improve performance. For dense models, increasing model size proportionally increases the model's computation footprint. In this work, we seek to aggressively decouple learning capacity and FLOPs through Mixture-of-Experts (MoE) style models with large knowledge-rich vocabulary based routing functions…
▽ More
Scaling up the number of parameters of language models has proven to be an effective approach to improve performance. For dense models, increasing model size proportionally increases the model's computation footprint. In this work, we seek to aggressively decouple learning capacity and FLOPs through Mixture-of-Experts (MoE) style models with large knowledge-rich vocabulary based routing functions and experts. Our proposed approach, dubbed Mixture of Word Experts (MoWE), can be seen as a memory augmented model, where a large set of word-specific experts play the role of a sparse memory. We demonstrate that MoWE performs significantly better than the T5 family of models with similar number of FLOPs in a variety of NLP tasks. Additionally, MoWE outperforms regular MoE models on knowledge intensive tasks and has similar performance to more complex memory augmented approaches that often require to invoke custom mechanisms to search the sparse memory.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Seneca: Taint-Based Call Graph Construction for Java Object Deserialization
Authors:
Joanna C. S. Santos,
Mehdi Mirakhorli,
Ali Shokri
Abstract:
Object serialization and deserialization are widely used for storing and preserving objects in files, memory, or database as well as for transporting them across machines, enabling remote interaction among processes and many more. This mechanism relies on reflection, a dynamic language that introduces serious challenges for static analyses. Current state-of-the-art call graph construction algorith…
▽ More
Object serialization and deserialization are widely used for storing and preserving objects in files, memory, or database as well as for transporting them across machines, enabling remote interaction among processes and many more. This mechanism relies on reflection, a dynamic language that introduces serious challenges for static analyses. Current state-of-the-art call graph construction algorithms do not fully support object serialization/deserialization, i.e., they are unable to uncover the callback methods that are invoked when objects are serialized and deserialized. Since call graphs are a core data structure for multiple types of analysis (e.g., vulnerability detection), an appropriate analysis cannot be performed since the call graph does not capture hidden (vulnerable) paths that occur via callback methods. In this paper, we present Seneca, an approach for handling serialization with improved soundness in the context of call graph construction. Our approach relies on taint analysis and API modeling to construct sound call graphs. We evaluated our approach with respect to soundness, precision, performance, and usefulness in detecting untrusted object deserialization vulnerabilities. Our results show that Seneca can create sound call graphs with respect to serialization features. The resulting call graphs do not incur significant runtime overhead and were shown to be useful for performing identification of vulnerable paths caused by untrusted object deserialization.
△ Less
Submitted 2 September, 2024; v1 submitted 1 November, 2023;
originally announced November 2023.
-
SALLM: Security Assessment of Generated Code
Authors:
Mohammed Latif Siddiq,
Joanna C. S. Santos,
Sajith Devareddy,
Anna Muller
Abstract:
With the growing popularity of Large Language Models (LLMs) in software engineers' daily practices, it is important to ensure that the code generated by these tools is not only functionally correct but also free of vulnerabilities. Although LLMs can help developers to be more productive, prior empirical studies have shown that LLMs can generate insecure code. There are two contributing factors to…
▽ More
With the growing popularity of Large Language Models (LLMs) in software engineers' daily practices, it is important to ensure that the code generated by these tools is not only functionally correct but also free of vulnerabilities. Although LLMs can help developers to be more productive, prior empirical studies have shown that LLMs can generate insecure code. There are two contributing factors to the insecure code generation. First, existing datasets used to evaluate LLMs do not adequately represent genuine software engineering tasks sensitive to security. Instead, they are often based on competitive programming challenges or classroom-type coding tasks. In real-world applications, the code produced is integrated into larger codebases, introducing potential security risks. Second, existing evaluation metrics primarily focus on the functional correctness of the generated code while ignoring security considerations. Therefore, in this paper, we described SALLM, a framework to benchmark LLMs' abilities to generate secure code systematically. This framework has three major components: a novel dataset of security-centric Python prompts, configurable assessment techniques to evaluate the generated code, and novel metrics to evaluate the models' performance from the perspective of secure code generation.
△ Less
Submitted 4 September, 2024; v1 submitted 1 November, 2023;
originally announced November 2023.
-
Predictive Maintenance Model Based on Anomaly Detection in Induction Motors: A Machine Learning Approach Using Real-Time IoT Data
Authors:
Sergio F. Chevtchenko,
Monalisa C. M. dos Santos,
Diego M. Vieira,
Ricardo L. Mota,
Elisson Rocha,
Bruna V. Cruz,
Danilo Araújo,
Ermeson Andrade
Abstract:
With the support of Internet of Things (IoT) devices, it is possible to acquire data from degradation phenomena and design data-driven models to perform anomaly detection in industrial equipment. This approach not only identifies potential anomalies but can also serve as a first step toward building predictive maintenance policies. In this work, we demonstrate a novel anomaly detection system on i…
▽ More
With the support of Internet of Things (IoT) devices, it is possible to acquire data from degradation phenomena and design data-driven models to perform anomaly detection in industrial equipment. This approach not only identifies potential anomalies but can also serve as a first step toward building predictive maintenance policies. In this work, we demonstrate a novel anomaly detection system on induction motors used in pumps, compressors, fans, and other industrial machines. This work evaluates a combination of pre-processing techniques and machine learning (ML) models with a low computational cost. We use a combination of pre-processing techniques such as Fast Fourier Transform (FFT), Wavelet Transform (WT), and binning, which are well-known approaches for extracting features from raw data. We also aim to guarantee an optimal balance between multiple conflicting parameters, such as anomaly detection rate, false positive rate, and inference speed of the solution. To this end, multiobjective optimization and analysis are performed on the evaluated models. Pareto-optimal solutions are presented to select which models have the best results regarding classification metrics and computational effort. Differently from most works in this field that use publicly available datasets to validate their models, we propose an end-to-end solution combining low-cost and readily available IoT sensors. The approach is validated by acquiring a custom dataset from induction motors. Also, we fuse vibration, temperature, and noise data from these sensors as the input to the proposed ML model. Therefore, we aim to propose a methodology general enough to be applied in different industrial contexts in the future.
△ Less
Submitted 15 October, 2023;
originally announced October 2023.
-
Discovery of Novel Reticular Materials for Carbon Dioxide Capture using GFlowNets
Authors:
Flaviu Cipcigan,
Jonathan Booth,
Rodrigo Neumann Barros Ferreira,
Carine Ribeiro dos Santos,
Mathias Steiner
Abstract:
Artificial intelligence holds promise to improve materials discovery. GFlowNets are an emerging deep learning algorithm with many applications in AI-assisted discovery. By using GFlowNets, we generate porous reticular materials, such as metal organic frameworks and covalent organic frameworks, for applications in carbon dioxide capture. We introduce a new Python package (matgfn) to train and sampl…
▽ More
Artificial intelligence holds promise to improve materials discovery. GFlowNets are an emerging deep learning algorithm with many applications in AI-assisted discovery. By using GFlowNets, we generate porous reticular materials, such as metal organic frameworks and covalent organic frameworks, for applications in carbon dioxide capture. We introduce a new Python package (matgfn) to train and sample GFlowNets. We use matgfn to generate the matgfn-rm dataset of novel and diverse reticular materials with gravimetric surface area above 5000 m$^2$/g. We calculate single- and two-component gas adsorption isotherms for the top-100 candidates in matgfn-rm. These candidates are novel compared to the state-of-art ARC-MOF dataset and rank in the 90th percentile in terms of working capacity compared to the CoRE2019 dataset. We discover 15 materials outperforming all materials in CoRE2019.
△ Less
Submitted 16 October, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Balancing Computational Efficiency and Forecast Error in Machine Learning-based Time-Series Forecasting: Insights from Live Experiments on Meteorological Nowcasting
Authors:
Elin Törnquist,
Wagner Costa Santos,
Timothy Pogue,
Nicholas Wingle,
Robert A. Caulk
Abstract:
Machine learning for time-series forecasting remains a key area of research. Despite successful application of many machine learning techniques, relating computational efficiency to forecast error remains an under-explored domain. This paper addresses this topic through a series of real-time experiments to quantify the relationship between computational cost and forecast error using meteorological…
▽ More
Machine learning for time-series forecasting remains a key area of research. Despite successful application of many machine learning techniques, relating computational efficiency to forecast error remains an under-explored domain. This paper addresses this topic through a series of real-time experiments to quantify the relationship between computational cost and forecast error using meteorological nowcasting as an example use-case. We employ a variety of popular regression techniques (XGBoost, FC-MLP, Transformer, and LSTM) for multi-horizon, short-term forecasting of three variables (temperature, wind speed, and cloud cover) for multiple locations. During a 5-day live experiment, 4000 data sources were streamed for training and inferencing 144 models per hour. These models were parameterized to explore forecast error for two computational cost minimization methods: a novel auto-adaptive data reduction technique (Variance Horizon) and a performance-based concept drift-detection mechanism. Forecast error of all model variations were benchmarked in real-time against a state-of-the-art numerical weather prediction model. Performance was assessed using classical and novel evaluation metrics. Results indicate that using the Variance Horizon reduced computational usage by more than 50\%, while increasing between 0-15\% in error. Meanwhile, performance-based retraining reduced computational usage by up to 90\% while \emph{also} improving forecast error by up to 10\%. Finally, the combination of both the Variance Horizon and performance-based retraining outperformed other model configurations by up to 99.7\% when considering error normalized to computational usage.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Enhancing E-Learning System Through Learning Management System (LMS) Technologies: Reshape The Learner Experience
Authors:
Cecilia P. Abaricia,
Manuel Luis C. Delos Santos
Abstract:
This paper aims to determine how the LMS Web portal application reshapes the learner experience through the developed E-Learning Management System using Data Mining Algorithm.
The methodology that the researchers used is descriptive research involving the interpretation of the meaning or significance of what is described. Gather data from questionnaires, surveys, observations concerned with the…
▽ More
This paper aims to determine how the LMS Web portal application reshapes the learner experience through the developed E-Learning Management System using Data Mining Algorithm.
The methodology that the researchers used is descriptive research involving the interpretation of the meaning or significance of what is described. Gather data from questionnaires, surveys, observations concerned with the study, and the chi-square formula for the statistical treatment of data.
The findings of the study, the extent that LMS Web portal application reshapes the learner experience in terms of the following variables with the Average Weighted Mean (AWM): Flexible engagement of Learners in any device is highly satisfied; Personalize learning tracker is highly satisfied; Collaborating with the Learning Expert is highly satisfied; Provides user-friendly Teaching Tools is satisfied; Evident Learner Progress and Involvement and is satisfied.
In the final analysis, this E-Learning System can fit any educational needs as follows: chat, virtual classes, supportive resources for the students, individual and group monitoring, and assessment using LMS as maximum efficiency. Moreover, this platform can be used to deliver hybrid learning.
△ Less
Submitted 31 August, 2023;
originally announced September 2023.
-
Legitimate Interest is the New Consent -- Large-Scale Measurement and Legal Compliance of IAB Europe TCF Paywalls
Authors:
Victor Morel,
Cristiana Santos,
Viktor Fredholm,
Adam Thunberg
Abstract:
Cookie paywalls allow visitors of a website to access its content only after they make a choice between paying a fee or accept tracking. European Data Protection Authorities (DPAs) recently issued guidelines and decisions on paywalls lawfulness, but it is yet unknown whether websites comply with them. We study in this paper the prevalence of cookie paywalls on the top one million websites using an…
▽ More
Cookie paywalls allow visitors of a website to access its content only after they make a choice between paying a fee or accept tracking. European Data Protection Authorities (DPAs) recently issued guidelines and decisions on paywalls lawfulness, but it is yet unknown whether websites comply with them. We study in this paper the prevalence of cookie paywalls on the top one million websites using an automatic crawler. We identify 431 cookie paywalls, all using the Transparency and Consent Framework (TCF). We then analyse the data these paywalls communicate through the TCF, and in particular, the legal grounds and the purposes used to collect personal data. We observe that cookie paywalls extensively rely on legitimate interest legal basis systematically conflated with consent. We also observe a lack of correlation between the presence of paywalls and legal decisions or guidelines by DPAs.
△ Less
Submitted 13 October, 2023; v1 submitted 20 September, 2023;
originally announced September 2023.
-
An Ontology of Dark Patterns Knowledge: Foundations, Definitions, and a Pathway for Shared Knowledge-Building
Authors:
Colin M. Gray,
Cristiana Santos,
Nataliia Bielova,
Thomas Mildner
Abstract:
Deceptive and coercive design practices are increasingly used by companies to extract profit, harvest data, and limit consumer choice. Dark patterns represent the most common contemporary amalgamation of these problematic practices, connecting designers, technologists, scholars, regulators, and legal professionals in transdisciplinary dialogue. However, a lack of universally accepted definitions a…
▽ More
Deceptive and coercive design practices are increasingly used by companies to extract profit, harvest data, and limit consumer choice. Dark patterns represent the most common contemporary amalgamation of these problematic practices, connecting designers, technologists, scholars, regulators, and legal professionals in transdisciplinary dialogue. However, a lack of universally accepted definitions across the academic, legislative and regulatory space has likely limited the impact that scholarship on dark patterns might have in supporting sanctions and evolved design practices. In this paper, we seek to support the development of a shared language of dark patterns, harmonizing ten existing regulatory and academic taxonomies of dark patterns and proposing a three-level ontology with standardized definitions for 65 synthesized dark patterns types across low-, meso-, and high-level patterns. We illustrate how this ontology can support translational research and regulatory action, including pathways to extend our initial types through new empirical work and map across application domains.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Improving Image Classification of Knee Radiographs: An Automated Image Labeling Approach
Authors:
Jikai Zhang,
Carlos Santos,
Christine Park,
Maciej Mazurowski,
Roy Colglazier
Abstract:
Large numbers of radiographic images are available in knee radiology practices which could be used for training of deep learning models for diagnosis of knee abnormalities. However, those images do not typically contain readily available labels due to limitations of human annotations. The purpose of our study was to develop an automated labeling approach that improves the image classification mode…
▽ More
Large numbers of radiographic images are available in knee radiology practices which could be used for training of deep learning models for diagnosis of knee abnormalities. However, those images do not typically contain readily available labels due to limitations of human annotations. The purpose of our study was to develop an automated labeling approach that improves the image classification model to distinguish normal knee images from those with abnormalities or prior arthroplasty. The automated labeler was trained on a small set of labeled data to automatically label a much larger set of unlabeled data, further improving the image classification performance for knee radiographic diagnosis. We developed our approach using 7,382 patients and validated it on a separate set of 637 patients. The final image classification model, trained using both manually labeled and pseudo-labeled data, had the higher weighted average AUC (WAUC: 0.903) value and higher AUC-ROC values among all classes (normal AUC-ROC: 0.894; abnormal AUC-ROC: 0.896, arthroplasty AUC-ROC: 0.990) compared to the baseline model (WAUC=0.857; normal AUC-ROC: 0.842; abnormal AUC-ROC: 0.848, arthroplasty AUC-ROC: 0.987), trained using only manually labeled data. DeLong tests show that the improvement is significant on normal (p-value<0.002) and abnormal (p-value<0.001) images. Our findings demonstrated that the proposed automated labeling approach significantly improves the performance of image classification for radiographic knee diagnosis, allowing for facilitating patient care and curation of large knee datasets.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
ICARUS: An Android-Based Unmanned Aerial Vehicle (UAV) Search and Rescue Eye in the Sky
Authors:
Manuel Luis C. Delos Santos,
Jerum B. Dasalla,
Jomar C. Feliciano,
Dustin Red B. Cabatay
Abstract:
The purpose of this paper is to develop an unmanned aerial vehicle (UAV) using a quadcopter with the capability of video surveillance, map coordinates, a deployable parachute with a medicine kit or a food pack as a payload, a collision warning system, remotely controlled, integrated with an android application to assist in search and rescue operations.
Applied research for the development of the…
▽ More
The purpose of this paper is to develop an unmanned aerial vehicle (UAV) using a quadcopter with the capability of video surveillance, map coordinates, a deployable parachute with a medicine kit or a food pack as a payload, a collision warning system, remotely controlled, integrated with an android application to assist in search and rescue operations.
Applied research for the development of the functional prototype, quantitative and descriptive statistics to summarize data by describing the relationship between variables in a sample or population. The quadcopter underwent an evaluation using a survey instrument to test its acceptability using predefined variables to select respondents within Caloocan City and Quezon City, Philippines.
Demographic profiles and known issues and concerns were answered by 30 respondents. The results were summarized and distributed in Tables 1 and 2.
In terms of demographic profiles, the number of SAR operators within the specified areas is distributed equally, most are male, single, and within the age bracket of 31 and above. In issues and concerns, the most common type of search and rescue was ground search and rescue. Human error is the primary cause of most injuries in operating units. The prototype was useful and everyone agreed, in terms of acceptability, drone technology will improve search and rescue operations.
The innovative way of utilizing Android and drone technology is a new step towards the improvement of SAR operations in the Philippines.
The LiPo battery must be replaced with a higher capacity and the drone operator should undergo a training course and secure a permit from the Civil Aviation Authority of the Philippines (CAAP).
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
Anomaly Detection in Industrial Machinery using IoT Devices and Machine Learning: a Systematic Mapping
Authors:
Sérgio F. Chevtchenko,
Elisson da Silva Rocha,
Monalisa Cristina Moura Dos Santos,
Ricardo Lins Mota,
Diego Moura Vieira,
Ermeson Carneiro de Andrade,
Danilo Ricardo Barbosa de Araújo
Abstract:
Anomaly detection is critical in the smart industry for preventing equipment failure, reducing downtime, and improving safety. Internet of Things (IoT) has enabled the collection of large volumes of data from industrial machinery, providing a rich source of information for Anomaly Detection. However, the volume and complexity of data generated by the Internet of Things ecosystems make it difficult…
▽ More
Anomaly detection is critical in the smart industry for preventing equipment failure, reducing downtime, and improving safety. Internet of Things (IoT) has enabled the collection of large volumes of data from industrial machinery, providing a rich source of information for Anomaly Detection. However, the volume and complexity of data generated by the Internet of Things ecosystems make it difficult for humans to detect anomalies manually. Machine learning (ML) algorithms can automate anomaly detection in industrial machinery by analyzing generated data. Besides, each technique has specific strengths and weaknesses based on the data nature and its corresponding systems. However, the current systematic mapping studies on Anomaly Detection primarily focus on addressing network and cybersecurity-related problems, with limited attention given to the industrial sector. Additionally, these studies do not cover the challenges involved in using ML for Anomaly Detection in industrial machinery within the context of the IoT ecosystems. This paper presents a systematic mapping study on Anomaly Detection for industrial machinery using IoT devices and ML algorithms to address this gap. The study comprehensively evaluates 84 relevant studies spanning from 2016 to 2023, providing an extensive review of Anomaly Detection research. Our findings identify the most commonly used algorithms, preprocessing techniques, and sensor types. Additionally, this review identifies application areas and points to future challenges and research opportunities.
△ Less
Submitted 14 November, 2023; v1 submitted 28 July, 2023;
originally announced July 2023.
-
RobôCIn Small Size League Extended Team Description Paper for RoboCup 2023
Authors:
Aline Lima de Oliveira,
Cauê Addae da Silva Gomes,
Cecília Virginia Santos da Silva,
Charles Matheus de Sousa Alves,
Danilo Andrade Martins de Souza,
Driele Pires Ferreira Araújo Xavier,
Edgleyson Pereira da Silva,
Felipe Bezerra Martins,
Lucas Henrique Cavalcanti Santos,
Lucas Dias Maciel,
Matheus Paixão Gumercindo dos Santos,
Matheus Lafayette Vasconcelos,
Matheus Vinícius Teotonio do Nascimento Andrade,
João Guilherme Oliveira Carvalho de Melo,
João Pedro Souza Pereira de Moura,
José Ronald da Silva,
José Victor Silva Cruz,
Pedro Henrique Santana de Morais,
Pedro Paulo Salman de Oliveira,
Riei Joaquim Matos Rodrigues,
Roberto Costa Fernandes,
Ryan Vinicius Santos Morais,
Tamara Mayara Ramos Teobaldo,
Washington Igor dos Santos Silva,
Edna Natividade Silva Barros
Abstract:
RobôCIn has participated in RoboCup Small Size League since 2019, won its first world title in 2022 (Division B), and is currently a three-times Latin-American champion. This paper presents our improvements to defend the Small Size League (SSL) division B title in RoboCup 2023 in Bordeaux, France. This paper aims to share some of the academic research that our team developed over the past year. Ou…
▽ More
RobôCIn has participated in RoboCup Small Size League since 2019, won its first world title in 2022 (Division B), and is currently a three-times Latin-American champion. This paper presents our improvements to defend the Small Size League (SSL) division B title in RoboCup 2023 in Bordeaux, France. This paper aims to share some of the academic research that our team developed over the past year. Our team has successfully published 2 articles related to SSL at two high-impact conferences: the 25th RoboCup International Symposium and the 19th IEEE Latin American Robotics Symposium (LARS 2022). Over the last year, we have been continuously migrating from our past codebase to Unification. We will describe the new architecture implemented and some points of software and AI refactoring. In addition, we discuss the process of integrating machined components into the mechanical system, our development for participating in the vision blackout challenge last year and what we are preparing for this year.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
FRANC: A Lightweight Framework for High-Quality Code Generation
Authors:
Mohammed Latif Siddiq,
Beatrice Casey,
Joanna C. S. Santos
Abstract:
In recent years, the use of automated source code generation utilizing transformer-based generative models has expanded, and these models can generate functional code according to the requirements of the developers. However, recent research revealed that these automatically generated source codes can contain vulnerabilities and other quality issues. Despite researchers' and practitioners' attempts…
▽ More
In recent years, the use of automated source code generation utilizing transformer-based generative models has expanded, and these models can generate functional code according to the requirements of the developers. However, recent research revealed that these automatically generated source codes can contain vulnerabilities and other quality issues. Despite researchers' and practitioners' attempts to enhance code generation models, retraining and fine-tuning large language models is time-consuming and resource-intensive. Thus, we describe FRANC, a lightweight framework for recommending more secure and high-quality source code derived from transformer-based code generation models. FRANC includes a static filter to make the generated code compilable with heuristics and a quality-aware ranker to sort the code snippets based on a quality score. Moreover, the framework uses prompt engineering to fix persistent quality issues. We evaluated the framework with five Python and Java code generation models and six prompt datasets, including a newly created one in this work (SOEval). The static filter improves 9% to 46% Java suggestions and 10% to 43% Python suggestions regarding compilability. The average improvement over the NDCG@10 score for the ranking system is 0.0763, and the repairing techniques repair the highest 80% of prompts. FRANC takes, on average, 1.98 seconds for Java; for Python, it takes 0.08 seconds.
△ Less
Submitted 28 August, 2024; v1 submitted 16 July, 2023;
originally announced July 2023.
-
AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring
Authors:
Juan Sebastián Cañas,
Maria Paula Toro-Gómez,
Larissa Sayuri Moreira Sugai,
Hernán Darío Benítez Restrepo,
Jorge Rudas,
Breyner Posso Bautista,
Luís Felipe Toledo,
Simone Dena,
Adão Henrique Rosa Domingos,
Franco Leandro de Souza,
Selvino Neckel-Oliveira,
Anderson da Rosa,
Vítor Carvalho-Rocha,
José Vinícius Bernardy,
José Luiz Massao Moreira Sugai,
Carolina Emília dos Santos,
Rogério Pereira Bastos,
Diego Llusia,
Juan Sebastián Ulloa
Abstract:
Global change is predicted to induce shifts in anuran acoustic behavior, which can be studied through passive acoustic monitoring (PAM). Understanding changes in calling behavior requires the identification of anuran species, which is challenging due to the particular characteristics of neotropical soundscapes. In this paper, we introduce a large-scale multi-species dataset of anuran amphibians ca…
▽ More
Global change is predicted to induce shifts in anuran acoustic behavior, which can be studied through passive acoustic monitoring (PAM). Understanding changes in calling behavior requires the identification of anuran species, which is challenging due to the particular characteristics of neotropical soundscapes. In this paper, we introduce a large-scale multi-species dataset of anuran amphibians calls recorded by PAM, that comprises 27 hours of expert annotations for 42 different species from two Brazilian biomes. We provide open access to the dataset, including the raw recordings, experimental setup code, and a benchmark with a baseline model of the fine-grained categorization problem. Additionally, we highlight the challenges of the dataset to encourage machine learning researchers to solve the problem of anuran call identification towards conservation policy. All our experiments and resources can be found on our GitHub repository https://github.com/soundclim/anuraset.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
Triggering Multi-Hop Reasoning for Question Answering in Language Models using Soft Prompts and Random Walks
Authors:
Kanishka Misra,
Cicero Nogueira dos Santos,
Siamak Shakeri
Abstract:
Despite readily memorizing world knowledge about entities, pre-trained language models (LMs) struggle to compose together two or more facts to perform multi-hop reasoning in question-answering tasks. In this work, we propose techniques that improve upon this limitation by relying on random walks over structured knowledge graphs. Specifically, we use soft prompts to guide LMs to chain together thei…
▽ More
Despite readily memorizing world knowledge about entities, pre-trained language models (LMs) struggle to compose together two or more facts to perform multi-hop reasoning in question-answering tasks. In this work, we propose techniques that improve upon this limitation by relying on random walks over structured knowledge graphs. Specifically, we use soft prompts to guide LMs to chain together their encoded knowledge by learning to map multi-hop questions to random walk paths that lead to the answer. Applying our methods on two T5 LMs shows substantial improvements over standard tuning approaches in answering questions that require 2-hop reasoning.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.