-
Prompting LLMs for Code Editing: Struggles and Remedies
Authors:
Daye Nam,
Ahmed Omran,
Ambar Murillo,
Saksham Thakur,
Abner Araujo,
Marcel Blistein,
Alexander Frömmgen,
Vincent Hellendoorn,
Satish Chandra
Abstract:
Large Language Models (LLMs) are rapidly transforming software engineering, with coding assistants embedded in an IDE becoming increasingly prevalent. While research has focused on improving the tools and understanding developer perceptions, a critical gap exists in understanding how developers actually use these tools in their daily workflows, and, crucially, where they struggle. This paper addre…
▽ More
Large Language Models (LLMs) are rapidly transforming software engineering, with coding assistants embedded in an IDE becoming increasingly prevalent. While research has focused on improving the tools and understanding developer perceptions, a critical gap exists in understanding how developers actually use these tools in their daily workflows, and, crucially, where they struggle. This paper addresses part of this gap through a multi-phased investigation of developer interactions with an LLM-powered code editing and transformation feature, Transform Code, in an IDE widely used at Google. First, we analyze telemetry logs of the feature usage, revealing that frequent re-prompting can be an indicator of developer struggles with using Transform Code. Second, we conduct a qualitative analysis of unsatisfactory requests, identifying five key categories of information often missing from developer prompts. Finally, based on these findings, we propose and evaluate a tool, AutoPrompter, for automatically improving prompts by inferring missing information from the surrounding code context, leading to a 27% improvement in edit correctness on our test set.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
AlignDiff: Learning Physically-Grounded Camera Alignment via Diffusion
Authors:
Liuyue Xie,
Jiancong Guo,
Ozan Cakmakci,
Andre Araujo,
Laszlo A. Jeni,
Zhiheng Jia
Abstract:
Accurate camera calibration is a fundamental task for 3D perception, especially when dealing with real-world, in-the-wild environments where complex optical distortions are common. Existing methods often rely on pre-rectified images or calibration patterns, which limits their applicability and flexibility. In this work, we introduce a novel framework that addresses these challenges by jointly mode…
▽ More
Accurate camera calibration is a fundamental task for 3D perception, especially when dealing with real-world, in-the-wild environments where complex optical distortions are common. Existing methods often rely on pre-rectified images or calibration patterns, which limits their applicability and flexibility. In this work, we introduce a novel framework that addresses these challenges by jointly modeling camera intrinsic and extrinsic parameters using a generic ray camera model. Unlike previous approaches, AlignDiff shifts focus from semantic to geometric features, enabling more accurate modeling of local distortions. We propose AlignDiff, a diffusion model conditioned on geometric priors, enabling the simultaneous estimation of camera distortions and scene geometry. To enhance distortion prediction, we incorporate edge-aware attention, focusing the model on geometric features around image edges, rather than semantic content. Furthermore, to enhance generalizability to real-world captures, we incorporate a large database of ray-traced lenses containing over three thousand samples. This database characterizes the distortion inherent in a diverse variety of lens forms. Our experiments demonstrate that the proposed method significantly reduces the angular error of estimated ray bundles by ~8.2 degrees and overall calibration accuracy, outperforming existing approaches on challenging, real-world datasets.
△ Less
Submitted 27 March, 2025;
originally announced March 2025.
-
Charting 5G Energy Efficiency: Flexible Energy Modeling for Sustainable Networks
Authors:
Anderson L de Araujo,
Luc Deneire,
Guillaume Urvoy-Keller,
André L F de Almeida
Abstract:
Despite the rapid advancements in 5G technology, accurately assessing the energy consumption of its Radio Access Networks (RANs) remains a challenge due to the diverse range of applicable technologies and implementation solutions. Designing a versatile power model for estimating the 5G RANspecific power consumption requires extensive data collection and experimental studies to capture the diverse…
▽ More
Despite the rapid advancements in 5G technology, accurately assessing the energy consumption of its Radio Access Networks (RANs) remains a challenge due to the diverse range of applicable technologies and implementation solutions. Designing a versatile power model for estimating the 5G RANspecific power consumption requires extensive data collection and experimental studies to capture the diverse range of technologies and implementation solutions. The objective is to outline a versatile energy model capable of estimating RAN-specific energy consumption, encompassing both mobile terminals and the physical layer (PHY) of base stations. In this paper, we focus on the computational complexity of the baseband part of the model. The developed (part of the) model is compared with the estimation of the number of cycles (and energy per cycle) used by a specific implementation (here a Matlab code ported on an Intel target), enabling the assessment of the model with the estimation of energy consumed on a real target. The study's results show a good agreement between the model and the implementation, even if some parts need to be refined to take specific algorithms into account. The key contribution is the development of an initial flexible energy model with finer granularity, enabling comparisons of energy use across various applications and contexts, and offering a comprehensive tool for optimizing 5G network energy consumption.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Embracing Experiential Learning: Hackathons as an Educational Strategy for Shaping Soft Skills in Software Engineering
Authors:
Allysson Allex Araújo,
Marcos Kalinowski,
Maria Teresa Baldassarre
Abstract:
In recent years, Software Engineering (SE) scholars and practitioners have emphasized the importance of integrating soft skills into SE education. However, teaching and learning soft skills are complex, as they cannot be acquired passively through raw knowledge acquisition. On the other hand, hackathons have attracted increasing attention due to their experiential, collaborative, and intensive nat…
▽ More
In recent years, Software Engineering (SE) scholars and practitioners have emphasized the importance of integrating soft skills into SE education. However, teaching and learning soft skills are complex, as they cannot be acquired passively through raw knowledge acquisition. On the other hand, hackathons have attracted increasing attention due to their experiential, collaborative, and intensive nature, which certain tasks could be similar to real-world software development. This paper aims to discuss the idea of hackathons as an educational strategy for shaping SE students' soft skills in practice. Initially, we overview the existing literature on soft skills and hackathons in SE education. Then, we report preliminary empirical evidence from a seven-day hybrid hackathon involving 40 students. We assess how the hackathon experience promoted innovative and creative thinking, collaboration and teamwork, and knowledge application among participants through a structured questionnaire designed to evaluate students' self-awareness. Lastly, our findings and new directions are analyzed through the lens of Self-Determination Theory, which offers a psychological lens to understand human behavior. This paper contributes to academia by advocating the potential of hackathons in SE education and proposing concrete plans for future research within SDT. For industry, our discussion has implications around developing soft skills in future SE professionals, thereby enhancing their employability and readiness in the software market.
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
Towards Emotionally Intelligent Software Engineers: Understanding Students' Self-Perceptions After a Cooperative Learning Experience
Authors:
Allysson Allex Araújo,
Marcos Kalinowski,
Matheus Paixao,
Daniel Graziotin
Abstract:
[Background] Emotional Intelligence (EI) can impact Software Engineering (SE) outcomes through improved team communication, conflict resolution, and stress management. SE workers face increasing pressure to develop both technical and interpersonal skills, as modern software development emphasizes collaborative work and complex team interactions. Despite EI's documented importance in professional p…
▽ More
[Background] Emotional Intelligence (EI) can impact Software Engineering (SE) outcomes through improved team communication, conflict resolution, and stress management. SE workers face increasing pressure to develop both technical and interpersonal skills, as modern software development emphasizes collaborative work and complex team interactions. Despite EI's documented importance in professional practice, SE education continues to prioritize technical knowledge over emotional and social competencies. [Objective] This paper analyzes SE students' self-perceptions of their EI after a two-month cooperative learning project, using Mayer and Salovey's four-ability model to examine how students handle emotions in collaborative development. [Method] We conducted a case study with 29 SE students organized into four squads within a project-based learning course, collecting data through questionnaires and focus groups that included brainwriting and sharing circles, then analyzing the data using descriptive statistics and open coding. [Results] Students demonstrated stronger abilities in managing their own emotions compared to interpreting others' emotional states. Despite limited formal EI training, they developed informal strategies for emotional management, including structured planning and peer support networks, which they connected to improved productivity and conflict resolution. [Conclusion] This study shows how SE students perceive EI in a collaborative learning context and provides evidence-based insights into the important role of emotional competencies in SE education.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Blockchain Developer Experience: A Multivocal Literature Review
Authors:
P. Soares,
A. A. Araujo,
G. Destefanis,
R. Neykova,
R. Saraiva,
J. Souza
Abstract:
The rise of smart contracts has expanded blockchain's capabilities, enabling the development of innovative decentralized applications (dApps). However, this advancement brings its own challenges, including the management of distributed architectures and immutable data. Addressing these complexities requires a specialized approach to software engineering, with blockchain-oriented practices emerging…
▽ More
The rise of smart contracts has expanded blockchain's capabilities, enabling the development of innovative decentralized applications (dApps). However, this advancement brings its own challenges, including the management of distributed architectures and immutable data. Addressing these complexities requires a specialized approach to software engineering, with blockchain-oriented practices emerging to support development in this domain. Developer Experience (DEx) is central to this effort, focusing on the usability, productivity, and overall satisfaction of tools and frameworks from the engineers' perspective. Despite its importance, research on Blockchain Developer Experience (BcDEx) remains limited, with no systematic mapping of academic and industry efforts. To bridge this gap, we conducted a Multivocal Literature Review analyzing 62 to understand the distribution of BcDEx sources, practical implementations, and their impact. Our findings revealed that academic focus on BcDEx is limited compared to the coverage in gray literature, which primarily includes blogs (41.8%) and corporate sources (21.8%). Particularly, development efficiency, multi-network support, and usability are the most addressed aspects in tools and frameworks. In addition, we found that BcDEx is being shaped through five key perspectives: complexity abstraction, adoption facilitation, productivity enhancement, developer education, and BcDEx evaluation.
△ Less
Submitted 20 January, 2025;
originally announced January 2025.
-
Observability in Fog Computing
Authors:
Aleteia Araujo,
Breno Costa,
Joao Bachiega Jr,
Leonardo R. Carvalho,
Rajkumar Buyya
Abstract:
Fog Computing provides computational resources close to the end user, supporting low-latency and high-bandwidth communications. It supports IoT applications, enabling real-time data processing, analytics, and decision-making at the edge of the network. However, the high distribution of its constituent nodes and resource-restricted devices interconnected by heterogeneous and unreliable networks mak…
▽ More
Fog Computing provides computational resources close to the end user, supporting low-latency and high-bandwidth communications. It supports IoT applications, enabling real-time data processing, analytics, and decision-making at the edge of the network. However, the high distribution of its constituent nodes and resource-restricted devices interconnected by heterogeneous and unreliable networks makes it challenging to execute service maintenance and troubleshooting, increasing the time to restore the application after failures and not guaranteeing the service level agreements. In such a scenario, increasing the observability of Fog applications and services may speed up troubleshooting and increase their availability. An observability system is a data-intensive service, and Fog Computing could have its nodes and channels saturated with an additional load. In this work, we detail the three pillars of observability (metrics, log, and traces), discuss the challenges, and clarify the approaches for increasing the observability of services in Fog environments. Furthermore, the system architecture that supports observability in Fog, related tools, and technologies are presented, providing a comprehensive discussion on this subject. An example of a solution shows how a real-world application can benefit from increased observability in this environment. Finally, there is a discussion about the future directions of Fog observability.
△ Less
Submitted 25 November, 2024;
originally announced November 2024.
-
Non-Dominated Sorting Bidirectional Differential Coevolution
Authors:
Cicero S. R. Mendes,
Aluizio F. R. Araújo,
Lucas R. C. Farias
Abstract:
Constrained multiobjective optimization problems (CMOPs) are commonly found in real-world applications. CMOP is a complex problem that needs to satisfy a set of equality or inequality constraints. This paper proposes a variant of the bidirectional coevolution algorithm (BiCo) with differential evolution (DE). The novelties in the model include the DE differential mutation and crossover operators a…
▽ More
Constrained multiobjective optimization problems (CMOPs) are commonly found in real-world applications. CMOP is a complex problem that needs to satisfy a set of equality or inequality constraints. This paper proposes a variant of the bidirectional coevolution algorithm (BiCo) with differential evolution (DE). The novelties in the model include the DE differential mutation and crossover operators as the main search engine and a non-dominated sorting selection scheme. Experimental results on two benchmark test suites and eight real-world CMOPs suggested that the proposed model reached better overall performance than the original model.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.
-
An Inverse Modeling Constrained Multi-Objective Evolutionary Algorithm Based on Decomposition
Authors:
Lucas R. C. Farias,
Aluizio F. R. Araújo
Abstract:
This paper introduces the inverse modeling constrained multi-objective evolutionary algorithm based on decomposition (IM-C-MOEA/D) for addressing constrained real-world optimization problems. Our research builds upon the advancements made in evolutionary computing-based inverse modeling, and it strategically bridges the gaps in applying inverse models based on decomposition to problem domains with…
▽ More
This paper introduces the inverse modeling constrained multi-objective evolutionary algorithm based on decomposition (IM-C-MOEA/D) for addressing constrained real-world optimization problems. Our research builds upon the advancements made in evolutionary computing-based inverse modeling, and it strategically bridges the gaps in applying inverse models based on decomposition to problem domains with constraints. The proposed approach is experimentally evaluated on diverse real-world problems (RWMOP1-35), showing superior performance to state-of-the-art constrained multi-objective evolutionary algorithms (CMOEAs). The experimental results highlight the robustness of the algorithm and its applicability in real-world constrained optimization scenarios.
△ Less
Submitted 24 October, 2024;
originally announced October 2024.
-
TIPS: Text-Image Pretraining with Spatial awareness
Authors:
Kevis-Kokitsi Maninis,
Kaifeng Chen,
Soham Ghosh,
Arjun Karpur,
Koert Chen,
Ye Xia,
Bingyi Cao,
Daniel Salz,
Guangxing Han,
Jan Dlabal,
Dan Gnanapragasam,
Mojtaba Seyedhosseini,
Howard Zhou,
Andre Araujo
Abstract:
While image-text representation learning has become very popular in recent years, existing models tend to lack spatial awareness and have limited direct applicability for dense understanding tasks. For this reason, self-supervised image-only pretraining is still the go-to method for many dense vision applications (e.g. depth estimation, semantic segmentation), despite the lack of explicit supervis…
▽ More
While image-text representation learning has become very popular in recent years, existing models tend to lack spatial awareness and have limited direct applicability for dense understanding tasks. For this reason, self-supervised image-only pretraining is still the go-to method for many dense vision applications (e.g. depth estimation, semantic segmentation), despite the lack of explicit supervisory signals. In this paper, we close this gap between image-text and self-supervised learning, by proposing a novel general-purpose image-text model, which can be effectively used off the shelf for dense and global vision tasks. Our method, which we refer to as Text-Image Pretraining with Spatial awareness (TIPS), leverages two simple and effective insights. First, on textual supervision: we reveal that replacing noisy web image captions by synthetically generated textual descriptions boosts dense understanding performance significantly, due to a much richer signal for learning spatially aware representations. We propose an adapted training method that combines noisy and synthetic captions, resulting in improvements across both dense and global understanding tasks. Second, on the learning technique: we propose to combine contrastive image-text learning with self-supervised masked image modeling, to encourage spatial coherence, unlocking substantial enhancements for downstream applications. Building on these two ideas, we scale our model using the transformer architecture, trained on a curated set of public images. Our experiments are conducted on 8 tasks involving 16 datasets in total, demonstrating strong off-the-shelf performance on both dense and global understanding, for several image-only and image-text tasks. Code and models are released at https://github.com/google-deepmind/tips.
△ Less
Submitted 7 March, 2025; v1 submitted 21 October, 2024;
originally announced October 2024.
-
EMMA: Efficient Visual Alignment in Multi-Modal LLMs
Authors:
Sara Ghazanfari,
Alexandre Araujo,
Prashanth Krishnamurthy,
Siddharth Garg,
Farshad Khorrami
Abstract:
Multi-modal Large Language Models (MLLMs) have recently exhibited impressive general-purpose capabilities by leveraging vision foundation models to encode the core concepts of images into representations. These are then combined with instructions and processed by the language model to generate high-quality responses. Despite significant progress in enhancing the language component, challenges pers…
▽ More
Multi-modal Large Language Models (MLLMs) have recently exhibited impressive general-purpose capabilities by leveraging vision foundation models to encode the core concepts of images into representations. These are then combined with instructions and processed by the language model to generate high-quality responses. Despite significant progress in enhancing the language component, challenges persist in optimally fusing visual encodings within the language model for task-specific adaptability. Recent research has focused on improving this fusion through modality adaptation modules but at the cost of significantly increased model complexity and training data needs. In this paper, we propose EMMA (Efficient Multi-Modal Adaptation), a lightweight cross-modality module designed to efficiently fuse visual and textual encodings, generating instruction-aware visual representations for the language model. Our key contributions include: (1) an efficient early fusion mechanism that integrates vision and language representations with minimal added parameters (less than 0.2% increase in model size), (2) an in-depth interpretability analysis that sheds light on the internal mechanisms of the proposed method; (3) comprehensive experiments that demonstrate notable improvements on both specialized and general benchmarks for MLLMs. Empirical results show that EMMA boosts performance across multiple tasks by up to 9.3% while significantly improving robustness against hallucinations. Our code is available at https://github.com/SaraGhazanfari/EMMA
△ Less
Submitted 10 June, 2025; v1 submitted 2 October, 2024;
originally announced October 2024.
-
Mobile App Security Trends and Topics: An Examination of Questions From Stack Overflow
Authors:
Timothy Huo,
Ana Catarina Araújo,
Jake Imanaka,
Anthony Peruma,
Rick Kazman
Abstract:
The widespread use of smartphones and tablets has made society heavily reliant on mobile applications (apps) for accessing various resources and services. These apps often handle sensitive personal, financial, and health data, making app security a critical concern for developers. While there is extensive research on software security topics like malware and vulnerabilities, less is known about th…
▽ More
The widespread use of smartphones and tablets has made society heavily reliant on mobile applications (apps) for accessing various resources and services. These apps often handle sensitive personal, financial, and health data, making app security a critical concern for developers. While there is extensive research on software security topics like malware and vulnerabilities, less is known about the practical security challenges mobile app developers face and the guidance they seek. In this study, we mine Stack Overflow for questions on mobile app security, which we analyze using quantitative and qualitative techniques. The findings reveal that Stack Overflow is a major resource for developers seeking help with mobile app security, especially for Android apps, and identifies seven main categories of security questions: Secured Communications, Database, App Distribution Service, Encryption, Permissions, File-Specific, and General Security. Insights from this research can inform the development of tools, techniques, and resources by the research and vendor community to better support developers in securing their mobile apps.
△ Less
Submitted 14 September, 2024; v1 submitted 12 September, 2024;
originally announced September 2024.
-
Micro and macro facial expressions by driven animations in realistic Virtual Humans
Authors:
Rubens Halbig Montanha,
Giovana Nascimento Raupp,
Ana Carolina Policarpo Schmitt,
Victor Flávio de Andrade Araujo,
Soraia Raupp Musse
Abstract:
Computer Graphics (CG) advancements have allowed the creation of more realistic Virtual Humans (VH) through modern techniques for animating the VH body and face, thereby affecting perception. From traditional methods, including blend shapes, to driven animations using facial and body tracking, these advancements can potentially enhance the perception of comfort and realism in relation to VHs. Prev…
▽ More
Computer Graphics (CG) advancements have allowed the creation of more realistic Virtual Humans (VH) through modern techniques for animating the VH body and face, thereby affecting perception. From traditional methods, including blend shapes, to driven animations using facial and body tracking, these advancements can potentially enhance the perception of comfort and realism in relation to VHs. Previously, Psychology studied facial movements in humans, with some works separating expressions into macro and micro expressions. Also, some previous CG studies have analyzed how macro and micro expressions are perceived, replicating psychology studies in VHs, encompassing studies with realistic and cartoon VHs, and exploring different VH technologies. However, instead of using facial tracking animation methods, these previous studies animated the VHs using blendshapes interpolation. To understand how the facial tracking technique alters the perception of VHs, this paper extends the study to macro and micro expressions, employing two datasets to transfer real facial expressions to VHs and analyze how their expressions are perceived. Our findings suggest that transferring facial expressions from real actors to VHs significantly diminishes the accuracy of emotion perception compared to VH facial animations created by artists.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
A Developer-Centric Study Exploring Mobile Application Security Practices and Challenges
Authors:
Anthony Peruma,
Timothy Huo,
Ana Catarina Araújo,
Jake Imanaka,
Rick Kazman
Abstract:
Mobile applications (apps) have become an essential part of everyday life, offering convenient access to services such as banking, healthcare, and shopping. With these apps handling sensitive personal and financial data, ensuring their security is paramount. While previous research has explored mobile app developer practices, there is limited knowledge about the common practices and challenges tha…
▽ More
Mobile applications (apps) have become an essential part of everyday life, offering convenient access to services such as banking, healthcare, and shopping. With these apps handling sensitive personal and financial data, ensuring their security is paramount. While previous research has explored mobile app developer practices, there is limited knowledge about the common practices and challenges that developers face in securing their apps. Our study addresses this need through a global survey of 137 experienced mobile app developers, providing a developer-centric view of mobile app security. Our findings show that developers place high importance on security, frequently implementing features such as authentication and secure storage. They face challenges with managing vulnerabilities, permissions, and privacy concerns, and often rely on resources like Stack Overflow for help. Many developers find that existing learning materials do not adequately prepare them to build secure apps and provide recommendations, such as following best practices and integrating security at the beginning of the development process. We envision our findings leading to improved security practices, better-designed tools and resources, and more effective training programs.
△ Less
Submitted 16 August, 2024;
originally announced August 2024.
-
Teaching Survey Research in Software Engineering
Authors:
Marcos Kalinowski,
Allysson Allex Araújo,
Daniel Mendez
Abstract:
In this chapter, we provide advice on how to effectively teach survey research based on lessons learned from several international teaching experiences on the topic and from conducting large-scale surveys published at various scientific conferences and journals. First, we provide teachers with a potential syllabus for teaching survey research, including learning objectives, lectures, and examples…
▽ More
In this chapter, we provide advice on how to effectively teach survey research based on lessons learned from several international teaching experiences on the topic and from conducting large-scale surveys published at various scientific conferences and journals. First, we provide teachers with a potential syllabus for teaching survey research, including learning objectives, lectures, and examples of practical assignments. Thereafter, we provide actionable advice on how to teach the topics related to each learning objective, including survey design, sampling, data collection, statistical and qualitative analysis, threats to validity and reliability, and ethical considerations. The chapter is complemented by online teaching resources, including slides covering an entire course.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
Tuning the Frequencies: Robust Training for Sinusoidal Neural Networks
Authors:
Tiago Novello,
Diana Aldana,
Andre Araujo,
Luiz Velho
Abstract:
Sinusoidal neural networks have been shown effective as implicit neural representations (INRs) of low-dimensional signals, due to their smoothness and high representation capacity. However, initializing and training them remain empirical tasks which lack on deeper understanding to guide the learning process. To fill this gap, our work introduces a theoretical framework that explains the capacity p…
▽ More
Sinusoidal neural networks have been shown effective as implicit neural representations (INRs) of low-dimensional signals, due to their smoothness and high representation capacity. However, initializing and training them remain empirical tasks which lack on deeper understanding to guide the learning process. To fill this gap, our work introduces a theoretical framework that explains the capacity property of sinusoidal networks and offers robust control mechanisms for initialization and training. Our analysis is based on a novel amplitude-phase expansion of the sinusoidal multilayer perceptron, showing how its layer compositions produce a large number of new frequencies expressed as integer combinations of the input frequencies. This relationship can be directly used to initialize the input neurons, as a form of spectral sampling, and to bound the network's spectrum while training. Our method, referred to as TUNER (TUNing sinusoidal nEtwoRks), greatly improves the stability and convergence of sinusoidal INR training, leading to detailed reconstructions, while preventing overfitting.
△ Less
Submitted 3 April, 2025; v1 submitted 30 July, 2024;
originally announced July 2024.
-
Agile Minds, Innovative Solutions, and Industry-Academia Collaboration: Lean R&D Meets Problem-Based Learning in Software Engineering Education
Authors:
Lucas Romao,
Marcos Kalinowski,
Clarissa Barbosa,
Allysson Allex Araújo,
Simone D. J. Barbosa,
Helio Lopes
Abstract:
[Context] Software Engineering (SE) education constantly seeks to bridge the gap between academic knowledge and industry demands, with active learning methods like Problem-Based Learning (PBL) gaining prominence. Despite these efforts, recent graduates struggle to align skills with industry needs. Recognizing the relevance of Industry-Academia Collaboration (IAC), Lean R&D has emerged as a success…
▽ More
[Context] Software Engineering (SE) education constantly seeks to bridge the gap between academic knowledge and industry demands, with active learning methods like Problem-Based Learning (PBL) gaining prominence. Despite these efforts, recent graduates struggle to align skills with industry needs. Recognizing the relevance of Industry-Academia Collaboration (IAC), Lean R&D has emerged as a successful agile-based research and development approach, emphasizing business and software development synergy. [Goal] This paper aims to extend Lean R&D with PBL principles, evaluating its application in an educational program designed by ExACTa PUC- Rio for Americanas S.A., a large Brazilian retail company. [Method] The educational program engaged 40 part-time students receiving lectures and mentoring while working on real problems, coordinators and mentors, and company stakeholders in industry projects. Empirical evaluation, through a case study approach, utilized structured questionnaires based on the Technology Acceptance Model (TAM). [Results] Stakeholders were satisfied with Lean R&D PBL for problem-solving. Students reported increased knowledge proficiency and perceived working on real problems as contributing the most to their learning. [Conclusion] This research contributes to academia by sharing Lean R&D PBL as an educational IAC approach. For industry, we discuss the implementation of this proposal in an IAC program that promotes workforce skill development and innovative solutions.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Investigating Benefits and Limitations of Migrating to a Micro-Frontends Architecture
Authors:
Fabio Antunes,
Maria Julia Dias Lima,
Marco Antônio Pereira Araújo,
Davide Taibi,
Marcos Kalinowski
Abstract:
[Context] The adoption of micro-frontends architectures has gained traction as a promising approach to enhance modularity, scalability, and maintainability of web applications. [Goal] The primary aim of this research is to investigate the benefits and limitations of migrating a real-world application to a micro-frontends architecture from the perspective of the developers. [Method] Based on the ac…
▽ More
[Context] The adoption of micro-frontends architectures has gained traction as a promising approach to enhance modularity, scalability, and maintainability of web applications. [Goal] The primary aim of this research is to investigate the benefits and limitations of migrating a real-world application to a micro-frontends architecture from the perspective of the developers. [Method] Based on the action research approach, after diagnosis and planning, we applied an intervention of migrating the target web application to a micro-frontends architecture. Thereafter, the migration was evaluated in a workshop involving the remaining developers responsible for maintaining the application. During the workshop, these developers were presented with the migrated architecture, conducted a simple maintenance task, discussed benefits and limitations in a focus group to gather insights, and answered a questionnaire on the acceptance of the technology. [Results] Developers' perceptions gathered during the focus group reinforce the benefits and limitations reported in the literature. Key benefits included enhanced flexibility in technology choices, scalability of development teams, and gradual migration of technologies. However, the increased complexity of the architecture raised concerns among developers, particularly in dependency and environment management, debugging, and integration testing. [Conclusions] While micro-frontends represent a promising technology, unresolved issues still limit their broader applicability. Developers generally perceived the architecture as useful and moderately easy to use but hesitated to adopt it.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Towards Effective Collaboration between Software Engineers and Data Scientists developing Machine Learning-Enabled Systems
Authors:
Gabriel Busquim,
Allysson Allex Araújo,
Maria Julia Lima,
Marcos Kalinowski
Abstract:
Incorporating Machine Learning (ML) into existing systems is a demand that has grown among several organizations. However, the development of ML-enabled systems encompasses several social and technical challenges, which must be addressed by actors with different fields of expertise working together. This paper has the objective of understanding how to enhance the collaboration between two key acto…
▽ More
Incorporating Machine Learning (ML) into existing systems is a demand that has grown among several organizations. However, the development of ML-enabled systems encompasses several social and technical challenges, which must be addressed by actors with different fields of expertise working together. This paper has the objective of understanding how to enhance the collaboration between two key actors in building these systems: software engineers and data scientists. We conducted two focus group sessions with experienced data scientists and software engineers working on real-world ML-enabled systems to assess the relevance of different recommendations for specific technical tasks. Our research has found that collaboration between these actors is important for effectively developing ML-enabled systems, especially when defining data access and ML model deployment. Participants provided concrete examples of how recommendations depicted in the literature can benefit collaboration during different tasks. For example, defining clear responsibilities for each team member and creating concise documentation can improve communication and overall performance. Our study contributes to a better understanding of how to foster effective collaboration between software engineers and data scientists creating ML-enabled systems.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Achieving Observability on Fog Computing with the use of open-source tools
Authors:
Breno Costa,
Abhik Banerjee,
Prem Prakash Jayaraman,
Leonardo R. Carvalho,
João Bachiega Jr.,
Aleteia Araujo
Abstract:
Fog computing can provide computational resources and low-latency communication at the network edge. But with it comes uncertainties that must be managed in order to guarantee Service Level Agreements. Service observability can help the environment better deal with uncertainties, delivering relevant and up-to-date information in a timely manner to support decision making. Observability is consider…
▽ More
Fog computing can provide computational resources and low-latency communication at the network edge. But with it comes uncertainties that must be managed in order to guarantee Service Level Agreements. Service observability can help the environment better deal with uncertainties, delivering relevant and up-to-date information in a timely manner to support decision making. Observability is considered a superset of monitoring since it uses not only performance metrics, but also other instrumentation domains such as logs and traces. However, as Fog Computing is typically characterised by resource-constrained nodes and network uncertainties, increasing observability in fog can be risky due to the additional load injected into a restricted environment. There is no work in the literature that evaluated fog observability. In this paper, we first outline the challenges of achieving observability in a Fog environment, based on which we present a formal definition of fog observability. Subsequently, a real-world Fog Computing testbed running a smart city use case is deployed, and an empirical evaluation of fog observability using open-source tools is presented. The results show that under certain conditions, it is viable to provide observability in a Fog Computing environment using open-source tools, although it is necessary to control the overhead modifying their default configuration according to the application characteristics.
△ Less
Submitted 25 May, 2024;
originally announced July 2024.
-
UDON: Universal Dynamic Online distillatioN for generic image representations
Authors:
Nikolaos-Antonios Ypsilantis,
Kaifeng Chen,
André Araujo,
Ondřej Chum
Abstract:
Universal image representations are critical in enabling real-world fine-grained and instance-level recognition applications, where objects and entities from any domain must be identified at large scale. Despite recent advances, existing methods fail to capture important domain-specific knowledge, while also ignoring differences in data distribution across different domains. This leads to a large…
▽ More
Universal image representations are critical in enabling real-world fine-grained and instance-level recognition applications, where objects and entities from any domain must be identified at large scale. Despite recent advances, existing methods fail to capture important domain-specific knowledge, while also ignoring differences in data distribution across different domains. This leads to a large performance gap between efficient universal solutions and expensive approaches utilising a collection of specialist models, one for each domain. In this work, we make significant strides towards closing this gap, by introducing a new learning technique, dubbed UDON (Universal Dynamic Online DistillatioN). UDON employs multi-teacher distillation, where each teacher is specialized in one domain, to transfer detailed domain-specific knowledge into the student universal embedding. UDON's distillation approach is not only effective, but also very efficient, by sharing most model parameters between the student and all teachers, where all models are jointly trained in an online manner. UDON also comprises a sampling technique which adapts the training process to dynamically allocate batches to domains which are learned slower and require more frequent processing. This boosts significantly the learning of complex domains which are characterised by a large number of classes and long-tail distributions. With comprehensive experiments, we validate each component of UDON, and showcase significant improvements over the state of the art in the recent UnED benchmark. Code: https://github.com/nikosips/UDON .
△ Less
Submitted 9 December, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Understanding and measuring software engineer behavior: What can we learn from the behavioral sciences?
Authors:
Allysson Allex Araújo,
Marcos Kalinowski,
Daniel Graziotin
Abstract:
This paper explores the intricate challenge of understanding and measuring software engineer behavior. More specifically, we revolve around a central question: How can we enhance our understanding of software engineer behavior? Grounded in the nuanced complexities addressed within Behavioral Software Engineering (BSE), we advocate for holistic methods that integrate quantitative measures, such as…
▽ More
This paper explores the intricate challenge of understanding and measuring software engineer behavior. More specifically, we revolve around a central question: How can we enhance our understanding of software engineer behavior? Grounded in the nuanced complexities addressed within Behavioral Software Engineering (BSE), we advocate for holistic methods that integrate quantitative measures, such as psychometric instruments, and qualitative data from diverse sources. Furthermore, we delve into the relevance of this challenge within national and international contexts, highlighting the increasing interest in understanding software engineer behavior. Real-world initiatives and academic endeavors are also examined to underscore the potential for advancing this research agenda and, consequently, refining software engineering practices based on behavioral aspects. Lastly, this paper addresses different ways to evaluate the progress of this challenge by leveraging methodological skills derived from behavioral sciences, ultimately contributing to a deeper understanding of software engineer behavior and software engineering practices.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Investigating the Online Recruitment and Selection Journey of Novice Software Engineers: Anti-patterns and Recommendations
Authors:
Miguel Setúbal,
Tayana Conte,
Marcos Kalinowski,
Allysson Allex Araújo
Abstract:
[Context] The growing software development market has increased the demand for qualified professionals in Software Engineering (SE). To this end, companies must enhance their Recruitment and Selection (R&S) processes to maintain high quality teams, including opening opportunities for beginners, such as trainees and interns. However, given the various judgments and sociotechnical factors involved,…
▽ More
[Context] The growing software development market has increased the demand for qualified professionals in Software Engineering (SE). To this end, companies must enhance their Recruitment and Selection (R&S) processes to maintain high quality teams, including opening opportunities for beginners, such as trainees and interns. However, given the various judgments and sociotechnical factors involved, this complex process of R&S poses a challenge for recent graduates seeking to enter the market. [Objective] This paper aims to identify a set of anti-patterns and recommendations for early career SE professionals concerning R&S processes. [Method] Under an exploratory and qualitative methodological approach, we conducted six online Focus Groups with 18 recruiters with experience in R&S in the software industry. [Results] After completing our qualitative analysis, we identified 12 anti-patterns and 31 actionable recommendations regarding the hiring process focused on entry level SE professionals. The identified anti-patterns encompass behavioral and technical dimensions innate to R&S processes. [Conclusion] These findings provide a rich opportunity for reflection in the SE industry and offer valuable guidance for early-career candidates and organizations. From an academic perspective, this work also raises awareness of the intersection of Human Resources and SE, an area with considerable potential to be expanded in the context of cooperative and human aspects of SE.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance
Authors:
Hanwen Jiang,
Arjun Karpur,
Bingyi Cao,
Qixing Huang,
Andre Araujo
Abstract:
The image matching field has been witnessing a continuous emergence of novel learnable feature matching techniques, with ever-improving performance on conventional benchmarks. However, our investigation shows that despite these gains, their potential for real-world applications is restricted by their limited generalization capabilities to novel image domains. In this paper, we introduce OmniGlue,…
▽ More
The image matching field has been witnessing a continuous emergence of novel learnable feature matching techniques, with ever-improving performance on conventional benchmarks. However, our investigation shows that despite these gains, their potential for real-world applications is restricted by their limited generalization capabilities to novel image domains. In this paper, we introduce OmniGlue, the first learnable image matcher that is designed with generalization as a core principle. OmniGlue leverages broad knowledge from a vision foundation model to guide the feature matching process, boosting generalization to domains not seen at training time. Additionally, we propose a novel keypoint position-guided attention mechanism which disentangles spatial and appearance information, leading to enhanced matching descriptors. We perform comprehensive experiments on a suite of $7$ datasets with varied image domains, including scene-level, object-centric and aerial images. OmniGlue's novel components lead to relative gains on unseen domains of $20.9\%$ with respect to a directly comparable reference model, while also outperforming the recent LightGlue method by $9.5\%$ relatively.Code and model can be found at https://hwjiang1510.github.io/OmniGlue
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
XFeat: Accelerated Features for Lightweight Image Matching
Authors:
Guilherme Potje,
Felipe Cadar,
Andre Araujo,
Renato Martins,
Erickson R. Nascimento
Abstract:
We introduce a lightweight and accurate architecture for resource-efficient visual correspondence. Our method, dubbed XFeat (Accelerated Features), revisits fundamental design choices in convolutional neural networks for detecting, extracting, and matching local features. Our new model satisfies a critical need for fast and robust algorithms suitable to resource-limited devices. In particular, acc…
▽ More
We introduce a lightweight and accurate architecture for resource-efficient visual correspondence. Our method, dubbed XFeat (Accelerated Features), revisits fundamental design choices in convolutional neural networks for detecting, extracting, and matching local features. Our new model satisfies a critical need for fast and robust algorithms suitable to resource-limited devices. In particular, accurate image matching requires sufficiently large image resolutions - for this reason, we keep the resolution as large as possible while limiting the number of channels in the network. Besides, our model is designed to offer the choice of matching at the sparse or semi-dense levels, each of which may be more suitable for different downstream applications, such as visual navigation and augmented reality. Our model is the first to offer semi-dense matching efficiently, leveraging a novel match refinement module that relies on coarse local descriptors. XFeat is versatile and hardware-independent, surpassing current deep learning-based local features in speed (up to 5x faster) with comparable or better accuracy, proven in pose estimation and visual localization. We showcase it running in real-time on an inexpensive laptop CPU without specialized hardware optimizations. Code and weights are available at www.verlab.dcc.ufmg.br/descriptors/xfeat_cvpr24.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
HAMMR: HierArchical MultiModal React agents for generic VQA
Authors:
Lluis Castrejon,
Thomas Mensink,
Howard Zhou,
Vittorio Ferrari,
Andre Araujo,
Jasper Uijlings
Abstract:
Combining Large Language Models (LLMs) with external specialized tools (LLMs+tools) is a recent paradigm to solve multimodal tasks such as Visual Question Answering (VQA). While this approach was demonstrated to work well when optimized and evaluated for each individual benchmark, in practice it is crucial for the next generation of real-world AI systems to handle a broad range of multimodal probl…
▽ More
Combining Large Language Models (LLMs) with external specialized tools (LLMs+tools) is a recent paradigm to solve multimodal tasks such as Visual Question Answering (VQA). While this approach was demonstrated to work well when optimized and evaluated for each individual benchmark, in practice it is crucial for the next generation of real-world AI systems to handle a broad range of multimodal problems. Therefore we pose the VQA problem from a unified perspective and evaluate a single system on a varied suite of VQA tasks including counting, spatial reasoning, OCR-based reasoning, visual pointing, external knowledge, and more. In this setting, we demonstrate that naively applying the LLM+tools approach using the combined set of all tools leads to poor results. This motivates us to introduce HAMMR: HierArchical MultiModal React. We start from a multimodal ReAct-based system and make it hierarchical by enabling our HAMMR agents to call upon other specialized agents. This enhances the compositionality of the LLM+tools approach, which we show to be critical for obtaining high accuracy on generic VQA. Concretely, on our generic VQA suite, HAMMR outperforms the naive LLM+tools approach by 19.5%. Additionally, HAMMR achieves state-of-the-art results on this task, outperforming the generic standalone PaLI-X VQA model by 5.0%.
△ Less
Submitted 14 October, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
Low-Resource Machine Translation through Retrieval-Augmented LLM Prompting: A Study on the Mambai Language
Authors:
Raphaël Merx,
Aso Mahmudi,
Katrina Langford,
Leo Alberto de Araujo,
Ekaterina Vylomova
Abstract:
This study explores the use of large language models (LLMs) for translating English into Mambai, a low-resource Austronesian language spoken in Timor-Leste, with approximately 200,000 native speakers. Leveraging a novel corpus derived from a Mambai language manual and additional sentences translated by a native speaker, we examine the efficacy of few-shot LLM prompting for machine translation (MT)…
▽ More
This study explores the use of large language models (LLMs) for translating English into Mambai, a low-resource Austronesian language spoken in Timor-Leste, with approximately 200,000 native speakers. Leveraging a novel corpus derived from a Mambai language manual and additional sentences translated by a native speaker, we examine the efficacy of few-shot LLM prompting for machine translation (MT) in this low-resource context. Our methodology involves the strategic selection of parallel sentences and dictionary entries for prompting, aiming to enhance translation accuracy, using open-source and proprietary LLMs (LlaMa 2 70b, Mixtral 8x7B, GPT-4). We find that including dictionary entries in prompts and a mix of sentences retrieved through TF-IDF and semantic embeddings significantly improves translation quality. However, our findings reveal stark disparities in translation performance across test sets, with BLEU scores reaching as high as 21.2 on materials from the language manual, in contrast to a maximum of 4.4 on a test set provided by a native speaker. These results underscore the importance of diverse and representative corpora in assessing MT for low-resource languages. Our research provides insights into few-shot LLM prompting for low-resource MT, and makes available an initial corpus for the Mambai language.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
PAL: Proxy-Guided Black-Box Attack on Large Language Models
Authors:
Chawin Sitawarin,
Norman Mu,
David Wagner,
Alexandre Araujo
Abstract:
Large Language Models (LLMs) have surged in popularity in recent months, but they have demonstrated concerning capabilities to generate harmful content when manipulated. While techniques like safety fine-tuning aim to minimize harmful use, recent works have shown that LLMs remain vulnerable to attacks that elicit toxic responses. In this work, we introduce the Proxy-Guided Attack on LLMs (PAL), th…
▽ More
Large Language Models (LLMs) have surged in popularity in recent months, but they have demonstrated concerning capabilities to generate harmful content when manipulated. While techniques like safety fine-tuning aim to minimize harmful use, recent works have shown that LLMs remain vulnerable to attacks that elicit toxic responses. In this work, we introduce the Proxy-Guided Attack on LLMs (PAL), the first optimization-based attack on LLMs in a black-box query-only setting. In particular, it relies on a surrogate model to guide the optimization and a sophisticated loss designed for real-world LLM APIs. Our attack achieves 84% attack success rate (ASR) on GPT-3.5-Turbo and 48% on Llama-2-7B, compared to 4% for the current state of the art. We also propose GCG++, an improvement to the GCG attack that reaches 94% ASR on white-box Llama-2-7B, and the Random-Search Attack on LLMs (RAL), a strong but simple baseline for query-based attacks. We believe the techniques proposed in this work will enable more comprehensive safety testing of LLMs and, in the long term, the development of better security guardrails. The code can be found at https://github.com/chawins/pal.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Can participation in a hackathon impact the motivation of software engineering students? A preliminary case study analysis
Authors:
Allysson Allex Araújo,
Marcos Kalinowski,
Maria Teresa Baldassarre
Abstract:
[Background] Hackathons are increasingly gaining prominence in Software Engineering (SE) education, lauded for their ability to elevate students' skill sets. [Objective] This paper investigates whether hackathons can impact the motivation of SE students. [Method] We conducted an evaluative case study assessing students' motivations before and after a hackathon, combining quantitative analysis usin…
▽ More
[Background] Hackathons are increasingly gaining prominence in Software Engineering (SE) education, lauded for their ability to elevate students' skill sets. [Objective] This paper investigates whether hackathons can impact the motivation of SE students. [Method] We conducted an evaluative case study assessing students' motivations before and after a hackathon, combining quantitative analysis using the Academic Motivation Scale (AMS) and qualitative coding of open-ended responses. [Results] Pre-hackathon findings reveal a diverse range of motivations with an overall acceptance, while post-hackathon responses highlight no statistically significant shift in participants' perceptions. Qualitative findings uncovered themes related to networking, team dynamics, and skill development. From a practical perspective, our findings highlight the potential of hackathons to impact participants' motivation. [Conclusion] While our study enhances the comprehension of hackathons as a motivational tool, it also underscores the need for further exploration of psychometric dimensions in SE educational research.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Reinforcement-learning robotic sailboats: simulator and preliminary results
Authors:
Eduardo Charles Vasconcellos,
Ronald M Sampaio,
André P D Araújo,
Esteban Walter Gonzales Clua,
Philippe Preux,
Raphael Guerra,
Luiz M G Gonçalves,
Luis Martí,
Hernan Lira,
Nayat Sanchez-Pi
Abstract:
This work focuses on the main challenges and problems in developing a virtual oceanic environment reproducing real experiments using Unmanned Surface Vehicles (USV) digital twins. We introduce the key features for building virtual worlds, considering using Reinforcement Learning (RL) agents for autonomous navigation and control. With this in mind, the main problems concern the definition of the si…
▽ More
This work focuses on the main challenges and problems in developing a virtual oceanic environment reproducing real experiments using Unmanned Surface Vehicles (USV) digital twins. We introduce the key features for building virtual worlds, considering using Reinforcement Learning (RL) agents for autonomous navigation and control. With this in mind, the main problems concern the definition of the simulation equations (physics and mathematics), their effective implementation, and how to include strategies for simulated control and perception (sensors) to be used with RL. We present the modeling, implementation steps, and challenges required to create a functional digital twin based on a real robotic sailing vessel. The application is immediate for developing navigation algorithms based on RL to be applied on real boats.
△ Less
Submitted 16 January, 2024;
originally announced February 2024.
-
Novel Quadratic Constraints for Extending LipSDP beyond Slope-Restricted Activations
Authors:
Patricia Pauli,
Aaron Havens,
Alexandre Araujo,
Siddharth Garg,
Farshad Khorrami,
Frank Allgöwer,
Bin Hu
Abstract:
Recently, semidefinite programming (SDP) techniques have shown great promise in providing accurate Lipschitz bounds for neural networks. Specifically, the LipSDP approach (Fazlyab et al., 2019) has received much attention and provides the least conservative Lipschitz upper bounds that can be computed with polynomial time guarantees. However, one main restriction of LipSDP is that its formulation r…
▽ More
Recently, semidefinite programming (SDP) techniques have shown great promise in providing accurate Lipschitz bounds for neural networks. Specifically, the LipSDP approach (Fazlyab et al., 2019) has received much attention and provides the least conservative Lipschitz upper bounds that can be computed with polynomial time guarantees. However, one main restriction of LipSDP is that its formulation requires the activation functions to be slope-restricted on $[0,1]$, preventing its further use for more general activation functions such as GroupSort, MaxMin, and Householder. One can rewrite MaxMin activations for example as residual ReLU networks. However, a direct application of LipSDP to the resultant residual ReLU networks is conservative and even fails in recovering the well-known fact that the MaxMin activation is 1-Lipschitz. Our paper bridges this gap and extends LipSDP beyond slope-restricted activation functions. To this end, we provide novel quadratic constraints for GroupSort, MaxMin, and Householder activations via leveraging their underlying properties such as sum preservation. Our proposed analysis is general and provides a unified approach for estimating $\ell_2$ and $\ell_\infty$ Lipschitz bounds for a rich class of neural network architectures, including non-residual and residual neural networks and implicit models, with GroupSort, MaxMin, and Householder activations. Finally, we illustrate the utility of our approach with a variety of experiments and show that our proposed SDPs generate less conservative Lipschitz bounds in comparison to existing approaches.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Towards Real-World Focus Stacking with Deep Learning
Authors:
Alexandre Araujo,
Jean Ponce,
Julien Mairal
Abstract:
Focus stacking is widely used in micro, macro, and landscape photography to reconstruct all-in-focus images from multiple frames obtained with focus bracketing, that is, with shallow depth of field and different focus planes. Existing deep learning approaches to the underlying multi-focus image fusion problem have limited applicability to real-world imagery since they are designed for very short i…
▽ More
Focus stacking is widely used in micro, macro, and landscape photography to reconstruct all-in-focus images from multiple frames obtained with focus bracketing, that is, with shallow depth of field and different focus planes. Existing deep learning approaches to the underlying multi-focus image fusion problem have limited applicability to real-world imagery since they are designed for very short image sequences (two to four images), and are typically trained on small, low-resolution datasets either acquired by light-field cameras or generated synthetically. We introduce a new dataset consisting of 94 high-resolution bursts of raw images with focus bracketing, with pseudo ground truth computed from the data using state-of-the-art commercial software. This dataset is used to train the first deep learning algorithm for focus stacking capable of handling bursts of sufficient length for real-world applications. Qualitative experiments demonstrate that it is on par with existing commercial solutions in the long-burst, realistic regime while being significantly more tolerant to noise. The code and dataset are available at https://github.com/araujoalexandre/FocusStackingDataset.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Transformer-based Model for Oral Epithelial Dysplasia Segmentation
Authors:
Adam J Shephard,
Hanya Mahmood,
Shan E Ahmed Raza,
Anna Luiza Damaceno Araujo,
Alan Roger Santos-Silva,
Marcio Ajudarte Lopes,
Pablo Agustin Vargas,
Kris McCombe,
Stephanie Craig,
Jacqueline James,
Jill Brooks,
Paul Nankivell,
Hisham Mehanna,
Syed Ali Khurram,
Nasir M Rajpoot
Abstract:
Oral epithelial dysplasia (OED) is a premalignant histopathological diagnosis given to lesions of the oral cavity. OED grading is subject to large inter/intra-rater variability, resulting in the under/over-treatment of patients. We developed a new Transformer-based pipeline to improve detection and segmentation of OED in haematoxylin and eosin (H&E) stained whole slide images (WSIs). Our model was…
▽ More
Oral epithelial dysplasia (OED) is a premalignant histopathological diagnosis given to lesions of the oral cavity. OED grading is subject to large inter/intra-rater variability, resulting in the under/over-treatment of patients. We developed a new Transformer-based pipeline to improve detection and segmentation of OED in haematoxylin and eosin (H&E) stained whole slide images (WSIs). Our model was trained on OED cases (n = 260) and controls (n = 105) collected using three different scanners, and validated on test data from three external centres in the United Kingdom and Brazil (n = 78). Our internal experiments yield a mean F1-score of 0.81 for OED segmentation, which reduced slightly to 0.71 on external testing, showing good generalisability, and gaining state-of-the-art results. This is the first externally validated study to use Transformers for segmentation in precancerous histology images. Our publicly available model shows great promise to be the first step of a fully-integrated pipeline, allowing earlier and more efficient OED diagnosis, ultimately benefiting patient outcomes.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
LipSim: A Provably Robust Perceptual Similarity Metric
Authors:
Sara Ghazanfari,
Alexandre Araujo,
Prashanth Krishnamurthy,
Farshad Khorrami,
Siddharth Garg
Abstract:
Recent years have seen growing interest in developing and applying perceptual similarity metrics. Research has shown the superiority of perceptual metrics over pixel-wise metrics in aligning with human perception and serving as a proxy for the human visual system. On the other hand, as perceptual metrics rely on neural networks, there is a growing concern regarding their resilience, given the esta…
▽ More
Recent years have seen growing interest in developing and applying perceptual similarity metrics. Research has shown the superiority of perceptual metrics over pixel-wise metrics in aligning with human perception and serving as a proxy for the human visual system. On the other hand, as perceptual metrics rely on neural networks, there is a growing concern regarding their resilience, given the established vulnerability of neural networks to adversarial attacks. It is indeed logical to infer that perceptual metrics may inherit both the strengths and shortcomings of neural networks. In this work, we demonstrate the vulnerability of state-of-the-art perceptual similarity metrics based on an ensemble of ViT-based feature extractors to adversarial attacks. We then propose a framework to train a robust perceptual similarity metric called LipSim (Lipschitz Similarity Metric) with provable guarantees. By leveraging 1-Lipschitz neural networks as the backbone, LipSim provides guarded areas around each data point and certificates for all perturbations within an $\ell_2$ ball. Finally, a comprehensive set of experiments shows the performance of LipSim in terms of natural and certified scores and on the image retrieval application. The code is available at https://github.com/SaraGhazanfari/LipSim.
△ Less
Submitted 29 March, 2024; v1 submitted 27 October, 2023;
originally announced October 2023.
-
Certification of Deep Learning Models for Medical Image Segmentation
Authors:
Othmane Laousy,
Alexandre Araujo,
Guillaume Chassagnon,
Nikos Paragios,
Marie-Pierre Revel,
Maria Vakalopoulou
Abstract:
In medical imaging, segmentation models have known a significant improvement in the past decade and are now used daily in clinical practice. However, similar to classification models, segmentation models are affected by adversarial attacks. In a safety-critical field like healthcare, certifying model predictions is of the utmost importance. Randomized smoothing has been introduced lately and provi…
▽ More
In medical imaging, segmentation models have known a significant improvement in the past decade and are now used daily in clinical practice. However, similar to classification models, segmentation models are affected by adversarial attacks. In a safety-critical field like healthcare, certifying model predictions is of the utmost importance. Randomized smoothing has been introduced lately and provides a framework to certify models and obtain theoretical guarantees. In this paper, we present for the first time a certified segmentation baseline for medical imaging based on randomized smoothing and diffusion models. Our results show that leveraging the power of denoising diffusion probabilistic models helps us overcome the limits of randomized smoothing. We conduct extensive experiments on five public datasets of chest X-rays, skin lesions, and colonoscopies, and empirically show that we are able to maintain high certified Dice scores even for highly perturbed images. Our work represents the first attempt to certify medical image segmentation models, and we aspire for it to set a foundation for future benchmarks in this crucial and largely uncharted area.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
The Lipschitz-Variance-Margin Tradeoff for Enhanced Randomized Smoothing
Authors:
Blaise Delattre,
Alexandre Araujo,
Quentin Barthélemy,
Alexandre Allauzen
Abstract:
Real-life applications of deep neural networks are hindered by their unsteady predictions when faced with noisy inputs and adversarial attacks. The certified radius in this context is a crucial indicator of the robustness of models. However how to design an efficient classifier with an associated certified radius? Randomized smoothing provides a promising framework by relying on noise injection in…
▽ More
Real-life applications of deep neural networks are hindered by their unsteady predictions when faced with noisy inputs and adversarial attacks. The certified radius in this context is a crucial indicator of the robustness of models. However how to design an efficient classifier with an associated certified radius? Randomized smoothing provides a promising framework by relying on noise injection into the inputs to obtain a smoothed and robust classifier. In this paper, we first show that the variance introduced by the Monte-Carlo sampling in the randomized smoothing procedure estimate closely interacts with two other important properties of the classifier, \textit{i.e.} its Lipschitz constant and margin. More precisely, our work emphasizes the dual impact of the Lipschitz constant of the base classifier, on both the smoothed classifier and the empirical variance. To increase the certified robust radius, we introduce a different way to convert logits to probability vectors for the base classifier to leverage the variance-margin trade-off. We leverage the use of Bernstein's concentration inequality along with enhanced Lipschitz bounds for randomized smoothing. Experimental results show a significant improvement in certified accuracy compared to current state-of-the-art methods. Our novel certification procedure allows us to use pre-trained models with randomized smoothing, effectively improving the current certification radius in a zero-shot manner.
△ Less
Submitted 18 March, 2024; v1 submitted 28 September, 2023;
originally announced September 2023.
-
Optimization of Rank Losses for Image Retrieval
Authors:
Elias Ramzi,
Nicolas Audebert,
Clément Rambour,
André Araujo,
Xavier Bitot,
Nicolas Thome
Abstract:
In image retrieval, standard evaluation metrics rely on score ranking, \eg average precision (AP), recall at k (R@k), normalized discounted cumulative gain (NDCG). In this work we introduce a general framework for robust and decomposable rank losses optimization. It addresses two major challenges for end-to-end training of deep neural networks with rank losses: non-differentiability and non-decomp…
▽ More
In image retrieval, standard evaluation metrics rely on score ranking, \eg average precision (AP), recall at k (R@k), normalized discounted cumulative gain (NDCG). In this work we introduce a general framework for robust and decomposable rank losses optimization. It addresses two major challenges for end-to-end training of deep neural networks with rank losses: non-differentiability and non-decomposability. Firstly we propose a general surrogate for ranking operator, SupRank, that is amenable to stochastic gradient descent. It provides an upperbound for rank losses and ensures robust training. Secondly, we use a simple yet effective loss function to reduce the decomposability gap between the averaged batch approximation of ranking losses and their values on the whole training set. We apply our framework to two standard metrics for image retrieval: AP and R@k. Additionally we apply our framework to hierarchical image retrieval. We introduce an extension of AP, the hierarchical average precision $\mathcal{H}$-AP, and optimize it as well as the NDCG. Finally we create the first hierarchical landmarks retrieval dataset. We use a semi-automatic pipeline to create hierarchical labels, extending the large scale Google Landmarks v2 dataset. The hierarchical dataset is publicly available at https://github.com/cvdfoundation/google-landmark. Code will be released at https://github.com/elias-ramzi/SupRank.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Towards Universal Image Embeddings: A Large-Scale Dataset and Challenge for Generic Image Representations
Authors:
Nikolaos-Antonios Ypsilantis,
Kaifeng Chen,
Bingyi Cao,
Mário Lipovský,
Pelin Dogan-Schönberger,
Grzegorz Makosa,
Boris Bluntschli,
Mojtaba Seyedhosseini,
Ondřej Chum,
André Araujo
Abstract:
Fine-grained and instance-level recognition methods are commonly trained and evaluated on specific domains, in a model per domain scenario. Such an approach, however, is impractical in real large-scale applications. In this work, we address the problem of universal image embedding, where a single universal model is trained and used in multiple domains. First, we leverage existing domain-specific d…
▽ More
Fine-grained and instance-level recognition methods are commonly trained and evaluated on specific domains, in a model per domain scenario. Such an approach, however, is impractical in real large-scale applications. In this work, we address the problem of universal image embedding, where a single universal model is trained and used in multiple domains. First, we leverage existing domain-specific datasets to carefully construct a new large-scale public benchmark for the evaluation of universal image embeddings, with 241k query images, 1.4M index images and 2.8M training images across 8 different domains and 349k classes. We define suitable metrics, training and evaluation protocols to foster future research in this area. Second, we provide a comprehensive experimental evaluation on the new dataset, demonstrating that existing approaches and simplistic extensions lead to worse performance than an assembly of models trained for each domain separately. Finally, we conducted a public research competition on this topic, leveraging industrial datasets, which attracted the participation of more than 1k teams worldwide. This exercise generated many interesting research ideas and findings which we present in detail. Project webpage: https://cmp.felk.cvut.cz/univ_emb/
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
CS-Mixer: A Cross-Scale Vision MLP Model with Spatial-Channel Mixing
Authors:
Jonathan Cui,
David A. Araujo,
Suman Saha,
Md. Faisal Kabir
Abstract:
Despite their simpler information fusion designs compared with Vision Transformers and Convolutional Neural Networks, Vision MLP architectures have demonstrated strong performance and high data efficiency in recent research. However, existing works such as CycleMLP and Vision Permutator typically model spatial information in equal-size spatial regions and do not consider cross-scale spatial intera…
▽ More
Despite their simpler information fusion designs compared with Vision Transformers and Convolutional Neural Networks, Vision MLP architectures have demonstrated strong performance and high data efficiency in recent research. However, existing works such as CycleMLP and Vision Permutator typically model spatial information in equal-size spatial regions and do not consider cross-scale spatial interactions. Further, their token mixers only model 1- or 2-axis correlations, avoiding 3-axis spatial-channel mixing due to its computational demands. We therefore propose CS-Mixer, a hierarchical Vision MLP that learns dynamic low-rank transformations for spatial-channel mixing through cross-scale local and global aggregation. The proposed methodology achieves competitive results on popular image recognition benchmarks without incurring substantially more compute. Our largest model, CS-Mixer-L, reaches 83.2% top-1 accuracy on ImageNet-1k with 13.7 GFLOPs and 94 M parameters.
△ Less
Submitted 14 January, 2024; v1 submitted 25 August, 2023;
originally announced August 2023.
-
Global Features are All You Need for Image Retrieval and Reranking
Authors:
Shihao Shao,
Kaifeng Chen,
Arjun Karpur,
Qinghua Cui,
Andre Araujo,
Bingyi Cao
Abstract:
Image retrieval systems conventionally use a two-stage paradigm, leveraging global features for initial retrieval and local features for reranking. However, the scalability of this method is often limited due to the significant storage and computation cost incurred by local feature matching in the reranking stage. In this paper, we present SuperGlobal, a novel approach that exclusively employs glo…
▽ More
Image retrieval systems conventionally use a two-stage paradigm, leveraging global features for initial retrieval and local features for reranking. However, the scalability of this method is often limited due to the significant storage and computation cost incurred by local feature matching in the reranking stage. In this paper, we present SuperGlobal, a novel approach that exclusively employs global features for both stages, improving efficiency without sacrificing accuracy. SuperGlobal introduces key enhancements to the retrieval system, specifically focusing on the global feature extraction and reranking processes. For extraction, we identify sub-optimal performance when the widely-used ArcFace loss and Generalized Mean (GeM) pooling methods are combined and propose several new modules to improve GeM pooling. In the reranking stage, we introduce a novel method to update the global features of the query and top-ranked images by only considering feature refinement with a small set of images, thus being very compute and memory efficient. Our experiments demonstrate substantial improvements compared to the state of the art in standard benchmarks. Notably, on the Revisited Oxford+1M Hard dataset, our single-stage results improve by 7.1%, while our two-stage gain reaches 3.7% with a strong 64,865x speedup. Our two-stage system surpasses the current single-stage state-of-the-art by 16.3%, offering a scalable, accurate alternative for high-performing image retrieval systems with minimal time overhead. Code: https://github.com/ShihaoShao-GH/SuperGlobal.
△ Less
Submitted 19 August, 2023; v1 submitted 14 August, 2023;
originally announced August 2023.
-
Spintronics for image recognition: performance benchmarking via ultrafast data-driven simulations
Authors:
Anatole Moureaux,
Chloé Chopin,
Simon de Wergifosse,
Laurent Jacques,
Flavio Abreu Araujo
Abstract:
We present a demonstration of image classification using an echo-state network (ESN) relying on a single simulated spintronic nanostructure known as the vortex-based spin-torque oscillator (STVO) delayed in time. We employ an ultrafast data-driven simulation framework called the data-driven Thiele equation approach (DD-TEA) to simulate the STVO dynamics. This allows us to avoid the challenges asso…
▽ More
We present a demonstration of image classification using an echo-state network (ESN) relying on a single simulated spintronic nanostructure known as the vortex-based spin-torque oscillator (STVO) delayed in time. We employ an ultrafast data-driven simulation framework called the data-driven Thiele equation approach (DD-TEA) to simulate the STVO dynamics. This allows us to avoid the challenges associated with repeated experimental manipulation of such a nanostructured system. We showcase the versatility of our solution by successfully applying it to solve classification challenges with the MNIST, EMNIST-letters and Fashion MNIST datasets. Through our simulations, we determine that within an ESN with numerous learnable parameters the results obtained using the STVO dynamics as an activation function are comparable to the ones obtained with other conventional nonlinear activation functions like the reLU and the sigmoid. While achieving state-of-the-art accuracy levels on the MNIST dataset, our model's performance on EMNIST-letters and Fashion MNIST is lower due to the relative simplicity of the system architecture and the increased complexity of the tasks. We expect that the DD-TEA framework will enable the exploration of deeper architectures, ultimately leading to improved classification accuracy.
△ Less
Submitted 7 February, 2024; v1 submitted 10 August, 2023;
originally announced August 2023.
-
R-LPIPS: An Adversarially Robust Perceptual Similarity Metric
Authors:
Sara Ghazanfari,
Siddharth Garg,
Prashanth Krishnamurthy,
Farshad Khorrami,
Alexandre Araujo
Abstract:
Similarity metrics have played a significant role in computer vision to capture the underlying semantics of images. In recent years, advanced similarity metrics, such as the Learned Perceptual Image Patch Similarity (LPIPS), have emerged. These metrics leverage deep features extracted from trained neural networks and have demonstrated a remarkable ability to closely align with human perception whe…
▽ More
Similarity metrics have played a significant role in computer vision to capture the underlying semantics of images. In recent years, advanced similarity metrics, such as the Learned Perceptual Image Patch Similarity (LPIPS), have emerged. These metrics leverage deep features extracted from trained neural networks and have demonstrated a remarkable ability to closely align with human perception when evaluating relative image similarity. However, it is now well-known that neural networks are susceptible to adversarial examples, i.e., small perturbations invisible to humans crafted to deliberately mislead the model. Consequently, the LPIPS metric is also sensitive to such adversarial examples. This susceptibility introduces significant security concerns, especially considering the widespread adoption of LPIPS in large-scale applications. In this paper, we propose the Robust Learned Perceptual Image Patch Similarity (R-LPIPS) metric, a new metric that leverages adversarially trained deep features. Through a comprehensive set of experiments, we demonstrate the superiority of R-LPIPS compared to the classical LPIPS metric. The code is available at https://github.com/SaraGhazanfari/R-LPIPS.
△ Less
Submitted 31 July, 2023; v1 submitted 27 July, 2023;
originally announced July 2023.
-
Neuromorphic spintronics simulated using an unconventional data-driven Thiele equation approach
Authors:
Anatole Moureaux,
Simon de Wergifosse,
Chloé Chopin,
Flavio Abreu Araujo
Abstract:
In this study, we developed a quantitative description of the dynamics of spin-torque vortex nano-oscillators (STVOs) through an unconventional model based on the combination of the Thiele equation approach (TEA) and data from micromagnetic simulations (MMS). Solving the STVO dynamics with our analytical model allows to accelerate the simulations by 9 orders of magnitude compared to MMS while reac…
▽ More
In this study, we developed a quantitative description of the dynamics of spin-torque vortex nano-oscillators (STVOs) through an unconventional model based on the combination of the Thiele equation approach (TEA) and data from micromagnetic simulations (MMS). Solving the STVO dynamics with our analytical model allows to accelerate the simulations by 9 orders of magnitude compared to MMS while reaching the same level of accuracy. Here, we showcase our model by simulating a STVO-based neural network for solving a classification task. We assess its performance with respect to the input signal current intensity and the level of noise that might affect such a system. Our approach is promising for accelerating the design of STVO-based neuromorphic computing devices while decreasing drastically its computational cost.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
Towards Better Certified Segmentation via Diffusion Models
Authors:
Othmane Laousy,
Alexandre Araujo,
Guillaume Chassagnon,
Marie-Pierre Revel,
Siddharth Garg,
Farshad Khorrami,
Maria Vakalopoulou
Abstract:
The robustness of image segmentation has been an important research topic in the past few years as segmentation models have reached production-level accuracy. However, like classification models, segmentation models can be vulnerable to adversarial perturbations, which hinders their use in critical-decision systems like healthcare or autonomous driving. Recently, randomized smoothing has been prop…
▽ More
The robustness of image segmentation has been an important research topic in the past few years as segmentation models have reached production-level accuracy. However, like classification models, segmentation models can be vulnerable to adversarial perturbations, which hinders their use in critical-decision systems like healthcare or autonomous driving. Recently, randomized smoothing has been proposed to certify segmentation predictions by adding Gaussian noise to the input to obtain theoretical guarantees. However, this method exhibits a trade-off between the amount of added noise and the level of certification achieved. In this paper, we address the problem of certifying segmentation prediction using a combination of randomized smoothing and diffusion models. Our experiments show that combining randomized smoothing and diffusion models significantly improves certified robustness, with results indicating a mean improvement of 21 points in accuracy compared to previous state-of-the-art methods on Pascal-Context and Cityscapes public datasets. Our method is independent of the selected segmentation model and does not need any additional specialized training procedure.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories
Authors:
Thomas Mensink,
Jasper Uijlings,
Lluis Castrejon,
Arushi Goel,
Felipe Cadar,
Howard Zhou,
Fei Sha,
André Araujo,
Vittorio Ferrari
Abstract:
We propose Encyclopedic-VQA, a large scale visual question answering (VQA) dataset featuring visual questions about detailed properties of fine-grained categories and instances. It contains 221k unique question+answer pairs each matched with (up to) 5 images, resulting in a total of 1M VQA samples. Moreover, our dataset comes with a controlled knowledge base derived from Wikipedia, marking the evi…
▽ More
We propose Encyclopedic-VQA, a large scale visual question answering (VQA) dataset featuring visual questions about detailed properties of fine-grained categories and instances. It contains 221k unique question+answer pairs each matched with (up to) 5 images, resulting in a total of 1M VQA samples. Moreover, our dataset comes with a controlled knowledge base derived from Wikipedia, marking the evidence to support each answer. Empirically, we show that our dataset poses a hard challenge for large vision+language models as they perform poorly on our dataset: PaLI [14] is state-of-the-art on OK-VQA [37], yet it only achieves 13.0% accuracy on our dataset. Moreover, we experimentally show that progress on answering our encyclopedic questions can be achieved by augmenting large models with a mechanism that retrieves relevant information from the knowledge base. An oracle experiment with perfect retrieval achieves 87.0% accuracy on the single-hop portion of our dataset, and an automatic retrieval-augmented prototype yields 48.8%. We believe that our dataset enables future research on retrieval-augmented vision+language models. It is available at https://github.com/google-research/google-research/tree/master/encyclopedic_vqa .
△ Less
Submitted 24 July, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations
Authors:
Varun Jampani,
Kevis-Kokitsi Maninis,
Andreas Engelhardt,
Arjun Karpur,
Karen Truong,
Kyle Sargent,
Stefan Popov,
André Araujo,
Ricardo Martin-Brualla,
Kaushal Patel,
Daniel Vlasic,
Vittorio Ferrari,
Ameesh Makadia,
Ce Liu,
Yuanzhen Li,
Howard Zhou
Abstract:
Recent advances in neural reconstruction enable high-quality 3D object reconstruction from casually captured image collections. Current techniques mostly analyze their progress on relatively simple image collections where Structure-from-Motion (SfM) techniques can provide ground-truth (GT) camera poses. We note that SfM techniques tend to fail on in-the-wild image collections such as image search…
▽ More
Recent advances in neural reconstruction enable high-quality 3D object reconstruction from casually captured image collections. Current techniques mostly analyze their progress on relatively simple image collections where Structure-from-Motion (SfM) techniques can provide ground-truth (GT) camera poses. We note that SfM techniques tend to fail on in-the-wild image collections such as image search results with varying backgrounds and illuminations. To enable systematic research progress on 3D reconstruction from casual image captures, we propose NAVI: a new dataset of category-agnostic image collections of objects with high-quality 3D scans along with per-image 2D-3D alignments providing near-perfect GT camera parameters. These 2D-3D alignments allow us to extract accurate derivative annotations such as dense pixel correspondences, depth and segmentation maps. We demonstrate the use of NAVI image collections on different problem settings and show that NAVI enables more thorough evaluations that were not possible with existing datasets. We believe NAVI is beneficial for systematic research progress on 3D reconstruction and correspondence estimation. Project page: https://navidataset.github.io
△ Less
Submitted 13 October, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Yes, we CANN: Constrained Approximate Nearest Neighbors for local feature-based visual localization
Authors:
Dror Aiger,
André Araujo,
Simon Lynen
Abstract:
Large-scale visual localization systems continue to rely on 3D point clouds built from image collections using structure-from-motion. While the 3D points in these models are represented using local image features, directly matching a query image's local features against the point cloud is challenging due to the scale of the nearest-neighbor search problem. Many recent approaches to visual localiza…
▽ More
Large-scale visual localization systems continue to rely on 3D point clouds built from image collections using structure-from-motion. While the 3D points in these models are represented using local image features, directly matching a query image's local features against the point cloud is challenging due to the scale of the nearest-neighbor search problem. Many recent approaches to visual localization have thus proposed a hybrid method, where first a global (per image) embedding is used to retrieve a small subset of database images, and local features of the query are matched only against those. It seems to have become common belief that global embeddings are critical for said image-retrieval in visual localization, despite the significant downside of having to compute two feature types for each query image. In this paper, we take a step back from this assumption and propose Constrained Approximate Nearest Neighbors (CANN), a joint solution of k-nearest-neighbors across both the geometry and appearance space using only local features. We first derive the theoretical foundation for k-nearest-neighbor retrieval across multiple metrics and then showcase how CANN improves visual localization. Our experiments on public localization benchmarks demonstrate that our method significantly outperforms both state-of-the-art global feature-based retrieval and approaches using local feature aggregation schemes. Moreover, it is an order of magnitude faster in both index and query time than feature aggregation schemes for these datasets. Code: \url{https://github.com/google-research/google-research/tree/master/cann}
△ Less
Submitted 29 December, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Diffusion-Based Adversarial Sample Generation for Improved Stealthiness and Controllability
Authors:
Haotian Xue,
Alexandre Araujo,
Bin Hu,
Yongxin Chen
Abstract:
Neural networks are known to be susceptible to adversarial samples: small variations of natural examples crafted to deliberately mislead the models. While they can be easily generated using gradient-based techniques in digital and physical scenarios, they often differ greatly from the actual data distribution of natural images, resulting in a trade-off between strength and stealthiness. In this pa…
▽ More
Neural networks are known to be susceptible to adversarial samples: small variations of natural examples crafted to deliberately mislead the models. While they can be easily generated using gradient-based techniques in digital and physical scenarios, they often differ greatly from the actual data distribution of natural images, resulting in a trade-off between strength and stealthiness. In this paper, we propose a novel framework dubbed Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic adversarial samples. By exploiting a gradient guided by a diffusion model, Diff-PGD ensures that adversarial samples remain close to the original data distribution while maintaining their effectiveness. Moreover, our framework can be easily customized for specific tasks such as digital attacks, physical-world attacks, and style-based attacks. Compared with existing methods for generating natural-style adversarial samples, our framework enables the separation of optimizing adversarial loss from other surrogate losses (e.g., content/smoothness/style loss), making it more stable and controllable. Finally, we demonstrate that the samples generated using Diff-PGD have better transferability and anti-purification power than traditional gradient-based methods. Code will be released in https://github.com/xavihart/Diff-PGD
△ Less
Submitted 17 January, 2024; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram Iteration
Authors:
Blaise Delattre,
Quentin Barthélemy,
Alexandre Araujo,
Alexandre Allauzen
Abstract:
Since the control of the Lipschitz constant has a great impact on the training stability, generalization, and robustness of neural networks, the estimation of this value is nowadays a real scientific challenge. In this paper we introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory and a new alternative to the Power ite…
▽ More
Since the control of the Lipschitz constant has a great impact on the training stability, generalization, and robustness of neural networks, the estimation of this value is nowadays a real scientific challenge. In this paper we introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory and a new alternative to the Power iteration. Called the Gram iteration, our approach exhibits a superlinear convergence. First, we show through a comprehensive set of experiments that our approach outperforms other state-of-the-art methods in terms of precision, computational cost, and scalability. Then, it proves highly effective for the Lipschitz regularization of convolutional neural networks, with competitive results against concurrent approaches. Code is available at https://github.com/blaisedelattre/lip4conv.
△ Less
Submitted 19 June, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Enhancing Deformable Local Features by Jointly Learning to Detect and Describe Keypoints
Authors:
Guilherme Potje,
Felipe Cadar,
Andre Araujo,
Renato Martins,
Erickson R. Nascimento
Abstract:
Local feature extraction is a standard approach in computer vision for tackling important tasks such as image matching and retrieval. The core assumption of most methods is that images undergo affine transformations, disregarding more complicated effects such as non-rigid deformations. Furthermore, incipient works tailored for non-rigid correspondence still rely on keypoint detectors designed for…
▽ More
Local feature extraction is a standard approach in computer vision for tackling important tasks such as image matching and retrieval. The core assumption of most methods is that images undergo affine transformations, disregarding more complicated effects such as non-rigid deformations. Furthermore, incipient works tailored for non-rigid correspondence still rely on keypoint detectors designed for rigid transformations, hindering performance due to the limitations of the detector. We propose DALF (Deformation-Aware Local Features), a novel deformation-aware network for jointly detecting and describing keypoints, to handle the challenging problem of matching deformable surfaces. All network components work cooperatively through a feature fusion approach that enforces the descriptors' distinctiveness and invariance. Experiments using real deforming objects showcase the superiority of our method, where it delivers 8% improvement in matching scores compared to the previous best results. Our approach also enhances the performance of two real-world applications: deformable object retrieval and non-rigid 3D surface registration. Code for training, inference, and applications are publicly available at https://verlab.dcc.ufmg.br/descriptors/dalf_cvpr23.
△ Less
Submitted 2 April, 2023;
originally announced April 2023.