Search | arXiv e-print repository

arXiv:2504.20196 [pdf, other]

Prompting LLMs for Code Editing: Struggles and Remedies

Authors: Daye Nam, Ahmed Omran, Ambar Murillo, Saksham Thakur, Abner Araujo, Marcel Blistein, Alexander Frömmgen, Vincent Hellendoorn, Satish Chandra

Abstract: Large Language Models (LLMs) are rapidly transforming software engineering, with coding assistants embedded in an IDE becoming increasingly prevalent. While research has focused on improving the tools and understanding developer perceptions, a critical gap exists in understanding how developers actually use these tools in their daily workflows, and, crucially, where they struggle. This paper addre… ▽ More Large Language Models (LLMs) are rapidly transforming software engineering, with coding assistants embedded in an IDE becoming increasingly prevalent. While research has focused on improving the tools and understanding developer perceptions, a critical gap exists in understanding how developers actually use these tools in their daily workflows, and, crucially, where they struggle. This paper addresses part of this gap through a multi-phased investigation of developer interactions with an LLM-powered code editing and transformation feature, Transform Code, in an IDE widely used at Google. First, we analyze telemetry logs of the feature usage, revealing that frequent re-prompting can be an indicator of developer struggles with using Transform Code. Second, we conduct a qualitative analysis of unsatisfactory requests, identifying five key categories of information often missing from developer prompts. Finally, based on these findings, we propose and evaluate a tool, AutoPrompter, for automatically improving prompts by inferring missing information from the surrounding code context, leading to a 27% improvement in edit correctness on our test set. △ Less

Submitted 28 April, 2025; originally announced April 2025.

arXiv:2503.21581 [pdf, other]

AlignDiff: Learning Physically-Grounded Camera Alignment via Diffusion

Authors: Liuyue Xie, Jiancong Guo, Ozan Cakmakci, Andre Araujo, Laszlo A. Jeni, Zhiheng Jia

Abstract: Accurate camera calibration is a fundamental task for 3D perception, especially when dealing with real-world, in-the-wild environments where complex optical distortions are common. Existing methods often rely on pre-rectified images or calibration patterns, which limits their applicability and flexibility. In this work, we introduce a novel framework that addresses these challenges by jointly mode… ▽ More Accurate camera calibration is a fundamental task for 3D perception, especially when dealing with real-world, in-the-wild environments where complex optical distortions are common. Existing methods often rely on pre-rectified images or calibration patterns, which limits their applicability and flexibility. In this work, we introduce a novel framework that addresses these challenges by jointly modeling camera intrinsic and extrinsic parameters using a generic ray camera model. Unlike previous approaches, AlignDiff shifts focus from semantic to geometric features, enabling more accurate modeling of local distortions. We propose AlignDiff, a diffusion model conditioned on geometric priors, enabling the simultaneous estimation of camera distortions and scene geometry. To enhance distortion prediction, we incorporate edge-aware attention, focusing the model on geometric features around image edges, rather than semantic content. Furthermore, to enhance generalizability to real-world captures, we incorporate a large database of ray-traced lenses containing over three thousand samples. This database characterizes the distortion inherent in a diverse variety of lens forms. Our experiments demonstrate that the proposed method significantly reduces the angular error of estimated ray bundles by ~8.2 degrees and overall calibration accuracy, outperforming existing approaches on challenging, real-world datasets. △ Less

Submitted 27 March, 2025; originally announced March 2025.

arXiv:2503.09145 [pdf, other]

doi 10.1109/WiMob61911.2024.10770327

Charting 5G Energy Efficiency: Flexible Energy Modeling for Sustainable Networks

Authors: Anderson L de Araujo, Luc Deneire, Guillaume Urvoy-Keller, André L F de Almeida

Abstract: Despite the rapid advancements in 5G technology, accurately assessing the energy consumption of its Radio Access Networks (RANs) remains a challenge due to the diverse range of applicable technologies and implementation solutions. Designing a versatile power model for estimating the 5G RANspecific power consumption requires extensive data collection and experimental studies to capture the diverse… ▽ More Despite the rapid advancements in 5G technology, accurately assessing the energy consumption of its Radio Access Networks (RANs) remains a challenge due to the diverse range of applicable technologies and implementation solutions. Designing a versatile power model for estimating the 5G RANspecific power consumption requires extensive data collection and experimental studies to capture the diverse range of technologies and implementation solutions. The objective is to outline a versatile energy model capable of estimating RAN-specific energy consumption, encompassing both mobile terminals and the physical layer (PHY) of base stations. In this paper, we focus on the computational complexity of the baseband part of the model. The developed (part of the) model is compared with the estimation of the number of cycles (and energy per cycle) used by a specific implementation (here a Matlab code ported on an Intel target), enabling the assessment of the model with the estimation of energy consumed on a real target. The study's results show a good agreement between the model and the implementation, even if some parts need to be refined to take specific algorithms into account. The key contribution is the development of an initial flexible energy model with finer granularity, enabling comparisons of energy use across various applications and contexts, and offering a comprehensive tool for optimizing 5G network energy consumption. △ Less

Submitted 12 March, 2025; originally announced March 2025.

Journal ref: 20th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob 2024), Oct 2024, Paris, France. pp.721-726

arXiv:2502.07950 [pdf, other]

Embracing Experiential Learning: Hackathons as an Educational Strategy for Shaping Soft Skills in Software Engineering

Authors: Allysson Allex Araújo, Marcos Kalinowski, Maria Teresa Baldassarre

Abstract: In recent years, Software Engineering (SE) scholars and practitioners have emphasized the importance of integrating soft skills into SE education. However, teaching and learning soft skills are complex, as they cannot be acquired passively through raw knowledge acquisition. On the other hand, hackathons have attracted increasing attention due to their experiential, collaborative, and intensive nat… ▽ More In recent years, Software Engineering (SE) scholars and practitioners have emphasized the importance of integrating soft skills into SE education. However, teaching and learning soft skills are complex, as they cannot be acquired passively through raw knowledge acquisition. On the other hand, hackathons have attracted increasing attention due to their experiential, collaborative, and intensive nature, which certain tasks could be similar to real-world software development. This paper aims to discuss the idea of hackathons as an educational strategy for shaping SE students' soft skills in practice. Initially, we overview the existing literature on soft skills and hackathons in SE education. Then, we report preliminary empirical evidence from a seven-day hybrid hackathon involving 40 students. We assess how the hackathon experience promoted innovative and creative thinking, collaboration and teamwork, and knowledge application among participants through a structured questionnaire designed to evaluate students' self-awareness. Lastly, our findings and new directions are analyzed through the lens of Self-Determination Theory, which offers a psychological lens to understand human behavior. This paper contributes to academia by advocating the potential of hackathons in SE education and proposing concrete plans for future research within SDT. For industry, our discussion has implications around developing soft skills in future SE professionals, thereby enhancing their employability and readiness in the software market. △ Less

Submitted 11 February, 2025; originally announced February 2025.

Comments: To appear in Proceedings of the 2025 IEEE Conference on Software Engineering Education and Training, CSEE&T 2025

arXiv:2502.05108 [pdf, other]

Towards Emotionally Intelligent Software Engineers: Understanding Students' Self-Perceptions After a Cooperative Learning Experience

Authors: Allysson Allex Araújo, Marcos Kalinowski, Matheus Paixao, Daniel Graziotin

Abstract: [Background] Emotional Intelligence (EI) can impact Software Engineering (SE) outcomes through improved team communication, conflict resolution, and stress management. SE workers face increasing pressure to develop both technical and interpersonal skills, as modern software development emphasizes collaborative work and complex team interactions. Despite EI's documented importance in professional p… ▽ More [Background] Emotional Intelligence (EI) can impact Software Engineering (SE) outcomes through improved team communication, conflict resolution, and stress management. SE workers face increasing pressure to develop both technical and interpersonal skills, as modern software development emphasizes collaborative work and complex team interactions. Despite EI's documented importance in professional practice, SE education continues to prioritize technical knowledge over emotional and social competencies. [Objective] This paper analyzes SE students' self-perceptions of their EI after a two-month cooperative learning project, using Mayer and Salovey's four-ability model to examine how students handle emotions in collaborative development. [Method] We conducted a case study with 29 SE students organized into four squads within a project-based learning course, collecting data through questionnaires and focus groups that included brainwriting and sharing circles, then analyzing the data using descriptive statistics and open coding. [Results] Students demonstrated stronger abilities in managing their own emotions compared to interpreting others' emotional states. Despite limited formal EI training, they developed informal strategies for emotional management, including structured planning and peer support networks, which they connected to improved productivity and conflict resolution. [Conclusion] This study shows how SE students perceive EI in a collaborative learning context and provides evidence-based insights into the important role of emotional competencies in SE education. △ Less

Submitted 7 February, 2025; originally announced February 2025.

Comments: 12 pages, 4 figures. To appear in Proceedings of the 2025 IEEE/ACM 17th International Conference on Cooperative and Human Aspects of Software Engineering, CHASE 2025

arXiv:2501.11431 [pdf, other]

Blockchain Developer Experience: A Multivocal Literature Review

Authors: P. Soares, A. A. Araujo, G. Destefanis, R. Neykova, R. Saraiva, J. Souza

Abstract: The rise of smart contracts has expanded blockchain's capabilities, enabling the development of innovative decentralized applications (dApps). However, this advancement brings its own challenges, including the management of distributed architectures and immutable data. Addressing these complexities requires a specialized approach to software engineering, with blockchain-oriented practices emerging… ▽ More The rise of smart contracts has expanded blockchain's capabilities, enabling the development of innovative decentralized applications (dApps). However, this advancement brings its own challenges, including the management of distributed architectures and immutable data. Addressing these complexities requires a specialized approach to software engineering, with blockchain-oriented practices emerging to support development in this domain. Developer Experience (DEx) is central to this effort, focusing on the usability, productivity, and overall satisfaction of tools and frameworks from the engineers' perspective. Despite its importance, research on Blockchain Developer Experience (BcDEx) remains limited, with no systematic mapping of academic and industry efforts. To bridge this gap, we conducted a Multivocal Literature Review analyzing 62 to understand the distribution of BcDEx sources, practical implementations, and their impact. Our findings revealed that academic focus on BcDEx is limited compared to the coverage in gray literature, which primarily includes blogs (41.8%) and corporate sources (21.8%). Particularly, development efficiency, multi-network support, and usability are the most addressed aspects in tools and frameworks. In addition, we found that BcDEx is being shaped through five key perspectives: complexity abstraction, adoption facilitation, productivity enhancement, developer education, and BcDEx evaluation. △ Less

Submitted 20 January, 2025; originally announced January 2025.

Comments: 12 pages, 5 figures, 18th Conference on Cooperative and Human Aspects of Software Engineering (CHASE)

arXiv:2411.17753 [pdf, other]

Observability in Fog Computing

Authors: Aleteia Araujo, Breno Costa, Joao Bachiega Jr, Leonardo R. Carvalho, Rajkumar Buyya

Abstract: Fog Computing provides computational resources close to the end user, supporting low-latency and high-bandwidth communications. It supports IoT applications, enabling real-time data processing, analytics, and decision-making at the edge of the network. However, the high distribution of its constituent nodes and resource-restricted devices interconnected by heterogeneous and unreliable networks mak… ▽ More Fog Computing provides computational resources close to the end user, supporting low-latency and high-bandwidth communications. It supports IoT applications, enabling real-time data processing, analytics, and decision-making at the edge of the network. However, the high distribution of its constituent nodes and resource-restricted devices interconnected by heterogeneous and unreliable networks makes it challenging to execute service maintenance and troubleshooting, increasing the time to restore the application after failures and not guaranteeing the service level agreements. In such a scenario, increasing the observability of Fog applications and services may speed up troubleshooting and increase their availability. An observability system is a data-intensive service, and Fog Computing could have its nodes and channels saturated with an additional load. In this work, we detail the three pillars of observability (metrics, log, and traces), discuss the challenges, and clarify the approaches for increasing the observability of services in Fog environments. Furthermore, the system architecture that supports observability in Fog, related tools, and technologies are presented, providing a comprehensive discussion on this subject. An example of a solution shows how a real-world application can benefit from increased observability in this environment. Finally, there is a discussion about the future directions of Fog observability. △ Less

Submitted 25 November, 2024; originally announced November 2024.

arXiv:2410.19439 [pdf, other]

Non-Dominated Sorting Bidirectional Differential Coevolution

Authors: Cicero S. R. Mendes, Aluizio F. R. Araújo, Lucas R. C. Farias

Abstract: Constrained multiobjective optimization problems (CMOPs) are commonly found in real-world applications. CMOP is a complex problem that needs to satisfy a set of equality or inequality constraints. This paper proposes a variant of the bidirectional coevolution algorithm (BiCo) with differential evolution (DE). The novelties in the model include the DE differential mutation and crossover operators a… ▽ More Constrained multiobjective optimization problems (CMOPs) are commonly found in real-world applications. CMOP is a complex problem that needs to satisfy a set of equality or inequality constraints. This paper proposes a variant of the bidirectional coevolution algorithm (BiCo) with differential evolution (DE). The novelties in the model include the DE differential mutation and crossover operators as the main search engine and a non-dominated sorting selection scheme. Experimental results on two benchmark test suites and eight real-world CMOPs suggested that the proposed model reached better overall performance than the original model. △ Less

Submitted 25 October, 2024; originally announced October 2024.

Comments: 6 pages

arXiv:2410.19203 [pdf, other]

An Inverse Modeling Constrained Multi-Objective Evolutionary Algorithm Based on Decomposition

Authors: Lucas R. C. Farias, Aluizio F. R. Araújo

Abstract: This paper introduces the inverse modeling constrained multi-objective evolutionary algorithm based on decomposition (IM-C-MOEA/D) for addressing constrained real-world optimization problems. Our research builds upon the advancements made in evolutionary computing-based inverse modeling, and it strategically bridges the gaps in applying inverse models based on decomposition to problem domains with… ▽ More This paper introduces the inverse modeling constrained multi-objective evolutionary algorithm based on decomposition (IM-C-MOEA/D) for addressing constrained real-world optimization problems. Our research builds upon the advancements made in evolutionary computing-based inverse modeling, and it strategically bridges the gaps in applying inverse models based on decomposition to problem domains with constraints. The proposed approach is experimentally evaluated on diverse real-world problems (RWMOP1-35), showing superior performance to state-of-the-art constrained multi-objective evolutionary algorithms (CMOEAs). The experimental results highlight the robustness of the algorithm and its applicability in real-world constrained optimization scenarios. △ Less

Submitted 24 October, 2024; originally announced October 2024.

Comments: 6 pages, 1 figure, 1 algorithm, and 2 tables

arXiv:2410.16512 [pdf, other]

TIPS: Text-Image Pretraining with Spatial awareness

Authors: Kevis-Kokitsi Maninis, Kaifeng Chen, Soham Ghosh, Arjun Karpur, Koert Chen, Ye Xia, Bingyi Cao, Daniel Salz, Guangxing Han, Jan Dlabal, Dan Gnanapragasam, Mojtaba Seyedhosseini, Howard Zhou, Andre Araujo

Abstract: While image-text representation learning has become very popular in recent years, existing models tend to lack spatial awareness and have limited direct applicability for dense understanding tasks. For this reason, self-supervised image-only pretraining is still the go-to method for many dense vision applications (e.g. depth estimation, semantic segmentation), despite the lack of explicit supervis… ▽ More While image-text representation learning has become very popular in recent years, existing models tend to lack spatial awareness and have limited direct applicability for dense understanding tasks. For this reason, self-supervised image-only pretraining is still the go-to method for many dense vision applications (e.g. depth estimation, semantic segmentation), despite the lack of explicit supervisory signals. In this paper, we close this gap between image-text and self-supervised learning, by proposing a novel general-purpose image-text model, which can be effectively used off the shelf for dense and global vision tasks. Our method, which we refer to as Text-Image Pretraining with Spatial awareness (TIPS), leverages two simple and effective insights. First, on textual supervision: we reveal that replacing noisy web image captions by synthetically generated textual descriptions boosts dense understanding performance significantly, due to a much richer signal for learning spatially aware representations. We propose an adapted training method that combines noisy and synthetic captions, resulting in improvements across both dense and global understanding tasks. Second, on the learning technique: we propose to combine contrastive image-text learning with self-supervised masked image modeling, to encourage spatial coherence, unlocking substantial enhancements for downstream applications. Building on these two ideas, we scale our model using the transformer architecture, trained on a curated set of public images. Our experiments are conducted on 8 tasks involving 16 datasets in total, demonstrating strong off-the-shelf performance on both dense and global understanding, for several image-only and image-text tasks. Code and models are released at https://github.com/google-deepmind/tips. △ Less

Submitted 7 March, 2025; v1 submitted 21 October, 2024; originally announced October 2024.

Comments: ICLR2025 camera-ready + appendix

arXiv:2410.02080 [pdf, ps, other]

EMMA: Efficient Visual Alignment in Multi-Modal LLMs

Authors: Sara Ghazanfari, Alexandre Araujo, Prashanth Krishnamurthy, Siddharth Garg, Farshad Khorrami

Abstract: Multi-modal Large Language Models (MLLMs) have recently exhibited impressive general-purpose capabilities by leveraging vision foundation models to encode the core concepts of images into representations. These are then combined with instructions and processed by the language model to generate high-quality responses. Despite significant progress in enhancing the language component, challenges pers… ▽ More Multi-modal Large Language Models (MLLMs) have recently exhibited impressive general-purpose capabilities by leveraging vision foundation models to encode the core concepts of images into representations. These are then combined with instructions and processed by the language model to generate high-quality responses. Despite significant progress in enhancing the language component, challenges persist in optimally fusing visual encodings within the language model for task-specific adaptability. Recent research has focused on improving this fusion through modality adaptation modules but at the cost of significantly increased model complexity and training data needs. In this paper, we propose EMMA (Efficient Multi-Modal Adaptation), a lightweight cross-modality module designed to efficiently fuse visual and textual encodings, generating instruction-aware visual representations for the language model. Our key contributions include: (1) an efficient early fusion mechanism that integrates vision and language representations with minimal added parameters (less than 0.2% increase in model size), (2) an in-depth interpretability analysis that sheds light on the internal mechanisms of the proposed method; (3) comprehensive experiments that demonstrate notable improvements on both specialized and general benchmarks for MLLMs. Empirical results show that EMMA boosts performance across multiple tasks by up to 9.3% while significantly improving robustness against hallucinations. Our code is available at https://github.com/SaraGhazanfari/EMMA △ Less

Submitted 10 June, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

arXiv:2409.07926 [pdf, ps, other]

Mobile App Security Trends and Topics: An Examination of Questions From Stack Overflow

Authors: Timothy Huo, Ana Catarina Araújo, Jake Imanaka, Anthony Peruma, Rick Kazman

Abstract: The widespread use of smartphones and tablets has made society heavily reliant on mobile applications (apps) for accessing various resources and services. These apps often handle sensitive personal, financial, and health data, making app security a critical concern for developers. While there is extensive research on software security topics like malware and vulnerabilities, less is known about th… ▽ More The widespread use of smartphones and tablets has made society heavily reliant on mobile applications (apps) for accessing various resources and services. These apps often handle sensitive personal, financial, and health data, making app security a critical concern for developers. While there is extensive research on software security topics like malware and vulnerabilities, less is known about the practical security challenges mobile app developers face and the guidance they seek. In this study, we mine Stack Overflow for questions on mobile app security, which we analyze using quantitative and qualitative techniques. The findings reveal that Stack Overflow is a major resource for developers seeking help with mobile app security, especially for Android apps, and identifies seven main categories of security questions: Secured Communications, Database, App Distribution Service, Encryption, Permissions, File-Specific, and General Security. Insights from this research can inform the development of tools, techniques, and resources by the research and vendor community to better support developers in securing their mobile apps. △ Less

Submitted 14 September, 2024; v1 submitted 12 September, 2024; originally announced September 2024.

Comments: This paper was accepted for publication at the 58th Hawaii International Conference on System Sciences (HICSS) - Software Technology Track

arXiv:2408.16110 [pdf, other]

doi 10.1016/j.entcom.2024.100853

Micro and macro facial expressions by driven animations in realistic Virtual Humans

Authors: Rubens Halbig Montanha, Giovana Nascimento Raupp, Ana Carolina Policarpo Schmitt, Victor Flávio de Andrade Araujo, Soraia Raupp Musse

Abstract: Computer Graphics (CG) advancements have allowed the creation of more realistic Virtual Humans (VH) through modern techniques for animating the VH body and face, thereby affecting perception. From traditional methods, including blend shapes, to driven animations using facial and body tracking, these advancements can potentially enhance the perception of comfort and realism in relation to VHs. Prev… ▽ More Computer Graphics (CG) advancements have allowed the creation of more realistic Virtual Humans (VH) through modern techniques for animating the VH body and face, thereby affecting perception. From traditional methods, including blend shapes, to driven animations using facial and body tracking, these advancements can potentially enhance the perception of comfort and realism in relation to VHs. Previously, Psychology studied facial movements in humans, with some works separating expressions into macro and micro expressions. Also, some previous CG studies have analyzed how macro and micro expressions are perceived, replicating psychology studies in VHs, encompassing studies with realistic and cartoon VHs, and exploring different VH technologies. However, instead of using facial tracking animation methods, these previous studies animated the VHs using blendshapes interpolation. To understand how the facial tracking technique alters the perception of VHs, this paper extends the study to macro and micro expressions, employing two datasets to transfer real facial expressions to VHs and analyze how their expressions are perceived. Our findings suggest that transferring facial expressions from real actors to VHs significantly diminishes the accuracy of emotion perception compared to VH facial animations created by artists. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Journal ref: Entertainment Computing, Volume 52, January 2025

arXiv:2408.09032 [pdf, other]

A Developer-Centric Study Exploring Mobile Application Security Practices and Challenges

Authors: Anthony Peruma, Timothy Huo, Ana Catarina Araújo, Jake Imanaka, Rick Kazman

Abstract: Mobile applications (apps) have become an essential part of everyday life, offering convenient access to services such as banking, healthcare, and shopping. With these apps handling sensitive personal and financial data, ensuring their security is paramount. While previous research has explored mobile app developer practices, there is limited knowledge about the common practices and challenges tha… ▽ More Mobile applications (apps) have become an essential part of everyday life, offering convenient access to services such as banking, healthcare, and shopping. With these apps handling sensitive personal and financial data, ensuring their security is paramount. While previous research has explored mobile app developer practices, there is limited knowledge about the common practices and challenges that developers face in securing their apps. Our study addresses this need through a global survey of 137 experienced mobile app developers, providing a developer-centric view of mobile app security. Our findings show that developers place high importance on security, frequently implementing features such as authentication and secure storage. They face challenges with managing vulnerabilities, permissions, and privacy concerns, and often rely on resources like Stack Overflow for help. Many developers find that existing learning materials do not adequately prepare them to build secure apps and provide recommendations, such as following best practices and integrating security at the beginning of the development process. We envision our findings leading to improved security practices, better-designed tools and resources, and more effective training programs. △ Less

Submitted 16 August, 2024; originally announced August 2024.

Comments: Accepted: International Conference on Software Maintenance and Evolution (ICSME 2024); Industry Track

arXiv:2407.21127 [pdf]

Teaching Survey Research in Software Engineering

Authors: Marcos Kalinowski, Allysson Allex Araújo, Daniel Mendez

Abstract: In this chapter, we provide advice on how to effectively teach survey research based on lessons learned from several international teaching experiences on the topic and from conducting large-scale surveys published at various scientific conferences and journals. First, we provide teachers with a potential syllabus for teaching survey research, including learning objectives, lectures, and examples… ▽ More In this chapter, we provide advice on how to effectively teach survey research based on lessons learned from several international teaching experiences on the topic and from conducting large-scale surveys published at various scientific conferences and journals. First, we provide teachers with a potential syllabus for teaching survey research, including learning objectives, lectures, and examples of practical assignments. Thereafter, we provide actionable advice on how to teach the topics related to each learning objective, including survey design, sampling, data collection, statistical and qualitative analysis, threats to validity and reliability, and ethical considerations. The chapter is complemented by online teaching resources, including slides covering an entire course. △ Less

Submitted 30 July, 2024; originally announced July 2024.

arXiv:2407.21121 [pdf, other]

Tuning the Frequencies: Robust Training for Sinusoidal Neural Networks

Authors: Tiago Novello, Diana Aldana, Andre Araujo, Luiz Velho

Abstract: Sinusoidal neural networks have been shown effective as implicit neural representations (INRs) of low-dimensional signals, due to their smoothness and high representation capacity. However, initializing and training them remain empirical tasks which lack on deeper understanding to guide the learning process. To fill this gap, our work introduces a theoretical framework that explains the capacity p… ▽ More Sinusoidal neural networks have been shown effective as implicit neural representations (INRs) of low-dimensional signals, due to their smoothness and high representation capacity. However, initializing and training them remain empirical tasks which lack on deeper understanding to guide the learning process. To fill this gap, our work introduces a theoretical framework that explains the capacity property of sinusoidal networks and offers robust control mechanisms for initialization and training. Our analysis is based on a novel amplitude-phase expansion of the sinusoidal multilayer perceptron, showing how its layer compositions produce a large number of new frequencies expressed as integer combinations of the input frequencies. This relationship can be directly used to initialize the input neurons, as a form of spectral sampling, and to bound the network's spectrum while training. Our method, referred to as TUNER (TUNing sinusoidal nEtwoRks), greatly improves the stability and convergence of sinusoidal INR training, leading to detailed reconstructions, while preventing overfitting. △ Less

Submitted 3 April, 2025; v1 submitted 30 July, 2024; originally announced July 2024.

Comments: CVPR2025 camera-ready + supplementary material

arXiv:2407.15982 [pdf]

Agile Minds, Innovative Solutions, and Industry-Academia Collaboration: Lean R&D Meets Problem-Based Learning in Software Engineering Education

Authors: Lucas Romao, Marcos Kalinowski, Clarissa Barbosa, Allysson Allex Araújo, Simone D. J. Barbosa, Helio Lopes

Abstract: [Context] Software Engineering (SE) education constantly seeks to bridge the gap between academic knowledge and industry demands, with active learning methods like Problem-Based Learning (PBL) gaining prominence. Despite these efforts, recent graduates struggle to align skills with industry needs. Recognizing the relevance of Industry-Academia Collaboration (IAC), Lean R&D has emerged as a success… ▽ More [Context] Software Engineering (SE) education constantly seeks to bridge the gap between academic knowledge and industry demands, with active learning methods like Problem-Based Learning (PBL) gaining prominence. Despite these efforts, recent graduates struggle to align skills with industry needs. Recognizing the relevance of Industry-Academia Collaboration (IAC), Lean R&D has emerged as a successful agile-based research and development approach, emphasizing business and software development synergy. [Goal] This paper aims to extend Lean R&D with PBL principles, evaluating its application in an educational program designed by ExACTa PUC- Rio for Americanas S.A., a large Brazilian retail company. [Method] The educational program engaged 40 part-time students receiving lectures and mentoring while working on real problems, coordinators and mentors, and company stakeholders in industry projects. Empirical evaluation, through a case study approach, utilized structured questionnaires based on the Technology Acceptance Model (TAM). [Results] Stakeholders were satisfied with Lean R&D PBL for problem-solving. Students reported increased knowledge proficiency and perceived working on real problems as contributing the most to their learning. [Conclusion] This research contributes to academia by sharing Lean R&D PBL as an educational IAC approach. For industry, we discuss the implementation of this proposal in an IAC program that promotes workforce skill development and innovative solutions. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.15829 [pdf]

Investigating Benefits and Limitations of Migrating to a Micro-Frontends Architecture

Authors: Fabio Antunes, Maria Julia Dias Lima, Marco Antônio Pereira Araújo, Davide Taibi, Marcos Kalinowski

Abstract: [Context] The adoption of micro-frontends architectures has gained traction as a promising approach to enhance modularity, scalability, and maintainability of web applications. [Goal] The primary aim of this research is to investigate the benefits and limitations of migrating a real-world application to a micro-frontends architecture from the perspective of the developers. [Method] Based on the ac… ▽ More [Context] The adoption of micro-frontends architectures has gained traction as a promising approach to enhance modularity, scalability, and maintainability of web applications. [Goal] The primary aim of this research is to investigate the benefits and limitations of migrating a real-world application to a micro-frontends architecture from the perspective of the developers. [Method] Based on the action research approach, after diagnosis and planning, we applied an intervention of migrating the target web application to a micro-frontends architecture. Thereafter, the migration was evaluated in a workshop involving the remaining developers responsible for maintaining the application. During the workshop, these developers were presented with the migrated architecture, conducted a simple maintenance task, discussed benefits and limitations in a focus group to gather insights, and answered a questionnaire on the acceptance of the technology. [Results] Developers' perceptions gathered during the focus group reinforce the benefits and limitations reported in the literature. Key benefits included enhanced flexibility in technology choices, scalability of development teams, and gradual migration of technologies. However, the increased complexity of the architecture raised concerns among developers, particularly in dependency and environment management, debugging, and integration testing. [Conclusions] While micro-frontends represent a promising technology, unresolved issues still limit their broader applicability. Developers generally perceived the architecture as useful and moderately easy to use but hesitated to adopt it. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.15821 [pdf]

Towards Effective Collaboration between Software Engineers and Data Scientists developing Machine Learning-Enabled Systems

Authors: Gabriel Busquim, Allysson Allex Araújo, Maria Julia Lima, Marcos Kalinowski

Abstract: Incorporating Machine Learning (ML) into existing systems is a demand that has grown among several organizations. However, the development of ML-enabled systems encompasses several social and technical challenges, which must be addressed by actors with different fields of expertise working together. This paper has the objective of understanding how to enhance the collaboration between two key acto… ▽ More Incorporating Machine Learning (ML) into existing systems is a demand that has grown among several organizations. However, the development of ML-enabled systems encompasses several social and technical challenges, which must be addressed by actors with different fields of expertise working together. This paper has the objective of understanding how to enhance the collaboration between two key actors in building these systems: software engineers and data scientists. We conducted two focus group sessions with experienced data scientists and software engineers working on real-world ML-enabled systems to assess the relevance of different recommendations for specific technical tasks. Our research has found that collaboration between these actors is important for effectively developing ML-enabled systems, especially when defining data access and ML model deployment. Participants provided concrete examples of how recommendations depicted in the literature can benefit collaboration during different tasks. For example, defining clear responsibilities for each team member and creating concise documentation can improve communication and overall performance. Our study contributes to a better understanding of how to foster effective collaboration between software engineers and data scientists creating ML-enabled systems. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.00035 [pdf, other]

doi 10.1007/978-3-031-63992-0_21

Achieving Observability on Fog Computing with the use of open-source tools

Authors: Breno Costa, Abhik Banerjee, Prem Prakash Jayaraman, Leonardo R. Carvalho, João Bachiega Jr., Aleteia Araujo

Abstract: Fog computing can provide computational resources and low-latency communication at the network edge. But with it comes uncertainties that must be managed in order to guarantee Service Level Agreements. Service observability can help the environment better deal with uncertainties, delivering relevant and up-to-date information in a timely manner to support decision making. Observability is consider… ▽ More Fog computing can provide computational resources and low-latency communication at the network edge. But with it comes uncertainties that must be managed in order to guarantee Service Level Agreements. Service observability can help the environment better deal with uncertainties, delivering relevant and up-to-date information in a timely manner to support decision making. Observability is considered a superset of monitoring since it uses not only performance metrics, but also other instrumentation domains such as logs and traces. However, as Fog Computing is typically characterised by resource-constrained nodes and network uncertainties, increasing observability in fog can be risky due to the additional load injected into a restricted environment. There is no work in the literature that evaluated fog observability. In this paper, we first outline the challenges of achieving observability in a Fog environment, based on which we present a formal definition of fog observability. Subsequently, a real-world Fog Computing testbed running a smart city use case is deployed, and an empirical evaluation of fog observability using open-source tools is presented. The results show that under certain conditions, it is viable to provide observability in a Fog Computing environment using open-source tools, although it is necessary to control the overhead modifying their default configuration according to the application characteristics. △ Less

Submitted 25 May, 2024; originally announced July 2024.

Comments: Paper presented at Mobiquitous 2023

arXiv:2406.08332 [pdf, other]

UDON: Universal Dynamic Online distillatioN for generic image representations

Authors: Nikolaos-Antonios Ypsilantis, Kaifeng Chen, André Araujo, Ondřej Chum

Abstract: Universal image representations are critical in enabling real-world fine-grained and instance-level recognition applications, where objects and entities from any domain must be identified at large scale. Despite recent advances, existing methods fail to capture important domain-specific knowledge, while also ignoring differences in data distribution across different domains. This leads to a large… ▽ More Universal image representations are critical in enabling real-world fine-grained and instance-level recognition applications, where objects and entities from any domain must be identified at large scale. Despite recent advances, existing methods fail to capture important domain-specific knowledge, while also ignoring differences in data distribution across different domains. This leads to a large performance gap between efficient universal solutions and expensive approaches utilising a collection of specialist models, one for each domain. In this work, we make significant strides towards closing this gap, by introducing a new learning technique, dubbed UDON (Universal Dynamic Online DistillatioN). UDON employs multi-teacher distillation, where each teacher is specialized in one domain, to transfer detailed domain-specific knowledge into the student universal embedding. UDON's distillation approach is not only effective, but also very efficient, by sharing most model parameters between the student and all teachers, where all models are jointly trained in an online manner. UDON also comprises a sampling technique which adapts the training process to dynamically allocate batches to domains which are learned slower and require more frequent processing. This boosts significantly the learning of complex domains which are characterised by a large number of classes and long-tail distributions. With comprehensive experiments, we validate each component of UDON, and showcase significant improvements over the state of the art in the recent UnED benchmark. Code: https://github.com/nikosips/UDON . △ Less

Submitted 9 December, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: NeurIPS 2024 accepted

arXiv:2406.03342 [pdf]

doi 10.5753/washes.2024.2986

Understanding and measuring software engineer behavior: What can we learn from the behavioral sciences?

Authors: Allysson Allex Araújo, Marcos Kalinowski, Daniel Graziotin

Abstract: This paper explores the intricate challenge of understanding and measuring software engineer behavior. More specifically, we revolve around a central question: How can we enhance our understanding of software engineer behavior? Grounded in the nuanced complexities addressed within Behavioral Software Engineering (BSE), we advocate for holistic methods that integrate quantitative measures, such as… ▽ More This paper explores the intricate challenge of understanding and measuring software engineer behavior. More specifically, we revolve around a central question: How can we enhance our understanding of software engineer behavior? Grounded in the nuanced complexities addressed within Behavioral Software Engineering (BSE), we advocate for holistic methods that integrate quantitative measures, such as psychometric instruments, and qualitative data from diverse sources. Furthermore, we delve into the relevance of this challenge within national and international contexts, highlighting the increasing interest in understanding software engineer behavior. Real-world initiatives and academic endeavors are also examined to underscore the potential for advancing this research agenda and, consequently, refining software engineering practices based on behavioral aspects. Lastly, this paper addresses different ways to evaluate the progress of this challenge by leveraging methodological skills derived from behavioral sciences, ultimately contributing to a deeper understanding of software engineer behavior and software engineering practices. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 6

arXiv:2406.02487 [pdf]

Investigating the Online Recruitment and Selection Journey of Novice Software Engineers: Anti-patterns and Recommendations

Authors: Miguel Setúbal, Tayana Conte, Marcos Kalinowski, Allysson Allex Araújo

Abstract: [Context] The growing software development market has increased the demand for qualified professionals in Software Engineering (SE). To this end, companies must enhance their Recruitment and Selection (R&S) processes to maintain high quality teams, including opening opportunities for beginners, such as trainees and interns. However, given the various judgments and sociotechnical factors involved,… ▽ More [Context] The growing software development market has increased the demand for qualified professionals in Software Engineering (SE). To this end, companies must enhance their Recruitment and Selection (R&S) processes to maintain high quality teams, including opening opportunities for beginners, such as trainees and interns. However, given the various judgments and sociotechnical factors involved, this complex process of R&S poses a challenge for recent graduates seeking to enter the market. [Objective] This paper aims to identify a set of anti-patterns and recommendations for early career SE professionals concerning R&S processes. [Method] Under an exploratory and qualitative methodological approach, we conducted six online Focus Groups with 18 recruiters with experience in R&S in the software industry. [Results] After completing our qualitative analysis, we identified 12 anti-patterns and 31 actionable recommendations regarding the hiring process focused on entry level SE professionals. The identified anti-patterns encompass behavioral and technical dimensions innate to R&S processes. [Conclusion] These findings provide a rich opportunity for reflection in the SE industry and offer valuable guidance for early-career candidates and organizations. From an academic perspective, this work also raises awareness of the intersection of Human Resources and SE, an area with considerable potential to be expanded in the context of cooperative and human aspects of SE. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 33 pages

arXiv:2405.12979 [pdf, other]

OmniGlue: Generalizable Feature Matching with Foundation Model Guidance

Authors: Hanwen Jiang, Arjun Karpur, Bingyi Cao, Qixing Huang, Andre Araujo

Abstract: The image matching field has been witnessing a continuous emergence of novel learnable feature matching techniques, with ever-improving performance on conventional benchmarks. However, our investigation shows that despite these gains, their potential for real-world applications is restricted by their limited generalization capabilities to novel image domains. In this paper, we introduce OmniGlue,… ▽ More The image matching field has been witnessing a continuous emergence of novel learnable feature matching techniques, with ever-improving performance on conventional benchmarks. However, our investigation shows that despite these gains, their potential for real-world applications is restricted by their limited generalization capabilities to novel image domains. In this paper, we introduce OmniGlue, the first learnable image matcher that is designed with generalization as a core principle. OmniGlue leverages broad knowledge from a vision foundation model to guide the feature matching process, boosting generalization to domains not seen at training time. Additionally, we propose a novel keypoint position-guided attention mechanism which disentangles spatial and appearance information, leading to enhanced matching descriptors. We perform comprehensive experiments on a suite of $7$ datasets with varied image domains, including scene-level, object-centric and aerial images. OmniGlue's novel components lead to relative gains on unseen domains of $20.9\%$ with respect to a directly comparable reference model, while also outperforming the recent LightGlue method by $9.5\%$ relatively.Code and model can be found at https://hwjiang1510.github.io/OmniGlue △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: CVPR 2024

arXiv:2404.19174 [pdf, other]

XFeat: Accelerated Features for Lightweight Image Matching

Authors: Guilherme Potje, Felipe Cadar, Andre Araujo, Renato Martins, Erickson R. Nascimento

Abstract: We introduce a lightweight and accurate architecture for resource-efficient visual correspondence. Our method, dubbed XFeat (Accelerated Features), revisits fundamental design choices in convolutional neural networks for detecting, extracting, and matching local features. Our new model satisfies a critical need for fast and robust algorithms suitable to resource-limited devices. In particular, acc… ▽ More We introduce a lightweight and accurate architecture for resource-efficient visual correspondence. Our method, dubbed XFeat (Accelerated Features), revisits fundamental design choices in convolutional neural networks for detecting, extracting, and matching local features. Our new model satisfies a critical need for fast and robust algorithms suitable to resource-limited devices. In particular, accurate image matching requires sufficiently large image resolutions - for this reason, we keep the resolution as large as possible while limiting the number of channels in the network. Besides, our model is designed to offer the choice of matching at the sparse or semi-dense levels, each of which may be more suitable for different downstream applications, such as visual navigation and augmented reality. Our model is the first to offer semi-dense matching efficiently, leveraging a novel match refinement module that relies on coarse local descriptors. XFeat is versatile and hardware-independent, surpassing current deep learning-based local features in speed (up to 5x faster) with comparable or better accuracy, proven in pose estimation and visual localization. We showcase it running in real-time on an inexpensive laptop CPU without specialized hardware optimizations. Code and weights are available at www.verlab.dcc.ufmg.br/descriptors/xfeat_cvpr24. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: CVPR 2024; Source code available at www.verlab.dcc.ufmg.br/descriptors/xfeat_cvpr24

arXiv:2404.05465 [pdf, other]

HAMMR: HierArchical MultiModal React agents for generic VQA

Authors: Lluis Castrejon, Thomas Mensink, Howard Zhou, Vittorio Ferrari, Andre Araujo, Jasper Uijlings

Abstract: Combining Large Language Models (LLMs) with external specialized tools (LLMs+tools) is a recent paradigm to solve multimodal tasks such as Visual Question Answering (VQA). While this approach was demonstrated to work well when optimized and evaluated for each individual benchmark, in practice it is crucial for the next generation of real-world AI systems to handle a broad range of multimodal probl… ▽ More Combining Large Language Models (LLMs) with external specialized tools (LLMs+tools) is a recent paradigm to solve multimodal tasks such as Visual Question Answering (VQA). While this approach was demonstrated to work well when optimized and evaluated for each individual benchmark, in practice it is crucial for the next generation of real-world AI systems to handle a broad range of multimodal problems. Therefore we pose the VQA problem from a unified perspective and evaluate a single system on a varied suite of VQA tasks including counting, spatial reasoning, OCR-based reasoning, visual pointing, external knowledge, and more. In this setting, we demonstrate that naively applying the LLM+tools approach using the combined set of all tools leads to poor results. This motivates us to introduce HAMMR: HierArchical MultiModal React. We start from a multimodal ReAct-based system and make it hierarchical by enabling our HAMMR agents to call upon other specialized agents. This enhances the compositionality of the LLM+tools approach, which we show to be critical for obtaining high accuracy on generic VQA. Concretely, on our generic VQA suite, HAMMR outperforms the naive LLM+tools approach by 19.5%. Additionally, HAMMR achieves state-of-the-art results on this task, outperforming the generic standalone PaLI-X VQA model by 5.0%. △ Less

Submitted 14 October, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.04809 [pdf, other]

Low-Resource Machine Translation through Retrieval-Augmented LLM Prompting: A Study on the Mambai Language

Authors: Raphaël Merx, Aso Mahmudi, Katrina Langford, Leo Alberto de Araujo, Ekaterina Vylomova

Abstract: This study explores the use of large language models (LLMs) for translating English into Mambai, a low-resource Austronesian language spoken in Timor-Leste, with approximately 200,000 native speakers. Leveraging a novel corpus derived from a Mambai language manual and additional sentences translated by a native speaker, we examine the efficacy of few-shot LLM prompting for machine translation (MT)… ▽ More This study explores the use of large language models (LLMs) for translating English into Mambai, a low-resource Austronesian language spoken in Timor-Leste, with approximately 200,000 native speakers. Leveraging a novel corpus derived from a Mambai language manual and additional sentences translated by a native speaker, we examine the efficacy of few-shot LLM prompting for machine translation (MT) in this low-resource context. Our methodology involves the strategic selection of parallel sentences and dictionary entries for prompting, aiming to enhance translation accuracy, using open-source and proprietary LLMs (LlaMa 2 70b, Mixtral 8x7B, GPT-4). We find that including dictionary entries in prompts and a mix of sentences retrieved through TF-IDF and semantic embeddings significantly improves translation quality. However, our findings reveal stark disparities in translation performance across test sets, with BLEU scores reaching as high as 21.2 on materials from the language manual, in contrast to a maximum of 4.4 on a test set provided by a native speaker. These results underscore the importance of diverse and representative corpora in assessing MT for low-resource languages. Our research provides insights into few-shot LLM prompting for low-resource MT, and makes available an initial corpus for the Mambai language. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Report number: https://aclanthology.org/2024.eurali-1.1/

arXiv:2402.09674 [pdf, other]

PAL: Proxy-Guided Black-Box Attack on Large Language Models

Authors: Chawin Sitawarin, Norman Mu, David Wagner, Alexandre Araujo

Abstract: Large Language Models (LLMs) have surged in popularity in recent months, but they have demonstrated concerning capabilities to generate harmful content when manipulated. While techniques like safety fine-tuning aim to minimize harmful use, recent works have shown that LLMs remain vulnerable to attacks that elicit toxic responses. In this work, we introduce the Proxy-Guided Attack on LLMs (PAL), th… ▽ More Large Language Models (LLMs) have surged in popularity in recent months, but they have demonstrated concerning capabilities to generate harmful content when manipulated. While techniques like safety fine-tuning aim to minimize harmful use, recent works have shown that LLMs remain vulnerable to attacks that elicit toxic responses. In this work, we introduce the Proxy-Guided Attack on LLMs (PAL), the first optimization-based attack on LLMs in a black-box query-only setting. In particular, it relies on a surrogate model to guide the optimization and a sophisticated loss designed for real-world LLM APIs. Our attack achieves 84% attack success rate (ASR) on GPT-3.5-Turbo and 48% on Llama-2-7B, compared to 4% for the current state of the art. We also propose GCG++, an improvement to the GCG attack that reaches 94% ASR on white-box Llama-2-7B, and the Random-Search Attack on LLMs (RAL), a strong but simple baseline for query-based attacks. We believe the techniques proposed in this work will enable more comprehensive safety testing of LLMs and, in the long term, the development of better security guardrails. The code can be found at https://github.com/chawins/pal. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.05339 [pdf]

Can participation in a hackathon impact the motivation of software engineering students? A preliminary case study analysis

Authors: Allysson Allex Araújo, Marcos Kalinowski, Maria Teresa Baldassarre

Abstract: [Background] Hackathons are increasingly gaining prominence in Software Engineering (SE) education, lauded for their ability to elevate students' skill sets. [Objective] This paper investigates whether hackathons can impact the motivation of SE students. [Method] We conducted an evaluative case study assessing students' motivations before and after a hackathon, combining quantitative analysis usin… ▽ More [Background] Hackathons are increasingly gaining prominence in Software Engineering (SE) education, lauded for their ability to elevate students' skill sets. [Objective] This paper investigates whether hackathons can impact the motivation of SE students. [Method] We conducted an evaluative case study assessing students' motivations before and after a hackathon, combining quantitative analysis using the Academic Motivation Scale (AMS) and qualitative coding of open-ended responses. [Results] Pre-hackathon findings reveal a diverse range of motivations with an overall acceptance, while post-hackathon responses highlight no statistically significant shift in participants' perceptions. Qualitative findings uncovered themes related to networking, team dynamics, and skill development. From a practical perspective, our findings highlight the potential of hackathons to impact participants' motivation. [Conclusion] While our study enhances the comprehension of hackathons as a motivational tool, it also underscores the need for further exploration of psychometric dimensions in SE educational research. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.03337 [pdf, other]

Reinforcement-learning robotic sailboats: simulator and preliminary results

Authors: Eduardo Charles Vasconcellos, Ronald M Sampaio, André P D Araújo, Esteban Walter Gonzales Clua, Philippe Preux, Raphael Guerra, Luiz M G Gonçalves, Luis Martí, Hernan Lira, Nayat Sanchez-Pi

Abstract: This work focuses on the main challenges and problems in developing a virtual oceanic environment reproducing real experiments using Unmanned Surface Vehicles (USV) digital twins. We introduce the key features for building virtual worlds, considering using Reinforcement Learning (RL) agents for autonomous navigation and control. With this in mind, the main problems concern the definition of the si… ▽ More This work focuses on the main challenges and problems in developing a virtual oceanic environment reproducing real experiments using Unmanned Surface Vehicles (USV) digital twins. We introduce the key features for building virtual worlds, considering using Reinforcement Learning (RL) agents for autonomous navigation and control. With this in mind, the main problems concern the definition of the simulation equations (physics and mathematics), their effective implementation, and how to include strategies for simulated control and perception (sensors) to be used with RL. We present the modeling, implementation steps, and challenges required to create a functional digital twin based on a real robotic sailing vessel. The application is immediate for developing navigation algorithms based on RL to be applied on real boats. △ Less

Submitted 16 January, 2024; originally announced February 2024.

Journal ref: NeurIPS 2023 Workshop on Robot Learning Workshop: Pretraining, Fine-Tuning, and Generalization with Large Scale Models, Dec 2023, New Orelans, United States

arXiv:2401.14033 [pdf, ps, other]

Novel Quadratic Constraints for Extending LipSDP beyond Slope-Restricted Activations

Authors: Patricia Pauli, Aaron Havens, Alexandre Araujo, Siddharth Garg, Farshad Khorrami, Frank Allgöwer, Bin Hu

Abstract: Recently, semidefinite programming (SDP) techniques have shown great promise in providing accurate Lipschitz bounds for neural networks. Specifically, the LipSDP approach (Fazlyab et al., 2019) has received much attention and provides the least conservative Lipschitz upper bounds that can be computed with polynomial time guarantees. However, one main restriction of LipSDP is that its formulation r… ▽ More Recently, semidefinite programming (SDP) techniques have shown great promise in providing accurate Lipschitz bounds for neural networks. Specifically, the LipSDP approach (Fazlyab et al., 2019) has received much attention and provides the least conservative Lipschitz upper bounds that can be computed with polynomial time guarantees. However, one main restriction of LipSDP is that its formulation requires the activation functions to be slope-restricted on $[0,1]$, preventing its further use for more general activation functions such as GroupSort, MaxMin, and Householder. One can rewrite MaxMin activations for example as residual ReLU networks. However, a direct application of LipSDP to the resultant residual ReLU networks is conservative and even fails in recovering the well-known fact that the MaxMin activation is 1-Lipschitz. Our paper bridges this gap and extends LipSDP beyond slope-restricted activation functions. To this end, we provide novel quadratic constraints for GroupSort, MaxMin, and Householder activations via leveraging their underlying properties such as sum preservation. Our proposed analysis is general and provides a unified approach for estimating $\ell_2$ and $\ell_\infty$ Lipschitz bounds for a rich class of neural network architectures, including non-residual and residual neural networks and implicit models, with GroupSort, MaxMin, and Householder activations. Finally, we illustrate the utility of our approach with a variety of experiments and show that our proposed SDPs generate less conservative Lipschitz bounds in comparison to existing approaches. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: accepted as a conference paper at ICLR 2024

arXiv:2311.17846 [pdf, other]

Towards Real-World Focus Stacking with Deep Learning

Authors: Alexandre Araujo, Jean Ponce, Julien Mairal

Abstract: Focus stacking is widely used in micro, macro, and landscape photography to reconstruct all-in-focus images from multiple frames obtained with focus bracketing, that is, with shallow depth of field and different focus planes. Existing deep learning approaches to the underlying multi-focus image fusion problem have limited applicability to real-world imagery since they are designed for very short i… ▽ More Focus stacking is widely used in micro, macro, and landscape photography to reconstruct all-in-focus images from multiple frames obtained with focus bracketing, that is, with shallow depth of field and different focus planes. Existing deep learning approaches to the underlying multi-focus image fusion problem have limited applicability to real-world imagery since they are designed for very short image sequences (two to four images), and are typically trained on small, low-resolution datasets either acquired by light-field cameras or generated synthetically. We introduce a new dataset consisting of 94 high-resolution bursts of raw images with focus bracketing, with pseudo ground truth computed from the data using state-of-the-art commercial software. This dataset is used to train the first deep learning algorithm for focus stacking capable of handling bursts of sufficient length for real-world applications. Qualitative experiments demonstrate that it is on par with existing commercial solutions in the long-burst, realistic regime while being significantly more tolerant to noise. The code and dataset are available at https://github.com/araujoalexandre/FocusStackingDataset. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.05452 [pdf, other]

Transformer-based Model for Oral Epithelial Dysplasia Segmentation

Authors: Adam J Shephard, Hanya Mahmood, Shan E Ahmed Raza, Anna Luiza Damaceno Araujo, Alan Roger Santos-Silva, Marcio Ajudarte Lopes, Pablo Agustin Vargas, Kris McCombe, Stephanie Craig, Jacqueline James, Jill Brooks, Paul Nankivell, Hisham Mehanna, Syed Ali Khurram, Nasir M Rajpoot

Abstract: Oral epithelial dysplasia (OED) is a premalignant histopathological diagnosis given to lesions of the oral cavity. OED grading is subject to large inter/intra-rater variability, resulting in the under/over-treatment of patients. We developed a new Transformer-based pipeline to improve detection and segmentation of OED in haematoxylin and eosin (H&E) stained whole slide images (WSIs). Our model was… ▽ More Oral epithelial dysplasia (OED) is a premalignant histopathological diagnosis given to lesions of the oral cavity. OED grading is subject to large inter/intra-rater variability, resulting in the under/over-treatment of patients. We developed a new Transformer-based pipeline to improve detection and segmentation of OED in haematoxylin and eosin (H&E) stained whole slide images (WSIs). Our model was trained on OED cases (n = 260) and controls (n = 105) collected using three different scanners, and validated on test data from three external centres in the United Kingdom and Brazil (n = 78). Our internal experiments yield a mean F1-score of 0.81 for OED segmentation, which reduced slightly to 0.71 on external testing, showing good generalisability, and gaining state-of-the-art results. This is the first externally validated study to use Transformers for segmentation in precancerous histology images. Our publicly available model shows great promise to be the first step of a fully-integrated pipeline, allowing earlier and more efficient OED diagnosis, ultimately benefiting patient outcomes. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: 5 pages, 2 figures, 4 tables

arXiv:2310.18274 [pdf, other]

LipSim: A Provably Robust Perceptual Similarity Metric

Authors: Sara Ghazanfari, Alexandre Araujo, Prashanth Krishnamurthy, Farshad Khorrami, Siddharth Garg

Abstract: Recent years have seen growing interest in developing and applying perceptual similarity metrics. Research has shown the superiority of perceptual metrics over pixel-wise metrics in aligning with human perception and serving as a proxy for the human visual system. On the other hand, as perceptual metrics rely on neural networks, there is a growing concern regarding their resilience, given the esta… ▽ More Recent years have seen growing interest in developing and applying perceptual similarity metrics. Research has shown the superiority of perceptual metrics over pixel-wise metrics in aligning with human perception and serving as a proxy for the human visual system. On the other hand, as perceptual metrics rely on neural networks, there is a growing concern regarding their resilience, given the established vulnerability of neural networks to adversarial attacks. It is indeed logical to infer that perceptual metrics may inherit both the strengths and shortcomings of neural networks. In this work, we demonstrate the vulnerability of state-of-the-art perceptual similarity metrics based on an ensemble of ViT-based feature extractors to adversarial attacks. We then propose a framework to train a robust perceptual similarity metric called LipSim (Lipschitz Similarity Metric) with provable guarantees. By leveraging 1-Lipschitz neural networks as the backbone, LipSim provides guarded areas around each data point and certificates for all perturbations within an $\ell_2$ ball. Finally, a comprehensive set of experiments shows the performance of LipSim in terms of natural and certified scores and on the image retrieval application. The code is available at https://github.com/SaraGhazanfari/LipSim. △ Less

Submitted 29 March, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

arXiv:2310.03664 [pdf, other]

Certification of Deep Learning Models for Medical Image Segmentation

Authors: Othmane Laousy, Alexandre Araujo, Guillaume Chassagnon, Nikos Paragios, Marie-Pierre Revel, Maria Vakalopoulou

Abstract: In medical imaging, segmentation models have known a significant improvement in the past decade and are now used daily in clinical practice. However, similar to classification models, segmentation models are affected by adversarial attacks. In a safety-critical field like healthcare, certifying model predictions is of the utmost importance. Randomized smoothing has been introduced lately and provi… ▽ More In medical imaging, segmentation models have known a significant improvement in the past decade and are now used daily in clinical practice. However, similar to classification models, segmentation models are affected by adversarial attacks. In a safety-critical field like healthcare, certifying model predictions is of the utmost importance. Randomized smoothing has been introduced lately and provides a framework to certify models and obtain theoretical guarantees. In this paper, we present for the first time a certified segmentation baseline for medical imaging based on randomized smoothing and diffusion models. Our results show that leveraging the power of denoising diffusion probabilistic models helps us overcome the limits of randomized smoothing. We conduct extensive experiments on five public datasets of chest X-rays, skin lesions, and colonoscopies, and empirically show that we are able to maintain high certified Dice scores even for highly perturbed images. Our work represents the first attempt to certify medical image segmentation models, and we aspire for it to set a foundation for future benchmarks in this crucial and largely uncharted area. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2309.16883 [pdf, other]

The Lipschitz-Variance-Margin Tradeoff for Enhanced Randomized Smoothing

Authors: Blaise Delattre, Alexandre Araujo, Quentin Barthélemy, Alexandre Allauzen

Abstract: Real-life applications of deep neural networks are hindered by their unsteady predictions when faced with noisy inputs and adversarial attacks. The certified radius in this context is a crucial indicator of the robustness of models. However how to design an efficient classifier with an associated certified radius? Randomized smoothing provides a promising framework by relying on noise injection in… ▽ More Real-life applications of deep neural networks are hindered by their unsteady predictions when faced with noisy inputs and adversarial attacks. The certified radius in this context is a crucial indicator of the robustness of models. However how to design an efficient classifier with an associated certified radius? Randomized smoothing provides a promising framework by relying on noise injection into the inputs to obtain a smoothed and robust classifier. In this paper, we first show that the variance introduced by the Monte-Carlo sampling in the randomized smoothing procedure estimate closely interacts with two other important properties of the classifier, \textit{i.e.} its Lipschitz constant and margin. More precisely, our work emphasizes the dual impact of the Lipschitz constant of the base classifier, on both the smoothed classifier and the empirical variance. To increase the certified robust radius, we introduce a different way to convert logits to probability vectors for the base classifier to leverage the variance-margin trade-off. We leverage the use of Bernstein's concentration inequality along with enhanced Lipschitz bounds for randomized smoothing. Experimental results show a significant improvement in certified accuracy compared to current state-of-the-art methods. Our novel certification procedure allows us to use pre-trained models with randomized smoothing, effectively improving the current certification radius in a zero-shot manner. △ Less

Submitted 18 March, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.08250 [pdf, other]

Optimization of Rank Losses for Image Retrieval

Authors: Elias Ramzi, Nicolas Audebert, Clément Rambour, André Araujo, Xavier Bitot, Nicolas Thome

Abstract: In image retrieval, standard evaluation metrics rely on score ranking, \eg average precision (AP), recall at k (R@k), normalized discounted cumulative gain (NDCG). In this work we introduce a general framework for robust and decomposable rank losses optimization. It addresses two major challenges for end-to-end training of deep neural networks with rank losses: non-differentiability and non-decomp… ▽ More In image retrieval, standard evaluation metrics rely on score ranking, \eg average precision (AP), recall at k (R@k), normalized discounted cumulative gain (NDCG). In this work we introduce a general framework for robust and decomposable rank losses optimization. It addresses two major challenges for end-to-end training of deep neural networks with rank losses: non-differentiability and non-decomposability. Firstly we propose a general surrogate for ranking operator, SupRank, that is amenable to stochastic gradient descent. It provides an upperbound for rank losses and ensures robust training. Secondly, we use a simple yet effective loss function to reduce the decomposability gap between the averaged batch approximation of ranking losses and their values on the whole training set. We apply our framework to two standard metrics for image retrieval: AP and R@k. Additionally we apply our framework to hierarchical image retrieval. We introduce an extension of AP, the hierarchical average precision $\mathcal{H}$-AP, and optimize it as well as the NDCG. Finally we create the first hierarchical landmarks retrieval dataset. We use a semi-automatic pipeline to create hierarchical labels, extending the large scale Google Landmarks v2 dataset. The hierarchical dataset is publicly available at https://github.com/cvdfoundation/google-landmark. Code will be released at https://github.com/elias-ramzi/SupRank. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: arXiv admin note: text overlap with arXiv:2207.04873

arXiv:2309.01858 [pdf, other]

Towards Universal Image Embeddings: A Large-Scale Dataset and Challenge for Generic Image Representations

Authors: Nikolaos-Antonios Ypsilantis, Kaifeng Chen, Bingyi Cao, Mário Lipovský, Pelin Dogan-Schönberger, Grzegorz Makosa, Boris Bluntschli, Mojtaba Seyedhosseini, Ondřej Chum, André Araujo

Abstract: Fine-grained and instance-level recognition methods are commonly trained and evaluated on specific domains, in a model per domain scenario. Such an approach, however, is impractical in real large-scale applications. In this work, we address the problem of universal image embedding, where a single universal model is trained and used in multiple domains. First, we leverage existing domain-specific d… ▽ More Fine-grained and instance-level recognition methods are commonly trained and evaluated on specific domains, in a model per domain scenario. Such an approach, however, is impractical in real large-scale applications. In this work, we address the problem of universal image embedding, where a single universal model is trained and used in multiple domains. First, we leverage existing domain-specific datasets to carefully construct a new large-scale public benchmark for the evaluation of universal image embeddings, with 241k query images, 1.4M index images and 2.8M training images across 8 different domains and 349k classes. We define suitable metrics, training and evaluation protocols to foster future research in this area. Second, we provide a comprehensive experimental evaluation on the new dataset, demonstrating that existing approaches and simplistic extensions lead to worse performance than an assembly of models trained for each domain separately. Finally, we conducted a public research competition on this topic, leveraging industrial datasets, which attracted the participation of more than 1k teams worldwide. This exercise generated many interesting research ideas and findings which we present in detail. Project webpage: https://cmp.felk.cvut.cz/univ_emb/ △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: ICCV 2023 Accepted

arXiv:2308.13363 [pdf, other]

CS-Mixer: A Cross-Scale Vision MLP Model with Spatial-Channel Mixing

Authors: Jonathan Cui, David A. Araujo, Suman Saha, Md. Faisal Kabir

Abstract: Despite their simpler information fusion designs compared with Vision Transformers and Convolutional Neural Networks, Vision MLP architectures have demonstrated strong performance and high data efficiency in recent research. However, existing works such as CycleMLP and Vision Permutator typically model spatial information in equal-size spatial regions and do not consider cross-scale spatial intera… ▽ More Despite their simpler information fusion designs compared with Vision Transformers and Convolutional Neural Networks, Vision MLP architectures have demonstrated strong performance and high data efficiency in recent research. However, existing works such as CycleMLP and Vision Permutator typically model spatial information in equal-size spatial regions and do not consider cross-scale spatial interactions. Further, their token mixers only model 1- or 2-axis correlations, avoiding 3-axis spatial-channel mixing due to its computational demands. We therefore propose CS-Mixer, a hierarchical Vision MLP that learns dynamic low-rank transformations for spatial-channel mixing through cross-scale local and global aggregation. The proposed methodology achieves competitive results on popular image recognition benchmarks without incurring substantially more compute. Our largest model, CS-Mixer-L, reaches 83.2% top-1 accuracy on ImageNet-1k with 13.7 GFLOPs and 94 M parameters. △ Less

Submitted 14 January, 2024; v1 submitted 25 August, 2023; originally announced August 2023.

Comments: 8 pages, 5 figures, developed under Penn State University's Multi-Campus Research Experience for Undergraduates Symposium, 2023. This work has been submitted to the IEEE for possible publication

arXiv:2308.06954 [pdf, other]

Global Features are All You Need for Image Retrieval and Reranking

Authors: Shihao Shao, Kaifeng Chen, Arjun Karpur, Qinghua Cui, Andre Araujo, Bingyi Cao

Abstract: Image retrieval systems conventionally use a two-stage paradigm, leveraging global features for initial retrieval and local features for reranking. However, the scalability of this method is often limited due to the significant storage and computation cost incurred by local feature matching in the reranking stage. In this paper, we present SuperGlobal, a novel approach that exclusively employs glo… ▽ More Image retrieval systems conventionally use a two-stage paradigm, leveraging global features for initial retrieval and local features for reranking. However, the scalability of this method is often limited due to the significant storage and computation cost incurred by local feature matching in the reranking stage. In this paper, we present SuperGlobal, a novel approach that exclusively employs global features for both stages, improving efficiency without sacrificing accuracy. SuperGlobal introduces key enhancements to the retrieval system, specifically focusing on the global feature extraction and reranking processes. For extraction, we identify sub-optimal performance when the widely-used ArcFace loss and Generalized Mean (GeM) pooling methods are combined and propose several new modules to improve GeM pooling. In the reranking stage, we introduce a novel method to update the global features of the query and top-ranked images by only considering feature refinement with a small set of images, thus being very compute and memory efficient. Our experiments demonstrate substantial improvements compared to the state of the art in standard benchmarks. Notably, on the Revisited Oxford+1M Hard dataset, our single-stage results improve by 7.1%, while our two-stage gain reaches 3.7% with a strong 64,865x speedup. Our two-stage system surpasses the current single-stage state-of-the-art by 16.3%, offering a scalable, accurate alternative for high-performing image retrieval systems with minimal time overhead. Code: https://github.com/ShihaoShao-GH/SuperGlobal. △ Less

Submitted 19 August, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

Comments: ICCV23 camera-ready + appendix

arXiv:2308.05810 [pdf, other]

Spintronics for image recognition: performance benchmarking via ultrafast data-driven simulations

Authors: Anatole Moureaux, Chloé Chopin, Simon de Wergifosse, Laurent Jacques, Flavio Abreu Araujo

Abstract: We present a demonstration of image classification using an echo-state network (ESN) relying on a single simulated spintronic nanostructure known as the vortex-based spin-torque oscillator (STVO) delayed in time. We employ an ultrafast data-driven simulation framework called the data-driven Thiele equation approach (DD-TEA) to simulate the STVO dynamics. This allows us to avoid the challenges asso… ▽ More We present a demonstration of image classification using an echo-state network (ESN) relying on a single simulated spintronic nanostructure known as the vortex-based spin-torque oscillator (STVO) delayed in time. We employ an ultrafast data-driven simulation framework called the data-driven Thiele equation approach (DD-TEA) to simulate the STVO dynamics. This allows us to avoid the challenges associated with repeated experimental manipulation of such a nanostructured system. We showcase the versatility of our solution by successfully applying it to solve classification challenges with the MNIST, EMNIST-letters and Fashion MNIST datasets. Through our simulations, we determine that within an ESN with numerous learnable parameters the results obtained using the STVO dynamics as an activation function are comparable to the ones obtained with other conventional nonlinear activation functions like the reLU and the sigmoid. While achieving state-of-the-art accuracy levels on the MNIST dataset, our model's performance on EMNIST-letters and Fashion MNIST is lower due to the relative simplicity of the system architecture and the increased complexity of the tasks. We expect that the DD-TEA framework will enable the exploration of deeper architectures, ultimately leading to improved classification accuracy. △ Less

Submitted 7 February, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

Comments: 6 pages, 4 figures

arXiv:2307.15157 [pdf, other]

R-LPIPS: An Adversarially Robust Perceptual Similarity Metric

Authors: Sara Ghazanfari, Siddharth Garg, Prashanth Krishnamurthy, Farshad Khorrami, Alexandre Araujo

Abstract: Similarity metrics have played a significant role in computer vision to capture the underlying semantics of images. In recent years, advanced similarity metrics, such as the Learned Perceptual Image Patch Similarity (LPIPS), have emerged. These metrics leverage deep features extracted from trained neural networks and have demonstrated a remarkable ability to closely align with human perception whe… ▽ More Similarity metrics have played a significant role in computer vision to capture the underlying semantics of images. In recent years, advanced similarity metrics, such as the Learned Perceptual Image Patch Similarity (LPIPS), have emerged. These metrics leverage deep features extracted from trained neural networks and have demonstrated a remarkable ability to closely align with human perception when evaluating relative image similarity. However, it is now well-known that neural networks are susceptible to adversarial examples, i.e., small perturbations invisible to humans crafted to deliberately mislead the model. Consequently, the LPIPS metric is also sensitive to such adversarial examples. This susceptibility introduces significant security concerns, especially considering the widespread adoption of LPIPS in large-scale applications. In this paper, we propose the Robust Learned Perceptual Image Patch Similarity (R-LPIPS) metric, a new metric that leverages adversarially trained deep features. Through a comprehensive set of experiments, we demonstrate the superiority of R-LPIPS compared to the classical LPIPS metric. The code is available at https://github.com/SaraGhazanfari/R-LPIPS. △ Less

Submitted 31 July, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

arXiv:2307.09262 [pdf, other]

Neuromorphic spintronics simulated using an unconventional data-driven Thiele equation approach

Authors: Anatole Moureaux, Simon de Wergifosse, Chloé Chopin, Flavio Abreu Araujo

Abstract: In this study, we developed a quantitative description of the dynamics of spin-torque vortex nano-oscillators (STVOs) through an unconventional model based on the combination of the Thiele equation approach (TEA) and data from micromagnetic simulations (MMS). Solving the STVO dynamics with our analytical model allows to accelerate the simulations by 9 orders of magnitude compared to MMS while reac… ▽ More In this study, we developed a quantitative description of the dynamics of spin-torque vortex nano-oscillators (STVOs) through an unconventional model based on the combination of the Thiele equation approach (TEA) and data from micromagnetic simulations (MMS). Solving the STVO dynamics with our analytical model allows to accelerate the simulations by 9 orders of magnitude compared to MMS while reaching the same level of accuracy. Here, we showcase our model by simulating a STVO-based neural network for solving a classification task. We assess its performance with respect to the input signal current intensity and the level of noise that might affect such a system. Our approach is promising for accelerating the design of STVO-based neuromorphic computing devices while decreasing drastically its computational cost. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: Presented in ISCS2023

Report number: ISCS23-46

arXiv:2306.09949 [pdf, other]

Towards Better Certified Segmentation via Diffusion Models

Authors: Othmane Laousy, Alexandre Araujo, Guillaume Chassagnon, Marie-Pierre Revel, Siddharth Garg, Farshad Khorrami, Maria Vakalopoulou

Abstract: The robustness of image segmentation has been an important research topic in the past few years as segmentation models have reached production-level accuracy. However, like classification models, segmentation models can be vulnerable to adversarial perturbations, which hinders their use in critical-decision systems like healthcare or autonomous driving. Recently, randomized smoothing has been prop… ▽ More The robustness of image segmentation has been an important research topic in the past few years as segmentation models have reached production-level accuracy. However, like classification models, segmentation models can be vulnerable to adversarial perturbations, which hinders their use in critical-decision systems like healthcare or autonomous driving. Recently, randomized smoothing has been proposed to certify segmentation predictions by adding Gaussian noise to the input to obtain theoretical guarantees. However, this method exhibits a trade-off between the amount of added noise and the level of certification achieved. In this paper, we address the problem of certifying segmentation prediction using a combination of randomized smoothing and diffusion models. Our experiments show that combining randomized smoothing and diffusion models significantly improves certified robustness, with results indicating a mean improvement of 21 points in accuracy compared to previous state-of-the-art methods on Pascal-Context and Cityscapes public datasets. Our method is independent of the selected segmentation model and does not need any additional specialized training procedure. △ Less

Submitted 16 June, 2023; originally announced June 2023.

arXiv:2306.09224 [pdf, other]

Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories

Authors: Thomas Mensink, Jasper Uijlings, Lluis Castrejon, Arushi Goel, Felipe Cadar, Howard Zhou, Fei Sha, André Araujo, Vittorio Ferrari

Abstract: We propose Encyclopedic-VQA, a large scale visual question answering (VQA) dataset featuring visual questions about detailed properties of fine-grained categories and instances. It contains 221k unique question+answer pairs each matched with (up to) 5 images, resulting in a total of 1M VQA samples. Moreover, our dataset comes with a controlled knowledge base derived from Wikipedia, marking the evi… ▽ More We propose Encyclopedic-VQA, a large scale visual question answering (VQA) dataset featuring visual questions about detailed properties of fine-grained categories and instances. It contains 221k unique question+answer pairs each matched with (up to) 5 images, resulting in a total of 1M VQA samples. Moreover, our dataset comes with a controlled knowledge base derived from Wikipedia, marking the evidence to support each answer. Empirically, we show that our dataset poses a hard challenge for large vision+language models as they perform poorly on our dataset: PaLI [14] is state-of-the-art on OK-VQA [37], yet it only achieves 13.0% accuracy on our dataset. Moreover, we experimentally show that progress on answering our encyclopedic questions can be achieved by augmenting large models with a mechanism that retrieves relevant information from the knowledge base. An oracle experiment with perfect retrieval achieves 87.0% accuracy on the single-hop portion of our dataset, and an automatic retrieval-augmented prototype yields 48.8%. We believe that our dataset enables future research on retrieval-augmented vision+language models. It is available at https://github.com/google-research/google-research/tree/master/encyclopedic_vqa . △ Less

Submitted 24 July, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: ICCV'23

arXiv:2306.09109 [pdf, other]

NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations

Authors: Varun Jampani, Kevis-Kokitsi Maninis, Andreas Engelhardt, Arjun Karpur, Karen Truong, Kyle Sargent, Stefan Popov, André Araujo, Ricardo Martin-Brualla, Kaushal Patel, Daniel Vlasic, Vittorio Ferrari, Ameesh Makadia, Ce Liu, Yuanzhen Li, Howard Zhou

Abstract: Recent advances in neural reconstruction enable high-quality 3D object reconstruction from casually captured image collections. Current techniques mostly analyze their progress on relatively simple image collections where Structure-from-Motion (SfM) techniques can provide ground-truth (GT) camera poses. We note that SfM techniques tend to fail on in-the-wild image collections such as image search… ▽ More Recent advances in neural reconstruction enable high-quality 3D object reconstruction from casually captured image collections. Current techniques mostly analyze their progress on relatively simple image collections where Structure-from-Motion (SfM) techniques can provide ground-truth (GT) camera poses. We note that SfM techniques tend to fail on in-the-wild image collections such as image search results with varying backgrounds and illuminations. To enable systematic research progress on 3D reconstruction from casual image captures, we propose NAVI: a new dataset of category-agnostic image collections of objects with high-quality 3D scans along with per-image 2D-3D alignments providing near-perfect GT camera parameters. These 2D-3D alignments allow us to extract accurate derivative annotations such as dense pixel correspondences, depth and segmentation maps. We demonstrate the use of NAVI image collections on different problem settings and show that NAVI enables more thorough evaluations that were not possible with existing datasets. We believe NAVI is beneficial for systematic research progress on 3D reconstruction and correspondence estimation. Project page: https://navidataset.github.io △ Less

Submitted 13 October, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023 camera ready. Project page: https://navidataset.github.io

arXiv:2306.09012 [pdf, other]

Yes, we CANN: Constrained Approximate Nearest Neighbors for local feature-based visual localization

Authors: Dror Aiger, André Araujo, Simon Lynen

Abstract: Large-scale visual localization systems continue to rely on 3D point clouds built from image collections using structure-from-motion. While the 3D points in these models are represented using local image features, directly matching a query image's local features against the point cloud is challenging due to the scale of the nearest-neighbor search problem. Many recent approaches to visual localiza… ▽ More Large-scale visual localization systems continue to rely on 3D point clouds built from image collections using structure-from-motion. While the 3D points in these models are represented using local image features, directly matching a query image's local features against the point cloud is challenging due to the scale of the nearest-neighbor search problem. Many recent approaches to visual localization have thus proposed a hybrid method, where first a global (per image) embedding is used to retrieve a small subset of database images, and local features of the query are matched only against those. It seems to have become common belief that global embeddings are critical for said image-retrieval in visual localization, despite the significant downside of having to compute two feature types for each query image. In this paper, we take a step back from this assumption and propose Constrained Approximate Nearest Neighbors (CANN), a joint solution of k-nearest-neighbors across both the geometry and appearance space using only local features. We first derive the theoretical foundation for k-nearest-neighbor retrieval across multiple metrics and then showcase how CANN improves visual localization. Our experiments on public localization benchmarks demonstrate that our method significantly outperforms both state-of-the-art global feature-based retrieval and approaches using local feature aggregation schemes. Moreover, it is an order of magnitude faster in both index and query time than feature aggregation schemes for these datasets. Code: \url{https://github.com/google-research/google-research/tree/master/cann} △ Less

Submitted 29 December, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: ICCV23 camera-ready + appendix

arXiv:2305.16494 [pdf, other]

Diffusion-Based Adversarial Sample Generation for Improved Stealthiness and Controllability

Authors: Haotian Xue, Alexandre Araujo, Bin Hu, Yongxin Chen

Abstract: Neural networks are known to be susceptible to adversarial samples: small variations of natural examples crafted to deliberately mislead the models. While they can be easily generated using gradient-based techniques in digital and physical scenarios, they often differ greatly from the actual data distribution of natural images, resulting in a trade-off between strength and stealthiness. In this pa… ▽ More Neural networks are known to be susceptible to adversarial samples: small variations of natural examples crafted to deliberately mislead the models. While they can be easily generated using gradient-based techniques in digital and physical scenarios, they often differ greatly from the actual data distribution of natural images, resulting in a trade-off between strength and stealthiness. In this paper, we propose a novel framework dubbed Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic adversarial samples. By exploiting a gradient guided by a diffusion model, Diff-PGD ensures that adversarial samples remain close to the original data distribution while maintaining their effectiveness. Moreover, our framework can be easily customized for specific tasks such as digital attacks, physical-world attacks, and style-based attacks. Compared with existing methods for generating natural-style adversarial samples, our framework enables the separation of optimizing adversarial loss from other surrogate losses (e.g., content/smoothness/style loss), making it more stable and controllable. Finally, we demonstrate that the samples generated using Diff-PGD have better transferability and anti-purification power than traditional gradient-based methods. Code will be released in https://github.com/xavihart/Diff-PGD △ Less

Submitted 17 January, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: Accepted as a conference paper in NeurIPS'2023. Code repo: https://github.com/xavihart/Diff-PGD

arXiv:2305.16173 [pdf, other]

Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram Iteration

Authors: Blaise Delattre, Quentin Barthélemy, Alexandre Araujo, Alexandre Allauzen

Abstract: Since the control of the Lipschitz constant has a great impact on the training stability, generalization, and robustness of neural networks, the estimation of this value is nowadays a real scientific challenge. In this paper we introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory and a new alternative to the Power ite… ▽ More Since the control of the Lipschitz constant has a great impact on the training stability, generalization, and robustness of neural networks, the estimation of this value is nowadays a real scientific challenge. In this paper we introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory and a new alternative to the Power iteration. Called the Gram iteration, our approach exhibits a superlinear convergence. First, we show through a comprehensive set of experiments that our approach outperforms other state-of-the-art methods in terms of precision, computational cost, and scalability. Then, it proves highly effective for the Lipschitz regularization of convolutional neural networks, with competitive results against concurrent approaches. Code is available at https://github.com/blaisedelattre/lip4conv. △ Less

Submitted 19 June, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: ICML 2023

arXiv:2304.00583 [pdf, other]

Enhancing Deformable Local Features by Jointly Learning to Detect and Describe Keypoints

Authors: Guilherme Potje, Felipe Cadar, Andre Araujo, Renato Martins, Erickson R. Nascimento

Abstract: Local feature extraction is a standard approach in computer vision for tackling important tasks such as image matching and retrieval. The core assumption of most methods is that images undergo affine transformations, disregarding more complicated effects such as non-rigid deformations. Furthermore, incipient works tailored for non-rigid correspondence still rely on keypoint detectors designed for… ▽ More Local feature extraction is a standard approach in computer vision for tackling important tasks such as image matching and retrieval. The core assumption of most methods is that images undergo affine transformations, disregarding more complicated effects such as non-rigid deformations. Furthermore, incipient works tailored for non-rigid correspondence still rely on keypoint detectors designed for rigid transformations, hindering performance due to the limitations of the detector. We propose DALF (Deformation-Aware Local Features), a novel deformation-aware network for jointly detecting and describing keypoints, to handle the challenging problem of matching deformable surfaces. All network components work cooperatively through a feature fusion approach that enforces the descriptors' distinctiveness and invariance. Experiments using real deforming objects showcase the superiority of our method, where it delivers 8% improvement in matching scores compared to the previous best results. Our approach also enhances the performance of two real-world applications: deformable object retrieval and non-rigid 3D surface registration. Code for training, inference, and applications are publicly available at https://verlab.dcc.ufmg.br/descriptors/dalf_cvpr23. △ Less

Submitted 2 April, 2023; originally announced April 2023.

Comments: CVPR 2023; Source code available at https://verlab.dcc.ufmg.br/descriptors/dalf_cvpr23

Showing 1–50 of 106 results for author: Araújo, A