-
Gemma 3 Technical Report
Authors:
Gemma Team,
Aishwarya Kamath,
Johan Ferret,
Shreya Pathak,
Nino Vieillard,
Ramona Merhej,
Sarah Perrin,
Tatiana Matejovicova,
Alexandre Ramé,
Morgane Rivière,
Louis Rouillard,
Thomas Mesnard,
Geoffrey Cideron,
Jean-bastien Grill,
Sabela Ramos,
Edouard Yvinec,
Michelle Casbon,
Etienne Pot,
Ivo Penchev,
Gaël Liu,
Francesco Visin,
Kathleen Kenealy,
Lucas Beyer,
Xiaohai Zhai,
Anton Tsitsulin
, et al. (191 additional authors not shown)
Abstract:
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achie…
▽ More
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
H-MBR: Hypervisor-level Memory Bandwidth Reservation for Mixed Criticality Systems
Authors:
Afonso Oliveira,
Diogo Costa,
Gonçalo Moreira,
José Martins,
Sandro Pinto
Abstract:
Recent advancements in fields such as automotive and aerospace have driven a growing demand for robust computational resources. Applications that were once designed for basic MCUs are now deployed on highly heterogeneous SoC platforms. While these platforms deliver the necessary computational performance, they also present challenges related to resource sharing and predictability. These challenges…
▽ More
Recent advancements in fields such as automotive and aerospace have driven a growing demand for robust computational resources. Applications that were once designed for basic MCUs are now deployed on highly heterogeneous SoC platforms. While these platforms deliver the necessary computational performance, they also present challenges related to resource sharing and predictability. These challenges are particularly pronounced when consolidating safety and non-safety-critical systems, the so-called Mixed-Criticality Systems (MCS) to adhere to strict SWaP-C requirements. MCS consolidation on shared platforms requires stringent spatial and temporal isolation to comply with functional safety standards. Virtualization, mainly leveraged by hypervisors, is a key technology that ensures spatial isolation across multiple OSes and applications; however, ensuring temporal isolation remains challenging due to contention on shared hardwar resources, which impacts real-time performance and predictability. To mitigate this problem, several strategies as cache coloring and memory bandwidth reservation have been proposed. Although cache coloring is typically implemented on state-of-the-art hypervisors, memory bandwidth reservation approaches are commonly implemented at the Linux kernel level or rely on dedicated hardware and typically do not consider the concept of VMs that can run different OSes. To fill the gap between current memory bandwidth reservation solutions and the deployment of MCSs that operate on a hypervisor, this work introduces H-MBR, an open-source VM-centric memory bandwidth reservation mechanism. H-MBR features (i) VM-centric bandwidth reservation, (ii) OS and platform agnosticism, and (iii) reduced overhead. Empirical results evidenced no overhead on non-regulated workloads, and negligible overhead (<1%) for regulated workloads for regulation periods of 2 us or higher.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
SP-IMPact: A Framework for Static Partitioning Interference Mitigation and Performance Analysis
Authors:
Diogo Costa,
Gonçalo Moreira,
Afonso Oliveira,
José Martins,
Sandro Pinto
Abstract:
Modern embedded systems are evolving toward complex, heterogeneous architectures to accommodate increasingly demanding applications. Driven by SWAP-C constraints, this shift has led to consolidating multiple systems onto single hardware platforms. Static Partitioning Hypervisors offer a promising solution to partition hardware resources and provide spatial isolation between critical workloads. How…
▽ More
Modern embedded systems are evolving toward complex, heterogeneous architectures to accommodate increasingly demanding applications. Driven by SWAP-C constraints, this shift has led to consolidating multiple systems onto single hardware platforms. Static Partitioning Hypervisors offer a promising solution to partition hardware resources and provide spatial isolation between critical workloads. However, shared resources like the Last-Level Cache and system bus can introduce temporal interference between virtual machines (VMs), negatively impacting performance and predictability. Over the past decade, academia and industry have developed interference mitigation techniques, such as cache partitioning and memory bandwidth reservation. However, configuring these techniques is complex and time-consuming. Cache partitioning requires balancing cache sections across VMs, while memory bandwidth reservation needs tuning bandwidth budgets and periods. Testing all configurations is impractical and often leads to suboptimal results. Moreover, understanding how these techniques interact is limited, as their combined use can produce compounded or conflicting effects on performance. Static analysis tools estimating worst-case execution times offer guidance for configuring mitigation techniques but often fail to capture the complexity of modern multi-core systems. They typically focus on limited shared resources while neglecting others, such as IOMMUs and interrupt controllers. To address these challenges, we present SP-IMPact, an open-source framework for analyzing and guiding interference mitigation configurations. SP-IMPact supports (i) cache coloring and (ii) memory bandwidth reservation, while evaluating their interactions and cumulative impact. By providing insights on real hardware, SP-IMPact helps optimize configurations for mixed-criticality systems, ensuring performance and predictability.
△ Less
Submitted 27 January, 2025;
originally announced January 2025.
-
Jet: A Modern Transformer-Based Normalizing Flow
Authors:
Alexander Kolesnikov,
André Susano Pinto,
Michael Tschannen
Abstract:
In the past, normalizing generative flows have emerged as a promising class of generative models for natural images. This type of model has many modeling advantages: the ability to efficiently compute log-likelihood of the input data, fast generation and simple overall structure. Normalizing flows remained a topic of active research but later fell out of favor, as visual quality of the samples was…
▽ More
In the past, normalizing generative flows have emerged as a promising class of generative models for natural images. This type of model has many modeling advantages: the ability to efficiently compute log-likelihood of the input data, fast generation and simple overall structure. Normalizing flows remained a topic of active research but later fell out of favor, as visual quality of the samples was not competitive with other model classes, such as GANs, VQ-VAE-based approaches or diffusion models. In this paper we revisit the design of the coupling-based normalizing flow models by carefully ablating prior design choices and using computational blocks based on the Vision Transformer architecture, not convolutional neural networks. As a result, we achieve state-of-the-art quantitative and qualitative performance with a much simpler architecture. While the overall visual quality is still behind the current state-of-the-art models, we argue that strong normalizing flow models can help advancing research frontier by serving as building components of more powerful generative models.
△ Less
Submitted 19 December, 2024;
originally announced December 2024.
-
PaliGemma 2: A Family of Versatile VLMs for Transfer
Authors:
Andreas Steiner,
André Susano Pinto,
Michael Tschannen,
Daniel Keysers,
Xiao Wang,
Yonatan Bitton,
Alexey Gritsenko,
Matthias Minderer,
Anthony Sherbondy,
Shangbang Long,
Siyang Qin,
Reeve Ingle,
Emanuele Bugliarello,
Sahar Kazemzadeh,
Thomas Mesnard,
Ibrahim Alabdulmohsin,
Lucas Beyer,
Xiaohua Zhai
Abstract:
PaliGemma 2 is an upgrade of the PaliGemma open Vision-Language Model (VLM) based on the Gemma 2 family of language models. We combine the SigLIP-So400m vision encoder that was also used by PaliGemma with the whole range of Gemma 2 models, from the 2B one all the way up to the 27B model. We train these models at three resolutions (224px, 448px, and 896px) in multiple stages to equip them with broa…
▽ More
PaliGemma 2 is an upgrade of the PaliGemma open Vision-Language Model (VLM) based on the Gemma 2 family of language models. We combine the SigLIP-So400m vision encoder that was also used by PaliGemma with the whole range of Gemma 2 models, from the 2B one all the way up to the 27B model. We train these models at three resolutions (224px, 448px, and 896px) in multiple stages to equip them with broad knowledge for transfer via fine-tuning. The resulting family of base models covering different model sizes and resolutions allows us to investigate factors impacting transfer performance (such as learning rate) and to analyze the interplay between the type of task, model size, and resolution. We further increase the number and breadth of transfer tasks beyond the scope of PaliGemma including different OCR-related tasks such as table structure recognition, molecular structure recognition, music score recognition, as well as long fine-grained captioning and radiography report generation, on which PaliGemma 2 obtains state-of-the-art results.
△ Less
Submitted 4 December, 2024;
originally announced December 2024.
-
JetFormer: An Autoregressive Generative Model of Raw Images and Text
Authors:
Michael Tschannen,
André Susano Pinto,
Alexander Kolesnikov
Abstract:
Removing modeling constraints and unifying architectures across domains has been a key driver of the recent progress in training large multimodal models. However, most of these models still rely on many separately trained components such as modality-specific encoders and decoders. In this work, we further streamline joint generative modeling of images and text. We propose an autoregressive decoder…
▽ More
Removing modeling constraints and unifying architectures across domains has been a key driver of the recent progress in training large multimodal models. However, most of these models still rely on many separately trained components such as modality-specific encoders and decoders. In this work, we further streamline joint generative modeling of images and text. We propose an autoregressive decoder-only transformer - JetFormer - which is trained to directly maximize the likelihood of raw data, without relying on any separately pretrained components, and can understand and generate both text and images. Specifically, we leverage a normalizing flow model to obtain a soft-token image representation that is jointly trained with an autoregressive multimodal transformer. The normalizing flow model serves as both an image encoder for perception tasks and an image decoder for image generation tasks during inference. JetFormer achieves text-to-image generation quality competitive with recent VQ-VAE- and VAE-based baselines. These baselines rely on pretrained image autoencoders, which are trained with a complex mixture of losses, including perceptual ones. At the same time, JetFormer demonstrates robust image understanding capabilities. To the best of our knowledge, JetFormer is the first model that is capable of generating high-fidelity images and producing strong log-likelihood bounds.
△ Less
Submitted 19 May, 2025; v1 submitted 29 November, 2024;
originally announced November 2024.
-
RISC-V Needs Secure 'Wheels': the MCU Initiator-Side Perspective
Authors:
Sandro Pinto,
Jose Martins,
Manuel Rodriguez,
Luis Cunha,
Georg Schmalz,
Uwe Moslehner,
Kai Dieffenbach,
Thomas Roecker
Abstract:
The automotive industry is experiencing a massive paradigm shift. Cars are becoming increasingly autonomous, connected, and computerized. Modern electrical/electronic (E/E) architectures are pushing for an unforeseen functionality integration density, resulting in physically separate Electronic Control Units (ECUs) becoming virtualized and mapped to logical partitions within a single physical micr…
▽ More
The automotive industry is experiencing a massive paradigm shift. Cars are becoming increasingly autonomous, connected, and computerized. Modern electrical/electronic (E/E) architectures are pushing for an unforeseen functionality integration density, resulting in physically separate Electronic Control Units (ECUs) becoming virtualized and mapped to logical partitions within a single physical microcontroller (MCU). While functional safety (FuSa) has been pivotal for vehicle certification for decades, the increasing connectivity and advances have opened the door for a number of car hacks and attacks. This development drives (cyber-)security requirements in cars, and has paved the way for the release of the new security certification standard ISO21434. RISC-V has great potential to transform automotive computing systems, but we argue that current ISA/extensions are not ready yet. This paper provides our critical perspective on the existing RISC-V limitations, particularly on the upcoming WorldGuard technology, to address virtualized MCU requirements in line with foreseen automotive applications and ISO21434 directives. We then present our proposal for the required ISA extensions to address such limitations, mainly targeting initiator-side protection. Finally, we explain our roadmap towards a full open-source proof-of-concept (PoC), which includes extending QEMU, an open-source RISC-V core, and building a complete software stack.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
Assisting Novice Developers Learning in Flutter Through Cognitive-Driven Development
Authors:
Ronivaldo Ferreira,
Victor H. S. Pinto,
Cleidson R. B. de Souza,
Gustavo Pinto
Abstract:
Cognitive-Driven Development (CDD) is a coding design technique that helps developers focus on designing code within cognitive limits. The imposed limit tends to enhance code readability and maintainability. While early works on CDD focused mostly on Java, its applicability extends beyond specific programming languages. In this study, we explored the use of CDD in two new dimensions: focusing on F…
▽ More
Cognitive-Driven Development (CDD) is a coding design technique that helps developers focus on designing code within cognitive limits. The imposed limit tends to enhance code readability and maintainability. While early works on CDD focused mostly on Java, its applicability extends beyond specific programming languages. In this study, we explored the use of CDD in two new dimensions: focusing on Flutter programming and targeting novice developers unfamiliar with both Flutter and CDD. Our goal was to understand to what extent CDD helps novice developers learn a new programming technology. We conducted an in-person Flutter training camp with 24 participants. After receiving CDD training, six remaining students were tasked with developing a software management application guided by CDD practices. Our findings indicate that CDD helped participants keep code complexity low, measured using Intrinsic Complexity Points (ICP), a CDD metric. Notably, stricter ICP limits led to a 20\% reduction in code size, improving code quality and readability. This report could be valuable for professors and instructors seeking effective methodologies for teaching design practices that reduce code and cognitive complexity.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
PaliGemma: A versatile 3B VLM for transfer
Authors:
Lucas Beyer,
Andreas Steiner,
André Susano Pinto,
Alexander Kolesnikov,
Xiao Wang,
Daniel Salz,
Maxim Neumann,
Ibrahim Alabdulmohsin,
Michael Tschannen,
Emanuele Bugliarello,
Thomas Unterthiner,
Daniel Keysers,
Skanda Koppula,
Fangyu Liu,
Adam Grycner,
Alexey Gritsenko,
Neil Houlsby,
Manoj Kumar,
Keran Rong,
Julian Eisenschlos,
Rishabh Kabra,
Matthias Bauer,
Matko Bošnjak,
Xi Chen,
Matthias Minderer
, et al. (10 additional authors not shown)
Abstract:
PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more…
▽ More
PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more specialized tasks such as remote-sensing and segmentation.
△ Less
Submitted 10 October, 2024; v1 submitted 10 July, 2024;
originally announced July 2024.
-
CROSSCON: Cross-platform Open Security Stack for Connected Devices
Authors:
Bruno Crispo,
Marco Roveri,
Sandro Pinto,
Tiago Gomes,
Aljosa Pasic,
Akos Milankovich,
David Puron,
Ainara Garcia,
Ziga Putrle,
Peter Ten,
Malvina Catalano
Abstract:
The proliferation of Internet of Things (IoT) embedded devices is expected to reach 30 billion by 2030, creating a dynamic landscape where diverse devices must coexist. This presents challenges due to the rapid expansion of different architectures and platforms. Addressing these challenges requires a unifi ed solution capable of accommodating various devices while offering a broad range of service…
▽ More
The proliferation of Internet of Things (IoT) embedded devices is expected to reach 30 billion by 2030, creating a dynamic landscape where diverse devices must coexist. This presents challenges due to the rapid expansion of different architectures and platforms. Addressing these challenges requires a unifi ed solution capable of accommodating various devices while offering a broad range of services to connect them to the Internet effectively. This white paper introduces CROSSCON, a three-year Research and Innovation Action funded under Horizon Europe. CROSSCON aims to tackle current IoT challenges by developing a new open, modular, and universally compatible IoT security stack. This stack is designed to be highly portable and vendor-independent, enabling its deployment across different devices with heterogeneous embedded hardware architectures, including ARM and RISC-V. The CROSSCON consortium consists of 11 partners spanning 8 European countries. This consortium includes 4 academic institutions, 1 major industrial partner, and 5 small to medium-sized enterprises (SMEs).
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
David and Goliath: An Empirical Evaluation of Attacks and Defenses for QNNs at the Deep Edge
Authors:
Miguel Costa,
Sandro Pinto
Abstract:
ML is shifting from the cloud to the edge. Edge computing reduces the surface exposing private data and enables reliable throughput guarantees in real-time applications. Of the panoply of devices deployed at the edge, resource-constrained MCUs, e.g., Arm Cortex-M, are more prevalent, orders of magnitude cheaper, and less power-hungry than application processors or GPUs. Thus, enabling intelligence…
▽ More
ML is shifting from the cloud to the edge. Edge computing reduces the surface exposing private data and enables reliable throughput guarantees in real-time applications. Of the panoply of devices deployed at the edge, resource-constrained MCUs, e.g., Arm Cortex-M, are more prevalent, orders of magnitude cheaper, and less power-hungry than application processors or GPUs. Thus, enabling intelligence at the deep edge is the zeitgeist, with researchers focusing on unveiling novel approaches to deploy ANNs on these constrained devices. Quantization is a well-established technique that has proved effective in enabling the deployment of neural networks on MCUs; however, it is still an open question to understand the robustness of QNNs in the face of adversarial examples.
To fill this gap, we empirically evaluate the effectiveness of attacks and defenses from (full-precision) ANNs on (constrained) QNNs. Our evaluation includes three QNNs targeting TinyML applications, ten attacks, and six defenses. With this study, we draw a set of interesting findings. First, quantization increases the point distance to the decision boundary and leads the gradient estimated by some attacks to explode or vanish. Second, quantization can act as a noise attenuator or amplifier, depending on the noise magnitude, and causes gradient misalignment. Regarding adversarial defenses, we conclude that input pre-processing defenses show impressive results on small perturbations; however, they fall short as the perturbation increases. At the same time, train-based defenses increase the average point distance to the decision boundary, which holds after quantization. However, we argue that train-based defenses still need to smooth the quantization-shift and gradient misalignment phenomenons to counteract adversarial example transferability to QNNs. All artifacts are open-sourced to enable independent validation of results.
△ Less
Submitted 2 May, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
LocCa: Visual Pretraining with Location-aware Captioners
Authors:
Bo Wan,
Michael Tschannen,
Yongqin Xian,
Filip Pavetic,
Ibrahim Alabdulmohsin,
Xiao Wang,
André Susano Pinto,
Andreas Steiner,
Lucas Beyer,
Xiaohua Zhai
Abstract:
Image captioning has been shown as an effective pretraining method similar to contrastive pretraining. However, the incorporation of location-aware information into visual pretraining remains an area with limited research. In this paper, we propose a simple visual pretraining method with location-aware captioners (LocCa). LocCa uses a simple image captioner task interface, to teach a model to read…
▽ More
Image captioning has been shown as an effective pretraining method similar to contrastive pretraining. However, the incorporation of location-aware information into visual pretraining remains an area with limited research. In this paper, we propose a simple visual pretraining method with location-aware captioners (LocCa). LocCa uses a simple image captioner task interface, to teach a model to read out rich information, i.e. bounding box coordinates, and captions, conditioned on the image pixel input. Thanks to the multitask capabilities of an encoder-decoder architecture, we show that an image captioner can easily handle multiple tasks during pretraining. Our experiments demonstrate that LocCa outperforms standard captioners significantly on localization downstream tasks while maintaining comparable performance on holistic tasks.
△ Less
Submitted 11 November, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
SoK: Where's the "up"?! A Comprehensive (bottom-up) Study on the Security of Arm Cortex-M Systems
Authors:
Xi Tan,
Zheyuan Ma,
Sandro Pinto,
Le Guan,
Ning Zhang,
Jun Xu,
Zhiqiang Lin,
Hongxin Hu,
Ziming Zhao
Abstract:
Arm Cortex-M processors are the most widely used 32-bit microcontrollers among embedded and Internet-of-Things devices. Despite the widespread usage, there has been little effort in summarizing their hardware security features, characterizing the limitations and vulnerabilities of their hardware and software stack, and systematizing the research on securing these systems. The goals and contributio…
▽ More
Arm Cortex-M processors are the most widely used 32-bit microcontrollers among embedded and Internet-of-Things devices. Despite the widespread usage, there has been little effort in summarizing their hardware security features, characterizing the limitations and vulnerabilities of their hardware and software stack, and systematizing the research on securing these systems. The goals and contributions of this paper are multi-fold. First, we analyze the hardware security limitations and issues of Cortex-M systems. Second, we conducted a deep study of the software stack designed for Cortex-M and revealed its limitations, which is accompanied by an empirical analysis of 1,797 real-world firmware. Third, we categorize the reported bugs in Cortex-M software systems. Finally, we systematize the efforts that aim at securing Cortex-M systems and evaluate them in terms of the protections they offer, runtime performance, required hardware features, etc. Based on the insights, we develop a set of recommendations for the research community and MCU software developers.
△ Less
Submitted 13 May, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
Using Zero-shot Prompting in the Automatic Creation and Expansion of Topic Taxonomies for Tagging Retail Banking Transactions
Authors:
Daniel de S. Moraes,
Pedro T. C. Santos,
Polyana B. da Costa,
Matheus A. S. Pinto,
Ivan de J. P. Pinto,
Álvaro M. G. da Veiga,
Sergio Colcher,
Antonio J. G. Busson,
Rafael H. Rocha,
Rennan Gaio,
Rafael Miceli,
Gabriela Tourinho,
Marcos Rabaioli,
Leandro Santos,
Fellipe Marques,
David Favaro
Abstract:
This work presents an unsupervised method for automatically constructing and expanding topic taxonomies using instruction-based fine-tuned LLMs (Large Language Models). We apply topic modeling and keyword extraction techniques to create initial topic taxonomies and LLMs to post-process the resulting terms and create a hierarchy. To expand an existing taxonomy with new terms, we use zero-shot promp…
▽ More
This work presents an unsupervised method for automatically constructing and expanding topic taxonomies using instruction-based fine-tuned LLMs (Large Language Models). We apply topic modeling and keyword extraction techniques to create initial topic taxonomies and LLMs to post-process the resulting terms and create a hierarchy. To expand an existing taxonomy with new terms, we use zero-shot prompting to find out where to add new nodes, which, to our knowledge, is the first work to present such an approach to taxonomy tasks. We use the resulting taxonomies to assign tags that characterize merchants from a retail bank dataset. To evaluate our work, we asked 12 volunteers to answer a two-part form in which we first assessed the quality of the taxonomies created and then the tags assigned to merchants based on that taxonomy. The evaluation revealed a coherence rate exceeding 90% for the chosen taxonomies. The taxonomies' expansion with LLMs also showed exciting results for parent node prediction, with an f1-score above 70% in our taxonomies.
△ Less
Submitted 11 February, 2024; v1 submitted 7 January, 2024;
originally announced January 2024.
-
A Heterogeneous RISC-V based SoC for Secure Nano-UAV Navigation
Authors:
Luca Valente,
Alessandro Nadalini,
Asif Veeran,
Mattia Sinigaglia,
Bruno Sa,
Nils Wistoff,
Yvan Tortorella,
Simone Benatti,
Rafail Psiakis,
Ari Kulmala,
Baker Mohammad,
Sandro Pinto,
Daniele Palossi,
Luca Benini,
Davide Rossi
Abstract:
The rapid advancement of energy-efficient parallel ultra-low-power (ULP) ucontrollers units (MCUs) is enabling the development of autonomous nano-sized unmanned aerial vehicles (nano-UAVs). These sub-10cm drones represent the next generation of unobtrusive robotic helpers and ubiquitous smart sensors. However, nano-UAVs face significant power and payload constraints while requiring advanced comput…
▽ More
The rapid advancement of energy-efficient parallel ultra-low-power (ULP) ucontrollers units (MCUs) is enabling the development of autonomous nano-sized unmanned aerial vehicles (nano-UAVs). These sub-10cm drones represent the next generation of unobtrusive robotic helpers and ubiquitous smart sensors. However, nano-UAVs face significant power and payload constraints while requiring advanced computing capabilities akin to standard drones, including real-time Machine Learning (ML) performance and the safe co-existence of general-purpose and real-time OSs. Although some advanced parallel ULP MCUs offer the necessary ML computing capabilities within the prescribed power limits, they rely on small main memories (<1MB) and ucontroller-class CPUs with no virtualization or security features, and hence only support simple bare-metal runtimes. In this work, we present Shaheen, a 9mm2 200mW SoC implemented in 22nm FDX technology. Differently from state-of-the-art MCUs, Shaheen integrates a Linux-capable RV64 core, compliant with the v1.0 ratified Hypervisor extension and equipped with timing channel protection, along with a low-cost and low-power memory controller exposing up to 512MB of off-chip low-cost low-power HyperRAM directly to the CPU. At the same time, it integrates a fully programmable energy- and area-efficient multi-core cluster of RV32 cores optimized for general-purpose DSP as well as reduced- and mixed-precision ML. To the best of the authors' knowledge, it is the first silicon prototype of a ULP SoC coupling the RV64 and RV32 cores in a heterogeneous host+accelerator architecture fully based on the RISC-V ISA. We demonstrate the capabilities of the proposed SoC on a wide range of benchmarks relevant to nano-UAV applications. The cluster can deliver up to 90GOp/s and up to 1.8TOp/s/W on 2-bit integer kernels and up to 7.9GFLOp/s and up to 150GFLOp/s/W on 16-bit FP kernels.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Study of Multiuser Multiple-Antenna Wireless Communications Systems Based on Super-Resolution Arrays
Authors:
S. Pinto,
R. C. de Lamare
Abstract:
This work studies multiple-antenna wireless communication systems based on super-resolution arrays (SRAs). We consider the uplink of a multiple-antenna system in which users communicate with a multiple-antenna base station equipped with SRAs. In particular, we develop linear minimum mean-square error (MMSE) receive filters along with linear and successive interference cancellation receivers for pr…
▽ More
This work studies multiple-antenna wireless communication systems based on super-resolution arrays (SRAs). We consider the uplink of a multiple-antenna system in which users communicate with a multiple-antenna base station equipped with SRAs. In particular, we develop linear minimum mean-square error (MMSE) receive filters along with linear and successive interference cancellation receivers for processing signals with the difference co-array originating from the SRAs. We then derive analytical expressions to assess the achievable sum-rates associated with the proposed multiple-antenna systems with SRAs. Simulations show that the proposed multiple-antenna systems with SRAs outperform existing systems with standard arrays that have a larger number of antenna elements.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
Evaluating the Relationship Between News Source Sharing and Political Beliefs
Authors:
Sofía M del Pozo,
Sebastián Pinto,
Matteo Serafino,
Federico Moss,
Tomás Cicchini,
Hernán A Makse,
Pablo Balenzuela
Abstract:
In an era marked by an abundance of news sources, access to information significantly influences public opinion. Notably, the bias of news sources often serves as an indicator of individuals' political leanings. This study explores this hypothesis by examining the news sharing behavior of politically active social media users, whose political ideologies were identified in a previous study. Using c…
▽ More
In an era marked by an abundance of news sources, access to information significantly influences public opinion. Notably, the bias of news sources often serves as an indicator of individuals' political leanings. This study explores this hypothesis by examining the news sharing behavior of politically active social media users, whose political ideologies were identified in a previous study. Using correspondence analysis, we estimate the Media Sharing Index (MSI), a measure that captures bias in media outlets and user preferences within a hidden space. During Argentina's 2019 election on Twitter, we observed a predictable pattern: center-right individuals predominantly shared media from center-right biased outlets. However, it is noteworthy that those with center-left inclinations displayed a more diverse media consumption, which is a significant finding. Despite a noticeable polarization based on political affiliation observed in a retweet network analysis, center-left users showed more diverse media sharing preferences, particularly concerning the MSI. Although these findings are specific to Argentina, the developed methodology can be applied in other countries to assess the correlation between users' political leanings and the media they share.
△ Less
Submitted 15 October, 2024; v1 submitted 17 November, 2023;
originally announced November 2023.
-
Analyzing User Ideologies and Shared News During the 2019 Argentinian Elections
Authors:
Sofía M del Pozo,
Sebastián Pinto,
Matteo Serafino,
Lucio Garcia,
Hernán A Makse,
Pablo Balenzuela
Abstract:
The extensive data generated on social media platforms allow us to gain insights over trending topics and public opinions. Additionally, it offers a window into user behavior, including their content engagement and news sharing habits. In this study, we analyze the relationship between users' political ideologies and the news they share during Argentina's 2019 election period. Our findings reveal…
▽ More
The extensive data generated on social media platforms allow us to gain insights over trending topics and public opinions. Additionally, it offers a window into user behavior, including their content engagement and news sharing habits. In this study, we analyze the relationship between users' political ideologies and the news they share during Argentina's 2019 election period. Our findings reveal that users predominantly share news that aligns with their political beliefs, despite accessing media outlets with diverse political leanings. Moreover, we observe a consistent pattern of users sharing articles related to topics biased to their preferred candidates, highlighting a deeper level of political alignment in online discussions. We believe that this systematic analysis framework can be applied to similar scenarios in different countries, especially those marked by significant political polarization, akin to Argentina.
△ Less
Submitted 25 April, 2024; v1 submitted 12 October, 2023;
originally announced October 2023.
-
MCU-Wide Timing Side Channels and Their Detection
Authors:
Johannes Müller,
Anna Lena Duque Antón,
Lucas Deutschmann,
Dino Mehmedagić,
Cristiano Rodrigues,
Daniel Oliveira,
Keerthikumara Devarajegowda,
Mohammad Rahmani Fadiheh,
Sandro Pinto,
Dominik Stoffel,
Wolfgang Kunz
Abstract:
Microarchitectural timing side channels have been thoroughly investigated as a security threat in hardware designs featuring shared buffers (e.g., caches) or parallelism between attacker and victim task execution. However, contradicting common intuitions, recent activities demonstrate that this threat is real even in microcontroller SoCs without such features. In this paper, we describe SoC-wide t…
▽ More
Microarchitectural timing side channels have been thoroughly investigated as a security threat in hardware designs featuring shared buffers (e.g., caches) or parallelism between attacker and victim task execution. However, contradicting common intuitions, recent activities demonstrate that this threat is real even in microcontroller SoCs without such features. In this paper, we describe SoC-wide timing side channels previously neglected by security analysis and present a new formal method to close this gap. In a case study on the RISC-V Pulpissimo SoC, our method detected a vulnerability to a previously unknown attack variant that allows an attacker to obtain information about a victim's memory access behavior. After implementing a conservative fix, we were able to verify that the SoC is now secure w.r.t. the considered class of timing side channels.
△ Less
Submitted 18 July, 2024; v1 submitted 22 September, 2023;
originally announced September 2023.
-
IRQ Coloring and the Subtle Art of Mitigating Interrupt-generated Interference
Authors:
Diogo Costa,
Luca Cuomo,
Daniel Oliveira,
Ida Maria Savino,
Bruno Morelli,
José Martins,
Alessandro Biasci,
Sandro Pinto
Abstract:
Integrating workloads with differing criticality levels presents a formidable challenge in achieving the stringent spatial and temporal isolation requirements imposed by safety-critical standards such as ISO26262. The shift towards high-performance multicore platforms has been posing increasing issues to the so-called mixed-criticality systems (MCS) due to the reciprocal interference created by co…
▽ More
Integrating workloads with differing criticality levels presents a formidable challenge in achieving the stringent spatial and temporal isolation requirements imposed by safety-critical standards such as ISO26262. The shift towards high-performance multicore platforms has been posing increasing issues to the so-called mixed-criticality systems (MCS) due to the reciprocal interference created by consolidated subsystems vying for access to shared (microarchitectural) resources (e.g., caches, bus interconnect, memory controller). The research community has acknowledged all these challenges. Thus, several techniques, such as cache partitioning and memory throttling, have been proposed to mitigate such interference; however, these techniques have some drawbacks and limitations that impact performance, memory footprint, and availability. In this work, we look from a different perspective. Departing from the observation that safety-critical workloads are typically event- and thus interrupt-driven, we mask "colored" interrupts based on the \ac{QoS} assessment, providing fine-grain control to mitigate interference on critical workloads without entirely suspending non-critical workloads. We propose the so-called IRQ coloring technique. We implement and evaluate the IRQ Coloring on a reference high-performance multicore platform, i.e., Xilinx ZCU102. Results demonstrate negligible performance overhead, i.e., <1% for a 100 microseconds period, and reasonable throughput guarantees for medium-critical workloads. We argue that the IRQ coloring technique presents predictability and intermediate guarantees advantages compared to state-of-art mechanisms
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Authors:
Lucas Beyer,
Bo Wan,
Gagan Madan,
Filip Pavetic,
Andreas Steiner,
Alexander Kolesnikov,
André Susano Pinto,
Emanuele Bugliarello,
Xiao Wang,
Qihang Yu,
Liang-Chieh Chen,
Xiaohua Zhai
Abstract:
There has been a recent explosion of computer vision models which perform many tasks and are composed of an image encoder (usually a ViT) and an autoregressive decoder (usually a Transformer). However, most of this work simply presents one system and its results, leaving many questions regarding design decisions and trade-offs of such systems unanswered. In this work, we aim to provide such answer…
▽ More
There has been a recent explosion of computer vision models which perform many tasks and are composed of an image encoder (usually a ViT) and an autoregressive decoder (usually a Transformer). However, most of this work simply presents one system and its results, leaving many questions regarding design decisions and trade-offs of such systems unanswered. In this work, we aim to provide such answers. We take a close look at autoregressive decoders for multi-task learning in multimodal computer vision, including classification, captioning, visual question answering, and optical character recognition. Through extensive systematic experiments, we study the effects of task and data mixture, training and regularization hyperparameters, conditioning type and specificity, modality combination, and more. Importantly, we compare these to well-tuned single-task baselines to highlight the cost incurred by multi-tasking. A key finding is that a small decoder learned on top of a frozen pretrained encoder works surprisingly well. We call this setup locked-image tuning with decoder (LiT-decoder). It can be seen as teaching a decoder to interact with a pretrained vision model via natural language.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
Shedding Light on Static Partitioning Hypervisors for Arm-based Mixed-Criticality Systems
Authors:
José Martins,
Sandro Pinto
Abstract:
In this paper, we aim to understand the properties and guarantees of static partitioning hypervisors (SPH) for Arm-based mixed-criticality systems (MCS). To this end, we performed a comprehensive empirical evaluation of popular open-source SPH, i.e., Jailhouse, Xen (Dom0-less), Bao, and seL4 CAmkES VMM, focusing on two key requirements of modern MCS: real-time and safety. The goal of this study is…
▽ More
In this paper, we aim to understand the properties and guarantees of static partitioning hypervisors (SPH) for Arm-based mixed-criticality systems (MCS). To this end, we performed a comprehensive empirical evaluation of popular open-source SPH, i.e., Jailhouse, Xen (Dom0-less), Bao, and seL4 CAmkES VMM, focusing on two key requirements of modern MCS: real-time and safety. The goal of this study is twofold. Firstly, to empower industrial practitioners with hard data to reason about the different trade-offs of SPH. Secondly, we aim to raise awareness of the research and open-source communities to the still open problems in SPH by unveiling new insights regarding lingering weaknesses. All artifacts will be open-sourced to enable independent validation of results and encourage further exploration on SPH.
△ Less
Submitted 23 March, 2023; v1 submitted 20 March, 2023;
originally announced March 2023.
-
Tuning computer vision models with task rewards
Authors:
André Susano Pinto,
Alexander Kolesnikov,
Yuge Shi,
Lucas Beyer,
Xiaohua Zhai
Abstract:
Misalignment between model predictions and intended usage can be detrimental for the deployment of computer vision models. The issue is exacerbated when the task involves complex structured outputs, as it becomes harder to design procedures which address this misalignment. In natural language processing, this is often addressed using reinforcement learning techniques that align models with a task…
▽ More
Misalignment between model predictions and intended usage can be detrimental for the deployment of computer vision models. The issue is exacerbated when the task involves complex structured outputs, as it becomes harder to design procedures which address this misalignment. In natural language processing, this is often addressed using reinforcement learning techniques that align models with a task reward. We adopt this approach and show its surprising effectiveness across multiple computer vision tasks, such as object detection, panoptic segmentation, colorization and image captioning. We believe this approach has the potential to be widely useful for better aligning models with a diverse range of computer vision tasks.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
CVA6 RISC-V Virtualization: Architecture, Microarchitecture, and Design Space Exploration
Authors:
Bruno Sá,
Luca Valente,
José Martins,
Davide Rossi,
Luca Benini,
Sandro Pinto
Abstract:
Virtualization is a key technology used in a wide range of applications, from cloud computing to embedded systems. Over the last few years, mainstream computer architectures were extended with hardware virtualization support, giving rise to a set of virtualization technologies (e.g., Intel VT, Arm VE) that are now proliferating in modern processors and SoCs. In this article, we describe our work o…
▽ More
Virtualization is a key technology used in a wide range of applications, from cloud computing to embedded systems. Over the last few years, mainstream computer architectures were extended with hardware virtualization support, giving rise to a set of virtualization technologies (e.g., Intel VT, Arm VE) that are now proliferating in modern processors and SoCs. In this article, we describe our work on hardware virtualization support in the RISC-V CVA6 core. Our contribution is multifold and encompasses architecture, microarchitecture, and design space exploration. In particular, we highlight the design of a set of microarchitectural enhancements (i.e., G-Stage Translation Lookaside Buffer (GTLB), L2 TLB) to alleviate the virtualization performance overhead. We also perform a Design Space Exploration (DSE) and accompanying post-layout simulations (based on 22nm FDX technology) to assess Performance, Power ,and Area (PPA). Further, we map design variants on an FPGA platform (Genesys 2) to assess the functional performance-area trade-off. Based on the DSE, we select an optimal design point for the CVA6 with hardware virtualization support. For this optimal hardware configuration, we collected functional performance results by running the MiBench benchmark on Linux atop Bao hypervisor for a single-core configuration. We observed a performance speedup of up to 16% (approx. 12.5% on average) compared with virtualization-aware non-optimized design at the minimal cost of 0.78% in area and 0.33% in power. Finally, all work described in this article is publicly available and open-sourced for the community to further evaluate additional design configurations and software stacks.
△ Less
Submitted 4 August, 2023; v1 submitted 6 February, 2023;
originally announced February 2023.
-
Bao-Enclave: Virtualization-based Enclaves for Arm
Authors:
Samuel Pereira,
Joao Sousa,
Sandro Pinto,
José Martins,
David Cerdeira
Abstract:
General-purpose operating systems (GPOS), such as Linux, encompass several million lines of code. Statistically, a larger code base inevitably leads to a higher number of potential vulnerabilities and inherently a more vulnerable system. To minimize the impact of vulnerabilities in GPOS, it has become common to implement security-sensitive programs outside the domain of the GPOS, i.e., in a Truste…
▽ More
General-purpose operating systems (GPOS), such as Linux, encompass several million lines of code. Statistically, a larger code base inevitably leads to a higher number of potential vulnerabilities and inherently a more vulnerable system. To minimize the impact of vulnerabilities in GPOS, it has become common to implement security-sensitive programs outside the domain of the GPOS, i.e., in a Trusted Execution Environment (TEE). Arm TrustZone is the de-facto technology for implementing TEEs in Arm devices. However, over the last decade, TEEs have been successfully attacked hundreds of times. Unfortunately, these attacks have been possible due to the presence of several architectural and implementation flaws in TrustZone-based TEEs. In this paper, we propose Bao-Enclave, a virtualization-based solution that enables OEMs to remove security functionality from the TEE and move them into normal world isolated environments, protected from potentially malicious OSes, in the form of lightweight virtual machines (VMs). We evaluate Bao-Enclave on real hardware platforms and find out that Bao-Enclave may improve the performance of security-sensitive workloads by up to 4.8x, while significantly simplifying the TEE software TCB.
△ Less
Submitted 12 September, 2022;
originally announced September 2022.
-
UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
Authors:
Alexander Kolesnikov,
André Susano Pinto,
Lucas Beyer,
Xiaohua Zhai,
Jeremiah Harmsen,
Neil Houlsby
Abstract:
We introduce UViM, a unified approach capable of modeling a wide range of computer vision tasks. In contrast to previous models, UViM has the same functional form for all tasks; it requires no task-specific modifications which require extensive human expertise. The approach involves two components: (I) a base model (feed-forward) which is trained to directly predict raw vision outputs, guided by a…
▽ More
We introduce UViM, a unified approach capable of modeling a wide range of computer vision tasks. In contrast to previous models, UViM has the same functional form for all tasks; it requires no task-specific modifications which require extensive human expertise. The approach involves two components: (I) a base model (feed-forward) which is trained to directly predict raw vision outputs, guided by a learned discrete code and (II) a language model (autoregressive) that is trained to generate the guiding code. These components complement each other: the language model is well-suited to modeling structured interdependent data, while the base model is efficient at dealing with high-dimensional outputs. We demonstrate the effectiveness of UViM on three diverse and challenging vision tasks: panoptic segmentation, depth prediction and image colorization, where we achieve competitive and near state-of-the-art results. Our experimental results suggest that UViM is a promising candidate for a unified modeling approach in computer vision.
△ Less
Submitted 14 October, 2022; v1 submitted 20 May, 2022;
originally announced May 2022.
-
ReZone: Disarming TrustZone with TEE Privilege Reduction
Authors:
David Cerdeira,
José Martins,
Nuno Santos,
Sandro Pinto
Abstract:
In TrustZone-assisted TEEs, the trusted OS has unrestricted access to both secure and normal world memory. Unfortunately, this architectural limitation has opened an aisle of exploration for attackers, which have demonstrated how to leverage a chain of exploits to hijack the trusted OS and gain full control of the system, targeting (i) the rich execution environment (REE), (ii) all trusted applica…
▽ More
In TrustZone-assisted TEEs, the trusted OS has unrestricted access to both secure and normal world memory. Unfortunately, this architectural limitation has opened an aisle of exploration for attackers, which have demonstrated how to leverage a chain of exploits to hijack the trusted OS and gain full control of the system, targeting (i) the rich execution environment (REE), (ii) all trusted applications (TAs), and (iii) the secure monitor. In this paper, we propose ReZone. The main novelty behind ReZone design relies on leveraging TrustZone-agnostic hardware primitives available on commercially off-the-shelf (COTS) platforms to restrict the privileges of the trusted OS. With ReZone, a monolithic TEE is restructured and partitioned into multiple sandboxed domains named zones, which have only access to private resources. We have fully implemented ReZone for the i.MX 8MQuad EVK and integrated it with Android OS and OP-TEE. We extensively evaluated ReZone using microbenchmarks and real-world applications. ReZone can sustain popular applications like DRM-protected video encoding with acceptable performance overheads. We have surveyed 80 CVE vulnerability reports and estimate that ReZone could mitigate 86.84% of them.
△ Less
Submitted 2 March, 2022;
originally announced March 2022.
-
Learning to Merge Tokens in Vision Transformers
Authors:
Cedric Renggli,
André Susano Pinto,
Neil Houlsby,
Basil Mustafa,
Joan Puigcerver,
Carlos Riquelme
Abstract:
Transformers are widely applied to solve natural language understanding and computer vision tasks. While scaling up these architectures leads to improved performance, it often comes at the expense of much higher computational costs. In order for large-scale models to remain practical in real-world systems, there is a need for reducing their computational overhead. In this work, we present the Patc…
▽ More
Transformers are widely applied to solve natural language understanding and computer vision tasks. While scaling up these architectures leads to improved performance, it often comes at the expense of much higher computational costs. In order for large-scale models to remain practical in real-world systems, there is a need for reducing their computational overhead. In this work, we present the PatchMerger, a simple module that reduces the number of patches or tokens the network has to process by merging them between two consecutive intermediate layers. We show that the PatchMerger achieves a significant speedup across various model sizes while matching the original performance both upstream and downstream after fine-tuning.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
Reconstructing social sensitivity from evolution of content volume in Twitter
Authors:
Sebastián Pinto,
Marcos Trevisan,
Pablo Balenzuela
Abstract:
We set up a simple mathematical model for the dynamics of public interest in terms of media coverage and social interactions. We test the model on a series of events related to violence in the US during 2020, using the volume of tweets and retweets as a proxy of public interest, and the volume of news as a proxy of media coverage. The model succesfully fits the data and allows inferring a measure…
▽ More
We set up a simple mathematical model for the dynamics of public interest in terms of media coverage and social interactions. We test the model on a series of events related to violence in the US during 2020, using the volume of tweets and retweets as a proxy of public interest, and the volume of news as a proxy of media coverage. The model succesfully fits the data and allows inferring a measure of social sensibility that correlates with human mobility data. These findings suggest the basic ingredients and mechanisms that regulate social responses capable of ignite social mobilizations.
△ Less
Submitted 3 October, 2022; v1 submitted 21 December, 2021;
originally announced December 2021.
-
Study of Linear Precoding and Power Allocation for Large Multiple-Antenna Systems with Coarsely Quantized Signals
Authors:
S. F. Pinto,
R. C. de Lamare
Abstract:
This work studies coarse quantization-aware BD (${\scriptstyle\mathrm{CQA-BD}}$) and coarse quantization-aware RBD (${\scriptstyle\mathrm{CQA-RBD}}$) precoding algorithms for large-scale MU-MIMO systems with coarsely quantized signals and proposes the coarse-quantization most advantageous allocation strategy (${\scriptstyle\mathrm{CQA-MAAS}}$) power allocation algorithm for linearly-precoded MU-MI…
▽ More
This work studies coarse quantization-aware BD (${\scriptstyle\mathrm{CQA-BD}}$) and coarse quantization-aware RBD (${\scriptstyle\mathrm{CQA-RBD}}$) precoding algorithms for large-scale MU-MIMO systems with coarsely quantized signals and proposes the coarse-quantization most advantageous allocation strategy (${\scriptstyle\mathrm{CQA-MAAS}}$) power allocation algorithm for linearly-precoded MU-MIMO systems. An analysis of the sum-rate along with studies of computational complexity is also carried out. Finally, comparisons between existing precoding and its power allocated version are followed by conclusions.
△ Less
Submitted 13 December, 2021;
originally announced December 2021.
-
Finding the Minimum Norm and Center Density of Cyclic Lattices via Nonlinear Systems
Authors:
William Lima da Silva Pinto,
Carina Alves
Abstract:
Lattices with a circulant generator matrix represent a subclass of cyclic lattices. This subclass can be described by a basis containing a vector and its circular shifts. In this paper, we present certain conditions under which the norm expression of an arbitrary vector of this type of lattice is substantially simplified, and then investigate some of the lattices obtained under these conditions. W…
▽ More
Lattices with a circulant generator matrix represent a subclass of cyclic lattices. This subclass can be described by a basis containing a vector and its circular shifts. In this paper, we present certain conditions under which the norm expression of an arbitrary vector of this type of lattice is substantially simplified, and then investigate some of the lattices obtained under these conditions. We exhibit systems of nonlinear equations whose solutions yield lattices as dense as $D_n$ in odd dimensions. As far as even dimensions, we obtain lattices denser than $A_n$ as long as $n \in 2\mathbb{Z} \backslash 4\mathbb{Z}$.
△ Less
Submitted 5 July, 2023; v1 submitted 15 November, 2021;
originally announced November 2021.
-
Shifting Capsule Networks from the Cloud to the Deep Edge
Authors:
Miguel Costa,
Diogo Costa,
Tiago Gomes,
Sandro Pinto
Abstract:
Capsule networks (CapsNets) are an emerging trend in image processing. In contrast to a convolutional neural network, CapsNets are not vulnerable to object deformation, as the relative spatial information of the objects is preserved across the network. However, their complexity is mainly related to the capsule structure and the dynamic routing mechanism, which makes it almost unreasonable to deplo…
▽ More
Capsule networks (CapsNets) are an emerging trend in image processing. In contrast to a convolutional neural network, CapsNets are not vulnerable to object deformation, as the relative spatial information of the objects is preserved across the network. However, their complexity is mainly related to the capsule structure and the dynamic routing mechanism, which makes it almost unreasonable to deploy a CapsNet, in its original form, in a resource-constrained device powered by a small microcontroller (MCU). In an era where intelligence is rapidly shifting from the cloud to the edge, this high complexity imposes serious challenges to the adoption of CapsNets at the very edge. To tackle this issue, we present an API for the execution of quantized CapsNets in Arm Cortex-M and RISC-V MCUs. Our software kernels extend the Arm CMSIS-NN and RISC-V PULP-NN to support capsule operations with 8-bit integers as operands. Along with it, we propose a framework to perform post-training quantization of a CapsNet. Results show a reduction in memory footprint of almost 75%, with accuracy loss ranging from 0.07% to 0.18%. In terms of throughput, our Arm Cortex-M API enables the execution of primary capsule and capsule layers with medium-sized kernels in just 119.94 and 90.60 milliseconds (ms), respectively (STM32H755ZIT6U, Cortex-M7 @ 480 MHz). For the GAP-8 SoC (RISC-V RV32IMCXpulp @ 170 MHz), the latency drops to 7.02 and 38.03 ms, respectively.
△ Less
Submitted 15 June, 2022; v1 submitted 6 October, 2021;
originally announced October 2021.
-
Study of Block Diagonalization Precoding and Power Allocation for Multiple-Antenna Systems with Coarsely Quantized Signals
Authors:
S. Pinto,
R. de Lamare
Abstract:
In this work, we present block diagonalization and power allocation algorithms for large-scale multiple-antenna systems with coarsely quantized signals. In particular, we develop Coarse Quantization-Aware Block Diagonalization ${\scriptstyle\mathrm{\left(CQA-BD\right)}}$ and Coarse Quantization-Aware Regularized Block Diagonalization ${\scriptstyle\mathrm{\left(CQA-RBD\right)}}$ precoding algorith…
▽ More
In this work, we present block diagonalization and power allocation algorithms for large-scale multiple-antenna systems with coarsely quantized signals. In particular, we develop Coarse Quantization-Aware Block Diagonalization ${\scriptstyle\mathrm{\left(CQA-BD\right)}}$ and Coarse Quantization-Aware Regularized Block Diagonalization ${\scriptstyle\mathrm{\left(CQA-RBD\right)}}$ precoding algorithms that employ the Bussgang decomposition and can mitigate the effects of low-resolution signals and interference. Moreover, we also devise the Coarse Quantization-Aware Most Advantageous Allocation Strategy ${\scriptstyle\mathrm{\left(CQA-MAAS\right)}}$ power allocation algorithm to improve the sum rate of precoders that operate with low-resolution signals. An analysis of the sum-rate performance is carried out along with computational complexity and power consumption studies of the proposed and existing techniques. Simulation results illustrate the performance of the proposed ${\scriptstyle\mathrm{CQA-BD}}$ and ${\scriptstyle\mathrm{CQA-RBD}}$ precoding algorithms, and the proposed ${\scriptstyle\mathrm{CQA-MAAS}}$ power allocation strategy against existing approaches.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Towards a Trusted Execution Environment via Reconfigurable FPGA
Authors:
Sérgio Pereira,
David Cerdeira,
Cristiano Rodrigues,
Sandro Pinto
Abstract:
Trusted Execution Environments (TEEs) are used to protect sensitive data and run secure execution for security-critical applications, by providing an environment isolated from the rest of the system. However, over the last few years, TEEs have been proven weak, as either TEEs built upon security-oriented hardware extensions (e.g., Arm TrustZone) or resorting to dedicated secure elements were explo…
▽ More
Trusted Execution Environments (TEEs) are used to protect sensitive data and run secure execution for security-critical applications, by providing an environment isolated from the rest of the system. However, over the last few years, TEEs have been proven weak, as either TEEs built upon security-oriented hardware extensions (e.g., Arm TrustZone) or resorting to dedicated secure elements were exploited multiple times. In this project, we introduce Trusted Execution Environments On-Demand (TEEOD), a novel TEE design that leverages the programmable logic (PL) in the heterogeneous system on chips (SoC) as the secure execution environment. Unlike other TEE designs, TEEOD can provide high-bandwidth connections and physical on-chip isolation. We implemented a proof-of-concept (PoC) implementation targeting an Ultra96-V2 platform. The conducted evaluation demonstrated TEEOD can host up to 6 simultaneous enclaves with a resource usage per enclave of 7.0%, 3.8%, and 15.3% of the total LUTs, FFs, and BRAMS, respectively. To demonstrate the practicability of TEEOD in real-world applications, we successfully run a legacy open-source Bitcoin wallet.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Scaling Vision with Sparse Mixture of Experts
Authors:
Carlos Riquelme,
Joan Puigcerver,
Basil Mustafa,
Maxim Neumann,
Rodolphe Jenatton,
André Susano Pinto,
Daniel Keysers,
Neil Houlsby
Abstract:
Sparsely-gated Mixture of Experts networks (MoEs) have demonstrated excellent scalability in Natural Language Processing. In Computer Vision, however, almost all performant networks are "dense", that is, every input is processed by every parameter. We present a Vision MoE (V-MoE), a sparse version of the Vision Transformer, that is scalable and competitive with the largest dense networks. When app…
▽ More
Sparsely-gated Mixture of Experts networks (MoEs) have demonstrated excellent scalability in Natural Language Processing. In Computer Vision, however, almost all performant networks are "dense", that is, every input is processed by every parameter. We present a Vision MoE (V-MoE), a sparse version of the Vision Transformer, that is scalable and competitive with the largest dense networks. When applied to image recognition, V-MoE matches the performance of state-of-the-art networks, while requiring as little as half of the compute at inference time. Further, we propose an extension to the routing algorithm that can prioritize subsets of each input across the entire batch, leading to adaptive per-image compute. This allows V-MoE to trade-off performance and compute smoothly at test-time. Finally, we demonstrate the potential of V-MoE to scale vision models, and train a 15B parameter model that attains 90.35% on ImageNet.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
A First Look at RISC-V Virtualization from an Embedded Systems Perspective
Authors:
Bruno Sá,
José Martins,
Sandro Pinto
Abstract:
This article describes the first public implementation and evaluation of the latest version of the RISC-V hypervisor extension (H-extension v0.6.1) specification in a Rocket chip core. To perform a meaningful evaluation for modern multi-core embedded and mixedcriticality systems, we have ported Bao, an open-source static partitioning hypervisor, to RISC-V. We have also extended the RISC-V platform…
▽ More
This article describes the first public implementation and evaluation of the latest version of the RISC-V hypervisor extension (H-extension v0.6.1) specification in a Rocket chip core. To perform a meaningful evaluation for modern multi-core embedded and mixedcriticality systems, we have ported Bao, an open-source static partitioning hypervisor, to RISC-V. We have also extended the RISC-V platformlevel interrupt controller (PLIC) to enable direct guest interrupt injection with low and deterministic latency and we have enhanced the timer infrastructure to avoid trap and emulation overheads. Experiments were carried out in FireSim, a cycle-accurate, FPGA-accelerated simulator, and the system was also successfully deployed and tested in a Zynq UltraScale+ MPSoC ZCU104. Our hardware implementation was opensourced and is currently in use by the RISC-V community towards the ratification of the H-extension specification.
△ Less
Submitted 16 August, 2021; v1 submitted 27 March, 2021;
originally announced March 2021.
-
uTango: an open-source TEE for IoT devices
Authors:
Daniel Oliveira,
Tiago Gomes,
Sandro Pinto
Abstract:
Security is one of the main challenges of the Internet of Things (IoT). IoT devices are mainly powered by low-cost microcontrollers (MCUs) that typically lack basic hardware security mechanisms to separate security-critical applications from less critical components. Recently, Arm has started to release Cortex-M MCUs enhanced with TrustZone technology (i.e., TrustZone-M), a system-wide security so…
▽ More
Security is one of the main challenges of the Internet of Things (IoT). IoT devices are mainly powered by low-cost microcontrollers (MCUs) that typically lack basic hardware security mechanisms to separate security-critical applications from less critical components. Recently, Arm has started to release Cortex-M MCUs enhanced with TrustZone technology (i.e., TrustZone-M), a system-wide security solution aiming at providing robust protection for IoT devices. Trusted Execution Environments (TEEs) relying on TrustZone hardware have been perceived as safe havens for securing mobile devices. However, for the past few years, considerable effort has gone into unveiling hundreds of vulnerabilities and proposing a collection of relevant defense techniques to address several issues. While new TEE solutions built on TrustZone-M start flourishing, the lessons gathered from the research community appear to be falling short, as these new systems are trapping into the same pitfalls of the past.
In this paper, we present uTango, the first multi-world TEE for modern IoT devices. uTango proposes a novel architecture aiming at tackling the major architectural deficiencies currently affecting TrustZone(-M)-assisted TEEs. In particular, we leverage the very same TrustZone hardware primitives used by dual-world implementations to create multiple and equally secure execution environments within the normal world. We demonstrate the benefits of uTango by conducting an extensive evaluation on a real TrustZone-M hardware platform, i.e., Arm Musca-B1. uTango will be open-sourced and freely available on GitHub in hopes of engaging academia and industry on securing the foreseeable trillion IoT devices.
△ Less
Submitted 16 February, 2022; v1 submitted 6 February, 2021;
originally announced February 2021.
-
Shift If You Can: Counting and Visualising Correction Operations for Beat Tracking Evaluation
Authors:
A. Sá Pinto,
I. Domingues,
M. E. P. Davies
Abstract:
In this late-breaking abstract we propose a modified approach for beat tracking evaluation which poses the problem in terms of the effort required to transform a sequence of beat detections such that they maximise the well-known F-measure calculation when compared to a sequence of ground truth annotations. Central to our approach is the inclusion of a shifting operation conducted over an additiona…
▽ More
In this late-breaking abstract we propose a modified approach for beat tracking evaluation which poses the problem in terms of the effort required to transform a sequence of beat detections such that they maximise the well-known F-measure calculation when compared to a sequence of ground truth annotations. Central to our approach is the inclusion of a shifting operation conducted over an additional, larger, tolerance window, which can substitute the combination of insertions and deletions. We describe a straightforward calculation of annotation efficiency and combine this with an informative visualisation which can be of use for the qualitative evaluation of beat tracking systems. We make our implementation and visualisation code freely available in a GitHub repository.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
Deep Ensembles for Low-Data Transfer Learning
Authors:
Basil Mustafa,
Carlos Riquelme,
Joan Puigcerver,
André Susano Pinto,
Daniel Keysers,
Neil Houlsby
Abstract:
In the low-data regime, it is difficult to train good supervised models from scratch. Instead practitioners turn to pre-trained models, leveraging transfer learning. Ensembling is an empirically and theoretically appealing way to construct powerful predictive models, but the predominant approach of training multiple deep networks with different random initialisations collides with the need for tra…
▽ More
In the low-data regime, it is difficult to train good supervised models from scratch. Instead practitioners turn to pre-trained models, leveraging transfer learning. Ensembling is an empirically and theoretically appealing way to construct powerful predictive models, but the predominant approach of training multiple deep networks with different random initialisations collides with the need for transfer via pre-trained weights. In this work, we study different ways of creating ensembles from pre-trained models. We show that the nature of pre-training itself is a performant source of diversity, and propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset. The approach is simple: Use nearest-neighbour accuracy to rank pre-trained models, fine-tune the best ones with a small hyperparameter sweep, and greedily construct an ensemble to minimise validation cross-entropy. When evaluated together with strong baselines on 19 different downstream tasks (the Visual Task Adaptation Benchmark), this achieves state-of-the-art performance at a much lower inference budget, even when selecting from over 2,000 pre-trained models. We also assess our ensembles on ImageNet variants and show improved robustness to distribution shift.
△ Less
Submitted 19 October, 2020; v1 submitted 14 October, 2020;
originally announced October 2020.
-
Which Model to Transfer? Finding the Needle in the Growing Haystack
Authors:
Cedric Renggli,
André Susano Pinto,
Luka Rimanic,
Joan Puigcerver,
Carlos Riquelme,
Ce Zhang,
Mario Lucic
Abstract:
Transfer learning has been recently popularized as a data-efficient alternative to training models from scratch, in particular for computer vision tasks where it provides a remarkably solid baseline. The emergence of rich model repositories, such as TensorFlow Hub, enables the practitioners and researchers to unleash the potential of these models across a wide range of downstream tasks. As these r…
▽ More
Transfer learning has been recently popularized as a data-efficient alternative to training models from scratch, in particular for computer vision tasks where it provides a remarkably solid baseline. The emergence of rich model repositories, such as TensorFlow Hub, enables the practitioners and researchers to unleash the potential of these models across a wide range of downstream tasks. As these repositories keep growing exponentially, efficiently selecting a good model for the task at hand becomes paramount. We provide a formalization of this problem through a familiar notion of regret and introduce the predominant strategies, namely task-agnostic (e.g. ranking models by their ImageNet performance) and task-aware search strategies (such as linear or kNN evaluation). We conduct a large-scale empirical study and show that both task-agnostic and task-aware methods can yield high regret. We then propose a simple and computationally efficient hybrid search strategy which outperforms the existing approaches. We highlight the practical benefits of the proposed solution on a set of 19 diverse vision tasks.
△ Less
Submitted 25 March, 2022; v1 submitted 13 October, 2020;
originally announced October 2020.
-
Training general representations for remote sensing using in-domain knowledge
Authors:
Maxim Neumann,
André Susano Pinto,
Xiaohua Zhai,
Neil Houlsby
Abstract:
Automatically finding good and general remote sensing representations allows to perform transfer learning on a wide range of applications - improving the accuracy and reducing the required number of training samples. This paper investigates development of generic remote sensing representations, and explores which characteristics are important for a dataset to be a good source for representation le…
▽ More
Automatically finding good and general remote sensing representations allows to perform transfer learning on a wide range of applications - improving the accuracy and reducing the required number of training samples. This paper investigates development of generic remote sensing representations, and explores which characteristics are important for a dataset to be a good source for representation learning. For this analysis, five diverse remote sensing datasets are selected and used for both, disjoint upstream representation learning and downstream model training and evaluation. A common evaluation protocol is used to establish baselines for these datasets that achieve state-of-the-art performance. As the results indicate, especially with a low number of available training samples a significant performance enhancement can be observed when including additionally in-domain data in comparison to training models from scratch or fine-tuning only on ImageNet (up to 11% and 40%, respectively, at 100 training samples). All datasets and pretrained representation models are published online.
△ Less
Submitted 30 September, 2020;
originally announced October 2020.
-
Scalable Transfer Learning with Expert Models
Authors:
Joan Puigcerver,
Carlos Riquelme,
Basil Mustafa,
Cedric Renggli,
André Susano Pinto,
Sylvain Gelly,
Daniel Keysers,
Neil Houlsby
Abstract:
Transfer of pre-trained representations can improve sample efficiency and reduce computational requirements for new tasks. However, representations used for transfer are usually generic, and are not tailored to a particular distribution of downstream tasks. We explore the use of expert representations for transfer with a simple, yet effective, strategy. We train a diverse set of experts by exploit…
▽ More
Transfer of pre-trained representations can improve sample efficiency and reduce computational requirements for new tasks. However, representations used for transfer are usually generic, and are not tailored to a particular distribution of downstream tasks. We explore the use of expert representations for transfer with a simple, yet effective, strategy. We train a diverse set of experts by exploiting existing label structures, and use cheap-to-compute performance proxies to select the relevant expert for each target task. This strategy scales the process of transferring to new tasks, since it does not revisit the pre-training data during transfer. Accordingly, it requires little extra compute per target task, and results in a speed-up of 2-3 orders of magnitude compared to competing approaches. Further, we provide an adapter-based architecture able to compress many experts into a single model. We evaluate our approach on two different data sources and demonstrate that it outperforms baselines on over 20 diverse vision tasks in both cases.
△ Less
Submitted 28 September, 2020;
originally announced September 2020.
-
Study of Coarse Quantization-Aware Block Diagonalization Algorithms for MIMO Systems with Low Resolution
Authors:
S. B. Pinto,
R. C. de Lamare
Abstract:
It is known that the estimated energy consumption of digital-to analog converters (DACs) is around 30\% of the energy consumed by analog-to-digital converters (ADCs) keeping fixed the sampling rate and bit resolution. Assuming that similarly to ADC, DAC dissipation doubles with every extra bit of resolution, a decrease in two resolution bits, for instance from 4 to 2 bits, represents a 75$\% $ low…
▽ More
It is known that the estimated energy consumption of digital-to analog converters (DACs) is around 30\% of the energy consumed by analog-to-digital converters (ADCs) keeping fixed the sampling rate and bit resolution. Assuming that similarly to ADC, DAC dissipation doubles with every extra bit of resolution, a decrease in two resolution bits, for instance from 4 to 2 bits, represents a 75$\% $ lower dissipation. The current limitations in sum-rates of 1-bit quantization have motivated researchers to consider extra bits in resolution to obtain higher levels of sum-rates. Following this, we devise coarse quantization-aware precoding using few bits for the broadcast channel of multiple-antenna systems based on the Bussgang theorem. In particular, we consider block diagonalization algorithms, which have not been considered in the literature so far. The sum-rates achieved by the proposed Coarse Quantization-Aware Block Diagonalization (CQA-BD) and its regularized version (CQA-RBD) are superior to those previously reported in the literature. Simulations illustrate the performance of the proposed CQA-BD and CGA-RBD algorithms against existing approaches.
△ Less
Submitted 22 February, 2020;
originally announced February 2020.
-
In-domain representation learning for remote sensing
Authors:
Maxim Neumann,
Andre Susano Pinto,
Xiaohua Zhai,
Neil Houlsby
Abstract:
Given the importance of remote sensing, surprisingly little attention has been paid to it by the representation learning community. To address it and to establish baselines and a common evaluation protocol in this domain, we provide simplified access to 5 diverse remote sensing datasets in a standardized form. Specifically, we investigate in-domain representation learning to develop generic remote…
▽ More
Given the importance of remote sensing, surprisingly little attention has been paid to it by the representation learning community. To address it and to establish baselines and a common evaluation protocol in this domain, we provide simplified access to 5 diverse remote sensing datasets in a standardized form. Specifically, we investigate in-domain representation learning to develop generic remote sensing representations and explore which characteristics are important for a dataset to be a good source for remote sensing representation learning. The established baselines achieve state-of-the-art performance on these datasets.
△ Less
Submitted 15 November, 2019;
originally announced November 2019.
-
A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark
Authors:
Xiaohua Zhai,
Joan Puigcerver,
Alexander Kolesnikov,
Pierre Ruyssen,
Carlos Riquelme,
Mario Lucic,
Josip Djolonga,
Andre Susano Pinto,
Maxim Neumann,
Alexey Dosovitskiy,
Lucas Beyer,
Olivier Bachem,
Michael Tschannen,
Marcin Michalski,
Olivier Bousquet,
Sylvain Gelly,
Neil Houlsby
Abstract:
Representation learning promises to unlock deep learning for the long tail of vision tasks without expensive labelled datasets. Yet, the absence of a unified evaluation for general visual representations hinders progress. Popular protocols are often too constrained (linear classification), limited in diversity (ImageNet, CIFAR, Pascal-VOC), or only weakly related to representation quality (ELBO, r…
▽ More
Representation learning promises to unlock deep learning for the long tail of vision tasks without expensive labelled datasets. Yet, the absence of a unified evaluation for general visual representations hinders progress. Popular protocols are often too constrained (linear classification), limited in diversity (ImageNet, CIFAR, Pascal-VOC), or only weakly related to representation quality (ELBO, reconstruction error). We present the Visual Task Adaptation Benchmark (VTAB), which defines good representations as those that adapt to diverse, unseen tasks with few examples. With VTAB, we conduct a large-scale study of many popular publicly-available representation learning algorithms. We carefully control confounders such as architecture and tuning budget. We address questions like: How effective are ImageNet representations beyond standard natural datasets? How do representations trained via generative and discriminative models compare? To what extent can self-supervision replace labels? And, how close are we to general visual representations?
△ Less
Submitted 21 February, 2020; v1 submitted 1 October, 2019;
originally announced October 2019.
-
Analyzing Mass Media influence using natural language processing and time series analysis
Authors:
Federico Albanese,
Sebastián Pinto,
Viktoriya Semeshenko,
Pablo Balenzuela
Abstract:
A key question of collective social behavior is related to the influence of Mass Media on public opinion. Different approaches have been developed to address quantitatively this issue, ranging from field experiments to mathematical models. In this work we propose a combination of tools involving natural language processing and time series analysis. We compare selected features of mass media news a…
▽ More
A key question of collective social behavior is related to the influence of Mass Media on public opinion. Different approaches have been developed to address quantitatively this issue, ranging from field experiments to mathematical models. In this work we propose a combination of tools involving natural language processing and time series analysis. We compare selected features of mass media news articles with measurable manifestation of public opinion. We apply our analysis to news articles belonging to the 2016 U.S. presidential campaign. We compare variations in polls (as a proxy of public opinion) with changes in the connotation of the news (sentiment) or in the agenda (topics) of a selected group of media outlets. Our results suggest that the sentiment content by itself is not enough to understand the differences in polls, but the combination of topics coverage and sentiment content provides an useful insight of the context in which public opinion varies. The methodology employed in this work is far general and can be easily extended to other topics of interest.
△ Less
Submitted 12 June, 2020; v1 submitted 6 September, 2019;
originally announced September 2019.
-
Predicting assisted ventilation in Amyotrophic Lateral Sclerosis using a mixture of experts and conformal predictors
Authors:
Telma Pereira,
Sofia Pires,
Marta Gromicho,
Susana Pinto,
Mamede de Carvalho,
Sara C. Madeira
Abstract:
Amyotrophic Lateral Sclerosis (ALS) is a neurodegenerative disease characterized by a rapid motor decline, leading to respiratory failure and subsequently to death. In this context, researchers have sought for models to automatically predict disease progression to assisted ventilation in ALS patients. However, the clinical translation of such models is limited by the lack of insight 1) on the risk…
▽ More
Amyotrophic Lateral Sclerosis (ALS) is a neurodegenerative disease characterized by a rapid motor decline, leading to respiratory failure and subsequently to death. In this context, researchers have sought for models to automatically predict disease progression to assisted ventilation in ALS patients. However, the clinical translation of such models is limited by the lack of insight 1) on the risk of error for predictions at patient-level, and 2) on the most adequate time to administer the non-invasive ventilation. To address these issues, we combine Conformal Prediction (a machine learning framework that complements predictions with confidence measures) and a mixture experts into a prognostic model which not only predicts whether an ALS patient will suffer from respiratory insufficiency but also the most likely time window of occurrence, at a given reliability level. Promising results were obtained, with near 80% of predictions being correctly identified.
△ Less
Submitted 30 July, 2019;
originally announced July 2019.
-
Well-Rounded Lattices via Polynomials
Authors:
Carina Alves,
William Lima da Silva Pinto,
Antonio Aparecido de Andrade
Abstract:
Well-rounded lattices have been a topic of recent studies with applications in wiretap channels and in cryptography. A lattice of full rank in Euclidean space is called well-rounded if its set of minimal vectors spans the whole space. In this paper, we investigate when lattices coming from polynomials with integer coefficients are well-rounded.
Well-rounded lattices have been a topic of recent studies with applications in wiretap channels and in cryptography. A lattice of full rank in Euclidean space is called well-rounded if its set of minimal vectors spans the whole space. In this paper, we investigate when lattices coming from polynomials with integer coefficients are well-rounded.
△ Less
Submitted 6 April, 2019;
originally announced April 2019.
-
Direction Finding Based on Multi-Step Knowledge-Aided Iterative Conjugate Gradient Algorithms
Authors:
S. Pinto,
R. C. de Lamare
Abstract:
In this work, we present direction-of-arrival (DoA) estimation algorithms based on the Krylov subspace that effectively exploit prior knowledge of the signals that impinge on a sensor array. The proposed multi-step knowledge-aided iterative conjugate gradient (CG) (MS-KAI-CG) algorithms perform subtraction of the unwanted terms found in the estimated covariance matrix of the sensor data. Furthermo…
▽ More
In this work, we present direction-of-arrival (DoA) estimation algorithms based on the Krylov subspace that effectively exploit prior knowledge of the signals that impinge on a sensor array. The proposed multi-step knowledge-aided iterative conjugate gradient (CG) (MS-KAI-CG) algorithms perform subtraction of the unwanted terms found in the estimated covariance matrix of the sensor data. Furthermore, we develop a version of MS-KAI-CG equipped with forward-backward averaging, called MS-KAI-CG-FB, which is appropriate for scenarios with correlated signals. Unlike current knowledge-aided methods, which take advantage of known DoAs to enhance the estimation of the covariance matrix of the input data, the MS-KAI-CG algorithms take advantage of the knowledge of the structure of the forward-backward smoothed covariance matrix and its disturbance terms. Simulations with both uncorrelated and correlated signals show that the MS-KAI-CG algorithms outperform existing techniques.
△ Less
Submitted 15 December, 2018;
originally announced December 2018.
-
Study of Multi-Step Knowledge-Aided Iterative Nested MUSIC for Direction Finding
Authors:
S. Pinto,
R. C. de Lamare
Abstract:
In this work, we propose a subspace-based algorithm for direction-of-arrival (DOA) estimation applied to the signals impinging on a two-level nested array, referred to as multi-step knowledge-aided iterative nested MUSIC method (MS-KAI-Nested-MUSIC), which significantly improves the accuracy of the original Nested-MUSIC. Differently from existing knowledge-aided methods applied to uniform linear a…
▽ More
In this work, we propose a subspace-based algorithm for direction-of-arrival (DOA) estimation applied to the signals impinging on a two-level nested array, referred to as multi-step knowledge-aided iterative nested MUSIC method (MS-KAI-Nested-MUSIC), which significantly improves the accuracy of the original Nested-MUSIC. Differently from existing knowledge-aided methods applied to uniform linear arrays (ULAs), which make use of available known DOAs to improve the estimation of the covariance matrix of the input data, the proposed Multi-Step KAI-Nested-MU employs knowledge of the structure of the augmented sample covariance matrix, which is obtained by exploiting the difference co-array structure covariance matrix, and its perturbation terms and the gradual incorporation of prior knowledge, which is obtained on line. The effectiveness of the proposed technique can be noticed by simulations focusing on uncorrelated closely-spaced sources.
△ Less
Submitted 18 November, 2018;
originally announced November 2018.