Skip to main content

Showing 1–19 of 19 results for author: Lisanti, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.21549  [pdf, ps, other

    cs.CV

    SiM3D: Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark

    Authors: Alex Costanzino, Pierluigi Zama Ramirez, Luigi Lella, Matteo Ragaglia, Alessandro Oliva, Giuseppe Lisanti, Luigi Di Stefano

    Abstract: We propose SiM3D, the first benchmark considering the integration of multiview and multimodal information for comprehensive 3D anomaly detection and segmentation (ADS), where the task is to produce a voxel-based Anomaly Volume. Moreover, SiM3D focuses on a scenario of high interest in manufacturing: single-instance anomaly detection, where only one object, either real or synthetic, is available fo… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  2. arXiv:2505.22486  [pdf, other

    cs.LG cs.CV

    Understanding Adversarial Training with Energy-based Models

    Authors: Mujtaba Hussain Mirza, Maria Rosaria Briglia, Filippo Bartolucci, Senad Beadini, Giuseppe Lisanti, Iacopo Masi

    Abstract: We aim at using Energy-based Model (EBM) framework to better understand adversarial training (AT) in classifiers, and additionally to analyze the intrinsic generative capabilities of robust classifiers. By viewing standard classifiers through an energy lens, we begin by analyzing how the energies of adversarial examples, generated by various attacks, differ from those of the natural samples. The c… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Under review for TPAMI

  3. arXiv:2505.21742  [pdf, ps, other

    cs.CV cs.LG

    What is Adversarial Training for Diffusion Models?

    Authors: Briglia Maria Rosaria, Mujtaba Hussain Mirza, Giuseppe Lisanti, Iacopo Masi

    Abstract: We answer the question in the title, showing that adversarial training (AT) for diffusion models (DMs) fundamentally differs from classifiers: while AT in classifiers enforces output invariance, AT in DMs requires equivariance to keep the diffusion process aligned with the data distribution. AT is a way to enforce smoothness in the diffusion flow, improving robustness to outliers and corrupted dat… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 40 pages

  4. arXiv:2504.13995  [pdf, other

    cs.CV

    Scaling LLaNA: Advancing NeRF-Language Understanding Through Large-Scale Training

    Authors: Andrea Amaduzzi, Pierluigi Zama Ramirez, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano

    Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have shown remarkable capabilities in understanding both images and 3D data, yet these modalities face inherent limitations in comprehensively representing object geometry and appearance. Neural Radiance Fields (NeRFs) have emerged as a promising alternative, encoding both geometric and photorealistic properties within the weights of a si… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: Under submission. Project page at https://andreamaduzzi.github.io/llana/

  5. arXiv:2409.17941  [pdf, other

    cs.CV

    Perturb, Attend, Detect and Localize (PADL): Robust Proactive Image Defense

    Authors: Filippo Bartolucci, Iacopo Masi, Giuseppe Lisanti

    Abstract: Image manipulation detection and localization have received considerable attention from the research community given the blooming of Generative Models (GMs). Detection methods that follow a passive approach may overfit to specific GMs, limiting their application in real-world scenarios, due to the growing diversity of generative models. Recently, approaches based on a proactive framework have show… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

  6. Learning to Be a Transformer to Pinpoint Anomalies

    Authors: Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti, Luigi Di Stefano

    Abstract: To efficiently deploy strong, often pre-trained feature extractors, recent Industrial Anomaly Detection and Segmentation (IADS) methods process low-resolution images, e.g., 224x224 pixels, obtained by downsampling the original input images. However, while numerous industrial applications demand the identification of both large and small defects, downsampling the input image to a low resolution may… ▽ More

    Submitted 26 June, 2025; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted at IEEE Access

  7. arXiv:2406.11840  [pdf, other

    cs.CV

    LLaNA: Large Language and NeRF Assistant

    Authors: Andrea Amaduzzi, Pierluigi Zama Ramirez, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated an excellent understanding of images and 3D data. However, both modalities have shortcomings in holistically capturing the appearance and geometry of objects. Meanwhile, Neural Radiance Fields (NeRFs), which encode information within the weights of a simple Multi-Layer Perceptron (MLP), have emerged as an increasingly widespread modality t… ▽ More

    Submitted 22 November, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Under review. Project page: https://andreamaduzzi.github.io/llana/

  8. arXiv:2404.03743  [pdf, other

    cs.CV

    Test Time Training for Industrial Anomaly Segmentation

    Authors: Alex Costanzino, Pierluigi Zama Ramirez, Mirko Del Moro, Agostino Aiezzo, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano

    Abstract: Anomaly Detection and Segmentation (AD&S) is crucial for industrial quality control. While existing methods excel in generating anomaly scores for each pixel, practical applications require producing a binary segmentation to identify anomalies. Due to the absence of labeled anomalies in many real scenarios, standard practices binarize these maps based on some statistics derived from a validation s… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted at VAND 2.0, CVPRW 2024

  9. arXiv:2312.04521  [pdf, other

    cs.CV

    Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping

    Authors: Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti, Luigi Di Stefano

    Abstract: The paper explores the industrial multimodal Anomaly Detection (AD) task, which exploits point clouds and RGB images to localize anomalies. We introduce a novel light and fast framework that learns to map features from one modality to the other on nominal samples. At test time, anomalies are detected by pinpointing inconsistencies between observed and mapped features. Extensive experiments show th… ▽ More

    Submitted 8 July, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: Accepted at CVPR 2024

  10. arXiv:2309.07917  [pdf, other

    cs.CV

    Looking at words and points with attention: a benchmark for text-to-shape coherence

    Authors: Andrea Amaduzzi, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano

    Abstract: While text-conditional 3D object generation and manipulation have seen rapid progress, the evaluation of coherence between generated 3D shapes and input textual descriptions lacks a clear benchmark. The reason is twofold: a) the low quality of the textual descriptions in the only publicly available dataset of text-shape pairs; b) the limited effectiveness of the metrics used to quantitatively asse… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: ICCV 2023 Workshop "AI for 3D Content Creation", Project page: https://cvlab-unibo.github.io/CrossCoherence-Web/, 26 pages

  11. Semantic Image Synthesis via Class-Adaptive Cross-Attention

    Authors: Tomaso Fontanini, Claudio Ferrari, Giuseppe Lisanti, Massimo Bertozzi, Andrea Prati

    Abstract: In semantic image synthesis the state of the art is dominated by methods that use customized variants of the SPatially-Adaptive DE-normalization (SPADE) layers, which allow for good visual generation quality and editing versatility. By design, such layers learn pixel-wise modulation parameters to de-normalize the generator activations based on the semantic class each pixel belongs to. Thus, they t… ▽ More

    Submitted 30 July, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: Code and models available at https://github.com/TFonta/CA2SIS The paper is under consideration at Computer Vision and Image Understanding

  12. arXiv:2306.00914  [pdf, other

    cs.CV

    Conditioning Diffusion Models via Attributes and Semantic Masks for Face Generation

    Authors: Nico Giambi, Giuseppe Lisanti

    Abstract: Deep generative models have shown impressive results in generating realistic images of faces. GANs managed to generate high-quality, high-fidelity images when conditioned on semantic masks, but they still lack the ability to diversify their output. Diffusion models partially solve this problem and are able to generate diverse samples given the same condition. In this paper, we propose a multi-cond… ▽ More

    Submitted 27 September, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: The paper is under consideration at Computer Vision and Image Understanding

  13. Multimodal Side-Tuning for Document Classification

    Authors: Stefano Pio Zingaro, Giuseppe Lisanti, Maurizio Gabbrielli

    Abstract: In this paper, we propose to exploit the side-tuning framework for multimodal document classification. Side-tuning is a methodology for network adaptation recently introduced to solve some of the problems related to previous approaches. Thanks to this technique it is actually possible to overcome model rigidity and catastrophic forgetting of transfer learning by fine-tuning. The proposed solution… ▽ More

    Submitted 23 January, 2023; v1 submitted 16 January, 2023; originally announced January 2023.

    Comments: 2020 25th International Conference on Pattern Recognition (ICPR)

  14. arXiv:2012.01955  [pdf, other

    cs.CV cs.CY cs.MM

    IMAGO: A family photo album dataset for a socio-historical analysis of the twentieth century

    Authors: Lorenzo Stacchio, Alessia Angeli, Giuseppe Lisanti, Daniela Calanca, Gustavo Marfia

    Abstract: Although one of the most popular practices in photography since the end of the 19th century, an increase in scholarly interest in family photo albums dates back to the early 1980s. Such collections of photos may reveal sociological and historical insights regarding specific cultures and times. They are, however, in most cases scattered among private homes and only available on paper or photographi… ▽ More

    Submitted 3 December, 2020; originally announced December 2020.

  15. arXiv:1707.09173  [pdf, other

    cs.CV

    Group Re-Identification via Unsupervised Transfer of Sparse Features Encoding

    Authors: Giuseppe Lisanti, Niki Martinel, Alberto Del Bimbo, Gian Luca Foresti

    Abstract: Person re-identification is best known as the problem of associating a single person that is observed from one or more disjoint cameras. The existing literature has mainly addressed such an issue, neglecting the fact that people usually move in groups, like in crowded scenarios. We believe that the additional information carried by neighboring individuals provides a relevant visual context that ca… ▽ More

    Submitted 28 July, 2017; originally announced July 2017.

    Comments: This paper has been accepted for publication at ICCV 2017

  16. arXiv:1705.02503  [pdf, other

    cs.CV

    Context-Aware Trajectory Prediction

    Authors: Federico Bartoli, Giuseppe Lisanti, Lamberto Ballan, Alberto Del Bimbo

    Abstract: Human motion and behaviour in crowded spaces is influenced by several factors, such as the dynamics of other moving agents in the scene, as well as the static elements that might be perceived as points of attraction or obstacles. In this work, we present a new model for human trajectory prediction which is able to take advantage of both human-human and human-space interactions. The future trajecto… ▽ More

    Submitted 6 May, 2017; originally announced May 2017.

    Comments: Submitted to BMVC 2017

  17. Multi Channel-Kernel Canonical Correlation Analysis for Cross-View Person Re-Identification

    Authors: Giuseppe Lisanti, Svebor Karaman, Iacopo Masi

    Abstract: In this paper we introduce a method to overcome one of the main challenges of person re-identification in multi-camera networks, namely cross-view appearance changes. The proposed solution addresses the extreme variability of person appearance in different camera views by exploiting multiple feature representations. For each feature, Kernel Canonical Correlation Analysis (KCCA) with different kern… ▽ More

    Submitted 21 March, 2017; v1 submitted 7 July, 2016; originally announced July 2016.

    Comments: The latest/updated version of the manuscript with more experiments can be found at https://doi.org/10.1145/3038916. Please cite the paper using https://doi.org/10.1145/3038916

    Journal ref: ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 13 Issue 2, March 2017

  18. arXiv:1507.07815  [pdf, other

    cs.CV

    A Multi-Camera Image Processing and Visualization System for Train Safety Assessment

    Authors: Giuseppe Lisanti, Svebor Karaman, Daniele Pezzatini, Alberto Del Bimbo

    Abstract: In this paper we present a machine vision system to efficiently monitor, analyze and present visual data acquired with a railway overhead gantry equipped with multiple cameras. This solution aims to improve the safety of daily life railway transportation in a two- fold manner: (1) by providing automatic algorithms that can process large imagery of trains (2) by helping train operators to keep atte… ▽ More

    Submitted 28 July, 2015; originally announced July 2015.

    Comments: 11 pages

  19. arXiv:1401.6606  [pdf, other

    cs.CV

    Continuous Localization and Mapping of a Pan Tilt Zoom Camera for Wide Area Tracking

    Authors: Giuseppe Lisanti, Iacopo Masi, Federico Pernici, Alberto Del Bimbo

    Abstract: Pan-tilt-zoom (PTZ) cameras are powerful to support object identification and recognition in far-field scenes. However, the effective use of PTZ cameras in real contexts is complicated by the fact that a continuous on-line camera calibration is needed and the absolute pan, tilt and zoom positional values provided by the camera actuators cannot be used because are not synchronized with the video st… ▽ More

    Submitted 23 March, 2015; v1 submitted 25 January, 2014; originally announced January 2014.