Search | arXiv e-print repository

SplatVoxel: History-Aware Novel View Streaming without Temporal Training

Authors: Yiming Wang, Lucy Chai, Xuan Luo, Michael Niemeyer, Manuel Lagunas, Stephen Lombardi, Siyu Tang, Tiancheng Sun

Abstract: We study the problem of novel view streaming from sparse-view videos, which aims to generate a continuous sequence of high-quality, temporally consistent novel views as new input frames arrive. However, existing novel view synthesis methods struggle with temporal coherence and visual fidelity, leading to flickering and inconsistency. To address these challenges, we introduce history-awareness, lev… ▽ More We study the problem of novel view streaming from sparse-view videos, which aims to generate a continuous sequence of high-quality, temporally consistent novel views as new input frames arrive. However, existing novel view synthesis methods struggle with temporal coherence and visual fidelity, leading to flickering and inconsistency. To address these challenges, we introduce history-awareness, leveraging previous frames to reconstruct the scene and improve quality and stability. We propose a hybrid splat-voxel feed-forward scene reconstruction approach that combines Gaussian Splatting to propagate information over time, with a hierarchical voxel grid for temporal fusion. Gaussian primitives are efficiently warped over time using a motion graph that extends 2D tracking models to 3D motion, while a sparse voxel transformer integrates new temporal observations in an error-aware manner. Crucially, our method does not require training on multi-view video datasets, which are currently limited in size and diversity, and can be directly applied to sparse-view video streams in a history-aware manner at inference time. Our approach achieves state-of-the-art performance in both static and streaming scene reconstruction, effectively reducing temporal artifacts and visual artifacts while running at interactive rates (15 fps with 350ms delay) on a single H100 GPU. Project Page: https://19reborn.github.io/SplatVoxel/ △ Less

Submitted 18 March, 2025; originally announced March 2025.

arXiv:2502.19014 [pdf, other]

Robust Over-the-Air Computation with Type-Based Multiple Access

Authors: Marc Martinez-Gost, Ana Pérez-Neira, Miguel Ángel Lagunas

Abstract: This paper utilizes the properties of type-based multiple access (TBMA) to investigate its effectiveness as a robust approach for over-the-air computation (AirComp) in the presence of Byzantine attacks, this is, adversarial strategies where malicious nodes intentionally distort their transmissions to corrupt the aggregated result. Unlike classical direct aggregation (DA) AirComp, which aggregates… ▽ More This paper utilizes the properties of type-based multiple access (TBMA) to investigate its effectiveness as a robust approach for over-the-air computation (AirComp) in the presence of Byzantine attacks, this is, adversarial strategies where malicious nodes intentionally distort their transmissions to corrupt the aggregated result. Unlike classical direct aggregation (DA) AirComp, which aggregates data in the amplitude of the signals and are highly vulnerable to attacks, TBMA distributes data over multiple radio resources, enabling the receiver to construct a histogram representation of the transmitted data. This structure allows the integration of classical robust estimators and supports the computation of diverse functions beyond the arithmetic mean, which is not feasible with DA. Through extensive simulations, we demonstrate that robust TBMA significantly outperforms DA, maintaining high accuracy even under adversarial conditions, and showcases its applicability in federated learning (FEEL) scenarios. Additionally, TBMA reduces channel state information (CSI) requirements, lowers energy consumption, and enhances resiliency by leveraging the diversity of the transmitted data. These results establish TBMA as a scalable and robust solution for AirComp, paving the way for secure and efficient aggregation in next-generation networks. △ Less

Submitted 26 February, 2025; originally announced February 2025.

Comments: Paper submitted to 33rd European Signal Processing Conference (EUSIPCO 2025)

arXiv:2309.00530 [pdf, other]

doi 10.1109/CSCC58962.2023.00043

Adaptive function approximation based on the Discrete Cosine Transform (DCT)

Authors: Ana I. Pérez-Neira, Marc Martinez-Gost, Miguel Ángel Lagunas

Abstract: This paper studies the cosine as basis function for the approximation of univariate and continuous functions without memory. This work studies a supervised learning to obtain the approximation coefficients, instead of using the Discrete Cosine Transform (DCT). Due to the finite dynamics and orthogonality of the cosine basis functions, simple gradient algorithms, such as the Normalized Least Mean S… ▽ More This paper studies the cosine as basis function for the approximation of univariate and continuous functions without memory. This work studies a supervised learning to obtain the approximation coefficients, instead of using the Discrete Cosine Transform (DCT). Due to the finite dynamics and orthogonality of the cosine basis functions, simple gradient algorithms, such as the Normalized Least Mean Squares (NLMS), can benefit from it and present a controlled and predictable convergence time and error misadjustment. Due to its simplicity, the proposed technique ranks as the best in terms of learning quality versus complexity, and it is presented as an attractive technique to be used in more complex supervised learning systems. Simulations illustrate the performance of the approach. This paper celebrates the 50th anniversary of the publication of the DCT by Nasir Ahmed in 1973. △ Less

Submitted 1 September, 2023; originally announced September 2023.

Comments: Accepted paper in 26th International Conference on Circuits, Systems, Communications and Computers (CSCC)

arXiv:2307.00673 [pdf, other]

doi 10.1109/JSTSP.2024.3361154

ENN: A Neural Network with DCT Adaptive Activation Functions

Authors: Marc Martinez-Gost, Ana Pérez-Neira, Miguel Ángel Lagunas

Abstract: The expressiveness of neural networks highly depends on the nature of the activation function, although these are usually assumed predefined and fixed during the training stage. Under a signal processing perspective, in this paper we present Expressive Neural Network (ENN), a novel model in which the non-linear activation functions are modeled using the Discrete Cosine Transform (DCT) and adapted… ▽ More The expressiveness of neural networks highly depends on the nature of the activation function, although these are usually assumed predefined and fixed during the training stage. Under a signal processing perspective, in this paper we present Expressive Neural Network (ENN), a novel model in which the non-linear activation functions are modeled using the Discrete Cosine Transform (DCT) and adapted using backpropagation during training. This parametrization keeps the number of trainable parameters low, is appropriate for gradient-based schemes, and adapts to different learning tasks. This is the first non-linear model for activation functions that relies on a signal processing perspective, providing high flexibility and expressiveness to the network. We contribute with insights in the explainability of the network at convergence by recovering the concept of bump, this is, the response of each activation function in the output space. Finally, through exhaustive experiments we show that the model can adapt to classification and regression tasks. The performance of ENN outperforms state of the art benchmarks, providing above a 40% gap in accuracy in some scenarios. △ Less

Submitted 30 January, 2024; v1 submitted 2 July, 2023; originally announced July 2023.

Comments: Paper accepted in IEEE Journal of Selected Topics in Signal Processing (JSTSP) Special Series on AI in Signal & Data Science - Toward Explainable, Reliable, and Sustainable Machine Learning

arXiv:2305.10018 [pdf, other]

Transfer Learning for Fine-grained Classification Using Semi-supervised Learning and Visual Transformers

Authors: Manuel Lagunas, Brayan Impata, Victor Martinez, Virginia Fernandez, Christos Georgakis, Sofia Braun, Felipe Bertrand

Abstract: Fine-grained classification is a challenging task that involves identifying subtle differences between objects within the same category. This task is particularly challenging in scenarios where data is scarce. Visual transformers (ViT) have recently emerged as a powerful tool for image classification, due to their ability to learn highly expressive representations of visual data using self-attenti… ▽ More Fine-grained classification is a challenging task that involves identifying subtle differences between objects within the same category. This task is particularly challenging in scenarios where data is scarce. Visual transformers (ViT) have recently emerged as a powerful tool for image classification, due to their ability to learn highly expressive representations of visual data using self-attention mechanisms. In this work, we explore Semi-ViT, a ViT model fine tuned using semi-supervised learning techniques, suitable for situations where we have lack of annotated data. This is particularly common in e-commerce, where images are readily available but labels are noisy, nonexistent, or expensive to obtain. Our results demonstrate that Semi-ViT outperforms traditional convolutional neural networks (CNN) and ViTs, even when fine-tuned with limited annotated data. These findings indicate that Semi-ViTs hold significant promise for applications that require precise and fine-grained classification of visual data. △ Less

Submitted 17 May, 2023; originally announced May 2023.

Comments: 6 pages, 1 figure, 3 tables

arXiv:2302.03619 [pdf, ps, other]

In-the-wild Material Appearance Editing using Perceptual Attributes

Authors: J. Daniel Subias, Manuel Lagunas

Abstract: Intuitively editing the appearance of materials from a single image is a challenging task given the complexity of the interactions between light and matter, and the ambivalence of human perception. This problem has been traditionally addressed by estimating additional factors of the scene like geometry or illumination, thus solving an inverse rendering problem and subduing the final quality of the… ▽ More Intuitively editing the appearance of materials from a single image is a challenging task given the complexity of the interactions between light and matter, and the ambivalence of human perception. This problem has been traditionally addressed by estimating additional factors of the scene like geometry or illumination, thus solving an inverse rendering problem and subduing the final quality of the results to the quality of these estimations. We present a single-image appearance editing framework that allows us to intuitively modify the material appearance of an object by increasing or decreasing high-level perceptual attributes describing such appearance (e.g., glossy or metallic). Our framework takes as input an in-the-wild image of a single object, where geometry, material, and illumination are not controlled, and inverse rendering is not required. We rely on generative models and devise a novel architecture with Selective Transfer Unit (STU) cells that allow to preserve the high-frequency details from the input image in the edited one. To train our framework we leverage a dataset with pairs of synthetic images rendered with physically-based algorithms, and the corresponding crowd-sourced ratings of high-level perceptual attributes. We show that our material editing framework outperforms the state of the art, and showcase its applicability on synthetic images, in-the-wild real-world photographs, and video sequences. △ Less

Submitted 13 February, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

Comments: 13 pages, 16 figures. Camera-Ready version accepted at Eurgraphics 2023

arXiv:2107.07259 [pdf, other]

doi 10.2312/sr.20211300

Single-image Full-body Human Relighting

Authors: Manuel Lagunas, Xin Sun, Jimei Yang, Ruben Villegas, Jianming Zhang, Zhixin Shu, Belen Masia, Diego Gutierrez

Abstract: We present a single-image data-driven method to automatically relight images with full-body humans in them. Our framework is based on a realistic scene decomposition leveraging precomputed radiance transfer (PRT) and spherical harmonics (SH) lighting. In contrast to previous work, we lift the assumptions on Lambertian materials and explicitly model diffuse and specular reflectance in our data. Mor… ▽ More We present a single-image data-driven method to automatically relight images with full-body humans in them. Our framework is based on a realistic scene decomposition leveraging precomputed radiance transfer (PRT) and spherical harmonics (SH) lighting. In contrast to previous work, we lift the assumptions on Lambertian materials and explicitly model diffuse and specular reflectance in our data. Moreover, we introduce an additional light-dependent residual term that accounts for errors in the PRT-based image reconstruction. We propose a new deep learning architecture, tailored to the decomposition performed in PRT, that is trained using a combination of L1, logarithmic, and rendering losses. Our model outperforms the state of the art for full-body human relighting both with synthetic images and photographs. △ Less

Submitted 15 July, 2021; originally announced July 2021.

Comments: 11 pages, 12 figures

Journal ref: Eurographics Symposium on Rendering (EGSR), 2021

arXiv:2101.02496 [pdf, other]

doi 10.1167/jov.21.2.2

The joint role of geometry and illumination on material recognition

Authors: Manuel Lagunas, Ana Serrano, Diego Gutierrez, Belen Masia

Abstract: Observing and recognizing materials is a fundamental part of our daily life. Under typical viewing conditions, we are capable of effortlessly identifying the objects that surround us and recognizing the materials they are made of. Nevertheless, understanding the underlying perceptual processes that take place to accurately discern the visual properties of an object is a long-standing problem. In t… ▽ More Observing and recognizing materials is a fundamental part of our daily life. Under typical viewing conditions, we are capable of effortlessly identifying the objects that surround us and recognizing the materials they are made of. Nevertheless, understanding the underlying perceptual processes that take place to accurately discern the visual properties of an object is a long-standing problem. In this work, we perform a comprehensive and systematic analysis of how the interplay of geometry, illumination, and their spatial frequencies affects human performance on material recognition tasks. We carry out large-scale behavioral experiments where participants are asked to recognize different reference materials among a pool of candidate samples. In the different experiments, we carefully sample the information in the frequency domain of the stimuli. From our analysis, we find significant first-order interactions between the geometry and the illumination, of both the reference and the candidates. In addition, we observe that simple image statistics and higher-order image histograms do not correlate with human performance. Therefore, we perform a high-level comparison of highly non-linear statistics by training a deep neural network on material recognition tasks. Our results show that such models can accurately classify materials, which suggests that they are capable of defining a meaningful representation of material appearance from labeled proximal image data. Last, we find preliminary evidence that these highly non-linear models and humans may use similar high-level factors for material recognition tasks. △ Less

Submitted 4 February, 2021; v1 submitted 7 January, 2021; originally announced January 2021.

Comments: 15 pages, 16 figures, Accepted to the Journal of Vision, 2021

Journal ref: Journal of Vision February 2021, Vol.21, 2

arXiv:2007.13679 [pdf]

Llums que no són només llums

Authors: Jorge Baranda, Pol Henarejos, Ciprian George Gavrincea, Miguel Ángel Lagunas

Abstract: Visible Light Communications (VLC) is a new paradigm in wireless communications. The characteristics of this technology, which uses light-emitting diode-based lighting devices as transmitting elements, make it possible to be considered a complement to current wireless radio communication systems. ----- Les comunicacions per llum visible o 'Visible Light Communications' (VLC) són un nou paradig… ▽ More Visible Light Communications (VLC) is a new paradigm in wireless communications. The characteristics of this technology, which uses light-emitting diode-based lighting devices as transmitting elements, make it possible to be considered a complement to current wireless radio communication systems. ----- Les comunicacions per llum visible o 'Visible Light Communications' (VLC) són un nou paradigma en comunicacions sense fils. Les característiques que presenta aquesta tecnologia, que utilitza els dispositius d'il{\lgem{}}luminació basats en díodes emissors de llum com elements transmissors, fa que es pugui considerar un complement dels actuals sistemes de comunicació inal`ambrics. △ Less

Submitted 15 July, 2020; originally announced July 2020.

Comments: 4 pages, in catalan

Journal ref: Telecos.cat, no. 59, vol. 6, 2013

arXiv:1905.01562 [pdf, other]

doi 10.1145/3306346.3323036

A Similarity Measure for Material Appearance

Authors: Manuel Lagunas, Sandra Malpica, Ana Serrano, Elena Garces, Diego Gutierrez, Belen Masia

Abstract: We present a model to measure the similarity in appearance between different materials, which correlates with human similarity judgments. We first create a database of 9,000 rendered images depicting objects with varying materials, shape and illumination. We then gather data on perceived similarity from crowdsourced experiments; our analysis of over 114,840 answers suggests that indeed a shared pe… ▽ More We present a model to measure the similarity in appearance between different materials, which correlates with human similarity judgments. We first create a database of 9,000 rendered images depicting objects with varying materials, shape and illumination. We then gather data on perceived similarity from crowdsourced experiments; our analysis of over 114,840 answers suggests that indeed a shared perception of appearance similarity exists. We feed this data to a deep learning architecture with a novel loss function, which learns a feature space for materials that correlates with such perceived appearance similarity. Our evaluation shows that our model outperforms existing metrics. Last, we demonstrate several applications enabled by our metric, including appearance-based search for material suggestions, database visualization, clustering and summarization, and gamut mapping. △ Less

Submitted 4 May, 2019; originally announced May 2019.

Comments: 12 pages, 17 figures

Journal ref: ACM Transactions on Graphics (SIGGRAPH 2019)

arXiv:1902.05378 [pdf, ps, other]

doi 10.1007/s11042-018-6628-7

Learning icons appearance similarity

Authors: Manuel Lagunas, Elena Garces, Diego Gutierrez

Abstract: Selecting an optimal set of icons is a crucial step in the pipeline of visual design to structure and navigate through content. However, designing the icons sets is usually a difficult task for which expert knowledge is required. In this work, to ease the process of icon set selection to the users, we propose a similarity metric which captures the properties of style and visual identity. We train… ▽ More Selecting an optimal set of icons is a crucial step in the pipeline of visual design to structure and navigate through content. However, designing the icons sets is usually a difficult task for which expert knowledge is required. In this work, to ease the process of icon set selection to the users, we propose a similarity metric which captures the properties of style and visual identity. We train a Siamese Neural Network with an online dataset of icons organized in visually coherent collections that are used to adaptively sample training data and optimize the training process. As the dataset contains noise, we further collect human-rated information on the perception of icon's similarity which will be used for evaluating and testing the proposed model. We present several results and applications based on searches, kernel visualizations and optimized set proposals that can be helpful for designers and non-expert users while exploring large collections of icons. △ Less

Submitted 1 February, 2019; originally announced February 2019.

Comments: 12 pages, 11 figures

Journal ref: Multimedia Tools and Applications, pages: 1-19, year: 2018, publisher: Springer

arXiv:1806.02682 [pdf, other]

doi 10.2312/ceig.20171213

Transfer Learning for Illustration Classification

Authors: Manuel Lagunas, Elena Garces

Abstract: The field of image classification has shown an outstanding success thanks to the development of deep learning techniques. Despite the great performance obtained, most of the work has focused on natural images ignoring other domains like artistic depictions. In this paper, we use transfer learning techniques to propose a new classification network with better performance in illustration images. Sta… ▽ More The field of image classification has shown an outstanding success thanks to the development of deep learning techniques. Despite the great performance obtained, most of the work has focused on natural images ignoring other domains like artistic depictions. In this paper, we use transfer learning techniques to propose a new classification network with better performance in illustration images. Starting from the deep convolutional network VGG19, pre-trained with natural images, we propose two novel models which learn object representations in the new domain. Our optimized network will learn new low-level features of the images (colours, edges, textures) while keeping the knowledge of the objects and shapes that it already learned from the ImageNet dataset. Thus, requiring much less data for the training. We propose a novel dataset of illustration images labelled by content where our optimized architecture achieves $\textbf{86.61\%}$ of top-1 and $\textbf{97.21\%}$ of top-5 precision. We additionally demonstrate that our model is still able to recognize objects in photographs. △ Less

Submitted 23 May, 2018; originally announced June 2018.

Comments: 9 pages, 8 figures, 4 tables

Journal ref: 2017 Spanish Computer Graphics Conference (CEIG)

Showing 1–12 of 12 results for author: Lagunas, M