-
Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders
Authors:
James Oldfield,
Shawn Im,
Yixuan Li,
Mihalis A. Nicolaou,
Ioannis Patras,
Grigorios G Chrysos
Abstract:
Multilayer perceptrons (MLPs) are an integral part of large language models, yet their dense representations render them difficult to understand, edit, and steer. Recent methods learn interpretable approximations via neuron-level sparsity, yet fail to faithfully reconstruct the original mapping--significantly increasing model's next-token cross-entropy loss. In this paper, we advocate for moving t…
▽ More
Multilayer perceptrons (MLPs) are an integral part of large language models, yet their dense representations render them difficult to understand, edit, and steer. Recent methods learn interpretable approximations via neuron-level sparsity, yet fail to faithfully reconstruct the original mapping--significantly increasing model's next-token cross-entropy loss. In this paper, we advocate for moving to layer-level sparsity to overcome the accuracy trade-off in sparse layer approximation. Under this paradigm, we introduce Mixture of Decoders (MxDs). MxDs generalize MLPs and Gated Linear Units, expanding pre-trained dense layers into tens of thousands of specialized sublayers. Through a flexible form of tensor factorization, each sparsely activating MxD sublayer implements a linear transformation with full-rank weights--preserving the original decoders' expressive capacity even under heavy sparsity. Experimentally, we show that MxDs significantly outperform state-of-the-art methods (e.g., Transcoders) on the sparsity-accuracy frontier in language models with up to 3B parameters. Further evaluations on sparse probing and feature steering demonstrate that MxDs learn similarly specialized features of natural language--opening up a promising new avenue for designing interpretable yet faithful decompositions. Our code is included at: https://github.com/james-oldfield/MxD/.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Immersed boundary - lattice Boltzmann method for wetting problems
Authors:
Elisa Bellantoni,
Fabio Guglietta,
Francesca Pelusi,
Mathieu Desbrun,
Kiwon Um,
Mihalis Nicolaou,
Nikos Savva,
Mauro Sbragaglia
Abstract:
We develop a mesoscale computational model to describe the interaction of a droplet with a solid. The model is based on the hybrid combination of the immersed boundary and the lattice Boltzmann computational schemes: the former is used to model the non-ideal sharp interface of the droplet coupled with the inner and outer fluids, simulated with the lattice Boltzmann scheme. We further introduce an…
▽ More
We develop a mesoscale computational model to describe the interaction of a droplet with a solid. The model is based on the hybrid combination of the immersed boundary and the lattice Boltzmann computational schemes: the former is used to model the non-ideal sharp interface of the droplet coupled with the inner and outer fluids, simulated with the lattice Boltzmann scheme. We further introduce an interaction force to model the wetting interactions of the droplet with the solid: this interaction force is designed with the key computational advantage of providing a regularization of the interface profile close to the contact line, avoiding abrupt curvature changes that could otherwise cause numerical instabilities. The proposed model substantially improves earlier immersed boundary - lattice Boltzmann models for wetting in that it allows a description of an ample variety of wetting interactions, ranging from hydrophobic to hydrophilic cases, without the need for any pre-calibration study on model parameters to be used. Model validations against theoretical results for droplet shape at equilibrium and scaling laws for droplet spreading dynamics are addressed.
△ Less
Submitted 27 March, 2025; v1 submitted 26 March, 2025;
originally announced March 2025.
-
Enabling Local Editing in Diffusion Models by Joint and Individual Component Analysis
Authors:
Theodoros Kouzelis,
Manos Plitsis,
Mihalis A. Nicolaou,
Yannis Panagakis
Abstract:
Recent advances in Diffusion Models (DMs) have led to significant progress in visual synthesis and editing tasks, establishing them as a strong competitor to Generative Adversarial Networks (GANs). However, the latent space of DMs is not as well understood as that of GANs. Recent research has focused on unsupervised semantic discovery in the latent space of DMs by leveraging the bottleneck layer o…
▽ More
Recent advances in Diffusion Models (DMs) have led to significant progress in visual synthesis and editing tasks, establishing them as a strong competitor to Generative Adversarial Networks (GANs). However, the latent space of DMs is not as well understood as that of GANs. Recent research has focused on unsupervised semantic discovery in the latent space of DMs by leveraging the bottleneck layer of the denoising network, which has been shown to exhibit properties of a semantic latent space. However, these approaches are limited to discovering global attributes. In this paper we address, the challenge of local image manipulation in DMs and introduce an unsupervised method to factorize the latent semantics learned by the denoising network of pre-trained DMs. Given an arbitrary image and defined regions of interest, we utilize the Jacobian of the denoising network to establish a relation between the regions of interest and their corresponding subspaces in the latent space. Furthermore, we disentangle the joint and individual components of these subspaces to identify latent directions that enable local image manipulation. Once discovered, these directions can be applied to different images to produce semantically consistent edits, making our method suitable for practical applications. Experimental results on various datasets demonstrate that our method can produce semantic edits that are more localized and have better fidelity compared to the state-of-the-art.
△ Less
Submitted 2 September, 2024; v1 submitted 29 August, 2024;
originally announced August 2024.
-
Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses
Authors:
Panagiotis Koromilas,
Giorgos Bouritsas,
Theodoros Giannakopoulos,
Mihalis Nicolaou,
Yannis Panagakis
Abstract:
What do different contrastive learning (CL) losses actually optimize for? Although multiple CL methods have demonstrated remarkable representation learning capabilities, the differences in their inner workings remain largely opaque. In this work, we analyse several CL families and prove that, under certain conditions, they admit the same minimisers when optimizing either their batch-level objectiv…
▽ More
What do different contrastive learning (CL) losses actually optimize for? Although multiple CL methods have demonstrated remarkable representation learning capabilities, the differences in their inner workings remain largely opaque. In this work, we analyse several CL families and prove that, under certain conditions, they admit the same minimisers when optimizing either their batch-level objectives or their expectations asymptotically. In both cases, an intimate connection with the hyperspherical energy minimisation (HEM) problem resurfaces. Drawing inspiration from this, we introduce a novel CL objective, coined Decoupled Hyperspherical Energy Loss (DHEL). DHEL simplifies the problem by decoupling the target hyperspherical energy from the alignment of positive examples while preserving the same theoretical guarantees. Going one step further, we show the same results hold for another relevant CL family, namely kernel contrastive learning (KCL), with the additional advantage of the expected loss being independent of batch size, thus identifying the minimisers in the non-asymptotic regime. Empirical results demonstrate improved downstream performance and robustness across combinations of different batch sizes and hyperparameters and reduced dimensionality collapse, on several computer vision datasets.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization
Authors:
James Oldfield,
Markos Georgopoulos,
Grigorios G. Chrysos,
Christos Tzelepis,
Yannis Panagakis,
Mihalis A. Nicolaou,
Jiankang Deng,
Ioannis Patras
Abstract:
The Mixture of Experts (MoE) paradigm provides a powerful way to decompose dense layers into smaller, modular computations often more amenable to human interpretation, debugging, and editability. However, a major challenge lies in the computational cost of scaling the number of experts high enough to achieve fine-grained specialization. In this paper, we propose the Multilinear Mixture of Experts…
▽ More
The Mixture of Experts (MoE) paradigm provides a powerful way to decompose dense layers into smaller, modular computations often more amenable to human interpretation, debugging, and editability. However, a major challenge lies in the computational cost of scaling the number of experts high enough to achieve fine-grained specialization. In this paper, we propose the Multilinear Mixture of Experts ($μ$MoE) layer to address this, focusing on vision models. $μ$MoE layers enable scalable expert specialization by performing an implicit computation on prohibitively large weight tensors entirely in factorized form. Consequently, $μ$MoEs (1) avoid the restrictively high inference-time costs of dense MoEs, yet (2) do not inherit the training issues of the popular sparse MoEs' discrete (non-differentiable) expert routing. We present both qualitative and quantitative evidence that scaling $μ$MoE layers when fine-tuning foundation models for vision tasks leads to more specialized experts at the class-level, further enabling manual bias correction in CelebA attribute classification. Finally, we show qualitative results demonstrating the expert specialism achieved when pre-training large GPT2 and MLP-Mixer models with parameter-matched $μ$MoE blocks at every layer, maintaining comparable accuracy. Our code is available at: https://github.com/james-oldfield/muMoE.
△ Less
Submitted 16 October, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Locality-preserving Directions for Interpreting the Latent Space of Satellite Image GANs
Authors:
Georgia Kourmouli,
Nikos Kostagiolas,
Yannis Panagakis,
Mihalis A. Nicolaou
Abstract:
We present a locality-aware method for interpreting the latent space of wavelet-based Generative Adversarial Networks (GANs), that can well capture the large spatial and spectral variability that is characteristic to satellite imagery. By focusing on preserving locality, the proposed method is able to decompose the weight-space of pre-trained GANs and recover interpretable directions that correspo…
▽ More
We present a locality-aware method for interpreting the latent space of wavelet-based Generative Adversarial Networks (GANs), that can well capture the large spatial and spectral variability that is characteristic to satellite imagery. By focusing on preserving locality, the proposed method is able to decompose the weight-space of pre-trained GANs and recover interpretable directions that correspond to high-level semantic concepts (such as urbanization, structure density, flora presence) - that can subsequently be used for guided synthesis of satellite imagery. In contrast to typically used approaches that focus on capturing the variability of the weight-space in a reduced dimensionality space (i.e., based on Principal Component Analysis, PCA), we show that preserving locality leads to vectors with different angles, that are more robust to artifacts and can better preserve class information. Via a set of quantitative and qualitative examples, we further show that the proposed approach can outperform both baseline geometric augmentations, as well as global, PCA-based approaches for data synthesis in the context of data augmentation for satellite scene classification.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Parts of Speech-Grounded Subspaces in Vision-Language Models
Authors:
James Oldfield,
Christos Tzelepis,
Yannis Panagakis,
Mihalis A. Nicolaou,
Ioannis Patras
Abstract:
Latent image representations arising from vision-language models have proved immensely useful for a variety of downstream tasks. However, their utility is limited by their entanglement with respect to different visual attributes. For instance, recent work has shown that CLIP image representations are often biased toward specific visual properties (such as objects or actions) in an unpredictable ma…
▽ More
Latent image representations arising from vision-language models have proved immensely useful for a variety of downstream tasks. However, their utility is limited by their entanglement with respect to different visual attributes. For instance, recent work has shown that CLIP image representations are often biased toward specific visual properties (such as objects or actions) in an unpredictable manner. In this paper, we propose to separate representations of the different visual modalities in CLIP's joint vision-language space by leveraging the association between parts of speech and specific visual modes of variation (e.g. nouns relate to objects, adjectives describe appearance). This is achieved by formulating an appropriate component analysis model that learns subspaces capturing variability corresponding to a specific part of speech, while jointly minimising variability to the rest. Such a subspace yields disentangled representations of the different visual properties of an image or text in closed form while respecting the underlying geometry of the manifold on which the representations lie. What's more, we show the proposed model additionally facilitates learning subspaces corresponding to specific visual appearances (e.g. artists' painting styles), which enables the selective removal of entire visual themes from CLIP-based text-to-image synthesis. We validate the model both qualitatively, by visualising the subspace projections with a text-to-image model and by preventing the imitation of artists' styles, and quantitatively, through class invariance metrics and improvements to baseline zero-shot classification.
△ Less
Submitted 12 November, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Unsupervised Discovery of Semantic Concepts in Satellite Imagery with Style-based Wavelet-driven Generative Models
Authors:
Nikos Kostagiolas,
Mihalis A. Nicolaou,
Yannis Panagakis
Abstract:
In recent years, considerable advancements have been made in the area of Generative Adversarial Networks (GANs), particularly with the advent of style-based architectures that address many key shortcomings - both in terms of modeling capabilities and network interpretability. Despite these improvements, the adoption of such approaches in the domain of satellite imagery is not straightforward. Typi…
▽ More
In recent years, considerable advancements have been made in the area of Generative Adversarial Networks (GANs), particularly with the advent of style-based architectures that address many key shortcomings - both in terms of modeling capabilities and network interpretability. Despite these improvements, the adoption of such approaches in the domain of satellite imagery is not straightforward. Typical vision datasets used in generative tasks are well-aligned and annotated, and exhibit limited variability. In contrast, satellite imagery exhibits great spatial and spectral variability, wide presence of fine, high-frequency details, while the tedious nature of annotating satellite imagery leads to annotation scarcity - further motivating developments in unsupervised learning. In this light, we present the first pre-trained style- and wavelet-based GAN model that can readily synthesize a wide gamut of realistic satellite images in a variety of settings and conditions - while also preserving high-frequency information. Furthermore, we show that by analyzing the intermediate activations of our network, one can discover a multitude of interpretable semantic directions that facilitate the guided synthesis of satellite images in terms of high-level concepts (e.g., urbanization) without using any form of supervision. Via a set of qualitative and quantitative experiments we demonstrate the efficacy of our framework, in terms of suitability for downstream tasks (e.g., data augmentation), quality of synthetic imagery, as well as generalization capabilities to unseen datasets.
△ Less
Submitted 3 August, 2022;
originally announced August 2022.
-
PandA: Unsupervised Learning of Parts and Appearances in the Feature Maps of GANs
Authors:
James Oldfield,
Christos Tzelepis,
Yannis Panagakis,
Mihalis A. Nicolaou,
Ioannis Patras
Abstract:
Recent advances in the understanding of Generative Adversarial Networks (GANs) have led to remarkable progress in visual editing and synthesis tasks, capitalizing on the rich semantics that are embedded in the latent spaces of pre-trained GANs. However, existing methods are often tailored to specific GAN architectures and are limited to either discovering global semantic directions that do not fac…
▽ More
Recent advances in the understanding of Generative Adversarial Networks (GANs) have led to remarkable progress in visual editing and synthesis tasks, capitalizing on the rich semantics that are embedded in the latent spaces of pre-trained GANs. However, existing methods are often tailored to specific GAN architectures and are limited to either discovering global semantic directions that do not facilitate localized control, or require some form of supervision through manually provided regions or segmentation masks. In this light, we present an architecture-agnostic approach that jointly discovers factors representing spatial parts and their appearances in an entirely unsupervised fashion. These factors are obtained by applying a semi-nonnegative tensor factorization on the feature maps, which in turn enables context-aware local image editing with pixel-level control. In addition, we show that the discovered appearance factors correspond to saliency maps that localize concepts of interest, without using any labels. Experiments on a wide range of GAN architectures and datasets show that, in comparison to the state of the art, our method is far more efficient in terms of training time and, most importantly, provides much more accurate localized control. Our code is available at: https://github.com/james-oldfield/PandA.
△ Less
Submitted 6 February, 2023; v1 submitted 31 May, 2022;
originally announced June 2022.
-
Tensor Component Analysis for Interpreting the Latent Space of GANs
Authors:
James Oldfield,
Markos Georgopoulos,
Yannis Panagakis,
Mihalis A. Nicolaou,
Ioannis Patras
Abstract:
This paper addresses the problem of finding interpretable directions in the latent space of pre-trained Generative Adversarial Networks (GANs) to facilitate controllable image synthesis. Such interpretable directions correspond to transformations that can affect both the style and geometry of the synthetic images. However, existing approaches that utilise linear techniques to find these transforma…
▽ More
This paper addresses the problem of finding interpretable directions in the latent space of pre-trained Generative Adversarial Networks (GANs) to facilitate controllable image synthesis. Such interpretable directions correspond to transformations that can affect both the style and geometry of the synthetic images. However, existing approaches that utilise linear techniques to find these transformations often fail to provide an intuitive way to separate these two sources of variation. To address this, we propose to a) perform a multilinear decomposition of the tensor of intermediate representations, and b) use a tensor-based regression to map directions found using this decomposition to the latent space. Our scheme allows for both linear edits corresponding to the individual modes of the tensor, and non-linear ones that model the multiplicative interactions between them. We show experimentally that we can utilise the former to better separate style- from geometry-based transformations, and the latter to generate an extended set of possible transformations in comparison to prior works. We demonstrate our approach's efficacy both quantitatively and qualitatively compared to the current state-of-the-art.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
Classification of Influenza Hemagglutinin Protein Sequences using Convolutional Neural Networks
Authors:
Charalambos Chrysostomou,
Floris Alexandrou,
Mihalis A. Nicolaou,
Huseyin Seker
Abstract:
The Influenza virus can be considered as one of the most severe viruses that can infect multiple species with often fatal consequences to the hosts. The Hemagglutinin (HA) gene of the virus can be a target for antiviral drug development realised through accurate identification of its sub-types and possible the targeted hosts. This paper focuses on accurately predicting if an Influenza type A virus…
▽ More
The Influenza virus can be considered as one of the most severe viruses that can infect multiple species with often fatal consequences to the hosts. The Hemagglutinin (HA) gene of the virus can be a target for antiviral drug development realised through accurate identification of its sub-types and possible the targeted hosts. This paper focuses on accurately predicting if an Influenza type A virus can infect specific hosts, and more specifically, Human, Avian and Swine hosts, using only the protein sequence of the HA gene. In more detail, we propose encoding the protein sequences into numerical signals using the Hydrophobicity Index and subsequently utilising a Convolutional Neural Network-based predictive model. The Influenza HA protein sequences used in the proposed work are obtained from the Influenza Research Database (IRD). Specifically, complete and unique HA protein sequences were used for avian, human and swine hosts. The data obtained for this work was 17999 human-host proteins, 17667 avian-host proteins and 9278 swine-host proteins. Given this set of collected proteins, the proposed method yields as much as 10% higher accuracy for an individual class (namely, Avian) and 5% higher overall accuracy than in an earlier study. It is also observed that the accuracy for each class in this work is more balanced than what was presented in this earlier study. As the results show, the proposed model can distinguish HA protein sequences with high accuracy whenever the virus under investigation can infect Human, Avian or Swine hosts.
△ Less
Submitted 9 August, 2021;
originally announced August 2021.
-
Tensor Methods in Computer Vision and Deep Learning
Authors:
Yannis Panagakis,
Jean Kossaifi,
Grigorios G. Chrysos,
James Oldfield,
Mihalis A. Nicolaou,
Anima Anandkumar,
Stefanos Zafeiriou
Abstract:
Tensors, or multidimensional arrays, are data structures that can naturally represent visual data of multiple dimensions. Inherently able to efficiently capture structured, latent semantic spaces and high-order interactions, tensors have a long history of applications in a wide span of computer vision problems. With the advent of the deep learning paradigm shift in computer vision, tensors have be…
▽ More
Tensors, or multidimensional arrays, are data structures that can naturally represent visual data of multiple dimensions. Inherently able to efficiently capture structured, latent semantic spaces and high-order interactions, tensors have a long history of applications in a wide span of computer vision problems. With the advent of the deep learning paradigm shift in computer vision, tensors have become even more fundamental. Indeed, essential ingredients in modern deep learning architectures, such as convolutions and attention mechanisms, can readily be considered as tensor mappings. In effect, tensor methods are increasingly finding significant applications in deep learning, including the design of memory and compute efficient network architectures, improving robustness to random noise and adversarial attacks, and aiding the theoretical understanding of deep networks.
This article provides an in-depth and practical review of tensors and tensor methods in the context of representation learning and deep learning, with a particular focus on visual data analysis and computer vision applications. Concretely, besides fundamental work in tensor-based visual data analysis methods, we focus on recent developments that have brought on a gradual increase of tensor methods, especially in deep learning architectures, and their implications in computer vision applications. To further enable the newcomer to grasp such concepts quickly, we provide companion Python notebooks, covering key aspects of the paper and implementing them, step-by-step with TensorLy.
△ Less
Submitted 7 July, 2021;
originally announced July 2021.
-
Understanding Boolean Function Learnability on Deep Neural Networks: PAC Learning Meets Neurosymbolic Models
Authors:
Marcio Nicolau,
Anderson R. Tavares,
Zhiwei Zhang,
Pedro Avelar,
João M. Flach,
Luis C. Lamb,
Moshe Y. Vardi
Abstract:
Computational learning theory states that many classes of boolean formulas are learnable in polynomial time. This paper addresses the understudied subject of how, in practice, such formulas can be learned by deep neural networks. Specifically, we analyze boolean formulas associated with model-sampling benchmarks, combinatorial optimization problems, and random 3-CNFs with varying degrees of constr…
▽ More
Computational learning theory states that many classes of boolean formulas are learnable in polynomial time. This paper addresses the understudied subject of how, in practice, such formulas can be learned by deep neural networks. Specifically, we analyze boolean formulas associated with model-sampling benchmarks, combinatorial optimization problems, and random 3-CNFs with varying degrees of constrainedness. Our experiments indicate that: (i) neural learning generalizes better than pure rule-based systems and pure symbolic approach; (ii) relatively small and shallow neural networks are very good approximators of formulas associated with combinatorial optimization problems; (iii) smaller formulas seem harder to learn, possibly due to the fewer positive (satisfying) examples available; and (iv) interestingly, underconstrained 3-CNF formulas are more challenging to learn than overconstrained ones. Such findings pave the way for a better understanding, construction, and use of interpretable neurosymbolic AI methods.
△ Less
Submitted 17 November, 2022; v1 submitted 12 September, 2020;
originally announced September 2020.
-
Enhancing Facial Data Diversity with Style-based Face Aging
Authors:
Markos Georgopoulos,
James Oldfield,
Mihalis A. Nicolaou,
Yannis Panagakis,
Maja Pantic
Abstract:
A significant limiting factor in training fair classifiers relates to the presence of dataset bias. In particular, face datasets are typically biased in terms of attributes such as gender, age, and race. If not mitigated, bias leads to algorithms that exhibit unfair behaviour towards such groups. In this work, we address the problem of increasing the diversity of face datasets with respect to age.…
▽ More
A significant limiting factor in training fair classifiers relates to the presence of dataset bias. In particular, face datasets are typically biased in terms of attributes such as gender, age, and race. If not mitigated, bias leads to algorithms that exhibit unfair behaviour towards such groups. In this work, we address the problem of increasing the diversity of face datasets with respect to age. Concretely, we propose a novel, generative style-based architecture for data augmentation that captures fine-grained aging patterns by conditioning on multi-resolution age-discriminative representations. By evaluating on several age-annotated datasets in both single- and cross-database experiments, we show that the proposed method outperforms state-of-the-art algorithms for age transfer, especially in the case of age groups that lie in the tails of the label distribution. We further show significantly increased diversity in the augmented datasets, outperforming all compared methods according to established metrics.
△ Less
Submitted 6 June, 2020;
originally announced June 2020.
-
The Lagrangian remainder of Taylor's series, distinguishes $\mathcal{O}(f(x))$ time complexities to polynomials or not
Authors:
Nikolaos P. Bakas,
Elias Kosmatopoulos,
Mihalis Nicolaou,
Savvas A. Chatzichristofis
Abstract:
The purpose of this letter is to investigate the time complexity consequences of the truncated Taylor series, known as Taylor Polynomials \cite{bakas2019taylor,Katsoprinakis2011,Nestoridis2011}. In particular, it is demonstrated that the examination of the $\mathbf{P=NP}$ equality, is associated with the determination of whether the $n^{th}$ derivative of a particular solution is bounded or not. A…
▽ More
The purpose of this letter is to investigate the time complexity consequences of the truncated Taylor series, known as Taylor Polynomials \cite{bakas2019taylor,Katsoprinakis2011,Nestoridis2011}. In particular, it is demonstrated that the examination of the $\mathbf{P=NP}$ equality, is associated with the determination of whether the $n^{th}$ derivative of a particular solution is bounded or not. Accordingly, in some cases, this is not true, and hence in general.
△ Less
Submitted 27 May, 2020; v1 submitted 30 January, 2020;
originally announced January 2020.
-
A Gradient Free Neural Network Framework Based on Universal Approximation Theorem
Authors:
Nikolaos P. Bakas,
Andreas Langousis,
Mihalis Nicolaou,
Savvas A. Chatzichristofis
Abstract:
We present a numerical scheme for computation of Artificial Neural Networks (ANN) weights, which stems from the Universal Approximation Theorem, avoiding laborious iterations. The proposed algorithm adheres to the underlying theory, is highly fast, and results in remarkably low errors when applied for regression and classification of complex data-sets, such as the Griewank function of multiple var…
▽ More
We present a numerical scheme for computation of Artificial Neural Networks (ANN) weights, which stems from the Universal Approximation Theorem, avoiding laborious iterations. The proposed algorithm adheres to the underlying theory, is highly fast, and results in remarkably low errors when applied for regression and classification of complex data-sets, such as the Griewank function of multiple variables $\mathbf{x} \in \mathbb{R}^{100}$ with random noise addition, and MNIST database for handwritten digits recognition, with $7\times10^4$ images. The same mathematical formulation is found capable of approximating highly nonlinear functions in multiple dimensions, with low errors (e.g. $10^{-10}$) for the test-set of the unknown functions, their higher-order partial derivatives, as well as numerically solving Partial Differential Equations. The method is based on the calculation of the weights of each neuron in small neighborhoods of the data, such that the corresponding local approximation matrix is invertible. Accordingly, optimization of hyperparameters is not necessary, as the number of neurons stems directly from the dimensionality of the data, further improving the algorithmic speed. Under this setting, overfitting is inherently avoided, and the results are interpretable and reproducible. The complexity of the proposed algorithm is of class P with $\mathcal{O}(mn^2)+\mathcal{O}(\frac{m^3}{n^2})-\mathcal{O}(\log(n+1))$ computing time, with respect to the observations $m$ and features $n$, in contrast with the NP-Complete class of standard algorithms for ANN training. The performance of the method is high, irrespective of the size of the dataset, and the test-set errors are similar or smaller than the training errors, indicating the generalization efficiency of the algorithm.
△ Less
Submitted 18 August, 2020; v1 submitted 30 September, 2019;
originally announced September 2019.
-
Multimodal Joint Emotion and Game Context Recognition in League of Legends Livestreams
Authors:
Charles Ringer,
James Alfred Walker,
Mihalis A. Nicolaou
Abstract:
Video game streaming provides the viewer with a rich set of audio-visual data, conveying information both with regards to the game itself, through game footage and audio, as well as the streamer's emotional state and behaviour via webcam footage and audio. Analysing player behaviour and discovering correlations with game context is crucial for modelling and understanding important aspects of lives…
▽ More
Video game streaming provides the viewer with a rich set of audio-visual data, conveying information both with regards to the game itself, through game footage and audio, as well as the streamer's emotional state and behaviour via webcam footage and audio. Analysing player behaviour and discovering correlations with game context is crucial for modelling and understanding important aspects of livestreams, but comes with a significant set of challenges - such as fusing multimodal data captured by different sensors in uncontrolled ('in-the-wild') conditions. Firstly, we present, to our knowledge, the first data set of League of Legends livestreams, annotated for both streamer affect and game context. Secondly, we propose a method that exploits tensor decompositions for high-order fusion of multimodal representations. The proposed method is evaluated on the problem of jointly predicting game context and player affect, compared with a set of baseline fusion approaches such as late and early fusion.
△ Less
Submitted 31 May, 2019;
originally announced May 2019.
-
3DFaceGAN: Adversarial Nets for 3D Face Representation, Generation, and Translation
Authors:
Stylianos Moschoglou,
Stylianos Ploumpis,
Mihalis Nicolaou,
Athanasios Papaioannou,
Stefanos Zafeiriou
Abstract:
Over the past few years, Generative Adversarial Networks (GANs) have garnered increased interest among researchers in Computer Vision, with applications including, but not limited to, image generation, translation, imputation, and super-resolution. Nevertheless, no GAN-based method has been proposed in the literature that can successfully represent, generate or translate 3D facial shapes (meshes).…
▽ More
Over the past few years, Generative Adversarial Networks (GANs) have garnered increased interest among researchers in Computer Vision, with applications including, but not limited to, image generation, translation, imputation, and super-resolution. Nevertheless, no GAN-based method has been proposed in the literature that can successfully represent, generate or translate 3D facial shapes (meshes). This can be primarily attributed to two facts, namely that (a) publicly available 3D face databases are scarce as well as limited in terms of sample size and variability (e.g., few subjects, little diversity in race and gender), and (b) mesh convolutions for deep networks present several challenges that are not entirely tackled in the literature, leading to operator approximations and model instability, often failing to preserve high-frequency components of the distribution. As a result, linear methods such as Principal Component Analysis (PCA) have been mainly utilized towards 3D shape analysis, despite being unable to capture non-linearities and high frequency details of the 3D face - such as eyelid and lip variations. In this work, we present 3DFaceGAN, the first GAN tailored towards modeling the distribution of 3D facial surfaces, while retaining the high frequency details of 3D face shapes. We conduct an extensive series of both qualitative and quantitative experiments, where the merits of 3DFaceGAN are clearly demonstrated against other, state-of-the-art methods in tasks such as 3D shape representation, generation, and translation.
△ Less
Submitted 9 May, 2019; v1 submitted 1 May, 2019;
originally announced May 2019.
-
Adversarial Learning of Disentangled and Generalizable Representations for Visual Attributes
Authors:
James Oldfield,
Yannis Panagakis,
Mihalis A. Nicolaou
Abstract:
Recently, a multitude of methods for image-to-image translation have demonstrated impressive results on problems such as multi-domain or multi-attribute transfer. The vast majority of such works leverages the strengths of adversarial learning and deep convolutional autoencoders to achieve realistic results by well-capturing the target data distribution. Nevertheless, the most prominent representat…
▽ More
Recently, a multitude of methods for image-to-image translation have demonstrated impressive results on problems such as multi-domain or multi-attribute transfer. The vast majority of such works leverages the strengths of adversarial learning and deep convolutional autoencoders to achieve realistic results by well-capturing the target data distribution. Nevertheless, the most prominent representatives of this class of methods do not facilitate semantic structure in the latent space, and usually rely on binary domain labels for test-time transfer. This leads to rigid models, unable to capture the variance of each domain label. In this light, we propose a novel adversarial learning method that (i) facilitates the emergence of latent structure by semantically disentangling sources of variation, and (ii) encourages learning generalizable, continuous, and transferable latent codes that enable flexible attribute mixing. This is achieved by introducing a novel loss function that encourages representations to result in uniformly distributed class posteriors for disentangled attributes. In tandem with an algorithm for inducing generalizable properties, the resulting representations can be utilized for a variety of tasks such as intensity-preserving multi-attribute image translation and synthesis, without requiring labelled test data. We demonstrate the merits of the proposed method by a set of qualitative and quantitative experiments on popular databases such as MultiPIE, RaFD, and BU-3DFE, where our method outperforms other, state-of-the-art methods in tasks such as intensity-preserving multi-attribute transfer and synthesis.
△ Less
Submitted 30 January, 2021; v1 submitted 9 April, 2019;
originally announced April 2019.
-
On the automorphism group of foliations with geometric transverse structure
Authors:
Laurent Meersseman,
Marcel Nicolau,
Javier Ribon
Abstract:
Motivated by questions of deformations/moduli in foliation theory, we investigate the structure of some groups of diffeomorphisms preserving a foliation. We give an example of a $C^\infty$ foliation whose diffeomorphism group is not a Lie group in any reasonable sense. On the positive side, we prove that the automorphism group of a transversely holomorphic foliation or a riemannian foliation is a…
▽ More
Motivated by questions of deformations/moduli in foliation theory, we investigate the structure of some groups of diffeomorphisms preserving a foliation. We give an example of a $C^\infty$ foliation whose diffeomorphism group is not a Lie group in any reasonable sense. On the positive side, we prove that the automorphism group of a transversely holomorphic foliation or a riemannian foliation is a strong ILH Lie goup in the sense of Omori.
△ Less
Submitted 24 March, 2022; v1 submitted 16 October, 2018;
originally announced October 2018.
-
Deep Unsupervised Multi-View Detection of Video Game Stream Highlights
Authors:
Charles Ringer,
Mihalis A. Nicolaou
Abstract:
We consider the problem of automatic highlight-detection in video game streams. Currently, the vast majority of highlight-detection systems for games are triggered by the occurrence of hard-coded game events (e.g., score change, end-game), while most advanced tools and techniques are based on detection of highlights via visual analysis of game footage. We argue that in the context of game streamin…
▽ More
We consider the problem of automatic highlight-detection in video game streams. Currently, the vast majority of highlight-detection systems for games are triggered by the occurrence of hard-coded game events (e.g., score change, end-game), while most advanced tools and techniques are based on detection of highlights via visual analysis of game footage. We argue that in the context of game streaming, events that may constitute highlights are not only dependent on game footage, but also on social signals that are conveyed by the streamer during the play session (e.g., when interacting with viewers, or when commenting and reacting to the game). In this light, we present a multi-view unsupervised deep learning methodology for novelty-based highlight detection. The method jointly analyses both game footage and social signals such as the players facial expressions and speech, and shows promising results for generating highlights on streams of popular games such as Player Unknown's Battlegrounds.
△ Less
Submitted 25 July, 2018;
originally announced July 2018.
-
Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond
Authors:
Dimitrios Kollias,
Panagiotis Tzirakis,
Mihalis A. Nicolaou,
Athanasios Papaioannou,
Guoying Zhao,
Björn Schuller,
Irene Kotsia,
Stefanos Zafeiriou
Abstract:
Automatic understanding of human affect using visual signals is of great importance in everyday human-machine interactions. Appraising human emotional states, behaviors and reactions displayed in real-world settings, can be accomplished using latent continuous dimensions (e.g., the circumplex model of affect). Valence (i.e., how positive or negative is an emotion) & arousal (i.e., power of the act…
▽ More
Automatic understanding of human affect using visual signals is of great importance in everyday human-machine interactions. Appraising human emotional states, behaviors and reactions displayed in real-world settings, can be accomplished using latent continuous dimensions (e.g., the circumplex model of affect). Valence (i.e., how positive or negative is an emotion) & arousal (i.e., power of the activation of the emotion) constitute popular and effective affect representations. Nevertheless, the majority of collected datasets this far, although containing naturalistic emotional states, have been captured in highly controlled recording conditions. In this paper, we introduce the Aff-Wild benchmark for training and evaluating affect recognition algorithms. We also report on the results of the First Affect-in-the-wild Challenge that was organized in conjunction with CVPR 2017 on the Aff-Wild database and was the first ever challenge on the estimation of valence and arousal in-the-wild. Furthermore, we design and extensively train an end-to-end deep neural architecture which performs prediction of continuous emotion dimensions based on visual cues. The proposed deep learning architecture, AffWildNet, includes convolutional & recurrent neural network layers, exploiting the invariant properties of convolutional features, while also modeling temporal dynamics that arise in human behavior via the recurrent layers. The AffWildNet produced state-of-the-art results on the Aff-Wild Challenge. We then exploit the AffWild database for learning features, which can be used as priors for achieving best performances both for dimensional, as well as categorical emotion recognition, using the RECOLA, AFEW-VA and EmotiW datasets, compared to all other methods designed for the same goal. The database and emotion recognition models are available at http://ibug.doc.ic.ac.uk/resources/first-affect-wild-challenge.
△ Less
Submitted 1 February, 2019; v1 submitted 29 April, 2018;
originally announced April 2018.
-
Fusarium Damaged Kernels Detection Using Transfer Learning on Deep Neural Network Architecture
Authors:
Márcio Nicolau,
Márcia Barrocas Moreira Pimentel,
Casiane Salete Tibola,
José Mauricio Cunha Fernandes,
Willingthon Pavan
Abstract:
The present work shows the application of transfer learning for a pre-trained deep neural network (DNN), using a small image dataset ($\approx$ 12,000) on a single workstation with enabled NVIDIA GPU card that takes up to 1 hour to complete the training task and archive an overall average accuracy of $94.7\%$. The DNN presents a $20\%$ score of misclassification for an external test dataset. The a…
▽ More
The present work shows the application of transfer learning for a pre-trained deep neural network (DNN), using a small image dataset ($\approx$ 12,000) on a single workstation with enabled NVIDIA GPU card that takes up to 1 hour to complete the training task and archive an overall average accuracy of $94.7\%$. The DNN presents a $20\%$ score of misclassification for an external test dataset. The accuracy of the proposed methodology is equivalent to ones using HSI methodology $(81\%-91\%)$ used for the same task, but with the advantage of being independent on special equipment to classify wheat kernel for FHB symptoms.
△ Less
Submitted 31 January, 2018;
originally announced February 2018.
-
Multi-Attribute Robust Component Analysis for Facial UV Maps
Authors:
Stylianos Moschoglou,
Evangelos Ververas,
Yannis Panagakis,
Mihalis Nicolaou,
Stefanos Zafeiriou
Abstract:
Recently, due to the collection of large scale 3D face models, as well as the advent of deep learning, a significant progress has been made in the field of 3D face alignment "in-the-wild". That is, many methods have been proposed that establish sparse or dense 3D correspondences between a 2D facial image and a 3D face model. The utilization of 3D face alignment introduces new challenges and resear…
▽ More
Recently, due to the collection of large scale 3D face models, as well as the advent of deep learning, a significant progress has been made in the field of 3D face alignment "in-the-wild". That is, many methods have been proposed that establish sparse or dense 3D correspondences between a 2D facial image and a 3D face model. The utilization of 3D face alignment introduces new challenges and research directions, especially on the analysis of facial texture images. In particular, texture does not suffer any more from warping effects (that occurred when 2D face alignment methods were used). Nevertheless, since facial images are commonly captured in arbitrary recording conditions, a considerable amount of missing information and gross outliers is observed (e.g., due to self-occlusion, or subjects wearing eye-glasses). Given that many annotated databases have been developed for face analysis tasks, it is evident that component analysis techniques need to be developed in order to alleviate issues arising from the aforementioned challenges. In this paper, we propose a novel component analysis technique that is suitable for facial UV maps containing a considerable amount of missing information and outliers, while additionally, incorporates knowledge from various attributes (such as age and identity). We evaluate the proposed Multi-Attribute Robust Component Analysis (MA-RCA) on problems such as UV completion and age progression, where the proposed method outperforms compared techniques. Finally, we demonstrate that MA-RCA method is powerful enough to provide weak annotations for training deep learning systems for various applications, such as illumination transfer.
△ Less
Submitted 15 December, 2017;
originally announced December 2017.
-
End-to-End Multimodal Emotion Recognition using Deep Neural Networks
Authors:
Panagiotis Tzirakis,
George Trigeorgis,
Mihalis A. Nicolaou,
Björn Schuller,
Stefanos Zafeiriou
Abstract:
Automatic affect recognition is a challenging task due to the various modalities emotions can be expressed with. Applications can be found in many domains including multimedia retrieval and human computer interaction. In recent years, deep neural networks have been used with great success in determining emotional states. Inspired by this success, we propose an emotion recognition system using audi…
▽ More
Automatic affect recognition is a challenging task due to the various modalities emotions can be expressed with. Applications can be found in many domains including multimedia retrieval and human computer interaction. In recent years, deep neural networks have been used with great success in determining emotional states. Inspired by this success, we propose an emotion recognition system using auditory and visual modalities. To capture the emotional content for various styles of speaking, robust features need to be extracted. To this purpose, we utilize a Convolutional Neural Network (CNN) to extract features from the speech, while for the visual modality a deep residual network (ResNet) of 50 layers. In addition to the importance of feature extraction, a machine learning algorithm needs also to be insensitive to outliers while being able to model the context. To tackle this problem, Long Short-Term Memory (LSTM) networks are utilized. The system is then trained in an end-to-end fashion where - by also taking advantage of the correlations of the each of the streams - we manage to significantly outperform the traditional approaches based on auditory and visual handcrafted features for the prediction of spontaneous and natural emotions on the RECOLA database of the AVEC 2016 research challenge on emotion recognition.
△ Less
Submitted 27 April, 2017;
originally announced April 2017.
-
Improving Fitness Functions in Genetic Programming for Classification on Unbalanced Credit Card Datasets
Authors:
Van Loi Cao,
Nhien-An Le-Khac,
Miguel Nicolau,
Michael ONeill,
James McDermott
Abstract:
Credit card fraud detection based on machine learning has recently attracted considerable interest from the research community. One of the most important tasks in this area is the ability of classifiers to handle the imbalance in credit card data. In this scenario, classifiers tend to yield poor accuracy on the fraud class (minority class) despite realizing high overall accuracy. This is due to th…
▽ More
Credit card fraud detection based on machine learning has recently attracted considerable interest from the research community. One of the most important tasks in this area is the ability of classifiers to handle the imbalance in credit card data. In this scenario, classifiers tend to yield poor accuracy on the fraud class (minority class) despite realizing high overall accuracy. This is due to the influence of the majority class on traditional training criteria. In this paper, we aim to apply genetic programming to address this issue by adapting existing fitness functions. We examine two fitness functions from previous studies and develop two new fitness functions to evolve GP classifier with superior accuracy on the minority class and overall. Two UCI credit card datasets are used to evaluate the effectiveness of the proposed fitness functions. The results demonstrate that the proposed fitness functions augment GP classifiers, encouraging fitter solutions on both the minority and the majority classes.
△ Less
Submitted 11 April, 2017;
originally announced April 2017.
-
Deformations and Moduli of Structures on Manifolds: General Existence Theorem and Application to the Sasakian Case
Authors:
Laurent Meersseman,
Marcel Nicolau
Abstract:
In this paper, we prove an existence theorem of a local moduli space for geometric structures in a very general setting. Then to show the interest of this result, we apply it to the case of sasakian and Sasaki-Einstein structures.
In this paper, we prove an existence theorem of a local moduli space for geometric structures in a very general setting. Then to show the interest of this result, we apply it to the case of sasakian and Sasaki-Einstein structures.
△ Less
Submitted 16 October, 2015; v1 submitted 27 March, 2015;
originally announced March 2015.
-
Foliations and webs inducing Galois coverings
Authors:
Andrés Beltrán,
Maycol Falla Luza,
David Marín,
Marcel Nicolau
Abstract:
We introduce the notion of Galois holomorphic foliation on the complex projective space as that of foliations whose Gauss map is a Galois covering when restricted to an appropriate Zariski open subset. First, we establish general criteria assuring that a rational map between projective manifolds of the same dimension defines a Galois covering. Then, these criteria are used to give a geometric char…
▽ More
We introduce the notion of Galois holomorphic foliation on the complex projective space as that of foliations whose Gauss map is a Galois covering when restricted to an appropriate Zariski open subset. First, we establish general criteria assuring that a rational map between projective manifolds of the same dimension defines a Galois covering. Then, these criteria are used to give a geometric characterization of Galois foliations in terms of their inflection divisor and their singularities. We also characterize Galois foliations on $\mathbb P^2$ admitting continuous symmetries, obtaining a complete classification of Galois homogeneous foliations.
△ Less
Submitted 16 March, 2015;
originally announced March 2015.
-
A Unified Framework for Probabilistic Component Analysis
Authors:
Mihalis A. Nicolaou,
Stefanos Zafeiriou,
Maja Pantic
Abstract:
We present a unifying framework which reduces the construction of probabilistic component analysis techniques to a mere selection of the latent neighbourhood, thus providing an elegant and principled framework for creating novel component analysis models as well as constructing probabilistic equivalents of deterministic component analysis methods. Under our framework, we unify many very popular an…
▽ More
We present a unifying framework which reduces the construction of probabilistic component analysis techniques to a mere selection of the latent neighbourhood, thus providing an elegant and principled framework for creating novel component analysis models as well as constructing probabilistic equivalents of deterministic component analysis methods. Under our framework, we unify many very popular and well-studied component analysis algorithms, such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Locality Preserving Projections (LPP) and Slow Feature Analysis (SFA), some of which have no probabilistic equivalents in literature thus far. We firstly define the Markov Random Fields (MRFs) which encapsulate the latent connectivity of the aforementioned component analysis techniques; subsequently, we show that the projection directions produced by all PCA, LDA, LPP and SFA are also produced by the Maximum Likelihood (ML) solution of a single joint probability density function, composed by selecting one of the defined MRF priors while utilising a simple observation model. Furthermore, we propose novel Expectation Maximization (EM) algorithms, exploiting the proposed joint PDF, while we generalize the proposed methodologies to arbitrary connectivities via parameterizable MRF products. Theoretical analysis and experiments on both simulated and real world data show the usefulness of the proposed framework, by deriving methods which well outperform state-of-the-art equivalents.
△ Less
Submitted 14 November, 2014; v1 submitted 13 March, 2013;
originally announced March 2013.
-
Evolving Genes to Balance a Pole
Authors:
Miguel Nicolau,
Marc Schoenauer,
W. Banzhaf
Abstract:
We discuss how to use a Genetic Regulatory Network as an evolutionary representation to solve a typical GP reinforcement problem, the pole balancing. The network is a modified version of an Artificial Regulatory Network proposed a few years ago, and the task could be solved only by finding a proper way of connecting inputs and outputs to the network. We show that the representation is able to gene…
▽ More
We discuss how to use a Genetic Regulatory Network as an evolutionary representation to solve a typical GP reinforcement problem, the pole balancing. The network is a modified version of an Artificial Regulatory Network proposed a few years ago, and the task could be solved only by finding a proper way of connecting inputs and outputs to the network. We show that the representation is able to generalize well over the problem domain, and discuss the performance of different models of this kind.
△ Less
Submitted 17 May, 2010;
originally announced May 2010.
-
Deformations of Kahler manifolds with non vanishing holomorphic vector fields
Authors:
Jaume Amoros,
Monica Manjarin,
Marcel Nicolau
Abstract:
In this article we study compact Kähler manifolds $X$ admitting non-singular holomorphic vector fields with the aim of extending to this setting the classical birational classification of projective varieties with tangent vector fields. We prove that any such a Kähler manifold $X$ admits an arbitrarily small deformation of a particular type which is a suspension over a torus; that is, a quotient o…
▽ More
In this article we study compact Kähler manifolds $X$ admitting non-singular holomorphic vector fields with the aim of extending to this setting the classical birational classification of projective varieties with tangent vector fields. We prove that any such a Kähler manifold $X$ admits an arbitrarily small deformation of a particular type which is a suspension over a torus; that is, a quotient of $F\times \mbb C^s$ fibering over a torus $T=\mbb C^s/Λ$. We derive some results dealing with the structure of such manifolds. In particular, we prove an extension of Calabi's theorem describing the structure of compact Kähler manifolds with $c_1(X)=0$ to general Kähler manifolds with non-vanishing vector fields. A complete classification when $X$ is a projective manifold or when $\dim X\leq s+2$ is also given. As an application, it is shown that the study of the dynamics of holomorphic tangent fields on compact Kähler manifolds reduces to the case of rational manifolds.
△ Less
Submitted 19 July, 2010; v1 submitted 25 September, 2009;
originally announced September 2009.
-
Deformations Feuilletees Des Varietes De Hopf
Authors:
Laurent Meersseman,
Marcel Nicolau,
Alberto Verjovsky
Abstract:
In this article, we focus on a very special class of foliations with complex leaves whose diffeomorphism type is fixed. They have a unique compact leaf and the noncompact leaves all accumulate onto it. We show that the complex structure along the non-compact leaves is fixed by the complex structure of the compact leaf. Reciprocally, we prove that the complex structure along a non-compact leaf de…
▽ More
In this article, we focus on a very special class of foliations with complex leaves whose diffeomorphism type is fixed. They have a unique compact leaf and the noncompact leaves all accumulate onto it. We show that the complex structure along the non-compact leaves is fixed by the complex structure of the compact leaf. Reciprocally, we prove that the complex structure along a non-compact leaf determines the complex structure along the other leaves. We apply these results to the study of foliated deformations of Hopf manifolds, a foliated analogue to the notion of deformation in the large.
△ Less
Submitted 25 February, 2009;
originally announced February 2009.
-
Complex and CR-structures on compact Lie groups associated to Abelian actions
Authors:
J. -J. Loeb,
M. Manjarin,
M. Nicolau
Abstract:
It was shown by Samelson and Wang that each compact Lie group K of even dimension admits left-invariant complex structures. When K has odd dimension it admits a left-invariant CR-structure of maximal dimension. This has been proved recently by Charbonnel and Khalgui who have also given a complete algebraic description of these structures. In this article we present an alternative and more geomet…
▽ More
It was shown by Samelson and Wang that each compact Lie group K of even dimension admits left-invariant complex structures. When K has odd dimension it admits a left-invariant CR-structure of maximal dimension. This has been proved recently by Charbonnel and Khalgui who have also given a complete algebraic description of these structures. In this article we present an alternative and more geometric construction of this type of invariant structures on a compact Lie group K when it is semisimple. We prove that each left-invariant complex structure, or each CR-structure of maximal dimension with a transverse CR-action by R, is induced by a holomorphic C^l action on a quasi-projective manifold X naturally associated to K. We then show that X admits more general Abelian actions, also inducing complex or CR structures on K which are generically non-invariant.
△ Less
Submitted 30 October, 2006;
originally announced October 2006.