Search | arXiv e-print repository

Revisiting dynamics of interacting quintessence

Authors: Patrocinio Pérez, Ulises Nucamendi, Roberto De Arcia

Abstract: We apply the tools of the dynamical system theory in order to revisit and uncover the structure of a nongravitational interaction between pressureless dark matter and dark energy described by a scalar field $φ$. For a coupling function $Q = -(αdρ_m/dt + βdρ_φ/dt )$, where t is the cosmic time, we have found that it can be rewritten in the form $Q = 3H (αρ_m + β(dφ/dt)^2 )/(1-α+β)$, so that its dep… ▽ More We apply the tools of the dynamical system theory in order to revisit and uncover the structure of a nongravitational interaction between pressureless dark matter and dark energy described by a scalar field $φ$. For a coupling function $Q = -(αdρ_m/dt + βdρ_φ/dt )$, where t is the cosmic time, we have found that it can be rewritten in the form $Q = 3H (αρ_m + β(dφ/dt)^2 )/(1-α+β)$, so that its dependence on the dark matter density and on the kinetic term of the scalar field is linear and proportional to the Hubble parameter. We analyze the scenarios $α=0$, $α= β$ and $α= -β$, separately and in order to describe the cosmological evolution we have calculated various observables. A notable result of this work is that, unlike for the noninteracting scalar field with exponential potential where five critical points appear, in the case studied here, with the exception of the matter dominated solution, the remaining singular points are transformed into scaling solutions enriching the phase space. It is shown that for $α\neq 0$, a separatrix arises modifying prominently the structure of the phase space. This represents a novel feature no mentioned before in the literature. △ Less

Submitted 7 December, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

Comments: 16 pages, 9 figures, 7 tables

Journal ref: Eur. Phys. J. C (2021) 81:1063

arXiv:2104.02618 [pdf, other]

doi 10.1109/TMM.2021.3098450

Subjective Assessment Experiments That Recruit Few Observers With Repetitions (FOWR)

Authors: Pablo Perez, Lucjan Janowski, Narciso Garcia, Margaret Pinson

Abstract: Recent studies have shown that it is possible to characterize subject bias and variance in subjective assessment tests. Apparent differences among subjects can, for the most part, be explained by random factors. Building on that theory, we propose a subjective test design where three to four team members each rate the stimuli multiple times. The results are comparable to a high performing objectiv… ▽ More Recent studies have shown that it is possible to characterize subject bias and variance in subjective assessment tests. Apparent differences among subjects can, for the most part, be explained by random factors. Building on that theory, we propose a subjective test design where three to four team members each rate the stimuli multiple times. The results are comparable to a high performing objective metric. This provides a quick and simple way to analyze new technologies and perform pre-tests for subjective assessment. △ Less

Submitted 20 July, 2022; v1 submitted 6 April, 2021; originally announced April 2021.

Comments: IEEE Transactions on Multimedia

arXiv:2103.16214 [pdf, other]

Multi-View Radar Semantic Segmentation

Authors: Arthur Ouaknine, Alasdair Newson, Patrick Pérez, Florence Tupin, Julien Rebut

Abstract: Understanding the scene around the ego-vehicle is key to assisted and autonomous driving. Nowadays, this is mostly conducted using cameras and laser scanners, despite their reduced performances in adverse weather conditions. Automotive radars are low-cost active sensors that measure properties of surrounding objects, including their relative speed, and have the key advantage of not being impacted… ▽ More Understanding the scene around the ego-vehicle is key to assisted and autonomous driving. Nowadays, this is mostly conducted using cameras and laser scanners, despite their reduced performances in adverse weather conditions. Automotive radars are low-cost active sensors that measure properties of surrounding objects, including their relative speed, and have the key advantage of not being impacted by rain, snow or fog. However, they are seldom used for scene understanding due to the size and complexity of radar raw data and the lack of annotated datasets. Fortunately, recent open-sourced datasets have opened up research on classification, object detection and semantic segmentation with raw radar signals using end-to-end trainable models. In this work, we propose several novel architectures, and their associated losses, which analyse multiple "views" of the range-angle-Doppler radar tensor to segment it semantically. Experiments conducted on the recent CARRADA dataset demonstrate that our best model outperforms alternative models, derived either from the semantic segmentation of natural images or from radar scene understanding, while requiring significantly fewer parameters. Both our code and trained models are available at https://github.com/valeoai/MVRSS. △ Less

Submitted 24 August, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

Comments: 16 pages, 9 figures. Accepted at ICCV 2021

arXiv:2103.13905 [pdf, other]

StyleLess layer: Improving robustness for real-world driving

Authors: Julien Rebut, Andrei Bursuc, Patrick Pérez

Abstract: Deep Neural Networks (DNNs) are a critical component for self-driving vehicles. They achieve impressive performance by reaping information from high amounts of labeled data. Yet, the full complexity of the real world cannot be encapsulated in the training data, no matter how big the dataset, and DNNs can hardly generalize to unseen conditions. Robustness to various image corruptions, caused by cha… ▽ More Deep Neural Networks (DNNs) are a critical component for self-driving vehicles. They achieve impressive performance by reaping information from high amounts of labeled data. Yet, the full complexity of the real world cannot be encapsulated in the training data, no matter how big the dataset, and DNNs can hardly generalize to unseen conditions. Robustness to various image corruptions, caused by changing weather conditions or sensor degradation and aging, is crucial for safety when such vehicles are deployed in the real world. We address this problem through a novel type of layer, dubbed StyleLess, which enables DNNs to learn robust and informative features that can cope with varying external conditions. We propose multiple variations of this layer that can be integrated in most of the architectures and trained jointly with the main task. We validate our contribution on typical autonomous-driving tasks (detection, semantic segmentation), showing that in most cases, this approach improves predictive performance on unseen conditions (fog, rain), while preserving performance on seen conditions and objects. △ Less

Submitted 25 March, 2021; originally announced March 2021.

Comments: 6 pages, 6 figures, 2 tables

arXiv:2103.13397 [pdf, other]

doi 10.1103/PhysRevD.104.055007

Baryogenesis via Leptogenesis: Spontaneous B and L Violation

Authors: Pavel Fileviez Perez, Clara Murgui, Alexis D. Plascencia

Abstract: In order to address the baryon asymmetry in the Universe one needs to understand the origin of baryon (B) and lepton (L) number violation. In this article, we discuss the mechanism of baryogenesis via leptogenesis to explain the matter-antimatter asymmetry in theories with spontaneous breaking of baryon and lepton number. In this context, a lepton asymmetry is generated through the out-of-equilibr… ▽ More In order to address the baryon asymmetry in the Universe one needs to understand the origin of baryon (B) and lepton (L) number violation. In this article, we discuss the mechanism of baryogenesis via leptogenesis to explain the matter-antimatter asymmetry in theories with spontaneous breaking of baryon and lepton number. In this context, a lepton asymmetry is generated through the out-of-equilibrium decays of right-handed neutrinos at the high-scale, while local baryon number must be broken below the multi-TeV scale to satisfy the cosmological bounds on the dark matter relic density. We demonstrate how the lepton asymmetry generated via leptogenesis can be converted in two different ways: a) in the theory predicting Majorana dark matter the lepton asymmetry is converted into a baryon asymmetry, and b) in the theory with Dirac dark matter the decays of right-handed neutrinos can generate lepton and dark matter asymmetries that are then partially converted into a baryon asymmetry. Consequently, we show how to explain the matter-antimatter asymmetry, the dark matter relic density and neutrino masses in theories for local baryon and lepton number. △ Less

Submitted 9 August, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

Comments: 18 pages, 8 figures. v2 References added, to appear in PRD

Journal ref: Phys. Rev. D 104, 055007 (2021)

arXiv:2103.10930 [pdf, other]

Prediction of Hydraulic Blockage at Cross Drainage Structures using Regression Analysis

Authors: Umair Iqbal, Johan Barthelemy, Pascal Perez, Wanqing Li

Abstract: Hydraulic blockage of cross-drainage structures such as culverts is considered one of main contributor in triggering urban flash floods. However, due to lack of during floods data and highly non-linear nature of debris interaction, conventional modelling for hydraulic blockage is not possible. This paper proposes to use machine learning regression analysis for the prediction of hydraulic blockage.… ▽ More Hydraulic blockage of cross-drainage structures such as culverts is considered one of main contributor in triggering urban flash floods. However, due to lack of during floods data and highly non-linear nature of debris interaction, conventional modelling for hydraulic blockage is not possible. This paper proposes to use machine learning regression analysis for the prediction of hydraulic blockage. Relevant data has been collected by performing a scaled in-lab study and replicating different blockage scenarios. From the regression analysis, Artificial Neural Network (ANN) was reported best in hydraulic blockage prediction with $R^2$ of 0.89. With deployment of hydraulic sensors in smart cities, and availability of Big Data, regression analysis may prove helpful in addressing the blockage detection problem which is difficult to counter using conventional experimental and hydrological approaches. △ Less

Submitted 5 March, 2021; originally announced March 2021.

Comments: 12 pages, 5 figures

arXiv:2103.02550 [pdf, other]

doi 10.1109/TAFFC.2022.3149162

Methodology to Assess Quality, Presence, Empathy, Attitude, and Attention in 360-degree Videos for Immersive Communications

Authors: Marta Orduna, Pablo Pérez, Jesús Gutiérrez, Narciso García

Abstract: This paper analyzes the joint assessment of quality, spatial and social presence, empathy, attitude, and attention in three conditions: (A)visualizing and rating the quality of contents in a Head-Mounted Display (HMD), (B)visualizing the contents in an HMD,and (C)visualizing the contents in an HMD where participants can see their hands and take notes. The experiment simulates an immersive communic… ▽ More This paper analyzes the joint assessment of quality, spatial and social presence, empathy, attitude, and attention in three conditions: (A)visualizing and rating the quality of contents in a Head-Mounted Display (HMD), (B)visualizing the contents in an HMD,and (C)visualizing the contents in an HMD where participants can see their hands and take notes. The experiment simulates an immersive communication where participants attend conversations of different genres and from different acquisition perspectives in the context of international experiences. Video quality is evaluated with Single-Stimulus Discrete Quality Evaluation (SSDQE) methodology. Spatial and social presence are evaluated with questionnaires adapted from the literature. Initial empathy is assessed with Interpersonal Reactivity Index(IRI) and a questionnaire is designed to evaluate attitude. Attention is evaluated with 3 questions that had pass/fail answers. 54 participants were evenly distributed among A, B, and C conditions taking into account their international experience backgrounds, obtaining a diverse sample of participants. The results from the subjective test validate the proposed methodology in VR communications, showing that video quality experiments can be adapted to conditions imposed by experiments focused on the evaluation of socioemotional features in terms of contents of long-duration, actor and observer acquisition perspectives, and genre. In addition, the positive results related to the sense of presence imply that technology can be relevant in the analyzed use case. The acquisition perspective greatly influences social presence and all the contents have a positive impact on all participants on their attitude towards international experiences. The annotated dataset, Student Experiences Around the World dataset (SEAW-dataset), obtained from the experiment is made publicly available. △ Less

Submitted 9 February, 2022; v1 submitted 3 March, 2021; originally announced March 2021.

Comments: IEEE Transactions on Affective Computing, Early Access

arXiv:2101.07253 [pdf, other]

Cross-modal Learning for Domain Adaptation in 3D Semantic Segmentation

Authors: Maximilian Jaritz, Tuan-Hung Vu, Raoul de Charette, Émilie Wirbel, Patrick Pérez

Abstract: Domain adaptation is an important task to enable learning when labels are scarce. While most works focus only on the image modality, there are many important multi-modal datasets. In order to leverage multi-modality for domain adaptation, we propose cross-modal learning, where we enforce consistency between the predictions of two modalities via mutual mimicking. We constrain our network to make co… ▽ More Domain adaptation is an important task to enable learning when labels are scarce. While most works focus only on the image modality, there are many important multi-modal datasets. In order to leverage multi-modality for domain adaptation, we propose cross-modal learning, where we enforce consistency between the predictions of two modalities via mutual mimicking. We constrain our network to make correct predictions on labeled data and consistent predictions across modalities on unlabeled target-domain data. Experiments in unsupervised and semi-supervised domain adaptation settings prove the effectiveness of this novel domain adaptation strategy. Specifically, we evaluate on the task of 3D semantic segmentation from either the 2D image, the 3D point cloud or from both. We leverage recent driving datasets to produce a wide variety of domain adaptation scenarios including changes in scene layout, lighting, sensor setup and weather, as well as the synthetic-to-real setup. Our method significantly improves over previous uni-modal adaptation baselines on all adaption scenarios. Our code is publicly available at https://github.com/valeoai/xmuda_journal △ Less

Submitted 22 June, 2022; v1 submitted 18 January, 2021; originally announced January 2021.

Comments: TPAMI 2022

arXiv:2101.05307 [pdf, other]

Explainability of deep vision-based autonomous driving systems: Review and challenges

Authors: Éloi Zablocki, Hédi Ben-Younes, Patrick Pérez, Matthieu Cord

Abstract: This survey reviews explainability methods for vision-based self-driving systems trained with behavior cloning. The concept of explainability has several facets and the need for explainability is strong in driving, a safety-critical application. Gathering contributions from several research fields, namely computer vision, deep learning, autonomous driving, explainable AI (X-AI), this survey tackle… ▽ More This survey reviews explainability methods for vision-based self-driving systems trained with behavior cloning. The concept of explainability has several facets and the need for explainability is strong in driving, a safety-critical application. Gathering contributions from several research fields, namely computer vision, deep learning, autonomous driving, explainable AI (X-AI), this survey tackles several points. First, it discusses definitions, context, and motivation for gaining more interpretability and explainability from self-driving systems, as well as the challenges that are specific to this application. Second, methods providing explanations to a black-box self-driving system in a post-hoc fashion are comprehensively organized and detailed. Third, approaches from the literature that aim at building more interpretable self-driving systems by design are presented and discussed in detail. Finally, remaining open-challenges and potential future research directions are identified and examined. △ Less

Submitted 19 July, 2022; v1 submitted 13 January, 2021; originally announced January 2021.

Comments: IJCV 2022

arXiv:2012.11552 [pdf, other]

OBoW: Online Bag-of-Visual-Words Generation for Self-Supervised Learning

Authors: Spyros Gidaris, Andrei Bursuc, Gilles Puy, Nikos Komodakis, Matthieu Cord, Patrick Pérez

Abstract: Learning image representations without human supervision is an important and active research field. Several recent approaches have successfully leveraged the idea of making such a representation invariant under different types of perturbations, especially via contrastive-based instance discrimination training. Although effective visual representations should indeed exhibit such invariances, there… ▽ More Learning image representations without human supervision is an important and active research field. Several recent approaches have successfully leveraged the idea of making such a representation invariant under different types of perturbations, especially via contrastive-based instance discrimination training. Although effective visual representations should indeed exhibit such invariances, there are other important characteristics, such as encoding contextual reasoning skills, for which alternative reconstruction-based approaches might be better suited. With this in mind, we propose a teacher-student scheme to learn representations by training a convolutional net to reconstruct a bag-of-visual-words (BoW) representation of an image, given as input a perturbed version of that same image. Our strategy performs an online training of both the teacher network (whose role is to generate the BoW targets) and the student network (whose role is to learn representations), along with an online update of the visual-words vocabulary (used for the BoW targets). This idea effectively enables fully online BoW-guided unsupervised learning. Extensive experiments demonstrate the interest of our BoW-based strategy which surpasses previous state-of-the-art methods (including contrastive-based ones) in several applications. For instance, in downstream tasks such Pascal object detection, Pascal classification and Places205 classification, our method improves over all prior unsupervised approaches, thus establishing new state-of-the-art results that are also significantly better even than those of supervised pre-training. We provide the implementation code at https://github.com/valeoai/obow. △ Less

Submitted 29 October, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

Comments: Accepted to CVPR2021. Code at https://github.com/valeoai/obow

arXiv:2012.08274 [pdf, other]

Artificial Dummies for Urban Dataset Augmentation

Authors: Antonín Vobecký, David Hurych, Michal Uřičář, Patrick Pérez, Josef Šivic

Abstract: Existing datasets for training pedestrian detectors in images suffer from limited appearance and pose variation. The most challenging scenarios are rarely included because they are too difficult to capture due to safety reasons, or they are very unlikely to happen. The strict safety requirements in assisted and autonomous driving applications call for an extra high detection accuracy also in these… ▽ More Existing datasets for training pedestrian detectors in images suffer from limited appearance and pose variation. The most challenging scenarios are rarely included because they are too difficult to capture due to safety reasons, or they are very unlikely to happen. The strict safety requirements in assisted and autonomous driving applications call for an extra high detection accuracy also in these rare situations. Having the ability to generate people images in arbitrary poses, with arbitrary appearances and embedded in different background scenes with varying illumination and weather conditions, is a crucial component for the development and testing of such applications. The contributions of this paper are three-fold. First, we describe an augmentation method for controlled synthesis of urban scenes containing people, thus producing rare or never-seen situations. This is achieved with a data generator (called DummyNet) with disentangled control of the pose, the appearance, and the target background scene. Second, the proposed generator relies on novel network architecture and associated loss that takes into account the segmentation of the foreground person and its composition into the background scene. Finally, we demonstrate that the data generated by our DummyNet improve performance of several existing person detectors across various datasets as well as in challenging situations, such as night-time conditions, where only a limited amount of training data is available. In the setup with only day-time data available, we improve the night-time detector by $17\%$ log-average miss rate over the detector trained with the day-time data only. △ Less

Submitted 15 December, 2020; originally announced December 2020.

Comments: Accepted to AAAI 2021

arXiv:2012.06599 [pdf, other]

doi 10.1007/JHEP02(2021)163

Baryonic Higgs and Dark Matter

Authors: Pavel Fileviez Perez, Clara Murgui, Alexis D. Plascencia

Abstract: We discuss the correlation between dark matter and Higgs decays in gauge theories where the dark matter is predicted from anomaly cancellation. In these theories, the Higgs responsible for the breaking of the gauge symmetry generates the mass for the dark matter candidate. We investigate the Higgs decays in the minimal gauge theory for Baryon number. After imposing the dark matter density and dire… ▽ More We discuss the correlation between dark matter and Higgs decays in gauge theories where the dark matter is predicted from anomaly cancellation. In these theories, the Higgs responsible for the breaking of the gauge symmetry generates the mass for the dark matter candidate. We investigate the Higgs decays in the minimal gauge theory for Baryon number. After imposing the dark matter density and direct detection constraints, we find that the new Higgs can have a large branching ratio into two photons or into dark matter. Furthermore, we discuss the production channels and the unique signatures at the Large Hadron Collider. △ Less

Submitted 6 January, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

Comments: 24 pages. v2: minor changes to the text, accepted for publication in JHEP

Journal ref: JHEP 02 (2021) 163

arXiv:2012.06508 [pdf, other]

Confidence Estimation via Auxiliary Models

Authors: Charles Corbière, Nicolas Thome, Antoine Saporta, Tuan-Hung Vu, Matthieu Cord, Patrick Pérez

Abstract: Reliably quantifying the confidence of deep neural classifiers is a challenging yet fundamental requirement for deploying such models in safety-critical applications. In this paper, we introduce a novel target criterion for model confidence, namely the true class probability (TCP). We show that TCP offers better properties for confidence estimation than standard maximum class probability (MCP). Si… ▽ More Reliably quantifying the confidence of deep neural classifiers is a challenging yet fundamental requirement for deploying such models in safety-critical applications. In this paper, we introduce a novel target criterion for model confidence, namely the true class probability (TCP). We show that TCP offers better properties for confidence estimation than standard maximum class probability (MCP). Since the true class is by essence unknown at test time, we propose to learn TCP criterion from data with an auxiliary model, introducing a specific learning scheme adapted to this context. We evaluate our approach on the task of failure prediction and of self-training with pseudo-labels for domain adaptation, which both necessitate effective confidence estimates. Extensive experiments are conducted for validating the relevance of the proposed approach in each task. We study various network architectures and experiment with small and large datasets for image classification and semantic segmentation. In every tested benchmark, our approach outperforms strong baselines. △ Less

Submitted 31 May, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

Comments: Accepted to TPAMI 2021

arXiv:2012.04983 [pdf, other]

doi 10.1016/j.patcog.2021.108421

Driving Behavior Explanation with Multi-level Fusion

Authors: Hédi Ben-Younes, Éloi Zablocki, Patrick Pérez, Matthieu Cord

Abstract: In this era of active development of autonomous vehicles, it becomes crucial to provide driving systems with the capacity to explain their decisions. In this work, we focus on generating high-level driving explanations as the vehicle drives. We present BEEF, for BEhavior Explanation with Fusion, a deep architecture which explains the behavior of a trajectory prediction model. Supervised by annotat… ▽ More In this era of active development of autonomous vehicles, it becomes crucial to provide driving systems with the capacity to explain their decisions. In this work, we focus on generating high-level driving explanations as the vehicle drives. We present BEEF, for BEhavior Explanation with Fusion, a deep architecture which explains the behavior of a trajectory prediction model. Supervised by annotations of human driving decisions justifications, BEEF learns to fuse features from multiple levels. Leveraging recent advances in the multi-modal fusion literature, BEEF is carefully designed to model the correlations between high-level decisions features and mid-level perceptual features. The flexibility and efficiency of our approach are validated with extensive experiments on the HDD and BDD-X datasets. △ Less

Submitted 9 December, 2020; originally announced December 2020.

Comments: Accepted at NeurIPS Workshop ML4AD 2020

Journal ref: Pattern Recognition, Volume 123, March 2022, 108421

arXiv:2012.02647 [pdf, other]

doi 10.1109/TITS.2021.3107587

Detecting 32 Pedestrian Attributes for Autonomous Vehicles

Authors: Taylor Mordan, Matthieu Cord, Patrick Pérez, Alexandre Alahi

Abstract: Pedestrians are arguably one of the most safety-critical road users to consider for autonomous vehicles in urban areas. In this paper, we address the problem of jointly detecting pedestrians and recognizing 32 pedestrian attributes from a single image. These encompass visual appearance and behavior, and also include the forecasting of road crossing, which is a main safety concern. For this, we int… ▽ More Pedestrians are arguably one of the most safety-critical road users to consider for autonomous vehicles in urban areas. In this paper, we address the problem of jointly detecting pedestrians and recognizing 32 pedestrian attributes from a single image. These encompass visual appearance and behavior, and also include the forecasting of road crossing, which is a main safety concern. For this, we introduce a Multi-Task Learning (MTL) model relying on a composite field framework, which achieves both goals in an efficient way. Each field spatially locates pedestrian instances and aggregates attribute predictions over them. This formulation naturally leverages spatial context, making it well suited to low resolution scenarios such as autonomous driving. By increasing the number of attributes jointly learned, we highlight an issue related to the scales of gradients, which arises in MTL with numerous tasks. We solve it by normalizing the gradients coming from different objective functions when they join at the fork in the network architecture during the backward pass, referred to as fork-normalization. Experimental validation is performed on JAAD, a dataset providing numerous attributes for pedestrian analysis from autonomous vehicles, and shows competitive detection and attribute recognition results, as well as a more stable MTL training. △ Less

Submitted 27 August, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

Comments: Accepted to IEEE Transactions on Intelligent Transportation Systems (T-ITS). Code available at https://github.com/vita-epfl/detection-attributes-fields

arXiv:2011.15028 [pdf, other]

The LDBC Graphalytics Benchmark

Authors: Alexandru Iosup, Ahmed Musaafir, Alexandru Uta, Arnau Prat Pérez, Gábor Szárnyas, Hassan Chafi, Ilie Gabriel Tănase, Lifeng Nai, Michael Anderson, Mihai Capotă, Narayanan Sundaram, Peter Boncz, Siegfried Depner, Stijn Heldens, Thomas Manhardt, Tim Hegeman, Wing Lung Ngai, Yinglong Xia

Abstract: In this document, we describe LDBC Graphalytics, an industrial-grade benchmark for graph analysis platforms. The main goal of Graphalytics is to enable the fair and objective comparison of graph analysis platforms. Due to the diversity of bottlenecks and performance issues such platforms need to address, Graphalytics consists of a set of selected deterministic algorithms for full-graph analysis, s… ▽ More In this document, we describe LDBC Graphalytics, an industrial-grade benchmark for graph analysis platforms. The main goal of Graphalytics is to enable the fair and objective comparison of graph analysis platforms. Due to the diversity of bottlenecks and performance issues such platforms need to address, Graphalytics consists of a set of selected deterministic algorithms for full-graph analysis, standard graph datasets, synthetic dataset generators, and reference output for validation purposes. Its test harness produces deep metrics that quantify multiple kinds of systems scalability, weak and strong, and robustness, such as failures and performance variability. The benchmark also balances comprehensiveness with runtime necessary to obtain the deep metrics. The benchmark comes with open-source software for generating performance data, for validating algorithm results, for monitoring and sharing performance data, and for obtaining the final benchmark result as a standard performance report. △ Less

Submitted 6 April, 2023; v1 submitted 30 November, 2020; originally announced November 2020.

ACM Class: C.4; H.2.4

arXiv:2011.08660 [pdf, other]

doi 10.1364/OE.423222

PhaseGAN: A deep-learning phase-retrieval approach for unpaired datasets

Authors: Yuhe Zhang, Mike Andreas Noack, Patrik Vagovic, Kamel Fezzaa, Francisco Garcia-Moreno, Tobias Ritschel, Pablo Villanueva-Perez

Abstract: Phase retrieval approaches based on DL provide a framework to obtain phase information from an intensity hologram or diffraction pattern in a robust manner and in real time. However, current DL architectures applied to the phase problem rely i) on paired datasets, i.e., they are only applicable when a satisfactory solution of the phase problem has been found, and ii) on the fact that most of them… ▽ More Phase retrieval approaches based on DL provide a framework to obtain phase information from an intensity hologram or diffraction pattern in a robust manner and in real time. However, current DL architectures applied to the phase problem rely i) on paired datasets, i.e., they are only applicable when a satisfactory solution of the phase problem has been found, and ii) on the fact that most of them ignore the physics of the imaging process. Here, we present PhaseGAN, a new DL approach based on Generative Adversarial Networks, which allows the use of unpaired datasets and includes the physics of image formation. Performance of our approach is enhanced by including the image formation physics and provides phase reconstructions when conventional phase retrieval algorithms fail, such as ultra-fast experiments. Thus, PhaseGAN offers the opportunity to address the phase problem when no phase reconstructions are available, but good simulations of the object or data from other experiments are available, enabling us to obtain results not possible before. △ Less

Submitted 19 November, 2020; v1 submitted 15 November, 2020; originally announced November 2020.

arXiv:2009.09485 [pdf, other]

PIE: Portrait Image Embedding for Semantic Control

Authors: Ayush Tewari, Mohamed Elgharib, Mallikarjun B R., Florian Bernard, Hans-Peter Seidel, Patrick Pérez, Michael Zollhöfer, Christian Theobalt

Abstract: Editing of portrait images is a very popular and important research topic with a large variety of applications. For ease of use, control should be provided via a semantically meaningful parameterization that is akin to computer animation controls. The vast majority of existing techniques do not provide such intuitive and fine-grained control, or only enable coarse editing of a single isolated cont… ▽ More Editing of portrait images is a very popular and important research topic with a large variety of applications. For ease of use, control should be provided via a semantically meaningful parameterization that is akin to computer animation controls. The vast majority of existing techniques do not provide such intuitive and fine-grained control, or only enable coarse editing of a single isolated control parameter. Very recently, high-quality semantically controlled editing has been demonstrated, however only on synthetically created StyleGAN images. We present the first approach for embedding real portrait images in the latent space of StyleGAN, which allows for intuitive editing of the head pose, facial expression, and scene illumination in the image. Semantic editing in parameter space is achieved based on StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN. We design a novel hierarchical non-linear optimization problem to obtain the embedding. An identity preservation energy term allows spatially coherent edits while maintaining facial integrity. Our approach runs at interactive frame rates and thus allows the user to explore the space of possible edits. We evaluate our approach on a wide set of portrait photos, compare it to the current state of the art, and validate the effectiveness of its components in an ablation study. △ Less

Submitted 20 September, 2020; originally announced September 2020.

Comments: To appear in SIGGRAPH Asia 2020. Project webpage: https://gvv.mpi-inf.mpg.de/projects/PIE/

arXiv:2008.09116 [pdf, other]

doi 10.1007/JHEP03(2021)185

Electric Dipole Moments, New Forces and Dark Matter

Authors: Pavel Fileviez Perez, Alexis D. Plascencia

Abstract: New sources of CP violation beyond the Standard Model are crucial to explain the baryon asymmetry in the Universe. We discuss the impact of new CP violating interactions in theories where a dark matter candidate is predicted by the cancellation of gauge anomalies. In these theories, the constraint on the dark matter relic density implies an upper bound on the new symmetry breaking scale from which… ▽ More New sources of CP violation beyond the Standard Model are crucial to explain the baryon asymmetry in the Universe. We discuss the impact of new CP violating interactions in theories where a dark matter candidate is predicted by the cancellation of gauge anomalies. In these theories, the constraint on the dark matter relic density implies an upper bound on the new symmetry breaking scale from which all new states acquire their masses. We investigate in detail the predictions for electric dipole moments and show that if the relevant CP-violating phase is large, experiments such as the ACME collaboration will be able to fully probe the theory. △ Less

Submitted 17 February, 2021; v1 submitted 20 August, 2020; originally announced August 2020.

Comments: 9 pages, 10 figures. v2: Minor changes to the text, accepted for publication in JHEP

Journal ref: JHEP 03 (2021) 185

arXiv:2007.10753 [pdf, other]

Enhancement of damaged-image prediction through Cahn-Hilliard Image Inpainting

Authors: José A. Carrillo, Serafim Kalliadasis, Fuyue Liang, Sergio P. Perez

Abstract: We assess the benefit of including an image inpainting filter before passing damaged images into a classification neural network. For this we employ a modified Cahn-Hilliard equation as an image inpainting filter, which is solved via a finite volume scheme with reduced computational cost and adequate properties for energy stability and boundedness. The benchmark dataset employed here is MNIST, whi… ▽ More We assess the benefit of including an image inpainting filter before passing damaged images into a classification neural network. For this we employ a modified Cahn-Hilliard equation as an image inpainting filter, which is solved via a finite volume scheme with reduced computational cost and adequate properties for energy stability and boundedness. The benchmark dataset employed here is MNIST, which consists of binary images of handwritten digits and is a standard dataset to validate image-processing methodologies. We train a neural network based of dense layers with the training set of MNIST, and subsequently we contaminate the test set with damage of different types and intensities. We then compare the prediction accuracy of the neural network with and without applying the Cahn-Hilliard filter to the damaged images test. Our results quantify the significant improvement of damaged-image prediction due to applying the Cahn-Hilliard filter, which for specific damages can increase up to 50% and is in general advantageous for low to moderate damage. △ Less

Submitted 15 March, 2021; v1 submitted 21 July, 2020; originally announced July 2020.

Comments: An interactive jupyter notebook with the code of this work is available at https://github.com/sergiopperez/Image_Inpainting. The MNIST dataset employed in this work can be downloaded from http://yann.lecun.com/exdb/mnist/

MSC Class: 68U10; 94A08; 65M22; 76M25; 76M12

arXiv:2007.10285 [pdf, other]

Condición de Lorentz y ecuaciones de ondas electromagnéticas como propiedades emergentes del sistema de Maxwell

Authors: Yudier Peña Pérez, Juan Bory Reyes

Abstract: This article deals with the study of electromagnetic waves equations and the Lorentz condition, as emergent properties of Maxwell's system in the context of systems theory. To do this, the wave equations and the Helmholtz equation are first deduced. Using the displaced Dirac operator, which is closely related to the main vector calculation operators, it is possible to establish a direct connection… ▽ More This article deals with the study of electromagnetic waves equations and the Lorentz condition, as emergent properties of Maxwell's system in the context of systems theory. To do this, the wave equations and the Helmholtz equation are first deduced. Using the displaced Dirac operator, which is closely related to the main vector calculation operators, it is possible to establish a direct connection between the solutions of the Maxwell time-harmonic system and two quaternion equations. Also, the application of the Lorentz condition to transform the time-harmonic Maxwell system into a simple quaternion equation based on the scalar and vector potentials is exposed. △ Less

Submitted 26 October, 2020; v1 submitted 26 June, 2020; originally announced July 2020.

Comments: in Spanish. The sections were organized in a better way. Revised argument in section 4, results unchanged. The reference list has been updated

arXiv:2007.05397 [pdf, other]

doi 10.2352/ISSN.2470-1173.2020.16.AVM-109

VRUNet: Multi-Task Learning Model for Intent Prediction of Vulnerable Road Users

Authors: Adithya Ranga, Filippo Giruzzi, Jagdish Bhanushali, Emilie Wirbel, Patrick Pérez, Tuan-Hung Vu, Xavier Perrotton

Abstract: Advanced perception and path planning are at the core for any self-driving vehicle. Autonomous vehicles need to understand the scene and intentions of other road users for safe motion planning. For urban use cases it is very important to perceive and predict the intentions of pedestrians, cyclists, scooters, etc., classified as vulnerable road users (VRU). Intent is a combination of pedestrian act… ▽ More Advanced perception and path planning are at the core for any self-driving vehicle. Autonomous vehicles need to understand the scene and intentions of other road users for safe motion planning. For urban use cases it is very important to perceive and predict the intentions of pedestrians, cyclists, scooters, etc., classified as vulnerable road users (VRU). Intent is a combination of pedestrian activities and long term trajectories defining their future motion. In this paper we propose a multi-task learning model to predict pedestrian actions, crossing intent and forecast their future path from video sequences. We have trained the model on naturalistic driving open-source JAAD dataset, which is rich in behavioral annotations and real world scenarios. Experimental results show state-of-the-art performance on JAAD dataset and how we can benefit from jointly learning and predicting actions and trajectories using 2D human pose features and scene context. △ Less

Submitted 10 July, 2020; originally announced July 2020.

Comments: This paper is reprinted from, "VRUNet: Multi-Task Learning Model for Intent Prediction of Vulnerable Road Users, IS&T Electronic Imaging: Autonomous Vehicles and Machines 2020 Proceedings, (IS&T, Springfield, VA, 2020) page 109-1-10. DOI: 10.2352/ISSN.2470-1173.2020.16.AVM-109." Reprinted with permission of The Society for Imaging Science and Technology, holders of the 2020 copyright

arXiv:2007.02662 [pdf, other]

Toward unsupervised, multi-object discovery in large-scale image collections

Authors: Huy V. Vo, Patrick Pérez, Jean Ponce

Abstract: This paper addresses the problem of discovering the objects present in a collection of images without any supervision. We build on the optimization approach of Vo et al. (CVPR'19) with several key novelties: (1) We propose a novel saliency-based region proposal algorithm that achieves significantly higher overlap with ground-truth objects than other competitive methods. This procedure leverages of… ▽ More This paper addresses the problem of discovering the objects present in a collection of images without any supervision. We build on the optimization approach of Vo et al. (CVPR'19) with several key novelties: (1) We propose a novel saliency-based region proposal algorithm that achieves significantly higher overlap with ground-truth objects than other competitive methods. This procedure leverages off-the-shelf CNN features trained on classification tasks without any bounding box information, but is otherwise unsupervised. (2) We exploit the inherent hierarchical structure of proposals as an effective regularizer for the approach to object discovery of Vo et al., boosting its performance to significantly improve over the state of the art on several standard benchmarks. (3) We adopt a two-stage strategy to select promising proposals using small random sets of images before using the whole image collection to discover the objects it depicts, allowing us to tackle, for the first time (to the best of our knowledge), the discovery of multiple objects in each one of the pictures making up datasets with up to 20,000 images, an over five-fold increase compared to existing methods, and a first step toward true large-scale unsupervised image interpretation. △ Less

Submitted 25 August, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

Comments: Accepted for publication in European Conference on Computer Vision (ECCV) 2020

arXiv:2006.13382 [pdf, other]

Spherical Perspective on Learning with Normalization Layers

Authors: Simon Roburin, Yann de Mont-Marin, Andrei Bursuc, Renaud Marlet, Patrick Pérez, Mathieu Aubry

Abstract: Normalization Layers (NLs) are widely used in modern deep-learning architectures. Despite their apparent simplicity, their effect on optimization is not yet fully understood. This paper introduces a spherical framework to study the optimization of neural networks with NLs from a geometric perspective. Concretely, the radial invariance of groups of parameters, such as filters for convolutional neur… ▽ More Normalization Layers (NLs) are widely used in modern deep-learning architectures. Despite their apparent simplicity, their effect on optimization is not yet fully understood. This paper introduces a spherical framework to study the optimization of neural networks with NLs from a geometric perspective. Concretely, the radial invariance of groups of parameters, such as filters for convolutional neural networks, allows to translate the optimization steps on the $L_2$ unit hypersphere. This formulation and the associated geometric interpretation shed new light on the training dynamics. Firstly, the first effective learning rate expression of Adam is derived. Then the demonstration that, in the presence of NLs, performing Stochastic Gradient Descent (SGD) alone is actually equivalent to a variant of Adam constrained to the unit hypersphere, stems from the framework. Finally, this analysis outlines phenomena that previous variants of Adam act on and their importance in the optimization process are experimentally validated. △ Less

Submitted 19 May, 2022; v1 submitted 23 June, 2020; originally announced June 2020.

arXiv:2006.08658 [pdf, other]

ESL: Entropy-guided Self-supervised Learning for Domain Adaptation in Semantic Segmentation

Authors: Antoine Saporta, Tuan-Hung Vu, Matthieu Cord, Patrick Pérez

Abstract: While fully-supervised deep learning yields good models for urban scene semantic segmentation, these models struggle to generalize to new environments with different lighting or weather conditions for instance. In addition, producing the extensive pixel-level annotations that the task requires comes at a great cost. Unsupervised domain adaptation (UDA) is one approach that tries to address these i… ▽ More While fully-supervised deep learning yields good models for urban scene semantic segmentation, these models struggle to generalize to new environments with different lighting or weather conditions for instance. In addition, producing the extensive pixel-level annotations that the task requires comes at a great cost. Unsupervised domain adaptation (UDA) is one approach that tries to address these issues in order to make such systems more scalable. In particular, self-supervised learning (SSL) has recently become an effective strategy for UDA in semantic segmentation. At the core of such methods lies `pseudo-labeling', that is, the practice of assigning high-confident class predictions as pseudo-labels, subsequently used as true labels, for target data. To collect pseudo-labels, previous works often rely on the highest softmax score, which we here argue as an unfavorable confidence measurement. In this work, we propose Entropy-guided Self-supervised Learning (ESL), leveraging entropy as the confidence indicator for producing more accurate pseudo-labels. On different UDA benchmarks, ESL consistently outperforms strong SSL baselines and achieves state-of-the-art results. △ Less

Submitted 15 June, 2020; originally announced June 2020.

Comments: Accepted at the CVPR 2020 Workshop on Scalability in Autonomous Driving

arXiv:2006.05966 [pdf, other]

doi 10.1016/j.nima.2020.164657

Positron production using a 9 MeV electron linac for the GBAR experiment

Authors: M. Charlton, J. J. Choi, M. Chung, P. Clade, P. Comini, P-P. Crepin, P. Crivelli, O. Dalkarov, P. Debu, L. Dodd, A. Douillet, S. Guellati-Khelifa, P-A. Hervieux, L. Hilico, A. Husson, P. Indelicato, G. Janka, S. Jonsell, J-P. Karr, B. H. Kim, E-S. Kim, S. K. Kim, Y. Ko, T. Kosinski, N. Kuroda , et al. (45 additional authors not shown)

Abstract: For the GBAR (Gravitational Behaviour of Antihydrogen at Rest) experiment at CERN's Antiproton Decelerator (AD) facility we have constructed a source of slow positrons, which uses a low-energy electron linear accelerator (linac). The driver linac produces electrons of 9 MeV kinetic energy that create positrons from bremsstrahlung-induced pair production. Staying below 10 MeV ensures no persistent… ▽ More For the GBAR (Gravitational Behaviour of Antihydrogen at Rest) experiment at CERN's Antiproton Decelerator (AD) facility we have constructed a source of slow positrons, which uses a low-energy electron linear accelerator (linac). The driver linac produces electrons of 9 MeV kinetic energy that create positrons from bremsstrahlung-induced pair production. Staying below 10 MeV ensures no persistent radioactive activation in the target zone and that the radiation level outside the biological shield is safe for public access. An annealed tungsten-mesh assembly placed directly behind the target acts as a positron moderator. The system produces $5\times10^7$ slow positrons per second, a performance demonstrating that a low-energy electron linac is a superior choice over positron-emitting radioactive sources for high positron flux. △ Less

Submitted 6 October, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

Comments: published in NIM A. 33 pages 9 figures

Journal ref: Nucl. Instrum. Methods Phys. Res. A 985, 164657 (2021)

arXiv:2005.04408 [pdf, other]

Photo style transfer with consistency losses

Authors: Xu Yao, Gilles Puy, Patrick Pérez

Abstract: We address the problem of style transfer between two photos and propose a new way to preserve photorealism. Using the single pair of photos available as input, we train a pair of deep convolution networks (convnets), each of which transfers the style of one photo to the other. To enforce photorealism, we introduce a content preserving mechanism by combining a cycle-consistency loss with a self-con… ▽ More We address the problem of style transfer between two photos and propose a new way to preserve photorealism. Using the single pair of photos available as input, we train a pair of deep convolution networks (convnets), each of which transfers the style of one photo to the other. To enforce photorealism, we introduce a content preserving mechanism by combining a cycle-consistency loss with a self-consistency loss. Experimental results show that this method does not suffer from typical artifacts observed in methods working in the same settings. We then further analyze some properties of these trained convnets. First, we notice that they can be used to stylize other unseen images with same known style. Second, we show that retraining only a small subset of the network parameters can be sufficient to adapt these convnets to new styles. △ Less

Submitted 9 May, 2020; originally announced May 2020.

Journal ref: In 2019 IEEE International Conference on Image Processing (ICIP) (pp. 2314-2318). IEEE

arXiv:2005.04235 [pdf, other]

doi 10.1103/PhysRevD.102.015010

Probing the Nature of Neutrinos with a New Force

Authors: Pavel Fileviez Perez, Alexis D. Plascencia

Abstract: We discuss the possibility to distinguish between Dirac and Majorana neutrinos in the context of the minimal gauge theory for neutrino masses, the B-L gauge extension of the Standard Model. We revisit the possibility to observe lepton number violation at the Large Hadron Collider and point out the importance of the decays of the new gauge boson to discriminate between the existence of Dirac or Maj… ▽ More We discuss the possibility to distinguish between Dirac and Majorana neutrinos in the context of the minimal gauge theory for neutrino masses, the B-L gauge extension of the Standard Model. We revisit the possibility to observe lepton number violation at the Large Hadron Collider and point out the importance of the decays of the new gauge boson to discriminate between the existence of Dirac or Majorana neutrinos. △ Less

Submitted 9 July, 2020; v1 submitted 8 May, 2020; originally announced May 2020.

Comments: 18 pages, 10 figures. v2: Typos fixed. Accepted for publication in PRD

Journal ref: Phys. Rev. D 102, 015010 (2020)

arXiv:2005.01456 [pdf, other]

CARRADA Dataset: Camera and Automotive Radar with Range-Angle-Doppler Annotations

Authors: A. Ouaknine, A. Newson, J. Rebut, F. Tupin, P. Pérez

Abstract: High quality perception is essential for autonomous driving (AD) systems. To reach the accuracy and robustness that are required by such systems, several types of sensors must be combined. Currently, mostly cameras and laser scanners (lidar) are deployed to build a representation of the world around the vehicle. While radar sensors have been used for a long time in the automotive industry, they ar… ▽ More High quality perception is essential for autonomous driving (AD) systems. To reach the accuracy and robustness that are required by such systems, several types of sensors must be combined. Currently, mostly cameras and laser scanners (lidar) are deployed to build a representation of the world around the vehicle. While radar sensors have been used for a long time in the automotive industry, they are still under-used for AD despite their appealing characteristics (notably, their ability to measure the relative speed of obstacles and to operate even in adverse weather conditions). To a large extent, this situation is due to the relative lack of automotive datasets with real radar signals that are both raw and annotated. In this work, we introduce CARRADA, a dataset of synchronized camera and radar recordings with range-angle-Doppler annotations. We also present a semi-automatic annotation approach, which was used to annotate the dataset, and a radar semantic segmentation baseline, which we evaluate on several metrics. Both our code and dataset are available online. △ Less

Submitted 26 May, 2021; v1 submitted 4 May, 2020; originally announced May 2020.

Comments: 9 pages, 5 figues. Accepted at ICPR 2020. Erratum: results in Table III have been updated since the ICPR proceedings, models are selected using the PP metric instead of the previously used PR metric

ACM Class: I.2.10; I.4.8

arXiv:2004.05341 [pdf, other]

High-order well-balanced finite volume schemes for hydrodynamic equations with nonlocal free energy

Authors: José A. Carrillo, Manuel J. Castro, Serafim Kalliadasis, Sergio P. Perez

Abstract: We propose high-order well-balanced finite-volume schemes for a broad class of hydrodynamic systems with attractive-repulsive interaction forces and linear and nonlinear damping. Our schemes are suitable for free energies containing convolutions of an interaction potential with the density, which are essential for applications such as the Keller-Segel model, more general Euler-Poisson systems, or… ▽ More We propose high-order well-balanced finite-volume schemes for a broad class of hydrodynamic systems with attractive-repulsive interaction forces and linear and nonlinear damping. Our schemes are suitable for free energies containing convolutions of an interaction potential with the density, which are essential for applications such as the Keller-Segel model, more general Euler-Poisson systems, or dynamic-density functional theory. Our schemes are also equipped with a nonnegative-density reconstruction which allows for vacuum regions during the simulation. We provide several prototypical examples from relevant applications highlighting the benefit of our algorithms elucidate also some of our analytical results. △ Less

Submitted 19 November, 2020; v1 submitted 11 April, 2020; originally announced April 2020.

MSC Class: 65XX (Primary); 35Qxx; 35Q35; 35Q82; 76M12 (Secondary)

arXiv:2004.01130 [pdf, other]

Handling new target classes in semantic segmentation with domain adaptation

Authors: Maxime Bucher, Tuan-Hung Vu, Matthieu Cord, Patrick Pérez

Abstract: In this work, we define and address a novel domain adaptation (DA) problem in semantic scene segmentation, where the target domain not only exhibits a data distribution shift w.r.t. the source domain, but also includes novel classes that do not exist in the latter. Different to "open-set" and "universal domain adaptation", which both regard all objects from new classes as "unknown", we aim at expl… ▽ More In this work, we define and address a novel domain adaptation (DA) problem in semantic scene segmentation, where the target domain not only exhibits a data distribution shift w.r.t. the source domain, but also includes novel classes that do not exist in the latter. Different to "open-set" and "universal domain adaptation", which both regard all objects from new classes as "unknown", we aim at explicit test-time prediction for these new classes. To reach this goal, we propose a framework that leverages domain adaptation and zero-shot learning techniques to enable "boundless" adaptation in the target domain. It relies on a novel architecture, along with a dedicated learning scheme, to bridge the source-target domain gap while learning how to map new classes' labels to relevant visual representations. The performance is further improved using self-training on target-domain pseudo-labels. For validation, we consider different domain adaptation set-ups, namely synthetic-2-real, country-2-country and dataset-2-dataset. Our framework outperforms the baselines by significant margins, setting competitive standards on all benchmarks for the new task. Code and models are available at https://github.com/valeoai/buda. △ Less

Submitted 16 February, 2021; v1 submitted 2 April, 2020; originally announced April 2020.

Comments: Under review at CVIU

arXiv:2004.00121 [pdf, other]

StyleRig: Rigging StyleGAN for 3D Control over Portrait Images

Authors: Ayush Tewari, Mohamed Elgharib, Gaurav Bharaj, Florian Bernard, Hans-Peter Seidel, Patrick Pérez, Michael Zollhöfer, Christian Theobalt

Abstract: StyleGAN generates photorealistic portrait images of faces with eyes, teeth, hair and context (neck, shoulders, background), but lacks a rig-like control over semantic face parameters that are interpretable in 3D, such as face pose, expressions, and scene illumination. Three-dimensional morphable face models (3DMMs) on the other hand offer control over the semantic parameters, but lack photorealis… ▽ More StyleGAN generates photorealistic portrait images of faces with eyes, teeth, hair and context (neck, shoulders, background), but lacks a rig-like control over semantic face parameters that are interpretable in 3D, such as face pose, expressions, and scene illumination. Three-dimensional morphable face models (3DMMs) on the other hand offer control over the semantic parameters, but lack photorealism when rendered and only model the face interior, not other parts of a portrait image (hair, mouth interior, background). We present the first method to provide a face rig-like control over a pretrained and fixed StyleGAN via a 3DMM. A new rigging network, RigNet is trained between the 3DMM's semantic parameters and StyleGAN's input. The network is trained in a self-supervised manner, without the need for manual annotations. At test time, our method generates portrait images with the photorealism of StyleGAN and provides explicit control over the 3D semantic parameters of the face. △ Less

Submitted 13 June, 2020; v1 submitted 31 March, 2020; originally announced April 2020.

Comments: CVPR 2020 (Oral). Project page: https://gvv.mpi-inf.mpg.de/projects/StyleRig/

arXiv:2003.12352 [pdf, other]

Enhanced Self-Perception in Mixed Reality: Egocentric Arm Segmentation and Database with Automatic Labelling

Authors: Ester Gonzalez-Sosa, Pablo Perez, Ruben Tolosana, Redouane Kachach, Alvaro Villegas

Abstract: In this study, we focus on the egocentric segmentation of arms to improve self-perception in Augmented Virtuality (AV). The main contributions of this work are: i) a comprehensive survey of segmentation algorithms for AV; ii) an Egocentric Arm Segmentation Dataset, composed of more than 10, 000 images, comprising variations of skin color, and gender, among others. We provide all details required f… ▽ More In this study, we focus on the egocentric segmentation of arms to improve self-perception in Augmented Virtuality (AV). The main contributions of this work are: i) a comprehensive survey of segmentation algorithms for AV; ii) an Egocentric Arm Segmentation Dataset, composed of more than 10, 000 images, comprising variations of skin color, and gender, among others. We provide all details required for the automated generation of groundtruth and semi-synthetic images; iii) the use of deep learning for the first time for segmenting arms in AV; iv) to showcase the usefulness of this database, we report results on different real egocentric hand datasets, including GTEA Gaze+, EDSH, EgoHands, Ego Youtube Hands, THU-Read, TEgO, FPAB, and Ego Gesture, which allow for direct comparisons with existing approaches utilizing color or depth. Results confirm the suitability of the EgoArm dataset for this task, achieving improvement up to 40% with respect to the original network, depending on the particular dataset. Results also suggest that, while approaches based on color or depth can work in controlled conditions (lack of occlusion, uniform lighting, only objects of interest in the near range, controlled background, etc.), egocentric segmentation based on deep learning is more robust in real AV applications. △ Less

Submitted 27 March, 2020; originally announced March 2020.

arXiv:2003.09426 [pdf, other]

doi 10.1007/JHEP07(2020)087

The Higgs and Leptophobic Force at the LHC

Authors: Pavel Fileviez Perez, Elliot Golias, Clara Murgui, Alexis D. Plascencia

Abstract: The Higgs boson could provide the key to discover new physics at the Large Hadron Collider. We investigate novel decays of the Standard Model (SM) Higgs boson into leptophobic gauge bosons which can be light in agreement with all experimental constraints. We study the associated production of the SM Higgs and the leptophobic gauge boson that could be crucial to test the existence of a leptophobic… ▽ More The Higgs boson could provide the key to discover new physics at the Large Hadron Collider. We investigate novel decays of the Standard Model (SM) Higgs boson into leptophobic gauge bosons which can be light in agreement with all experimental constraints. We study the associated production of the SM Higgs and the leptophobic gauge boson that could be crucial to test the existence of a leptophobic force. Our results demonstrate that it is possible to have a simple gauge extension of the SM at the low scale, without assuming very small couplings and in agreement with all the experimental bounds that can be probed at the LHC. △ Less

Submitted 11 June, 2020; v1 submitted 20 March, 2020; originally announced March 2020.

Comments: 19 pages, 10 figures, 3 appendices. v2: Accepted for publication in JHEP. Comments and references added

Journal ref: J. High Energ. Phys. 2020, 87 (2020)

arXiv:2002.12247 [pdf, other]

Learning Representations by Predicting Bags of Visual Words

Authors: Spyros Gidaris, Andrei Bursuc, Nikos Komodakis, Patrick Pérez, Matthieu Cord

Abstract: Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data. Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions that encode discrete visual concepts, here called visual words. To build such discrete representations, we quantize the feature maps of a… ▽ More Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data. Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions that encode discrete visual concepts, here called visual words. To build such discrete representations, we quantize the feature maps of a first pre-trained self-supervised convnet, over a k-means based vocabulary. Then, as a self-supervised task, we train another convnet to predict the histogram of visual words of an image (i.e., its Bag-of-Words representation) given as input a perturbed version of that image. The proposed task forces the convnet to learn perturbation-invariant and context-aware image features, useful for downstream image understanding tasks. We extensively evaluate our method and demonstrate very strong empirical results, e.g., our pre-trained self-supervised representations transfer better on detection task and similarly on classification over classes "unseen" during pre-training, when compared to the supervised case. This also shows that the process of image discretization into visual words can provide the basis for very powerful self-supervised approaches in the image domain, thus allowing further connections to be made to related methods from the NLP domain that have been extremely successful so far. △ Less

Submitted 27 February, 2020; originally announced February 2020.

Comments: Accepted to CVPR2020

arXiv:2002.00444 [pdf, other]

Deep Reinforcement Learning for Autonomous Driving: A Survey

Authors: B Ravi Kiran, Ibrahim Sobh, Victor Talpaert, Patrick Mannion, Ahmad A. Al Sallab, Senthil Yogamani, Patrick Pérez

Abstract: With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional environments. This review summarises deep reinforcement learning (DRL) algorithms and provides a taxonomy of automated driving tasks where (D)RL methods have been employed, while addressing key computat… ▽ More With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional environments. This review summarises deep reinforcement learning (DRL) algorithms and provides a taxonomy of automated driving tasks where (D)RL methods have been employed, while addressing key computational challenges in real world deployment of autonomous driving agents. It also delineates adjacent domains such as behavior cloning, imitation learning, inverse reinforcement learning that are related but are not classical RL algorithms. The role of simulators in training agents, methods to validate, test and robustify existing solutions in RL are discussed. △ Less

Submitted 23 January, 2021; v1 submitted 2 February, 2020; originally announced February 2020.

Comments: Accepted for publication at IEEE Transactions on Intelligent Transportation Systems

arXiv:2001.08830 [pdf, other]

Scattering Features for Multimodal Gait Recognition

Authors: Srđan Kitić, Gilles Puy, Patrick Pérez, Philippe Gilberton

Abstract: We consider the problem of identifying people on the basis of their walk (gait) pattern. Classical approaches to tackle this problem are based on, e.g., video recordings or piezoelectric sensors embedded in the floor. In this work, we rely on acoustic and vibration measurements, obtained from a microphone and a geophone sensor, respectively. The contribution of this work is twofold. First, we prop… ▽ More We consider the problem of identifying people on the basis of their walk (gait) pattern. Classical approaches to tackle this problem are based on, e.g., video recordings or piezoelectric sensors embedded in the floor. In this work, we rely on acoustic and vibration measurements, obtained from a microphone and a geophone sensor, respectively. The contribution of this work is twofold. First, we propose a feature extraction method based on an (untrained) shallow scattering network, specially tailored for the gait signals. Second, we demonstrate that fusing the two modalities improves identification in the practically relevant open set scenario. △ Less

Submitted 23 January, 2020; originally announced January 2020.

Comments: Published at IEEE GlobalSIP 2017

arXiv:2001.05570 [pdf, other]

doi 10.1093/mnras/staa158

A survey for variable young stars with small telescopes: II -- Mapping a protoplanetary disk with stable structures at 0.15 AU

Authors: Jack J. Evitts, Dirk Froebrich, Aleks Scholz, Jochen Eislöffel, Justyn Campbell-White, Will Furnell, Thomas Urtly, Roger Pickard, Klaas Wiersema, Pavol A. Dubovský, Igor Kudzej, Ramon Naves, Mario Morales Aimar, Rafael Castillo García, Tonny Vanmunster, Erik Schwendeman, Francisco C. Soldán Alfaro, Stephen Johnstone, Rafael Gonzalez Farfán, Thomas Killestein, Jesús Delgado Casal, Faustino García de la Cuesta, Dean Roberts, Ulrich Kolb, Luís Montoro , et al. (35 additional authors not shown)

Abstract: The HOYS citizen science project conducts long term, multifilter, high cadence monitoring of large YSO samples with a wide variety of professional and amateur telescopes. We present the analysis of the light curve of V1490Cyg in the Pelican Nebula. We show that colour terms in the diverse photometric data can be calibrated out to achieve a median photometric accuracy of 0.02mag in broadband filter… ▽ More The HOYS citizen science project conducts long term, multifilter, high cadence monitoring of large YSO samples with a wide variety of professional and amateur telescopes. We present the analysis of the light curve of V1490Cyg in the Pelican Nebula. We show that colour terms in the diverse photometric data can be calibrated out to achieve a median photometric accuracy of 0.02mag in broadband filters, allowing detailed investigations into a variety of variability amplitudes over timescales from hours to several years. Using GaiaDR2 we estimate the distance to the Pelican Nebula to be 870$^{+70}_{-55}$pc. V1490Cyg is a quasi-periodic dipper with a period of 31.447$\pm$0.011d. The obscuring dust has homogeneous properties, and grains larger than those typical in the ISM. Larger variability on short timescales is observed in U and R$_c-$H$α$, with U-amplitudes reaching 3mag on timescales of hours, indicating the source is accreting. The H$α$ equivalent width and NIR/MIR colours place V1490Cyg between CTTS/WTTS and transition disk objects. The material responsible for the dipping is located in a warped inner disk, about 0.15AU from the star. This mass reservoir can be filled and emptied on time scales shorter than the period at a rate of up to 10$^{-10}$M$_\odot$/yr, consistent with low levels of accretion in other T Tauri stars. Most likely the warp at this separation from the star is induced by a protoplanet in the inner accretion disk. However, we cannot fully rule out the possibility of an AA Tau-like warp, or occultations by the Hill sphere around a forming planet. △ Less

Submitted 17 January, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

Comments: 27 pages, 17 figures, accepted by MNRAS, full version with full appendix available at http://astro.kent.ac.uk/~df/

arXiv:1912.01540 [pdf, other]

QUEST: Quantized embedding space for transferring knowledge

Authors: Himalaya Jain, Spyros Gidaris, Nikos Komodakis, Patrick Pérez, Matthieu Cord

Abstract: Knowledge distillation refers to the process of training a compact student network to achieve better accuracy by learning from a high capacity teacher network. Most of the existing knowledge distillation methods direct the student to follow the teacher by matching the teacher's output, feature maps or their distribution. In this work, we propose a novel way to achieve this goal: by distilling the… ▽ More Knowledge distillation refers to the process of training a compact student network to achieve better accuracy by learning from a high capacity teacher network. Most of the existing knowledge distillation methods direct the student to follow the teacher by matching the teacher's output, feature maps or their distribution. In this work, we propose a novel way to achieve this goal: by distilling the knowledge through a quantized space. According to our method, the teacher's feature maps are quantized to represent the main visual concepts encompassed in the feature maps. The student is then asked to predict the quantized representation, which thus forms the task that the student uses to learn from the teacher. Despite its simplicity, we show that our approach is able to yield results that improve the state of the art on knowledge distillation. To that end, we provide an extensive evaluation across several network architectures and most commonly used benchmark datasets. △ Less

Submitted 17 July, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

Comments: Accepted at ECCV 2020

arXiv:1911.12676 [pdf, other]

xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation

Authors: Maximilian Jaritz, Tuan-Hung Vu, Raoul de Charette, Émilie Wirbel, Patrick Pérez

Abstract: Unsupervised Domain Adaptation (UDA) is crucial to tackle the lack of annotations in a new domain. There are many multi-modal datasets, but most UDA approaches are uni-modal. In this work, we explore how to learn from multi-modality and propose cross-modal UDA (xMUDA) where we assume the presence of 2D images and 3D point clouds for 3D semantic segmentation. This is challenging as the two input sp… ▽ More Unsupervised Domain Adaptation (UDA) is crucial to tackle the lack of annotations in a new domain. There are many multi-modal datasets, but most UDA approaches are uni-modal. In this work, we explore how to learn from multi-modality and propose cross-modal UDA (xMUDA) where we assume the presence of 2D images and 3D point clouds for 3D semantic segmentation. This is challenging as the two input spaces are heterogeneous and can be impacted differently by domain shift. In xMUDA, modalities learn from each other through mutual mimicking, disentangled from the segmentation objective, to prevent the stronger modality from adopting false predictions from the weaker one. We evaluate on new UDA scenarios including day-to-night, country-to-country and dataset-to-dataset, leveraging recent autonomous driving datasets. xMUDA brings large improvements over uni-modal UDA on all tested scenarios, and is complementary to state-of-the-art UDA techniques. Code is available at https://github.com/valeoai/xmuda. △ Less

Submitted 30 March, 2020; v1 submitted 28 November, 2019; originally announced November 2019.

Comments: Accepted at CVPR 2020. For a demo video, see http://tiny.cc/xmuda

arXiv:1911.09191 [pdf, other]

doi 10.1155/2020/6894580

Biquaternionic Reformulation of a Fractional Monochromatic Maxwell System

Authors: Yudier Peña Pérez, Ricardo Abreu Blaya, Martín Patricio Árciga Alejandre, Juan Bory Reyes

Abstract: In this work we propose a biquaternionic reformulation of a fractional monochromatic Maxwell system. Additionally, some examples are given to illustrate how the quaternionic fractional approach emerges in linear hydrodynamic and elasticity. In this work we propose a biquaternionic reformulation of a fractional monochromatic Maxwell system. Additionally, some examples are given to illustrate how the quaternionic fractional approach emerges in linear hydrodynamic and elasticity. △ Less

Submitted 20 November, 2019; originally announced November 2019.

Journal ref: Advances in High Energy Physics. Volume 2020 (2020) 1-9

arXiv:1911.05738 [pdf, other]

doi 10.1007/JHEP01(2020)091

Axion Dark Matter, Proton Decay and Unification

Authors: Pavel Fileviez Perez, Clara Murgui, Alexis D. Plascencia

Abstract: We discuss the possibility to predict the QCD axion mass in the context of grand unified theories. We investigate the implementation of the DFSZ mechanism in the context of renormalizable SU(5) theories. In the simplest theory, the axion mass can be predicted with good precision in the range $m_a = (2-16)$ neV, and there is a strong correlation between the predictions for the axion mass and proton… ▽ More We discuss the possibility to predict the QCD axion mass in the context of grand unified theories. We investigate the implementation of the DFSZ mechanism in the context of renormalizable SU(5) theories. In the simplest theory, the axion mass can be predicted with good precision in the range $m_a = (2-16)$ neV, and there is a strong correlation between the predictions for the axion mass and proton decay rates. In this context, we predict an upper bound for the proton decay channels with antineutrinos, $τ(p\to K^+ \barν) \lesssim 4 \times 10^{37} \text{ yr}$ and $τ(p \to π^+ \barν) \lesssim 2 \times 10^{36}\text{ yr}$. This theory can be considered as the minimal realistic grand unified theory with the DFSZ mechanism and it can be fully tested by proton decay and axion experiments. △ Less

Submitted 20 December, 2019; v1 submitted 13 November, 2019; originally announced November 2019.

Comments: 17 pages, 4 figures. v2: minor comments and references added, matches version published in JHEP

Journal ref: JHEP(2020) 2020: 91

arXiv:1911.02888 [pdf, other]

This dataset does not exist: training models from generated images

Authors: Victor Besnier, Himalaya Jain, Andrei Bursuc, Matthieu Cord, Patrick Pérez

Abstract: Current generative networks are increasingly proficient in generating high-resolution realistic images. These generative networks, especially the conditional ones, can potentially become a great tool for providing new image datasets. This naturally brings the question: Can we train a classifier only on the generated data? This potential availability of nearly unlimited amounts of training data cha… ▽ More Current generative networks are increasingly proficient in generating high-resolution realistic images. These generative networks, especially the conditional ones, can potentially become a great tool for providing new image datasets. This naturally brings the question: Can we train a classifier only on the generated data? This potential availability of nearly unlimited amounts of training data challenges standard practices for training machine learning models, which have been crafted across the years for limited and fixed size datasets. In this work we investigate this question and its related challenges. We identify ways to improve significantly the performance over naive training on randomly generated images with regular heuristics. We propose three standalone techniques that can be applied at different stages of the pipeline, i.e., data generation, training on generated data, and deploying on real data. We evaluate our proposed approaches on a subset of the ImageNet dataset and show encouraging results compared to classifiers trained on real images. △ Less

Submitted 7 November, 2019; originally announced November 2019.

arXiv:1910.14138 [pdf, ps, other]

Belief revision and 3-valued logics: Characterization of 19,683 belief change operators

Authors: Nerio Borges, Ramón Pino Pérez

Abstract: In most classical models of belief change, epistemic states are represented by theories (AGM) or formulas (Katsuno-Mendelzon) and the new pieces of information by formulas. The Representation Theorem for revision operators says that operators are represented by total preorders. This important representation is exploited by Darwiche and Pearl to shift the notion of epistemic state to a more abstrac… ▽ More In most classical models of belief change, epistemic states are represented by theories (AGM) or formulas (Katsuno-Mendelzon) and the new pieces of information by formulas. The Representation Theorem for revision operators says that operators are represented by total preorders. This important representation is exploited by Darwiche and Pearl to shift the notion of epistemic state to a more abstract one, where the paradigm of epistemic state is indeed that of a total preorder over interpretations. In this work, we introduce a 3-valued logic where the formulas can be identified with a generalisation of total preorders of three levels: a ranking function mapping interpretations into the truth values. Then we analyse some sort of changes in this kind of structures and give syntactical characterizations of them. △ Less

Submitted 30 October, 2019; originally announced October 2019.

arXiv:1910.05067 [pdf, other]

A Finite-Volume Method for Fluctuating Dynamical Density Functional Theory

Authors: Antonio Russo, Sergio P. Perez, Miguel A. Durán-Olivencia, Peter Yatsyshin, José A. Carrillo, Serafim Kalliadasis

Abstract: We introduce a finite-volume numerical scheme for solving stochastic gradient-flow equations. Such equations are of crucial importance within the framework of fluctuating hydrodynamics and dynamic density functional theory. Our proposed scheme deals with general free-energy functionals, including, for instance, external fields or interaction potentials. This allows us to simulate a range of physic… ▽ More We introduce a finite-volume numerical scheme for solving stochastic gradient-flow equations. Such equations are of crucial importance within the framework of fluctuating hydrodynamics and dynamic density functional theory. Our proposed scheme deals with general free-energy functionals, including, for instance, external fields or interaction potentials. This allows us to simulate a range of physical phenomena where thermal fluctuations play a crucial role, such as nucleation and other energy-barrier crossing transitions. A positivity-preserving algorithm for the density is derived based on a hybrid space discretization of the deterministic and the stochastic terms and different implicit and explicit time integrators. We show through numerous applications that not only our scheme is able to accurately reproduce the statistical properties (structure factor and correlations) of the physical system, but, because of the multiplicative noise, it allows us to simulate energy barrier crossing dynamics, which cannot be captured by mean-field approaches. △ Less

Submitted 31 December, 2020; v1 submitted 11 October, 2019; originally announced October 2019.

arXiv:1910.04851 [pdf, other]

Addressing Failure Prediction by Learning Model Confidence

Authors: Charles Corbière, Nicolas Thome, Avner Bar-Hen, Matthieu Cord, Patrick Pérez

Abstract: Assessing reliably the confidence of a deep neural network and predicting its failures is of primary importance for the practical deployment of these models. In this paper, we propose a new target criterion for model confidence, corresponding to the True Class Probability (TCP). We show how using the TCP is more suited than relying on the classic Maximum Class Probability (MCP). We provide in addi… ▽ More Assessing reliably the confidence of a deep neural network and predicting its failures is of primary importance for the practical deployment of these models. In this paper, we propose a new target criterion for model confidence, corresponding to the True Class Probability (TCP). We show how using the TCP is more suited than relying on the classic Maximum Class Probability (MCP). We provide in addition theoretical guarantees for TCP in the context of failure prediction. Since the true class is by essence unknown at test time, we propose to learn TCP criterion on the training set, introducing a specific learning scheme adapted to this context. Extensive experiments are conducted for validating the relevance of the proposed approach. We study various network architectures, small and large scale datasets for image classification and semantic segmentation. We show that our approach consistently outperforms several strong methods, from MCP to Bayesian uncertainty, as well as recent approaches specifically designed for failure prediction. △ Less

Submitted 26 October, 2019; v1 submitted 1 October, 2019; originally announced October 2019.

Comments: NeurIPS 2019 (accepted)

arXiv:1909.06974 [pdf, other]

doi 10.1007/978-3-030-53061-7_1

The combinatorics of plane curve singularities. How Newton polygons blossom into lotuses

Authors: Evelia R. García Barroso, Pedro D. González Pérez, Patrick Popescu-Pampu

Abstract: This survey may be seen as an introduction to the use of toric and tropical geometry in the analysis of plane curve singularities, which are germs $(C,o)$ of complex analytic curves contained in a smooth complex analytic surface $S$. The embedded topological type of such a pair $(S, C)$ is usually defined to be that of the oriented link obtained by intersecting $C$ with a sufficiently small orient… ▽ More This survey may be seen as an introduction to the use of toric and tropical geometry in the analysis of plane curve singularities, which are germs $(C,o)$ of complex analytic curves contained in a smooth complex analytic surface $S$. The embedded topological type of such a pair $(S, C)$ is usually defined to be that of the oriented link obtained by intersecting $C$ with a sufficiently small oriented Euclidean sphere centered at the point $o$, defined once a system of local coordinates $(x,y)$ was chosen on the germ $(S,o)$. If one works more generally over an arbitrary algebraically closed field of characteristic zero, one speaks instead of the combinatorial type of $(S, C)$. One may define it by looking either at the Newton-Puiseux series associated to $C$ relative to a generic local coordinate system $(x,y)$, or at the set of infinitely near points which have to be blown up in order to get the minimal embedded resolution of the germ $(C,o)$ or, thirdly, at the preimage of this germ by the resolution. Each point of view leads to a different encoding of the combinatorial type by a decorated tree: an Eggers-Wall tree, an Enriques diagram, or a weighted dual graph. The three trees contain the same information, which in the complex setting is equivalent to the knowledge of the embedded topological type. There are known algorithms for transforming one tree into another. In this paper we explain how a special type of two-dimensional simplicial complex called a lotus allows to think geometrically about the relations between the three types of trees. Namely, all of them embed in a natural lotus, their numerical decorations appearing as invariants of it. This lotus is constructed from the finite set of Newton polygons created during any process of resolution of $(C,o)$ by successive toric modifications. △ Less

Submitted 22 June, 2020; v1 submitted 15 September, 2019; originally announced September 2019.

Comments: 104 pages, 58 figures. Compared to the previous version, section 2 is new. The historical information, contained before in subsection 6.2, is distributed now throughout the paper in the subsections called "Historical comments''. More details are also added at various places of the paper. To appear in the Handbook of Geometry and Topology of Singularities I, Springer, 2020

MSC Class: 14H20; 14B05; 32S05

Journal ref: Handbook of Geometry and Topology of Singularities I, Springer (2020), 1-150

arXiv:1908.01772 [pdf, other]

doi 10.1007/JHEP11(2019)093

The QCD Axion and Unification

Authors: Pavel Fileviez Perez, Clara Murgui, Alexis D. Plascencia

Abstract: The QCD axion is one of the most appealing candidates for the dark matter in the Universe. In this article, we discuss the possibility to predict the axion mass in the context of a simple renormalizable grand unified theory where the Peccei-Quinn scale is determined by the unification scale. In this framework, the axion mass is predicted to be in the range… ▽ More The QCD axion is one of the most appealing candidates for the dark matter in the Universe. In this article, we discuss the possibility to predict the axion mass in the context of a simple renormalizable grand unified theory where the Peccei-Quinn scale is determined by the unification scale. In this framework, the axion mass is predicted to be in the range $m_a \simeq (3 - 13) \times 10^{-9} \ \rm{eV}$. We study the axion phenomenology and find that the ABRACADABRA and CASPEr-Electric experiments will be able to fully probe this mass window. △ Less

Submitted 4 November, 2019; v1 submitted 5 August, 2019; originally announced August 2019.

Comments: 14 pages plus appendices, 6 figures. v2: minor changes to the text, references added, accepted for publication in JHEP

Journal ref: JHEP 11 (2019) 093

arXiv:1906.05186 [pdf, other]

Boosting Few-Shot Visual Learning with Self-Supervision

Authors: Spyros Gidaris, Andrei Bursuc, Nikos Komodakis, Patrick Pérez, Matthieu Cord

Abstract: Few-shot learning and self-supervised learning address different facets of the same problem: how to train a model with little or no labeled data. Few-shot learning aims for optimization methods and models that can learn efficiently to recognize patterns in the low data regime. Self-supervised learning focuses instead on unlabeled data and looks into it for the supervisory signal to feed high capac… ▽ More Few-shot learning and self-supervised learning address different facets of the same problem: how to train a model with little or no labeled data. Few-shot learning aims for optimization methods and models that can learn efficiently to recognize patterns in the low data regime. Self-supervised learning focuses instead on unlabeled data and looks into it for the supervisory signal to feed high capacity deep neural networks. In this work we exploit the complementarity of these two domains and propose an approach for improving few-shot learning through self-supervision. We use self-supervision as an auxiliary task in a few-shot learning pipeline, enabling feature extractors to learn richer and more transferable visual representations while still using few annotated samples. Through self-supervision, our approach can be naturally extended towards using diverse unlabeled data from other datasets in the few-shot setting. We report consistent improvements across an array of architectures, datasets and self-supervision techniques. △ Less

Submitted 12 June, 2019; originally announced June 2019.

arXiv:1906.00817 [pdf, other]

Zero-Shot Semantic Segmentation

Authors: Maxime Bucher, Tuan-Hung Vu, Matthieu Cord, Patrick Pérez

Abstract: Semantic segmentation models are limited in their ability to scale to large numbers of object classes. In this paper, we introduce the new task of zero-shot semantic segmentation: learning pixel-wise classifiers for never-seen object categories with zero training examples. To this end, we present a novel architecture, ZS3Net, combining a deep visual segmentation model with an approach to generate… ▽ More Semantic segmentation models are limited in their ability to scale to large numbers of object classes. In this paper, we introduce the new task of zero-shot semantic segmentation: learning pixel-wise classifiers for never-seen object categories with zero training examples. To this end, we present a novel architecture, ZS3Net, combining a deep visual segmentation model with an approach to generate visual representations from semantic word embeddings. By this way, ZS3Net addresses pixel classification tasks where both seen and unseen categories are faced at test time (so called "generalized" zero-shot classification). Performance is further improved by a self-training step that relies on automatic pseudo-labeling of pixels from unseen classes. On the two standard segmentation datasets, Pascal-VOC and Pascal-Context, we propose zero-shot benchmarks and set competitive baselines. For complex scenes as ones in the Pascal-Context dataset, we extend our approach by using a graph-context encoding to fully leverage spatial context priors coming from class-wise segmentation maps. △ Less

Submitted 18 November, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

Comments: NeurIPS 2019 (accepted)

Showing 151–200 of 807 results for author: Perez, P