-
Enhancement of Approximation Spaces by the Use of Primals and Neighborhood
Authors:
A. Çaksu Güler
Abstract:
Rough set theory is one of the most widely used and significant approaches for handling incomplete information. It divides the universe in the beginning and uses equivalency relations to produce blocks. Numerous generalized rough set models have been put out and investigated in an effort to increase flexibility and extend the range of possible uses. We introduce four new generalized rough set mode…
▽ More
Rough set theory is one of the most widely used and significant approaches for handling incomplete information. It divides the universe in the beginning and uses equivalency relations to produce blocks. Numerous generalized rough set models have been put out and investigated in an effort to increase flexibility and extend the range of possible uses. We introduce four new generalized rough set models that draw inspiration from "neighborhoods and primals" in order to make a contribution to this topic. By minimizing the uncertainty regions, these models are intended to assist decision makers in more effectively analyzing and evaluating the provided data. We verify this goal by demonstrating that the existing models outperform certain current method approaches in terms of improving the approximation operators (upper and lower) and accuracy measurements. We claim that the current models can preserve nearly all significant aspects associated with the rough set model. Preserving the monotonic property, which enables us to assess data uncertainty and boost confidence in outcomes, is one of the intriguing characterizations derived from the existing models. With the aid of specific instances, we also compare the areas of the current approach. Finally, we demonstrate that the new strategy we define for our everyday health-related problem yields more accurate findings.
△ Less
Submitted 23 October, 2024;
originally announced November 2024.
-
Refining Tuberculosis Detection in CXR Imaging: Addressing Bias in Deep Neural Networks via Interpretability
Authors:
Özgür Acar Güler,
Manuel Günther,
André Anjos
Abstract:
Automatic classification of active tuberculosis from chest X-ray images has the potential to save lives, especially in low- and mid-income countries where skilled human experts can be scarce. Given the lack of available labeled data to train such systems and the unbalanced nature of publicly available datasets, we argue that the reliability of deep learning models is limited, even if they can be s…
▽ More
Automatic classification of active tuberculosis from chest X-ray images has the potential to save lives, especially in low- and mid-income countries where skilled human experts can be scarce. Given the lack of available labeled data to train such systems and the unbalanced nature of publicly available datasets, we argue that the reliability of deep learning models is limited, even if they can be shown to obtain perfect classification accuracy on the test data. One way of evaluating the reliability of such systems is to ensure that models use the same regions of input images for predictions as medical experts would. In this paper, we show that pre-training a deep neural network on a large-scale proxy task, as well as using mixed objective optimization network (MOON), a technique to balance different classes during pre-training and fine-tuning, can improve the alignment of decision foundations between models and experts, as compared to a model directly trained on the target dataset. At the same time, these approaches keep perfect classification accuracy according to the area under the receiver operating characteristic curve (AUROC) on the test set, and improve generalization on an independent, unseen dataset. For the purpose of reproducibility, our source code is made available online.
△ Less
Submitted 8 October, 2024; v1 submitted 19 July, 2024;
originally announced July 2024.
-
MeshPose: Unifying DensePose and 3D Body Mesh reconstruction
Authors:
Eric-Tuan Lê,
Antonis Kakolyris,
Petros Koutras,
Himmy Tam,
Efstratios Skordos,
George Papandreou,
Rıza Alp Güler,
Iasonas Kokkinos
Abstract:
DensePose provides a pixel-accurate association of images with 3D mesh coordinates, but does not provide a 3D mesh, while Human Mesh Reconstruction (HMR) systems have high 2D reprojection error, as measured by DensePose localization metrics. In this work we introduce MeshPose to jointly tackle DensePose and HMR. For this we first introduce new losses that allow us to use weak DensePose supervision…
▽ More
DensePose provides a pixel-accurate association of images with 3D mesh coordinates, but does not provide a 3D mesh, while Human Mesh Reconstruction (HMR) systems have high 2D reprojection error, as measured by DensePose localization metrics. In this work we introduce MeshPose to jointly tackle DensePose and HMR. For this we first introduce new losses that allow us to use weak DensePose supervision to accurately localize in 2D a subset of the mesh vertices ('VertexPose'). We then lift these vertices to 3D, yielding a low-poly body mesh ('MeshPose'). Our system is trained in an end-to-end manner and is the first HMR method to attain competitive DensePose accuracy, while also being lightweight and amenable to efficient inference, making it suitable for real-time AR applications.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
SPAD : Spatially Aware Multiview Diffusers
Authors:
Yash Kant,
Ziyi Wu,
Michael Vasilkovsky,
Guocheng Qian,
Jian Ren,
Riza Alp Guler,
Bernard Ghanem,
Sergey Tulyakov,
Igor Gilitschenski,
Aliaksandr Siarohin
Abstract:
We present SPAD, a novel approach for creating consistent multi-view images from text prompts or single images. To enable multi-view generation, we repurpose a pretrained 2D diffusion model by extending its self-attention layers with cross-view interactions, and fine-tune it on a high quality subset of Objaverse. We find that a naive extension of the self-attention proposed in prior work (e.g. MVD…
▽ More
We present SPAD, a novel approach for creating consistent multi-view images from text prompts or single images. To enable multi-view generation, we repurpose a pretrained 2D diffusion model by extending its self-attention layers with cross-view interactions, and fine-tune it on a high quality subset of Objaverse. We find that a naive extension of the self-attention proposed in prior work (e.g. MVDream) leads to content copying between views. Therefore, we explicitly constrain the cross-view attention based on epipolar geometry. To further enhance 3D consistency, we utilize Plucker coordinates derived from camera rays and inject them as positional encoding. This enables SPAD to reason over spatial proximity in 3D well. In contrast to recent works that can only generate views at fixed azimuth and elevation, SPAD offers full camera control and achieves state-of-the-art results in novel view synthesis on unseen objects from the Objaverse and Google Scanned Objects datasets. Finally, we demonstrate that text-to-3D generation using SPAD prevents the multi-face Janus issue. See more details at our webpage: https://yashkant.github.io/spad
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
iNVS: Repurposing Diffusion Inpainters for Novel View Synthesis
Authors:
Yash Kant,
Aliaksandr Siarohin,
Michael Vasilkovsky,
Riza Alp Guler,
Jian Ren,
Sergey Tulyakov,
Igor Gilitschenski
Abstract:
We present a method for generating consistent novel views from a single source image. Our approach focuses on maximizing the reuse of visible pixels from the source image. To achieve this, we use a monocular depth estimator that transfers visible pixels from the source view to the target view. Starting from a pre-trained 2D inpainting diffusion model, we train our method on the large-scale Objaver…
▽ More
We present a method for generating consistent novel views from a single source image. Our approach focuses on maximizing the reuse of visible pixels from the source image. To achieve this, we use a monocular depth estimator that transfers visible pixels from the source view to the target view. Starting from a pre-trained 2D inpainting diffusion model, we train our method on the large-scale Objaverse dataset to learn 3D object priors. While training we use a novel masking mechanism based on epipolar lines to further improve the quality of our approach. This allows our framework to perform zero-shot novel view synthesis on a variety of objects. We evaluate the zero-shot abilities of our framework on three challenging datasets: Google Scanned Objects, Ray Traced Multiview, and Common Objects in 3D. See our webpage for more details: https://yashkant.github.io/invs/
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Explainable Image Quality Assessment for Medical Imaging
Authors:
Caner Ozer,
Arda Guler,
Aysel Turkvatan Cansever,
Ilkay Oksuz
Abstract:
Medical image quality assessment is an important aspect of image acquisition, as poor-quality images may lead to misdiagnosis. Manual labelling of image quality is a tedious task for population studies and can lead to misleading results. While much research has been done on automated analysis of image quality to address this issue, relatively little work has been done to explain the methodologies.…
▽ More
Medical image quality assessment is an important aspect of image acquisition, as poor-quality images may lead to misdiagnosis. Manual labelling of image quality is a tedious task for population studies and can lead to misleading results. While much research has been done on automated analysis of image quality to address this issue, relatively little work has been done to explain the methodologies. In this work, we propose an explainable image quality assessment system and validate our idea on two different objectives which are foreign object detection on Chest X-Rays (Object-CXR) and Left Ventricular Outflow Tract (LVOT) detection on Cardiac Magnetic Resonance (CMR) volumes. We apply a variety of techniques to measure the faithfulness of the saliency detectors, and our explainable pipeline relies on NormGrad, an algorithm which can efficiently localise image quality issues with saliency maps of the classifier. We compare NormGrad with a range of saliency detection methods and illustrate its superior performance as a result of applying these methodologies for measuring the faithfulness of the saliency detectors. We see that NormGrad has significant gains over other saliency detectors by reaching a repeated Pointing Game score of 0.853 for Object-CXR and 0.611 for LVOT datasets.
△ Less
Submitted 25 March, 2023;
originally announced March 2023.
-
Invertible Neural Skinning
Authors:
Yash Kant,
Aliaksandr Siarohin,
Riza Alp Guler,
Menglei Chai,
Jian Ren,
Sergey Tulyakov,
Igor Gilitschenski
Abstract:
Building animatable and editable models of clothed humans from raw 3D scans and poses is a challenging problem. Existing reposing methods suffer from the limited expressiveness of Linear Blend Skinning (LBS), require costly mesh extraction to generate each new pose, and typically do not preserve surface correspondences across different poses. In this work, we introduce Invertible Neural Skinning (…
▽ More
Building animatable and editable models of clothed humans from raw 3D scans and poses is a challenging problem. Existing reposing methods suffer from the limited expressiveness of Linear Blend Skinning (LBS), require costly mesh extraction to generate each new pose, and typically do not preserve surface correspondences across different poses. In this work, we introduce Invertible Neural Skinning (INS) to address these shortcomings. To maintain correspondences, we propose a Pose-conditioned Invertible Network (PIN) architecture, which extends the LBS process by learning additional pose-varying deformations. Next, we combine PIN with a differentiable LBS module to build an expressive and end-to-end Invertible Neural Skinning (INS) pipeline. We demonstrate the strong performance of our method by outperforming the state-of-the-art reposing techniques on clothed humans and preserving surface correspondences, while being an order of magnitude faster. We also perform an ablation study, which shows the usefulness of our pose-conditioning formulation, and our qualitative results display that INS can rectify artefacts introduced by LBS well. See our webpage for more details: https://yashkant.github.io/invertible-neural-skinning/
△ Less
Submitted 4 March, 2023; v1 submitted 17 February, 2023;
originally announced February 2023.
-
Shifted Windows Transformers for Medical Image Quality Assessment
Authors:
Caner Ozer,
Arda Guler,
Aysel Turkvatan Cansever,
Deniz Alis,
Ercan Karaarslan,
Ilkay Oksuz
Abstract:
To maintain a standard in a medical imaging study, images should have necessary image quality for potential diagnostic use. Although CNN-based approaches are used to assess the image quality, their performance can still be improved in terms of accuracy. In this work, we approach this problem by using Swin Transformer, which improves the poor-quality image classification performance that causes the…
▽ More
To maintain a standard in a medical imaging study, images should have necessary image quality for potential diagnostic use. Although CNN-based approaches are used to assess the image quality, their performance can still be improved in terms of accuracy. In this work, we approach this problem by using Swin Transformer, which improves the poor-quality image classification performance that causes the degradation in medical image quality. We test our approach on Foreign Object Classification problem on Chest X-Rays (Object-CXR) and Left Ventricular Outflow Tract Classification problem on Cardiac MRI with a four-chamber view (LVOT). While we obtain a classification accuracy of 87.1% and 95.48% on the Object-CXR and LVOT datasets, our experimental results suggest that the use of Swin Transformer improves the Object-CXR classification performance while obtaining a comparable performance for the LVOT dataset. To the best of our knowledge, our study is the first vision transformer application for medical image quality assessment.
△ Less
Submitted 11 August, 2022;
originally announced August 2022.
-
Context-self contrastive pretraining for crop type semantic segmentation
Authors:
Michail Tarasiou,
Riza Alp Guler,
Stefanos Zafeiriou
Abstract:
In this paper, we propose a fully supervised pre-training scheme based on contrastive learning particularly tailored to dense classification tasks. The proposed Context-Self Contrastive Loss (CSCL) learns an embedding space that makes semantic boundaries pop-up by use of a similarity metric between every location in a training sample and its local context. For crop type semantic segmentation from…
▽ More
In this paper, we propose a fully supervised pre-training scheme based on contrastive learning particularly tailored to dense classification tasks. The proposed Context-Self Contrastive Loss (CSCL) learns an embedding space that makes semantic boundaries pop-up by use of a similarity metric between every location in a training sample and its local context. For crop type semantic segmentation from Satellite Image Time Series (SITS) we find performance at parcel boundaries to be a critical bottleneck and explain how CSCL tackles the underlying cause of that problem, improving the state-of-the-art performance in this task. Additionally, using images from the Sentinel-2 (S2) satellite missions we compile the largest, to our knowledge, SITS dataset densely annotated by crop type and parcel identities, which we make publicly available together with the data generation pipeline. Using that data we find CSCL, even with minimal pre-training, to improve all respective baselines and present a process for semantic segmentation at super-resolution for obtaining crop classes at a more granular level. The code and instructions to download the data can be found in https://github.com/michaeltrs/DeepSatModels.
△ Less
Submitted 5 February, 2024; v1 submitted 9 April, 2021;
originally announced April 2021.
-
Divide and Learn: A Divide and Conquer Approach for Predict+Optimize
Authors:
Ali Ugur Guler,
Emir Demirovic,
Jeffrey Chan,
James Bailey,
Christopher Leckie,
Peter J. Stuckey
Abstract:
The predict+optimize problem combines machine learning ofproblem coefficients with a combinatorial optimization prob-lem that uses the predicted coefficients. While this problemcan be solved in two separate stages, it is better to directlyminimize the optimization loss. However, this requires dif-ferentiating through a discrete, non-differentiable combina-torial function. Most existing approaches…
▽ More
The predict+optimize problem combines machine learning ofproblem coefficients with a combinatorial optimization prob-lem that uses the predicted coefficients. While this problemcan be solved in two separate stages, it is better to directlyminimize the optimization loss. However, this requires dif-ferentiating through a discrete, non-differentiable combina-torial function. Most existing approaches use some form ofsurrogate gradient. Demirovicet alshowed how to directlyexpress the loss of the optimization problem in terms of thepredicted coefficients as a piece-wise linear function. How-ever, their approach is restricted to optimization problemswith a dynamic programming formulation. In this work wepropose a novel divide and conquer algorithm to tackle op-timization problems without this restriction and predict itscoefficients using the optimization loss. We also introduce agreedy version of this approach, which achieves similar re-sults with less computation. We compare our approach withother approaches to the predict+optimize problem and showwe can successfully tackle some hard combinatorial problemsbetter than other predict+optimize methods.
△ Less
Submitted 3 December, 2020;
originally announced December 2020.
-
Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild
Authors:
Dominik Kulon,
Riza Alp Güler,
Iasonas Kokkinos,
Michael Bronstein,
Stefanos Zafeiriou
Abstract:
We introduce a simple and effective network architecture for monocular 3D hand pose estimation consisting of an image encoder followed by a mesh convolutional decoder that is trained through a direct 3D hand mesh reconstruction loss. We train our network by gathering a large-scale dataset of hand action in YouTube videos and use it as a source of weak supervision. Our weakly-supervised mesh convol…
▽ More
We introduce a simple and effective network architecture for monocular 3D hand pose estimation consisting of an image encoder followed by a mesh convolutional decoder that is trained through a direct 3D hand mesh reconstruction loss. We train our network by gathering a large-scale dataset of hand action in YouTube videos and use it as a source of weak supervision. Our weakly-supervised mesh convolutions-based system largely outperforms state-of-the-art methods, even halving the errors on the in the wild benchmark. The dataset and additional resources are available at https://arielai.com/mesh_hands.
△ Less
Submitted 4 April, 2020;
originally announced April 2020.
-
Slim DensePose: Thrifty Learning from Sparse Annotations and Motion Cues
Authors:
Natalia Neverova,
James Thewlis,
Rıza Alp Güler,
Iasonas Kokkinos,
Andrea Vedaldi
Abstract:
DensePose supersedes traditional landmark detectors by densely mapping image pixels to body surface coordinates. This power, however, comes at a greatly increased annotation time, as supervising the model requires to manually label hundreds of points per pose instance. In this work, we thus seek methods to significantly slim down the DensePose annotations, proposing more efficient data collection…
▽ More
DensePose supersedes traditional landmark detectors by densely mapping image pixels to body surface coordinates. This power, however, comes at a greatly increased annotation time, as supervising the model requires to manually label hundreds of points per pose instance. In this work, we thus seek methods to significantly slim down the DensePose annotations, proposing more efficient data collection strategies. In particular, we demonstrate that if annotations are collected in video frames, their efficacy can be multiplied for free by using motion cues. To explore this idea, we introduce DensePose-Track, a dataset of videos where selected frames are annotated in the traditional DensePose manner. Then, building on geometric properties of the DensePose mapping, we use the video dynamic to propagate ground-truth annotations in time as well as to learn from Siamese equivariance constraints. Having performed exhaustive empirical evaluation of various data annotation and learning strategies, we demonstrate that doing so can deliver significantly improved pose estimation results over strong baselines. However, despite what is suggested by some recent works, we show that merely synthesizing motion patterns by applying geometric transformations to isolated frames is significantly less effective, and that motion cues help much more when they are extracted from videos.
△ Less
Submitted 13 June, 2019;
originally announced June 2019.
-
Single Image 3D Hand Reconstruction with Mesh Convolutions
Authors:
Dominik Kulon,
Haoyang Wang,
Riza Alp Güler,
Michael Bronstein,
Stefanos Zafeiriou
Abstract:
Monocular 3D reconstruction of deformable objects, such as human body parts, has been typically approached by predicting parameters of heavyweight linear models. In this paper, we demonstrate an alternative solution that is based on the idea of encoding images into a latent non-linear representation of meshes. The prior on 3D hand shapes is learned by training an autoencoder with intrinsic graph c…
▽ More
Monocular 3D reconstruction of deformable objects, such as human body parts, has been typically approached by predicting parameters of heavyweight linear models. In this paper, we demonstrate an alternative solution that is based on the idea of encoding images into a latent non-linear representation of meshes. The prior on 3D hand shapes is learned by training an autoencoder with intrinsic graph convolutions performed in the spectral domain. The pre-trained decoder acts as a non-linear statistical deformable model. The latent parameters that reconstruct the shape and articulated pose of hands in the image are predicted using an image encoder. We show that our system reconstructs plausible meshes and operates in real-time. We evaluate the quality of the mesh reconstructions produced by the decoder on a new dataset and show latent space interpolation results. Our code, data, and models will be made publicly available.
△ Less
Submitted 5 August, 2019; v1 submitted 4 May, 2019;
originally announced May 2019.
-
Lifting AutoEncoders: Unsupervised Learning of a Fully-Disentangled 3D Morphable Model using Deep Non-Rigid Structure from Motion
Authors:
Mihir Sahasrabudhe,
Zhixin Shu,
Edward Bartrum,
Riza Alp Guler,
Dimitris Samaras,
Iasonas Kokkinos
Abstract:
In this work we introduce Lifting Autoencoders, a generative 3D surface-based model of object categories. We bring together ideas from non-rigid structure from motion, image formation, and morphable models to learn a controllable, geometric model of 3D categories in an entirely unsupervised manner from an unstructured set of images. We exploit the 3D geometric nature of our model and use normal in…
▽ More
In this work we introduce Lifting Autoencoders, a generative 3D surface-based model of object categories. We bring together ideas from non-rigid structure from motion, image formation, and morphable models to learn a controllable, geometric model of 3D categories in an entirely unsupervised manner from an unstructured set of images. We exploit the 3D geometric nature of our model and use normal information to disentangle appearance into illumination, shading and albedo. We further use weak supervision to disentangle the non-rigid shape variability of human faces into identity and expression. We combine the 3D representation with a differentiable renderer to generate RGB images and append an adversarially trained refinement network to obtain sharp, photorealistic image reconstruction results. The learned generative model can be controlled in terms of interpretable geometry and appearance factors, allowing us to perform photorealistic image manipulation of identity, expression, 3D pose, and illumination properties.
△ Less
Submitted 26 April, 2019;
originally announced April 2019.
-
Dense Pose Transfer
Authors:
Natalia Neverova,
Riza Alp Guler,
Iasonas Kokkinos
Abstract:
In this work we integrate ideas from surface-based modeling with neural synthesis: we propose a combination of surface-based pose estimation and deep generative models that allows us to perform accurate pose transfer, i.e. synthesize a new image of a person based on a single image of that person and the image of a pose donor. We use a dense pose estimation system that maps pixels from both images…
▽ More
In this work we integrate ideas from surface-based modeling with neural synthesis: we propose a combination of surface-based pose estimation and deep generative models that allows us to perform accurate pose transfer, i.e. synthesize a new image of a person based on a single image of that person and the image of a pose donor. We use a dense pose estimation system that maps pixels from both images to a common surface-based coordinate system, allowing the two images to be brought in correspondence with each other. We inpaint and refine the source image intensities in the surface coordinate system, prior to warping them onto the target pose. These predictions are fused with those of a convolutional predictive module through a neural synthesis module allowing for training the whole pipeline jointly end-to-end, optimizing a combination of adversarial and perceptual losses. We show that dense pose estimation is a substantially more powerful conditioning input than landmark-, or mask-based alternatives, and report systematic improvements over state of the art generators on DeepFashion and MVC datasets.
△ Less
Submitted 6 September, 2018;
originally announced September 2018.
-
Deforming Autoencoders: Unsupervised Disentangling of Shape and Appearance
Authors:
Zhixin Shu,
Mihir Sahasrabudhe,
Alp Guler,
Dimitris Samaras,
Nikos Paragios,
Iasonas Kokkinos
Abstract:
In this work we introduce Deforming Autoencoders, a generative model for images that disentangles shape from appearance in an unsupervised manner. As in the deformable template paradigm, shape is represented as a deformation between a canonical coordinate system (`template') and an observed image, while appearance is modeled in `canonical', template, coordinates, thus discarding variability due to…
▽ More
In this work we introduce Deforming Autoencoders, a generative model for images that disentangles shape from appearance in an unsupervised manner. As in the deformable template paradigm, shape is represented as a deformation between a canonical coordinate system (`template') and an observed image, while appearance is modeled in `canonical', template, coordinates, thus discarding variability due to deformations. We introduce novel techniques that allow this approach to be deployed in the setting of autoencoders and show that this method can be used for unsupervised group-wise image alignment. We show experiments with expression morphing in humans, hands, and digits, face manipulation, such as shape and appearance interpolation, as well as unsupervised landmark localization. A more powerful form of unsupervised disentangling becomes possible in template coordinates, allowing us to successfully decompose face images into shading and albedo, and further manipulate face images.
△ Less
Submitted 18 June, 2018;
originally announced June 2018.
-
DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild
Authors:
Riza Alp Guler,
Yuxiang Zhou,
George Trigeorgis,
Epameinondas Antonakos,
Patrick Snape,
Stefanos Zafeiriou,
Iasonas Kokkinos
Abstract:
In this work we use deep learning to establish dense correspondences between a 3D object model and an image "in the wild". We introduce "DenseReg", a fully-convolutional neural network (F-CNN) that densely regresses at every foreground pixel a pair of U-V template coordinates in a single feedforward pass. To train DenseReg we construct a supervision signal by combining 3D deformable model fitting…
▽ More
In this work we use deep learning to establish dense correspondences between a 3D object model and an image "in the wild". We introduce "DenseReg", a fully-convolutional neural network (F-CNN) that densely regresses at every foreground pixel a pair of U-V template coordinates in a single feedforward pass. To train DenseReg we construct a supervision signal by combining 3D deformable model fitting and 2D landmark annotations. We define the regression task in terms of the intrinsic, U-V coordinates of a 3D deformable model that is brought into correspondence with image instances at training time. A host of other object-related tasks (e.g. part segmentation, landmark localization) are shown to be by-products of this task, and to largely improve thanks to its introduction. We obtain highly-accurate regression results by combining ideas from semantic segmentation with regression networks, yielding a 'quantized regression' architecture that first obtains a quantized estimate of position through classification, and refines it through regression of the residual. We show that such networks can boost the performance of existing state-of-the-art systems for pose estimation. Firstly, we show that our system can serve as an initialization for Statistical Deformable Models, as well as an element of cascaded architectures that jointly localize landmarks and estimate dense correspondences. We also show that the obtained dense correspondence can act as a source of 'privileged information' that complements and extends the pure landmark-level annotations, accelerating and improving the training of pose estimation networks. We report state-of-the-art performance on the challenging 300W benchmark for facial landmark localization and on the MPII and LSP datasets for human pose estimation.
△ Less
Submitted 11 March, 2018; v1 submitted 5 March, 2018;
originally announced March 2018.
-
DensePose: Dense Human Pose Estimation In The Wild
Authors:
Rıza Alp Güler,
Natalia Neverova,
Iasonas Kokkinos
Abstract:
In this work, we establish dense correspondences between RGB image and a surface-based representation of the human body, a task we refer to as dense human pose estimation. We first gather dense correspondences for 50K persons appearing in the COCO dataset by introducing an efficient annotation pipeline. We then use our dataset to train CNN-based systems that deliver dense correspondence 'in the wi…
▽ More
In this work, we establish dense correspondences between RGB image and a surface-based representation of the human body, a task we refer to as dense human pose estimation. We first gather dense correspondences for 50K persons appearing in the COCO dataset by introducing an efficient annotation pipeline. We then use our dataset to train CNN-based systems that deliver dense correspondence 'in the wild', namely in the presence of background, occlusions and scale variations. We improve our training set's effectiveness by training an 'inpainting' network that can fill in missing groundtruth values and report clear improvements with respect to the best results that would be achievable in the past. We experiment with fully-convolutional networks and region-based models and observe a superiority of the latter; we further improve accuracy through cascading, obtaining a system that delivers highly0accurate results in real time. Supplementary materials and videos are provided on the project page http://densepose.org
△ Less
Submitted 1 February, 2018;
originally announced February 2018.
-
Spatial Interference Detection for Mobile Visible Light Communication
Authors:
Ali Ugur Guler,
Tristan Braud,
Pan Hui
Abstract:
Taking advantage of the rolling shutter effect of CMOS cameras in smartphones is a common practice to increase the transfered data rate with visible light communication (VLC) without employing external equipment such as photodiodes. VLC can then be used as replacement of other marker based techniques for object identification for Augmented Reality and Ubiquitous computing applications. However, th…
▽ More
Taking advantage of the rolling shutter effect of CMOS cameras in smartphones is a common practice to increase the transfered data rate with visible light communication (VLC) without employing external equipment such as photodiodes. VLC can then be used as replacement of other marker based techniques for object identification for Augmented Reality and Ubiquitous computing applications. However, the rolling shutter effect only allows to transmit data over a single dimension, which considerably limits the available bandwidth. In this article we propose a new method exploiting spacial interference detection to enable parallel transmission and design a protocol that enables easy identification of interferences between two signals. By introducing a second dimension, we are not only able to significantly increase the available bandwidth, but also identify and isolate light sources in close proximity.
△ Less
Submitted 13 July, 2017;
originally announced July 2017.
-
DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild
Authors:
Rıza Alp Güler,
George Trigeorgis,
Epameinondas Antonakos,
Patrick Snape,
Stefanos Zafeiriou,
Iasonas Kokkinos
Abstract:
In this paper we propose to learn a mapping from image pixels into a dense template grid through a fully convolutional network. We formulate this task as a regression problem and train our network by leveraging upon manually annotated facial landmarks "in-the-wild". We use such landmarks to establish a dense correspondence field between a three-dimensional object template and the input image, whic…
▽ More
In this paper we propose to learn a mapping from image pixels into a dense template grid through a fully convolutional network. We formulate this task as a regression problem and train our network by leveraging upon manually annotated facial landmarks "in-the-wild". We use such landmarks to establish a dense correspondence field between a three-dimensional object template and the input image, which then serves as the ground-truth for training our regression system. We show that we can combine ideas from semantic segmentation with regression networks, yielding a highly-accurate "quantized regression" architecture.
Our system, called DenseReg, allows us to estimate dense image-to-template correspondences in a fully convolutional manner. As such our network can provide useful correspondence information as a stand-alone system, while when used as an initialization for Statistical Deformable Models we obtain landmark localization results that largely outperform the current state-of-the-art on the challenging 300W benchmark. We thoroughly evaluate our method on a host of facial analysis tasks and also provide qualitative results for dense human body correspondence. We make our code available at http://alpguler.com/DenseReg.html along with supplementary materials.
△ Less
Submitted 19 June, 2017; v1 submitted 4 December, 2016;
originally announced December 2016.
-
One-dimensional Cutting Stock Problem with Divisible Items
Authors:
Deniz Tanir,
Onur Ugurlu,
Asli Guler,
Urfat Nuriyev
Abstract:
This paper considers the one-dimensional cutting stock problem with divisible items, which is a new problem in the cutting stock literature. The problem exists in steel industries. In the new problem, each item can be divided into smaller pieces, then they can be recombined again by welding. The objective is to minimize both the trim loss and the number of the welds. We present a mathematical mode…
▽ More
This paper considers the one-dimensional cutting stock problem with divisible items, which is a new problem in the cutting stock literature. The problem exists in steel industries. In the new problem, each item can be divided into smaller pieces, then they can be recombined again by welding. The objective is to minimize both the trim loss and the number of the welds. We present a mathematical model and a dynamic programming based heuristic for the problem. Furthermore, a software, which is based on the proposed heuristic algorithm, is developed to use in MKA company, and its performance is analyzed by solving real-life problems in the steel industry. The computational experiments show the efficiency of the proposed algorithm.
△ Less
Submitted 4 June, 2016;
originally announced June 2016.