-
Lessons from Defending Gemini Against Indirect Prompt Injections
Authors:
Chongyang Shi,
Sharon Lin,
Shuang Song,
Jamie Hayes,
Ilia Shumailov,
Itay Yona,
Juliette Pluto,
Aneesh Pappu,
Christopher A. Choquette-Choo,
Milad Nasr,
Chawin Sitawarin,
Gena Gibson,
Andreas Terzis,
John "Four" Flynn
Abstract:
Gemini is increasingly used to perform tasks on behalf of users, where function-calling and tool-use capabilities enable the model to access user data. Some tools, however, require access to untrusted data introducing risk. Adversaries can embed malicious instructions in untrusted data which cause the model to deviate from the user's expectations and mishandle their data or permissions. In this re…
▽ More
Gemini is increasingly used to perform tasks on behalf of users, where function-calling and tool-use capabilities enable the model to access user data. Some tools, however, require access to untrusted data introducing risk. Adversaries can embed malicious instructions in untrusted data which cause the model to deviate from the user's expectations and mishandle their data or permissions. In this report, we set out Google DeepMind's approach to evaluating the adversarial robustness of Gemini models and describe the main lessons learned from the process. We test how Gemini performs against a sophisticated adversary through an adversarial evaluation framework, which deploys a suite of adaptive attack techniques to run continuously against past, current, and future versions of Gemini. We describe how these ongoing evaluations directly help make Gemini more resilient against manipulation.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Quark: Real-time, High-resolution, and General Neural View Synthesis
Authors:
John Flynn,
Michael Broxton,
Lukas Murmann,
Lucy Chai,
Matthew DuVall,
Clément Godard,
Kathryn Heal,
Srinivas Kaza,
Stephen Lombardi,
Xuan Luo,
Supreeth Achar,
Kira Prabhu,
Tiancheng Sun,
Lynn Tsai,
Ryan Overbeck
Abstract:
We present a novel neural algorithm for performing high-quality, high-resolution, real-time novel view synthesis. From a sparse set of input RGB images or videos streams, our network both reconstructs the 3D scene and renders novel views at 1080p resolution at 30fps on an NVIDIA A100. Our feed-forward network generalizes across a wide variety of datasets and scenes and produces state-of-the-art qu…
▽ More
We present a novel neural algorithm for performing high-quality, high-resolution, real-time novel view synthesis. From a sparse set of input RGB images or videos streams, our network both reconstructs the 3D scene and renders novel views at 1080p resolution at 30fps on an NVIDIA A100. Our feed-forward network generalizes across a wide variety of datasets and scenes and produces state-of-the-art quality for a real-time method. Our quality approaches, and in some cases surpasses, the quality of some of the top offline methods. In order to achieve these results we use a novel combination of several key concepts, and tie them together into a cohesive and effective algorithm. We build on previous works that represent the scene using semi-transparent layers and use an iterative learned render-and-refine approach to improve those layers. Instead of flat layers, our method reconstructs layered depth maps (LDMs) that efficiently represent scenes with complex depth and occlusions. The iterative update steps are embedded in a multi-scale, UNet-style architecture to perform as much compute as possible at reduced resolution. Within each update step, to better aggregate the information from multiple input views, we use a specialized Transformer-based network component. This allows the majority of the per-input image processing to be performed in the input image space, as opposed to layer space, further increasing efficiency. Finally, due to the real-time nature of our reconstruction and rendering, we dynamically create and discard the internal 3D geometry for each frame, generating the LDM for each view. Taken together, this produces a novel and effective algorithm for view synthesis. Through extensive evaluation, we demonstrate that we achieve state-of-the-art quality at real-time rates. Project page: https://quark-3d.github.io/
△ Less
Submitted 25 November, 2024;
originally announced November 2024.
-
Curricular Complexity Versus Quality of Computer Science Programs
Authors:
Gregory L. Heileman,
Hayden W. Free,
Johnny Flynn,
Camden Mackowiak,
Jerzy W. Jaromczyk,
Chaouki T. Abdallah
Abstract:
In this research paper we describe a study that involves measuring the complexities of undergraduate curricula offered by computer science departments, and then comparing them to the quality of these departments, where quality is determined by a metric-based ranking system. The study objective was to determine whether or not a relationship exists between the quality of computer science departments…
▽ More
In this research paper we describe a study that involves measuring the complexities of undergraduate curricula offered by computer science departments, and then comparing them to the quality of these departments, where quality is determined by a metric-based ranking system. The study objective was to determine whether or not a relationship exists between the quality of computer science departments and the complexity of the curricula they offer. The relationship between curricular complexity and program quality was previously investigated for the case of undergraduate electrical engineering programs, with surprising results. It was found that if the US News & World Report Best Undergraduate Programs ranking is used as a proxy for quality, then a statistically significant difference in curricular complexities exists between higher and lower quality electrical engineering programs. Furthermore, it was found that higher quality electrical engineering programs tend to have lower complexity curricula, and vice versa. In the study reported in this paper, a sufficient amount of data was collected in order to determine that an inverse relationship between program quality and curricular complexity also exists in undergraduate computer science departments. This brings up an interesting question regarding the extent to which this phenomenon exists across the spectrum of STEM disciplines.
△ Less
Submitted 11 June, 2020;
originally announced June 2020.
-
DeepView: View Synthesis with Learned Gradient Descent
Authors:
John Flynn,
Michael Broxton,
Paul Debevec,
Matthew DuVall,
Graham Fyffe,
Ryan Overbeck,
Noah Snavely,
Richard Tucker
Abstract:
We present a novel approach to view synthesis using multiplane images (MPIs). Building on recent advances in learned gradient descent, our algorithm generates an MPI from a set of sparse camera viewpoints. The resulting method incorporates occlusion reasoning, improving performance on challenging scene features such as object boundaries, lighting reflections, thin structures, and scenes with high…
▽ More
We present a novel approach to view synthesis using multiplane images (MPIs). Building on recent advances in learned gradient descent, our algorithm generates an MPI from a set of sparse camera viewpoints. The resulting method incorporates occlusion reasoning, improving performance on challenging scene features such as object boundaries, lighting reflections, thin structures, and scenes with high depth complexity. We show that our method achieves high-quality, state-of-the-art results on two datasets: the Kalantari light field dataset, and a new camera array dataset, Spaces, which we make publicly available.
△ Less
Submitted 17 June, 2019;
originally announced June 2019.
-
DeepLight: Learning Illumination for Unconstrained Mobile Mixed Reality
Authors:
Chloe LeGendre,
Wan-Chun Ma,
Graham Fyffe,
John Flynn,
Laurent Charbonnel,
Jay Busch,
Paul Debevec
Abstract:
We present a learning-based method to infer plausible high dynamic range (HDR), omnidirectional illumination given an unconstrained, low dynamic range (LDR) image from a mobile phone camera with a limited field of view (FOV). For training data, we collect videos of various reflective spheres placed within the camera's FOV, leaving most of the background unoccluded, leveraging that materials with d…
▽ More
We present a learning-based method to infer plausible high dynamic range (HDR), omnidirectional illumination given an unconstrained, low dynamic range (LDR) image from a mobile phone camera with a limited field of view (FOV). For training data, we collect videos of various reflective spheres placed within the camera's FOV, leaving most of the background unoccluded, leveraging that materials with diverse reflectance functions reveal different lighting cues in a single exposure. We train a deep neural network to regress from the LDR background image to HDR lighting by matching the LDR ground truth sphere images to those rendered with the predicted illumination using image-based relighting, which is differentiable. Our inference runs at interactive frame rates on a mobile device, enabling realistic rendering of virtual objects into real scenes for mobile mixed reality. Training on automatically exposed and white-balanced videos, we improve the realism of rendered objects compared to the state-of-the art methods for both indoor and outdoor scenes.
△ Less
Submitted 1 April, 2019;
originally announced April 2019.
-
On Hallucinating Context and Background Pixels from a Face Mask using Multi-scale GANs
Authors:
Sandipan Banerjee,
Walter J. Scheirer,
Kevin W. Bowyer,
Patrick J. Flynn
Abstract:
We propose a multi-scale GAN model to hallucinate realistic context (forehead, hair, neck, clothes) and background pixels automatically from a single input face mask. Instead of swapping a face on to an existing picture, our model directly generates realistic context and background pixels based on the features of the provided face mask. Unlike face inpainting algorithms, it can generate realistic…
▽ More
We propose a multi-scale GAN model to hallucinate realistic context (forehead, hair, neck, clothes) and background pixels automatically from a single input face mask. Instead of swapping a face on to an existing picture, our model directly generates realistic context and background pixels based on the features of the provided face mask. Unlike face inpainting algorithms, it can generate realistic hallucinations even for a large number of missing pixels. Our model is composed of a cascaded network of GAN blocks, each tasked with hallucination of missing pixels at a particular resolution while guiding the synthesis process of the next GAN block. The hallucinated full face image is made photo-realistic by using a combination of reconstruction, perceptual, adversarial and identity preserving losses at each block of the network. With a set of extensive experiments, we demonstrate the effectiveness of our model in hallucinating context and background pixels from face masks varying in facial pose, expression and lighting, collected from multiple datasets subject disjoint with our training data. We also compare our method with two popular face swapping and face completion methods in terms of visual quality and recognition performance. Additionally, we analyze our cascaded pipeline and compare it with the recently proposed progressive growing of GANs.
△ Less
Submitted 11 January, 2020; v1 submitted 17 November, 2018;
originally announced November 2018.
-
Fast Face Image Synthesis with Minimal Training
Authors:
Sandipan Banerjee,
Walter J. Scheirer,
Kevin W. Bowyer,
Patrick J. Flynn
Abstract:
We propose an algorithm to generate realistic face images of both real and synthetic identities (people who do not exist) with different facial yaw, shape and resolution.The synthesized images can be used to augment datasets to train CNNs or as massive distractor sets for biometric verification experiments without any privacy concerns. Additionally, law enforcement can make use of this technique t…
▽ More
We propose an algorithm to generate realistic face images of both real and synthetic identities (people who do not exist) with different facial yaw, shape and resolution.The synthesized images can be used to augment datasets to train CNNs or as massive distractor sets for biometric verification experiments without any privacy concerns. Additionally, law enforcement can make use of this technique to train forensic experts to recognize faces. Our method samples face components from a pool of multiple face images of real identities to generate the synthetic texture. Then, a real 3D head model compatible to the generated texture is used to render it under different facial yaw transformations. We perform multiple quantitative experiments to assess the effectiveness of our synthesis procedure in CNN training and its potential use to generate distractor face images. Additionally, we compare our method with popular GAN models in terms of visual quality and execution time.
△ Less
Submitted 19 November, 2018; v1 submitted 4 November, 2018;
originally announced November 2018.
-
Domain-Specific Human-Inspired Binarized Statistical Image Features for Iris Recognition
Authors:
Adam Czajka,
Daniel Moreira,
Kevin W. Bowyer,
Patrick J. Flynn
Abstract:
Binarized statistical image features (BSIF) have been successfully used for texture analysis in many computer vision tasks, including iris recognition and biometric presentation attack detection. One important point is that all applications of BSIF in iris recognition have used the original BSIF filters, which were trained on image patches extracted from natural images. This paper tests the questi…
▽ More
Binarized statistical image features (BSIF) have been successfully used for texture analysis in many computer vision tasks, including iris recognition and biometric presentation attack detection. One important point is that all applications of BSIF in iris recognition have used the original BSIF filters, which were trained on image patches extracted from natural images. This paper tests the question of whether domain-specific BSIF can give better performance than the default BSIF. The second important point is in the selection of image patches to use in training for BSIF. Can image patches derived from eye-tracking experiments, in which humans perform an iris recognition task, give better performance than random patches? Our results say that (1) domain-specific BSIF features can out-perform the default BSIF features, and (2) selecting image patches in a task-specific manner guided by human performance can out-perform selecting random patches. These results are important because BSIF is often regarded as a generic texture tool that does not need any domain adaptation, and human-task-guided selection of patches for training has never (to our knowledge) been done. This paper follows the reproducible research requirements, and the new iris-domain-specific BSIF filters, the patches used in filter training, the database used in testing and the source codes of the designed iris recognition method are made available along with this paper to facilitate applications of this concept.
△ Less
Submitted 17 November, 2018; v1 submitted 13 July, 2018;
originally announced July 2018.
-
Performance of Humans in Iris Recognition: The Impact of Iris Condition and Annotation-driven Verification
Authors:
Daniel Moreira,
Mateusz Trokielewicz,
Adam Czajka,
Kevin W. Bowyer,
Patrick J. Flynn
Abstract:
This paper advances the state of the art in human examination of iris images by (1) assessing the impact of different iris conditions in identity verification, and (2) introducing an annotation step that improves the accuracy of people's decisions. In a first experimental session, 114 subjects were asked to decide if pairs of iris images depict the same eye (genuine pairs) or two distinct eyes (im…
▽ More
This paper advances the state of the art in human examination of iris images by (1) assessing the impact of different iris conditions in identity verification, and (2) introducing an annotation step that improves the accuracy of people's decisions. In a first experimental session, 114 subjects were asked to decide if pairs of iris images depict the same eye (genuine pairs) or two distinct eyes (impostor pairs). The image pairs sampled six conditions: (1) easy for algorithms to classify, (2) difficult for algorithms to classify, (3) large difference in pupil dilation, (4) disease-affected eyes, (5) identical twins, and (6) post-mortem samples. In a second session, 85 of the 114 subjects were asked to annotate matching and non-matching regions that supported their decisions. Subjects were allowed to change their initial classification as a result of the annotation process. Results suggest that: (a) people improve their identity verification accuracy when asked to annotate matching and non-matching regions between the pair of images, (b) images depicting the same eye with large difference in pupil dilation were the most challenging to subjects, but benefited well from the annotation-driven classification, (c) humans performed better than iris recognition algorithms when verifying genuine pairs of post-mortem and disease-affected eyes (i.e., samples showing deformations that go beyond the distortions of a healthy iris due to pupil dilation), and (d) annotation does not improve accuracy of analyzing images from identical twins, which remain confusing for people.
△ Less
Submitted 20 November, 2018; v1 submitted 13 July, 2018;
originally announced July 2018.
-
Beyond Pixels: Image Provenance Analysis Leveraging Metadata
Authors:
Aparna Bharati,
Daniel Moreira,
Joel Brogan,
Patricia Hale,
Kevin W. Bowyer,
Patrick J. Flynn,
Anderson Rocha,
Walter J. Scheirer
Abstract:
Creative works, whether paintings or memes, follow unique journeys that result in their final form. Understanding these journeys, a process known as "provenance analysis", provides rich insights into the use, motivation, and authenticity underlying any given work. The application of this type of study to the expanse of unregulated content on the Internet is what we consider in this paper. Provenan…
▽ More
Creative works, whether paintings or memes, follow unique journeys that result in their final form. Understanding these journeys, a process known as "provenance analysis", provides rich insights into the use, motivation, and authenticity underlying any given work. The application of this type of study to the expanse of unregulated content on the Internet is what we consider in this paper. Provenance analysis provides a snapshot of the chronology and validity of content as it is uploaded, re-uploaded, and modified over time. Although still in its infancy, automated provenance analysis for online multimedia is already being applied to different types of content. Most current works seek to build provenance graphs based on the shared content between images or videos. This can be a computationally expensive task, especially when considering the vast influx of content that the Internet sees every day. Utilizing non-content-based information, such as timestamps, geotags, and camera IDs can help provide important insights into the path a particular image or video has traveled during its time on the Internet without large computational overhead. This paper tests the scope and applicability of metadata-based inferences for provenance graph construction in two different scenarios: digital image forensics and cultural analytics.
△ Less
Submitted 6 March, 2019; v1 submitted 9 July, 2018;
originally announced July 2018.
-
Stereo Magnification: Learning View Synthesis using Multiplane Images
Authors:
Tinghui Zhou,
Richard Tucker,
John Flynn,
Graham Fyffe,
Noah Snavely
Abstract:
The view synthesis problem--generating novel views of a scene from known imagery--has garnered recent attention due in part to compelling applications in virtual and augmented reality. In this paper, we explore an intriguing scenario for view synthesis: extrapolating views from imagery captured by narrow-baseline stereo cameras, including VR cameras and now-widespread dual-lens camera phones. We c…
▽ More
The view synthesis problem--generating novel views of a scene from known imagery--has garnered recent attention due in part to compelling applications in virtual and augmented reality. In this paper, we explore an intriguing scenario for view synthesis: extrapolating views from imagery captured by narrow-baseline stereo cameras, including VR cameras and now-widespread dual-lens camera phones. We call this problem stereo magnification, and propose a learning framework that leverages a new layered representation that we call multiplane images (MPIs). Our method also uses a massive new data source for learning view extrapolation: online videos on YouTube. Using data mined from such videos, we train a deep network that predicts an MPI from an input stereo image pair. This inferred MPI can then be used to synthesize a range of novel views of the scene, including views that extrapolate significantly beyond the input baseline. We show that our method compares favorably with several recent view synthesis methods, and demonstrate applications in magnifying narrow-baseline stereo images.
△ Less
Submitted 24 May, 2018;
originally announced May 2018.
-
Image Provenance Analysis at Scale
Authors:
Daniel Moreira,
Aparna Bharati,
Joel Brogan,
Allan Pinto,
Michael Parowski,
Kevin W. Bowyer,
Patrick J. Flynn,
Anderson Rocha,
Walter J. Scheirer
Abstract:
Prior art has shown it is possible to estimate, through image processing and computer vision techniques, the types and parameters of transformations that have been applied to the content of individual images to obtain new images. Given a large corpus of images and a query image, an interesting further step is to retrieve the set of original images whose content is present in the query image, as we…
▽ More
Prior art has shown it is possible to estimate, through image processing and computer vision techniques, the types and parameters of transformations that have been applied to the content of individual images to obtain new images. Given a large corpus of images and a query image, an interesting further step is to retrieve the set of original images whose content is present in the query image, as well as the detailed sequences of transformations that yield the query image given the original images. This is a problem that recently has received the name of image provenance analysis. In these times of public media manipulation ( e.g., fake news and meme sharing), obtaining the history of image transformations is relevant for fact checking and authorship verification, among many other applications. This article presents an end-to-end processing pipeline for image provenance analysis, which works at real-world scale. It employs a cutting-edge image filtering solution that is custom-tailored for the problem at hand, as well as novel techniques for obtaining the provenance graph that expresses how the images, as nodes, are ancestrally connected. A comprehensive set of experiments for each stage of the pipeline is provided, comparing the proposed solution with state-of-the-art results, employing previously published datasets. In addition, this work introduces a new dataset of real-world provenance cases from the social media site Reddit, along with baseline results.
△ Less
Submitted 23 January, 2018; v1 submitted 19 January, 2018;
originally announced January 2018.
-
SREFI: Synthesis of Realistic Example Face Images
Authors:
Sandipan Banerjee,
John S. Bernhard,
Walter J. Scheirer,
Kevin W. Bowyer,
Patrick J. Flynn
Abstract:
In this paper, we propose a novel face synthesis approach that can generate an arbitrarily large number of synthetic images of both real and synthetic identities. Thus a face image dataset can be expanded in terms of the number of identities represented and the number of images per identity using this approach, without the identity-labeling and privacy complications that come from downloading imag…
▽ More
In this paper, we propose a novel face synthesis approach that can generate an arbitrarily large number of synthetic images of both real and synthetic identities. Thus a face image dataset can be expanded in terms of the number of identities represented and the number of images per identity using this approach, without the identity-labeling and privacy complications that come from downloading images from the web. To measure the visual fidelity and uniqueness of the synthetic face images and identities, we conducted face matching experiments with both human participants and a CNN pre-trained on a dataset of 2.6M real face images. To evaluate the stability of these synthetic faces, we trained a CNN model with an augmented dataset containing close to 200,000 synthetic faces. We used a snapshot of this trained CNN to recognize extremely challenging frontal (real) face images. Experiments showed training with the augmented faces boosted the face recognition performance of the CNN.
△ Less
Submitted 24 April, 2017; v1 submitted 21 April, 2017;
originally announced April 2017.
-
3D Bounding Box Estimation Using Deep Learning and Geometry
Authors:
Arsalan Mousavian,
Dragomir Anguelov,
John Flynn,
Jana Kosecka
Abstract:
We present a method for 3D object detection and pose estimation from a single image. In contrast to current techniques that only regress the 3D orientation of an object, our method first regresses relatively stable 3D object properties using a deep convolutional neural network and then combines these estimates with geometric constraints provided by a 2D object bounding box to produce a complete 3D…
▽ More
We present a method for 3D object detection and pose estimation from a single image. In contrast to current techniques that only regress the 3D orientation of an object, our method first regresses relatively stable 3D object properties using a deep convolutional neural network and then combines these estimates with geometric constraints provided by a 2D object bounding box to produce a complete 3D bounding box. The first network output estimates the 3D object orientation using a novel hybrid discrete-continuous loss, which significantly outperforms the L2 loss. The second output regresses the 3D object dimensions, which have relatively little variance compared to alternatives and can often be predicted for many object types. These estimates, combined with the geometric constraints on translation imposed by the 2D bounding box, enable us to recover a stable and accurate 3D object pose. We evaluate our method on the challenging KITTI object detection benchmark both on the official metric of 3D orientation estimation and also on the accuracy of the obtained 3D bounding boxes. Although conceptually simple, our method outperforms more complex and computationally expensive approaches that leverage semantic segmentation, instance level segmentation and flat ground priors and sub-category detection. Our discrete-continuous loss also produces state of the art results for 3D viewpoint estimation on the Pascal 3D+ dataset.
△ Less
Submitted 10 April, 2017; v1 submitted 1 December, 2016;
originally announced December 2016.
-
Geometric Neural Phrase Pooling: Modeling the Spatial Co-occurrence of Neurons
Authors:
Lingxi Xie,
Qi Tian,
John Flynn,
Jingdong Wang,
Alan Yuille
Abstract:
Deep Convolutional Neural Networks (CNNs) are playing important roles in state-of-the-art visual recognition. This paper focuses on modeling the spatial co-occurrence of neuron responses, which is less studied in the previous work. For this, we consider the neurons in the hidden layer as neural words, and construct a set of geometric neural phrases on top of them. The idea that grouping neural wor…
▽ More
Deep Convolutional Neural Networks (CNNs) are playing important roles in state-of-the-art visual recognition. This paper focuses on modeling the spatial co-occurrence of neuron responses, which is less studied in the previous work. For this, we consider the neurons in the hidden layer as neural words, and construct a set of geometric neural phrases on top of them. The idea that grouping neural words into neural phrases is borrowed from the Bag-of-Visual-Words (BoVW) model. Next, the Geometric Neural Phrase Pooling (GNPP) algorithm is proposed to efficiently encode these neural phrases. GNPP acts as a new type of hidden layer, which punishes the isolated neuron responses after convolution, and can be inserted into a CNN model with little extra computational overhead. Experimental results show that GNPP produces significant and consistent accuracy gain in image classification.
△ Less
Submitted 21 July, 2016;
originally announced July 2016.
-
The ND-IRIS-0405 Iris Image Dataset
Authors:
Kevin W. Bowyer,
Patrick J. Flynn
Abstract:
The Computer Vision Research Lab at the University of Notre Dame began collecting iris images in the spring semester of 2004. The initial data collections used an LG 2200 iris imaging system for image acquisition. Image datasets acquired in 2004-2005 at Notre Dame with this LG 2200 have been used in the ICE 2005 and ICE 2006 iris biometric evaluations. The ICE 2005 iris image dataset has been dist…
▽ More
The Computer Vision Research Lab at the University of Notre Dame began collecting iris images in the spring semester of 2004. The initial data collections used an LG 2200 iris imaging system for image acquisition. Image datasets acquired in 2004-2005 at Notre Dame with this LG 2200 have been used in the ICE 2005 and ICE 2006 iris biometric evaluations. The ICE 2005 iris image dataset has been distributed to over 100 research groups around the world. The purpose of this document is to describe the content of the ND-IRIS-0405 iris image dataset. This dataset is a superset of the iris image datasets used in ICE 2005 and ICE 2006. The ND 2004-2005 iris image dataset contains 64,980 images corresponding to 356 unique subjects, and 712 unique irises. The age range of the subjects is 18 to 75 years old. 158 of the subjects are female, and 198 are male. 250 of the subjects are Caucasian, 82 are Asian, and 24 are other ethnicities.
△ Less
Submitted 15 June, 2016;
originally announced June 2016.
-
DeepStereo: Learning to Predict New Views from the World's Imagery
Authors:
John Flynn,
Ivan Neulander,
James Philbin,
Noah Snavely
Abstract:
Deep networks have recently enjoyed enormous success when applied to recognition and classification problems in computer vision, but their use in graphics problems has been limited. In this work, we present a novel deep architecture that performs new view synthesis directly from pixels, trained from a large number of posed image sets. In contrast to traditional approaches which consist of multiple…
▽ More
Deep networks have recently enjoyed enormous success when applied to recognition and classification problems in computer vision, but their use in graphics problems has been limited. In this work, we present a novel deep architecture that performs new view synthesis directly from pixels, trained from a large number of posed image sets. In contrast to traditional approaches which consist of multiple complex stages of processing, each of which require careful tuning and can fail in unexpected ways, our system is trained end-to-end. The pixels from neighboring views of a scene are presented to the network which then directly produces the pixels of the unseen view. The benefits of our approach include generality (we only require posed image sets and can easily apply our method to different domains), and high quality results on traditionally difficult scenes. We believe this is due to the end-to-end nature of our system which is able to plausibly generate pixels according to color, depth, and texture priors learnt automatically from the training data. To verify our method we show that it can convincingly reproduce known test views from nearby imagery. Additionally we show images rendered from novel viewpoints. To our knowledge, our work is the first to apply deep learning to the problem of new view synthesis from sets of real-world, natural imagery.
△ Less
Submitted 22 June, 2015;
originally announced June 2015.
-
Representing Data by a Mixture of Activated Simplices
Authors:
Chunyu Wang,
John Flynn,
Yizhou Wang,
Alan L. Yuille
Abstract:
We present a new model which represents data as a mixture of simplices. Simplices are geometric structures that generalize triangles. We give a simple geometric understanding that allows us to learn a simplicial structure efficiently. Our method requires that the data are unit normalized (and thus lie on the unit sphere). We show that under this restriction, building a model with simplices amounts…
▽ More
We present a new model which represents data as a mixture of simplices. Simplices are geometric structures that generalize triangles. We give a simple geometric understanding that allows us to learn a simplicial structure efficiently. Our method requires that the data are unit normalized (and thus lie on the unit sphere). We show that under this restriction, building a model with simplices amounts to constructing a convex hull inside the sphere whose boundary facets is close to the data. We call the boundary facets of the convex hull that are close to the data Activated Simplices. While the total number of bases used to build the simplices is a parameter of the model, the dimensions of the individual activated simplices are learned from the data. Simplices can have different dimensions, which facilitates modeling of inhomogeneous data sources. The simplicial structure is bounded --- this is appropriate for modeling data with constraints, such as human elbows can not bend more than 180 degrees. The simplices are easy to interpret and extremes within the data can be discovered among the vertices. The method provides good reconstruction and regularization. It supports good nearest neighbor classification and it allows realistic generative models to be constructed. It achieves state-of-the-art results on benchmark datasets, including 3D poses and digits.
△ Less
Submitted 12 December, 2014;
originally announced December 2014.