-
CIMGEN: Controlled Image Manipulation by Finetuning Pretrained Generative Models on Limited Data
Authors:
Chandrakanth Gudavalli,
Erik Rosten,
Lakshmanan Nataraj,
Shivkumar Chandrasekaran,
B. S. Manjunath
Abstract:
Content creation and image editing can benefit from flexible user controls. A common intermediate representation for conditional image generation is a semantic map, that has information of objects present in the image. When compared to raw RGB pixels, the modification of semantic map is much easier. One can take a semantic map and easily modify the map to selectively insert, remove, or replace obj…
▽ More
Content creation and image editing can benefit from flexible user controls. A common intermediate representation for conditional image generation is a semantic map, that has information of objects present in the image. When compared to raw RGB pixels, the modification of semantic map is much easier. One can take a semantic map and easily modify the map to selectively insert, remove, or replace objects in the map. The method proposed in this paper takes in the modified semantic map and alter the original image in accordance to the modified map. The method leverages traditional pre-trained image-to-image translation GANs, such as CycleGAN or Pix2Pix GAN, that are fine-tuned on a limited dataset of reference images associated with the semantic maps. We discuss the qualitative and quantitative performance of our technique to illustrate its capacity and possible applications in the fields of image forgery and image editing. We also demonstrate the effectiveness of the proposed image forgery technique in thwarting the numerous deep learning-based image forensic techniques, highlighting the urgent need to develop robust and generalizable image forensic tools in the fight against the spread of fake media.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
3D Structure from 2D Microscopy images using Deep Learning
Authors:
Benjamin J. Blundell,
Christian Sieben,
Suliana Manley,
Ed Rosten,
QueeLim Ch'ng,
Susan Cox
Abstract:
Understanding the structure of a protein complex is crucial indetermining its function. However, retrieving accurate 3D structures from microscopy images is highly challenging, particularly as many imaging modalities are two-dimensional. Recent advances in Artificial Intelligence have been applied to this problem, primarily using voxel based approaches to analyse sets of electron microscopy images…
▽ More
Understanding the structure of a protein complex is crucial indetermining its function. However, retrieving accurate 3D structures from microscopy images is highly challenging, particularly as many imaging modalities are two-dimensional. Recent advances in Artificial Intelligence have been applied to this problem, primarily using voxel based approaches to analyse sets of electron microscopy images. Herewe present a deep learning solution for reconstructing the protein com-plexes from a number of 2D single molecule localization microscopy images, with the solution being completely unconstrained. Our convolutional neural network coupled with a differentiable renderer predicts pose and derives a single structure. After training, the network is dis-carded, with the output of this method being a structural model which fits the data-set. We demonstrate the performance of our system on two protein complexes: CEP152 (which comprises part of the proximal toroid of the centriole) and centrioles.
△ Less
Submitted 14 October, 2021;
originally announced October 2021.
-
SeeTheSeams: Localized Detection of Seam Carving based Image Forgery in Satellite Imagery
Authors:
Chandrakanth Gudavalli,
Erik Rosten,
Lakshmanan Nataraj,
Shivkumar Chandrasekaran,
B. S. Manjunath
Abstract:
Seam carving is a popular technique for content aware image retargeting. It can be used to deliberately manipulate images, for example, change the GPS locations of a building or insert/remove roads in a satellite image. This paper proposes a novel approach for detecting and localizing seams in such images. While there are methods to detect seam carving based manipulations, this is the first time t…
▽ More
Seam carving is a popular technique for content aware image retargeting. It can be used to deliberately manipulate images, for example, change the GPS locations of a building or insert/remove roads in a satellite image. This paper proposes a novel approach for detecting and localizing seams in such images. While there are methods to detect seam carving based manipulations, this is the first time that robust localization and detection of seam carving forgery is made possible. We also propose a seam localization score (SLS) metric to evaluate the effectiveness of localization. The proposed method is evaluated extensively on a large collection of images from different sources, demonstrating a high level of detection and localization performance across these datasets. The datasets curated during this work will be released to the public.
△ Less
Submitted 27 August, 2021;
originally announced August 2021.
-
Progressive Batching for Efficient Non-linear Least Squares
Authors:
Huu Le,
Christopher Zach,
Edward Rosten,
Oliver J. Woodford
Abstract:
Non-linear least squares solvers are used across a broad range of offline and real-time model fitting problems. Most improvements of the basic Gauss-Newton algorithm tackle convergence guarantees or leverage the sparsity of the underlying problem structure for computational speedup. With the success of deep learning methods leveraging large datasets, stochastic optimization methods received recent…
▽ More
Non-linear least squares solvers are used across a broad range of offline and real-time model fitting problems. Most improvements of the basic Gauss-Newton algorithm tackle convergence guarantees or leverage the sparsity of the underlying problem structure for computational speedup. With the success of deep learning methods leveraging large datasets, stochastic optimization methods received recently a lot of attention. Our work borrows ideas from both stochastic machine learning and statistics, and we present an approach for non-linear least-squares that guarantees convergence while at the same time significantly reduces the required amount of computation. Empirical results show that our proposed method achieves competitive convergence rates compared to traditional second-order approaches on common computer vision problems, such as image alignment and essential matrix estimation, with very large numbers of residuals.
△ Less
Submitted 21 October, 2020;
originally announced October 2020.
-
Large Scale Photometric Bundle Adjustment
Authors:
Oliver J. Woodford,
Edward Rosten
Abstract:
Direct methods have shown promise on visual odometry and SLAM, leading to greater accuracy and robustness over feature-based methods. However, offline 3-d reconstruction from internet images has not yet benefited from a joint, photometric optimization over dense geometry and camera parameters. Issues such as the lack of brightness constancy, and the sheer volume of data, make this a more challengi…
▽ More
Direct methods have shown promise on visual odometry and SLAM, leading to greater accuracy and robustness over feature-based methods. However, offline 3-d reconstruction from internet images has not yet benefited from a joint, photometric optimization over dense geometry and camera parameters. Issues such as the lack of brightness constancy, and the sheer volume of data, make this a more challenging task. This work presents a framework for jointly optimizing millions of scene points and hundreds of camera poses and intrinsics, using a photometric cost that is invariant to local lighting changes. The improvement in metric reconstruction accuracy that it confers over feature-based bundle adjustment is demonstrated on the large-scale Tanks & Temples benchmark. We further demonstrate qualitative reconstruction improvements on an internet photo collection, with challenging diversity in lighting and camera intrinsics.
△ Less
Submitted 10 September, 2020; v1 submitted 26 August, 2020;
originally announced August 2020.
-
Y-Autoencoders: disentangling latent representations via sequential-encoding
Authors:
Massimiliano Patacchiola,
Patrick Fox-Roberts,
Edward Rosten
Abstract:
In the last few years there have been important advancements in generative models with the two dominant approaches being Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs). However, standard Autoencoders (AEs) and closely related structures have remained popular because they are easy to train and adapt to different tasks. An interesting question is if we can achieve state-o…
▽ More
In the last few years there have been important advancements in generative models with the two dominant approaches being Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs). However, standard Autoencoders (AEs) and closely related structures have remained popular because they are easy to train and adapt to different tasks. An interesting question is if we can achieve state-of-the-art performance with AEs while retaining their good properties. We propose an answer to this question by introducing a new model called Y-Autoencoder (Y-AE). The structure and training procedure of a Y-AE enclose a representation into an implicit and an explicit part. The implicit part is similar to the output of an autoencoder and the explicit part is strongly correlated with labels in the training set. The two parts are separated in the latent space by splitting the output of the encoder into two paths (forming a Y shape) before decoding and re-encoding. We then impose a number of losses, such as reconstruction loss, and a loss on dependence between the implicit and explicit parts. Additionally, the projection in the explicit manifold is monitored by a predictor, that is embedded in the encoder and trained end-to-end with no adversarial losses. We provide significant experimental results on various domains, such as separation of style and content, image-to-image translation, and inverse graphics.
△ Less
Submitted 25 July, 2019;
originally announced July 2019.
-
Boosting in Location Space
Authors:
Damian Eads,
David Helmbold,
Ed Rosten
Abstract:
The goal of object detection is to find objects in an image. An object detector accepts an image and produces a list of locations as $(x,y)$ pairs. Here we introduce a new concept: {\bf location-based boosting}. Location-based boosting differs from previous boosting algorithms because it optimizes a new spatial loss function to combine object detectors, each of which may have marginal performance,…
▽ More
The goal of object detection is to find objects in an image. An object detector accepts an image and produces a list of locations as $(x,y)$ pairs. Here we introduce a new concept: {\bf location-based boosting}. Location-based boosting differs from previous boosting algorithms because it optimizes a new spatial loss function to combine object detectors, each of which may have marginal performance, into a single, more accurate object detector. A structured representation of object locations as a list of $(x,y)$ pairs is a more natural domain for object detection than the spatially unstructured representation produced by classifiers. Furthermore, this formulation allows us to take advantage of the intuition that large areas of the background are uninteresting and it is not worth expending computational effort on them. This results in a more scalable algorithm because it does not need to take measures to prevent the background data from swamping the foreground data such as subsampling or applying an ad-hoc weighting to the pixels. We first present the theory of location-based boosting, and then motivate it with empirical results on a challenging data set.
△ Less
Submitted 4 September, 2013;
originally announced September 2013.
-
Improved RANSAC performance using simple, iterative minimal-set solvers
Authors:
Edward Rosten,
Gerhard Reitmayr,
Tom Drummond
Abstract:
RANSAC is a popular technique for estimating model parameters in the presence of outliers. The best speed is achieved when the minimum possible number of points is used to estimate hypotheses for the model. Many useful problems can be represented using polynomial constraints (for instance, the determinant of a fundamental matrix must be zero) and so have a number of solutions which are consistent…
▽ More
RANSAC is a popular technique for estimating model parameters in the presence of outliers. The best speed is achieved when the minimum possible number of points is used to estimate hypotheses for the model. Many useful problems can be represented using polynomial constraints (for instance, the determinant of a fundamental matrix must be zero) and so have a number of solutions which are consistent with a minimal set. A considerable amount of effort has been expended on finding the constraints of such problems, and these often require the solution of systems of polynomial equations. We show that better performance can be achieved by using a simple optimization based approach on minimal sets. For a given minimal set, the optimization approach is not guaranteed to converge to the correct solution. However, when used within RANSAC the greater speed and numerical stability results in better performance overall, and much simpler algorithms. We also show that by selecting more than the minimal number of points and using robust optimization can yield better results for very noisy by reducing the number of trials required. The increased speed of our method demonstrated with experiments on essential matrix estimation.
△ Less
Submitted 8 July, 2010;
originally announced July 2010.
-
Automatic creation of urban velocity fields from aerial video
Authors:
Edward Rosten,
Rohan Loveland,
Mark Hickman
Abstract:
In this paper, we present a system for modelling vehicle motion in an urban scene from low frame-rate aerial video. In particular, the scene is modelled as a probability distribution over velocities at every pixel in the image.
We describe the complete system for acquiring this model. The video is captured from a helicopter and stabilized by warping the images to match an orthorectified image…
▽ More
In this paper, we present a system for modelling vehicle motion in an urban scene from low frame-rate aerial video. In particular, the scene is modelled as a probability distribution over velocities at every pixel in the image.
We describe the complete system for acquiring this model. The video is captured from a helicopter and stabilized by warping the images to match an orthorectified image of the area. A pixel classifier is applied to the stabilized images, and the response is segmented to determine car locations and orientations. The results are fed in to a tracking scheme which tracks cars for three frames, creating tracklets. This allows the tracker to use a combination of velocity, direction, appearance, and acceleration cues to keep only tracks likely to be correct. Each tracklet provides a measurement of the car velocity at every point along the tracklet's length, and these are then aggregated to create a histogram of vehicle velocities at every pixel in the image.
The results demonstrate that the velocity probability distribution prior can be used to infer a variety of information about road lane directions, speed limits, vehicle speeds and common trajectories, and traffic bottlenecks, as well as providing a means of describing environmental knowledge about traffic rules that can be used in tracking.
△ Less
Submitted 7 December, 2009;
originally announced December 2009.
-
Learning Object Location Predictors with Boosting and Grammar-Guided Feature Extraction
Authors:
Damian Eads,
Edward Rosten,
David Helmbold
Abstract:
We present BEAMER: a new spatially exploitative approach to learning object detectors which shows excellent results when applied to the task of detecting objects in greyscale aerial imagery in the presence of ambiguous and noisy data. There are four main contributions used to produce these results. First, we introduce a grammar-guided feature extraction system, enabling the exploration of a rich…
▽ More
We present BEAMER: a new spatially exploitative approach to learning object detectors which shows excellent results when applied to the task of detecting objects in greyscale aerial imagery in the presence of ambiguous and noisy data. There are four main contributions used to produce these results. First, we introduce a grammar-guided feature extraction system, enabling the exploration of a richer feature space while constraining the features to a useful subset. This is specified with a rule-based generative grammar crafted by a human expert. Second, we learn a classifier on this data using a newly proposed variant of AdaBoost which takes into account the spatially correlated nature of the data. Third, we perform another round of training to optimize the method of converting the pixel classifications generated by boosting into a high quality set of (x, y) locations. Lastly, we carefully define three common problems in object detection and define two evaluation criteria that are tightly matched to these problems. Major strengths of this approach are: (1) a way of randomly searching a broad feature space, (2) its performance when evaluated on well-matched evaluation criteria, and (3) its use of the location prediction domain to learn object detectors as well as to generate detections that perform well on several tasks: object counting, tracking, and target detection. We demonstrate the efficacy of BEAMER with a comprehensive experimental evaluation on a challenging data set.
△ Less
Submitted 24 July, 2009;
originally announced July 2009.
-
Camera distortion self-calibration using the plumb-line constraint and minimal Hough entropy
Authors:
Edward Rosten,
Rohan Loveland
Abstract:
In this paper we present a simple and robust method for self-correction of camera distortion using single images of scenes which contain straight lines. Since the most common distortion can be modelled as radial distortion, we illustrate the method using the Harris radial distortion model, but the method is applicable to any distortion model. The method is based on transforming the edgels of the…
▽ More
In this paper we present a simple and robust method for self-correction of camera distortion using single images of scenes which contain straight lines. Since the most common distortion can be modelled as radial distortion, we illustrate the method using the Harris radial distortion model, but the method is applicable to any distortion model. The method is based on transforming the edgels of the distorted image to a 1-D angular Hough space, and optimizing the distortion correction parameters which minimize the entropy of the corresponding normalized histogram. Properly corrected imagery will have fewer curved lines, and therefore less spread in Hough space. Since the method does not rely on any image structure beyond the existence of edgels sharing some common orientations and does not use edge fitting, it is applicable to a wide variety of image types. For instance, it can be applied equally well to images of texture with weak but dominant orientations, or images with strong vanishing points. Finally, the method is performed on both synthetic and real data revealing that it is particularly robust to noise.
△ Less
Submitted 4 January, 2009; v1 submitted 24 October, 2008;
originally announced October 2008.
-
Faster and better: a machine learning approach to corner detection
Authors:
Edward Rosten,
Reid Porter,
Tom Drummond
Abstract:
The repeatability and efficiency of a corner detector determines how likely it is to be useful in a real-world application. The repeatability is importand because the same scene viewed from different positions should yield features which correspond to the same real-world 3D locations [Schmid et al 2000]. The efficiency is important because this determines whether the detector combined with furth…
▽ More
The repeatability and efficiency of a corner detector determines how likely it is to be useful in a real-world application. The repeatability is importand because the same scene viewed from different positions should yield features which correspond to the same real-world 3D locations [Schmid et al 2000]. The efficiency is important because this determines whether the detector combined with further processing can operate at frame rate.
Three advances are described in this paper. First, we present a new heuristic for feature detection, and using machine learning we derive a feature detector from this which can fully process live PAL video using less than 5% of the available processing time. By comparison, most other detectors cannot even operate at frame rate (Harris detector 115%, SIFT 195%). Second, we generalize the detector, allowing it to be optimized for repeatability, with little loss of efficiency. Third, we carry out a rigorous comparison of corner detectors based on the above repeatability criterion applied to 3D scenes. We show that despite being principally constructed for speed, on these stringent tests, our heuristic detector significantly outperforms existing feature detectors. Finally, the comparison demonstrates that using machine learning produces significant improvements in repeatability, yielding a detector that is both very fast and very high quality.
△ Less
Submitted 14 October, 2008;
originally announced October 2008.