Skip to main content

Showing 1–18 of 18 results for author: Kolkin, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.01639  [pdf, other

    cs.CV cs.GR cs.LG

    SliderSpace: Decomposing the Visual Capabilities of Diffusion Models

    Authors: Rohit Gandikota, Zongze Wu, Richard Zhang, David Bau, Eli Shechtman, Nick Kolkin

    Abstract: We present SliderSpace, a framework for automatically decomposing the visual capabilities of diffusion models into controllable and human-understandable directions. Unlike existing control methods that require a user to specify attributes for each edit direction individually, SliderSpace discovers multiple interpretable and diverse directions simultaneously from a single text prompt. Each directio… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

    Comments: Project Website: https://sliderspace.baulab.info

  2. arXiv:2408.08332  [pdf, other

    cs.CV cs.LG

    TurboEdit: Instant text-based image editing

    Authors: Zongze Wu, Nicholas Kolkin, Jonathan Brandt, Richard Zhang, Eli Shechtman

    Abstract: We address the challenges of precise image inversion and disentangled image editing in the context of few-step diffusion models. We introduce an encoder based iterative inversion technique. The inversion network is conditioned on the input image and the reconstructed image from the previous step, allowing for correction of the next reconstruction towards the input image. We demonstrate that disent… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Accepted to European Conference on Computer Vision (ECCV), 2024. Project page: https://betterze.github.io/TurboEdit/

  3. arXiv:2405.12978  [pdf, other

    cs.CV

    Personalized Residuals for Concept-Driven Text-to-Image Generation

    Authors: Cusuh Ham, Matthew Fisher, James Hays, Nicholas Kolkin, Yuchen Liu, Richard Zhang, Tobias Hinz

    Abstract: We present personalized residuals and localized attention-guided sampling for efficient concept-driven generation using text-to-image diffusion models. Our method first represents concepts by freezing the weights of a pretrained text-conditioned diffusion model and learning low-rank residuals for a small subset of the model's layers. The residual-based approach then directly enables application of… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: CVPR 2024. Project page at https://cusuh.github.io/personalized-residuals

  4. arXiv:2311.17137  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Generative Models: What Do They Know? Do They Know Things? Let's Find Out!

    Authors: Xiaodan Du, Nicholas Kolkin, Greg Shakhnarovich, Anand Bhattad

    Abstract: Generative models excel at mimicking real scenes, suggesting they might inherently encode important intrinsic scene properties. In this paper, we aim to explore the following key questions: (1) What intrinsic knowledge do generative models like GANs, Autoregressive models, and Diffusion models encode? (2) Can we establish a general framework to recover intrinsic representations from these models,… ▽ More

    Submitted 16 October, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: https://intrinsic-lora.github.io/

  5. arXiv:2311.04315  [pdf, other

    cs.CV

    A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization

    Authors: Xingzhe He, Zhiwen Cao, Nicholas Kolkin, Lantao Yu, Kun Wan, Helge Rhodin, Ratheesh Kalarot

    Abstract: Large text-to-image models have revolutionized the ability to generate imagery using natural language. However, particularly unique or personal visual concepts, such as pets and furniture, will not be captured by the original model. This has led to interest in how to personalize a text-to-image model. Despite significant progress, this task remains a formidable challenge, particularly in preservin… ▽ More

    Submitted 6 November, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

    Comments: WACV 2025

  6. arXiv:2307.04157  [pdf, other

    cs.CV

    DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer

    Authors: Dan Ruta, Gemma Canet Tarrés, Andrew Gilbert, Eli Shechtman, Nicholas Kolkin, John Collomosse

    Abstract: Neural Style Transfer (NST) is the field of study applying neural techniques to modify the artistic appearance of a content image to match the style of a reference style image. Traditionally, NST methods have focused on texture-based image edits, affecting mostly low level information and keeping most image structures the same. However, style-based deformation of the content is desirable for some… ▽ More

    Submitted 11 July, 2023; v1 submitted 9 July, 2023; originally announced July 2023.

  7. arXiv:2304.05139  [pdf, other

    cs.CV cs.LG

    NeAT: Neural Artistic Tracing for Beautiful Style Transfer

    Authors: Dan Ruta, Andrew Gilbert, John Collomosse, Eli Shechtman, Nicholas Kolkin

    Abstract: Style transfer is the task of reproducing the semantic contents of a source image in the artistic style of a second target image. In this paper, we present NeAT, a new state-of-the art feed-forward style transfer method. We re-formulate feed-forward style transfer as image editing, rather than image generation, resulting in a model which improves over the state-of-the-art in both preserving the so… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  8. arXiv:2209.03953  [pdf, other

    cs.CV cs.LG

    Text-Free Learning of a Natural Language Interface for Pretrained Face Generators

    Authors: Xiaodan Du, Raymond A. Yeh, Nicholas Kolkin, Eli Shechtman, Greg Shakhnarovich

    Abstract: We propose Fast text2StyleGAN, a natural language interface that adapts pre-trained GANs for text-guided human face synthesis. Leveraging the recent advances in Contrastive Language-Image Pre-training (CLIP), no text data is required during training. Fast text2StyleGAN is formulated as a conditional variational autoencoder (CVAE) that provides extra control and diversity to the generated images at… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

  9. arXiv:2206.06360  [pdf, other

    cs.CV

    ARF: Artistic Radiance Fields

    Authors: Kai Zhang, Nick Kolkin, Sai Bi, Fujun Luan, Zexiang Xu, Eli Shechtman, Noah Snavely

    Abstract: We present a method for transferring the artistic features of an arbitrary style image to a 3D scene. Previous methods that perform 3D stylization on point clouds or meshes are sensitive to geometric reconstruction errors for complex real-world scenes. Instead, we propose to stylize the more robust radiance field representation. We find that the commonly used Gram matrix-based loss tends to produc… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: Project page: https://www.cs.cornell.edu/projects/arf/

  10. arXiv:2203.13215  [pdf, other

    cs.CV cs.GR

    Neural Neighbor Style Transfer

    Authors: Nicholas Kolkin, Michal Kucera, Sylvain Paris, Daniel Sykora, Eli Shechtman, Greg Shakhnarovich

    Abstract: We propose Neural Neighbor Style Transfer (NNST), a pipeline that offers state-of-the-art quality, generalization, and competitive efficiency for artistic style transfer. Our approach is based on explicitly replacing neural features extracted from the content input (to be stylized) with those from a style exemplar, then synthesizing the final output based on these rearranged features. While the sp… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

    Comments: Code for NNST-Opt available at https://github.com/nkolkin13/NeuralNeighborStyleTransfer

  11. arXiv:2110.06443  [pdf, other

    cs.CV cs.AI

    Harnessing the Conditioning Sensorium for Improved Image Translation

    Authors: Cooper Nederhood, Nicholas Kolkin, Deqing Fu, Jason Salavon

    Abstract: Multi-modal domain translation typically refers to synthesizing a novel image that inherits certain localized attributes from a 'content' image (e.g. layout, semantics, or geometry), and inherits everything else (e.g. texture, lighting, sometimes even semantics) from a 'style' image. The dominant approach to this task is attempting to learn disentangled 'content' and 'style' representations from s… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

  12. arXiv:2108.12847  [pdf, other

    cs.CV cs.GR

    Non-Parametric Neural Style Transfer

    Authors: Nicholas Kolkin

    Abstract: It seems easy to imagine a photograph of the Eiffel Tower painted in the style of Vincent van Gogh's 'The Starry Night', but upon introspection it is difficult to precisely define what this would entail. What visual elements must an image contain to represent the 'content' of the Eiffel Tower? What visual elements of 'The Starry Night' are caused by van Gogh's 'style' rather than his decision to d… ▽ More

    Submitted 29 August, 2021; originally announced August 2021.

    Comments: PhD thesis

  13. arXiv:2003.11038  [pdf, other

    cs.CV cs.GR cs.LG

    Deformable Style Transfer

    Authors: Sunnie S. Y. Kim, Nicholas Kolkin, Jason Salavon, Gregory Shakhnarovich

    Abstract: Both geometry and texture are fundamental aspects of visual style. Existing style transfer methods, however, primarily focus on texture, almost entirely ignoring geometry. We propose deformable style transfer (DST), an optimization-based approach that jointly stylizes the texture and geometry of a content image to better match a style image. Unlike previous geometry-aware stylization methods, our… ▽ More

    Submitted 19 July, 2020; v1 submitted 24 March, 2020; originally announced March 2020.

    Comments: ECCV 2020 (21 pages, 11 figures including the supplementary material)

  14. arXiv:1908.00463  [pdf, other

    cs.CV

    DIODE: A Dense Indoor and Outdoor DEpth Dataset

    Authors: Igor Vasiljevic, Nick Kolkin, Shanyi Zhang, Ruotian Luo, Haochen Wang, Falcon Z. Dai, Andrea F. Daniele, Mohammadreza Mostajabi, Steven Basart, Matthew R. Walter, Gregory Shakhnarovich

    Abstract: We introduce DIODE, a dataset that contains thousands of diverse high resolution color images with accurate, dense, long-range depth measurements. DIODE (Dense Indoor/Outdoor DEpth) is the first public dataset to include RGBD images of indoor and outdoor scenes obtained with one sensor suite. This is in contrast to existing datasets that focus on just one domain/scene type and employ different sen… ▽ More

    Submitted 29 August, 2019; v1 submitted 1 August, 2019; originally announced August 2019.

  15. arXiv:1904.12785  [pdf, other

    cs.CV

    Style Transfer by Relaxed Optimal Transport and Self-Similarity

    Authors: Nicholas Kolkin, Jason Salavon, Greg Shakhnarovich

    Abstract: Style transfer algorithms strive to render the content of one image using the style of another. We propose Style Transfer by Relaxed Optimal Transport and Self-Similarity (STROTSS), a new optimization-based style transfer algorithm. We extend our method to allow user-specified point-to-point or region-to-region control over visual similarity between the style image and the output. Such guidance ca… ▽ More

    Submitted 9 October, 2019; v1 submitted 29 April, 2019; originally announced April 2019.

    Comments: To Appear CVPR 2019, Webdemo Available at http://style.ttic.edu

  16. arXiv:1708.02212  [pdf, other

    cs.CV

    Training Deep Networks to be Spatially Sensitive

    Authors: Nicholas Kolkin, Gregory Shakhnarovich, Eli Shechtman

    Abstract: In many computer vision tasks, for example saliency prediction or semantic segmentation, the desired output is a foreground map that predicts pixels where some criteria is satisfied. Despite the inherently spatial nature of this task commonly used learning objectives do not incorporate the spatial relationships between misclassified pixels and the underlying ground truth. The Weighted F-measure, a… ▽ More

    Submitted 7 August, 2017; originally announced August 2017.

    Comments: ICCV 2017

  17. arXiv:1612.01991  [pdf, other

    cs.CV

    Diverse Sampling for Self-Supervised Learning of Semantic Segmentation

    Authors: Mohammadreza Mostajabi, Nicholas Kolkin, Gregory Shakhnarovich

    Abstract: We propose an approach for learning category-level semantic segmentation purely from image-level classification tags indicating presence of categories. It exploits localization cues that emerge from training classification-tasked convolutional networks, to drive a "self-supervision" process that automatically labels a sparse, diverse training set of points likely to belong to classes of interest.… ▽ More

    Submitted 6 December, 2016; originally announced December 2016.

  18. arXiv:1412.1740  [pdf, other

    stat.ML cs.CV cs.LG

    Image Data Compression for Covariance and Histogram Descriptors

    Authors: Matt J. Kusner, Nicholas I. Kolkin, Stephen Tyree, Kilian Q. Weinberger

    Abstract: Covariance and histogram image descriptors provide an effective way to capture information about images. Both excel when used in combination with special purpose distance metrics. For covariance descriptors these metrics measure the distance along the non-Euclidean Riemannian manifold of symmetric positive definite matrices. For histogram descriptors the Earth Mover's distance measures the optimal… ▽ More

    Submitted 23 May, 2015; v1 submitted 4 December, 2014; originally announced December 2014.