-
Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback
Authors:
Nina Konovalova,
Maxim Nikolaev,
Andrey Kuznetsov,
Aibek Alanov
Abstract:
Despite significant progress in text-to-image diffusion models, achieving precise spatial control over generated outputs remains challenging. ControlNet addresses this by introducing an auxiliary conditioning module, while ControlNet++ further refines alignment through a cycle consistency loss applied only to the final denoising steps. However, this approach neglects intermediate generation stages…
▽ More
Despite significant progress in text-to-image diffusion models, achieving precise spatial control over generated outputs remains challenging. ControlNet addresses this by introducing an auxiliary conditioning module, while ControlNet++ further refines alignment through a cycle consistency loss applied only to the final denoising steps. However, this approach neglects intermediate generation stages, limiting its effectiveness. We propose InnerControl, a training strategy that enforces spatial consistency across all diffusion steps. Our method trains lightweight convolutional probes to reconstruct input control signals (e.g., edges, depth) from intermediate UNet features at every denoising step. These probes efficiently extract signals even from highly noisy latents, enabling pseudo ground truth controls for training. By minimizing the discrepancy between predicted and target conditions throughout the entire diffusion process, our alignment loss improves both control fidelity and generation quality. Combined with established techniques like ControlNet++, InnerControl achieves state-of-the-art performance across diverse conditioning methods (e.g., edges, depth).
△ Less
Submitted 3 July, 2025;
originally announced July 2025.
-
MaterialFusion: High-Quality, Zero-Shot, and Controllable Material Transfer with Diffusion Models
Authors:
Kamil Garifullin,
Maxim Nikolaev,
Andrey Kuznetsov,
Aibek Alanov
Abstract:
Manipulating the material appearance of objects in images is critical for applications like augmented reality, virtual prototyping, and digital content creation. We present MaterialFusion, a novel framework for high-quality material transfer that allows users to adjust the degree of material application, achieving an optimal balance between new material properties and the object's original feature…
▽ More
Manipulating the material appearance of objects in images is critical for applications like augmented reality, virtual prototyping, and digital content creation. We present MaterialFusion, a novel framework for high-quality material transfer that allows users to adjust the degree of material application, achieving an optimal balance between new material properties and the object's original features. MaterialFusion seamlessly integrates the modified object into the scene by maintaining background consistency and mitigating boundary artifacts. To thoroughly evaluate our approach, we have compiled a dataset of real-world material transfer examples and conducted complex comparative analyses. Through comprehensive quantitative evaluations and user studies, we demonstrate that MaterialFusion significantly outperforms existing methods in terms of quality, user control, and background preservation. Code is available at https://github.com/ControlGenAI/MaterialFusion.
△ Less
Submitted 12 February, 2025; v1 submitted 10 February, 2025;
originally announced February 2025.
-
Greedy Conjecture for the Shortest Common Superstring Problem and its Strengthenings
Authors:
Maksim Nikolaev
Abstract:
In the Shortest Common Superstring problem, one needs to find the shortest superstring for a set of strings. This problem is APX-hard, and many approximation algorithms were proposed, with the current best approximation factor of 2.466. Whereas these algorithms are technically involved, for more than thirty years the Greedy Conjecture remains unsolved, that states that the Greedy Algorithm ``take…
▽ More
In the Shortest Common Superstring problem, one needs to find the shortest superstring for a set of strings. This problem is APX-hard, and many approximation algorithms were proposed, with the current best approximation factor of 2.466. Whereas these algorithms are technically involved, for more than thirty years the Greedy Conjecture remains unsolved, that states that the Greedy Algorithm ``take two strings with the maximum overlap; merge them; repeat'' is a 2-approximation.
This conjecture is still open, and one way to approach it is to consider its stronger version, which may make the proof easier due to the stronger premise or provide insights from its refutation. In this paper, we propose two directions to strengthen the conjecture. First, we introduce the Locally Greedy Algorithm (LGA), that selects a pair of strings not with the largest overlap but with the \emph{locally largest} overlap, that is, the largest among all pairs of strings with the same first or second string. Second, we change the quality metric: instead of length, we evaluate the solution by the number of occurrences of an arbitrary symbol.
Despite the double strengthening, we prove that LGA is a \emph{uniform} 4-approximation, that is, it always constructs a superstring with no more than four times as many occurrences of an arbitrary symbol as any other superstring. At the same time, we discover the limitations of the greedy heuristic: we show that LGA is at least 3-approximation, and the Greedy Algorithm is at least uniform 2.5-approximation. These result show that if the Greedy Conjecture is true, it is not because the Greedy Algorithm is locally greedy or is uniformly 2-approximation.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
HairFastGAN: Realistic and Robust Hair Transfer with a Fast Encoder-Based Approach
Authors:
Maxim Nikolaev,
Mikhail Kuznetsov,
Dmitry Vetrov,
Aibek Alanov
Abstract:
Our paper addresses the complex task of transferring a hairstyle from a reference image to an input photo for virtual hair try-on. This task is challenging due to the need to adapt to various photo poses, the sensitivity of hairstyles, and the lack of objective metrics. The current state of the art hairstyle transfer methods use an optimization process for different parts of the approach, making t…
▽ More
Our paper addresses the complex task of transferring a hairstyle from a reference image to an input photo for virtual hair try-on. This task is challenging due to the need to adapt to various photo poses, the sensitivity of hairstyles, and the lack of objective metrics. The current state of the art hairstyle transfer methods use an optimization process for different parts of the approach, making them inexcusably slow. At the same time, faster encoder-based models are of very low quality because they either operate in StyleGAN's W+ space or use other low-dimensional image generators. Additionally, both approaches have a problem with hairstyle transfer when the source pose is very different from the target pose, because they either don't consider the pose at all or deal with it inefficiently. In our paper, we present the HairFast model, which uniquely solves these problems and achieves high resolution, near real-time performance, and superior reconstruction compared to optimization problem-based methods. Our solution includes a new architecture operating in the FS latent space of StyleGAN, an enhanced inpainting approach, and improved encoders for better alignment, color transfer, and a new encoder for post-processing. The effectiveness of our approach is demonstrated on realism metrics after random hairstyle transfer and reconstruction when the original hairstyle is transferred. In the most difficult scenario of transferring both shape and color of a hairstyle from different images, our method performs in less than a second on the Nvidia V100. Our code is available at https://github.com/AIRI-Institute/HairFastGAN.
△ Less
Submitted 25 May, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
All instantiations of the greedy algorithm for the shortest superstring problem are equivalent
Authors:
Maksim Nikolaev
Abstract:
In the Shortest Common Superstring problem (SCS), one needs to find the shortest superstring for a set of strings. While SCS is NP-hard and MAX-SNP-hard, the Greedy Algorithm "choose two strings with the largest overlap; merge them; repeat" achieves a constant factor approximation that is known to be at most 3.5 and conjectured to be equal to 2. The Greedy Algorithm is not deterministic, so its in…
▽ More
In the Shortest Common Superstring problem (SCS), one needs to find the shortest superstring for a set of strings. While SCS is NP-hard and MAX-SNP-hard, the Greedy Algorithm "choose two strings with the largest overlap; merge them; repeat" achieves a constant factor approximation that is known to be at most 3.5 and conjectured to be equal to 2. The Greedy Algorithm is not deterministic, so its instantiations with different tie-breaking rules may have different approximation factors. In this paper, we show that it is not the case: all factors are equal. To prove this, we show how to transform a set of strings so that all overlaps are different whereas their ratios stay roughly the same.
We also reveal connections between the original version of SCS and the following one: find a~superstring minimizing the number of occurrences of a given symbol. It turns out that the latter problem is equivalent to the original one.
△ Less
Submitted 10 February, 2021;
originally announced February 2021.
-
Collapsing Superstring Conjecture
Authors:
Alexander Golovnev,
Alexander S. Kulikov,
Alexander Logunov,
Ivan Mihajlin,
Maksim Nikolaev
Abstract:
In the Shortest Common Superstring (SCS) problem, one is given a collection of strings, and needs to find a shortest string containing each of them as a substring. SCS admits $2\frac{11}{23}$-approximation in polynomial time (Mucha, SODA'13). While this algorithm and its analysis are technically involved, the 30 years old Greedy Conjecture claims that the trivial and efficient Greedy Algorithm giv…
▽ More
In the Shortest Common Superstring (SCS) problem, one is given a collection of strings, and needs to find a shortest string containing each of them as a substring. SCS admits $2\frac{11}{23}$-approximation in polynomial time (Mucha, SODA'13). While this algorithm and its analysis are technically involved, the 30 years old Greedy Conjecture claims that the trivial and efficient Greedy Algorithm gives a 2-approximation for SCS.
We develop a graph-theoretic framework for studying approximation algorithms for SCS. The framework is reminiscent of the classical 2-approximation for Traveling Salesman: take two copies of an optimal solution, apply a trivial edge-collapsing procedure, and get an approximate solution. In this framework, we observe two surprising properties of SCS solutions, and we conjecture that they hold for all input instances. The first conjecture, that we call Collapsing Superstring conjecture, claims that there is an elementary way to transform any solution repeated twice into the same graph $G$. This conjecture would give an elementary 2-approximate algorithm for SCS. The second conjecture claims that not only the resulting graph $G$ is the same for all solutions, but that $G$ can be computed by an elementary greedy procedure called Greedy Hierarchical Algorithm.
While the second conjecture clearly implies the first one, perhaps surprisingly we prove their equivalence. We support these equivalent conjectures by giving a proof for the special case where all input strings have length at most 3. We prove that the standard Greedy Conjecture implies Greedy Hierarchical Conjecture, while the latter is sufficient for an efficient greedy 2-approximate approximation of SCS. Except for its (conjectured) good approximation ratio, the Greedy Hierarchical Algorithm provably finds a 3.5-approximation.
△ Less
Submitted 3 June, 2020; v1 submitted 23 September, 2018;
originally announced September 2018.