Search | arXiv e-print repository

Segmentation-Free Guidance for Text-to-Image Diffusion Models

Authors: Kambiz Azarian, Debasmit Das, Qiqi Hou, Fatih Porikli

Abstract: We introduce segmentation-free guidance, a novel method designed for text-to-image diffusion models like Stable Diffusion. Our method does not require retraining of the diffusion model. At no additional compute cost, it uses the diffusion model itself as an implied segmentation network, hence named segmentation-free guidance, to dynamically adjust the negative prompt for each patch of the generate… ▽ More We introduce segmentation-free guidance, a novel method designed for text-to-image diffusion models like Stable Diffusion. Our method does not require retraining of the diffusion model. At no additional compute cost, it uses the diffusion model itself as an implied segmentation network, hence named segmentation-free guidance, to dynamically adjust the negative prompt for each patch of the generated image, based on the patch's relevance to concepts in the prompt. We evaluate segmentation-free guidance both objectively, using FID, CLIP, IS, and PickScore, and subjectively, through human evaluators. For the subjective evaluation, we also propose a methodology for subsampling the prompts in a dataset like MS COCO-30K to keep the number of human evaluations manageable while ensuring that the selected subset is both representative in terms of content and fair in terms of model performance. The results demonstrate the superiority of our segmentation-free guidance to the widely used classifier-free method. Human evaluators preferred segmentation-free guidance over classifier-free 60% to 19%, with 18% of occasions showing a strong preference. Additionally, PickScore win-rate, a recently proposed metric mimicking human preference, also indicates a preference for our method over classifier-free. △ Less

Submitted 3 June, 2024; originally announced July 2024.

arXiv:2302.14611 [pdf, other]

TransAdapt: A Transformative Framework for Online Test Time Adaptive Semantic Segmentation

Authors: Debasmit Das, Shubhankar Borse, Hyojin Park, Kambiz Azarian, Hong Cai, Risheek Garrepalli, Fatih Porikli

Abstract: Test-time adaptive (TTA) semantic segmentation adapts a source pre-trained image semantic segmentation model to unlabeled batches of target domain test images, different from real-world, where samples arrive one-by-one in an online fashion. To tackle online settings, we propose TransAdapt, a framework that uses transformer and input transformations to improve segmentation performance. Specifically… ▽ More Test-time adaptive (TTA) semantic segmentation adapts a source pre-trained image semantic segmentation model to unlabeled batches of target domain test images, different from real-world, where samples arrive one-by-one in an online fashion. To tackle online settings, we propose TransAdapt, a framework that uses transformer and input transformations to improve segmentation performance. Specifically, we pre-train a transformer-based module on a segmentation network that transforms unsupervised segmentation output to a more reliable supervised output, without requiring test-time online training. To also facilitate test-time adaptation, we propose an unsupervised loss based on the transformed input that enforces the model to be invariant and equivariant to photometric and geometric perturbations, respectively. Overall, our framework produces higher quality segmentation masks with up to 17.6% and 2.8% mIOU improvement over no-adaptation and competitive baselines, respectively. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: ICASSP 2023

arXiv:2212.06242 [pdf, other]

Test-time Adaptation vs. Training-time Generalization: A Case Study in Human Instance Segmentation using Keypoints Estimation

Authors: Kambiz Azarian, Debasmit Das, Hyojin Park, Fatih Porikli

Abstract: We consider the problem of improving the human instance segmentation mask quality for a given test image using keypoints estimation. We compare two alternative approaches. The first approach is a test-time adaptation (TTA) method, where we allow test-time modification of the segmentation network's weights using a single unlabeled test image. In this approach, we do not assume test-time access to t… ▽ More We consider the problem of improving the human instance segmentation mask quality for a given test image using keypoints estimation. We compare two alternative approaches. The first approach is a test-time adaptation (TTA) method, where we allow test-time modification of the segmentation network's weights using a single unlabeled test image. In this approach, we do not assume test-time access to the labeled source dataset. More specifically, our TTA method consists of using the keypoints estimates as pseudo labels and backpropagating them to adjust the backbone weights. The second approach is a training-time generalization (TTG) method, where we permit offline access to the labeled source dataset but not the test-time modification of weights. Furthermore, we do not assume the availability of any images from or knowledge about the target domain. Our TTG method consists of augmenting the backbone features with those generated by the keypoints head and feeding the aggregate vector to the mask head. Through a comprehensive set of ablations, we evaluate both approaches and identify several factors limiting the TTA gains. In particular, we show that in the absence of a significant domain shift, TTA may hurt and TTG show only a small gain in performance, whereas for a large domain shift, TTA gains are smaller and dependent on the heuristics used, while TTG gains are larger and robust to architectural choices. △ Less

Submitted 12 December, 2022; originally announced December 2022.

arXiv:2104.11413 [pdf, other]

Unsupervised Information Obfuscation for Split Inference of Neural Networks

Authors: Mohammad Samragh, Hossein Hosseini, Aleksei Triastcyn, Kambiz Azarian, Joseph Soriaga, Farinaz Koushanfar

Abstract: Splitting network computations between the edge device and a server enables low edge-compute inference of neural networks but might expose sensitive information about the test query to the server. To address this problem, existing techniques train the model to minimize information leakage for a given set of sensitive attributes. In practice, however, the test queries might contain attributes that… ▽ More Splitting network computations between the edge device and a server enables low edge-compute inference of neural networks but might expose sensitive information about the test query to the server. To address this problem, existing techniques train the model to minimize information leakage for a given set of sensitive attributes. In practice, however, the test queries might contain attributes that are not foreseen during training. We propose instead an unsupervised obfuscation method to discard the information irrelevant to the main task. We formulate the problem via an information theoretical framework and derive an analytical solution for a given distortion to the model output. In our method, the edge device runs the model up to a split layer determined based on its computational capacity. It then obfuscates the obtained feature vector based on the first layer of the server model by removing the components in the null space as well as the low-energy components of the remaining signal. Our experimental results show that our method outperforms existing techniques in removing the information of the irrelevant attributes and maintaining the accuracy on the target label. We also show that our method reduces the communication cost and incurs only a small computational overhead. △ Less

Submitted 22 June, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

arXiv:2103.10629 [pdf, other]

Cascade Weight Shedding in Deep Neural Networks: Benefits and Pitfalls for Network Pruning

Authors: Kambiz Azarian, Fatih Porikli

Abstract: We report, for the first time, on the cascade weight shedding phenomenon in deep neural networks where in response to pruning a small percentage of a network's weights, a large percentage of the remaining is shed over a few epochs during the ensuing fine-tuning phase. We show that cascade weight shedding, when present, can significantly improve the performance of an otherwise sub-optimal scheme su… ▽ More We report, for the first time, on the cascade weight shedding phenomenon in deep neural networks where in response to pruning a small percentage of a network's weights, a large percentage of the remaining is shed over a few epochs during the ensuing fine-tuning phase. We show that cascade weight shedding, when present, can significantly improve the performance of an otherwise sub-optimal scheme such as random pruning. This explains why some pruning methods may perform well under certain circumstances, but poorly under others, e.g., ResNet50 vs. MobileNetV3. We provide insight into why the global magnitude-based pruning, i.e., GMP, despite its simplicity, provides a competitive performance for a wide range of scenarios. We also demonstrate cascade weight shedding's potential for improving GMP's accuracy, and reduce its computational complexity. In doing so, we highlight the importance of pruning and learning-rate schedules. We shed light on weight and learning-rate rewinding methods of re-training, showing their possible connections to the cascade weight shedding and reason for their advantage over fine-tuning. We also investigate cascade weight shedding's effect on the set of kept weights, and its implications for semi-structured pruning. Finally, we give directions for future research. △ Less

Submitted 19 March, 2021; originally announced March 2021.

arXiv:2003.00075 [pdf, other]

Learned Threshold Pruning

Authors: Kambiz Azarian, Yash Bhalgat, Jinwon Lee, Tijmen Blankevoort

Abstract: This paper presents a novel differentiable method for unstructured weight pruning of deep neural networks. Our learned-threshold pruning (LTP) method learns per-layer thresholds via gradient descent, unlike conventional methods where they are set as input. Making thresholds trainable also makes LTP computationally efficient, hence scalable to deeper networks. For example, it takes $30$ epochs for… ▽ More This paper presents a novel differentiable method for unstructured weight pruning of deep neural networks. Our learned-threshold pruning (LTP) method learns per-layer thresholds via gradient descent, unlike conventional methods where they are set as input. Making thresholds trainable also makes LTP computationally efficient, hence scalable to deeper networks. For example, it takes $30$ epochs for LTP to prune ResNet50 on ImageNet by a factor of $9.1$. This is in contrast to other methods that search for per-layer thresholds via a computationally intensive iterative pruning and fine-tuning process. Additionally, with a novel differentiable $L_0$ regularization, LTP is able to operate effectively on architectures with batch-normalization. This is important since $L_1$ and $L_2$ penalties lose their regularizing effect in networks with batch-normalization. Finally, LTP generates a trail of progressively sparser networks from which the desired pruned network can be picked based on sparsity and performance requirements. These features allow LTP to achieve competitive compression rates on ImageNet networks such as AlexNet ($26.4\times$ compression with $79.1\%$ Top-5 accuracy) and ResNet50 ($9.1\times$ compression with $92.0\%$ Top-5 accuracy). We also show that LTP effectively prunes modern \textit{compact} architectures, such as EfficientNet, MobileNetV2 and MixNet. △ Less

Submitted 18 March, 2021; v1 submitted 28 February, 2020; originally announced March 2020.

arXiv:cs/0701053 [pdf, ps, other]

A Case For Amplify-Forward Relaying in the Block-Fading Multi-Access Channel

Authors: Deqiang Chen, Kambiz Azarian, J. Nicholas Laneman

Abstract: This paper demonstrates the significant gains that multi-access users can achieve from sharing a single amplify-forward relay in slow fading environments. The proposed protocol, namely the multi-access relay amplify-forward, allows for a low-complexity relay and achieves the optimal diversity-multiplexing trade-off at high multiplexing gains. Analysis of the protocol reveals that it uniformly do… ▽ More This paper demonstrates the significant gains that multi-access users can achieve from sharing a single amplify-forward relay in slow fading environments. The proposed protocol, namely the multi-access relay amplify-forward, allows for a low-complexity relay and achieves the optimal diversity-multiplexing trade-off at high multiplexing gains. Analysis of the protocol reveals that it uniformly dominates the compress-forward strategy and further outperforms the dynamic decode-forward protocol at high multiplexing gains. An interesting feature of the proposed protocol is that, at high multiplexing gains, it resembles a multiple-input single-output system, and at low multiplexing gains, it provides each user with the same diversity-multiplexing trade-off as if there is no contention for the relay from the other users. △ Less

Submitted 8 January, 2007; originally announced January 2007.

arXiv:cs/0602049 [pdf, ps, other]

Cooperative Lattice Coding and Decoding

Authors: Arul Murugan, Kambiz Azarian, Hesham El Gamal

Abstract: A novel lattice coding framework is proposed for outage-limited cooperative channels. This framework provides practical implementations for the optimal cooperation protocols proposed by Azarian et al. In particular, for the relay channel we implement a variant of the dynamic decode and forward protocol, which uses orthogonal constellations to reduce the channel seen by the destination to a singl… ▽ More A novel lattice coding framework is proposed for outage-limited cooperative channels. This framework provides practical implementations for the optimal cooperation protocols proposed by Azarian et al. In particular, for the relay channel we implement a variant of the dynamic decode and forward protocol, which uses orthogonal constellations to reduce the channel seen by the destination to a single-input single-output time-selective one, while inheriting the same diversity-multiplexing tradeoff. This simplification allows for building the receiver using traditional belief propagation or tree search architectures. Our framework also generalizes the coding scheme of Yang and Belfiore in the context of amplify and forward cooperation. For the cooperative multiple access channel, a tree coding approach, matched to the optimal linear cooperation protocol of Azarain et al, is developed. For this scenario, the MMSE-DFE Fano decoder is shown to enjoy an excellent tradeoff between performance and complexity. Finally, the utility of the proposed schemes is established via a comprehensive simulation study. △ Less

Submitted 15 February, 2006; v1 submitted 13 February, 2006; originally announced February 2006.

Comments: 25 pages, 8 figures

ACM Class: E.4; H.1.1

arXiv:cs/0602048 [pdf, ps, other]

On the Optimality of the ARQ-DDF Protocol

Authors: Kambiz Azarian, Hesham El Gamal, Philip Schniter

Abstract: The performance of the automatic repeat request-dynamic decode and forward (ARQ-DDF) cooperation protocol is analyzed in two distinct scenarios. The first scenario is the multiple access relay (MAR) channel where a single relay is dedicated to simultaneously help several multiple access users. For this setup, it is shown that the ARQ-DDF protocol achieves the optimal diversity multiplexing trade… ▽ More The performance of the automatic repeat request-dynamic decode and forward (ARQ-DDF) cooperation protocol is analyzed in two distinct scenarios. The first scenario is the multiple access relay (MAR) channel where a single relay is dedicated to simultaneously help several multiple access users. For this setup, it is shown that the ARQ-DDF protocol achieves the optimal diversity multiplexing tradeoff (DMT) of the channel. The second scenario is the cooperative vector multiple access (CVMA) channel where the users cooperate in delivering their messages to a destination equipped with multiple receiving antennas. For this setup, we develop a new variant of the ARQ-DDF protocol where the users are purposefully instructed not to cooperate in the first round of transmission. Lower and upper bounds on the achievable DMT are then derived. These bounds are shown to converge to the optimal tradeoff as the number of transmission rounds increases. △ Less

Submitted 13 February, 2006; originally announced February 2006.

Comments: 26 pages, 2 figures

ACM Class: E.4; H.1.1

arXiv:cs/0509021 [pdf, ps, other]

The Throughput-Reliability Tradeoff in MIMO Channels

Authors: Kambiz Azarian, Hesham El Gamal

Abstract: In this paper, an outage limited MIMO channel is considered. We build on Zheng and Tse's elegant formulation of the diversity-multiplexing tradeoff to develop a better understanding of the asymptotic relationship between the probability of error, transmission rate, and signal-to-noise ratio. In particular, we identify the limitation imposed by the multiplexing gain notion and develop a new formu… ▽ More In this paper, an outage limited MIMO channel is considered. We build on Zheng and Tse's elegant formulation of the diversity-multiplexing tradeoff to develop a better understanding of the asymptotic relationship between the probability of error, transmission rate, and signal-to-noise ratio. In particular, we identify the limitation imposed by the multiplexing gain notion and develop a new formulation for the throughput-reliability tradeoff that avoids this limitation. The new characterization is then used to elucidate the asymptotic trends exhibited by the outage probability curves of MIMO channels. △ Less

Submitted 7 September, 2005; originally announced September 2005.

Comments: 30 pages, 15 figures, Submitted to IEEE transactions on Information Theory

ACM Class: E.4

arXiv:cs/0506018 [pdf, ps, other]

On the Achievable Diversity-Multiplexing Tradeoffs in Half-Duplex Cooperative Channels

Authors: Kambiz Azarian, Hesham El Gamal, Philip Schniter

Abstract: In this paper, we propose novel cooperative transmission protocols for delay limited coherent fading channels consisting of N (half-duplex and single-antenna) partners and one cell site. In our work, we differentiate between the relay, cooperative broadcast (down-link), and cooperative multiple-access (up-link) channels. For the relay channel, we investigate two classes of cooperation schemes; n… ▽ More In this paper, we propose novel cooperative transmission protocols for delay limited coherent fading channels consisting of N (half-duplex and single-antenna) partners and one cell site. In our work, we differentiate between the relay, cooperative broadcast (down-link), and cooperative multiple-access (up-link) channels. For the relay channel, we investigate two classes of cooperation schemes; namely, Amplify and Forward (AF) protocols and Decode and Forward (DF) protocols. For the first class, we establish an upper bound on the achievable diversity-multiplexing tradeoff with a single relay. We then construct a new AF protocol that achieves this upper bound. The proposed algorithm is then extended to the general case with N-1 relays where it is shown to outperform the space-time coded protocol of Laneman and Worenell without requiring decoding/encoding at the relays. For the class of DF protocols, we develop a dynamic decode and forward (DDF) protocol that achieves the optimal tradeoff for multiplexing gains 0 < r < 1/N. Furthermore, with a single relay, the DDF protocol is shown to dominate the class of AF protocols for all multiplexing gains. The superiority of the DDF protocol is shown to be more significant in the cooperative broadcast channel. The situation is reversed in the cooperative multiple-access channel where we propose a new AF protocol that achieves the optimal tradeoff for all multiplexing gains. A distinguishing feature of the proposed protocols in the three scenarios is that they do not rely on orthogonal subspaces, allowing for a more efficient use of resources. In fact, using our results one can argue that the sub-optimality of previously proposed protocols stems from their use of orthogonal subspaces rather than the half-duplex constraint. △ Less

Submitted 6 June, 2005; originally announced June 2005.

Comments: 40 pages, 12 figures, IEEE Transactions on Information Theory

ACM Class: E.4

Showing 1–11 of 11 results for author: Azarian, K