Search | arXiv e-print repository

myEye2Wheeler: A Two-Wheeler Indian Driver Real-World Eye-Tracking Dataset

Authors: Bhaiya Vaibhaw Kumar, Deepti Rawat, Tanvi Kandalla, Aarnav Nagariya, Kavita Vemuri

Abstract: This paper presents the myEye2Wheeler dataset, a unique resource of real-world gaze behaviour of two-wheeler drivers navigating complex Indian traffic. Most datasets are from four-wheeler drivers on well-planned roads and homogeneous traffic. Our dataset offers a critical lens into the unique visual attention patterns and insights into the decision-making of Indian two-wheeler drivers. The analysi… ▽ More This paper presents the myEye2Wheeler dataset, a unique resource of real-world gaze behaviour of two-wheeler drivers navigating complex Indian traffic. Most datasets are from four-wheeler drivers on well-planned roads and homogeneous traffic. Our dataset offers a critical lens into the unique visual attention patterns and insights into the decision-making of Indian two-wheeler drivers. The analysis demonstrates that existing saliency models, like TASED-Net, perform less effectively on the myEye-2Wheeler dataset compared to when applied on the European 4-wheeler eye tracking datasets (DR(Eye)VE), highlighting the need for models specifically tailored to the traffic conditions. By introducing the dataset, we not only fill a significant gap in two-wheeler driver behaviour research in India but also emphasise the critical need for developing context-specific saliency models. The larger aim is to improve road safety for two-wheeler users and lane-planning to support a cost-effective mode of transport. △ Less

Submitted 18 February, 2025; originally announced February 2025.

arXiv:2305.01884 [pdf, other]

Class adaptive threshold and negative class guided noisy annotation robust Facial Expression Recognition

Authors: Darshan Gera, Badveeti Naveen Siva Kumar, Bobbili Veerendra Raj Kumar, S Balasubramanian

Abstract: The hindering problem in facial expression recognition (FER) is the presence of inaccurate annotations referred to as noisy annotations in the datasets. These noisy annotations are present in the datasets inherently because the labeling is subjective to the annotator, clarity of the image, etc. Recent works use sample selection methods to solve this noisy annotation problem in FER. In our work, we… ▽ More The hindering problem in facial expression recognition (FER) is the presence of inaccurate annotations referred to as noisy annotations in the datasets. These noisy annotations are present in the datasets inherently because the labeling is subjective to the annotator, clarity of the image, etc. Recent works use sample selection methods to solve this noisy annotation problem in FER. In our work, we use a dynamic adaptive threshold to separate confident samples from non-confident ones so that our learning won't be hampered due to non-confident samples. Instead of discarding the non-confident samples, we impose consistency in the negative classes of those non-confident samples to guide the model to learn better in the positive class. Since FER datasets usually come with 7 or 8 classes, we can correctly guess a negative class by 85% probability even by choosing randomly. By learning "which class a sample doesn't belong to", the model can learn "which class it belongs to" in a better manner. We demonstrate proposed framework's effectiveness using quantitative as well as qualitative results. Our method performs better than the baseline by a margin of 4% to 28% on RAFDB and 3.3% to 31.4% on FERPlus for various levels of synthetic noisy labels in the aforementioned datasets. △ Less

Submitted 3 May, 2023; originally announced May 2023.

Comments: 14 pages, 9 figures

arXiv:2303.09785 [pdf, ps, other]

ABAW : Facial Expression Recognition in the wild

Authors: Darshan Gera, Badveeti Naveen Siva Kumar, Bobbili Veerendra Raj Kumar, S Balasubramanian

Abstract: The fifth Affective Behavior Analysis in-the-wild (ABAW) competition has multiple challenges such as Valence-Arousal Estimation Challenge, Expression Classification Challenge, Action Unit Detection Challenge, Emotional Reaction Intensity Estimation Challenge. In this paper we have dealt only expression classification challenge using multiple approaches such as fully supervised, semi-supervised and… ▽ More The fifth Affective Behavior Analysis in-the-wild (ABAW) competition has multiple challenges such as Valence-Arousal Estimation Challenge, Expression Classification Challenge, Action Unit Detection Challenge, Emotional Reaction Intensity Estimation Challenge. In this paper we have dealt only expression classification challenge using multiple approaches such as fully supervised, semi-supervised and noisy label approach. Our approach using noise aware model has performed better than baseline model by 10.46% and semi supervised model has performed better than baseline model by 9.38% and the fully supervised model has performed better than the baseline by 9.34% △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: 6 pages

arXiv:2208.10221 [pdf, other]

Dynamic Adaptive Threshold based Learning for Noisy Annotations Robust Facial Expression Recognition

Authors: Darshan Gera, Naveen Siva Kumar Badveeti, Bobbili Veerendra Raj Kumar, S Balasubramanian

Abstract: The real-world facial expression recognition (FER) datasets suffer from noisy annotations due to crowd-sourcing, ambiguity in expressions, the subjectivity of annotators and inter-class similarity. However, the recent deep networks have strong capacity to memorize the noisy annotations leading to corrupted feature embedding and poor generalization. To handle noisy annotations, we propose a dynamic… ▽ More The real-world facial expression recognition (FER) datasets suffer from noisy annotations due to crowd-sourcing, ambiguity in expressions, the subjectivity of annotators and inter-class similarity. However, the recent deep networks have strong capacity to memorize the noisy annotations leading to corrupted feature embedding and poor generalization. To handle noisy annotations, we propose a dynamic FER learning framework (DNFER) in which clean samples are selected based on dynamic class specific threshold during training. Specifically, DNFER is based on supervised training using selected clean samples and unsupervised consistent training using all the samples. During training, the mean posterior class probabilities of each mini-batch is used as dynamic class-specific threshold to select the clean samples for supervised training. This threshold is independent of noise rate and does not need any clean data unlike other methods. In addition, to learn from all samples, the posterior distributions between weakly-augmented image and strongly-augmented image are aligned using an unsupervised consistency loss. We demonstrate the robustness of DNFER on both synthetic as well as on real noisy annotated FER datasets like RAFDB, FERPlus, SFEW and AffectNet. △ Less

Submitted 22 August, 2022; originally announced August 2022.

arXiv:2207.09012 [pdf, other]

SS-MFAR : Semi-supervised Multi-task Facial Affect Recognition

Authors: Darshan Gera, Badveeti Naveen Siva Kumar, Bobbili Veerendra Raj Kumar, S Balasubramanian

Abstract: Automatic affect recognition has applications in many areas such as education, gaming, software development, automotives, medical care, etc. but it is non trivial task to achieve appreciable performance on in-the-wild data sets. In-the-wild data sets though represent real-world scenarios better than synthetic data sets, the former ones suffer from the problem of incomplete labels. Inspired by semi… ▽ More Automatic affect recognition has applications in many areas such as education, gaming, software development, automotives, medical care, etc. but it is non trivial task to achieve appreciable performance on in-the-wild data sets. In-the-wild data sets though represent real-world scenarios better than synthetic data sets, the former ones suffer from the problem of incomplete labels. Inspired by semi-supervised learning, in this paper, we introduce our submission to the Multi-Task-Learning Challenge at the 4th Affective Behavior Analysis in-the-wild (ABAW) 2022 Competition. The three tasks that are considered in this challenge are valence-arousal(VA) estimation, classification of expressions into 6 basic (anger, disgust, fear, happiness, sadness, surprise), neutral, and the 'other' category and 12 action units(AU) numbered AU-{1,2,4,6,7,10,12,15,23,24,25,26}. Our method Semi-supervised Multi-task Facial Affect Recognition titled SS-MFAR uses a deep residual network with task specific classifiers for each of the tasks along with adaptive thresholds for each expression class and semi-supervised learning for the incomplete labels. Source code is available at https://github.com/1980x/ABAW2022DMACS. △ Less

Submitted 5 August, 2022; v1 submitted 18 July, 2022; originally announced July 2022.

Comments: ABAW 2022 test set results added

arXiv:2007.10315 [pdf, other]

Joint Disentangling and Adaptation for Cross-Domain Person Re-Identification

Authors: Yang Zou, Xiaodong Yang, Zhiding Yu, B. V. K. Vijaya Kumar, Jan Kautz

Abstract: Although a significant progress has been witnessed in supervised person re-identification (re-id), it remains challenging to generalize re-id models to new domains due to the huge domain gaps. Recently, there has been a growing interest in using unsupervised domain adaptation to address this scalability issue. Existing methods typically conduct adaptation on the representation space that contains… ▽ More Although a significant progress has been witnessed in supervised person re-identification (re-id), it remains challenging to generalize re-id models to new domains due to the huge domain gaps. Recently, there has been a growing interest in using unsupervised domain adaptation to address this scalability issue. Existing methods typically conduct adaptation on the representation space that contains both id-related and id-unrelated factors, thus inevitably undermining the adaptation efficacy of id-related features. In this paper, we seek to improve adaptation by purifying the representation space to be adapted. To this end, we propose a joint learning framework that disentangles id-related/unrelated features and enforces adaptation to work on the id-related feature space exclusively. Our model involves a disentangling module that encodes cross-domain images into a shared appearance space and two separate structure spaces, and an adaptation module that performs adversarial alignment and self-training on the shared appearance space. The two modules are co-designed to be mutually beneficial. Extensive experiments demonstrate that the proposed joint learning framework outperforms the state-of-the-art methods by clear margins. △ Less

Submitted 20 July, 2020; originally announced July 2020.

Comments: ECCV 2020 (Oral)

arXiv:2005.00946 [pdf, other]

Towards Occlusion-Aware Multifocal Displays

Authors: Jen-Hao Rick Chang, Anat Levin, B. V. K. Vijaya Kumar, Aswin C. Sankaranarayanan

Abstract: The human visual system uses numerous cues for depth perception, including disparity, accommodation, motion parallax and occlusion. It is incumbent upon virtual-reality displays to satisfy these cues to provide an immersive user experience. Multifocal displays, one of the classic approaches to satisfy the accommodation cue, place virtual content at multiple focal planes, each at a di erent depth.… ▽ More The human visual system uses numerous cues for depth perception, including disparity, accommodation, motion parallax and occlusion. It is incumbent upon virtual-reality displays to satisfy these cues to provide an immersive user experience. Multifocal displays, one of the classic approaches to satisfy the accommodation cue, place virtual content at multiple focal planes, each at a di erent depth. However, the content on focal planes close to the eye do not occlude those farther away; this deteriorates the occlusion cue as well as reduces contrast at depth discontinuities due to leakage of the defocus blur. This paper enables occlusion-aware multifocal displays using a novel ConeTilt operator that provides an additional degree of freedom -- tilting the light cone emitted at each pixel of the display panel. We show that, for scenes with relatively simple occlusion con gurations, tilting the light cones provides the same e ect as physical occlusion. We demonstrate that ConeTilt can be easily implemented by a phase-only spatial light modulator. Using a lab prototype, we show results that demonstrate the presence of occlusion cues and the increased contrast of the display at depth edges. △ Less

Submitted 2 May, 2020; originally announced May 2020.

Comments: SIGGRAPH 2020

arXiv:1910.10369 [pdf, other]

Deep Classification Network for Monocular Depth Estimation

Authors: Azeez Oluwafemi, Yang Zou, B. V. K. Vijaya Kumar

Abstract: Monocular Depth Estimation is usually treated as a supervised and regression problem when it actually is very similar to semantic segmentation task since they both are fundamentally pixel-level classification tasks. We applied depth increments that increases with depth in discretizing depth values and then applied Deeplab v2 and the result was higher accuracy. We were able to achieve a state-of-th… ▽ More Monocular Depth Estimation is usually treated as a supervised and regression problem when it actually is very similar to semantic segmentation task since they both are fundamentally pixel-level classification tasks. We applied depth increments that increases with depth in discretizing depth values and then applied Deeplab v2 and the result was higher accuracy. We were able to achieve a state-of-the-art result on the KITTI dataset and outperformed existing architecture by an 8% margin. △ Less

Submitted 23 October, 2019; originally announced October 2019.

arXiv:1908.09822 [pdf, other]

Confidence Regularized Self-Training

Authors: Yang Zou, Zhiding Yu, Xiaofeng Liu, B. V. K. Vijaya Kumar, Jinsong Wang

Abstract: Recent advances in domain adaptation show that deep self-training presents a powerful means for unsupervised domain adaptation. These methods often involve an iterative process of predicting on target domain and then taking the confident predictions as pseudo-labels for retraining. However, since pseudo-labels can be noisy, self-training can put overconfident label belief on wrong classes, leading… ▽ More Recent advances in domain adaptation show that deep self-training presents a powerful means for unsupervised domain adaptation. These methods often involve an iterative process of predicting on target domain and then taking the confident predictions as pseudo-labels for retraining. However, since pseudo-labels can be noisy, self-training can put overconfident label belief on wrong classes, leading to deviated solutions with propagated errors. To address the problem, we propose a confidence regularized self-training (CRST) framework, formulated as regularized self-training. Our method treats pseudo-labels as continuous latent variables jointly optimized via alternating optimization. We propose two types of confidence regularization: label regularization (LR) and model regularization (MR). CRST-LR generates soft pseudo-labels while CRST-MR encourages the smoothness on network output. Extensive experiments on image classification and semantic segmentation show that CRSTs outperform their non-regularized counterpart with state-of-the-art performance. The code and models of this work are available at https://github.com/yzou2/CRST. △ Less

Submitted 15 July, 2020; v1 submitted 26 August, 2019; originally announced August 2019.

Comments: Accepted to ICCV 2019 (Oral)

arXiv:1908.01872 [pdf, other]

Attention Control with Metric Learning Alignment for Image Set-based Recognition

Authors: Xiaofeng Liu, Zhenhua Guo, Jane You, B. V. K Vijaya Kumar

Abstract: This paper considers the problem of image set-based face verification and identification. Unlike traditional single sample (an image or a video) setting, this situation assumes the availability of a set of heterogeneous collection of orderless images and videos. The samples can be taken at different check points, different identity documents $etc$. The importance of each image is usually considere… ▽ More This paper considers the problem of image set-based face verification and identification. Unlike traditional single sample (an image or a video) setting, this situation assumes the availability of a set of heterogeneous collection of orderless images and videos. The samples can be taken at different check points, different identity documents $etc$. The importance of each image is usually considered either equal or based on a quality assessment of that image independent of other images and/or videos in that image set. How to model the relationship of orderless images within a set remains a challenge. We address this problem by formulating it as a Markov Decision Process (MDP) in a latent space. Specifically, we first propose a dependency-aware attention control (DAC) network, which uses actor-critic reinforcement learning for attention decision of each image to exploit the correlations among the unordered images. An off-policy experience replay is introduced to speed up the learning process. Moreover, the DAC is combined with a temporal model for videos using divide and conquer strategies. We also introduce a pose-guided representation (PGR) scheme that can further boost the performance at extreme poses. We propose a parameter-free PGR without the need for training as well as a novel metric learning-based PGR for pose alignment without the need for pose detection in testing stage. Extensive evaluations on IJB-A/B/C, YTF, Celebrity-1000 datasets demonstrate that our method outperforms many state-of-art approaches on the set-based as well as video-based face recognition databases. △ Less

Submitted 5 August, 2019; originally announced August 2019.

Comments: Accepted to IEEE T-IFS (Extension of ECCV 2018 paper: Dependency-aware Attention Control for Unconstrained Face Recognition with Image Sets). arXiv admin note: substantial text overlap with arXiv:1907.03030; text overlap with arXiv:1707.00130 by other authors

arXiv:1908.01174 [pdf, other]

Permutation-invariant Feature Restructuring for Correlation-aware Image Set-based Recognition

Authors: Xiaofeng Liu, Zhenhua Guo, Site Li, Lingsheng Kong, Ping Jia, Jane You, B. V. K. Kumar

Abstract: We consider the problem of comparing the similarity of image sets with variable-quantity, quality and un-ordered heterogeneous images. We use feature restructuring to exploit the correlations of both inner$\&$inter-set images. Specifically, the residual self-attention can effectively restructure the features using the other features within a set to emphasize the discriminative images and eliminate… ▽ More We consider the problem of comparing the similarity of image sets with variable-quantity, quality and un-ordered heterogeneous images. We use feature restructuring to exploit the correlations of both inner$\&$inter-set images. Specifically, the residual self-attention can effectively restructure the features using the other features within a set to emphasize the discriminative images and eliminate the redundancy. Then, a sparse/collaborative learning-based dependency-guided representation scheme reconstructs the probe features conditional to the gallery features in order to adaptively align the two sets. This enables our framework to be compatible with both verification and open-set identification. We show that the parametric self-attention network and non-parametric dictionary learning can be trained end-to-end by a unified alternative optimization scheme, and that the full framework is permutation-invariant. In the numerical experiments we conducted, our method achieves top performance on competitive image set/video-based face recognition and person re-identification benchmarks. △ Less

Submitted 3 August, 2019; originally announced August 2019.

Comments: Accepted to ICCV 2019

arXiv:1907.03030 [pdf, other]

Dependency-aware Attention Control for Unconstrained Face Recognition with Image Sets

Authors: Xiaofeng Liu, B. V. K Vijaya Kumar, Chao Yang, Qingming Tang, Jane You

Abstract: This paper targets the problem of image set-based face verification and identification. Unlike traditional single media (an image or video) setting, we encounter a set of heterogeneous contents containing orderless images and videos. The importance of each image is usually considered either equal or based on their independent quality assessment. How to model the relationship of orderless images wi… ▽ More This paper targets the problem of image set-based face verification and identification. Unlike traditional single media (an image or video) setting, we encounter a set of heterogeneous contents containing orderless images and videos. The importance of each image is usually considered either equal or based on their independent quality assessment. How to model the relationship of orderless images within a set remains a challenge. We address this problem by formulating it as a Markov Decision Process (MDP) in the latent space. Specifically, we first present a dependency-aware attention control (DAC) network, which resorts to actor-critic reinforcement learning for sequential attention decision of each image embedding to fully exploit the rich correlation cues among the unordered images. Moreover, we introduce its sample-efficient variant with off-policy experience replay to speed up the learning process. The pose-guided representation scheme can further boost the performance at the extremes of the pose variation. △ Less

Submitted 5 July, 2019; originally announced July 2019.

Comments: Fixed the unreadable code in CVF version. arXiv admin note: text overlap with arXiv:1707.00130 by other authors

arXiv:1810.07911 [pdf, other]

Domain Adaptation for Semantic Segmentation via Class-Balanced Self-Training

Authors: Yang Zou, Zhiding Yu, B. V. K. Vijaya Kumar, Jinsong Wang

Abstract: Recent deep networks achieved state of the art performance on a variety of semantic segmentation tasks. Despite such progress, these models often face challenges in real world `wild tasks' where large difference between labeled training/source data and unseen test/target data exists. In particular, such difference is often referred to as `domain gap', and could cause significantly decreased perfor… ▽ More Recent deep networks achieved state of the art performance on a variety of semantic segmentation tasks. Despite such progress, these models often face challenges in real world `wild tasks' where large difference between labeled training/source data and unseen test/target data exists. In particular, such difference is often referred to as `domain gap', and could cause significantly decreased performance which cannot be easily remedied by further increasing the representation power. Unsupervised domain adaptation (UDA) seeks to overcome such problem without target domain labels. In this paper, we propose a novel UDA framework based on an iterative self-training procedure, where the problem is formulated as latent variable loss minimization, and can be solved by alternatively generating pseudo labels on target data and re-training the model with these labels. On top of self-training, we also propose a novel class-balanced self-training framework to avoid the gradual dominance of large classes on pseudo-label generation, and introduce spatial priors to refine generated labels. Comprehensive experiments show that the proposed methods achieve state of the art semantic segmentation performance under multiple major UDA settings. △ Less

Submitted 25 October, 2018; v1 submitted 18 October, 2018; originally announced October 2018.

Comments: Accepted to ECCV 2018

arXiv:1808.01992 [pdf, other]

Simultaneous Edge Alignment and Learning

Authors: Zhiding Yu, Weiyang Liu, Yang Zou, Chen Feng, Srikumar Ramalingam, B. V. K. Vijaya Kumar, Jan Kautz

Abstract: Edge detection is among the most fundamental vision problems for its role in perceptual grouping and its wide applications. Recent advances in representation learning have led to considerable improvements in this area. Many state of the art edge detection models are learned with fully convolutional networks (FCNs). However, FCN-based edge learning tends to be vulnerable to misaligned labels due to… ▽ More Edge detection is among the most fundamental vision problems for its role in perceptual grouping and its wide applications. Recent advances in representation learning have led to considerable improvements in this area. Many state of the art edge detection models are learned with fully convolutional networks (FCNs). However, FCN-based edge learning tends to be vulnerable to misaligned labels due to the delicate structure of edges. While such problem was considered in evaluation benchmarks, similar issue has not been explicitly addressed in general edge learning. In this paper, we show that label misalignment can cause considerably degraded edge learning quality, and address this issue by proposing a simultaneous edge alignment and learning framework. To this end, we formulate a probabilistic model where edge alignment is treated as latent variable optimization, and is learned end-to-end during network training. Experiments show several applications of this work, including improved edge detection with state of the art performance, and automatic refinement of noisy annotations. △ Less

Submitted 26 October, 2018; v1 submitted 6 August, 2018; originally announced August 2018.

Comments: Accepted to ECCV 2018

arXiv:1805.10664 [pdf, other]

doi 10.1145/3272127.3275015

Towards Multifocal Displays with Dense Focal Stacks

Authors: Jen-Hao Rick Chang, B. V. K. Vijaya Kumar, Aswin C. Sankaranarayanan

Abstract: We present a virtual reality display that is capable of generating a dense collection of depth/focal planes. This is achieved by driving a focus-tunable lens to sweep a range of focal lengths at a high frequency and, subsequently, tracking the focal length precisely at microsecond time resolutions using an optical module. Precise tracking of the focal length, coupled with a high-speed display, ena… ▽ More We present a virtual reality display that is capable of generating a dense collection of depth/focal planes. This is achieved by driving a focus-tunable lens to sweep a range of focal lengths at a high frequency and, subsequently, tracking the focal length precisely at microsecond time resolutions using an optical module. Precise tracking of the focal length, coupled with a high-speed display, enables our lab prototype to generate 1600 focal planes per second. This enables a novel first-of-its-kind virtual reality multifocal display that is capable of resolving the vergence-accommodation conflict endemic to today's displays. △ Less

Submitted 22 September, 2018; v1 submitted 27 May, 2018; originally announced May 2018.

arXiv:1803.00117 [pdf, ps, other]

doi 10.1109/JCN.2018.000051

Redundancy allocation in finite-length nested codes for nonvolatile memories

Authors: Yongjune Kim, B. V. K. Vijaya Kumar

Abstract: In this paper, we investigate the optimum way to allocate redundancy of finite-length nested codes for modern nonvolatile memories suffering from both permanent defects and transient errors (erasures or random errors). A nested coding approach such as partitioned codes can handle both permanent defects and transient errors by using two parts of redundancy: 1) redundancy to deal with permanent defe… ▽ More In this paper, we investigate the optimum way to allocate redundancy of finite-length nested codes for modern nonvolatile memories suffering from both permanent defects and transient errors (erasures or random errors). A nested coding approach such as partitioned codes can handle both permanent defects and transient errors by using two parts of redundancy: 1) redundancy to deal with permanent defects and 2) redundancy for transient errors. We consider two different channel models of the binary defect and erasure channel (BDEC) and the binary defect and symmetric channel (BDSC). The transient errors of the BDEC are erasures and the BDSC's transient errors are modeled by the binary symmetric channel, respectively. Asymptotically, the probability of recovery failure can converge to zero if the capacity region conditions of nested codes are satisfied. However, the probability of recovery failure of finite-length nested codes can be significantly variable for different redundancy allocations even though they all satisfy the capacity region conditions. Hence, we formulate the redundancy allocation problem of finite-length nested codes to minimize the recovery failure probability. We derive the upper bounds on the probability of recovery failure and use them to estimate the optimal redundancy allocation. Numerical results show that our estimated redundancy allocation matches well the optimal redundancy allocation. △ Less

Submitted 28 February, 2018; originally announced March 2018.

Comments: accepted by Journal of Communications and Networks (JCN)

arXiv:1703.09912 [pdf, other]

One Network to Solve Them All --- Solving Linear Inverse Problems using Deep Projection Models

Authors: J. H. Rick Chang, Chun-Liang Li, Barnabas Poczos, B. V. K. Vijaya Kumar, Aswin C. Sankaranarayanan

Abstract: While deep learning methods have achieved state-of-the-art performance in many challenging inverse problems like image inpainting and super-resolution, they invariably involve problem-specific training of the networks. Under this approach, different problems require different networks. In scenarios where we need to solve a wide variety of problems, e.g., on a mobile camera, it is inefficient and c… ▽ More While deep learning methods have achieved state-of-the-art performance in many challenging inverse problems like image inpainting and super-resolution, they invariably involve problem-specific training of the networks. Under this approach, different problems require different networks. In scenarios where we need to solve a wide variety of problems, e.g., on a mobile camera, it is inefficient and costly to use these specially-trained networks. On the other hand, traditional methods using signal priors can be used in all linear inverse problems but often have worse performance on challenging tasks. In this work, we provide a middle ground between the two kinds of methods --- we propose a general framework to train a single deep neural network that solves arbitrary linear inverse problems. The proposed network acts as a proximal operator for an optimization algorithm and projects non-image signals onto the set of natural images defined by the decision boundary of a classifier. In our experiments, the proposed framework demonstrates superior performance over traditional methods using a wavelet sparsity prior and achieves comparable performance of specially-trained networks on tasks including compressive sensing and pixel-wise inpainting. △ Less

Submitted 29 March, 2017; originally announced March 2017.

ACM Class: I.4.5

arXiv:1602.03536 [pdf, other]

doi 10.1109/ITA.2016.7888148

Duality between erasures and defects

Authors: Yongjune Kim, B. V. K. Vijaya Kumar

Abstract: We investigate the duality of the binary erasure channel (BEC) and the binary defect channel (BDC). This duality holds for channel capacities, capacity achieving schemes, minimum distances, and upper bounds on the probability of failure to retrieve the original message. In addition, the relations between BEC, BDC, binary erasure quantization (BEQ), and write-once memory (WOM) are described. From t… ▽ More We investigate the duality of the binary erasure channel (BEC) and the binary defect channel (BDC). This duality holds for channel capacities, capacity achieving schemes, minimum distances, and upper bounds on the probability of failure to retrieve the original message. In addition, the relations between BEC, BDC, binary erasure quantization (BEQ), and write-once memory (WOM) are described. From these relations we claim that the capacity of the BDC can be achieved by Reed-Muller (RM) codes under maximum a posterior (MAP) decoding. Also, polar codes with a successive cancellation encoder achieve the capacity of the BDC. Inspired by the duality between the BEC and the BDC, we introduce locally rewritable codes (LWC) for resistive memories, which are the counterparts of locally repairable codes (LRC) for distributed storage systems. The proposed LWC can improve endurance limit and power efficiency of resistive memories. △ Less

Submitted 10 February, 2016; originally announced February 2016.

Comments: Presented at Information Theory and Applications (ITA) Workshop 2016. arXiv admin note: text overlap with arXiv:1602.01202

arXiv:1602.01202 [pdf, ps, other]

doi 10.1109/ICC.2016.7510727

Locally rewritable codes for resistive memories

Authors: Yongjune Kim, Abhishek A. Sharma, Robert Mateescu, Seung-Hwan Song, Zvonimir Z. Bandic, James A. Bain, B. V. K. Vijaya Kumar

Abstract: We propose locally rewritable codes (LWC) for resistive memories inspired by locally repairable codes (LRC) for distributed storage systems. Small values of repair locality of LRC enable fast repair of a single failed node since the lost data in the failed node can be recovered by accessing only a small fraction of other nodes. By using rewriting locality, LWC can improve endurance limit and power… ▽ More We propose locally rewritable codes (LWC) for resistive memories inspired by locally repairable codes (LRC) for distributed storage systems. Small values of repair locality of LRC enable fast repair of a single failed node since the lost data in the failed node can be recovered by accessing only a small fraction of other nodes. By using rewriting locality, LWC can improve endurance limit and power consumption which are major challenges for resistive memories. We point out the duality between LRC and LWC, which indicates that existing construction methods of LRC can be applied to construct LWC. △ Less

Submitted 3 February, 2016; originally announced February 2016.

Comments: accepted by IEEE International Conference on Communications (ICC) 2016

arXiv:1411.4701 [pdf, other]

Structured Hough Voting for Vision-based Highway Border Detection

Authors: Zhiding Yu, Wende Zhang, B. V. K. Vijaya Kumar, Dan Levi

Abstract: We propose a vision-based highway border detection algorithm using structured Hough voting. Our approach takes advantage of the geometric relationship between highway road borders and highway lane markings. It uses a strategy where a number of trained road border and lane marking detectors are triggered, followed by Hough voting to generate corresponding detection of the border and lane marking. S… ▽ More We propose a vision-based highway border detection algorithm using structured Hough voting. Our approach takes advantage of the geometric relationship between highway road borders and highway lane markings. It uses a strategy where a number of trained road border and lane marking detectors are triggered, followed by Hough voting to generate corresponding detection of the border and lane marking. Since the initially triggered detectors usually result in large number of positives, conventional frame-wise Hough voting is not able to always generate robust border and lane marking results. Therefore, we formulate this problem as a joint detection-and-tracking problem under the structured Hough voting model, where tracking refers to exploiting inter-frame structural information to stabilize the detection results. Both qualitative and quantitative evaluations show the superiority of the proposed structured Hough voting model over a number of baseline methods. △ Less

Submitted 17 November, 2014; originally announced November 2014.

arXiv:1411.2316 [pdf, other]

doi 10.1109/TPAMI.2014.2375215

Zero-Aliasing Correlation Filters for Object Recognition

Authors: Joseph A. Fernandez, Vishnu Naresh Boddeti, Andres Rodriguez, B. V. K. Vijaya Kumar

Abstract: Correlation filters (CFs) are a class of classifiers that are attractive for object localization and tracking applications. Traditionally, CFs have been designed in the frequency domain using the discrete Fourier transform (DFT), where correlation is efficiently implemented. However, existing CF designs do not account for the fact that the multiplication of two DFTs in the frequency domain corresp… ▽ More Correlation filters (CFs) are a class of classifiers that are attractive for object localization and tracking applications. Traditionally, CFs have been designed in the frequency domain using the discrete Fourier transform (DFT), where correlation is efficiently implemented. However, existing CF designs do not account for the fact that the multiplication of two DFTs in the frequency domain corresponds to a circular correlation in the time/spatial domain. Because this was previously unaccounted for, prior CF designs are not truly optimal, as their optimization criteria do not accurately quantify their optimization intention. In this paper, we introduce new zero-aliasing constraints that completely eliminate this aliasing problem by ensuring that the optimization criterion for a given CF corresponds to a linear correlation rather than a circular correlation. This means that previous CF designs can be significantly improved by this reformulation. We demonstrate the benefits of this new CF design approach with several important CFs. We present experimental results on diverse data sets and present solutions to the computational challenges associated with computing these CFs. Code for the CFs described in this paper and their respective zero-aliasing versions is available at http://vishnu.boddeti.net/projects/correlation-filters.html △ Less

Submitted 19 November, 2014; v1 submitted 9 November, 2014; originally announced November 2014.

Comments: 14 pages, to appear in IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)

arXiv:1410.8541 [pdf, ps, other]

doi 10.1109/ICC.2015.7248332

Coding scheme for 3D vertical flash memory

Authors: Yongjune Kim, Robert Mateescu, Seung-Hwan Song, Zvonimir Bandic, B. V. K. Vijaya Kumar

Abstract: Recently introduced 3D vertical flash memory is expected to be a disruptive technology since it overcomes scaling challenges of conventional 2D planar flash memory by stacking up cells in the vertical direction. However, 3D vertical flash memory suffers from a new problem known as fast detrapping, which is a rapid charge loss problem. In this paper, we propose a scheme to compensate the effect of… ▽ More Recently introduced 3D vertical flash memory is expected to be a disruptive technology since it overcomes scaling challenges of conventional 2D planar flash memory by stacking up cells in the vertical direction. However, 3D vertical flash memory suffers from a new problem known as fast detrapping, which is a rapid charge loss problem. In this paper, we propose a scheme to compensate the effect of fast detrapping by intentional inter-cell interference (ICI). In order to properly control the intentional ICI, our scheme relies on a coding technique that incorporates the side information of fast detrapping during the encoding stage. This technique is closely connected to the well-known problem of coding in a memory with defective cells. Numerical results show that the proposed scheme can effectively address the problem of fast detrapping. △ Less

Submitted 10 February, 2015; v1 submitted 30 October, 2014; originally announced October 2014.

Comments: 7 pages, 9 figures. accepted to ICC 2015. arXiv admin note: text overlap with arXiv:1410.1775

arXiv:1410.1775 [pdf, ps, other]

doi 10.1109/ALLERTON.2014.7028498

Writing on dirty flash memory

Authors: Yongjune Kim, B. V. K. Vijaya Kumar

Abstract: The most important challenge in the scaling down of flash memory is its increased inter-cell interference (ICI). If side information about ICI is known to the encoder, the flash memory channel can be viewed as similar to Costa's "writing on dirty paper (dirty paper coding)." We first explain why flash memories are dirty due to ICI. We then show that "dirty flash memory" can be changed into "memory… ▽ More The most important challenge in the scaling down of flash memory is its increased inter-cell interference (ICI). If side information about ICI is known to the encoder, the flash memory channel can be viewed as similar to Costa's "writing on dirty paper (dirty paper coding)." We first explain why flash memories are dirty due to ICI. We then show that "dirty flash memory" can be changed into "memory with defective cells" model by using only one pre-read operation. The asymmetry between write and erase operations in flash memory plays an important role in this change. Based on the "memory with defective cells" model, we show that additive encoding can significantly improve the probability of decoding failure by using the side information. △ Less

Submitted 7 October, 2014; originally announced October 2014.

Comments: 8 pages, accepted to 52nd Annual Allerton Conference on Communication, Control, and Computing, Oct. 2014

arXiv:1404.6031 [pdf, other]

Maximum Margin Vector Correlation Filter

Authors: Vishnu Naresh Boddeti, B. V. K. Vijaya Kumar

Abstract: Correlation Filters (CFs) are a class of classifiers which are designed for accurate pattern localization. Traditionally CFs have been used with scalar features only, which limits their ability to be used with vector feature representations like Gabor filter banks, SIFT, HOG, etc. In this paper we present a new CF named Maximum Margin Vector Correlation Filter (MMVCF) which extends the traditional… ▽ More Correlation Filters (CFs) are a class of classifiers which are designed for accurate pattern localization. Traditionally CFs have been used with scalar features only, which limits their ability to be used with vector feature representations like Gabor filter banks, SIFT, HOG, etc. In this paper we present a new CF named Maximum Margin Vector Correlation Filter (MMVCF) which extends the traditional CF designs to vector features. MMVCF further combines the generalization capability of large margin based classifiers like Support Vector Machines (SVMs) and the localization properties of CFs for better robustness to outliers. We demonstrate the efficacy of MMVCF for object detection and landmark localization on a variety of databases and demonstrate that MMVCF consistently shows improved pattern localization capability in comparison to SVMs. △ Less

Submitted 24 April, 2014; originally announced April 2014.

Comments: 8 pages

arXiv:1403.1897 [pdf, ps, other]

doi 10.1109/JCN.2018.000051

On the Duality of Erasures and Defects

Authors: Yongjune Kim, B V K Vijaya Kumar

Abstract: In this paper, the duality of erasures and defects will be investigated by comparing the binary erasure channel (BEC) and the binary defect channel (BDC). The duality holds for channel capacities, capacity achieving schemes, minimum distances, and upper bounds on the probability of failure to retrieve the original message. Also, the binary defect and erasure channel (BDEC) will be introduced by co… ▽ More In this paper, the duality of erasures and defects will be investigated by comparing the binary erasure channel (BEC) and the binary defect channel (BDC). The duality holds for channel capacities, capacity achieving schemes, minimum distances, and upper bounds on the probability of failure to retrieve the original message. Also, the binary defect and erasure channel (BDEC) will be introduced by combining the properties of the BEC and the BDC. It will be shown that the capacity of the BDEC can be achieved by the coding scheme that combines the encoding for the defects and the decoding for the erasures. This coding scheme for the BDEC has two separate redundancy parts for correcting erasures and masking defects. Thus, we will investigate the problem of redundancy allocation between these two parts. △ Less

Submitted 7 March, 2014; originally announced March 2014.

Comments: 40 pages, 8 figures, submitted to IEEE Transactions on Information Theory

arXiv:1305.3289 [pdf, ps, other]

doi 10.1109/ISIT.2013.6620651

Redundancy Allocation of Partitioned Linear Block Codes

Authors: Yongjune Kim, B. V. K. Vijaya Kumar

Abstract: Most memories suffer from both permanent defects and intermittent random errors. The partitioned linear block codes (PLBC) were proposed by Heegard to efficiently mask stuck-at defects and correct random errors. The PLBC have two separate redundancy parts for defects and random errors. In this paper, we investigate the allocation of redundancy between these two parts. The optimal redundancy alloca… ▽ More Most memories suffer from both permanent defects and intermittent random errors. The partitioned linear block codes (PLBC) were proposed by Heegard to efficiently mask stuck-at defects and correct random errors. The PLBC have two separate redundancy parts for defects and random errors. In this paper, we investigate the allocation of redundancy between these two parts. The optimal redundancy allocation will be investigated using simulations and the simulation results show that the PLBC can significantly reduce the probability of decoding failure in memory with defects. In addition, we will derive the upper bound on the probability of decoding failure of PLBC and estimate the optimal redundancy allocation using this upper bound. The estimated redundancy allocation matches the optimal redundancy allocation well. △ Less

Submitted 14 May, 2013; originally announced May 2013.

Comments: 5 pages, 2 figures, to appear in IEEE International Symposium on Information Theory (ISIT), Jul. 2013

arXiv:1304.4821 [pdf]

doi 10.1109/ICC.2013.6655249

Coding for Memory with Stuck-at Defects

Authors: Yongjune Kim, B. V. K. Vijaya Kumar

Abstract: In this paper, we propose an encoding scheme for partitioned linear block codes (PLBC) which mask the stuck-at defects in memories. In addition, we derive an upper bound and the estimate of the probability that masking fails. Numerical results show that PLBC can efficiently mask the defects with the proposed encoding scheme. Also, we show that our upper bound is very tight by using numerical resul… ▽ More In this paper, we propose an encoding scheme for partitioned linear block codes (PLBC) which mask the stuck-at defects in memories. In addition, we derive an upper bound and the estimate of the probability that masking fails. Numerical results show that PLBC can efficiently mask the defects with the proposed encoding scheme. Also, we show that our upper bound is very tight by using numerical results. △ Less

Submitted 17 April, 2013; originally announced April 2013.

Comments: 6 pages, 5 figures, IEEE International Conference on Communications (ICC), Jun. 2013

arXiv:1304.4811 [pdf]

doi 10.1109/ICCNC.2013.6504220

Modulation Coding for Flash Memories

Authors: Yongjune Kim, Kyoung Lae Cho, Hongrak Son, Jaehong Kim, Jun Jin Kong, Jaejin Lee, B. V. K. Vijaya Kumar

Abstract: The aggressive scaling down of flash memories has threatened data reliability since the scaling down of cell sizes gives rise to more serious degradation mechanisms such as cell-to-cell interference and lateral charge spreading. The effect of these mechanisms has pattern dependency and some data patterns are more vulnerable than other ones. In this paper, we will categorize data patterns taking in… ▽ More The aggressive scaling down of flash memories has threatened data reliability since the scaling down of cell sizes gives rise to more serious degradation mechanisms such as cell-to-cell interference and lateral charge spreading. The effect of these mechanisms has pattern dependency and some data patterns are more vulnerable than other ones. In this paper, we will categorize data patterns taking into account degradation mechanisms and pattern dependency. In addition, we propose several modulation coding schemes to improve the data reliability by transforming original vulnerable data patterns into more robust ones. △ Less

Submitted 17 April, 2013; originally announced April 2013.

Comments: 7 pages, 9 figures, Proc. IEEE International Conference on Computing, Networking and Communications (ICNC), Jan. 2013

Showing 1–28 of 28 results for author: Kumar, B V