Skip to main content

Showing 1–34 of 34 results for author: Ithapu, V K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.21080  [pdf, ps, other

    cs.CV cs.AI cs.LG

    EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception

    Authors: Sanjoy Chowdhury, Subrata Biswas, Sayan Nag, Tushar Nagarajan, Calvin Murdock, Ishwarya Ananthabhotla, Yijun Qian, Vamsi Krishna Ithapu, Dinesh Manocha, Ruohan Gao

    Abstract: Modern perception models, particularly those designed for multisensory egocentric tasks, have achieved remarkable performance but often come with substantial computational costs. These high demands pose challenges for real-world deployment, especially in resource-constrained environments. In this paper, we introduce EgoAdapt, a framework that adaptively performs cross-modal distillation and policy… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted at ICCV 2025

  2. arXiv:2504.10746  [pdf, ps, other

    cs.CV cs.AI cs.LG cs.SD eess.AS

    Hearing Anywhere in Any Environment

    Authors: Xiulong Liu, Anurag Kumar, Paul Calamia, Sebastia V. Amengual, Calvin Murdock, Ishwarya Ananthabhotla, Philip Robinson, Eli Shlizerman, Vamsi Krishna Ithapu, Ruohan Gao

    Abstract: In mixed reality applications, a realistic acoustic experience in spatial environments is as crucial as the visual experience for achieving true immersion. Despite recent advances in neural approaches for Room Impulse Response (RIR) estimation, most existing methods are limited to the single environment on which they are trained, lacking the ability to generalize to new rooms with different geomet… ▽ More

    Submitted 4 June, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

    Comments: CVPR 2025; Project Page: https://dragonliu1995.github.io/hearinganywhereinanyenvironment/

  3. arXiv:2411.02019  [pdf, other

    eess.AS cs.LG cs.SD

    Modulating State Space Model with SlowFast Framework for Compute-Efficient Ultra Low-Latency Speech Enhancement

    Authors: Longbiao Cheng, Ashutosh Pandey, Buye Xu, Tobi Delbruck, Vamsi Krishna Ithapu, Shih-Chii Liu

    Abstract: Deep learning-based speech enhancement (SE) methods often face significant computational challenges when needing to meet low-latency requirements because of the increased number of frames to be processed. This paper introduces the SlowFast framework which aims to reduce computation costs specifically when low-latency enhancement is needed. The framework consists of a slow branch that analyzes the… ▽ More

    Submitted 4 January, 2025; v1 submitted 4 November, 2024; originally announced November 2024.

    Comments: Accepted to ICASSP 2025

  4. arXiv:2408.05364  [pdf, other

    cs.CV

    Spherical World-Locking for Audio-Visual Localization in Egocentric Videos

    Authors: Heeseung Yun, Ruohan Gao, Ishwarya Ananthabhotla, Anurag Kumar, Jacob Donley, Chao Li, Gunhee Kim, Vamsi Krishna Ithapu, Calvin Murdock

    Abstract: Egocentric videos provide comprehensive contexts for user and scene understanding, spanning multisensory perception to behavioral interaction. We propose Spherical World-Locking (SWL) as a general framework for egocentric scene representation, which implicitly transforms multisensory streams with respect to measurements of head orientation. Compared to conventional head-locked egocentric represent… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: ECCV2024

  5. arXiv:2401.08972  [pdf, other

    cs.CV

    Hearing Loss Detection from Facial Expressions in One-on-one Conversations

    Authors: Yufeng Yin, Ishwarya Ananthabhotla, Vamsi Krishna Ithapu, Stavros Petridis, Yu-Hsiang Wu, Christi Miller

    Abstract: Individuals with impaired hearing experience difficulty in conversations, especially in noisy environments. This difficulty often manifests as a change in behavior and may be captured via facial expressions, such as the expression of discomfort or fatigue. In this work, we build on this idea and introduce the problem of detecting hearing loss from an individual's facial expressions during a conver… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  6. arXiv:2312.12870  [pdf, other

    cs.CV

    The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

    Authors: Wenqi Jia, Miao Liu, Hao Jiang, Ishwarya Ananthabhotla, James M. Rehg, Vamsi Krishna Ithapu, Ruohan Gao

    Abstract: In recent years, the thriving development of research related to egocentric videos has provided a unique perspective for the study of conversational interactions, where both visual and audio signals play a crucial role. While most prior work focus on learning about behaviors that directly involve the camera wearer, we introduce the Ego-Exocentric Conversational Graph Prediction problem, marking th… ▽ More

    Submitted 3 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

  7. arXiv:2303.16024  [pdf, other

    cs.CV cs.SD eess.AS

    Egocentric Auditory Attention Localization in Conversations

    Authors: Fiona Ryan, Hao Jiang, Abhinav Shukla, James M. Rehg, Vamsi Krishna Ithapu

    Abstract: In a noisy conversation environment such as a dinner party, people often exhibit selective auditory attention, or the ability to focus on a particular speaker while tuning out others. Recognizing who somebody is listening to in a conversation is essential for developing technologies that can understand social behavior and devices that can augment human hearing by amplifying particular sound source… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  8. arXiv:2301.08730  [pdf, other

    cs.CV cs.SD eess.AS

    Novel-View Acoustic Synthesis

    Authors: Changan Chen, Alexander Richard, Roman Shapovalov, Vamsi Krishna Ithapu, Natalia Neverova, Kristen Grauman, Andrea Vedaldi

    Abstract: We introduce the novel-view acoustic synthesis (NVAS) task: given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint? We propose a neural rendering approach: Visually-Guided Acoustic Synthesis (ViGAS) network that learns to synthesize the sound of an arbitrary point in space by analyzing the input audio-visual cues. To benc… ▽ More

    Submitted 24 October, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

    Comments: Accepted at CVPR 2023. Project page: https://vision.cs.utexas.edu/projects/nvas

  9. arXiv:2301.02184  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations

    Authors: Sagnik Majumder, Hao Jiang, Pierre Moulon, Ethan Henderson, Paul Calamia, Kristen Grauman, Vamsi Krishna Ithapu

    Abstract: Can conversational videos captured from multiple egocentric viewpoints reveal the map of a scene in a cost-efficient way? We seek to answer this question by proposing a new problem: efficiently building the map of a previously unseen 3D environment by exploiting shared information in the egocentric audio-visual observations of participants in a natural conversation. Our hypothesis is that as multi… ▽ More

    Submitted 20 April, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

    Comments: Accepted to CVPR 2023

  10. arXiv:2211.10999  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders

    Authors: Rodrigo Mira, Buye Xu, Jacob Donley, Anurag Kumar, Stavros Petridis, Vamsi Krishna Ithapu, Maja Pantic

    Abstract: Audio-visual speech enhancement aims to extract clean speech from a noisy environment by leveraging not only the audio itself but also the target speaker's lip movements. This approach has been shown to yield improvements over audio-only speech enhancement, particularly for the removal of interfering speech. Despite recent advances in speech synthesis, most audio-visual approaches continue to use… ▽ More

    Submitted 13 March, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

    Comments: accepted to ICASSP 2023

  11. arXiv:2211.08624  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Leveraging Heteroscedastic Uncertainty in Learning Complex Spectral Mapping for Single-channel Speech Enhancement

    Authors: Kuan-Lin Chen, Daniel D. E. Wong, Ke Tan, Buye Xu, Anurag Kumar, Vamsi Krishna Ithapu

    Abstract: Most speech enhancement (SE) models learn a point estimate and do not make use of uncertainty estimation in the learning process. In this paper, we show that modeling heteroscedastic uncertainty by minimizing a multivariate Gaussian negative log-likelihood (NLL) improves SE performance at no extra cost. During training, our approach augments a model learning complex spectral mapping with a tempora… ▽ More

    Submitted 8 March, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: 5 pages. Accepted at ICASSP 2023

  12. arXiv:2211.04473  [pdf, other

    cs.SD cs.AI eess.AS

    Towards Improved Room Impulse Response Estimation for Speech Recognition

    Authors: Anton Ratnarajah, Ishwarya Ananthabhotla, Vamsi Krishna Ithapu, Pablo Hoffmann, Dinesh Manocha, Paul Calamia

    Abstract: We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of a downstream application scenario, far-field automatic speech recognition (ASR). We first draw the connection between improved RIR estimation and improved ASR performance, as a means of evaluating neural RIR estimators. We then propose a generative adversarial network (GAN) based architecture tha… ▽ More

    Submitted 19 March, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

    Comments: Accepted at ICASSP 2023. More results are available at https://anton-jeran.github.io/S2IR/

  13. arXiv:2206.12297  [pdf, other

    eess.AS cs.SD

    SAQAM: Spatial Audio Quality Assessment Metric

    Authors: Pranay Manocha, Anurag Kumar, Buye Xu, Anjali Menon, Israel D. Gebru, Vamsi K. Ithapu, Paul Calamia

    Abstract: Audio quality assessment is critical for assessing the perceptual realism of sounds. However, the time and expense of obtaining ''gold standard'' human judgments limit the availability of such data. For AR&VR, good perceived sound quality and localizability of sources are among the key elements to ensure complete immersion of the user. Our work introduces SAQAM which uses a multi-task learning fra… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

    Comments: To Appear, Interspeech 2022

  14. arXiv:2202.08862  [pdf, other

    cs.SD cs.LG eess.AS

    RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing

    Authors: Efthymios Tzinis, Yossi Adi, Vamsi Krishna Ithapu, Buye Xu, Paris Smaragdis, Anurag Kumar

    Abstract: We present RemixIT, a simple yet effective self-supervised method for training speech enhancement without the need of a single isolated in-domain speech nor a noise waveform. Our approach overcomes limitations of previous methods which make them dependent on clean in-domain target signals and thus, sensitive to any domain mismatch between train and test samples. RemixIT is based on a continuous se… ▽ More

    Submitted 3 August, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

    Comments: To appear in IEEE Journal of Selected Topics in Signal Processing

    Journal ref: J-STSP-SLSAP-00040-2022

  15. arXiv:2202.03416  [pdf, other

    cs.SD cs.LG eess.AS

    Deep Impulse Responses: Estimating and Parameterizing Filters with Deep Networks

    Authors: Alexander Richard, Peter Dodds, Vamsi Krishna Ithapu

    Abstract: Impulse response estimation in high noise and in-the-wild settings, with minimal control of the underlying data distributions, is a challenging problem. We propose a novel framework for parameterizing and estimating impulse responses based on recent advances in neural representation learning. Our framework is driven by a carefully designed neural network that jointly estimates the impulse response… ▽ More

    Submitted 7 February, 2022; originally announced February 2022.

  16. arXiv:2201.01928  [pdf, other

    cs.CV cs.SD eess.AS

    Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization

    Authors: Hao Jiang, Calvin Murdock, Vamsi Krishna Ithapu

    Abstract: Augmented reality devices have the potential to enhance human perception and enable other assistive functionalities in complex conversational environments. Effectively capturing the audio-visual context necessary for understanding these social interactions first requires detecting and localizing the voice activities of the device wearer and the surrounding people. These tasks are challenging due t… ▽ More

    Submitted 6 January, 2022; originally announced January 2022.

  17. Continual self-training with bootstrapped remixing for speech enhancement

    Authors: Efthymios Tzinis, Yossi Adi, Vamsi K. Ithapu, Buye Xu, Anurag Kumar

    Abstract: We propose RemixIT, a simple and novel self-supervised training method for speech enhancement. The proposed method is based on a continuously self-training scheme that overcomes limitations from previous studies including assumptions for the in-domain noise distribution and having access to clean target signals. Specifically, a separation teacher model is pre-trained on an out-of-domain dataset an… ▽ More

    Submitted 29 January, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: To appear in Proc. ICASSP 2022, May 22-27, 2022, Singapore

    Journal ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  18. arXiv:2110.07058  [pdf, other

    cs.CV cs.AI

    Ego4D: Around the World in 3,000 Hours of Egocentric Video

    Authors: Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do , et al. (60 additional authors not shown)

    Abstract: We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with cons… ▽ More

    Submitted 11 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: To appear in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. This version updates the baseline result numbers for the Hands and Objects benchmark (appendix)

  19. arXiv:2107.07503  [pdf, other

    eess.AS cs.SD

    Filtered Noise Shaping for Time Domain Room Impulse Response Estimation From Reverberant Speech

    Authors: Christian J. Steinmetz, Vamsi Krishna Ithapu, Paul Calamia

    Abstract: Deep learning approaches have emerged that aim to transform an audio signal so that it sounds as if it was recorded in the same room as a reference recording, with applications both in audio post-production and augmented reality. In this work, we propose FiNS, a Filtered Noise Shaping network that directly estimates the time domain room impulse response (RIR) from reverberant speech. Our domain-in… ▽ More

    Submitted 15 July, 2021; originally announced July 2021.

    Comments: Accepted to WASPAA 2021. See details at https://facebookresearch.github.io/FiNS/

  20. arXiv:2107.04174  [pdf, other

    cs.SD cs.CV cs.LG eess.AS eess.SP

    EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments

    Authors: Jacob Donley, Vladimir Tourbabin, Jung-Suk Lee, Mark Broyles, Hao Jiang, Jie Shen, Maja Pantic, Vamsi Krishna Ithapu, Ravish Mehra

    Abstract: Augmented Reality (AR) as a platform has the potential to facilitate the reduction of the cocktail party effect. Future AR headsets could potentially leverage information from an array of sensors spanning many different modalities. Training and testing signal processing and machine learning algorithms on tasks such as beam-forming and speech enhancement require high quality representative data. To… ▽ More

    Submitted 18 October, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

    Comments: Dataset is available at: https://github.com/facebookresearch/EasyComDataset

  21. arXiv:2106.11335  [pdf, other

    cs.SD cs.AI eess.AS

    Do sound event representations generalize to other audio tasks? A case study in audio transfer learning

    Authors: Anurag Kumar, Yun Wang, Vamsi Krishna Ithapu, Christian Fuegen

    Abstract: Transfer learning is critical for efficient information transfer across multiple related learning problems. A simple, yet effective transfer learning approach utilizes deep neural networks trained on a large-scale task for feature extraction. Such representations are then used to learn related downstream tasks. In this paper, we investigate transfer learning capacity of audio representations obtai… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: Accepted Interspeech 2021

  22. DPLM: A Deep Perceptual Spatial-Audio Localization Metric

    Authors: Pranay Manocha, Anurag Kumar, Buye Xu, Anjali Menon, Israel D. Gebru, Vamsi K. Ithapu, Paul Calamia

    Abstract: Subjective evaluations are critical for assessing the perceptual realism of sounds in audio-synthesis driven technologies like augmented and virtual reality. However, they are challenging to set up, fatiguing for users, and expensive. In this work, we tackle the problem of capturing the perceptual characteristics of localizing sounds. Specifically, we propose a framework for building a general pur… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

  23. arXiv:2104.05167  [pdf, other

    cs.CV

    Egocentric Pose Estimation from Human Vision Span

    Authors: Hao Jiang, Vamsi Krishna Ithapu

    Abstract: Estimating camera wearer's body pose from an egocentric view (egopose) is a vital task in augmented and virtual reality. Existing approaches either use a narrow field of view front facing camera that barely captures the wearer, or an extruded head-mounted top-down camera for maximal wearer visibility. In this paper, we tackle the egopose estimation from a more natural human vision span, where came… ▽ More

    Submitted 11 April, 2021; originally announced April 2021.

    Comments: 9 pages

  24. arXiv:2012.15470  [pdf, other

    cs.CV

    Audio-Visual Floorplan Reconstruction

    Authors: Senthil Purushwalkam, Sebastian Vicenc Amengual Gari, Vamsi Krishna Ithapu, Carl Schissler, Philip Robinson, Abhinav Gupta, Kristen Grauman

    Abstract: Given only a few glimpses of an environment, how much can we infer about its entire floorplan? Existing methods can map only what is visible or immediately apparent from context, and thus require substantial movements through a space to fully map it. We explore how both audio and visual sensing together can provide rapid floorplan reconstruction from limited viewpoints. Audio not only helps sense… ▽ More

    Submitted 31 December, 2020; originally announced December 2020.

  25. arXiv:2007.00144  [pdf, other

    cs.SD cs.LG eess.AS

    A Sequential Self Teaching Approach for Improving Generalization in Sound Event Recognition

    Authors: Anurag Kumar, Vamsi Krishna Ithapu

    Abstract: An important problem in machine auditory perception is to recognize and detect sound events. In this paper, we propose a sequential self-teaching approach to learning sounds. Our main proposition is that it is harder to learn sounds in adverse situations such as from weakly labeled and/or noisy labeled data, and in these situations a single stage of learning is not sufficient. Our proposal is a se… ▽ More

    Submitted 30 June, 2020; originally announced July 2020.

    Comments: Accepted International Conference on Machine Learning $\textbf{(ICML) 2020}$. 14 pages

  26. arXiv:1912.11474  [pdf, other

    cs.CV cs.HC cs.SD eess.AS

    SoundSpaces: Audio-Visual Navigation in 3D Environments

    Authors: Changan Chen, Unnat Jain, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman

    Abstract: Moving around in the world is naturally a multisensory experience, but today's embodied agents are deaf---restricted to solely their visual perception of the environment. We introduce audio-visual navigation for complex, acoustically and visually realistic 3D environments. By both seeing and hearing, the agent must learn to navigate to a sounding object. We propose a multi-modal deep reinforcement… ▽ More

    Submitted 21 August, 2020; v1 submitted 24 December, 2019; originally announced December 2019.

    Comments: Accepted to ECCV 2020 (Spotlight). Project page: http://vision.cs.utexas.edu/projects/audio_visual_navigation/

  27. Secost: Sequential co-supervision for large scale weakly labeled audio event detection

    Authors: Anurag Kumar, Vamsi Krishna Ithapu

    Abstract: Weakly supervised learning algorithms are critical for scaling audio event detection to several hundreds of sound categories. Such learning models should not only disambiguate sound events efficiently with minimal class-specific annotation but also be robust to label noise, which is more apparent with weak labels instead of strong annotations. In this work, we propose a new framework for designing… ▽ More

    Submitted 4 May, 2020; v1 submitted 25 October, 2019; originally announced October 2019.

    Comments: Accepted IEEE ICASSP 2020

  28. arXiv:1705.05804  [pdf, other

    cs.CV math.NA stat.ML

    The Incremental Multiresolution Matrix Factorization Algorithm

    Authors: Vamsi K. Ithapu, Risi Kondor, Sterling C. Johnson, Vikas Singh

    Abstract: Multiresolution analysis and matrix factorization are foundational tools in computer vision. In this work, we study the interface between these two distinct topics and obtain techniques to uncover hierarchical block structure in symmetric matrices -- an important aspect in the success of many vision problems. Our new algorithm, the incremental multiresolution matrix factorization, uncovers such st… ▽ More

    Submitted 16 May, 2017; originally announced May 2017.

    Comments: Computer Vision and Pattern Recognition (CVPR) 2017, 10 pages

  29. Accelerating Permutation Testing in Voxel-wise Analysis through Subspace Tracking: A new plugin for SnPM

    Authors: Felipe Gutierrez-Barragan, Vamsi K. Ithapu, Chris Hinrichs, Camille Maumet, Sterling C. Johnson, Thomas E. Nichols, Vikas Singh, the ADNI

    Abstract: Permutation testing is a non-parametric method for obtaining the max null distribution used to compute corrected $p$-values that provide strong control of false positives. In neuroimaging, however, the computational burden of running such an algorithm can be significant. We find that by viewing the permutation testing procedure as the construction of a very large permutation testing matrix, $T$, o… ▽ More

    Submitted 24 July, 2017; v1 submitted 4 March, 2017; originally announced March 2017.

    Comments: 36 pages, 16 figures

  30. arXiv:1702.08670  [pdf, other

    cs.LG math.OC stat.ML

    On architectural choices in deep learning: From network structure to gradient convergence and parameter estimation

    Authors: Vamsi K Ithapu, Sathya N Ravi, Vikas Singh

    Abstract: We study mechanisms to characterize how the asymptotic convergence of backpropagation in deep architectures, in general, is related to the network structure, and how it may be influenced by other design choices including activation type, denoising and dropout rate. We seek to analyze whether network architecture and input data statistics may guide the choices of learning parameters and vice versa.… ▽ More

    Submitted 28 February, 2017; originally announced February 2017.

    Comments: 87 Pages; 14 figures; Under review

  31. arXiv:1511.05297  [pdf, other

    cs.LG stat.ML

    On the interplay of network structure and gradient convergence in deep learning

    Authors: Vamsi K Ithapu, Sathya N Ravi, Vikas Singh

    Abstract: The regularization and output consistency behavior of dropout and layer-wise pretraining for learning deep networks have been fairly well studied. However, our understanding of how the asymptotic convergence of backpropagation in deep architectures is related to the structural properties of the network and other design choices (like denoising and dropout rate) is less clear at this time. An intere… ▽ More

    Submitted 22 February, 2017; v1 submitted 17 November, 2015; originally announced November 2015.

    Comments: 54th Allerton Conference on Communication, Control and Computing 2016; pgs 488-495

  32. arXiv:1506.03412   

    cs.LG cs.CV cs.NE math.OC stat.ML

    Convergence rates for pretraining and dropout: Guiding learning parameters using network structure

    Authors: Vamsi K. Ithapu, Sathya Ravi, Vikas Singh

    Abstract: Unsupervised pretraining and dropout have been well studied, especially with respect to regularization and output consistency. However, our understanding about the explicit convergence rates of the parameter estimates, and their dependence on the learning (like denoising and dropout rate) and structural (like depth and layer lengths) aspects of the network is less mature. An interesting question i… ▽ More

    Submitted 22 February, 2017; v1 submitted 10 June, 2015; originally announced June 2015.

    Comments: This manuscript is now superseded by arXiv:1511.05297 and the corresponding accepted paper in 54th Allerton Conference on Communication, Control and Computing (2017)

  33. arXiv:1502.03537  [pdf, other

    cs.LG cs.CV math.OC

    Convergence of gradient based pre-training in Denoising autoencoders

    Authors: Vamsi K Ithapu, Sathya Ravi, Vikas Singh

    Abstract: The success of deep architectures is at least in part attributed to the layer-by-layer unsupervised pre-training that initializes the network. Various papers have reported extensive empirical analysis focusing on the design and implementation of good pre-training procedures. However, an understanding pertaining to the consistency of parameter estimates, the convergence of learning procedures and t… ▽ More

    Submitted 11 February, 2015; originally announced February 2015.

    Comments: 20 pages

  34. arXiv:1502.03536  [pdf, other

    stat.CO cs.AI stat.ML

    Speeding up Permutation Testing in Neuroimaging

    Authors: Chris Hinrichs, Vamsi K Ithapu, Qinyuan Sun, Sterling C Johnson, Vikas Singh

    Abstract: Multiple hypothesis testing is a significant problem in nearly all neuroimaging studies. In order to correct for this phenomena, we require a reliable estimate of the Family-Wise Error Rate (FWER). The well known Bonferroni correction method, while simple to implement, is quite conservative, and can substantially under-power a study because it ignores dependencies between test statistics. Permutat… ▽ More

    Submitted 11 February, 2015; originally announced February 2015.

    Comments: NIPS 13

    Journal ref: Advances in neural information processing systems (2013), pp. 890-898