Skip to main content

Showing 1–37 of 37 results for author: Goel, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.07304  [pdf, ps, other

    cs.CV

    FANVID: A Benchmark for Face and License Plate Recognition in Low-Resolution Videos

    Authors: Kavitha Viswanathan, Vrinda Goel, Shlesh Gholap, Devayan Ghosh, Madhav Gupta, Dhruvi Ganatra, Sanket Potdar, Amit Sethi

    Abstract: Real-world surveillance often renders faces and license plates unrecognizable in individual low-resolution (LR) frames, hindering reliable identification. To advance temporal recognition models, we present FANVID, a novel video-based benchmark comprising nearly 1,463 LR clips (180 x 320, 20--60 FPS) featuring 63 identities and 49 license plates from three English-speaking countries. Each video inc… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  2. arXiv:2504.17653  [pdf

    cs.CL

    Towards a comprehensive taxonomy of online abusive language informed by machine leaning

    Authors: Samaneh Hosseini Moghaddam, Kelly Lyons, Cheryl Regehr, Vivek Goel, Kaitlyn Regehr

    Abstract: The proliferation of abusive language in online communications has posed significant risks to the health and wellbeing of individuals and communities. The growing concern regarding online abuse and its consequences necessitates methods for identifying and mitigating harmful content and facilitating continuous monitoring, moderation, and early intervention. This paper presents a taxonomy for distin… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  3. arXiv:2502.03639  [pdf, other

    cs.CV

    Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach

    Authors: Yunuo Chen, Junli Cao, Anil Kag, Vidit Goel, Sergei Korolev, Chenfanfu Jiang, Sergey Tulyakov, Jian Ren

    Abstract: We present a novel video generation framework that integrates 3-dimensional geometry and dynamic awareness. To achieve this, we augment 2D videos with 3D point trajectories and align them in pixel space. The resulting 3D-aware video dataset, PointVid, is then used to fine-tune a latent diffusion model, enabling it to track 2D objects with 3D Cartesian coordinates. Building on this, we regularize t… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

    Comments: Project Page: \url{https://snap-research.github.io/PointVidGen/}

  4. arXiv:2412.12091  [pdf, other

    cs.CV

    Wonderland: Navigating 3D Scenes from a Single Image

    Authors: Hanwen Liang, Junli Cao, Vidit Goel, Guocheng Qian, Sergei Korolev, Demetri Terzopoulos, Konstantinos N. Plataniotis, Sergey Tulyakov, Jian Ren

    Abstract: How can one efficiently generate high-quality, wide-scope 3D scenes from arbitrary single images? Existing methods suffer several drawbacks, such as requiring multi-view data, time-consuming per-scene optimization, distorted geometry in occluded areas, and low visual quality in backgrounds. Our novel 3D scene reconstruction pipeline overcomes these limitations to tackle the aforesaid challenge. Sp… ▽ More

    Submitted 26 April, 2025; v1 submitted 16 December, 2024; originally announced December 2024.

    Comments: Project page: https://snap-research.github.io/wonderland/

  5. arXiv:2410.09176  [pdf, other

    cs.CV

    Cross-Domain Evaluation of Few-Shot Classification Models: Natural Images vs. Histopathological Images

    Authors: Ardhendu Sekhar, Aditya Bhattacharya, Vinayak Goyal, Vrinda Goel, Aditya Bhangale, Ravi Kant Gupta, Amit Sethi

    Abstract: In this study, we investigate the performance of few-shot classification models across different domains, specifically natural images and histopathological images. We first train several few-shot classification models on natural images and evaluate their performance on histopathological images. Subsequently, we train the same models on histopathological images and compare their performance. We inc… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  6. arXiv:2408.13818  [pdf, other

    eess.IV cs.CV

    HER2 and FISH Status Prediction in Breast Biopsy H&E-Stained Images Using Deep Learning

    Authors: Ardhendu Sekhar, Vrinda Goel, Garima Jain, Abhijeet Patil, Ravi Kant Gupta, Tripti Bameta, Swapnil Rane, Amit Sethi

    Abstract: The current standard for detecting human epidermal growth factor receptor 2 (HER2) status in breast cancer patients relies on HER2 amplification, identified through fluorescence in situ hybridization (FISH) or immunohistochemistry (IHC). However, hematoxylin and eosin (H\&E) tumor stains are more widely available, and accurately predicting HER2 status using H\&E could reduce costs and expedite tre… ▽ More

    Submitted 26 September, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

  7. arXiv:2406.19434  [pdf, other

    cs.GR cs.AI

    Lightweight Predictive 3D Gaussian Splats

    Authors: Junli Cao, Vidit Goel, Chaoyang Wang, Anil Kag, Ju Hu, Sergei Korolev, Chenfanfu Jiang, Sergey Tulyakov, Jian Ren

    Abstract: Recent approaches representing 3D objects and scenes using Gaussian splats show increased rendering speed across a variety of platforms and devices. While rendering such representations is indeed extremely efficient, storing and transmitting them is often prohibitively expensive. To represent large-scale scenes, one often needs to store millions of 3D Gaussians, occupying gigabytes of disk space.… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Project Page: https://plumpuddings.github.io/LPGS//

  8. arXiv:2404.07990  [pdf, other

    cs.CV cs.AI

    OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

    Authors: Moreno D'Incà, Elia Peruzzo, Massimiliano Mancini, Dejia Xu, Vidit Goel, Xingqian Xu, Zhangyang Wang, Humphrey Shi, Nicu Sebe

    Abstract: Text-to-image generative models are becoming increasingly popular and accessible to the general public. As these models see large-scale deployments, it is necessary to deeply investigate their safety and fairness to not disseminate and perpetuate any kind of biases. However, existing works focus on detecting closed sets of biases defined a priori, limiting the studies to well-known concepts. In th… ▽ More

    Submitted 5 August, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Highlight - Code: https://github.com/Picsart-AI-Research/OpenBias

  9. arXiv:2401.02473  [pdf, other

    cs.CV

    VASE: Object-Centric Appearance and Shape Manipulation of Real Videos

    Authors: Elia Peruzzo, Vidit Goel, Dejia Xu, Xingqian Xu, Yifan Jiang, Zhangyang Wang, Humphrey Shi, Nicu Sebe

    Abstract: Recently, several works tackled the video editing task fostered by the success of large-scale text-to-image generative models. However, most of these methods holistically edit the frame using the text, exploiting the prior given by foundation diffusion models and focusing on improving the temporal consistency across frames. In this work, we introduce a framework that is object-centric and is desig… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: Project Page https://helia95.github.io/vase-website/

  10. arXiv:2311.04212  [pdf, other

    cs.CV

    Video Instance Matting

    Authors: Jiachen Li, Roberto Henschel, Vidit Goel, Marianna Ohanyan, Shant Navasardyan, Humphrey Shi

    Abstract: Conventional video matting outputs one alpha matte for all instances appearing in a video frame so that individual instances are not distinguished. While video instance segmentation provides time-consistent instance masks, results are unsatisfactory for matting applications, especially due to applied binarization. To remedy this deficiency, we propose Video Instance Matting~(VIM), that is, estimat… ▽ More

    Submitted 8 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

  11. arXiv:2310.00220  [pdf, other

    physics.plasm-ph cs.CE

    Optimization of Tritium Breeding Ratio in a DT and DD Submersion Tokamak Fusion Reactor

    Authors: Vikram Goel, Soha Aslam, Sejal Dua

    Abstract: The mass of stars is enough to confine a plasma to fuse light atoms, but this is not possible to engineer on Earth. Fortunately, nuclear engineering can rely on the magnetic confinement of a plasma using superconducting coils so long as the Tritium Breeding Ratio (TBR) is optimized. This paper will investigate some of the materials which can increase the rate at which Tritium is produced within th… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

    Comments: 8 pages, 5 figures

    ACM Class: J.2; I.6

  12. Interactive Neural Painting

    Authors: Elia Peruzzo, Willi Menapace, Vidit Goel, Federica Arrigoni, Hao Tang, Xingqian Xu, Arman Chopikyan, Nikita Orlov, Yuxiao Hu, Humphrey Shi, Nicu Sebe, Elisa Ricci

    Abstract: In the last few years, Neural Painting (NP) techniques became capable of producing extremely realistic artworks. This paper advances the state of the art in this emerging research domain by proposing the first approach for Interactive NP. Considering a setting where a user looks at a scene and tries to reproduce it on a painting, our objective is to develop a computational framework to assist the… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: This is a preprint version of the paper to appear at Computer Vision and Image Understanding (CVIU). The final journal version will be available at https://www.sciencedirect.com/science/article/pii/S1077314223001583

    Journal ref: 10.1016/j.cviu.2023.103778

  13. arXiv:2303.17546  [pdf, other

    cs.CV cs.AI cs.LG

    PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor

    Authors: Vidit Goel, Elia Peruzzo, Yifan Jiang, Dejia Xu, Xingqian Xu, Nicu Sebe, Trevor Darrell, Zhangyang Wang, Humphrey Shi

    Abstract: Generative image editing has recently witnessed extremely fast-paced growth. Some works use high-level conditioning such as text, while others use low-level conditioning. Nevertheless, most of them lack fine-grained control over the properties of the different objects present in the image, i.e. object-level image editing. In this work, we tackle the task by perceiving the images as an amalgamation… ▽ More

    Submitted 8 April, 2024; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: Accepted in CVPR 2024, Project page https://vidit98.github.io/publication/conference-paper/pair_diff.html

  14. arXiv:2303.08774  [pdf, other

    cs.CL cs.AI

    GPT-4 Technical Report

    Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

    Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More

    Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 100 pages; updated authors list; fixed author names and added citation

  15. arXiv:2302.02479  [pdf, other

    cs.SI cs.AI cs.CL cs.CY

    Hatemongers ride on echo chambers to escalate hate speech diffusion

    Authors: Vasu Goel, Dhruv Sahnan, Subhabrata Dutta, Anil Bandhakavi, Tanmoy Chakraborty

    Abstract: Recent years have witnessed a swelling rise of hateful and abusive content over online social networks. While detection and moderation of hate speech have been the early go-to countermeasures, the solution requires a deeper exploration of the dynamics of hate generation and propagation. We analyze more than 32 million posts from over 6.8 million users across three popular online social networks to… ▽ More

    Submitted 5 February, 2023; originally announced February 2023.

    Comments: Accepted in PNAS Nexus

  16. arXiv:2208.12801  [pdf, other

    cs.CV

    VMFormer: End-to-End Video Matting with Transformer

    Authors: Jiachen Li, Vidit Goel, Marianna Ohanyan, Shant Navasardyan, Yunchao Wei, Humphrey Shi

    Abstract: Video matting aims to predict the alpha mattes for each frame from a given input video sequence. Recent solutions to video matting have been dominated by deep convolutional neural networks (CNN) for the past few years, which have become the de-facto standard for both academia and industry. However, they have inbuilt inductive bias of locality and do not capture global characteristics of an image d… ▽ More

    Submitted 30 November, 2022; v1 submitted 26 August, 2022; originally announced August 2022.

    Comments: Project Page at https://chrisjuniorli.github.io/project/VMFormer/

  17. arXiv:2206.04647  [pdf, other

    eess.IV cs.CV cs.LG

    VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution

    Authors: Zeyuan Chen, Yinbo Chen, Jingwen Liu, Xingqian Xu, Vidit Goel, Zhangyang Wang, Humphrey Shi, Xiaolong Wang

    Abstract: Videos typically record the streaming and continuous visual data as discrete consecutive frames. Since the storage cost is expensive for videos of high fidelity, most of them are stored in a relatively low resolution and frame rate. Recent works of Space-Time Video Super-Resolution (STVSR) are developed to incorporate temporal interpolation and spatial super-resolution in a unified framework. Howe… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: Accepted to CVPR 2022. Project page: http://zeyuan-chen.com/VideoINR/

  18. arXiv:2205.12335  [pdf, other

    cs.CL cs.LG

    K-12BERT: BERT for K-12 education

    Authors: Vasu Goel, Dhruv Sahnan, Venktesh V, Gaurav Sharma, Deep Dwivedi, Mukesh Mohania

    Abstract: Online education platforms are powered by various NLP pipelines, which utilize models like BERT to aid in content curation. Since the inception of the pre-trained language models like BERT, there have also been many efforts toward adapting these pre-trained models to specific domains. However, there has not been a model specifically adapted for the education domain (particularly K-12) across subje… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

    Comments: 4 pages

  19. arXiv:2112.06267  [pdf, other

    cs.SI

    DiVA: A Scalable, Interactive and Customizable Visual Analytics Platform for Information Diffusion on Large Networks

    Authors: Dhruv Sahnan, Vasu Goel, Sarah Masud, Chhavi Jain, Vikram Goyal, Tanmoy Chakraborty

    Abstract: With an increasing outreach of digital platforms in our lives, researchers have taken a keen interest to study different facets of social interactions that seem to be evolving rapidly. Analysing the spread of information (aka diffusion) has brought forth multiple research areas such as modelling user engagement, determining emerging topics, forecasting virality of online posts and predicting infor… ▽ More

    Submitted 21 August, 2022; v1 submitted 12 December, 2021; originally announced December 2021.

    Comments: 33 pages, 12 figures, 11 tables

  20. arXiv:2106.10452  [pdf, other

    cs.CV cs.LG

    MSN: Efficient Online Mask Selection Network for Video Instance Segmentation

    Authors: Vidit Goel, Jiachen Li, Shubhika Garg, Harsh Maheshwari, Humphrey Shi

    Abstract: In this work we present a novel solution for Video Instance Segmentation(VIS), that is automatically generating instance level segmentation masks along with object class and tracking them in a video. Our method improves the masks from segmentation and propagation branches in an online manner using the Mask Selection Network (MSN) hence limiting the noise accumulation during mask tracking. We propo… ▽ More

    Submitted 19 June, 2021; originally announced June 2021.

    Comments: 3rd Place Solution to the YouTube-VIS Challenge at CVPR 2021

  21. arXiv:2008.01761  [pdf, other

    cs.LG cs.CR stat.ML

    Can Adversarial Weight Perturbations Inject Neural Backdoors?

    Authors: Siddhant Garg, Adarsh Kumar, Vibhor Goel, Yingyu Liang

    Abstract: Adversarial machine learning has exposed several security hazards of neural models and has become an important research topic in recent times. Thus far, the concept of an "adversarial perturbation" has exclusively been used with reference to the input space referring to a small, imperceptible change which can cause a ML model to err. In this work we extend the idea of "adversarial perturbations" t… ▽ More

    Submitted 21 September, 2020; v1 submitted 4 August, 2020; originally announced August 2020.

    Comments: Accepted as a conference paper at CIKM 2020

  22. arXiv:2007.04422  [pdf, other

    cs.CV cs.CL

    IQ-VQA: Intelligent Visual Question Answering

    Authors: Vatsal Goel, Mohit Chandak, Ashish Anand, Prithwijit Guha

    Abstract: Even though there has been tremendous progress in the field of Visual Question Answering, models today still tend to be inconsistent and brittle. To this end, we propose a model-independent cyclic framework which increases consistency and robustness of any VQA architecture. We train our models to answer the original question, generate an implication based on the answer and then also learn to answe… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

  23. arXiv:2004.14774  [pdf, other

    cs.CV cs.LG cs.RO eess.IV stat.ML

    IROS 2019 Lifelong Robotic Vision Challenge -- Lifelong Object Recognition Report

    Authors: Qi She, Fan Feng, Qi Liu, Rosa H. M. Chan, Xinyue Hao, Chuanlin Lan, Qihan Yang, Vincenzo Lomonaco, German I. Parisi, Heechul Bae, Eoin Brophy, Baoquan Chen, Gabriele Graffieti, Vidit Goel, Hyonyoung Han, Sathursan Kanagarajah, Somesh Kumar, Siew-Kei Lam, Tin Lun Lam, Liang Ma, Davide Maltoni, Lorenzo Pellegrini, Duvindu Piyasena, Shiliang Pu, Debdoot Sheet , et al. (11 additional authors not shown)

    Abstract: This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, w… ▽ More

    Submitted 26 April, 2020; originally announced April 2020.

    Comments: 9 pages, 11 figures, 3 tables, accepted into IEEE Robotics and Automation Magazine. arXiv admin note: text overlap with arXiv:1911.06487

  24. arXiv:1805.07780  [pdf, other

    cs.CV cs.AI cs.LG

    Unsupervised Video Object Segmentation for Deep Reinforcement Learning

    Authors: Vik Goel, Jameson Weng, Pascal Poupart

    Abstract: We present a new technique for deep reinforcement learning that automatically detects moving objects and uses the relevant information for action selection. The detection of moving objects is done in an unsupervised way by exploiting structure from motion. Instead of directly learning a policy from raw images, the agent first learns to detect and segment moving objects by exploiting flow informati… ▽ More

    Submitted 20 May, 2018; originally announced May 2018.

  25. arXiv:1801.04638  [pdf, ps, other

    math.GR cs.FL

    Pointlike sets for varieties determined by groups

    Authors: Samuel J. v. Gool, B. Steinberg

    Abstract: For a variety of finite groups $\mathbf H$, let $\overline{\mathbf H}$ denote the variety of finite semigroups all of whose subgroups lie in $\mathbf H$. We give a characterization of the subsets of a finite semigroup that are pointlike with respect to $\overline{\mathbf H}$. Our characterization is effective whenever $\mathbf H$ has a decidable membership problem. In particular, the separation pr… ▽ More

    Submitted 14 January, 2018; originally announced January 2018.

    MSC Class: 20M07; 20M35

  26. arXiv:1710.06937  [pdf, ps, other

    cs.CL

    Embedding-Based Speaker Adaptive Training of Deep Neural Networks

    Authors: Xiaodong Cui, Vaibhava Goel, George Saon

    Abstract: An embedding-based speaker adaptive training (SAT) approach is proposed and investigated in this paper for deep neural network acoustic modeling. In this approach, speaker embedding vectors, which are a constant given a particular speaker, are mapped through a control network to layer-dependent element-wise affine transformations to canonicalize the internal feature representations at the output o… ▽ More

    Submitted 17 October, 2017; originally announced October 2017.

  27. arXiv:1708.08118  [pdf, ps, other

    math.GR cs.FL math.RA

    Merge decompositions, two-sided Krohn-Rhodes, and aperiodic pointlikes

    Authors: Samuel J. v. Gool, Benjamin Steinberg

    Abstract: This paper provides short proofs of two fundamental theorems of finite semigroup theory whose previous proofs were significantly longer, namely the two-sided Krohn-Rhodes decomposition theorem and Henckell's aperiodic pointlike theorem, using a new algebraic technique that we call the merge decomposition. A prototypical application of this technique decomposes a semigroup $T$ into a two-sided semi… ▽ More

    Submitted 27 August, 2017; originally announced August 2017.

    Comments: 8 pages

    MSC Class: 20M07; 20M35; 68Q70 ACM Class: F.4.3

  28. arXiv:1702.08398  [pdf, other

    cs.LG stat.ML

    McGan: Mean and Covariance Feature Matching GAN

    Authors: Youssef Mroueh, Tom Sercu, Vaibhava Goel

    Abstract: We introduce new families of Integral Probability Metrics (IPM) for training Generative Adversarial Networks (GAN). Our IPMs are based on matching statistics of distributions embedded in a finite dimensional feature space. Mean and covariance feature matching IPMs allow for stable training of GANs, which we will call McGan. McGan minimizes a meaningful loss between distributions.

    Submitted 8 June, 2017; v1 submitted 27 February, 2017; originally announced February 2017.

    Comments: 15 pages; published at ICML 2017

  29. ArchiveSpark: Efficient Web Archive Access, Extraction and Derivation

    Authors: Helge Holzmann, Vinay Goel, Avishek Anand

    Abstract: Web archives are a valuable resource for researchers of various disciplines. However, to use them as a scholarly source, researchers require a tool that provides efficient access to Web archive data for extraction and derivation of smaller datasets. Besides efficient access we identify five other objectives based on practical researcher needs such as ease of use, extensibility and reusability. T… ▽ More

    Submitted 3 February, 2017; originally announced February 2017.

    Comments: JCDL 2016, Newark, NJ, USA

  30. arXiv:1612.00563  [pdf, other

    cs.LG cs.AI cs.CV

    Self-critical Sequence Training for Image Captioning

    Authors: Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, Vaibhava Goel

    Abstract: Recently it has been shown that policy-gradient methods for reinforcement learning can be utilized to train deep end-to-end systems directly on non-differentiable metrics for the task at hand. In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the test metrics of the MSCOCO task, signifi… ▽ More

    Submitted 15 November, 2017; v1 submitted 1 December, 2016; originally announced December 2016.

    Comments: CVPR 2017 + additional analysis + fixed baseline results, 16 pages

  31. arXiv:1611.09288  [pdf, other

    cs.CL cs.LG cs.NE

    Dense Prediction on Sequences with Time-Dilated Convolutions for Speech Recognition

    Authors: Tom Sercu, Vaibhava Goel

    Abstract: In computer vision pixelwise dense prediction is the task of predicting a label for each pixel in the image. Convolutional neural networks achieve good performance on this task, while being computationally efficient. In this paper we carry these ideas over to the problem of assigning a sequence of labels to a set of speech frames, a task commonly known as framewise classification. We show that den… ▽ More

    Submitted 14 December, 2016; v1 submitted 28 November, 2016; originally announced November 2016.

    Comments: Appeared at NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop

  32. arXiv:1610.07686  [pdf, ps, other

    cs.LG

    Co-Occuring Directions Sketching for Approximate Matrix Multiply

    Authors: Youssef Mroueh, Etienne Marcheret, Vaibhava Goel

    Abstract: We introduce co-occurring directions sketching, a deterministic algorithm for approximate matrix product (AMM), in the streaming model. We show that co-occuring directions achieves a better error bound for AMM than other randomized and deterministic approaches for AMM. Co-occurring directions gives a $1 + ε$ -approximation of the optimal low rank approximation of a matrix product. Empirically our… ▽ More

    Submitted 24 October, 2016; originally announced October 2016.

  33. arXiv:1609.07736  [pdf, ps, other

    cs.FL math.GR math.LO math.RA

    Pro-aperiodic monoids via saturated models

    Authors: Samuel J. v. Gool, Benjamin Steinberg

    Abstract: We apply Stone duality and model theory to study the structure theory of free pro-aperiodic monoids. Stone duality implies that elements of the free pro-aperiodic monoid may be viewed as elementary equivalence classes of pseudofinite words. Model theory provides us with saturated words in each such class, i.e., words in which all possible factorizations are realized. We give several applications o… ▽ More

    Submitted 28 August, 2017; v1 submitted 25 September, 2016; originally announced September 2016.

    Comments: Technical report, submitted

    MSC Class: 68Q45; 03D05; 20M35; 03C50 ACM Class: F.4.3

  34. arXiv:1604.01792  [pdf, other

    cs.CL cs.LG cs.NE

    Advances in Very Deep Convolutional Neural Networks for LVCSR

    Authors: Tom Sercu, Vaibhava Goel

    Abstract: Very deep CNNs with small 3x3 kernels have recently been shown to achieve very strong performance as acoustic models in hybrid NN-HMM speech recognition systems. In this paper we investigate how to efficiently scale these models to larger datasets. Specifically, we address the design choice of pooling and padding along the time dimension which renders convolutional evaluation of sequences highly i… ▽ More

    Submitted 24 June, 2016; v1 submitted 6 April, 2016; originally announced April 2016.

    Comments: Proc. Interspeech 2016

  35. arXiv:1511.06267  [pdf, other

    cs.LG

    Asymmetrically Weighted CCA And Hierarchical Kernel Sentence Embedding For Image & Text Retrieval

    Authors: Youssef Mroueh, Etienne Marcheret, Vaibhava Goel

    Abstract: Joint modeling of language and vision has been drawing increasing interest. A multimodal data representation allowing for bidirectional retrieval of images by sentences and vice versa is a key aspect. In this paper we present three contributions in canonical correlation analysis (CCA) based multimodal retrieval. Firstly, we show that an asymmetric weighting of the canonical weights, while achievin… ▽ More

    Submitted 5 December, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: Under Review CVPR 2017

  36. arXiv:1506.03705  [pdf, other

    cs.LG stat.ML

    Random Maxout Features

    Authors: Youssef Mroueh, Steven Rennie, Vaibhava Goel

    Abstract: In this paper, we propose and study random maxout features, which are constructed by first projecting the input data onto sets of randomly generated vectors with Gaussian elements, and then outputing the maximum projection value for each set. We show that the resulting random feature map, when used in conjunction with linear models, allows for the locally linear estimation of the function of inter… ▽ More

    Submitted 12 June, 2015; v1 submitted 11 June, 2015; originally announced June 2015.

  37. arXiv:1501.05396  [pdf, other

    cs.CL cs.LG

    Deep Multimodal Learning for Audio-Visual Speech Recognition

    Authors: Youssef Mroueh, Etienne Marcheret, Vaibhava Goel

    Abstract: In this paper, we present methods in deep multimodal learning for fusing speech and visual modalities for Audio-Visual Automatic Speech Recognition (AV-ASR). First, we study an approach where uni-modal deep networks are trained separately and their final hidden layers fused to obtain a joint feature space in which another deep network is built. While the audio network alone achieves a phone error… ▽ More

    Submitted 22 January, 2015; originally announced January 2015.

    Comments: ICASSP 2015