Skip to main content

Showing 1–26 of 26 results for author: Liang, K J

.
  1. arXiv:2505.17015  [pdf, other

    cs.CV cs.CL

    Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

    Authors: Runsen Xu, Weiyao Wang, Hao Tang, Xingyu Chen, Xiaodong Wang, Fu-Jen Chu, Dahua Lin, Matt Feiszli, Kevin J. Liang

    Abstract: Multi-modal large language models (MLLMs) have rapidly advanced in visual tasks, yet their spatial understanding remains limited to single images, leaving them ill-suited for robotics and other real-world applications that require multi-frame reasoning. In this paper, we propose a framework to equip MLLMs with robust multi-frame spatial understanding by integrating depth perception, visual corresp… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 24 pages. An MLLM, dataset, and benchmark for multi-frame spatial understanding. Project page: https://runsenxu.com/projects/Multi-SpatialMLLM

  2. arXiv:2503.19157  [pdf, other

    cs.CV

    HOIGPT: Learning Long Sequence Hand-Object Interaction with Language Models

    Authors: Mingzhen Huang, Fu-Jen Chu, Bugra Tekin, Kevin J Liang, Haoyu Ma, Weiyao Wang, Xingyu Chen, Pierre Gleize, Hongfei Xue, Siwei Lyu, Kris Kitani, Matt Feiszli, Hao Tang

    Abstract: We introduce HOIGPT, a token-based generative method that unifies 3D hand-object interactions (HOI) perception and generation, offering the first comprehensive solution for captioning and generating high-quality 3D HOI sequences from a diverse range of conditional signals (\eg text, objects, partial sequences). At its core, HOIGPT utilizes a large language model to predict the bidrectional transfo… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  3. arXiv:2501.13928  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

    Authors: Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, Matt Feiszli

    Abstract: Multi-view 3D reconstruction remains a core challenge in computer vision, particularly in applications requiring accurate and scalable representations across diverse perspectives. Current leading methods such as DUSt3R employ a fundamentally pairwise approach, processing images in pairs and necessitating costly global alignment procedures to reconstruct from multiple views. In this work, we propos… ▽ More

    Submitted 19 March, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

    Comments: CVPR 2025. Project website: https://fast3r-3d.github.io/

  4. arXiv:2401.08937  [pdf, other

    cs.CV

    ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization

    Authors: Weiyao Wang, Pierre Gleize, Hao Tang, Xingyu Chen, Kevin J Liang, Matt Feiszli

    Abstract: Neural Radiance Fields (NeRF) exhibit remarkable performance for Novel View Synthesis (NVS) given a set of 2D images. However, NeRF training requires accurate camera pose for each input view, typically obtained by Structure-from-Motion (SfM) pipelines. Recent works have attempted to relax this constraint, but they still often rely on decent initial poses which they can refine. Here we aim at remov… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  5. arXiv:2312.15086  [pdf, other

    cs.LG cs.CV

    HyperMix: Out-of-Distribution Detection and Classification in Few-Shot Settings

    Authors: Nikhil Mehta, Kevin J Liang, Jing Huang, Fu-Jen Chu, Li Yin, Tal Hassner

    Abstract: Out-of-distribution (OOD) detection is an important topic for real-world machine learning systems, but settings with limited in-distribution samples have been underexplored. Such few-shot OOD settings are challenging, as models have scarce opportunities to learn the data distribution before being tasked with identifying OOD samples. Indeed, we demonstrate that recent state-of-the-art OOD methods f… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  6. arXiv:2312.01167  [pdf, other

    cs.CV cs.LG stat.ML

    Meta-Learned Attribute Self-Interaction Network for Continual and Generalized Zero-Shot Learning

    Authors: Vinay K Verma, Nikhil Mehta, Kevin J Liang, Aakansha Mishra, Lawrence Carin

    Abstract: Zero-shot learning (ZSL) is a promising approach to generalizing a model to categories unseen during training by leveraging class attributes, but challenges remain. Recently, methods using generative models to combat bias towards classes seen during training have pushed state of the art, but these generative models can be slow or computationally expensive to train. Also, these generative models as… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

    Comments: Accepted in IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024. arXiv admin note: substantial text overlap with arXiv:2102.11856

  7. arXiv:2311.18259  [pdf, other

    cs.CV cs.AI

    Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

    Authors: Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain , et al. (76 additional authors not shown)

    Abstract: We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from… ▽ More

    Submitted 25 September, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Expanded manuscript (compared to arxiv v1 from Nov 2023 and CVPR 2024 paper from June 2024) for more comprehensive dataset and benchmark presentation, plus new results on v2 data release

  8. arXiv:2210.13605  [pdf, other

    cs.CV

    GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction

    Authors: Samrudhdhi B Rangrej, Kevin J Liang, Tal Hassner, James J Clark

    Abstract: Many online action prediction models observe complete frames to locate and attend to informative subregions in the frames called glimpses and recognize an ongoing action based on global and local information. However, in applications with constrained resources, an agent may not be able to observe the complete frame, yet must still locate useful glimpses to predict an incomplete action based on loc… ▽ More

    Submitted 18 April, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: Accepted to WACV 2023

  9. arXiv:2210.07423  [pdf, other

    cs.CV

    Task Grouping for Multilingual Text Recognition

    Authors: Jing Huang, Kevin J Liang, Rama Kovvuri, Tal Hassner

    Abstract: Most existing OCR methods focus on alphanumeric characters due to the popularity of English and numbers, as well as their corresponding datasets. On extending the characters to more languages, recent methods have shown that training different scripts with different recognition heads can greatly improve the end-to-end recognition accuracy compared to combining characters from all languages in the s… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: ECCV 2022: Text in Everything (TIE) Workshop (Oral)

  10. arXiv:2204.05494  [pdf, other

    cs.CV stat.ML

    Few-shot Learning with Noisy Labels

    Authors: Kevin J Liang, Samrudhdhi B. Rangrej, Vladan Petrovic, Tal Hassner

    Abstract: Few-shot learning (FSL) methods typically assume clean support sets with accurately labeled samples when training on novel classes. This assumption can often be unrealistic: support sets, no matter how small, can still include mislabeled samples. Robustness to label noise is therefore essential for FSL methods to be practical, but this problem surprisingly remains largely unexplored. To address mi… ▽ More

    Submitted 31 July, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted to CVPR 2022

  11. arXiv:2203.13903  [pdf, other

    cs.CV

    Sylph: A Hypernetwork Framework for Incremental Few-shot Object Detection

    Authors: Li Yin, Juan M Perez-Rua, Kevin J Liang

    Abstract: We study the challenging incremental few-shot object detection (iFSD) setting. Recently, hypernetwork-based approaches have been studied in the context of continuous and finetune-free iFSD with limited success. We take a closer look at important design choices of such methods, leading to several key improvements and resulting in a more accurate and flexible framework, which we call Sylph. In parti… ▽ More

    Submitted 4 April, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022

  12. arXiv:2201.02302  [pdf, other

    cs.CV

    Extending One-Stage Detection with Open-World Proposals

    Authors: Sachin Konan, Kevin J Liang, Li Yin

    Abstract: In many applications, such as autonomous driving, hand manipulation, or robot navigation, object detection methods must be able to detect objects unseen in the training set. Open World Detection(OWD) seeks to tackle this problem by generalizing detection performance to seen and unseen class categories. Recent works have seen success in the generation of class-agnostic proposals, which we call Open… ▽ More

    Submitted 12 January, 2022; v1 submitted 6 January, 2022; originally announced January 2022.

  13. arXiv:2104.13417  [pdf, other

    cs.CV cs.LG stat.ML

    Towards Fair Federated Learning with Zero-Shot Data Augmentation

    Authors: Weituo Hao, Mostafa El-Khamy, Jungwon Lee, Jianyi Zhang, Kevin J Liang, Changyou Chen, Lawrence Carin

    Abstract: Federated learning has emerged as an important distributed learning paradigm, where a server aggregates a global model from many client-trained models while having no access to the client data. Although it is recognized that statistical heterogeneity of the client local data yields slower global model convergence, it is less commonly recognized that it also yields a biased federated global model w… ▽ More

    Submitted 27 April, 2021; originally announced April 2021.

    Comments: Accepted by IEEE CVPR Workshop on Fair, Data Efficient And Trusted Computer Vision

  14. arXiv:2103.15992  [pdf, other

    cs.CV

    A Multiplexed Network for End-to-End, Multilingual OCR

    Authors: Jing Huang, Guan Pang, Rama Kovvuri, Mandy Toh, Kevin J Liang, Praveen Krishnan, Xi Yin, Tal Hassner

    Abstract: Recent advances in OCR have shown that an end-to-end (E2E) training pipeline that includes both detection and recognition leads to the best results. However, many existing methods focus primarily on Latin-alphabet languages, often even only case-insensitive English characters. In this paper, we propose an E2E approach, Multiplexed Multilingual Mask TextSpotter, that performs script identification… ▽ More

    Submitted 29 March, 2021; originally announced March 2021.

  15. arXiv:2103.13558  [pdf, other

    cs.LG cs.AI cs.CV

    Efficient Feature Transformations for Discriminative and Generative Continual Learning

    Authors: Vinay Kumar Verma, Kevin J Liang, Nikhil Mehta, Piyush Rai, Lawrence Carin

    Abstract: As neural networks are increasingly being applied to real-world applications, mechanisms to address distributional shift and sequential task learning without forgetting are critical. Methods incorporating network expansion have shown promise by naturally adding model capacity for learning new tasks while simultaneously avoiding catastrophic forgetting. However, the growth in the number of addition… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

    Comments: Accepted in CVPR 2021

  16. arXiv:2103.09916  [pdf, other

    cs.LG

    Can Targeted Adversarial Examples Transfer When the Source and Target Models Have No Label Space Overlap?

    Authors: Nathan Inkawhich, Kevin J Liang, Jingyang Zhang, Huanrui Yang, Hai Li, Yiran Chen

    Abstract: We design blackbox transfer-based targeted adversarial attacks for an environment where the attacker's source model and the target blackbox model may have disjoint label spaces and training datasets. This scenario significantly differs from the "standard" blackbox setting, and warrants a unique approach to the attacking process. Our methodology begins with the construction of a class correspondenc… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

  17. arXiv:2011.00593  [pdf, other

    cs.CL stat.ML

    MixKD: Towards Efficient Distillation of Large-scale Language Models

    Authors: Kevin J Liang, Weituo Hao, Dinghan Shen, Yufan Zhou, Weizhu Chen, Changyou Chen, Lawrence Carin

    Abstract: Large-scale language models have recently demonstrated impressive empirical performance. Nevertheless, the improved results are attained at the price of bigger models, more power consumption, and slower inference, which hinder their applicability to low-resource (both memory and computation) platforms. Knowledge distillation (KD) has been demonstrated as an effective framework for compressing such… ▽ More

    Submitted 17 March, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

    Comments: ICLR 2021 Camera Ready

  18. Background Adaptive Faster R-CNN for Semi-Supervised Convolutional Object Detection of Threats in X-Ray Images

    Authors: John B. Sigman, Gregory P. Spell, Kevin J Liang, Lawrence Carin

    Abstract: Recently, progress has been made in the supervised training of Convolutional Object Detectors (e.g. Faster R-CNN) for threat recognition in carry-on luggage using X-ray images. This is part of the Transportation Security Administration's (TSA's) mission to protect air travelers in the United States. While more training data with threats may reliably improve performance for this class of deep algor… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

    Journal ref: Proc. SPIE 11404, Anomaly Detection and Imaging with X-Rays (ADIX) V, 1140404 (26 May 2020)

  19. arXiv:2008.05687  [pdf, other

    cs.LG stat.ML

    WAFFLe: Weight Anonymized Factorization for Federated Learning

    Authors: Weituo Hao, Nikhil Mehta, Kevin J Liang, Pengyu Cheng, Mostafa El-Khamy, Lawrence Carin

    Abstract: In domains where data are sensitive or private, there is great value in methods that can learn in a distributed manner without the data ever leaving the local devices. In light of this need, federated learning has emerged as a popular training paradigm. However, many federated learning approaches trade transmitting data for communicating updated weight parameters for each local device. Therefore,… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

  20. arXiv:2004.14861  [pdf, other

    cs.CR cs.LG stat.ML

    Perturbing Across the Feature Hierarchy to Improve Standard and Strict Blackbox Attack Transferability

    Authors: Nathan Inkawhich, Kevin J Liang, Binghui Wang, Matthew Inkawhich, Lawrence Carin, Yiran Chen

    Abstract: We consider the blackbox transfer-based targeted adversarial attack threat model in the realm of deep neural network (DNN) image classifiers. Rather than focusing on crossing decision boundaries at the output layer of the source model, our method perturbs representations throughout the extracted feature hierarchy to resemble other classes. We design a flexible attack framework that allows for mult… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

  21. arXiv:2004.12519  [pdf, other

    cs.LG stat.ML

    Transferable Perturbations of Deep Feature Distributions

    Authors: Nathan Inkawhich, Kevin J Liang, Lawrence Carin, Yiran Chen

    Abstract: Almost all current adversarial attacks of CNN classifiers rely on information derived from the output layer of the network. This work presents a new adversarial attack based on the modeling and exploitation of class-wise and layer-wise deep feature distributions. We achieve state-of-the-art targeted blackbox transfer-based attack results for undefended ImageNet models. Further, we place a priority… ▽ More

    Submitted 26 April, 2020; originally announced April 2020.

    Comments: Published as a conference paper at ICLR 2020

  22. arXiv:2004.10098  [pdf, other

    cs.LG stat.ML

    Continual Learning using a Bayesian Nonparametric Dictionary of Weight Factors

    Authors: Nikhil Mehta, Kevin J Liang, Vinay K Verma, Lawrence Carin

    Abstract: Naively trained neural networks tend to experience catastrophic forgetting in sequential task settings, where data from previous tasks are unavailable. A number of methods, using various model expansion strategies, have been proposed recently as possible solutions. However, determining how much to expand the model is left to the practitioner, and often a constant schedule is chosen for simplicity,… ▽ More

    Submitted 27 April, 2021; v1 submitted 21 April, 2020; originally announced April 2020.

    Comments: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021 Post-conference updates: Fixed typo in equation (11) and updated references

  23. arXiv:2002.04672  [pdf, other

    cs.CV

    Object Detection as a Positive-Unlabeled Problem

    Authors: Yuewei Yang, Kevin J Liang, Lawrence Carin

    Abstract: As with other deep learning methods, label quality is important for learning modern convolutional object detectors. However, the potentially large number and wide diversity of object instances that can be found in complex image scenes makes constituting complete annotations a challenging task; objects missing annotations can be observed in a variety of popular object detection datasets. These miss… ▽ More

    Submitted 1 November, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

    Comments: Published as a conference paper in the British Machine Vision Conference (BMVC) 2020

  24. arXiv:1912.06329  [pdf, other

    cs.CV

    Toward Automatic Threat Recognition for Airport X-ray Baggage Screening with Deep Convolutional Object Detection

    Authors: Kevin J Liang, John B. Sigman, Gregory P. Spell, Dan Strellis, William Chang, Felix Liu, Tejas Mehta, Lawrence Carin

    Abstract: For the safety of the traveling public, the Transportation Security Administration (TSA) operates security checkpoints at airports in the United States, seeking to keep dangerous items off airplanes. At these checkpoints, the TSA employs a fleet of X-ray scanners, such as the Rapiscan 620DV, so Transportation Security Officers (TSOs) can inspect the contents of carry-on possessions. However, ident… ▽ More

    Submitted 13 December, 2019; originally announced December 2019.

  25. arXiv:1910.04233  [pdf, other

    stat.ML cs.LG cs.NE

    Kernel-Based Approaches for Sequence Modeling: Connections to Neural Methods

    Authors: Kevin J Liang, Guoyin Wang, Yitong Li, Ricardo Henao, Lawrence Carin

    Abstract: We investigate time-dependent data analysis from the perspective of recurrent kernel machines, from which models with hidden units and gated memory cells arise naturally. By considering dynamic gating of the memory cell, a model closely related to the long short-term memory (LSTM) recurrent neural network is derived. Extending this setup to $n$-gram filters, the convolutional neural network (CNN),… ▽ More

    Submitted 9 October, 2019; originally announced October 2019.

  26. arXiv:1811.11083  [pdf, other

    stat.ML cs.LG

    Generative Adversarial Network Training is a Continual Learning Problem

    Authors: Kevin J Liang, Chunyuan Li, Guoyin Wang, Lawrence Carin

    Abstract: Generative Adversarial Networks (GANs) have proven to be a powerful framework for learning to draw samples from complex distributions. However, GANs are also notoriously difficult to train, with mode collapse and oscillations a common problem. We hypothesize that this is at least in part due to the evolution of the generator distribution and the catastrophic forgetting tendency of neural networks,… ▽ More

    Submitted 27 November, 2018; originally announced November 2018.