Skip to main content

Showing 1–41 of 41 results for author: Cao, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.08013  [pdf, ps, other

    cs.CV cs.AI cs.LG

    StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets

    Authors: Anh-Quan Cao, Ivan Lopes, Raoul de Charette

    Abstract: Multi-task learning for dense prediction is limited by the need for extensive annotation for every task, though recent works have explored training with partial task labels. Leveraging the generalization power of diffusion models, we extend the partial learning setup to a zero-shot setting, training a multi-task model on multiple synthetic datasets, each labeled for only a subset of tasks. Our met… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: Code is available at https://github.com/astra-vision/StableMTL

  2. arXiv:2504.14151  [pdf, other

    cs.CV cs.AI cs.RO

    Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D

    Authors: Sergio Arnaud, Paul McVay, Ada Martin, Arjun Majumdar, Krishna Murthy Jatavallabhula, Phillip Thomas, Ruslan Partsey, Daniel Dugas, Abha Gejji, Alexander Sax, Vincent-Pierre Berges, Mikael Henaff, Ayush Jain, Ang Cao, Ishita Prasad, Mrinal Kalakrishnan, Michael Rabbat, Nicolas Ballas, Mido Assran, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier

    Abstract: We present LOCATE 3D, a model for localizing objects in 3D scenes from referring expressions like "the small coffee table between the sofa and the lamp." LOCATE 3D sets a new state-of-the-art on standard referential grounding benchmarks and showcases robust generalization capabilities. Notably, LOCATE 3D operates directly on sensor observation streams (posed RGB-D frames), enabling real-world depl… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    ACM Class: I.2.10; I.2.6; I.2.9; I.3.7; I.4.6; I.4.8

  3. arXiv:2502.20389  [pdf, ps, other

    cs.CV

    From Thousands to Billions: 3D Visual Language Grounding via Render-Supervised Distillation from 2D VLMs

    Authors: Ang Cao, Sergio Arnaud, Oleksandr Maksymets, Jianing Yang, Ayush Jain, Sriram Yenamandra, Ada Martin, Vincent-Pierre Berges, Paul McVay, Ruslan Partsey, Aravind Rajeswaran, Franziska Meier, Justin Johnson, Jeong Joon Park, Alexander Sax

    Abstract: 3D vision-language grounding faces a fundamental data bottleneck: while 2D models train on billions of images, 3D models have access to only thousands of labeled scenes--a six-order-of-magnitude gap that severely limits performance. We introduce $\textbf{LIFT-GS}$, a practical distillation technique that overcomes this limitation by using differentiable rendering to bridge 3D and 2D supervision. L… ▽ More

    Submitted 9 June, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

    Comments: Project page: https://liftgs.github.io

  4. arXiv:2501.13928  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

    Authors: Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, Matt Feiszli

    Abstract: Multi-view 3D reconstruction remains a core challenge in computer vision, particularly in applications requiring accurate and scalable representations across diverse perspectives. Current leading methods such as DUSt3R employ a fundamentally pairwise approach, processing images in pairs and necessitating costly global alignment procedures to reconstruct from multiple views. In this work, we propos… ▽ More

    Submitted 19 March, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

    Comments: CVPR 2025. Project website: https://fast3r-3d.github.io/

  5. arXiv:2501.10030  [pdf, ps, other

    eess.SY cs.IT

    Informativity Conditions for Multiple Signals: Properties, Experimental Design, and Applications

    Authors: Ao Cao, Fuyong Wang

    Abstract: Recent studies highlight the importance of persistently exciting condition in single signal sequence for model identification and data-driven control methodologies. However, maintaining prolonged excitation in control signals introduces significant challenges, as continuous excitation can reduce the lifetime of mechanical devices. In this paper, we introduce three informativity conditions for vari… ▽ More

    Submitted 17 January, 2025; originally announced January 2025.

  6. arXiv:2501.00569  [pdf, other

    cs.CV cs.LG

    Probing Visual Language Priors in VLMs

    Authors: Tiange Luo, Ang Cao, Gunhee Lee, Justin Johnson, Honglak Lee

    Abstract: Despite recent advances in Vision-Language Models (VLMs), they may over-rely on visual language priors existing in their training data rather than true visual reasoning. To investigate this, we introduce ViLP, a benchmark featuring deliberately out-of-distribution images synthesized via image generation models and out-of-distribution Q&A pairs. Each question in ViLP is coupled with three potential… ▽ More

    Submitted 11 April, 2025; v1 submitted 31 December, 2024; originally announced January 2025.

    Comments: Project Page: https://vilp-team.github.io/

  7. arXiv:2412.18496  [pdf, other

    cs.CL

    Generating event descriptions under syntactic and semantic constraints

    Authors: Angela Cao, Faye Holt, Jonas Chan, Stephanie Richter, Lelia Glass, Aaron Steven White

    Abstract: With the goal of supporting scalable lexical semantic annotation, analysis, and theorizing, we conduct a comprehensive evaluation of different methods for generating event descriptions under both syntactic constraints -- e.g. desired clause structure -- and semantic constraints -- e.g. desired verb sense. We compare three different methods -- (i) manual generation by experts; (ii) sampling from a… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

  8. arXiv:2411.11927  [pdf, other

    cs.CV

    FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training

    Authors: Anjia Cao, Xing Wei, Zhiheng Ma

    Abstract: Language-image pre-training faces significant challenges due to limited data in specific formats and the constrained capacities of text encoders. While prevailing methods attempt to address these issues through data augmentation and architecture modifications, they continue to struggle with processing long-form text inputs, and the inherent limitations of traditional CLIP text encoders lead to sub… ▽ More

    Submitted 26 April, 2025; v1 submitted 18 November, 2024; originally announced November 2024.

  9. arXiv:2410.21872  [pdf

    cs.CV cs.AI

    Advancing Efficient Brain Tumor Multi-Class Classification -- New Insights from the Vision Mamba Model in Transfer Learning

    Authors: Yinyi Lai, Anbo Cao, Yuan Gao, Jiaqi Shang, Zongyu Li, Jia Guo

    Abstract: Early and accurate diagnosis of brain tumors is crucial for improving patient survival rates. However, the detection and classification of brain tumors are challenging due to their diverse types and complex morphological characteristics. This study investigates the application of pre-trained models for brain tumor classification, with a particular focus on deploying the Mamba model. We fine-tuned… ▽ More

    Submitted 5 November, 2024; v1 submitted 29 October, 2024; originally announced October 2024.

  10. arXiv:2410.14795  [pdf, other

    cs.CL

    Cross-Document Event-Keyed Summarization

    Authors: William Walden, Pavlo Kuchmiichuk, Alexander Martin, Chihsheng Jin, Angela Cao, Claire Sun, Curisia Allen, Aaron Steven White

    Abstract: Event-keyed summarization (EKS) requires summarizing a specific event described in a document given the document text and an event representation extracted from it. In this work, we extend EKS to the cross-document setting (CDEKS), in which summaries must synthesize information from accounts of the same event as given by multiple sources. We introduce SEAMUS (Summaries of Events Across Multiple So… ▽ More

    Submitted 15 December, 2024; v1 submitted 18 October, 2024; originally announced October 2024.

    Comments: ACL Rolling Review long paper (in submission)

  11. arXiv:2410.08211  [pdf, other

    cs.CV cs.AI cs.CL

    LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts

    Authors: Anh-Quan Cao, Maximilian Jaritz, Matthieu Guillaumin, Raoul de Charette, Loris Bazzani

    Abstract: Large-scale vision-language pre-trained (VLP) models (e.g., CLIP) are renowned for their versatility, as they can be applied to diverse applications in a zero-shot setup. However, when these models are used in specific domains, their performance often falls short due to domain gaps or the under-representation of these domains in the training data. While fine-tuning VLP models on custom datasets wi… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  12. arXiv:2409.08307  [pdf, other

    eess.IV cs.CV

    MedSegMamba: 3D CNN-Mamba Hybrid Architecture for Brain Segmentation

    Authors: Aaron Cao, Zongyu Li, Jordan Jomsky, Andrew F. Laine, Jia Guo

    Abstract: Widely used traditional pipelines for subcortical brain segmentation are often inefficient and slow, particularly when processing large datasets. Furthermore, deep learning models face challenges due to the high resolution of MRI images and the large number of anatomical classes involved. To address these limitations, we developed a 3D patch-based hybrid CNN-Mamba model that leverages Mamba's sele… ▽ More

    Submitted 13 October, 2024; v1 submitted 11 September, 2024; originally announced September 2024.

    Comments: 14 pages, 8 figures

  13. arXiv:2408.16924  [pdf, other

    cs.CV cs.ET

    Enhancing Autism Spectrum Disorder Early Detection with the Parent-Child Dyads Block-Play Protocol and an Attention-enhanced GCN-xLSTM Hybrid Deep Learning Framework

    Authors: Xiang Li, Lizhou Fan, Hanbo Wu, Kunping Chen, Xiaoxiao Yu, Chao Che, Zhifeng Cai, Xiuhong Niu, Aihua Cao, Xin Ma

    Abstract: Autism Spectrum Disorder (ASD) is a rapidly growing neurodevelopmental disorder. Performing a timely intervention is crucial for the growth of young children with ASD, but traditional clinical screening methods lack objectivity. This study introduces an innovative approach to early detection of ASD. The contributions are threefold. First, this work proposes a novel Parent-Child Dyads Block-Play (P… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 18 pages, 8 figures, and 4 tables

  14. arXiv:2407.02599  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Meta 3D Gen

    Authors: Raphael Bensadoun, Tom Monnier, Yanir Kleiman, Filippos Kokkinos, Yawar Siddiqui, Mahendra Kariya, Omri Harosh, Roman Shapovalov, Benjamin Graham, Emilien Garreau, Animesh Karnewar, Ang Cao, Idan Azuri, Iurii Makarov, Eric-Tuan Le, Antoine Toisoul, David Novotny, Oran Gafni, Natalia Neverova, Andrea Vedaldi

    Abstract: We introduce Meta 3D Gen (3DGen), a new state-of-the-art, fast pipeline for text-to-3D asset generation. 3DGen offers 3D asset creation with high prompt fidelity and high-quality 3D shapes and textures in under a minute. It supports physically-based rendering (PBR), necessary for 3D asset relighting in real-world applications. Additionally, 3DGen supports generative retexturing of previously gener… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  15. arXiv:2405.16248  [pdf

    eess.IV cs.CV cs.LG q-bio.QM

    Combining Radiomics and Machine Learning Approaches for Objective ASD Diagnosis: Verifying White Matter Associations with ASD

    Authors: Junlin Song, Yuzhuo Chen, Yuan Yao, Zetong Chen, Renhao Guo, Lida Yang, Xinyi Sui, Qihang Wang, Xijiao Li, Aihua Cao, Wei Li

    Abstract: Autism Spectrum Disorder is a condition characterized by a typical brain development leading to impairments in social skills, communication abilities, repetitive behaviors, and sensory processing. There have been many studies combining brain MRI images with machine learning algorithms to achieve objective diagnosis of autism, but the correlation between white matter and autism has not been fully u… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  16. arXiv:2405.09150  [pdf, other

    cs.CV

    Curriculum Dataset Distillation

    Authors: Zhiheng Ma, Anjia Cao, Funing Yang, Xing Wei

    Abstract: Most dataset distillation methods struggle to accommodate large-scale datasets due to their substantial computational and memory requirements. In this paper, we present a curriculum-based dataset distillation framework designed to harmonize scalability with efficiency. This framework strategically distills synthetic images, adhering to a curriculum that transitions from simple to complex. By incor… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  17. arXiv:2404.19760  [pdf, other

    cs.CV cs.GR

    Lightplane: Highly-Scalable Components for Neural 3D Fields

    Authors: Ang Cao, Justin Johnson, Andrea Vedaldi, David Novotny

    Abstract: Contemporary 3D research, particularly in reconstruction and generation, heavily relies on 2D images for inputs or supervision. However, current designs for these 2D-3D mapping are memory-intensive, posing a significant bottleneck for existing methods and hindering new applications. In response, we propose a pair of highly scalable components for 3D neural fields: Lightplane Render and Splatter, w… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Project Page: https://lightplane.github.io/ Code: https://github.com/facebookresearch/lightplane

  18. arXiv:2404.10279  [pdf, other

    cs.CV

    EucliDreamer: Fast and High-Quality Texturing for 3D Models with Depth-Conditioned Stable Diffusion

    Authors: Cindy Le, Congrui Hetang, Chendi Lin, Ang Cao, Yihui He

    Abstract: We present EucliDreamer, a simple and effective method to generate textures for 3D models given text prompts and meshes. The texture is parametrized as an implicit function on the 3D surface, which is optimized with the Score Distillation Sampling (SDS) process and differentiable rendering. To generate high-quality textures, we leverage a depth-conditioned Stable Diffusion model guided by the dept… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Short version of arXiv:2311.15573

  19. arXiv:2404.02928  [pdf, other

    cs.CR cs.AI

    Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models

    Authors: Jiachen Ma, Yijiang Li, Zhiqing Xiao, Anda Cao, Jie Zhang, Chao Ye, Junbo Zhao

    Abstract: Text-to-image (T2I) models can be maliciously used to generate harmful content such as sexually explicit, unfaithful, and misleading or Not-Safe-for-Work (NSFW) images. Previous attacks largely depend on the availability of the diffusion model or involve a lengthy optimization process. In this work, we investigate a more practical and universal attack that does not require the presence of a target… ▽ More

    Submitted 26 May, 2025; v1 submitted 2 April, 2024; originally announced April 2024.

    Journal ref: NAACL2025

  20. arXiv:2312.17142  [pdf, other

    cs.CV cs.GR

    DreamGaussian4D: Generative 4D Gaussian Splatting

    Authors: Jiawei Ren, Liang Pan, Jiaxiang Tang, Chi Zhang, Ang Cao, Gang Zeng, Ziwei Liu

    Abstract: 4D content generation has achieved remarkable progress recently. However, existing methods suffer from long optimization times, a lack of motion controllability, and a low quality of details. In this paper, we introduce DreamGaussian4D (DG4D), an efficient 4D generation framework that builds on Gaussian Splatting (GS). Our key insight is that combining explicit modeling of spatial transformations… ▽ More

    Submitted 10 June, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: Technical report. Project page is at https://jiawei-ren.github.io/projects/dreamgaussian4d Code is at https://github.com/jiawei-ren/dreamgaussian4d

  21. arXiv:2312.08267  [pdf, other

    eess.IV cs.CV q-bio.QM

    TABSurfer: a Hybrid Deep Learning Architecture for Subcortical Segmentation

    Authors: Aaron Cao, Vishwanatha M. Rao, Kejia Liu, Xinru Liu, Andrew F. Laine, Jia Guo

    Abstract: Subcortical segmentation remains challenging despite its important applications in quantitative structural analysis of brain MRI scans. The most accurate method, manual segmentation, is highly labor intensive, so automated tools like FreeSurfer have been adopted to handle this task. However, these traditional pipelines are slow and inefficient for processing large datasets. In this study, we propo… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: 5 pages, 3 figures, 2 tables

  22. arXiv:2312.05279  [pdf

    eess.IV cs.CV

    Quantitative perfusion maps using a novelty spatiotemporal convolutional neural network

    Authors: Anbo Cao, Pin-Yu Le, Zhonghui Qie, Haseeb Hassan, Yingwei Guo, Asim Zaman, Jiaxi Lu, Xueqiang Zeng, Huihui Yang, Xiaoqiang Miao, Taiyu Han, Guangtao Huang, Yan Kang, Yu Luo, Jia Guo

    Abstract: Dynamic susceptibility contrast magnetic resonance imaging (DSC-MRI) is widely used to evaluate acute ischemic stroke to distinguish salvageable tissue and infarct core. For this purpose, traditional methods employ deconvolution techniques, like singular value decomposition, which are known to be vulnerable to noise, potentially distorting the derived perfusion parameters. However, deep learning t… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  23. arXiv:2312.02158  [pdf, other

    cs.CV cs.AI

    PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness

    Authors: Anh-Quan Cao, Angela Dai, Raoul de Charette

    Abstract: We propose the task of Panoptic Scene Completion (PSC) which extends the recently popular Semantic Scene Completion (SSC) task with instance-level information to produce a richer understanding of the 3D scene. Our PSC proposal utilizes a hybrid mask-based technique on the non-empty voxels from sparse multi-scale completions. Whereas the SSC literature overlooks uncertainty which is critical for ro… ▽ More

    Submitted 25 May, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 Oral - Best paper award candidate. Project page: https://astra-vision.github.io/PaSCo

  24. arXiv:2311.15573  [pdf, other

    cs.CV cs.GR

    EucliDreamer: Fast and High-Quality Texturing for 3D Models with Stable Diffusion Depth

    Authors: Cindy Le, Congrui Hetang, Chendi Lin, Ang Cao, Yihui He

    Abstract: This paper presents a novel method to generate textures for 3D models given text prompts and 3D meshes. Additional depth information is taken into account to perform the Score Distillation Sampling (SDS) process with depth conditional Stable Diffusion. We ran our model over the open-source dataset Objaverse and conducted a user study to compare the results with those of various 3D texturing method… ▽ More

    Submitted 13 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  25. arXiv:2310.01037  [pdf, other

    physics.geo-ph cs.LG

    SeisT: A foundational deep learning model for earthquake monitoring tasks

    Authors: Sen Li, Xu Yang, Anye Cao, Changbin Wang, Yaoqi Liu, Yapeng Liu, Qiang Niu

    Abstract: Seismograms, the fundamental seismic records, have revolutionized earthquake research and monitoring. Recent advancements in deep learning have further enhanced seismic signal processing, leading to even more precise and effective earthquake monitoring capabilities. This paper introduces a foundational deep learning model, the Seismogram Transformer (SeisT), designed for a variety of earthquake mo… ▽ More

    Submitted 26 December, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

    Journal ref: IEEE Transactions on Geoscience and Remote Sensing, 2024

  26. arXiv:2305.01151  [pdf, ps, other

    cs.LG

    Early Classifying Multimodal Sequences

    Authors: Alexander Cao, Jean Utke, Diego Klabjan

    Abstract: Often pieces of information are received sequentially over time. When did one collect enough such pieces to classify? Trading wait time for decision certainty leads to early classification problems that have recently gained attention as a means of adapting classification to more dynamic environments. However, so far results have been limited to unimodal sequences. In this pilot study, we expand in… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Comments: 7 pages, 5 figures

  27. arXiv:2304.03463  [pdf, ps, other

    cs.LG

    A Policy for Early Sequence Classification

    Authors: Alexander Cao, Jean Utke, Diego Klabjan

    Abstract: Sequences are often not received in their entirety at once, but instead, received incrementally over time, element by element. Early predictions yielding a higher benefit, one aims to classify a sequence as accurately as possible, as soon as possible, without having to wait for the last element. For this early sequence classification, we introduce our novel classifier-induced stopping. While previ… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: 12 pages, 6 figures

  28. arXiv:2303.11989  [pdf, other

    cs.CV

    Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models

    Authors: Lukas Höllein, Ang Cao, Andrew Owens, Justin Johnson, Matthias Nießner

    Abstract: We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input. To this end, we leverage pre-trained 2D text-to-image models to synthesize a sequence of images from different poses. In order to lift these outputs into a consistent 3D scene representation, we combine monocular depth estimation with a text-conditioned inpainting model. The core idea of… ▽ More

    Submitted 10 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: Accepted to ICCV 2023 (Oral) video: https://youtu.be/fjRnFL91EZc project page: https://lukashoel.github.io/text-to-room/ code: https://github.com/lukasHoel/text2room

  29. arXiv:2301.09632  [pdf, other

    cs.CV

    HexPlane: A Fast Representation for Dynamic Scenes

    Authors: Ang Cao, Justin Johnson

    Abstract: Modeling and re-rendering dynamic 3D scenes is a challenging task in 3D vision. Prior approaches build on NeRF and rely on implicit representations. This is slow since it requires many MLP evaluations, constraining real-world applications. We show that dynamic 3D scenes can be explicitly represented by six planes of learned features, leading to an elegant solution we call HexPlane. A HexPlane comp… ▽ More

    Submitted 27 March, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

    Comments: CVPR 2023, Camera Ready Project page: https://caoang327.github.io/HexPlane

  30. arXiv:2212.02501  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields

    Authors: Anh-Quan Cao, Raoul de Charette

    Abstract: 3D reconstruction from a single 2D image was extensively covered in the literature but relies on depth supervision at training time, which limits its applicability. To relax the dependence to depth we propose SceneRF, a self-supervised monocular scene reconstruction method using only posed image sequences for training. Fueled by the recent progress in neural radiance fields (NeRF) we optimize a ra… ▽ More

    Submitted 24 August, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

    Comments: ICCV 2023. Project page: https://astra-vision.github.io/SceneRF

  31. arXiv:2210.01784  [pdf, other

    cs.CV

    COARSE3D: Class-Prototypes for Contrastive Learning in Weakly-Supervised 3D Point Cloud Segmentation

    Authors: Rong Li, Anh-Quan Cao, Raoul de Charette

    Abstract: Annotation of large-scale 3D data is notoriously cumbersome and costly. As an alternative, weakly-supervised learning alleviates such a need by reducing the annotation by several order of magnitudes. We propose COARSE3D, a novel architecture-agnostic contrastive learning strategy for 3D segmentation. Since contrastive learning requires rich and diverse examples as keys and anchors, we leverage a p… ▽ More

    Submitted 7 October, 2022; v1 submitted 4 October, 2022; originally announced October 2022.

  32. arXiv:2206.08355  [pdf, other

    cs.CV

    FWD: Real-time Novel View Synthesis with Forward Warping and Depth

    Authors: Ang Cao, Chris Rockwell, Justin Johnson

    Abstract: Novel view synthesis (NVS) is a challenging task requiring systems to generate photorealistic images of scenes from new viewpoints, where both quality and speed are important for applications. Previous image-based rendering (IBR) methods are fast, but have poor quality when input views are sparse. Recent Neural Radiance Fields (NeRF) and generalizable variants give impressive results but are not r… ▽ More

    Submitted 5 August, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: CVPR 2022. Project website https://caoang327.github.io/FWD/

  33. arXiv:2201.02923  [pdf, ps, other

    cs.LG

    Open-Set Recognition of Breast Cancer Treatments

    Authors: Alexander Cao, Diego Klabjan, Yuan Luo

    Abstract: Open-set recognition generalizes a classification task by classifying test samples as one of the known classes from training or "unknown." As novel cancer drug cocktails with improved treatment are continually discovered, predicting cancer treatments can naturally be formulated in terms of an open-set recognition problem. Drawbacks, due to modeling unknown samples during training, arise from strai… ▽ More

    Submitted 8 January, 2022; originally announced January 2022.

    Comments: 22 pages, 9 figures and 9 tables

  34. arXiv:2112.00726  [pdf, other

    cs.CV cs.AI cs.RO

    MonoScene: Monocular 3D Semantic Scene Completion

    Authors: Anh-Quan Cao, Raoul de Charette

    Abstract: MonoScene proposes a 3D Semantic Scene Completion (SSC) framework, where the dense geometry and semantics of a scene are inferred from a single monocular RGB image. Different from the SSC literature, relying on 2.5 or 3D input, we solve the complex problem of 2D to 3D scene reconstruction while jointly inferring its semantics. Our framework relies on successive 2D and 3D UNets bridged by a novel 2… ▽ More

    Submitted 29 March, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: Accepted at CVPR 2022. Project page: https://cv-rits.github.io/MonoScene/

  35. arXiv:2110.01269  [pdf, other

    cs.CV

    PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds

    Authors: Anh-Quan Cao, Gilles Puy, Alexandre Boulch, Renaud Marlet

    Abstract: Rigid registration of point clouds with partial overlaps is a longstanding problem usually solved in two steps: (a) finding correspondences between the point clouds; (b) filtering these correspondences to keep only the most reliable ones to estimate the transformation. Recently, several deep nets have been proposed to solve these steps jointly. We built upon these works and propose PCAM: a neural… ▽ More

    Submitted 4 October, 2021; originally announced October 2021.

    Comments: ICCV21

  36. arXiv:2106.13933  [pdf, other

    cs.CV

    Inverting and Understanding Object Detectors

    Authors: Ang Cao, Justin Johnson

    Abstract: As a core problem in computer vision, the performance of object detection has improved drastically in the past few years. Despite their impressive performance, object detectors suffer from a lack of interpretability. Visualization techniques have been developed and widely applied to introspect the decisions made by other kinds of deep learning models; however, visualizing object detectors has been… ▽ More

    Submitted 25 June, 2021; originally announced June 2021.

    Comments: Preprints

  37. arXiv:2006.02003  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Open-Set Recognition with Gaussian Mixture Variational Autoencoders

    Authors: Alexander Cao, Yuan Luo, Diego Klabjan

    Abstract: In inference, open-set classification is to either classify a sample into a known class from training or reject it as an unknown class. Existing deep open-set classifiers train explicit closed-set classifiers, in some cases disjointly utilizing reconstruction, which we find dilutes the latent representation's ability to distinguish unknown classes. In contrast, we train our model to cooperatively… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

    Comments: 12 pages including 8 figures and 4 tables, plus 6 pages of supplementary material

  38. arXiv:2005.05389  [pdf

    cs.DL

    Citations versus expert opinions: Citation analysis of Featured Reviews of the American Mathematical Society

    Authors: Lawrence Smolinsky, Daniel S. Sage, Aaron J. Lercher, Aaron Cao

    Abstract: Peer review and citation metrics are two means of gauging the value of scientific research, but the lack of publicly available peer review data makes the comparison of these methods difficult. Mathematics can serve as a useful laboratory for considering these questions because as an exact science, there is a narrow range of reasons for citations. In mathematics, virtually all published articles ar… ▽ More

    Submitted 16 December, 2020; v1 submitted 11 May, 2020; originally announced May 2020.

    Comments: 21 pages, 3 figures, 4 tables

  39. arXiv:1912.05590  [pdf, other

    cs.NI cs.LG

    Peek Inside the Closed World: Evaluating Autoencoder-Based Detection of DDoS to Cloud

    Authors: Hang Guo, Xun Fan, Anh Cao, Geoff Outhred, John Heidemann

    Abstract: Machine-learning-based anomaly detection (ML-based AD) has been successful at detecting DDoS events in the lab. However published evaluations of ML-based AD have used only limited data and provided minimal insight into why it works. To address limited evaluation against real-world data, we apply autoencoder, an existing ML-AD model, to 57 DDoS attack events captured at 5 cloud IPs from a major clo… ▽ More

    Submitted 20 June, 2020; v1 submitted 11 December, 2019; originally announced December 2019.

  40. arXiv:1908.03237  [pdf, other

    cs.CV

    Image-based marker tracking and registration for intraoperative 3D image-guided interventions using augmented reality

    Authors: Andong Cao, Ali Dhanaliwala, Jianbo Shi, Terence Gade, Brian Park

    Abstract: Augmented reality has the potential to improve operating room workflow by allowing physicians to "see" inside a patient through the projection of imaging directly onto the surgical field. For this to be useful the acquired imaging must be quickly and accurately registered with patient and the registration must be maintained. Here we describe a method for projecting a CT scan with Microsoft Hololen… ▽ More

    Submitted 8 August, 2019; originally announced August 2019.

  41. arXiv:1907.06143  [pdf, other

    cs.LG cs.CV

    Neural Embedding for Physical Manipulations

    Authors: Lingzhi Zhang, Andong Cao, Rui Li, Jianbo Shi

    Abstract: In common real-world robotic operations, action and state spaces can be vast and sometimes unknown, and observations are often relatively sparse. How do we learn the full topology of action and state spaces when given only few and sparse observations? Inspired by the properties of grid cells in mammalian brains, we build a generative model that enforces a normalized pairwise distance constraint be… ▽ More

    Submitted 13 July, 2019; originally announced July 2019.