Skip to main content

Showing 1–50 of 149 results for author: Woo, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.21559  [pdf, other

    cs.CV cs.AI cs.CL

    Black-Box Visual Prompt Engineering for Mitigating Object Hallucination in Large Vision Language Models

    Authors: Sangmin Woo, Kang Zhou, Yun Zhou, Shuai Wang, Sheng Guan, Haibo Ding, Lin Lee Cheong

    Abstract: Large Vision Language Models (LVLMs) often suffer from object hallucination, which undermines their reliability. Surprisingly, we find that simple object-based visual prompting -- overlaying visual cues (e.g., bounding box, circle) on images -- can significantly mitigate such hallucination; however, different visual prompts (VPs) vary in effectiveness. To address this, we propose Black-Box Visual… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: NAACL 2025

  2. Fairness and Robustness in Machine Unlearning

    Authors: Khoa Tran, Simon S. Woo

    Abstract: Machine unlearning poses the challenge of ``how to eliminate the influence of specific data from a pretrained model'' in regard to privacy concerns. While prior research on approximated unlearning has demonstrated accuracy and efficiency in time complexity, we claim that it falls short of achieving exact unlearning, and we are the first to focus on fairness and robustness in machine unlearning alg… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: 5 pages

  3. Saliency-Aware Diffusion Reconstruction for Effective Invisible Watermark Removal

    Authors: Inzamamul Alam, Md Tanvir Islam, Simon S. Woo

    Abstract: As digital content becomes increasingly ubiquitous, the need for robust watermark removal techniques has grown due to the inadequacy of existing embedding techniques, which lack robustness. This paper introduces a novel Saliency-Aware Diffusion Reconstruction (SADRE) framework for watermark elimination on the web, combining adaptive noise injection, region-specific perturbations, and advanced diff… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Accepted at The Web Conference 2025

    ACM Class: I.4.5; I.5.4

  4. Towards Safe Synthetic Image Generation On the Web: A Multimodal Robust NSFW Defense and Million Scale Dataset

    Authors: Muhammad Shahid Muneer, Simon S. Woo

    Abstract: In the past years, we have witnessed the remarkable success of Text-to-Image (T2I) models and their widespread use on the web. Extensive research in making T2I models produce hyper-realistic images has led to new concerns, such as generating Not-Safe-For-Work (NSFW) web content and polluting the web society. To help prevent misuse of T2I models and create a safer web environment for users features… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: Short Paper The Web Conference

  5. arXiv:2504.09199  [pdf, other

    cs.CR

    Illusion Worlds: Deceptive UI Attacks in Social VR

    Authors: Junhee Lee, Hwanjo Heo, Seungwon Woo, Minseok Kim, Jongseop Kim, Jinwoo Kim

    Abstract: Social Virtual Reality (VR) platforms have surged in popularity, yet their security risks remain underexplored. This paper presents four novel UI attacks that covertly manipulate users into performing harmful actions through deceptive virtual content. Implemented on VRChat and validated in an IRB-approved study with 30 participants, these attacks demonstrate how deceptive elements can mislead user… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

    Comments: To appear in the IEEE VR 2025 Workshop Poster Proceedings

  6. arXiv:2504.05806  [pdf, other

    cs.AI

    Meta-Continual Learning of Neural Fields

    Authors: Seungyoon Woo, Junhyeog Yun, Gunhee Kim

    Abstract: Neural Fields (NF) have gained prominence as a versatile framework for complex data representation. This work unveils a new problem setting termed \emph{Meta-Continual Learning of Neural Fields} (MCL-NF) and introduces a novel strategy that employs a modular architecture combined with optimization-based meta-learning. Focused on overcoming the limitations of existing methods for continual learning… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  7. arXiv:2503.21261  [pdf, other

    cs.LG

    HOT: Hadamard-based Optimized Training

    Authors: Seonggon Kim, Juncheol Shin, Seung-taek Woo, Eunhyeok Park

    Abstract: It has become increasingly important to optimize backpropagation to reduce memory usage and computational overhead. Achieving this goal is highly challenging, as multiple objectives must be considered jointly while maintaining training quality. In this paper, we focus on matrix multiplication, which accounts for the largest portion of training costs, and analyze its backpropagation in detail to id… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: Accepted in CVPR 2025

  8. arXiv:2503.16433  [pdf, other

    cs.HC cs.CL cs.MA

    The Application of MATEC (Multi-AI Agent Team Care) Framework in Sepsis Care

    Authors: Andrew Cho, Jason M. Woo, Brian Shi, Aishwaryaa Udeshi, Jonathan S. H. Woo

    Abstract: Under-resourced or rural hospitals have limited access to medical specialists and healthcare professionals, which can negatively impact patient outcomes in sepsis. To address this gap, we developed the MATEC (Multi-AI Agent Team Care) framework, which integrates a team of specialized AI agents for sepsis care. The sepsis AI agent team includes five doctor agents, four health professional agents, a… ▽ More

    Submitted 9 February, 2025; originally announced March 2025.

    Comments: 15 pages

  9. arXiv:2503.10003  [pdf, other

    cs.AI cs.LG

    A New Benchmark for Few-Shot Class-Incremental Learning: Redefining the Upper Bound

    Authors: Shiwon Kim, Dongjun Hwang, Sungwon Woo, Rita Singh

    Abstract: Class-incremental learning (CIL) aims to continuously adapt to emerging classes while retaining knowledge of previously learned ones. Few-shot class-incremental learning (FSCIL) presents an even greater challenge which requires the model to learn incremental classes with only a limited number of samples. In conventional CIL, joint training is widely considered the upper bound, serving as both a be… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  10. arXiv:2503.01905  [pdf, other

    cs.LG cs.AI

    PaCA: Partial Connection Adaptation for Efficient Fine-Tuning

    Authors: Sunghyeon Woo, Sol Namkung, Sunwoo Lee, Inho Jeong, Beomseok Kim, Dongsuk Jeon

    Abstract: Prior parameter-efficient fine-tuning (PEFT) algorithms reduce memory usage and computational costs of fine-tuning large neural network models by training only a few additional adapter parameters, rather than the entire model. However, the reduction in computational costs due to PEFT does not necessarily translate to a reduction in training time; although the computational costs of the adapter lay… ▽ More

    Submitted 11 March, 2025; v1 submitted 28 February, 2025; originally announced March 2025.

  11. arXiv:2503.00447  [pdf

    cs.ET

    Single-Ferroelectric Memcapacitor-Based Time-Domain Content-Addressable Memory for Highly Precise Distance Function Computation

    Authors: Minjeong Ryu, Jae Seung Woo, Yeonwoo Kim, Woo Young Choi

    Abstract: Single ferroelectric memcapacitor-based time-domain (TD) content-addressable memory (CAM) is proposed and experimentally demonstrated for high reliability and density. The proposed TD CAM features the symmetric capacitance-voltage characteristics of a ferroelectric memcapacitor with a gated p-i-n diode structure. This CAM performs search operations based on the variable capacitance of cells. The p… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

  12. arXiv:2502.16923  [pdf, other

    cs.CL cs.AI

    A Systematic Survey of Automatic Prompt Optimization Techniques

    Authors: Kiran Ramnath, Kang Zhou, Sheng Guan, Soumya Smruti Mishra, Xuan Qi, Zhengyuan Shen, Shuai Wang, Sangmin Woo, Sullam Jeoung, Yawei Wang, Haozhu Wang, Han Ding, Yuzhe Lu, Zhichao Xu, Yun Zhou, Balasubramaniam Srinivasan, Qiaojing Yan, Yueyan Chen, Haibo Ding, Panpan Xu, Lin Lee Cheong

    Abstract: Since the advent of large language models (LLMs), prompt engineering has been a crucial step for eliciting desired responses for various Natural Language Processing (NLP) tasks. However, prompt engineering remains an impediment for end users due to rapid advances in models, tasks, and associated best practices. To mitigate this, Automatic Prompt Optimization (APO) techniques have recently emerged… ▽ More

    Submitted 2 April, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

    Comments: 8 main pages, 31 total pages, 1 figure

  13. arXiv:2412.12511  [pdf, other

    cs.CV

    Invisible Watermarks: Attacks and Robustness

    Authors: Dongjun Hwang, Sungwon Woo, Tom Gao, Raymond Luo, Sunghwan Baek

    Abstract: As Generative AI continues to become more accessible, the case for robust detection of generated images in order to combat misinformation is stronger than ever. Invisible watermarking methods act as identifiers of generated content, embedding image- and latent-space messages that are robust to many forms of perturbations. The majority of current research investigates full-image attacks against ima… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

    Comments: YouTube link for the presentation: https://www.youtube.com/watch?v=0vwFG1HSrUE

  14. arXiv:2411.15224  [pdf, other

    cs.LG cs.AI

    Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation

    Authors: Seokil Ham, Hee-Seon Kim, Sangmin Woo, Changick Kim

    Abstract: Despite the growing interest in Mamba architecture as a potential replacement for Transformer architecture, parameter-efficient fine-tuning (PEFT) approaches for Mamba remain largely unexplored. In our study, we introduce two key insights-driven strategies for PEFT in Mamba architecture: (1) While state-space models (SSMs) have been regarded as the cornerstone of Mamba architecture, then expected… ▽ More

    Submitted 24 March, 2025; v1 submitted 20 November, 2024; originally announced November 2024.

    Comments: accepted in CVPR 2025

  15. arXiv:2410.22735  [pdf, other

    cs.LG

    MIXAD: Memory-Induced Explainable Time Series Anomaly Detection

    Authors: Minha Kim, Kishor Kumar Bhaumik, Amin Ahsan Ali, Simon S. Woo

    Abstract: For modern industrial applications, accurately detecting and diagnosing anomalies in multivariate time series data is essential. Despite such need, most state-of-the-art methods often prioritize detection performance over model interpretability. Addressing this gap, we introduce MIXAD (Memory-Induced Explainable Time Series Anomaly Detection), a model designed for interpretable anomaly detection.… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

    Comments: ICPR 2024 (oral paper)

  16. arXiv:2410.19341  [pdf, other

    cs.RO cs.CV

    Context-Based Visual-Language Place Recognition

    Authors: Soojin Woo, Seong-Woo Kim

    Abstract: In vision-based robot localization and SLAM, Visual Place Recognition (VPR) is essential. This paper addresses the problem of VPR, which involves accurately recognizing the location corresponding to a given query image. A popular approach to vision-based place recognition relies on low-level visual features. Despite significant progress in recent years, place recognition based on low-level visual… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

  17. arXiv:2410.15589  [pdf

    cs.LG

    SSMT: Few-Shot Traffic Forecasting with Single Source Meta-Transfer

    Authors: Kishor Kumar Bhaumik, Minha Kim, Fahim Faisal Niloy, Amin Ahsan Ali, Simon S. Woo

    Abstract: Traffic forecasting in Intelligent Transportation Systems (ITS) is vital for intelligent traffic prediction. Yet, ITS often relies on data from traffic sensors or vehicle devices, where certain cities might not have all those smart devices or enabling infrastructures. Also, recent studies have employed meta-learning to generalize spatial-temporal traffic networks, utilizing data from multiple citi… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

    Comments: ICPR 2024

  18. arXiv:2410.09831  [pdf, other

    cs.CV cs.AI cs.CE

    LoLI-Street: Benchmarking Low-Light Image Enhancement and Beyond

    Authors: Md Tanvir Islam, Inzamamul Alam, Simon S. Woo, Saeed Anwar, IK Hyun Lee, Khan Muhammad

    Abstract: Low-light image enhancement (LLIE) is essential for numerous computer vision tasks, including object detection, tracking, segmentation, and scene understanding. Despite substantial research on improving low-quality images captured in underexposed conditions, clear vision remains critical for autonomous vehicles, which often struggle with low-light scenarios, signifying the need for continuous rese… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

    Comments: Accepted by the Asian Conference on Computer Vision (ACCV 2024)

  19. arXiv:2410.09529  [pdf, other

    cs.CV cs.AI

    Preserving Old Memories in Vivid Detail: Human-Interactive Photo Restoration Framework

    Authors: Seung-Yeon Back, Geonho Son, Dahye Jeong, Eunil Park, Simon S. Woo

    Abstract: Photo restoration technology enables preserving visual memories in photographs. However, physical prints are vulnerable to various forms of deterioration, ranging from physical damage to loss of image quality, etc. While restoration by human experts can improve the quality of outcomes, it often comes at a high price in terms of cost and time for restoration. In this work, we present the AI-based p… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

  20. arXiv:2410.04081  [pdf, other

    cs.CV cs.AI eess.IV

    Epsilon-VAE: Denoising as Visual Decoding

    Authors: Long Zhao, Sanghyun Woo, Ziyu Wan, Yandong Li, Han Zhang, Boqing Gong, Hartwig Adam, Xuhui Jia, Ting Liu

    Abstract: In generative modeling, tokenization simplifies complex data into compact, structured representations, creating a more efficient, learnable space. For high-dimensional visual data, it reduces redundancy and emphasizes key features for high-quality generation. Current visual tokenization methods rely on a traditional autoencoder framework, where the encoder compresses data into latent representatio… ▽ More

    Submitted 24 February, 2025; v1 submitted 5 October, 2024; originally announced October 2024.

    Comments: Preprint. v2: added comparisons to SD-VAE and more visual results

  21. arXiv:2409.10027  [pdf, other

    cs.RO cs.AI

    E2Map: Experience-and-Emotion Map for Self-Reflective Robot Navigation with Language Models

    Authors: Chan Kim, Keonwoo Kim, Mintaek Oh, Hanbi Baek, Jiyang Lee, Donghwi Jung, Soojin Woo, Younkyung Woo, John Tucker, Roya Firoozi, Seung-Woo Seo, Mac Schwager, Seong-Woo Kim

    Abstract: Large language models (LLMs) have shown significant potential in guiding embodied agents to execute language instructions across a range of tasks, including robotic manipulation and navigation. However, existing methods are primarily designed for static environments and do not leverage the agent's own experiences to refine its initial plans. Given that real-world environments are inherently stocha… ▽ More

    Submitted 2 February, 2025; v1 submitted 16 September, 2024; originally announced September 2024.

    Comments: 19 pages, 28 figures. Project page: https://e2map.github.io. Accepted to ICRA 2025

  22. UGAD: Universal Generative AI Detector utilizing Frequency Fingerprints

    Authors: Inzamamul Alam, Muhammad Shahid Muneer, Simon S. Woo

    Abstract: In the wake of a fabricated explosion image at the Pentagon, an ability to discern real images from fake counterparts has never been more critical. Our study introduces a novel multi-modal approach to detect AI-generated images amidst the proliferation of new generation methods such as Diffusion models. Our method, UGAD, encompasses three key detection steps: First, we transform the RGB images int… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  23. arXiv:2409.05346  [pdf, other

    cs.LG cs.AI

    GDFlow: Anomaly Detection with NCDE-based Normalizing Flow for Advanced Driver Assistance System

    Authors: Kangjun Lee, Minha Kim, Youngho Jun, Simon S. Woo

    Abstract: For electric vehicles, the Adaptive Cruise Control (ACC) in Advanced Driver Assistance Systems (ADAS) is designed to assist braking based on driving conditions, road inclines, predefined deceleration strengths, and user braking patterns. However, the driving data collected during the development of ADAS are generally limited and lack diversity. This deficiency leads to late or aggressive braking f… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  24. arXiv:2409.01201  [pdf, other

    eess.AS cs.AI cs.SD

    EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance

    Authors: Jaeyeon Kim, Minjeon Jeon, Jaeyoon Jung, Sang Hoon Woo, Jinjoo Lee

    Abstract: In this work, we aim to analyze and optimize the EnCLAP framework, a state-of-the-art model in automated audio captioning. We investigate the impact of modifying the acoustic encoder components, explore pretraining with different dataset scales, and study the effectiveness of a reranking scheme. Through extensive experimentation and quantitative analysis of generated captions, we develop EnCLAP++,… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted to DCASE2024 Workshop

  25. arXiv:2409.01160  [pdf, ps, other

    eess.AS cs.AI cs.SD

    Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio Captioning

    Authors: Jaeyeon Kim, Jaeyoon Jung, Minjeong Jeon, Sang Hoon Woo, Jinjoo Lee

    Abstract: In this technical report, we describe our submission to DCASE2024 Challenge Task6 (Automated Audio Captioning) and Task8 (Language-based Audio Retrieval). We develop our approach building upon the EnCLAP audio captioning framework and optimizing it for Task6 of the challenge. Notably, we outline the changes in the underlying components and the incorporation of the reranking process. Additionally,… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: DCASE2024 Challenge Technical Report. Ranked 2nd in Task 6 Automated Audio Captioning

  26. arXiv:2408.17066  [pdf, other

    cs.RO

    Non-verbal Interaction and Interface with a Quadruped Robot using Body and Hand Gestures: Design and User Experience Evaluation

    Authors: Soohyun Shin, Trevor Evetts, Hunter Saylor, Hyunji Kim, Soojin Woo, Wonhwha Rhee, Seong-Woo Kim

    Abstract: In recent years, quadruped robots have attracted significant attention due to their practical advantages in maneuverability, particularly when navigating rough terrain and climbing stairs. As these robots become more integrated into various industries, including construction and healthcare, researchers have increasingly focused on developing intuitive interaction methods such as speech and gesture… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 16 pages

  27. Blind-Match: Efficient Homomorphic Encryption-Based 1:N Matching for Privacy-Preserving Biometric Identification

    Authors: Hyunmin Choi, Jiwon Kim, Chiyoung Song, Simon S. Woo, Hyoungshick Kim

    Abstract: We present Blind-Match, a novel biometric identification system that leverages homomorphic encryption (HE) for efficient and privacy-preserving 1:N matching. Blind-Match introduces a HE-optimized cosine similarity computation method, where the key idea is to divide the feature vector into smaller parts for processing rather than computing the entire vector at once. By optimizing the number of thes… ▽ More

    Submitted 13 October, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: Accepted to CIKM 2024 (Applied Research Track)

  28. arXiv:2407.19102  [pdf, other

    cs.CC

    The Computational Complexity of Factored Graphs

    Authors: Shreya Gupta, Boyang Huang, Russell Impagliazzo, Stanley Woo, Christopher Ye

    Abstract: While graphs and abstract data structures can be large and complex, practical instances are often regular or highly structured. If the instance has sufficient structure, we might hope to compress the object into a more succinct representation. An efficient algorithm (with respect to the compressed input size) could then lead to more efficient computations than algorithms taking the explicit, uncom… ▽ More

    Submitted 28 November, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

    Comments: To appear in ITCS 2025

  29. arXiv:2407.15554  [pdf, other

    cs.CV

    Decomposition of Neural Discrete Representations for Large-Scale 3D Mapping

    Authors: Minseong Park, Suhan Woo, Euntai Kim

    Abstract: Learning efficient representations of local features is a key challenge in feature volume-based 3D neural mapping, especially in large-scale environments. In this paper, we introduce Decomposition-based Neural Mapping (DNMap), a storage-efficient large-scale 3D mapping method that employs a discrete representation based on a decomposition strategy. This decomposition strategy aims to efficiently c… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  30. arXiv:2407.11714  [pdf, other

    cs.CV

    Improving Unsupervised Video Object Segmentation via Fake Flow Generation

    Authors: Suhwan Cho, Minhyeok Lee, Jungho Lee, Donghyeong Kim, Seunghoon Lee, Sungmin Woo, Sangyoun Lee

    Abstract: Unsupervised video object segmentation (VOS), also known as video salient object detection, aims to detect the most prominent object in a video at the pixel level. Recently, two-stream approaches that leverage both RGB images and optical flow maps have gained significant attention. However, the limited amount of training data remains a substantial challenge. In this study, we propose a novel data… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  31. arXiv:2407.10784  [pdf, other

    cs.LG cs.AI stat.ML

    AdapTable: Test-Time Adaptation for Tabular Data via Shift-Aware Uncertainty Calibrator and Label Distribution Handler

    Authors: Changhun Kim, Taewon Kim, Seungyeon Woo, June Yong Yang, Eunho Yang

    Abstract: In real-world scenarios, tabular data often suffer from distribution shifts that threaten the performance of machine learning models. Despite its prevalence and importance, handling distribution shifts in the tabular domain remains underexplored due to the inherent challenges within the tabular data itself. In this sense, test-time adaptation (TTA) offers a promising solution by adapting models to… ▽ More

    Submitted 12 February, 2025; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: NeurIPS Workshop on Table Representation Learning (NeurIPSW-TRL), 2024

  32. arXiv:2407.10399  [pdf, other

    cs.CV

    Exploring the Impact of Moire Pattern on Deepfake Detectors

    Authors: Razaib Tariq, Shahroz Tariq, Simon S. Woo

    Abstract: Deepfake detection is critical in mitigating the societal threats posed by manipulated videos. While various algorithms have been developed for this purpose, challenges arise when detectors operate externally, such as on smartphones, when users take a photo of deepfake images and upload on the Internet. One significant challenge in such scenarios is the presence of Moiré patterns, which degrade im… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 7 page, 4 figures, 1 table, Accepted for publication in IEEE International Conference on Image Processing (ICIP 2024)

  33. arXiv:2407.10277  [pdf, other

    cs.CV cs.AI cs.LG

    Disrupting Diffusion-based Inpainters with Semantic Digression

    Authors: Geonho Son, Juhun Lee, Simon S. Woo

    Abstract: The fabrication of visual misinformation on the web and social media has increased exponentially with the advent of foundational text-to-image diffusion models. Namely, Stable Diffusion inpainters allow the synthesis of maliciously inpainted images of personal and private figures, and copyrighted contents, also known as deepfakes. To combat such generations, a disruption framework, namely Photogua… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 16 pages, 13 figures, IJCAI 2024

  34. arXiv:2407.09303  [pdf, other

    cs.CV

    ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion

    Authors: Sungmin Woo, Wonjoon Lee, Woo Jin Kim, Dogyoon Lee, Sangyoun Lee

    Abstract: Self-supervised multi-frame monocular depth estimation relies on the geometric consistency between successive frames under the assumption of a static scene. However, the presence of moving objects in dynamic scenes introduces inevitable inconsistencies, causing misaligned multi-frame feature matching and misleading self-supervision during training. In this paper, we propose a novel framework calle… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024. Project Page: https://sungmin-woo.github.io/prodepth/

  35. arXiv:2407.01073  [pdf, other

    cs.RO

    No More Potentially Dynamic Objects: Static Point Cloud Map Generation based on 3D Object Detection and Ground Projection

    Authors: Soojin Woo, Donghwi Jung, Seong-Woo Kim

    Abstract: In this paper, we propose an algorithm to generate a static point cloud map based on LiDAR point cloud data. Our proposed pipeline detects dynamic objects using 3D object detectors and projects points of dynamic objects onto the ground. Typically, point cloud data acquired in real-time serves as a snapshot of the surrounding areas containing both static objects and dynamic objects. The static obje… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  36. arXiv:2406.16860  [pdf, other

    cs.CV

    Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

    Authors: Shengbang Tong, Ellis Brown, Penghao Wu, Sanghyun Woo, Manoj Middepogu, Sai Charitha Akula, Jihan Yang, Shusheng Yang, Adithya Iyer, Xichen Pan, Ziteng Wang, Rob Fergus, Yann LeCun, Saining Xie

    Abstract: We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. While stronger language models can enhance multimodal capabilities, the design choices for vision components are often insufficiently explored and disconnected from visual representation learning research. This gap hinders accurate sensory grounding in real-world scenarios. Our study uses LLMs and… ▽ More

    Submitted 4 December, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: NeurIPS 2024 (Oral). Website at https://cambrian-mllm.github.io

  37. arXiv:2405.18012  [pdf, other

    cs.CV eess.IV

    Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition

    Authors: Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Jinyoung Park, Yooseung Wang, Donguk Kim, Changick Kim

    Abstract: Weakly-Supervised Group Activity Recognition (WSGAR) aims to understand the activity performed together by a group of individuals with the video-level label and without actor-level labels. We propose Flow-Assisted Motion Learning Network (Flaming-Net) for WSGAR, which consists of the motion-aware actor encoder to extract actor features and the two-pathways relation module to infer the interaction… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  38. arXiv:2405.17928  [pdf, other

    cs.CV

    Relational Self-supervised Distillation with Compact Descriptors for Image Copy Detection

    Authors: Juntae Kim, Sungwon Woo, Jongho Nang

    Abstract: Image copy detection is the task of detecting edited copies of any image within a reference database. While previous approaches have shown remarkable progress, the large size of their networks and descriptors remains a disadvantage, complicating their practical application. In this paper, we propose a novel method that achieves competitive performance by using a lightweight network and compact des… ▽ More

    Submitted 9 November, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: WACV 2025

  39. arXiv:2405.17825  [pdf, other

    cs.CV cs.AI

    Diffusion Model Patching via Mixture-of-Prompts

    Authors: Seokil Ham, Sangmin Woo, Jin-Young Kim, Hyojun Go, Byeongjun Park, Changick Kim

    Abstract: We present Diffusion Model Patching (DMP), a simple method to boost the performance of pre-trained diffusion models that have already reached convergence, with a negligible increase in parameters. DMP inserts a small, learnable set of prompts into the model's input space while keeping the original model frozen. The effectiveness of DMP is not merely due to the addition of parameters but stems from… ▽ More

    Submitted 11 December, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: AAAI 2025; Project: https://sangminwoo.github.io/DMP/

  40. arXiv:2405.17821  [pdf, other

    cs.CV cs.AI

    RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language Models

    Authors: Sangmin Woo, Jaehyuk Jang, Donguk Kim, Yubin Choi, Changick Kim

    Abstract: Recent advancements in Large Vision Language Models (LVLMs) have revolutionized how machines understand and generate textual responses based on visual inputs, yet they often produce "hallucinatory" outputs that misinterpret visual information, posing challenges in reliability and trustworthiness. We propose RITUAL, a simple decoding method that reduces hallucinations by leveraging randomly transfo… ▽ More

    Submitted 16 December, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Project: https://sangminwoo.github.io/RITUAL/

  41. arXiv:2405.17820  [pdf, other

    cs.CV cs.AI

    Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models

    Authors: Sangmin Woo, Donguk Kim, Jaehyuk Jang, Yubin Choi, Changick Kim

    Abstract: This study addresses the issue observed in Large Vision Language Models (LVLMs), where excessive attention on a few image tokens, referred to as blind tokens, leads to hallucinatory responses in tasks requiring fine-grained understanding of visual objects. We found that tokens receiving lower attention weights often hold essential information for identifying nuanced object details -- ranging from… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Project page: https://sangminwoo.github.io/AvisC/

  42. arXiv:2405.01934  [pdf, other

    cs.CV cs.AI cs.CR cs.LG

    Impact of Architectural Modifications on Deep Learning Adversarial Robustness

    Authors: Firuz Juraev, Mohammed Abuhamad, Simon S. Woo, George K Thiruvathukal, Tamer Abuhmed

    Abstract: Rapid advancements of deep learning are accelerating adoption in a wide variety of applications, including safety-critical applications such as self-driving vehicles, drones, robots, and surveillance systems. These advancements include applying variations of sophisticated techniques that improve the performance of models. However, such models are not immune to adversarial manipulations, which can… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  43. arXiv:2404.14617  [pdf, other

    cs.AR

    TDRAM: Tag-enhanced DRAM for Efficient Caching

    Authors: Maryam Babaie, Ayaz Akram, Wendy Elsasser, Brent Haukness, Michael Miller, Taeksang Song, Thomas Vogelsang, Steven Woo, Jason Lowe-Power

    Abstract: As SRAM-based caches are hitting a scaling wall, manufacturers are integrating DRAM-based caches into system designs to continue increasing cache sizes. While DRAM caches can improve the performance of memory systems, existing DRAM cache designs suffer from high miss penalties, wasted data movement, and interference between misses and demand requests. In this paper, we propose TDRAM, a novel DRAM… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  44. arXiv:2403.20225  [pdf, other

    cs.CV

    MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark

    Authors: Sanghyun Woo, Kwanyong Park, Inkyu Shin, Myungchul Kim, In So Kweon

    Abstract: Multi-target multi-camera tracking is a crucial task that involves identifying and tracking individuals over time using video streams from multiple cameras. This task has practical applications in various fields, such as visual surveillance, crowd behavior analysis, and anomaly detection. However, due to the difficulty and cost of collecting and labeling data, existing datasets for this task are e… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted on CVPR 2024

  45. arXiv:2403.14113  [pdf, other

    cs.CV

    Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition

    Authors: Sumin Lee, Yooseung Wang, Sangmin Woo, Changick Kim

    Abstract: Panoramic Activity Recognition (PAR) seeks to identify diverse human activities across different scales, from individual actions to social group and global activities in crowded panoramic scenes. PAR presents two major challenges: 1) recognizing the nuanced interactions among numerous individuals and 2) understanding multi-granular human activities. To address these, we propose Social Proximity-aw… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  46. arXiv:2403.11582  [pdf, other

    cs.CV

    OurDB: Ouroboric Domain Bridging for Multi-Target Domain Adaptive Semantic Segmentation

    Authors: Seungbeom Woo, Geonwoo Baek, Taehoon Kim, Jaemin Na, Joong-won Hwang, Wonjun Hwang

    Abstract: Multi-target domain adaptation (MTDA) for semantic segmentation poses a significant challenge, as it involves multiple target domains with varying distributions. The goal of MTDA is to minimize the domain discrepancies among a single source and multi-target domains, aiming to train a single model that excels across all target domains. Previous MTDA approaches typically employ multiple teacher arch… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  47. arXiv:2403.09176  [pdf, other

    cs.CV

    Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts

    Authors: Byeongjun Park, Hyojun Go, Jin-Young Kim, Sangmin Woo, Seokil Ham, Changick Kim

    Abstract: Diffusion models have achieved remarkable success across a range of generative tasks. Recent efforts to enhance diffusion model architectures have reimagined them as a form of multi-task learning, where each task corresponds to a denoising task at a specific noise level. While these efforts have focused on parameter isolation and task routing, they fall short of capturing detailed inter-task relat… ▽ More

    Submitted 10 July, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Project Page: https://byeongjun-park.github.io/Switch-DiT/

  48. arXiv:2403.04981  [pdf, other

    cs.ET

    Paving the Way for Pass Disturb Free Vertical NAND Storage via A Dedicated and String-Compatible Pass Gate

    Authors: Zijian Zhao, Sola Woo, Khandker Akif Aabrar, Sharadindu Gopal Kirtania, Zhouhang Jiang, Shan Deng, Yi Xiao, Halid Mulaosmanovic, Stefan Duenkel, Dominik Kleimaier, Steven Soss, Sven Beyer, Rajiv Joshi, Scott Meninger, Mohamed Mohamed, Kijoon Kim, Jongho Woo, Suhwan Lim, Kwangsoo Kim, Wanki Kim, Daewon Ha, Vijaykrishnan Narayanan, Suman Datta, Shimeng Yu, Kai Ni

    Abstract: In this work, we propose a dual-port cell design to address the pass disturb in vertical NAND storage, which can pass signals through a dedicated and string-compatible pass gate. We demonstrate that: i) the pass disturb-free feature originates from weakening of the depolarization field by the pass bias at the high-${V}_{TH}$ (HVT) state and the screening of the applied field by channel at the low-… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 29 pages, 7 figures

  49. arXiv:2402.18848  [pdf, other

    cs.CV

    SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting

    Authors: Hoon Kim, Minje Jang, Wonjun Yoon, Jisoo Lee, Donghyun Na, Sanghyun Woo

    Abstract: We introduce a co-designed approach for human portrait relighting that combines a physics-guided architecture with a pre-training framework. Drawing on the Cook-Torrance reflectance model, we have meticulously configured the architecture design to precisely simulate light-surface interactions. Furthermore, to overcome the limitation of scarce high-quality lightstage data, we have developed a self-… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: CVPR2024. Live demos available at https://www.beeble.ai/

  50. arXiv:2402.18817  [pdf, other

    cs.CV

    Gradient Alignment for Cross-Domain Face Anti-Spoofing

    Authors: Binh M. Le, Simon S. Woo

    Abstract: Recent advancements in domain generalization (DG) for face anti-spoofing (FAS) have garnered considerable attention. Traditional methods have focused on designing learning objectives and additional modules to isolate domain-specific features while retaining domain-invariant characteristics in their representations. However, such approaches often lack guarantees of consistent maintenance of domain-… ▽ More

    Submitted 11 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Journal ref: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024