-
CoT-Segmenter: Enhancing OOD Detection in Dense Road Scenes via Chain-of-Thought Reasoning
Authors:
Jeonghyo Song,
Kimin Yun,
DaeUng Jo,
Jinyoung Kim,
Youngjoon Yoo
Abstract:
Effective Out-of-Distribution (OOD) detection is criti-cal for ensuring the reliability of semantic segmentation models, particularly in complex road environments where safety and accuracy are paramount. Despite recent advancements in large language models (LLMs), notably GPT-4, which significantly enhanced multimodal reasoning through Chain-of-Thought (CoT) prompting, the application of CoT-based…
▽ More
Effective Out-of-Distribution (OOD) detection is criti-cal for ensuring the reliability of semantic segmentation models, particularly in complex road environments where safety and accuracy are paramount. Despite recent advancements in large language models (LLMs), notably GPT-4, which significantly enhanced multimodal reasoning through Chain-of-Thought (CoT) prompting, the application of CoT-based visual reasoning for OOD semantic segmentation remains largely unexplored. In this paper, through extensive analyses of the road scene anomalies, we identify three challenging scenarios where current state-of-the-art OOD segmentation methods consistently struggle: (1) densely packed and overlapping objects, (2) distant scenes with small objects, and (3) large foreground-dominant objects. To address the presented challenges, we propose a novel CoT-based framework targeting OOD detection in road anomaly scenes. Our method leverages the extensive knowledge and reasoning capabilities of foundation models, such as GPT-4, to enhance OOD detection through improved image understanding and prompt-based reasoning aligned with observed problematic scene attributes. Extensive experiments show that our framework consistently outperforms state-of-the-art methods on both standard benchmarks and our newly defined challenging subset of the RoadAnomaly dataset, offering a robust and interpretable solution for OOD semantic segmentation in complex driving environments.
△ Less
Submitted 5 July, 2025;
originally announced July 2025.
-
EdgeSRIE: A hybrid deep learning framework for real-time speckle reduction and image enhancement on portable ultrasound systems
Authors:
Hyunwoo Cho,
Jongsoo Lee,
Jinbum Kang,
Yangmo Yoo
Abstract:
Speckle patterns in ultrasound images often obscure anatomical details, leading to diagnostic uncertainty. Recently, various deep learning (DL)-based techniques have been introduced to effectively suppress speckle; however, their high computational costs pose challenges for low-resource devices, such as portable ultrasound systems. To address this issue, EdgeSRIE, which is a lightweight hybrid DL…
▽ More
Speckle patterns in ultrasound images often obscure anatomical details, leading to diagnostic uncertainty. Recently, various deep learning (DL)-based techniques have been introduced to effectively suppress speckle; however, their high computational costs pose challenges for low-resource devices, such as portable ultrasound systems. To address this issue, EdgeSRIE, which is a lightweight hybrid DL framework for real-time speckle reduction and image enhancement in portable ultrasound imaging, is introduced. The proposed framework consists of two main branches: an unsupervised despeckling branch, which is trained by minimizing a loss function between speckled images, and a deblurring branch, which restores blurred images to sharp images. For hardware implementation, the trained network is quantized to 8-bit integer precision and deployed on a low-resource system-on-chip (SoC) with limited power consumption. In the performance evaluation with phantom and in vivo analyses, EdgeSRIE achieved the highest contrast-to-noise ratio (CNR) and average gradient magnitude (AGM) compared with the other baselines (different 2-rule-based methods and other 4-DL-based methods). Furthermore, EdgeSRIE enabled real-time inference at over 60 frames per second while satisfying computational requirements (< 20K parameters) on actual portable ultrasound hardware. These results demonstrated the feasibility of EdgeSRIE for real-time, high-quality ultrasound imaging in resource-limited environments.
△ Less
Submitted 5 July, 2025;
originally announced July 2025.
-
A User-Centric, Privacy-Preserving, and Verifiable Ecosystem for Personal Data Management and Utilization
Authors:
Osama Zafar,
Mina Namazi,
Yuqiao Xu,
Youngjin Yoo,
Erman Ayday
Abstract:
In the current paradigm of digital personalized services, the centralized management of personal data raises significant privacy concerns, security vulnerabilities, and diminished individual autonomy over sensitive information. Despite their efficiency, traditional centralized architectures frequently fail to satisfy rigorous privacy requirements and expose users to data breaches and unauthorized…
▽ More
In the current paradigm of digital personalized services, the centralized management of personal data raises significant privacy concerns, security vulnerabilities, and diminished individual autonomy over sensitive information. Despite their efficiency, traditional centralized architectures frequently fail to satisfy rigorous privacy requirements and expose users to data breaches and unauthorized access risks. This pressing challenge calls for a fundamental paradigm shift in methodologies for collecting, storing, and utilizing personal data across diverse sectors, including education, healthcare, and finance.
This paper introduces a novel decentralized, privacy-preserving architecture that handles heterogeneous personal information, ranging from educational credentials to health records and financial data. Unlike traditional models, our system grants users complete data ownership and control, allowing them to selectively share information without compromising privacy. The architecture's foundation comprises advanced privacy-enhancing technologies, including secure enclaves and federated learning, enabling secure computation, verification, and data sharing. The system supports diverse functionalities, including local computation, model training, and privacy-preserving data sharing, while ensuring data credibility and robust user privacy.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Continual-MEGA: A Large-scale Benchmark for Generalizable Continual Anomaly Detection
Authors:
Geonu Lee,
Yujeong Oh,
Geonhui Jang,
Soyoung Lee,
Jeonghyo Song,
Sungmin Cha,
YoungJoon Yoo
Abstract:
In this paper, we introduce a new benchmark for continual learning in anomaly detection, aimed at better reflecting real-world deployment scenarios. Our benchmark, Continual-MEGA, includes a large and diverse dataset that significantly expands existing evaluation settings by combining carefully curated existing datasets with our newly proposed dataset, ContinualAD. In addition to standard continua…
▽ More
In this paper, we introduce a new benchmark for continual learning in anomaly detection, aimed at better reflecting real-world deployment scenarios. Our benchmark, Continual-MEGA, includes a large and diverse dataset that significantly expands existing evaluation settings by combining carefully curated existing datasets with our newly proposed dataset, ContinualAD. In addition to standard continual learning with expanded quantity, we propose a novel scenario that measures zero-shot generalization to unseen classes, those not observed during continual adaptation. This setting poses a new problem setting that continual adaptation also enhances zero-shot performance. We also present a unified baseline algorithm that improves robustness in few-shot detection and maintains strong generalization. Through extensive evaluations, we report three key findings: (1) existing methods show substantial room for improvement, particularly in pixel-level defect localization; (2) our proposed method consistently outperforms prior approaches; and (3) the newly introduced ContinualAD dataset enhances the performance of strong anomaly detection models. We release the benchmark and code in https://github.com/Continual-Mega/Continual-Mega.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
Def-DTS: Deductive Reasoning for Open-domain Dialogue Topic Segmentation
Authors:
Seungmin Lee,
Yongsang Yoo,
Minhwa Jung,
Min Song
Abstract:
Dialogue Topic Segmentation (DTS) aims to divide dialogues into coherent segments. DTS plays a crucial role in various NLP downstream tasks, but suffers from chronic problems: data shortage, labeling ambiguity, and incremental complexity of recently proposed solutions. On the other hand, Despite advances in Large Language Models (LLMs) and reasoning strategies, these have rarely been applied to DT…
▽ More
Dialogue Topic Segmentation (DTS) aims to divide dialogues into coherent segments. DTS plays a crucial role in various NLP downstream tasks, but suffers from chronic problems: data shortage, labeling ambiguity, and incremental complexity of recently proposed solutions. On the other hand, Despite advances in Large Language Models (LLMs) and reasoning strategies, these have rarely been applied to DTS. This paper introduces Def-DTS: Deductive Reasoning for Open-domain Dialogue Topic Segmentation, which utilizes LLM-based multi-step deductive reasoning to enhance DTS performance and enable case study using intermediate result. Our method employs a structured prompting approach for bidirectional context summarization, utterance intent classification, and deductive topic shift detection. In the intent classification process, we propose the generalizable intent list for domain-agnostic dialogue intent classification. Experiments in various dialogue settings demonstrate that Def-DTS consistently outperforms traditional and state-of-the-art approaches, with each subtask contributing to improved performance, particularly in reducing type 2 error. We also explore the potential for autolabeling, emphasizing the importance of LLM reasoning techniques in DTS.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
PatentMind: A Multi-Aspect Reasoning Graph for Patent Similarity Evaluation
Authors:
Yongmin Yoo,
Qiongkai Xu,
Longbing Cao
Abstract:
Patent similarity evaluation plays a critical role in intellectual property analysis. However, existing methods often overlook the intricate structure of patent documents, which integrate technical specifications, legal boundaries, and application contexts. We introduce PatentMind, a novel framework for patent similarity assessment based on a Multi-Aspect Reasoning Graph (MARG). PatentMind decompo…
▽ More
Patent similarity evaluation plays a critical role in intellectual property analysis. However, existing methods often overlook the intricate structure of patent documents, which integrate technical specifications, legal boundaries, and application contexts. We introduce PatentMind, a novel framework for patent similarity assessment based on a Multi-Aspect Reasoning Graph (MARG). PatentMind decomposes patents into three core dimensions: technical feature, application domain, and claim scope, to compute dimension-specific similarity scores. These scores are dynamically weighted through a four-stage reasoning process which integrates contextual signals to emulate expert-level judgment. To support evaluation, we construct PatentSimBench, a human-annotated benchmark comprising 500 patent pairs. Experimental results demonstrate that PatentMind achieves a strong correlation ($r=0.938$) with expert annotations, significantly outperforming embedding-based models and advanced prompt engineering methods.These results highlight the effectiveness of modular reasoning frameworks in overcoming key limitations of embedding-based methods for analyzing patent similarity.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
PatentScore: Multi-dimensional Evaluation of LLM-Generated Patent Claims
Authors:
Yongmin Yoo,
Qiongkai Xu,
Longbing Cao
Abstract:
Natural language generation (NLG) metrics play a central role in evaluating generated texts, but are not well suited for the structural and legal characteristics of patent documents. Large language models (LLMs) offer strong potential in automating patent generation, yet research on evaluating LLM-generated patents remains limited, especially in evaluating the generation quality of patent claims,…
▽ More
Natural language generation (NLG) metrics play a central role in evaluating generated texts, but are not well suited for the structural and legal characteristics of patent documents. Large language models (LLMs) offer strong potential in automating patent generation, yet research on evaluating LLM-generated patents remains limited, especially in evaluating the generation quality of patent claims, which are central to defining the scope of protection. Effective claim evaluation requires addressing legal validity, technical accuracy, and structural compliance. To address this gap, we introduce PatentScore, a multi-dimensional evaluation framework for assessing LLM-generated patent claims. PatentScore incorporates: (1) hierarchical decomposition for claim analysis; (2) domain-specific validation patterns based on legal and technical standards; and (3) scoring across structural, semantic, and legal dimensions. Unlike general-purpose NLG metrics, PatentScore reflects patent-specific constraints and document structures, enabling evaluation beyond surface similarity. We evaluate 400 GPT-4o-mini generated Claim 1s and report a Pearson correlation of $r = 0.819$ with expert annotations, outperforming existing NLG metrics. Furthermore, we conduct additional evaluations using open models such as Claude-3.5-Haiku and Gemini-1.5-flash, all of which show strong correlations with expert judgments, confirming the robustness and generalizability of our framework.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
Prediction of Permissioned Blockchain Performance for Resource Scaling Configurations
Authors:
Seungwoo Jung,
Yeonho Yoo,
Gyeongsik Yang,
Chuck Yoo
Abstract:
Blockchain is increasingly offered as blockchain-as-a-service (BaaS) by cloud service providers. However, configuring BaaS appropriately for optimal performance and reliability resorts to try-and-error. A key challenge is that BaaS is often perceived as a ``black-box,'' leading to uncertainties in performance and resource provisioning. Previous studies attempted to address this challenge; however,…
▽ More
Blockchain is increasingly offered as blockchain-as-a-service (BaaS) by cloud service providers. However, configuring BaaS appropriately for optimal performance and reliability resorts to try-and-error. A key challenge is that BaaS is often perceived as a ``black-box,'' leading to uncertainties in performance and resource provisioning. Previous studies attempted to address this challenge; however, the impacts of both vertical and horizontal scaling remain elusive. To this end, we present machine learning-based models to predict network reliability and throughput based on scaling configurations. In our evaluation, the models exhibit prediction errors of ~1.9%, which is highly accurate and can be applied in the real-world.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Representing 3D Shapes With 64 Latent Vectors for 3D Diffusion Models
Authors:
In Cho,
Youngbeom Yoo,
Subin Jeon,
Seon Joo Kim
Abstract:
Constructing a compressed latent space through a variational autoencoder (VAE) is the key for efficient 3D diffusion models. This paper introduces COD-VAE, a VAE that encodes 3D shapes into a COmpact set of 1D latent vectors without sacrificing quality. COD-VAE introduces a two-stage autoencoder scheme to improve compression and decoding efficiency. First, our encoder block progressively compresse…
▽ More
Constructing a compressed latent space through a variational autoencoder (VAE) is the key for efficient 3D diffusion models. This paper introduces COD-VAE, a VAE that encodes 3D shapes into a COmpact set of 1D latent vectors without sacrificing quality. COD-VAE introduces a two-stage autoencoder scheme to improve compression and decoding efficiency. First, our encoder block progressively compresses point clouds into compact latent vectors via intermediate point patches. Second, our triplane-based decoder reconstructs dense triplanes from latent vectors instead of directly decoding neural fields, significantly reducing computational overhead of neural fields decoding. Finally, we propose uncertainty-guided token pruning, which allocates resources adaptively by skipping computations in simpler regions and improves the decoder efficiency. Experimental results demonstrate that COD-VAE achieves 16x compression compared to the baseline while maintaining quality. This enables 20.8x speedup in generation, highlighting that a large number of latent vectors is not a prerequisite for high-quality reconstruction and generation.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
A Non-contrast Head CT Foundation Model for Comprehensive Neuro-Trauma Triage
Authors:
Youngjin Yoo,
Bogdan Georgescu,
Yanbo Zhang,
Sasa Grbic,
Han Liu,
Gabriela D. Aldea,
Thomas J. Re,
Jyotipriya Das,
Poikavila Ullaskrishnan,
Eva Eibenberger,
Andrei Chekkoury,
Uttam K. Bodanapally,
Savvas Nicolaou,
Pina C. Sanelli,
Thomas J. Schroeppel,
Yvonne W. Lui,
Eli Gibson
Abstract:
Recent advancements in AI and medical imaging offer transformative potential in emergency head CT interpretation for reducing assessment times and improving accuracy in the face of an increasing request of such scans and a global shortage in radiologists. This study introduces a 3D foundation model for detecting diverse neuro-trauma findings with high accuracy and efficiency. Using large language…
▽ More
Recent advancements in AI and medical imaging offer transformative potential in emergency head CT interpretation for reducing assessment times and improving accuracy in the face of an increasing request of such scans and a global shortage in radiologists. This study introduces a 3D foundation model for detecting diverse neuro-trauma findings with high accuracy and efficiency. Using large language models (LLMs) for automatic labeling, we generated comprehensive multi-label annotations for critical conditions. Our approach involved pretraining neural networks for hemorrhage subtype segmentation and brain anatomy parcellation, which were integrated into a pretrained comprehensive neuro-trauma detection network through multimodal fine-tuning. Performance evaluation against expert annotations and comparison with CT-CLIP demonstrated strong triage accuracy across major neuro-trauma findings, such as hemorrhage and midline shift, as well as less frequent critical conditions such as cerebral edema and arterial hyperdensity. The integration of neuro-specific features significantly enhanced diagnostic capabilities, achieving an average AUC of 0.861 for 16 neuro-trauma conditions. This work advances foundation models in medical imaging, serving as a benchmark for future AI-assisted neuro-trauma diagnostics in emergency radiology.
△ Less
Submitted 28 February, 2025;
originally announced February 2025.
-
Fully Guided Neural Schrödinger bridge for Brain MR image synthesis
Authors:
Hanyeol Yang,
Sunggyu Kim,
Yongseon Yoo,
Jong-min Lee
Abstract:
Multi-modal brain MRI provides essential complementary information for clinical diagnosis. However, acquiring all modalities is often challenging due to time and cost constraints. To address this, various methods have been proposed to generate missing modalities from available ones. Traditional approaches can be broadly categorized into two main types: paired and unpaired methods. While paired met…
▽ More
Multi-modal brain MRI provides essential complementary information for clinical diagnosis. However, acquiring all modalities is often challenging due to time and cost constraints. To address this, various methods have been proposed to generate missing modalities from available ones. Traditional approaches can be broadly categorized into two main types: paired and unpaired methods. While paired methods offer superior performance, obtaining large-scale paired datasets is challenging in real-world scenarios. Conversely, unpaired methods facilitate large-scale data collection but struggle to preserve critical image features, such as tumors. In this paper, we propose Fully Guided Schrödinger Bridges (FGSB), a novel framework based on Neural Schrödinger Bridges, to overcome these limitations. FGSB achieves stable, high-quality generation of missing modalities using minimal paired data. Furthermore, when provided with ground truth or a segmentation network for specific regions, FGSB can generate missing modalities while preserving these critical areas with reduced data requirements. Our proposed model consists of two consecutive phases. 1) Generation Phase: Fuses a generated image, a paired reference image, and Gaussian noise, employing iterative refinement to mitigate issues such as mode collapse and improve generation quality 2) Training Phase: Learns the mapping from the generated image to the target modality. Experiments demonstrate that FGSB achieves comparable generation performance to methods trained on large datasets, while using data from only two subjects. Moreover, the utilization of lesion information with FGSB significantly enhances its ability to preserve crucial lesion features.
△ Less
Submitted 23 January, 2025;
originally announced January 2025.
-
3D Operation of Autonomous Excavator based on Reinforcement Learning through Independent Reward for Individual Joints
Authors:
Yoonkyu Yoo,
Donghwi Jung,
Seong-Woo Kim
Abstract:
In this paper, we propose a control algorithm based on reinforcement learning, employing independent rewards for each joint to control excavators in a 3D space. The aim of this research is to address the challenges associated with achieving precise control of excavators, which are extensively utilized in construction sites but prove challenging to control with precision due to their hydraulic stru…
▽ More
In this paper, we propose a control algorithm based on reinforcement learning, employing independent rewards for each joint to control excavators in a 3D space. The aim of this research is to address the challenges associated with achieving precise control of excavators, which are extensively utilized in construction sites but prove challenging to control with precision due to their hydraulic structures. Traditional methods relied on operator expertise for precise excavator operation, occasionally resulting in safety accidents. Therefore, there have been endeavors to attain precise excavator control through equation-based control algorithms. However, these methods had the limitation of necessitating prior information related to physical values of the excavator, rendering them unsuitable for the diverse range of excavators used in the field. To overcome these limitations, we have explored reinforcement learning-based control methods that do not demand prior knowledge of specific equipment but instead utilize data to train models. Nevertheless, existing reinforcement learning-based methods overlooked cabin swing rotation and confined the bucket's workspace to a 2D plane. Control confined within such a limited area diminishes the applicability of the algorithm in construction sites. We address this issue by expanding the previous 2D plane workspace of the bucket operation into a 3D space, incorporating cabin swing rotation. By expanding the workspace into 3D, excavators can execute continuous operations without requiring human intervention. To accomplish this objective, distinct targets were established for each joint, facilitating the training of action values for each joint independently, regardless of the progress of other joint learning.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Advancing Cross-Domain Generalizability in Face Anti-Spoofing: Insights, Design, and Metrics
Authors:
Hyojin Kim,
Jiyoon Lee,
Yonghyun Jeong,
Haneol Jang,
YoungJoon Yoo
Abstract:
This paper presents a novel perspective for enhancing anti-spoofing performance in zero-shot data domain generalization. Unlike traditional image classification tasks, face anti-spoofing datasets display unique generalization characteristics, necessitating novel zero-shot data domain generalization. One step forward to the previous frame-wise spoofing prediction, we introduce a nuanced metric calc…
▽ More
This paper presents a novel perspective for enhancing anti-spoofing performance in zero-shot data domain generalization. Unlike traditional image classification tasks, face anti-spoofing datasets display unique generalization characteristics, necessitating novel zero-shot data domain generalization. One step forward to the previous frame-wise spoofing prediction, we introduce a nuanced metric calculation that aggregates frame-level probabilities for a video-wise prediction, to tackle the gap between the reported frame-wise accuracy and instability in real-world use-case. This approach enables the quantification of bias and variance in model predictions, offering a more refined analysis of model generalization. Our investigation reveals that simply scaling up the backbone of models does not inherently improve the mentioned instability, leading us to propose an ensembled backbone method from a Bayesian perspective. The probabilistically ensembled backbone both improves model robustness measured from the proposed metric and spoofing accuracy, and also leverages the advantages of measuring uncertainty, allowing for enhanced sampling during training that contributes to model generalization across new datasets. We evaluate the proposed method from the benchmark OMIC dataset and also the public CelebA-Spoof and SiW-Mv2. Our final model outperforms existing state-of-the-art methods across the datasets, showcasing advancements in Bias, Variance, HTER, and AUC metrics.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seongjin Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Towards Label-Efficient Human Matting: A Simple Baseline for Weakly Semi-Supervised Trimap-Free Human Matting
Authors:
Beomyoung Kim,
Myeong Yeon Yi,
Joonsang Yu,
Young Joon Yoo,
Sung Ju Hwang
Abstract:
This paper presents a new practical training method for human matting, which demands delicate pixel-level human region identification and significantly laborious annotations. To reduce the annotation cost, most existing matting approaches often rely on image synthesis to augment the dataset. However, the unnaturalness of synthesized training images brings in a new domain generalization challenge f…
▽ More
This paper presents a new practical training method for human matting, which demands delicate pixel-level human region identification and significantly laborious annotations. To reduce the annotation cost, most existing matting approaches often rely on image synthesis to augment the dataset. However, the unnaturalness of synthesized training images brings in a new domain generalization challenge for natural images. To address this challenge, we introduce a new learning paradigm, weakly semi-supervised human matting (WSSHM), which leverages a small amount of expensive matte labels and a large amount of budget-friendly segmentation labels, to save the annotation cost and resolve the domain generalization problem. To achieve the goal of WSSHM, we propose a simple and effective training method, named Matte Label Blending (MLB), that selectively guides only the beneficial knowledge of the segmentation and matte data to the matting model. Extensive experiments with our detailed analysis demonstrate our method can substantially improve the robustness of the matting model using a few matte data and numerous segmentation data. Our training method is also easily applicable to real-time models, achieving competitive accuracy with breakneck inference speed (328 FPS on NVIDIA V100 GPU). The implementation code is available at \url{https://github.com/clovaai/WSSHM}.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Multi-Object RANSAC: Efficient Plane Clustering Method in a Clutter
Authors:
Seunghyeon Lim,
Youngjae Yoo,
Jun Ki Lee,
Byoung-Tak Zhang
Abstract:
In this paper, we propose a novel method for plane clustering specialized in cluttered scenes using an RGB-D camera and validate its effectiveness through robot grasping experiments. Unlike existing methods, which focus on large-scale indoor structures, our approach -- Multi-Object RANSAC emphasizes cluttered environments that contain a wide range of objects with different scales. It enhances plan…
▽ More
In this paper, we propose a novel method for plane clustering specialized in cluttered scenes using an RGB-D camera and validate its effectiveness through robot grasping experiments. Unlike existing methods, which focus on large-scale indoor structures, our approach -- Multi-Object RANSAC emphasizes cluttered environments that contain a wide range of objects with different scales. It enhances plane segmentation by generating subplanes in Deep Plane Clustering (DPC) module, which are then merged with the final planes by post-processing. DPC rearranges the point cloud by voting layers to make subplane clusters, trained in a self-supervised manner using pseudo-labels generated from RANSAC. Multi-Object RANSAC demonstrates superior plane instance segmentation performances over other recent RANSAC applications. We conducted an experiment on robot suction-based grasping, comparing our method with vision-based grasping network and RANSAC applications. The results from this real-world scenario showed its remarkable performance surpassing the baseline methods, highlighting its potential for advanced scene understanding and manipulation.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Hyper-CL: Conditioning Sentence Representations with Hypernetworks
Authors:
Young Hyun Yoo,
Jii Cha,
Changhyeon Kim,
Taeuk Kim
Abstract:
While the introduction of contrastive learning frameworks in sentence representation learning has significantly contributed to advancements in the field, it still remains unclear whether state-of-the-art sentence embeddings can capture the fine-grained semantics of sentences, particularly when conditioned on specific perspectives. In this paper, we introduce Hyper-CL, an efficient methodology that…
▽ More
While the introduction of contrastive learning frameworks in sentence representation learning has significantly contributed to advancements in the field, it still remains unclear whether state-of-the-art sentence embeddings can capture the fine-grained semantics of sentences, particularly when conditioned on specific perspectives. In this paper, we introduce Hyper-CL, an efficient methodology that integrates hypernetworks with contrastive learning to compute conditioned sentence representations. In our proposed approach, the hypernetwork is responsible for transforming pre-computed condition embeddings into corresponding projection layers. This enables the same sentence embeddings to be projected differently according to various conditions. Evaluation on two representative conditioning benchmarks, namely conditional semantic text similarity and knowledge graph completion, demonstrates that Hyper-CL is effective in flexibly conditioning sentence representations, showcasing its computational efficiency at the same time. We also provide a comprehensive analysis of the inner workings of our approach, leading to a better interpretation of its mechanisms.
△ Less
Submitted 6 June, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
Multimodal Anomaly Detection based on Deep Auto-Encoder for Object Slip Perception of Mobile Manipulation Robots
Authors:
Youngjae Yoo,
Chung-Yeon Lee,
Byoung-Tak Zhang
Abstract:
Object slip perception is essential for mobile manipulation robots to perform manipulation tasks reliably in the dynamic real-world. Traditional approaches to robot arms' slip perception use tactile or vision sensors. However, mobile robots still have to deal with noise in their sensor signals caused by the robot's movement in a changing environment. To solve this problem, we present an anomaly de…
▽ More
Object slip perception is essential for mobile manipulation robots to perform manipulation tasks reliably in the dynamic real-world. Traditional approaches to robot arms' slip perception use tactile or vision sensors. However, mobile robots still have to deal with noise in their sensor signals caused by the robot's movement in a changing environment. To solve this problem, we present an anomaly detection method that utilizes multisensory data based on a deep autoencoder model. The proposed framework integrates heterogeneous data streams collected from various robot sensors, including RGB and depth cameras, a microphone, and a force-torque sensor. The integrated data is used to train a deep autoencoder to construct latent representations of the multisensory data that indicate the normal status. Anomalies can then be identified by error scores measured by the difference between the trained encoder's latent values and the latent values of reconstructed input data. In order to evaluate the proposed framework, we conducted an experiment that mimics an object slip by a mobile service robot operating in a real-world environment with diverse household objects and different moving patterns. The experimental results verified that the proposed framework reliably detects anomalies in object slip situations despite various object types and robot behaviors, and visual and auditory noise in the environment.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Authors:
Jonghyun Lee,
Hansam Cho,
Youngjoon Yoo,
Seoung Bum Kim,
Yonghyun Jeong
Abstract:
Addressing the limitations of text as a source of accurate layout representation in text-conditional diffusion models, many works incorporate additional signals to condition certain attributes within a generated image. Although successful, previous works do not account for the specific localization of said attributes extended into the three dimensional plane. In this context, we present a conditio…
▽ More
Addressing the limitations of text as a source of accurate layout representation in text-conditional diffusion models, many works incorporate additional signals to condition certain attributes within a generated image. Although successful, previous works do not account for the specific localization of said attributes extended into the three dimensional plane. In this context, we present a conditional diffusion model that integrates control over three-dimensional object placement with disentangled representations of global stylistic semantics from multiple exemplar images. Specifically, we first introduce \textit{depth disentanglement training} to leverage the relative depth of objects as an estimator, allowing the model to identify the absolute positions of unseen objects through the use of synthetic image triplets. We also introduce \textit{soft guidance}, a method for imposing global semantics onto targeted regions without the use of any additional localization cues. Our integrated framework, \textsc{Compose and Conquer (CnC)}, unifies these techniques to localize multiple conditions in a disentangled manner. We demonstrate that our approach allows perception of objects at varying depths while offering a versatile framework for composing localized objects with different global semantics. Code: https://github.com/tomtom1103/compose-and-conquer/
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Gaussian Mixture Proposals with Pull-Push Learning Scheme to Capture Diverse Events for Weakly Supervised Temporal Video Grounding
Authors:
Sunoh Kim,
Jungchan Cho,
Joonsang Yu,
YoungJoon Yoo,
Jin Young Choi
Abstract:
In the weakly supervised temporal video grounding study, previous methods use predetermined single Gaussian proposals which lack the ability to express diverse events described by the sentence query. To enhance the expression ability of a proposal, we propose a Gaussian mixture proposal (GMP) that can depict arbitrary shapes by learning importance, centroid, and range of every Gaussian in the mixt…
▽ More
In the weakly supervised temporal video grounding study, previous methods use predetermined single Gaussian proposals which lack the ability to express diverse events described by the sentence query. To enhance the expression ability of a proposal, we propose a Gaussian mixture proposal (GMP) that can depict arbitrary shapes by learning importance, centroid, and range of every Gaussian in the mixture. In learning GMP, each Gaussian is not trained in a feature space but is implemented over a temporal location. Thus the conventional feature-based learning for Gaussian mixture model is not valid for our case. In our special setting, to learn moderately coupled Gaussian mixture capturing diverse events, we newly propose a pull-push learning scheme using pulling and pushing losses, each of which plays an opposite role to the other. The effects of components in our scheme are verified in-depth with extensive ablation studies and the overall scheme achieves state-of-the-art performance. Our code is available at https://github.com/sunoh-kim/pps.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided Document Generation
Authors:
YoungJoon Yoo,
Jongwon Choi
Abstract:
This paper introduces a novel approach for topic modeling utilizing latent codebooks from Vector-Quantized Variational Auto-Encoder~(VQ-VAE), discretely encapsulating the rich information of the pre-trained embeddings such as the pre-trained language model. From the novel interpretation of the latent codebooks and embeddings as conceptual bag-of-words, we propose a new generative topic model calle…
▽ More
This paper introduces a novel approach for topic modeling utilizing latent codebooks from Vector-Quantized Variational Auto-Encoder~(VQ-VAE), discretely encapsulating the rich information of the pre-trained embeddings such as the pre-trained language model. From the novel interpretation of the latent codebooks and embeddings as conceptual bag-of-words, we propose a new generative topic model called Topic-VQ-VAE~(TVQ-VAE) which inversely generates the original documents related to the respective latent codebook. The TVQ-VAE can visualize the topics with various generative distributions including the traditional BoW distribution and the autoregressive image generation. Our experimental results on document analysis and image generation demonstrate that TVQ-VAE effectively captures the topic context which reveals the underlying structures of the dataset and supports flexible forms of document generation. Official implementation of the proposed TVQ-VAE is available at https://github.com/clovaai/TVQ-VAE.
△ Less
Submitted 21 January, 2024; v1 submitted 15 December, 2023;
originally announced December 2023.
-
Deep Coherence Learning: An Unsupervised Deep Beamformer for High Quality Single Plane Wave Imaging in Medical Ultrasound
Authors:
Hyunwoo Cho,
Seongjun Park,
Jinbum Kang,
Yangmo Yoo
Abstract:
Plane wave imaging (PWI) in medical ultrasound is becoming an important reconstruction method with high frame rates and new clinical applications. Recently, single PWI based on deep learning (DL) has been studied to overcome lowered frame rates of traditional PWI with multiple PW transmissions. However, due to the lack of appropriate ground truth images, DL-based PWI still remains challenging for…
▽ More
Plane wave imaging (PWI) in medical ultrasound is becoming an important reconstruction method with high frame rates and new clinical applications. Recently, single PWI based on deep learning (DL) has been studied to overcome lowered frame rates of traditional PWI with multiple PW transmissions. However, due to the lack of appropriate ground truth images, DL-based PWI still remains challenging for performance improvements. To address this issue, in this paper, we propose a new unsupervised learning approach, i.e., deep coherence learning (DCL)-based DL beamformer (DL-DCL), for high-quality single PWI. In DL-DCL, the DL network is trained to predict highly correlated signals with a unique loss function from a set of PW data, and the trained DL model encourages high-quality PWI from low-quality single PW data. In addition, the DL-DCL framework based on complex baseband signals enables a universal beamformer. To assess the performance of DL-DCL, simulation, phantom and in vivo studies were conducted with public datasets, and it was compared with traditional beamformers (i.e., DAS with 75-PWs and DMAS with 1-PW) and other DL-based methods (i.e., supervised learning approach with 1-PW and generative adversarial network (GAN) with 1-PW). From the experiments, the proposed DL-DCL showed comparable results with DMAS with 1-PW and DAS with 75-PWs in spatial resolution, and it outperformed all comparison methods in contrast resolution. These results demonstrated that the proposed unsupervised learning approach can address the inherent limitations of traditional PWIs based on DL, and it also showed great potential in clinical settings with minimal artifacts.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
GeNAS: Neural Architecture Search with Better Generalization
Authors:
Joonhyun Jeong,
Joonsang Yu,
Geondo Park,
Dongyoon Han,
YoungJoon Yoo
Abstract:
Neural Architecture Search (NAS) aims to automatically excavate the optimal network architecture with superior test performance. Recent neural architecture search (NAS) approaches rely on validation loss or accuracy to find the superior network for the target data. In this paper, we investigate a new neural architecture search measure for excavating architectures with better generalization. We dem…
▽ More
Neural Architecture Search (NAS) aims to automatically excavate the optimal network architecture with superior test performance. Recent neural architecture search (NAS) approaches rely on validation loss or accuracy to find the superior network for the target data. In this paper, we investigate a new neural architecture search measure for excavating architectures with better generalization. We demonstrate that the flatness of the loss surface can be a promising proxy for predicting the generalization capability of neural network architectures. We evaluate our proposed method on various search spaces, showing similar or even better performance compared to the state-of-the-art NAS methods. Notably, the resultant architecture found by flatness measure generalizes robustly to various shifts in data distribution (e.g. ImageNet-V2,-A,-O), as well as various tasks such as object detection and semantic segmentation. Code is available at https://github.com/clovaai/GeNAS.
△ Less
Submitted 18 May, 2023; v1 submitted 15 May, 2023;
originally announced May 2023.
-
A Novel Patent Similarity Measurement Methodology: Semantic Distance and Technological Distance
Authors:
Yongmin Yoo,
Cheonkam Jeong,
Sanguk Gim,
Junwon Lee,
Zachary Schimke,
Deaho Seo
Abstract:
Patent similarity analysis plays a crucial role in evaluating the risk of patent infringement. Nonetheless, this analysis is predominantly conducted manually by legal experts, often resulting in a time-consuming process. Recent advances in natural language processing technology offer a promising avenue for automating this process. However, methods for measuring similarity between patents still rel…
▽ More
Patent similarity analysis plays a crucial role in evaluating the risk of patent infringement. Nonetheless, this analysis is predominantly conducted manually by legal experts, often resulting in a time-consuming process. Recent advances in natural language processing technology offer a promising avenue for automating this process. However, methods for measuring similarity between patents still rely on experts manually classifying patents. Due to the recent development of artificial intelligence technology, a lot of research is being conducted focusing on the semantic similarity of patents using natural language processing technology. However, it is difficult to accurately analyze patent data, which are legal documents representing complex technologies, using existing natural language processing technologies. To address these limitations, we propose a hybrid methodology that takes into account bibliographic similarity, measures the similarity between patents by considering the semantic similarity of patents, the technical similarity between patents, and the bibliographic information of patents. Using natural language processing techniques, we measure semantic similarity based on patent text and calculate technical similarity through the degree of coexistence of International patent classification (IPC) codes. The similarity of bibliographic information of a patent is calculated using the special characteristics of the patent: citation information, inventor information, and assignee information. We propose a model that assigns reasonable weights to each similarity method considered. With the help of experts, we performed manual similarity evaluations on 420 pairs and evaluated the performance of our model based on this data. We have empirically shown that our method outperforms recent natural language processing techniques.
△ Less
Submitted 30 November, 2023; v1 submitted 23 March, 2023;
originally announced March 2023.
-
Multi label classification of Artificial Intelligence related patents using Modified D2SBERT and Sentence Attention mechanism
Authors:
Yongmin Yoo,
Tak-Sung Heo,
Dongjin Lim,
Deaho Seo
Abstract:
Patent classification is an essential task in patent information management and patent knowledge mining. It is very important to classify patents related to artificial intelligence, which is the biggest topic these days. However, artificial intelligence-related patents are very difficult to classify because it is a mixture of complex technologies and legal terms. Moreover, due to the unsatisfactor…
▽ More
Patent classification is an essential task in patent information management and patent knowledge mining. It is very important to classify patents related to artificial intelligence, which is the biggest topic these days. However, artificial intelligence-related patents are very difficult to classify because it is a mixture of complex technologies and legal terms. Moreover, due to the unsatisfactory performance of current algorithms, it is still mostly done manually, wasting a lot of time and money. Therefore, we present a method for classifying artificial intelligence-related patents published by the USPTO using natural language processing technique and deep learning methodology. We use deformed BERT and sentence attention overcome the limitations of BERT. Our experiment result is highest performance compared to other deep learning methods.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
Compliant Suction Gripper with Seamless Deployment and Retraction for Robust Picking against Depth and Tilt Errors
Authors:
Yuna Yoo,
Jaemin Eom,
Min Jo Park,
Kyu-Jin Cho
Abstract:
Applying suction grippers in unstructured environments is a challenging task because of depth and tilt errors in vision systems, requiring additional costs in elaborate sensing and control. To reduce additional costs, suction grippers with compliant bodies or mechanisms have been proposed; however, their bulkiness and limited allowable error hinder their use in complex environments with large erro…
▽ More
Applying suction grippers in unstructured environments is a challenging task because of depth and tilt errors in vision systems, requiring additional costs in elaborate sensing and control. To reduce additional costs, suction grippers with compliant bodies or mechanisms have been proposed; however, their bulkiness and limited allowable error hinder their use in complex environments with large errors. Here, we propose a compact suction gripper that can pick objects over a wide range of distances and tilt angles without elaborate sensing and control. The spring-inserted gripper body deploys and conforms to distant and tilted objects until the suction cup completely seals with the object and retracts immediately after, while holding the object. This seamless deployment and retraction is enabled by connecting the gripper body and suction cup to the same vacuum source, which couples the vacuum picking and retraction of the gripper body. Experimental results validated that the proposed gripper can pick objects within 79 mm, which is 1.4 times the initial length, and can pick objects with tilt angles up to 60°. The feasibility of the gripper was verified by demonstrations, including picking objects of different heights from the same picking height and the bin picking of transparent objects.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
5-Star Hotel Customer Satisfaction Analysis Using Hybrid Methodology
Authors:
Yongmin Yoo,
Yeongjoon Park,
Dongjin Lim,
Deaho Seo
Abstract:
Due to the rapid development of non-face-to-face services due to the corona virus, commerce through the Internet, such as sales and reservations, is increasing very rapidly. Consumers also post reviews, suggestions, or judgments about goods or services on the website. The review data directly used by consumers provides positive feedback and nice impact to consumers, such as creating business value…
▽ More
Due to the rapid development of non-face-to-face services due to the corona virus, commerce through the Internet, such as sales and reservations, is increasing very rapidly. Consumers also post reviews, suggestions, or judgments about goods or services on the website. The review data directly used by consumers provides positive feedback and nice impact to consumers, such as creating business value. Therefore, analysing review data is very important from a marketing point of view. Our research suggests a new way to find factors for customer satisfaction through review data. We applied a method to find factors for customer satisfaction by mixing and using the data mining technique, which is a big data analysis method, and the natural language processing technique, which is a language processing method, in our research. Unlike many studies on customer satisfaction that have been conducted in the past, our research has a novelty of the thesis by using various techniques. And as a result of the analysis, the results of our experiments were very accurate.
△ Less
Submitted 26 September, 2022;
originally announced September 2022.
-
Machine Learning-Enabled Cyber Attack Prediction and Mitigation for EV Charging Stations
Authors:
Mansi Girdhar,
Junho Hong,
Yongsik Yoo,
Tai-Jin Song
Abstract:
Safe and reliable electric vehicle charging stations (EVCSs) have become imperative in an intelligent transportation infrastructure. Over the years, there has been a rapid increase in the deployment of EVCSs to address the upsurging charging demands. However, advances in information and communication technologies (ICT) have rendered this cyber-physical system (CPS) vulnerable to suffering cyber th…
▽ More
Safe and reliable electric vehicle charging stations (EVCSs) have become imperative in an intelligent transportation infrastructure. Over the years, there has been a rapid increase in the deployment of EVCSs to address the upsurging charging demands. However, advances in information and communication technologies (ICT) have rendered this cyber-physical system (CPS) vulnerable to suffering cyber threats, thereby destabilizing the charging ecosystem and even the entire electric grid infrastructure. This paper develops an advanced cybersecurity framework, where STRIDE threat modeling is used to identify potential vulnerabilities in an EVCS. Further, the weighted attack defense tree approach is employed to create multiple attack scenarios, followed by developing Hidden Markov Model (HMM) and Partially Observable Monte-Carlo Planning (POMCP) algorithms for modeling the security attacks. Also, potential mitigation strategies are suggested for the identified threats.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
Meet Your Favorite Character: Open-domain Chatbot Mimicking Fictional Characters with only a Few Utterances
Authors:
Seungju Han,
Beomsu Kim,
Jin Yong Yoo,
Seokjun Seo,
Sangbum Kim,
Enkhbayar Erdenee,
Buru Chang
Abstract:
In this paper, we consider mimicking fictional characters as a promising direction for building engaging conversation models. To this end, we present a new practical task where only a few utterances of each fictional character are available to generate responses mimicking them. Furthermore, we propose a new method named Pseudo Dialog Prompting (PDP) that generates responses by leveraging the power…
▽ More
In this paper, we consider mimicking fictional characters as a promising direction for building engaging conversation models. To this end, we present a new practical task where only a few utterances of each fictional character are available to generate responses mimicking them. Furthermore, we propose a new method named Pseudo Dialog Prompting (PDP) that generates responses by leveraging the power of large-scale language models with prompts containing the target character's utterances. To better reflect the style of the character, PDP builds the prompts in the form of dialog that includes the character's utterances as dialog history. Since only utterances of the characters are available in the proposed task, PDP matches each utterance with an appropriate pseudo-context from a predefined set of context candidates using a retrieval model. Through human and automatic evaluation, we show that PDP generates responses that better reflect the style of fictional characters than baseline methods.
△ Less
Submitted 22 April, 2022;
originally announced April 2022.
-
DAGAM: Data Augmentation with Generation And Modification
Authors:
Byeong-Cheol Jo,
Tak-Sung Heo,
Yeongjoon Park,
Yongmin Yoo,
Won Ik Cho,
Kyungsun Kim
Abstract:
Text classification is a representative downstream task of natural language processing, and has exhibited excellent performance since the advent of pre-trained language models based on Transformer architecture. However, in pre-trained language models, under-fitting often occurs due to the size of the model being very large compared to the amount of available training data. Along with significant i…
▽ More
Text classification is a representative downstream task of natural language processing, and has exhibited excellent performance since the advent of pre-trained language models based on Transformer architecture. However, in pre-trained language models, under-fitting often occurs due to the size of the model being very large compared to the amount of available training data. Along with significant importance of data collection in modern machine learning paradigm, studies have been actively conducted for natural language data augmentation. In light of this, we introduce three data augmentation schemes that help reduce underfitting problems of large-scale language models. Primarily we use a generation model for data augmentation, which is defined as Data Augmentation with Generation (DAG). Next, we augment data using text modification techniques such as corruption and word order change (Data Augmentation with Modification, DAM). Finally, we propose Data Augmentation with Generation And Modification (DAGAM), which combines DAG and DAM techniques for a boosted performance. We conduct data augmentation for six benchmark datasets of text classification task, and verify the usefulness of DAG, DAM, and DAGAM through BERT-based fine-tuning and evaluation, deriving better results compared to the performance with original datasets.
△ Less
Submitted 6 April, 2022;
originally announced April 2022.
-
EResFD: Rediscovery of the Effectiveness of Standard Convolution for Lightweight Face Detection
Authors:
Joonhyun Jeong,
Beomyoung Kim,
Joonsang Yu,
Youngjoon Yoo
Abstract:
This paper analyzes the design choices of face detection architecture that improve efficiency of computation cost and accuracy. Specifically, we re-examine the effectiveness of the standard convolutional block as a lightweight backbone architecture for face detection. Unlike the current tendency of lightweight architecture design, which heavily utilizes depthwise separable convolution layers, we s…
▽ More
This paper analyzes the design choices of face detection architecture that improve efficiency of computation cost and accuracy. Specifically, we re-examine the effectiveness of the standard convolutional block as a lightweight backbone architecture for face detection. Unlike the current tendency of lightweight architecture design, which heavily utilizes depthwise separable convolution layers, we show that heavily channel-pruned standard convolution layers can achieve better accuracy and inference speed when using a similar parameter size. This observation is supported by the analyses concerning the characteristics of the target data domain, faces. Based on our observation, we propose to employ ResNet with a highly reduced channel, which surprisingly allows high efficiency compared to other mobile-friendly networks (e.g., MobileNetV1, V2, V3). From the extensive experiments, we show that the proposed backbone can replace that of the state-of-the-art face detector with a faster inference speed. Also, we further propose a new feature aggregation method to maximize the detection performance. Our proposed detector EResFD obtained 80.4% mAP on WIDER FACE Hard subset which only takes 37.7 ms for VGA image inference on CPU. Code is available at https://github.com/clovaai/EResFD.
△ Less
Submitted 2 November, 2023; v1 submitted 3 April, 2022;
originally announced April 2022.
-
ShareTrace: Contact Tracing with the Actor Model
Authors:
Ryan Tatton,
Erman Ayday,
Youngjin Yoo,
Anisa Halimi
Abstract:
Proximity-based contact tracing relies on mobile-device interaction to estimate the spread of disease. ShareTrace is one such approach that improves the efficacy of tracking disease spread by considering direct and indirect forms of contact. In this work, we utilize the actor model to provide an efficient and scalable formulation of ShareTrace with asynchronous, concurrent message passing on a tem…
▽ More
Proximity-based contact tracing relies on mobile-device interaction to estimate the spread of disease. ShareTrace is one such approach that improves the efficacy of tracking disease spread by considering direct and indirect forms of contact. In this work, we utilize the actor model to provide an efficient and scalable formulation of ShareTrace with asynchronous, concurrent message passing on a temporal contact network. We also introduce message reachability, an extension of temporal reachability that accounts for network topology and message-passing semantics. Our evaluation on both synthetic and real-world contact networks indicates that correct parameter values optimize for algorithmic accuracy and efficiency. In addition, we demonstrate that message reachability can accurately estimate the risk a user poses to their contacts.
△ Less
Submitted 18 September, 2022; v1 submitted 23 March, 2022;
originally announced March 2022.
-
Learning Features with Parameter-Free Layers
Authors:
Dongyoon Han,
YoungJoon Yoo,
Beomyoung Kim,
Byeongho Heo
Abstract:
Trainable layers such as convolutional building blocks are the standard network design choices by learning parameters to capture the global context through successive spatial operations. When designing an efficient network, trainable layers such as the depthwise convolution is the source of efficiency in the number of parameters and FLOPs, but there was little improvement to the model speed in pra…
▽ More
Trainable layers such as convolutional building blocks are the standard network design choices by learning parameters to capture the global context through successive spatial operations. When designing an efficient network, trainable layers such as the depthwise convolution is the source of efficiency in the number of parameters and FLOPs, but there was little improvement to the model speed in practice. This paper argues that simple built-in parameter-free operations can be a favorable alternative to the efficient trainable layers replacing spatial operations in a network architecture. We aim to break the stereotype of organizing the spatial operations of building blocks into trainable layers. Extensive experimental analyses based on layer-level studies with fully-trained models and neural architecture searches are provided to investigate whether parameter-free operations such as the max-pool are functional. The studies eventually give us a simple yet effective idea for redesigning network architectures, where the parameter-free operations are heavily used as the main building block without sacrificing the model accuracy as much. Experimental results on the ImageNet dataset demonstrate that the network architectures with parameter-free operations could enjoy the advantages of further efficiency in terms of model speed, the number of the parameters, and FLOPs. Code and ImageNet pretrained models are available at https://github.com/naver-ai/PfLayer.
△ Less
Submitted 20 March, 2022; v1 submitted 6 February, 2022;
originally announced February 2022.
-
Self-supervised Learning from 100 Million Medical Images
Authors:
Florin C. Ghesu,
Bogdan Georgescu,
Awais Mansoor,
Youngjin Yoo,
Dominik Neumann,
Pragneshkumar Patel,
R. S. Vishwanath,
James M. Balter,
Yue Cao,
Sasa Grbic,
Dorin Comaniciu
Abstract:
Building accurate and robust artificial intelligence systems for medical image assessment requires not only the research and design of advanced deep learning models but also the creation of large and curated sets of annotated training examples. Constructing such datasets, however, is often very costly -- due to the complex nature of annotation tasks and the high level of expertise required for the…
▽ More
Building accurate and robust artificial intelligence systems for medical image assessment requires not only the research and design of advanced deep learning models but also the creation of large and curated sets of annotated training examples. Constructing such datasets, however, is often very costly -- due to the complex nature of annotation tasks and the high level of expertise required for the interpretation of medical images (e.g., expert radiologists). To counter this limitation, we propose a method for self-supervised learning of rich image features based on contrastive learning and online feature clustering. For this purpose we leverage large training datasets of over 100,000,000 medical images of various modalities, including radiography, computed tomography (CT), magnetic resonance (MR) imaging and ultrasonography. We propose to use these features to guide model training in supervised and hybrid self-supervised/supervised regime on various downstream tasks. We highlight a number of advantages of this strategy on challenging image assessment problems in radiography, CT and MR: 1) Significant increase in accuracy compared to the state-of-the-art (e.g., AUC boost of 3-7% for detection of abnormalities from chest radiography scans and hemorrhage detection on brain CT); 2) Acceleration of model convergence during training by up to 85% compared to using no pretraining (e.g., 83% when training a model for detection of brain metastases in MR scans); 3) Increase in robustness to various image augmentations, such as intensity variations, rotations or scaling reflective of data variation seen in the field.
△ Less
Submitted 4 January, 2022;
originally announced January 2022.
-
Approximating TSP walks in subcubic graphs
Authors:
Michael C. Wigal,
Youngho Yoo,
Xingxing Yu
Abstract:
We prove that every simple 2-connected subcubic graph on $n$ vertices with $n_2$ vertices of degree 2 has a TSP walk of length at most $\frac{5n+n_2}{4}-1$, confirming a conjecture of Dvořák, Král', and Mohar. This bound is best possible; there are infinitely many subcubic and cubic graphs whose minimum TSP walks have lengths $\frac{5n+n_2}{4}-1$ and $\frac{5n}{4} - 2$ respectively. We characteriz…
▽ More
We prove that every simple 2-connected subcubic graph on $n$ vertices with $n_2$ vertices of degree 2 has a TSP walk of length at most $\frac{5n+n_2}{4}-1$, confirming a conjecture of Dvořák, Král', and Mohar. This bound is best possible; there are infinitely many subcubic and cubic graphs whose minimum TSP walks have lengths $\frac{5n+n_2}{4}-1$ and $\frac{5n}{4} - 2$ respectively. We characterize the extremal subcubic examples meeting this bound. We also give a quadratic-time combinatorial algorithm for finding such a TSP walk. In particular, we obtain a $\frac{5}{4}$-approximation algorithm for the graphic TSP on simple cubic graphs, improving on the previously best known approximation ratio of $\frac{9}{7}$.
△ Less
Submitted 12 December, 2021;
originally announced December 2021.
-
Artificial Intelligence Technology analysis using Artificial Intelligence patent through Deep Learning model and vector space model
Authors:
Yongmin Yoo,
Dongjin Lim,
Kyungsun Kim
Abstract:
Thanks to rapid development of artificial intelligence technology in recent years, the current artificial intelligence technology is contributing to many part of society. Education, environment, medical care, military, tourism, economy, politics, etc. are having a very large impact on society as a whole. For example, in the field of education, there is an artificial intelligence tutoring system th…
▽ More
Thanks to rapid development of artificial intelligence technology in recent years, the current artificial intelligence technology is contributing to many part of society. Education, environment, medical care, military, tourism, economy, politics, etc. are having a very large impact on society as a whole. For example, in the field of education, there is an artificial intelligence tutoring system that automatically assigns tutors based on student's level. In the field of economics, there are quantitative investment methods that automatically analyze large amounts of data to find investment laws to create investment models or predict changes in financial markets. As such, artificial intelligence technology is being used in various fields. So, it is very important to know exactly what factors have an important influence on each field of artificial intelligence technology and how the relationship between each field is connected. Therefore, it is necessary to analyze artificial intelligence technology in each field. In this paper, we analyze patent documents related to artificial intelligence technology. We propose a method for keyword analysis within factors using artificial intelligence patent data sets for artificial intelligence technology analysis. This is a model that relies on feature engineering based on deep learning model named KeyBERT, and using vector space model. A case study of collecting and analyzing artificial intelligence patent data was conducted to show how the proposed model can be applied to real world problems.
△ Less
Submitted 7 November, 2021;
originally announced November 2021.
-
Observations on K-image Expansion of Image-Mixing Augmentation for Classification
Authors:
Joonhyun Jeong,
Sungmin Cha,
Youngjoon Yoo,
Sangdoo Yun,
Taesup Moon,
Jongwon Choi
Abstract:
Image-mixing augmentations (e.g., Mixup and CutMix), which typically involve mixing two images, have become the de-facto training techniques for image classification. Despite their huge success in image classification, the number of images to be mixed has not been elucidated in the literature: only the naive K-image expansion has been shown to lead to performance degradation. This study derives a…
▽ More
Image-mixing augmentations (e.g., Mixup and CutMix), which typically involve mixing two images, have become the de-facto training techniques for image classification. Despite their huge success in image classification, the number of images to be mixed has not been elucidated in the literature: only the naive K-image expansion has been shown to lead to performance degradation. This study derives a new K-image mixing augmentation based on the stick-breaking process under Dirichlet prior distribution. We demonstrate the superiority of our K-image expansion augmentation over conventional two-image mixing augmentation methods through extensive experiments and analyses: (1) more robust and generalized classifiers; (2) a more desirable loss landscape shape; (3) better adversarial robustness. Moreover, we show that our probabilistic model can measure the sample-wise uncertainty and boost the efficiency for network architecture search by achieving a 7-fold reduction in the search time. Code will be available at https://github.com/yjyoo3312/DCutMix-PyTorch.git.
△ Less
Submitted 17 March, 2023; v1 submitted 8 October, 2021;
originally announced October 2021.
-
Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation via Semantic Knowledge Transfer and Self-Refinement
Authors:
Beomyoung Kim,
Youngjoon Yoo,
Chaeeun Rhee,
Junmo Kim
Abstract:
Weakly-supervised instance segmentation (WSIS) has been considered as a more challenging task than weakly-supervised semantic segmentation (WSSS). Compared to WSSS, WSIS requires instance-wise localization, which is difficult to extract from image-level labels. To tackle the problem, most WSIS approaches use off-the-shelf proposal techniques that require pre-training with instance or object level…
▽ More
Weakly-supervised instance segmentation (WSIS) has been considered as a more challenging task than weakly-supervised semantic segmentation (WSSS). Compared to WSSS, WSIS requires instance-wise localization, which is difficult to extract from image-level labels. To tackle the problem, most WSIS approaches use off-the-shelf proposal techniques that require pre-training with instance or object level labels, deviating the fundamental definition of the fully-image-level supervised setting. In this paper, we propose a novel approach including two innovative components. First, we propose a semantic knowledge transfer to obtain pseudo instance labels by transferring the knowledge of WSSS to WSIS while eliminating the need for the off-the-shelf proposals. Second, we propose a self-refinement method to refine the pseudo instance labels in a self-supervised scheme and to use the refined labels for training in an online manner. Here, we discover an erroneous phenomenon, semantic drift, that occurred by the missing instances in pseudo instance labels categorized as background class. This semantic drift occurs confusion between background and instance in training and consequently degrades the segmentation performance. We term this problem as semantic drift problem and show that our proposed self-refinement method eliminates the semantic drift problem. The extensive experiments on PASCAL VOC 2012 and MS COCO demonstrate the effectiveness of our approach, and we achieve a considerable performance without off-the-shelf proposal techniques. The code is available at https://github.com/clovaai/BESTIE.
△ Less
Submitted 29 March, 2022; v1 submitted 20 September, 2021;
originally announced September 2021.
-
Solar cell patent classification method based on keyword extraction and deep neural network
Authors:
Yongmin Yoo,
Dongjin Lim,
Tak-Sung Heo
Abstract:
With the growing impact of ESG on businesses, research related to renewable energy is receiving great attention. Solar cells are one of them, and accordingly, it can be said that the research value of solar cell patent analysis is very high. Patent documents have high research value. Being able to accurately analyze and classify patent documents can reveal several important technical relationships…
▽ More
With the growing impact of ESG on businesses, research related to renewable energy is receiving great attention. Solar cells are one of them, and accordingly, it can be said that the research value of solar cell patent analysis is very high. Patent documents have high research value. Being able to accurately analyze and classify patent documents can reveal several important technical relationships. It can also describe the business trends in that technology. And when it comes to investment, new industrial solutions will also be inspired and proposed to make important decisions. Therefore, we must carefully analyze patent documents and utilize the value of patents. To solve the solar cell patent classification problem, we propose a keyword extraction method and a deep neural network-based solar cell patent classification method. First, solar cell patents are analyzed for pretreatment. It then uses the KeyBERT algorithm to extract keywords and key phrases from the patent abstract to construct a lexical dictionary. We then build a solar cell patent classification model according to the deep neural network. Finally, we use a deep neural network-based solar cell patent classification model to classify power patents, and the training accuracy is greater than 95%. Also, the validation accuracy is about 87.5%. It can be seen that the deep neural network method can not only realize the classification of complex and difficult solar cell patents, but also have a good classification effect.
△ Less
Submitted 8 December, 2021; v1 submitted 17 September, 2021;
originally announced September 2021.
-
Towards Improving Adversarial Training of NLP Models
Authors:
Jin Yong Yoo,
Yanjun Qi
Abstract:
Adversarial training, a method for learning robust deep neural networks, constructs adversarial examples during training. However, recent methods for generating NLP adversarial examples involve combinatorial search and expensive sentence encoders for constraining the generated instances. As a result, it remains challenging to use vanilla adversarial training to improve NLP models' performance, and…
▽ More
Adversarial training, a method for learning robust deep neural networks, constructs adversarial examples during training. However, recent methods for generating NLP adversarial examples involve combinatorial search and expensive sentence encoders for constraining the generated instances. As a result, it remains challenging to use vanilla adversarial training to improve NLP models' performance, and the benefits are mainly uninvestigated. This paper proposes a simple and improved vanilla adversarial training process for NLP models, which we name Attacking to Training (A2T). The core part of A2T is a new and cheaper word substitution attack optimized for vanilla adversarial training. We use A2T to train BERT and RoBERTa models on IMDB, Rotten Tomatoes, Yelp, and SNLI datasets. Our results empirically show that it is possible to train robust NLP models using a much cheaper adversary. We demonstrate that vanilla adversarial training with A2T can improve an NLP model's robustness to the attack it was originally trained with and also defend the model against other types of word substitution attacks. Furthermore, we show that A2T can improve NLP models' standard accuracy, cross-domain generalization, and interpretability. Code is available at https://github.com/QData/Textattack-A2T .
△ Less
Submitted 11 September, 2021; v1 submitted 1 September, 2021;
originally announced September 2021.
-
NCIS: Neural Contextual Iterative Smoothing for Purifying Adversarial Perturbations
Authors:
Sungmin Cha,
Naeun Ko,
Youngjoon Yoo,
Taesup Moon
Abstract:
We propose a novel and effective purification based adversarial defense method against pre-processor blind white- and black-box attacks. Our method is computationally efficient and trained only with self-supervised learning on general images, without requiring any adversarial training or retraining of the classification model. We first show an empirical analysis on the adversarial noise, defined t…
▽ More
We propose a novel and effective purification based adversarial defense method against pre-processor blind white- and black-box attacks. Our method is computationally efficient and trained only with self-supervised learning on general images, without requiring any adversarial training or retraining of the classification model. We first show an empirical analysis on the adversarial noise, defined to be the residual between an original image and its adversarial example, has almost zero mean, symmetric distribution. Based on this observation, we propose a very simple iterative Gaussian Smoothing (GS) which can effectively smooth out adversarial noise and achieve substantially high robust accuracy. To further improve it, we propose Neural Contextual Iterative Smoothing (NCIS), which trains a blind-spot network (BSN) in a self-supervised manner to reconstruct the discriminative features of the original image that is also smoothed out by GS. From our extensive experiments on the large-scale ImageNet using four classification models, we show that our method achieves both competitive standard accuracy and state-of-the-art robust accuracy against most strong purifier-blind white- and black-box attacks. Also, we propose a new benchmark for evaluating a purification method based on commercial image classification APIs, such as AWS, Azure, Clarifai and Google. We generate adversarial examples by ensemble transfer-based black-box attack, which can induce complete misclassification of APIs, and demonstrate that our method can be used to increase adversarial robustness of APIs.
△ Less
Submitted 30 December, 2021; v1 submitted 22 June, 2021;
originally announced June 2021.
-
SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning
Authors:
Sungmin Cha,
Beomyoung Kim,
Youngjoon Yoo,
Taesup Moon
Abstract:
This paper introduces a solid state-of-the-art baseline for a class-incremental semantic segmentation (CISS) problem. While the recent CISS algorithms utilize variants of the knowledge distillation (KD) technique to tackle the problem, they failed to fully address the critical challenges in CISS causing the catastrophic forgetting; the semantic drift of the background class and the multi-label pre…
▽ More
This paper introduces a solid state-of-the-art baseline for a class-incremental semantic segmentation (CISS) problem. While the recent CISS algorithms utilize variants of the knowledge distillation (KD) technique to tackle the problem, they failed to fully address the critical challenges in CISS causing the catastrophic forgetting; the semantic drift of the background class and the multi-label prediction issue. To better address these challenges, we propose a new method, dubbed SSUL-M (Semantic Segmentation with Unknown Label with Memory), by carefully combining techniques tailored for semantic segmentation. Specifically, we claim three main contributions. (1) defining unknown classes within the background class to help to learn future classes (help plasticity), (2) freezing backbone network and past classifiers with binary cross-entropy loss and pseudo-labeling to overcome catastrophic forgetting (help stability), and (3) utilizing tiny exemplar memory for the first time in CISS to improve both plasticity and stability. The extensively conducted experiments show the effectiveness of our method, achieving significantly better performance than the recent state-of-the-art baselines on the standard benchmark datasets. Furthermore, we justify our contributions with thorough ablation analyses and discuss different natures of the CISS problem compared to the traditional class-incremental learning targeting classification. The official code is available at https://github.com/clovaai/SSUL.
△ Less
Submitted 19 November, 2021; v1 submitted 22 June, 2021;
originally announced June 2021.
-
Towards Automatic Actor-Critic Solutions to Continuous Control
Authors:
Jake Grigsby,
Jin Yong Yoo,
Yanjun Qi
Abstract:
Model-free off-policy actor-critic methods are an efficient solution to complex continuous control tasks. However, these algorithms rely on a number of design tricks and hyperparameters, making their application to new domains difficult and computationally expensive. This paper creates an evolutionary approach that automatically tunes these design decisions and eliminates the RL-specific hyperpara…
▽ More
Model-free off-policy actor-critic methods are an efficient solution to complex continuous control tasks. However, these algorithms rely on a number of design tricks and hyperparameters, making their application to new domains difficult and computationally expensive. This paper creates an evolutionary approach that automatically tunes these design decisions and eliminates the RL-specific hyperparameters from the Soft Actor-Critic algorithm. Our design is sample efficient and provides practical advantages over baseline approaches, including improved exploration, generalization over multiple control frequencies, and a robust ensemble of high-performance policies. Empirically, we show that our agent outperforms well-tuned hyperparameter settings in popular benchmarks from the DeepMind Control Suite. We then apply it to less common control tasks outside of simulated robotics to find high-performance solutions with minimal compute and research effort.
△ Less
Submitted 23 October, 2021; v1 submitted 16 June, 2021;
originally announced June 2021.
-
Medical Code Prediction from Discharge Summary: Document to Sequence BERT using Sequence Attention
Authors:
Tak-Sung Heo,
Yongmin Yoo,
Yeongjoon Park,
Byeong-Cheol Jo,
Kyungsun Kim
Abstract:
Clinical notes are unstructured text generated by clinicians during patient encounters. Clinical notes are usually accompanied by a set of metadata codes from the International Classification of Diseases(ICD). ICD code is an important code used in various operations, including insurance, reimbursement, medical diagnosis, etc. Therefore, it is important to classify ICD codes quickly and accurately.…
▽ More
Clinical notes are unstructured text generated by clinicians during patient encounters. Clinical notes are usually accompanied by a set of metadata codes from the International Classification of Diseases(ICD). ICD code is an important code used in various operations, including insurance, reimbursement, medical diagnosis, etc. Therefore, it is important to classify ICD codes quickly and accurately. However, annotating these codes is costly and time-consuming. So we propose a model based on bidirectional encoder representations from transformers (BERT) using the sequence attention method for automatic ICD code assignment. We evaluate our approach on the medical information mart for intensive care III (MIMIC-III) benchmark dataset. Our model achieved performance of macro-averaged F1: 0.62898 and micro-averaged F1: 0.68555 and is performing better than a performance of the state-of-the-art model using the MIMIC-III dataset. The contribution of this study proposes a method of using BERT that can be applied to documents and a sequence attention method that can capture important sequence in-formation appearing in documents.
△ Less
Submitted 10 November, 2021; v1 submitted 15 June, 2021;
originally announced June 2021.
-
A novel hybrid methodology of measuring sentence similarity
Authors:
Yongmin Yoo,
Tak-Sung Heo,
Yeongjoon Park,
Kyungsun Kim
Abstract:
The problem of measuring sentence similarity is an essential issue in the natural language processing (NLP) area. It is necessary to measure the similarity between sentences accurately. There are many approaches to measuring sentence similarity. Deep learning methodology shows a state-of-the-art performance in many natural language processing fields and is used a lot in sentence similarity measure…
▽ More
The problem of measuring sentence similarity is an essential issue in the natural language processing (NLP) area. It is necessary to measure the similarity between sentences accurately. There are many approaches to measuring sentence similarity. Deep learning methodology shows a state-of-the-art performance in many natural language processing fields and is used a lot in sentence similarity measurement methods. However, in the natural language processing field, considering the structure of the sentence or the word structure that makes up the sentence is also important. In this study, we propose a methodology combined with both deep learning methodology and a method considering lexical relationships. Our evaluation metric is the Pearson correlation coefficient and Spearman correlation coefficient. As a result, the proposed method outperforms the current approaches on a KorSTS standard benchmark Korean dataset. Moreover, it performs a maximum of 65% increase than only using deep learning methodology. Experiments show that our proposed method generally results in better performance than those with only a deep learning model.
△ Less
Submitted 20 June, 2021; v1 submitted 3 May, 2021;
originally announced May 2021.
-
Rainbow Memory: Continual Learning with a Memory of Diverse Samples
Authors:
Jihwan Bang,
Heesu Kim,
YoungJoon Yoo,
Jung-Woo Ha,
Jonghyun Choi
Abstract:
Continual learning is a realistic learning scenario for AI models. Prevalent scenario of continual learning, however, assumes disjoint sets of classes as tasks and is less realistic rather artificial. Instead, we focus on 'blurry' task boundary; where tasks shares classes and is more realistic and practical. To address such task, we argue the importance of diversity of samples in an episodic memor…
▽ More
Continual learning is a realistic learning scenario for AI models. Prevalent scenario of continual learning, however, assumes disjoint sets of classes as tasks and is less realistic rather artificial. Instead, we focus on 'blurry' task boundary; where tasks shares classes and is more realistic and practical. To address such task, we argue the importance of diversity of samples in an episodic memory. To enhance the sample diversity in the memory, we propose a novel memory management strategy based on per-sample classification uncertainty and data augmentation, named Rainbow Memory (RM). With extensive empirical validations on MNIST, CIFAR10, CIFAR100, and ImageNet datasets, we show that the proposed method significantly improves the accuracy in blurry continual learning setups, outperforming state of the arts by large margins despite its simplicity. Code and data splits will be available in https://github.com/clovaai/rainbow-memory.
△ Less
Submitted 31 March, 2021;
originally announced March 2021.
-
More than just an auxiliary loss: Anti-spoofing Backbone Training via Adversarial Pseudo-depth Generation
Authors:
Chang Keun Paik,
Naeun Ko,
Youngjoon Yoo
Abstract:
In this paper, a new method of training pipeline is discussed to achieve significant performance on the task of anti-spoofing with RGB image. We explore and highlight the impact of using pseudo-depth to pre-train a network that will be used as the backbone to the final classifier. While the usage of pseudo-depth for anti-spoofing task is not a new idea on its own, previous endeavours utilize pseud…
▽ More
In this paper, a new method of training pipeline is discussed to achieve significant performance on the task of anti-spoofing with RGB image. We explore and highlight the impact of using pseudo-depth to pre-train a network that will be used as the backbone to the final classifier. While the usage of pseudo-depth for anti-spoofing task is not a new idea on its own, previous endeavours utilize pseudo-depth simply as another medium to extract features for performing prediction, or as part of many auxiliary losses in aiding the training of the main classifier, normalizing the importance of pseudo-depth as just another semantic information. Through this work, we argue that there exists a significant advantage in training the final classifier can be gained by the pre-trained generator learning to predict the corresponding pseudo-depth of a given facial image, from a Generative Adversarial Network framework. Our experimental results indicate that our method results in a much more adaptable system that can generalize beyond intra-dataset samples, but to inter-dataset samples, which it has never seen before during training. Quantitatively, our method approaches the baseline performance of the current state of the art anti-spoofing models with 15.8x less parameters used. Moreover, experiments showed that the introduced methodology performs well only using basic binary label without additional semantic information which indicates potential benefits of this work in industrial and application based environment where trade-off between additional labelling and resources are considered.
△ Less
Submitted 19 March, 2021; v1 submitted 1 January, 2021;
originally announced January 2021.
-
TextAttack: Lessons learned in designing Python frameworks for NLP
Authors:
John X. Morris,
Jin Yong Yoo,
Yanjun Qi
Abstract:
TextAttack is an open-source Python toolkit for adversarial attacks, adversarial training, and data augmentation in NLP. TextAttack unites 15+ papers from the NLP adversarial attack literature into a single framework, with many components reused across attacks. This framework allows both researchers and developers to test and study the weaknesses of their NLP models. To build such an open-source N…
▽ More
TextAttack is an open-source Python toolkit for adversarial attacks, adversarial training, and data augmentation in NLP. TextAttack unites 15+ papers from the NLP adversarial attack literature into a single framework, with many components reused across attacks. This framework allows both researchers and developers to test and study the weaknesses of their NLP models. To build such an open-source NLP toolkit requires solving some common problems: How do we enable users to supply models from different deep learning frameworks? How can we build tools to support as many different datasets as possible? We share our insights into developing a well-written, well-documented NLP Python framework in hope that they can aid future development of similar packages.
△ Less
Submitted 4 October, 2020;
originally announced October 2020.
-
Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples
Authors:
Jin Yong Yoo,
John X. Morris,
Eli Lifland,
Yanjun Qi
Abstract:
We study the behavior of several black-box search algorithms used for generating adversarial examples for natural language processing (NLP) tasks. We perform a fine-grained analysis of three elements relevant to search: search algorithm, search space, and search budget. When new search algorithms are proposed in past work, the attack search space is often modified alongside the search algorithm. W…
▽ More
We study the behavior of several black-box search algorithms used for generating adversarial examples for natural language processing (NLP) tasks. We perform a fine-grained analysis of three elements relevant to search: search algorithm, search space, and search budget. When new search algorithms are proposed in past work, the attack search space is often modified alongside the search algorithm. Without ablation studies benchmarking the search algorithm change with the search space held constant, one cannot tell if an increase in attack success rate is a result of an improved search algorithm or a less restrictive search space. Additionally, many previous studies fail to properly consider the search algorithms' run-time cost, which is essential for downstream tasks like adversarial training. Our experiments provide a reproducible benchmark of search algorithms across a variety of search spaces and query budgets to guide future research in adversarial NLP. Based on our experiments, we recommend greedy attacks with word importance ranking when under a time constraint or attacking long inputs, and either beam search or particle swarm optimization otherwise. Code implementation shared via https://github.com/QData/TextAttack-Search-Benchmark
△ Less
Submitted 12 October, 2020; v1 submitted 9 September, 2020;
originally announced September 2020.
-
Quantifying and Leveraging Predictive Uncertainty for Medical Image Assessment
Authors:
Florin C. Ghesu,
Bogdan Georgescu,
Awais Mansoor,
Youngjin Yoo,
Eli Gibson,
R. S. Vishwanath,
Abishek Balachandran,
James M. Balter,
Yue Cao,
Ramandeep Singh,
Subba R. Digumarthy,
Mannudeep K. Kalra,
Sasa Grbic,
Dorin Comaniciu
Abstract:
The interpretation of medical images is a challenging task, often complicated by the presence of artifacts, occlusions, limited contrast and more. Most notable is the case of chest radiography, where there is a high inter-rater variability in the detection and classification of abnormalities. This is largely due to inconclusive evidence in the data or subjective definitions of disease appearance.…
▽ More
The interpretation of medical images is a challenging task, often complicated by the presence of artifacts, occlusions, limited contrast and more. Most notable is the case of chest radiography, where there is a high inter-rater variability in the detection and classification of abnormalities. This is largely due to inconclusive evidence in the data or subjective definitions of disease appearance. An additional example is the classification of anatomical views based on 2D Ultrasound images. Often, the anatomical context captured in a frame is not sufficient to recognize the underlying anatomy. Current machine learning solutions for these problems are typically limited to providing probabilistic predictions, relying on the capacity of underlying models to adapt to limited information and the high degree of label noise. In practice, however, this leads to overconfident systems with poor generalization on unseen data. To account for this, we propose a system that learns not only the probabilistic estimate for classification, but also an explicit uncertainty measure which captures the confidence of the system in the predicted output. We argue that this approach is essential to account for the inherent ambiguity characteristic of medical images from different radiologic exams including computed radiography, ultrasonography and magnetic resonance imaging. In our experiments we demonstrate that sample rejection based on the predicted uncertainty can significantly improve the ROC-AUC for various tasks, e.g., by 8% to 0.91 with an expected rejection rate of under 25% for the classification of different abnormalities in chest radiographs. In addition, we show that using uncertainty-driven bootstrapping to filter the training data, one can achieve a significant increase in robustness and accuracy.
△ Less
Submitted 8 July, 2020;
originally announced July 2020.