-
Explainable AI-Based Interface System for Weather Forecasting Model
Authors:
Soyeon Kim,
Junho Choi,
Yeji Choi,
Subeen Lee,
Artyom Stitsyuk,
Minkyoung Park,
Seongyeop Jeong,
Youhyun Baek,
Jaesik Choi
Abstract:
Machine learning (ML) is becoming increasingly popular in meteorological decision-making. Although the literature on explainable artificial intelligence (XAI) is growing steadily, user-centered XAI studies have not extend to this domain yet. This study defines three requirements for explanations of black-box models in meteorology through user studies: statistical model performance for different ra…
▽ More
Machine learning (ML) is becoming increasingly popular in meteorological decision-making. Although the literature on explainable artificial intelligence (XAI) is growing steadily, user-centered XAI studies have not extend to this domain yet. This study defines three requirements for explanations of black-box models in meteorology through user studies: statistical model performance for different rainfall scenarios to identify model bias, model reasoning, and the confidence of model outputs. Appropriate XAI methods are mapped to each requirement, and the generated explanations are tested quantitatively and qualitatively. An XAI interface system is designed based on user feedback. The results indicate that the explanations increase decision utility and user trust. Users prefer intuitive explanations over those based on XAI algorithms even for potentially easy-to-recognize examples. These findings can provide evidence for future research on user-centered XAI algorithms, as well as a basis to improve the usability of AI systems in practice.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
Kanana: Compute-efficient Bilingual Language Models
Authors:
Kanana LLM Team,
Yunju Bak,
Hojin Lee,
Minho Ryu,
Jiyeon Ham,
Seungjae Jung,
Daniel Wontae Nam,
Taegyeong Eo,
Donghun Lee,
Doohae Jung,
Boseop Kim,
Nayeon Kim,
Jaesun Park,
Hyunho Kim,
Hyunwoong Ko,
Changmin Lee,
Kyoung-Woon On,
Seulye Baeg,
Junrae Cho,
Sunghee Jung,
Jieun Kang,
EungGyun Kim,
Eunhwa Kim,
Byeongil Ko,
Daniel Lee
, et al. (4 additional authors not shown)
Abstract:
We introduce Kanana, a series of bilingual language models that demonstrate exceeding performance in Korean and competitive performance in English. The computational cost of Kanana is significantly lower than that of state-of-the-art models of similar size. The report details the techniques employed during pre-training to achieve compute-efficient yet competitive models, including high quality dat…
▽ More
We introduce Kanana, a series of bilingual language models that demonstrate exceeding performance in Korean and competitive performance in English. The computational cost of Kanana is significantly lower than that of state-of-the-art models of similar size. The report details the techniques employed during pre-training to achieve compute-efficient yet competitive models, including high quality data filtering, staged pre-training, depth up-scaling, and pruning and distillation. Furthermore, the report outlines the methodologies utilized during the post-training of the Kanana models, encompassing supervised fine-tuning and preference optimization, aimed at enhancing their capability for seamless interaction with users. Lastly, the report elaborates on plausible approaches used for language model adaptation to specific scenarios, such as embedding, retrieval augmented generation, and function calling. The Kanana model series spans from 2.1B to 32.5B parameters with 2.1B models (base, instruct, embedding) publicly released to promote research on Korean language models.
△ Less
Submitted 28 February, 2025; v1 submitted 26 February, 2025;
originally announced February 2025.
-
LidaRefer: Outdoor 3D Visual Grounding for Autonomous Driving with Transformers
Authors:
Yeong-Seung Baek,
Heung-Seon Oh
Abstract:
3D visual grounding (VG) aims to locate relevant objects or regions within 3D scenes based on natural language descriptions. Although recent methods for indoor 3D VG have successfully transformer-based architectures to capture global contextual information and enable fine-grained cross-modal fusion, they are unsuitable for outdoor environments due to differences in the distribution of point clouds…
▽ More
3D visual grounding (VG) aims to locate relevant objects or regions within 3D scenes based on natural language descriptions. Although recent methods for indoor 3D VG have successfully transformer-based architectures to capture global contextual information and enable fine-grained cross-modal fusion, they are unsuitable for outdoor environments due to differences in the distribution of point clouds between indoor and outdoor settings. Specifically, first, extensive LiDAR point clouds demand unacceptable computational and memory resources within transformers due to the high-dimensional visual features. Second, dominant background points and empty spaces in sparse LiDAR point clouds complicate cross-modal fusion owing to their irrelevant visual information. To address these challenges, we propose LidaRefer, a transformer-based 3D VG framework designed for large-scale outdoor scenes. Moreover, during training, we introduce a simple and effective localization method, which supervises the decoder's queries to localize not only a target object but also ambiguous objects that might be confused as the target due to the exhibition of similar attributes in a scene or the incorrect understanding of a language description. This supervision enhances the model's ability to distinguish ambiguous objects from a target by learning the differences in their spatial relationships and attributes. LidaRefer achieves state-of-the-art performance on Talk2Car-3D, a 3D VG dataset for autonomous driving, with significant improvements under various evaluation settings.
△ Less
Submitted 6 November, 2024;
originally announced November 2024.
-
Exploring how deep learning decodes anomalous diffusion via Grad-CAM
Authors:
Jaeyong Bae,
Yongjoo Baek,
Hawoong Jeong
Abstract:
While deep learning has been successfully applied to the data-driven classification of anomalous diffusion mechanisms, how the algorithm achieves the feat still remains a mystery. In this study, we use a well-known technique aimed at achieving explainable AI, namely the Gradient-weighted Class Activation Map (Grad-CAM), to investigate how deep learning (implemented by ResNets) recognizes the disti…
▽ More
While deep learning has been successfully applied to the data-driven classification of anomalous diffusion mechanisms, how the algorithm achieves the feat still remains a mystery. In this study, we use a well-known technique aimed at achieving explainable AI, namely the Gradient-weighted Class Activation Map (Grad-CAM), to investigate how deep learning (implemented by ResNets) recognizes the distinctive features of a particular anomalous diffusion model from the raw trajectory data. Our results show that Grad-CAM reveals the portions of the trajectory that hold crucial information about the underlying mechanism of anomalous diffusion, which can be utilized to enhance the robustness of the trained classifier against the measurement noise. Moreover, we observe that deep learning distills unique statistical characteristics of different diffusion mechanisms at various spatiotemporal scales, with larger-scale (smaller-scale) features identified at higher (lower) layers.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Kiss up, Kick down: Exploring Behavioral Changes in Multi-modal Large Language Models with Assigned Visual Personas
Authors:
Seungjong Sun,
Eungu Lee,
Seo Yeon Baek,
Seunghyun Hwang,
Wonbyung Lee,
Dongyan Nan,
Bernard J. Jansen,
Jang Hyun Kim
Abstract:
This study is the first to explore whether multi-modal large language models (LLMs) can align their behaviors with visual personas, addressing a significant gap in the literature that predominantly focuses on text-based personas. We developed a novel dataset of 5K fictional avatar images for assignment as visual personas to LLMs, and analyzed their negotiation behaviors based on the visual traits…
▽ More
This study is the first to explore whether multi-modal large language models (LLMs) can align their behaviors with visual personas, addressing a significant gap in the literature that predominantly focuses on text-based personas. We developed a novel dataset of 5K fictional avatar images for assignment as visual personas to LLMs, and analyzed their negotiation behaviors based on the visual traits depicted in these images, with a particular focus on aggressiveness. The results indicate that LLMs assess the aggressiveness of images in a manner similar to humans and output more aggressive negotiation behaviors when prompted with an aggressive visual persona. Interestingly, the LLM exhibited more aggressive negotiation behaviors when the opponent's image appeared less aggressive than their own, and less aggressive behaviors when the opponents image appeared more aggressive.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration
Authors:
Yujin Baek,
ChaeHun Park,
Jaeseok Kim,
Yu-Jung Heo,
Du-Seong Chang,
Jaegul Choo
Abstract:
To create culturally inclusive vision-language models (VLMs), developing a benchmark that tests their ability to address culturally relevant questions is essential. Existing approaches typically rely on human annotators, making the process labor-intensive and creating a cognitive burden in generating diverse questions. To address this, we propose a semi-automated framework for constructing cultura…
▽ More
To create culturally inclusive vision-language models (VLMs), developing a benchmark that tests their ability to address culturally relevant questions is essential. Existing approaches typically rely on human annotators, making the process labor-intensive and creating a cognitive burden in generating diverse questions. To address this, we propose a semi-automated framework for constructing cultural VLM benchmarks, specifically targeting multiple-choice QA. This framework combines human-VLM collaboration, where VLMs generate questions based on guidelines, a small set of annotated examples, and relevant knowledge, followed by a verification process by native speakers. We demonstrate the effectiveness of this framework through the creation of K-Viscuit, a dataset focused on Korean culture. Our experiments on this dataset reveal that open-source models lag behind proprietary ones in understanding Korean culture, highlighting key areas for improvement. We also present a series of further analyses, including human evaluation, augmenting VLMs with external knowledge, and the evaluation beyond multiple-choice QA. Our dataset is available at https://huggingface.co/datasets/ddehun/k-viscuit.
△ Less
Submitted 17 December, 2024; v1 submitted 24 June, 2024;
originally announced June 2024.
-
CREPE: Coordinate-Aware End-to-End Document Parser
Authors:
Yamato Okamoto,
Youngmin Baek,
Geewook Kim,
Ryota Nakao,
DongHyun Kim,
Moon Bin Yim,
Seunghyun Park,
Bado Lee
Abstract:
In this study, we formulate an OCR-free sequence generation model for visual document understanding (VDU). Our model not only parses text from document images but also extracts the spatial coordinates of the text based on the multi-head architecture. Named as Coordinate-aware End-to-end Document Parser (CREPE), our method uniquely integrates these capabilities by introducing a special token for OC…
▽ More
In this study, we formulate an OCR-free sequence generation model for visual document understanding (VDU). Our model not only parses text from document images but also extracts the spatial coordinates of the text based on the multi-head architecture. Named as Coordinate-aware End-to-end Document Parser (CREPE), our method uniquely integrates these capabilities by introducing a special token for OCR text, and token-triggered coordinate decoding. We also proposed a weakly-supervised framework for cost-efficient training, requiring only parsing annotations without high-cost coordinate annotations. Our experimental evaluations demonstrate CREPE's state-of-the-art performances on document parsing tasks. Beyond that, CREPE's adaptability is further highlighted by its successful usage in other document understanding tasks such as layout analysis, document visual question answering, and so one. CREPE's abilities including OCR and semantic parsing not only mitigate error propagation issues in existing OCR-dependent methods, it also significantly enhance the functionality of sequence generation models, ushering in a new era for document understanding studies.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation
Authors:
Chanran Kim,
Jeongin Lee,
Shichang Joung,
Bongmo Kim,
Yeul-Min Baek
Abstract:
In the field of personalized image generation, the ability to create images preserving concepts has significantly improved. Creating an image that naturally integrates multiple concepts in a cohesive and visually appealing composition can indeed be challenging. This paper introduces "InstantFamily," an approach that employs a novel masked cross-attention mechanism and a multimodal embedding stack…
▽ More
In the field of personalized image generation, the ability to create images preserving concepts has significantly improved. Creating an image that naturally integrates multiple concepts in a cohesive and visually appealing composition can indeed be challenging. This paper introduces "InstantFamily," an approach that employs a novel masked cross-attention mechanism and a multimodal embedding stack to achieve zero-shot multi-ID image generation. Our method effectively preserves ID as it utilizes global and local features from a pre-trained face recognition model integrated with text conditions. Additionally, our masked cross-attention mechanism enables the precise control of multi-ID and composition in the generated images. We demonstrate the effectiveness of InstantFamily through experiments showing its dominance in generating images with multi-ID, while resolving well-known multi-ID generation problems. Additionally, our model achieves state-of-the-art performance in both single-ID and multi-ID preservation. Furthermore, our model exhibits remarkable scalability with a greater number of ID preservation than it was originally trained with.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Optical next generation reservoir computing
Authors:
Hao Wang,
Jianqi Hu,
YoonSeok Baek,
Kohei Tsuchiyama,
Malo Joly,
Qiang Liu,
Sylvain Gigan
Abstract:
Artificial neural networks with internal dynamics exhibit remarkable capability in processing information. Reservoir computing (RC) is a canonical example that features rich computing expressivity and compatibility with physical implementations for enhanced efficiency. Recently, a new RC paradigm known as next generation reservoir computing (NGRC) further improves expressivity but compromises its…
▽ More
Artificial neural networks with internal dynamics exhibit remarkable capability in processing information. Reservoir computing (RC) is a canonical example that features rich computing expressivity and compatibility with physical implementations for enhanced efficiency. Recently, a new RC paradigm known as next generation reservoir computing (NGRC) further improves expressivity but compromises its physical openness, posing challenges for realizations in physical systems. Here we demonstrate optical NGRC with computations performed by light scattering through disordered media. In contrast to conventional optical RC implementations, we drive our optical reservoir directly with time-delayed inputs. Much like digital NGRC that relies on polynomial features of delayed inputs, our optical reservoir also implicitly generates these polynomial features for desired functionalities. By leveraging the domain knowledge of the reservoir inputs, we show that the optical NGRC not only predicts the short-term dynamics of the low-dimensional Lorenz63 and large-scale Kuramoto-Sivashinsky chaotic time series, but also replicates their long-term ergodic properties. Optical NGRC shows superiority in shorter training length, increased interpretability and fewer hyperparameters compared to conventional optical RC based on scattering media, while achieving better forecasting performance. Our optical NGRC framework may inspire the realization of NGRC in other physical RC systems, new applications beyond time-series processing, and the development of deep and parallel architectures broadly.
△ Less
Submitted 23 October, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
How to Evaluate Entity Resolution Systems: An Entity-Centric Framework with Application to Inventor Name Disambiguation
Authors:
Olivier Binette,
Youngsoo Baek,
Siddharth Engineer,
Christina Jones,
Abel Dasylva,
Jerome P. Reiter
Abstract:
Entity resolution (record linkage, microclustering) systems are notoriously difficult to evaluate. Looking for a needle in a haystack, traditional evaluation methods use sophisticated, application-specific sampling schemes to find matching pairs of records among an immense number of non-matches. We propose an alternative that facilitates the creation of representative, reusable benchmark data sets…
▽ More
Entity resolution (record linkage, microclustering) systems are notoriously difficult to evaluate. Looking for a needle in a haystack, traditional evaluation methods use sophisticated, application-specific sampling schemes to find matching pairs of records among an immense number of non-matches. We propose an alternative that facilitates the creation of representative, reusable benchmark data sets without necessitating complex sampling schemes. These benchmark data sets can then be used for model training and a variety of evaluation tasks. Specifically, we propose an entity-centric data labeling methodology that integrates with a unified framework for monitoring summary statistics, estimating key performance metrics such as cluster and pairwise precision and recall, and analyzing root causes for errors. We validate the framework in an application to inventor name disambiguation and through simulation studies. Software: https://github.com/OlivierBinette/er-evaluation/
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Towards Accurate Translation via Semantically Appropriate Application of Lexical Constraints
Authors:
Yujin Baek,
Koanho Lee,
Dayeon Ki,
Hyoung-Gyu Lee,
Cheonbok Park,
Jaegul Choo
Abstract:
Lexically-constrained NMT (LNMT) aims to incorporate user-provided terminology into translations. Despite its practical advantages, existing work has not evaluated LNMT models under challenging real-world conditions. In this paper, we focus on two important but under-studied issues that lie in the current evaluation process of LNMT studies. The model needs to cope with challenging lexical constrai…
▽ More
Lexically-constrained NMT (LNMT) aims to incorporate user-provided terminology into translations. Despite its practical advantages, existing work has not evaluated LNMT models under challenging real-world conditions. In this paper, we focus on two important but under-studied issues that lie in the current evaluation process of LNMT studies. The model needs to cope with challenging lexical constraints that are "homographs" or "unseen" during training. To this end, we first design a homograph disambiguation module to differentiate the meanings of homographs. Moreover, we propose PLUMCOT, which integrates contextually rich information about unseen lexical constraints from pre-trained language models and strengthens a copy mechanism of the pointer network via direct supervision of a copying score. We also release HOLLY, an evaluation benchmark for assessing the ability of a model to cope with "homographic" and "unseen" lexical constraints. Experiments on HOLLY and the previous test setup show the effectiveness of our method. The effects of PLUMCOT are shown to be remarkable in "unseen" constraints. Our dataset is available at https://github.com/papago-lab/HOLLY-benchmark
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
Asymptotics of Bayesian Uncertainty Estimation in Random Features Regression
Authors:
Youngsoo Baek,
Samuel I. Berchuck,
Sayan Mukherjee
Abstract:
In this paper we compare and contrast the behavior of the posterior predictive distribution to the risk of the maximum a posteriori estimator for the random features regression model in the overparameterized regime. We will focus on the variance of the posterior predictive distribution (Bayesian model average) and compare its asymptotics to that of the risk of the MAP estimator. In the regime wher…
▽ More
In this paper we compare and contrast the behavior of the posterior predictive distribution to the risk of the maximum a posteriori estimator for the random features regression model in the overparameterized regime. We will focus on the variance of the posterior predictive distribution (Bayesian model average) and compare its asymptotics to that of the risk of the MAP estimator. In the regime where the model dimensions grow faster than any constant multiple of the number of samples, asymptotic agreement between these two quantities is governed by the phase transition in the signal-to-noise ratio. They also asymptotically agree with each other when the number of samples grow faster than any constant multiple of model dimensions. Numerical simulations illustrate finer distributional properties of the two quantities for finite dimensions. We conjecture they have Gaussian fluctuations and exhibit similar properties as found by previous authors in a Gaussian sequence model, which is of independent theoretical interest.
△ Less
Submitted 26 October, 2023; v1 submitted 6 June, 2023;
originally announced June 2023.
-
TRACE: Table Reconstruction Aligned to Corner and Edges
Authors:
Youngmin Baek,
Daehyun Nam,
Jaeheung Surh,
Seung Shin,
Seonghyeon Kim
Abstract:
A table is an object that captures structured and informative content within a document, and recognizing a table in an image is challenging due to the complexity and variety of table layouts. Many previous works typically adopt a two-stage approach; (1) Table detection(TD) localizes the table region in an image and (2) Table Structure Recognition(TSR) identifies row- and column-wise adjacency rela…
▽ More
A table is an object that captures structured and informative content within a document, and recognizing a table in an image is challenging due to the complexity and variety of table layouts. Many previous works typically adopt a two-stage approach; (1) Table detection(TD) localizes the table region in an image and (2) Table Structure Recognition(TSR) identifies row- and column-wise adjacency relations between the cells. The use of a two-stage approach often entails the consequences of error propagation between the modules and raises training and inference inefficiency. In this work, we analyze the natural characteristics of a table, where a table is composed of cells and each cell is made up of borders consisting of edges. We propose a novel method to reconstruct the table in a bottom-up manner. Through a simple process, the proposed method separates cell boundaries from low-level features, such as corners and edges, and localizes table positions by combining the cells. A simple design makes the model easier to train and requires less computation than previous two-stage methods. We achieve state-of-the-art performance on the ICDAR2013 table competition benchmark and Wired Table in the Wild(WTW) dataset.
△ Less
Submitted 30 April, 2023;
originally announced May 2023.
-
Tradeoff of generalization error in unsupervised learning
Authors:
Gilhan Kim,
Hojun Lee,
Junghyo Jo,
Yongjoo Baek
Abstract:
Finding the optimal model complexity that minimizes the generalization error (GE) is a key issue of machine learning. For the conventional supervised learning, this task typically involves the bias-variance tradeoff: lowering the bias by making the model more complex entails an increase in the variance. Meanwhile, little has been studied about whether the same tradeoff exists for unsupervised lear…
▽ More
Finding the optimal model complexity that minimizes the generalization error (GE) is a key issue of machine learning. For the conventional supervised learning, this task typically involves the bias-variance tradeoff: lowering the bias by making the model more complex entails an increase in the variance. Meanwhile, little has been studied about whether the same tradeoff exists for unsupervised learning. In this study, we propose that unsupervised learning generally exhibits a two-component tradeoff of the GE, namely the model error and the data error -- using a more complex model reduces the model error at the cost of the data error, with the data error playing a more significant role for a smaller training dataset. This is corroborated by training the restricted Boltzmann machine to generate the configurations of the two-dimensional Ising model at a given temperature and the totally asymmetric simple exclusion process with given entry and exit rates. Our results also indicate that the optimal model tends to be more complex when the data to be learned are more complex.
△ Less
Submitted 12 September, 2023; v1 submitted 10 March, 2023;
originally announced March 2023.
-
$α$-divergence Improves the Entropy Production Estimation via Machine Learning
Authors:
Euijoon Kwon,
Yongjoo Baek
Abstract:
Recent years have seen a surge of interest in the algorithmic estimation of stochastic entropy production (EP) from trajectory data via machine learning. A crucial element of such algorithms is the identification of a loss function whose minimization guarantees the accurate EP estimation. In this study, we show that there exists a host of loss functions, namely those implementing a variational rep…
▽ More
Recent years have seen a surge of interest in the algorithmic estimation of stochastic entropy production (EP) from trajectory data via machine learning. A crucial element of such algorithms is the identification of a loss function whose minimization guarantees the accurate EP estimation. In this study, we show that there exists a host of loss functions, namely those implementing a variational representation of the $α$-divergence, which can be used for the EP estimation. By fixing $α$ to a value between $-1$ and $0$, the $α$-NEEP (Neural Estimator for Entropy Production) exhibits a much more robust performance against strong nonequilibrium driving or slow dynamics, which adversely affects the existing method based on the Kullback-Leibler divergence ($α= 0$). In particular, the choice of $α= -0.5$ tends to yield the optimal results. To corroborate our findings, we present an exactly solvable simplification of the EP estimation problem, whose loss function landscape and stochastic properties give deeper intuition into the robustness of the $α$-NEEP.
△ Less
Submitted 19 January, 2024; v1 submitted 6 March, 2023;
originally announced March 2023.
-
High-resolution synthetic residential energy use profiles for the United States
Authors:
Swapna Thorve,
Young Yun Baek,
Samarth Swarup,
Henning Mortveit,
Achla Marathe,
Anil Vullikanti,
Madhav Marathe
Abstract:
Efficient energy consumption is crucial for achieving sustainable energy goals in the era of climate change and grid modernization. Thus, it is vital to understand how energy is consumed at finer resolutions such as household in order to plan demand-response events or analyze the impacts of weather, electricity prices, electric vehicles, solar, and occupancy schedules on energy consumption. Howeve…
▽ More
Efficient energy consumption is crucial for achieving sustainable energy goals in the era of climate change and grid modernization. Thus, it is vital to understand how energy is consumed at finer resolutions such as household in order to plan demand-response events or analyze the impacts of weather, electricity prices, electric vehicles, solar, and occupancy schedules on energy consumption. However, availability and access to detailed energy-use data, which would enable detailed studies, has been rare. In this paper, we release a unique, large-scale, synthetic, residential energy-use dataset for the residential sector across the contiguous United States covering millions of households. The data comprise of hourly energy use profiles for synthetic households, disaggregated into Thermostatically Controlled Loads (TCL) and appliance use. The underlying framework is constructed using a bottom-up approach. Diverse open-source surveys and first principles models are used for end-use modeling. Extensive validation of the synthetic dataset has been conducted through comparisons with reported energy-use data. We present a detailed, open, high-resolution, residential energy-use dataset for the United States.
△ Less
Submitted 15 December, 2022; v1 submitted 14 October, 2022;
originally announced October 2022.
-
Estimating the Performance of Entity Resolution Algorithms: Lessons Learned Through PatentsView.org
Authors:
Olivier Binette,
Sokhna A York,
Emma Hickerson,
Youngsoo Baek,
Sarvo Madhavan,
Christina Jones
Abstract:
This paper introduces a novel evaluation methodology for entity resolution algorithms. It is motivated by PatentsView.org, a U.S. Patents and Trademarks Office patent data exploration tool that disambiguates patent inventors using an entity resolution algorithm. We provide a data collection methodology and tailored performance estimators that account for sampling biases. Our approach is simple, pr…
▽ More
This paper introduces a novel evaluation methodology for entity resolution algorithms. It is motivated by PatentsView.org, a U.S. Patents and Trademarks Office patent data exploration tool that disambiguates patent inventors using an entity resolution algorithm. We provide a data collection methodology and tailored performance estimators that account for sampling biases. Our approach is simple, practical and principled -- key characteristics that allow us to paint the first representative picture of PatentsView's disambiguation performance. This approach is used to inform PatentsView's users of the reliability of the data and to allow the comparison of competing disambiguation algorithms.
△ Less
Submitted 17 April, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Reliable Decision from Multiple Subtasks through Threshold Optimization: Content Moderation in the Wild
Authors:
Donghyun Son,
Byounggyu Lew,
Kwanghee Choi,
Yongsu Baek,
Seungwoo Choi,
Beomjun Shin,
Sungjoo Ha,
Buru Chang
Abstract:
Social media platforms struggle to protect users from harmful content through content moderation. These platforms have recently leveraged machine learning models to cope with the vast amount of user-generated content daily. Since moderation policies vary depending on countries and types of products, it is common to train and deploy the models per policy. However, this approach is highly inefficien…
▽ More
Social media platforms struggle to protect users from harmful content through content moderation. These platforms have recently leveraged machine learning models to cope with the vast amount of user-generated content daily. Since moderation policies vary depending on countries and types of products, it is common to train and deploy the models per policy. However, this approach is highly inefficient, especially when the policies change, requiring dataset re-labeling and model re-training on the shifted data distribution. To alleviate this cost inefficiency, social media platforms often employ third-party content moderation services that provide prediction scores of multiple subtasks, such as predicting the existence of underage personnel, rude gestures, or weapons, instead of directly providing final moderation decisions. However, making a reliable automated moderation decision from the prediction scores of the multiple subtasks for a specific target policy has not been widely explored yet. In this study, we formulate real-world scenarios of content moderation and introduce a simple yet effective threshold optimization method that searches the optimal thresholds of the multiple subtasks to make a reliable moderation decision in a cost-effective way. Extensive experiments demonstrate that our approach shows better performance in content moderation compared to existing threshold optimization methods and heuristics.
△ Less
Submitted 25 January, 2023; v1 submitted 15 August, 2022;
originally announced August 2022.
-
Literature Review to Collect Conceptual Variables of Scenario Methods for Establishing a Conceptual Scenario Framework
Authors:
Young-Min Baek,
Esther Cho,
Donghwan Shin,
Doo-Hwan Bae
Abstract:
Over recent decades, scenarios and scenario-based software/system engineering have been actively employed as essential tools to handle intricate problems, validate requirements, and support stakeholders' communication. However, despite the widespread use of scenarios, there have been several challenges for engineers to more willingly utilize scenario-based engineering approaches (i.e., scenario me…
▽ More
Over recent decades, scenarios and scenario-based software/system engineering have been actively employed as essential tools to handle intricate problems, validate requirements, and support stakeholders' communication. However, despite the widespread use of scenarios, there have been several challenges for engineers to more willingly utilize scenario-based engineering approaches (i.e., scenario methods) in their projects. First, the term scenario has numerous published definitions, thus lacking in a well-established shared understanding of scenarios and scenario methods. Second, the conceptual basis for engineers developing or employing scenarios is missing. To establish shared understanding and to find common denominators of scenario methods, this study leverages well-defined metamodeling and conceptualization that systematically investigate the concepts under analysis and define core entities and their relations. By conducting a semi-systematic literature review, conceptual variables are collected and conceptualized as a conceptual meta-model. As a result, this study introduces scenario variables (SVs) that represent constructs/semantics of scenario descriptions, according to 4 levels of constructs of a scenario method. To evaluate the comprehensibility and applicability of the defined variables, we analyze five existing scenario methods and their instances in automated driving system (ADS) domains. The results showed that our conceptual model and its constituent scenario variables adequately support the understanding of a scenario method and provide a means for comparative analysis between different scenario methods.
△ Less
Submitted 17 May, 2022;
originally announced May 2022.
-
DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting
Authors:
Seonghyeon Kim,
Seung Shin,
Yoonsik Kim,
Han-Cheol Cho,
Taeho Kil,
Jaeheung Surh,
Seunghyun Park,
Bado Lee,
Youngmin Baek
Abstract:
Recent end-to-end scene text spotters have achieved great improvement in recognizing arbitrary-shaped text instances. Common approaches for text spotting use region of interest pooling or segmentation masks to restrict features to single text instances. However, this makes it hard for the recognizer to decode correct sequences when the detection is not accurate i.e. one or more characters are crop…
▽ More
Recent end-to-end scene text spotters have achieved great improvement in recognizing arbitrary-shaped text instances. Common approaches for text spotting use region of interest pooling or segmentation masks to restrict features to single text instances. However, this makes it hard for the recognizer to decode correct sequences when the detection is not accurate i.e. one or more characters are cropped out. Considering that it is hard to accurately decide word boundaries with only the detector, we propose a novel Detection-agnostic End-to-End Recognizer, DEER, framework. The proposed method reduces the tight dependency between detection and recognition modules by bridging them with a single reference point for each text instance, instead of using detected regions. The proposed method allows the decoder to recognize the texts that are indicated by the reference point, with features from the whole image. Since only a single point is required to recognize the text, the proposed method enables text spotting without an arbitrarily-shaped detector or bounding polygon annotations. Experimental results present that the proposed method achieves competitive results on regular and arbitrarily-shaped text spotting benchmarks. Further analysis shows that DEER is robust to the detection errors. The code and dataset will be publicly available.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
Character Region Attention For Text Spotting
Authors:
Youngmin Baek,
Seung Shin,
Jeonghun Baek,
Sungrae Park,
Junyeop Lee,
Daehyun Nam,
Hwalsuk Lee
Abstract:
A scene text spotter is composed of text detection and recognition modules. Many studies have been conducted to unify these modules into an end-to-end trainable model to achieve better performance. A typical architecture places detection and recognition modules into separate branches, and a RoI pooling is commonly used to let the branches share a visual feature. However, there still exists a chanc…
▽ More
A scene text spotter is composed of text detection and recognition modules. Many studies have been conducted to unify these modules into an end-to-end trainable model to achieve better performance. A typical architecture places detection and recognition modules into separate branches, and a RoI pooling is commonly used to let the branches share a visual feature. However, there still exists a chance of establishing a more complimentary connection between the modules when adopting recognizer that uses attention-based decoder and detector that represents spatial information of the character regions. This is possible since the two modules share a common sub-task which is to find the location of the character regions. Based on the insight, we construct a tightly coupled single pipeline model. This architecture is formed by utilizing detection outputs in the recognizer and propagating the recognition loss through the detection stage. The use of character score map helps the recognizer attend better to the character center points, and the recognition loss propagation to the detector module enhances the localization of the character regions. Also, a strengthened sharing stage allows feature rectification and boundary localization of arbitrary-shaped text regions. Extensive experiments demonstrate state-of-the-art performance in publicly available straight and curved benchmark dataset.
△ Less
Submitted 19 July, 2020;
originally announced July 2020.
-
CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks
Authors:
Youngmin Baek,
Daehyun Nam,
Sungrae Park,
Junyeop Lee,
Seung Shin,
Jeonghun Baek,
Chae Young Lee,
Hwalsuk Lee
Abstract:
Despite the recent success of text detection and recognition methods, existing evaluation metrics fail to provide a fair and reliable comparison among those methods. In addition, there exists no end-to-end evaluation metric that takes characteristics of OCR tasks into account. Previous end-to-end metric contains cascaded errors from the binary scoring process applied in both detection and recognit…
▽ More
Despite the recent success of text detection and recognition methods, existing evaluation metrics fail to provide a fair and reliable comparison among those methods. In addition, there exists no end-to-end evaluation metric that takes characteristics of OCR tasks into account. Previous end-to-end metric contains cascaded errors from the binary scoring process applied in both detection and recognition tasks. Ignoring partially correct results raises a gap between quantitative and qualitative analysis, and prevents fine-grained assessment. Based on the fact that character is a key element of text, we hereby propose a Character-Level Evaluation metric (CLEval). In CLEval, the \textit{instance matching} process handles split and merge detection cases, and the \textit{scoring process} conducts character-level evaluation. By aggregating character-level scores, the CLEval metric provides a fine-grained evaluation of end-to-end results composed of the detection and recognition as well as individual evaluations for each module from the end-performance perspective. We believe that our metrics can play a key role in developing and analyzing state-of-the-art text detection and recognition methods. The evaluation code is publicly available at https://github.com/clovaai/CLEval.
△ Less
Submitted 11 June, 2020;
originally announced June 2020.
-
TedEval: A Fair Evaluation Metric for Scene Text Detectors
Authors:
Chae Young Lee,
Youngmin Baek,
Hwalsuk Lee
Abstract:
Despite the recent success of scene text detection methods, common evaluation metrics fail to provide a fair and reliable comparison among detectors. They have obvious drawbacks in reflecting the inherent characteristic of text detection tasks, unable to address issues such as granularity, multiline, and character incompleteness. In this paper, we propose a novel evaluation protocol called TedEval…
▽ More
Despite the recent success of scene text detection methods, common evaluation metrics fail to provide a fair and reliable comparison among detectors. They have obvious drawbacks in reflecting the inherent characteristic of text detection tasks, unable to address issues such as granularity, multiline, and character incompleteness. In this paper, we propose a novel evaluation protocol called TedEval (Text detector Evaluation), which evaluates text detections by an instance-level matching and a character-level scoring. Based on a firm standard rewarding behaviors that result in successful recognition, TedEval can act as a reliable standard for comparing and quantizing the detection quality throughout all difficulty levels. In this regard, we believe that TedEval can play a key role in developing state-of-the-art scene text detectors. The code is publicly available at https://github.com/clovaai/TedEval.
△ Less
Submitted 2 July, 2019;
originally announced July 2019.
-
Reliable Estimation of Individual Treatment Effect with Causal Information Bottleneck
Authors:
Sungyub Kim,
Yongsu Baek,
Sung Ju Hwang,
Eunho Yang
Abstract:
Estimating individual level treatment effects (ITE) from observational data is a challenging and important area in causal machine learning and is commonly considered in diverse mission-critical applications. In this paper, we propose an information theoretic approach in order to find more reliable representations for estimating ITE. We leverage the Information Bottleneck (IB) principle, which addr…
▽ More
Estimating individual level treatment effects (ITE) from observational data is a challenging and important area in causal machine learning and is commonly considered in diverse mission-critical applications. In this paper, we propose an information theoretic approach in order to find more reliable representations for estimating ITE. We leverage the Information Bottleneck (IB) principle, which addresses the trade-off between conciseness and predictive power of representation. With the introduction of an extended graphical model for causal information bottleneck, we encourage the independence between the learned representation and the treatment type. We also introduce an additional form of a regularizer from the perspective of understanding ITE in the semi-supervised learning framework to ensure more reliable representations. Experimental results show that our model achieves the state-of-the-art results and exhibits more reliable prediction performances with uncertainty information on real-world datasets.
△ Less
Submitted 7 June, 2019;
originally announced June 2019.
-
Character Region Awareness for Text Detection
Authors:
Youngmin Baek,
Bado Lee,
Dongyoon Han,
Sangdoo Yun,
Hwalsuk Lee
Abstract:
Scene text detection methods based on neural networks have emerged recently and have shown promising results. Previous methods trained with rigid word-level bounding boxes exhibit limitations in representing the text region in an arbitrary shape. In this paper, we propose a new scene text detection method to effectively detect text area by exploring each character and affinity between characters.…
▽ More
Scene text detection methods based on neural networks have emerged recently and have shown promising results. Previous methods trained with rigid word-level bounding boxes exhibit limitations in representing the text region in an arbitrary shape. In this paper, we propose a new scene text detection method to effectively detect text area by exploring each character and affinity between characters. To overcome the lack of individual character level annotations, our proposed framework exploits both the given character-level annotations for synthetic images and the estimated character-level ground-truths for real images acquired by the learned interim model. In order to estimate affinity between characters, the network is trained with the newly proposed representation for affinity. Extensive experiments on six benchmarks, including the TotalText and CTW-1500 datasets which contain highly curved texts in natural images, demonstrate that our character-level text detection significantly outperforms the state-of-the-art detectors. According to the results, our proposed method guarantees high flexibility in detecting complicated scene text images, such as arbitrarily-oriented, curved, or deformed texts.
△ Less
Submitted 3 April, 2019;
originally announced April 2019.
-
Cultural Values and Cross-cultural Video Consumption on YouTube
Authors:
Minsu Park,
Jaram Park,
Young Min Baek,
Michael Macy
Abstract:
Video-sharing social media like YouTube provide access to diverse cultural products from all over the world, making it possible to test theories that the Web facilitates global cultural convergence. Drawing on a daily listing of YouTube's most popular videos across 58 countries, we investigate the consumption of popular videos in countries that differ in cultural values, language, gross domestic p…
▽ More
Video-sharing social media like YouTube provide access to diverse cultural products from all over the world, making it possible to test theories that the Web facilitates global cultural convergence. Drawing on a daily listing of YouTube's most popular videos across 58 countries, we investigate the consumption of popular videos in countries that differ in cultural values, language, gross domestic product, and Internet penetration rate. Although online social media facilitate global access to cultural products, we find this technological capability does not result in universal cultural convergence. Instead, consumption of popular videos in culturally different countries appears to be constrained by cultural values. Cross-cultural convergence is more advanced in cosmopolitan countries with cultural values that favor individualism and power inequality.
△ Less
Submitted 17 May, 2017; v1 submitted 8 May, 2017;
originally announced May 2017.
-
Fundamental Structural Constraint of Random Scale-Free Networks
Authors:
Yongjoo Baek,
Daniel Kim,
Meesoon Ha,
Hawoong Jeong
Abstract:
We study the structural constraint of random scale-free networks that determines possible combinations of the degree exponent $γ$ and the upper cutoff $k_c$ in the thermodynamic limit. We employ the framework of graphicality transitions proposed by [Del Genio and co-workers, Phys. Rev. Lett. {\bf 107}, 178701 (2011)], while making it more rigorous and applicable to general values of kc. Using the…
▽ More
We study the structural constraint of random scale-free networks that determines possible combinations of the degree exponent $γ$ and the upper cutoff $k_c$ in the thermodynamic limit. We employ the framework of graphicality transitions proposed by [Del Genio and co-workers, Phys. Rev. Lett. {\bf 107}, 178701 (2011)], while making it more rigorous and applicable to general values of kc. Using the graphicality criterion, we show that the upper cutoff must be lower than $k_c N^{1/γ}$ for $γ< 2$, whereas any upper cutoff is allowed for $γ> 2$. This result is also numerically verified by both the random and deterministic sampling of degree sequences.
△ Less
Submitted 11 September, 2012; v1 submitted 2 July, 2012;
originally announced July 2012.