-
Split-and-Bridge: Adaptable Class Incremental Learning within a Single Neural Network
Authors:
Jong-Yeong Kim,
Dong-Wan Choi
Abstract:
Continual learning has been a major problem in the deep learning community, where the main challenge is how to effectively learn a series of newly arriving tasks without forgetting the knowledge of previous tasks. Initiated by Learning without Forgetting (LwF), many of the existing works report that knowledge distillation is effective to preserve the previous knowledge, and hence they commonly use…
▽ More
Continual learning has been a major problem in the deep learning community, where the main challenge is how to effectively learn a series of newly arriving tasks without forgetting the knowledge of previous tasks. Initiated by Learning without Forgetting (LwF), many of the existing works report that knowledge distillation is effective to preserve the previous knowledge, and hence they commonly use a soft label for the old task, namely a knowledge distillation (KD) loss, together with a class label for the new task, namely a cross entropy (CE) loss, to form a composite loss for a single neural network. However, this approach suffers from learning the knowledge by a CE loss as a KD loss often more strongly influences the objective function when they are in a competitive situation within a single network. This could be a critical problem particularly in a class incremental scenario, where the knowledge across tasks as well as within the new task, both of which can only be acquired by a CE loss, is essentially learned due to the existence of a unified classifier. In this paper, we propose a novel continual learning method, called Split-and-Bridge, which can successfully address the above problem by partially splitting a neural network into two partitions for training the new task separated from the old task and re-connecting them for learning the knowledge across tasks. In our thorough experimental analysis, our Split-and-Bridge method outperforms the state-of-the-art competitors in KD-based continual learning.
△ Less
Submitted 3 July, 2021;
originally announced July 2021.
-
New Estimands for Experiments with Strong Interference
Authors:
David Choi
Abstract:
In experiments that study social phenomena, such as peer influence or herd immunity, the treatment of one unit may influence the outcomes of others. Such "interference between units" violates traditional approaches for causal inference, so that additional assumptions are often imposed to model or limit the underlying social mechanism. For binary outcomes, we propose new estimands that can be estim…
▽ More
In experiments that study social phenomena, such as peer influence or herd immunity, the treatment of one unit may influence the outcomes of others. Such "interference between units" violates traditional approaches for causal inference, so that additional assumptions are often imposed to model or limit the underlying social mechanism. For binary outcomes, we propose new estimands that can be estimated without such assumptions, allowing for interval estimates assuming only the randomization of treatment. However, the causal implications of these estimands are more limited than those attainable under stronger assumptions, showing only that the treatment effects under the observed assignment varied systematically as a function of each unit's direct and indirect exposure, while also lower bounding the number of units affected.
△ Less
Submitted 29 August, 2023; v1 submitted 1 July, 2021;
originally announced July 2021.
-
TagRuler: Interactive Tool for Span-Level Data Programming by Demonstration
Authors:
Dongjin Choi,
Sara Evensen,
Çağatay Demiralp,
Estevam Hruschka
Abstract:
Despite rapid developments in the field of machine learning research, collecting high-quality labels for supervised learning remains a bottleneck for many applications. This difficulty is exacerbated by the fact that state-of-the-art models for NLP tasks are becoming deeper and more complex, often increasing the amount of training data required even for fine-tuning. Weak supervision methods, inclu…
▽ More
Despite rapid developments in the field of machine learning research, collecting high-quality labels for supervised learning remains a bottleneck for many applications. This difficulty is exacerbated by the fact that state-of-the-art models for NLP tasks are becoming deeper and more complex, often increasing the amount of training data required even for fine-tuning. Weak supervision methods, including data programming, address this problem and reduce the cost of label collection by using noisy label sources for supervision. However, until recently, data programming was only accessible to users who knew how to program. To bridge this gap, the Data Programming by Demonstration framework was proposed to facilitate the automatic creation of labeling functions based on a few examples labeled by a domain expert. This framework has proven successful for generating high-accuracy labeling models for document classification. In this work, we extend the DPBD framework to span-level annotation tasks, arguably one of the most time-consuming NLP labeling tasks. We built a novel tool, TagRuler, that makes it easy for annotators to build span-level labeling functions without programming and encourages them to explore trade-offs between different labeling models and active learning strategies. We empirically demonstrated that an annotator could achieve a higher F1 score using the proposed tool compared to manual labeling for different span-level annotation tasks.
△ Less
Submitted 24 June, 2021;
originally announced June 2021.
-
Non-archimedean Sendov's Conjecture
Authors:
Daebeom Choi,
Seewoo Lee
Abstract:
We prove non-archimedean analogue of Sendov's conjecure. We also provide complete list of polynomials over an algebraically closed non-archimedean field $K$ that satisfy the optimal bound in the Sendov's conjecture.
We prove non-archimedean analogue of Sendov's conjecure. We also provide complete list of polynomials over an algebraically closed non-archimedean field $K$ that satisfy the optimal bound in the Sendov's conjecture.
△ Less
Submitted 4 July, 2021; v1 submitted 21 June, 2021;
originally announced June 2021.
-
Measuring the repertoire of age-related behavioral changes in Drosophila melanogaster
Authors:
Katherine E. Overman,
Daniel M. Choi,
Kawai Leung,
Joshua W. Shaevitz,
Gordon J. Berman
Abstract:
Aging affects almost all aspects of an organism -- its morphology, its physiology, its behavior. Isolating which biological mechanisms are regulating these changes, however, has proven difficult, potentially due to our inability to characterize the full repertoire of an animal's behavior across the lifespan. Using data from fruit flies (D. melanogaster) we measure the full repertoire of behaviors…
▽ More
Aging affects almost all aspects of an organism -- its morphology, its physiology, its behavior. Isolating which biological mechanisms are regulating these changes, however, has proven difficult, potentially due to our inability to characterize the full repertoire of an animal's behavior across the lifespan. Using data from fruit flies (D. melanogaster) we measure the full repertoire of behaviors as a function of age. We observe a sexually dimorphic pattern of changes in the behavioral repertoire during aging. Although the stereotypy of the behaviors and the complexity of the repertoire overall remains relatively unchanged, we find evidence that the observed alterations in behavior can be explained by changing the fly's overall energy budget, suggesting potential connections between metabolism, aging, and behavior.
△ Less
Submitted 15 June, 2021; v1 submitted 14 June, 2021;
originally announced June 2021.
-
Semantic-aware Binary Code Representation with BERT
Authors:
Hyungjoon Koo,
Soyeon Park,
Daejin Choi,
Taesoo Kim
Abstract:
A wide range of binary analysis applications, such as bug discovery, malware analysis and code clone detection, require recovery of contextual meanings on a binary code. Recently, binary analysis techniques based on machine learning have been proposed to automatically reconstruct the code representation of a binary instead of manually crafting specifics of the analysis algorithm. However, the exis…
▽ More
A wide range of binary analysis applications, such as bug discovery, malware analysis and code clone detection, require recovery of contextual meanings on a binary code. Recently, binary analysis techniques based on machine learning have been proposed to automatically reconstruct the code representation of a binary instead of manually crafting specifics of the analysis algorithm. However, the existing approaches utilizing machine learning are still specialized to solve one domain of problems, rendering recreation of models for different types of binary analysis. In this paper, we propose DeepSemantic utilizing BERT in producing the semantic-aware code representation of a binary code.
To this end, we introduce well-balanced instruction normalization that holds rich information for each of instructions yet minimizing an out-of-vocabulary (OOV) problem. DeepSemantic has been carefully designed based on our study with large swaths of binaries. Besides, DeepSemantic leverages the essence of the BERT architecture into re-purposing a pre-trained generic model that is readily available as a one-time processing, followed by quickly applying specific downstream tasks with a fine-tuning process. We demonstrate DeepSemantic with two downstream tasks, namely, binary similarity comparison and compiler provenance (i.e., compiler and optimization level) prediction. Our experimental results show that the binary similarity model outperforms two state-of-the-art binary similarity tools, DeepBinDiff and SAFE, 49.84% and 15.83% on average, respectively.
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
View Distillation with Unlabeled Data for Extracting Adverse Drug Effects from User-Generated Data
Authors:
Payam Karisani,
Jinho D. Choi,
Li Xiong
Abstract:
We present an algorithm based on multi-layer transformers for identifying Adverse Drug Reactions (ADR) in social media data. Our model relies on the properties of the problem and the characteristics of contextual word embeddings to extract two views from documents. Then a classifier is trained on each view to label a set of unlabeled documents to be used as an initializer for a new classifier in t…
▽ More
We present an algorithm based on multi-layer transformers for identifying Adverse Drug Reactions (ADR) in social media data. Our model relies on the properties of the problem and the characteristics of contextual word embeddings to extract two views from documents. Then a classifier is trained on each view to label a set of unlabeled documents to be used as an initializer for a new classifier in the other view. Finally, the initialized classifier in each view is further trained using the initial training examples. We evaluated our model in the largest publicly available ADR dataset. The experiments testify that our model significantly outperforms the transformer-based models pretrained on domain-specific data.
△ Less
Submitted 24 May, 2021;
originally announced May 2021.
-
OutFlip: Generating Out-of-Domain Samples for Unknown Intent Detection with Natural Language Attack
Authors:
DongHyun Choi,
Myeong Cheol Shin,
EungGyun Kim,
Dong Ryeol Shin
Abstract:
Out-of-domain (OOD) input detection is vital in a task-oriented dialogue system since the acceptance of unsupported inputs could lead to an incorrect response of the system. This paper proposes OutFlip, a method to generate out-of-domain samples using only in-domain training dataset automatically. A white-box natural language attack method HotFlip is revised to generate out-of-domain samples inste…
▽ More
Out-of-domain (OOD) input detection is vital in a task-oriented dialogue system since the acceptance of unsupported inputs could lead to an incorrect response of the system. This paper proposes OutFlip, a method to generate out-of-domain samples using only in-domain training dataset automatically. A white-box natural language attack method HotFlip is revised to generate out-of-domain samples instead of adversarial examples. Our evaluation results showed that integrating OutFlip-generated out-of-domain samples into the training dataset could significantly improve an intent classification model's out-of-domain detection performance.
△ Less
Submitted 12 May, 2021;
originally announced May 2021.
-
Balancing weights for region-level analysis: the effect of Medicaid Expansion on the uninsurance rate among states that did not expand Medicaid
Authors:
Max Rubinstein,
Amelia Haviland,
David Choi
Abstract:
We predict the average effect of Medicaid expansion on the non-elderly adult uninsurance rate among states that did not expand Medicaid in 2014 as if they had expanded their Medicaid eligibility requirements. Using American Community Survey data aggregated to the region level, we estimate this effect by finding weights that approximately reweights the expansion regions to match the covariate distr…
▽ More
We predict the average effect of Medicaid expansion on the non-elderly adult uninsurance rate among states that did not expand Medicaid in 2014 as if they had expanded their Medicaid eligibility requirements. Using American Community Survey data aggregated to the region level, we estimate this effect by finding weights that approximately reweights the expansion regions to match the covariate distribution of the non-expansion regions. Existing methods to estimate balancing weights often assume that the covariates are measured without error and do not account for dependencies in the outcome model. Our covariates have random noise that is uncorrelated with the outcome errors and our outcome model has state-level random effects inducing dependence between regions. To correct for the bias induced by the measurement error, we propose generating our weights on a linear approximation to the true covariates, using an idea from measurement error literature known as "regression-calibration" (see, e.g., Carroll (2006)). This requires auxiliary data to estimate the variability of the measurement error. We also modify the Stable Balancing Weights objective proposed by Zubizaretta (2015)) to reduce the variance of our estimator when the model errors follow our assumed correlation structure. We show that these approaches outperform existing methods when attempting to predict observed outcomes during the pre-treatment period. Using this method we estimate that Medicaid expansion would have caused a -2.33 (-3.54, -1.11) percentage point change in the adult uninsurance rate among states that did not expand Medicaid.
△ Less
Submitted 23 May, 2022; v1 submitted 5 May, 2021;
originally announced May 2021.
-
Enhancing Cognitive Models of Emotions with Representation Learning
Authors:
Yuting Guo,
Jinho Choi
Abstract:
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions that can be used to computationally describe psychological models of emotions. Our framework integrates a contextualized embedding encoder with a multi-head probing model that enables to interpret dynamically learned representations optimized for an emotion classification task. Our model…
▽ More
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions that can be used to computationally describe psychological models of emotions. Our framework integrates a contextualized embedding encoder with a multi-head probing model that enables to interpret dynamically learned representations optimized for an emotion classification task. Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions. Our layer analysis can derive an emotion graph to depict hierarchical relations among the emotions. Our emotion representations can be used to generate an emotion wheel directly comparable to the one from Plutchik's\LN model, and also augment the values of missing emotions in the PAD emotional state model.
△ Less
Submitted 20 April, 2021;
originally announced April 2021.
-
Evaluation of Unsupervised Entity and Event Salience Estimation
Authors:
Jiaying Lu,
Jinho D. Choi
Abstract:
Salience Estimation aims to predict term importance in documents. Due to few existing human-annotated datasets and the subjective notion of salience, previous studies typically generate pseudo-ground truth for evaluation. However, our investigation reveals that the evaluation protocol proposed by prior work is difficult to replicate, thus leading to few follow-up studies existing. Moreover, the ev…
▽ More
Salience Estimation aims to predict term importance in documents. Due to few existing human-annotated datasets and the subjective notion of salience, previous studies typically generate pseudo-ground truth for evaluation. However, our investigation reveals that the evaluation protocol proposed by prior work is difficult to replicate, thus leading to few follow-up studies existing. Moreover, the evaluation process is problematic: the entity linking tool used for entity matching is very noisy, while the ignorance of event argument for event evaluation leads to boosted performance. In this work, we propose a light yet practical entity and event salience estimation evaluation protocol, which incorporates the more reliable syntactic dependency parser. Furthermore, we conduct a comprehensive analysis among popular entity and event definition standards, and present our own definition for the Salience Estimation task to reduce noise during the pseudo-ground truth generation process. Furthermore, we construct dependency-based heterogeneous graphs to capture the interactions of entities and events. The empirical results show that both baseline methods and the novel GNN method utilizing the heterogeneous graph consistently outperform the previous SOTA model in all proposed metrics.
△ Less
Submitted 14 April, 2021;
originally announced April 2021.
-
Atomic Manipulation of In-gap States on the $β$-Bi$_2$Pd Superconductor
Authors:
Cristina Mier,
Jiyoon Hwang,
Jinkyung Kim,
Yujeong Bae,
Fuyuki Nabeshima,
Yoshinori Imai,
Atsutaka Maeda,
Nicolás Lorente,
Andreas Heinrich,
Deung-Jang Choi
Abstract:
Electronic states in the gap of a superconductor inherit intriguing many-body properties from the superconductor. Here, we create these in-gap states by manipulating Cr atomic chains on the $β$-Bi$_2$Pd superconductor. We find that the topological properties of the in-gap states can greatly vary depending on the crafted spin chain. These systems make an ideal platform for non-trivial topological p…
▽ More
Electronic states in the gap of a superconductor inherit intriguing many-body properties from the superconductor. Here, we create these in-gap states by manipulating Cr atomic chains on the $β$-Bi$_2$Pd superconductor. We find that the topological properties of the in-gap states can greatly vary depending on the crafted spin chain. These systems make an ideal platform for non-trivial topological phases because of the large atom-superconductor interactions and the existence of a large Rashba coupling at the Bi-terminated surface. We study two spin chains, one with atoms two-lattice-parameter apart and one with square-root-of-two lattice parameters. Of these, only the second one is in a topologically non-trivial phase, in correspondence with the spin interactions for this geometry.
△ Less
Submitted 6 May, 2021; v1 submitted 13 April, 2021;
originally announced April 2021.
-
Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning
Authors:
Sangmin Lee,
Hak Gu Kim,
Dae Hwi Choi,
Hyung-Il Kim,
Yong Man Ro
Abstract:
Our work addresses long-term motion context issues for predicting future frames. To predict the future precisely, it is required to capture which long-term motion context (e.g., walking or running) the input motion (e.g., leg movement) belongs to. The bottlenecks arising when dealing with the long-term motion context are: (i) how to predict the long-term motion context naturally matching input seq…
▽ More
Our work addresses long-term motion context issues for predicting future frames. To predict the future precisely, it is required to capture which long-term motion context (e.g., walking or running) the input motion (e.g., leg movement) belongs to. The bottlenecks arising when dealing with the long-term motion context are: (i) how to predict the long-term motion context naturally matching input sequences with limited dynamics, (ii) how to predict the long-term motion context with high-dimensionality (e.g., complex motion). To address the issues, we propose novel motion context-aware video prediction. To solve the bottleneck (i), we introduce a long-term motion context memory (LMC-Memory) with memory alignment learning. The proposed memory alignment learning enables to store long-term motion contexts into the memory and to match them with sequences including limited dynamics. As a result, the long-term context can be recalled from the limited input sequence. In addition, to resolve the bottleneck (ii), we propose memory query decomposition to store local motion context (i.e., low-dimensional dynamics) and recall the suitable local context for each local part of the input individually. It enables to boost the alignment effects of the memory. Experimental results show that the proposed method outperforms other sophisticated RNN-based methods, especially in long-term condition. Further, we validate the effectiveness of the proposed network designs by conducting ablation studies and memory feature analysis. The source code of this work is available.
△ Less
Submitted 2 April, 2021;
originally announced April 2021.
-
Multiplicity one bound for cohomological automorphic representations with a fixed level
Authors:
Dohoon Choi
Abstract:
Let $F$ be a totally real field, and $\mathbb{A}_F$ be the adele ring of $F$. Let us fix $N$ to be a positive integer. Let $π_1=\otimesπ_{1,v}$ and $π_2=\otimesπ_{2,v}$ be distinct cohomological cuspidal automorphic representations of $\mathrm{GL}_n(\mathbb{A}_{F})$ with levels less than or equal to $N$.
Let $\mathcal{N}(π_1,π_2)$ be the minimum of the absolute norm of $v \nmid \infty$ such that…
▽ More
Let $F$ be a totally real field, and $\mathbb{A}_F$ be the adele ring of $F$. Let us fix $N$ to be a positive integer. Let $π_1=\otimesπ_{1,v}$ and $π_2=\otimesπ_{2,v}$ be distinct cohomological cuspidal automorphic representations of $\mathrm{GL}_n(\mathbb{A}_{F})$ with levels less than or equal to $N$.
Let $\mathcal{N}(π_1,π_2)$ be the minimum of the absolute norm of $v \nmid \infty$ such that $π_{1,v} \not \simeq π_{2,v}$ and that $π_{1,v}$ and $π_{2,v}$ are unramified. We prove that there exists a constant $C_N$ such that for every pair $π_1$ and $π_2$, $$\mathcal{N}(π_1,π_2) \leq C_N.$$ This improves known bounds $$ \mathcal{N}(π_1,π_2)=O(Q^A) \;\;\; (\text{some } A \text{ depending only on } n), $$ where $Q$ is the maximum of the analytic conductors of $π_1$ and $π_2$.
This result applies to newforms on $Γ_1(N)$. In particular, assume that $f_1$ and $f_2$ are Hecke eigenforms of weight $k_1$ and $k_2$ on $\mathrm{SL}_2(\mathbb{Z})$, respectively. We prove that if for all $p \in \{2,7\}$, $$λ_{f_1}(p)/\sqrt{p}^{(k_1-1)} = λ_{f_2}(p)/\sqrt{p}^{(k_2-1)},$$ then $f_1=cf_2$ for some constant $c$. Here, for each prime $p$, $λ_{f_i}(p)$ denotes the $p$-th Hecke eigenvalue of $f_i$.
△ Less
Submitted 12 March, 2022; v1 submitted 23 March, 2021;
originally announced March 2021.
-
Spin Resonance Amplitude and Frequency of a Single Atom on a Surface in a Vector Magnetic Field
Authors:
Jinkyung Kim,
Won-jun Jang,
Thi Hong Bui,
Deung-jang Choi,
Christoph Wolf,
Fernando Delgado,
Denis Krylov,
Soonhyeong Lee,
Sangwon Yoon,
Christopher P. Lutz,
Andreas J. Heinrich,
Yujeong Bae
Abstract:
We used electron spin resonance (ESR) combined with scanning tunneling microscopy (STM) to measure hydrogenated Ti (spin-1/2) atoms at low-symmetry binding sites on MgO in vector magnetic fields. We found strongly anisotropic g-values in all three spatial directions. Interestingly, the amplitude and lineshape of the ESR signals are also strongly dependent on the angle of the field. We conclude tha…
▽ More
We used electron spin resonance (ESR) combined with scanning tunneling microscopy (STM) to measure hydrogenated Ti (spin-1/2) atoms at low-symmetry binding sites on MgO in vector magnetic fields. We found strongly anisotropic g-values in all three spatial directions. Interestingly, the amplitude and lineshape of the ESR signals are also strongly dependent on the angle of the field. We conclude that the Ti spin is aligned along the magnetic field, while the tip spin follows its strong magnetic anisotropy. Our results show the interplay between the tip and surface spins in determining the ESR signals and highlight the precision of ESR-STM to identify the single atom's spin states.
△ Less
Submitted 17 March, 2021;
originally announced March 2021.
-
Putting Humans in the Natural Language Processing Loop: A Survey
Authors:
Zijie J. Wang,
Dongjin Choi,
Shenyu Xu,
Diyi Yang
Abstract:
How can we design Natural Language Processing (NLP) systems that learn from human feedback? There is a growing research body of Human-in-the-loop (HITL) NLP frameworks that continuously integrate human feedback to improve the model itself. HITL NLP research is nascent but multifarious -- solving various NLP problems, collecting diverse feedback from different people, and applying different methods…
▽ More
How can we design Natural Language Processing (NLP) systems that learn from human feedback? There is a growing research body of Human-in-the-loop (HITL) NLP frameworks that continuously integrate human feedback to improve the model itself. HITL NLP research is nascent but multifarious -- solving various NLP problems, collecting diverse feedback from different people, and applying different methods to learn from collected feedback. We present a survey of HITL NLP work from both Machine Learning (ML) and Human-Computer Interaction (HCI) communities that highlights its short yet inspiring history, and thoroughly summarize recent frameworks focusing on their tasks, goals, human interactions, and feedback learning methods. Finally, we discuss future directions for integrating human feedback in the NLP development loop.
△ Less
Submitted 6 March, 2021;
originally announced March 2021.
-
Run Your Visual-Inertial Odometry on NVIDIA Jetson: Benchmark Tests on a Micro Aerial Vehicle
Authors:
Jinwoo Jeon,
Sungwook Jung,
Eungchang Lee,
Duckyu Choi,
Hyun Myung
Abstract:
This paper presents benchmark tests of various visual(-inertial) odometry algorithms on NVIDIA Jetson platforms. The compared algorithms include mono and stereo, covering Visual Odometry (VO) and Visual-Inertial Odometry (VIO): VINS-Mono, VINS-Fusion, Kimera, ALVIO, Stereo-MSCKF, ORB-SLAM2 stereo, and ROVIO. As these methods are mainly used for unmanned aerial vehicles (UAVs), they must perform we…
▽ More
This paper presents benchmark tests of various visual(-inertial) odometry algorithms on NVIDIA Jetson platforms. The compared algorithms include mono and stereo, covering Visual Odometry (VO) and Visual-Inertial Odometry (VIO): VINS-Mono, VINS-Fusion, Kimera, ALVIO, Stereo-MSCKF, ORB-SLAM2 stereo, and ROVIO. As these methods are mainly used for unmanned aerial vehicles (UAVs), they must perform well in situations where the size of the processing board and weight is limited. Jetson boards released by NVIDIA satisfy these constraints as they have a sufficiently powerful central processing unit (CPU) and graphics processing unit (GPU) for image processing. However, in existing studies, the performance of Jetson boards as a processing platform for executing VO/VIO has not been compared extensively in terms of the usage of computing resources and accuracy. Therefore, this study compares representative VO/VIO algorithms on several NVIDIA Jetson platforms, namely NVIDIA Jetson TX2, Xavier NX, and AGX Xavier, and introduces a novel dataset 'KAIST VIO dataset' for UAVs. Including pure rotations, the dataset has several geometric trajectories that are harsh to visual(-inertial) state estimation. The evaluation is performed in terms of the accuracy of estimated odometry, CPU usage, and memory usage on various Jetson boards, algorithms, and trajectories. We present the {results of the} comprehensive benchmark test and release the dataset for the computer vision and robotics applications.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
Peacock Exploration: A Lightweight Exploration for UAV using Control-Efficient Trajectory
Authors:
EungChang Mason Lee,
Duckyu Choi,
Hyun Myung
Abstract:
Unmanned Aerial Vehicles have received much attention in recent years due to its wide range of applications, such as exploration of an unknown environment to acquire a 3D map without prior knowledge of it. Existing exploration methods have been largely challenged by computationally heavy probabilistic path planning. Similarly, kinodynamic constraints or proper sensors considering the payload for U…
▽ More
Unmanned Aerial Vehicles have received much attention in recent years due to its wide range of applications, such as exploration of an unknown environment to acquire a 3D map without prior knowledge of it. Existing exploration methods have been largely challenged by computationally heavy probabilistic path planning. Similarly, kinodynamic constraints or proper sensors considering the payload for UAVs were not considered. In this paper, to solve those issues and to consider the limited payload and computational resource of UAVs, we propose "Peacock Exploration": A lightweight exploration method for UAVs using precomputed minimum snap trajectories which look like a peacock's tail. Using the widely known, control efficient minimum snap trajectories and OctoMap, the UAV equipped with a RGB-D camera can explore unknown 3D environments without any prior knowledge or human-guidance with only O(logN) computational complexity. It also adopts the receding horizon approach and simple, heuristic scoring criteria. The proposed algorithm's performance is demonstrated by exploring a challenging 3D maze environment and compared with a state-of-the-art algorithm.
△ Less
Submitted 29 December, 2020;
originally announced December 2020.
-
Creating a Physicist: The Impact of Informal Programs on University Student Development
Authors:
Callie Rethman,
Jonathan Perry,
Jonan Donaldson,
Daniel Choi,
Tatiana Erukhimova
Abstract:
Physics outreach programs provide a critical context for informal experiences that promote the transition from new student to contributing physicist. Prior studies have suggested a positive link between participation in informal physics outreach programs and the development of a student's physics identity. In this study, we adopt a student-focused investigation to explore the effects of informal p…
▽ More
Physics outreach programs provide a critical context for informal experiences that promote the transition from new student to contributing physicist. Prior studies have suggested a positive link between participation in informal physics outreach programs and the development of a student's physics identity. In this study, we adopt a student-focused investigation to explore the effects of informal programs on dimensions of physics identity, sense of community, 21st century skill development, and motivation. We employed a mixed methods study combining a survey instrument (117 responses) and interviews (35) with current and former undergraduate and graduate students who participated in five programs through a physics and astronomy department at a large land-grant university. To examine interviews, we employed a framework based on situated learning theory, transformative learning theory, and the Dynamic Systems Model of Role Identity. Our findings, based on self-reported data, show that students who facilitated informal physics programs positively developed their physics identity, experienced increased sense of belonging to the physics community, and developed 21st century career skills. Specifically, students reported positive benefits to their communication, teamwork and networking, and design skills. The benefits of these programs can be achieved by departments of any size without significant commitment of funds or changes to curriculum.
△ Less
Submitted 29 May, 2021; v1 submitted 27 December, 2020;
originally announced December 2020.
-
Reinforcement learning with distance-based incentive/penalty (DIP) updates for highly constrained industrial control systems
Authors:
Hyungjun Park,
Daiki Min,
Jong-hyun Ryu,
Dong Gu Choi
Abstract:
Typical reinforcement learning (RL) methods show limited applicability for real-world industrial control problems because industrial systems involve various constraints and simultaneously require continuous and discrete control. To overcome these challenges, we devise a novel RL algorithm that enables an agent to handle a highly constrained action space. This algorithm has two main features. First…
▽ More
Typical reinforcement learning (RL) methods show limited applicability for real-world industrial control problems because industrial systems involve various constraints and simultaneously require continuous and discrete control. To overcome these challenges, we devise a novel RL algorithm that enables an agent to handle a highly constrained action space. This algorithm has two main features. First, we devise two distance-based Q-value update schemes, incentive update and penalty update, in a distance-based incentive/penalty update technique to enable the agent to decide discrete and continuous actions in the feasible region and to update the value of these types of actions. Second, we propose a method for defining the penalty cost as a shadow price-weighted penalty. This approach affords two advantages compared to previous methods to efficiently induce the agent to not select an infeasible action. We apply our algorithm to an industrial control problem, microgrid system operation, and the experimental results demonstrate its superiority.
△ Less
Submitted 19 May, 2021; v1 submitted 21 November, 2020;
originally announced November 2020.
-
Nonlinear imaging of nanoscale topological corner states
Authors:
Sergey Kruk,
Wenlong Gao,
Duk Yong Choi,
Thomas Zentgraf,
Shuang Zhang,
Yuri Kivshar
Abstract:
Topological states of light represent counterintuitive optical modes localized at boundaries of finite-size optical structures that originate from the properties of the bulk. Being defined by bulk properties, such boundary states are insensitive to certain types of perturbations, thus naturally enhancing robustness of photonic circuitries. Conventionally, the N-dimensional bulk modes correspond to…
▽ More
Topological states of light represent counterintuitive optical modes localized at boundaries of finite-size optical structures that originate from the properties of the bulk. Being defined by bulk properties, such boundary states are insensitive to certain types of perturbations, thus naturally enhancing robustness of photonic circuitries. Conventionally, the N-dimensional bulk modes correspond to (N-1)-dimensional boundary states. The higher-order bulk-boundary correspondence relates N-dimensional bulk to boundary states with dimensionality reduced by more than 1. A special interest lies in miniaturization of such higher-order topological states to the nanoscale. Here, we realize nanoscale topological corner states in metasurfaces with C6-symmetric honeycomb lattices. We directly observe nanoscale topology-empowered edge and corner localizations of light and enhancement of light-matter interactions via a nonlinear imaging technique. Control of light at the nanoscale empowered by topology may facilitate miniaturization and on-chip integration of classical and quantum photonic devices.
△ Less
Submitted 1 September, 2022; v1 submitted 19 November, 2020;
originally announced November 2020.
-
Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering
Authors:
Ricky T. Q. Chen,
Dami Choi,
Lukas Balles,
David Duvenaud,
Philipp Hennig
Abstract:
Standard first-order stochastic optimization algorithms base their updates solely on the average mini-batch gradient, and it has been shown that tracking additional quantities such as the curvature can help de-sensitize common hyperparameters. Based on this intuition, we explore the use of exact per-sample Hessian-vector products and gradients to construct optimizers that are self-tuning and hyper…
▽ More
Standard first-order stochastic optimization algorithms base their updates solely on the average mini-batch gradient, and it has been shown that tracking additional quantities such as the curvature can help de-sensitize common hyperparameters. Based on this intuition, we explore the use of exact per-sample Hessian-vector products and gradients to construct optimizers that are self-tuning and hyperparameter-free. Based on a dynamics model of the gradient, we derive a process which leads to a curvature-corrected, noise-adaptive online gradient estimate. The smoothness of our updates makes it more amenable to simple step size selection schemes, which we also base off of our estimates quantities. We prove that our model-based procedure converges in the noisy quadratic setting. Though we do not see similar gains in deep learning tasks, we can match the performance of well-tuned optimizers and ultimately, this is an interesting step for constructing self-tuning optimizers.
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
Competence-Level Prediction and Resume & Job Description Matching Using Context-Aware Transformer Models
Authors:
Changmao Li,
Elaine Fisher,
Rebecca Thomas,
Steve Pittard,
Vicki Hertzberg,
Jinho D. Choi
Abstract:
This paper presents a comprehensive study on resume classification to reduce the time and labor needed to screen an overwhelming number of applications significantly, while improving the selection of suitable candidates. A total of 6,492 resumes are extracted from 24,933 job applications for 252 positions designated into four levels of experience for Clinical Research Coordinators (CRC). Each resu…
▽ More
This paper presents a comprehensive study on resume classification to reduce the time and labor needed to screen an overwhelming number of applications significantly, while improving the selection of suitable candidates. A total of 6,492 resumes are extracted from 24,933 job applications for 252 positions designated into four levels of experience for Clinical Research Coordinators (CRC). Each resume is manually annotated to its most appropriate CRC position by experts through several rounds of triple annotation to establish guidelines. As a result, a high Kappa score of 61% is achieved for inter-annotator agreement. Given this dataset, novel transformer-based classification models are developed for two tasks: the first task takes a resume and classifies it to a CRC level (T1), and the second task takes both a resume and a job description to apply and predicts if the application is suited to the job T2. Our best models using section encoding and multi-head attention decoding give results of 73.3% to T1 and 79.2% to T2. Our analysis shows that the prediction errors are mostly made among adjacent CRC levels, which are hard for even experts to distinguish, implying the practical value of our models in real HR platforms.
△ Less
Submitted 5 November, 2020;
originally announced November 2020.
-
Extracting Chemical-Protein Interactions via Calibrated Deep Neural Network and Self-training
Authors:
Dongha Choi,
Hyunju Lee
Abstract:
The extraction of interactions between chemicals and proteins from several biomedical articles is important in many fields of biomedical research such as drug development and prediction of drug side effects. Several natural language processing methods, including deep neural network (DNN) models, have been applied to address this problem. However, these methods were trained with hard-labeled data,…
▽ More
The extraction of interactions between chemicals and proteins from several biomedical articles is important in many fields of biomedical research such as drug development and prediction of drug side effects. Several natural language processing methods, including deep neural network (DNN) models, have been applied to address this problem. However, these methods were trained with hard-labeled data, which tend to become over-confident, leading to degradation of the model reliability. To estimate the data uncertainty and improve the reliability, "calibration" techniques have been applied to deep learning models. In this study, to extract chemical--protein interactions, we propose a DNN-based approach incorporating uncertainty information and calibration techniques. Our model first encodes the input sequence using a pre-trained language-understanding model, following which it is trained using two calibration methods: mixup training and addition of a confidence penalty loss. Finally, the model is re-trained with augmented data that are extracted using the estimated uncertainties. Our approach has achieved state-of-the-art performance with regard to the Biocreative VI ChemProt task, while preserving higher calibration abilities than those of previous approaches. Furthermore, our approach also presents the possibilities of using uncertainty estimation for performance improvement.
△ Less
Submitted 4 November, 2020;
originally announced November 2020.
-
Revealing the Myth of Higher-Order Inference in Coreference Resolution
Authors:
Liyan Xu,
Jinho D. Choi
Abstract:
This paper analyzes the impact of higher-order inference (HOI) on the task of coreference resolution. HOI has been adapted by almost all recent coreference resolution models without taking much investigation on its true effectiveness over representation learning. To make a comprehensive analysis, we implement an end-to-end coreference system as well as four HOI approaches, attended antecedent, ent…
▽ More
This paper analyzes the impact of higher-order inference (HOI) on the task of coreference resolution. HOI has been adapted by almost all recent coreference resolution models without taking much investigation on its true effectiveness over representation learning. To make a comprehensive analysis, we implement an end-to-end coreference system as well as four HOI approaches, attended antecedent, entity equalization, span clustering, and cluster merging, where the latter two are our original methods. We find that given a high-performing encoder such as SpanBERT, the impact of HOI is negative to marginal, providing a new perspective of HOI to this task. Our best model using cluster merging shows the Avg-F1 of 80.2 on the CoNLL 2012 shared task dataset in English.
△ Less
Submitted 28 September, 2020; v1 submitted 24 September, 2020;
originally announced September 2020.
-
Emora: An Inquisitive Social Chatbot Who Cares For You
Authors:
Sarah E. Finch,
James D. Finch,
Ali Ahmadvand,
Ingyu,
Choi,
Xiangjue Dong,
Ruixiang Qi,
Harshita Sahijwani,
Sergey Volokhin,
Zihan Wang,
Zihao Wang,
Jinho D. Choi
Abstract:
Inspired by studies on the overwhelming presence of experience-sharing in human-human conversations, Emora, the social chatbot developed by Emory University, aims to bring such experience-focused interaction to the current field of conversational AI. The traditional approach of information-sharing topic handlers is balanced with a focus on opinion-oriented exchanges that Emora delivers, and new co…
▽ More
Inspired by studies on the overwhelming presence of experience-sharing in human-human conversations, Emora, the social chatbot developed by Emory University, aims to bring such experience-focused interaction to the current field of conversational AI. The traditional approach of information-sharing topic handlers is balanced with a focus on opinion-oriented exchanges that Emora delivers, and new conversational abilities are developed that support dialogues that consist of a collaborative understanding and learning process of the partner's life experiences. We present a curated dialogue system that leverages highly expressive natural language templates, powerful intent classification, and ontology resources to provide an engaging and interesting conversational experience to every user.
△ Less
Submitted 9 September, 2020;
originally announced September 2020.
-
3D Room Layout Estimation Beyond the Manhattan World Assumption
Authors:
Dongho Choi
Abstract:
Predicting 3D room layout from single image is a challenging task with many applications. In this paper, we propose a new training and post-processing method for 3D room layout estimation, built on a recent state-of-the-art 3D room layout estimation model. Experimental results show our method outperforms state-of-the-art approaches by a large margin in predicting visible room layout. Our method ha…
▽ More
Predicting 3D room layout from single image is a challenging task with many applications. In this paper, we propose a new training and post-processing method for 3D room layout estimation, built on a recent state-of-the-art 3D room layout estimation model. Experimental results show our method outperforms state-of-the-art approaches by a large margin in predicting visible room layout. Our method has obtained the 3rd place in 2020 Holistic Scene Structures for 3D Vision Workshop.
△ Less
Submitted 6 September, 2020;
originally announced September 2020.
-
Data Programming by Demonstration: A Framework for Interactively Learning Labeling Functions
Authors:
Sara Evensen,
Chang Ge,
Dongjin Choi,
Çağatay Demiralp
Abstract:
Data programming is a programmatic weak supervision approach to efficiently curate large-scale labeled training data. Writing data programs (labeling functions) requires, however, both programming literacy and domain expertise. Many subject matter experts have neither programming proficiency nor time to effectively write data programs. Furthermore, regardless of one's expertise in coding or machin…
▽ More
Data programming is a programmatic weak supervision approach to efficiently curate large-scale labeled training data. Writing data programs (labeling functions) requires, however, both programming literacy and domain expertise. Many subject matter experts have neither programming proficiency nor time to effectively write data programs. Furthermore, regardless of one's expertise in coding or machine learning, transferring domain expertise into labeling functions by enumerating rules and thresholds is not only time consuming but also inherently difficult. Here we propose a new framework, data programming by demonstration (DPBD), to generate labeling rules using interactive demonstrations of users. DPBD aims to relieve the burden of writing labeling functions from users, enabling them to focus on higher-level semantics such as identifying relevant signals for labeling tasks. We operationalize our framework with Ruler, an interactive system that synthesizes labeling rules for document classification by using span-level annotations of users on document examples. We compare Ruler with conventional data programming through a user study conducted with 10 data scientists creating labeling functions for sentiment and spam classification tasks. We find that Ruler is easier to use and learn and offers higher overall satisfaction, while providing discriminative model performances comparable to ones achieved by conventional data programming.
△ Less
Submitted 15 September, 2020; v1 submitted 3 September, 2020;
originally announced September 2020.
-
Molecular templates of spin textures on superconducting surfaces
Authors:
Cristina Mier,
Benjamin Verlhac,
Léo Garnier,
Roberto Robles,
Laurent Limot,
Nicolás Lorente,
Deung-Jang Choi
Abstract:
We create ordered islands of magnetically anisotropic nickelocene molecules on a Pb (111) substrate. By using inelastic electron tunneling spectra (IETS) and density functional theory, we characterize the magnetic response of these islands. This allows us to conclude that the islands present local and collective magnetic excitations. Furthermore, we show that nickelocene islands present complex no…
▽ More
We create ordered islands of magnetically anisotropic nickelocene molecules on a Pb (111) substrate. By using inelastic electron tunneling spectra (IETS) and density functional theory, we characterize the magnetic response of these islands. This allows us to conclude that the islands present local and collective magnetic excitations. Furthermore, we show that nickelocene islands present complex non-collinear spin patterns on the superconducting Pb (111) surface, opening the possibility of using molecular arrays to engineer spin textures with important implications on topological superconductivity.
△ Less
Submitted 1 September, 2020;
originally announced September 2020.
-
High-harmonic generation from metasurfaces empowered by bound states in the continuum
Authors:
George Zograf,
Kirill Koshelev,
Anastasia Zalogina,
Viacheslav Korolev,
Duk-Yong Choi,
Michael Zurch,
Christian Spielmann,
Barry Luther-Davies,
Daniil Kartashov,
Sergey Makarov,
Sergey Kruk,
Yuri Kivshar
Abstract:
The concept of optical bound states in the continuum (BICs) underpins the existence of strongly localized waves embedded into the radiation spectrum that can enhance the electromagnetic fields in subwavelength photonic structures. Early studies of optical BICs in waveguides and photonic crystals uncovered their topological properties, and the concept of quasi-BIC metasurfaces facilitated applicati…
▽ More
The concept of optical bound states in the continuum (BICs) underpins the existence of strongly localized waves embedded into the radiation spectrum that can enhance the electromagnetic fields in subwavelength photonic structures. Early studies of optical BICs in waveguides and photonic crystals uncovered their topological properties, and the concept of quasi-BIC metasurfaces facilitated applications of strong light-matter interactions to biosensing, lasing, and low-order nonlinear processes. Here we employ BIC-empowered dielectric metasurfaces to generate efficiently high optical harmonics up to the 11th order. We optimize a BIC mode for the first few harmonics and observe a transition between perturbative and nonperturbative nonlinear regimes. We also suggest a general strategy for designing subwavelength structures with strong resonances and nonperturbative nonlinearities. Our work bridges the fields of perturbative and nonperturbative nonlinear optics on the subwavelength scale.
△ Less
Submitted 26 August, 2020;
originally announced August 2020.
-
Quantum amplitude estimation algorithms on IBM quantum devices
Authors:
Pooja Rao,
Kwangmin Yu,
Hyunkyung Lim,
Dasol Jin,
Deokkyu Choi
Abstract:
Since the publication of the Quantum Amplitude Estimation (QAE) algorithm by Brassard et al., 2002, several variations have been proposed, such as Aaronson et al., 2019, Grinko et al., 2019, and Suzuki et al., 2020. The main difference between the original and the variants is the exclusion of Quantum Phase Estimation (QPE) by the latter. This difference is notable given that QPE is the key compone…
▽ More
Since the publication of the Quantum Amplitude Estimation (QAE) algorithm by Brassard et al., 2002, several variations have been proposed, such as Aaronson et al., 2019, Grinko et al., 2019, and Suzuki et al., 2020. The main difference between the original and the variants is the exclusion of Quantum Phase Estimation (QPE) by the latter. This difference is notable given that QPE is the key component of original QAE, but is composed of many operations considered expensive for the current NISQ era devices. We compare two recently proposed variants (Grinko et al., 2019 and Suzuki et al., 2020) by implementing them on the IBM Quantum device using Qiskit, an open source framework for quantum computing. We analyze and discuss advantages of each algorithm from the point of view of their implementation and performance on a quantum computer.
△ Less
Submitted 3 August, 2020;
originally announced August 2020.
-
XD at SemEval-2020 Task 12: Ensemble Approach to Offensive Language Identification in Social Media Using Transformer Encoders
Authors:
Xiangjue Dong,
Jinho D. Choi
Abstract:
This paper presents six document classification models using the latest transformer encoders and a high-performing ensemble model for a task of offensive language identification in social media. For the individual models, deep transformer layers are applied to perform multi-head attentions. For the ensemble model, the utterance representations taken from those individual models are concatenated an…
▽ More
This paper presents six document classification models using the latest transformer encoders and a high-performing ensemble model for a task of offensive language identification in social media. For the individual models, deep transformer layers are applied to perform multi-head attentions. For the ensemble model, the utterance representations taken from those individual models are concatenated and fed into a linear decoder to make the final decisions. Our ensemble model outperforms the individual models and shows up to 8.6% improvement over the individual models on the development set. On the test set, it achieves macro-F1 of 90.9% and becomes one of the high performing systems among 85 participants in the sub-task A of this shared task. Our analysis shows that although the ensemble model significantly improves the accuracy on the development set, the improvement is not as evident on the test set.
△ Less
Submitted 21 July, 2020;
originally announced July 2020.
-
The Completed SDSS-IV Extended Baryon Oscillation Spectroscopic Survey: N-body Mock Challenge for Galaxy Clustering Measurements
Authors:
Graziano Rossi,
Peter D. Choi,
Jeongin Moon,
Julian E. Bautista,
Hector Gil-Marin,
Romain Paviot,
Mariana Vargas-Magana,
Sylvain de la Torre,
Sebastien Fromenteau,
Ashley J. Ross,
Santiago Avila,
Etienne Burtin,
Kyle S. Dawson,
Stephanie Escoffier,
Salman Habib,
Katrin Heitmann,
Jiamin Hou,
Eva-Maria Mueller,
Will J. Percival,
Alex Smith,
Cheng Zhao,
Gong-Bo Zhao
Abstract:
We develop a series of N-body data challenges, functional to the final analysis of the extended Baryon Oscillation Spectroscopic Survey (eBOSS) Data Release 16 (DR16) galaxy sample. The challenges are primarily based on high-fidelity catalogs constructed from the Outer Rim simulation - a large box size realization (3 Gpc/h) characterized by an unprecedented combination of volume and mass resolutio…
▽ More
We develop a series of N-body data challenges, functional to the final analysis of the extended Baryon Oscillation Spectroscopic Survey (eBOSS) Data Release 16 (DR16) galaxy sample. The challenges are primarily based on high-fidelity catalogs constructed from the Outer Rim simulation - a large box size realization (3 Gpc/h) characterized by an unprecedented combination of volume and mass resolution, down to 1.85x10^9 M_sun/h. We generate synthetic galaxy mocks by populating Outer Rim halos with a variety of halo occupation distribution (HOD) schemes of increasing complexity, spanning different redshift intervals. We then assess the performance of three complementary redshift space distortion (RSD) models in configuration and Fourier space, adopted for the analysis of the complete DR16 eBOSS sample of Luminous Red Galaxies (LRGs). We find all the methods mutually consistent, with comparable systematic errors on the Alcock-Paczynski parameters and the growth of structure, and robust to different HOD prescriptions - thus validating the robustness of the models and the pipelines used for the baryon acoustic oscillation (BAO) and full shape clustering analysis. In particular, all the techniques are able to recover a_par and a_perp to within 0.9%, and fsig8 to within 1.5%. As a by-product of our work, we are also able to gain interesting insights on the galaxy-halo connection. Our study is relevant for the final eBOSS DR16 `consensus cosmology', as the systematic error budget is informed by testing the results of analyses against these high-resolution mocks. In addition, it is also useful for future large-volume surveys, since similar mock-making techniques and systematic corrections can be readily extended to model for instance the Dark Energy Spectroscopic Instrument (DESI) galaxy sample.
△ Less
Submitted 25 March, 2021; v1 submitted 17 July, 2020;
originally announced July 2020.
-
The Completed SDSS-IV extended Baryon Oscillation Spectroscopic Survey: Large-scale Structure Catalogs for Cosmological Analysis
Authors:
Ashley J. Ross,
Julian Bautista,
Rita Tojeiro,
Shadab Alam,
Stephen Bailey,
Etienne Burtin,
Johan Comparat,
Kyle S. Dawson,
Arnaud de Mattia,
Hélion du Mas des Bourboux,
Héctor Gil-Marín,
Jiamin Hou,
Hui Kong,
Brad W. Lyke,
Faizan G. Mohammad,
John Moustakas,
Eva-Maria Mueller,
Adam D. Myers,
Will J. Percival,
Anand Raichoor,
Mehdi Rezaie,
Hee-Jong Seo,
Alex Smith,
Jeremy L. Tinker,
Pauline Zarrouk
, et al. (31 additional authors not shown)
Abstract:
We present large-scale structure catalogs from the completed extended Baryon Oscillation Spectroscopic Survey (eBOSS). Derived from Sloan Digital Sky Survey (SDSS) -IV Data Release 16 (DR16), these catalogs provide the data samples, corrected for observational systematics, and random positions sampling the survey selection function. Combined, they allow large-scale clustering measurements suitable…
▽ More
We present large-scale structure catalogs from the completed extended Baryon Oscillation Spectroscopic Survey (eBOSS). Derived from Sloan Digital Sky Survey (SDSS) -IV Data Release 16 (DR16), these catalogs provide the data samples, corrected for observational systematics, and random positions sampling the survey selection function. Combined, they allow large-scale clustering measurements suitable for testing cosmological models. We describe the methods used to create these catalogs for the eBOSS DR16 Luminous Red Galaxy (LRG) and Quasar samples. The quasar catalog contains 343,708 redshifts with $0.8 < z < 2.2$ over 4,808\,deg$^2$. We combine 174,816 eBOSS LRG redshifts over 4,242\,deg$^2$ in the redshift interval $0.6 < z < 1.0$ with SDSS-III BOSS LRGs in the same redshift range to produce a combined sample of 377,458 galaxy redshifts distributed over 9,493\,deg$^2$. Improved algorithms for estimating redshifts allow that 98 per cent of LRG observations result in a successful redshift, with less than one per cent catastrophic failures ($Δz > 1000$ ${\rm km~s}^{-1}$). For quasars, these rates are 95 and 2 per cent (with $Δz > 3000$ ${\rm km~s}^{-1}$). We apply corrections for trends between the number densities of our samples and the properties of the imaging and spectroscopic data. For example, the quasar catalog obtains a $χ^2$/DoF$= 776/10$ for a null test against imaging depth before corrections and a $χ^2$/DoF$=6/8$ after. The catalogs, combined with careful consideration of the details of their construction found here-in, allow companion papers to present cosmological results with negligible impact from observational systematic uncertainties.
△ Less
Submitted 30 September, 2020; v1 submitted 17 July, 2020;
originally announced July 2020.
-
The Completed SDSS-IV extended Baryon Oscillation Spectroscopic Survey: measurement of the BAO and growth rate of structure of the luminous red galaxy sample from the anisotropic power spectrum between redshifts 0.6 and 1.0
Authors:
Héctor Gil-Marín,
Julián E. Bautista,
Romain Paviot,
Mariana Vargas-Magaña,
Sylvain de la Torre,
Sebastien Fromenteau,
Shadab Alam,
Santiago Ávila,
Etienne Burtin,
Chia-Hsun Chuang,
Kyle S. Dawson,
Jiamin Hou,
Arnaud de Mattia,
Faizan G. Mohammad,
Eva-Maria Müller,
Seshadri Nadathur,
Richard Neveux,
Will J. Percival,
Anand Raichoor,
Mehdi Rezaie,
Ashley J. Ross,
Graziano Rossi,
Vanina Ruhlmann-Kleider,
Alex Smith,
Amélie Tamone
, et al. (15 additional authors not shown)
Abstract:
We analyse the clustering of the Sloan Digital Sky Survey IV extended Baryon Oscillation Spectroscopic Survey Data Release 16 luminous red galaxy sample (DR16 eBOSS LRG) in combination with the high redshift tail of the Sloan Digital Sky Survey III Baryon Oscillation Spectroscopic Survey Data Release 12 (DR12 BOSS CMASS). We measure the redshift space distortions (RSD) and also extract the longitu…
▽ More
We analyse the clustering of the Sloan Digital Sky Survey IV extended Baryon Oscillation Spectroscopic Survey Data Release 16 luminous red galaxy sample (DR16 eBOSS LRG) in combination with the high redshift tail of the Sloan Digital Sky Survey III Baryon Oscillation Spectroscopic Survey Data Release 12 (DR12 BOSS CMASS). We measure the redshift space distortions (RSD) and also extract the longitudinal and transverse baryonic acoustic oscillation (BAO) scale from the anisotropic power spectrum signal inferred from 377,458 galaxies between redshifts 0.6 and 1.0, with effective redshift of $z_{\rm eff}=0.698$ and effective comoving volume of $2.72\,{\rm Gpc}^3$. After applying reconstruction we measure the BAO scale and infer $D_H(z_{\rm eff})/r_{\rm drag} = 19.30\pm 0.56$ and $D_M(z_{\rm eff})/r_{\rm drag} =17.86 \pm 0.37$. When we perform a redshift space distortions analysis on the pre-reconstructed catalogue on the monopole, quadrupole and hexadecapole we find, $D_H(z_{\rm eff})/r_{\rm drag} = 20.18\pm 0.78$, $D_M(z_{\rm eff})/r_{\rm drag} =17.49 \pm 0.52$ and $fσ_8(z_{\rm eff})=0.454\pm0.046$. We combine both sets of results along with the measurements in configuration space of \cite{LRG_corr} and report the following consensus values: $D_H(z_{\rm eff})/r_{\rm drag} = 19.77\pm 0.47$, $D_M(z_{\rm eff})/r_{\rm drag} = 17.65\pm 0.30$ and $fσ_8(z_{\rm eff})=0.473\pm 0.044$, which are in full agreement with the standard $Λ$CDM and GR predictions. These results represent the most precise measurements within the redshift range $0.6\leq z \leq 1.0$ and are the culmination of more than 8 years of SDSS observations.
△ Less
Submitted 21 December, 2020; v1 submitted 17 July, 2020;
originally announced July 2020.
-
The Completed SDSS-IV extended Baryon Oscillation Spectroscopic Survey: measurement of the BAO and growth rate of structure of the luminous red galaxy sample from the anisotropic correlation function between redshifts 0.6 and 1
Authors:
Julian E. Bautista,
Romain Paviot,
Mariana Vargas Magaña,
Sylvain de la Torre,
Sebastien Fromenteau,
Hector Gil-Marín,
Ashley J. Ross,
Etienne Burtin,
Kyle S. Dawson,
Jiamin Hou,
Jean-Paul Kneib,
Arnaud de Mattia,
Will J. Percival,
Graziano Rossi,
Rita Tojeiro,
Cheng Zhao,
Gong-Bo Zhao,
Shadab Alam,
Joel Brownstein,
Michael J. Chapman,
Peter D. Choi,
Chia-Hsun Chuang,
Stéphanie Escoffier,
Axel de la Macorra,
Hélion du Mas des Bourboux
, et al. (8 additional authors not shown)
Abstract:
We present the cosmological analysis of the configuration-space anisotropic clustering in the completed Sloan Digital Sky Survey IV (SDSS-IV) extended Baryon Oscillation Spectroscopic Survey (eBOSS) DR16 galaxy sample. This sample consists of luminous red galaxies (LRGs) spanning the redshift range $0.6 < z < 1$, at an effective redshift of $z_{\rm eff}=0.698$. It combines 174 816 eBOSS LRGs and 2…
▽ More
We present the cosmological analysis of the configuration-space anisotropic clustering in the completed Sloan Digital Sky Survey IV (SDSS-IV) extended Baryon Oscillation Spectroscopic Survey (eBOSS) DR16 galaxy sample. This sample consists of luminous red galaxies (LRGs) spanning the redshift range $0.6 < z < 1$, at an effective redshift of $z_{\rm eff}=0.698$. It combines 174 816 eBOSS LRGs and 202 642 BOSS CMASS galaxies. We extract and model the baryon acoustic oscillations (BAO) and redshift-space distortions (RSD) features from the galaxy two-point correlation function to infer geometrical and dynamical cosmological constraints. The adopted methodology is extensively tested on a set of realistic simulations. The correlations between the inferred parameters from the BAO and full-shape correlation function analyses are estimated. This allows us to derive joint constraints on the three cosmological parameter combinations: $D_M(z)/r_d$, $D_H(z)/r_d$ and $fσ_8(z)$, where $D_M$ is the comoving angular diameter distance, $D_H$ is Hubble distance, $r_d$ is the comoving BAO scale, $f$ is the linear growth rate of structure, and $σ_8$ is the amplitude of linear matter perturbations. After combining the results with those from the parallel power spectrum analysis of Gil-Marin et al. 2020, we obtain the constraints: $D_M/r_d = 17.65 \pm 0.30$, $D_H/r_d = 19.77 \pm 0.47$, $fσ_8 = 0.473 \pm 0.044$. These measurements are consistent with a flat $Λ$CDM model with standard gravity.
△ Less
Submitted 21 September, 2020; v1 submitted 17 July, 2020;
originally announced July 2020.
-
The Completed SDSS-IV extended Baryon Oscillation Spectroscopic Survey: Cosmological Implications from two Decades of Spectroscopic Surveys at the Apache Point observatory
Authors:
eBOSS Collaboration,
Shadab Alam,
Marie Aubert,
Santiago Avila,
Christophe Balland,
Julian E. Bautista,
Matthew A. Bershady,
Dmitry Bizyaev,
Michael R. Blanton,
Adam S. Bolton,
Jo Bovy,
Jonathan Brinkmann,
Joel R. Brownstein,
Etienne Burtin,
Solene Chabanier,
Michael J. Chapman,
Peter Doohyun Choi,
Chia-Hsun Chuang,
Johan Comparat,
Andrei Cuceu,
Kyle S. Dawson,
Axel de la Macorra,
Sylvain de la Torre,
Arnaud de Mattia,
Victoria de Sainte Agathe
, et al. (75 additional authors not shown)
Abstract:
We present the cosmological implications from final measurements of clustering using galaxies, quasars, and Ly$α$ forests from the completed Sloan Digital Sky Survey (SDSS) lineage of experiments in large-scale structure. These experiments, composed of data from SDSS, SDSS-II, BOSS, and eBOSS, offer independent measurements of baryon acoustic oscillation (BAO) measurements of angular-diameter dist…
▽ More
We present the cosmological implications from final measurements of clustering using galaxies, quasars, and Ly$α$ forests from the completed Sloan Digital Sky Survey (SDSS) lineage of experiments in large-scale structure. These experiments, composed of data from SDSS, SDSS-II, BOSS, and eBOSS, offer independent measurements of baryon acoustic oscillation (BAO) measurements of angular-diameter distances and Hubble distances relative to the sound horizon, $r_d$, from eight different samples and six measurements of the growth rate parameter, $fσ_8$, from redshift-space distortions (RSD). This composite sample is the most constraining of its kind and allows us to perform a comprehensive assessment of the cosmological model after two decades of dedicated spectroscopic observation. We show that the BAO data alone are able to rule out dark-energy-free models at more than eight standard deviations in an extension to the flat, $Λ$CDM model that allows for curvature. When combined with Planck Cosmic Microwave Background (CMB) measurements of temperature and polarization the BAO data provide nearly an order of magnitude improvement on curvature constraints. The RSD measurements indicate a growth rate that is consistent with predictions from Planck primary data and with General Relativity. When combining the results of SDSS BAO and RSD with external data, all multiple-parameter extensions remain consistent with a $Λ$CDM model. Regardless of cosmological model, the precision on $Ω_Λ$, $H_0$, and $σ_8$, remains at roughly 1\%, showing changes of less than 0.6\% in the central values between models. The inverse distance ladder measurement under a o$w_0w_a$CDM yields $H_0= 68.20 \pm 0.81 \, \rm km\, s^{-1} Mpc^{-1}$, remaining in tension with several direct determination methods. (abridged)
△ Less
Submitted 9 July, 2024; v1 submitted 17 July, 2020;
originally announced July 2020.
-
PathGAN: Local Path Planning with Attentive Generative Adversarial Networks
Authors:
Dooseop Choi,
Seung-jun Han,
Kyoungwook Min,
Jeongdan Choi
Abstract:
To achieve autonomous driving without high-definition maps, we present a model capable of generating multiple plausible paths from egocentric images for autonomous vehicles. Our generative model comprises two neural networks: the feature extraction network (FEN) and path generation network (PGN). The FEN extracts meaningful features from an egocentric image, whereas the PGN generates multiple path…
▽ More
To achieve autonomous driving without high-definition maps, we present a model capable of generating multiple plausible paths from egocentric images for autonomous vehicles. Our generative model comprises two neural networks: the feature extraction network (FEN) and path generation network (PGN). The FEN extracts meaningful features from an egocentric image, whereas the PGN generates multiple paths from the features, given a driving intention and speed. To ensure that the paths generated are plausible and consistent with the intention, we introduce an attentive discriminator and train it with the PGN under generative adversarial networks framework. We also devise an interaction model between the positions in the paths and the intentions hidden in the positions and design a novel PGN architecture that reflects the interaction model, resulting in the improvement of the accuracy and diversity of the generated paths. Finally, we introduce ETRIDriving, a dataset for autonomous driving in which the recorded sensor data are labeled with discrete high-level driving actions, and demonstrate the state-of-the-art performance of the proposed model on ETRIDriving in terms of accuracy and diversity.
△ Less
Submitted 2 March, 2021; v1 submitted 7 July, 2020;
originally announced July 2020.
-
The spontaneous symmetry breaking in Ta$_2$NiSe$_5$ is structural in nature
Authors:
Edoardo Baldini,
Alfred Zong,
Dongsung Choi,
Changmin Lee,
Marios H. Michael,
Lukas Windgaetter,
Igor I. Mazin,
Simone Latini,
Doron Azoury,
Baiqing Lv,
Anshul Kogar,
Yao Wang,
Yangfan Lu,
Tomohiro Takayama,
Hidenori Takagi,
Andrew J. Millis,
Angel Rubio,
Eugene Demler,
Nuh Gedik
Abstract:
The excitonic insulator is an electronically-driven phase of matter that emerges upon the spontaneous formation and Bose condensation of excitons. Detecting this exotic order in candidate materials is a subject of paramount importance, as the size of the excitonic gap in the band structure establishes the potential of this collective state for superfluid energy transport. However, the identificati…
▽ More
The excitonic insulator is an electronically-driven phase of matter that emerges upon the spontaneous formation and Bose condensation of excitons. Detecting this exotic order in candidate materials is a subject of paramount importance, as the size of the excitonic gap in the band structure establishes the potential of this collective state for superfluid energy transport. However, the identification of this phase in real solids is hindered by the coexistence of a structural order parameter with the same symmetry as the excitonic order. Only a few materials are currently believed to host a dominant excitonic phase, Ta$_2$NiSe$_5$ being the most promising. Here, we test this scenario by using an ultrashort laser pulse to quench the broken-symmetry phase of this transition metal chalcogenide. Tracking the dynamics of the material's electronic and crystal structure after light excitation reveals surprising spectroscopic fingerprints that are only compatible with a primary order parameter of phononic nature. We rationalize our findings through state-of-the-art calculations, confirming that the structural order accounts for most of the electronic gap opening. Not only do our results uncover the long-sought mechanism driving the phase transition of Ta$_2$NiSe$_5$, but they also conclusively rule out any substantial excitonic character in this instability.
△ Less
Submitted 6 July, 2020;
originally announced July 2020.
-
Delayed Q-update: A novel credit assignment technique for deriving an optimal operation policy for the Grid-Connected Microgrid
Authors:
Hyungjun Park,
Daiki Min,
Jong-hyun Ryu,
Dong Gu Choi
Abstract:
A microgrid is an innovative system that integrates distributed energy resources to supply electricity demand within electrical boundaries. This study proposes an approach for deriving a desirable microgrid operation policy that enables sophisticated controls in the microgrid system using the proposed novel credit assignment technique, delayed-Q update. The technique employs novel features such as…
▽ More
A microgrid is an innovative system that integrates distributed energy resources to supply electricity demand within electrical boundaries. This study proposes an approach for deriving a desirable microgrid operation policy that enables sophisticated controls in the microgrid system using the proposed novel credit assignment technique, delayed-Q update. The technique employs novel features such as the ability to tackle and resolve the delayed effective property of the microgrid, which prevents learning agents from deriving a well-fitted policy under sophisticated controls. The proposed technique tracks the history of the charging period and retroactively assigns an adjusted value to the ESS charging control. The operation policy derived using the proposed approach is well-fitted for the real effects of ESS operation because of the process of the technique. Therefore, it supports the search for a near-optimal operation policy under a sophisticatedly controlled microgrid environment. To validate our technique, we simulate the operation policy under a real-world grid-connected microgrid system and demonstrate the convergence to a near-optimal policy by comparing performance measures of our policy with benchmark policy and optimal policy.
△ Less
Submitted 20 October, 2020; v1 submitted 30 June, 2020;
originally announced June 2020.
-
Sequential Feature Filtering Classifier
Authors:
Minseok Seo,
Jaemin Lee,
Jongchan Park,
Dong-Geol Choi
Abstract:
We propose Sequential Feature Filtering Classifier (FFC), a simple but effective classifier for convolutional neural networks (CNNs). With sequential LayerNorm and ReLU, FFC zeroes out low-activation units and preserves high-activation units. The sequential feature filtering process generates multiple features, which are fed into a shared classifier for multiple outputs. FFC can be applied to any…
▽ More
We propose Sequential Feature Filtering Classifier (FFC), a simple but effective classifier for convolutional neural networks (CNNs). With sequential LayerNorm and ReLU, FFC zeroes out low-activation units and preserves high-activation units. The sequential feature filtering process generates multiple features, which are fed into a shared classifier for multiple outputs. FFC can be applied to any CNNs with a classifier, and significantly improves performances with negligible overhead. We extensively validate the efficacy of FFC on various tasks: ImageNet-1K classification, MS COCO detection, Cityscapes segmentation, and HMDB51 action recognition. Moreover, we empirically show that FFC can further improve performances upon other techniques, including attention modules and augmentation techniques. The code and models will be publicly available.
△ Less
Submitted 21 June, 2020;
originally announced June 2020.
-
Gradient Estimation with Stochastic Softmax Tricks
Authors:
Max B. Paulus,
Dami Choi,
Daniel Tarlow,
Andreas Krause,
Chris J. Maddison
Abstract:
The Gumbel-Max trick is the basis of many relaxed gradient estimators. These estimators are easy to implement and low variance, but the goal of scaling them comprehensively to large combinatorial distributions is still outstanding. Working within the perturbation model framework, we introduce stochastic softmax tricks, which generalize the Gumbel-Softmax trick to combinatorial spaces. Our framewor…
▽ More
The Gumbel-Max trick is the basis of many relaxed gradient estimators. These estimators are easy to implement and low variance, but the goal of scaling them comprehensively to large combinatorial distributions is still outstanding. Working within the perturbation model framework, we introduce stochastic softmax tricks, which generalize the Gumbel-Softmax trick to combinatorial spaces. Our framework is a unified perspective on existing relaxed estimators for perturbation models, and it contains many novel relaxations. We design structured relaxations for subset selection, spanning trees, arborescences, and others. When compared to less structured baselines, we find that stochastic softmax tricks can be used to train latent variable models that perform better and discover more latent structure.
△ Less
Submitted 28 February, 2021; v1 submitted 14 June, 2020;
originally announced June 2020.
-
Emora STDM: A Versatile Framework for Innovative Dialogue System Development
Authors:
James D. Finch,
Jinho D. Choi
Abstract:
This demo paper presents Emora STDM (State Transition Dialogue Manager), a dialogue system development framework that provides novel workflows for rapid prototyping of chat-based dialogue managers as well as collaborative development of complex interactions. Our framework caters to a wide range of expertise levels by supporting interoperability between two popular approaches, state machine and inf…
▽ More
This demo paper presents Emora STDM (State Transition Dialogue Manager), a dialogue system development framework that provides novel workflows for rapid prototyping of chat-based dialogue managers as well as collaborative development of complex interactions. Our framework caters to a wide range of expertise levels by supporting interoperability between two popular approaches, state machine and information state, to dialogue management. Our Natural Language Expression package allows seamless integration of pattern matching, custom NLP modules, and database querying, that makes the workflows much more efficient. As a user study, we adopt this framework to an interdisciplinary undergraduate course where students with both technical and non-technical backgrounds are able to develop creative dialogue managers in a short period of time.
△ Less
Submitted 10 June, 2020;
originally announced June 2020.
-
Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols
Authors:
Sarah E. Finch,
Jinho D. Choi
Abstract:
As conversational AI-based dialogue management has increasingly become a trending topic, the need for a standardized and reliable evaluation procedure grows even more pressing. The current state of affairs suggests various evaluation protocols to assess chat-oriented dialogue management systems, rendering it difficult to conduct fair comparative studies across different approaches and gain an insi…
▽ More
As conversational AI-based dialogue management has increasingly become a trending topic, the need for a standardized and reliable evaluation procedure grows even more pressing. The current state of affairs suggests various evaluation protocols to assess chat-oriented dialogue management systems, rendering it difficult to conduct fair comparative studies across different approaches and gain an insightful understanding of their values. To foster this research, a more robust evaluation protocol must be set in place. This paper presents a comprehensive synthesis of both automated and human evaluation methods on dialogue systems, identifying their shortcomings while accumulating evidence towards the most effective evaluation dimensions. A total of 20 papers from the last two years are surveyed to analyze three types of evaluation protocols: automated, static, and interactive. Finally, the evaluation dimensions used in these papers are compared against our expert evaluation on the system-user dialogue data collected from the Alexa Prize 2020.
△ Less
Submitted 10 June, 2020;
originally announced June 2020.
-
Analysis of the Penn Korean Universal Dependency Treebank (PKT-UD): Manual Revision to Build Robust Parsing Model in Korean
Authors:
Tae Hwan Oh,
Ji Yoon Han,
Hyonsu Choe,
Seokwon Park,
Han He,
Jinho D. Choi,
Na-Rae Han,
Jena D. Hwang,
Hansaem Kim
Abstract:
In this paper, we first open on important issues regarding the Penn Korean Universal Treebank (PKT-UD) and address these issues by revising the entire corpus manually with the aim of producing cleaner UD annotations that are more faithful to Korean grammar. For compatibility to the rest of UD corpora, we follow the UDv2 guidelines, and extensively revise the part-of-speech tags and the dependency…
▽ More
In this paper, we first open on important issues regarding the Penn Korean Universal Treebank (PKT-UD) and address these issues by revising the entire corpus manually with the aim of producing cleaner UD annotations that are more faithful to Korean grammar. For compatibility to the rest of UD corpora, we follow the UDv2 guidelines, and extensively revise the part-of-speech tags and the dependency relations to reflect morphological features and flexible word-order aspects in Korean. The original and the revised versions of PKT-UD are experimented with transformer-based parsing models using biaffine attention. The parsing model trained on the revised corpus shows a significant improvement of 3.0% in labeled attachment score over the model trained on the previous corpus. Our error analysis demonstrates that this revision allows the parsing model to learn relations more robustly, reducing several critical errors that used to be made by the previous model.
△ Less
Submitted 26 May, 2020;
originally announced May 2020.
-
Transformer-based Context-aware Sarcasm Detection in Conversation Threads from Social Media
Authors:
Xiangjue Dong,
Changmao Li,
Jinho D. Choi
Abstract:
We present a transformer-based sarcasm detection model that accounts for the context from the entire conversation thread for more robust predictions. Our model uses deep transformer layers to perform multi-head attentions among the target utterance and the relevant context in the thread. The context-aware models are evaluated on two datasets from social media, Twitter and Reddit, and show 3.1% and…
▽ More
We present a transformer-based sarcasm detection model that accounts for the context from the entire conversation thread for more robust predictions. Our model uses deep transformer layers to perform multi-head attentions among the target utterance and the relevant context in the thread. The context-aware models are evaluated on two datasets from social media, Twitter and Reddit, and show 3.1% and 7.0% improvements over their baselines. Our best models give the F1-scores of 79.0% and 75.0% for the Twitter and Reddit datasets respectively, becoming one of the highest performing systems among 36 participants in this shared task.
△ Less
Submitted 22 May, 2020;
originally announced May 2020.
-
Fast magneto-ionic switching of interface anisotropy using yttria-stabilized zirconia gate oxide
Authors:
Ki-Young Lee,
Sujin Jo,
Aik Jun Tan,
Mantao Huang,
Dongwon Choi,
Jung Hoon Park,
Ho-Il Ji,
Ji-Won Son,
Joonyeon Chang,
Geoffrey S. D. Beach,
Seonghoon Woo
Abstract:
Voltage control of interfacial magnetism has been greatly highlighted in spintronics research for many years, as it might enable ultra-low power technologies. Among few suggested approaches, magneto-ionic control of magnetism has demonstrated large modulation of magnetic anisotropy. Moreover, the recent demonstration of magneto-ionic devices using hydrogen ions presented relatively fast magnetizat…
▽ More
Voltage control of interfacial magnetism has been greatly highlighted in spintronics research for many years, as it might enable ultra-low power technologies. Among few suggested approaches, magneto-ionic control of magnetism has demonstrated large modulation of magnetic anisotropy. Moreover, the recent demonstration of magneto-ionic devices using hydrogen ions presented relatively fast magnetization toggle switching, tsw ~ 100 ms, at room temperature. However, the operation speed may need to be significantly improved to be used for modern electronic devices. Here, we demonstrate that the speed of proton-induced magnetization toggle switching largely depends on proton-conducting oxides. We achieve ~1 ms reliable (> 103 cycles) switching using yttria-stabilized zirconia (YSZ), which is ~ 100 times faster than the state-of-the-art magneto-ionic devices reported to date at room temperature. Our results suggest further engineering of the proton-conducting materials could bring substantial improvement that may enable new low-power computing scheme based on magneto-ionics.
△ Less
Submitted 5 May, 2020;
originally announced May 2020.
-
Noise Pollution in Hospital Readmission Prediction: Long Document Classification with Reinforcement Learning
Authors:
Liyan Xu,
Julien Hogan,
Rachel E. Patzer,
Jinho D. Choi
Abstract:
This paper presents a reinforcement learning approach to extract noise in long clinical documents for the task of readmission prediction after kidney transplant. We face the challenges of developing robust models on a small dataset where each document may consist of over 10K tokens with full of noise including tabular text and task-irrelevant sentences. We first experiment four types of encoders t…
▽ More
This paper presents a reinforcement learning approach to extract noise in long clinical documents for the task of readmission prediction after kidney transplant. We face the challenges of developing robust models on a small dataset where each document may consist of over 10K tokens with full of noise including tabular text and task-irrelevant sentences. We first experiment four types of encoders to empirically decide the best document representation, and then apply reinforcement learning to remove noisy text from the long documents, which models the noise extraction process as a sequential decision problem. Our results show that the old bag-of-words encoder outperforms deep learning-based encoders on this task, and reinforcement learning is able to improve upon baseline while pruning out 25% text segments. Our analysis depicts that reinforcement learning is able to identify both typical noisy tokens and task-specific noisy text.
△ Less
Submitted 23 May, 2020; v1 submitted 4 May, 2020;
originally announced May 2020.
-
Integrated Eojeol Embedding for Erroneous Sentence Classification in Korean Chatbots
Authors:
DongHyun Choi,
IlNam Park,
Myeong Cheol Shin,
EungGyun Kim,
Dong Ryeol Shin
Abstract:
This paper attempts to analyze the Korean sentence classification system for a chatbot. Sentence classification is the task of classifying an input sentence based on predefined categories. However, spelling or space error contained in the input sentence causes problems in morphological analysis and tokenization. This paper proposes a novel approach of Integrated Eojeol (Korean syntactic word separ…
▽ More
This paper attempts to analyze the Korean sentence classification system for a chatbot. Sentence classification is the task of classifying an input sentence based on predefined categories. However, spelling or space error contained in the input sentence causes problems in morphological analysis and tokenization. This paper proposes a novel approach of Integrated Eojeol (Korean syntactic word separated by space) Embedding to reduce the effect that poorly analyzed morphemes may make on sentence classification. It also proposes two noise insertion methods that further improve classification performance. Our evaluation results indicate that the proposed system classifies erroneous sentences more accurately than the baseline system by 17%p.0
△ Less
Submitted 12 April, 2020;
originally announced April 2020.
-
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering
Authors:
Changmao Li,
Jinho D. Choi
Abstract:
We introduce a novel approach to transformers that learns hierarchical representations in multiparty dialogue. First, three language modeling tasks are used to pre-train the transformers, token- and utterance-level language modeling and utterance order prediction, that learn both token and utterance embeddings for better understanding in dialogue contexts. Then, multi-task learning between the utt…
▽ More
We introduce a novel approach to transformers that learns hierarchical representations in multiparty dialogue. First, three language modeling tasks are used to pre-train the transformers, token- and utterance-level language modeling and utterance order prediction, that learn both token and utterance embeddings for better understanding in dialogue contexts. Then, multi-task learning between the utterance prediction and the token span prediction is applied to fine-tune for span-based question answering (QA). Our approach is evaluated on the FriendsQA dataset and shows improvements of 3.8% and 1.4% over the two state-of-the-art transformer models, BERT and RoBERTa, respectively.
△ Less
Submitted 23 May, 2020; v1 submitted 7 April, 2020;
originally announced April 2020.