-
Limitations of Online Play Content for Parents of Infants and Toddlers
Authors:
Keunwoo Park,
Subin Ahn,
Mina Jung,
You Jung Cho,
Seulah Jeong,
Cheong-Ah Huh
Abstract:
Play is a fundamental aspect of developmental growth, yet many parents encounter significant challenges in fulfilling their caregiving roles in this area. As online content increasingly serves as the primary source of parental guidance, this study investigates the difficulties parents face related to play and evaluates the limitations of current online content. We identified ten findings through i…
▽ More
Play is a fundamental aspect of developmental growth, yet many parents encounter significant challenges in fulfilling their caregiving roles in this area. As online content increasingly serves as the primary source of parental guidance, this study investigates the difficulties parents face related to play and evaluates the limitations of current online content. We identified ten findings through in-depth interviews with nine parents who reported struggles in engaging with their children during play. Based on these findings, we discuss the major limitations of online play content and suggest how they can be improved. These recommendations include minimizing parental anxiety, accommodating diverse play scenarios, providing credible and personalized information, encouraging creativity, and delivering the same content in multiple formats.
△ Less
Submitted 4 January, 2025; v1 submitted 24 November, 2024;
originally announced November 2024.
-
The Geometry of Categorical and Hierarchical Concepts in Large Language Models
Authors:
Kiho Park,
Yo Joong Choe,
Yibo Jiang,
Victor Veitch
Abstract:
The linear representation hypothesis is the informal idea that semantic concepts are encoded as linear directions in the representation spaces of large language models (LLMs). Previous work has shown how to make this notion precise for representing binary concepts that have natural contrasts (e.g., {male, female}) as directions in representation space. However, many natural concepts do not have na…
▽ More
The linear representation hypothesis is the informal idea that semantic concepts are encoded as linear directions in the representation spaces of large language models (LLMs). Previous work has shown how to make this notion precise for representing binary concepts that have natural contrasts (e.g., {male, female}) as directions in representation space. However, many natural concepts do not have natural contrasts (e.g., whether the output is about an animal). In this work, we show how to extend the formalization of the linear representation hypothesis to represent features (e.g., is_animal) as vectors. This allows us to immediately formalize the representation of categorical concepts as polytopes in the representation space. Further, we use the formalization to prove a relationship between the hierarchical structure of concepts and the geometry of their representations. We validate these theoretical results on the Gemma and LLaMA-3 large language models, estimating representations for 900+ hierarchically related concepts using data from WordNet.
△ Less
Submitted 17 February, 2025; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Combining Evidence Across Filtrations
Authors:
Yo Joong Choe,
Aaditya Ramdas
Abstract:
In sequential anytime-valid inference, any admissible procedure must be based on e-processes: generalizations of test martingales that quantify the accumulated evidence against a composite null hypothesis at any stopping time. This paper proposes a method for combining e-processes constructed in different filtrations but for the same null. Although e-processes in the same filtration can be combine…
▽ More
In sequential anytime-valid inference, any admissible procedure must be based on e-processes: generalizations of test martingales that quantify the accumulated evidence against a composite null hypothesis at any stopping time. This paper proposes a method for combining e-processes constructed in different filtrations but for the same null. Although e-processes in the same filtration can be combined effortlessly (by averaging), e-processes in different filtrations cannot because their validity in a coarser filtration does not translate to a finer filtration. This issue arises in sequential tests of randomness and independence, as well as in the evaluation of sequential forecasters. We establish that a class of functions called adjusters can lift arbitrary e-processes across filtrations. The result yields a generally applicable "adjust-then-combine" procedure, which we demonstrate on the problem of testing randomness in real-world financial data. Furthermore, we prove a characterization theorem for adjusters that formalizes a sense in which using adjusters is necessary. There are two major implications. First, if we have a powerful e-process in a coarsened filtration, then we readily have a powerful e-process in the original filtration. Second, when we coarsen the filtration to construct an e-process, there is a logarithmic cost to recovering validity in the original filtration.
△ Less
Submitted 15 February, 2025; v1 submitted 14 February, 2024;
originally announced February 2024.
-
Heterogeneous LoRA for Federated Fine-tuning of On-Device Foundation Models
Authors:
Yae Jee Cho,
Luyang Liu,
Zheng Xu,
Aldi Fahrezi,
Gauri Joshi
Abstract:
Foundation models (FMs) adapt well to specific domains or tasks with fine-tuning, and federated learning (FL) enables the potential for privacy-preserving fine-tuning of the FMs with on-device local data. For federated fine-tuning of FMs, we consider the FMs with small to medium parameter sizes of single digit billion at maximum, referred to as on-device FMs (ODFMs) that can be deployed on devices…
▽ More
Foundation models (FMs) adapt well to specific domains or tasks with fine-tuning, and federated learning (FL) enables the potential for privacy-preserving fine-tuning of the FMs with on-device local data. For federated fine-tuning of FMs, we consider the FMs with small to medium parameter sizes of single digit billion at maximum, referred to as on-device FMs (ODFMs) that can be deployed on devices for inference but can only be fine-tuned with parameter efficient methods. In our work, we tackle the data and system heterogeneity problem of federated fine-tuning of ODFMs by proposing a novel method using heterogeneous low-rank approximations (LoRAs), namely HetLoRA. First, we show that the naive approach of using homogeneous LoRA ranks across devices face a trade-off between overfitting and slow convergence, and thus propose HetLoRA, which allows heterogeneous ranks across client devices and efficiently aggregates and distributes these heterogeneous LoRA modules. By applying rank self-pruning locally and sparsity-weighted aggregation at the server, HetLoRA combines the advantages of high and low-rank LoRAs, which achieves improved convergence speed and final performance compared to homogeneous LoRA. Furthermore, HetLoRA offers enhanced computation efficiency compared to full fine-tuning, making it suitable for federated fine-tuning across heterogeneous devices.
△ Less
Submitted 20 February, 2024; v1 submitted 12 January, 2024;
originally announced January 2024.
-
The Linear Representation Hypothesis and the Geometry of Large Language Models
Authors:
Kiho Park,
Yo Joong Choe,
Victor Veitch
Abstract:
Informally, the 'linear representation hypothesis' is the idea that high-level concepts are represented linearly as directions in some representation space. In this paper, we address two closely related questions: What does "linear representation" actually mean? And, how do we make sense of geometric notions (e.g., cosine similarity or projection) in the representation space? To answer these, we u…
▽ More
Informally, the 'linear representation hypothesis' is the idea that high-level concepts are represented linearly as directions in some representation space. In this paper, we address two closely related questions: What does "linear representation" actually mean? And, how do we make sense of geometric notions (e.g., cosine similarity or projection) in the representation space? To answer these, we use the language of counterfactuals to give two formalizations of "linear representation", one in the output (word) representation space, and one in the input (sentence) space. We then prove these connect to linear probing and model steering, respectively. To make sense of geometric notions, we use the formalization to identify a particular (non-Euclidean) inner product that respects language structure in a sense we make precise. Using this causal inner product, we show how to unify all notions of linear representation. In particular, this allows the construction of probes and steering vectors using counterfactual pairs. Experiments with LLaMA-2 demonstrate the existence of linear representations of concepts, the connection to interpretation and control, and the fundamental role of the choice of inner product.
△ Less
Submitted 17 July, 2024; v1 submitted 6 November, 2023;
originally announced November 2023.
-
Local or Global: Selective Knowledge Assimilation for Federated Learning with Limited Labels
Authors:
Yae Jee Cho,
Gauri Joshi,
Dimitrios Dimitriadis
Abstract:
Many existing FL methods assume clients with fully-labeled data, while in realistic settings, clients have limited labels due to the expensive and laborious process of labeling. Limited labeled local data of the clients often leads to their local model having poor generalization abilities to their larger unlabeled local data, such as having class-distribution mismatch with the unlabeled data. As a…
▽ More
Many existing FL methods assume clients with fully-labeled data, while in realistic settings, clients have limited labels due to the expensive and laborious process of labeling. Limited labeled local data of the clients often leads to their local model having poor generalization abilities to their larger unlabeled local data, such as having class-distribution mismatch with the unlabeled data. As a result, clients may instead look to benefit from the global model trained across clients to leverage their unlabeled data, but this also becomes difficult due to data heterogeneity across clients. In our work, we propose FedLabel where clients selectively choose the local or global model to pseudo-label their unlabeled data depending on which is more of an expert of the data. We further utilize both the local and global models' knowledge via global-local consistency regularization which minimizes the divergence between the two models' outputs when they have identical pseudo-labels for the unlabeled data. Unlike other semi-supervised FL baselines, our method does not require additional experts other than the local or global model, nor require additional parameters to be communicated. We also do not assume any server-labeled data or fully labeled clients. For both cross-device and cross-silo settings, we show that FedLabel outperforms other semi-supervised FL baselines by $8$-$24\%$, and even outperforms standard fully supervised FL baselines ($100\%$ labeled data) with only $5$-$20\%$ of labeled data.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Counterfactually Comparing Abstaining Classifiers
Authors:
Yo Joong Choe,
Aditya Gangrade,
Aaditya Ramdas
Abstract:
Abstaining classifiers have the option to abstain from making predictions on inputs that they are unsure about. These classifiers are becoming increasingly popular in high-stakes decision-making problems, as they can withhold uncertain predictions to improve their reliability and safety. When evaluating black-box abstaining classifier(s), however, we lack a principled approach that accounts for wh…
▽ More
Abstaining classifiers have the option to abstain from making predictions on inputs that they are unsure about. These classifiers are becoming increasingly popular in high-stakes decision-making problems, as they can withhold uncertain predictions to improve their reliability and safety. When evaluating black-box abstaining classifier(s), however, we lack a principled approach that accounts for what the classifier would have predicted on its abstentions. These missing predictions matter when they can eventually be utilized, either directly or as a backup option in a failure mode. In this paper, we introduce a novel approach and perspective to the problem of evaluating and comparing abstaining classifiers by treating abstentions as missing data. Our evaluation approach is centered around defining the counterfactual score of an abstaining classifier, defined as the expected performance of the classifier had it not been allowed to abstain. We specify the conditions under which the counterfactual score is identifiable: if the abstentions are stochastic, and if the evaluation data is independent of the training data (ensuring that the predictions are missing at random), then the score is identifiable. Note that, if abstentions are deterministic, then the score is unidentifiable because the classifier can perform arbitrarily poorly on its abstentions. Leveraging tools from observational causal inference, we then develop nonparametric and doubly robust methods to efficiently estimate this quantity under identification. Our approach is examined in both simulated and real data experiments.
△ Less
Submitted 9 November, 2023; v1 submitted 17 May, 2023;
originally announced May 2023.
-
On the Convergence of Federated Averaging with Cyclic Client Participation
Authors:
Yae Jee Cho,
Pranay Sharma,
Gauri Joshi,
Zheng Xu,
Satyen Kale,
Tong Zhang
Abstract:
Federated Averaging (FedAvg) and its variants are the most popular optimization algorithms in federated learning (FL). Previous convergence analyses of FedAvg either assume full client participation or partial client participation where the clients can be uniformly sampled. However, in practical cross-device FL systems, only a subset of clients that satisfy local criteria such as battery status, n…
▽ More
Federated Averaging (FedAvg) and its variants are the most popular optimization algorithms in federated learning (FL). Previous convergence analyses of FedAvg either assume full client participation or partial client participation where the clients can be uniformly sampled. However, in practical cross-device FL systems, only a subset of clients that satisfy local criteria such as battery status, network connectivity, and maximum participation frequency requirements (to ensure privacy) are available for training at a given time. As a result, client availability follows a natural cyclic pattern. We provide (to our knowledge) the first theoretical framework to analyze the convergence of FedAvg with cyclic client participation with several different client optimizers such as GD, SGD, and shuffled SGD. Our analysis discovers that cyclic client participation can achieve a faster asymptotic convergence rate than vanilla FedAvg with uniform client participation under suitable conditions, providing valuable insights into the design of client sampling protocols.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.
-
Maximizing Global Model Appeal in Federated Learning
Authors:
Yae Jee Cho,
Divyansh Jhunjhunwala,
Tian Li,
Virginia Smith,
Gauri Joshi
Abstract:
Federated learning typically considers collaboratively training a global model using local data at edge clients. Clients may have their own individual requirements, such as having a minimal training loss threshold, which they expect to be met by the global model. However, due to client heterogeneity, the global model may not meet each client's requirements, and only a small subset may find the glo…
▽ More
Federated learning typically considers collaboratively training a global model using local data at edge clients. Clients may have their own individual requirements, such as having a minimal training loss threshold, which they expect to be met by the global model. However, due to client heterogeneity, the global model may not meet each client's requirements, and only a small subset may find the global model appealing. In this work, we explore the problem of the global model lacking appeal to the clients due to not being able to satisfy local requirements. We propose MaxFL, which aims to maximize the number of clients that find the global model appealing. We show that having a high global model appeal is important to maintain an adequate pool of clients for training, and can directly improve the test accuracy on both seen and unseen clients. We provide convergence guarantees for MaxFL and show that MaxFL achieves a $22$-$40\%$ and $18$-$50\%$ test accuracy improvement for the training clients and unseen clients respectively, compared to a wide range of FL modeling approaches, including those that tackle data heterogeneity, aim to incentivize clients, and learn personalized or fair models.
△ Less
Submitted 4 February, 2023; v1 submitted 30 May, 2022;
originally announced May 2022.
-
Heterogeneous Ensemble Knowledge Transfer for Training Large Models in Federated Learning
Authors:
Yae Jee Cho,
Andre Manoel,
Gauri Joshi,
Robert Sim,
Dimitrios Dimitriadis
Abstract:
Federated learning (FL) enables edge-devices to collaboratively learn a model without disclosing their private data to a central aggregating server. Most existing FL algorithms require models of identical architecture to be deployed across the clients and server, making it infeasible to train large models due to clients' limited system resources. In this work, we propose a novel ensemble knowledge…
▽ More
Federated learning (FL) enables edge-devices to collaboratively learn a model without disclosing their private data to a central aggregating server. Most existing FL algorithms require models of identical architecture to be deployed across the clients and server, making it infeasible to train large models due to clients' limited system resources. In this work, we propose a novel ensemble knowledge transfer method named Fed-ET in which small models (different in architecture) are trained on clients, and used to train a larger model at the server. Unlike in conventional ensemble learning, in FL the ensemble can be trained on clients' highly heterogeneous data. Cognizant of this property, Fed-ET uses a weighted consensus distillation scheme with diversity regularization that efficiently extracts reliable consensus from the ensemble while improving generalization by exploiting the diversity within the ensemble. We show the generalization bound for the ensemble of weighted models trained on heterogeneous datasets that supports the intuition of Fed-ET. Our experiments on image and language tasks show that Fed-ET significantly outperforms other state-of-the-art FL algorithms with fewer communicated parameters, and is also robust against high data-heterogeneity.
△ Less
Submitted 27 April, 2022;
originally announced April 2022.
-
Comparing Sequential Forecasters
Authors:
Yo Joong Choe,
Aaditya Ramdas
Abstract:
Consider two forecasters, each making a single prediction for a sequence of events over time. We ask a relatively basic question: how might we compare these forecasters, either online or post-hoc, while avoiding unverifiable assumptions on how the forecasts and outcomes were generated? In this paper, we present a rigorous answer to this question by designing novel sequential inference procedures f…
▽ More
Consider two forecasters, each making a single prediction for a sequence of events over time. We ask a relatively basic question: how might we compare these forecasters, either online or post-hoc, while avoiding unverifiable assumptions on how the forecasts and outcomes were generated? In this paper, we present a rigorous answer to this question by designing novel sequential inference procedures for estimating the time-varying difference in forecast scores. To do this, we employ confidence sequences (CS), which are sequences of confidence intervals that can be continuously monitored and are valid at arbitrary data-dependent stopping times ("anytime-valid"). The widths of our CSs are adaptive to the underlying variance of the score differences. Underlying their construction is a game-theoretic statistical framework, in which we further identify e-processes and p-processes for sequentially testing a weak null hypothesis -- whether one forecaster outperforms another on average (rather than always). Our methods do not make distributional assumptions on the forecasts or outcomes; our main theorems apply to any bounded scores, and we later provide alternative methods for unbounded scores. We empirically validate our approaches by comparing real-world baseball and weather forecasters.
△ Less
Submitted 9 November, 2023; v1 submitted 30 September, 2021;
originally announced October 2021.
-
Personalized Federated Learning for Heterogeneous Clients with Clustered Knowledge Transfer
Authors:
Yae Jee Cho,
Jianyu Wang,
Tarun Chiruvolu,
Gauri Joshi
Abstract:
Personalized federated learning (FL) aims to train model(s) that can perform well for individual clients that are highly data and system heterogeneous. Most work in personalized FL, however, assumes using the same model architecture at all clients and increases the communication cost by sending/receiving models. This may not be feasible for realistic scenarios of FL. In practice, clients have high…
▽ More
Personalized federated learning (FL) aims to train model(s) that can perform well for individual clients that are highly data and system heterogeneous. Most work in personalized FL, however, assumes using the same model architecture at all clients and increases the communication cost by sending/receiving models. This may not be feasible for realistic scenarios of FL. In practice, clients have highly heterogeneous system-capabilities and limited communication resources. In our work, we propose a personalized FL framework, PerFed-CKT, where clients can use heterogeneous model architectures and do not directly communicate their model parameters. PerFed-CKT uses clustered co-distillation, where clients use logits to transfer their knowledge to other clients that have similar data-distributions. We theoretically show the convergence and generalization properties of PerFed-CKT and empirically show that PerFed-CKT achieves high test accuracy with several orders of magnitude lower communication cost compared to the state-of-the-art personalized FL schemes.
△ Less
Submitted 16 September, 2021;
originally announced September 2021.
-
Bandit-based Communication-Efficient Client Selection Strategies for Federated Learning
Authors:
Yae Jee Cho,
Samarth Gupta,
Gauri Joshi,
Osman YaÄŸan
Abstract:
Due to communication constraints and intermittent client availability in federated learning, only a subset of clients can participate in each training round. While most prior works assume uniform and unbiased client selection, recent work on biased client selection has shown that selecting clients with higher local losses can improve error convergence speed. However, previously proposed biased sel…
▽ More
Due to communication constraints and intermittent client availability in federated learning, only a subset of clients can participate in each training round. While most prior works assume uniform and unbiased client selection, recent work on biased client selection has shown that selecting clients with higher local losses can improve error convergence speed. However, previously proposed biased selection strategies either require additional communication cost for evaluating the exact local loss or utilize stale local loss, which can even make the model diverge. In this paper, we present a bandit-based communication-efficient client selection strategy UCB-CS that achieves faster convergence with lower communication overhead. We also demonstrate how client selection can be used to improve fairness.
△ Less
Submitted 14 December, 2020;
originally announced December 2020.
-
Client Selection in Federated Learning: Convergence Analysis and Power-of-Choice Selection Strategies
Authors:
Yae Jee Cho,
Jianyu Wang,
Gauri Joshi
Abstract:
Federated learning is a distributed optimization paradigm that enables a large number of resource-limited client nodes to cooperatively train a model without data sharing. Several works have analyzed the convergence of federated learning by accounting of data heterogeneity, communication and computation limitations, and partial client participation. However, they assume unbiased client participati…
▽ More
Federated learning is a distributed optimization paradigm that enables a large number of resource-limited client nodes to cooperatively train a model without data sharing. Several works have analyzed the convergence of federated learning by accounting of data heterogeneity, communication and computation limitations, and partial client participation. However, they assume unbiased client participation, where clients are selected at random or in proportion of their data sizes. In this paper, we present the first convergence analysis of federated optimization for biased client selection strategies, and quantify how the selection bias affects convergence speed. We reveal that biasing client selection towards clients with higher local loss achieves faster error convergence. Using this insight, we propose Power-of-Choice, a communication- and computation-efficient client selection framework that can flexibly span the trade-off between convergence speed and solution bias. Our experiments demonstrate that Power-of-Choice strategies converge up to 3 $\times$ faster and give $10$% higher test accuracy than the baseline random selection.
△ Less
Submitted 2 October, 2020;
originally announced October 2020.
-
An Empirical Study of Invariant Risk Minimization
Authors:
Yo Joong Choe,
Jiyeon Ham,
Kyubyong Park
Abstract:
Invariant risk minimization (IRM) (Arjovsky et al., 2019) is a recently proposed framework designed for learning predictors that are invariant to spurious correlations across different training environments. Yet, despite its theoretical justifications, IRM has not been extensively tested across various settings. In an attempt to gain a better understanding of the framework, we empirically investig…
▽ More
Invariant risk minimization (IRM) (Arjovsky et al., 2019) is a recently proposed framework designed for learning predictors that are invariant to spurious correlations across different training environments. Yet, despite its theoretical justifications, IRM has not been extensively tested across various settings. In an attempt to gain a better understanding of the framework, we empirically investigate several research questions using IRMv1, which is the first practical algorithm proposed to approximately solve IRM. By extending the ColoredMNIST experiment in different ways, we find that IRMv1 (i) performs better as the spurious correlation varies more widely between training environments, (ii) learns an approximately invariant predictor when the underlying relationship is approximately invariant, and (iii) can be extended to an analogous setting for text classification.
△ Less
Submitted 6 July, 2020; v1 submitted 10 April, 2020;
originally announced April 2020.
-
KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding
Authors:
Jiyeon Ham,
Yo Joong Choe,
Kyubyong Park,
Ilji Choi,
Hyungjoon Soh
Abstract:
Natural language inference (NLI) and semantic textual similarity (STS) are key tasks in natural language understanding (NLU). Although several benchmark datasets for those tasks have been released in English and a few other languages, there are no publicly available NLI or STS datasets in the Korean language. Motivated by this, we construct and release new datasets for Korean NLI and STS, dubbed K…
▽ More
Natural language inference (NLI) and semantic textual similarity (STS) are key tasks in natural language understanding (NLU). Although several benchmark datasets for those tasks have been released in English and a few other languages, there are no publicly available NLI or STS datasets in the Korean language. Motivated by this, we construct and release new datasets for Korean NLI and STS, dubbed KorNLI and KorSTS, respectively. Following previous approaches, we machine-translate existing English training sets and manually translate development and test sets into Korean. To accelerate research on Korean NLU, we also establish baselines on KorNLI and KorSTS. Our datasets are publicly available at https://github.com/kakaobrain/KorNLUDatasets.
△ Less
Submitted 5 October, 2020; v1 submitted 7 April, 2020;
originally announced April 2020.
-
Jejueo Datasets for Machine Translation and Speech Synthesis
Authors:
Kyubyong Park,
Yo Joong Choe,
Jiyeon Ham
Abstract:
Jejueo was classified as critically endangered by UNESCO in 2010. Although diverse efforts to revitalize it have been made, there have been few computational approaches. Motivated by this, we construct two new Jejueo datasets: Jejueo Interview Transcripts (JIT) and Jejueo Single Speaker Speech (JSS). The JIT dataset is a parallel corpus containing 170k+ Jejueo-Korean sentences, and the JSS dataset…
▽ More
Jejueo was classified as critically endangered by UNESCO in 2010. Although diverse efforts to revitalize it have been made, there have been few computational approaches. Motivated by this, we construct two new Jejueo datasets: Jejueo Interview Transcripts (JIT) and Jejueo Single Speaker Speech (JSS). The JIT dataset is a parallel corpus containing 170k+ Jejueo-Korean sentences, and the JSS dataset consists of 10k high-quality audio files recorded by a native Jejueo speaker and a transcript file. Subsequently, we build neural systems of machine translation and speech synthesis using them. All resources are publicly available via our GitHub repository. We hope that these datasets will attract interest of both language and machine learning communities.
△ Less
Submitted 27 November, 2019;
originally announced November 2019.
-
word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs
Authors:
Yo Joong Choe,
Kyubyong Park,
Dongwoo Kim
Abstract:
We present word2word, a publicly available dataset and an open-source Python package for cross-lingual word translations extracted from sentence-level parallel corpora. Our dataset provides top-k word translations in 3,564 (directed) language pairs across 62 languages in OpenSubtitles2018 (Lison et al., 2018). To obtain this dataset, we use a count-based bilingual lexicon extraction model based on…
▽ More
We present word2word, a publicly available dataset and an open-source Python package for cross-lingual word translations extracted from sentence-level parallel corpora. Our dataset provides top-k word translations in 3,564 (directed) language pairs across 62 languages in OpenSubtitles2018 (Lison et al., 2018). To obtain this dataset, we use a count-based bilingual lexicon extraction model based on the observation that not only source and target words but also source words themselves can be highly correlated. We illustrate that the resulting bilingual lexicons have high coverage and attain competitive translation quality for several language pairs. We wrap our dataset and model in an easy-to-use Python library, which supports downloading and retrieving top-k word translations in any of the supported language pairs as well as computing top-k word translations for custom parallel corpora.
△ Less
Submitted 27 November, 2019;
originally announced November 2019.
-
A Neural Grammatical Error Correction System Built On Better Pre-training and Sequential Transfer Learning
Authors:
Yo Joong Choe,
Jiyeon Ham,
Kyubyong Park,
Yeoil Yoon
Abstract:
Grammatical error correction can be viewed as a low-resource sequence-to-sequence task, because publicly available parallel corpora are limited. To tackle this challenge, we first generate erroneous versions of large unannotated corpora using a realistic noising function. The resulting parallel corpora are subsequently used to pre-train Transformer models. Then, by sequentially applying transfer l…
▽ More
Grammatical error correction can be viewed as a low-resource sequence-to-sequence task, because publicly available parallel corpora are limited. To tackle this challenge, we first generate erroneous versions of large unannotated corpora using a realistic noising function. The resulting parallel corpora are subsequently used to pre-train Transformer models. Then, by sequentially applying transfer learning, we adapt these models to the domain and style of the test set. Combined with a context-aware neural spellchecker, our system achieves competitive results in both restricted and low resource tracks in ACL 2019 BEA Shared Task. We release all of our code and materials for reproducibility.
△ Less
Submitted 2 July, 2019;
originally announced July 2019.
-
Predicting drug-target interaction using 3D structure-embedded graph representations from graph neural networks
Authors:
Jaechang Lim,
Seongok Ryu,
Kyubyong Park,
Yo Joong Choe,
Jiyeon Ham,
Woo Youn Kim
Abstract:
Accurate prediction of drug-target interaction (DTI) is essential for in silico drug design. For the purpose, we propose a novel approach for predicting DTI using a GNN that directly incorporates the 3D structure of a protein-ligand complex. We also apply a distance-aware graph attention algorithm with gate augmentation to increase the performance of our model. As a result, our model shows better…
▽ More
Accurate prediction of drug-target interaction (DTI) is essential for in silico drug design. For the purpose, we propose a novel approach for predicting DTI using a GNN that directly incorporates the 3D structure of a protein-ligand complex. We also apply a distance-aware graph attention algorithm with gate augmentation to increase the performance of our model. As a result, our model shows better performance than docking and other deep learning methods for both virtual screening and pose prediction. In addition, our model can reproduce the natural population distribution of active molecules and inactive molecules.
△ Less
Submitted 17 April, 2019;
originally announced April 2019.
-
Discovery of Natural Language Concepts in Individual Units of CNNs
Authors:
Seil Na,
Yo Joong Choe,
Dong-Hyun Lee,
Gunhee Kim
Abstract:
Although deep convolutional networks have achieved improved performance in many natural language tasks, they have been treated as black boxes because they are difficult to interpret. Especially, little is known about how they represent language in their intermediate layers. In an attempt to understand the representations of deep convolutional networks trained on language tasks, we show that indivi…
▽ More
Although deep convolutional networks have achieved improved performance in many natural language tasks, they have been treated as black boxes because they are difficult to interpret. Especially, little is known about how they represent language in their intermediate layers. In an attempt to understand the representations of deep convolutional networks trained on language tasks, we show that individual units are selectively responsive to specific morphemes, words, and phrases, rather than responding to arbitrary and uninterpretable patterns. In order to quantitatively analyze such an intriguing phenomenon, we propose a concept alignment method based on how units respond to the replicated text. We conduct analyses with different architectures on multiple datasets for classification and translation tasks and provide new insights into how deep models understand natural language.
△ Less
Submitted 28 February, 2019; v1 submitted 18 February, 2019;
originally announced February 2019.
-
V2X Downlink Coverage Analysis with a Realistic Urban Vehicular Model
Authors:
Yae Jee Cho,
Kaibin Huang,
Chan-Byoung Chae
Abstract:
As the realization of vehicular communication such as vehicle-to-vehicle (V2V) or vehicle-to-infrastructure (V2I) is imperative for the autonomous driving cars, the understanding of realistic vehicle-to-everything (V2X) models is needed. While previous research has mostly targeted vehicular models in which vehicles are randomly distributed and the variable of carrier frequency was not considered,…
▽ More
As the realization of vehicular communication such as vehicle-to-vehicle (V2V) or vehicle-to-infrastructure (V2I) is imperative for the autonomous driving cars, the understanding of realistic vehicle-to-everything (V2X) models is needed. While previous research has mostly targeted vehicular models in which vehicles are randomly distributed and the variable of carrier frequency was not considered, a more realistic analysis of the V2X model is proposed in this paper. We use a one-dimensional (1D) Poisson cluster process (PCP) to model a realistic scenario of vehicle distribution in a perpendicular cross line road urban area and compare the coverage results with the previous research that distributed vehicles randomly by Poisson Point Process (PPP). Moreover, we incorporate the effect of different carrier frequencies, mmWave and sub-6 GHz, to our analysis by altering the antenna radiation pattern accordingly. Results indicated that while the effect of clustering led to lower outage, using mmWave had even more significance in leading to lower outage. Moreover, line-of-sight (LoS) interference links are shown to be more dominant in lowering the outage than the non-line-of-sight (NLoS) links even though they are less in number. The analytical results give insight into designing and analyzing the urban V2X channels, and are verified by actual urban area three-dimensional (3D) ray-tracing simulation.
△ Less
Submitted 25 June, 2018; v1 submitted 10 May, 2018;
originally announced May 2018.
-
RF Lens-Embedded Antenna Array for mmWave MIMO: Design and Performance
Authors:
Yae Jee Cho,
Gee-Yong Suk,
Byoungnam Kim,
Dong Ku Kim,
Chan-Byoung Chae
Abstract:
The requirement of high data-rate in the fifth generation wireless systems (5G) calls for the ultimate utilization of the wide bandwidth in the mmWave frequency band. Researchers seeking to compensate for mmWave's high path loss and to achieve both gain and directivity have proposed that mmWave multiple-input multiple-output (MIMO) systems make use of beamforming systems. Hybrid beamforming in mmW…
▽ More
The requirement of high data-rate in the fifth generation wireless systems (5G) calls for the ultimate utilization of the wide bandwidth in the mmWave frequency band. Researchers seeking to compensate for mmWave's high path loss and to achieve both gain and directivity have proposed that mmWave multiple-input multiple-output (MIMO) systems make use of beamforming systems. Hybrid beamforming in mmWave demonstrates promising performance in achieving high gain and directivity by using phase shifters at the analog processing block. What remains a problem, however, is the actual implementation of mmWave beamforming systems; to fabricate such a system is costly and complex. With the aim of reducing such cost and complexity, this article presents actual prototypes of the lens antenna as an effective device to be used in the future 5G mmWave hybrid beamforming systems. Using a lens as a passive phase shifter enables beamforming without the heavy network of active phase shifters, while gain and directivity are achieved by the energy-focusing property of the lens. Proposed in this article are two types of lens antennas, one for static and the other for mobile usage. Their performance is evaluated using measurements and simulation data along with link-level analysis via a software defined radio (SDR) platform. Results show the promising potential of the lens antenna for its high gain and directivity, and its improved beam-switching feasibility compared to when a lens is not used. System-level evaluations reveal the significant throughput enhancement in both real indoor and outdoor environments. Moreover, the lens antenna's design issues are also discussed by evaluating different lens sizes.
△ Less
Submitted 22 January, 2018;
originally announced January 2018.
-
Map-based Millimeter-Wave Channel Models: An Overview, Hybrid Modeling, Data, and Learning
Authors:
Yeon-Geun Lim,
Yae Jee Cho,
MinSoo Sim,
Younsun Kim,
Chan-Byoung Chae,
Reinaldo A. Valenzuela
Abstract:
Compared to the current wireless communication systems, millimeter wave (mm-Wave) promises a wide range of spectrum. As viable alternatives to existing mm-Wave channel models, various map-based channel models with different modeling methods have been widely discussed. Map-based channel models are based on a ray-tracing algorithm and include realistic channel parameters in a given map. Such paramet…
▽ More
Compared to the current wireless communication systems, millimeter wave (mm-Wave) promises a wide range of spectrum. As viable alternatives to existing mm-Wave channel models, various map-based channel models with different modeling methods have been widely discussed. Map-based channel models are based on a ray-tracing algorithm and include realistic channel parameters in a given map. Such parameters enable researchers to accurately evaluate novel technologies in the mm-Wave range. Diverse map-based modeling methods result in different modeling objectives, including the characteristics of channel parameters and different complexities of the modeling procedure. This article outlines an overview of map-based mm-Wave channel models and proposes a concept of how they can be utilized to integrate a hardware testbed/sounder with a software testbed/sounder. In addition, we categorize map-based channel parameters and provide guidelines for hybrid modeling. Next, we share the measurement data and the map-based channel parameters with the public. Lastly, we evaluate a machine learning-based beam selection algorithm through the shared database. We expect that the offered guidelines and the shared database will enable researchers to readily design a map-based channel model.
△ Less
Submitted 10 July, 2019; v1 submitted 24 November, 2017;
originally announced November 2017.
-
Relationship between Cross-Polarization Discrimination (XPD) and Spatial Correlation in Indoor Small-Cell MIMO Systems
Authors:
Yeon-Geun Lim,
Yae Jee Cho,
TaeckKeun Oh,
Yongshik Lee,
Chan-Byoung Chae
Abstract:
In this letter, we present a correlated channel model for a dual-polarization antenna to omnidirectional antennas in indoor small-cell multiple-input multiple-output (MIMO) systems. In an indoor environment, we confirm that the cross-polarization discrimination (XPD) in the direction of angle-of-departure can be represented as the spatial correlation of the MIMO channel. We also evaluate a dual-po…
▽ More
In this letter, we present a correlated channel model for a dual-polarization antenna to omnidirectional antennas in indoor small-cell multiple-input multiple-output (MIMO) systems. In an indoor environment, we confirm that the cross-polarization discrimination (XPD) in the direction of angle-of-departure can be represented as the spatial correlation of the MIMO channel. We also evaluate a dual-polarization antenna-based MIMO channel model and a spatially correlated channel model using a three-dimensional (3D) ray-tracing simulator. Furthermore, we provide the equivalent distance between adjacent antennas according to the XPD, providing insights into designing a dual-polarization antenna and its arrays.
△ Less
Submitted 6 December, 2017; v1 submitted 1 July, 2017;
originally announced July 2017.
-
Effective Enzyme Deployment for Degradation of Interference Molecules in Molecular Communication
Authors:
Yae Jee Cho,
H. Birkan Yilmaz,
Weisi Guo,
Chan-Byoung Chae
Abstract:
In molecular communication, the heavy tail nature of molecular signals causes inter-symbol interference (ISI). Because of this, it is difficult to decrease symbol periods and achieve high data rate. As a probable solution for ISI mitigation, enzymes were proposed to be used since they are capable of degrading ISI molecules without deteriorating the molecular communication. While most prior work ha…
▽ More
In molecular communication, the heavy tail nature of molecular signals causes inter-symbol interference (ISI). Because of this, it is difficult to decrease symbol periods and achieve high data rate. As a probable solution for ISI mitigation, enzymes were proposed to be used since they are capable of degrading ISI molecules without deteriorating the molecular communication. While most prior work has assumed an infinite amount of enzymes deployed around the channel, from a resource perspective, it is more efficient to deploy a limited amount of enzymes at particular locations and structures. This paper considers carrying out such deployment at two structures--around the receiver (Rx) and/or the transmitter (Tx) site. For both of the deployment scenarios, channels with different system environment parameters, Tx-to-Rx distance, size of enzyme area, and symbol period, are compared with each other for analyzing an optimized system environment for ISI mitigation when a limited amount of enzymes are available.
△ Less
Submitted 18 March, 2017;
originally announced March 2017.
-
A Machine Learning Approach to Model the Received Signal in Molecular Communications
Authors:
H. Birkan Yilmaz,
Changmin Lee,
Yae Jee Cho,
Chan-Byoung Chae
Abstract:
A molecular communication channel is determined by the received signal. Received signal models form the basis for studies focused on modulation, receiver design, capacity, and coding depend on the received signal models. Therefore, it is crucial to model the number of received molecules until time $t$ analytically. Modeling the diffusion-based molecular communication channel with the first-hitting…
▽ More
A molecular communication channel is determined by the received signal. Received signal models form the basis for studies focused on modulation, receiver design, capacity, and coding depend on the received signal models. Therefore, it is crucial to model the number of received molecules until time $t$ analytically. Modeling the diffusion-based molecular communication channel with the first-hitting process is an open issue for a spherical transmitter. In this paper, we utilize the artificial neural networks technique to model the received signal for a spherical transmitter and a perfectly absorbing receiver (i.e., first hitting process). The proposed technique may be utilized in other studies that assume a spherical transmitter instead of a point transmitter.
△ Less
Submitted 18 November, 2016;
originally announced November 2016.
-
Effective inter-symbol interference mitigation with a limited amount of enzymes in molecular communications
Authors:
Yae Jee Cho,
H. Birkan Yilmaz,
Weisi Guo,
Chan-Byoung Chae
Abstract:
In molecular communication via diffusion (MCvD), the inter-symbol interference (ISI) is a well known severe problem that deteriorates both data rates and link reliability. ISI mainly occurs due to the slow and highly random propagation of the messenger molecules, which causes the emitted molecules from the previous symbols to interfere with molecules from the current symbol. An effective way to mi…
▽ More
In molecular communication via diffusion (MCvD), the inter-symbol interference (ISI) is a well known severe problem that deteriorates both data rates and link reliability. ISI mainly occurs due to the slow and highly random propagation of the messenger molecules, which causes the emitted molecules from the previous symbols to interfere with molecules from the current symbol. An effective way to mitigate the ISI is using enzymes to degrade undesired molecules. Prior work on ISI mitigation by enzymes has assumed an infinite amount of enzymes randomly distributed around the molecular channel. Taking a different approach, this paper assumes an MCvD channel with a limited amount of enzymes. The main question this paper addresses is how to deploy these enzymes in an effective structure so that ISI mitigation is maximized. To find an effective MCvD channel environment, this study considers optimization of the shape of the transmitter node, the deployment location and structure, the size of the enzyme deployed area, and the half-lives of the enzymes. It also analyzes the dependence of the optimum size of the enzyme area on the distance and half-life.
△ Less
Submitted 18 April, 2016;
originally announced April 2016.