-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seongjin Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Non-backtracking Graph Neural Networks
Authors:
Seonghyun Park,
Narae Ryu,
Gahee Kim,
Dongyeop Woo,
Se-Young Yun,
Sungsoo Ahn
Abstract:
The celebrated message-passing updates for graph neural networks allow representing large-scale graphs with local and computationally tractable updates. However, the updates suffer from backtracking, i.e., a message flowing through the same edge twice and revisiting the previously visited node. Since the number of message flows increases exponentially with the number of updates, the redundancy in…
▽ More
The celebrated message-passing updates for graph neural networks allow representing large-scale graphs with local and computationally tractable updates. However, the updates suffer from backtracking, i.e., a message flowing through the same edge twice and revisiting the previously visited node. Since the number of message flows increases exponentially with the number of updates, the redundancy in local updates prevents the graph neural network from accurately recognizing a particular message flow relevant for downstream tasks. In this work, we propose to resolve such a redundancy issue via the non-backtracking graph neural network (NBA-GNN) that updates a message without incorporating the message from the previously visited node. We theoretically investigate how NBA-GNN alleviates the over-squashing of GNNs, and establish a connection between NBA-GNN and the impressive performance of non-backtracking updates for stochastic block model recovery. Furthermore, we empirically verify the effectiveness of our NBA-GNN on the long-range graph benchmark and transductive node classification problems.
△ Less
Submitted 25 September, 2024; v1 submitted 11 October, 2023;
originally announced October 2023.
-
360$^\circ$ Reconstruction From a Single Image Using Space Carved Outpainting
Authors:
Nuri Ryu,
Minsu Gong,
Geonung Kim,
Joo-Haeng Lee,
Sunghyun Cho
Abstract:
We introduce POP3D, a novel framework that creates a full $360^\circ$-view 3D model from a single image. POP3D resolves two prominent issues that limit the single-view reconstruction. Firstly, POP3D offers substantial generalizability to arbitrary categories, a trait that previous methods struggle to achieve. Secondly, POP3D further improves reconstruction fidelity and naturalness, a crucial aspec…
▽ More
We introduce POP3D, a novel framework that creates a full $360^\circ$-view 3D model from a single image. POP3D resolves two prominent issues that limit the single-view reconstruction. Firstly, POP3D offers substantial generalizability to arbitrary categories, a trait that previous methods struggle to achieve. Secondly, POP3D further improves reconstruction fidelity and naturalness, a crucial aspect that concurrent works fall short of. Our approach marries the strengths of four primary components: (1) a monocular depth and normal predictor that serves to predict crucial geometric cues, (2) a space carving method capable of demarcating the potentially unseen portions of the target object, (3) a generative model pre-trained on a large-scale image dataset that can complete unseen regions of the target, and (4) a neural implicit surface reconstruction method tailored in reconstructing objects using RGB images along with monocular geometric cues. The combination of these components enables POP3D to readily generalize across various in-the-wild images and generate state-of-the-art reconstructions, outperforming similar works by a significant margin. Project page: \url{http://cg.postech.ac.kr/research/POP3D}
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Aligning Language Models with Preferences through f-divergence Minimization
Authors:
Dongyoung Go,
Tomasz Korbak,
Germán Kruszewski,
Jos Rozen,
Nahyeon Ryu,
Marc Dymetman
Abstract:
Aligning language models with preferences can be posed as approximating a target distribution representing some desired behavior. Existing approaches differ both in the functional form of the target distribution and the algorithm used to approximate it. For instance, Reinforcement Learning from Human Feedback (RLHF) corresponds to minimizing a reverse KL from an implicit target distribution arisin…
▽ More
Aligning language models with preferences can be posed as approximating a target distribution representing some desired behavior. Existing approaches differ both in the functional form of the target distribution and the algorithm used to approximate it. For instance, Reinforcement Learning from Human Feedback (RLHF) corresponds to minimizing a reverse KL from an implicit target distribution arising from a KL penalty in the objective. On the other hand, Generative Distributional Control (GDC) has an explicit target distribution and minimizes a forward KL from it using the Distributional Policy Gradient (DPG) algorithm. In this paper, we propose a new approach, f-DPG, which allows the use of any f-divergence to approximate any target distribution that can be evaluated. f-DPG unifies both frameworks (RLHF, GDC) and the approximation methods (DPG, RL with KL penalties). We show the practical benefits of various choices of divergence objectives and demonstrate that there is no universally optimal objective but that different divergences present different alignment and diversity trade-offs. We show that Jensen-Shannon divergence strikes a good balance between these objectives, and frequently outperforms forward KL divergence by a wide margin, leading to significant improvements over prior work. These distinguishing characteristics between divergences persist as the model size increases, highlighting the importance of selecting appropriate divergence objectives.
△ Less
Submitted 6 June, 2023; v1 submitted 16 February, 2023;
originally announced February 2023.
-
Dr.3D: Adapting 3D GANs to Artistic Drawings
Authors:
Wonjoon Jin,
Nuri Ryu,
Geonung Kim,
Seung-Hwan Baek,
Sunghyun Cho
Abstract:
While 3D GANs have recently demonstrated the high-quality synthesis of multi-view consistent images and 3D shapes, they are mainly restricted to photo-realistic human portraits. This paper aims to extend 3D GANs to a different, but meaningful visual form: artistic portrait drawings. However, extending existing 3D GANs to drawings is challenging due to the inevitable geometric ambiguity present in…
▽ More
While 3D GANs have recently demonstrated the high-quality synthesis of multi-view consistent images and 3D shapes, they are mainly restricted to photo-realistic human portraits. This paper aims to extend 3D GANs to a different, but meaningful visual form: artistic portrait drawings. However, extending existing 3D GANs to drawings is challenging due to the inevitable geometric ambiguity present in drawings. To tackle this, we present Dr.3D, a novel adaptation approach that adapts an existing 3D GAN to artistic drawings. Dr.3D is equipped with three novel components to handle the geometric ambiguity: a deformation-aware 3D synthesis network, an alternating adaptation of pose estimation and image synthesis, and geometric priors. Experiments show that our approach can successfully adapt 3D GANs to drawings and enable multi-view consistent semantic editing of drawings.
△ Less
Submitted 30 November, 2022;
originally announced November 2022.
-
What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
Authors:
Boseop Kim,
HyoungSeok Kim,
Sang-Woo Lee,
Gichang Lee,
Donghyun Kwak,
Dong Hyeon Jeon,
Sunghyun Park,
Sungju Kim,
Seonhoon Kim,
Dongpil Seo,
Heungsub Lee,
Minyoung Jeong,
Sungjae Lee,
Minsub Kim,
Suk Hyun Ko,
Seokhun Kim,
Taeyong Park,
Jinuk Kim,
Soyoung Kang,
Na-Hyeon Ryu,
Kang Min Yoo,
Minsuk Chang,
Soobin Suh,
Sookyo In,
Jinseong Park
, et al. (12 additional authors not shown)
Abstract:
GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data. Here we address some remaining issues less reported by the GPT-3 paper, such as a non-English LM, the performances of different sized models, and the effect of recently introduced prompt optimization on in-context learning. To achieve this, we introduce HyperCLOVA, a K…
▽ More
GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data. Here we address some remaining issues less reported by the GPT-3 paper, such as a non-English LM, the performances of different sized models, and the effect of recently introduced prompt optimization on in-context learning. To achieve this, we introduce HyperCLOVA, a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens. Enhanced by our Korean-specific tokenization, HyperCLOVA with our training configuration shows state-of-the-art in-context zero-shot and few-shot learning performances on various downstream tasks in Korean. Also, we show the performance benefits of prompt-based learning and demonstrate how it can be integrated into the prompt engineering pipeline. Then we discuss the possibility of materializing the No Code AI paradigm by providing AI prototyping capabilities to non-experts of ML by introducing HyperCLOVA studio, an interactive prompt engineering interface. Lastly, we demonstrate the potential of our methods with three successful in-house applications.
△ Less
Submitted 28 November, 2021; v1 submitted 9 September, 2021;
originally announced September 2021.
-
Regret in Online Recommendation Systems
Authors:
Kaito Ariu,
Narae Ryu,
Se-Young Yun,
Alexandre Proutière
Abstract:
This paper proposes a theoretical analysis of recommendation systems in an online setting, where items are sequentially recommended to users over time. In each round, a user, randomly picked from a population of $m$ users, requests a recommendation. The decision-maker observes the user and selects an item from a catalogue of $n$ items. Importantly, an item cannot be recommended twice to the same u…
▽ More
This paper proposes a theoretical analysis of recommendation systems in an online setting, where items are sequentially recommended to users over time. In each round, a user, randomly picked from a population of $m$ users, requests a recommendation. The decision-maker observes the user and selects an item from a catalogue of $n$ items. Importantly, an item cannot be recommended twice to the same user. The probabilities that a user likes each item are unknown. The performance of the recommendation algorithm is captured through its regret, considering as a reference an Oracle algorithm aware of these probabilities. We investigate various structural assumptions on these probabilities: we derive for each structure regret lower bounds, and devise algorithms achieving these limits. Interestingly, our analysis reveals the relative weights of the different components of regret: the component due to the constraint of not presenting the same item twice to the same user, that due to learning the chances users like items, and finally that arising when learning the underlying structure.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
Voice Search and Typed Search Performance Comparison on Baidu Search System
Authors:
Hanqing Huang,
Kezia Irene,
Nahyun Ryu
Abstract:
Although the voice search system is getting more and more developed, some people still have difficulties when searching for information with the voice search system. This paper is a pilot study to compare the search performance of people using voice search and typed search using Baidu search system. We surveyed and interviewed 40 Chinese students who have been using the Baidu search system. Afterw…
▽ More
Although the voice search system is getting more and more developed, some people still have difficulties when searching for information with the voice search system. This paper is a pilot study to compare the search performance of people using voice search and typed search using Baidu search system. We surveyed and interviewed 40 Chinese students who have been using the Baidu search system. Afterward, we analyzed 8 people who had a middle to advanced searching ability by their behaviors, search results, and average query length. We found that there are a lot of variations among the participants' time when searching for different queries, and there were some interesting behaviors that were displayed by a number of participants. We conclude that more participants are needed to make a firm conclusion on the performance comparison between the voice search and typed search.
△ Less
Submitted 17 November, 2019;
originally announced November 2019.
-
Community Detection with Colored Edges
Authors:
Narae Ryu,
Sae-Young Chung
Abstract:
In this paper, we prove a sharp limit on the community detection problem with colored edges. We assume two equal-sized communities and there are $m$ different types of edges. If two vertices are in the same community, the distribution of edges follows $p_i=α_i\log{n}/n$ for $1\leq i \leq m$, otherwise the distribution of edges is $q_i=β_i\log{n}/n$ for $1\leq i \leq m$, where $α_i$ and $β_i$ are p…
▽ More
In this paper, we prove a sharp limit on the community detection problem with colored edges. We assume two equal-sized communities and there are $m$ different types of edges. If two vertices are in the same community, the distribution of edges follows $p_i=α_i\log{n}/n$ for $1\leq i \leq m$, otherwise the distribution of edges is $q_i=β_i\log{n}/n$ for $1\leq i \leq m$, where $α_i$ and $β_i$ are positive constants and $n$ is the total number of vertices. Under these assumptions, a fundamental limit on community detection is characterized using the Hellinger distance between the two distributions. If $\sum_{i=1}^{m} {(\sqrt{α_i} - \sqrt{β_i})}^2 >2$, then the community detection via maximum likelihood (ML) estimator is possible with high probability. If $\sum_{i=1}^m {(\sqrt{α_i} - \sqrt{β_i})}^2 < 2$, the probability that the ML estimator fails to detect the communities does not go to zero.
△ Less
Submitted 12 January, 2017;
originally announced February 2017.