-
Theory of nonlinear whispering gallery mode dynamics in a cylindrical microresonator with a radius variation
Authors:
Alena Yu. Kolesnikova,
Ilya D. Vatnik
Abstract:
We propose a comprehensive model describing the Kerr nonlinear dynamics of an electric field in a cylindrical microresonator with an effective radius variation, coupled to a radiation source. The proposed system of equations for coupled azimuthal modes takes into account full azimuthal dispersion as well as the influence of the radiation source on the field in the microresonator with the coupling…
▽ More
We propose a comprehensive model describing the Kerr nonlinear dynamics of an electric field in a cylindrical microresonator with an effective radius variation, coupled to a radiation source. The proposed system of equations for coupled azimuthal modes takes into account full azimuthal dispersion as well as the influence of the radiation source on the field in the microresonator with the coupling coefficients determined experimentally. The model appears a powerful tool to study nonlinear effects, generation axial-azimuthal modes and optical frequency combs. We illustrate the power of the model with optimization of the coupling point of the light source, getting two order of magnitude improvement for the nonlinear threshold.
△ Less
Submitted 10 March, 2023;
originally announced March 2023.
-
Knowledge Distillation of Russian Language Models with Reduction of Vocabulary
Authors:
Alina Kolesnikova,
Yuri Kuratov,
Vasily Konovalov,
Mikhail Burtsev
Abstract:
Today, transformer language models serve as a core component for majority of natural language processing tasks. Industrial application of such models requires minimization of computation time and memory footprint. Knowledge distillation is one of approaches to address this goal. Existing methods in this field are mainly focused on reducing the number of layers or dimension of embeddings/hidden rep…
▽ More
Today, transformer language models serve as a core component for majority of natural language processing tasks. Industrial application of such models requires minimization of computation time and memory footprint. Knowledge distillation is one of approaches to address this goal. Existing methods in this field are mainly focused on reducing the number of layers or dimension of embeddings/hidden representations. Alternative option is to reduce the number of tokens in vocabulary and therefore the embeddings matrix of the student model. The main problem with vocabulary minimization is mismatch between input sequences and output class distributions of a teacher and a student models. As a result, it is impossible to directly apply KL-based knowledge distillation. We propose two simple yet effective alignment techniques to make knowledge distillation to the students with reduced vocabulary. Evaluation of distilled models on a number of common benchmarks for Russian such as Russian SuperGLUE, SberQuAD, RuSentiment, ParaPhaser, Collection-3 demonstrated that our techniques allow to achieve compression from $17\times$ to $49\times$, while maintaining quality of $1.7\times$ compressed student with the full-sized vocabulary, but reduced number of Transformer layers only. We make our code and distilled models available.
△ Less
Submitted 4 May, 2022;
originally announced May 2022.
-
Can Deep Learning Predict Risky Retail Investors? A Case Study in Financial Risk Behavior Forecasting
Authors:
Yaodong Yang,
Alisa Kolesnikova,
Stefan Lessmann,
Tiejun Ma,
Ming-Chien Sung,
Johnnie E. V. Johnson
Abstract:
The paper examines the potential of deep learning to support decisions in financial risk management. We develop a deep learning model for predicting whether individual spread traders secure profits from future trades. This task embodies typical modeling challenges faced in risk and behavior forecasting. Conventional machine learning requires data that is representative of the feature-target relati…
▽ More
The paper examines the potential of deep learning to support decisions in financial risk management. We develop a deep learning model for predicting whether individual spread traders secure profits from future trades. This task embodies typical modeling challenges faced in risk and behavior forecasting. Conventional machine learning requires data that is representative of the feature-target relationship and relies on the often costly development, maintenance, and revision of handcrafted features. Consequently, modeling highly variable, heterogeneous patterns such as trader behavior is challenging. Deep learning promises a remedy. Learning hierarchical distributed representations of the data in an automatic manner (e.g. risk taking behavior), it uncovers generative features that determine the target (e.g., trader's profitability), avoids manual feature engineering, and is more robust toward change (e.g. dynamic market conditions). The results of employing a deep network for operational risk forecasting confirm the feature learning capability of deep learning, provide guidance on designing a suitable network architecture and demonstrate the superiority of deep learning over machine learning and rule-based benchmarks.
△ Less
Submitted 17 November, 2019; v1 submitted 14 December, 2018;
originally announced December 2018.