-
AutoTM 2.0: Automatic Topic Modeling Framework for Documents Analysis
Authors:
Maria Khodorchenko,
Nikolay Butakov,
Maxim Zuev,
Denis Nasonov
Abstract:
In this work, we present an AutoTM 2.0 framework for optimizing additively regularized topic models. Comparing to the previous version, this version includes such valuable improvements as novel optimization pipeline, LLM-based quality metrics and distributed mode.
AutoTM 2.0 is a comfort tool for specialists as well as non-specialists to work with text documents to conduct exploratory data analy…
▽ More
In this work, we present an AutoTM 2.0 framework for optimizing additively regularized topic models. Comparing to the previous version, this version includes such valuable improvements as novel optimization pipeline, LLM-based quality metrics and distributed mode.
AutoTM 2.0 is a comfort tool for specialists as well as non-specialists to work with text documents to conduct exploratory data analysis or to perform clustering task on interpretable set of features. Quality evaluation is based on specially developed metrics such as coherence and gpt-4-based approaches. Researchers and practitioners can easily integrate new optimization algorithms and adapt novel metrics to enhance modeling quality and extend their experiments.
We show that AutoTM 2.0 achieves better performance compared to the previous AutoTM by providing results on 5 datasets with different features and in two different languages.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
Improvement of Computational Performance of Evolutionary AutoML in a Heterogeneous Environment
Authors:
Nikolay O. Nikitin,
Sergey Teryoshkin,
Valerii Pokrovskii,
Sergey Pakulin,
Denis Nasonov
Abstract:
Resource-intensive computations are a major factor that limits the effectiveness of automated machine learning solutions. In the paper, we propose a modular approach that can be used to increase the quality of evolutionary optimization for modelling pipelines with a graph-based structure. It consists of several stages - parallelization, caching and evaluation. Heterogeneous and remote resources ca…
▽ More
Resource-intensive computations are a major factor that limits the effectiveness of automated machine learning solutions. In the paper, we propose a modular approach that can be used to increase the quality of evolutionary optimization for modelling pipelines with a graph-based structure. It consists of several stages - parallelization, caching and evaluation. Heterogeneous and remote resources can be involved in the evaluation stage. The conducted experiments confirm the correctness and effectiveness of the proposed approach. The implemented algorithms are available as a part of the open-source framework FEDOT.
△ Less
Submitted 12 January, 2023;
originally announced January 2023.
-
User profiles matching for different social networks based on faces embeddings
Authors:
Timur Sokhin,
Nikolay Butakov,
Denis Nasonov
Abstract:
It is common practice nowadays to use multiple social networks for different social roles. Although this, these networks assume differences in content type, communications and style of speech. If we intend to understand human behaviour as a key-feature for recommender systems, banking risk assessments or sociological researches, this is better to achieve using a combination of the data from differ…
▽ More
It is common practice nowadays to use multiple social networks for different social roles. Although this, these networks assume differences in content type, communications and style of speech. If we intend to understand human behaviour as a key-feature for recommender systems, banking risk assessments or sociological researches, this is better to achieve using a combination of the data from different social media. In this paper, we propose a new approach for user profiles matching across social media based on embeddings of publicly available users' face photos and conduct an experimental study of its efficiency. Our approach is stable to changes in content and style for certain social media.
△ Less
Submitted 15 May, 2019;
originally announced May 2019.
-
Reconstructions in human history by mapping dental markers in living Eurasian populations
Authors:
Vera F. Kashibadze,
Olga G. Nasonova,
Dmitry S. Nasonov
Abstract:
On the base of advantages in gene geography and anthropophenetics the phenogeographical method for anthropological research is initiated and experienced using dental data. Statistical and cartographical analyses are provided for 498 living Eurasian populations. Mapping principal components supplied evidence for the phene pool structure in Eurasian populations and for reconstructions of our species…
▽ More
On the base of advantages in gene geography and anthropophenetics the phenogeographical method for anthropological research is initiated and experienced using dental data. Statistical and cartographical analyses are provided for 498 living Eurasian populations. Mapping principal components supplied evidence for the phene pool structure in Eurasian populations and for reconstructions of our species history on the continent. The longitudinal variability seems to be the most important regularity revealed by principal components analysis (PCA) and mapping proving the division of the whole area into western and eastern main provinces. So, the most ancient scenario in the history of Eurasian populations was developing from two perspective different groups: western group related to ancient populations of West Asia and the eastern one rooted by ancestry in South and/or East Asia. In spite of the enormous territory and the revealed divergence the populations of the continent have undergone wide scale and intensive time-space interaction. Many details in the revealed landscapes could be backgrounded to different historical events. The most amazing results are obtained for proving migrations and assimilation as two essential phenomena in Eurasian history: the wide spread of the western combination through the whole continent till the Pacific coastline and the envision of the movement of the paradox combinations of eastern and western markers from South or Central Asia to the east and to the west. Taking into account that no additional eastern combinations in the total variation in Asian groups have been found but mixed or western markers' sets and that eastern dental characteristics are traced in Asia since Homo erectus, the assumption is made in favour of the hetero-level assimilation in the Eastern province and of net-like evolution of our species.
△ Less
Submitted 17 July, 2011;
originally announced July 2011.