Skip to main content

Showing 1–32 of 32 results for author: Ryabinin, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.20039  [pdf, other

    cs.CL cs.LG

    AutoJudge: Judge Decoding Without Manual Annotation

    Authors: Roman Garipov, Fedor Velikonivtsev, Ruslan Svirschevski, Vage Egiazarian, Max Ryabinin

    Abstract: We introduce AutoJudge, a framework that accelerates large language model (LLM) inference with task-specific lossy speculative decoding. Instead of matching the original model output distribution token-by-token, we identify which of the generated tokens affect the downstream quality of the generated response, relaxing the guarantee so that the "unimportant" tokens can be generated faster. Our appr… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: Preprint, Work in progress

  2. arXiv:2502.13252  [pdf, other

    cs.CL

    Multilingual Language Model Pretraining using Machine-translated Data

    Authors: Jiayi Wang, Yao Lu, Maurice Weber, Max Ryabinin, David Adelani, Yihong Chen, Raphael Tang, Pontus Stenetorp

    Abstract: High-resource languages such as English, enables the pretraining of high-quality large language models (LLMs). The same can not be said for most other languages as LLMs still underperform for non-English languages, likely due to a gap in the quality and diversity of the available multilingual pretraining corpora. In this work, we find that machine-translated texts from a single high-quality source… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  3. arXiv:2501.16007  [pdf, other

    cs.CR cs.DC

    TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference

    Authors: Jack Min Ong, Matthew Di Ferrante, Aaron Pazdera, Ryan Garner, Sami Jaghouar, Manveer Basra, Max Ryabinin, Johannes Hagemann

    Abstract: Large language models (LLMs) have proven to be very capable, but access to frontier models currently relies on inference providers. This introduces trust challenges: how can we be sure that the provider is using the model configuration they claim? We propose TOPLOC, a novel method for verifiable inference that addresses this problem. TOPLOC leverages a compact locality-sensitive hashing mechanism… ▽ More

    Submitted 30 May, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

    Comments: 16 pages, 11 tables, 3 figures

  4. arXiv:2501.08365  [pdf

    cs.CY cs.AI cs.CL cs.LG

    Towards Best Practices for Open Datasets for LLM Training

    Authors: Stefan Baack, Stella Biderman, Kasia Odrozek, Aviya Skowron, Ayah Bdeir, Jillian Bommarito, Jennifer Ding, Maximilian Gahntz, Paul Keller, Pierre-Carl Langlais, Greg Lindahl, Sebastian Majstorovic, Nik Marda, Guilherme Penedo, Maarten Van Segbroeck, Jennifer Wang, Leandro von Werra, Mitchell Baker, Julie Belião, Kasia Chmielinski, Marzieh Fadaee, Lisa Gutermuth, Hynek Kydlíček, Greg Leppert, EM Lewis-Jong , et al. (14 additional authors not shown)

    Abstract: Many AI companies are training their large language models (LLMs) on data without the permission of the copyright owners. The permissibility of doing so varies by jurisdiction: in countries like the EU and Japan, this is allowed under certain restrictions, while in the United States, the legal landscape is more ambiguous. Regardless of the legal status, concerns from creative producers have led to… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

  5. arXiv:2412.16669  [pdf, other

    cs.LG cs.CR

    Label Privacy in Split Learning for Large Models with Parameter-Efficient Training

    Authors: Philip Zmushko, Marat Mansurov, Ruslan Svirschevski, Denis Kuznedelev, Max Ryabinin, Aleksandr Beznosikov

    Abstract: As deep learning models become larger and more expensive, many practitioners turn to fine-tuning APIs. These web services allow fine-tuning a model between two parties: the client that provides the data, and the server that hosts the model. While convenient, these APIs raise a new concern: the data of the client is at risk of privacy breach during the training procedure. This challenge presents an… ▽ More

    Submitted 21 December, 2024; originally announced December 2024.

  6. arXiv:2412.01152  [pdf, other

    cs.DC

    INTELLECT-1 Technical Report

    Authors: Sami Jaghouar, Jack Min Ong, Manveer Basra, Fares Obeid, Jannik Straube, Michael Keiblinger, Elie Bakouch, Lucas Atkins, Maziyar Panahi, Charles Goddard, Max Ryabinin, Johannes Hagemann

    Abstract: In this report, we introduce INTELLECT-1, the first 10 billion parameter language model collaboratively trained across the globe, demonstrating that large-scale model training is no longer confined to large corporations but can be achieved through a distributed, community-driven approach. INTELLECT-1 was trained on 1 trillion tokens using up to 14 concurrent nodes distributed across 3 continents,… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

    Comments: 19 pages, 6 figures

  7. arXiv:2411.12372  [pdf, other

    cs.CL cs.LG

    RedPajama: an Open Dataset for Training Large Language Models

    Authors: Maurice Weber, Daniel Fu, Quentin Anthony, Yonatan Oren, Shane Adams, Anton Alexandrov, Xiaozhong Lyu, Huu Nguyen, Xiaozhe Yao, Virginia Adams, Ben Athiwaratkun, Rahul Chalamala, Kezhen Chen, Max Ryabinin, Tri Dao, Percy Liang, Christopher Ré, Irina Rish, Ce Zhang

    Abstract: Large language models are increasingly becoming a cornerstone technology in artificial intelligence, the sciences, and society as a whole, yet the optimal strategies for dataset composition and filtering remain largely elusive. Many of the top-performing models lack transparency in their dataset curation and model development processes, posing an obstacle to the development of fully open language… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

    Comments: 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks

  8. arXiv:2410.23956  [pdf, other

    cs.CL

    Multilingual Pretraining Using a Large Corpus Machine-Translated from a Single Source Language

    Authors: Jiayi Wang, Yao Lu, Maurice Weber, Max Ryabinin, Yihong Chen, Raphael Tang, Pontus Stenetorp

    Abstract: English, as a very high-resource language, enables the pretraining of high-quality large language models (LLMs). The same cannot be said for most other languages, as leading LLMs still underperform for non-English languages, likely due to a gap in the quality and diversity of the available multilingual pretraining corpora. In this work, we find that machine-translated text from a single high-quali… ▽ More

    Submitted 5 November, 2024; v1 submitted 31 October, 2024; originally announced October 2024.

  9. arXiv:2406.02532  [pdf, other

    cs.CL

    SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices

    Authors: Ruslan Svirschevski, Avner May, Zhuoming Chen, Beidi Chen, Zhihao Jia, Max Ryabinin

    Abstract: As large language models gain widespread adoption, running them efficiently becomes crucial. Recent works on LLM inference use speculative decoding to achieve extreme speedups. However, most of these works implicitly design their algorithms for high-end datacenter hardware. In this work, we ask the opposite question: how fast can we run LLMs on consumer machines? Consumer GPUs can no longer fit th… ▽ More

    Submitted 30 November, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  10. arXiv:2404.05904  [pdf, other

    cs.CL

    The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models

    Authors: Giwon Hong, Aryo Pradipta Gema, Rohit Saxena, Xiaotang Du, Ping Nie, Yu Zhao, Laura Perez-Beltrachini, Max Ryabinin, Xuanli He, Clémentine Fourrier, Pasquale Minervini

    Abstract: Large Language Models (LLMs) have transformed the Natural Language Processing (NLP) landscape with their remarkable ability to understand and generate human-like text. However, these models are prone to ``hallucinations'' -- outputs that do not align with factual reality or the input context. This paper introduces the Hallucinations Leaderboard, an open initiative to quantitatively measure and com… ▽ More

    Submitted 17 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  11. arXiv:2402.12374  [pdf, other

    cs.CL

    Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding

    Authors: Zhuoming Chen, Avner May, Ruslan Svirschevski, Yuhsun Huang, Max Ryabinin, Zhihao Jia, Beidi Chen

    Abstract: As the usage of large language models (LLMs) grows, performing efficient inference with these models becomes increasingly important. While speculative decoding has recently emerged as a promising direction for speeding up inference, existing methods are limited in their ability to scale to larger speculation budgets, and adapt to different hyperparameters and hardware. This paper introduces Sequoi… ▽ More

    Submitted 29 February, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  12. arXiv:2401.06766  [pdf, other

    cs.CL

    Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements

    Authors: Anton Voronov, Lena Wolf, Max Ryabinin

    Abstract: Large language models demonstrate a remarkable capability for learning to solve new tasks from a few examples. The prompt template, or the way the input examples are formatted to obtain the prompt, is an important yet often overlooked aspect of in-context learning. In this work, we conduct a comprehensive study of the template format's influence on the in-context learning performance. We evaluate… ▽ More

    Submitted 6 June, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: Accepted to Findings of ACL 2024. 24 pages, 10 figures. Code: https://github.com/yandex-research/mind-your-format

  13. arXiv:2312.08361  [pdf, other

    cs.LG cs.DC

    Distributed Inference and Fine-tuning of Large Language Models Over The Internet

    Authors: Alexander Borzunov, Max Ryabinin, Artem Chumachenko, Dmitry Baranchuk, Tim Dettmers, Younes Belkada, Pavel Samygin, Colin Raffel

    Abstract: Large language models (LLMs) are useful in many NLP tasks and become more capable with size, with the best open-source models having over 50 billion parameters. However, using these 50B+ models requires high-end hardware, making them inaccessible to most researchers. In this work, we investigate methods for cost-efficient inference and fine-tuning of LLMs, comparing local and distributed strategie… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: Accepted to Conference on Neural Information Processing Systems (NeurIPS) 2023. 20 pages, 3 figures

  14. arXiv:2310.09247  [pdf, other

    cs.CV cs.CL cs.LG

    Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet Hierarchy

    Authors: Anton Baryshnikov, Max Ryabinin

    Abstract: Text-to-image synthesis has recently attracted widespread attention due to rapidly improving quality and numerous practical applications. However, the language understanding capabilities of text-to-image models are still poorly understood, which makes it difficult to reason about prompt formulations that a given model would understand well. In this work, we measure the capability of popular text-t… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  15. arXiv:2303.06865  [pdf, other

    cs.LG cs.AI cs.PF

    FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

    Authors: Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang

    Abstract: The high computational and memory requirements of large language model (LLM) inference make it feasible only with multiple high-end accelerators. Motivated by the emerging demand for latency-insensitive tasks with batched processing, this paper initiates the study of high-throughput LLM inference using limited resources, such as a single commodity GPU. We present FlexGen, a high-throughput generat… ▽ More

    Submitted 12 June, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

  16. arXiv:2302.04841  [pdf, other

    cs.CV cs.LG

    Is This Loss Informative? Faster Text-to-Image Customization by Tracking Objective Dynamics

    Authors: Anton Voronov, Mikhail Khoroshikh, Artem Babenko, Max Ryabinin

    Abstract: Text-to-image generation models represent the next step of evolution in image synthesis, offering a natural way to achieve flexible yet fine-grained control over the result. One emerging area of research is the fast adaptation of large text-to-image models to smaller datasets or new visual concepts. However, many efficient methods of adaptation have a long training time, which limits their practic… ▽ More

    Submitted 1 November, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

    Comments: Accepted to Conference on Neural Information Processing Systems (NeurIPS) 2023. 20 pages, 15 figures. Code: https://github.com/yandex-research/DVAR

  17. arXiv:2301.11913  [pdf, other

    cs.DC cs.LG

    SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient

    Authors: Max Ryabinin, Tim Dettmers, Michael Diskin, Alexander Borzunov

    Abstract: Many deep learning applications benefit from using large models with billions of parameters. Training these models is notoriously expensive due to the need for specialized HPC clusters. In this work, we consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions. We analyze the performance of existing model-parallel… ▽ More

    Submitted 29 June, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: Accepted to International Conference on Machine Learning (ICML) 2023. 25 pages, 8 figures

  18. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  19. RuCoLA: Russian Corpus of Linguistic Acceptability

    Authors: Vladislav Mikhailov, Tatiana Shamardina, Max Ryabinin, Alena Pestova, Ivan Smurov, Ekaterina Artemova

    Abstract: Linguistic acceptability (LA) attracts the attention of the research community due to its many uses, such as testing the grammatical knowledge of language models and filtering implausible texts with acceptability classifiers. However, the application scope of LA in languages other than English is limited due to the lack of high-quality resources. To this end, we introduce the Russian Corpus of Lin… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: Accepted to the EMNLP 2022 main conference

  20. arXiv:2209.01188  [pdf, other

    cs.LG cs.DC

    Petals: Collaborative Inference and Fine-tuning of Large Models

    Authors: Alexander Borzunov, Dmitry Baranchuk, Tim Dettmers, Max Ryabinin, Younes Belkada, Artem Chumachenko, Pavel Samygin, Colin Raffel

    Abstract: Many NLP tasks benefit from using large language models (LLMs) that often have more than 100 billion parameters. With the release of BLOOM-176B and OPT-175B, everyone can download pretrained models of this scale. Still, using these models requires high-end hardware unavailable to many researchers. In some cases, LLMs can be used more affordably via RAM offloading or hosted APIs. However, these tec… ▽ More

    Submitted 2 March, 2023; v1 submitted 2 September, 2022; originally announced September 2022.

    Comments: 10 pages, 4 figures. The version 2 updates the benchmarks and the description of the chat application. Source code and docs: https://petals.ml

  21. arXiv:2207.03481  [pdf, other

    cs.LG cs.DC

    Training Transformers Together

    Authors: Alexander Borzunov, Max Ryabinin, Tim Dettmers, Quentin Lhoest, Lucile Saulnier, Michael Diskin, Yacine Jernite, Thomas Wolf

    Abstract: The infrastructure necessary for training state-of-the-art models is becoming overly expensive, which makes training such models affordable only to large corporations and institutions. Recent work proposes several methods for training such models collaboratively, i.e., by pooling together hardware from many independent parties and training a shared model over the Internet. In this demonstration, w… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: Accepted to NeurIPS 2021 Demonstration Track. 10 pages, 2 figures. Link: https://training-transformers-together.github.io

  22. arXiv:2110.03313  [pdf, other

    cs.LG stat.ML

    Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees

    Authors: Aleksandr Beznosikov, Peter Richtárik, Michael Diskin, Max Ryabinin, Alexander Gasnikov

    Abstract: Variational inequalities in general and saddle point problems in particular are increasingly relevant in machine learning applications, including adversarial learning, GANs, transport and robust optimization. With increasing data and problem sizes necessary to train high performing models across various applications, we need to rely on parallel and distributed computing. However, in distributed tr… ▽ More

    Submitted 2 April, 2023; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: Appears in: Advances in Neural Information Processing Systems 35 (NeurIPS 2022). Minor modifications with respect to the NeurIPS version. 73 pages, 9 algorithms, 2 figures, 2 tables

    Journal ref: https://proceedings.neurips.cc/paper_files/paper/2022/hash/5ac1428c23b5da5e66d029646ea3206d-Abstract-Conference.html

  23. arXiv:2106.12066  [pdf, other

    cs.CL cs.LG

    It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning

    Authors: Alexey Tikhonov, Max Ryabinin

    Abstract: Commonsense reasoning is one of the key problems in natural language processing, but the relative scarcity of labeled data holds back the progress for languages other than English. Pretrained cross-lingual models are a source of powerful language-agnostic representations, yet their inherent reasoning capabilities are still actively studied. In this work, we design a simple approach to commonsense… ▽ More

    Submitted 30 November, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

    Comments: Accepted to Findings of ACL 2021. 13 pages, 4 figures. Code: https://github.com/yandex-research/crosslingual_winograd

  24. arXiv:2106.11257  [pdf, other

    cs.LG cs.DC math.OC

    Secure Distributed Training at Scale

    Authors: Eduard Gorbunov, Alexander Borzunov, Michael Diskin, Max Ryabinin

    Abstract: Many areas of deep learning benefit from using increasingly larger neural networks trained on public data, as is the case for pre-trained models for NLP and computer vision. Training such models requires a lot of computational resources (e.g., HPC clusters) that are not available to small research groups and independent researchers. One way to address it is for several smaller groups to pool their… ▽ More

    Submitted 1 January, 2023; v1 submitted 21 June, 2021; originally announced June 2021.

    Comments: Accepted to International Conference on Machine Learning (ICML 2022). 61 pages, 10 figures. The version 4 fixes inaccuracies in the proofs of Lemmas E.2 and E.4. Code: https://github.com/yandex-research/btard

  25. arXiv:2106.10207  [pdf, other

    cs.LG cs.DC

    Distributed Deep Learning in Open Collaborations

    Authors: Michael Diskin, Alexey Bukhtiyarov, Max Ryabinin, Lucile Saulnier, Quentin Lhoest, Anton Sinitsin, Dmitry Popov, Dmitry Pyrkin, Maxim Kashirin, Alexander Borzunov, Albert Villanova del Moral, Denis Mazur, Ilia Kobelev, Yacine Jernite, Thomas Wolf, Gennady Pekhimenko

    Abstract: Modern deep learning applications require increasingly more compute to train state-of-the-art models. To address this demand, large corporations and institutions use dedicated High-Performance Computing clusters, whose construction and maintenance are both environmentally costly and well beyond the budget of most organizations. As a result, some research directions become the exclusive domain of a… ▽ More

    Submitted 8 November, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

    Comments: Accepted to Conference on Neural Information Processing Systems (NeurIPS) 2021. 32 pages, 10 figures. Code: https://github.com/yandex-research/DeDLOC

  26. arXiv:2105.06987  [pdf, other

    cs.LG cs.AI stat.ML

    Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets

    Authors: Max Ryabinin, Andrey Malinin, Mark Gales

    Abstract: Ensembles of machine learning models yield improved system performance as well as robust and interpretable uncertainty estimates; however, their inference costs may often be prohibitively high. \emph{Ensemble Distribution Distillation} is an approach that allows a single model to efficiently capture both the predictive performance and uncertainty estimates of an ensemble. For classification, this… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

  27. arXiv:2103.03239  [pdf, other

    cs.LG cs.DC math.OC

    Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices

    Authors: Max Ryabinin, Eduard Gorbunov, Vsevolod Plokhotnyuk, Gennady Pekhimenko

    Abstract: Training deep neural networks on large datasets can often be accelerated by using multiple compute nodes. This approach, known as distributed training, can utilize hundreds of computers via specialized message-passing protocols such as Ring All-Reduce. However, running these protocols at scale requires reliable high-speed networking that is only available in dedicated clusters. In contrast, many r… ▽ More

    Submitted 11 January, 2022; v1 submitted 4 March, 2021; originally announced March 2021.

    Comments: Accepted to Conference on Neural Information Processing Systems (NeurIPS) 2021. Code: https://github.com/yandex-research/moshpit-sgd

  28. arXiv:2010.02598  [pdf, other

    cs.CL cs.LG

    Embedding Words in Non-Vector Space with Unsupervised Graph Learning

    Authors: Max Ryabinin, Sergei Popov, Liudmila Prokhorenkova, Elena Voita

    Abstract: It has become a de-facto standard to represent words as elements of a vector space (word2vec, GloVe). While this approach is convenient, it is unnatural for language: words form a graph with a latent hierarchical structure, and this structure has to be revealed and encoded by word embeddings. We introduce GraphGlove: unsupervised graph word representations which are learned end-to-end. In our sett… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: Accepted as a long paper for EMNLP 2020. 15 pages, 6 figures

  29. arXiv:2002.04013  [pdf, other

    cs.DC cs.LG stat.ML

    Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts

    Authors: Max Ryabinin, Anton Gusev

    Abstract: Many recent breakthroughs in deep learning were achieved by training increasingly larger models on massive datasets. However, training such models can be prohibitively expensive. For instance, the cluster used to train GPT-3 costs over \… ▽ More

    Submitted 21 October, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

    Comments: Advances in Neural Information Processing Systems, 2020. Code URL: https://github.com/mryab/learning-at-home. 16 pages, 6 figures

    Journal ref: Advances in Neural Information Processing Systems 33 (2020) 3659-3672

  30. arXiv:1901.00213  [pdf, other

    stat.ML cs.LG

    A weighted random survival forest

    Authors: Lev V. Utkin, Andrei V. Konstantinov, Viacheslav S. Chukanov, Mikhail V. Kots, Mikhail A. Ryabinin, Anna A. Meldo

    Abstract: A weighted random survival forest is presented in the paper. It can be regarded as a modification of the random forest improving its performance. The main idea underlying the proposed model is to replace the standard procedure of averaging used for estimation of the random survival forest hazard function by weighted avaraging where the weights are assigned to every tree and can be veiwed as traini… ▽ More

    Submitted 1 January, 2019; originally announced January 2019.

  31. arXiv:1705.09620  [pdf, other

    stat.ML cs.LG

    Discriminative Metric Learning with Deep Forest

    Authors: Lev V. Utkin, Mikhail A. Ryabinin

    Abstract: A Discriminative Deep Forest (DisDF) as a metric learning algorithm is proposed in the paper. It is based on the Deep Forest or gcForest proposed by Zhou and Feng and can be viewed as a gcForest modification. The case of the fully supervised learning is studied when the class labels of individual training examples are known. The main idea underlying the algorithm is to assign weights to decision t… ▽ More

    Submitted 25 May, 2017; originally announced May 2017.

    Comments: arXiv admin note: substantial text overlap with arXiv:1704.08715

    MSC Class: 68T10

  32. arXiv:1704.08715  [pdf, other

    stat.ML cs.LG

    A Siamese Deep Forest

    Authors: Lev V. Utkin, Mikhail A. Ryabinin

    Abstract: A Siamese Deep Forest (SDF) is proposed in the paper. It is based on the Deep Forest or gcForest proposed by Zhou and Feng and can be viewed as a gcForest modification. It can be also regarded as an alternative to the well-known Siamese neural networks. The SDF uses a modified training set consisting of concatenated pairs of vectors. Moreover, it defines the class distributions in the deep forest… ▽ More

    Submitted 27 April, 2017; originally announced April 2017.

    MSC Class: 68T10