Skip to main content

Showing 1–8 of 8 results for author: Miftahutdinov, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.09240  [pdf, other

    cs.LG cs.CL

    nach0-pc: Multi-task Language Model with Molecular Point Cloud Encoder

    Authors: Maksim Kuznetsov, Airat Valiev, Alex Aliper, Daniil Polykovskiy, Elena Tutubalina, Rim Shayakhmetov, Zulfat Miftahutdinov

    Abstract: Recent advancements have integrated Language Models (LMs) into a drug discovery pipeline. However, existing models mostly work with SMILES and SELFIES chemical string representations, which lack spatial features vital for drug discovery. Additionally, attempts to translate chemical 3D structures into text format encounter issues such as excessive length and insufficient atom connectivity informati… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  2. arXiv:2311.12410  [pdf, other

    cs.CL cs.AI cs.LG q-bio.QM

    nach0: Multimodal Natural and Chemical Languages Foundation Model

    Authors: Micha Livne, Zulfat Miftahutdinov, Elena Tutubalina, Maksim Kuznetsov, Daniil Polykovskiy, Annika Brundyn, Aastha Jhunjhunwala, Anthony Costa, Alex Aliper, Alán Aspuru-Guzik, Alex Zhavoronkov

    Abstract: Large Language Models (LLMs) have substantially driven scientific progress in various domains, and many papers have demonstrated their ability to tackle complex problems with creative solutions. Our paper introduces a new foundation model, nach0, capable of solving various chemical and biological tasks: biomedical question answering, named entity recognition, molecular generation, molecular synthe… ▽ More

    Submitted 2 May, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: Accepted to Chemical Science Journal. Models are publicly available via https://huggingface.co/insilicomedicine/nach0_base and https://huggingface.co/insilicomedicine/nach0_large

    Journal ref: Chemical Science, 15(22), 8380-8389, 2024

  3. arXiv:2101.09311  [pdf, ps, other

    cs.CL cs.IR

    Drug and Disease Interpretation Learning with Biomedical Entity Representation Transformer

    Authors: Zulfat Miftahutdinov, Artur Kadurin, Roman Kudrin, Elena Tutubalina

    Abstract: Concept normalization in free-form texts is a crucial step in every text-mining pipeline. Neural architectures based on Bidirectional Encoder Representations from Transformers (BERT) have achieved state-of-the-art results in the biomedical domain. In the context of drug discovery and development, clinical trials are necessary to establish the efficacy and safety of drugs. We investigate the effect… ▽ More

    Submitted 22 January, 2021; originally announced January 2021.

    Comments: Accepted to the 43rd European Conference on Information Retrieval (ECIR 2021)

  4. The Russian Drug Reaction Corpus and Neural Models for Drug Reactions and Effectiveness Detection in User Reviews

    Authors: Elena Tutubalina, Ilseyar Alimova, Zulfat Miftahutdinov, Andrey Sakhovskiy, Valentin Malykh, Sergey Nikolenko

    Abstract: The Russian Drug Reaction Corpus (RuDReC) is a new partially annotated corpus of consumer reviews in Russian about pharmaceutical products for the detection of health-related named entities and the effectiveness of pharmaceutical products. The corpus itself consists of two parts, the raw one and the labelled one. The raw part includes 1.4 million health-related user-generated texts collected from… ▽ More

    Submitted 7 April, 2020; originally announced April 2020.

    Comments: 9 pages, 9 tables, 4 figures

    Journal ref: Bioinformatics, 2020

  5. arXiv:1908.07069  [pdf, other

    cs.IR cs.CL cs.LG

    CommentsRadar: Dive into Unique Data on All Comments on the Web

    Authors: Sergey Nikolenko, Elena Tutubalina, Zulfat Miftahutdinov, Eugene Beloded

    Abstract: We introduce an entity-centric search engineCommentsRadarthatpairs entity queries with articles and user opinions covering a widerange of topics from top commented sites. The engine aggregatesarticles and comments for these articles, extracts named entities,links them together and with knowledge base entries, performssentiment analysis, and aggregates the results, aiming to mine fortemporal trends… ▽ More

    Submitted 16 August, 2019; originally announced August 2019.

  6. arXiv:1907.07972  [pdf, ps, other

    cs.CL cs.IR cs.LG

    Deep Neural Models for Medical Concept Normalization in User-Generated Texts

    Authors: Zulfat Miftahutdinov, Elena Tutubalina

    Abstract: In this work, we consider the medical concept normalization problem, i.e., the problem of mapping a health-related entity mention in a free-form text to a concept in a controlled vocabulary, usually to the standard thesaurus in the Unified Medical Language System (UMLS). This is a challenging task since medical terminology is very different when coming from health care professionals or from the ge… ▽ More

    Submitted 18 July, 2019; originally announced July 2019.

    Comments: This is preprint of the paper "Deep Neural Models for Medical Concept Normalization in User-Generated Texts" to be published at ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop

    Journal ref: ACL SRW 2019

  7. Sequence Learning with RNNs for Medical Concept Normalization in User-Generated Texts

    Authors: Elena Tutubalina, Zulfat Miftahutdinov, Sergey Nikolenko, Valentin Malykh

    Abstract: In this work, we consider the medical concept normalization problem, i.e., the problem of mapping a disease mention in free-form text to a concept in a controlled vocabulary, usually to the standard thesaurus in the Unified Medical Language System (UMLS). This task is challenging since medical terminology is very different when coming from health care professionals or from the general public in th… ▽ More

    Submitted 29 November, 2018; v1 submitted 28 November, 2018; originally announced November 2018.

    Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

    Report number: ML4H/2018/117

    Journal ref: Journal of Biomedical Informatics. - 2018. - Vol.84, Is.. - P.93-102

  8. arXiv:1712.01213  [pdf, ps, other

    cs.CL cs.CY

    An Encoder-Decoder Model for ICD-10 Coding of Death Certificates

    Authors: Elena Tutubalina, Zulfat Miftahutdinov

    Abstract: Information extraction from textual documents such as hospital records and healthrelated user discussions has become a topic of intense interest. The task of medical concept coding is to map a variable length text to medical concepts and corresponding classification codes in some external system or ontology. In this work, we utilize recurrent neural networks to automatically assign ICD-10 codes to… ▽ More

    Submitted 4 December, 2017; originally announced December 2017.

    Journal ref: KFU at CLEF eHealth 2017 Task 1: ICD-10 Coding of English Death Certificates with Recurrent Neural Networks, CEUR Workshop Proceedings, Vol 1866, 2017