Skip to main content

Showing 1–6 of 6 results for author: Roguski, Ł

Searching in archive cs. Search in all archives.
.
  1. arXiv:2308.08234  [pdf, other

    cs.CL cs.AI cs.LG

    Challenges and Opportunities of Using Transformer-Based Multi-Task Learning in NLP Through ML Lifecycle: A Survey

    Authors: Lovre Torbarina, Tin Ferkovic, Lukasz Roguski, Velimir Mihelcic, Bruno Sarlija, Zeljko Kraljevic

    Abstract: The increasing adoption of natural language processing (NLP) models across industries has led to practitioners' need for machine learning systems to handle these models efficiently, from training to serving them in production. However, training, deploying, and updating multiple models can be complex, costly, and time-consuming, mainly when using transformer-based pre-trained language models. Multi… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

  2. arXiv:2108.06835  [pdf

    cs.IR

    Deployment of a Free-Text Analytics Platform at a UK National Health Service Research Hospital: CogStack at University College London Hospitals

    Authors: Kawsar Noor, Lukasz Roguski, Alex Handy, Roman Klapaukh, Amos Folarin, Luis Romao, Joshua Matteson, Nathan Lea, Leilei Zhu, Wai Keong Wong, Anoop Shah, Richard J Dobson

    Abstract: As more healthcare organisations transition to using electronic health record (EHR) systems it is important for these organisations to maximise the secondary use of their data to support service improvement and clinical research. These organisations will find it challenging to have systems which can mine information from the unstructured data fields in the record (clinical notes, letters etc) and… ▽ More

    Submitted 15 August, 2021; originally announced August 2021.

  3. arXiv:2010.01165  [pdf, other

    cs.CL cs.AI cs.LG

    Multi-domain Clinical Natural Language Processing with MedCAT: the Medical Concept Annotation Toolkit

    Authors: Zeljko Kraljevic, Thomas Searle, Anthony Shek, Lukasz Roguski, Kawsar Noor, Daniel Bean, Aurelie Mascio, Leilei Zhu, Amos A Folarin, Angus Roberts, Rebecca Bendayan, Mark P Richardson, Robert Stewart, Anoop D Shah, Wai Keong Wong, Zina Ibrahim, James T Teo, Richard JB Dobson

    Abstract: Electronic health records (EHR) contain large volumes of unstructured text, requiring the application of Information Extraction (IE) technologies to enable clinical analysis. We present the open-source Medical Concept Annotation Toolkit (MedCAT) that provides: a) a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT; b) a f… ▽ More

    Submitted 25 March, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

    Comments: Preprint: 27 Pages, 3 Figures

  4. arXiv:1912.10166  [pdf

    cs.CL cs.LG stat.ML

    MedCAT -- Medical Concept Annotation Tool

    Authors: Zeljko Kraljevic, Daniel Bean, Aurelie Mascio, Lukasz Roguski, Amos Folarin, Angus Roberts, Rebecca Bendayan, Richard Dobson

    Abstract: Biomedical documents such as Electronic Health Records (EHRs) contain a large amount of information in an unstructured format. The data in EHRs is a hugely valuable resource documenting clinical narratives and decisions, but whilst the text can be easily understood by human doctors it is challenging to use in research and clinical applications. To uncover the potential of biomedical documents we n… ▽ More

    Submitted 18 December, 2019; originally announced December 2019.

    Comments: Preprint, 25 pages, 5 figures and 4 tables

  5. arXiv:1506.05185  [pdf

    q-bio.GN cs.CE cs.DS

    CARGO: Effective format-free compressed storage of genomic information

    Authors: Łukasz Roguski, Paolo Ribeca

    Abstract: The recent super-exponential growth in the amount of sequencing data generated worldwide has put techniques for compressed storage into the focus. Most available solutions, however, are strictly tied to specific bioinformatics formats, sometimes inheriting from them suboptimal design choices; this hinders flexible and effective data sharing. Here we present CARGO (Compressed ARchiving for GenOmics… ▽ More

    Submitted 16 June, 2015; originally announced June 2015.

    Comments: 13 (Main) + 31 (Supplementary) + 88 (Manual) pages

  6. arXiv:1405.6874  [pdf, other

    cs.DS

    Disk-based genome sequencing data compression

    Authors: Szymon Grabowski, Sebastian Deorowicz, Łukasz Roguski

    Abstract: Motivation: High-coverage sequencing data have significant, yet hard to exploit, redundancy. Most FASTQ compressors cannot efficiently compress the DNA stream of large datasets, since the redundancy between overlapping reads cannot be easily captured in the (relatively small) main memory. More interesting solutions for this problem are disk-based~(Yanovsky, 2011; Cox et al., 2012), where the bette… ▽ More

    Submitted 18 September, 2014; v1 submitted 27 May, 2014; originally announced May 2014.

    MSC Class: 68W32 ACM Class: E.4