Skip to main content

Showing 1–13 of 13 results for author: Gondara, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.21191  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Small or Large? Zero-Shot or Finetuned? Guiding Language Model Choice for Specialized Applications in Healthcare

    Authors: Lovedeep Gondara, Jonathan Simkin, Graham Sayle, Shebnum Devji, Gregory Arbour, Raymond Ng

    Abstract: This study aims to guide language model selection by investigating: 1) the necessity of finetuning versus zero-shot usage, 2) the benefits of domain-adjacent versus generic pretrained models, 3) the value of further domain-specific pretraining, and 4) the continued relevance of Small Language Models (SLMs) compared to Large Language Models (LLMs) for specific tasks. Using electronic pathology repo… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  2. arXiv:2504.15261  [pdf

    cs.AI cs.LG

    Leveraging Language Models for Automated Patient Record Linkage

    Authors: Mohammad Beheshti, Lovedeep Gondara, Iris Zachary

    Abstract: Objective: Healthcare data fragmentation presents a major challenge for linking patient data, necessitating robust record linkage to integrate patient records from diverse sources. This study investigates the feasibility of leveraging language models for automated patient record linkage, focusing on two key tasks: blocking and matching. Materials and Methods: We utilized real-world healthcare data… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  3. arXiv:2503.21800  [pdf, other

    cs.CL cs.AI cs.LG

    ELM: Ensemble of Language Models for Predicting Tumor Group from Pathology Reports

    Authors: Lovedeep Gondara, Jonathan Simkin, Shebnum Devji, Gregory Arbour, Raymond Ng

    Abstract: Population-based cancer registries (PBCRs) face a significant bottleneck in manually extracting data from unstructured pathology reports, a process crucial for tasks like tumor group assignment, which can consume 900 person-hours for approximately 100,000 reports. To address this, we introduce ELM (Ensemble of Language Models), a novel ensemble-based approach leveraging both small language models… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  4. arXiv:2411.16702  [pdf, other

    cs.CY cs.LG

    A Clinical Trial Design Approach to Auditing Language Models in Healthcare Setting

    Authors: Lovedeep Gondara, Jonathan Simkin

    Abstract: We present an audit mechanism for language models, with a focus on models deployed in the healthcare setting. Our proposed mechanism takes inspiration from clinical trial design where we posit the language model audit as a single blind equivalence trial, with the comparison of interest being the subject matter experts. We show that using our proposed method, we can follow principled sample size an… ▽ More

    Submitted 18 December, 2024; v1 submitted 11 November, 2024; originally announced November 2024.

  5. arXiv:2112.04640  [pdf, other

    cs.LG cs.CR stat.ML

    Differentially Private Ensemble Classifiers for Data Streams

    Authors: Lovedeep Gondara, Ke Wang, Ricardo Silva Carvalho

    Abstract: Learning from continuous data streams via classification/regression is prevalent in many domains. Adapting to evolving data characteristics (concept drift) while protecting data owners' private information is an open challenge. We present a differentially private ensemble solution to this problem with two distinguishing features: it allows an \textit{unbounded} number of ensemble updates to deal w… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

    Comments: Accepted at WSDM 2022

  6. arXiv:2002.11613  [pdf, other

    cs.LG stat.ML

    The Differentially Private Lottery Ticket Mechanism

    Authors: Lovedeep Gondara, Ke Wang, Ricardo Silva Carvalho

    Abstract: We propose the differentially private lottery ticket mechanism (DPLTM). An end-to-end differentially private training paradigm based on the lottery ticket hypothesis. Using "high-quality winners", selected via our custom score function, DPLTM significantly improves the privacy-utility trade-off over the state-of-the-art. We show that DPLTM converges faster, allowing for early stopping with reduced… ▽ More

    Submitted 16 February, 2020; originally announced February 2020.

  7. arXiv:1910.05108  [pdf, other

    cs.LG stat.ME stat.ML

    Differentially Private Survival Function Estimation

    Authors: Lovedeep Gondara, Ke Wang

    Abstract: Survival function estimation is used in many disciplines, but it is most common in medical analytics in the form of the Kaplan-Meier estimator. Sensitive data (patient records) is used in the estimation without any explicit control on the information leakage, which is a significant privacy concern. We propose a first differentially private estimator of the survival function and show that it can be… ▽ More

    Submitted 14 January, 2020; v1 submitted 4 October, 2019; originally announced October 2019.

    Comments: Preprint

  8. arXiv:1802.04664  [pdf, other

    cs.LG stat.AP stat.ML

    Recovering Loss to Followup Information Using Denoising Autoencoders

    Authors: Lovedeep Gondara, Ke Wang

    Abstract: Loss to followup is a significant issue in healthcare and has serious consequences for a study's validity and cost. Methods available at present for recovering loss to followup information are restricted by their expressive capabilities and struggle to model highly non-linear relations and complex interactions. In this paper we propose a model based on overcomplete denoising autoencoders to recove… ▽ More

    Submitted 11 February, 2018; originally announced February 2018.

    Comments: Copyright IEEE 2017, IEEE International Conference on Big Data (Big Data)

  9. arXiv:1705.02737  [pdf, other

    cs.LG stat.ML

    MIDA: Multiple Imputation using Denoising Autoencoders

    Authors: Lovedeep Gondara, Ke Wang

    Abstract: Missing data is a significant problem impacting all domains. State-of-the-art framework for minimizing missing data bias is multiple imputation, for which the choice of an imputation model remains nontrivial. We propose a multiple imputation model based on overcomplete deep denoising autoencoders. Our proposed model is capable of handling different data types, missingness patterns, missingness pro… ▽ More

    Submitted 17 February, 2018; v1 submitted 8 May, 2017; originally announced May 2017.

    Comments: To appear in the proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2018)

  10. arXiv:1705.02224  [pdf, other

    cs.LG cs.CV stat.ML

    Detecting Adversarial Samples Using Density Ratio Estimates

    Authors: Lovedeep Gondara

    Abstract: Machine learning models, especially based on deep architectures are used in everyday applications ranging from self driving cars to medical diagnostics. It has been shown that such models are dangerously susceptible to adversarial samples, indistinguishable from real samples to human eye, adversarial samples lead to incorrect classifications with high confidence. Impact of adversarial samples is f… ▽ More

    Submitted 20 November, 2017; v1 submitted 5 May, 2017; originally announced May 2017.

    Comments: Updated

  11. arXiv:1612.02707  [pdf, other

    cs.LG cs.HC stat.ML

    CrowdMI: Multiple Imputation via Crowdsourcing

    Authors: Lovedeep Gondara

    Abstract: Can humans impute missing data with similar proficiency as machines? This is the question we aim to answer in this paper. We present a novel idea of converting observations with missing data in to a survey questionnaire, which is presented to crowdworkers for completion. We replicate a multiple imputation framework by having multiple unique crowdworkers complete our questionnaire. Experimental res… ▽ More

    Submitted 23 February, 2018; v1 submitted 8 December, 2016; originally announced December 2016.

    Comments: Updated version

  12. arXiv:1609.09471  [pdf, other

    cs.LG stat.ML

    Classifier comparison using precision

    Authors: Lovedeep Gondara

    Abstract: New proposed models are often compared to state-of-the-art using statistical significance testing. Literature is scarce for classifier comparison using metrics other than accuracy. We present a survey of statistical methods that can be used for classifier comparison using precision, accounting for inter-precision correlation arising from use of same dataset. Comparisons are made using per-class pr… ▽ More

    Submitted 15 November, 2016; v1 submitted 29 September, 2016; originally announced September 2016.

    Comments: Extended version

  13. Medical image denoising using convolutional denoising autoencoders

    Authors: Lovedeep Gondara

    Abstract: Image denoising is an important pre-processing step in medical image analysis. Different algorithms have been proposed in past three decades with varying denoising performances. More recently, having outperformed all conventional methods, deep learning based models have shown a great promise. These methods are however limited for requirement of large training sample size and high computational cos… ▽ More

    Submitted 17 September, 2016; v1 submitted 16 August, 2016; originally announced August 2016.

    Comments: To appear: 6 pages, paper to be published at the Fourth Workshop on Data Mining in Biomedical Informatics and Healthcare at ICDM, 2016