Skip to main content

Showing 1–20 of 20 results for author: Hanna, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.06618  [pdf, other

    cs.IT cs.ET

    On the Reliability of Information Retrieval From MDS Coded Data in DNA Storage

    Authors: Serge Kas Hanna

    Abstract: This work presents a theoretical analysis of the probability of successfully retrieving data encoded with MDS codes (e.g., Reed-Solomon codes) in DNA storage systems. We study this probability under independent and identically distributed (i.i.d.) substitution errors, focusing on a common code design strategy that combines inner and outer MDS codes. Our analysis demonstrates how this probability d… ▽ More

    Submitted 1 May, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

    Comments: A shorter version of this paper has been accepted for presentation at ISIT 2025

  2. arXiv:2411.02523  [pdf

    cs.CL cs.AI

    Evaluating the Impact of Lab Test Results on Large Language Models Generated Differential Diagnoses from Clinical Case Vignettes

    Authors: Balu Bhasuran, Qiao Jin, Yuzhang Xie, Carl Yang, Karim Hanna, Jennifer Costa, Cindy Shavor, Zhiyong Lu, Zhe He

    Abstract: Differential diagnosis is crucial for medicine as it helps healthcare providers systematically distinguish between conditions that share similar symptoms. This study assesses the impact of lab test results on differential diagnoses (DDx) made by large language models (LLMs). Clinical vignettes from 50 case reports from PubMed Central were created incorporating patient demographics, symptoms, and l… ▽ More

    Submitted 31 October, 2024; originally announced November 2024.

  3. arXiv:2410.21422  [pdf, other

    cs.CE

    A Foundation Model for Chemical Design and Property Prediction

    Authors: Feiyang Cai, Katelin Hanna, Tianyu Zhu, Tzuen-Rong Tzeng, Yongping Duan, Ling Liu, Srikanth Pilla, Gang Li, Feng Luo

    Abstract: Artificial intelligence (AI) has significantly advanced computational chemistry research in various tasks. However, traditional AI methods often rely on task-specific model designs and training, which constrain both the scalability of model size and generalization across different tasks. Here, we introduce ChemFM, a large foundation model specifically developed for chemicals. ChemFM comprises 3 bi… ▽ More

    Submitted 23 January, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

  4. arXiv:2409.18986  [pdf, other

    cs.CL cs.AI cs.IR

    Lab-AI: Using Retrieval Augmentation to Enhance Language Models for Personalized Lab Test Interpretation in Clinical Medicine

    Authors: Xiaoyu Wang, Haoyong Ouyang, Balu Bhasuran, Xiao Luo, Karim Hanna, Mia Liza A. Lustria, Carl Yang, Zhe He

    Abstract: Accurate interpretation of lab results is crucial in clinical medicine, yet most patient portals use universal normal ranges, ignoring conditional factors like age and gender. This study introduces Lab-AI, an interactive system that offers personalized normal ranges using retrieval-augmented generation (RAG) from credible health sources. Lab-AI has two modules: factor retrieval and normal range re… ▽ More

    Submitted 23 April, 2025; v1 submitted 16 September, 2024; originally announced September 2024.

  5. arXiv:2404.03524  [pdf, other

    cs.LG cs.CR cs.DC cs.IT stat.ML

    Approximate Gradient Coding for Privacy-Flexible Federated Learning with Non-IID Data

    Authors: Okko Makkonen, Sampo Niemelä, Camilla Hollanti, Serge Kas Hanna

    Abstract: This work focuses on the challenges of non-IID data and stragglers/dropouts in federated learning. We introduce and explore a privacy-flexible paradigm that models parts of the clients' local data as non-private, offering a more versatile and business-oriented perspective on privacy. Within this framework, we propose a data-driven strategy for mitigating the effects of label heterogeneity and clie… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  6. arXiv:2402.01693  [pdf

    cs.CL cs.AI

    Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study

    Authors: Zhe He, Balu Bhasuran, Qiao Jin, Shubo Tian, Karim Hanna, Cindy Shavor, Lisbeth Garcia Arguello, Patrick Murray, Zhiyong Lu

    Abstract: Lab results are often confusing and hard to understand. Large language models (LLMs) such as ChatGPT have opened a promising avenue for patients to get their questions answered. We aim to assess the feasibility of using LLMs to generate relevant, accurate, helpful, and unharmful responses to lab test-related questions asked by patients and to identify potential issues that can be mitigated with au… ▽ More

    Submitted 23 January, 2024; originally announced February 2024.

  7. arXiv:2402.01244  [pdf, other

    cs.IT

    $\mathsf{GC+}$ Code: a Short Systematic Code for Correcting Random Edit Errors in DNA Storage

    Authors: Serge Kas Hanna

    Abstract: Storing digital data in synthetic DNA faces challenges in ensuring data reliability in the presence of edit errors -- deletions, insertions, and substitutions -- that occur randomly during various phases of the storage process. Current limitations in DNA synthesis technology also require the use of short DNA sequences, highlighting the particular need for short edit-correcting codes. Motivated by… ▽ More

    Submitted 7 September, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: Extended version of ISIT paper with new results

  8. arXiv:2308.05110  [pdf, other

    cs.LG

    Can Attention Be Used to Explain EHR-Based Mortality Prediction Tasks: A Case Study on Hemorrhagic Stroke

    Authors: Qizhang Feng, Jiayi Yuan, Forhan Bin Emdad, Karim Hanna, Xia Hu, Zhe He

    Abstract: Stroke is a significant cause of mortality and morbidity, necessitating early predictive strategies to minimize risks. Traditional methods for evaluating patients, such as Acute Physiology and Chronic Health Evaluation (APACHE II, IV) and Simplified Acute Physiology Score III (SAPS III), have limited accuracy and interpretability. This paper proposes a novel approach: an interpretable, attention-b… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

  9. arXiv:2304.09839  [pdf, other

    cs.IT cs.DM

    Optimal Codes Detecting Deletions in Concatenated Binary Strings Applied to Trace Reconstruction

    Authors: Serge Kas Hanna

    Abstract: Consider two or more strings $\mathbf{x}^1,\mathbf{x}^2,\ldots,$ that are concatenated to form $\mathbf{x}=\langle \mathbf{x}^1,\mathbf{x}^2,\ldots \rangle$. Suppose that up to $δ$ deletions occur in each of the concatenated strings. Since deletions alter the lengths of the strings, a fundamental question to ask is: how much redundancy do we need to introduce in $\mathbf{x}$ in order to recover th… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

    Comments: Accepted for publication in the IEEE Transactions on Information Theory. arXiv admin note: substantial text overlap with arXiv:2207.05126, arXiv:2105.00212

  10. arXiv:2304.08589  [pdf, other

    cs.DC cs.IT cs.LG stat.ML

    Fast and Straggler-Tolerant Distributed SGD with Reduced Computation Load

    Authors: Maximilian Egger, Serge Kas Hanna, Rawad Bitar

    Abstract: In distributed machine learning, a central node outsources computationally expensive calculations to external worker nodes. The properties of optimization procedures like stochastic gradient descent (SGD) can be leveraged to mitigate the effect of unresponsive or slow workers called stragglers, that otherwise degrade the benefit of outsourcing the computation. This can be done by only waiting for… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

  11. arXiv:2302.08644  [pdf, other

    cs.IT

    Codes Correcting Burst and Arbitrary Erasures for Reliable and Low-Latency Communication

    Authors: Serge Kas Hanna, Zhiyuan Tan, Wen Xu, Antonia Wachter-Zeh

    Abstract: Motivated by modern network communication applications which require low latency, we study codes that correct erasures with low decoding delay. We provide a simple explicit construction that yields convolutional codes that can correct both burst and arbitrary erasures under a maximum decoding delay constraint $T$. Our proposed code has efficient encoding/decoding algorithms and requires a field si… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

    Comments: Accepted for publication in IEEE ICASSP 2023

  12. arXiv:2208.03134  [pdf, other

    cs.LG cs.DC cs.IT

    Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning

    Authors: Serge Kas Hanna, Rawad Bitar, Parimal Parag, Venkat Dasari, Salim El Rouayheb

    Abstract: We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers, each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the responses of the fastest $k<n$ workers before updatin… ▽ More

    Submitted 4 August, 2022; originally announced August 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2002.11005

  13. arXiv:2207.05126  [pdf, other

    cs.IT

    Coding for Trace Reconstruction over Multiple Channels with Vanishing Deletion Probabilities

    Authors: Serge Kas Hanna

    Abstract: Motivated by DNA-based storage applications, we study the problem of reconstructing a coded sequence from multiple traces. We consider the model where the traces are outputs of independent deletion channels, where each channel deletes each bit of the input codeword \(\mathbf{x} \in \{0,1\}^n\) independently with probability \(p\). We focus on the regime where the deletion probability \(p \to 0\) w… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: This is the full version of the short paper accepted at ISIT 2022

  14. arXiv:2105.02298  [pdf, ps, other

    cs.IT

    Optimal Codes Correcting Localized Deletions

    Authors: Rawad Bitar, Serge Kas Hanna, Nikita Polyanskii, Ilya Vorobyev

    Abstract: We consider the problem of constructing codes that can correct deletions that are localized within a certain part of the codeword that is unknown a priori. Namely, the model that we study is when at most $k$ deletions occur in a window of size $k$, where the positions of the deletions within this window are not necessarily consecutive. Localized deletions are thus a generalization of burst deletio… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

    Comments: 10 pages, a full version of the paper accepted to 2021 IEEE ISIT

  15. arXiv:2105.00212  [pdf, ps, other

    cs.IT

    Detecting Deletions and Insertions in Concatenated Strings with Optimal Redundancy

    Authors: Serge Kas Hanna, Rawad Bitar

    Abstract: We study codes that can detect the exact number of deletions and insertions in concatenated binary strings. We construct optimal codes for the case of detecting up to $\del$ deletions. We prove the optimality of these codes by deriving a converse result which shows that the redundancy of our codes is asymptotically optimal in $\del$ among all families of deletion detecting codes, and particularly… ▽ More

    Submitted 1 May, 2021; originally announced May 2021.

    Comments: Shorter version accepted in ISIT 2021

  16. Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers

    Authors: Serge Kas Hanna, Rawad Bitar, Parimal Parag, Venkat Dasari, Salim El Rouayheb

    Abstract: We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the responses of the fastest $k<n$ workers before updating… ▽ More

    Submitted 25 February, 2020; originally announced February 2020.

    Comments: Accepted to IEEE ICASSP 2020

    Report number: pp. 4262--4266, May 2020

    Journal ref: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4262--4266, May 2020

  17. arXiv:1810.07281  [pdf, other

    cs.IT

    List Decoding of Deletions Using Guess & Check Codes

    Authors: Serge Kas Hanna, Salim El Rouayheb

    Abstract: Guess & Check (GC) codes are systematic binary codes that can correct multiple deletions, with high probability. GC codes have logarithmic redundancy in the length of the message $k$, and the encoding and decoding algorithms of these codes are deterministic and run in polynomial time for a constant number of deletions $δ$. The unique decoding properties of GC codes were examined in a previous work… ▽ More

    Submitted 29 April, 2019; v1 submitted 16 October, 2018; originally announced October 2018.

    Comments: This is a full version of the short paper accepted at ISIT 2019

  18. arXiv:1711.01941  [pdf, other

    cs.IT

    Codes for Correcting Localized Deletions

    Authors: Serge Kas Hanna, Salim El Rouayheb

    Abstract: We consider the problem of constructing binary codes for correcting deletions that are localized within certain parts of the codeword that are unknown a priori. The model that we study is when $δ\leq w$ deletions are localized in a window of size $w$ bits. These $δ$ deletions do not necessarily occur in consecutive positions, but are restricted to the window of size $w$. The localized deletions mo… ▽ More

    Submitted 8 January, 2021; v1 submitted 2 November, 2017; originally announced November 2017.

    Comments: Accepted for publication in IEEE Transactions on Information Theory

  19. arXiv:1705.09569  [pdf, other

    cs.IT

    Guess & Check Codes for Deletions, Insertions, and Synchronization

    Authors: Serge Kas Hanna, Salim El Rouayheb

    Abstract: We consider the problem of constructing codes that can correct $δ$ deletions occurring in an arbitrary binary string of length $n$ bits. Varshamov-Tenengolts (VT) codes, dating back to 1965, are zero-error single deletion $(δ=1)$ correcting codes, and have an asymptotically optimal redundancy. Finding similar codes for $δ\geq 2$ deletions remains an open problem. In this work, we relax the standar… ▽ More

    Submitted 24 May, 2018; v1 submitted 24 May, 2017; originally announced May 2017.

    Comments: Accepted to the IEEE Transactions on Information Theory. arXiv admin note: text overlap with arXiv:1702.04466

  20. arXiv:1702.04466  [pdf, other

    cs.IT

    Guess & Check Codes for Deletions and Synchronization

    Authors: Serge Kas Hanna, Salim El Rouayheb

    Abstract: We consider the problem of constructing codes that can correct $δ$ deletions occurring in an arbitrary binary string of length $n$ bits. Varshamov-Tenengolts (VT) codes can correct all possible single deletions $(δ=1)$ with an asymptotically optimal redundancy. Finding similar codes for $δ\geq 2$ deletions is an open problem. We propose a new family of codes, that we call Guess & Check (GC) codes,… ▽ More

    Submitted 27 April, 2017; v1 submitted 15 February, 2017; originally announced February 2017.

    Comments: Accepted in ISIT 2017