Skip to main content

Showing 1–16 of 16 results for author: Madduri, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.12674  [pdf, ps, other

    cs.CL

    Enhancing Clinical Models with Pseudo Data for De-identification

    Authors: Paul Landes, Aaron J Chaise, Tarak Nath Nandi, Ravi K Madduri

    Abstract: Many models are pretrained on redacted text for privacy reasons. Clinical foundation models are often trained on de-identified text, which uses special syntax (masked) text in place of protected health information. Even though these models have increased in popularity, there has been little effort in understanding the effects of training them on redacted text. In this work, we pretrain several enc… ▽ More

    Submitted 16 June, 2025; v1 submitted 14 June, 2025; originally announced June 2025.

  2. arXiv:2506.05127  [pdf, ps, other

    eess.IV cs.CV q-bio.QM

    PixCell: A generative foundation model for digital histopathology images

    Authors: Srikar Yellapragada, Alexandros Graikos, Zilinghan Li, Kostas Triaridis, Varun Belagali, Saarthak Kapse, Tarak Nath Nandi, Ravi K Madduri, Prateek Prasanna, Tahsin Kurc, Rajarsi R. Gupta, Joel Saltz, Dimitris Samaras

    Abstract: The digitization of histology slides has revolutionized pathology, providing massive datasets for cancer diagnosis and research. Contrastive self-supervised and vision-language models have been shown to effectively mine large pathology datasets to learn discriminative representations. On the other hand, generative models, capable of synthesizing realistic and diverse images, present a compelling s… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  3. arXiv:2505.23849  [pdf, other

    cs.CR cs.AI cs.LG

    CADRE: Customizable Assurance of Data Readiness in Privacy-Preserving Federated Learning

    Authors: Kaveen Hiniduma, Zilinghan Li, Aditya Sinha, Ravi Madduri, Suren Byna

    Abstract: Privacy-Preserving Federated Learning (PPFL) is a decentralized machine learning approach where multiple clients train a model collaboratively. PPFL preserves privacy and security of the client's data by not exchanging it. However, ensuring that data at each client is of high quality and ready for federated learning (FL) is a challenge due to restricted data access. In this paper, we introduce CAD… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 10 pages, 7 figures, 2 tables

  4. arXiv:2505.21727  [pdf, ps, other

    cs.DC

    FedCostAware: Enabling Cost-Aware Federated Learning on the Cloud

    Authors: Aditya Sinha, Zilinghan Li, Tingkai Liu, Volodymyr Kindratenko, Kibaek Kim, Ravi Madduri

    Abstract: Federated learning (FL) is a distributed machine learning (ML) approach that allows multiple clients to collaboratively train ML model without exchanging their original training data, offering a solution that is particularly valuable in sensitive domains such as biomedicine. However, training robust FL models often requires substantial computing resources from participating clients, such as GPUs,… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  5. arXiv:2505.18213  [pdf, other

    cs.CY cs.AI

    AIDRIN 2.0: A Framework to Assess Data Readiness for AI

    Authors: Kaveen Hiniduma, Dylan Ryan, Suren Byna, Jean Luca Bez, Ravi Madduri

    Abstract: AI Data Readiness Inspector (AIDRIN) is a framework to evaluate and improve data preparedness for AI applications. It addresses critical data readiness dimensions such as data quality, bias, fairness, and privacy. This paper details enhancements to AIDRIN by focusing on user interface improvements and integration with a privacy-preserving federated learning (PPFL) framework. By refining the UI and… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 3 pages, 3 figures

  6. arXiv:2503.11591  [pdf, other

    eess.IV cs.CV

    Pathology Image Compression with Pre-trained Autoencoders

    Authors: Srikar Yellapragada, Alexandros Graikos, Kostas Triaridis, Zilinghan Li, Tarak Nath Nandi, Ravi K Madduri, Prateek Prasanna, Joel Saltz, Dimitris Samaras

    Abstract: The growing volume of high-resolution Whole Slide Images in digital histopathology poses significant storage, transmission, and computational efficiency challenges. Standard compression methods, such as JPEG, reduce file sizes but often fail to preserve fine-grained phenotypic details critical for downstream tasks. In this work, we repurpose autoencoders (AEs) designed for Latent Diffusion Models… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  7. arXiv:2412.01672  [pdf, other

    cs.CV

    Gen-SIS: Generative Self-augmentation Improves Self-supervised Learning

    Authors: Varun Belagali, Srikar Yellapragada, Alexandros Graikos, Saarthak Kapse, Zilinghan Li, Tarak Nath Nandi, Ravi K Madduri, Prateek Prasanna, Joel Saltz, Dimitris Samaras

    Abstract: Self-supervised learning (SSL) methods have emerged as strong visual representation learners by training an image encoder to maximize similarity between features of different views of the same image. To perform this view-invariance task, current SSL algorithms rely on hand-crafted augmentations such as random cropping and color jittering to create multiple views of an image. Recently, generative d… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

    Comments: Webpage: https://histodiffusion.github.io/docs/publications/gensis

  8. arXiv:2409.19756  [pdf, other

    cs.CR cs.DC

    Advances in Privacy Preserving Federated Learning to Realize a Truly Learning Healthcare System

    Authors: Ravi Madduri, Zilinghan Li, Tarak Nandi, Kibaek Kim, Minseok Ryu, Alex Rodriguez

    Abstract: The concept of a learning healthcare system (LHS) envisions a self-improving network where multimodal data from patient care are continuously analyzed to enhance future healthcare outcomes. However, realizing this vision faces significant challenges in data sharing and privacy protection. Privacy-Preserving Federated Learning (PPFL) is a transformative and promising approach that has the potential… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

  9. arXiv:2409.11585  [pdf, other

    cs.LG cs.CR cs.DC

    Advances in APPFL: A Comprehensive and Extensible Federated Learning Framework

    Authors: Zilinghan Li, Shilan He, Ze Yang, Minseok Ryu, Kibaek Kim, Ravi Madduri

    Abstract: Federated learning (FL) is a distributed machine learning paradigm enabling collaborative model training while preserving data privacy. In today's landscape, where most data is proprietary, confidential, and distributed, FL has become a promising approach to leverage such data effectively, particularly in sensitive domains such as medicine and the electric grid. Heterogeneity and security are the… ▽ More

    Submitted 10 March, 2025; v1 submitted 17 September, 2024; originally announced September 2024.

    Comments: In 2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)

  10. AI Data Readiness Inspector (AIDRIN) for Quantitative Assessment of Data Readiness for AI

    Authors: Kaveen Hiniduma, Suren Byna, Jean Luca Bez, Ravi Madduri

    Abstract: "Garbage In Garbage Out" is a universally agreed quote by computer scientists from various domains, including Artificial Intelligence (AI). As data is the fuel for AI, models trained on low-quality, biased data are often ineffective. Computer scientists who use AI invest a considerable amount of time and effort in preparing the data for AI. However, there are no standard methods or frameworks for… ▽ More

    Submitted 11 March, 2025; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 12 pages, 9 figures, Accepted to SSDBM 2024

  11. arXiv:2402.12271  [pdf, other

    cs.DC cs.LG

    Secure Federated Learning Across Heterogeneous Cloud and High-Performance Computing Resources -- A Case Study on Federated Fine-tuning of LLaMA 2

    Authors: Zilinghan Li, Shilan He, Pranshu Chaturvedi, Volodymyr Kindratenko, Eliu A Huerta, Kibaek Kim, Ravi Madduri

    Abstract: Federated learning enables multiple data owners to collaboratively train robust machine learning models without transferring large or sensitive local datasets by only sharing the parameters of the locally trained models. In this paper, we elaborate on the design of our Advanced Privacy-Preserving Federated Learning (APPFL) framework, which streamlines end-to-end secure and reliable federated learn… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  12. arXiv:2312.08701  [pdf, other

    cs.DC

    Enabling End-to-End Secure Federated Learning in Biomedical Research on Heterogeneous Computing Environments with APPFLx

    Authors: Trung-Hieu Hoang, Jordan Fuhrman, Ravi Madduri, Miao Li, Pranshu Chaturvedi, Zilinghan Li, Kibaek Kim, Minseok Ryu, Ryan Chard, E. A. Huerta, Maryellen Giger

    Abstract: Facilitating large-scale, cross-institutional collaboration in biomedical machine learning projects requires a trustworthy and resilient federated learning (FL) environment to ensure that sensitive information such as protected health information is kept confidential. In this work, we introduce APPFLx, a low-code FL framework that enables the easy setup, configuration, and running of FL experiment… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  13. arXiv:2309.14675  [pdf, other

    cs.LG cs.DC

    FedCompass: Efficient Cross-Silo Federated Learning on Heterogeneous Client Devices using a Computing Power Aware Scheduler

    Authors: Zilinghan Li, Pranshu Chaturvedi, Shilan He, Han Chen, Gagandeep Singh, Volodymyr Kindratenko, E. A. Huerta, Kibaek Kim, Ravi Madduri

    Abstract: Cross-silo federated learning offers a promising solution to collaboratively train robust and generalized AI models without compromising the privacy of local datasets, e.g., healthcare, financial, as well as scientific projects that lack a centralized data facility. Nonetheless, because of the disparity of computing resources among different clients (i.e., device heterogeneity), synchronous federa… ▽ More

    Submitted 11 March, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: Accepted as poster at The Twelfth International Conference on Learning Representations (ICLR 2024)

  14. arXiv:2308.08786  [pdf, other

    cs.LG

    APPFLx: Providing Privacy-Preserving Cross-Silo Federated Learning as a Service

    Authors: Zilinghan Li, Shilan He, Pranshu Chaturvedi, Trung-Hieu Hoang, Minseok Ryu, E. A. Huerta, Volodymyr Kindratenko, Jordan Fuhrman, Maryellen Giger, Ryan Chard, Kibaek Kim, Ravi Madduri

    Abstract: Cross-silo privacy-preserving federated learning (PPFL) is a powerful tool to collaboratively train robust and generalized machine learning (ML) models without sharing sensitive (e.g., healthcare of financial) local data. To ease and accelerate the adoption of PPFL, we introduce APPFLx, a ready-to-use platform that provides privacy-preserving cross-silo federated learning as a service. APPFLx empl… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

  15. arXiv:2210.08973  [pdf, ps, other

    cs.CY cs.HC cs.LG hep-ex

    FAIR for AI: An interdisciplinary and international community building perspective

    Authors: E. A. Huerta, Ben Blaiszik, L. Catherine Brinson, Kristofer E. Bouchard, Daniel Diaz, Caterina Doglioni, Javier M. Duarte, Murali Emani, Ian Foster, Geoffrey Fox, Philip Harris, Lukas Heinrich, Shantenu Jha, Daniel S. Katz, Volodymyr Kindratenko, Christine R. Kirkpatrick, Kati Lassila-Perini, Ravi K. Madduri, Mark S. Neubauer, Fotis E. Psomopoulos, Avik Roy, Oliver RĂ¼bel, Zhizhen Zhao, Ruike Zhu

    Abstract: A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding principles have been re-interpreted or extended to i… ▽ More

    Submitted 1 August, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

    Comments: 10 pages, comments welcome!; v2: 12 pages, accepted to Scientific Data

    ACM Class: I.2.0; E.0

    Journal ref: Scientific Data 10, 487 (2023)

  16. arXiv:2202.03672  [pdf, other

    cs.LG

    APPFL: Open-Source Software Framework for Privacy-Preserving Federated Learning

    Authors: Minseok Ryu, Youngdae Kim, Kibaek Kim, Ravi K. Madduri

    Abstract: Federated learning (FL) enables training models at different sites and updating the weights from the training instead of transferring data to a central location and training as in classical machine learning. The FL capability is especially important to domains such as biomedicine and smart grid, where data may not be shared freely or stored at a central location because of policy challenges. Thank… ▽ More

    Submitted 15 March, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

    Comments: 9 pages, 4 figures, 1 table