Skip to main content

Showing 1–16 of 16 results for author: Grossman, R L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.07683  [pdf, other

    cs.LG cs.AI

    Multimodal Survival Modeling in the Age of Foundation Models

    Authors: Steven Song, Morgan Borjigin-Wang, Irene Madejski, Robert L. Grossman

    Abstract: The Cancer Genome Atlas (TCGA) has enabled novel discoveries and served as a large-scale reference through its harmonized genomics, clinical, and image data. Prior studies have trained bespoke cancer survival prediction models from unimodal or multimodal TCGA data. A modern paradigm in biomedical deep learning is the development of foundation models (FMs) to derive meaningful feature embeddings, a… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 23 pages, 7 figures, 8 tables

  2. arXiv:2502.11857  [pdf, other

    cs.DC

    A Proposed End-To-End Principle for Data Commons

    Authors: Robert L. Grossman

    Abstract: A data commons brings together (or co-locates) data with cloud computing infrastructure and commonly used software services, tools and applications for managing, analyzing and sharing data to create an interoperable resource for a research community. We introduce an architectural design principle for data commons called the narrow middle architecture that is broadly based upon the end-to-end argum… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: 4 pages, 1 figure

  3. arXiv:2411.16523  [pdf, other

    cs.CV cs.CL

    LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation

    Authors: Steven Song, Anirudh Subramanyam, Irene Madejski, Robert L. Grossman

    Abstract: In the current paradigm of image captioning, deep learning models are trained to generate text from image embeddings of latent features. We challenge the assumption that these latent features ought to be high-dimensional vectors which require model fine tuning to handle. Here we propose Label Boosted Retrieval Augmented Generation (LaB-RAG), a text-based approach to image captioning that leverages… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

  4. arXiv:2411.05248  [pdf

    cs.DC

    Ten Pillars for Data Meshes

    Authors: Robert L. Grossman, Ceilyn Boyd, Nhan Do, Danne C. Elbers, Michael S. Fitzsimons, Maryellen L. Giger, Anthony Juehne, Brienna Larrick, Jerry S. H. Lee, Dawei Lin, Michael Lukowski, James D. Myers, L. Philip Schumm, Aarti Venkat

    Abstract: Over the past few years, a growing number of data platforms have emerged, including data commons, data repositories, and databases containing biomedical, environmental, social determinants of health and other data relevant to improving health outcomes. With the growing number of data platforms, interoperating multiple data platforms to form data meshes, data fabrics and other types of data ecosyst… ▽ More

    Submitted 7 November, 2024; originally announced November 2024.

    Comments: 10 pages, 1 figure

  5. arXiv:2404.15475  [pdf, ps, other

    cs.IR

    An Annotated Glossary for Data Commons, Data Meshes, and Other Data Platforms

    Authors: Robert L. Grossman

    Abstract: Cloud-based data commons, data meshes, data hubs, and other data platforms are important ways to manage, analyze and share data to accelerate research and to support reproducible research. This is an annotated glossary of some of the more common terms used in articles and discussions about these platforms.

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 6 pages

  6. arXiv:2311.05659  [pdf, other

    cs.LG cs.AI

    Enhancing Instance-Level Image Classification with Set-Level Labels

    Authors: Renyu Zhang, Aly A. Khan, Yuxin Chen, Robert L. Grossman

    Abstract: Instance-level image classification tasks have traditionally relied on single-instance labels to train models, e.g., few-shot learning and transfer learning. However, set-level coarse-grained labels that capture relationships among instances can provide richer information in real-world scenarios. In this paper, we present a novel approach to enhance instance-level image classification by leveragin… ▽ More

    Submitted 17 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

  7. arXiv:2207.11167  [pdf, ps, other

    cs.DC

    Ten Lessons for Data Sharing With a Data Commons

    Authors: Robert L. Grossman

    Abstract: A data commons is a cloud-based data platform with a governance structure that allows a community to manage, analyze and share its data. Data commons provide a research community with the ability to manage and analyze large datasets using the elastic scalability provided by cloud computing and to share data securely and compliantly, and, in this way, accelerate the pace of research. Over the past… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

  8. arXiv:2203.05097  [pdf

    cs.DC

    A Framework for the Interoperability of Cloud Platforms: Towards FAIR Data in SAFE Environments

    Authors: Robert L. Grossman, Rebecca R. Boyles, Brandi N. Davis-Dusenbery, Amanda Haddock, Allison P. Heath, Brian D. O'Connor, Adam C. Resnick, Deanne M. Taylor, Stan Ahalt

    Abstract: As the number of cloud platforms supporting scientific research grows, there is an increasing need to support interoperability between two or more cloud platforms, as a growing amount of data is being hosted in cloud-based platforms. A well accepted core concept is to make data in cloud platforms Findable, Accessible, Interoperable and Reusable (FAIR). We introduce a companion concept that applies… ▽ More

    Submitted 15 February, 2024; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: 16 pages with 2 figures

    ACM Class: D.2.11; D.2.12; E.0

  9. arXiv:2112.13737  [pdf, other

    cs.LG cs.AI

    Scalable Batch-Mode Deep Bayesian Active Learning via Equivalence Class Annealing

    Authors: Renyu Zhang, Aly A. Khan, Robert L. Grossman, Yuxin Chen

    Abstract: Active learning has demonstrated data efficiency in many fields. Existing active learning algorithms, especially in the context of batch-mode deep Bayesian active models, rely heavily on the quality of uncertainty estimations of the model, and are often challenging to scale to large batches. In this paper, we propose Batch-BALanCe, a scalable batch-mode active learning algorithm, which combines in… ▽ More

    Submitted 20 February, 2023; v1 submitted 27 December, 2021; originally announced December 2021.

  10. arXiv:1809.01699  [pdf

    q-bio.GN cs.CY

    Data Lakes, Clouds and Commons: A Review of Platforms for Analyzing and Sharing Genomic Data

    Authors: Robert L. Grossman

    Abstract: Data commons collate data with cloud computing infrastructure and commonly used software services, tools and applications to create biomedical resources for the large-scale management, analysis, harmonization, and sharing of biomedical data. Over the past few years, data commons have been used to analyze, harmonize and share large scale genomics datasets. Data ecosystems can be built by interopera… ▽ More

    Submitted 24 December, 2018; v1 submitted 5 September, 2018; originally announced September 2018.

    Comments: 28 pages, 4 figures

  11. arXiv:1604.02608  [pdf, other

    cs.CY cs.DC

    A Case for Data Commons: Towards Data Science as a Service

    Authors: Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson, Walt Wells

    Abstract: As the amount of scientific data continues to grow at ever faster rates, the research community is increasingly in need of flexible computational infrastructure that can support the entirety of the data science lifecycle, including long-term data storage, data exploration and discovery services, and compute capabilities to support data analysis and re-analysis, as new data are added and as scienti… ▽ More

    Submitted 9 April, 2016; originally announced April 2016.

  12. arXiv:1601.00323  [pdf, other

    cs.CE

    The Design of a Community Science Cloud: The Open Science Data Cloud Perspective

    Authors: Robert L. Grossman, Matthew Greenway, Allison P. Heath, Ray Powell, Rafael D. Suarez, Walt Wells, Kevin White, Malcolm Atkinson, Iraklis Klampanos, Heidi L. Alvarez, Christine Harvey, Joe J. Mambretti

    Abstract: In this paper we describe the design, and implementation of the Open Science Data Cloud, or OSDC. The goal of the OSDC is to provide petabyte-scale data cloud infrastructure and related services for scientists working with large quantities of data. Currently, the OSDC consists of more than 2000 cores and 2 PB of storage distributed across four data centers connected by 10G networks. We discuss som… ▽ More

    Submitted 3 January, 2016; originally announced January 2016.

    Comments: 12 pages, 3 figures

  13. arXiv:1007.1261  [pdf, other

    cs.DC

    MalStone: Towards A Benchmark for Analytics on Large Data Clouds

    Authors: Collin Bennett, Robert L. Grossman, David Locke, Jonathan Seidman, Steve Vejcik

    Abstract: Developing data mining algorithms that are suitable for cloud computing platforms is currently an active area of research, as is developing cloud computing platforms appropriate for data mining. Currently, the most common benchmark for cloud computing is the Terasort (and related) benchmarks. Although the Terasort Benchmark is quite useful, it was not designed for data mining per se. In this paper… ▽ More

    Submitted 7 July, 2010; originally announced July 2010.

  14. arXiv:0809.1181  [pdf

    cs.DC

    Sector and Sphere: Towards Simplified Storage and Processing of Large Scale Distributed Data

    Authors: Yunhong Gu, Robert L Grossman

    Abstract: Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply given the right programming model and infrastructure. In this paper, we describe the design and implementation of the Sector storage cloud and the Sphere compute cloud. In contrast to existing storage and compute clouds, Sector can manage data not only within a data center, but also ac… ▽ More

    Submitted 16 January, 2009; v1 submitted 6 September, 2008; originally announced September 2008.

  15. arXiv:0808.3019  [pdf, other

    cs.DC

    Data Mining Using High Performance Data Clouds: Experimental Studies Using Sector and Sphere

    Authors: Robert L Grossman, Yunhong Gu

    Abstract: We describe the design and implementation of a high performance cloud that we have used to archive, analyze and mine large distributed data sets. By a cloud, we mean an infrastructure that provides resources and/or services over the Internet. A storage cloud provides storage services, while a compute cloud provides compute services. We describe the design of the Sector storage cloud and how it p… ▽ More

    Submitted 21 August, 2008; originally announced August 2008.

  16. arXiv:0808.1802  [pdf, other

    cs.DC

    Compute and Storage Clouds Using Wide Area High Performance Networks

    Authors: Robert L. Grossman, Yunhong Gu, Michael Sabala, Wanzhi Zhang

    Abstract: We describe a cloud based infrastructure that we have developed that is optimized for wide area, high performance networks and designed to support data mining applications. The infrastructure consists of a storage cloud called Sector and a compute cloud called Sphere. We describe two applications that we have built using the cloud and some experimental studies.

    Submitted 13 August, 2008; originally announced August 2008.