-
Atlas: A Novel Pathology Foundation Model by Mayo Clinic, Charité, and Aignostics
Authors:
Maximilian Alber,
Stephan Tietz,
Jonas Dippel,
Timo Milbich,
Timothée Lesort,
Panos Korfiatis,
Moritz Krügener,
Beatriz Perez Cancer,
Neelay Shah,
Alexander Möllers,
Philipp Seegerer,
Alexandra Carpen-Amarie,
Kai Standvoss,
Gabriel Dernbach,
Edwin de Jong,
Simon Schallenberg,
Andreas Kunft,
Helmut Hoffer von Ankershoffen,
Gavin Schaeferle,
Patrick Duffy,
Matt Redlon,
Philipp Jurmeister,
David Horst,
Lukas Ruff,
Klaus-Robert Müller
, et al. (2 additional authors not shown)
Abstract:
Recent advances in digital pathology have demonstrated the effectiveness of foundation models across diverse applications. In this report, we present Atlas, a novel vision foundation model based on the RudolfV approach. Our model was trained on a dataset comprising 1.2 million histopathology whole slide images, collected from two medical institutions: Mayo Clinic and Charité - Universtätsmedizin B…
▽ More
Recent advances in digital pathology have demonstrated the effectiveness of foundation models across diverse applications. In this report, we present Atlas, a novel vision foundation model based on the RudolfV approach. Our model was trained on a dataset comprising 1.2 million histopathology whole slide images, collected from two medical institutions: Mayo Clinic and Charité - Universtätsmedizin Berlin. Comprehensive evaluations show that Atlas achieves state-of-the-art performance across twenty-one public benchmark datasets, even though it is neither the largest model by parameter count nor by training dataset size.
△ Less
Submitted 10 January, 2025; v1 submitted 9 January, 2025;
originally announced January 2025.
-
Tuning MPI Collectives by Verifying Performance Guidelines
Authors:
Sascha Hunold,
Alexandra Carpen-Amarie
Abstract:
MPI collective operations provide a standardized interface for performing data movements within a group of processes. The efficiency of collective communication operations depends on the actual algorithm, its implementation, and the specific communication problem (type of communication, message size, number of processes). Many MPI libraries provide numerous algorithms for specific collective opera…
▽ More
MPI collective operations provide a standardized interface for performing data movements within a group of processes. The efficiency of collective communication operations depends on the actual algorithm, its implementation, and the specific communication problem (type of communication, message size, number of processes). Many MPI libraries provide numerous algorithms for specific collective operations. The strategy for selecting an efficient algorithm is often times predefined (hard-coded) in MPI libraries, but some of them, such as Open MPI, allow users to change the algorithm manually. Finding the best algorithm for each case is a hard problem, and several approaches to tune these algorithmic parameters have been proposed. We use an orthogonal approach to the parameter-tuning of MPI collectives, that is, instead of testing individual algorithmic choices provided by an MPI library, we compare the latency of a specific MPI collective operation to the latency of semantically equivalent functions, which we call the mock-up implementations. The structure of the mock-up implementations is defined by self-consistent performance guidelines. The advantage of this approach is that tuning using mock-up implementations is always possible, whether or not an MPI library allows users to select a specific algorithm at run-time. We implement this concept in a library called PGMPITuneLib, which is layered between the user code and the actual MPI implementation. This library selects the best-performing algorithmic pattern of an MPI collective by intercepting MPI calls and redirecting them to our mock-up implementations. Experimental results show that PGMPITuneLib can significantly reduce the latency of MPI collectives, and also equally important, that it can help identifying the tuning potential of MPI libraries.
△ Less
Submitted 10 August, 2017; v1 submitted 31 July, 2017;
originally announced July 2017.
-
MPI Derived Datatypes: Performance Expectations and Status Quo
Authors:
Alexandra Carpen-Amarie,
Sascha Hunold,
Jesper Larsson Träff
Abstract:
We examine natural expectations on communication performance using MPI derived datatypes in comparison to the baseline, "raw" performance of communicating simple, non-contiguous data layouts. We show that common MPI libraries sometimes violate these datatype performance expectations, and discuss reasons why this happens, but also show cases where MPI libraries perform well. Our findings are in man…
▽ More
We examine natural expectations on communication performance using MPI derived datatypes in comparison to the baseline, "raw" performance of communicating simple, non-contiguous data layouts. We show that common MPI libraries sometimes violate these datatype performance expectations, and discuss reasons why this happens, but also show cases where MPI libraries perform well. Our findings are in many ways surprising and disappointing. First, the performance of derived datatypes is sometimes worse than the semantically equivalent packing and unpacking using the corresponding MPI functionality. Second, the communication performance equivalence stated in the MPI standard between a single contiguous datatype and the repetition of its constituent datatype does not hold universally. Third, the heuristics that are typically employed by MPI libraries at type-commit time are insufficient to enforce natural performance guidelines, and better type normalization heuristics may have a significant performance impact. We show cases where all the MPI type constructors are necessary to achieve the expected performance for certain data layouts. We describe our benchmarking approach to verify the datatype performance guidelines, and present extensive verification results for different MPI libraries.
△ Less
Submitted 1 July, 2016;
originally announced July 2016.
-
Message-Combining Algorithms for Isomorphic, Sparse Collective Communication
Authors:
Jesper Larsson Träff,
Alexandra Carpen-Amarie,
Sascha Hunold,
Antoine Rougier
Abstract:
Isomorphic (sparse) collective communication is a form of collective communication in which all involved processes communicate in small, identically structured neighborhoods of other processes. Isomorphic neighborhoods are defined via an embedding of the processes in a regularly structured topology, e.g., $d$-dimensional torus, which may correspond to the physical communication network of the unde…
▽ More
Isomorphic (sparse) collective communication is a form of collective communication in which all involved processes communicate in small, identically structured neighborhoods of other processes. Isomorphic neighborhoods are defined via an embedding of the processes in a regularly structured topology, e.g., $d$-dimensional torus, which may correspond to the physical communication network of the underlying system. Isomorphic collective communication is useful for implementing stencil and other regular, sparse distributed computations, where the assumption that all processes behave (almost) symmetrically is justified.
In this paper, we show how efficient message-combining communication schedules for isomorphic, sparse collective communication can easily and efficiently be computed by purely local computations. We give schemes for \emph{isomorphic \alltoall} and \emph{\allgather} communication that reduce the number of communication rounds and thereby the communication latency from $s$ to at most $Nd$, for neighborhoods consisting of $s$ processes with the (small) factor $N$ depending on the structure of the neighborhood and the capabilities of the communication system. Using these schedules, we give \emph{zero-copy implementations} of the isomorphic collectives using MPI and its derived datatypes to eliminate explicit, process-local copy operations. By benchmarking the collective communication algorithms against straightforward implementations and against the corresponding MPI neighborhood collectives, we document significant latency improvements of our implementations for block sizes of up to a few kilobytes. We discuss further optimizations for computing even better schedules, some of which have been implemented and benchmarked.
△ Less
Submitted 24 June, 2016;
originally announced June 2016.
-
PGMPI: Automatically Verifying Self-Consistent MPI Performance Guidelines
Authors:
Sascha Hunold,
Alexandra Carpen-Amarie,
Felix Donatus Lübbe,
Jesper Larsson Träff
Abstract:
The Message Passing Interface (MPI) is the most commonly used application programming interface for process communication on current large-scale parallel systems. Due to the scale and complexity of modern parallel architectures, it is becoming increasingly difficult to optimize MPI libraries, as many factors can influence the communication performance. To assist MPI developers and users, we propos…
▽ More
The Message Passing Interface (MPI) is the most commonly used application programming interface for process communication on current large-scale parallel systems. Due to the scale and complexity of modern parallel architectures, it is becoming increasingly difficult to optimize MPI libraries, as many factors can influence the communication performance. To assist MPI developers and users, we propose an automatic way to check whether MPI libraries respect self-consistent performance guidelines for collective communication operations. We introduce the PGMPI framework to detect violations of performance guidelines through benchmarking. Our experimental results show that PGMPI can pinpoint undesired and often unexpected performance degradations of collective MPI operations. We demonstrate how to overcome performance issues of several libraries by adapting the algorithmic implementations of their respective collective MPI calls.
△ Less
Submitted 2 September, 2016; v1 submitted 1 June, 2016;
originally announced June 2016.
-
MPI Benchmarking Revisited: Experimental Design and Reproducibility
Authors:
Sascha Hunold,
Alexandra Carpen-Amarie
Abstract:
The Message Passing Interface (MPI) is the prevalent programming model used on today's supercomputers. Therefore, MPI library developers are looking for the best possible performance (shortest run-time) of individual MPI functions across many different supercomputer architectures. Several MPI benchmark suites have been developed to assess the performance of MPI implementations. Unfortunately, the…
▽ More
The Message Passing Interface (MPI) is the prevalent programming model used on today's supercomputers. Therefore, MPI library developers are looking for the best possible performance (shortest run-time) of individual MPI functions across many different supercomputer architectures. Several MPI benchmark suites have been developed to assess the performance of MPI implementations. Unfortunately, the outcome of these benchmarks is often neither reproducible nor statistically sound. To overcome these issues, we show which experimental factors have an impact on the run-time of blocking collective MPI operations and how to control them. We address the problem of process and clock synchronization in MPI benchmarks. Finally, we present a new experimental method that allows us to obtain reproducible and statistically sound MPI measurements.
△ Less
Submitted 27 May, 2016; v1 submitted 28 May, 2015;
originally announced May 2015.
-
An Algorithm for File Transfer Scheduling in Grid Environments
Authors:
Alexandra Carpen-Amarie,
Mugurel Ionut Andreica,
Valentin Cristea
Abstract:
This paper addresses the data transfer scheduling problem for Grid environments, presenting a centralized scheduler developed with dynamic and adaptive features. The algorithm offers a reservation system for user transfer requests that allocates them transfer times and bandwidth, according to the network topology and the constraints the user specified for the requests. This paper presents the pr…
▽ More
This paper addresses the data transfer scheduling problem for Grid environments, presenting a centralized scheduler developed with dynamic and adaptive features. The algorithm offers a reservation system for user transfer requests that allocates them transfer times and bandwidth, according to the network topology and the constraints the user specified for the requests. This paper presents the projects related to the data transfer field, the design of the framework for which the scheduler was built, the main features of the scheduler, the steps for transfer requests rescheduling and two tests that illustrate the system's behavior for different types of transfer requests.
△ Less
Submitted 2 January, 2009;
originally announced January 2009.