-
Measuring Heterogeneity in Machine Learning with Distributed Energy Distance
Authors:
Mengchen Fan,
Baocheng Geng,
Roman Shterenberg,
Joseph A. Casey,
Zhong Chen,
Keren Li
Abstract:
In distributed and federated learning, heterogeneity across data sources remains a major obstacle to effective model aggregation and convergence. We focus on feature heterogeneity and introduce energy distance as a sensitive measure for quantifying distributional discrepancies. While we show that energy distance is robust for detecting data distribution shifts, its direct use in large-scale system…
▽ More
In distributed and federated learning, heterogeneity across data sources remains a major obstacle to effective model aggregation and convergence. We focus on feature heterogeneity and introduce energy distance as a sensitive measure for quantifying distributional discrepancies. While we show that energy distance is robust for detecting data distribution shifts, its direct use in large-scale systems can be prohibitively expensive. To address this, we develop Taylor approximations that preserve key theoretical quantitative properties while reducing computational overhead. Through simulation studies, we show how accurately capturing feature discrepancies boosts convergence in distributed learning. Finally, we propose a novel application of energy distance to assign penalty weights for aligning predictions across heterogeneous nodes, ultimately enhancing coordination in federated and distributed settings.
△ Less
Submitted 27 January, 2025;
originally announced January 2025.
-
Towards Precision Cardiovascular Analysis in Zebrafish: The ZACAF Paradigm
Authors:
Amir Mohammad Naderi,
Jennifer G. Casey,
Mao-Hsiang Huang,
Rachelle Victorio,
David Y. Chiang,
Calum MacRae,
Hung Cao,
Vandana A. Gupta
Abstract:
Quantifying cardiovascular parameters like ejection fraction in zebrafish as a host of biological investigations has been extensively studied. Since current manual monitoring techniques are time-consuming and fallible, several image processing frameworks have been proposed to automate the process. Most of these works rely on supervised deep-learning architectures. However, supervised methods tend…
▽ More
Quantifying cardiovascular parameters like ejection fraction in zebrafish as a host of biological investigations has been extensively studied. Since current manual monitoring techniques are time-consuming and fallible, several image processing frameworks have been proposed to automate the process. Most of these works rely on supervised deep-learning architectures. However, supervised methods tend to be overfitted on their training dataset. This means that applying the same framework to new data with different imaging setups and mutant types can severely decrease performance. We have developed a Zebrafish Automatic Cardiovascular Assessment Framework (ZACAF) to quantify the cardiac function in zebrafish. In this work, we further applied data augmentation, Transfer Learning (TL), and Test Time Augmentation (TTA) to ZACAF to improve the performance for the quantification of cardiovascular function quantification in zebrafish. This strategy can be integrated with the available frameworks to aid other researchers. We demonstrate that using TL, even with a constrained dataset, the model can be refined to accommodate a novel microscope setup, encompassing diverse mutant types and accommodating various video recording protocols. Additionally, as users engage in successive rounds of TL, the model is anticipated to undergo substantial enhancements in both generalizability and accuracy. Finally, we applied this approach to assess the cardiovascular function in nrap mutant zebrafish, a model of cardiomyopathy.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Navigating the Mise-en-Page: Interpretive Machine Learning Approaches to the Visual Layouts of Multi-Ethnic Periodicals
Authors:
Benjamin Charles Germain Lee,
Joshua Ortiz Baco,
Sarah H. Salter,
Jim Casey
Abstract:
This paper presents a computational method of analysis that draws from machine learning, library science, and literary studies to map the visual layouts of multi-ethnic newspapers from the late 19th and early 20th century United States. This work departs from prior approaches to newspapers that focus on individual pieces of textual and visual content. Our method combines Chronicling America's MARC…
▽ More
This paper presents a computational method of analysis that draws from machine learning, library science, and literary studies to map the visual layouts of multi-ethnic newspapers from the late 19th and early 20th century United States. This work departs from prior approaches to newspapers that focus on individual pieces of textual and visual content. Our method combines Chronicling America's MARC data and the Newspaper Navigator machine learning dataset to identify the visual patterns of newspaper page layouts. By analyzing high-dimensional visual similarity, we aim to better understand how editors spoke and protested through the layout of their papers.
△ Less
Submitted 3 September, 2021;
originally announced September 2021.
-
ÆtherFlow: Principled Wireless Support in SDN
Authors:
Muxi Yan,
Jasson Casey,
Prithviraj Shome,
Alex Sprintson,
Andrew Sutton
Abstract:
Software Defined Networking (SDN) drastically changes the meaning and process of designing, building, testing, and operating networks. The current support for wireless net- working in SDN technologies has lagged behind its development and deployment for wired networks. The purpose of this work is to bring principled support for wireless access networks so that they can receive the same level of pr…
▽ More
Software Defined Networking (SDN) drastically changes the meaning and process of designing, building, testing, and operating networks. The current support for wireless net- working in SDN technologies has lagged behind its development and deployment for wired networks. The purpose of this work is to bring principled support for wireless access networks so that they can receive the same level of programmability as wireline interfaces. Specifically we aim to integrate wireless protocols into the general SDN framework by proposing a new set of abstractions in wireless devices and the interfaces to manipulate them. We validate our approach by implementing our design as an extension of an existing OpenFlow data plane and deploying it in an IEEE 802.11 access point. We demonstrate the viability of software-defined wireless access networks by developing and testing a wireless handoff application. The results of the exper- iment show that our framework is capable of providing new capabilities in an efficient manner.
△ Less
Submitted 15 September, 2015;
originally announced September 2015.
-
tinyNBI: Distilling an API from essential OpenFlow abstractions
Authors:
C. Jasson Casey,
Andrew Sutton,
Alex Sprintson
Abstract:
If simplicity is a key strategy for success as a network protocol OpenFlow is not winning. At its core OpenFlow presents a simple idea, which is a network switch data plane abstraction along with a control protocol for manipulating that abstraction. The result of this idea has been far from simple: a new version released each year, five active versions, com- plex feature dependencies, unstable ver…
▽ More
If simplicity is a key strategy for success as a network protocol OpenFlow is not winning. At its core OpenFlow presents a simple idea, which is a network switch data plane abstraction along with a control protocol for manipulating that abstraction. The result of this idea has been far from simple: a new version released each year, five active versions, com- plex feature dependencies, unstable version negotiation, lack of state machine definition, etc. This complexity represents roadblocks for network, software, and hardware engineers.
We have distilled the core abstractions present in 5 existing versions of OpenFlow and refactored them into a simple API called tinyNBI. Our work does not provide high-level network abstractions (address pools, VPN maps, etc.), instead it focuses on providing a clean low level interface that supports the development of these higher layer abstractions. The goal of tinyNBI is to allow configuration of all existing OpenFlow abstractions without having to deal with the unique personalities of each version of OpenFlow or their level of support in target switches.
△ Less
Submitted 26 March, 2014;
originally announced March 2014.
-
The IceProd Framework: Distributed Data Processing for the IceCube Neutrino Observatory
Authors:
M. G. Aartsen,
R. Abbasi,
M. Ackermann,
J. Adams,
J. A. Aguilar,
M. Ahlers,
D. Altmann,
C. Arguelles,
J. Auffenberg,
X. Bai,
M. Baker,
S. W. Barwick,
V. Baum,
R. Bay,
J. J. Beatty,
J. Becker Tjus,
K. -H. Becker,
S. BenZvi,
P. Berghaus,
D. Berley,
E. Bernardini,
A. Bernhard,
D. Z. Besson,
G. Binder,
D. Bindig
, et al. (262 additional authors not shown)
Abstract:
IceCube is a one-gigaton instrument located at the geographic South Pole, designed to detect cosmic neutrinos, iden- tify the particle nature of dark matter, and study high-energy neutrinos themselves. Simulation of the IceCube detector and processing of data require a significant amount of computational resources. IceProd is a distributed management system based on Python, XML-RPC and GridFTP. It…
▽ More
IceCube is a one-gigaton instrument located at the geographic South Pole, designed to detect cosmic neutrinos, iden- tify the particle nature of dark matter, and study high-energy neutrinos themselves. Simulation of the IceCube detector and processing of data require a significant amount of computational resources. IceProd is a distributed management system based on Python, XML-RPC and GridFTP. It is driven by a central database in order to coordinate and admin- ister production of simulations and processing of data produced by the IceCube detector. IceProd runs as a separate layer on top of other middleware and can take advantage of a variety of computing resources, including grids and batch systems such as CREAM, Condor, and PBS. This is accomplished by a set of dedicated daemons that process job submission in a coordinated fashion through the use of middleware plugins that serve to abstract the details of job submission and job management from the framework.
△ Less
Submitted 22 August, 2014; v1 submitted 22 November, 2013;
originally announced November 2013.
-
Eliminating Network Protocol Vulnerabilities Through Abstraction and Systems Language Design
Authors:
C. Jasson Casey,
Andrew Sutton,
Gabriel Dos Reis,
Alex Sprintson
Abstract:
Incorrect implementations of network protocol message specifications affect the stability, security, and cost of network system development. Most implementation defects fall into one of three categories of well defined message constraints. However, the general process of constructing network protocol stacks and systems does not capture these categorical con- straints. We introduce a systems progra…
▽ More
Incorrect implementations of network protocol message specifications affect the stability, security, and cost of network system development. Most implementation defects fall into one of three categories of well defined message constraints. However, the general process of constructing network protocol stacks and systems does not capture these categorical con- straints. We introduce a systems programming language with new abstractions that capture these constraints. Safe and efficient implementations of standard message handling operations are synthesized by our compiler, and whole-program analysis is used to ensure constraints are never violated. We present language examples using the OpenFlow protocol.
△ Less
Submitted 13 November, 2013;
originally announced November 2013.