-
Complexity Synchronization
Authors:
Korosh Mahmoodi,
Scott E. Kerick,
Paolo Grigolini,
Piotr J. Franaszczuk,
Bruce J. West
Abstract:
The observational ubiquity of inverse power law spectra (IPL) in complex phenomena entails theory for dynamic fractal phenomena capturing their fractal dimension, dynamics, and statistics. These and other properties are consequences of the complexity resulting from nonlinear dynamic networks collectively summarized for biomedical phenomena as the Network Effect (NE) or focused more narrowly as Net…
▽ More
The observational ubiquity of inverse power law spectra (IPL) in complex phenomena entails theory for dynamic fractal phenomena capturing their fractal dimension, dynamics, and statistics. These and other properties are consequences of the complexity resulting from nonlinear dynamic networks collectively summarized for biomedical phenomena as the Network Effect (NE) or focused more narrowly as Network Physiology. Herein we address the measurable consequences of the NE on time series generated by different parts of the brain, heart, and lung organ networks, which are directly related to their inter-network and intra-network interactions. Moreover, these same physiologic organ networks have been shown to generate crucial event (CE) time series, and herein are shown, using modified diffusion entropy analysis (MDEA), to have scaling indices with quasiperiodic changes in complexity, as measured by scaling indices, over time. Such time series are generated by different parts of the brain, heart, and lung organ networks, and the results do not depend on the underlying coherence properties of the associated time series but demonstrate a generalized synchronization of complexity. This high order synchrony among the scaling indices of EEG (brain), ECG (heart), and respiratory time series is governed by the quantitative interdependence of the multifractal behavior of the various physiological organs' network dynamics. This consequence of the NE opens the door for an entirely general characterization of the dynamics of complex networks in terms of complexity synchronization (CS) independently of the scientific, engineering, or technological context.
△ Less
Submitted 29 October, 2022;
originally announced October 2022.
-
Bayesian Learning: A Selective Overview
Authors:
Yu Lin Hsu,
Chu Chuan Jeng,
Pavithra Sripathanallur Murali,
Mohammadreza Torkjazi,
Jonathan West,
Michaela Zuber,
Vadim Sokolov
Abstract:
This paper presents an overview of some of the concepts of Bayesian Learning. The number of scientific and industrial applications of Bayesian learning has been growing in size rapidly over the last few decades. This process has started with the wide use of Markov Chain Monte Carlo methods that emerged as a dominant computational technique for Bayesian in the early 1990's. Since then Bayesian lear…
▽ More
This paper presents an overview of some of the concepts of Bayesian Learning. The number of scientific and industrial applications of Bayesian learning has been growing in size rapidly over the last few decades. This process has started with the wide use of Markov Chain Monte Carlo methods that emerged as a dominant computational technique for Bayesian in the early 1990's. Since then Bayesian learning has spread well across several fields from robotics and machine learning to medical applications. This paper provides an overview of some of the widely used concepts and shows several applications. This is a paper based on the series of seminars given by students of a PhD course on Bayesian Learning at George Mason University. The course was taught in the Fall of 2021. Thus, the topics covered in the paper reflect the topics students selected to study.
△ Less
Submitted 23 December, 2021;
originally announced December 2021.
-
Gender-based homophily in collaborations across a heterogeneous scholarly landscape
Authors:
Y. Samuel Wang,
Carole J. Lee,
Jevin D. West,
Carl T. Bergstrom,
Elena A. Erosheva
Abstract:
In this article, we investigate the role of gender in collaboration patterns by analyzing gender-based homophily -- the tendency for researchers to co-author with individuals of the same gender. We develop and apply novel methodology to the corpus of JSTOR articles, a broad scholarly landscape, which we analyze at various levels of granularity. Most notably, for a precise analysis of gender homoph…
▽ More
In this article, we investigate the role of gender in collaboration patterns by analyzing gender-based homophily -- the tendency for researchers to co-author with individuals of the same gender. We develop and apply novel methodology to the corpus of JSTOR articles, a broad scholarly landscape, which we analyze at various levels of granularity. Most notably, for a precise analysis of gender homophily, we develop methodology which explicitly accounts for the fact that the data comprises heterogeneous intellectual communities and that not all authorships are exchangeable. In particular, we distinguish three phenomena which may affect the distribution of observed gender homophily in collaborations: a structural component that is due to demographics and non-gendered authorship norms of a scholarly community, a compositional component which is driven by varying gender representation across sub-disciplines and time, and a behavioral component which we define as the remainder of observed gender homophily after its structural and compositional components have been taken into account. Using minimal modeling assumptions, the methodology we develop allows us to test for behavioral homophily. We find that statistically significant behavioral homophily can be detected across the JSTOR corpus and show that this finding is robust to missing gender indicators in our data. In a secondary analysis, we show that the proportion of women representation in a field is positively associated with the probability of finding statistically significant behavioral homophily.
△ Less
Submitted 16 June, 2022; v1 submitted 3 September, 2019;
originally announced September 2019.
-
Delineating Knowledge Domains in the Scientific Literature Using Visual Information
Authors:
Sean Yang,
Po-shen Lee,
Jevin D. West,
Bill Howe
Abstract:
Figures are an important channel for scientific communication, used to express complex ideas, models and data in ways that words cannot. However, this visual information is mostly ignored in analyses of the scientific literature. In this paper, we demonstrate the utility of using scientific figures as markers of knowledge domains in science, which can be used for classification, recommender system…
▽ More
Figures are an important channel for scientific communication, used to express complex ideas, models and data in ways that words cannot. However, this visual information is mostly ignored in analyses of the scientific literature. In this paper, we demonstrate the utility of using scientific figures as markers of knowledge domains in science, which can be used for classification, recommender systems, and studies of scientific information exchange. We encode sets of images into a visual signature, then use distances between these signatures to understand how patterns of visual communication compare with patterns of jargon and citation structures. We find that figures can be as effective for differentiating communities of practice as text or citation patterns. We then consider where these metrics disagree to understand how different disciplines use visualization to express ideas. Finally, we further consider how specific figure types propagate through the literature, suggesting a new mechanism for understanding the flow of ideas apart from conventional channels of text and citations. Our ultimate aim is to better leverage these information-dense objects to improve scientific communication across disciplinary boundaries.
△ Less
Submitted 12 August, 2019;
originally announced August 2019.
-
Improved Reinforcement Learning with Curriculum
Authors:
Joseph West,
Frederic Maire,
Cameron Browne,
Simon Denman
Abstract:
Humans tend to learn complex abstract concepts faster if examples are presented in a structured manner. For instance, when learning how to play a board game, usually one of the first concepts learned is how the game ends, i.e. the actions that lead to a terminal state (win, lose or draw). The advantage of learning end-games first is that once the actions which lead to a terminal state are understo…
▽ More
Humans tend to learn complex abstract concepts faster if examples are presented in a structured manner. For instance, when learning how to play a board game, usually one of the first concepts learned is how the game ends, i.e. the actions that lead to a terminal state (win, lose or draw). The advantage of learning end-games first is that once the actions which lead to a terminal state are understood, it becomes possible to incrementally learn the consequences of actions that are further away from a terminal state - we call this an end-game-first curriculum. Currently the state-of-the-art machine learning player for general board games, AlphaZero by Google DeepMind, does not employ a structured training curriculum; instead learning from the entire game at all times. By employing an end-game-first training curriculum to train an AlphaZero inspired player, we empirically show that the rate of learning of an artificial player can be improved during the early stages of training when compared to a player not using a training curriculum.
△ Less
Submitted 10 June, 2019; v1 submitted 28 March, 2019;
originally announced March 2019.
-
Stem-ming the Tide: Predicting STEM attrition using student transcript data
Authors:
Lovenoor Aulck,
Rohan Aras,
Lysia Li,
Coulter L'Heureux,
Peter Lu,
Jevin West
Abstract:
Science, technology, engineering, and math (STEM) fields play growing roles in national and international economies by driving innovation and generating high salary jobs. Yet, the US is lagging behind other highly industrialized nations in terms of STEM education and training. Furthermore, many economic forecasts predict a rising shortage of domestic STEM-trained professions in the US for years to…
▽ More
Science, technology, engineering, and math (STEM) fields play growing roles in national and international economies by driving innovation and generating high salary jobs. Yet, the US is lagging behind other highly industrialized nations in terms of STEM education and training. Furthermore, many economic forecasts predict a rising shortage of domestic STEM-trained professions in the US for years to come. One potential solution to this deficit is to decrease the rates at which students leave STEM-related fields in higher education, as currently over half of all students intending to graduate with a STEM degree eventually attrite. However, little quantitative research at scale has looked at causes of STEM attrition, let alone the use of machine learning to examine how well this phenomenon can be predicted. In this paper, we detail our efforts to model and predict dropout from STEM fields using one of the largest known datasets used for research on students at a traditional campus setting. Our results suggest that attrition from STEM fields can be accurately predicted with data that is routinely collected at universities using only information on students' first academic year. We also propose a method to model student STEM intentions for each academic term to better understand the timing of STEM attrition events. We believe these results show great promise in using machine learning to improve STEM retention in traditional and non-traditional campus settings.
△ Less
Submitted 28 August, 2017;
originally announced August 2017.
-
Data Integration Model for Air Quality: A Hierarchical Approach to the Global Estimation of Exposures to Ambient Air Pollution
Authors:
Gavin Shaddick,
Matthew L. Thomas,
Amelia Jobling,
Michael Brauer,
Aaron van Donkelaar,
Rick Burnett,
Howard Chang,
Aaron Cohen,
Rita Van Dingenen,
Carlos Dora,
Sophie Gumy,
Yang Liu,
Randall Martin,
Lance A. Waller,
Jason West,
James V. Zidek,
Annette Prüss-Ustün
Abstract:
Air pollution is a major risk factor for global health, with both ambient and household air pollution contributing substantial components of the overall global disease burden. One of the key drivers of adverse health effects is fine particulate matter ambient pollution (PM$_{2.5}$) to which an estimated 3 million deaths can be attributed annually. The primary source of information for estimating e…
▽ More
Air pollution is a major risk factor for global health, with both ambient and household air pollution contributing substantial components of the overall global disease burden. One of the key drivers of adverse health effects is fine particulate matter ambient pollution (PM$_{2.5}$) to which an estimated 3 million deaths can be attributed annually. The primary source of information for estimating exposures has been measurements from ground monitoring networks but, although coverage is increasing, there remain regions in which monitoring is limited. Ground monitoring data therefore needs to be supplemented with information from other sources, such as satellite retrievals of aerosol optical depth and chemical transport models. A hierarchical modelling approach for integrating data from multiple sources is proposed allowing spatially-varying relationships between ground measurements and other factors that estimate air quality. Set within a Bayesian framework, the resulting Data Integration Model for Air Quality (DIMAQ) is used to estimate exposures, together with associated measures of uncertainty, on a high resolution grid covering the entire world. Bayesian analysis on this scale can be computationally challenging and here approximate Bayesian inference is performed using Integrated Nested Laplace Approximations. Model selection and assessment is performed by cross-validation with the final model offering substantial increases in predictive accuracy, particularly in regions where there is sparse ground monitoring, when compared to current approaches: root mean square error (RMSE) reduced from 17.1 to 10.7, and population weighted RMSE from 23.1 to 12.1 $μ$gm$^{-3}$. Based on summaries of the posterior distributions for each grid cell, it is estimated that 92% of the world's population reside in areas exceeding the World Health Organization's Air Quality Guidelines.
△ Less
Submitted 26 September, 2016; v1 submitted 1 September, 2016;
originally announced September 2016.
-
Predicting Student Dropout in Higher Education
Authors:
Lovenoor Aulck,
Nishant Velagapudi,
Joshua Blumenstock,
Jevin West
Abstract:
Each year, roughly 30% of first-year students at US baccalaureate institutions do not return for their second year and over $9 billion is spent educating these students. Yet, little quantitative research has analyzed the causes and possible remedies for student attrition. Here, we describe initial efforts to model student dropout using the largest known dataset on higher education attrition, which…
▽ More
Each year, roughly 30% of first-year students at US baccalaureate institutions do not return for their second year and over $9 billion is spent educating these students. Yet, little quantitative research has analyzed the causes and possible remedies for student attrition. Here, we describe initial efforts to model student dropout using the largest known dataset on higher education attrition, which tracks over 32,500 students' demographics and transcript records at one of the nation's largest public universities. Our results highlight several early indicators of student attrition and show that dropout can be accurately predicted even when predictions are based on a single term of academic transcript data. These results highlight the potential for machine learning to have an impact on student retention and success while pointing to several promising directions for future work.
△ Less
Submitted 7 March, 2017; v1 submitted 20 June, 2016;
originally announced June 2016.