-
An Ensemble Approach for Brain Tumor Segmentation and Synthesis
Authors:
Juampablo E. Heras Rivera,
Agamdeep S. Chopra,
Tianyi Ren,
Hitender Oswal,
Yutong Pan,
Zineb Sordo,
Sophie Walters,
William Henry,
Hooman Mohammadi,
Riley Olson,
Fargol Rezayaraghi,
Tyson Lam,
Akshay Jaikanth,
Pavan Kancharla,
Jacob Ruzevick,
Daniela Ushizima,
Mehmet Kurt
Abstract:
The integration of machine learning in magnetic resonance imaging (MRI), specifically in neuroimaging, is proving to be incredibly effective, leading to better diagnostic accuracy, accelerated image analysis, and data-driven insights, which can potentially transform patient care. Deep learning models utilize multiple layers of processing to capture intricate details of complex data, which can then…
▽ More
The integration of machine learning in magnetic resonance imaging (MRI), specifically in neuroimaging, is proving to be incredibly effective, leading to better diagnostic accuracy, accelerated image analysis, and data-driven insights, which can potentially transform patient care. Deep learning models utilize multiple layers of processing to capture intricate details of complex data, which can then be used on a variety of tasks, including brain tumor classification, segmentation, image synthesis, and registration. Previous research demonstrates high accuracy in tumor segmentation using various model architectures, including nn-UNet and Swin-UNet. U-Mamba, which uses state space modeling, also achieves high accuracy in medical image segmentation. To leverage these models, we propose a deep learning framework that ensembles these state-of-the-art architectures to achieve accurate segmentation and produce finely synthesized images.
△ Less
Submitted 26 November, 2024;
originally announced November 2024.
-
Automatic Search of Multiword Place Names on Historical Maps
Authors:
Rhett Olson,
Jina Kim,
Yao-Yi Chiang
Abstract:
Historical maps are invaluable sources of information about the past, and scanned historical maps are increasingly accessible in online libraries. To retrieve maps from these large libraries that contain specific places of interest, previous work has applied computer vision techniques to recognize words on historical maps, enabling searches for maps that contain specific place names. However, sear…
▽ More
Historical maps are invaluable sources of information about the past, and scanned historical maps are increasingly accessible in online libraries. To retrieve maps from these large libraries that contain specific places of interest, previous work has applied computer vision techniques to recognize words on historical maps, enabling searches for maps that contain specific place names. However, searching for multiword place names is challenging due to complex layouts of text labels on historical maps. This paper proposes an efficient query method for searching a given multiword place name on historical maps. Using existing methods to recognize words on historical maps, we link single-word text labels into potential multiword phrases by constructing minimum spanning trees. These trees aim to link pairs of text labels that are spatially close and have similar height, angle, and capitalization. We then query these trees for the given multiword place name. We evaluate the proposed method in two experiments: 1) to evaluate the accuracy of the minimum spanning tree approach at linking multiword place names and 2) to evaluate the number and time range of maps retrieved by the query approach. The resulting maps reveal how places using multiword names have changed on a large number of maps from across history.
△ Less
Submitted 21 October, 2024; v1 submitted 20 October, 2024;
originally announced October 2024.
-
A Matrix Exponential Generalization of the Laplace Transform of Poisson Shot Noise
Authors:
Nicholas R. Olson,
Jeffrey G. Andrews
Abstract:
We consider a generalization of the Laplace transform of Poisson shot noise defined as an integral transform with respect to a matrix exponential. We denote this as the matrix Laplace transform and establish that it is in general a matrix function extension of the scalar Laplace transform. We show that the matrix Laplace transform of Poisson shot noise admits an expression analogous to that implie…
▽ More
We consider a generalization of the Laplace transform of Poisson shot noise defined as an integral transform with respect to a matrix exponential. We denote this as the matrix Laplace transform and establish that it is in general a matrix function extension of the scalar Laplace transform. We show that the matrix Laplace transform of Poisson shot noise admits an expression analogous to that implied by Campbell's theorem. We demonstrate the utility of this generalization of Campbell's theorem in two important applications: the characterization of a Poisson shot noise process and the derivation of the complementary CDF (CCDF) and meta-distribution of signal-to-interference-and-noise (SINR) models in Poisson networks. In the former application, we demonstrate how the higher order moments of Poisson shot noise may be obtained directly from the elements of its matrix Laplace transform. We further show how the CCDF of this object may be bounded using a summation of the first row of its matrix Laplace transform. For the latter application, we show how the CCDF of SINR models with phase-type distributed desired signal power may be obtained via an expectation of the matrix Laplace transform of the interference and noise, analogous to the canonical case of SINR models with Rayleigh fading. Additionally, when the power of the desired signal is exponentially distributed, we establish that the meta-distribution may be obtained in terms of the limit of a sequence expressed in terms of the matrix Laplace transform of a related Poisson shot noise process.
△ Less
Submitted 26 October, 2024; v1 submitted 7 June, 2024;
originally announced June 2024.
-
An explicit granular-mechanics approach to marine sediment acoustics
Authors:
Abram H. Clark,
Derek R. Olson,
Andrew J. Swartz,
W. Mason Starnes
Abstract:
Here we theoretically and computationally study the frequency dependence of phase speed and attenuation for marine sediments from the perspective of granular mechanics. We leverage recent theoretical insights from the granular physics community as well as discrete-element method simulations, where the granular material is treated as a packing of discrete objects that interact via pairwise forces.…
▽ More
Here we theoretically and computationally study the frequency dependence of phase speed and attenuation for marine sediments from the perspective of granular mechanics. We leverage recent theoretical insights from the granular physics community as well as discrete-element method simulations, where the granular material is treated as a packing of discrete objects that interact via pairwise forces. These pairwise forces include both repulsive contact forces as well as dissipative terms which may include losses from the fluid as well as losses from inelasticity at grain-grain contacts. We show that the structure of disordered granular packings leads to anomalous scaling laws for frequency-dependent phase speed and attenuation that do not follow from a continuum treatment. Our results demonstrate that granular packing structure, which is not explicitly considered in existing models, may play a crucial role in a complete theory of sediment acoustics. While this simple approach does not explicitly treat sound propagation or inertial effects in the interstitial fluid, it provides a starting point for future models that include these and other more complex features.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Geometric theory on large-scale and local determination of density dependence of a recovering large carnivore population
Authors:
Yunyi Shen,
Erik R. Olson,
Timothy R. Van Deelen
Abstract:
Density-dependent population growth is a feature of large carnivores like wolves ($\textit{Canis lupus}$), with mechanisms typically attributed to resource (e.g. prey) limitation. Such mechanisms are local phenomena and rely on individuals having access to information, such as prey availability at their location. Using over four decades of wolf population and range expansion data from Wisconsin (U…
▽ More
Density-dependent population growth is a feature of large carnivores like wolves ($\textit{Canis lupus}$), with mechanisms typically attributed to resource (e.g. prey) limitation. Such mechanisms are local phenomena and rely on individuals having access to information, such as prey availability at their location. Using over four decades of wolf population and range expansion data from Wisconsin (USA) wolves, we found that the population not only exhibited density dependence locally but also at landscape scale. Superficially, one may consider space as yet another limiting resource to explain landscape-scale density dependence. However, this view poses an information puzzle: most individuals do not have access to global information such as range-wide habitat availability as they would for local prey availability. How would the population "know" when to slow their range expansion? To understand observed large-scale spatial density dependence, we propose a reaction-diffusion model, first introduced by Fisher and Kolmogorov, with a "travelling wave" solution, wherein the population expands from a core range that quickly achieves local carrying capacity. Early-stage acceleration and later-stage deceleration of population growth can be explained by early elongation of an expanding frontier and a later collision of the expanding frontier with a habitat boundary. Such a process does not require individuals to have global density information. We illustrate our proposal with simulations and spatial visualizations of wolf recolonization in the western Great Lakes region over time relative to habitat suitability. We further synthesize previous studies on wolf habitat selection in the western Great Lakes region and argue that the habitat boundary appeared to be driven by spatial variation in mortality, likely associated with human use of the landscape.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
Comparison of model selection techniques for seafloor scattering statistics
Authors:
Derek R Olson,
Marc Geilhufe
Abstract:
In quantitative analysis of seafloor imagery, it is common to model the collection of individual pixel intensities scattered by the seafloor as a random variable with a given statistical distribution. There is a considerable literature on statistical models for seafloor scattering, mostly focused on areas with statistically homogeneous properties (i.e. exhibiting spatial stationarity). For more co…
▽ More
In quantitative analysis of seafloor imagery, it is common to model the collection of individual pixel intensities scattered by the seafloor as a random variable with a given statistical distribution. There is a considerable literature on statistical models for seafloor scattering, mostly focused on areas with statistically homogeneous properties (i.e. exhibiting spatial stationarity). For more complex seafloors, the pixel intensity distribution is more appropriately modeled using a mixture of simple distributions. For very complex seafloors, fitting 3 or more mixture components makes physical sense, but the statistical model becomes much more complex in these cases. Therefore, picking the number of components of the mixture model is a decision that must be made, using a priori information, or using a data driven approach. However, this information is time consuming to collect, and depends on the skill and experience of the human. Therefore, a data-driven approach is advantageous to use, and is explored in this work. Criteria for choosing a model always need to balance the trade-off for the best fit for the data on the one hand and the model complexity on the other hand. In this work, we compare several statistical model selection criteria, e.g., the Bayesian information criterion. Examples are given for SAS data collected by an autonomous underwater vehicle in a rocky environment off the coast of Bergen, Norway using data from the HISAS-1032 synthetic aperture sonar system.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Computer Simulation of Carbonization and Graphitization of Coal
Authors:
C. Ugwumadu,
R. Olson III,
N. L. Smith,
K. Nepal,
Y. Al-Majali,
J. Trembly,
D. A. Drabold
Abstract:
This study describes computer simulations of carbonization and graphite formation, including the effects of hydrogen, nitrogen, oxygen, and sulfur. We introduce a novel technique to simulate carbonization, "Simulation of Thermal Emission of Atoms and Molecules (STEAM)," designed to elucidate the removal of volatiles and density variations in carbonization residue. The investigation extensively ana…
▽ More
This study describes computer simulations of carbonization and graphite formation, including the effects of hydrogen, nitrogen, oxygen, and sulfur. We introduce a novel technique to simulate carbonization, "Simulation of Thermal Emission of Atoms and Molecules (STEAM)," designed to elucidate the removal of volatiles and density variations in carbonization residue. The investigation extensively analyzes the functional groups that endure through high-temperature carbonization and examines the graphitization processes in carbon-rich materials containing non-carbon "impurity elements". The physical, vibrational, and electronic attributes of impure amorphous graphite are analyzed, and the impact of nitrogen on electronic conduction is investigated, revealing its substitutional integration into the sp$^2$ layered network.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Coverage and Rate of Joint Communication and Parameter Estimation in Wireless Networks
Authors:
Nicholas R. Olson,
Jeffrey G. Andrews,
Robert W. Heath Jr
Abstract:
From an information theoretic perspective, joint communication and sensing (JCAS) represents a natural generalization of communication network functionality. However, it requires the re-evaluation of network performance from a multi-objective perspective. We develop a novel mathematical framework for characterizing the sensing and communication coverage probability and ergodic rate in JCAS network…
▽ More
From an information theoretic perspective, joint communication and sensing (JCAS) represents a natural generalization of communication network functionality. However, it requires the re-evaluation of network performance from a multi-objective perspective. We develop a novel mathematical framework for characterizing the sensing and communication coverage probability and ergodic rate in JCAS networks. We employ a formulation of sensing parameter estimation based on mutual information to extend the notions of coverage probability and ergodic rate to the radar setting. We define sensing coverage probability as the probability that the rate of information extracted about the parameters of interest associated with a typical radar target exceeds some threshold, and sensing ergodic rate as the spatial average of the aforementioned rate of information. Using this framework, we analyze the downlink sensing and communication coverage and rate of a mmWave JCAS network employing a shared waveform, directional beamforming, and monostatic sensing. Leveraging tools from stochastic geometry, we derive upper and lower bounds for these quantities. We also develop several general technical results including: i) a generic method for obtaining closed form upper and lower bounds on the Laplace Transform of a shot noise process, ii) a new analog of H{ö}lder's Inequality to the setting of harmonic means, and iii) a relation between the Laplace and Mellin Transforms of a non-negative random variable. We use the derived bounds to numerically investigate the performance of JCAS networks under varying base station and blockage density. Among several insights, our numerical analysis indicates that network densification improves sensing SINR performance -- in contrast to communications.
△ Less
Submitted 15 January, 2024; v1 submitted 5 October, 2022;
originally announced October 2022.
-
A series approximation to the Kirchhoff integral for Gaussian and exponential roughness covariance functions
Authors:
Derek R. Olson
Abstract:
The Kirchhoff integral is a fundamental integral in scattering theory, appearing in both the Kirchhoff approximation, as well as the small slope approximation. In this work, a functional Taylor series approximation to the-eps-converted-to.pdf Kirchhoff integral is presented, under the condition that the roughness covariance function follows either an exponential or Gaussian form--in both the one-d…
▽ More
The Kirchhoff integral is a fundamental integral in scattering theory, appearing in both the Kirchhoff approximation, as well as the small slope approximation. In this work, a functional Taylor series approximation to the-eps-converted-to.pdf Kirchhoff integral is presented, under the condition that the roughness covariance function follows either an exponential or Gaussian form--in both the one-dimensional and two-dimensional cases. Previous approximations to the Kirchhoff integral [Gragg et al. J. Acoust. Soc. Am. 2001, Drumheller and Gragg J. Acoust. Soc. Am., 2001] assumed that the outer scale of the roughness was very large compared to the wavelength, whereas the proposed method can treat arbitrary outer scales. Assuming an infinite outer scale implies that the root mean square (rms) roughness is infinite. The proposed method can efficiently treat surfaces with finite outer scale, and therefore finite rms height. This series is shown to converge independently of roughness or acoustic parameters, and converges to within roundoff error with a reasonable number of terms for a wide variety of dimensionless roughness parameters. The series converges quickly when the dimensionless rms height is small, and slowly when it is large.
△ Less
Submitted 15 June, 2021;
originally announced June 2021.
-
Resolution dependence of rough surface scattering using a power law roughness spectrum
Authors:
Derek R. Olson,
Anthony P. Lyons
Abstract:
Contemporary high-resolution sonar systems use broadband pulses and long arrays to achieve high resolution. It is important to understand effects that high-resolution sonar systems might have on quantitative measures of the scattered field due to the seafloor. A quantity called the broadband scattering cross section is defined, appropriate for high-resolution measurements. The dependence of the br…
▽ More
Contemporary high-resolution sonar systems use broadband pulses and long arrays to achieve high resolution. It is important to understand effects that high-resolution sonar systems might have on quantitative measures of the scattered field due to the seafloor. A quantity called the broadband scattering cross section is defined, appropriate for high-resolution measurements. The dependence of the broadband scattering cross section, $σ_{bb}$ and the scintillation index, $SI$ on resolution was investigated for one-dimensional rough surfaces with power-law spectra and backscattering geometries. Using integral equations and Fourier synthesis, no resolution dependence of $σ_{bb}$ was found. The incoherently-averaged frequency-domain scattering cross section has negligible bandwidth dependence. $SI$ increases as resolution increases, grazing angle decreases, and spectral strength increases. This trend is confirmed for center frequencies of 100 kHz and 10 kHz, as well as for power-law spectral exponents of 1.5, 2, and 2.5. The hypothesis that local tilting at the scale of the acoustic resolution is responsible for intensity fluctuations was examined using a representative model for the effect of slopes (inspired by the composite roughness approximation). It was found that slopes are responsible in part for the fluctuations, but other effects, such as multiple scattering and shadowing may also play a role.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
Scattering from layered seafloors: Comparisons between theory and integral equations
Authors:
Derek R. Olson,
Darrell Jackson
Abstract:
Acoustic scattering from layered seafloors exhibits dependence on both the mean geoacoustic layering, as well as the roughness properties of each layer. Several theoretical treatments of this environment exist, including the small roughness perturbation approximation, the Kirchhoff approximation, and three different versions of the small slope approximation. All of these models give different resu…
▽ More
Acoustic scattering from layered seafloors exhibits dependence on both the mean geoacoustic layering, as well as the roughness properties of each layer. Several theoretical treatments of this environment exist, including the small roughness perturbation approximation, the Kirchhoff approximation, and three different versions of the small slope approximation. All of these models give different results for the scattering cross section and coherent reflection coefficient, and there is currently no way to distinguish which model is the most correct. In this work, an integral equation for scattering from a layered seafloor with rough interfaces is presented, and compared with small roughness perturbation method, and two of the small slope approximations. It is found that the most recent small slope approximation by Jackson and Olson is the most accurate when the root mean square (rms) roughness is large, and some models are in close agreement with each other when the rms roughness is small.
△ Less
Submitted 25 September, 2020;
originally announced September 2020.
-
Scattering statistics of rock outcrops: Model-data comparisons and Bayesian inference using mixture distributions
Authors:
Derek R. Olson,
Anthony P. Lyons,
Douglas A. Abraham,
Torstein O. Sæbø
Abstract:
The probability density function of the acoustic field amplitude scattered by the seafloor was measured in a rocky environment off the coast of Norway using a synthetic aperture sonar system, and is reported here in terms of the probability of false alarm. Interpretation of the measurements focused on finding appropriate class of statistical models (single versus two-component mixture models), and…
▽ More
The probability density function of the acoustic field amplitude scattered by the seafloor was measured in a rocky environment off the coast of Norway using a synthetic aperture sonar system, and is reported here in terms of the probability of false alarm. Interpretation of the measurements focused on finding appropriate class of statistical models (single versus two-component mixture models), and on appropriate models within these two classes. It was found that two-component mixture models performed better than single models. The two mixture models that performed the best (and had a basis in the physics of scattering) were a mixture between two K distributions, and a mixture between a Rayleigh and generalized Pareto distribution. Bayes' theorem was used to estimate the probability density function of the mixture model parameters. It was found that the K-K mixture exhibits significant correlation between its parameters. The mixture between the Rayleigh and generalized Pareto distributions also had significant parameter correlation, but also contained multiple modes. We conclude that the mixture between two K distributions is the most applicable to this dataset.
△ Less
Submitted 23 January, 2019;
originally announced January 2019.
-
Accounting for Skill in Trend, Variability, and Autocorrelation Facilitates Better Multi-Model Projections: Application to the AMOC and Temperature Time Series
Authors:
Roman Olson,
Soon-Il An,
Yanan Fan,
Jason P. Evans
Abstract:
We present a novel quasi-Bayesian method to weight multiple dynamical models by their skill at capturing both potentially non-linear trends and first-order autocorrelated variability of the underlying process, and to make weighted probabilistic projections. We validate the method using a suite of one-at-a-time cross-validation experiments involving Atlantic meridional overturning circulation (AMOC…
▽ More
We present a novel quasi-Bayesian method to weight multiple dynamical models by their skill at capturing both potentially non-linear trends and first-order autocorrelated variability of the underlying process, and to make weighted probabilistic projections. We validate the method using a suite of one-at-a-time cross-validation experiments involving Atlantic meridional overturning circulation (AMOC), its temperature-based index, as well as Korean summer mean maximum temperature. In these experiments the method tends to exhibit superior skill over a trend-only Bayesian model averaging weighting method in terms of weight assignment and probabilistic forecasts. Specifically, mean credible interval width, and mean absolute error of the projections tend to improve. We apply the method to a problem of projecting summer mean maximum temperature change over Korea by the end of the 21st century using a multi-model ensemble. Compared to the trend-only method, the new method appreciably sharpens the probability distribution function (pdf) and increases future most likely, median, and mean warming in Korea. The method is flexible, with a potential to improve forecasts in geosciences and other fields.
△ Less
Submitted 17 April, 2019; v1 submitted 7 November, 2018;
originally announced November 2018.
-
Layered TPOT: Speeding up Tree-based Pipeline Optimization
Authors:
Pieter Gijsbers,
Joaquin Vanschoren,
Randal S. Olson
Abstract:
With the demand for machine learning increasing, so does the demand for tools which make it easier to use. Automated machine learning (AutoML) tools have been developed to address this need, such as the Tree-Based Pipeline Optimization Tool (TPOT) which uses genetic programming to build optimal pipelines. We introduce Layered TPOT, a modification to TPOT which aims to create pipelines equally good…
▽ More
With the demand for machine learning increasing, so does the demand for tools which make it easier to use. Automated machine learning (AutoML) tools have been developed to address this need, such as the Tree-Based Pipeline Optimization Tool (TPOT) which uses genetic programming to build optimal pipelines. We introduce Layered TPOT, a modification to TPOT which aims to create pipelines equally good as the original, but in significantly less time. This approach evaluates candidate pipelines on increasingly large subsets of the data according to their fitness, using a modified evolutionary algorithm to allow for separate competition between pipelines trained on different sample sizes. Empirical evaluation shows that, on sufficiently large datasets, Layered TPOT indeed finds better models faster.
△ Less
Submitted 12 March, 2018; v1 submitted 18 January, 2018;
originally announced January 2018.
-
Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data Mining
Authors:
Ryan J. Urbanowicz,
Randal S. Olson,
Peter Schmitt,
Melissa Meeker,
Jason H. Moore
Abstract:
Modern biomedical data mining requires feature selection methods that can (1) be applied to large scale feature spaces (e.g. `omics' data), (2) function in noisy problems, (3) detect complex patterns of association (e.g. gene-gene interactions), (4) be flexibly adapted to various problem domains and data types (e.g. genetic variants, gene expression, and clinical data) and (5) are computationally…
▽ More
Modern biomedical data mining requires feature selection methods that can (1) be applied to large scale feature spaces (e.g. `omics' data), (2) function in noisy problems, (3) detect complex patterns of association (e.g. gene-gene interactions), (4) be flexibly adapted to various problem domains and data types (e.g. genetic variants, gene expression, and clinical data) and (5) are computationally tractable. To that end, this work examines a set of filter-style feature selection algorithms inspired by the `Relief' algorithm, i.e. Relief-Based algorithms (RBAs). We implement and expand these RBAs in an open source framework called ReBATE (Relief-Based Algorithm Training Environment). We apply a comprehensive genetic simulation study comparing existing RBAs, a proposed RBA called MultiSURF, and other established feature selection methods, over a variety of problems. The results of this study (1) support the assertion that RBAs are particularly flexible, efficient, and powerful feature selection methods that differentiate relevant features having univariate, multivariate, epistatic, or heterogeneous associations, (2) confirm the efficacy of expansions for classification vs. regression, discrete vs. continuous features, missing data, multiple classes, or class imbalance, (3) identify previously unknown limitations of specific RBAs, and (4) suggest that while MultiSURF* performs best for explicitly identifying pure 2-way interactions, MultiSURF yields the most reliable feature selection performance across a wide range of problem types.
△ Less
Submitted 2 April, 2018; v1 submitted 22 November, 2017;
originally announced November 2017.
-
Relief-Based Feature Selection: Introduction and Review
Authors:
Ryan J. Urbanowicz,
Melissa Meeker,
William LaCava,
Randal S. Olson,
Jason H. Moore
Abstract:
Feature selection plays a critical role in biomedical data mining, driven by increasing feature dimensionality in target problems and growing interest in advanced but computationally expensive methodologies able to model complex associations. Specifically, there is a need for feature selection methods that are computationally efficient, yet sensitive to complex patterns of association, e.g. intera…
▽ More
Feature selection plays a critical role in biomedical data mining, driven by increasing feature dimensionality in target problems and growing interest in advanced but computationally expensive methodologies able to model complex associations. Specifically, there is a need for feature selection methods that are computationally efficient, yet sensitive to complex patterns of association, e.g. interactions, so that informative features are not mistakenly eliminated prior to downstream modeling. This paper focuses on Relief-based algorithms (RBAs), a unique family of filter-style feature selection algorithms that have gained appeal by striking an effective balance between these objectives while flexibly adapting to various data characteristics, e.g. classification vs. regression. First, this work broadly examines types of feature selection and defines RBAs within that context. Next, we introduce the original Relief algorithm and associated concepts, emphasizing the intuition behind how it works, how feature weights generated by the algorithm can be interpreted, and why it is sensitive to feature interactions without evaluating combinations of features. Lastly, we include an expansive review of RBA methodological research beyond Relief and its popular descendant, ReliefF. In particular, we characterize branches of RBA research, and provide comparative summaries of RBA algorithms including contributions, strategies, functionality, time complexity, adaptation to key data characteristics, and software availability.
△ Less
Submitted 2 April, 2018; v1 submitted 22 November, 2017;
originally announced November 2017.
-
Considerations of automated machine learning in clinical metabolic profiling: Altered homocysteine plasma concentration associated with metformin exposure
Authors:
Alena Orlenko,
Jason H. Moore,
Patryk Orzechowski,
Randal S. Olson,
Junmei Cairns,
Pedro J. Caraballo,
Richard M. Weinshilboum,
Liewei Wang,
Matthew K. Breitenstein
Abstract:
With the maturation of metabolomics science and proliferation of biobanks, clinical metabolic profiling is an increasingly opportunistic frontier for advancing translational clinical research. Automated Machine Learning (AutoML) approaches provide exciting opportunity to guide feature selection in agnostic metabolic profiling endeavors, where potentially thousands of independent data points must b…
▽ More
With the maturation of metabolomics science and proliferation of biobanks, clinical metabolic profiling is an increasingly opportunistic frontier for advancing translational clinical research. Automated Machine Learning (AutoML) approaches provide exciting opportunity to guide feature selection in agnostic metabolic profiling endeavors, where potentially thousands of independent data points must be evaluated. In previous research, AutoML using high-dimensional data of varying types has been demonstrably robust, outperforming traditional approaches. However, considerations for application in clinical metabolic profiling remain to be evaluated. Particularly, regarding the robustness of AutoML to identify and adjust for common clinical confounders. In this study, we present a focused case study regarding AutoML considerations for using the Tree-Based Optimization Tool (TPOT) in metabolic profiling of exposure to metformin in a biobank cohort. First, we propose a tandem rank-accuracy measure to guide agnostic feature selection and corresponding threshold determination in clinical metabolic profiling endeavors. Second, while AutoML, using default parameters, demonstrated potential to lack sensitivity to low-effect confounding clinical covariates, we demonstrated residual training and adjustment of metabolite features as an easily applicable approach to ensure AutoML adjustment for potential confounding characteristics. Finally, we present increased homocysteine with long-term exposure to metformin as a potentially novel, non-replicated metabolite association suggested by TPOT; an association not identified in parallel clinical metabolic profiling endeavors. While considerations are recommended, including adjustment approaches for clinical confounders, AutoML presents an exciting tool to enhance clinical metabolic profiling and advance translational research endeavors.
△ Less
Submitted 9 October, 2017;
originally announced October 2017.
-
Markov Brains: A Technical Introduction
Authors:
Arend Hintze,
Jeffrey A. Edlund,
Randal S. Olson,
David B. Knoester,
Jory Schossau,
Larissa Albantakis,
Ali Tehrani-Saleh,
Peter Kvam,
Leigh Sheneman,
Heather Goldsby,
Clifford Bohm,
Christoph Adami
Abstract:
Markov Brains are a class of evolvable artificial neural networks (ANN). They differ from conventional ANNs in many aspects, but the key difference is that instead of a layered architecture, with each node performing the same function, Markov Brains are networks built from individual computational components. These computational components interact with each other, receive inputs from sensors, and…
▽ More
Markov Brains are a class of evolvable artificial neural networks (ANN). They differ from conventional ANNs in many aspects, but the key difference is that instead of a layered architecture, with each node performing the same function, Markov Brains are networks built from individual computational components. These computational components interact with each other, receive inputs from sensors, and control motor outputs. The function of the computational components, their connections to each other, as well as connections to sensors and motors are all subject to evolutionary optimization. Here we describe in detail how a Markov Brain works, what techniques can be used to study them, and how they can be evolved.
△ Less
Submitted 16 September, 2017;
originally announced September 2017.
-
Data-driven Advice for Applying Machine Learning to Bioinformatics Problems
Authors:
Randal S. Olson,
William La Cava,
Zairah Mustahsan,
Akshay Varik,
Jason H. Moore
Abstract:
As the bioinformatics field grows, it must keep pace not only with new data but with new algorithms. Here we contribute a thorough analysis of 13 state-of-the-art, commonly used machine learning algorithms on a set of 165 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. We present a number of statistical and visual compari…
▽ More
As the bioinformatics field grows, it must keep pace not only with new data but with new algorithms. Here we contribute a thorough analysis of 13 state-of-the-art, commonly used machine learning algorithms on a set of 165 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. We present a number of statistical and visual comparisons of algorithm performance and quantify the effect of model selection and algorithm tuning for each algorithm and dataset. The analysis culminates in the recommendation of five algorithms with hyperparameters that maximize classifier performance across the tested problems, as well as general guidelines for applying machine learning to supervised classification problems.
△ Less
Submitted 7 January, 2018; v1 submitted 8 August, 2017;
originally announced August 2017.
-
A System for Accessible Artificial Intelligence
Authors:
Randal S. Olson,
Moshe Sipper,
William La Cava,
Sharon Tartarone,
Steven Vitale,
Weixuan Fu,
Patryk Orzechowski,
Ryan J. Urbanowicz,
John H. Holmes,
Jason H. Moore
Abstract:
While artificial intelligence (AI) has become widespread, many commercial AI systems are not yet accessible to individual researchers nor the general public due to the deep knowledge of the systems required to use them. We believe that AI has matured to the point where it should be an accessible technology for everyone. We present an ongoing project whose ultimate goal is to deliver an open source…
▽ More
While artificial intelligence (AI) has become widespread, many commercial AI systems are not yet accessible to individual researchers nor the general public due to the deep knowledge of the systems required to use them. We believe that AI has matured to the point where it should be an accessible technology for everyone. We present an ongoing project whose ultimate goal is to deliver an open source, user-friendly AI system that is specialized for machine learning analysis of complex data in the biomedical and health care domains. We discuss how genetic programming can aid in this endeavor, and highlight specific examples where genetic programming has automated machine learning analyses in previous projects.
△ Less
Submitted 10 August, 2017; v1 submitted 1 May, 2017;
originally announced May 2017.
-
Survey techniques, detection probabilities, and the relative abundance of the carnivore guild on the Apostle Islands (2014-2016)
Authors:
Maximilian L. Allen,
Bryn E. Evans,
Michael E. Wheeler,
Marcus A. Mueller,
Kenneth Pemble,
Erik R. Olson,
Julie Van Stappen,
Timothy R. Van Deelen
Abstract:
Carnivores are important components of ecosystems with wide-ranging effects on ecological communities.We studied the carnivore community in the Apostle Islands National Lakeshore (APIS), where the presence, distribution, and populations of carnivores was largely unknown. We developed a systematic method to deploy camera traps across a grid while targeting fine-scale features to maximize carnivore…
▽ More
Carnivores are important components of ecosystems with wide-ranging effects on ecological communities.We studied the carnivore community in the Apostle Islands National Lakeshore (APIS), where the presence, distribution, and populations of carnivores was largely unknown. We developed a systematic method to deploy camera traps across a grid while targeting fine-scale features to maximize carnivore detection (Appendix 1), including systematic methods for organizing and tagging the photo data (Appendix 2). We deployed 88 cameras on 13 islands from 2014-2016. We collected 92,694 photographs across 18,721 trap nights, including 3,591 wildlife events and 1,070 carnivore events. We had a mean of 6.6 cameras per island (range 2-30), and our camera density averaged 1.23 (range 0.74-3.08) cameras/ km2. We detected 27 species and 10 terrestrial carnivores, including surprising detections of American martens (Martes americana) and gray wolves (Canis lupus). The mean richness of carnivores on an island was 3.23 (range 0-10). The best single variable to explain carnivore richness on the Apostle Islands was island size, while the best model was island size (positive correlation) and distance from mainland (negative correlation) (R2 = 0.92). Relative abundances for carnivores ranged from a low of 0.01 for weasels (Mustela spp.) to a high of 2.64 for black bears (Ursus americanus), and the relative abundance of a species was significantly correlated with the number of islands on which they were found. Carnivore occupancy ranged from lows of 0.09 for gray wolves and 0.11 for weasels to a high of 0.82 for black bears. Fuller understanding of APIS ecology will require on-going monitoring of carnivores to evaluate temporal dynamics as well as related ecological evaluations (e.g. small mammal dynamics, plant community dynamics) to understand trophic effects.
△ Less
Submitted 31 January, 2018; v1 submitted 30 March, 2017;
originally announced March 2017.
-
PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison
Authors:
Randal S. Olson,
William La Cava,
Patryk Orzechowski,
Ryan J. Urbanowicz,
Jason H. Moore
Abstract:
The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchma…
▽ More
The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. This work is an important first step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.
△ Less
Submitted 1 March, 2017;
originally announced March 2017.
-
Toward the automated analysis of complex diseases in genome-wide association studies using genetic programming
Authors:
Andrew Sohn,
Randal S. Olson,
Jason H. Moore
Abstract:
Machine learning has been gaining traction in recent years to meet the demand for tools that can efficiently analyze and make sense of the ever-growing databases of biomedical data in health care systems around the world. However, effectively using machine learning methods requires considerable domain expertise, which can be a barrier of entry for bioinformaticians new to computational data scienc…
▽ More
Machine learning has been gaining traction in recent years to meet the demand for tools that can efficiently analyze and make sense of the ever-growing databases of biomedical data in health care systems around the world. However, effectively using machine learning methods requires considerable domain expertise, which can be a barrier of entry for bioinformaticians new to computational data science methods. Therefore, off-the-shelf tools that make machine learning more accessible can prove invaluable for bioinformaticians. To this end, we have developed an open source pipeline optimization tool (TPOT-MDR) that uses genetic programming to automatically design machine learning pipelines for bioinformatics studies. In TPOT-MDR, we implement Multifactor Dimensionality Reduction (MDR) as a feature construction method for modeling higher-order feature interactions, and combine it with a new expert knowledge-guided feature selector for large biomedical data sets. We demonstrate TPOT-MDR's capabilities using a combination of simulated and real world data sets from human genetics and find that TPOT-MDR significantly outperforms modern machine learning methods such as logistic regression and eXtreme Gradient Boosting (XGBoost). We further analyze the best pipeline discovered by TPOT-MDR for a real world problem and highlight TPOT-MDR's ability to produce a high-accuracy solution that is also easily interpretable.
△ Less
Submitted 6 February, 2017;
originally announced February 2017.
-
The first cryogenic DT layered, beryllium capsule implosion at the National Ignition Facility
Authors:
D. C. Wilson,
J. L. Kline,
S. A. Yi,
A. N. Simakov,
G. A. Kyrala,
R. E. Olson,
T. S. Perry,
F. E. Merrill,
S. Batha,
A. B. Zylstra,
D. A. Callahan,
W. Cassata,
E. L. Dewald,
S. W. Haan,
D. E. Hinkel,
O. A. Hurricane,
N. Izumi,
T. Ma,
A. G. MacPhee,
J. L. Milovich,
J. E. Ralph,
J. R. Rygg,
M. B. Schneider,
S. Sepke,
D. J. Strozzi
, et al. (4 additional authors not shown)
Abstract:
NIF experiments with Be capsules have followed a path of the highly successful "high-foot" CH capsules. Several keyhole and ConA targets preceeded a DT layered shot. In addition to backscatter subtraction, laser drive multipliers were needed to match observed X-ray drives. Those for the picket (0.95), trough (1.0) and second pulse (0.80) were determined by VISAR measurements. The time dependence o…
▽ More
NIF experiments with Be capsules have followed a path of the highly successful "high-foot" CH capsules. Several keyhole and ConA targets preceeded a DT layered shot. In addition to backscatter subtraction, laser drive multipliers were needed to match observed X-ray drives. Those for the picket (0.95), trough (1.0) and second pulse (0.80) were determined by VISAR measurements. The time dependence of the Dante total x-ray flux and its fraction > 1.8 keV reflect the time dependence of the multipliers. A two step drive multiplier for the main pulse can match implosion times, but Dante measurements suggest the drive multiplier must increase late in time. With a single set of time dependent, multi-level multipliers the Dante data are well matched. These same third pulse drive multipliers also match the implosion times and Dante signals for two CH capsule DT. One discrepancy in the calculations is the X-ray flux in the picket. Calculations over-estimate the flux > 1.8 keV by a factor of ~100, while getting the total flux correctly. These harder X-rays cause an expansion of the Be/fuel interface of 2-3 km/s before the arrival of the first shock. VISAR measurements show only 0.2 to 0.3 km/s. The X-ray drive on the DT Be capsule was further degraded by a random decrease of 9% in the total picket flux. This small change caused the capsule fuel to change from an adiabat of 1.8 to 2.3 by mistiming of the first and second shocks. With this shock tuning and adjustments to the calculation, the first NIF Be capsule implosion achieved 29% of calculated yield, comparable to the CH DT capsules of 68% and 21%. Inclusion of a large M1 asymmetry in the DT ice layer and mixing from instability growth may help explain this final degradation. In summary when driven similarly the Be capsules performed like CH capsules. Performance degradation for both seems to be dominated by drive and capsule asymmetries.
△ Less
Submitted 31 January, 2017;
originally announced January 2017.
-
Identifying and Harnessing the Building Blocks of Machine Learning Pipelines for Sensible Initialization of a Data Science Automation Tool
Authors:
Randal S. Olson,
Jason H. Moore
Abstract:
As data science continues to grow in popularity, there will be an increasing need to make data science tools more scalable, flexible, and accessible. In particular, automated machine learning (AutoML) systems seek to automate the process of designing and optimizing machine learning pipelines. In this chapter, we present a genetic programming-based AutoML system called TPOT that optimizes a series…
▽ More
As data science continues to grow in popularity, there will be an increasing need to make data science tools more scalable, flexible, and accessible. In particular, automated machine learning (AutoML) systems seek to automate the process of designing and optimizing machine learning pipelines. In this chapter, we present a genetic programming-based AutoML system called TPOT that optimizes a series of feature preprocessors and machine learning models with the goal of maximizing classification accuracy on a supervised classification problem. Further, we analyze a large database of pipelines that were previously used to solve various supervised classification problems and identify 100 short series of machine learning operations that appear the most frequently, which we call the building blocks of machine learning pipelines. We harness these building blocks to initialize TPOT with promising solutions, and find that this sensible initialization method significantly improves TPOT's performance on one benchmark at no cost of significantly degrading performance on the others. Thus, sensible initialization with machine learning pipeline building blocks shows promise for GP-based AutoML systems, and should be further refined in future work.
△ Less
Submitted 29 July, 2016;
originally announced July 2016.
-
Evolution of active categorical image classification via saccadic eye movement
Authors:
Randal S. Olson,
Jason H. Moore,
Christoph Adami
Abstract:
Pattern recognition and classification is a central concern for modern information processing systems. In particular, one key challenge to image and video classification has been that the computational cost of image processing scales linearly with the number of pixels in the image or video. Here we present an intelligent machine (the "active categorical classifier," or ACC) that is inspired by the…
▽ More
Pattern recognition and classification is a central concern for modern information processing systems. In particular, one key challenge to image and video classification has been that the computational cost of image processing scales linearly with the number of pixels in the image or video. Here we present an intelligent machine (the "active categorical classifier," or ACC) that is inspired by the saccadic movements of the eye, and is capable of classifying images by selectively scanning only a portion of the image. We harness evolutionary computation to optimize the ACC on the MNIST hand-written digit classification task, and provide a proof-of-concept that the ACC works on noisy multi-class data. We further analyze the ACC and demonstrate its ability to classify images after viewing only a fraction of the pixels, and provide insight on future research paths to further improve upon the ACC presented here.
△ Less
Submitted 16 June, 2016; v1 submitted 27 March, 2016;
originally announced March 2016.
-
Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science
Authors:
Randal S. Olson,
Nathan Bartley,
Ryan J. Urbanowicz,
Jason H. Moore
Abstract:
As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and d…
▽ More
As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user. We also address the tendency for TPOT to design overly complex pipelines by integrating Pareto optimization, which produces compact pipelines without sacrificing classification accuracy. As such, this work represents an important step toward fully automating machine learning pipeline design.
△ Less
Submitted 20 March, 2016;
originally announced March 2016.
-
Exploring the coevolution of predator and prey morphology and behavior
Authors:
Randal S. Olson,
Arend Hintze,
Fred C. Dyer,
Jason H. Moore,
Christoph Adami
Abstract:
A common idiom in biology education states, "Eyes in the front, the animal hunts. Eyes on the side, the animal hides." In this paper, we explore one possible explanation for why predators tend to have forward-facing, high-acuity visual systems. We do so using an agent-based computational model of evolution, where predators and prey interact and adapt their behavior and morphology to one another ov…
▽ More
A common idiom in biology education states, "Eyes in the front, the animal hunts. Eyes on the side, the animal hides." In this paper, we explore one possible explanation for why predators tend to have forward-facing, high-acuity visual systems. We do so using an agent-based computational model of evolution, where predators and prey interact and adapt their behavior and morphology to one another over successive generations of evolution. In this model, we observe a coevolutionary cycle between prey swarming behavior and the predator's visual system, where the predator and prey continually adapt their visual system and behavior, respectively, over evolutionary time in reaction to one another due to the well-known "predator confusion effect." Furthermore, we provide evidence that the predator visual system is what drives this coevolutionary cycle, and suggest that the cycle could be closed if the predator evolves a hybrid visual system capable of narrow, high-acuity vision for tracking prey as well as broad, coarse vision for prey discovery. Thus, the conflicting demands imposed on a predator's visual system by the predator confusion effect could have led to the evolution of complex eyes in many predators.
△ Less
Submitted 28 February, 2016;
originally announced February 2016.
-
Automating biomedical data science through tree-based pipeline optimization
Authors:
Randal S. Olson,
Ryan J. Urbanowicz,
Peter C. Andrews,
Nicole A. Lavender,
La Creis Kidd,
Jason H. Moore
Abstract:
Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and…
▽ More
Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we show that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators---such as synthetic feature constructors---that significantly improve classification accuracy on these data sets. We also highlight the current challenges to pipeline optimization, such as the tendency to produce pipelines that overfit the data, and suggest future research paths to overcome these challenges. As such, this work represents an early step toward fully automating machine learning pipeline design.
△ Less
Submitted 28 January, 2016;
originally announced January 2016.
-
Measurements of high-frequency acoustic scattering from glacially-eroded rock outcrops
Authors:
Derek R. Olson,
Anthony P. Lyons,
Torstein Sæbø
Abstract:
Measurements of acoustic backscattering from glacially-eroded rock outcrops were made off the coast of Sandefjord, Norway using a high-frequency synthetic aperture sonar (SAS) system. A method by which scattering strength can be estimated from data collected by a SAS system is detailed, as well as a method to estimate an effective calibration parameter for the system. Scattering strength measureme…
▽ More
Measurements of acoustic backscattering from glacially-eroded rock outcrops were made off the coast of Sandefjord, Norway using a high-frequency synthetic aperture sonar (SAS) system. A method by which scattering strength can be estimated from data collected by a SAS system is detailed, as well as a method to estimate an effective calibration parameter for the system. Scattering strength measurements from very smooth areas of the rock outcrops agree with predictions from both the small-slope approximation and perturbation theory, and range between -33 and -26 dB at 20$^\circ$ grazing angle. Scattering strength measurements from very rough areas of the rock outcrops agree with the sine-squared shape of the empirical Lambertian model and fall between -30 and -20 dB at 20$^\circ$ grazing angle. Both perturbation theory and the small-slope approximation are expected to be inaccurate for the very rough area, and overestimate scattering strength by 8 dB or more for all measurements of very rough surfaces. Supporting characterization of the environment was performed in the form of geoacoustic and roughness parameter estimates.
△ Less
Submitted 13 April, 2016; v1 submitted 22 January, 2016;
originally announced January 2016.
-
Responses to remixing on a social media sharing website
Authors:
Benjamin Mako Hill,
Andrés Monroy-Hernández,
Kristina R. Olson
Abstract:
In this paper we describe the ways participants of the Scratch online community, primarily young people, engage in remixing of each others' shared animations, games, and interactive projects. In particular, we try to answer the following questions: How do users respond to remixing in a social media environment where remixing is explicitly permitted? What qualities of originators and their projects…
▽ More
In this paper we describe the ways participants of the Scratch online community, primarily young people, engage in remixing of each others' shared animations, games, and interactive projects. In particular, we try to answer the following questions: How do users respond to remixing in a social media environment where remixing is explicitly permitted? What qualities of originators and their projects correspond to a higher likelihood of plagiarism accusations? Is there a connection between plagiarism complaints and similarities between a remix and the work it is based on? Our findings indicate that users have a very wide range of reactions to remixing and that as many users react positively as accuse remixers of plagiarism. We test several hypotheses that might explain the high number of plagiarism accusations related to original project complexity, cumulative remixing, originators' integration into remixing practice, and remixee-remixer project similarity, and find support for the first and last explanations.
△ Less
Submitted 5 July, 2015;
originally announced July 2015.
-
Exploring the evolution of a trade-off between vigilance and foraging in group-living organisms
Authors:
Randal S. Olson,
Patrick B. Haley,
Fred C. Dyer,
Christoph Adami
Abstract:
Despite the fact that grouping behavior has been actively studied for over a century, the relative importance of the numerous proposed fitness benefits of grouping remain unclear. We use a digital model of evolving prey under simulated predation to directly explore the evolution of gregarious foraging behavior according to one such benefit, the "many eyes" hypothesis. According to this hypothesis,…
▽ More
Despite the fact that grouping behavior has been actively studied for over a century, the relative importance of the numerous proposed fitness benefits of grouping remain unclear. We use a digital model of evolving prey under simulated predation to directly explore the evolution of gregarious foraging behavior according to one such benefit, the "many eyes" hypothesis. According to this hypothesis, collective vigilance allows prey in large groups to detect predators more efficiently by making alarm signals or behavioral cues to each other, thereby allowing individuals within the group to spend more time foraging. Here, we find that collective vigilance is sufficient to select for gregarious foraging behavior as long there is not a direct cost for grouping (e.g., competition for limited food resources), even when controlling for confounding factors such as the dilution effect. Further, we explore the role of the genetic relatedness and reproductive strategy of the prey, and find that highly related groups of prey with a semelparous reproductive strategy are the most likely to evolve gregarious foraging behavior mediated by the benefit of vigilance. These findings, combined with earlier studies with evolving digital organisms, further sharpen our understanding of the factors favoring grouping behavior.
△ Less
Submitted 8 August, 2014;
originally announced August 2014.
-
Navigating the massive world of reddit: Using backbone networks to map user interests in social media
Authors:
Randal S. Olson,
Zachary P. Neal
Abstract:
In the massive online worlds of social media, users frequently rely on organizing themselves around specific topics of interest to find and engage with like-minded people. However, navigating these massive worlds and finding topics of specific interest often proves difficult because the worlds are mostly organized haphazardly, leaving users to find relevant interests by word of mouth or using a ba…
▽ More
In the massive online worlds of social media, users frequently rely on organizing themselves around specific topics of interest to find and engage with like-minded people. However, navigating these massive worlds and finding topics of specific interest often proves difficult because the worlds are mostly organized haphazardly, leaving users to find relevant interests by word of mouth or using a basic search feature. Here, we report on a method using the backbone of a network to create a map of the primary topics of interest in any social network. To demonstrate the method, we build an interest map for the social news web site reddit and show how such a map could be used to navigate a social media world. Moreover, we analyze the network properties of the reddit social network and find that it has a scale-free, small-world, and modular community structure, much like other online social networks such as Facebook and Twitter. We suggest that the integration of interest maps into popular social media platforms will assist users in organizing themselves into more specific interest groups, which will help alleviate the overcrowding effect often observed in large online communities.
△ Less
Submitted 11 December, 2013;
originally announced December 2013.
-
Risk aversion as an evolutionary adaptation
Authors:
Arend Hintze,
Randal S. Olson,
Christoph Adami,
Ralph Hertwig
Abstract:
Risk aversion is a common behavior universal to humans and animals alike. Economists have traditionally defined risk preferences by the curvature of the utility function. Psychologists and behavioral economists also make use of concepts such as loss aversion and probability weighting to model risk aversion. Neurophysiological evidence suggests that loss aversion has its origins in relatively ancie…
▽ More
Risk aversion is a common behavior universal to humans and animals alike. Economists have traditionally defined risk preferences by the curvature of the utility function. Psychologists and behavioral economists also make use of concepts such as loss aversion and probability weighting to model risk aversion. Neurophysiological evidence suggests that loss aversion has its origins in relatively ancient neural circuitries (e.g., ventral striatum). Could there thus be an evolutionary origin to risk avoidance? We study this question by evolving strategies that adapt to play the equivalent mean payoff gamble. We hypothesize that risk aversion in the equivalent mean payoff gamble is beneficial as an adaptation to living in small groups, and find that a preference for risk averse strategies only evolves in small populations of less than 1,000 individuals, while agents exhibit no such strategy preference in larger populations. Further, we discover that risk aversion can also evolve in larger populations, but only when the population is segmented into small groups of around 150 individuals. Finally, we observe that risk aversion only evolves when the gamble is a rare event that has a large impact on the individual's fitness. These findings align with earlier reports that humans lived in small groups for a large portion of their evolutionary history. As such, we suggest that rare, high-risk, high-payoff events such as mating and mate competition could have driven the evolution of risk averse behavior in humans living in small groups.
△ Less
Submitted 23 October, 2013;
originally announced October 2013.
-
Evolution of swarming behavior is shaped by how predators attack
Authors:
Randal S. Olson,
David B. Knoester,
Christoph Adami
Abstract:
Animal grouping behaviors have been widely studied due to their implications for understanding social intelligence, collective cognition, and potential applications in engineering, artificial intelligence, and robotics. An important biological aspect of these studies is discerning which selection pressures favor the evolution of grouping behavior. In the past decade, researchers have begun using e…
▽ More
Animal grouping behaviors have been widely studied due to their implications for understanding social intelligence, collective cognition, and potential applications in engineering, artificial intelligence, and robotics. An important biological aspect of these studies is discerning which selection pressures favor the evolution of grouping behavior. In the past decade, researchers have begun using evolutionary computation to study the evolutionary effects of these selection pressures in predator-prey models. The selfish herd hypothesis states that concentrated groups arise because prey selfishly attempt to place their conspecifics between themselves and the predator, thus causing an endless cycle of movement toward the center of the group. Using an evolutionary model of a predator-prey system, we show that how predators attack is critical to the evolution of the selfish herd. Following this discovery, we show that density-dependent predation provides an abstraction of Hamilton's original formulation of ``domains of danger.'' Finally, we verify that density-dependent predation provides a sufficient selective advantage for prey to evolve the selfish herd in response to predation by coevolving predators. Thus, our work corroborates Hamilton's selfish herd hypothesis in a digital evolutionary model, refines the assumptions of the selfish herd hypothesis, and generalizes the domain of danger concept to density-dependent predation.
△ Less
Submitted 24 November, 2015; v1 submitted 22 October, 2013;
originally announced October 2013.
-
A composite likelihood approach to computer model calibration using high-dimensional spatial data
Authors:
Won Chang,
Murali Haran,
Roman Olson,
Klaus Keller
Abstract:
Computer models are used to model complex processes in various disciplines. Often, a key source of uncertainty in the behavior of complex computer models is uncertainty due to unknown model input parameters. Statistical computer model calibration is the process of inferring model parameter values, along with associated uncertainties, from observations of the physical process and from model outputs…
▽ More
Computer models are used to model complex processes in various disciplines. Often, a key source of uncertainty in the behavior of complex computer models is uncertainty due to unknown model input parameters. Statistical computer model calibration is the process of inferring model parameter values, along with associated uncertainties, from observations of the physical process and from model outputs at various parameter settings. Observations and model outputs are often in the form of high-dimensional spatial fields, especially in the environmental sciences. Sound statistical inference may be computationally challenging in such situations. Here we introduce a composite likelihood-based approach to perform computer model calibration with high-dimensional spatial data. While composite likelihood has been studied extensively in the context of spatial statistics, computer model calibration using composite likelihood poses several new challenges. We propose a computationally efficient approach for Bayesian computer model calibration using composite likelihood. We also develop a methodology based on asymptotic theory for adjusting the composite likelihood posterior distribution so that it accurately represents posterior uncertainties. We study the application of our new approach in the context of calibration for a climate model.
△ Less
Submitted 31 July, 2013;
originally announced August 2013.
-
Fast dimension-reduced climate model calibration and the effect of data aggregation
Authors:
Won Chang,
Murali Haran,
Roman Olson,
Klaus Keller
Abstract:
How will the climate system respond to anthropogenic forcings? One approach to this question relies on climate model projections. Current climate projections are considerably uncertain. Characterizing and, if possible, reducing this uncertainty is an area of ongoing research. We consider the problem of making projections of the North Atlantic meridional overturning circulation (AMOC). Uncertaintie…
▽ More
How will the climate system respond to anthropogenic forcings? One approach to this question relies on climate model projections. Current climate projections are considerably uncertain. Characterizing and, if possible, reducing this uncertainty is an area of ongoing research. We consider the problem of making projections of the North Atlantic meridional overturning circulation (AMOC). Uncertainties about climate model parameters play a key role in uncertainties in AMOC projections. When the observational data and the climate model output are high-dimensional spatial data sets, the data are typically aggregated due to computational constraints. The effects of aggregation are unclear because statistically rigorous approaches for model parameter inference have been infeasible for high-resolution data. Here we develop a flexible and computationally efficient approach using principal components and basis expansions to study the effect of spatial data aggregation on parametric and projection uncertainties. Our Bayesian reduced-dimensional calibration approach allows us to study the effect of complicated error structures and data-model discrepancies on our ability to learn about climate model parameters from high-dimensional data. Considering high-dimensional spatial observations reduces the effect of deep uncertainty associated with prior specifications for the data-model discrepancy. Also, using the unaggregated data results in sharper projections based on our climate model. Our computationally efficient approach may be widely applicable to a variety of high-dimensional computer model calibration problems.
△ Less
Submitted 31 July, 2014; v1 submitted 6 March, 2013;
originally announced March 2013.
-
The Comets of Caroline Herschel (1750-1848), Sleuth of the Skies at Slough
Authors:
Roberta J. M. Olson,
Jay M. Pasachoff
Abstract:
In this paper, we discuss the work on comets of Caroline Herschel, the first female comet-hunter. After leaving Bath for the environs of Windsor Castle and eventually Slough, she discovered at least eight comets, five of which were reported in the Philosophical Transactions of the Royal Society. We consider her public image, astronomers' perceptions of her contributions, and the style of her astro…
▽ More
In this paper, we discuss the work on comets of Caroline Herschel, the first female comet-hunter. After leaving Bath for the environs of Windsor Castle and eventually Slough, she discovered at least eight comets, five of which were reported in the Philosophical Transactions of the Royal Society. We consider her public image, astronomers' perceptions of her contributions, and the style of her astronomical drawings that changed with the technological developments in astronomical illustration.
△ Less
Submitted 4 December, 2012;
originally announced December 2012.
-
Predator confusion is sufficient to evolve swarming behavior
Authors:
Randal S. Olson,
Arend Hintze,
Fred C. Dyer,
David B. Knoester,
Christoph Adami
Abstract:
Swarming behaviors in animals have been extensively studied due to their implications for the evolution of cooperation, social cognition, and predator-prey dynamics. An important goal of these studies is discerning which evolutionary pressures favor the formation of swarms. One hypothesis is that swarms arise because the presence of multiple moving prey in swarms causes confusion for attacking pre…
▽ More
Swarming behaviors in animals have been extensively studied due to their implications for the evolution of cooperation, social cognition, and predator-prey dynamics. An important goal of these studies is discerning which evolutionary pressures favor the formation of swarms. One hypothesis is that swarms arise because the presence of multiple moving prey in swarms causes confusion for attacking predators, but it remains unclear how important this selective force is. Using an evolutionary model of a predator-prey system, we show that predator confusion provides a sufficient selection pressure to evolve swarming behavior in prey. Furthermore, we demonstrate that the evolutionary effect of predator confusion on prey could in turn exert pressure on the structure of the predator's visual field, favoring the frontally oriented, high-resolution visual systems commonly observed in predators that feed on swarming animals. Finally, we provide evidence that when prey evolve swarming in response to predator confusion, there is a change in the shape of the functional response curve describing the predator's consumption rate as prey density increases. Thus, we show that a relatively simple perceptual constraint--predator confusion--could have pervasive evolutionary effects on prey behavior, predator sensory mechanisms, and the ecological interactions between predators and prey.
△ Less
Submitted 3 April, 2013; v1 submitted 14 September, 2012;
originally announced September 2012.
-
Charge exchange and ionisation in N$^{7+}$, N$^{6+}$, C$^{6+}$ - H($n=1, 2$) collisions studied systematically by theoretical approaches
Authors:
Katharin Igenbergs,
Josef Schweinzer,
Alexander Veiter,
Lukas Perneczky,
Edwin Frühwirth,
Markus Wallerberger,
Ronald E. Olson,
Friedrich Aumayr
Abstract:
The introduction of gases like nitrogen or neon for cooling the edge region of magnetically confined fusion plasmas has triggered a renewed interest in state selective cross sections necessary for plasma diagnostics by means of charge exchange recombination spectroscopy. To improve the quality of spectroscopic data analysis, charge exchange and ionisation cross sections for N$^{7+}$ + H($n=1,2$) h…
▽ More
The introduction of gases like nitrogen or neon for cooling the edge region of magnetically confined fusion plasmas has triggered a renewed interest in state selective cross sections necessary for plasma diagnostics by means of charge exchange recombination spectroscopy. To improve the quality of spectroscopic data analysis, charge exchange and ionisation cross sections for N$^{7+}$ + H($n=1,2$) have been calculated using two different theoretical approaches, namely the atomic-orbital close-coupling method and the classical trajectory Monte Carlo method. Total and state resolved charge exchange cross sections are analysed in detail.
In the second part, we compare two collision systems involving equally charged ions, C$^{6+}$ and N$^{6+}$ on atomic hydrogen. The analysis of the data lead to the conclusion that deviations between these two impurity ions are practically negligible. This finding is very helpful when calculating cross sections for collision systems with heavier not completely stripped impurity ions.
△ Less
Submitted 15 December, 2011;
originally announced December 2011.
-
Spectrum of spontaneous emission into the mode of a cavity QED system
Authors:
M. L. Terraciano,
R. Olson,
D. L. Freimund,
L. A. Orozco,
P. R. Rice
Abstract:
We study the probe spectrum of light generated by spontaneous emission into the mode of a cavity QED system. The probe spectrum has a maximum on-resonance when the number of inverted atoms for an input drive is maximal. For a larger number of atoms N, the maximum splits and develops into a doublet, but its frequencies are different from those of the so-called vacuum Rabi splitting.
We study the probe spectrum of light generated by spontaneous emission into the mode of a cavity QED system. The probe spectrum has a maximum on-resonance when the number of inverted atoms for an input drive is maximal. For a larger number of atoms N, the maximum splits and develops into a doublet, but its frequencies are different from those of the so-called vacuum Rabi splitting.
△ Less
Submitted 10 January, 2006;
originally announced January 2006.
-
Self-referenced prism deflection measurement schemes with microradian precision
Authors:
Rebecca Olson,
Justin Paul,
Scott Bergeson,
Dallin S. Durfee
Abstract:
We have demonstrated several inexpensive methods which can be used to measure the deflection angles of prisms with microradian precision. The methods are self-referenced, using various reversals to achieve absolute measurements without the need of a reference prism or any expensive precision components other than the prisms under test. These techniques are based on laser interferometry and have…
▽ More
We have demonstrated several inexpensive methods which can be used to measure the deflection angles of prisms with microradian precision. The methods are self-referenced, using various reversals to achieve absolute measurements without the need of a reference prism or any expensive precision components other than the prisms under test. These techniques are based on laser interferometry and have been used in our lab to characterize parallel-plate beamsplitters, penta prisms, right angle prisms, and corner cube reflectors using only components typically available in an optics lab.
△ Less
Submitted 30 September, 2005;
originally announced September 2005.
-
Increasing the output of a Littman-type laser by use of an intracavity Faraday rotator
Authors:
Rebecca Merrill,
Rebecca Olson,
Scott Bergeson,
Dallin S. Durfee
Abstract:
We present a new method of external-cavity diode laser grating stabilization which combines the high output power of the Littrow design with the fixed output pointing of the Littman-Metcalf design. Our new approach utilizes a Faraday-effect optical isolator inside the external cavity. Experimental testing and a model which describes the tuning range and optimal tuning parameters of the laser are…
▽ More
We present a new method of external-cavity diode laser grating stabilization which combines the high output power of the Littrow design with the fixed output pointing of the Littman-Metcalf design. Our new approach utilizes a Faraday-effect optical isolator inside the external cavity. Experimental testing and a model which describes the tuning range and optimal tuning parameters of the laser are described. Preliminary testing of this design has resulted in a short-term linewidth of 360 kHz and a side-mode suppression of 37 dB. The laser tunes mode-hop free over 7 GHz and we predict that much larger tuning ranges are possible.
△ Less
Submitted 30 September, 2005;
originally announced September 2005.
-
Charge Exchange Spectra of Hydrogenic and He-like Iron
Authors:
B. J. Wargelin,
P. Beiersdorfer,
P. A. Neill,
R. E. Olson,
J. H. Scofield
Abstract:
We present H-like Fe XXVI and He-like Fe XXV charge-exchange spectra resulting from collisions of highly charged iron with N2 gas at an energy of 10 eV/amu in an electron beam ion trap. Although individual high-n emission lines are not resolved in our measurements, we observe that the most likely level for Fe25+ --> Fe24+ electron capture is n~9, in line with expectations, while the most likely…
▽ More
We present H-like Fe XXVI and He-like Fe XXV charge-exchange spectra resulting from collisions of highly charged iron with N2 gas at an energy of 10 eV/amu in an electron beam ion trap. Although individual high-n emission lines are not resolved in our measurements, we observe that the most likely level for Fe25+ --> Fe24+ electron capture is n~9, in line with expectations, while the most likely value for Fe26+ --> Fe25+ charge exchange is significantly higher. In the Fe XXV spectrum, the K-alpha emission feature dominates, whether produced via charge exchange or collisional excitation. The K-alpha centroid is lower in energy for the former case than the latter (6666 versus 6685 eV, respectively), as expected because of the strong enhancement of emission from the forbidden and intercombination lines, relative to the resonance line, in charge-exchange spectra. In contrast, the Fe XXVI high-n Lyman lines have a summed intensity greater than that of Ly-alpha, and are substantially stronger than predicted from theoretical calculations of charge exchange with atomic H. We conclude that the angular momentum distribution resulting from electron capture using a multi-electron target gas is significantly different from that obtained with H, resulting in the observed high-n enhancement. A discussion is presented of the relevance of our results to studies of diffuse Fe emission in the Galactic Center and Galactic Ridge, particularly with ASTRO-E2/Suzaku.
△ Less
Submitted 2 August, 2005;
originally announced August 2005.