-
BenchQC -- Scalable and modular benchmarking of industrial quantum computing applications
Authors:
Florian Geissler,
Eric Stopfer,
Christian Ufrecht,
Nico Meyer,
Daniel D. Scherer,
Friedrich Wagner,
Johannes M. Oberreuter,
Zao Chen,
Alessandro Farace,
Daria Gutina,
Ulrich Schwenk,
Kimberly Lange,
Vanessa Junk,
Thomas Husslein,
Marvin Erdmann,
Florian Kiwit,
Benjamin Decker,
Greshma Shaji,
Etienne Granet,
Henrik Dreyer,
Theodora-Augustina Dragan,
Jeanette Miriam Lorenz
Abstract:
We present BenchQC, a research project funded by the state of Bavaria, which promotes an application-centric perspective for benchmarking real-world quantum applications. Diverse use cases from industry consortium members are the starting point of a benchmarking workflow, that builds on the open-source platform QUARK, encompassing the full quantum software stack from the hardware provider interfac…
▽ More
We present BenchQC, a research project funded by the state of Bavaria, which promotes an application-centric perspective for benchmarking real-world quantum applications. Diverse use cases from industry consortium members are the starting point of a benchmarking workflow, that builds on the open-source platform QUARK, encompassing the full quantum software stack from the hardware provider interface to the application layer. By identifying and evaluating key metrics across the entire pipeline, we aim to uncover meaningful trends, provide systematic guidance on quantum utility, and distinguish promising research directions from less viable approaches. Ultimately, this initiative contributes to the broader effort of establishing reliable benchmarking standards that drive the transition from experimental demonstrations to practical quantum advantage.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
ttta: Tools for Temporal Text Analysis
Authors:
Kai-Robin Lange,
Niklas Benner,
Lars Grönberg,
Aymane Hachcham,
Imene Kolli,
Jonas Rieger,
Carsten Jentsch
Abstract:
Text data is inherently temporal. The meaning of words and phrases changes over time, and the context in which they are used is constantly evolving. This is not just true for social media data, where the language used is rapidly influenced by current events, memes and trends, but also for journalistic, economic or political text data. Most NLP techniques however consider the corpus at hand to be h…
▽ More
Text data is inherently temporal. The meaning of words and phrases changes over time, and the context in which they are used is constantly evolving. This is not just true for social media data, where the language used is rapidly influenced by current events, memes and trends, but also for journalistic, economic or political text data. Most NLP techniques however consider the corpus at hand to be homogenous in regard to time. This is a simplification that can lead to biased results, as the meaning of words and phrases can change over time. For instance, running a classic Latent Dirichlet Allocation on a corpus that spans several years is not enough to capture changes in the topics over time, but only portraits an "average" topic distribution over the whole time span. Researchers have developed a number of tools for analyzing text data over time. However, these tools are often scattered across different packages and libraries, making it difficult for researchers to use them in a consistent and reproducible way. The ttta package is supposed to serve as a collection of tools for analyzing text data over time.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
Tactics for Improving Least Squares Estimation
Authors:
Qiang Heng,
Hua Zhou,
Kenneth Lange
Abstract:
This paper deals with tactics for fast computation in least squares regression in high dimensions. These tactics include: (a) the majorization-minimization (MM) principle, (b) smoothing by Moreau envelopes, and (c) the proximal distance principal for constrained estimation. In iteratively reweighted least squares, the MM principle can create a surrogate function that trades case weights for adjust…
▽ More
This paper deals with tactics for fast computation in least squares regression in high dimensions. These tactics include: (a) the majorization-minimization (MM) principle, (b) smoothing by Moreau envelopes, and (c) the proximal distance principal for constrained estimation. In iteratively reweighted least squares, the MM principle can create a surrogate function that trades case weights for adjusted responses. Reduction to ordinary least squares then permits the reuse of the Gram matrix and its Cholesky decomposition across iterations. This tactic is pertinent to estimation in L2E regression and generalized linear models. For problems such as quantile regression, non-smooth terms of an objective function can be replaced by their Moreau envelope approximations and majorized by spherical quadratics. Finally, penalized regression with distance-to-set penalties also benefits from this perspective. Our numerical experiments validate the speed and utility of deweighting and Moreau envelope approximations. Julia software implementing these experiments is available on our web page.
△ Less
Submitted 5 January, 2025;
originally announced January 2025.
-
Zeitenwenden: Detecting changes in the German political discourse
Authors:
Kai-Robin Lange,
Jonas Rieger,
Niklas Benner,
Carsten Jentsch
Abstract:
From a monarchy to a democracy, to a dictatorship and back to a democracy -- the German political landscape has been constantly changing ever since the first German national state was formed in 1871. After World War II, the Federal Republic of Germany was formed in 1949. Since then every plenary session of the German Bundestag was logged and even has been digitized over the course of the last few…
▽ More
From a monarchy to a democracy, to a dictatorship and back to a democracy -- the German political landscape has been constantly changing ever since the first German national state was formed in 1871. After World War II, the Federal Republic of Germany was formed in 1949. Since then every plenary session of the German Bundestag was logged and even has been digitized over the course of the last few years. We analyze these texts using a time series variant of the topic model LDA to investigate which events had a lasting effect on the political discourse and how the political topics changed over time. This allows us to detect changes in word frequency (and thus key discussion points) in political discourse.
△ Less
Submitted 23 October, 2024;
originally announced October 2024.
-
SpeakGer: A meta-data enriched speech corpus of German state and federal parliaments
Authors:
Kai-Robin Lange,
Carsten Jentsch
Abstract:
The application of natural language processing on political texts as well as speeches has become increasingly relevant in political sciences due to the ability to analyze large text corpora which cannot be read by a single person. But such text corpora often lack critical meta information, detailing for instance the party, age or constituency of the speaker, that can be used to provide an analysis…
▽ More
The application of natural language processing on political texts as well as speeches has become increasingly relevant in political sciences due to the ability to analyze large text corpora which cannot be read by a single person. But such text corpora often lack critical meta information, detailing for instance the party, age or constituency of the speaker, that can be used to provide an analysis tailored to more fine-grained research questions. To enable researchers to answer such questions with quantitative approaches such as natural language processing, we provide the SpeakGer data set, consisting of German parliament debates from all 16 federal states of Germany as well as the German Bundestag from 1947-2023, split into a total of 10,806,105 speeches. This data set includes rich meta data in form of information on both reactions from the audience towards the speech as well as information about the speaker's party, their age, their constituency and their party's political alignment, which enables a deeper analysis. We further provide three exploratory analyses, detailing topic shares of different parties throughout time, a descriptive analysis of the development of the age of an average speaker as well as a sentiment analysis of speeches of different parties with regards to the COVID-19 pandemic.
△ Less
Submitted 23 October, 2024;
originally announced October 2024.
-
Machine Learning in Space: Surveying the Robustness of on-board ML models to Radiation
Authors:
Kevin Lange,
Federico Fontana,
Francesco Rossi,
Mattia Varile,
Giovanni Apruzzese
Abstract:
Modern spacecraft are increasingly relying on machine learning (ML). However, physical equipment in space is subject to various natural hazards, such as radiation, which may inhibit the correct operation of computing devices. Despite plenty of evidence showing the damage that naturally-induced faults can cause to ML-related hardware, we observe that the effects of radiation on ML models for space…
▽ More
Modern spacecraft are increasingly relying on machine learning (ML). However, physical equipment in space is subject to various natural hazards, such as radiation, which may inhibit the correct operation of computing devices. Despite plenty of evidence showing the damage that naturally-induced faults can cause to ML-related hardware, we observe that the effects of radiation on ML models for space applications are not well-studied. This is a problem: without understanding how ML models are affected by these natural phenomena, it is uncertain "where to start from" to develop radiation-tolerant ML software. As ML researchers, we attempt to tackle this dilemma. By partnering up with space-industry practitioners specialized in ML, we perform a reflective analysis of the state of the art. We provide factual evidence that prior work did not thoroughly examine the impact of natural hazards on ML models meant for spacecraft. Then, through a "negative result", we show that some existing open-source technologies can hardly be used by researchers to study the effects of radiation for some applications of ML in satellites. As a constructive step forward, we perform simple experiments showcasing how to leverage current frameworks to assess the robustness of practical ML models for cloud detection against radiation-induced faults. Our evaluation reveals that not all faults are as devastating as claimed by some prior work. By publicly releasing our resources, we provide a foothold -- usable by researchers without access to spacecraft -- for spearheading development of space-tolerant ML models.
△ Less
Submitted 29 May, 2024; v1 submitted 4 May, 2024;
originally announced May 2024.
-
Resistive switching acceleration induced by thermal confinement
Authors:
Alexandros Sarantopoulos,
Kristof Lange,
Francisco Rivadulla,
Stephan Menzel,
Regina Dittmann
Abstract:
Enhancing the switching speed of oxide-based memristive devices at a low voltage level is crucial for their use as non-volatile memory and their integration into emerging computing paradigms such as neuromorphic computing. Efforts to accelerate the switching speed often result in an energy tradeoff, leading to an increase of the minimum working voltage. In our study, we present an innovative solut…
▽ More
Enhancing the switching speed of oxide-based memristive devices at a low voltage level is crucial for their use as non-volatile memory and their integration into emerging computing paradigms such as neuromorphic computing. Efforts to accelerate the switching speed often result in an energy tradeoff, leading to an increase of the minimum working voltage. In our study, we present an innovative solution: the introduction of a low thermal conductivity layer placed within the active electrode, which impedes the dissipation of heat generated during the switching process. The result is a notable acceleration in the switching speed of the memristive model system SrTiO$_{3}$ by a remarkable factor of 10$^{3}$, while preserving the integrity of the switching layer and the interfaces with the electrodes, rendering it adaptable to various filamentary memristive systems. The incorporation of HfO$_{2}$ or TaO$_{x}$ as heat-blocking layers not only streamlines the fabrication process, but also ensures compatibility with complementary metal-oxide-semiconductor technology.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
A Stability Framework for Parameter Selection in the Minimum Covariance Determinant Problem
Authors:
Qiang Heng,
Hui Shen,
Kenneth Lange
Abstract:
The Minimum Covariance Determinant (MCD) method is a widely adopted tool for robust estimation and outlier detection. In this paper, we introduce MCD model selection based on the notion of stability. Our best subset method leverages prior best practices such as statistical depths for initialization and concentration steps for subset refinement. Our contribution lies in constructing a bootstrap pro…
▽ More
The Minimum Covariance Determinant (MCD) method is a widely adopted tool for robust estimation and outlier detection. In this paper, we introduce MCD model selection based on the notion of stability. Our best subset method leverages prior best practices such as statistical depths for initialization and concentration steps for subset refinement. Our contribution lies in constructing a bootstrap procedure to estimate the instability of the best subset algorithm. The instability path offers insights into a dataset's inlier/outlier structure and facilitates suitable choice of the subset size. We rigorously benchmark the proposed framework against existing MCD variants and illustrate its practical utility on several real-world datasets.
△ Less
Submitted 15 April, 2025; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Statistics of Turbulence in the Solar Wind. I. What is the Reynolds Number of the Solar Wind?
Authors:
Daniel Wrench,
Tulasi N. Parashar,
Sean Oughton,
Kevin de Lange,
Marcus Frean
Abstract:
The Reynolds number, Re, is an important quantity for describing a turbulent flow. It tells us about the bandwidth over which energy can cascade from large scales to smaller ones, prior to the onset of dissipation. However, calculating it for nearly collisionless plasmas like the solar wind is challenging. Previous studies have used "effective" Reynolds number formulations, expressing Re as a func…
▽ More
The Reynolds number, Re, is an important quantity for describing a turbulent flow. It tells us about the bandwidth over which energy can cascade from large scales to smaller ones, prior to the onset of dissipation. However, calculating it for nearly collisionless plasmas like the solar wind is challenging. Previous studies have used "effective" Reynolds number formulations, expressing Re as a function of the correlation scale and either the Taylor scale or a proxy for the dissipation scale. We find that the Taylor scale definition of the Reynolds number has a sizeable prefactor of approximately 27, which has not been employed in previous works. Drawing from 18 years of data from the Wind spacecraft at 1 au, we calculate the magnetic Taylor scale directly and use both the ion inertial length and the magnetic spectrum break scale as approximations for the dissipation scale, yielding three distinct Re estimates for each 12-hour interval. Average values of Re range between 116,000 and 3,406,000, within the general distribution of past work. We also find considerable disagreement between the methods, with linear associations of between 0.38 and 0.72. Although the Taylor scale method is arguably more physically motivated, due to its dependence on the energy cascade rate, more theoretical work is needed in order to identify the most appropriate way of calculating effective Reynolds numbers for kinetic plasmas. As a summary of our observational analysis, we make available a data product of 28 years of 1 au solar wind and magnetospheric plasma measurements from Wind.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
The need for spatially resolved observations of PAHs in protoplanetary discs
Authors:
K. Lange,
C. Dominik,
A. G. G. M. Tielens
Abstract:
The signatures of polycyclic aromatic hydrocarbons (PAHs) have been observed in protoplanetary discs, and their emission features obtained from spectral energy distributions (SED) have been used in the literature to characterise their size and determine their abundance. Two simple disc models (uniform PAH distribution against a PAH gap in the inner disc) are compared to investigate the difference…
▽ More
The signatures of polycyclic aromatic hydrocarbons (PAHs) have been observed in protoplanetary discs, and their emission features obtained from spectral energy distributions (SED) have been used in the literature to characterise their size and determine their abundance. Two simple disc models (uniform PAH distribution against a PAH gap in the inner disc) are compared to investigate the difference of their SED and obtainable information. We used the radiative transfer code RADMC-3D to model the SED of two protoplanetary discs orbiting a typical Herbig star, one of which features a depletion of PAHs in the inner disc. We further created artificial images of the discs at face-on view to extract radial profiles of the PAH emission in the infrared. We find that the extracted PAH features from an SED provide limited information about the PAHs in protoplanetary disc environments, except for the ionisation state. The distribution of PAHs in a protoplanetary disc influences the total observed PAH luminosity in a non-linear fashion and alters the relative strength between the 3.3\,$μ$m and 11.3\,$μ$m features. Furthermore, we produced radial profiles at the 3\,$μ$m, 6\,$μ$m and, 11\,$μ$m PAH emission features and find that they follow a double power-law profile where the slope reflects the radiative environment (single photon regime vs. multi-photon regime) in which the PAHs lie. Using spatially resolved techniques such as IFU or imaging in the era of the James Webb Space Telescope, we find that multi-wavelength radial emission profiles will not only provide information on the spatial distribution of the PAHs, but may also provide information on their size and underlying UV environment, which is crucial for photo-evaporative disc wind models.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Turbulent processing of PAHs in protoplanetary discs -- Coagulation and freeze-out leading to depletion of gas-phase PAH
Authors:
K. Lange,
C. Dominik,
A. G. G. M. Tielens
Abstract:
Polycyclic aromatic hydrocarbons (PAHs) have been detected in numerous circumstellar discs. We propose the continuous processing of PAHs through clustering, adsorption on dust grains, and their reverse-processes as key mechanisms to reduce the emission-capable PAH abundance in protoplanetary discs. This cycle of processing is driven by vertical turbulence in the disc mixing PAHs between the disc m…
▽ More
Polycyclic aromatic hydrocarbons (PAHs) have been detected in numerous circumstellar discs. We propose the continuous processing of PAHs through clustering, adsorption on dust grains, and their reverse-processes as key mechanisms to reduce the emission-capable PAH abundance in protoplanetary discs. This cycle of processing is driven by vertical turbulence in the disc mixing PAHs between the disc midplane and the photosphere. We used a theoretical Monte Carlo model for photodesorption and a coagulation code in the disc midplane to estimate the relevance and timescale of these processes in a Herbig Ae/Be disc environment. By combining these components in a 1D vertical model, we calculated the gas-phase depletion of PAHs that stick as clusters on dust grains. Our results show that the clustering of gas-phase PAHs is very efficient, and that clusters with more than 100 monomers can grow for years before they are able to freeze out in the disc midplane. Once a PAH cluster is frozen on the dust grain surface, the large heat capacity of these clusters prevents them from evaporating off the grains in UV-rich environments such as the photosphere. Therefore, the clustering of PAHs followed by freeze-out can lead to a depletion of gas-phase PAHs in discs. Evaluated over the lifetime of protoplanetary discs, we find a depletion of PAHs by a factor that ranges between 50 and 1000 compared to the standard ISM abundance of PAHs in the inner disc through turbulent processing. Through these processes, we favour PAHs smaller than circumovalene as the major gas-phase emitters of the disc photosphere as larger PAH monomers cannot photodesorb from the grain surface. These gas-phase PAHs co-exist with large PAH clusters sticking on dust grains. We find a close relation between the amount of PAHs frozen out on dust grains and the dust population, as well as the strength of the vertical turbulence.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
Lex2Sent: A bagging approach to unsupervised sentiment analysis
Authors:
Kai-Robin Lange,
Jonas Rieger,
Carsten Jentsch
Abstract:
Unsupervised text classification, with its most common form being sentiment analysis, used to be performed by counting words in a text that were stored in a lexicon, which assigns each word to one class or as a neutral word. In recent years, these lexicon-based methods fell out of favor and were replaced by computationally demanding fine-tuning techniques for encoder-only models such as BERT and z…
▽ More
Unsupervised text classification, with its most common form being sentiment analysis, used to be performed by counting words in a text that were stored in a lexicon, which assigns each word to one class or as a neutral word. In recent years, these lexicon-based methods fell out of favor and were replaced by computationally demanding fine-tuning techniques for encoder-only models such as BERT and zero-shot classification using decoder-only models such as GPT-4. In this paper, we propose an alternative approach: Lex2Sent, which provides improvement over classic lexicon methods but does not require any GPU or external hardware. To classify texts, we train embedding models to determine the distances between document embeddings and the embeddings of the parts of a suitable lexicon. We employ resampling, which results in a bagging effect, boosting the performance of the classification. We show that our model outperforms lexica and provides a basis for a high performing few-shot fine-tuning approach in the task of binary sentiment analysis.
△ Less
Submitted 22 October, 2024; v1 submitted 26 September, 2022;
originally announced September 2022.
-
A Flexible Quasi-Copula Distribution for Statistical Modeling
Authors:
Sarah S. Ji,
Benjamin B. Chu,
Hua Zhou,
Kenneth Lange
Abstract:
Copulas, generalized estimating equations, and generalized linear mixed models promote the analysis of grouped data where non-normal responses are correlated. Unfortunately, parameter estimation remains challenging in these three frameworks. Based on prior work of Tonda, we derive a new class of probability density functions that allow explicit calculation of moments, marginal and conditional dist…
▽ More
Copulas, generalized estimating equations, and generalized linear mixed models promote the analysis of grouped data where non-normal responses are correlated. Unfortunately, parameter estimation remains challenging in these three frameworks. Based on prior work of Tonda, we derive a new class of probability density functions that allow explicit calculation of moments, marginal and conditional distributions, and the score and observed information needed in maximum likelihood estimation. We also illustrate how the new distribution flexibly models longitudinal data following a non-Gaussian distribution. Finally, we conduct a tri-variate genome-wide association analysis on dichotomized systolic and diastolic blood pressure and body mass index data from the UK-Biobank, showcasing the modeling prowess and computational scalability of the new distribution.
△ Less
Submitted 14 October, 2024; v1 submitted 6 May, 2022;
originally announced May 2022.
-
Possible Ribose Synthesis in Carbonaceous Planetesimals
Authors:
Klaus Paschek,
Kai Kohler,
Ben K. D. Pearce,
Kevin Lange,
Thomas K. Henning,
Oliver Trapp,
Ralph E. Pudritz,
Dmitry A. Semenov
Abstract:
The origin of life might be sparked by the polymerization of the first RNA molecules in Darwinian ponds during wet-dry cycles. The key life-building block ribose was found in carbonaceous chondrites. Its exogenous delivery onto the Hadean Earth could be a crucial step toward the emergence of the RNA world. Here, we investigate the formation of ribose through a simplified version of the formose rea…
▽ More
The origin of life might be sparked by the polymerization of the first RNA molecules in Darwinian ponds during wet-dry cycles. The key life-building block ribose was found in carbonaceous chondrites. Its exogenous delivery onto the Hadean Earth could be a crucial step toward the emergence of the RNA world. Here, we investigate the formation of ribose through a simplified version of the formose reaction inside carbonaceous chondrite parent bodies. Following up on our previous studies regarding nucleobases with the same coupled physico-chemical model, we calculate the abundance of ribose within planetesimals of different sizes and heating histories. We perform laboratory experiments using catalysts present in carbonaceous chondrites to infer the yield of ribose among all pentoses (5Cs) forming during the formose reaction. These laboratory yields are used to tune our theoretical model that can only predict the total abundance of 5Cs. We found that the calculated abundances of ribose were similar to the ones measured in carbonaceous chondrites. We discuss the possibilities of chemical decomposition and preservation of ribose and derived constraints on time and location in planetesimals. In conclusion, the aqueous formose reaction might produce most of the ribose in carbonaceous chondrites. Together with our previous studies on nucleobases, we found that life-building blocks of the RNA world could be synthesized inside parent bodies and later delivered onto the early Earth.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Feature Selection for Vertex Discriminant Analysis
Authors:
Alfonso Landeros,
Tong Tong Wu,
Kenneth Lange
Abstract:
We revisit vertex discriminant analysis (VDA) from the perspective of proximal distance algorithms. By specifying sparsity sets as constraints that directly control the number of active features, VDA is able to fit multiclass classifiers with no more than $k$ active features. We combine our sparse VDA approach with repeated cross validation to fit classifiers across the full range of model sizes o…
▽ More
We revisit vertex discriminant analysis (VDA) from the perspective of proximal distance algorithms. By specifying sparsity sets as constraints that directly control the number of active features, VDA is able to fit multiclass classifiers with no more than $k$ active features. We combine our sparse VDA approach with repeated cross validation to fit classifiers across the full range of model sizes on a given dataset. Our numerical examples demonstrate that grappling with sparsity directly is an attractive approach to model building in high-dimensional settings. Applications to kernel-based VDA are also considered.
△ Less
Submitted 21 March, 2022;
originally announced March 2022.
-
A Sharper Computational Tool for $\text{L}_2\text{E}$ Regression
Authors:
Xiaoqian Liu,
Eric C. Chi,
Kenneth Lange
Abstract:
Building on previous research of Chi and Chi (2022), the current paper revisits estimation in robust structured regression under the $\text{L}_2\text{E}$ criterion. We adopt the majorization-minimization (MM) principle to design a new algorithm for updating the vector of regression coefficients. Our sharp majorization achieves faster convergence than the previous alternating proximal gradient desc…
▽ More
Building on previous research of Chi and Chi (2022), the current paper revisits estimation in robust structured regression under the $\text{L}_2\text{E}$ criterion. We adopt the majorization-minimization (MM) principle to design a new algorithm for updating the vector of regression coefficients. Our sharp majorization achieves faster convergence than the previous alternating proximal gradient descent algorithm (Chi and Chi, 2022). In addition, we reparameterize the model by substituting precision for scale and estimate precision via a modified Newton's method. This simplifies and accelerates overall estimation. We also introduce distance-to-set penalties to allow constrained estimation under nonconvex constraint sets. This tactic also improves performance in coefficient estimation and structure recovery. Finally, we demonstrate the merits of our improved tactics through a rich set of simulation examples and a real data application.
△ Less
Submitted 23 August, 2022; v1 submitted 6 March, 2022;
originally announced March 2022.
-
A unified analysis of convex and non-convex lp-ball projection problems
Authors:
Joong-Ho Won,
Kenneth Lange,
Jason Xu
Abstract:
The task of projecting onto $\ell_p$ norm balls is ubiquitous in statistics and machine learning, yet the availability of actionable algorithms for doing so is largely limited to the special cases of $p = \left\{ 0, 1,2, \infty \right\}$. In this paper, we introduce novel, scalable methods for projecting onto the $\ell_p$ ball for general $p>0$. For $p \geq1 $, we solve the univariate Lagrangian d…
▽ More
The task of projecting onto $\ell_p$ norm balls is ubiquitous in statistics and machine learning, yet the availability of actionable algorithms for doing so is largely limited to the special cases of $p = \left\{ 0, 1,2, \infty \right\}$. In this paper, we introduce novel, scalable methods for projecting onto the $\ell_p$ ball for general $p>0$. For $p \geq1 $, we solve the univariate Lagrangian dual via a dual Newton method. We then carefully design a bisection approach for $p<1$, presenting theoretical and empirical evidence of zero or a small duality gap in the non-convex case. The success of our contributions is thoroughly assessed empirically, and applied to large-scale regularized multi-task learning and compressed sensing.
△ Less
Submitted 2 March, 2022; v1 submitted 1 March, 2022;
originally announced March 2022.
-
A.I. and Data-Driven Mobility at Volkswagen Financial Services AG
Authors:
Shayan Jawed,
Mofassir ul Islam Arif,
Ahmed Rashed,
Kiran Madhusudhanan,
Shereen Elsayed,
Mohsan Jameel,
Alexei Volk,
Andre Hintsches,
Marlies Kornfeld,
Katrin Lange,
Lars Schmidt-Thieme
Abstract:
Machine learning is being widely adapted in industrial applications owing to the capabilities of commercially available hardware and rapidly advancing research. Volkswagen Financial Services (VWFS), as a market leader in vehicle leasing services, aims to leverage existing proprietary data and the latest research to enhance existing and derive new business processes. The collaboration between Infor…
▽ More
Machine learning is being widely adapted in industrial applications owing to the capabilities of commercially available hardware and rapidly advancing research. Volkswagen Financial Services (VWFS), as a market leader in vehicle leasing services, aims to leverage existing proprietary data and the latest research to enhance existing and derive new business processes. The collaboration between Information Systems and Machine Learning Lab (ISMLL) and VWFS serves to realize this goal. In this paper, we propose methods in the fields of recommender systems, object detection, and forecasting that enable data-driven decisions for the vehicle life-cycle at VWFS.
△ Less
Submitted 9 February, 2022;
originally announced February 2022.
-
Meteorites and the RNA World: Synthesis of Nucleobases in Carbonaceous Planetesimals and the Role of Initial Volatile Content
Authors:
Klaus Paschek,
Dmitry A. Semenov,
Ben K. D. Pearce,
Kevin Lange,
Thomas K. Henning,
Ralph E. Pudritz
Abstract:
Prebiotic molecules, fundamental building blocks for the origin of life, have been found in carbonaceous chondrites. The exogenous delivery of these organic molecules onto the Hadean Earth could have sparked the polymerization of the first RNA molecules in Darwinian ponds during wet-dry cycles. Here, we investigate the formation of the RNA and DNA nucleobases adenine, uracil, cytosine, guanine, an…
▽ More
Prebiotic molecules, fundamental building blocks for the origin of life, have been found in carbonaceous chondrites. The exogenous delivery of these organic molecules onto the Hadean Earth could have sparked the polymerization of the first RNA molecules in Darwinian ponds during wet-dry cycles. Here, we investigate the formation of the RNA and DNA nucleobases adenine, uracil, cytosine, guanine, and thymine inside parent body planetesimals of carbonaceous chondrites. An up-to-date thermochemical equilibrium model coupled with a 1D thermodynamic planetesimal model is used to calculate the nucleobase concentrations. Different from the previous study (Pearce & Pudritz 2016), we assume initial volatile concentrations more appropriate for the formation zone of carbonaceous chondrite parent bodies. This represents more accurately cosmochemical findings that these bodies have formed inside the inner, $\sim 2\mathrm{-}5\,\mathrm{au}$, warm region of the solar system. Due to these improvements, our model represents the concentrations of adenine and guanine measured in carbonaceous chondrites. Our model did not reproduce per se the measurements of uracil, cytosine, and thymine in these meteorites. This can be explained by transformation reactions between nucleobases and potential decomposition of thymine. The synthesis of prebiotic organic matter in carbonaceous asteroids could be well explained by a combination of i) radiogenic heating, ii) aqueous chemistry involving a few key processes at a specific range of radii inside planetesimals where water can exist in the liquid phase, and iii) a reduced initial volatile content (H$_2$, CO, HCN, CH$_2$O) of the protoplanetary disk material in the parent body region compared to the outer region of comets.
△ Less
Submitted 11 November, 2022; v1 submitted 16 December, 2021;
originally announced December 2021.
-
Algorithms for Sparse Support Vector Machines
Authors:
Alfonso Landeros,
Kenneth Lange
Abstract:
Many problems in classification involve huge numbers of irrelevant features. Model selection reveals the crucial features, reduces the dimensionality of feature space, and improves model interpretation. In the support vector machine literature, model selection is achieved by $\ell_1$ penalties. These convex relaxations seriously bias parameter estimates toward 0 and tend to admit too many irreleva…
▽ More
Many problems in classification involve huge numbers of irrelevant features. Model selection reveals the crucial features, reduces the dimensionality of feature space, and improves model interpretation. In the support vector machine literature, model selection is achieved by $\ell_1$ penalties. These convex relaxations seriously bias parameter estimates toward 0 and tend to admit too many irrelevant features. The current paper presents an alternative that replaces penalties by sparse-set constraints. Penalties still appear, but serve a different purpose. The proximal distance principle takes a loss function $L(\boldsymbolβ)$ and adds the penalty $\fracρ{2}\mathrm{dist}(\boldsymbolβ, S_k)^2$ capturing the squared Euclidean distance of the parameter vector $\boldsymbolβ$ to the sparsity set $S_k$ where at most $k$ components of $\boldsymbolβ$ are nonzero. If $\boldsymbolβ_ρ$ represents the minimum of the objective $f_ρ(\boldsymbolβ)=L(\boldsymbolβ)+\fracρ{2}\mathrm{dist}(\boldsymbolβ, S_k)^2$, then $\boldsymbolβ_ρ$ tends to the constrained minimum of $L(\boldsymbolβ)$ over $S_k$ as $ρ$ tends to $\infty$. We derive two closely related algorithms to carry out this strategy. Our simulated and real examples vividly demonstrate how the algorithms achieve much better sparsity without loss of classification power.
△ Less
Submitted 14 October, 2021;
originally announced October 2021.
-
A Proximal Distance Algorithm for Likelihood-Based Sparse Covariance Estimation
Authors:
Jason Xu,
Kenneth Lange
Abstract:
This paper addresses the task of estimating a covariance matrix under a patternless sparsity assumption. In contrast to existing approaches based on thresholding or shrinkage penalties, we propose a likelihood-based method that regularizes the distance from the covariance estimate to a symmetric sparsity set. This formulation avoids unwanted shrinkage induced by more common norm penalties and enab…
▽ More
This paper addresses the task of estimating a covariance matrix under a patternless sparsity assumption. In contrast to existing approaches based on thresholding or shrinkage penalties, we propose a likelihood-based method that regularizes the distance from the covariance estimate to a symmetric sparsity set. This formulation avoids unwanted shrinkage induced by more common norm penalties and enables optimization of the resulting non-convex objective by solving a sequence of smooth, unconstrained subproblems. These subproblems are generated and solved via the proximal distance version of the majorization-minimization principle. The resulting algorithm executes rapidly, gracefully handles settings where the number of parameters exceeds the number of cases, yields a positive definite solution, and enjoys desirable convergence properties. Empirically, we demonstrate that our approach outperforms competing methods by several metrics across a suite of simulated experiments. Its merits are illustrated on an international migration dataset and a classic case study on flow cytometry. Our findings suggest that the marginal and conditional dependency networks for the cell signalling data are more similar than previously concluded.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
Stability of Polycyclic Aromatic Hydrocarbon Clusters in Protoplanetary Disks
Authors:
K. Lange,
C. Dominik,
A. G. G. M. Tielens
Abstract:
The infrared signature of polycyclic aromatic hydrocarbons (PAHs) are present in many protostellar disks and these speciesare thought to play an important role in heating of the gas in the photosphere. We aim to consider PAH cluster formation as one possible cause for non-detections of PAH features in protoplanetary disks. We test the necessary conditions for cluster formation and cluster dissocia…
▽ More
The infrared signature of polycyclic aromatic hydrocarbons (PAHs) are present in many protostellar disks and these speciesare thought to play an important role in heating of the gas in the photosphere. We aim to consider PAH cluster formation as one possible cause for non-detections of PAH features in protoplanetary disks. We test the necessary conditions for cluster formation and cluster dissociation by stellar optical and FUV photons in protoplanetarydisks using a Herbig Ae/Be and a T Tauri star disk model. We perform Monte-Carlo (MC) and statistical calculations to determine dissociation rates for coronene, circumcoronene and circumcoronene clusters with sizes between 2 and 200 cluster members. By applying general disk models to our Herbig Ae/Be and T Tauri star model, we estimate the formation rate of PAH dimers and compare these with the dissociation rates. We show that the formation of PAH dimers can take place in the inner 100 AU of protoplanetary disks in sub-photospheric layers. Dimer formation takes seconds to years allowing them to grow beyond dimer size in a short time. We further demonstrate that PAH cluster increase their stability while they grow if they are located beyond a critical distance that depends on stellar properties and PAH species. The comparison with the local vertical mixing time scale allows a determination of the minimum cluster size necessaryfor survival of PAH clusters. Considering the PAH cluster formation sites, cluster survival in the photosphere of the inner disk of Herbig stars isunlikely because of the high UV radiation. For the T Tauri stars, survival of coronene, circumcoronene and circumcircumcoronene clusters is possible and cluster formation should be considered as one possible explanation for low PAH detection rates in T Tauri star disks.
△ Less
Submitted 24 August, 2021;
originally announced August 2021.
-
Nonconvex Optimization via MM Algorithms: Convergence Theory
Authors:
Kenneth Lange,
Joong-Ho Won,
Alfonso Landeros,
Hua Zhou
Abstract:
The majorization-minimization (MM) principle is an extremely general framework for deriving optimization algorithms. It includes the expectation-maximization (EM) algorithm, proximal gradient algorithm, concave-convex procedure, quadratic lower bound algorithm, and proximal distance algorithm as special cases. Besides numerous applications in statistics, optimization, and imaging, the MM principle…
▽ More
The majorization-minimization (MM) principle is an extremely general framework for deriving optimization algorithms. It includes the expectation-maximization (EM) algorithm, proximal gradient algorithm, concave-convex procedure, quadratic lower bound algorithm, and proximal distance algorithm as special cases. Besides numerous applications in statistics, optimization, and imaging, the MM principle finds wide applications in large scale machine learning problems such as matrix completion, discriminant analysis, and nonnegative matrix factorizations. When applied to nonconvex optimization problems, MM algorithms enjoy the advantages of convexifying the objective function, separating variables, numerical stability, and ease of implementation. However, compared to the large body of literature on other optimization algorithms, the convergence analysis of MM algorithms is scattered and problem specific. This survey presents a unified treatment of the convergence of MM algorithms. With modern applications in mind, the results encompass non-smooth objective functions and non-asymptotic analysis.
△ Less
Submitted 5 June, 2021;
originally announced June 2021.
-
What Makes a Message Persuasive? Identifying Adaptations Towards Persuasiveness in Nine Exploratory Case Studies
Authors:
Sebastian Duerr,
Krystian Teodor Lange,
Peter A. Gloor
Abstract:
The ability to persuade others is critical to professional and personal success. However, crafting persuasive messages is demanding and poses various challenges. We conducted nine exploratory case studies to identify adaptations that professional and non-professional writers make in written scenarios to increase their subjective persuasiveness. Furthermore, we identified challenges that those writ…
▽ More
The ability to persuade others is critical to professional and personal success. However, crafting persuasive messages is demanding and poses various challenges. We conducted nine exploratory case studies to identify adaptations that professional and non-professional writers make in written scenarios to increase their subjective persuasiveness. Furthermore, we identified challenges that those writers faced and identified strategies to resolve them with persuasive natural language generation, i.e., artificial intelligence. Our findings show that humans can achieve high degrees of persuasiveness (more so for professional-level writers), and artificial intelligence can complement them to achieve increased celerity and alignment in the process.
△ Less
Submitted 26 April, 2021;
originally announced April 2021.
-
Poisson Phase Retrieval in Very Low-count Regimes
Authors:
Zongyu Li,
Kenneth Lange,
Jeffrey A. Fessler
Abstract:
This paper discusses phase retrieval algorithms for maximum likelihood (ML) estimation from measurements following independent Poisson distributions in very low-count regimes, e.g., 0.25 photon per pixel. To maximize the log-likelihood of the Poisson ML model, we propose a modified Wirtinger flow (WF) algorithm using a step size based on the observed Fisher information. This approach eliminates al…
▽ More
This paper discusses phase retrieval algorithms for maximum likelihood (ML) estimation from measurements following independent Poisson distributions in very low-count regimes, e.g., 0.25 photon per pixel. To maximize the log-likelihood of the Poisson ML model, we propose a modified Wirtinger flow (WF) algorithm using a step size based on the observed Fisher information. This approach eliminates all parameter tuning except the number of iterations. We also propose a novel curvature for majorize-minimize (MM) algorithms with a quadratic majorizer. We show theoretically that our proposed curvature is sharper than the curvature derived from the supremum of the second derivative of the Poisson ML cost function. We compare the proposed algorithms (WF, MM) with existing optimization methods, including WF using other step-size schemes, quasi-Newton methods such as LBFGS and alternating direction method of multipliers (ADMM) algorithms, under a variety of experimental settings. Simulation experiments with a random Gaussian matrix, a canonical DFT matrix, a masked DFT matrix and an empirical transmission matrix demonstrate the following. 1) As expected, algorithms based on the Poisson ML model consistently produce higher quality reconstructions than algorithms derived from Gaussian noise ML models when applied to low-count data. 2) For unregularized cases, our proposed WF algorithm with Fisher information for step size converges faster than other WF methods, e.g., WF with empirical step size, backtracking line search, and optimal step size for the Gaussian noise model; it also converges faster than the LBFGS quasi-Newton method. 3) In regularized cases, our proposed WF algorithm converges faster than WF with backtracking line search, LBFGS, MM and ADMM.
△ Less
Submitted 24 September, 2022; v1 submitted 1 April, 2021;
originally announced April 2021.
-
On the Impact of Attachment Strategies for Payment Channel Networks
Authors:
Kimberly Lange,
Elias Rohrer,
Florian Tschorsch
Abstract:
Payment channel networks, such as Bitcoin's Lightning Network, promise to improve the scalability of blockchain systems by processing the majority of transactions off-chain. Due to the design, the positioning of nodes in the network topology is a highly influential factor regarding the experienced performance, costs, and fee revenue of network participants. As a consequence, today's Lightning Netw…
▽ More
Payment channel networks, such as Bitcoin's Lightning Network, promise to improve the scalability of blockchain systems by processing the majority of transactions off-chain. Due to the design, the positioning of nodes in the network topology is a highly influential factor regarding the experienced performance, costs, and fee revenue of network participants. As a consequence, today's Lightning Network is built around a small number of highly-connected hubs. Recent literature shows the centralizing tendencies to be incentive-compatible and at the same time detrimental to security and privacy. The choice of attachment strategies therefore becomes a crucial factor for the future of such systems. In this paper, we provide an empirical study on the (local and global) impact of various attachment strategies for payment channel networks. To this end, we introduce candidate strategies from the field of graph theory and analyze them with respect to their computational complexity as well as their repercussions for end users and service providers. Moreover, we evaluate their long-term impact on the network topology.
△ Less
Submitted 18 February, 2021;
originally announced February 2021.
-
Momentum Entanglement for Atom Interferometry
Authors:
F. Anders,
A. Idel,
P. Feldmann,
D. Bondarenko,
S. Loriani,
K. Lange,
J. Peise,
M. Gersemann,
B. Meyer,
S. Abend,
N. Gaaloul,
C. Schubert,
D. Schlippert,
L. Santos,
E. Rasel,
C. Klempt
Abstract:
Compared to light interferometers, the flux in cold-atom interferometers is low and the associated shot noise large. Sensitivities beyond these limitations require the preparation of entangled atoms in different momentum modes. Here, we demonstrate a source of entangled atoms that is compatible with state-of-the-art interferometers. Entanglement is transferred from the spin degree of freedom of a…
▽ More
Compared to light interferometers, the flux in cold-atom interferometers is low and the associated shot noise large. Sensitivities beyond these limitations require the preparation of entangled atoms in different momentum modes. Here, we demonstrate a source of entangled atoms that is compatible with state-of-the-art interferometers. Entanglement is transferred from the spin degree of freedom of a Bose-Einstein condensate to well-separated momentum modes, witnessed by a squeezing parameter of -3.1(8) dB. Entanglement-enhanced atom interferometers open up unprecedented sensitivities for quantum gradiometers or gravitational wave detectors.
△ Less
Submitted 30 November, 2020; v1 submitted 29 October, 2020;
originally announced October 2020.
-
Extensions to the Proximal Distance Method of Constrained Optimization
Authors:
Alfonso Landeros,
Oscar Hernan Madrid Padilla,
Hua Zhou,
Kenneth Lange
Abstract:
The current paper studies the problem of minimizing a loss $f(\boldsymbol{x})$ subject to constraints of the form $\boldsymbol{D}\boldsymbol{x} \in S$, where $S$ is a closed set, convex or not, and $\boldsymbol{D}$ is a matrix that fuses parameters. Fusion constraints can capture smoothness, sparsity, or more general constraint patterns. To tackle this generic class of problems, we combine the Bel…
▽ More
The current paper studies the problem of minimizing a loss $f(\boldsymbol{x})$ subject to constraints of the form $\boldsymbol{D}\boldsymbol{x} \in S$, where $S$ is a closed set, convex or not, and $\boldsymbol{D}$ is a matrix that fuses parameters. Fusion constraints can capture smoothness, sparsity, or more general constraint patterns. To tackle this generic class of problems, we combine the Beltrami-Courant penalty method with the proximal distance principle. The latter is driven by minimization of penalized objectives $f(\boldsymbol{x})+\fracρ{2}\text{dist}(\boldsymbol{D}\boldsymbol{x},S)^2$ involving large tuning constants $ρ$ and the squared Euclidean distance of $\boldsymbol{D}\boldsymbol{x}$ from $S$. The next iterate $\boldsymbol{x}_{n+1}$ of the corresponding proximal distance algorithm is constructed from the current iterate $\boldsymbol{x}_n$ by minimizing the majorizing surrogate function $f(\boldsymbol{x})+\fracρ{2}\|\boldsymbol{D}\boldsymbol{x}-\mathcal{P}_{S}(\boldsymbol{D}\boldsymbol{x}_n)\|^2$. For fixed $ρ$ and a subanalytic loss $f(\boldsymbol{x})$ and a subanalytic constraint set $S$, we prove convergence to a stationary point. Under stronger assumptions, we provide convergence rates and demonstrate linear local convergence. We also construct a steepest descent (SD) variant to avoid costly linear system solves. To benchmark our algorithms, we compare against the alternating direction method of multipliers (ADMM). Our extensive numerical tests include problems on metric projection, convex regression, convex clustering, total variation image denoising, and projection of a matrix to a good condition number. These experiments demonstrate the superior speed and acceptable accuracy of our steepest variant on high-dimensional problems.
△ Less
Submitted 11 January, 2022; v1 submitted 1 September, 2020;
originally announced September 2020.
-
Attack-aware Security Function Chain Reordering
Authors:
Lukas Iffländer,
Nishant Rawtani,
Lukas Beierlieb,
Nicolas Fella,
Klaus-Dieter Lange,
Samuel Kounev
Abstract:
Attack-awareness recognizes self-awareness for security systems regarding the occurring attacks. More frequent and intense attacks on cloud and network infrastructures are pushing security systems to the limit. With the end of Moore's Law, merely scaling against these attacks is no longer economically justified. Previous works have already dealt with the adoption of Software-defined Networking and…
▽ More
Attack-awareness recognizes self-awareness for security systems regarding the occurring attacks. More frequent and intense attacks on cloud and network infrastructures are pushing security systems to the limit. With the end of Moore's Law, merely scaling against these attacks is no longer economically justified. Previous works have already dealt with the adoption of Software-defined Networking and Network Function Virtualization in security systems and used both approaches to optimize performance by the intelligent placement of security functions. However, these works have not yet considered the sequence in which traffic passes through these functions. In this work, we make a case for the need to take this ordering into account by showing its impact. We then propose a reordering framework and analyze what aspects are necessary for modeling security service function chains and making decisions regarding the order based on those models. We show the impact of the order and validate our framework in an evaluation environment. The effect can extend to multiple orders of magnitude, and the framework's evaluation proves the feasibility of our concept.
△ Less
Submitted 17 May, 2020;
originally announced May 2020.
-
Simple and Scalable Sparse k-means Clustering via Feature Ranking
Authors:
Zhiyue Zhang,
Kenneth Lange,
Jason Xu
Abstract:
Clustering, a fundamental activity in unsupervised learning, is notoriously difficult when the feature space is high-dimensional. Fortunately, in many realistic scenarios, only a handful of features are relevant in distinguishing clusters. This has motivated the development of sparse clustering techniques that typically rely on k-means within outer algorithms of high computational complexity. Curr…
▽ More
Clustering, a fundamental activity in unsupervised learning, is notoriously difficult when the feature space is high-dimensional. Fortunately, in many realistic scenarios, only a handful of features are relevant in distinguishing clusters. This has motivated the development of sparse clustering techniques that typically rely on k-means within outer algorithms of high computational complexity. Current techniques also require careful tuning of shrinkage parameters, further limiting their scalability. In this paper, we propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms. We show that our algorithm enjoys consistency and convergence guarantees. Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings. We showcase these contributions thoroughly via simulated experiments and real data benchmarks, including a case study on protein expression in trisomic mice.
△ Less
Submitted 22 October, 2020; v1 submitted 19 February, 2020;
originally announced February 2020.
-
OPENMENDEL: A Cooperative Programming Project for Statistical Genetics
Authors:
Hua Zhou,
Janet S. Sinsheimer,
Christopher A. German,
Sarah S. Ji,
Douglas M. Bates,
Benjamin B. Chu,
Kevin L. Keys,
Juhyun Kim,
Seyoon Ko,
Gordon D. Mosher,
Jeanette C. Papp,
Eric M. Sobel,
Jing Zhai,
Jin J. Zhou,
Kenneth Lange
Abstract:
Statistical methods for genomewide association studies (GWAS) continue to improve. However, the increasing volume and variety of genetic and genomic data make computational speed and ease of data manipulation mandatory in future software. In our view, a collaborative effort of statistical geneticists is required to develop open source software targeted to genetic epidemiology. Our attempt to meet…
▽ More
Statistical methods for genomewide association studies (GWAS) continue to improve. However, the increasing volume and variety of genetic and genomic data make computational speed and ease of data manipulation mandatory in future software. In our view, a collaborative effort of statistical geneticists is required to develop open source software targeted to genetic epidemiology. Our attempt to meet this need is called the OPENMENDELproject (https://openmendel.github.io). It aims to (1) enable interactive and reproducible analyses with informative intermediate results, (2) scale to big data analytics, (3) embrace parallel and distributed computing, (4) adapt to rapid hardware evolution, (5) allow cloud computing, (6) allow integration of varied genetic data types, and (7) foster easy communication between clinicians, geneticists, statisticians, and computer scientists. This article reviews and makes recommendations to the genetic epidemiology community in the context of the OPENMENDEL project.
△ Less
Submitted 13 February, 2019;
originally announced February 2019.
-
Excited states of molecules in strong uniform and non-uniform magnetic fields
Authors:
Sangita Sen,
Kai K. Lange,
Erik I. Tellgren
Abstract:
This paper reports an implementation of Hartree-Fock linear response with complex orbitals for computing electronic spectra of molecules in a strong external magnetic fields. The implementation is completely general, allowing for spin-restricted, spin-unrestricted, and general two-component reference states. The method is applied to small molecules placed in strong uniform and non-uniform magnetic…
▽ More
This paper reports an implementation of Hartree-Fock linear response with complex orbitals for computing electronic spectra of molecules in a strong external magnetic fields. The implementation is completely general, allowing for spin-restricted, spin-unrestricted, and general two-component reference states. The method is applied to small molecules placed in strong uniform and non-uniform magnetic fields of astrochemical importance at the Random Phase Approximation level of theory. For uniform fields, where comparison is possible, the spectra are found to be qualitatively similar to those recently obtained with equation of motion coupled cluster theory. We also study the behaviour of spin-forbidden excitations with progressive loss of spin symmetry induced by non-uniform magnetic fields. Finally, the equivalence of length and velocity gauges for oscillator strengths when using complex orbitals is investigated and found to hold numerically.
△ Less
Submitted 30 January, 2019;
originally announced January 2019.
-
BioSimulator.jl: Stochastic simulation in Julia
Authors:
Alfonso Landeros,
Timothy Stutz,
Kevin L. Keys,
Alexander Alekseyenko,
Janet S. Sinsheimer,
Kenneth Lange,
Mary Sehl
Abstract:
Biological systems with intertwined feedback loops pose a challenge to mathematical modeling efforts. Moreover, rare events, such as mutation and extinction, complicate system dynamics. Stochastic simulation algorithms are useful in generating time-evolution trajectories for these systems because they can adequately capture the influence of random fluctuations and quantify rare events. We present…
▽ More
Biological systems with intertwined feedback loops pose a challenge to mathematical modeling efforts. Moreover, rare events, such as mutation and extinction, complicate system dynamics. Stochastic simulation algorithms are useful in generating time-evolution trajectories for these systems because they can adequately capture the influence of random fluctuations and quantify rare events. We present a simple and flexible package, BioSimulator.jl, for implementing the Gillespie algorithm, $τ$-leaping, and related stochastic simulation algorithms. The objective of this work is to provide scientists across domains with fast, user-friendly simulation tools. We used the high-performance programming language Julia because of its emphasis on scientific computing. Our software package implements a suite of stochastic simulation algorithms based on Markov chain theory. We provide the ability to (a) diagram Petri Nets describing interactions, (b) plot average trajectories and attached standard deviations of each participating species over time, and (c) generate frequency distributions of each species at a specified time. BioSimulator.jl's interface allows users to build models programmatically within Julia. A model is then passed to the simulate routine to generate simulation data. The built-in tools allow one to visualize results and compute summary statistics. Our examples highlight the broad applicability of our software to systems of varying complexity from ecology, systems biology, chemistry, and genetics. The user-friendly nature of BioSimulator.jl encourages the use of stochastic simulation, minimizes tedious programming efforts, and reduces errors during model specification.
△ Less
Submitted 29 November, 2018;
originally announced November 2018.
-
Orthogonal Trace-Sum Maximization: Applications, Local Algorithms, and Global Optimality
Authors:
Joong-Ho Won,
Hua Zhou,
Kenneth Lange
Abstract:
This paper studies the problem of maximizing the sum of traces of matrix quadratic forms on a product of Stiefel manifolds. This orthogonal trace-sum maximization (OTSM) problem generalizes many interesting problems such as generalized canonical correlation analysis (CCA), Procrustes analysis, and cryo-electron microscopy of the Nobel prize fame. For these applications finding global solutions is…
▽ More
This paper studies the problem of maximizing the sum of traces of matrix quadratic forms on a product of Stiefel manifolds. This orthogonal trace-sum maximization (OTSM) problem generalizes many interesting problems such as generalized canonical correlation analysis (CCA), Procrustes analysis, and cryo-electron microscopy of the Nobel prize fame. For these applications finding global solutions is highly desirable but it has been unclear how to find even a stationary point, let alone testing its global optimality. Through a close inspection of Ky Fan's classical result (1949) on the variational formulation of the sum of largest eigenvalues of a symmetric matrix, and a semidefinite programming (SDP) relaxation of the latter, we first provide a simple method to certify global optimality of a given stationary point of OTSM. This method only requires testing whether a symmetric matrix is positive semidefinite. A by-product of this analysis is an unexpected strong duality between Shapiro-Botha (1988) and Zhang-Singer (2017). After showing that a popular algorithm for generalized CCA and Procrustes analysis may generate oscillating iterates, we propose a simple fix that provably guarantees convergence to a stationary point. The combination of our algorithm and certificate reveals novel global optima of various instances of OTSM.
△ Less
Submitted 6 February, 2021; v1 submitted 8 November, 2018;
originally announced November 2018.
-
Visibly Pushdown Languages and Free Profinite Algebras
Authors:
Silke Czarnetzki,
Andreas Krebs,
Klaus-Jörn Lange
Abstract:
We build a notion of algebraic recognition for visibly pushdown languages by finite algebraic objects. These come with a typical Eilenberg relationship, now between classes of visibly pushdown languages and classes of finite algebras. Building on that algebraic foundation, we further construct a topological object with one purpose being the possibility to derive a notion of equations, through whic…
▽ More
We build a notion of algebraic recognition for visibly pushdown languages by finite algebraic objects. These come with a typical Eilenberg relationship, now between classes of visibly pushdown languages and classes of finite algebras. Building on that algebraic foundation, we further construct a topological object with one purpose being the possibility to derive a notion of equations, through which it is possible to prove that some given visibly pushdown language is not part of a certain class (or to even show decidability of the membership-problem of the class in some cases). In particular, we obtain a special instance of Reiterman's theorem for pseudo-varieties. These findings are then employed on two subclasses of the visibly pushdown languages, for which we derive concrete sets of equations. For some showcase languages, these equations are utilised to prove non-membership to the previously described classes.
△ Less
Submitted 30 October, 2018;
originally announced October 2018.
-
Creation of entangled atomic states by an analogue of the Dynamical Casimir Effect
Authors:
K. Lange,
J. Peise,
B. Lücke,
T. Gruber,
A. Sala,
A. Polls,
W. Ertmer,
B. Juliá-Díaz,
L. Santos,
C. Klempt
Abstract:
If the boundary conditions of the quantum vacuum are changed in time, quantum field theory predicts that real, observable particles can be created in the initially empty modes. Here, we realize this effect by changing the boundary conditions of a spinor Bose-Einstein condensate, which yields a population of initially unoccupied spatial and spin excitations. We prove that the excitations are create…
▽ More
If the boundary conditions of the quantum vacuum are changed in time, quantum field theory predicts that real, observable particles can be created in the initially empty modes. Here, we realize this effect by changing the boundary conditions of a spinor Bose-Einstein condensate, which yields a population of initially unoccupied spatial and spin excitations. We prove that the excitations are created as entangled excitation pairs by certifying continuous-variable entanglement within the many-particle output state.
△ Less
Submitted 29 August, 2018; v1 submitted 7 May, 2018;
originally announced May 2018.
-
Generalized Linear Model Regression under Distance-to-set Penalties
Authors:
Jason Xu,
Eric C. Chi,
Kenneth Lange
Abstract:
Estimation in generalized linear models (GLM) is complicated by the presence of constraints. One can handle constraints by maximizing a penalized log-likelihood. Penalties such as the lasso are effective in high dimensions, but often lead to unwanted shrinkage. This paper explores instead penalizing the squared distance to constraint sets. Distance penalties are more flexible than algebraic and re…
▽ More
Estimation in generalized linear models (GLM) is complicated by the presence of constraints. One can handle constraints by maximizing a penalized log-likelihood. Penalties such as the lasso are effective in high dimensions, but often lead to unwanted shrinkage. This paper explores instead penalizing the squared distance to constraint sets. Distance penalties are more flexible than algebraic and regularization penalties, and avoid the drawback of shrinkage. To optimize distance penalized objectives, we make use of the majorization-minimization principle. Resulting algorithms constructed within this framework are amenable to acceleration and come with global convergence guarantees. Applications to shape constraints, sparse regression, and rank-restricted matrix regression on synthetic and real data showcase strong empirical performance, even under non-convex constraints.
△ Less
Submitted 3 November, 2017;
originally announced November 2017.
-
Entanglement between two spatially separated atomic modes
Authors:
Karsten Lange,
Jan Peise,
Bernd Lücke,
Ilka Kruse,
Giuseppe Vitagliano,
Iagoba Apellaniz,
Matthias Kleinmann,
Geza Toth,
Carsten Klempt
Abstract:
Modern quantum technologies in the fields of quantum computing, quantum simulation and quantum metrology require the creation and control of large ensembles of entangled particles. In ultracold ensembles of neutral atoms, highly entangled states containing thousands of particles have been generated, outnumbering any other physical system by orders of magnitude. The entanglement generation relies o…
▽ More
Modern quantum technologies in the fields of quantum computing, quantum simulation and quantum metrology require the creation and control of large ensembles of entangled particles. In ultracold ensembles of neutral atoms, highly entangled states containing thousands of particles have been generated, outnumbering any other physical system by orders of magnitude. The entanglement generation relies on the fundamental particle-exchange symmetry in ensembles of identical particles, which lacks the standard notion of entanglement between clearly definable subsystems. Here we present the generation of entanglement between two spatially separated clouds by splitting an ensemble of ultracold identical particles. Since the clouds can be addressed individually, our experiments open a path to exploit the available entangled states of indistinguishable particles for quantum information applications.
△ Less
Submitted 8 August, 2017;
originally announced August 2017.
-
Erratum to the article: Charge transfer to solvent identified using dark channel fluorescence-yield L-edge spectroscopy, NATURE CHEMISTRY 2 (2010) 853
Authors:
Emad F. Aziz,
Hannelore Rittmann-Frank,
Kathrin M. Lange,
Sebastien Bonhommeau,
Majed Chergui
Abstract:
Erratum to the article: Charge transfer to solvent identified using dark channel fluorescence-yield L-edge spectroscopy, NATURE CHEMISTRY 2 (2010) 853
Erratum to the article: Charge transfer to solvent identified using dark channel fluorescence-yield L-edge spectroscopy, NATURE CHEMISTRY 2 (2010) 853
△ Less
Submitted 12 June, 2017; v1 submitted 10 May, 2017;
originally announced May 2017.
-
An MM Algorithm for Split Feasibility Problems
Authors:
Jason Xu,
Eric C. Chi,
Meng Yang,
Kenneth Lange
Abstract:
The classical multi-set split feasibility problem seeks a point in the intersection of finitely many closed convex domain constraints, whose image under a linear mapping also lies in the intersection of finitely many closed convex range constraints. Split feasibility generalizes important inverse problems including convex feasibility, linear complementarity, and regression with constraint sets. Wh…
▽ More
The classical multi-set split feasibility problem seeks a point in the intersection of finitely many closed convex domain constraints, whose image under a linear mapping also lies in the intersection of finitely many closed convex range constraints. Split feasibility generalizes important inverse problems including convex feasibility, linear complementarity, and regression with constraint sets. When a feasible point does not exist, solution methods that proceed by minimizing a proximity function can be used to obtain optimal approximate solutions to the problem. We present an extension of the proximity function approach that generalizes the linear split feasibility problem to allow for non-linear mappings. Our algorithm is based on the principle of majorization-minimization, is amenable to quasi-Newton acceleration, and comes complete with convergence guarantees under mild assumptions. Furthermore, we show that the Euclidean norm appearing in the proximity function of the non-linear split feasibility problem can be replaced by arbitrary Bregman divergences. We explore several examples illustrating the merits of non-linear formulations over the linear case, with a focus on optimization for intensity-modulated radiation therapy.
△ Less
Submitted 17 January, 2017; v1 submitted 16 December, 2016;
originally announced December 2016.
-
Iterative Hard Thresholding for Model Selection in Genome-Wide Association Studies
Authors:
Kevin L. Keys,
Gary K. Chen,
Kenneth Lange
Abstract:
A genome-wide association study (GWAS) correlates marker variation with trait variation in a sample of individuals. Each study subject is genotyped at a multitude of SNPs (single nucleotide polymorphisms) spanning the genome. Here we assume that subjects are unrelated and collected at random and that trait values are normally distributed or transformed to normality. Over the past decade, researche…
▽ More
A genome-wide association study (GWAS) correlates marker variation with trait variation in a sample of individuals. Each study subject is genotyped at a multitude of SNPs (single nucleotide polymorphisms) spanning the genome. Here we assume that subjects are unrelated and collected at random and that trait values are normally distributed or transformed to normality. Over the past decade, researchers have been remarkably successful in applying GWAS analysis to hundreds of traits. The massive amount of data produced in these studies present unique computational challenges. Penalized regression with LASSO or MCP penalties is capable of selecting a handful of associated SNPs from millions of potential SNPs. Unfortunately, model selection can be corrupted by false positives and false negatives, obscuring the genetic underpinning of a trait. This paper introduces the iterative hard thresholding (IHT) algorithm to the GWAS analysis of continuous traits. Our parallel implementation of IHT accommodates SNP genotype compression and exploits multiple CPU cores and graphics processing units (GPUs). This allows statistical geneticists to leverage commodity desktop computers in GWAS analysis and to avoid supercomputing. We evaluate IHT performance on both simulated and real GWAS data and conclude that it reduces false positive and false negative rates while remaining competitive in computational time with penalized regression. Source code is freely available at https://github.com/klkeys/IHT.jl.
△ Less
Submitted 24 July, 2017; v1 submitted 3 August, 2016;
originally announced August 2016.
-
0.75 atoms improve the clock signal of 10,000 atoms
Authors:
I. Kruse,
K. Lange,
J. Peise,
B. Lücke,
L. Pezzè,
J. Arlt,
W. Ertmer,
C. Lisdat,
L. Santos,
A. Smerzi,
C. Klempt
Abstract:
Since the pioneering work of Ramsey, atom interferometers are employed for precision metrology, in particular to measure time and to realize the second. In a classical interferometer, an ensemble of atoms is prepared in one of the two input states, whereas the second one is left empty. In this case, the vacuum noise restricts the precision of the interferometer to the standard quantum limit (SQL).…
▽ More
Since the pioneering work of Ramsey, atom interferometers are employed for precision metrology, in particular to measure time and to realize the second. In a classical interferometer, an ensemble of atoms is prepared in one of the two input states, whereas the second one is left empty. In this case, the vacuum noise restricts the precision of the interferometer to the standard quantum limit (SQL). Here, we propose and experimentally demonstrate a novel clock configuration that surpasses the SQL by squeezing the vacuum in the empty input state. We create a squeezed vacuum state containing an average of 0.75 atoms to improve the clock sensitivity of 10,000 atoms by 2.05 dB. The SQL poses a significant limitation for today's microwave fountain clocks, which serve as the main time reference. We evaluate the major technical limitations and challenges for devising a next generation of fountain clocks based on atomic squeezed vacuum.
△ Less
Submitted 25 May, 2016;
originally announced May 2016.
-
Proximal Distance Algorithms: Theory and Examples
Authors:
Kevin L. Keys,
Hua Zhou,
Kenneth Lange
Abstract:
Proximal distance algorithms combine the classical penalty method of constrained minimization with distance majorization. If $f(\boldsymbol{x})$ is the loss function, and $C$ is the constraint set in a constrained minimization problem, then the proximal distance principle mandates minimizing the penalized loss $f(\boldsymbol{x})+\fracρ{2}\mathop{dist}(x,C)^2$ and following the solution…
▽ More
Proximal distance algorithms combine the classical penalty method of constrained minimization with distance majorization. If $f(\boldsymbol{x})$ is the loss function, and $C$ is the constraint set in a constrained minimization problem, then the proximal distance principle mandates minimizing the penalized loss $f(\boldsymbol{x})+\fracρ{2}\mathop{dist}(x,C)^2$ and following the solution $\boldsymbol{x}_ρ$ to its limit as $ρ$ tends to $\infty$. At each iteration the squared Euclidean distance $\mathop{dist}(\boldsymbol{x},C)^2$ is majorized by the spherical quadratic $\| \boldsymbol{x}-P_C(\boldsymbol{x}_k)\|^2$, where $P_C(\boldsymbol{x}_k)$ denotes the projection of the current iterate $\boldsymbol{x}_k$ onto $C$. The minimum of the surrogate function $f(\boldsymbol{x})+\fracρ{2}\|\boldsymbol{x}-P_C(\boldsymbol{x}_k)\|^2$ is given by the proximal map $\mathop{prox}_{ρ^{-1}f}[P_C(\boldsymbol{x}_k)]$. The next iterate $\boldsymbol{x}_{k+1}$ automatically decreases the original penalized loss for fixed $ρ$. Since many explicit projections and proximal maps are known, it is straightforward to derive and implement novel optimization algorithms in this setting. These algorithms can take hundreds if not thousands of iterations to converge, but the stereotyped nature of each iteration makes proximal distance algorithms competitive with traditional algorithms. For convex problems, we prove global convergence. Our numerical examples include a) linear programming, b) nonnegative quadratic programming, c) projection to the closest kinship matrix, d) projection onto a second-order cone constraint, e) calculation of Horn's copositive matrix index, f) linear complementarity programming, and g) sparse principal components analysis. The proximal distance algorithm in each case is competitive or superior in speed to traditional methods.
△ Less
Submitted 30 August, 2018; v1 submitted 19 April, 2016;
originally announced April 2016.
-
Satisfying the Einstein-Podolsky-Rosen criterion with massive particles
Authors:
J. Peise,
I. Kruse,
K. Lange,
B. Lücke,
L. Pezzè,
J. Arlt,
W. Ertmer,
K. Hammerer,
L. Santos,
A. Smerzi,
C. Klempt
Abstract:
In 1935, Einstein, Podolsky and Rosen (EPR) questioned the completeness of quantum mechanics by devising a quantum state of two massive particles with maximally correlated space and momentum coordinates. The EPR criterion qualifies such continuous-variable entangled states, where a measurement of one subsystem seemingly allows for a prediction of the second subsystem beyond the Heisenberg uncertai…
▽ More
In 1935, Einstein, Podolsky and Rosen (EPR) questioned the completeness of quantum mechanics by devising a quantum state of two massive particles with maximally correlated space and momentum coordinates. The EPR criterion qualifies such continuous-variable entangled states, where a measurement of one subsystem seemingly allows for a prediction of the second subsystem beyond the Heisenberg uncertainty relation. Up to now, continuous-variable EPR correlations have only been created with photons, while the demonstration of such strongly correlated states with massive particles is still outstanding. Here, we report on the creation of an EPR-correlated two-mode squeezed state in an ultracold atomic ensemble. The state shows an EPR entanglement parameter of 0.18(3), which is 2.4 standard deviations below the threshold 1/4 of the EPR criterion. We also present a full tomographic reconstruction of the underlying many-particle quantum state. The state presents a resource for tests of quantum nonlocality and a wide variety of applications in the field of continuous-variable quantum information and metrology.
△ Less
Submitted 27 November, 2015;
originally announced November 2015.
-
MM Algorithms for Variance Components Models
Authors:
Hua Zhou,
Liuyi Hu,
Jin Zhou,
Kenneth Lange
Abstract:
Variance components estimation and mixed model analysis are central themes in statistics with applications in numerous scientific disciplines. Despite the best efforts of generations of statisticians and numerical analysts, maximum likelihood estimation and restricted maximum likelihood estimation of variance component models remain numerically challenging. Building on the minorization-maximizatio…
▽ More
Variance components estimation and mixed model analysis are central themes in statistics with applications in numerous scientific disciplines. Despite the best efforts of generations of statisticians and numerical analysts, maximum likelihood estimation and restricted maximum likelihood estimation of variance component models remain numerically challenging. Building on the minorization-maximization (MM) principle, this paper presents a novel iterative algorithm for variance components estimation. MM algorithm is trivial to implement and competitive on large data problems. The algorithm readily extends to more complicated problems such as linear mixed models, multivariate response models possibly with missing data, maximum a posteriori estimation, penalized estimation, and generalized estimating equations (GEE). We establish the global convergence of the MM algorithm to a KKT point and demonstrate, both numerically and theoretically, that it converges faster than the classical EM algorithm when the number of variance components is greater than two and all covariance matrices are positive definite.
△ Less
Submitted 24 September, 2015;
originally announced September 2015.
-
The proximal distance algorithm
Authors:
Kenneth Lange,
Kevin L. Keys
Abstract:
The MM principle is a device for creating optimization algorithms satisfying the ascent or descent property. The current survey emphasizes the role of the MM principle in nonlinear programming. For smooth functions, one can construct an adaptive interior point method based on scaled Bregmann barriers. This algorithm does not follow the central path. For convex programming subject to nonsmooth cons…
▽ More
The MM principle is a device for creating optimization algorithms satisfying the ascent or descent property. The current survey emphasizes the role of the MM principle in nonlinear programming. For smooth functions, one can construct an adaptive interior point method based on scaled Bregmann barriers. This algorithm does not follow the central path. For convex programming subject to nonsmooth constraints, one can combine an exact penalty method with distance majorization to create versatile algorithms that are effective even in discrete optimization. These proximal distance algorithms are highly modular and reduce to set projections and proximal mappings, both very well-understood techniques in optimization. We illustrate the possibilities in linear programming, binary piecewise-linear programming, nonnegative quadratic programming, $\ell_0$ regression, matrix completion, and inverse sparse covariance estimation.
△ Less
Submitted 27 July, 2015;
originally announced July 2015.
-
Coupled-cluster theory for atoms and molecules in strong magnetic fields
Authors:
Stella Stopkowicz,
Jürgen Gauss,
Kai K. Lange,
Erik I. Tellgren,
Trygve Helgaker
Abstract:
An implementation of coupled-cluster (CC) theory to treat atoms and molecules in finite magnetic fields is presented. The main challenges stem from the magnetic-field dependence in the Hamiltonian, or, more precisely, the appearance of the angular momentum operator, due to which the wave function becomes complex and which introduces a gauge-origin dependence. For this reason, an implementation of…
▽ More
An implementation of coupled-cluster (CC) theory to treat atoms and molecules in finite magnetic fields is presented. The main challenges stem from the magnetic-field dependence in the Hamiltonian, or, more precisely, the appearance of the angular momentum operator, due to which the wave function becomes complex and which introduces a gauge-origin dependence. For this reason, an implementation of a complex CC code is required together with the use of gauge-including atomic orbitals to ensure gauge-origin independence. Results of coupled-cluster singles--doubles--perturbative-triples (CCSD(T)) calculations are presented for atoms and molecules with a focus on the dependence of correlation and binding energies on the magnetic field.
△ Less
Submitted 12 August, 2015; v1 submitted 29 May, 2015;
originally announced May 2015.
-
Convex Clustering: An Attractive Alternative to Hierarchical Clustering
Authors:
Gary K. Chen,
Eric Chi,
John Ranola,
Kenneth Lange
Abstract:
The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical cluster…
▽ More
The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The current paper exploits the proximal distance principle to construct a novel algorithm for solving the convex clustering problem. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. Our convex clustering software separates parameters, accommodates missing data, and supports prior information on relationships. The software is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems.
△ Less
Submitted 6 September, 2014;
originally announced September 2014.
-
Fast Genome-Wide QTL Analysis Using Mendel
Authors:
Hua Zhou,
Jin Zhou,
Tao Hu,
Eric M Sobel,
Kenneth Lange
Abstract:
Pedigree GWAS (Option 29) in the current version of the Mendel software is an optimized subroutine for performing large scale genome-wide QTL analysis. This analysis (a) works for random sample data, pedigree data, or a mix of both, (b) is highly efficient in both run time and memory requirement, (c) accommodates both univariate and multivariate traits, (d) works for autosomal and x-linked loci, (…
▽ More
Pedigree GWAS (Option 29) in the current version of the Mendel software is an optimized subroutine for performing large scale genome-wide QTL analysis. This analysis (a) works for random sample data, pedigree data, or a mix of both, (b) is highly efficient in both run time and memory requirement, (c) accommodates both univariate and multivariate traits, (d) works for autosomal and x-linked loci, (e) correctly deals with missing data in traits, covariates, and genotypes, (f) allows for covariate adjustment and constraints among parameters, (g) uses either theoretical or SNP-based empirical kinship matrix for additive polygenic effects, (h) allows extra variance components such as dominant polygenic effects and household effects, (i) detects and reports outlier individuals and pedigrees, and (j) allows for robust estimation via the $t$-distribution. The current paper assesses these capabilities on the genetics analysis workshop 19 (GAW19) sequencing data. We analyzed simulated and real phenotypes for both family and random sample data sets. For instance, when jointly testing the 8 longitudinally measured systolic blood pressure (SBP) and diastolic blood pressure (DBP) traits, it takes Mendel 78 minutes on a standard laptop computer to read, quality check, and analyze a data set with 849 individuals and 8.3 million SNPs. Genome-wide eQTL analysis of 20,643 expression traits on 641 individuals with 8.3 million SNPs takes 30 hours using 20 parallel runs on a cluster. Mendel is freely available at \url{http://www.genetics.ucla.edu/software}.
△ Less
Submitted 30 July, 2014;
originally announced July 2014.
-
Fast Genome-Wide QTL Association Mapping on Pedigree and Population Data
Authors:
Hua Zhou,
John Blangero,
Thomas D. Dyer,
Kei-hang K. Chan,
Kenneth Lange,
Eric M. Sobel
Abstract:
Since most analysis software for genome-wide association studies (GWAS) currently exploit only unrelated individuals, there is a need for efficient applications that can handle general pedigree data or mixtures of both population and pedigree data. Even data sets thought to consist of only unrelated individuals may include cryptic relationships that can lead to false positives if not discovered an…
▽ More
Since most analysis software for genome-wide association studies (GWAS) currently exploit only unrelated individuals, there is a need for efficient applications that can handle general pedigree data or mixtures of both population and pedigree data. Even data sets thought to consist of only unrelated individuals may include cryptic relationships that can lead to false positives if not discovered and controlled for. In addition, family designs possess compelling advantages. They are better equipped to detect rare variants, control for population stratification, and facilitate the study of parent-of-origin effects. Pedigrees selected for extreme trait values often segregate a single gene with strong effect. Finally, many pedigrees are available as an important legacy from the era of linkage analysis. Unfortunately, pedigree likelihoods are notoriously hard to compute. In this paper we re-examine the computational bottlenecks and implement ultra-fast pedigree-based GWAS analysis. Kinship coefficients can either be based on explicitly provided pedigrees or automatically estimated from dense markers. Our strategy (a) works for random sample data, pedigree data, or a mix of both; (b) entails no loss of power; (c) allows for any number of covariate adjustments, including correction for population stratification; (d) allows for testing SNPs under additive, dominant, and recessive models; and (e) accommodates both univariate and multivariate quantitative traits. On a typical personal computer (6 CPU cores at 2.67 GHz), analyzing a univariate HDL (high-density lipoprotein) trait from the San Antonio Family Heart Study (935,392 SNPs on 1357 individuals in 124 pedigrees) takes less than 2 minutes and 1.5 GB of memory. Complete multivariate QTL analysis of the three time-points of the longitudinal HDL multivariate trait takes less than 5 minutes and 1.5 GB of memory.
△ Less
Submitted 19 December, 2014; v1 submitted 30 July, 2014;
originally announced July 2014.