-
A Statistical Interpretation of the Maximum Subarray Problem
Authors:
Dennis Wei,
Dmitry M. Malioutov
Abstract:
Maximum subarray is a classical problem in computer science that given an array of numbers aims to find a contiguous subarray with the largest sum. We focus on its use for a noisy statistical problem of localizing an interval with a mean different from background. While a naive application of maximum subarray fails at this task, both a penalized and a constrained version can succeed. We show that…
▽ More
Maximum subarray is a classical problem in computer science that given an array of numbers aims to find a contiguous subarray with the largest sum. We focus on its use for a noisy statistical problem of localizing an interval with a mean different from background. While a naive application of maximum subarray fails at this task, both a penalized and a constrained version can succeed. We show that the penalized version can be derived for common exponential family distributions, in a manner similar to the change-point detection literature, and we interpret the resulting optimal penalty value. The failure of the naive formulation is then explained by an analysis of the estimated interval boundaries. Experiments further quantify the effect of deviating from the optimal penalty. We also relate the penalized and constrained formulations and show that the solutions to the former lie on the convex hull of the solutions to the latter.
△ Less
Submitted 20 October, 2023; v1 submitted 26 April, 2023;
originally announced April 2023.
-
Proceedings of the 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017)
Authors:
Been Kim,
Dmitry M. Malioutov,
Kush R. Varshney,
Adrian Weller
Abstract:
This is the Proceedings of the 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), which was held in Sydney, Australia, August 10, 2017. Invited speakers were Tony Jebara, Pang Wei Koh, and David Sontag.
This is the Proceedings of the 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), which was held in Sydney, Australia, August 10, 2017. Invited speakers were Tony Jebara, Pang Wei Koh, and David Sontag.
△ Less
Submitted 8 August, 2017;
originally announced August 2017.
-
Quantifying homologous proteins and proteoforms
Authors:
Dmitry Malioutov,
Tianchi Chen,
Jacob Jaffe,
Edoardo Airoldi,
Steven Carr,
Bogdan Budnik,
Nikolai Slavov
Abstract:
Many proteoforms - arising from alternative splicing, post-translational modifications (PTMs), or paralogous genes - have distinct biological functions, such as histone PTM proteoforms. However, their quantification by existing bottom-up mass-spectrometry (MS) methods is undermined by peptide-specific biases. To avoid these biases, we developed and implemented a first-principles model (HIquant) fo…
▽ More
Many proteoforms - arising from alternative splicing, post-translational modifications (PTMs), or paralogous genes - have distinct biological functions, such as histone PTM proteoforms. However, their quantification by existing bottom-up mass-spectrometry (MS) methods is undermined by peptide-specific biases. To avoid these biases, we developed and implemented a first-principles model (HIquant) for quantifying proteoform stoichiometries. We characterized when MS data allow inferring proteoform stoichiometries by HIquant, derived an algorithm for optimal inference, and demonstrated experimentally high accuracy in quantifying fractional PTM occupancy without using external standards, even in the challenging case of the histone modification code.
HIquant server is implemented at: https://web.northeastern.edu/slavov/2014_HIquant/
△ Less
Submitted 5 August, 2017;
originally announced August 2017.
-
Proceedings of the 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016)
Authors:
Been Kim,
Dmitry M. Malioutov,
Kush R. Varshney
Abstract:
This is the Proceedings of the 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), which was held in New York, NY, June 23, 2016.
Invited speakers were Susan Athey, Rich Caruana, Jacob Feldman, Percy Liang, and Hanna Wallach.
This is the Proceedings of the 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), which was held in New York, NY, June 23, 2016.
Invited speakers were Susan Athey, Rich Caruana, Jacob Feldman, Percy Liang, and Hanna Wallach.
△ Less
Submitted 27 July, 2016; v1 submitted 8 July, 2016;
originally announced July 2016.
-
Interpretable Two-level Boolean Rule Learning for Classification
Authors:
Guolong Su,
Dennis Wei,
Kush R. Varshney,
Dmitry M. Malioutov
Abstract:
As a contribution to interpretable machine learning research, we develop a novel optimization framework for learning accurate and sparse two-level Boolean rules. We consider rules in both conjunctive normal form (AND-of-ORs) and disjunctive normal form (OR-of-ANDs). A principled objective function is proposed to trade classification accuracy and interpretability, where we use Hamming loss to chara…
▽ More
As a contribution to interpretable machine learning research, we develop a novel optimization framework for learning accurate and sparse two-level Boolean rules. We consider rules in both conjunctive normal form (AND-of-ORs) and disjunctive normal form (OR-of-ANDs). A principled objective function is proposed to trade classification accuracy and interpretability, where we use Hamming loss to characterize accuracy and sparsity to characterize interpretability. We propose efficient procedures to optimize these objectives based on linear programming (LP) relaxation, block coordinate descent, and alternating minimization. Experiments show that our new algorithms provide very good tradeoffs between accuracy and interpretability.
△ Less
Submitted 18 June, 2016;
originally announced June 2016.
-
Convex Total Least Squares
Authors:
Dmitry Malioutov,
Nikolai Slavov
Abstract:
We study the total least squares (TLS) problem that generalizes least squares regression by allowing measurement errors in both dependent and independent variables. TLS is widely used in applied fields including computer vision, system identification and econometrics. The special case when all dependent and independent variables have the same level of uncorrelated Gaussian noise, known as ordinary…
▽ More
We study the total least squares (TLS) problem that generalizes least squares regression by allowing measurement errors in both dependent and independent variables. TLS is widely used in applied fields including computer vision, system identification and econometrics. The special case when all dependent and independent variables have the same level of uncorrelated Gaussian noise, known as ordinary TLS, can be solved by singular value decomposition (SVD). However, SVD cannot solve many important practical TLS problems with realistic noise structure, such as having varying measurement noise, known structure on the errors, or large outliers requiring robust error-norms. To solve such problems, we develop convex relaxation approaches for a general class of structured TLS (STLS). We show both theoretically and experimentally, that while the plain nuclear norm relaxation incurs large approximation errors for STLS, the re-weighted nuclear norm approach is very effective, and achieves better accuracy on challenging STLS problems than popular non-convex solvers. We describe a fast solution based on augmented Lagrangian formulation, and apply our approach to an important class of biological problems that use population average measurements to infer cell-type and physiological-state specific expression levels that are very hard to measure directly.
△ Less
Submitted 1 June, 2014;
originally announced June 2014.
-
Iterative Log Thresholding
Authors:
Dmitry Malioutov,
Aleksandr Aravkin
Abstract:
Sparse reconstruction approaches using the re-weighted l1-penalty have been shown, both empirically and theoretically, to provide a significant improvement in recovering sparse signals in comparison to the l1-relaxation. However, numerical optimization of such penalties involves solving problems with l1-norms in the objective many times. Using the direct link of reweighted l1-penalties to the conc…
▽ More
Sparse reconstruction approaches using the re-weighted l1-penalty have been shown, both empirically and theoretically, to provide a significant improvement in recovering sparse signals in comparison to the l1-relaxation. However, numerical optimization of such penalties involves solving problems with l1-norms in the objective many times. Using the direct link of reweighted l1-penalties to the concave log-regularizer for sparsity, we derive a simple prox-like algorithm for the log-regularized formulation. The proximal splitting step of the algorithm has a closed form solution, and we call the algorithm 'log-thresholding' in analogy to soft thresholding for the l1-penalty.
We establish convergence results, and demonstrate that log-thresholding provides more accurate sparse reconstructions compared to both soft and hard thresholding. Furthermore, the approach can be directly extended to optimization over matrices with penalty for rank (i.e. the nuclear norm penalty and its re-weigthed version), where we suggest a singular-value log-thresholding approach.
△ Less
Submitted 5 December, 2013;
originally announced December 2013.