Skip to main content

Showing 1–8 of 8 results for author: Sapir, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2501.01836  [pdf, ps, other

    cs.LG cs.AI

    Practical machine learning is learning on small samples

    Authors: Marina Sapir

    Abstract: Based on limited observations, machine learning discerns a dependence which is expected to hold in the future. What makes it possible? Statistical learning theory imagines indefinitely increasing training sample to justify its approach. In reality, there is no infinite time or even infinite general population for learning. Here I argue that practical machine learning is based on an implicit assump… ▽ More

    Submitted 3 January, 2025; originally announced January 2025.

  2. arXiv:2206.07586  [pdf, ps, other

    cs.AI cs.LG cs.LO

    Theory of Machine Learning with Limited Data

    Authors: Marina Sapir

    Abstract: Application of machine learning may be understood as deriving new knowledge for practical use through explaining accumulated observations, training set. Peirce used the term abduction for this kind of inference. Here I formalize the concept of abduction for real valued hypotheses, and show that 14 of the most popular textbook ML learners (every learner I tested), covering classification, regressio… ▽ More

    Submitted 1 January, 2023; v1 submitted 15 June, 2022; originally announced June 2022.

  3. arXiv:2006.09500  [pdf, ps, other

    cs.LG stat.ML

    Logic of Machine Learning

    Authors: Marina Sapir

    Abstract: The main question is: why and how can we ever predict based on a finite sample? The question is not answered by statistical learning theory. Here, I suggest that prediction requires belief in "predictability" of the underlying dependence, and learning involves search for a hypothesis where these beliefs are violated the least given the observations. The measure of these violations ("errors") for g… ▽ More

    Submitted 27 January, 2022; v1 submitted 16 June, 2020; originally announced June 2020.

  4. arXiv:1807.10681  [pdf, ps, other

    cs.LG stat.ML

    Learnable: Theory vs Applications

    Authors: Marina Sapir

    Abstract: Two different views on machine learning problem: Applied learning (machine learning with business applications) and Agnostic PAC learning are formalized and compared here. I show that, under some conditions, the theory of PAC Learnable provides a way to solve the Applied learning problem. However, the theory requires to have the training sets so large, that it would make the learning practically u… ▽ More

    Submitted 27 July, 2018; originally announced July 2018.

    Comments: 10 pages

  5. arXiv:1706.08439  [pdf, ps, other

    cs.AI

    Optimal choice: new machine learning problem and its solution

    Authors: Marina Sapir

    Abstract: The task of learning to pick a single preferred example out a finite set of examples, an "optimal choice problem", is a supervised machine learning problem with complex, structured input. Problems of optimal choice emerge often in various practical applications. We formalize the problem, show that it does not satisfy the assumptions of statistical learning theory, yet it can be solved efficiently… ▽ More

    Submitted 6 July, 2017; v1 submitted 26 June, 2017; originally announced June 2017.

  6. arXiv:1112.1966  [pdf, ps, other

    cs.LG

    Bipartite ranking algorithm for classification and survival analysis

    Authors: Marina Sapir

    Abstract: Unsupervised aggregation of independently built univariate predictors is explored as an alternative regularization approach for noisy, sparse datasets. Bipartite ranking algorithm Smooth Rank implementing this approach is introduced. The advantages of this algorithm are demonstrated on two types of problems. First, Smooth Rank is applied to two-class problems from bio-medical field, where ranking… ▽ More

    Submitted 8 December, 2011; originally announced December 2011.

    Comments: arXiv admin note: substantial text overlap with arXiv:1108.2820

  7. arXiv:1109.5311  [pdf, other

    cs.LG stat.ML

    Bias Plus Variance Decomposition for Survival Analysis Problems

    Authors: Marina Sapir

    Abstract: Bias - variance decomposition of the expected error defined for regression and classification problems is an important tool to study and compare different algorithms, to find the best areas for their application. Here the decomposition is introduced for the survival analysis problem. In our experiments, we study bias -variance parts of the expected error for two algorithms: original Cox proportion… ▽ More

    Submitted 24 September, 2011; originally announced September 2011.

  8. arXiv:1108.2820  [pdf, other

    cs.LG stat.ML

    Ensemble Risk Modeling Method for Robust Learning on Scarce Data

    Authors: Marina Sapir

    Abstract: In medical risk modeling, typical data are "scarce": they have relatively small number of training instances (N), censoring, and high dimensionality (M). We show that the problem may be effectively simplified by reducing it to bipartite ranking, and introduce new bipartite ranking algorithm, Smooth Rank, for robust learning on scarce data. The algorithm is based on ensemble learning with unsupervi… ▽ More

    Submitted 28 January, 2012; v1 submitted 13 August, 2011; originally announced August 2011.