-
Parameter Learning in PRISM Programs with Continuous Random Variables
Authors:
Muhammad Asiful Islam,
C. R. Ramakrishnan,
I. V. Ramakrishnan
Abstract:
Probabilistic Logic Programming (PLP), exemplified by Sato and Kameya's PRISM, Poole's ICL, De Raedt et al's ProbLog and Vennekens et al's LPAD, combines statistical and logical knowledge representation and inference. Inference in these languages is based on enumerative construction of proofs over logic programs. Consequently, these languages permit very limited use of random variables with contin…
▽ More
Probabilistic Logic Programming (PLP), exemplified by Sato and Kameya's PRISM, Poole's ICL, De Raedt et al's ProbLog and Vennekens et al's LPAD, combines statistical and logical knowledge representation and inference. Inference in these languages is based on enumerative construction of proofs over logic programs. Consequently, these languages permit very limited use of random variables with continuous distributions. In this paper, we extend PRISM with Gaussian random variables and linear equality constraints, and consider the problem of parameter learning in the extended language. Many statistical models such as finite mixture models and Kalman filter can be encoded in extended PRISM. Our EM-based learning algorithm uses a symbolic inference procedure that represents sets of derivations without enumeration. This permits us to learn the distribution parameters of extended PRISM programs with discrete as well as Gaussian variables. The learning algorithm naturally generalizes the ones used for PRISM and Hybrid Bayesian Networks.
△ Less
Submitted 19 March, 2012;
originally announced March 2012.
-
Scaling Datalog for Machine Learning on Big Data
Authors:
Yingyi Bu,
Vinayak Borkar,
Michael J. Carey,
Joshua Rosen,
Neoklis Polyzotis,
Tyson Condie,
Markus Weimer,
Raghu Ramakrishnan
Abstract:
In this paper, we present the case for a declarative foundation for data-intensive machine learning systems. Instead of creating a new system for each specific flavor of machine learning task, or hardcoding new optimizations, we argue for the use of recursive queries to program a variety of machine learning systems. By taking this approach, database query optimization techniques can be utilized to…
▽ More
In this paper, we present the case for a declarative foundation for data-intensive machine learning systems. Instead of creating a new system for each specific flavor of machine learning task, or hardcoding new optimizations, we argue for the use of recursive queries to program a variety of machine learning systems. By taking this approach, database query optimization techniques can be utilized to identify effective execution plans, and the resulting runtime plans can be executed on a single unified data-parallel query processing engine. As a proof of concept, we consider two programming models--Pregel and Iterative Map-Reduce-Update---from the machine learning domain, and show how they can be captured in Datalog, tuned for a specific task, and then compiled into an optimized physical plan. Experiments performed on a large computing cluster with real data demonstrate that this declarative approach can provide very good performance while offering both increased generality and programming ease.
△ Less
Submitted 2 March, 2012; v1 submitted 1 March, 2012;
originally announced March 2012.
-
Inference in Probabilistic Logic Programs with Continuous Random Variables
Authors:
Muhammad Asiful Islam,
C. R. Ramakrishnan,
I. V. Ramakrishnan
Abstract:
Probabilistic Logic Programming (PLP), exemplified by Sato and Kameya's PRISM, Poole's ICL, Raedt et al's ProbLog and Vennekens et al's LPAD, is aimed at combining statistical and logical knowledge representation and inference. A key characteristic of PLP frameworks is that they are conservative extensions to non-probabilistic logic programs which have been widely used for knowledge representation…
▽ More
Probabilistic Logic Programming (PLP), exemplified by Sato and Kameya's PRISM, Poole's ICL, Raedt et al's ProbLog and Vennekens et al's LPAD, is aimed at combining statistical and logical knowledge representation and inference. A key characteristic of PLP frameworks is that they are conservative extensions to non-probabilistic logic programs which have been widely used for knowledge representation. PLP frameworks extend traditional logic programming semantics to a distribution semantics, where the semantics of a probabilistic logic program is given in terms of a distribution over possible models of the program. However, the inference techniques used in these works rely on enumerating sets of explanations for a query answer. Consequently, these languages permit very limited use of random variables with continuous distributions. In this paper, we present a symbolic inference procedure that uses constraints and represents sets of explanations without enumeration. This permits us to reason over PLPs with Gaussian or Gamma-distributed random variables (in addition to discrete-valued random variables) and linear equality constraints over reals. We develop the inference procedure in the context of PRISM; however the procedure's core ideas can be easily applied to other PLP languages as well. An interesting aspect of our inference procedure is that PRISM's query evaluation process becomes a special case in the absence of any continuous random variables in the program. The symbolic inference procedure enables us to reason over complex probabilistic models such as Kalman filters and a large subclass of Hybrid Bayesian networks that were hitherto not possible in PLP frameworks. (To appear in Theory and Practice of Logic Programming).
△ Less
Submitted 7 October, 2012; v1 submitted 12 December, 2011;
originally announced December 2011.
-
Spin(7) instantons and the Hodge Conjecture for certain abelian four-folds: a modest proposal
Authors:
Ramadas T. Ramakrishnan
Abstract:
The Hodge Conjecture is equivalent to a statement about conditions under which a complex vector bundle on a smooth complex projective variety admits a holomorphic structure. I advertise a class of abelian four-folds due to Mumford where this approach could be tested. I construct explicit smooth vector bundles - which can in fact be constructed in terms of of smooth line bundles - whose Chern cha…
▽ More
The Hodge Conjecture is equivalent to a statement about conditions under which a complex vector bundle on a smooth complex projective variety admits a holomorphic structure. I advertise a class of abelian four-folds due to Mumford where this approach could be tested. I construct explicit smooth vector bundles - which can in fact be constructed in terms of of smooth line bundles - whose Chern characters are given Hodge classes. An instanton connection on these vector bundles would endow them with a holomorphic structure and thus prove that these classes are algebraic. I use complex multiplication to exhibit Cayley cycles representing the given Hodge classes. I find alternate complex structures with respect to which the given bundles are holomorphic, and close with a suggestion (due to G. Tian) as to how this may possibly be put to use.
△ Less
Submitted 23 September, 2008;
originally announced September 2008.
-
Computation of Maximal Resolution of Copy Number Variation on a Nanofluidic Device using Digital PCR
Authors:
Simant Dube,
Alain Mir,
Robert C. Jones,
Ramesh Ramakrishnan,
Gang Sun
Abstract:
Copy Number Variations (CNVs) of regions of the human genome are important in disease association studies.The digital array is a nanofluidic biochip which utilizes integrated channels and valves that partition mixtures of sample and reagents into 765 nanovolume reaction chambers. It was recently shown how one can perform statistical analysis of CNV in a DNA sample the digital array. In particula…
▽ More
Copy Number Variations (CNVs) of regions of the human genome are important in disease association studies.The digital array is a nanofluidic biochip which utilizes integrated channels and valves that partition mixtures of sample and reagents into 765 nanovolume reaction chambers. It was recently shown how one can perform statistical analysis of CNV in a DNA sample the digital array. In particular, it was shown how one can accurately estimate the true concentration of the molecules in the DNA sample and then determine the ratios of different sequences along with statistical confidence intervals on these estimations. In this paper we perform computation of maximum number of copies which can be distinguished using the digital array which gives its resolution in terms of its ability to determine CNV. Then, we demonstrate the usefulness of the mathematical analysis to solve an important real-world problem of determination of the copy number of X chromosome as our example application.
△ Less
Submitted 27 October, 2008; v1 submitted 8 September, 2008;
originally announced September 2008.