-
Risk Prediction of a Multiple Sclerosis Diagnosis
Authors:
Joyce C. Ho,
Joydeep Ghosh,
KP Unnikrishnan
Abstract:
Multiple sclerosis (MS) is a chronic autoimmune disease that affects the central nervous system. The progression and severity of MS varies by individual, but it is generally a disabling disease. Although medications have been developed to slow the disease progression and help manage symptoms, MS research has yet to result in a cure. Early diagnosis and treatment of the disease have been shown to b…
▽ More
Multiple sclerosis (MS) is a chronic autoimmune disease that affects the central nervous system. The progression and severity of MS varies by individual, but it is generally a disabling disease. Although medications have been developed to slow the disease progression and help manage symptoms, MS research has yet to result in a cure. Early diagnosis and treatment of the disease have been shown to be effective at slowing the development of disabilities. However, early MS diagnosis is difficult because symptoms are intermittent and shared with other diseases. Thus most previous works have focused on uncovering the risk factors associated with MS and predicting the progression of disease after a diagnosis rather than disease prediction. This paper investigates the use of data available in electronic medical records (EMRs) to create a risk prediction model; thereby helping clinicians perform the difficult task of diagnosing an MS patient. Our results demonstrate that even given a limited time window of patient data, one can achieve reasonable classification with an area under the receiver operating characteristic curve of 0.724. By restricting our features to common EMR components, the developed models also generalize to other healthcare systems.
△ Less
Submitted 5 March, 2013;
originally announced March 2013.
-
Growth Patterns of US Children from 1963 to 2012
Authors:
Xiang Zhong,
Jingshan Li,
Goutham Rao,
KP Unnikrishnan
Abstract:
Anthropometric measurements such as weight, stature (height), and body mass index (BMI) provide reliable indicators of children's growth. The 2000 CDC growth charts are the national standards in the United States for these important measures. But these growth charts were generated using data from 1963-1994. To understand the growth patterns of US children since 1994, we generate weight-for-age, st…
▽ More
Anthropometric measurements such as weight, stature (height), and body mass index (BMI) provide reliable indicators of children's growth. The 2000 CDC growth charts are the national standards in the United States for these important measures. But these growth charts were generated using data from 1963-1994. To understand the growth patterns of US children since 1994, we generate weight-for-age, stature-for-age and BMI-for-age percentile curves for both boys and girls aged 2-20 through the methods used to generate the 2000 CDC growth charts. Our datasets are from the National Health and Nutrition Examination Survey (NHANES) for years 1999-2010 and and from NorthShore University HealthSystem's Enterprise Data Warehouse (NS-EDW) for years 2006-2012. The weight and BMI percentile curves generated from NS-EDW and NHANES data differ substantially from the CDC percentile curves, while those for stature do not differ substantially. We conclude that the population weight and BMI values of US children in recent years have increased significantly since 2000 and the 2000 CDC growth charts may no longer be applicable to the current population of US children. Our charts poignantly reveals the increasing obesity of American children.
△ Less
Submitted 4 March, 2013;
originally announced March 2013.
-
Statistical Inference of Functional Connectivity in Neuronal Networks using Frequent Episodes
Authors:
Casey Diekman,
Kohinoor Dasgupta,
Vijay Nair,
P. S. Sastry,
K. P. Unnikrishnan
Abstract:
Identifying the spatio-temporal network structure of brain activity from multi-neuronal data streams is one of the biggest challenges in neuroscience. Repeating patterns of precisely timed activity across a group of neurons is potentially indicative of a microcircuit in the underlying neural tissue. Frequent episode discovery, a temporal data mining framework, has recently been shown to be a com…
▽ More
Identifying the spatio-temporal network structure of brain activity from multi-neuronal data streams is one of the biggest challenges in neuroscience. Repeating patterns of precisely timed activity across a group of neurons is potentially indicative of a microcircuit in the underlying neural tissue. Frequent episode discovery, a temporal data mining framework, has recently been shown to be a computationally efficient method of counting the occurrences of such patterns. In this paper, we propose a framework to determine when the counts are statistically significant by modeling the counting process. Our model allows direct estimation of the strengths of functional connections between neurons with improved resolution over previously published methods. It can also be used to rank the patterns discovered in a network of neurons according to their strengths and begin to reconstruct the graph structure of the network that produced the spike data. We validate our methods on simulated data and present analysis of patterns discovered in data from cultures of cortical neurons.
△ Less
Submitted 21 February, 2009;
originally announced February 2009.
-
Conditional probability based significance tests for sequential patterns in multi-neuronal spike trains
Authors:
P. S. Sastry,
K. P. Unnikrishnan
Abstract:
In this paper we consider the problem of detecting statistically significant sequential patterns in multi-neuronal spike trains. These patterns are characterized by ordered sequences of spikes from different neurons with specific delays between spikes. We have previously proposed a data mining scheme to efficiently discover such patterns which are frequent in the sense that the count of non-over…
▽ More
In this paper we consider the problem of detecting statistically significant sequential patterns in multi-neuronal spike trains. These patterns are characterized by ordered sequences of spikes from different neurons with specific delays between spikes. We have previously proposed a data mining scheme to efficiently discover such patterns which are frequent in the sense that the count of non-overlapping occurrences of the pattern in the data stream is above a threshold. Here we propose a method to determine the statistical significance of these repeating patterns and to set the thresholds automatically. The novelty of our approach is that we use a compound null hypothesis that includes not only models of independent neurons but also models where neurons have weak dependencies. The strength of interaction among the neurons is represented in terms of certain pair-wise conditional probabilities. We specify our null hypothesis by putting an upper bound on all such conditional probabilities. We construct a probabilistic model that captures the counting process and use this to calculate the mean and variance of the count for any pattern. Using this we derive a test of significance for rejecting such a null hypothesis. This also allows us to rank-order different significant patterns. We illustrate the effectiveness of our approach using spike trains generated from a non-homogeneous Poisson model with embedded dependencies.
△ Less
Submitted 27 August, 2008; v1 submitted 26 August, 2008;
originally announced August 2008.