-
Anopheles number prediction on environmental and climate variables using Lasso and stratified two levels cross validation
Abstract: This paper deals with prediction of anopheles number using environmental and climate variables. The variables selection is performed by an automatic machine learning method based on Lasso and stratified two levels cross validation. Selected variables are debiased while the predictionis generated by simple GLM (Generalized linear model). Finally, the results reveal to be qualitatively better, at… ▽ More
Submitted 4 August, 2016; originally announced August 2016.
Comments: arXiv admin note: text overlap with arXiv:1606.07578, arXiv:1511.01284
-
Regression Trees and Random forest based feature selection for malaria risk exposure prediction
Abstract: This paper deals with prediction of anopheles number, the main vector of malaria risk, using environmental and climate variables. The variables selection is based on an automatic machine learning method using regression trees, and random forests combined with stratified two levels cross validation. The minimum threshold of variables importance is accessed using the quadratic distance of variables… ▽ More
Submitted 24 June, 2016; originally announced June 2016.
-
Lasso based feature selection for malaria risk exposure prediction
Abstract: In life sciences, the experts generally use empirical knowledge to recode variables, choose interactions and perform selection by classical approach. The aim of this work is to perform automatic learning algorithm for variables selection which can lead to know if experts can be help in they decision or simply replaced by the machine and improve they knowledge and results. The Lasso method can dete… ▽ More
Submitted 4 November, 2015; originally announced November 2015.
Comments: in Petra Perner. Machine Learning and Data Mining in Pattern Recognition, Jul 2015, Hamburg, Germany. Ibai publishing, 2015, Machine Learning and Data Mining in Pattern Recognition (proceedings of 11th International Conference, MLDM 2015)
-
arXiv:1509.02873 [pdf, ps, other]
Sélection de variables par le GLM-Lasso pour la prédiction du risque palustre
Abstract: In this study, we propose an automatic learning method for variables selection based on Lasso in epidemiology context. One of the aim of this approach is to overcome the pretreatment of experts in medicine and epidemiology on collected data. These pretreatment consist in recoding some variables and to choose some interactions based on expertise. The approach proposed uses all available explanatory… ▽ More
Submitted 9 September, 2015; originally announced September 2015.
Comments: in French
Journal ref: 47èmes Journées de Statistique de la SFdS, Jun 2015, Lille, France. 2015