-
What we should learn from pandemic publishing
Authors:
Satyaki Sikdar,
Sara Venturini,
Marie-Laure Charpignon,
Sagar Kumar,
Francesco Rinaldi,
Francesco Tudisco,
Santo Fortunato,
Maimuna S. Majumder
Abstract:
Authors of COVID-19 papers produced during the pandemic were overwhelmingly not subject matter experts. Such a massive inflow of scholars from different expertise areas is both an asset and a potential problem. Domain-informed scientific collaboration is the key to preparing for future crises.
Authors of COVID-19 papers produced during the pandemic were overwhelmingly not subject matter experts. Such a massive inflow of scholars from different expertise areas is both an asset and a potential problem. Domain-informed scientific collaboration is the key to preparing for future crises.
△ Less
Submitted 24 September, 2024;
originally announced October 2024.
-
Interpretable Symbolic Regression for Data Science: Analysis of the 2022 Competition
Authors:
F. O. de Franca,
M. Virgolin,
M. Kommenda,
M. S. Majumder,
M. Cranmer,
G. Espada,
L. Ingelse,
A. Fonseca,
M. Landajuela,
B. Petersen,
R. Glatt,
N. Mundhenk,
C. S. Lee,
J. D. Hochhalter,
D. L. Randall,
P. Kamienny,
H. Zhang,
G. Dick,
A. Simon,
B. Burlacu,
Jaan Kasak,
Meera Machado,
Casper Wilstrup,
W. G. La Cava
Abstract:
Symbolic regression searches for analytic expressions that accurately describe studied phenomena. The main attraction of this approach is that it returns an interpretable model that can be insightful to users. Historically, the majority of algorithms for symbolic regression have been based on evolutionary algorithms. However, there has been a recent surge of new proposals that instead utilize appr…
▽ More
Symbolic regression searches for analytic expressions that accurately describe studied phenomena. The main attraction of this approach is that it returns an interpretable model that can be insightful to users. Historically, the majority of algorithms for symbolic regression have been based on evolutionary algorithms. However, there has been a recent surge of new proposals that instead utilize approaches such as enumeration algorithms, mixed linear integer programming, neural networks, and Bayesian optimization. In order to assess how well these new approaches behave on a set of common challenges often faced in real-world data, we hosted a competition at the 2022 Genetic and Evolutionary Computation Conference consisting of different synthetic and real-world datasets which were blind to entrants. For the real-world track, we assessed interpretability in a realistic way by using a domain expert to judge the trustworthiness of candidate models.We present an in-depth analysis of the results obtained in this competition, discuss current challenges of symbolic regression algorithms and highlight possible improvements for future competitions.
△ Less
Submitted 3 July, 2023; v1 submitted 3 April, 2023;
originally announced April 2023.
-
A Secure Back-up and Restore for Resource-Constrained IoT based on Nanotechnology
Authors:
Mesbah Uddin,
Md. Badruddoja Majumder,
Md. Sakib Hasan,
Garrett S. Rose
Abstract:
With the emergence of IoT (Internet of things), huge amounts of sensitive data are being processed and transmitted everyday in edge devices with little to no security. Due to their aggressive power management schemes, it is a common and necessary technique to make a back-up of their program states and other necessary data in a non-volatile memory (NVM) before going to sleep or low power mode. Howe…
▽ More
With the emergence of IoT (Internet of things), huge amounts of sensitive data are being processed and transmitted everyday in edge devices with little to no security. Due to their aggressive power management schemes, it is a common and necessary technique to make a back-up of their program states and other necessary data in a non-volatile memory (NVM) before going to sleep or low power mode. However, this memory is often left unprotected as adding robust security measures tends to be expensive for these resource constrained systems. In this paper, we propose a lightweight security system for NVM during low power mode. This security architecture uses the memristor, an emerging nanoscale device which is used to build hardware security primitives like PUF (physical unclonable function) based encryption-decryption, true random number generators (TRNG), and memory integrity checking. A reliability enhancement technique for this PUF is also proposed which shows how this system would work even with less-than-100\% reliable PUF responses. Together, with all these techniques, we have established a dual layer security protocol (data encryption+integrity check) which provides reasonable security to an embedded processor while being very lightweight in terms of area, power, and computation time. A complete system design is demonstrated with 65$n$m CMOS and emerging memristive technology. With this, we have provided a detailed and accurate estimation of resource overhead. Analysis of the security of the whole system is also provided.
△ Less
Submitted 9 July, 2020;
originally announced July 2020.
-
Tracking COVID-19 using online search
Authors:
Vasileios Lampos,
Maimuna S. Majumder,
Elad Yom-Tov,
Michael Edelstein,
Simon Moura,
Yohhei Hamada,
Molebogeng X. Rangaka,
Rachel A. McKendry,
Ingemar J. Cox
Abstract:
Previous research has demonstrated that various properties of infectious diseases can be inferred from online search behaviour. In this work we use time series of online search query frequencies to gain insights about the prevalence of COVID-19 in multiple countries. We first develop unsupervised modelling techniques based on associated symptom categories identified by the United Kingdom's Nationa…
▽ More
Previous research has demonstrated that various properties of infectious diseases can be inferred from online search behaviour. In this work we use time series of online search query frequencies to gain insights about the prevalence of COVID-19 in multiple countries. We first develop unsupervised modelling techniques based on associated symptom categories identified by the United Kingdom's National Health Service and Public Health England. We then attempt to minimise an expected bias in these signals caused by public interest -- as opposed to infections -- using the proportion of news media coverage devoted to COVID-19 as a proxy indicator. Our analysis indicates that models based on online searches precede the reported confirmed cases and deaths by 16.7 (10.2 - 23.2) and 22.1 (17.4 - 26.9) days, respectively. We also investigate transfer learning techniques for mapping supervised models from countries where the spread of disease has progressed extensively to countries that are in earlier phases of their respective epidemic curves. Furthermore, we compare time series of online search activity against confirmed COVID-19 cases or deaths jointly across multiple countries, uncovering interesting querying patterns, including the finding that rarer symptoms are better predictors than common ones. Finally, we show that web searches improve the short-term forecasting accuracy of autoregressive models for COVID-19 deaths. Our work provides evidence that online search data can be used to develop complementary public health surveillance methods to help inform the COVID-19 response in conjunction with more established approaches.
△ Less
Submitted 10 February, 2021; v1 submitted 18 March, 2020;
originally announced March 2020.
-
Improving drug sensitivity predictions in precision medicine through active expert knowledge elicitation
Authors:
Iiris Sundin,
Tomi Peltola,
Muntasir Mamun Majumder,
Pedram Daee,
Marta Soare,
Homayun Afrabandpey,
Caroline Heckman,
Samuel Kaski,
Pekka Marttinen
Abstract:
Predicting the efficacy of a drug for a given individual, using high-dimensional genomic measurements, is at the core of precision medicine. However, identifying features on which to base the predictions remains a challenge, especially when the sample size is small. Incorporating expert knowledge offers a promising alternative to improve a prediction model, but collecting such knowledge is laborio…
▽ More
Predicting the efficacy of a drug for a given individual, using high-dimensional genomic measurements, is at the core of precision medicine. However, identifying features on which to base the predictions remains a challenge, especially when the sample size is small. Incorporating expert knowledge offers a promising alternative to improve a prediction model, but collecting such knowledge is laborious to the expert if the number of candidate features is very large. We introduce a probabilistic model that can incorporate expert feedback about the impact of genomic measurements on the sensitivity of a cancer cell for a given drug. We also present two methods to intelligently collect this feedback from the expert, using experimental design and multi-armed bandit models. In a multiple myeloma blood cancer data set (n=51), expert knowledge decreased the prediction error by 8%. Furthermore, the intelligent approaches can be used to reduce the workload of feedback collection to less than 30% on average compared to a naive approach.
△ Less
Submitted 9 May, 2017;
originally announced May 2017.
-
Guided Deep List: Automating the Generation of Epidemiological Line Lists from Open Sources
Authors:
Saurav Ghosh,
Prithwish Chakraborty,
Bryan L. Lewis,
Maimuna S. Majumder,
Emily Cohn,
John S. Brownstein,
Madhav V. Marathe,
Naren Ramakrishnan
Abstract:
Real-time monitoring and responses to emerging public health threats rely on the availability of timely surveillance data. During the early stages of an epidemic, the ready availability of line lists with detailed tabular information about laboratory-confirmed cases can assist epidemiologists in making reliable inferences and forecasts. Such inferences are crucial to understand the epidemiology of…
▽ More
Real-time monitoring and responses to emerging public health threats rely on the availability of timely surveillance data. During the early stages of an epidemic, the ready availability of line lists with detailed tabular information about laboratory-confirmed cases can assist epidemiologists in making reliable inferences and forecasts. Such inferences are crucial to understand the epidemiology of a specific disease early enough to stop or control the outbreak. However, construction of such line lists requires considerable human supervision and therefore, difficult to generate in real-time. In this paper, we motivate Guided Deep List, the first tool for building automated line lists (in near real-time) from open source reports of emerging disease outbreaks. Specifically, we focus on deriving epidemiological characteristics of an emerging disease and the affected population from reports of illness. Guided Deep List uses distributed vector representations (ala word2vec) to discover a set of indicators for each line list feature. This discovery of indicators is followed by the use of dependency parsing based techniques for final extraction in tabular form. We evaluate the performance of Guided Deep List against a human annotated line list provided by HealthMap corresponding to MERS outbreaks in Saudi Arabia. We demonstrate that Guided Deep List extracts line list features with increased accuracy compared to a baseline method. We further show how these automatically extracted line list features can be used for making epidemiological inferences, such as inferring demographics and symptoms-to-hospitalization period of affected individuals.
△ Less
Submitted 21 February, 2017;
originally announced February 2017.
-
A Mobile Message Scheduling and Delivery System using m-Learning framework
Authors:
Moumita Majumder,
Sumit Dhar
Abstract:
Wireless data communications in form of Short Message Service (SMS) and Wireless Access Protocols (WAP) browsers have gained global popularity, yet, not much has been done to extend the usage of these devices in electronic learning (e-learning) and information sharing. This project explores the extension of e learning into wireless/ handheld (W/H) computing devices with the help of a mobile learni…
▽ More
Wireless data communications in form of Short Message Service (SMS) and Wireless Access Protocols (WAP) browsers have gained global popularity, yet, not much has been done to extend the usage of these devices in electronic learning (e-learning) and information sharing. This project explores the extension of e learning into wireless/ handheld (W/H) computing devices with the help of a mobile learning (m-learning) framework. This framework provides the requirements to develop m-learning application that can be used to share academic and administrative information among people within the university campus. A prototype application has been developed to demonstrate the important functionality of the proposed system in simulated environment. This system is supposed to work both in bulk SMS and interactive SMS delivery mode. Here we have combined both Short Message Service (SMS) and Wireless Access Protocols (WAP) browsers. SMS is used for Short and in time information delivery and WAP is used for detailed information delivery like course content, training material, interactive evolution tests etc. The push model is used for sending personalized multicasting messages to a group of mobile users with a common profile thereby improving the effectiveness and usefulness of the cntent delivered. Again pull mechanism can be applied for sending information as SMS when requested by end user in interactive SMS delivery mode. The main strength of the system is that, the actual SMS delivery application can be hosted on a mobile device, which can operate even when the device is on move.
△ Less
Submitted 29 March, 2010;
originally announced March 2010.