-
EQUATOR: A Deterministic Framework for Evaluating LLM Reasoning with Open-Ended Questions. # v1.0.0-beta
Authors:
Raymond Bernard,
Shaina Raza,
Subhabrata Das,
Rahul Murugan
Abstract:
Despite the remarkable coherence of Large Language Models (LLMs), existing evaluation methods often suffer from fluency bias and rely heavily on multiple-choice formats, making it difficult to assess factual accuracy and complex reasoning effectively. LLMs thus frequently generate factually inaccurate responses, especially in complex reasoning tasks, highlighting two prominent challenges: (1) the…
▽ More
Despite the remarkable coherence of Large Language Models (LLMs), existing evaluation methods often suffer from fluency bias and rely heavily on multiple-choice formats, making it difficult to assess factual accuracy and complex reasoning effectively. LLMs thus frequently generate factually inaccurate responses, especially in complex reasoning tasks, highlighting two prominent challenges: (1) the inadequacy of existing methods to evaluate reasoning and factual accuracy effectively, and (2) the reliance on human evaluators for nuanced judgment, as illustrated by Williams and Huckle (2024)[1], who found manual grading indispensable despite automated grading advancements.
To address evaluation gaps in open-ended reasoning tasks, we introduce the EQUATOR Evaluator (Evaluation of Question Answering Thoroughness in Open-ended Reasoning). This framework combines deterministic scoring with a focus on factual accuracy and robust reasoning assessment. Using a vector database, EQUATOR pairs open-ended questions with human-evaluated answers, enabling more precise and scalable evaluations. In practice, EQUATOR significantly reduces reliance on human evaluators for scoring and improves scalability compared to Williams and Huckle's (2004)[1] methods.
Our results demonstrate that this framework significantly outperforms traditional multiple-choice evaluations while maintaining high accuracy standards. Additionally, we introduce an automated evaluation process leveraging smaller, locally hosted LLMs. We used LLaMA 3.2B, running on the Ollama binaries to streamline our assessments. This work establishes a new paradigm for evaluating LLM performance, emphasizing factual accuracy and reasoning ability, and provides a robust methodological foundation for future research.
△ Less
Submitted 30 December, 2024;
originally announced January 2025.
-
Enhanced Momentum with Momentum Transformers
Authors:
Max Mason,
Waasi A Jagirdar,
David Huang,
Rahul Murugan
Abstract:
The primary objective of this research is to build a Momentum Transformer that is expected to outperform benchmark time-series momentum and mean-reversion trading strategies. We extend the ideas introduced in the paper Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture to equities as the original paper primarily only builds upon futures and equity indices. Unlike…
▽ More
The primary objective of this research is to build a Momentum Transformer that is expected to outperform benchmark time-series momentum and mean-reversion trading strategies. We extend the ideas introduced in the paper Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture to equities as the original paper primarily only builds upon futures and equity indices. Unlike conventional Long Short-Term Memory (LSTM) models, which operate sequentially and are optimized for processing local patterns, an attention mechanism equips our architecture with direct access to all prior time steps in the training window. This hybrid design, combining attention with an LSTM, enables the model to capture long-term dependencies, enhance performance in scenarios accounting for transaction costs, and seamlessly adapt to evolving market conditions, such as those witnessed during the Covid Pandemic. We average 4.14% returns which is similar to the original papers results. Our Sharpe is lower at an average of 1.12 due to much higher volatility which may be due to stocks being inherently more volatile than futures and indices.
△ Less
Submitted 16 December, 2024;
originally announced December 2024.
-
Acute kidney injury prediction for non-critical care patients: a retrospective external and internal validation study
Authors:
Esra Adiyeke,
Yuanfang Ren,
Benjamin Shickel,
Matthew M. Ruppert,
Ziyuan Guan,
Sandra L. Kane-Gill,
Raghavan Murugan,
Nabihah Amatullah,
Britney A. Stottlemyer,
Tiffany L. Tran,
Dan Ricketts,
Christopher M Horvat,
Parisa Rashidi,
Azra Bihorac,
Tezcan Ozrazgat-Baslanti
Abstract:
Background: Acute kidney injury (AKI), the decline of kidney excretory function, occurs in up to 18% of hospitalized admissions. Progression of AKI may lead to irreversible kidney damage. Methods: This retrospective cohort study includes adult patients admitted to a non-intensive care unit at the University of Pittsburgh Medical Center (UPMC) (n = 46,815) and University of Florida Health (UFH) (n…
▽ More
Background: Acute kidney injury (AKI), the decline of kidney excretory function, occurs in up to 18% of hospitalized admissions. Progression of AKI may lead to irreversible kidney damage. Methods: This retrospective cohort study includes adult patients admitted to a non-intensive care unit at the University of Pittsburgh Medical Center (UPMC) (n = 46,815) and University of Florida Health (UFH) (n = 127,202). We developed and compared deep learning and conventional machine learning models to predict progression to Stage 2 or higher AKI within the next 48 hours. We trained local models for each site (UFH Model trained on UFH, UPMC Model trained on UPMC) and a separate model with a development cohort of patients from both sites (UFH-UPMC Model). We internally and externally validated the models on each site and performed subgroup analyses across sex and race. Results: Stage 2 or higher AKI occurred in 3% (n=3,257) and 8% (n=2,296) of UFH and UPMC patients, respectively. Area under the receiver operating curve values (AUROC) for the UFH test cohort ranged between 0.77 (UPMC Model) and 0.81 (UFH Model), while AUROC values ranged between 0.79 (UFH Model) and 0.83 (UPMC Model) for the UPMC test cohort. UFH-UPMC Model achieved an AUROC of 0.81 (95% confidence interval [CI] [0.80, 0.83]) for UFH and 0.82 (95% CI [0.81,0.84]) for UPMC test cohorts; an area under the precision recall curve values (AUPRC) of 0.6 (95% CI, [0.05, 0.06]) for UFH and 0.13 (95% CI, [0.11,0.15]) for UPMC test cohorts. Kinetic estimated glomerular filtration rate, nephrotoxic drug burden and blood urea nitrogen remained the top three features with the highest influence across the models and health centers. Conclusion: Locally developed models displayed marginally reduced discrimination when tested on another institution, while the top set of influencing features remained the same across the models and sites.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Machine learning techniques for the Schizophrenia diagnosis: A comprehensive review and future research directions
Authors:
Shradha Verma,
Tripti Goel,
M Tanveer,
Weiping Ding,
Rahul Sharma,
R Murugan
Abstract:
Schizophrenia (SCZ) is a brain disorder where different people experience different symptoms, such as hallucination, delusion, flat-talk, disorganized thinking, etc. In the long term, this can cause severe effects and diminish life expectancy by more than ten years. Therefore, early and accurate diagnosis of SCZ is prevalent, and modalities like structural magnetic resonance imaging (sMRI), functi…
▽ More
Schizophrenia (SCZ) is a brain disorder where different people experience different symptoms, such as hallucination, delusion, flat-talk, disorganized thinking, etc. In the long term, this can cause severe effects and diminish life expectancy by more than ten years. Therefore, early and accurate diagnosis of SCZ is prevalent, and modalities like structural magnetic resonance imaging (sMRI), functional MRI (fMRI), diffusion tensor imaging (DTI), and electroencephalogram (EEG) assist in witnessing the brain abnormalities of the patients. Moreover, for accurate diagnosis of SCZ, researchers have used machine learning (ML) algorithms for the past decade to distinguish the brain patterns of healthy and SCZ brains using MRI and fMRI images. This paper seeks to acquaint SCZ researchers with ML and to discuss its recent applications to the field of SCZ study. This paper comprehensively reviews state-of-the-art techniques such as ML classifiers, artificial neural network (ANN), deep learning (DL) models, methodological fundamentals, and applications with previous studies. The motivation of this paper is to benefit from finding the research gaps that may lead to the development of a new model for accurate SCZ diagnosis. The paper concludes with the research finding, followed by the future scope that directly contributes to new research directions.
△ Less
Submitted 16 January, 2023;
originally announced January 2023.
-
Lightweight 3D Convolutional Neural Network for Schizophrenia diagnosis using MRI Images and Ensemble Bagging Classifier
Authors:
P Supriya Patro,
Tripti Goel,
S A VaraPrasad,
M Tanveer,
R Murugan
Abstract:
Structural alterations have been thoroughly investigated in the brain during the early onset of schizophrenia (SCZ) with the development of neuroimaging methods. The objective of the paper is an efficient classification of SCZ in 2 different classes: Cognitive Normal (CN), and SCZ using magnetic resonance imaging (MRI) images. This paper proposed a lightweight 3D convolutional neural network (CNN)…
▽ More
Structural alterations have been thoroughly investigated in the brain during the early onset of schizophrenia (SCZ) with the development of neuroimaging methods. The objective of the paper is an efficient classification of SCZ in 2 different classes: Cognitive Normal (CN), and SCZ using magnetic resonance imaging (MRI) images. This paper proposed a lightweight 3D convolutional neural network (CNN) based framework for SCZ diagnosis using MRI images. In the proposed model, lightweight 3D CNN is used to extract both spatial and spectral features simultaneously from 3D volume MRI scans, and classification is done using an ensemble bagging classifier. Ensemble bagging classifier contributes to preventing overfitting, reduces variance, and improves the model's accuracy. The proposed algorithm is tested on datasets taken from three benchmark databases available as open-source: MCICShare, COBRE, and fBRINPhase-II. These datasets have undergone preprocessing steps to register all the MRI images to the standard template and reduce the artifacts. The model achieves the highest accuracy 92.22%, sensitivity 94.44%, specificity 90%, precision 90.43%, recall 94.44%, F1-score 92.39% and G-mean 92.19% as compared to the current state-of-the-art techniques. The performance metrics evidenced the use of this model to assist the clinicians for automatic accurate diagnosis of SCZ.
△ Less
Submitted 5 November, 2022;
originally announced November 2022.
-
Theory on the mechanisms of combinatorial binding of transcription factors with DNA
Authors:
R. Murugan
Abstract:
We develop a theoretical framework on the mechanism of combinatorial binding of transcription factors (TFs) with their specific binding sites on DNA. We consider three possible mechanisms viz. monomer, hetero-oligomer and coordinated recruitment pathways. In the monomer pathway, combinatorial TFs search for their targets in an independent manner and the protein-protein interactions among them will…
▽ More
We develop a theoretical framework on the mechanism of combinatorial binding of transcription factors (TFs) with their specific binding sites on DNA. We consider three possible mechanisms viz. monomer, hetero-oligomer and coordinated recruitment pathways. In the monomer pathway, combinatorial TFs search for their targets in an independent manner and the protein-protein interactions among them will be insignificant. The protein-protein interactions are very strong so that the hetero-oligomer complex of TFs as a whole searches for the cognate sites in case of hetero-oligomer pathway. The TF which arrived first will recruit the adjacent TFs in a sequential manner in the recruitment pathway. The free energy released from the protein-protein interactions among TFs will be in turn utilized to stabilize the TFs-DNA complex. Such coordinated binding of TFs in fact emerges as the cooperative effect. Monomer and hetero-oligomer pathways are efficient only when few TFs are involved in the combinatorial regulation. Detailed random walk simulations suggest that when the number of TFs in a combination increases then the searching efficiency of TFs in these pathways decreases with the increasing number of TFs in a power law manner. The power law exponent associated with the monomer pathway seems to be strongly dependent on the number of TFs, distance between the initial position of TFs from their specific binding sites and the hop size associated with the dynamics of TFs on DNA.
△ Less
Submitted 20 October, 2016;
originally announced October 2016.
-
Theory on the mechanism of site-specific DNA-protein interactions in the presence of traps
Authors:
G. Niranjani,
R. Murugan
Abstract:
The speed of site-specific binding of transcription factor (TFs) proteins with genomic DNA seems to be strongly retarded by the randomly occurring sequence traps. Traps are those DNA sequences sharing significant similarity with the original specific binding sites. It is an intriguing question how the naturally occurring TFs and their specific binding sites are designed to manage the retarding eff…
▽ More
The speed of site-specific binding of transcription factor (TFs) proteins with genomic DNA seems to be strongly retarded by the randomly occurring sequence traps. Traps are those DNA sequences sharing significant similarity with the original specific binding sites. It is an intriguing question how the naturally occurring TFs and their specific binding sites are designed to manage the retarding effects of such randomly occurring traps. We develop a simple random walk model on the site-specific binding of TFs with genomic DNA in the presence of sequence traps. Our dynamical model predicts that (a) the retarding effects of traps will be minimum when the traps are arranged around the specific binding site such that there is a negative correlation between the binding strength of TFs with traps and the distance of traps from the specific binding site and (b) the retarding effects of sequence traps can be appeased by the condensed conformational state of DNA. Our computational analysis results on the distribution of sequence traps around the putative binding sites of various TFs in mouse and human genome clearly agree well the theoretical predictions. We propose that the distribution of traps can be used as an additional metric to efficiently identify the specific binding sites of TFs on genomic DNA.
△ Less
Submitted 31 May, 2016;
originally announced May 2016.
-
Theory on the mechanism of DNA renaturation: Stochastic nucleation and zipping
Authors:
Gnanapragasam Niranjani,
Rajamanickam Murugan
Abstract:
Renaturation of complementary single strands of DNA is one of the important processes that requires better understanding in the view of molecular biology and biological physics. Here we develop a stochastic dynamical model on the DNA renaturation. According to our model there are at least three steps in the renaturation process viz. incorrect-contact formation, correct-contact formation and nuclea…
▽ More
Renaturation of complementary single strands of DNA is one of the important processes that requires better understanding in the view of molecular biology and biological physics. Here we develop a stochastic dynamical model on the DNA renaturation. According to our model there are at least three steps in the renaturation process viz. incorrect-contact formation, correct-contact formation and nucleation, and zipping. Most of the earlier two-state models combined nucleation with incorrect-contact formation step. In our model we suggest that it is considerably meaningful when we combine the nucleation with the zipping since nucleation is the initial step of zipping and the nucleated and zipping molecules are indistinguishable. Incorrect-contact formation step is a pure three-dimensional diffusion controlled collision process. Whereas nucleation involves several rounds of one-dimensional slithering dynamics of one single strand of DNA on the other complementary strand in the process of searching for the correct-contact and then initiate nucleation. Upon nucleation, the stochastic zipping follows to generate a fully renatured double stranded DNA. It seems that the square-root dependency of the overall renaturation rate constant on the length of reacting single strands originates mainly from the geometric constraints in the diffusion controlled incorrect-contact formation step. Further the inverse scaling of the renaturation rate on the viscosity of the reaction medium also originates from the incorrect-contact formation step. On the other hand the inverse scaling of the renaturation rate with the sequence complexity originates from the stochastic zipping which involves several rounds of crossing over the free-energy barrier at microscopic levels.
△ Less
Submitted 22 November, 2015; v1 submitted 11 October, 2015;
originally announced October 2015.
-
Theory on the mechanism of rapid binding of transcription factor proteins at specific-sites on DNA
Authors:
Rajamanickam Murugan
Abstract:
We develop revised theoretical ideas on the mechanism by which the transcription factor proteins locate their specific binding sites on DNA faster than the three-dimensional (3D) diffusion controlled rate limit. We demonstrate that the 3D-diffusion controlled rate limit can be enhanced when the protein molecule reads several possible binding stretches of the template DNA via one-dimensional (1D) d…
▽ More
We develop revised theoretical ideas on the mechanism by which the transcription factor proteins locate their specific binding sites on DNA faster than the three-dimensional (3D) diffusion controlled rate limit. We demonstrate that the 3D-diffusion controlled rate limit can be enhanced when the protein molecule reads several possible binding stretches of the template DNA via one-dimensional (1D) diffusion upon each 3D-diffusion mediated collision or nonspecific binding event. The overall enhancement of site-specific association rate is directly proportional to the maximum possible sliding length (LA, square root of (6Do/kr) where Do is the 1D-diffusion coefficient and kr is the dissociation rate constant associated with the nonspecific DNA-protein complex) associated with the 1D-diffusion of protein molecule along DNA. Upon considering several possible mechanisms we find that the DNA binding proteins can efficiently locate their cognate sites on DNA by switching across fast-moving, slow-moving and reading states of their DNA binding domains in a cyclic manner. Irrespective of the type of mechanism the overall rate enhancement factor asymptotically approaches a limiting value which is directly proportional to LA as the total length of DNA that contains the cognate site increases. These results are consistent with the in vitro experimental observations.
△ Less
Submitted 19 September, 2014; v1 submitted 3 July, 2014;
originally announced July 2014.
-
Theory on the Dynamics of Oscillatory Loops in the Transcription Factor Networks
Authors:
Rajamanickam Murugan
Abstract:
We develop a detailed theoretical framework for various types of transcription factor gene oscillators. We further demonstrate that one can build genetic-oscillators which are tunable and robust against perturbations in the critical control parameters by coupling two or more independent Goodwin-Griffith oscillators through either -OR- or -AND- type logic. Most of the coupled oscillators constructe…
▽ More
We develop a detailed theoretical framework for various types of transcription factor gene oscillators. We further demonstrate that one can build genetic-oscillators which are tunable and robust against perturbations in the critical control parameters by coupling two or more independent Goodwin-Griffith oscillators through either -OR- or -AND- type logic. Most of the coupled oscillators constructed in the literature so far seem to be of -OR- type. When there are transient perturbations in one of the -OR- type coupled-oscillators, then the overall period of the system remains constant (period-buffering) whereas in case of -AND- type coupling the overall period of the system moves towards the perturbed oscillator. Though there is a period-buffering, the amplitudes of oscillators coupled through -OR- type logic are more sensitive to perturbations in the parameters associated with the promoter state dynamics than -AND- type. Further analysis shows that the period of -AND- type coupled dual-feedback oscillators can be tuned without conceding on the amplitudes. Using these results we derive the basic design principles governing the robust and tunable synthetic gene oscillators without compromising on their amplitudes.
△ Less
Submitted 15 May, 2014;
originally announced May 2014.