Search | arXiv e-print repository

ConceptMix++: Leveling the Playing Field in Text-to-Image Benchmarking via Iterative Prompt Optimization

Authors: Haosheng Gan, Berk Tinaz, Mohammad Shahab Sepehri, Zalan Fabian, Mahdi Soltanolkotabi

Abstract: Current text-to-image (T2I) benchmarks evaluate models on rigid prompts, potentially underestimating true generative capabilities due to prompt sensitivity and creating biases that favor certain models while disadvantaging others. We introduce ConceptMix++, a framework that disentangles prompt phrasing from visual generation capabilities by applying iterative prompt optimization. Building on Conce… ▽ More Current text-to-image (T2I) benchmarks evaluate models on rigid prompts, potentially underestimating true generative capabilities due to prompt sensitivity and creating biases that favor certain models while disadvantaging others. We introduce ConceptMix++, a framework that disentangles prompt phrasing from visual generation capabilities by applying iterative prompt optimization. Building on ConceptMix, our approach incorporates a multimodal optimization pipeline that leverages vision-language model feedback to refine prompts systematically. Through extensive experiments across multiple diffusion models, we show that optimized prompts significantly improve compositional generation performance, revealing previously hidden model capabilities and enabling fairer comparisons across T2I models. Our analysis reveals that certain visual concepts -- such as spatial relationships and shapes -- benefit more from optimization than others, suggesting that existing benchmarks systematically underestimate model performance in these categories. Additionally, we find strong cross-model transferability of optimized prompts, indicating shared preferences for effective prompt phrasing across models. These findings demonstrate that rigid benchmarking approaches may significantly underrepresent true model capabilities, while our framework provides more accurate assessment and insights for future development. △ Less

Submitted 3 July, 2025; originally announced July 2025.

Comments: An earlier version appeared in the CVPR 2025 Workshop on Generative Models for Computer Vision

arXiv:2507.03209 [pdf, ps, other]

LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency Prediction

Authors: Asal Mehradfar, Mohammad Shahab Sepehri, Jose Miguel Hernandez-Lobato, Glen S. Kwon, Mahdi Soltanolkotabi, Salman Avestimehr, Morteza Rasoulianboroujeni

Abstract: The discovery of new ionizable lipids for efficient lipid nanoparticle (LNP)-mediated RNA delivery remains a critical bottleneck for RNA-based therapeutics development. Recent advances have highlighted the potential of machine learning (ML) to predict transfection efficiency from molecular structure, enabling high-throughput virtual screening and accelerating lead identification. However, existing… ▽ More The discovery of new ionizable lipids for efficient lipid nanoparticle (LNP)-mediated RNA delivery remains a critical bottleneck for RNA-based therapeutics development. Recent advances have highlighted the potential of machine learning (ML) to predict transfection efficiency from molecular structure, enabling high-throughput virtual screening and accelerating lead identification. However, existing approaches are hindered by inadequate data quality, ineffective feature representations, low predictive accuracy, and poor generalizability. Here, we present LANTERN (Lipid nANoparticle Transfection Efficiency pRedictioN), a robust ML framework for predicting transfection efficiency based on ionizable lipid representation. We benchmarked a diverse set of ML models against AGILE, a previously published model developed for transfection prediction. Our results show that combining simpler models with chemically informative features, particularly count-based Morgan fingerprints, outperforms more complex models that rely on internally learned embeddings, such as AGILE. We also show that a multi-layer perceptron trained on a combination of Morgan fingerprints and Expert descriptors achieved the highest performance ($\text{R}^2$ = 0.8161, r = 0.9053), significantly exceeding AGILE ($\text{R}^2$ = 0.2655, r = 0.5488). We show that the models in LANTERN consistently have strong performance across multiple evaluation metrics. Thus, LANTERN offers a robust benchmarking framework for LNP transfection prediction and serves as a valuable tool for accelerating lipid-based RNA delivery systems design. △ Less

Submitted 3 July, 2025; originally announced July 2025.

arXiv:2501.01010 [pdf, other]

CryptoMamba: Leveraging State Space Models for Accurate Bitcoin Price Prediction

Authors: Mohammad Shahab Sepehri, Asal Mehradfar, Mahdi Soltanolkotabi, Salman Avestimehr

Abstract: Predicting Bitcoin price remains a challenging problem due to the high volatility and complex non-linear dynamics of cryptocurrency markets. Traditional time-series models, such as ARIMA and GARCH, and recurrent neural networks, like LSTMs, have been widely applied to this task but struggle to capture the regime shifts and long-range dependencies inherent in the data. In this work, we propose Cryp… ▽ More Predicting Bitcoin price remains a challenging problem due to the high volatility and complex non-linear dynamics of cryptocurrency markets. Traditional time-series models, such as ARIMA and GARCH, and recurrent neural networks, like LSTMs, have been widely applied to this task but struggle to capture the regime shifts and long-range dependencies inherent in the data. In this work, we propose CryptoMamba, a novel Mamba-based State Space Model (SSM) architecture designed to effectively capture long-range dependencies in financial time-series data. Our experiments show that CryptoMamba not only provides more accurate predictions but also offers enhanced generalizability across different market conditions, surpassing the limitations of previous models. Coupled with trading algorithms for real-world scenarios, CryptoMamba demonstrates its practical utility by translating accurate forecasts into financial outcomes. Our findings signal a huge advantage for SSMs in stock and cryptocurrency price forecasting tasks. △ Less

Submitted 3 May, 2025; v1 submitted 1 January, 2025; originally announced January 2025.

Comments: Published in IEEE International Conference on Blockchain and Cryptocurrency (ICBC) 2025

arXiv:2409.15477 [pdf, other]

MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models

Authors: Mohammad Shahab Sepehri, Zalan Fabian, Maryam Soltanolkotabi, Mahdi Soltanolkotabi

Abstract: Multimodal Large Language Models (MLLMs) have tremendous potential to improve the accuracy, availability, and cost-effectiveness of healthcare by providing automated solutions or serving as aids to medical professionals. Despite promising first steps in developing medical MLLMs in the past few years, their capabilities and limitations are not well-understood. Recently, many benchmark datasets have… ▽ More Multimodal Large Language Models (MLLMs) have tremendous potential to improve the accuracy, availability, and cost-effectiveness of healthcare by providing automated solutions or serving as aids to medical professionals. Despite promising first steps in developing medical MLLMs in the past few years, their capabilities and limitations are not well-understood. Recently, many benchmark datasets have been proposed that test the general medical knowledge of such models across a variety of medical areas. However, the systematic failure modes and vulnerabilities of such models are severely underexplored with most medical benchmarks failing to expose the shortcomings of existing models in this safety-critical domain. In this paper, we introduce MediConfusion, a challenging medical Visual Question Answering (VQA) benchmark dataset, that probes the failure modes of medical MLLMs from a vision perspective. We reveal that state-of-the-art models are easily confused by image pairs that are otherwise visually dissimilar and clearly distinct for medical experts. Strikingly, all available models (open-source or proprietary) achieve performance below random guessing on MediConfusion, raising serious concerns about the reliability of existing medical MLLMs for healthcare deployment. We also extract common patterns of model failure that may help the design of a new generation of more trustworthy and reliable MLLMs in healthcare. △ Less

Submitted 20 May, 2025; v1 submitted 23 September, 2024; originally announced September 2024.

Comments: 24 Pages, 9 figures, The Thirteenth International Conference on Learning Representations (ICLR) 2025

arXiv:2403.17902 [pdf, other]

Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models

Authors: Mohammad Shahab Sepehri, Zalan Fabian, Mahdi Soltanolkotabi

Abstract: The landscape of computational building blocks of efficient image restoration architectures is dominated by a combination of convolutional processing and various attention mechanisms. However, convolutional filters, while efficient, are inherently local and therefore struggle with modeling long-range dependencies in images. In contrast, attention excels at capturing global interactions between arb… ▽ More The landscape of computational building blocks of efficient image restoration architectures is dominated by a combination of convolutional processing and various attention mechanisms. However, convolutional filters, while efficient, are inherently local and therefore struggle with modeling long-range dependencies in images. In contrast, attention excels at capturing global interactions between arbitrary image regions, but suffers from a quadratic cost in image dimension. In this work, we propose Serpent, an efficient architecture for high-resolution image restoration that combines recent advances in state space models (SSMs) with multi-scale signal processing in its core computational block. SSMs, originally introduced for sequence modeling, can maintain a global receptive field with a favorable linear scaling in input size. We propose a novel hierarchical architecture inspired by traditional signal processing principles, that converts the input image into a collection of sequences and processes them in a multi-scale fashion. Our experimental results demonstrate that Serpent can achieve reconstruction quality on par with state-of-the-art techniques, while requiring orders of magnitude less compute (up to $150$ fold reduction in FLOPS) and a factor of up to $5\times$ less GPU memory while maintaining a compact model size. The efficiency gains achieved by Serpent are especially notable at high image resolutions. △ Less

Submitted 21 January, 2025; v1 submitted 26 March, 2024; originally announced March 2024.

ACM Class: I.4.4; I.4.5

arXiv:1805.09135 [pdf, ps, other]

Analysis of Novel Annotations in the Gene Ontology for Boosting the Selection of Negative Examples

Authors: Maryam Sepehri, Marco Frasca

Abstract: Public repositories for genome and proteome annotations, such as the Gene Ontology (GO), rarely stores negative annotations, i.e. proteins not possessing a given function. This leaves undefined or ill defined the set of negative examples, which is crucial for training the majority of machine learning methods inferring proteins functions. Automated techniques to choose reliable negative proteins ar… ▽ More Public repositories for genome and proteome annotations, such as the Gene Ontology (GO), rarely stores negative annotations, i.e. proteins not possessing a given function. This leaves undefined or ill defined the set of negative examples, which is crucial for training the majority of machine learning methods inferring proteins functions. Automated techniques to choose reliable negative proteins are thereby required to train accurate function prediction models. This study proposes the first extensive analysis of the temporal evolution of protein annotations in the GO repository. Novel annotations registered through the years have been analyzed to verify the presence of annotation patterns in the GO hierarchy. Our research supplied fundamental clues about proteins likely to be unreliable as negative examples, that we verified into a novel algorithm of our own construction, validated on two organisms in a genome wide fashion against approaches proposed to choose negative examples in the context of functional prediction. △ Less

Submitted 23 May, 2018; originally announced May 2018.

Comments: 13 pages, 5 figures, 1 table

Showing 1–6 of 6 results for author: Sepehri, M