Skip to main content

Showing 1–10 of 10 results for author: Hassoun, S

Searching in archive q-bio. Search in all archives.
.
  1. arXiv:2501.09274  [pdf, other

    cs.LG cs.AI q-bio.QM

    Large Language Model is Secretly a Protein Sequence Optimizer

    Authors: Yinkai Wang, Jiaxing He, Yuanqi Du, Xiaohui Chen, Jianan Canal Li, Li-Ping Liu, Xiaolin Xu, Soha Hassoun

    Abstract: We consider the protein sequence engineering problem, which aims to find protein sequences with high fitness levels, starting from a given wild-type sequence. Directed evolution has been a dominating paradigm in this field which has an iterative process to generate variants and select via experimental feedback. We demonstrate large language models (LLMs), despite being trained on massive texts, ar… ▽ More

    Submitted 17 January, 2025; v1 submitted 15 January, 2025; originally announced January 2025.

    Comments: Preprint

  2. arXiv:2411.14464  [pdf, ps, other

    q-bio.QM cs.AI cs.LG q-bio.BM

    JESTR: Joint Embedding Space Technique for Ranking Candidate Molecules for the Annotation of Untargeted Metabolomics Data

    Authors: Apurva Kalia, Yan Zhou Chen, Dilip Krishnan, Soha Hassoun

    Abstract: Motivation: A major challenge in metabolomics is annotation: assigning molecular structures to mass spectral fragmentation patterns. Despite recent advances in molecule-to-spectra and in spectra-to-molecular fingerprint prediction (FP), annotation rates remain low. Results: We introduce in this paper a novel paradigm (JESTR) for annotation. Unlike prior approaches that explicitly construct molecul… ▽ More

    Submitted 7 June, 2025; v1 submitted 17 November, 2024; originally announced November 2024.

    Comments: 10 pages, 10 figures, 4 tables

  3. arXiv:2410.23326  [pdf, other

    q-bio.QM cs.LG

    MassSpecGym: A benchmark for the discovery and identification of molecules

    Authors: Roman Bushuiev, Anton Bushuiev, Niek F. de Jonge, Adamo Young, Fleming Kretschmer, Raman Samusevich, Janne Heirman, Fei Wang, Luke Zhang, Kai Dührkop, Marcus Ludwig, Nils A. Haupt, Apurva Kalia, Corinna Brungs, Robin Schmid, Russell Greiner, Bo Wang, David S. Wishart, Li-Ping Liu, Juho Rousu, Wout Bittremieux, Hannes Rost, Tytus D. Mak, Soha Hassoun, Florian Huber , et al. (5 additional authors not shown)

    Abstract: The discovery and identification of molecules in biological and environmental samples is crucial for advancing biomedical and chemical sciences. Tandem mass spectrometry (MS/MS) is the leading technique for high-throughput elucidation of molecular structures. However, decoding a molecular structure from its mass spectrum is exceptionally challenging, even when performed by human experts. As a resu… ▽ More

    Submitted 14 February, 2025; v1 submitted 30 October, 2024; originally announced October 2024.

  4. arXiv:2203.13783  [pdf, other

    cs.LG cs.AI q-bio.BM

    Ensemble Spectral Prediction (ESP) Model for Metabolite Annotation

    Authors: Xinmeng Li, Hao Zhu, Li-ping Liu, Soha Hassoun

    Abstract: A key challenge in metabolomics is annotating measured spectra from a biological sample with chemical identities. Currently, only a small fraction of measurements can be assigned identities. Two complementary computational approaches have emerged to address the annotation problem: mapping candidate molecules to spectra, and mapping query spectra to molecular candidates. In essence, the candidate m… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

  5. arXiv:2111.09467  [pdf, other

    cs.LG q-bio.MN q-bio.QM

    CSI: Contrastive Data Stratification for Interaction Prediction and its Application to Compound-Protein Interaction Prediction

    Authors: Apurva Kalia, Dilip Krishnan, Soha Hassoun

    Abstract: Accurately predicting the likelihood of interaction between two objects (compound-protein sequence, user-item, author-paper, etc.) is a fundamental problem in Computer Science. Current deep-learning models rely on learning accurate representations of the interacting objects. Importantly, relationships between the interacting objects, or features of the interaction, offer an opportunity to partitio… ▽ More

    Submitted 21 December, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

    Comments: 11 pages, submitted to BioInformatics

  6. arXiv:2109.14766  [pdf, other

    q-bio.QM cs.IR cs.LG

    Boost-RS: Boosted Embeddings for Recommender Systems and its Application to Enzyme-Substrate Interaction Prediction

    Authors: Xinmeng Li, Li-ping Liu, Soha Hassoun

    Abstract: Despite experimental and curation efforts, the extent of enzyme promiscuity on substrates continues to be largely unexplored and under documented. Recommender systems (RS), which are currently unexplored for the enzyme-substrate interaction prediction problem, can be utilized to provide enzyme recommendations for substrates, and vice versa. The performance of Collaborative-Filtering (CF) recommend… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

    Comments: 9 pages; 2 figures

  7. ASAP-SML: An Antibody Sequence Analysis Pipeline Using Statistical Testing and Machine Learning

    Authors: Xinmeng Li, James A. Van Deventer, Soha Hassoun

    Abstract: Antibodies are capable of potently and specifically binding individual antigens and, in some cases, disrupting their functions. The key challenge in generating antibody-based inhibitors is the lack of fundamental information relating sequences of antibodies to their unique properties as inhibitors. We develop a pipeline, Antibody Sequence Analysis Pipeline using Statistical testing and Machine Lea… ▽ More

    Submitted 8 March, 2020; originally announced March 2020.

  8. arXiv:2002.07327  [pdf

    q-bio.CB cs.LG

    Enzyme promiscuity prediction using hierarchy-informed multi-label classification

    Authors: Gian Marco Visani, Michael C. Hughes, Soha Hassoun

    Abstract: As experimental efforts are costly and time consuming, computational characterization of enzyme capabilities is an attractive alternative. We present and evaluate several machine-learning models to predict which of 983 distinct enzymes, as defined via the Enzyme Commission, EC, numbers, are likely to interact with a given query molecule. Our data consists of enzyme-substrate interactions from the… ▽ More

    Submitted 25 January, 2021; v1 submitted 17 February, 2020; originally announced February 2020.

    Comments: Presented as a poster at the 2019 Machine Learning for Computational Biology Symposium, Vancouver, CA Accepted for publication, Bioinformatics, Jan 22, 2021

  9. arXiv:2002.03410  [pdf, other

    q-bio.MN

    Learning graph representations of biochemical networks and its application to enzymatic link prediction

    Authors: Julie Jiang, Li-Ping Liu, Soha Hassoun

    Abstract: The complete characterization of enzymatic activities between molecules remains incomplete, hindering biological engineering and limiting biological discovery. We develop in this work a technique, Enzymatic Link Prediction (ELP), for predicting the likelihood of an enzymatic transformation between two molecules. ELP models enzymatic reactions catalogued in the KEGG database as a graph. ELP is inno… ▽ More

    Submitted 9 February, 2020; originally announced February 2020.

    Comments: 6 pages, 5 figures

  10. arXiv:1912.05753  [pdf

    q-bio.QM cs.LG stat.ML

    Pathway-Activity Likelihood Analysis and Metabolite Annotation for Untargeted Metabolomics using Probabilistic Modeling

    Authors: Ramtin Hosseini, Neda Hassanpour, Li-Ping Liu, Soha Hassoun

    Abstract: Motivation: Untargeted metabolomics comprehensively characterizes small molecules and elucidates activities of biochemical pathways within a biological sample. Despite computational advances, interpreting collected measurements and determining their biological role remains a challenge. Results: To interpret measurements, we present an inference-based approach, termed Probabilistic modeling for Unt… ▽ More

    Submitted 9 March, 2020; v1 submitted 11 December, 2019; originally announced December 2019.

    Comments: For more details, please visit my homepage at: https://www.eecs.tufts.edu/~ramtin/