Search | arXiv e-print repository

arXiv:2505.21937 [pdf, ps, other]

Graph-Assisted Culturally Adaptable Idiomatic Translation for Indic Languages

Authors: Pratik Rakesh Singh, Kritarth Prasad, Mohammadi Zaki, Pankaj Wasnik

Abstract: Translating multi-word expressions (MWEs) and idioms requires a deep understanding of the cultural nuances of both the source and target languages. This challenge is further amplified by the one-to-many nature of idiomatic translations, where a single source idiom can have multiple target-language equivalents depending on cultural references and contextual variations. Traditional static knowledge… ▽ More Translating multi-word expressions (MWEs) and idioms requires a deep understanding of the cultural nuances of both the source and target languages. This challenge is further amplified by the one-to-many nature of idiomatic translations, where a single source idiom can have multiple target-language equivalents depending on cultural references and contextual variations. Traditional static knowledge graphs (KGs) and prompt-based approaches struggle to capture these complex relationships, often leading to suboptimal translations. To address this, we propose IdiomCE, an adaptive graph neural network (GNN) based methodology that learns intricate mappings between idiomatic expressions, effectively generalizing to both seen and unseen nodes during training. Our proposed method enhances translation quality even in resource-constrained settings, facilitating improved idiomatic translation in smaller models. We evaluate our approach on multiple idiomatic translation datasets using reference-less metrics, demonstrating significant improvements in translating idioms from English to various Indian languages. △ Less

Submitted 27 May, 2025; originally announced May 2025.

Journal ref: ACL Findings 2025

arXiv:2505.21777 [pdf, other]

Memorization to Generalization: Emergence of Diffusion Models from Associative Memory

Authors: Bao Pham, Gabriel Raya, Matteo Negri, Mohammed J. Zaki, Luca Ambrogioni, Dmitry Krotov

Abstract: Hopfield networks are associative memory (AM) systems, designed for storing and retrieving patterns as local minima of an energy landscape. In the classical Hopfield model, an interesting phenomenon occurs when the amount of training data reaches its critical memory load $- spurious\,\,states$, or unintended stable points, emerge at the end of the retrieval dynamics, leading to incorrect recall. I… ▽ More Hopfield networks are associative memory (AM) systems, designed for storing and retrieving patterns as local minima of an energy landscape. In the classical Hopfield model, an interesting phenomenon occurs when the amount of training data reaches its critical memory load $- spurious\,\,states$, or unintended stable points, emerge at the end of the retrieval dynamics, leading to incorrect recall. In this work, we examine diffusion models, commonly used in generative modeling, from the perspective of AMs. The training phase of diffusion model is conceptualized as memory encoding (training data is stored in the memory). The generation phase is viewed as an attempt of memory retrieval. In the small data regime the diffusion model exhibits a strong memorization phase, where the network creates distinct basins of attraction around each sample in the training set, akin to the Hopfield model below the critical memory load. In the large data regime, a different phase appears where an increase in the size of the training set fosters the creation of new attractor states that correspond to manifolds of the generated samples. Spurious states appear at the boundary of this transition and correspond to emergent attractor states, which are absent in the training set, but, at the same time, have distinct basins of attraction around them. Our findings provide: a novel perspective on the memorization-generalization phenomenon in diffusion models via the lens of AMs, theoretical prediction of existence of spurious states, empirical validation of this prediction in commonly-used diffusion models. △ Less

Submitted 27 May, 2025; originally announced May 2025.

arXiv:2505.15069 [pdf, ps, other]

In-Domain African Languages Translation Using LLMs and Multi-armed Bandits

Authors: Pratik Rakesh Singh, Kritarth Prasad, Mohammadi Zaki, Pankaj Wasnik

Abstract: Neural Machine Translation (NMT) systems face significant challenges when working with low-resource languages, particularly in domain adaptation tasks. These difficulties arise due to limited training data and suboptimal model generalization, As a result, selecting an optimal model for translation is crucial for achieving strong performance on in-domain data, particularly in scenarios where fine-t… ▽ More Neural Machine Translation (NMT) systems face significant challenges when working with low-resource languages, particularly in domain adaptation tasks. These difficulties arise due to limited training data and suboptimal model generalization, As a result, selecting an optimal model for translation is crucial for achieving strong performance on in-domain data, particularly in scenarios where fine-tuning is not feasible or practical. In this paper, we investigate strategies for selecting the most suitable NMT model for a given domain using bandit-based algorithms, including Upper Confidence Bound, Linear UCB, Neural Linear Bandit, and Thompson Sampling. Our method effectively addresses the resource constraints by facilitating optimal model selection with high confidence. We evaluate the approach across three African languages and domains, demonstrating its robustness and effectiveness in both scenarios where target data is available and where it is absent. △ Less

Submitted 20 May, 2025; originally announced May 2025.

Journal ref: AfricaNLP Workshop at ACL 2025

arXiv:2505.14629 [pdf, ps, other]

KERL: Knowledge-Enhanced Personalized Recipe Recommendation using Large Language Models

Authors: Fnu Mohbat, Mohammed J Zaki

Abstract: Recent advances in large language models (LLMs) and the abundance of food data have resulted in studies to improve food understanding using LLMs. Despite several recommendation systems utilizing LLMs and Knowledge Graphs (KGs), there has been limited research on integrating food related KGs with LLMs. We introduce KERL, a unified system that leverages food KGs and LLMs to provide personalized food… ▽ More Recent advances in large language models (LLMs) and the abundance of food data have resulted in studies to improve food understanding using LLMs. Despite several recommendation systems utilizing LLMs and Knowledge Graphs (KGs), there has been limited research on integrating food related KGs with LLMs. We introduce KERL, a unified system that leverages food KGs and LLMs to provide personalized food recommendations and generates recipes with associated micro-nutritional information. Given a natural language question, KERL extracts entities, retrieves subgraphs from the KG, which are then fed into the LLM as context to select the recipes that satisfy the constraints. Next, our system generates the cooking steps and nutritional information for each recipe. To evaluate our approach, we also develop a benchmark dataset by curating recipe related questions, combined with constraints and personal preferences. Through extensive experiments, we show that our proposed KG-augmented LLM significantly outperforms existing approaches, offering a complete and coherent solution for food recommendation, recipe generation, and nutritional analysis. Our code and benchmark datasets are publicly available at https://github.com/mohbattharani/KERL. △ Less

Submitted 20 May, 2025; originally announced May 2025.

Comments: Accepted at ACL 2025

arXiv:2504.06036 [pdf, other]

Multi-Sense Embeddings for Language Models and Knowledge Distillation

Authors: Qitong Wang, Mohammed J. Zaki, Georgios Kollias, Vasileios Kalantzis

Abstract: Transformer-based large language models (LLMs) rely on contextual embeddings which generate different (continuous) representations for the same token depending on its surrounding context. Nonetheless, words and tokens typically have a limited number of senses (or meanings). We propose multi-sense embeddings as a drop-in replacement for each token in order to capture the range of their uses in a la… ▽ More Transformer-based large language models (LLMs) rely on contextual embeddings which generate different (continuous) representations for the same token depending on its surrounding context. Nonetheless, words and tokens typically have a limited number of senses (or meanings). We propose multi-sense embeddings as a drop-in replacement for each token in order to capture the range of their uses in a language. To construct a sense embedding dictionary, we apply a clustering algorithm to embeddings generated by an LLM and consider the cluster centers as representative sense embeddings. In addition, we propose a novel knowledge distillation method that leverages the sense dictionary to learn a smaller student model that mimics the senses from the much larger base LLM model, offering significant space and inference time savings, while maintaining competitive performance. Via thorough experiments on various benchmarks, we showcase the effectiveness of our sense embeddings and knowledge distillation approach. We share our code at https://github.com/Qitong-Wang/SenseDict △ Less

Submitted 8 April, 2025; originally announced April 2025.

Comments: 16 pages, 4 figures

arXiv:2503.14801 [pdf, other]

Towards Connected Smart Work Zones: Advancing Work Zone Management through Improved Connectivity

Authors: Mariam Nour, Mohamed H. Zaki, Mohamed Abdel-Aty

Abstract: Work zones play a key role in road and highway maintenance but can lead to significant risks to both drivers and workers. Smart Work Zones (SWZs) have emerged as a potential solution, offering decision-makers real-time insights into the status of the work zone. By utilizing work zone barrels equipped with sensors and communication nodes, SWZs facilitate collecting and transmitting critical data, i… ▽ More Work zones play a key role in road and highway maintenance but can lead to significant risks to both drivers and workers. Smart Work Zones (SWZs) have emerged as a potential solution, offering decision-makers real-time insights into the status of the work zone. By utilizing work zone barrels equipped with sensors and communication nodes, SWZs facilitate collecting and transmitting critical data, including location, traffic density, flow patterns, and worker proximity alerts. In collaboration with the Florida Department of Transportation (FDOT), this study addresses work zone barrel connectivity requirements while considering a cost-effective, low-power, and low-maintenance solution. While the broader project aimed to create a complete SWZ system for the localization of work zone barrels, this paper proposes a novel relay node selection algorithm integrated with Bluetooth Low Energy (BLE) technology to enhance network performance. The proposed algorithm enhances the communication network performance by selecting specific nodes as relay points, avoiding message flooding in the network. It demonstrates an improvement in message delivery rates, achieving up to a 40% increase over existing methods while ensuring balanced load distribution among nodes. Moreover, it maintains an 80% message delivery rate while minimizing power consumption, outperforming other approaches. This improvement in communication efficiency is critical, as it ensures the accurate transmission and delivery of vital work zone data, allowing for faster and more informed decisions to enhance work zone safety and management. △ Less

Submitted 18 March, 2025; originally announced March 2025.

arXiv:2501.15219 [pdf, other]

Faster Machine Translation Ensembling with Reinforcement Learning and Competitive Correction

Authors: Kritarth Prasad, Mohammadi Zaki, Pratik Singh, Pankaj Wasnik

Abstract: Ensembling neural machine translation (NMT) models to produce higher-quality translations than the $L$ individual models has been extensively studied. Recent methods typically employ a candidate selection block (CSB) and an encoder-decoder fusion block (FB), requiring inference across \textit{all} candidate models, leading to significant computational overhead, generally $Ω(L)$. This paper introdu… ▽ More Ensembling neural machine translation (NMT) models to produce higher-quality translations than the $L$ individual models has been extensively studied. Recent methods typically employ a candidate selection block (CSB) and an encoder-decoder fusion block (FB), requiring inference across \textit{all} candidate models, leading to significant computational overhead, generally $Ω(L)$. This paper introduces \textbf{SmartGen}, a reinforcement learning (RL)-based strategy that improves the CSB by selecting a small, fixed number of candidates and identifying optimal groups to pass to the fusion block for each input sentence. Furthermore, previously, the CSB and FB were trained independently, leading to suboptimal NMT performance. Our DQN-based \textbf{SmartGen} addresses this by using feedback from the FB block as a reward during training. We also resolve a key issue in earlier methods, where candidates were passed to the FB without modification, by introducing a Competitive Correction Block (CCB). Finally, we validate our approach with extensive experiments on English-Hindi translation tasks in both directions. △ Less

Submitted 25 January, 2025; originally announced January 2025.

arXiv:2501.10385 [pdf]

Autonomous Microscopy Experiments through Large Language Model Agents

Authors: Indrajeet Mandal, Jitendra Soni, Mohd Zaki, Morten M. Smedskjaer, Katrin Wondraczek, Lothar Wondraczek, Nitya Nand Gosvami, N. M. Anoop Krishnan

Abstract: The emergence of large language models (LLMs) has accelerated the development of self-driving laboratories (SDLs) for materials research. Despite their transformative potential, current SDL implementations rely on rigid, predefined protocols that limit their adaptability to dynamic experimental scenarios across different labs. A significant challenge persists in measuring how effectively AI agents… ▽ More The emergence of large language models (LLMs) has accelerated the development of self-driving laboratories (SDLs) for materials research. Despite their transformative potential, current SDL implementations rely on rigid, predefined protocols that limit their adaptability to dynamic experimental scenarios across different labs. A significant challenge persists in measuring how effectively AI agents can replicate the adaptive decision-making and experimental intuition of expert scientists. Here, we introduce AILA (Artificially Intelligent Lab Assistant), a framework that automates atomic force microscopy (AFM) through LLM-driven agents. Using AFM as an experimental testbed, we develop AFMBench-a comprehensive evaluation suite that challenges AI agents based on language models like GPT-4o and GPT-3.5 to perform tasks spanning the scientific workflow: from experimental design to results analysis. Our systematic assessment shows that state-of-the-art language models struggle even with basic tasks such as documentation retrieval, leading to a significant decline in performance in multi-agent coordination scenarios. Further, we observe that LLMs exhibit a tendency to not adhere to instructions or even divagate to additional tasks beyond the original request, raising serious concerns regarding safety alignment aspects of AI agents for SDLs. Finally, we demonstrate the application of AILA on increasingly complex experiments open-ended experiments: automated AFM calibration, high-resolution feature detection, and mechanical property measurement. Our findings emphasize the necessity for stringent benchmarking protocols before deploying AI agents as laboratory assistants across scientific disciplines. △ Less

Submitted 18 December, 2024; originally announced January 2025.

arXiv:2412.20440 [pdf, other]

Enhancing Entertainment Translation for Indian Languages using Adaptive Context, Style and LLMs

Authors: Pratik Rakesh Singh, Mohammadi Zaki, Pankaj Wasnik

Abstract: We address the challenging task of neural machine translation (NMT) in the entertainment domain, where the objective is to automatically translate a given dialogue from a source language content to a target language. This task has various applications, particularly in automatic dubbing, subtitling, and other content localization tasks, enabling source content to reach a wider audience. Traditional… ▽ More We address the challenging task of neural machine translation (NMT) in the entertainment domain, where the objective is to automatically translate a given dialogue from a source language content to a target language. This task has various applications, particularly in automatic dubbing, subtitling, and other content localization tasks, enabling source content to reach a wider audience. Traditional NMT systems typically translate individual sentences in isolation, without facilitating knowledge transfer of crucial elements such as the context and style from previously encountered sentences. In this work, we emphasize the significance of these fundamental aspects in producing pertinent and captivating translations. We demonstrate their significance through several examples and propose a novel framework for entertainment translation, which, to our knowledge, is the first of its kind. Furthermore, we introduce an algorithm to estimate the context and style of the current session and use these estimations to generate a prompt that guides a Large Language Model (LLM) to generate high-quality translations. Our method is both language and LLM-agnostic, making it a general-purpose tool. We demonstrate the effectiveness of our algorithm through various numerical studies and observe significant improvement in the COMET scores over various state-of-the-art LLMs. Moreover, our proposed method consistently outperforms baseline LLMs in terms of win-ratio. △ Less

Submitted 29 December, 2024; originally announced December 2024.

Comments: Accepted to AAAI'25

arXiv:2412.09560 [pdf, other]

Foundational Large Language Models for Materials Research

Authors: Vaibhav Mishra, Somaditya Singh, Dhruv Ahlawat, Mohd Zaki, Vaibhav Bihani, Hargun Singh Grover, Biswajit Mishra, Santiago Miret, Mausam, N. M. Anoop Krishnan

Abstract: Materials discovery and development are critical for addressing global challenges. Yet, the exponential growth in materials science literature comprising vast amounts of textual data has created significant bottlenecks in knowledge extraction, synthesis, and scientific reasoning. Large Language Models (LLMs) offer unprecedented opportunities to accelerate materials research through automated analy… ▽ More Materials discovery and development are critical for addressing global challenges. Yet, the exponential growth in materials science literature comprising vast amounts of textual data has created significant bottlenecks in knowledge extraction, synthesis, and scientific reasoning. Large Language Models (LLMs) offer unprecedented opportunities to accelerate materials research through automated analysis and prediction. Still, their effective deployment requires domain-specific adaptation for understanding and solving domain-relevant tasks. Here, we present LLaMat, a family of foundational models for materials science developed through continued pretraining of LLaMA models on an extensive corpus of materials literature and crystallographic data. Through systematic evaluation, we demonstrate that LLaMat excels in materials-specific NLP and structured information extraction while maintaining general linguistic capabilities. The specialized LLaMat-CIF variant demonstrates unprecedented capabilities in crystal structure generation, predicting stable crystals with high coverage across the periodic table. Intriguingly, despite LLaMA-3's superior performance in comparison to LLaMA-2, we observe that LLaMat-2 demonstrates unexpectedly enhanced domain-specific performance across diverse materials science tasks, including structured information extraction from text and tables, more particularly in crystal structure generation, a potential adaptation rigidity in overtrained LLMs. Altogether, the present work demonstrates the effectiveness of domain adaptation towards developing practically deployable LLM copilots for materials research. Beyond materials science, our findings reveal important considerations for domain adaptation of LLMs, such as model selection, training methodology, and domain-specific performance, which may influence the development of specialized scientific AI systems. △ Less

Submitted 28 January, 2025; v1 submitted 12 December, 2024; originally announced December 2024.

arXiv:2411.15221 [pdf, other]

Reflections from the 2024 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry

Authors: Yoel Zimmermann, Adib Bazgir, Zartashia Afzal, Fariha Agbere, Qianxiang Ai, Nawaf Alampara, Alexander Al-Feghali, Mehrad Ansari, Dmytro Antypov, Amro Aswad, Jiaru Bai, Viktoriia Baibakova, Devi Dutta Biswajeet, Erik Bitzek, Joshua D. Bocarsly, Anna Borisova, Andres M Bran, L. Catherine Brinson, Marcel Moran Calderon, Alessandro Canalicchio, Victor Chen, Yuan Chiang, Defne Circi, Benjamin Charmes, Vikrant Chaudhary , et al. (119 additional authors not shown)

Abstract: Here, we present the outcomes from the second Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry, which engaged participants across global hybrid locations, resulting in 34 team submissions. The submissions spanned seven key application areas and demonstrated the diverse utility of LLMs for applications in (1) molecular and material property prediction; (2) mo… ▽ More Here, we present the outcomes from the second Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry, which engaged participants across global hybrid locations, resulting in 34 team submissions. The submissions spanned seven key application areas and demonstrated the diverse utility of LLMs for applications in (1) molecular and material property prediction; (2) molecular and material design; (3) automation and novel interfaces; (4) scientific communication and education; (5) research data management and automation; (6) hypothesis generation and evaluation; and (7) knowledge extraction and reasoning from scientific literature. Each team submission is presented in a summary table with links to the code and as brief papers in the appendix. Beyond team results, we discuss the hackathon event and its hybrid format, which included physical hubs in Toronto, Montreal, San Francisco, Berlin, Lausanne, and Tokyo, alongside a global online hub to enable local and virtual collaboration. Overall, the event highlighted significant improvements in LLM capabilities since the previous year's hackathon, suggesting continued expansion of LLMs for applications in materials science and chemistry research. These outcomes demonstrate the dual utility of LLMs as both multipurpose models for diverse machine learning tasks and platforms for rapid prototyping custom applications in scientific research. △ Less

Submitted 2 January, 2025; v1 submitted 20 November, 2024; originally announced November 2024.

Comments: Updating author information, the submission remains largely unchanged. 98 pages total

arXiv:2411.05031 [pdf, other]

On-Device Emoji Classifier Trained with GPT-based Data Augmentation for a Mobile Keyboard

Authors: Hossam Amer, Joe Osborne, Michael Zaki, Mohamed Afify

Abstract: Emojis improve communication quality among smart-phone users that use mobile keyboards to exchange text. To predict emojis for users based on input text, we should consider the on-device low memory and time constraints, ensure that the on-device emoji classifier covers a wide range of emoji classes even though the emoji dataset is typically imbalanced, and adapt the emoji classifier output to user… ▽ More Emojis improve communication quality among smart-phone users that use mobile keyboards to exchange text. To predict emojis for users based on input text, we should consider the on-device low memory and time constraints, ensure that the on-device emoji classifier covers a wide range of emoji classes even though the emoji dataset is typically imbalanced, and adapt the emoji classifier output to user favorites. This paper proposes an on-device emoji classifier based on MobileBert with reasonable memory and latency requirements for SwiftKey. To account for the data imbalance, we utilize the widely used GPT to generate one or more tags for each emoji class. For each emoji and corresponding tags, we merge the original set with GPT-generated sentences and label them with this emoji without human intervention to alleviate the data imbalance. At inference time, we interpolate the emoji output with the user history for emojis for better emoji classifications. Results show that the proposed on-device emoji classifier deployed for SwiftKey increases the accuracy performance of emoji prediction particularly on rare emojis and emoji engagement. △ Less

Submitted 13 February, 2025; v1 submitted 6 November, 2024; originally announced November 2024.

Comments: 8 pages

arXiv:2410.02024 [pdf, other]

FLAG: Financial Long Document Classification via AMR-based GNN

Authors: Bolun "Namir" Xia, Aparna Gupta, Mohammed J. Zaki

Abstract: The advent of large language models (LLMs) has initiated much research into their various financial applications. However, in applying LLMs on long documents, semantic relations are not explicitly incorporated, and a full or arbitrarily sparse attention operation is employed. In recent years, progress has been made in Abstract Meaning Representation (AMR), which is a graph-based representation of… ▽ More The advent of large language models (LLMs) has initiated much research into their various financial applications. However, in applying LLMs on long documents, semantic relations are not explicitly incorporated, and a full or arbitrarily sparse attention operation is employed. In recent years, progress has been made in Abstract Meaning Representation (AMR), which is a graph-based representation of text to preserve its semantic relations. Since AMR can represent semantic relationships at a deeper level, it can be beneficially utilized by graph neural networks (GNNs) for constructing effective document-level graph representations built upon LLM embeddings to predict target metrics in the financial domain. We propose FLAG: Financial Long document classification via AMR-based GNN, an AMR graph based framework to generate document-level embeddings for long financial document classification. We construct document-level graphs from sentence-level AMR graphs, endow them with specialized LLM word embeddings in the financial domain, apply a deep learning mechanism that utilizes a GNN, and examine the efficacy of our AMR-based approach in predicting labeled target data from long financial documents. Extensive experiments are conducted on a dataset of quarterly earnings calls transcripts of companies in various sectors of the economy, as well as on a corpus of more recent earnings calls of companies in the S&P 1500 Composite Index. We find that our AMR-based approach outperforms fine-tuning LLMs directly on text in predicting stock price movement trends at different time horizons in both datasets. Our work also outperforms previous work utilizing document graphs and GNNs for text classification. △ Less

Submitted 22 October, 2024; v1 submitted 2 October, 2024; originally announced October 2024.

Comments: 8 pages, 3 figures, to be published in CIFEr Conference 2024 as "Semantic Graph Learning for Trend Prediction from Long Financial Documents"

arXiv:2410.00876 [pdf, other]

Replacing Paths with Connection-Biased Attention for Knowledge Graph Completion

Authors: Sharmishtha Dutta, Alex Gittens, Mohammed J. Zaki, Charu C. Aggarwal

Abstract: Knowledge graph (KG) completion aims to identify additional facts that can be inferred from the existing facts in the KG. Recent developments in this field have explored this task in the inductive setting, where at test time one sees entities that were not present during training; the most performant models in the inductive setting have employed path encoding modules in addition to standard subgra… ▽ More Knowledge graph (KG) completion aims to identify additional facts that can be inferred from the existing facts in the KG. Recent developments in this field have explored this task in the inductive setting, where at test time one sees entities that were not present during training; the most performant models in the inductive setting have employed path encoding modules in addition to standard subgraph encoding modules. This work similarly focuses on KG completion in the inductive setting, without the explicit use of path encodings, which can be time-consuming and introduces several hyperparameters that require costly hyperparameter optimization. Our approach uses a Transformer-based subgraph encoding module only; we introduce connection-biased attention and entity role embeddings into the subgraph encoding module to eliminate the need for an expensive and time-consuming path encoding module. Evaluations on standard inductive KG completion benchmark datasets demonstrate that our \textbf{C}onnection-\textbf{B}iased \textbf{Li}nk \textbf{P}rediction (CBLiP) model has superior performance to models that do not use path information. Compared to models that utilize path information, CBLiP shows competitive or superior performance while being faster. Additionally, to show that the effectiveness of connection-biased attention and entity role embeddings also holds in the transductive setting, we compare CBLiP's performance on the relation prediction task in the transductive setting. △ Less

Submitted 8 April, 2025; v1 submitted 1 October, 2024; originally announced October 2024.

arXiv:2408.16889 [pdf, other]

doi 10.1145/3627673.3679562

LLaVA-Chef: A Multi-modal Generative Model for Food Recipes

Authors: Fnu Mohbat, Mohammed J. Zaki

Abstract: In the rapidly evolving landscape of online recipe sharing within a globalized context, there has been a notable surge in research towards comprehending and generating food recipes. Recent advancements in large language models (LLMs) like GPT-2 and LLaVA have paved the way for Natural Language Processing (NLP) approaches to delve deeper into various facets of food-related tasks, encompassing ingre… ▽ More In the rapidly evolving landscape of online recipe sharing within a globalized context, there has been a notable surge in research towards comprehending and generating food recipes. Recent advancements in large language models (LLMs) like GPT-2 and LLaVA have paved the way for Natural Language Processing (NLP) approaches to delve deeper into various facets of food-related tasks, encompassing ingredient recognition and comprehensive recipe generation. Despite impressive performance and multi-modal adaptability of LLMs, domain-specific training remains paramount for their effective application. This work evaluates existing LLMs for recipe generation and proposes LLaVA-Chef, a novel model trained on a curated dataset of diverse recipe prompts in a multi-stage approach. First, we refine the mapping of visual food image embeddings to the language space. Second, we adapt LLaVA to the food domain by fine-tuning it on relevant recipe data. Third, we utilize diverse prompts to enhance the model's recipe comprehension. Finally, we improve the linguistic quality of generated recipes by penalizing the model with a custom loss function. LLaVA-Chef demonstrates impressive improvements over pretrained LLMs and prior works. A detailed qualitative analysis reveals that LLaVA-Chef generates more detailed recipes with precise ingredient mentions, compared to existing approaches. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2407.00520 [pdf, other]

Effects of Family Non-universal $Z^{\prime}$ Model in the angular observables of $B\to(ρ,a_{1})μ^{+}μ^{-}$ decays

Authors: Nimra Farooq, Marwah Zaki, M. Ali Paracha, Faisal Munir Bhutta

Abstract: We present the angular distribution of the four-fold $B\toρ(\toππ)μ^{+}μ^{-}$ and $B\to a_{1}(\toρ_{\parallel, \perp}π)μ^{+}μ^{-}$ decays both in the Standard Model and the family non-universal $Z^{\prime}$ model. At the quark level, these decays are governed by the $b\to dμ^{+}μ^{-}$ transition. Along with different angular observables, we also give predictions of differential branching ratios, f… ▽ More We present the angular distribution of the four-fold $B\toρ(\toππ)μ^{+}μ^{-}$ and $B\to a_{1}(\toρ_{\parallel, \perp}π)μ^{+}μ^{-}$ decays both in the Standard Model and the family non-universal $Z^{\prime}$ model. At the quark level, these decays are governed by the $b\to dμ^{+}μ^{-}$ transition. Along with different angular observables, we also give predictions of differential branching ratios, forward-backward asymmetry, longitudinal polarization fraction of $ρ$, and $a_{1}$ mesons. Our analysis shows that the signatures of family non-universal $Z^{\prime}$ model are more distinct in the observables associated with the $B\toρ(\toππ)μ^{+}μ^{-}$ decay, compared to that of the $B\to a_{1}(\toρ_{\parallel, \perp}π)μ^{+}μ^{-}$ decay. Future measurements of the predicted angular observables, both at current and future high energy colliders, will add to the useful complementary data required to clarify the structure of the family non-universal $Z^{\prime}$ model in $|Δb|$=$|Δd|=1$ processes. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 39 pages, 6 figures, 38 tables; version to be published in Chinese Physics C

arXiv:2406.08530 [pdf, other]

Validating Temporal Compliance Patterns: A Unified Approach with $MTL_f$ over various Data Models

Authors: Nesma M. Zaki, Iman M. A. Helal, Ehab E. Hassanein, Ahmed Awad

Abstract: Process mining extracts valuable insights from event data to help organizations improve their business processes, which is essential for their growth and success. By leveraging process mining techniques, organizations gain a comprehensive understanding of their processes' execution, enabling the discovery of process models, detection of deviations, identification of bottlenecks, and assessment of… ▽ More Process mining extracts valuable insights from event data to help organizations improve their business processes, which is essential for their growth and success. By leveraging process mining techniques, organizations gain a comprehensive understanding of their processes' execution, enabling the discovery of process models, detection of deviations, identification of bottlenecks, and assessment of performance. Compliance checking, a specific area within conformance checking, ensures that the organizational activities adhere to prescribed process models and regulations. Linear Temporal Logic over finite traces ($LTL_{f}$ ) is commonly used for conformance checking, but it may not capture all temporal aspects accurately. This paper proposes Metric Temporal Logic over finite traces ($MTL_{f}$ ) to define explicit time-related constraints effectively in addition to the implicit time-ordering covered by $LTL_f$. Therefore, it provides a universal formal approach to capture compliance rules. Moreover, we define a minimal set of generic $MTL_f$ formulas and show that they are capable of capturing all the common patterns for compliance rules. As compliance validation is largely driven by the data model used to represent the event logs, we provide a mapping from $MTL_f$ to the common data models we found in the literature to encode event logs, namely, the relational and the graph models. A comprehensive study comparing various data models and an empirical evaluation across real-life event logs demonstrates the effectiveness of the proposed approach. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2405.20587 [pdf, ps, other]

Quality-Aware Task Offloading for Cooperative Perception in Vehicular Edge Computing

Authors: Amr M. Zaki, Sara A. Elsayed, Khalid Elgazzar, Hossam S. Hassanein

Abstract: Task offloading in Vehicular Edge Computing (VEC) can advance cooperative perception (CP) to improve traffic awareness in Autonomous Vehicles. In this paper, we propose the Quality-aware Cooperative Perception Task Offloading (QCPTO) scheme. Q-CPTO is the first task offloading scheme that enhances traffic awareness by prioritizing the quality rather than the quantity of cooperative perception. Q-C… ▽ More Task offloading in Vehicular Edge Computing (VEC) can advance cooperative perception (CP) to improve traffic awareness in Autonomous Vehicles. In this paper, we propose the Quality-aware Cooperative Perception Task Offloading (QCPTO) scheme. Q-CPTO is the first task offloading scheme that enhances traffic awareness by prioritizing the quality rather than the quantity of cooperative perception. Q-CPTO improves the quality of CP by curtailing perception redundancy and increasing the Value of Information (VOI) procured by each user. We use Kalman filters (KFs) for VOI assessment, predicting the next movement of each vehicle to estimate its region of interest. The estimated VOI is then integrated into the task offloading problem. We formulate the task offloading problem as an Integer Linear Program (ILP) that maximizes the VOI of users and reduces perception redundancy by leveraging the spatially diverse fields of view (FOVs) of vehicles, while adhering to strict latency requirements. We also propose the Q-CPTO-Heuristic (Q-CPTOH) scheme to solve the task offloading problem in a time-efficient manner. Extensive evaluations show that Q-CPTO significantly outperforms prominent task offloading schemes by up to 14% and 20% in terms of response delay and traffic awareness, respectively. Furthermore, Q-CPTO-H closely approaches the optimal solution, with marginal gaps of up to 1.4% and 2.1% in terms of traffic awareness and the number of collaborating users, respectively, while reducing the runtime by up to 84%. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.07269 [pdf, other]

The comparative study of high efficiency of Tm^{3+}-doped fiber laser at 1.72 μm for different pump schemes

Authors: Mohamed Zaki, Mostafa Abouricha, Said Amrane

Abstract: In this study, we revealed the impact of the pumping scheme, fiber length, pumping power, and reflectivity of the output fiber Bragg grating on the performance of a Tm^3+ -doped fiber laser (TDFL) operating at a wavelength of 1.72 μm. Using numerical simulations, we optimized the output power and reduced losses due to reabsorption; as well as amplified spontaneous emission (ASE) at approximately 1… ▽ More In this study, we revealed the impact of the pumping scheme, fiber length, pumping power, and reflectivity of the output fiber Bragg grating on the performance of a Tm^3+ -doped fiber laser (TDFL) operating at a wavelength of 1.72 μm. Using numerical simulations, we optimized the output power and reduced losses due to reabsorption; as well as amplified spontaneous emission (ASE) at approximately 1820 nm. The Tm^3+ -doped fiber was bi-directionally pumped at 1570 nm to enhance the pump absorption. The simulations suggest that a maximum power of 5.96W at 1.72 μm and a slope efficiency of 64 % are achievable using a Tm^{3+}-doped silica fiber with a bi-directional pump of 4 W forward and 6 W backward. △ Less

Submitted 12 May, 2024; originally announced May 2024.

arXiv:2403.15469 [pdf, other]

Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning

Authors: Shivam Ratnakant Mhaskar, Nirmesh J. Shah, Mohammadi Zaki, Ashishkumar P. Gudmalwar, Pankaj Wasnik, Rajiv Ratn Shah

Abstract: Traditional Automatic Video Dubbing (AVD) pipeline consists of three key modules, namely, Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), and Text-to-Speech (TTS). Within AVD pipelines, isometric-NMT algorithms are employed to regulate the length of the synthesized output text. This is done to guarantee synchronization with respect to the alignment of video and audio subseque… ▽ More Traditional Automatic Video Dubbing (AVD) pipeline consists of three key modules, namely, Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), and Text-to-Speech (TTS). Within AVD pipelines, isometric-NMT algorithms are employed to regulate the length of the synthesized output text. This is done to guarantee synchronization with respect to the alignment of video and audio subsequent to the dubbing process. Previous approaches have focused on aligning the number of characters and words in the source and target language texts of Machine Translation models. However, our approach aims to align the number of phonemes instead, as they are closely associated with speech duration. In this paper, we present the development of an isometric NMT system using Reinforcement Learning (RL), with a focus on optimizing the alignment of phoneme counts in the source and target language sentence pairs. To evaluate our models, we propose the Phoneme Count Compliance (PCC) score, which is a measure of length compliance. Our approach demonstrates a substantial improvement of approximately 36% in the PCC score compared to the state-of-the-art models when applied to English-Hindi language pairs. Moreover, we propose a student-teacher architecture within the framework of our RL approach to maintain a trade-off between the phoneme count and translation quality. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Accepted in NAACL2024 Findings

arXiv:2402.06185 [pdf, other]

Development and validation of an artificial intelligence model to accurately predict spinopelvic parameters

Authors: Edward S. Harake, Joseph R. Linzey, Cheng Jiang, Rushikesh S. Joshi, Mark M. Zaki, Jaes C. Jones, Siri S. Khalsa, John H. Lee, Zachary Wilseck, Jacob R. Joseph, Todd C. Hollon, Paul Park

Abstract: Objective. Achieving appropriate spinopelvic alignment has been shown to be associated with improved clinical symptoms. However, measurement of spinopelvic radiographic parameters is time-intensive and interobserver reliability is a concern. Automated measurement tools have the promise of rapid and consistent measurements, but existing tools are still limited by some degree of manual user-entry re… ▽ More Objective. Achieving appropriate spinopelvic alignment has been shown to be associated with improved clinical symptoms. However, measurement of spinopelvic radiographic parameters is time-intensive and interobserver reliability is a concern. Automated measurement tools have the promise of rapid and consistent measurements, but existing tools are still limited by some degree of manual user-entry requirements. This study presents a novel artificial intelligence (AI) tool called SpinePose that automatically predicts spinopelvic parameters with high accuracy without the need for manual entry. Methods. SpinePose was trained and validated on 761 sagittal whole-spine X-rays to predict sagittal vertical axis (SVA), pelvic tilt (PT), pelvic incidence (PI), sacral slope (SS), lumbar lordosis (LL), T1-pelvic angle (T1PA), and L1-pelvic angle (L1PA). A separate test set of 40 X-rays was labeled by 4 reviewers, including fellowship-trained spine surgeons and a fellowship-trained radiologist with neuroradiology subspecialty certification. Median errors relative to the most senior reviewer were calculated to determine model accuracy on test images. Intraclass correlation coefficients (ICC) were used to assess inter-rater reliability. Results. SpinePose exhibited the following median (interquartile range) parameter errors: SVA: 2.2(2.3)mm, p=0.93; PT: 1.3(1.2)°, p=0.48; SS: 1.7(2.2)°, p=0.64; PI: 2.2(2.1)°, p=0.24; LL: 2.6(4.0)°, p=0.89; T1PA: 1.1(0.9)°, p=0.42; and L1PA: 1.4(1.6)°, p=0.49. Model predictions also exhibited excellent reliability at all parameters (ICC: 0.91-1.0). Conclusions. SpinePose accurately predicted spinopelvic parameters with excellent reliability comparable to fellowship-trained spine surgeons and neuroradiologists. Utilization of predictive AI tools in spinal imaging can substantially aid in patient selection and surgical planning. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 10 pages, 5 figures, to appear in Journal of Neurosurgery: Spine

arXiv:2402.04538 [pdf, other]

Triplet Interaction Improves Graph Transformers: Accurate Molecular Graph Learning with Triplet Graph Transformers

Authors: Md Shamim Hussain, Mohammed J. Zaki, Dharmashankar Subramanian

Abstract: Graph transformers typically lack third-order interactions, limiting their geometric understanding which is crucial for tasks like molecular geometry prediction. We propose the Triplet Graph Transformer (TGT) that enables direct communication between pairs within a 3-tuple of nodes via novel triplet attention and aggregation mechanisms. TGT is applied to molecular property prediction by first pred… ▽ More Graph transformers typically lack third-order interactions, limiting their geometric understanding which is crucial for tasks like molecular geometry prediction. We propose the Triplet Graph Transformer (TGT) that enables direct communication between pairs within a 3-tuple of nodes via novel triplet attention and aggregation mechanisms. TGT is applied to molecular property prediction by first predicting interatomic distances from 2D graphs and then using these distances for downstream tasks. A novel three-stage training procedure and stochastic inference further improve training efficiency and model performance. Our model achieves new state-of-the-art (SOTA) results on open challenge benchmarks PCQM4Mv2 and OC20 IS2RE. We also obtain SOTA results on QM9, MOLPCBA, and LIT-PCBA molecular property prediction benchmarks via transfer learning. We also demonstrate the generality of TGT with SOTA results on the traveling salesman problem (TSP). △ Less

Submitted 9 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: ICML'24 Accepted Version, 25 pages, 10 figures, 18 tables

arXiv:2310.08383 [pdf, other]

doi 10.1039/D4DD00032C

Reconstructing Materials Tetrahedron: Challenges in Materials Information Extraction

Authors: Kausik Hira, Mohd Zaki, Dhruvil Sheth, Mausam, N M Anoop Krishnan

Abstract: The discovery of new materials has a documented history of propelling human progress for centuries and more. The behaviour of a material is a function of its composition, structure, and properties, which further depend on its processing and testing conditions. Recent developments in deep learning and natural language processing have enabled information extraction at scale from published literature… ▽ More The discovery of new materials has a documented history of propelling human progress for centuries and more. The behaviour of a material is a function of its composition, structure, and properties, which further depend on its processing and testing conditions. Recent developments in deep learning and natural language processing have enabled information extraction at scale from published literature such as peer-reviewed publications, books, and patents. However, this information is spread in multiple formats, such as tables, text, and images, and with little or no uniformity in reporting style giving rise to several machine learning challenges. Here, we discuss, quantify, and document these challenges in automated information extraction (IE) from materials science literature towards the creation of a large materials science knowledge base. Specifically, we focus on IE from text and tables and outline several challenges with examples. We hope the present work inspires researchers to address the challenges in a coherent fashion, providing a fillip to IE towards developing a materials knowledge base. △ Less

Submitted 26 April, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

Journal ref: Digital Discovery, 2024, Advance Article

arXiv:2308.09115 [pdf]

MaScQA: A Question Answering Dataset for Investigating Materials Science Knowledge of Large Language Models

Authors: Mohd Zaki, Jayadeva, Mausam, N. M. Anoop Krishnan

Abstract: Information extraction and textual comprehension from materials literature are vital for developing an exhaustive knowledge base that enables accelerated materials discovery. Language models have demonstrated their capability to answer domain-specific questions and retrieve information from knowledge bases. However, there are no benchmark datasets in the materials domain that can evaluate the unde… ▽ More Information extraction and textual comprehension from materials literature are vital for developing an exhaustive knowledge base that enables accelerated materials discovery. Language models have demonstrated their capability to answer domain-specific questions and retrieve information from knowledge bases. However, there are no benchmark datasets in the materials domain that can evaluate the understanding of the key concepts by these language models. In this work, we curate a dataset of 650 challenging questions from the materials domain that require the knowledge and skills of a materials student who has cleared their undergraduate degree. We classify these questions based on their structure and the materials science domain-based subcategories. Further, we evaluate the performance of GPT-3.5 and GPT-4 models on solving these questions via zero-shot and chain of thought prompting. It is observed that GPT-4 gives the best performance (~62% accuracy) as compared to GPT-3.5. Interestingly, in contrast to the general observation, no significant improvement in accuracy is observed with the chain of thought prompting. To evaluate the limitations, we performed an error analysis, which revealed conceptual errors (~64%) as the major contributor compared to computational errors (~36%) towards the reduced performance of LLMs. We hope that the dataset and analysis performed in this work will promote further research in developing better materials science domain-specific LLMs and strategies for information extraction. △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2306.03209 [pdf, other]

End-to-end Differentiable Clustering with Associative Memories

Authors: Bishwajit Saha, Dmitry Krotov, Mohammed J. Zaki, Parikshit Ram

Abstract: Clustering is a widely used unsupervised learning technique involving an intensive discrete optimization problem. Associative Memory models or AMs are differentiable neural networks defining a recursive dynamical system, which have been integrated with various deep learning architectures. We uncover a novel connection between the AM dynamics and the inherent discrete assignment necessary in cluste… ▽ More Clustering is a widely used unsupervised learning technique involving an intensive discrete optimization problem. Associative Memory models or AMs are differentiable neural networks defining a recursive dynamical system, which have been integrated with various deep learning architectures. We uncover a novel connection between the AM dynamics and the inherent discrete assignment necessary in clustering to propose a novel unconstrained continuous relaxation of the discrete clustering problem, enabling end-to-end differentiable clustering with AM, dubbed ClAM. Leveraging the pattern completion ability of AMs, we further develop a novel self-supervised clustering loss. Our evaluations on varied datasets demonstrate that ClAM benefits from the self-supervision, and significantly improves upon both the traditional Lloyd's k-means algorithm, and more recent continuous clustering relaxations (by upto 60% in terms of the Silhouette Coefficient). △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: Accepted to ICML 2023

arXiv:2306.01705 [pdf, other]

doi 10.1145/3580305.3599520

The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles

Authors: Md Shamim Hussain, Mohammed J. Zaki, Dharmashankar Subramanian

Abstract: Transformers use the dense self-attention mechanism which gives a lot of flexibility for long-range connectivity. Over multiple layers of a deep transformer, the number of possible connectivity patterns increases exponentially. However, very few of these contribute to the performance of the network, and even fewer are essential. We hypothesize that there are sparsely connected sub-networks within… ▽ More Transformers use the dense self-attention mechanism which gives a lot of flexibility for long-range connectivity. Over multiple layers of a deep transformer, the number of possible connectivity patterns increases exponentially. However, very few of these contribute to the performance of the network, and even fewer are essential. We hypothesize that there are sparsely connected sub-networks within a transformer, called information pathways which can be trained independently. However, the dynamic (i.e., input-dependent) nature of these pathways makes it difficult to prune dense self-attention during training. But the overall distribution of these pathways is often predictable. We take advantage of this fact to propose Stochastically Subsampled self-Attention (SSA) - a general-purpose training strategy for transformers that can reduce both the memory and computational cost of self-attention by 4 to 8 times during training while also serving as a regularization method - improving generalization over dense training. We show that an ensemble of sub-models can be formed from the subsampled pathways within a network, which can achieve better performance than its densely attended counterpart. We perform experiments on a variety of NLP, computer vision and graph learning tasks in both generative and discriminative settings to provide empirical evidence for our claims and show the effectiveness of the proposed method. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: KDD23 preprint, 12 pages, 7 figures, 10 tables

arXiv:2305.17219 [pdf]

GVdoc: Graph-based Visual Document Classification

Authors: Fnu Mohbat, Mohammed J. Zaki, Catherine Finegan-Dollak, Ashish Verma

Abstract: The robustness of a model for real-world deployment is decided by how well it performs on unseen data and distinguishes between in-domain and out-of-domain samples. Visual document classifiers have shown impressive performance on in-distribution test sets. However, they tend to have a hard time correctly classifying and differentiating out-of-distribution examples. Image-based classifiers lack the… ▽ More The robustness of a model for real-world deployment is decided by how well it performs on unseen data and distinguishes between in-domain and out-of-domain samples. Visual document classifiers have shown impressive performance on in-distribution test sets. However, they tend to have a hard time correctly classifying and differentiating out-of-distribution examples. Image-based classifiers lack the text component, whereas multi-modality transformer-based models face the token serialization problem in visual documents due to their diverse layouts. They also require a lot of computing power during inference, making them impractical for many real-world applications. We propose, GVdoc, a graph-based document classification model that addresses both of these challenges. Our approach generates a document graph based on its layout, and then trains a graph neural network to learn node and graph embeddings. Through experiments, we show that our model, even with fewer parameters, outperforms state-of-the-art models on out-of-distribution data while retaining comparable performance on the in-distribution test set. △ Less

Submitted 26 May, 2023; originally announced May 2023.

arXiv:2303.01145 [pdf, other]

doi 10.1016/j.nuclphysb.2023.116236

Footprints of New Physics in the angular distribution of $B_{c}\to D_{s}^{\ast}(\to D_{s}γ,(D_{s}π))\ell^{+}\ell^{-}$ decays

Authors: Marwah Zaki, M. Ali Paracha, Faisal Munir Bhutta

Abstract: We investigate the angular decay distribution of the four-fold $B_{c}\to D^{\ast}_{s}(\to D_{s}γ)μ^{+}μ^{-}$, and $B_{c}\to D^{\ast}_{s}(\to D_{s}π)μ^{+}μ^{-}$ decays that proceed through $b\to sμ^{+}μ^{-}$ quark level transition. We use the model independent effective Hamiltonian with vector and axial vector new physics operators to formulate the angular observables and study the implications of… ▽ More We investigate the angular decay distribution of the four-fold $B_{c}\to D^{\ast}_{s}(\to D_{s}γ)μ^{+}μ^{-}$, and $B_{c}\to D^{\ast}_{s}(\to D_{s}π)μ^{+}μ^{-}$ decays that proceed through $b\to sμ^{+}μ^{-}$ quark level transition. We use the model independent effective Hamiltonian with vector and axial vector new physics operators to formulate the angular observables and study the implications of different latest new physics scenarios, taken from the global fits to all the $b\to s$ data, on these observables. We also give Standard Model and new physics predictions of several observables such as differential branching ratios, forward backward asymmetry, longitudinal polarization fraction of $D_s^{\ast}$, and the unpolarized and polarized lepton flavor universality violating ratios. Future measurements of the predicted angular observables, both at current and future high energy colliders, will add to the useful complementary data required to clarify the structure of new physics in $b\to s\ell\ell$ neutral current decays. △ Less

Submitted 29 July, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

Comments: 24 pages, 7 figures, 18 tables; ; version matching publication in NPB

arXiv:2302.07253 [pdf, other]

Energy Transformer

Authors: Benjamin Hoover, Yuchen Liang, Bao Pham, Rameswar Panda, Hendrik Strobelt, Duen Horng Chau, Mohammed J. Zaki, Dmitry Krotov

Abstract: Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory. Attention is the power-house driving modern deep learning successes, but it lacks clear theoretical foundations. Energy-based models allow a principled approach to discriminative and generative tasks, but the design of the energy functional is not st… ▽ More Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory. Attention is the power-house driving modern deep learning successes, but it lacks clear theoretical foundations. Energy-based models allow a principled approach to discriminative and generative tasks, but the design of the energy functional is not straightforward. At the same time, Dense Associative Memory models or Modern Hopfield Networks have a well-established theoretical foundation, and allow an intuitive design of the energy function. We propose a novel architecture, called the Energy Transformer (or ET for short), that uses a sequence of attention layers that are purposely designed to minimize a specifically engineered energy function, which is responsible for representing the relationships between the tokens. In this work, we introduce the theoretical foundations of ET, explore its empirical capabilities using the image completion task, and obtain strong quantitative results on the graph anomaly detection and graph classification tasks. △ Less

Submitted 31 October, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

Journal ref: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2301.08073 [pdf]

Glass Hardness: Predicting Composition and Load Effects via Symbolic Reasoning-Informed Machine Learning

Authors: Sajid Mannan, Mohd Zaki, Suresh Bishnoi, Daniel R. Cassar, Jeanini Jiusti, Julio Cesar Ferreira Faria, Johan F. S. Christensen, Nitya Nand Gosvami, Morten M. Smedskjaer, Edgar Dutra Zanotto, N. M. Anoop Krishnan

Abstract: Glass hardness varies in a non-linear fashion with the chemical composition and applied load, a phenomenon known as the indentation size effect (ISE), which is challenging to predict quantitatively. Here, using a curated dataset of over approx. 3000 inorganic glasses from the literature comprising the composition, indentation load, and hardness, we develop machine learning (ML) models to predict t… ▽ More Glass hardness varies in a non-linear fashion with the chemical composition and applied load, a phenomenon known as the indentation size effect (ISE), which is challenging to predict quantitatively. Here, using a curated dataset of over approx. 3000 inorganic glasses from the literature comprising the composition, indentation load, and hardness, we develop machine learning (ML) models to predict the composition and load dependence of Vickers hardness. Interestingly, when tested on new glass compositions unseen during the training, the standard data-driven ML model failed to capture the ISE. To address this gap, we combined an empirical expression (Bernhardt law) to describe the ISE with ML to develop a framework that incorporates the symbolic law representing the domain reasoning in ML, namely Symbolic Reasoning-Informed ML Procedure (SRIMP). We show that the resulting SRIMP outperforms the data-driven ML model in predicting the ISE. Finally, we interpret the SRIMP model to understand the contribution of the glass network formers and modifiers toward composition and load-dependent (ISE) and load-independent hardness. The deconvolution of the hardness into load-dependent and load-independent terms paves the way toward a holistic understanding of composition and ISE in glasses, enabling the accelerated discovery of new glass compositions with targeted hardness. △ Less

Submitted 19 January, 2023; originally announced January 2023.

arXiv:2211.03223 [pdf]

Cementron: Machine Learning the Constituent Phases in Cement Clinker from Optical Images

Authors: Mohd Zaki, Siddhant Sharma, Sunil Kumar Gurjar, Raju Goyal, Jayadeva, N. M. Anoop Krishnan

Abstract: Cement is the most used construction material. The performance of cement hydrate depends on the constituent phases, viz. alite, belite, aluminate, and ferrites present in the cement clinker, both qualitatively and quantitatively. Traditionally, clinker phases are analyzed from optical images relying on a domain expert and simple image processing techniques. However, the non-uniformity of the image… ▽ More Cement is the most used construction material. The performance of cement hydrate depends on the constituent phases, viz. alite, belite, aluminate, and ferrites present in the cement clinker, both qualitatively and quantitatively. Traditionally, clinker phases are analyzed from optical images relying on a domain expert and simple image processing techniques. However, the non-uniformity of the images, variations in the geometry and size of the phases, and variabilities in the experimental approaches and imaging methods make it challenging to obtain the phases. Here, we present a machine learning (ML) approach to detect clinker microstructure phases automatically. To this extent, we create the first annotated dataset of cement clinker by segmenting alite and belite particles. Further, we use supervised ML methods to train models for identifying alite and belite regions. Specifically, we finetune the image detection and segmentation model Detectron-2 on the cement microstructure to develop a model for detecting the cement phases, namely, Cementron. We demonstrate that Cementron, trained only on literature data, works remarkably well on new images obtained from our experiments, demonstrating its generalizability. We make Cementron available for public use. △ Less

Submitted 6 November, 2022; originally announced November 2022.

arXiv:2211.00691 [pdf]

Accelerated Design of Chalcogenide Glasses through Interpretable Machine Learning for Composition Property Relationships

Authors: Sayam Singla, Sajid Mannan, Mohd Zaki, N. M. Anoop Krishnan

Abstract: Chalcogenide glasses possess several outstanding properties that enable several ground breaking applications, such as optical discs, infrared cameras, and thermal imaging systems. Despite the ubiquitous usage of these glasses, the composition property relationships in these materials remain poorly understood. Here, we use a large experimental dataset comprising approx 24000 glass compositions made… ▽ More Chalcogenide glasses possess several outstanding properties that enable several ground breaking applications, such as optical discs, infrared cameras, and thermal imaging systems. Despite the ubiquitous usage of these glasses, the composition property relationships in these materials remain poorly understood. Here, we use a large experimental dataset comprising approx 24000 glass compositions made of 51 distinct elements from the periodic table to develop machine learning models for predicting 12 properties, namely, annealing point, bulk modulus, density, Vickers hardness, Littleton point, Youngs modulus, shear modulus, softening point, thermal expansion coefficient, glass transition temperature, liquidus temperature, and refractive index. These models, by far, are the largest for chalcogenide glasses. Further, we use SHAP, a game theory based algorithm, to interpret the output of machine learning algorithms by analyzing the contributions of each element towards the models prediction of a property. This provides a powerful tool for experimentalists to interpret the models prediction and hence design new glass compositions with targeted properties. Finally, using the models, we develop several glass selection charts that can potentially aid in the rational design of novel chalcogenide glasses for various applications. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: 17 pages, 8 figures

arXiv:2208.14376 [pdf, other]

Associative Learning for Network Embedding

Authors: Yuchen Liang, Dmitry Krotov, Mohammed J. Zaki

Abstract: The network embedding task is to represent the node in the network as a low-dimensional vector while incorporating the topological and structural information. Most existing approaches solve this problem by factorizing a proximity matrix, either directly or implicitly. In this work, we introduce a network embedding method from a new perspective, which leverages Modern Hopfield Networks (MHN) for as… ▽ More The network embedding task is to represent the node in the network as a low-dimensional vector while incorporating the topological and structural information. Most existing approaches solve this problem by factorizing a proximity matrix, either directly or implicitly. In this work, we introduce a network embedding method from a new perspective, which leverages Modern Hopfield Networks (MHN) for associative learning. Our network learns associations between the content of each node and that node's neighbors. These associations serve as memories in the MHN. The recurrent dynamics of the network make it possible to recover the masked node, given that node's neighbors. Our proposed method is evaluated on different downstream tasks such as node classification and linkage prediction. The results show competitive performance compared to the common matrix factorization techniques and deep learning based methods. △ Less

Submitted 30 August, 2022; originally announced August 2022.

Comments: Accepted at the Eighth International Workshop on Deep Learning on Graphs: Methods and Applications (DLG-KDD 2022), Washington DC

arXiv:2207.09090 [pdf, other]

Actor-Critic based Improper Reinforcement Learning

Authors: Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor

Abstract: We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones. This can be useful in tuning across controllers, learnt possibly in mismatched or simulated environments, to obtain a good controller for a… ▽ More We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones. This can be useful in tuning across controllers, learnt possibly in mismatched or simulated environments, to obtain a good controller for a given target environment with relatively few trials. Towards this, we propose two algorithms: (1) a Policy Gradient-based approach; and (2) an algorithm that can switch between a simple Actor-Critic (AC) based scheme and a Natural Actor-Critic (NAC) scheme depending on the available information. Both algorithms operate over a class of improper mixtures of the given controllers. For the first case, we derive convergence rate guarantees assuming access to a gradient oracle. For the AC-based approach we provide convergence rate guarantees to a stationary point in the basic AC case and to a global optimum in the NAC case. Numerical results on (i) the standard control theoretic benchmark of stabilizing an cartpole; and (ii) a constrained queueing task show that our improper policy optimization algorithm can stabilize the system even when the base policies at its disposal are unstable. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2102.08201

arXiv:2207.05194 [pdf, other]

Towards Neural Numeric-To-Text Generation From Temporal Personal Health Data

Authors: Jonathan Harris, Mohammed J. Zaki

Abstract: With an increased interest in the production of personal health technologies designed to track user data (e.g., nutrient intake, step counts), there is now more opportunity than ever to surface meaningful behavioral insights to everyday users in the form of natural language. This knowledge can increase their behavioral awareness and allow them to take action to meet their health goals. It can also… ▽ More With an increased interest in the production of personal health technologies designed to track user data (e.g., nutrient intake, step counts), there is now more opportunity than ever to surface meaningful behavioral insights to everyday users in the form of natural language. This knowledge can increase their behavioral awareness and allow them to take action to meet their health goals. It can also bridge the gap between the vast collection of personal health data and the summary generation required to describe an individual's behavioral tendencies. Previous work has focused on rule-based time-series data summarization methods designed to generate natural language summaries of interesting patterns found within temporal personal health data. We examine recurrent, convolutional, and Transformer-based encoder-decoder models to automatically generate natural language summaries from numeric temporal personal health data. We showcase the effectiveness of our models on real user health data logged in MyFitnessPal and show that we can automatically generate high-quality natural language summaries. Our work serves as a first step towards the ambitious goal of automatically generating novel and meaningful temporal summaries from personal health data. △ Less

Submitted 11 July, 2022; originally announced July 2022.

Comments: 5 pages, 2 figures, 1 table

arXiv:2207.01079 [pdf, other]

DiSCoMaT: Distantly Supervised Composition Extraction from Tables in Materials Science Articles

Authors: Tanishq Gupta, Mohd Zaki, Devanshi Khatsuriya, Kausik Hira, N. M. Anoop Krishnan, Mausam

Abstract: A crucial component in the curation of KB for a scientific domain (e.g., materials science, foods & nutrition, fuels) is information extraction from tables in the domain's published research articles. To facilitate research in this direction, we define a novel NLP task of extracting compositions of materials (e.g., glasses) from tables in materials science papers. The task involves solving several… ▽ More A crucial component in the curation of KB for a scientific domain (e.g., materials science, foods & nutrition, fuels) is information extraction from tables in the domain's published research articles. To facilitate research in this direction, we define a novel NLP task of extracting compositions of materials (e.g., glasses) from tables in materials science papers. The task involves solving several challenges in concert, such as tables that mention compositions have highly varying structures; text in captions and full paper needs to be incorporated along with data in tables; and regular languages for numbers, chemical compounds and composition expressions must be integrated into the model. We release a training dataset comprising 4,408 distantly supervised tables, along with 1,475 manually annotated dev and test tables. We also present a strong baseline DISCOMAT, that combines multiple graph neural networks with several task-specific regular expressions, features, and constraints. We show that DISCOMAT outperforms recent table processing architectures by significant margins. △ Less

Submitted 28 January, 2024; v1 submitted 3 July, 2022; originally announced July 2022.

Comments: Accepted long paper at ACL 2023 (https://2023.aclweb.org/program/accepted_main_conference/)

arXiv:2206.09336 [pdf, other]

Efficient Checking of Timed Order Compliance Rules over Graph-encoded Event Logs

Authors: Nesma M. Zaki, Iman M. A. Helal, Ahmed Awad, Ehab E. Hassanein

Abstract: Validation of compliance rules against process data is a fundamental functionality for business process management. Over the years, the problem has been addressed for different types of process data, i.e., process models, process event data at runtime, and event logs representing historical execution. Several approaches have been proposed to tackle compliance checking over process logs. These appr… ▽ More Validation of compliance rules against process data is a fundamental functionality for business process management. Over the years, the problem has been addressed for different types of process data, i.e., process models, process event data at runtime, and event logs representing historical execution. Several approaches have been proposed to tackle compliance checking over process logs. These approaches have been based on different data models and storage technologies including relational databases, graph databases, and proprietary formats. Graph-based encoding of event logs is a promising direction that turns several process analytics tasks into queries on the underlying graph. Compliance checking is one class of such analysis tasks. In this paper, we argue that encoding log data as graphs alone is not enough to guarantee efficient processing of queries on this data. Efficiency is important due to the interactive nature of compliance checking. Thus, compliance checking would benefit from sub-linear scanning of the data. Moreover, as more data are added, e.g., new batches of logs arrive, the data size should grow sub-linearly to optimize both the space of storage and time for querying. We propose two encoding methods using graph representation, realized in Neo4J, and show the benefits of these encoding on a special class of queries, namely timed order compliance rules. Compared to a baseline encoding, our experiments show up to 5x speed up in the querying time as well as a 3x reduction in the graph size. △ Less

Submitted 19 June, 2022; originally announced June 2022.

Comments: 18 pages, 5 figures, 6 tables

MSC Class: 68

arXiv:2206.06952 [pdf, other]

FETILDA: An Effective Framework For Fin-tuned Embeddings For Long Financial Text Documents

Authors: Bolun "Namir" Xia, Vipula D. Rawte, Mohammed J. Zaki, Aparna Gupta

Abstract: Unstructured data, especially text, continues to grow rapidly in various domains. In particular, in the financial sphere, there is a wealth of accumulated unstructured financial data, such as the textual disclosure documents that companies submit on a regular basis to regulatory agencies, such as the Securities and Exchange Commission (SEC). These documents are typically very long and tend to cont… ▽ More Unstructured data, especially text, continues to grow rapidly in various domains. In particular, in the financial sphere, there is a wealth of accumulated unstructured financial data, such as the textual disclosure documents that companies submit on a regular basis to regulatory agencies, such as the Securities and Exchange Commission (SEC). These documents are typically very long and tend to contain valuable soft information about a company's performance. It is therefore of great interest to learn predictive models from these long textual documents, especially for forecasting numerical key performance indicators (KPIs). Whereas there has been a great progress in pre-trained language models (LMs) that learn from tremendously large corpora of textual data, they still struggle in terms of effective representations for long documents. Our work fills this critical need, namely how to develop better models to extract useful information from long textual documents and learn effective features that can leverage the soft financial and risk information for text regression (prediction) tasks. In this paper, we propose and implement a deep learning framework that splits long documents into chunks and utilizes pre-trained LMs to process and aggregate the chunks into vector representations, followed by self-attention to extract valuable document-level features. We evaluate our model on a collection of 10-K public disclosure reports from US banks, and another dataset of reports submitted by US companies. Overall, our framework outperforms strong baseline methods for textual modeling as well as a baseline regression model using only numerical data. Our work provides better insights into how utilizing pre-trained domain-specific and fine-tuned long-input LMs in representing long documents can improve the quality of representation of textual data, and therefore, help in improving predictive analyses. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: 10 pages, 9 figures, 7 tables

ACM Class: I.2.7

arXiv:2111.07198 [pdf, ps, other]

Keyphrase Extraction Using Neighborhood Knowledge Based on Word Embeddings

Authors: Yuchen Liang, Mohammed J. Zaki

Abstract: Keyphrase extraction is the task of finding several interesting phrases in a text document, which provide a list of the main topics within the document. Most existing graph-based models use co-occurrence links as cohesion indicators to model the relationship of syntactic elements. However, a word may have different forms of expression within the document, and may have several synonyms as well. Sim… ▽ More Keyphrase extraction is the task of finding several interesting phrases in a text document, which provide a list of the main topics within the document. Most existing graph-based models use co-occurrence links as cohesion indicators to model the relationship of syntactic elements. However, a word may have different forms of expression within the document, and may have several synonyms as well. Simply using co-occurrence information cannot capture this information. In this paper, we enhance the graph-based ranking model by leveraging word embeddings as background knowledge to add semantic information to the inter-word graph. Our approach is evaluated on established benchmark datasets and empirical results show that the word embedding neighborhood information improves the model performance. △ Less

Submitted 13 November, 2021; originally announced November 2021.

arXiv:2110.06208 [pdf, other]

Towards formalization and monitoring of microscopic traffic parameters using temporal logic

Authors: Mariam Nour, Mohamed H. Zaki

Abstract: Smart cities are revolutionizing the transportation infrastructure by the integration of technology. However, ensuring that various transportation system components are operating as expected and in a safe manner is a great challenge. In this work, we propose the use of formal methods as a means to specify and reason about the traffic network's complex properties. Formal methods provide a flexible… ▽ More Smart cities are revolutionizing the transportation infrastructure by the integration of technology. However, ensuring that various transportation system components are operating as expected and in a safe manner is a great challenge. In this work, we propose the use of formal methods as a means to specify and reason about the traffic network's complex properties. Formal methods provide a flexible tool to define the safe operation of the traffic network by capturing non-conforming behavior, exploring various possible states of the traffic scene, and detecting any inconsistencies within it. Hence, we develop specification-based monitoring for the analysis of traffic networks using the formal language, Signal Temporal Logic. We develop monitors that identify safety-related behavior such as conforming to speed limits and maintaining appropriate headway. The framework is tested using a calibrated micro-simulated highway scenario and offline specification-based monitoring is applied to individual vehicle trajectories to understand whether they violate or satisfy the defined safety specifications. Statistical analysis of the outputs show that our approach can differentiate violating from conforming vehicle trajectories based on the defined specifications. This work can be utilized by traffic management centers to study the traffic stream properties, identify possible hazards, and provide valuable feedback for automating the traffic monitoring systems. △ Less

Submitted 12 October, 2021; originally announced October 2021.

arXiv:2109.15290 [pdf]

MatSciBERT: A Materials Domain Language Model for Text Mining and Information Extraction

Authors: Tanishq Gupta, Mohd Zaki, N. M. Anoop Krishnan, Mausam

Abstract: An overwhelmingly large amount of knowledge in the materials domain is generated and stored as text published in peer-reviewed scientific literature. Recent developments in natural language processing, such as bidirectional encoder representations from transformers (BERT) models, provide promising tools to extract information from these texts. However, direct application of these models in the mat… ▽ More An overwhelmingly large amount of knowledge in the materials domain is generated and stored as text published in peer-reviewed scientific literature. Recent developments in natural language processing, such as bidirectional encoder representations from transformers (BERT) models, provide promising tools to extract information from these texts. However, direct application of these models in the materials domain may yield suboptimal results as the models themselves may not be trained on notations and jargon that are specific to the domain. Here, we present a materials-aware language model, namely, MatSciBERT, which is trained on a large corpus of scientific literature published in the materials domain. We further evaluate the performance of MatSciBERT on three downstream tasks, namely, abstract classification, named entity recognition, and relation extraction, on different materials datasets. We show that MatSciBERT outperforms SciBERT, a language model trained on science corpus, on all the tasks. Further, we discuss some of the applications of MatSciBERT in the materials domain for extracting information, which can, in turn, contribute to materials discovery or optimization. Finally, to make the work accessible to the larger materials community, we make the pretrained and finetuned weights and the models of MatSciBERT freely accessible. △ Less

Submitted 30 September, 2021; originally announced September 2021.

arXiv:2108.03348 [pdf, other]

doi 10.1145/3534678.3539296

Global Self-Attention as a Replacement for Graph Convolution

Authors: Md Shamim Hussain, Mohammed J. Zaki, Dharmashankar Subramanian

Abstract: We propose an extension to the transformer neural network architecture for general-purpose graph learning by adding a dedicated pathway for pairwise structural information, called edge channels. The resultant framework - which we call Edge-augmented Graph Transformer (EGT) - can directly accept, process and output structural information of arbitrary form, which is important for effective learning… ▽ More We propose an extension to the transformer neural network architecture for general-purpose graph learning by adding a dedicated pathway for pairwise structural information, called edge channels. The resultant framework - which we call Edge-augmented Graph Transformer (EGT) - can directly accept, process and output structural information of arbitrary form, which is important for effective learning on graph-structured data. Our model exclusively uses global self-attention as an aggregation mechanism rather than static localized convolutional aggregation. This allows for unconstrained long-range dynamic interactions between nodes. Moreover, the edge channels allow the structural information to evolve from layer to layer, and prediction tasks on edges/links can be performed directly from the output embeddings of these channels. We verify the performance of EGT in a wide range of graph-learning experiments on benchmark datasets, in which it outperforms Convolutional/Message-Passing Graph Neural Networks. EGT sets a new state-of-the-art for the quantum-chemical regression task on the OGB-LSC PCQM4Mv2 dataset containing 3.8 million molecular graphs. Our findings indicate that global self-attention based aggregation can serve as a flexible, adaptive and effective replacement of graph convolution for general-purpose graph learning. Therefore, convolutional local neighborhood aggregation is not an essential inductive bias. △ Less

Submitted 3 June, 2022; v1 submitted 6 August, 2021; originally announced August 2021.

Comments: The accepted version in KDD '22

arXiv:2107.06369 [pdf, other]

Exploring DMD-type Algorithms for Modeling Signalised Intersections

Authors: Kazi Redwan Shabab, Shakib Mustavee, Shaurya Agarwal, Mohamed H. Zaki, Sajal Das

Abstract: This paper explores a novel data-driven approach based on recent developments in Koopman operator theory and dynamic mode decomposition (DMD) for modeling signalized intersections. Vehicular flow and queue formation on signalized intersections have complex nonlinear dynamics, making system identification, modeling, and controller design tasks challenging. We employ a Koopman theoretic approach to… ▽ More This paper explores a novel data-driven approach based on recent developments in Koopman operator theory and dynamic mode decomposition (DMD) for modeling signalized intersections. Vehicular flow and queue formation on signalized intersections have complex nonlinear dynamics, making system identification, modeling, and controller design tasks challenging. We employ a Koopman theoretic approach to transform the original nonlinear dynamics into locally linear infinite-dimensional dynamics. The data-driven approach relies entirely on spatio-temporal snapshots of the traffic data. We investigate several key aspects of the approach and provide insights into the usage of DMD-type algorithms for application in adaptive signalized intersections. To demonstrate the utility of the obtained linearized dynamics, we perform prediction of the queue lengths at the intersection; and compare the results with the state-of-the-art long short term memory (LSTM) method. The case study involves the morning peak vehicle movements and queue lengths at two Orlando area signalized intersections. It is observed that DMD-based algorithms are able to capture complex dynamics with a linear approximation to a reasonable extent. △ Less

Submitted 13 July, 2021; originally announced July 2021.

Comments: 11 pages, 8 figures, Submitted to: Journal of Intelligent Transportation Systems

Report number: GITS-2021-0219

arXiv:2105.00210 [pdf, other]

Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling

Authors: Mohammani Zaki, Avi Mohan, Aditya Gopalan, Shie Mannor

Abstract: We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. Modern communication systems are becoming increasingly complex, and are required to handle multiple types of traffic with widely varying characteristics such as arrival rates and service times. This, coupled with the need for rapid network deployment, render a bottom up approach of first… ▽ More We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. Modern communication systems are becoming increasingly complex, and are required to handle multiple types of traffic with widely varying characteristics such as arrival rates and service times. This, coupled with the need for rapid network deployment, render a bottom up approach of first characterizing the traffic and then devising an appropriate scheduling protocol infeasible. In contrast, we formulate a top down approach to scheduling where, given an unknown network and a set of scheduling policies, we use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies. We derive convergence results and analyze finite time performance of the algorithm. Simulation results show that the algorithm performs well even when the arrival rates are nonstationary and can stabilize the system even when the constituent policies are unstable. △ Less

Submitted 1 May, 2021; originally announced May 2021.

Comments: 4 pages, 5 figures, RLNQ workshop at the SIGMETRICS 2021

arXiv:2103.12050 [pdf]

Revealing the Compositional Control of Electrical, Mechanical, Optical, and Physical Properties of Inorganic Glasses

Authors: R. Ravinder, Suresh Bishnoi, Mohd Zaki, N. M. Anoop Krishnan

Abstract: Inorganic glasses, produced by the melt-quenching of a concoction of minerals, compounds, and elements, can possess unique optical and elastic properties along with excellent chemical, and thermal durability. Despite the ubiquitous use of glasses for critical applications such as touchscreen panels, windshields, bioactive implants, optical fibers and sensors, kitchen and laboratory glassware, ther… ▽ More Inorganic glasses, produced by the melt-quenching of a concoction of minerals, compounds, and elements, can possess unique optical and elastic properties along with excellent chemical, and thermal durability. Despite the ubiquitous use of glasses for critical applications such as touchscreen panels, windshields, bioactive implants, optical fibers and sensors, kitchen and laboratory glassware, thermal insulators, nuclear waste immobilization, optical lenses, and solid electrolytes, their composition-structure-property relationships remain poorly understood. Here, exploiting largescale experimental data on inorganic glasses and explainable machine learning algorithms, we develop composition-property models for twenty-five properties, which are in agreement with experimental observations. These models are further interpreted using a game-theoretic concept namely, Shapley additive explanations, to understand the role of glass components in controlling the final property. The analysis reveals that the components present in the glass, such as network formers, modifiers, and the intermediates, play distinct roles in governing each of the optical, physical, electrical, and mechanical properties of glasses. Additionally, these components exhibit interdependence, the magnitude of which is different for different properties. While the physical origins of some of these interdependencies could be attributed to known phenomena such as "boron anomaly", "mixed modifier effect", and the "Loewenstein rule", the majority of the remaining ones requires further experimental and computational analysis of the glass structure. Thus, our work paves the way for decoding the "glass genome", which can provide the recipe for discovery of novel glasses, while also shedding light into the fundamental factors governing the composition-structure-property relationships. △ Less

Submitted 23 March, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

arXiv:2103.03633 [pdf]

Unveiling the Glass Veil: Elucidating the Optical Properties in Glasses with Interpretable Machine Learning

Authors: Mohd Zaki, Vineeth Venugopal, R. Ravinder, Suresh Bishnoi, Sourabh Kumar Singh, Amarnath R. Allu, Jayadeva, N. M. Anoop Krishnan

Abstract: Due to their excellent optical properties, glasses are used for various applications ranging from smartphone screens to telescopes. Developing compositions with tailored Abbe number (Vd) and refractive index (nd), two crucial optical properties, is a major challenge. To this extent, machine learning (ML) approaches have been successfully used to develop composition-property models. However, these… ▽ More Due to their excellent optical properties, glasses are used for various applications ranging from smartphone screens to telescopes. Developing compositions with tailored Abbe number (Vd) and refractive index (nd), two crucial optical properties, is a major challenge. To this extent, machine learning (ML) approaches have been successfully used to develop composition-property models. However, these models are essentially black-box in nature and suffer from the lack of interpretability. In this paper, we demonstrate the use of ML models to predict the composition-dependent variations of Vd and n at 587.6 nm (nd). Further, using Shapely Additive exPlanations (SHAP), we interpret the ML models to identify the contribution of each of the input components toward a target prediction. We observe that the glass formers such as SiO2, B2O3, and P2O5, and intermediates like TiO2, PbO, and Bi2O3 play a significant role in controlling the optical properties. Interestingly, components that contribute toward increasing the nd are found to decrease the Vd and vice-versa. Finally, we develop the Abbe diagram, also known as the "glass veil", using the ML models, allowing accelerated discovery of new glasses for optical properties beyond the experimental pareto front. Overall, employing explainable ML, we discover the hidden compositional control on the optical properties of oxide glasses. △ Less

Submitted 5 March, 2021; originally announced March 2021.

Comments: 13 pages, 5 figures

arXiv:2102.08201 [pdf, other]

Improper Reinforcement Learning with Gradient-based Policy Optimization

Authors: Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor

Abstract: We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones. This can be useful in tuning across controllers, learnt possibly in mismatched or simulated environments, to obtain a good controller for a… ▽ More We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones. This can be useful in tuning across controllers, learnt possibly in mismatched or simulated environments, to obtain a good controller for a given target environment with relatively few trials. \par We propose a gradient-based approach that operates over a class of improper mixtures of the controllers. We derive convergence rate guarantees for the approach assuming access to a gradient oracle. The value function of the mixture and its gradient may not be available in closed-form; however, we show that we can employ rollouts and simultaneous perturbation stochastic approximation (SPSA) for explicit gradient descent optimization. Numerical results on (i) the standard control theoretic benchmark of stabilizing an inverted pendulum and (ii) a constrained queueing task show that our improper policy optimization algorithm can stabilize the system even when the base policies at its disposal are unstable\footnote{Under review. Please do not distribute.}. △ Less

Submitted 3 July, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

arXiv:2102.05571 [pdf, other]

TINKER: A framework for Open source Cyberthreat Intelligence

Authors: Nidhi Rastogi, Sharmishtha Dutta, Mohammed J. Zaki, Alex Gittens, Charu Aggarwal

Abstract: Threat intelligence on malware attacks and campaigns is increasingly being shared with other security experts for a cost or for free. Other security analysts use this intelligence to inform them of indicators of compromise, attack techniques, and preventative actions. Security analysts prepare threat analysis reports after investigating an attack, an emerging cyber threat, or a recently discovered… ▽ More Threat intelligence on malware attacks and campaigns is increasingly being shared with other security experts for a cost or for free. Other security analysts use this intelligence to inform them of indicators of compromise, attack techniques, and preventative actions. Security analysts prepare threat analysis reports after investigating an attack, an emerging cyber threat, or a recently discovered vulnerability. Collectively known as cyber threat intelligence (CTI), the reports are typically in an unstructured format and, therefore, challenging to integrate seamlessly into existing intrusion detection systems. This paper proposes a framework that uses the aggregated CTI for analysis and defense at scale. The information is extracted and stored in a structured format using knowledge graphs such that the semantics of the threat intelligence can be preserved and shared at scale with other security analysts. Specifically, we propose the first semi-supervised open-source knowledge graph-based framework, TINKER, to capture cyber threat information and its context. Following TINKER, we generate a Cyberthreat Intelligence Knowledge Graph (CTI-KG) and demonstrate the usage using different use cases. △ Less

Submitted 19 January, 2023; v1 submitted 10 February, 2021; originally announced February 2021.

Comments: 9 pages

arXiv:2101.06887 [pdf, other]

Can a Fruit Fly Learn Word Embeddings?

Authors: Yuchen Liang, Chaitanya K. Ryali, Benjamin Hoover, Leopold Grinberg, Saket Navlakha, Mohammed J. Zaki, Dmitry Krotov

Abstract: The mushroom body of the fruit fly brain is one of the best studied systems in neuroscience. At its core it consists of a population of Kenyon cells, which receive inputs from multiple sensory modalities. These cells are inhibited by the anterior paired lateral neuron, thus creating a sparse high dimensional representation of the inputs. In this work we study a mathematical formalization of this n… ▽ More The mushroom body of the fruit fly brain is one of the best studied systems in neuroscience. At its core it consists of a population of Kenyon cells, which receive inputs from multiple sensory modalities. These cells are inhibited by the anterior paired lateral neuron, thus creating a sparse high dimensional representation of the inputs. In this work we study a mathematical formalization of this network motif and apply it to learning the correlational structure between words and their context in a corpus of unstructured text, a common natural language processing (NLP) task. We show that this network can learn semantic representations of words and can generate both static and context-dependent word embeddings. Unlike conventional methods (e.g., BERT, GloVe) that use dense representations for word embedding, our algorithm encodes semantic meaning of words and their context in the form of sparse binary hash codes. The quality of the learned representations is evaluated on word similarity analysis, word-sense disambiguation, and document classification. It is shown that not only can the fruit fly network motif achieve performance comparable to existing methods in NLP, but, additionally, it uses only a fraction of the computational resources (shorter training time and smaller memory footprint). △ Less

Submitted 14 March, 2021; v1 submitted 18 January, 2021; originally announced January 2021.

Comments: Accepted for publication at ICLR 2021

arXiv:2101.01775 [pdf, other]

doi 10.1145/3437963.3441816

Personalized Food Recommendation as Constrained Question Answering over a Large-scale Food Knowledge Graph

Authors: Yu Chen, Ananya Subburathinam, Ching-Hua Chen, Mohammed J. Zaki

Abstract: Food recommendation has become an important means to help guide users to adopt healthy dietary habits. Previous works on food recommendation either i) fail to consider users' explicit requirements, ii) ignore crucial health factors (e.g., allergies and nutrition needs), or iii) do not utilize the rich food knowledge for recommending healthy recipes. To address these limitations, we propose a novel… ▽ More Food recommendation has become an important means to help guide users to adopt healthy dietary habits. Previous works on food recommendation either i) fail to consider users' explicit requirements, ii) ignore crucial health factors (e.g., allergies and nutrition needs), or iii) do not utilize the rich food knowledge for recommending healthy recipes. To address these limitations, we propose a novel problem formulation for food recommendation, modeling this task as constrained question answering over a large-scale food knowledge base/graph (KBQA). Besides the requirements from the user query, personalized requirements from the user's dietary preferences and health guidelines are handled in a unified way as additional constraints to the QA system. To validate this idea, we create a QA style dataset for personalized food recommendation based on a large-scale food knowledge graph and health guidelines. Furthermore, we propose a KBQA-based personalized food recommendation framework which is equipped with novel techniques for handling negations and numerical comparisons in the queries. Experimental results on the benchmark show that our approach significantly outperforms non-personalized counterparts (average 59.7% absolute improvement across various evaluation metrics), and is able to recommend more relevant and healthier recipes. △ Less

Submitted 5 January, 2021; originally announced January 2021.

Comments: 9 pages. Accepted by WSDM 2021. Final version

Showing 1–50 of 85 results for author: Zaki, M