Skip to main content

Showing 1–50 of 114 results for author: Ghodsi, A

.
  1. arXiv:2505.23366  [pdf, ps, other

    hep-th gr-qc

    On the spectra of holographic QFTs on constant curvature manifolds

    Authors: Ahmad Ghodsi, Elias Kiritsis, Parisa Mashayekhi, Francesco Nitti

    Abstract: We analyze linear fluctuations of five-dimensional Einstein-Dilaton theories dual to holographic quantum field theories defined on four-dimensional de Sitter and Anti-de Sitter space-times. We identify the physical propagating scalar and tensor degrees of freedom. For these, we write the linearized bulk field equations as eigenvalue equations. In the dual QFT, the eigenstates correspond to towers… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 97 pages

    Report number: CCTP-2025-4, ITCP-2025/4

  2. arXiv:2411.15684  [pdf, ps, other

    q-bio.BM cs.LG

    Disentangling the Complex Multiplexed DIA Spectra in De Novo Peptide Sequencing

    Authors: Zheng Ma, Zeping Mao, Ruixue Zhang, Jiazhen Chen, Lei Xin, Paul Shan, Ali Ghodsi, Ming Li

    Abstract: Data-Independent Acquisition (DIA) was introduced to improve sensitivity to cover all peptides in a range rather than only sampling high-intensity peaks as in Data-Dependent Acquisition (DDA) mass spectrometry. However, it is not very clear how useful DIA data is for de novo peptide sequencing as the DIA data are marred with coeluted peptides, high noises, and varying data quality. We present a ne… ▽ More

    Submitted 12 June, 2025; v1 submitted 23 November, 2024; originally announced November 2024.

  3. arXiv:2409.14595  [pdf, other

    cs.CL cs.LG

    EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models

    Authors: Hossein Rajabzadeh, Aref Jafari, Aman Sharma, Benyamin Jami, Hyock Ju Kwon, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh

    Abstract: Large Language Models (LLMs), with their increasing depth and number of parameters, have demonstrated outstanding performance across a variety of natural language processing tasks. However, this growth in scale leads to increased computational demands, particularly during inference and fine-tuning. To address these challenges, we introduce EchoAtt, a novel framework aimed at optimizing transformer… ▽ More

    Submitted 22 September, 2024; originally announced September 2024.

  4. arXiv:2409.02879  [pdf, other

    hep-th

    On holographic confining QFTs on AdS

    Authors: Ahmad Ghodsi, Elias Kiritsis, Francesco Nitti

    Abstract: Holographic quantum field theories that confine in flat space, are considered on a fixed AdS space. The space of holographic solutions for such theories is constructed and three types of regular solutions are found. Theories with two AdS boundaries provide interfaces between two confining theories. Theories with a single AdS boundary correspond to ground states of a single confining theory on AdS.… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 118 pages

    Report number: CCTP-2024-12, ITCP-2024/12

  5. arXiv:2407.01955  [pdf, other

    cs.CL

    S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models

    Authors: Parsa Kavehzadeh, Mohammadreza Pourreza, Mojtaba Valipour, Tinashu Zhu, Haoli Bai, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh

    Abstract: Deployment of autoregressive large language models (LLMs) is costly, and as these models increase in size, the associated costs will become even more considerable. Consequently, different methods have been proposed to accelerate the token generation process and reduce costs. Speculative decoding (SD) is among the most promising approaches to speed up the LLM decoding process by verifying multiple… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  6. arXiv:2404.08019  [pdf, other

    q-bio.QM cs.LG physics.chem-ph

    Learning Chemotherapy Drug Action via Universal Physics-Informed Neural Networks

    Authors: Lena Podina, Ali Ghodsi, Mohammad Kohandel

    Abstract: Quantitative systems pharmacology (QSP) is widely used to assess drug effects and toxicity before the drug goes to clinical trial. However, significant manual distillation of the literature is needed in order to construct a QSP model. Parameters may need to be fit, and simplifying assumptions of the model need to be made. In this work, we apply Universal Physics-Informed Neural Networks (UPINNs) t… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  7. arXiv:2402.18508  [pdf, other

    cs.LG

    Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling

    Authors: Mahdi Karami, Ali Ghodsi

    Abstract: In the rapidly evolving field of deep learning, the demand for models that are both expressive and computationally efficient has never been more critical. This paper introduces Orchid, a novel architecture designed to address the quadratic complexity of traditional attention mechanisms without compromising the ability to capture long-range dependencies and in-context learning. At the core of this… ▽ More

    Submitted 24 May, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  8. arXiv:2402.10462  [pdf, other

    cs.LG cs.CL

    QDyLoRA: Quantized Dynamic Low-Rank Adaptation for Efficient Large Language Model Tuning

    Authors: Hossein Rajabzadeh, Mojtaba Valipour, Tianshu Zhu, Marzieh Tahaei, Hyock Ju Kwon, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh

    Abstract: Finetuning large language models requires huge GPU memory, restricting the choice to acquire Larger models. While the quantized version of the Low-Rank Adaptation technique, named QLoRA, significantly alleviates this issue, finding the efficient LoRA rank is still challenging. Moreover, QLoRA is trained on a pre-defined rank and, therefore, cannot be reconfigured for its lower ranks without requir… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Best Paper Award AAAI EIW Workshop

  9. arXiv:2402.09603  [pdf, other

    cs.LG cs.AI

    Scalable Graph Self-Supervised Learning

    Authors: Ali Saheb Pasand, Reza Moravej, Mahdi Biparva, Raika Karimi, Ali Ghodsi

    Abstract: In regularization Self-Supervised Learning (SSL) methods for graphs, computational complexity increases with the number of nodes in graphs and embedding dimensions. To mitigate the scalability of non-contrastive graph SSL, we propose a novel approach to reduce the cost of computing the covariance matrix for the pre-training loss function with volume-maximization terms. Our work focuses on reducing… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  10. arXiv:2402.09586  [pdf, other

    cs.LG

    WERank: Towards Rank Degradation Prevention for Self-Supervised Learning Using Weight Regularization

    Authors: Ali Saheb Pasand, Reza Moravej, Mahdi Biparva, Ali Ghodsi

    Abstract: A common phenomena confining the representation quality in Self-Supervised Learning (SSL) is dimensional collapse (also known as rank degeneration), where the learned representations are mapped to a low dimensional subspace of the representation space. The State-of-the-Art SSL methods have shown to suffer from dimensional collapse and fall behind maintaining full rank. Recent approaches to prevent… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  11. arXiv:2309.08968  [pdf, other

    cs.CL cs.LG

    Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference

    Authors: Parsa Kavehzadeh, Mojtaba Valipour, Marzieh Tahaei, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh

    Abstract: Large language models (LLMs) have revolutionized natural language processing (NLP) by excelling at understanding and generating human-like text. However, their widespread deployment can be prohibitively expensive. SortedNet is a recent training technique for enabling dynamic inference by leveraging the modularity in networks and sorting sub-models based on computation/accuracy in a nested manner.… ▽ More

    Submitted 8 February, 2024; v1 submitted 16 September, 2023; originally announced September 2023.

    Comments: Accepted to EACL 2024 - Findings

  12. Holographic CFTs on $AdS_d\times S^n$ and conformal defects

    Authors: Ahmad Ghodsi, Elias Kiritsis, Francesco Nitti

    Abstract: We consider ($d+n+1$)-dimensional solutions of Einstein gravity with constant negative curvature. Regular solutions of this type are expected to be dual to the ground states of ($d+n$)-dimensional holographic CFTs on $AdS_d\times S^n$. Their only dimensionless parameter is the ratio of radii of curvatures of $AdS_d$ and $S^n$. The same solutions may also be dual to $(d-1)$-dimensional conformal de… ▽ More

    Submitted 22 November, 2023; v1 submitted 9 September, 2023; originally announced September 2023.

    Comments: 75 pages, 23 figures

    Report number: CCTP-2023-6, ITCP-2023/6

  13. arXiv:2309.00255  [pdf, other

    cs.LG

    SortedNet: A Scalable and Generalized Framework for Training Modular Deep Neural Networks

    Authors: Mojtaba Valipour, Mehdi Rezagholizadeh, Hossein Rajabzadeh, Parsa Kavehzadeh, Marzieh Tahaei, Boxing Chen, Ali Ghodsi

    Abstract: Deep neural networks (DNNs) must cater to a variety of users with different performance needs and budgets, leading to the costly practice of training, storing, and maintaining numerous user/task-specific models. There are solutions in the literature to deal with single dynamic or many-in-one models instead of many individual networks; however, they suffer from significant drops in performance, lac… ▽ More

    Submitted 1 June, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

  14. arXiv:2305.13395  [pdf, other

    cs.CL

    BioDEX: Large-Scale Biomedical Adverse Drug Event Extraction for Real-World Pharmacovigilance

    Authors: Karel D'Oosterlinck, François Remy, Johannes Deleu, Thomas Demeester, Chris Develder, Klim Zaporojets, Aneiss Ghodsi, Simon Ellershaw, Jack Collins, Christopher Potts

    Abstract: Timely and accurate extraction of Adverse Drug Events (ADE) from biomedical literature is paramount for public safety, but involves slow and costly manual labor. We set out to improve drug safety monitoring (pharmacovigilance, PV) through the use of Natural Language Processing (NLP). We introduce BioDEX, a large-scale resource for Biomedical adverse Drug Event Extraction, rooted in the historical… ▽ More

    Submitted 20 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: 28 pages. EMNLP Findings 2023

  15. arXiv:2304.11461  [pdf, other

    cs.LG cs.CL cs.NE cs.SD eess.AS

    Recurrent Neural Networks and Long Short-Term Memory Networks: Tutorial and Survey

    Authors: Benyamin Ghojogh, Ali Ghodsi

    Abstract: This is a tutorial paper on Recurrent Neural Network (RNN), Long Short-Term Memory Network (LSTM), and their variants. We start with a dynamical system and backpropagation through time for RNN. Then, we discuss the problems of gradient vanishing and explosion in long-term dependencies. We explain close-to-identity weight matrix, long delays, leaky units, and echo state networks for solving this pr… ▽ More

    Submitted 22 April, 2023; originally announced April 2023.

    Comments: To appear as a part of an upcoming textbook on deep learning

  16. arXiv:2301.12006  [pdf, other

    cs.LG cs.CL cs.CV

    Improved knowledge distillation by utilizing backward pass knowledge in neural networks

    Authors: Aref Jafari, Mehdi Rezagholizadeh, Ali Ghodsi

    Abstract: Knowledge distillation (KD) is one of the prominent techniques for model compression. In this method, the knowledge of a large network (teacher) is distilled into a model (student) with usually significantly fewer parameters. KD tries to better-match the output of the student model to that of the teacher model based on the knowledge extracts from the forward pass of the teacher network. Although c… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

  17. arXiv:2212.05998  [pdf, other

    cs.LG cs.CL

    Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization

    Authors: Aref Jafari, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart, Ali Ghodsi

    Abstract: Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model's (a student) generalization by transferring the knowledge from a larger model (a teacher). Although KD methods achieve state-of-the-art performance in numerous settings, they suffer from several problems limiting their performance. It is shown in the literature that the ca… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: Published at EMNLP 2022 (Findings)

  18. arXiv:2212.05956  [pdf, other

    cs.CL cs.LG

    Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging

    Authors: Peng Lu, Ivan Kobyzev, Mehdi Rezagholizadeh, Ahmad Rashid, Ali Ghodsi, Philippe Langlais

    Abstract: Knowledge Distillation (KD) is a commonly used technique for improving the generalization of compact Pre-trained Language Models (PLMs) on downstream tasks. However, such methods impose the additional burden of training a separate teacher model for every new dataset. Alternatively, one may directly work on the improvement of the optimization procedure of the compact model toward better generalizat… ▽ More

    Submitted 16 December, 2022; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: Published at EMNLP 2022 (Findings)

  19. arXiv:2210.07558  [pdf, other

    cs.CL cs.LG

    DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation

    Authors: Mojtaba Valipour, Mehdi Rezagholizadeh, Ivan Kobyzev, Ali Ghodsi

    Abstract: With the ever-growing size of pretrained models (PMs), fine-tuning them has become more expensive and resource-hungry. As a remedy, low-rank adapters (LoRA) keep the main pretrained weights of the model frozen and just introduce some learnable truncated SVD modules (so-called LoRA blocks) to the model. While LoRA blocks are parameter-efficient, they suffer from two major problems: first, the size… ▽ More

    Submitted 19 April, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: Accepted to EACL 2023

  20. Holographic QFTs on AdS$_d$, wormholes and holographic interfaces

    Authors: A. Ghodsi, J. K. Ghosh, E. Kiritsis, F. Nitti, V. Nourry

    Abstract: We consider three related topics: (a) Holographic quantum field theories on AdS spaces. (b) Holographic interfaces of flat space QFTs. (c) Wormholes connecting generically different QFTs. We investigate in a concrete example how the related classical solutions explore the space of QFTs and we construct the general solutions that interpolate between the same or different CFTs with arbitrary couplin… ▽ More

    Submitted 22 November, 2023; v1 submitted 24 September, 2022; originally announced September 2022.

    Comments: 97 pages, 52 figures

    Report number: CCTP-2022-5, ITCP-2022/4

  21. arXiv:2205.12428  [pdf, other

    cs.LG cs.CL

    Do we need Label Regularization to Fine-tune Pre-trained Language Models?

    Authors: Ivan Kobyzev, Aref Jafari, Mehdi Rezagholizadeh, Tianda Li, Alan Do-Omri, Peng Lu, Pascal Poupart, Ali Ghodsi

    Abstract: Knowledge Distillation (KD) is a prominent neural model compression technique that heavily relies on teacher network predictions to guide the training of a student model. Considering the ever-growing size of pre-trained language models (PLMs), KD is often adopted in many NLP tasks involving PLMs. However, it is evident that in KD, deploying the teacher network during training adds to the memory an… ▽ More

    Submitted 12 April, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: Published at EACL 2023

  22. arXiv:2205.07147  [pdf

    cs.DC

    The Sky Above The Clouds

    Authors: Sarah Chasins, Alvin Cheung, Natacha Crooks, Ali Ghodsi, Ken Goldberg, Joseph E. Gonzalez, Joseph M. Hellerstein, Michael I. Jordan, Anthony D. Joseph, Michael W. Mahoney, Aditya Parameswaran, David Patterson, Raluca Ada Popa, Koushik Sen, Scott Shenker, Dawn Song, Ion Stoica

    Abstract: Technology ecosystems often undergo significant transformations as they mature. For example, telephony, the Internet, and PCs all started with a single provider, but in the United States each is now served by a competitive market that uses comprehensive and universal technology standards to provide compatibility. This white paper presents our view on how the cloud ecosystem, barely over fifteen ye… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

    Comments: 35 pages

  23. Theoretical Connection between Locally Linear Embedding, Factor Analysis, and Probabilistic PCA

    Authors: Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

    Abstract: Locally Linear Embedding (LLE) is a nonlinear spectral dimensionality reduction and manifold learning method. It has two main steps which are linear reconstruction and linear embedding of points in the input space and embedding space, respectively. In this work, we look at the linear reconstruction step from a stochastic perspective where it is assumed that every data point is conditioned on its l… ▽ More

    Submitted 10 August, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

    Comments: Accepted for presentation at the Canadian AI 2022 (Canadian Conference on Artificial Intelligence). This paper has some shared materials with our other paper arXiv:2104.01525 but its focus and aim are different from that paper. v2: corrected a mathematical typo

    Journal ref: Proceedings of the 35th Canadian Conference on Artificial Intelligence, Canadian Artificial Intelligence Association, 2022

  24. arXiv:2203.09391  [pdf, other

    cs.LG cs.CL

    When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation

    Authors: Ehsan Kamalloo, Mehdi Rezagholizadeh, Ali Ghodsi

    Abstract: Data Augmentation (DA) is known to improve the generalizability of deep neural networks. Most existing DA techniques naively add a certain number of augmented samples without considering the quality and the added computational cost of these samples. To tackle this problem, a common strategy, adopted by several state-of-the-art DA methods, is to adaptively generate or re-weight augmented samples wi… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

    Comments: ACL 2022 Findings

  25. arXiv:2201.09267  [pdf, other

    stat.ML cs.CV cs.LG

    Spectral, Probabilistic, and Deep Metric Learning: Tutorial and Survey

    Authors: Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

    Abstract: This is a tutorial and survey paper on metric learning. Algorithms are divided into spectral, probabilistic, and deep metric learning. We first start with the definition of distance metric, Mahalanobis distance, and generalized Mahalanobis distance. In spectral methods, we start with methods using scatters of data, including the first spectral metric learning, relevant methods to Fisher discrimina… ▽ More

    Submitted 23 January, 2022; originally announced January 2022.

    Comments: To appear as a part of an upcoming textbook on dimensionality reduction and manifold learning

  26. arXiv:2111.13282  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    Generative Adversarial Networks and Adversarial Autoencoders: Tutorial and Survey

    Authors: Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

    Abstract: This is a tutorial and survey paper on Generative Adversarial Network (GAN), adversarial autoencoders, and their variants. We start with explaining adversarial learning and the vanilla GAN. Then, we explain the conditional GAN and DCGAN. The mode collapse problem is introduced and various methods, including minibatch GAN, unrolled GAN, BourGAN, mixture GAN, D2GAN, and Wasserstein GAN, are introduc… ▽ More

    Submitted 25 November, 2021; originally announced November 2021.

    Comments: To appear as a part of an upcoming textbook on dimensionality reduction and manifold learning

  27. arXiv:2110.09620  [pdf, ps, other

    stat.ME cs.LG math.ST stat.ML

    Sufficient Dimension Reduction for High-Dimensional Regression and Low-Dimensional Embedding: Tutorial and Survey

    Authors: Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

    Abstract: This is a tutorial and survey paper on various methods for Sufficient Dimension Reduction (SDR). We cover these methods with both statistical high-dimensional regression perspective and machine learning approach for dimensionality reduction. We start with introducing inverse regression methods including Sliced Inverse Regression (SIR), Sliced Average Variance Estimation (SAVE), contour regression,… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: To appear as a part of an upcoming textbook on dimensionality reduction and manifold learning

  28. arXiv:2110.08532  [pdf, other

    cs.CL

    Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher

    Authors: Mehdi Rezagholizadeh, Aref Jafari, Puneeth Salad, Pranav Sharma, Ali Saheb Pasand, Ali Ghodsi

    Abstract: With ever growing scale of neural models, knowledge distillation (KD) attracts more attention as a prominent tool for neural model compression. However, there are counter intuitive observations in the literature showing some challenging limitations of KD. A case in point is that the best performing checkpoint of the teacher might not necessarily be the best teacher for training the student in KD.… ▽ More

    Submitted 16 October, 2021; originally announced October 2021.

  29. arXiv:2110.01858  [pdf, other

    math.OC cs.DC cs.LG math.NA

    KKT Conditions, First-Order and Second-Order Optimization, and Distributed Optimization: Tutorial and Survey

    Authors: Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

    Abstract: This is a tutorial and survey paper on Karush-Kuhn-Tucker (KKT) conditions, first-order and second-order numerical optimization, and distributed optimization. After a brief review of history of optimization, we start with some preliminaries on properties of sets, norms, functions, and concepts of optimization. Then, we introduce the optimization problem, standard optimization problems (including l… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

    Comments: To appear partly as a part of an upcoming textbook on dimensionality reduction and manifold learning

  30. arXiv:2109.10147  [pdf, other

    cs.CL

    Knowledge Distillation with Noisy Labels for Natural Language Understanding

    Authors: Shivendra Bhardwaj, Abbas Ghaddar, Ahmad Rashid, Khalil Bibi, Chengyang Li, Ali Ghodsi, Philippe Langlais, Mehdi Rezagholizadeh

    Abstract: Knowledge Distillation (KD) is extensively used to compress and deploy large pre-trained language models on edge devices for real-world applications. However, one neglected area of research is the impact of noisy (corrupted) labels on KD. We present, to the best of our knowledge, the first study on KD with noisy labels in Natural Language Understanding (NLU). We document the scope of the problem a… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

  31. arXiv:2109.06243  [pdf, other

    cs.CL cs.AI

    KroneckerBERT: Learning Kronecker Decomposition for Pre-trained Language Models via Knowledge Distillation

    Authors: Marzieh S. Tahaei, Ella Charlaix, Vahid Partovi Nia, Ali Ghodsi, Mehdi Rezagholizadeh

    Abstract: The development of over-parameterized pre-trained language models has made a significant contribution toward the success of natural language processing. While over-parameterization of these models is the key to their generalization power, it makes them unsuitable for deployment on low-capacity devices. We push the limits of state-of-the-art Transformer-based pre-trained language model compression… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

  32. arXiv:2109.05696  [pdf, other

    cs.CL

    How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding

    Authors: Tianda Li, Ahmad Rashid, Aref Jafari, Pranav Sharma, Ali Ghodsi, Mehdi Rezagholizadeh

    Abstract: Knowledge Distillation (KD) is a model compression algorithm that helps transfer the knowledge of a large neural network into a smaller one. Even though KD has shown promise on a wide range of Natural Language Processing (NLP) applications, little is understood about how one KD algorithm compares to another and whether these approaches can be complimentary to each other. In this work, we evaluate… ▽ More

    Submitted 20 September, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: Accepted as EMNLP 2021 Findings

  33. arXiv:2109.02508  [pdf, ps, other

    cs.HC cs.LG

    Uniform Manifold Approximation and Projection (UMAP) and its Variants: Tutorial and Survey

    Authors: Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

    Abstract: Uniform Manifold Approximation and Projection (UMAP) is one of the state-of-the-art methods for dimensionality reduction and data visualization. This is a tutorial and survey paper on UMAP and its variants. We start with UMAP algorithm where we explain probabilities of neighborhood in the input and embedding spaces, optimization of cost function, training algorithm, derivation of gradients, and su… ▽ More

    Submitted 24 August, 2021; originally announced September 2021.

    Comments: To appear as a part of an upcoming textbook on dimensionality reduction and manifold learning

  34. arXiv:2108.04172  [pdf, other

    stat.ML cs.DS cs.LG math.PR

    Johnson-Lindenstrauss Lemma, Linear and Nonlinear Random Projections, Random Fourier Features, and Random Kitchen Sinks: Tutorial and Survey

    Authors: Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

    Abstract: This is a tutorial and survey paper on the Johnson-Lindenstrauss (JL) lemma and linear and nonlinear random projections. We start with linear random projection and then justify its correctness by JL lemma and its proof. Then, sparse random projections with $\ell_1$ norm and interpolation norm are introduced. Two main applications of random projection, which are low-rank matrix approximation and ap… ▽ More

    Submitted 9 August, 2021; originally announced August 2021.

    Comments: To appear as a part of an upcoming textbook on dimensionality reduction and manifold learning

  35. arXiv:2107.12521  [pdf, other

    cs.LG cs.NE physics.data-an stat.ML

    Restricted Boltzmann Machine and Deep Belief Network: Tutorial and Survey

    Authors: Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

    Abstract: This is a tutorial and survey paper on Boltzmann Machine (BM), Restricted Boltzmann Machine (RBM), and Deep Belief Network (DBN). We start with the required background on probabilistic graphical models, Markov random field, Gibbs sampling, statistical physics, Ising model, and the Hopfield network. Then, we introduce the structures of BM and RBM. The conditional distributions of visible and hidden… ▽ More

    Submitted 5 August, 2022; v1 submitted 26 July, 2021; originally announced July 2021.

    Comments: To appear as a part of an upcoming textbook on dimensionality reduction and manifold learning. v2: applied readers' feedback

  36. arXiv:2106.15379  [pdf, other

    stat.ML cs.CV cs.LG

    Unified Framework for Spectral Dimensionality Reduction, Maximum Variance Unfolding, and Kernel Learning By Semidefinite Programming: Tutorial and Survey

    Authors: Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

    Abstract: This is a tutorial and survey paper on unification of spectral dimensionality reduction methods, kernel learning by Semidefinite Programming (SDP), Maximum Variance Unfolding (MVU) or Semidefinite Embedding (SDE), and its variants. We first explain how the spectral dimensionality reduction methods can be unified as kernel Principal Component Analysis (PCA) with different kernels. This unification… ▽ More

    Submitted 3 August, 2022; v1 submitted 29 June, 2021; originally announced June 2021.

    Comments: To appear as a part of an upcoming textbook on dimensionality reduction and manifold learning. v2: corrected some typos

  37. arXiv:2106.14320  [pdf, other

    math.NA cs.LG

    Legendre Deep Neural Network (LDNN) and its application for approximation of nonlinear Volterra Fredholm Hammerstein integral equations

    Authors: Zeinab Hajimohammadi, Kourosh Parand, Ali Ghodsi

    Abstract: Various phenomena in biology, physics, and engineering are modeled by differential equations. These differential equations including partial differential equations and ordinary differential equations can be converted and represented as integral equations. In particular, Volterra Fredholm Hammerstein integral equations are the main type of these integral equations and researchers are interested in… ▽ More

    Submitted 27 June, 2021; originally announced June 2021.

  38. arXiv:2106.14131  [pdf, other

    cs.LG cs.CL cs.SC

    SymbolicGPT: A Generative Transformer Model for Symbolic Regression

    Authors: Mojtaba Valipour, Bowen You, Maysum Panju, Ali Ghodsi

    Abstract: Symbolic regression is the task of identifying a mathematical expression that best fits a provided dataset of input and output values. Due to the richness of the space of mathematical expressions, symbolic regression is generally a challenging problem. While conventional approaches based on genetic evolution algorithms have been used for decades, deep learning-based methods are relatively new and… ▽ More

    Submitted 26 June, 2021; originally announced June 2021.

    Comments: 11 pages, 4 figures

  39. arXiv:2106.08443  [pdf, other

    stat.ML cs.LG math.FA

    Reproducing Kernel Hilbert Space, Mercer's Theorem, Eigenfunctions, Nyström Method, and Use of Kernels in Machine Learning: Tutorial and Survey

    Authors: Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

    Abstract: This is a tutorial and survey paper on kernels, kernel methods, and related fields. We start with reviewing the history of kernels in functional analysis and machine learning. Then, Mercer kernel, Hilbert and Banach spaces, Reproducing Kernel Hilbert Space (RKHS), Mercer's theorem and its proof, frequently used kernels, kernel construction from distance metric, important classes of kernels (includ… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: To appear as a part of an upcoming textbook on dimensionality reduction and manifold learning

  40. arXiv:2106.02154  [pdf, other

    stat.ML cs.CV cs.LG

    Laplacian-Based Dimensionality Reduction Including Spectral Clustering, Laplacian Eigenmap, Locality Preserving Projection, Graph Embedding, and Diffusion Map: Tutorial and Survey

    Authors: Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

    Abstract: This is a tutorial and survey paper for nonlinear dimensionality and feature extraction methods which are based on the Laplacian of graph of data. We first introduce adjacency matrix, definition of Laplacian matrix, and the interpretation of Laplacian. Then, we cover the cuts of graph and spectral clustering which applies clustering in a subspace of data. Different optimization variants of Laplaci… ▽ More

    Submitted 5 August, 2022; v1 submitted 3 June, 2021; originally announced June 2021.

    Comments: To appear as a part of an upcoming textbook on dimensionality reduction and manifold learning. v2: applied readers' feedback

  41. arXiv:2105.13608  [pdf, other

    cs.CL cs.LG

    Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMax

    Authors: Ehsan Kamalloo, Mehdi Rezagholizadeh, Peyman Passban, Ali Ghodsi

    Abstract: In Natural Language Processing (NLP), finding data augmentation techniques that can produce high-quality human-interpretable examples has always been challenging. Recently, leveraging kNN such that augmented examples are retrieved from large repositories of unlabelled sentences has made a step toward interpretable augmentation. Inspired by this paradigm, we introduce Minimax-kNN, a sample efficien… ▽ More

    Submitted 2 June, 2021; v1 submitted 28 May, 2021; originally announced May 2021.

    Comments: Findings of ACL 2021

  42. Higher order curvature corrections and holographic renormalization group flow

    Authors: Ahmad Ghodsi, Malihe Siahvoshan

    Abstract: We study the holographic renormalization group (RG) flow in the presence of higher-order curvature corrections to the $(d+1)$-dimensional Einstein-Hilbert (EH) action for an arbitrary interacting scalar matter field by using the superpotential approach. We find the critical points of the RG flow near the local minima and maxima of the potential and show the existence of the bounce solutions. In co… ▽ More

    Submitted 27 May, 2021; originally announced May 2021.

    Comments: 33 pages, 4 figures

  43. arXiv:2104.07163  [pdf, other

    cs.CL cs.LG

    Annealing Knowledge Distillation

    Authors: Aref Jafari, Mehdi Rezagholizadeh, Pranav Sharma, Ali Ghodsi

    Abstract: Significant memory and computational requirements of large deep neural networks restrict their application on edge devices. Knowledge distillation (KD) is a prominent model compression technique for deep neural networks in which the knowledge of a trained large teacher model is transferred to a smaller student model. The success of knowledge distillation is mainly attributed to its training object… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

  44. arXiv:2104.01525  [pdf, other

    stat.ML cs.CV cs.LG

    Generative Locally Linear Embedding

    Authors: Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

    Abstract: Locally Linear Embedding (LLE) is a nonlinear spectral dimensionality reduction and manifold learning method. It has two main steps which are linear reconstruction and linear embedding of points in the input space and embedding space, respectively. In this work, we propose two novel generative versions of LLE, named Generative LLE (GLLE), whose linear reconstruction steps are stochastic rather tha… ▽ More

    Submitted 3 April, 2021; originally announced April 2021.

  45. arXiv:2101.07903  [pdf, other

    eess.IV

    Fine-Tuning and Training of DenseNet for Histopathology Image Representation Using TCGA Diagnostic Slides

    Authors: Abtin Riasatian, Morteza Babaie, Danial Maleki, Shivam Kalra, Mojtaba Valipour, Sobhan Hemati, Manit Zaveri, Amir Safarpoor, Sobhan Shafiei, Mehdi Afshari, Maral Rasoolijaberi, Milad Sikaroudi, Mohd Adnan, Sultaan Shah, Charles Choi, Savvas Damaskinos, Clinton JV Campbell, Phedias Diamandis, Liron Pantanowitz, Hany Kashani, Ali Ghodsi, H. R. Tizhoosh

    Abstract: Feature vectors provided by pre-trained deep artificial neural networks have become a dominant source for image representation in recent literature. Their contribution to the performance of image analysis can be improved through finetuning. As an ultimate solution, one might even train a deep network from scratch with the domain-relevant images, a highly desirable option which is generally impeded… ▽ More

    Submitted 19 January, 2021; originally announced January 2021.

  46. arXiv:2101.00734  [pdf, other

    stat.ML cs.CV cs.LG

    Factor Analysis, Probabilistic Principal Component Analysis, Variational Inference, and Variational Autoencoder: Tutorial and Survey

    Authors: Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

    Abstract: This is a tutorial and survey paper on factor analysis, probabilistic Principal Component Analysis (PCA), variational inference, and Variational Autoencoder (VAE). These methods, which are tightly related, are dimensionality reduction and generative models. They assume that every data point is generated from or caused by a low-dimensional latent factor. By learning the parameters of distribution o… ▽ More

    Submitted 23 May, 2022; v1 submitted 3 January, 2021; originally announced January 2021.

    Comments: To appear as a part of an upcoming textbook on dimensionality reduction and manifold learning. v2: corrected some mathematical typos

  47. arXiv:2011.10925  [pdf, other

    stat.ML cs.CV cs.LG

    Locally Linear Embedding and its Variants: Tutorial and Survey

    Authors: Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

    Abstract: This is a tutorial and survey paper for Locally Linear Embedding (LLE) and its variants. The idea of LLE is fitting the local structure of manifold in the embedding space. In this paper, we first cover LLE, kernel LLE, inverse LLE, and feature fusion with LLE. Then, we cover out-of-sample embedding using linear reconstruction, eigenfunctions, and kernel mapping. Incremental LLE is explained for em… ▽ More

    Submitted 21 November, 2020; originally announced November 2020.

    Comments: To appear as a part of an upcoming textbook on dimensionality reduction and manifold learning

  48. arXiv:2011.06673  [pdf, other

    cs.LG cs.NE cs.SC

    Symbolically Solving Partial Differential Equations using Deep Learning

    Authors: Maysum Panju, Kourosh Parand, Ali Ghodsi

    Abstract: We describe a neural-based method for generating exact or approximate solutions to differential equations in the form of mathematical expressions. Unlike other neural methods, our system returns symbolic expressions that can be interpreted directly. Our method uses a neural architecture for learning mathematical expressions to optimize a customizable objective, and is scalable, compact, and easily… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

    Comments: 10 pages

  49. arXiv:2011.02415  [pdf, other

    cs.LG cs.AI cs.SC

    A Neuro-Symbolic Method for Solving Differential and Functional Equations

    Authors: Maysum Panju, Ali Ghodsi

    Abstract: When neural networks are used to solve differential equations, they usually produce solutions in the form of black-box functions that are not directly mathematically interpretable. We introduce a method for generating symbolic expressions to solve differential equations while leveraging deep learning training methods. Unlike existing methods, our system does not require learning a language model o… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

    Comments: 8 pages

  50. arXiv:2009.10301  [pdf, ps, other

    stat.ML cs.CV cs.LG

    Stochastic Neighbor Embedding with Gaussian and Student-t Distributions: Tutorial and Survey

    Authors: Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

    Abstract: Stochastic Neighbor Embedding (SNE) is a manifold learning and dimensionality reduction method with a probabilistic approach. In SNE, every point is consider to be the neighbor of all other points with some probability and this probability is tried to be preserved in the embedding space. SNE considers Gaussian distribution for the probability in both the input and embedding spaces. However, t-SNE… ▽ More

    Submitted 3 August, 2022; v1 submitted 21 September, 2020; originally announced September 2020.

    Comments: To appear as a part of an upcoming academic book on dimensionality reduction and manifold learning. v2: applied readers' feedback