-
CASP: Compression of Large Multimodal Models Based on Attention Sparsity
Authors:
Mohsen Gholami,
Mohammad Akbari,
Kevin Cannons,
Yong Zhang
Abstract:
In this work, we propose an extreme compression technique for Large Multimodal Models (LMMs). While previous studies have explored quantization as an efficient post-training compression method for Large Language Models (LLMs), low-bit compression for multimodal models remains under-explored. The redundant nature of inputs in multimodal models results in a highly sparse attention matrix. We theoret…
▽ More
In this work, we propose an extreme compression technique for Large Multimodal Models (LMMs). While previous studies have explored quantization as an efficient post-training compression method for Large Language Models (LLMs), low-bit compression for multimodal models remains under-explored. The redundant nature of inputs in multimodal models results in a highly sparse attention matrix. We theoretically and experimentally demonstrate that the attention matrix's sparsity bounds the compression error of the Query and Key weight matrices. Based on this, we introduce CASP, a model compression technique for LMMs. Our approach performs a data-aware low-rank decomposition on the Query and Key weight matrix, followed by quantization across all layers based on an optimal bit allocation process. CASP is compatible with any quantization technique and enhances state-of-the-art 2-bit quantization methods (AQLM and QuIP#) by an average of 21% on image- and video-language benchmarks.
△ Less
Submitted 7 March, 2025;
originally announced March 2025.
-
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Authors:
Saeed Ranjbar Alvar,
Gursimran Singh,
Mohammad Akbari,
Yong Zhang
Abstract:
Large Multimodal Models (LMMs) have emerged as powerful models capable of understanding various data modalities, including text, images, and videos. LMMs encode both text and visual data into tokens that are then combined and processed by an integrated Large Language Model (LLM). Including visual tokens substantially increases the total token count, often by thousands. The increased input length f…
▽ More
Large Multimodal Models (LMMs) have emerged as powerful models capable of understanding various data modalities, including text, images, and videos. LMMs encode both text and visual data into tokens that are then combined and processed by an integrated Large Language Model (LLM). Including visual tokens substantially increases the total token count, often by thousands. The increased input length for LLM significantly raises the complexity of inference, resulting in high latency in LMMs. To address this issue, token pruning methods, which remove part of the visual tokens, are proposed. The existing token pruning methods either require extensive calibration and fine-tuning or rely on suboptimal importance metrics which results in increased redundancy among the retained tokens. In this paper, we first formulate token pruning as Max-Min Diversity Problem (MMDP) where the goal is to select a subset such that the diversity among the selected {tokens} is maximized. Then, we solve the MMDP to obtain the selected subset and prune the rest. The proposed method, DivPrune, reduces redundancy and achieves the highest diversity of the selected tokens. By ensuring high diversity, the selected tokens better represent the original tokens, enabling effective performance even at high pruning ratios without requiring fine-tuning. Extensive experiments with various LMMs show that DivPrune achieves state-of-the-art accuracy over 16 image- and video-language datasets. Additionally, DivPrune reduces both the end-to-end latency and GPU memory usage for the tested models. The code is available $\href{https://github.com/vbdi/divprune}{\text{here}}$.
△ Less
Submitted 1 April, 2025; v1 submitted 3 March, 2025;
originally announced March 2025.
-
Drug-Target Interaction/Affinity Prediction: Deep Learning Models and Advances Review
Authors:
Ali Vefghi,
Zahed Rahmati,
Mohammad Akbari
Abstract:
Drug discovery remains a slow and expensive process that involves many steps, from detecting the target structure to obtaining approval from the Food and Drug Administration (FDA), and is often riddled with safety concerns. Accurate prediction of how drugs interact with their targets and the development of new drugs by using better methods and technologies have immense potential to speed up this p…
▽ More
Drug discovery remains a slow and expensive process that involves many steps, from detecting the target structure to obtaining approval from the Food and Drug Administration (FDA), and is often riddled with safety concerns. Accurate prediction of how drugs interact with their targets and the development of new drugs by using better methods and technologies have immense potential to speed up this process, ultimately leading to faster delivery of life-saving medications. Traditional methods used for drug-target interaction prediction show limitations, particularly in capturing complex relationships between drugs and their targets. As an outcome, deep learning models have been presented to overcome the challenges of interaction prediction through their precise and efficient end results. By outlining promising research avenues and models, each with a different solution but similar to the problem, this paper aims to give researchers a better idea of methods for even more accurate and efficient prediction of drug-target interaction, ultimately accelerating the development of more effective drugs. A total of 180 prediction methods for drug-target interactions were analyzed throughout the period spanning 2016 to 2025 using different frameworks based on machine learning, mainly deep learning and graph neural networks. Additionally, this paper discusses the novelty, architecture, and input representation of these models.
△ Less
Submitted 21 February, 2025;
originally announced February 2025.
-
Task-Agnostic Language Model Watermarking via High Entropy Passthrough Layers
Authors:
Vaden Masrani,
Mohammad Akbari,
David Ming Xuan Yue,
Ahmad Rezaei,
Yong Zhang
Abstract:
In the era of costly pre-training of large language models, ensuring the intellectual property rights of model owners, and insuring that said models are responsibly deployed, is becoming increasingly important. To this end, we propose model watermarking via passthrough layers, which are added to existing pre-trained networks and trained using a self-supervised loss such that the model produces hig…
▽ More
In the era of costly pre-training of large language models, ensuring the intellectual property rights of model owners, and insuring that said models are responsibly deployed, is becoming increasingly important. To this end, we propose model watermarking via passthrough layers, which are added to existing pre-trained networks and trained using a self-supervised loss such that the model produces high-entropy output when prompted with a unique private key, and acts normally otherwise. Unlike existing model watermarking methods, our method is fully task-agnostic, and can be applied to both classification and sequence-to-sequence tasks without requiring advanced access to downstream fine-tuning datasets. We evaluate the proposed passthrough layers on a wide range of downstream tasks, and show experimentally our watermarking method achieves a near-perfect watermark extraction accuracy and false-positive rate in most cases without damaging original model performance. Additionally, we show our method is robust to both downstream fine-tuning, fine-pruning, and layer removal attacks, and can be trained in a fraction of the time required to train the original model. Code is available in the paper.
△ Less
Submitted 17 December, 2024;
originally announced December 2024.
-
UNet++ and LSTM combined approach for Breast Ultrasound Image Segmentation
Authors:
Saba Hesaraki,
Morteza Akbari,
Ramin Mousa
Abstract:
Breast cancer stands as a prevalent cause of fatality among females on a global scale, with prompt detection playing a pivotal role in diminishing mortality rates. The utilization of ultrasound scans in the BUSI dataset for medical imagery pertaining to breast cancer has exhibited commendable segmentation outcomes through the application of UNet and UNet++ networks. Nevertheless, a notable drawbac…
▽ More
Breast cancer stands as a prevalent cause of fatality among females on a global scale, with prompt detection playing a pivotal role in diminishing mortality rates. The utilization of ultrasound scans in the BUSI dataset for medical imagery pertaining to breast cancer has exhibited commendable segmentation outcomes through the application of UNet and UNet++ networks. Nevertheless, a notable drawback of these models resides in their inattention towards the temporal aspects embedded within the images. This research endeavors to enrich the UNet++ architecture by integrating LSTM layers and self-attention mechanisms to exploit temporal characteristics for segmentation purposes. Furthermore, the incorporation of a Multiscale Feature Extraction Module aims to grasp varied scale features within the UNet++. Through the amalgamation of our proposed methodology with data augmentation on the BUSI with GT dataset, an accuracy rate of 98.88%, specificity of 99.53%, precision of 95.34%, sensitivity of 91.20%, F1-score of 93.74, and Dice coefficient of 92.74% are achieved. These findings demonstrate competitiveness with cutting-edge techniques outlined in existing literature.
△ Less
Submitted 7 December, 2024;
originally announced December 2024.
-
Towards Secure and Usable 3D Assets: A Novel Framework for Automatic Visible Watermarking
Authors:
Gursimran Singh,
Tianxi Hu,
Mohammad Akbari,
Qiang Tang,
Yong Zhang
Abstract:
3D models, particularly AI-generated ones, have witnessed a recent surge across various industries such as entertainment. Hence, there is an alarming need to protect the intellectual property and avoid the misuse of these valuable assets. As a viable solution to address these concerns, we rigorously define the novel task of automated 3D visible watermarking in terms of two competing aspects: water…
▽ More
3D models, particularly AI-generated ones, have witnessed a recent surge across various industries such as entertainment. Hence, there is an alarming need to protect the intellectual property and avoid the misuse of these valuable assets. As a viable solution to address these concerns, we rigorously define the novel task of automated 3D visible watermarking in terms of two competing aspects: watermark quality and asset utility. Moreover, we propose a method of embedding visible watermarks that automatically determines the right location, orientation, and number of watermarks to be placed on arbitrary 3D assets for high watermark quality and asset utility. Our method is based on a novel rigid-body optimization that uses back-propagation to automatically learn transforms for ideal watermark placement. In addition, we propose a novel curvature-matching method for fusing the watermark into the 3D model that further improves readability and security. Finally, we provide a detailed experimental analysis on two benchmark 3D datasets validating the superior performance of our approach in comparison to baselines. Code and demo are available.
△ Less
Submitted 17 September, 2024; v1 submitted 30 August, 2024;
originally announced September 2024.
-
LaWa: Using Latent Space for In-Generation Image Watermarking
Authors:
Ahmad Rezaei,
Mohammad Akbari,
Saeed Ranjbar Alvar,
Arezou Fatemi,
Yong Zhang
Abstract:
With generative models producing high quality images that are indistinguishable from real ones, there is growing concern regarding the malicious usage of AI-generated images. Imperceptible image watermarking is one viable solution towards such concerns. Prior watermarking methods map the image to a latent space for adding the watermark. Moreover, Latent Diffusion Models (LDM) generate the image in…
▽ More
With generative models producing high quality images that are indistinguishable from real ones, there is growing concern regarding the malicious usage of AI-generated images. Imperceptible image watermarking is one viable solution towards such concerns. Prior watermarking methods map the image to a latent space for adding the watermark. Moreover, Latent Diffusion Models (LDM) generate the image in the latent space of a pre-trained autoencoder. We argue that this latent space can be used to integrate watermarking into the generation process. To this end, we present LaWa, an in-generation image watermarking method designed for LDMs. By using coarse-to-fine watermark embedding modules, LaWa modifies the latent space of pre-trained autoencoders and achieves high robustness against a wide range of image transformations while preserving perceptual quality of the image. We show that LaWa can also be used as a general image watermarking method. Through extensive experiments, we demonstrate that LaWa outperforms previous works in perceptual quality, robustness against attacks, and computational complexity, while having very low false positive rate. Code is available here.
△ Less
Submitted 30 May, 2025; v1 submitted 11 August, 2024;
originally announced August 2024.
-
GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation
Authors:
Mohsen Gholami,
Mohammad Akbari,
Cindy Hu,
Vaden Masrani,
Z. Jane Wang,
Yong Zhang
Abstract:
Knowledge distillation from LLMs is essential for the efficient deployment of language models. Prior works have proposed data generation using LLMs for preparing distilled models. We argue that generating data with LLMs is prone to sampling mainly from the center of original content distribution. This limitation hinders the distilled model from learning the true underlying data distribution and to…
▽ More
Knowledge distillation from LLMs is essential for the efficient deployment of language models. Prior works have proposed data generation using LLMs for preparing distilled models. We argue that generating data with LLMs is prone to sampling mainly from the center of original content distribution. This limitation hinders the distilled model from learning the true underlying data distribution and to forget the tails of the distributions (samples with lower probability). To this end, we propose GOLD, a task-agnostic data generation and knowledge distillation framework, which employs an iterative out-of-distribution-guided feedback mechanism for the LLM. As a result, the generated data improves the generalizability of distilled models. An energy-based OOD evaluation approach is also introduced to deal with noisy generated data. Our extensive experiments on 10 different classification and sequence-to-sequence tasks in NLP show that GOLD respectively outperforms prior arts and the LLM with an average improvement of 5% and 14%. We will also show that the proposed method is applicable to less explored and novel tasks. The code is available.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
AMUSE: Adaptive Multi-Segment Encoding for Dataset Watermarking
Authors:
Saeed Ranjbar Alvar,
Mohammad Akbari,
David Ming Xuan Yue,
Yong Zhang
Abstract:
Curating high quality datasets that play a key role in the emergence of new AI applications requires considerable time, money, and computational resources. So, effective ownership protection of datasets is becoming critical. Recently, to protect the ownership of an image dataset, imperceptible watermarking techniques are used to store ownership information (i.e., watermark) into the individual ima…
▽ More
Curating high quality datasets that play a key role in the emergence of new AI applications requires considerable time, money, and computational resources. So, effective ownership protection of datasets is becoming critical. Recently, to protect the ownership of an image dataset, imperceptible watermarking techniques are used to store ownership information (i.e., watermark) into the individual image samples. Embedding the entire watermark into all samples leads to significant redundancy in the embedded information which damages the watermarked dataset quality and extraction accuracy. In this paper, a multi-segment encoding-decoding method for dataset watermarking (called AMUSE) is proposed to adaptively map the original watermark into a set of shorter sub-messages and vice versa. Our message encoder is an adaptive method that adjusts the length of the sub-messages according to the protection requirements for the target dataset. Existing image watermarking methods are then employed to embed the sub-messages into the original images in the dataset and also to extract them from the watermarked images. Our decoder is then used to reconstruct the original message from the extracted sub-messages. The proposed encoder and decoder are plug-and-play modules that can easily be added to any watermarking method. To this end, extensive experiments are preformed with multiple watermarking solutions which show that applying AMUSE improves the overall message extraction accuracy upto 28% for the same given dataset quality. Furthermore, the image dataset quality is enhanced by a PSNR of $\approx$2 dB on average, while improving the extraction accuracy for one of the tested image watermarking methods.
△ Less
Submitted 18 July, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
ArchBERT: Bi-Modal Understanding of Neural Architectures and Natural Languages
Authors:
Mohammad Akbari,
Saeed Ranjbar Alvar,
Behnam Kamranian,
Amin Banitalebi-Dehkordi,
Yong Zhang
Abstract:
Building multi-modal language models has been a trend in the recent years, where additional modalities such as image, video, speech, etc. are jointly learned along with natural languages (i.e., textual information). Despite the success of these multi-modal language models with different modalities, there is no existing solution for neural network architectures and natural languages. Providing neur…
▽ More
Building multi-modal language models has been a trend in the recent years, where additional modalities such as image, video, speech, etc. are jointly learned along with natural languages (i.e., textual information). Despite the success of these multi-modal language models with different modalities, there is no existing solution for neural network architectures and natural languages. Providing neural architectural information as a new modality allows us to provide fast architecture-2-text and text-2-architecture retrieval/generation services on the cloud with a single inference. Such solution is valuable in terms of helping beginner and intermediate ML users to come up with better neural architectures or AutoML approaches with a simple text query. In this paper, we propose ArchBERT, a bi-modal model for joint learning and understanding of neural architectures and natural languages, which opens up new avenues for research in this area. We also introduce a pre-training strategy named Masked Architecture Modeling (MAM) for a more generalized joint learning. Moreover, we introduce and publicly release two new bi-modal datasets for training and validating our methods. The ArchBERT's performance is verified through a set of numerical experiments on different downstream tasks such as architecture-oriented reasoning, question answering, and captioning (summarization). Datasets, codes, and demos are available supplementary materials.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
ETran: Energy-Based Transferability Estimation
Authors:
Mohsen Gholami,
Mohammad Akbari,
Xinglu Wang,
Behnam Kamranian,
Yong Zhang
Abstract:
This paper addresses the problem of ranking pre-trained models for object detection and image classification. Selecting the best pre-trained model by fine-tuning is an expensive and time-consuming task. Previous works have proposed transferability estimation based on features extracted by the pre-trained models. We argue that quantifying whether the target dataset is in-distribution (IND) or out-o…
▽ More
This paper addresses the problem of ranking pre-trained models for object detection and image classification. Selecting the best pre-trained model by fine-tuning is an expensive and time-consuming task. Previous works have proposed transferability estimation based on features extracted by the pre-trained models. We argue that quantifying whether the target dataset is in-distribution (IND) or out-of-distribution (OOD) for the pre-trained model is an important factor in the transferability estimation. To this end, we propose ETran, an energy-based transferability assessment metric, which includes three scores: 1) energy score, 2) classification score, and 3) regression score. We use energy-based models to determine whether the target dataset is OOD or IND for the pre-trained model. In contrast to the prior works, ETran is applicable to a wide range of tasks including classification, regression, and object detection (classification+regression). This is the first work that proposes transferability estimation for object detection task. Our extensive experiments on four benchmarks and two tasks show that ETran outperforms previous works on object detection and classification benchmarks by an average of 21% and 12%, respectively, and achieves SOTA in transferability assessment.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
EnrichEvent: Enriching Social Data with Contextual Information for Emerging Event Extraction
Authors:
Mohammadali Sefidi Esfahani,
Mohammad Akbari
Abstract:
Social platforms have emerged as crucial platforms for distributing information and discussing social events, offering researchers an excellent opportunity to design and implement novel event detection frameworks. Identifying unspecified events and detecting events without prior knowledge enables governments, aid agencies, and experts to respond swiftly and effectively to unfolding situations, suc…
▽ More
Social platforms have emerged as crucial platforms for distributing information and discussing social events, offering researchers an excellent opportunity to design and implement novel event detection frameworks. Identifying unspecified events and detecting events without prior knowledge enables governments, aid agencies, and experts to respond swiftly and effectively to unfolding situations, such as natural disasters, by assessing severity and optimizing aid delivery. Social data is characterized by misspellings, incompleteness, word sense ambiguation, and irregular language. While discussing an ongoing event, users share different opinions and perspectives based on their prior experience, background, and knowledge. Prior works primarily leverage tweets' lexical and structural patterns to capture users' opinions and views about events. In this study, we propose an end-to-end novel framework, EnrichEvent, to identify unspecified events from streaming social data. In addition to lexical and structural patterns, we leverage contextual knowledge of the tweets to enrich their representation and gain a better perspective on users' opinions about events. Compared to our baselines, the EnrichEvent framework achieves the highest values for Consolidation outcome with an average of 87% vs. 67% and the lowest for Discrimination outcome with an average of 10% vs. 16%. Moreover, the Trending Data Extraction module in the EnrichEvent framework improves efficiency by reducing Runtime by up to 50% by identifying and discarding irrelevant tweets within message blocks, making the framework highly scalable for processing streaming data. Our source code and dataset are available in our official replication package.
△ Less
Submitted 11 June, 2025; v1 submitted 29 July, 2023;
originally announced July 2023.
-
A Hybrid Architecture for Out of Domain Intent Detection and Intent Discovery
Authors:
Masoud Akbari,
Ali Mohades,
M. Hassan Shirali-Shahreza
Abstract:
Intent Detection is one of the tasks of the Natural Language Understanding (NLU) unit in task-oriented dialogue systems. Out of Scope (OOS) and Out of Domain (OOD) inputs may run these systems into a problem. On the other side, a labeled dataset is needed to train a model for Intent Detection in task-oriented dialogue systems. The creation of a labeled dataset is time-consuming and needs human res…
▽ More
Intent Detection is one of the tasks of the Natural Language Understanding (NLU) unit in task-oriented dialogue systems. Out of Scope (OOS) and Out of Domain (OOD) inputs may run these systems into a problem. On the other side, a labeled dataset is needed to train a model for Intent Detection in task-oriented dialogue systems. The creation of a labeled dataset is time-consuming and needs human resources. The purpose of this article is to address mentioned problems. The task of identifying OOD/OOS inputs is named OOD/OOS Intent Detection. Also, discovering new intents and pseudo-labeling of OOD inputs is well known by Intent Discovery. In OOD intent detection part, we make use of a Variational Autoencoder to distinguish between known and unknown intents independent of input data distribution. After that, an unsupervised clustering method is used to discover different unknown intents underlying OOD/OOS inputs. We also apply a non-linear dimensionality reduction on OOD/OOS representations to make distances between representations more meaning full for clustering. Our results show that the proposed model for both OOD/OOS Intent Detection and Intent Discovery achieves great results and passes baselines in English and Persian languages.
△ Less
Submitted 30 July, 2023; v1 submitted 7 March, 2023;
originally announced March 2023.
-
A Persian Benchmark for Joint Intent Detection and Slot Filling
Authors:
Masoud Akbari,
Amir Hossein Karimi,
Tayyebeh Saeedi,
Zeinab Saeidi,
Kiana Ghezelbash,
Fatemeh Shamsezat,
Mohammad Akbari,
Ali Mohades
Abstract:
Natural Language Understanding (NLU) is important in today's technology as it enables machines to comprehend and process human language, leading to improved human-computer interactions and advancements in fields such as virtual assistants, chatbots, and language-based AI systems. This paper highlights the significance of advancing the field of NLU for low-resource languages. With intent detection…
▽ More
Natural Language Understanding (NLU) is important in today's technology as it enables machines to comprehend and process human language, leading to improved human-computer interactions and advancements in fields such as virtual assistants, chatbots, and language-based AI systems. This paper highlights the significance of advancing the field of NLU for low-resource languages. With intent detection and slot filling being crucial tasks in NLU, the widely used datasets ATIS and SNIPS have been utilized in the past. However, these datasets only cater to the English language and do not support other languages. In this work, we aim to address this gap by creating a Persian benchmark for joint intent detection and slot filling based on the ATIS dataset. To evaluate the effectiveness of our benchmark, we employ state-of-the-art methods for intent detection and slot filling.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
Russia-Ukraine war: Modeling and Clustering the Sentiments Trends of Various Countries
Authors:
Hamed Vahdat-Nejad,
Mohammad Ghasem Akbari,
Fatemeh Salmani,
Faezeh Azizi,
Hamid-Reza Nili-Sani
Abstract:
With Twitter's growth and popularity, a huge number of views are shared by users on various topics, making this platform a valuable information source on various political, social, and economic issues. This paper investigates English tweets on the Russia-Ukraine war to analyze trends reflecting users' opinions and sentiments regarding the conflict. The tweets' positive and negative sentiments are…
▽ More
With Twitter's growth and popularity, a huge number of views are shared by users on various topics, making this platform a valuable information source on various political, social, and economic issues. This paper investigates English tweets on the Russia-Ukraine war to analyze trends reflecting users' opinions and sentiments regarding the conflict. The tweets' positive and negative sentiments are analyzed using a BERT-based model, and the time series associated with the frequency of positive and negative tweets for various countries is calculated. Then, we propose a method based on the neighborhood average for modeling and clustering the time series of countries. The clustering results provide valuable insight into public opinion regarding this conflict. Among other things, we can mention the similar thoughts of users from the United States, Canada, the United Kingdom, and most Western European countries versus the shared views of Eastern European, Scandinavian, Asian, and South American nations toward the conflict.
△ Less
Submitted 2 January, 2023;
originally announced January 2023.
-
E-LANG: Energy-Based Joint Inferencing of Super and Swift Language Models
Authors:
Mohammad Akbari,
Amin Banitalebi-Dehkordi,
Yong Zhang
Abstract:
Building huge and highly capable language models has been a trend in the past years. Despite their great performance, they incur high computational cost. A common solution is to apply model compression or choose light-weight architectures, which often need a separate fixed-size model for each desirable computational budget, and may lose performance in case of heavy compression. This paper proposes…
▽ More
Building huge and highly capable language models has been a trend in the past years. Despite their great performance, they incur high computational cost. A common solution is to apply model compression or choose light-weight architectures, which often need a separate fixed-size model for each desirable computational budget, and may lose performance in case of heavy compression. This paper proposes an effective dynamic inference approach, called E-LANG, which distributes the inference between large accurate Super-models and light-weight Swift models. To this end, a decision making module routes the inputs to Super or Swift models based on the energy characteristics of the representations in the latent space. This method is easily adoptable and architecture agnostic. As such, it can be applied to black-box pre-trained models without a need for architectural manipulations, reassembling of modules, or re-training. Unlike existing methods that are only applicable to encoder-only backbones and classification tasks, our method also works for encoder-decoder structures and sequence-to-sequence tasks such as translation. The E-LANG performance is verified through a set of experiments with T5 and BERT backbones on GLUE, SuperGLUE, and WMT. In particular, we outperform T5-11B with an average computations speed-up of 3.3$\times$ on GLUE and 2.9$\times$ on SuperGLUE. We also achieve BERT-based SOTA on GLUE with 3.2$\times$ less computations. Code and demo are available in the supplementary materials.
△ Less
Submitted 1 March, 2022;
originally announced March 2022.
-
Deep Learning meets Liveness Detection: Recent Advancements and Challenges
Authors:
Arian Sabaghi,
Marzieh Oghbaie,
Kooshan Hashemifard,
Mohammad Akbari
Abstract:
Facial biometrics has been recently received tremendous attention as a convenient replacement for traditional authentication systems. Consequently, detecting malicious attempts has found great significance, leading to extensive studies in face anti-spoofing~(FAS),i.e., face presentation attack detection. Deep feature learning and techniques, as opposed to hand-crafted features, have promised a dra…
▽ More
Facial biometrics has been recently received tremendous attention as a convenient replacement for traditional authentication systems. Consequently, detecting malicious attempts has found great significance, leading to extensive studies in face anti-spoofing~(FAS),i.e., face presentation attack detection. Deep feature learning and techniques, as opposed to hand-crafted features, have promised a dramatic increase in the FAS systems' accuracy, tackling the key challenges of materializing the real-world application of such systems. Hence, a new research area dealing with the development of more generalized as well as accurate models is increasingly attracting the attention of the research community and industry. In this paper, we present a comprehensive survey on the literature related to deep-feature-based FAS methods since 2017. To shed light on this topic, a semantic taxonomy based on various features and learning methodologies is represented. Further, we cover predominant public datasets for FAS in chronological order, their evolutional progress, and the evaluation criteria (both intra-dataset and inter-dataset). Finally, we discuss the open research challenges and future directions.
△ Less
Submitted 29 December, 2021;
originally announced December 2021.
-
EBJR: Energy-Based Joint Reasoning for Adaptive Inference
Authors:
Mohammad Akbari,
Amin Banitalebi-Dehkordi,
Yong Zhang
Abstract:
State-of-the-art deep learning models have achieved significant performance levels on various benchmarks. However, the excellent performance comes at a cost of inefficient computational cost. Light-weight architectures, on the other hand, achieve moderate accuracies, but at a much more desirable latency. This paper presents a new method of jointly using the large accurate models together with the…
▽ More
State-of-the-art deep learning models have achieved significant performance levels on various benchmarks. However, the excellent performance comes at a cost of inefficient computational cost. Light-weight architectures, on the other hand, achieve moderate accuracies, but at a much more desirable latency. This paper presents a new method of jointly using the large accurate models together with the small fast ones. To this end, we propose an Energy-Based Joint Reasoning (EBJR) framework that adaptively distributes the samples between shallow and deep models to achieve an accuracy close to the deep model, but latency close to the shallow one. Our method is applicable to out-of-the-box pre-trained models as it does not require an architecture change nor re-training. Moreover, it is easy to use and deploy, especially for cloud services. Through a comprehensive set of experiments on different down-stream tasks, we show that our method outperforms strong state-of-the-art approaches with a considerable margin. In addition, we propose specialized EBJR, an extension of our method where we create a smaller specialized side model that performs the target task only partially, but yields an even higher accuracy and faster inference. We verify the strengths of our methods with both theoretical and experimental evaluations.
△ Less
Submitted 19 October, 2021;
originally announced October 2021.
-
Advances and Challenges in Deep Lip Reading
Authors:
Marzieh Oghbaie,
Arian Sabaghi,
Kooshan Hashemifard,
Mohammad Akbari
Abstract:
Driven by deep learning techniques and large-scale datasets, recent years have witnessed a paradigm shift in automatic lip reading. While the main thrust of Visual Speech Recognition (VSR) was improving accuracy of Audio Speech Recognition systems, other potential applications, such as biometric identification, and the promised gains of VSR systems, have motivated extensive efforts on developing t…
▽ More
Driven by deep learning techniques and large-scale datasets, recent years have witnessed a paradigm shift in automatic lip reading. While the main thrust of Visual Speech Recognition (VSR) was improving accuracy of Audio Speech Recognition systems, other potential applications, such as biometric identification, and the promised gains of VSR systems, have motivated extensive efforts on developing the lip reading technology. This paper provides a comprehensive survey of the state-of-the-art deep learning based VSR research with a focus on data challenges, task-specific complications, and the corresponding solutions. Advancements in these directions will expedite the transformation of silent speech interface from theory to practice. We also discuss the main modules of a VSR pipeline and the influential datasets. Finally, we introduce some typical VSR application concerns and impediments to real-world scenarios as well as future research directions.
△ Less
Submitted 15 October, 2021;
originally announced October 2021.
-
Bagging Supervised Autoencoder Classifier for Credit Scoring
Authors:
Mahsan Abdoli,
Mohammad Akbari,
Jamal Shahrabi
Abstract:
Credit scoring models, which are among the most potent risk management tools that banks and financial institutes rely on, have been a popular subject for research in the past few decades. Accordingly, many approaches have been developed to address the challenges in classifying loan applicants and improve and facilitate decision-making. The imbalanced nature of credit scoring datasets, as well as t…
▽ More
Credit scoring models, which are among the most potent risk management tools that banks and financial institutes rely on, have been a popular subject for research in the past few decades. Accordingly, many approaches have been developed to address the challenges in classifying loan applicants and improve and facilitate decision-making. The imbalanced nature of credit scoring datasets, as well as the heterogeneous nature of features in credit scoring datasets, pose difficulties in developing and implementing effective credit scoring models, targeting the generalization power of classification models on unseen data. In this paper, we propose the Bagging Supervised Autoencoder Classifier (BSAC) that mainly leverages the superior performance of the Supervised Autoencoder, which learns low-dimensional embeddings of the input data exclusively with regards to the ultimate classification task of credit scoring, based on the principles of multi-task learning. BSAC also addresses the data imbalance problem by employing a variant of the Bagging process based on the undersampling of the majority class. The obtained results from our experiments on the benchmark and real-life credit scoring datasets illustrate the robustness and effectiveness of the Bagging Supervised Autoencoder Classifier in the classification of loan applicants that can be regarded as a positive development in credit scoring models.
△ Less
Submitted 12 August, 2021;
originally announced August 2021.
-
Learned Image Compression with Gaussian-Laplacian-Logistic Mixture Model and Concatenated Residual Modules
Authors:
Haisheng Fu,
Feng Liang,
Jianping Lin,
Bing Li,
Mohammad Akbari,
Jie Liang,
Guohe Zhang,
Dong Liu,
Chengjie Tu,
Jingning Han
Abstract:
Recently deep learning-based image compression methods have achieved significant achievements and gradually outperformed traditional approaches including the latest standard Versatile Video Coding (VVC) in both PSNR and MS-SSIM metrics. Two key components of learned image compression are the entropy model of the latent representations and the encoding/decoding network architectures. Various models…
▽ More
Recently deep learning-based image compression methods have achieved significant achievements and gradually outperformed traditional approaches including the latest standard Versatile Video Coding (VVC) in both PSNR and MS-SSIM metrics. Two key components of learned image compression are the entropy model of the latent representations and the encoding/decoding network architectures. Various models have been proposed, such as autoregressive, softmax, logistic mixture, Gaussian mixture, and Laplacian. Existing schemes only use one of these models. However, due to the vast diversity of images, it is not optimal to use one model for all images, even different regions within one image. In this paper, we propose a more flexible discretized Gaussian-Laplacian-Logistic mixture model (GLLMM) for the latent representations, which can adapt to different contents in different images and different regions of one image more accurately and efficiently, given the same complexity. Besides, in the encoding/decoding network design part, we propose a concatenated residual blocks (CRB), where multiple residual blocks are serially connected with additional shortcut connections. The CRB can improve the learning ability of the network, which can further improve the compression performance. Experimental results using the Kodak, Tecnick-100 and Tecnick-40 datasets show that the proposed scheme outperforms all the leading learning-based methods and existing compression standards including VVC intra coding (4:4:4 and 4:2:0) in terms of the PSNR and MS-SSIM. The source code is available at \url{https://github.com/fengyurenpingsheng}
△ Less
Submitted 9 February, 2024; v1 submitted 13 July, 2021;
originally announced July 2021.
-
Age of Information Aware VNF Scheduling in Industrial IoT Using Deep Reinforcement Learning
Authors:
Mohammad Akbari,
Mohammad Reza Abedi,
Roghayeh Joda,
Mohsen Pourghasemian,
Nader Mokari,
Melike Erol-Kantarci
Abstract:
In delay-sensitive industrial internet of things (IIoT) applications, the age of information (AoI) is employed to characterize the freshness of information. Meanwhile, the emerging network function virtualization provides flexibility and agility for service providers to deliver a given network service using a sequence of virtual network functions (VNFs). However, suitable VNF placement and schedul…
▽ More
In delay-sensitive industrial internet of things (IIoT) applications, the age of information (AoI) is employed to characterize the freshness of information. Meanwhile, the emerging network function virtualization provides flexibility and agility for service providers to deliver a given network service using a sequence of virtual network functions (VNFs). However, suitable VNF placement and scheduling in these schemes is NP-hard and finding a globally optimal solution by traditional approaches is complex. Recently, deep reinforcement learning (DRL) has appeared as a viable way to solve such problems. In this paper, we first utilize single agent low-complex compound action actor-critic RL to cover both discrete and continuous actions and jointly minimize VNF cost and AoI in terms of network resources under end-to end Quality of Service constraints. To surmount the single-agent capacity limitation for learning, we then extend our solution to a multi-agent DRL scheme in which agents collaborate with each other. Simulation results demonstrate that single-agent schemes significantly outperform the greedy algorithm in terms of average network cost and AoI. Moreover, multi-agent solution decreases the average cost by dividing the tasks between the agents. However, it needs more iterations to be learned due to the requirement on the agents collaboration.
△ Less
Submitted 10 May, 2021;
originally announced May 2021.
-
A Compact Deep Learning Model for Face Spoofing Detection
Authors:
Seyedkooshan Hashemifard,
Mohammad Akbari
Abstract:
In recent years, face biometric security systems are rapidly increasing, therefore, the presentation attack detection (PAD) has received significant attention from research communities and has become a major field of research. Researchers have tackled the problem with various methods, from exploiting conventional texture feature extraction such as LBP, BSIF, and LPQ to using deep neural networks w…
▽ More
In recent years, face biometric security systems are rapidly increasing, therefore, the presentation attack detection (PAD) has received significant attention from research communities and has become a major field of research. Researchers have tackled the problem with various methods, from exploiting conventional texture feature extraction such as LBP, BSIF, and LPQ to using deep neural networks with different architectures. Despite the results each of these techniques has achieved for a certain attack scenario or dataset, most of them still failed to generalized the problem for unseen conditions, as the efficiency of each is limited to certain type of presentation attacks and instruments (PAI). In this paper, instead of completely extracting hand-crafted texture features or relying only on deep neural networks, we address the problem via fusing both wide and deep features in a unified neural architecture. The main idea is to take advantage of the strength of both methods to derive well-generalized solution for the problem. We also evaluated the effectiveness of our method by comparing the results with each of the mentioned techniques separately. The procedure is done on different spoofing datasets such as ROSE-Youtu, SiW and NUAA Imposter datasets.
In particular, we simultanously learn a low dimensional latent space empowered with data-driven features learnt via Convolutional Neural Network designes for spoofing detection task (i.e., deep channel) as well as leverages spoofing detection feature already popular for spoofing in frequency and temporal dimensions ( i.e., via wide channel).
△ Less
Submitted 12 January, 2021;
originally announced January 2021.
-
Learned Multi-Resolution Variable-Rate Image Compression with Octave-based Residual Blocks
Authors:
Mohammad Akbari,
Jie Liang,
Jingning Han,
Chengjie Tu
Abstract:
Recently deep learning-based image compression has shown the potential to outperform traditional codecs. However, most existing methods train multiple networks for multiple bit rates, which increase the implementation complexity. In this paper, we propose a new variable-rate image compression framework, which employs generalized octave convolutions (GoConv) and generalized octave transposed-convol…
▽ More
Recently deep learning-based image compression has shown the potential to outperform traditional codecs. However, most existing methods train multiple networks for multiple bit rates, which increase the implementation complexity. In this paper, we propose a new variable-rate image compression framework, which employs generalized octave convolutions (GoConv) and generalized octave transposed-convolutions (GoTConv) with built-in generalized divisive normalization (GDN) and inverse GDN (IGDN) layers. Novel GoConv- and GoTConv-based residual blocks are also developed in the encoder and decoder networks. Our scheme also uses a stochastic rounding-based scalar quantization. To further improve the performance, we encode the residual between the input and the reconstructed image from the decoder network as an enhancement layer. To enable a single model to operate with different bit rates and to learn multi-rate image features, a new objective function is introduced. Experimental results show that the proposed framework trained with variable-rate objective function outperforms the standard codecs such as H.265/HEVC-based BPG and state-of-the-art learning-based variable-rate methods.
△ Less
Submitted 31 December, 2020;
originally announced December 2020.
-
Twitter Spam Detection: A Systematic Review
Authors:
Sepideh Bazzaz Abkenar,
Mostafa Haghi Kashani,
Mohammad Akbari,
Ebrahim Mahdipour
Abstract:
Nowadays, with the rise of Internet access and mobile devices around the globe, more people are using social networks for collaboration and receiving real-time information. Twitter, the microblogging that is becoming a critical source of communication and news propagation, has grabbed the attention of spammers to distract users. So far, researchers have introduced various defense techniques to det…
▽ More
Nowadays, with the rise of Internet access and mobile devices around the globe, more people are using social networks for collaboration and receiving real-time information. Twitter, the microblogging that is becoming a critical source of communication and news propagation, has grabbed the attention of spammers to distract users. So far, researchers have introduced various defense techniques to detect spams and combat spammer activities on Twitter. To overcome this problem, in recent years, many novel techniques have been offered by researchers, which have greatly enhanced the spam detection performance. Therefore, it raises a motivation to conduct a systematic review about different approaches of spam detection on Twitter. This review focuses on comparing the existing research techniques on Twitter spam detection systematically. Literature review analysis reveals that most of the existing methods rely on Machine Learning-based algorithms. Among these Machine Learning algorithms, the major differences are related to various feature selection methods. Hence, we propose a taxonomy based on different feature selection methods and analyses, namely content analysis, user analysis, tweet analysis, network analysis, and hybrid analysis. Then, we present numerical analyses and comparative studies on current approaches, coming up with open challenges that help researchers develop solutions in this topic.
△ Less
Submitted 1 December, 2020; v1 submitted 30 November, 2020;
originally announced November 2020.
-
Dynamic Ensemble Learning for Credit Scoring: A Comparative Study
Authors:
Mahsan Abdoli,
Mohammad Akbari,
Jamal Shahrabi
Abstract:
Automatic credit scoring, which assesses the probability of default by loan applicants, plays a vital role in peer-to-peer lending platforms to reduce the risk of lenders. Although it has been demonstrated that dynamic selection techniques are effective for classification tasks, the performance of these techniques for credit scoring has not yet been determined. This study attempts to benchmark dif…
▽ More
Automatic credit scoring, which assesses the probability of default by loan applicants, plays a vital role in peer-to-peer lending platforms to reduce the risk of lenders. Although it has been demonstrated that dynamic selection techniques are effective for classification tasks, the performance of these techniques for credit scoring has not yet been determined. This study attempts to benchmark different dynamic selection approaches systematically for ensemble learning models to accurately estimate the credit scoring task on a large and high-dimensional real-life credit scoring data set. The results of this study indicate that dynamic selection techniques are able to boost the performance of ensemble models, especially in imbalanced training environments.
△ Less
Submitted 18 October, 2020;
originally announced October 2020.
-
Generalized Octave Convolutions for Learned Multi-Frequency Image Compression
Authors:
Mohammad Akbari,
Jie Liang,
Jingning Han,
Chengjie Tu
Abstract:
Learned image compression has recently shown the potential to outperform the standard codecs. State-of-the-art rate-distortion (R-D) performance has been achieved by context-adaptive entropy coding approaches in which hyperprior and autoregressive models are jointly utilized to effectively capture the spatial dependencies in the latent representations. However, the latents are feature maps of the…
▽ More
Learned image compression has recently shown the potential to outperform the standard codecs. State-of-the-art rate-distortion (R-D) performance has been achieved by context-adaptive entropy coding approaches in which hyperprior and autoregressive models are jointly utilized to effectively capture the spatial dependencies in the latent representations. However, the latents are feature maps of the same spatial resolution in previous works, which contain some redundancies that affect the R-D performance. In this paper, we propose the first learned multi-frequency image compression and entropy coding approach that is based on the recently developed octave convolutions to factorize the latents into high and low frequency (resolution) components, where the low frequency is represented by a lower resolution. Therefore, its spatial redundancy is reduced, which improves the R-D performance. Novel generalized octave convolution and octave transposed-convolution architectures with internal activation layers are also proposed to preserve more spatial structure of the information. Experimental results show that the proposed scheme not only outperforms all existing learned methods as well as standard codecs such as the next-generation video coding standard VVC (4:2:0) on the Kodak dataset in both PSNR and MS-SSIM. We also show that the proposed generalized octave convolution can improve the performance of other auto-encoder-based computer vision tasks such as semantic segmentation and image denoising.
△ Less
Submitted 31 December, 2020; v1 submitted 23 February, 2020;
originally announced February 2020.
-
Deep Learning-based Image Compression with Trellis Coded Quantization
Authors:
Binglin Li,
Mohammad Akbari,
Jie Liang,
Yang Wang
Abstract:
Recently many works attempt to develop image compression models based on deep learning architectures, where the uniform scalar quantizer (SQ) is commonly applied to the feature maps between the encoder and decoder. In this paper, we propose to incorporate trellis coded quantizer (TCQ) into a deep learning based image compression framework. A soft-to-hard strategy is applied to allow for back propa…
▽ More
Recently many works attempt to develop image compression models based on deep learning architectures, where the uniform scalar quantizer (SQ) is commonly applied to the feature maps between the encoder and decoder. In this paper, we propose to incorporate trellis coded quantizer (TCQ) into a deep learning based image compression framework. A soft-to-hard strategy is applied to allow for back propagation during training. We develop a simple image compression model that consists of three subnetworks (encoder, decoder and entropy estimation), and optimize all of the components in an end-to-end manner. We experiment on two high resolution image datasets and both show that our model can achieve superior performance at low bit rates. We also show the comparisons between TCQ and SQ based on our proposed baseline model and demonstrate the advantage of TCQ.
△ Less
Submitted 26 January, 2020;
originally announced January 2020.
-
Learned Variable-Rate Image Compression with Residual Divisive Normalization
Authors:
Mohammad Akbari,
Jie Liang,
Jingning Han,
Chengjie Tu
Abstract:
Recently it has been shown that deep learning-based image compression has shown the potential to outperform traditional codecs. However, most existing methods train multiple networks for multiple bit rates, which increases the implementation complexity. In this paper, we propose a variable-rate image compression framework, which employs more Generalized Divisive Normalization (GDN) layers than pre…
▽ More
Recently it has been shown that deep learning-based image compression has shown the potential to outperform traditional codecs. However, most existing methods train multiple networks for multiple bit rates, which increases the implementation complexity. In this paper, we propose a variable-rate image compression framework, which employs more Generalized Divisive Normalization (GDN) layers than previous GDN-based methods. Novel GDN-based residual sub-networks are also developed in the encoder and decoder networks. Our scheme also uses a stochastic rounding-based scalable quantization. To further improve the performance, we encode the residual between the input and the reconstructed image from the decoder network as an enhancement layer. To enable a single model to operate with different bit rates and to learn multi-rate image features, a new objective function is introduced. Experimental results show that the proposed framework trained with variable-rate objective function outperforms all standard codecs such as H.265/HEVC-based BPG and state-of-the-art learning-based variable-rate methods.
△ Less
Submitted 11 December, 2019;
originally announced December 2019.
-
MarlRank: Multi-agent Reinforced Learning to Rank
Authors:
Shihao Zou,
Zhonghua Li,
Mohammad Akbari,
Jun Wang,
Peng Zhang
Abstract:
When estimating the relevancy between a query and a document, ranking models largely neglect the mutual information among documents. A common wisdom is that if two documents are similar in terms of the same query, they are more likely to have similar relevance score. To mitigate this problem, in this paper, we propose a multi-agent reinforced ranking model, named MarlRank. In particular, by consid…
▽ More
When estimating the relevancy between a query and a document, ranking models largely neglect the mutual information among documents. A common wisdom is that if two documents are similar in terms of the same query, they are more likely to have similar relevance score. To mitigate this problem, in this paper, we propose a multi-agent reinforced ranking model, named MarlRank. In particular, by considering each document as an agent, we formulate the ranking process as a multi-agent Markov Decision Process (MDP), where the mutual interactions among documents are incorporated in the ranking process. To compute the ranking list, each document predicts its relevance to a query considering not only its own query-document features but also its similar documents features and actions. By defining reward as a function of NDCG, we can optimize our model directly on the ranking performance measure. Our experimental results on two LETOR benchmark datasets show that our model has significant performance gains over the state-of-art baselines. We also find that the NDCG shows an overall increasing trend along with the step of interactions, which demonstrates that the mutual information among documents helps improve the ranking performance.
△ Less
Submitted 15 September, 2019;
originally announced September 2019.
-
Using Contextual Information to Improve Blood Glucose Prediction
Authors:
Mohammad Akbari,
Rumi Chunara
Abstract:
Blood glucose value prediction is an important task in diabetes management. While it is reported that glucose concentration is sensitive to social context such as mood, physical activity, stress, diet, alongside the influence of diabetes pathologies, we need more research on data and methodologies to incorporate and evaluate signals about such temporal context into prediction models. Person-genera…
▽ More
Blood glucose value prediction is an important task in diabetes management. While it is reported that glucose concentration is sensitive to social context such as mood, physical activity, stress, diet, alongside the influence of diabetes pathologies, we need more research on data and methodologies to incorporate and evaluate signals about such temporal context into prediction models. Person-generated data sources, such as actively contributed surveys as well as passively mined data from social media offer opportunity to capture such context, however the self-reported nature and sparsity of such data mean that such data are noisier and less specific than physiological measures such as blood glucose values themselves. Therefore, here we propose a Gaussian Process model to both address these data challenges and combine blood glucose and latent feature representations of contextual data for a novel multi-signal blood glucose prediction task. We find this approach outperforms common methods for multi-variate data, as well as using the blood glucose values in isolation. Given a robust evaluation across two blood glucose datasets with different forms of contextual information, we conclude that multi-signal Gaussian Processes can improve blood glucose prediction by using contextual information and may provide a significant shift in blood glucose prediction research and practice.
△ Less
Submitted 24 August, 2019;
originally announced September 2019.
-
Improved Hybrid Layered Image Compression using Deep Learning and Traditional Codecs
Authors:
Haisheng Fu,
Feng Liang,
Bo Lei,
Nai Bian,
Qian zhang,
Mohammad Akbari,
Jie Liang,
Chengjie Tu
Abstract:
Recently deep learning-based methods have been applied in image compression and achieved many promising results. In this paper, we propose an improved hybrid layered image compression framework by combining deep learning and the traditional image codecs. At the encoder, we first use a convolutional neural network (CNN) to obtain a compact representation of the input image, which is losslessly enco…
▽ More
Recently deep learning-based methods have been applied in image compression and achieved many promising results. In this paper, we propose an improved hybrid layered image compression framework by combining deep learning and the traditional image codecs. At the encoder, we first use a convolutional neural network (CNN) to obtain a compact representation of the input image, which is losslessly encoded by the FLIF codec as the base layer of the bit stream. A coarse reconstruction of the input is obtained by another CNN from the reconstructed compact representation. The residual between the input and the coarse reconstruction is then obtained and encoded by the H.265/HEVC-based BPG codec as the enhancement layer of the bit stream. Experimental results using the Kodak and Tecnick datasets show that the proposed scheme outperforms the state-of-the-art deep learning-based layered coding scheme and traditional codecs including BPG in both PSNR and MS-SSIM metrics across a wide range of bit rates, when the images are coded in the RGB444 domain.
△ Less
Submitted 15 July, 2019;
originally announced July 2019.
-
Detecting Target-Area Link-Flooding DDoS Attacks using Traffic Analysis and Supervised Learning
Authors:
Mostafa Rezazad,
Matthias R. Brust,
Mohammad Akbari,
Pascal Bouvry,
Ngai-Man Cheung
Abstract:
A novel class of extreme link-flooding DDoS (Distributed Denial of Service) attacks is designed to cut off entire geographical areas such as cities and even countries from the Internet by simultaneously targeting a selected set of network links. The Crossfire attack is a target-area link-flooding attack, which is orchestrated in three complex phases. The attack uses a massively distributed large-s…
▽ More
A novel class of extreme link-flooding DDoS (Distributed Denial of Service) attacks is designed to cut off entire geographical areas such as cities and even countries from the Internet by simultaneously targeting a selected set of network links. The Crossfire attack is a target-area link-flooding attack, which is orchestrated in three complex phases. The attack uses a massively distributed large-scale botnet to generate low-rate benign traffic aiming to congest selected network links, so-called target links. The adoption of benign traffic, while simultaneously targeting multiple network links, makes detecting the Crossfire attack a serious challenge. In this paper, we present analytical and emulated results showing hitherto unidentified vulnerabilities in the execution of the attack, such as a correlation between coordination of the botnet traffic and the quality of the attack, and a correlation between the attack distribution and detectability of the attack. Additionally, we identified a warm-up period due to the bot synchronization. For attack detection, we report results of using two supervised machine learning approaches: Support Vector Machine (SVM) and Random Forest (RF) for classification of network traffic to normal and abnormal traffic, i.e, attack traffic. These machine learning models have been trained in various scenarios using the link volume as the main feature set.
△ Less
Submitted 1 March, 2019;
originally announced March 2019.
-
From the User to the Medium: Neural Profiling Across Web Communities
Authors:
Mohammad Akbari,
Kunal Relia,
Anas Elghafari,
Rumi Chunara
Abstract:
Online communities provide a unique way for individuals to access information from those in similar circumstances, which can be critical for health conditions that require daily and personalized management. As these groups and topics often arise organically, identifying the types of topics discussed is necessary to understand their needs. As well, these communities and people in them can be quite…
▽ More
Online communities provide a unique way for individuals to access information from those in similar circumstances, which can be critical for health conditions that require daily and personalized management. As these groups and topics often arise organically, identifying the types of topics discussed is necessary to understand their needs. As well, these communities and people in them can be quite diverse, and existing community detection methods have not been extended towards evaluating these heterogeneities. This has been limited as community detection methodologies have not focused on community detection based on semantic relations between textual features of the user-generated content. Thus here we develop an approach, NeuroCom, that optimally finds dense groups of users as communities in a latent space inferred by neural representation of published contents of users. By embedding of words and messages, we show that NeuroCom demonstrates improved clustering and identifies more nuanced discussion topics in contrast to other common unsupervised learning approaches.
△ Less
Submitted 3 December, 2018;
originally announced December 2018.
-
Named Entity Disambiguation using Deep Learning on Graphs
Authors:
Alberto Cetoli,
Mohammad Akbari,
Stefano Bragaglia,
Andrew D. O'Harney,
Marc Sloan
Abstract:
We tackle \ac{NED} by comparing entities in short sentences with \wikidata{} graphs. Creating a context vector from graphs through deep learning is a challenging problem that has never been applied to \ac{NED}. Our main contribution is to present an experimental study of recent neural techniques, as well as a discussion about which graph features are most important for the disambiguation task. In…
▽ More
We tackle \ac{NED} by comparing entities in short sentences with \wikidata{} graphs. Creating a context vector from graphs through deep learning is a challenging problem that has never been applied to \ac{NED}. Our main contribution is to present an experimental study of recent neural techniques, as well as a discussion about which graph features are most important for the disambiguation task. In addition, a new dataset (\wikidatadisamb{}) is created to allow a clean and scalable evaluation of \ac{NED} with \wikidata{} entries, and to be used as a reference in future research. In the end our results show that a \ac{Bi-LSTM} encoding of the graph triplets performs best, improving upon the baseline models and scoring an \rm{F1} value of $91.6\%$ on the \wikidatadisamb{} test set
△ Less
Submitted 22 October, 2018;
originally announced October 2018.
-
DSSLIC: Deep Semantic Segmentation-based Layered Image Compression
Authors:
Mohammad Akbari,
Jie Liang,
Jingning Han
Abstract:
Deep learning has revolutionized many computer vision fields in the last few years, including learning-based image compression. In this paper, we propose a deep semantic segmentation-based layered image compression (DSSLIC) framework in which the semantic segmentation map of the input image is obtained and encoded as the base layer of the bit-stream. A compact representation of the input image is…
▽ More
Deep learning has revolutionized many computer vision fields in the last few years, including learning-based image compression. In this paper, we propose a deep semantic segmentation-based layered image compression (DSSLIC) framework in which the semantic segmentation map of the input image is obtained and encoded as the base layer of the bit-stream. A compact representation of the input image is also generated and encoded as the first enhancement layer. The segmentation map and the compact version of the image are then employed to obtain a coarse reconstruction of the image. The residual between the input and the coarse reconstruction is additionally encoded as another enhancement layer. Experimental results show that the proposed framework outperforms the H.265/HEVC-based BPG and other codecs in both PSNR and MS-SSIM metrics across a wide range of bit rates in RGB domain. Besides, since semantic segmentation map is included in the bit-stream, the proposed scheme can facilitate many other tasks such as image search and object-based adaptive image compression.
△ Less
Submitted 18 April, 2019; v1 submitted 8 June, 2018;
originally announced June 2018.
-
Semi-Recurrent CNN-based VAE-GAN for Sequential Data Generation
Authors:
Mohammad Akbari,
Jie Liang
Abstract:
A semi-recurrent hybrid VAE-GAN model for generating sequential data is introduced. In order to consider the spatial correlation of the data in each frame of the generated sequence, CNNs are utilized in the encoder, generator, and discriminator. The subsequent frames are sampled from the latent distributions obtained by encoding the previous frames. As a result, the dependencies between the frames…
▽ More
A semi-recurrent hybrid VAE-GAN model for generating sequential data is introduced. In order to consider the spatial correlation of the data in each frame of the generated sequence, CNNs are utilized in the encoder, generator, and discriminator. The subsequent frames are sampled from the latent distributions obtained by encoding the previous frames. As a result, the dependencies between the frames are maintained. Two testing frameworks for synthesizing a sequence with any number of frames are also proposed. The promising experimental results on piano music generation indicates the potential of the proposed framework in modeling other sequential data such as video.
△ Less
Submitted 1 June, 2018;
originally announced June 2018.
-
Socio-spatial Self-organizing Maps: Using Social Media to Assess Relevant Geographies for Exposure to Social Processes
Authors:
Kunal Relia,
Mohammad Akbari,
Dustin Duncan,
Rumi Chunara
Abstract:
Social media offers a unique window into attitudes like racism and homophobia, exposure to which are important, hard to measure and understudied social determinants of health. However, individual geo-located observations from social media are noisy and geographically inconsistent. Existing areas by which exposures are measured, like Zip codes, average over irrelevant administratively-defined bound…
▽ More
Social media offers a unique window into attitudes like racism and homophobia, exposure to which are important, hard to measure and understudied social determinants of health. However, individual geo-located observations from social media are noisy and geographically inconsistent. Existing areas by which exposures are measured, like Zip codes, average over irrelevant administratively-defined boundaries. Hence, in order to enable studies of online social environmental measures like attitudes on social media and their possible relationship to health outcomes, first there is a need for a method to define the collective, underlying degree of social media attitudes by region. To address this, we create the Socio-spatial-Self organizing map, "SS-SOM" pipeline to best identify regions by their latent social attitude from Twitter posts. SS-SOMs use neural embedding for text-classification, and augment traditional SOMs to generate a controlled number of non-overlapping, topologically-constrained and topically-similar clusters. We find that not only are SS-SOMs robust to missing data, the exposure of a cohort of men who are susceptible to multiple racism and homophobia-linked health outcomes, changes by up to 42% using SS-SOM measures as compared to using Zip code-based measures.
△ Less
Submitted 4 September, 2018; v1 submitted 23 March, 2018;
originally announced March 2018.
-
Adaptive specular reflection detection and inpainting in colonoscopy video frames
Authors:
Mojtaba Akbari,
Majid Mohrekesh,
S. M. Reza Soroushmehr,
Nader Karimi,
Shadrokh Samavi,
Kayvan Najarian
Abstract:
Colonoscopy video frames might be contaminated by bright spots with unsaturated values known as specular reflection. Detection and removal of such reflections could enhance the quality of colonoscopy images and facilitate diagnosis procedure. In this paper we propose a novel two-phase method for this purpose, consisting of detection and removal phases. In the detection phase, we employ both HSV an…
▽ More
Colonoscopy video frames might be contaminated by bright spots with unsaturated values known as specular reflection. Detection and removal of such reflections could enhance the quality of colonoscopy images and facilitate diagnosis procedure. In this paper we propose a novel two-phase method for this purpose, consisting of detection and removal phases. In the detection phase, we employ both HSV and RGB color space information for segmentation of specular reflections. We first train a non-linear SVM for selecting a color space based on image statistical features extracted from each channel of the color spaces. Then, a cost function for detection of specular reflections is introduced. In the removal phase, we propose a two-step inpainting method which consists of appropriate replacement patch selection and removal of the blockiness effects. The proposed method is evaluated by testing on an available colonoscopy image database where accuracy and Dice score of 99.68% and 71.79% are achieved respectively.
△ Less
Submitted 23 February, 2018;
originally announced February 2018.
-
Left Ventricle Segmentation in Cardiac MR Images Using Fully Convolutional Network
Authors:
Mina Nasr-Esfahani,
Majid Mohrekesh,
Mojtaba Akbari,
S. M. Reza Soroushmehr,
Ebrahim Nasr-Esfahani,
Nader Karimi,
Shadrokh Samavi,
Kayvan Najarian
Abstract:
Medical image analysis, especially segmenting a specific organ, has an important role in developing clinical decision support systems. In cardiac magnetic resonance (MR) imaging, segmenting the left and right ventricles helps physicians diagnose different heart abnormalities. There are challenges for this task, including the intensity and shape similarity between left ventricle and other organs, i…
▽ More
Medical image analysis, especially segmenting a specific organ, has an important role in developing clinical decision support systems. In cardiac magnetic resonance (MR) imaging, segmenting the left and right ventricles helps physicians diagnose different heart abnormalities. There are challenges for this task, including the intensity and shape similarity between left ventricle and other organs, inaccurate boundaries and presence of noise in most of the images. In this paper we propose an automated method for segmenting the left ventricle in cardiac MR images. We first automatically extract the region of interest, and then employ it as an input of a fully convolutional network. We train the network accurately despite the small number of left ventricle pixels in comparison with the whole image. Thresholding on the output map of the fully convolutional network and selection of regions based on their roundness are performed in our proposed post-processing phase. The Dice score of our method reaches 87.24% by applying this algorithm on the York dataset of heart images.
△ Less
Submitted 21 February, 2018;
originally announced February 2018.
-
Modeling and predicting measured response time of cloud-based web services using long-memory time series
Authors:
Hossein Nourikhah,
Mohammad Kazem Akbari,
Mohammad Kalantari
Abstract:
Predicting cloud performance from user's perspective is a complex task, because of several factors involved in providing the service to the consumer. In this work, the response time of 10 real-world services is analyzed. We have observed long memory in terms of the measured response time of the CPU-intensive services and statistically verified this observation using estimators of the Hurst exponen…
▽ More
Predicting cloud performance from user's perspective is a complex task, because of several factors involved in providing the service to the consumer. In this work, the response time of 10 real-world services is analyzed. We have observed long memory in terms of the measured response time of the CPU-intensive services and statistically verified this observation using estimators of the Hurst exponent. Then, naïve, mean, autoregressive integrated moving average (ARIMA) and autoregressive fractionally integrated moving average (ARFIMA) methods are used to forecast the future values of quality of service (QoS) at runtime. Results of the cross-validation over the 10 datasets show that the long-memory ARFIMA model provides the mean of 37.5 % and the maximum of 57.8 % reduction in the forecast error when compared to the short-memory ARIMA model according to the standard error measure of mean absolute percentage error. Our work implies that consideration of the long-range dependence in QoS data can help to improve the selection of services according to their possible future QoS values.
△ Less
Submitted 11 April, 2016;
originally announced April 2016.
-
CB-REFIM: A Practical Coordinated Beamforming in Multicell Networks
Authors:
Mohammad Hossein Akbari,
Vahid Tabataba Vakili
Abstract:
Performance of multicell systems is inevitably limited by interference and available resources. Although intercell interference can be mitigated by Base Station (BS) Coordination, the demand on inter-BS information exchange and computational complexity grows rapidly with the number of cells, subcarriers, and users. On the other hand, some of the existing coordination beamforming methods need compu…
▽ More
Performance of multicell systems is inevitably limited by interference and available resources. Although intercell interference can be mitigated by Base Station (BS) Coordination, the demand on inter-BS information exchange and computational complexity grows rapidly with the number of cells, subcarriers, and users. On the other hand, some of the existing coordination beamforming methods need computation of pseudo-inverse or generalized eigenvector of a matrix, which are practically difficult to implement in a real system. To handle these issues, we propose a novel linear beamforming across a set of coordinated cells only with limiting backhaul signalling. Resource allocation (i.e. precoding and power control) is formulated as an optimization problem with objective function of signal-to-interference-plus-noise ratios (SINRs) in order to maximize the instantaneous weighted sum-rate subject to power constraints. Although the primal problem is nonconvex and difficult to be optimally solved, an iterative algorithm is presented based on the Karush-Kuhn-Tucker (KKT) condition. To have a practical solution with low computational complexity and signalling overhead, we present CB-REFIM (coordination beamforming-reference based interference management) and show the recently proposed REFIM algorithm can be interpreted as a special case of CB-REFIM. We evaluate CB-REFIM through extensive simulation and observe that the proposed strategies achieve close-to-optimal performance.
△ Less
Submitted 9 July, 2014; v1 submitted 5 July, 2014;
originally announced July 2014.
-
Near-Optimal Virtual Machine Packing Based on Resource Requirement of Service Demands Using Pattern Clustering
Authors:
Yaghoob Siahmargooei,
Mohammad Kazem Akbari,
Seyyed Alireza Hashemi Golpayegani,
Saeed Sharifian
Abstract:
Upon the expansion of Cloud Computing and the positive outlook of organizations with regard to the movements towards using cloud computing and their expanding utilization of such valuable processing method, as well as the solutions provided by the cloud infrastructure providers with regard to the reduction of the costs of processing resources, the problem of organizing resources in a cloud environ…
▽ More
Upon the expansion of Cloud Computing and the positive outlook of organizations with regard to the movements towards using cloud computing and their expanding utilization of such valuable processing method, as well as the solutions provided by the cloud infrastructure providers with regard to the reduction of the costs of processing resources, the problem of organizing resources in a cloud environment gained a high importance. One of the major preoccupations of the minds of cloud infrastructure clients is their lack of knowledge on the quantity of their required processing resources in different periods of time. The managers and technicians are trying to make the most use of scalability and the flexibility of the resources in cloud computing. The main challenge is with calculating the amount of the required processing resources per moment with regard to the quantity of incoming requests of the service. Through deduction of the accurate amount of these items, one can have an accurate estimation of the requests per moment. This paper aims at introducing a model for automatic scaling of the cloud resources that would reduce the cost of renting the resources for the clients of cloud infrastructure. Thus, first we start with a thorough explanation of the proposal and the major components of the model. Then through calculating the incomings of the model through clustering and introducing the way that each of these components work in different phases,...
△ Less
Submitted 27 June, 2014;
originally announced June 2014.