-
Simulation Based Control Architecture Using Webots and Simulink
Authors:
Harun Kurt,
Ahmet Cayir,
Kadir Erkan
Abstract:
This paper presents a simulation based control architecture that integrates Webots and Simulink for the development and testing of robotic systems. Using Webots for 3D physics based simulation and Simulink for control system design, real time testing and controller validation are achieved efficiently. The proposed approach aims to reduce hardware in the loop dependency in early development stages,…
▽ More
This paper presents a simulation based control architecture that integrates Webots and Simulink for the development and testing of robotic systems. Using Webots for 3D physics based simulation and Simulink for control system design, real time testing and controller validation are achieved efficiently. The proposed approach aims to reduce hardware in the loop dependency in early development stages, offering a cost effective and modular control framework for academic, industrial, and robotics applications.
△ Less
Submitted 4 May, 2025;
originally announced May 2025.
-
A Use Case: Reformulating Query Rewriting as a Statistical Machine Translation Problem
Authors:
Abdullah Can Algan,
Emre Yürekli,
Aykut Çayır
Abstract:
One of the most important challenges for modern search engines is to retrieve relevant web content based on user queries. In order to achieve this challenge, search engines have a module to rewrite user queries. That is why modern web search engines utilize some statistical and neural models used in the natural language processing domain. Statistical machine translation is a well-known NLP method…
▽ More
One of the most important challenges for modern search engines is to retrieve relevant web content based on user queries. In order to achieve this challenge, search engines have a module to rewrite user queries. That is why modern web search engines utilize some statistical and neural models used in the natural language processing domain. Statistical machine translation is a well-known NLP method among them. The paper proposes a query rewriting pipeline based on a monolingual machine translation model that learns to rewrite Arabic user search queries. This paper also describes preprocessing steps to create a mapping between user queries and web page titles.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
An Ensemble of Pre-trained Transformer Models For Imbalanced Multiclass Malware Classification
Authors:
Ferhat Demirkıran,
Aykut Çayır,
Uğur Ünal,
Hasan Dağ
Abstract:
Classification of malware families is crucial for a comprehensive understanding of how they can infect devices, computers, or systems. Thus, malware identification enables security researchers and incident responders to take precautions against malware and accelerate mitigation. API call sequences made by malware are widely utilized features by machine and deep learning models for malware classifi…
▽ More
Classification of malware families is crucial for a comprehensive understanding of how they can infect devices, computers, or systems. Thus, malware identification enables security researchers and incident responders to take precautions against malware and accelerate mitigation. API call sequences made by malware are widely utilized features by machine and deep learning models for malware classification as these sequences represent the behavior of malware. However, traditional machine and deep learning models remain incapable of capturing sequence relationships between API calls. On the other hand, the transformer-based models process sequences as a whole and learn relationships between API calls due to multi-head attention mechanisms and positional embeddings. Our experiments demonstrate that the transformer model with one transformer block layer surpassed the widely used base architecture, LSTM. Moreover, BERT or CANINE, pre-trained transformer models, outperformed in classifying highly imbalanced malware families according to evaluation metrics, F1-score, and AUC score. Furthermore, the proposed bagging-based random transformer forest (RTF), an ensemble of BERT or CANINE, has reached the state-of-the-art evaluation scores on three out of four datasets, particularly state-of-the-art F1-score of 0.6149 on one of the commonly used benchmark dataset.
△ Less
Submitted 22 June, 2022; v1 submitted 25 December, 2021;
originally announced December 2021.
-
Benchmark Static API Call Datasets for Malware Family Classification
Authors:
Berkant Düzgün,
Aykut Çayır,
Ferhat Demirkıran,
Ceyda Nur Kahya,
Buket Gençaydın,
Hasan Dağ
Abstract:
Nowadays, malware and malware incidents are increasing daily, even with various antivirus systems and malware detection or classification methodologies. Machine learning techniques have been the main focus of the security experts to detect malware and determine their families. Many static, dynamic, and hybrid techniques have been presented for that purpose. In this study, the static analysis techn…
▽ More
Nowadays, malware and malware incidents are increasing daily, even with various antivirus systems and malware detection or classification methodologies. Machine learning techniques have been the main focus of the security experts to detect malware and determine their families. Many static, dynamic, and hybrid techniques have been presented for that purpose. In this study, the static analysis technique has been applied to malware samples to extract API calls, which is one of the most used features in machine/deep learning models as it represents the behavior of malware samples.
Since the rapid increase and continuous evolution of malware affect the detection capacity of antivirus scanners, recent and updated datasets of malicious software became necessary to overcome this drawback. This paper introduces two new datasets: One with 14,616 samples obtained and compiled from VirusShare and one with 9,795 samples from VirusSample. In addition, benchmark results based on static API calls of malware samples are presented using several machine and deep learning models on these datasets. We believe that these two datasets and benchmark results enable researchers to test and validate their methods and approaches in this field.
△ Less
Submitted 4 August, 2022; v1 submitted 30 November, 2021;
originally announced November 2021.
-
Random CapsNet Forest Model for Imbalanced Malware Type Classification Task
Authors:
Aykut Çayır,
Uğur Ünal,
Hasan Dağ
Abstract:
Behavior of a malware varies with respect to malware types. Therefore,knowing type of a malware affects strategies of system protection softwares. Many malware type classification models empowered by machine and deep learning achieve superior accuracies to predict malware types.Machine learning based models need to do heavy feature engineering and feature engineering is dominantly effecting perfor…
▽ More
Behavior of a malware varies with respect to malware types. Therefore,knowing type of a malware affects strategies of system protection softwares. Many malware type classification models empowered by machine and deep learning achieve superior accuracies to predict malware types.Machine learning based models need to do heavy feature engineering and feature engineering is dominantly effecting performance of models.On the other hand, deep learning based models require less feature engineering than machine learning based models. However, traditional deep learning architectures and components cause very complex and data sensitive models. Capsule network architecture minimizes this complexity and data sensitivity unlike classical convolutional neural network architectures. This paper proposes an ensemble capsule network model based on bootstrap aggregating technique. The proposed method are tested on two malware datasets, whose the-state-of-the-art results are well-known.
△ Less
Submitted 23 August, 2020; v1 submitted 20 December, 2019;
originally announced December 2019.