-
RadarSeq: A Temporal Vision Framework for User Churn Prediction via Radar Chart Sequences
Authors:
Sina Najafi,
M. Hadi Sepanj,
Fahimeh Jafari
Abstract:
Predicting user churn in non-subscription gig platforms, where disengagement is implicit, poses unique challenges due to the absence of explicit labels and the dynamic nature of user behavior. Existing methods often rely on aggregated snapshots or static visual representations, which obscure temporal cues critical for early detection. In this work, we propose a temporally-aware computer vision fra…
▽ More
Predicting user churn in non-subscription gig platforms, where disengagement is implicit, poses unique challenges due to the absence of explicit labels and the dynamic nature of user behavior. Existing methods often rely on aggregated snapshots or static visual representations, which obscure temporal cues critical for early detection. In this work, we propose a temporally-aware computer vision framework that models user behavioral patterns as a sequence of radar chart images, each encoding day-level behavioral features. By integrating a pretrained CNN encoder with a bidirectional LSTM, our architecture captures both spatial and temporal patterns underlying churn behavior. Extensive experiments on a large real-world dataset demonstrate that our method outperforms classical models and ViT-based radar chart baselines, yielding gains of 17.7 in F1 score, 29.4 in precision, and 16.1 in AUC, along with improved interpretability. The framework's modular design, explainability tools, and efficient deployment characteristics make it suitable for large-scale churn modeling in dynamic gig-economy platforms.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Fast Audio Codec Identification Using Overlapping LCS
Authors:
Farzane Jafari
Abstract:
Audio data are widely exchanged over telecommunications networks. Due to the limitations of network resources, these data are typically compressed before transmission. Various methods are available for compressing audio data. To access such audio information, it is first necessary to identify the codec used for compression. One of the most effective approaches for audio codec identification involv…
▽ More
Audio data are widely exchanged over telecommunications networks. Due to the limitations of network resources, these data are typically compressed before transmission. Various methods are available for compressing audio data. To access such audio information, it is first necessary to identify the codec used for compression. One of the most effective approaches for audio codec identification involves analyzing the content of received packets. In these methods, statistical features extracted from the packets are utilized to determine the codec employed. This paper proposes a novel method for audio codec classification based on features derived from the overlapped longest common sub-string and sub-sequence (LCS). The simulation results, which achieved an accuracy of 97% for 8 KB packets, demonstrate the superiority of the proposed method over conventional approaches. This method divides each 8 KB packet into fifteen 1 KB packets with a 50% overlap. The results indicate that this division has no significant impact on the simulation outcomes, while significantly speeding up the feature extraction, being eight times faster than the traditional method for extracting LCS features.
△ Less
Submitted 11 February, 2025; v1 submitted 2 February, 2025;
originally announced February 2025.
-
Detection of Vascular Leukoencephalopathy in CT Images
Authors:
Z. Cernekova,
V. Sisik,
F. Jafari
Abstract:
Artificial intelligence (AI) has seen a significant surge in popularity, particularly in its application to medicine. This study explores AI's role in diagnosing leukoencephalopathy, a small vessel disease of the brain, and a leading cause of vascular dementia and hemorrhagic strokes. We utilized a dataset of approximately 1200 patients with axial brain CT scans to train convolutional neural netwo…
▽ More
Artificial intelligence (AI) has seen a significant surge in popularity, particularly in its application to medicine. This study explores AI's role in diagnosing leukoencephalopathy, a small vessel disease of the brain, and a leading cause of vascular dementia and hemorrhagic strokes. We utilized a dataset of approximately 1200 patients with axial brain CT scans to train convolutional neural networks (CNNs) for binary disease classification. Addressing the challenge of varying scan dimensions due to different patient physiologies, we processed the data to a uniform size and applied three preprocessing methods to improve model accuracy. We compared four neural network architectures: ResNet50, ResNet50 3D, ConvNext, and Densenet. The ConvNext model achieved the highest accuracy of 98.5% without any preprocessing, outperforming models with 3D convolutions. To gain insights into model decision-making, we implemented Grad-CAM heatmaps, which highlighted the focus areas of the models on the scans. Our results demonstrate that AI, particularly the ConvNext architecture, can significantly enhance diagnostic accuracy for leukoencephalopathy. This study underscores AI's potential in advancing diagnostic methodologies for brain diseases and highlights the effectiveness of CNNs in medical imaging applications.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
Vocal Melody Construction for Persian Lyrics Using LSTM Recurrent Neural Networks
Authors:
Farshad Jafari,
Farzad Didehvar,
Amin Gheibi
Abstract:
The present paper investigated automatic melody construction for Persian lyrics as an input. It was assumed that there is a phonological correlation between the lyric syllables and the melody in a song. A seq2seq neural network was developed to investigate this assumption, trained on parallel syllable and note sequences in Persian songs to suggest a pleasant melody for a new sequence of syllables.…
▽ More
The present paper investigated automatic melody construction for Persian lyrics as an input. It was assumed that there is a phonological correlation between the lyric syllables and the melody in a song. A seq2seq neural network was developed to investigate this assumption, trained on parallel syllable and note sequences in Persian songs to suggest a pleasant melody for a new sequence of syllables. More than 100 pieces of Persian music were collected and converted from the printed version to the digital format due to the lack of a dataset on Persian digital music. Finally, 14 new lyrics were given to the model as input, and the suggested melodies were performed and recorded by music experts to evaluate the trained model. The evaluation was conducted using an audio questionnaire, which more than 170 persons answered. According to the answers about the pleasantness of melody, the system outputs scored an average of 3.005 from 5, while the human-made melodies for the same lyrics obtained an average score of 4.078.
△ Less
Submitted 23 October, 2024;
originally announced October 2024.
-
Striking a New Chord: Neural Networks in Music Information Dynamics
Authors:
Farshad Jafari,
Claire Arthur
Abstract:
Initiating a quest to unravel the complexities of musical aesthetics through the lens of information dynamics, our study delves into the realm of musical sequence modeling, drawing a parallel between the sequential structured nature of music and natural language.
Despite the prevalence of neural network models in MIR, the modeling of symbolic music events as applied to music cognition and music…
▽ More
Initiating a quest to unravel the complexities of musical aesthetics through the lens of information dynamics, our study delves into the realm of musical sequence modeling, drawing a parallel between the sequential structured nature of music and natural language.
Despite the prevalence of neural network models in MIR, the modeling of symbolic music events as applied to music cognition and music neuroscience has largely relied on statistical models. In this "proof of concept" paper we posit the superiority of neural network models over statistical models for predicting musical events. Specifically, we compare LSTM, Transformer, and GPT models against a widely-used markov model to predict a chord event following a sequence of chords.
Utilizing chord sequences from the McGill Billboard dataset, we trained each model to predict the next chord from a given sequence of chords. We found that neural models significantly outperformed statistical ones in our study. Specifically, the LSTM with attention model led with an accuracy of 0.329, followed by Transformer models at 0.321, GPT at 0.301, and standard LSTM at 0.191. Variable Order Markov and Markov trailed behind with accuracies of 0.277 and 0.140, respectively. Encouraged by these results, we extended our investigation to multidimensional modeling, employing a many-to-one LSTM, LSTM with attention, Transformer, and GPT predictors. These models were trained on both chord and melody lines as two-dimensional data using the CoCoPops Billboard dataset, achieving an accuracy of 0.083, 0.312, 0.271, and 0.120, respectively, in predicting the next chord.
△ Less
Submitted 23 October, 2024; v1 submitted 23 October, 2024;
originally announced October 2024.
-
Traversing Emotional Landscapes and Linguistic Patterns in Bernard-Marie Koltès' Plays: An NLP Perspective
Authors:
Arezou Zahiri Pourzarandi,
Farshad Jafari
Abstract:
This study employs Natural Language Processing (NLP) to analyze the intricate linguistic and emotional dimensions within the plays of Bernard-Marie Koltès, a central figure in contemporary French theatre. By integrating advanced computational techniques, we dissect Koltès' narrative style, revealing the subtle interplay between language and emotion across his dramatic oeuvre. Our findings highligh…
▽ More
This study employs Natural Language Processing (NLP) to analyze the intricate linguistic and emotional dimensions within the plays of Bernard-Marie Koltès, a central figure in contemporary French theatre. By integrating advanced computational techniques, we dissect Koltès' narrative style, revealing the subtle interplay between language and emotion across his dramatic oeuvre. Our findings highlight how Koltès crafts his narratives, enriching our understanding of his thematic explorations and contributing to the broader field of digital humanities in literary analysis.
△ Less
Submitted 12 October, 2024;
originally announced October 2024.
-
I or Not I: Unraveling the Linguistic Echoes of Identity in Samuel Beckett's "Not I" Through Natural Language Processing
Authors:
Arezou Zahiri Pourzarandi,
Farshad Jafari
Abstract:
Exploring the depths of Samuel Beckett's "Not I" through advanced natural language processing techniques, this research uncovers the intricate linguistic structures that underpin the text. By analyzing word frequency, detecting emotional sentiments with a BERT-based model, and examining repetitive motifs, we unveil how Beckett's minimalist yet complex language reflects the protagonist's fragmented…
▽ More
Exploring the depths of Samuel Beckett's "Not I" through advanced natural language processing techniques, this research uncovers the intricate linguistic structures that underpin the text. By analyzing word frequency, detecting emotional sentiments with a BERT-based model, and examining repetitive motifs, we unveil how Beckett's minimalist yet complex language reflects the protagonist's fragmented psyche. Our results demonstrate that recurring themes of time, memory, and existential angst are artfully woven through recursive linguistic patterns and rhythmic repetition. This innovative approach not only deepens our understanding of Beckett's stylistic contributions but also highlights his unique role in modern literature, where language transcends simple communication to explore profound existential questions.
△ Less
Submitted 12 October, 2024;
originally announced October 2024.
-
Towards Symbolic XAI -- Explanation Through Human Understandable Logical Relationships Between Features
Authors:
Thomas Schnake,
Farnoush Rezaei Jafari,
Jonas Lederer,
Ping Xiong,
Shinichi Nakajima,
Stefan Gugler,
Grégoire Montavon,
Klaus-Robert Müller
Abstract:
Explainable Artificial Intelligence (XAI) plays a crucial role in fostering transparency and trust in AI systems, where traditional XAI approaches typically offer one level of abstraction for explanations, often in the form of heatmaps highlighting single or multiple input features. However, we ask whether abstract reasoning or problem-solving strategies of a model may also be relevant, as these a…
▽ More
Explainable Artificial Intelligence (XAI) plays a crucial role in fostering transparency and trust in AI systems, where traditional XAI approaches typically offer one level of abstraction for explanations, often in the form of heatmaps highlighting single or multiple input features. However, we ask whether abstract reasoning or problem-solving strategies of a model may also be relevant, as these align more closely with how humans approach solutions to problems. We propose a framework, called Symbolic XAI, that attributes relevance to symbolic queries expressing logical relationships between input features, thereby capturing the abstract reasoning behind a model's predictions. The methodology is built upon a simple yet general multi-order decomposition of model predictions. This decomposition can be specified using higher-order propagation-based relevance methods, such as GNN-LRP, or perturbation-based explanation methods commonly used in XAI. The effectiveness of our framework is demonstrated in the domains of natural language processing (NLP), vision, and quantum chemistry (QC), where abstract symbolic domain knowledge is abundant and of significant interest to users. The Symbolic XAI framework provides an understanding of the model's decision-making process that is both flexible for customization by the user and human-readable through logical formulas.
△ Less
Submitted 1 October, 2024; v1 submitted 30 August, 2024;
originally announced August 2024.
-
JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid Transformer-Mamba Model
Authors:
Farzaneh Jafari,
Stefano Berretti,
Anup Basu
Abstract:
In recent years, talking head generation has become a focal point for researchers. Considerable effort is being made to refine lip-sync motion, capture expressive facial expressions, generate natural head poses, and achieve high video quality. However, no single model has yet achieved equivalence across all these metrics. This paper aims to animate a 3D face using Jamba, a hybrid Transformers-Mamb…
▽ More
In recent years, talking head generation has become a focal point for researchers. Considerable effort is being made to refine lip-sync motion, capture expressive facial expressions, generate natural head poses, and achieve high video quality. However, no single model has yet achieved equivalence across all these metrics. This paper aims to animate a 3D face using Jamba, a hybrid Transformers-Mamba model. Mamba, a pioneering Structured State Space Model (SSM) architecture, was designed to address the constraints of the conventional Transformer architecture. Nevertheless, it has several drawbacks. Jamba merges the advantages of both Transformer and Mamba approaches, providing a holistic solution. Based on the foundational Jamba block, we present JambaTalk to enhance motion variety and speed through multimodal integration. Extensive experiments reveal that our method achieves performance comparable or superior to state-of-the-art models.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
MambaLRP: Explaining Selective State Space Sequence Models
Authors:
Farnoush Rezaei Jafari,
Grégoire Montavon,
Klaus-Robert Müller,
Oliver Eberle
Abstract:
Recent sequence modeling approaches using selective state space sequence models, referred to as Mamba models, have seen a surge of interest. These models allow efficient processing of long sequences in linear time and are rapidly being adopted in a wide range of applications such as language modeling, demonstrating promising performance. To foster their reliable use in real-world scenarios, it is…
▽ More
Recent sequence modeling approaches using selective state space sequence models, referred to as Mamba models, have seen a surge of interest. These models allow efficient processing of long sequences in linear time and are rapidly being adopted in a wide range of applications such as language modeling, demonstrating promising performance. To foster their reliable use in real-world scenarios, it is crucial to augment their transparency. Our work bridges this critical gap by bringing explainability, particularly Layer-wise Relevance Propagation (LRP), to the Mamba architecture. Guided by the axiom of relevance conservation, we identify specific components in the Mamba architecture, which cause unfaithful explanations. To remedy this issue, we propose MambaLRP, a novel algorithm within the LRP framework, which ensures a more stable and reliable relevance propagation through these components. Our proposed method is theoretically sound and excels in achieving state-of-the-art explanation performance across a diverse range of models and datasets. Moreover, MambaLRP facilitates a deeper inspection of Mamba architectures, uncovering various biases and evaluating their significance. It also enables the analysis of previous speculations regarding the long-range capabilities of Mamba models.
△ Less
Submitted 15 January, 2025; v1 submitted 11 June, 2024;
originally announced June 2024.
-
Improved bounds on the size of permutation codes under Kendall $τ$-metric
Authors:
Farzad Parvaresh,
Reza Sobhani,
Alireza Abdollahi,
Javad Bagherian,
Fatemeh Jafari,
Maryam Khatami
Abstract:
In order to overcome the challenges caused by flash memories and also to protect against errors related to reading information stored in DNA molecules in the shotgun sequencing method, the rank modulation is proposed. In the rank modulation framework, codewords are permutations. In this paper, we study the largest size $P(n, d)$ of permutation codes of length $n$, i.e., subsets of the set $S_n$ of…
▽ More
In order to overcome the challenges caused by flash memories and also to protect against errors related to reading information stored in DNA molecules in the shotgun sequencing method, the rank modulation is proposed. In the rank modulation framework, codewords are permutations. In this paper, we study the largest size $P(n, d)$ of permutation codes of length $n$, i.e., subsets of the set $S_n$ of all permutations on $\{1,\ldots, n\}$ with the minimum distance at least $d\in\{1,\ldots ,\binom{n}{2}\}$ under the Kendall $τ$-metric. By presenting an algorithm and some theorems, we managed to improve the known lower and upper bounds for $P(n,d)$. In particular, we show that $P(n,d)=4$ for all $n\geq 6$ and $\frac{3}{5}\binom{n}{2}< d \leq \frac{2}{3} \binom{n}{2}$. Additionally, we prove that for any prime number $n$ and integer $r\leq \frac{n}{6}$, $ P(n,3)\leq (n-1)!-\dfrac{n-6r}{\sqrt{n^2-8rn+20r^2}}\sqrt{\dfrac{(n-1)!}{n(n-r)!}}. $ This result greatly improves the upper bound of $P(n,3)$ for all primes $n\geq 37$.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Adaptive Token Sampling For Efficient Vision Transformers
Authors:
Mohsen Fayyaz,
Soroush Abbasi Koohpayegani,
Farnoush Rezaei Jafari,
Sunando Sengupta,
Hamid Reza Vaezi Joze,
Eric Sommerlade,
Hamed Pirsiavash,
Juergen Gall
Abstract:
While state-of-the-art vision transformer models achieve promising results in image classification, they are computationally expensive and require many GFLOPs. Although the GFLOPs of a vision transformer can be decreased by reducing the number of tokens in the network, there is no setting that is optimal for all input images. In this work, we therefore introduce a differentiable parameter-free Ada…
▽ More
While state-of-the-art vision transformer models achieve promising results in image classification, they are computationally expensive and require many GFLOPs. Although the GFLOPs of a vision transformer can be decreased by reducing the number of tokens in the network, there is no setting that is optimal for all input images. In this work, we therefore introduce a differentiable parameter-free Adaptive Token Sampler (ATS) module, which can be plugged into any existing vision transformer architecture. ATS empowers vision transformers by scoring and adaptively sampling significant tokens. As a result, the number of tokens is not constant anymore and varies for each input image. By integrating ATS as an additional layer within the current transformer blocks, we can convert them into much more efficient vision transformers with an adaptive number of tokens. Since ATS is a parameter-free module, it can be added to the off-the-shelf pre-trained vision transformers as a plug and play module, thus reducing their GFLOPs without any additional training. Moreover, due to its differentiable design, one can also train a vision transformer equipped with ATS. We evaluate the efficiency of our module in both image and video classification tasks by adding it to multiple SOTA vision transformers. Our proposed module improves the SOTA by reducing their computational costs (GFLOPs) by 2X, while preserving their accuracy on the ImageNet, Kinetics-400, and Kinetics-600 datasets.
△ Less
Submitted 26 July, 2022; v1 submitted 30 November, 2021;
originally announced November 2021.
-
Generating Multilingual Parallel Corpus Using Subtitles
Authors:
Farshad Jafari
Abstract:
Neural Machine Translation with its significant results, still has a great problem: lack or absence of parallel corpus for many languages. This article suggests a method for generating considerable amount of parallel corpus for any language pairs, extracted from open source materials existing on the Internet. Parallel corpus contents will be derived from video subtitles. It needs a set of video ti…
▽ More
Neural Machine Translation with its significant results, still has a great problem: lack or absence of parallel corpus for many languages. This article suggests a method for generating considerable amount of parallel corpus for any language pairs, extracted from open source materials existing on the Internet. Parallel corpus contents will be derived from video subtitles. It needs a set of video titles, with some attributes like release date, rating, duration and etc. Process of finding and downloading subtitle pairs for desired language pairs is automated by using a crawler. Finally sentence pairs will be extracted from synchronous dialogues in subtitles. The main problem of this method is unsynchronized subtitle pairs. Therefore subtitles will be verified before downloading. If two subtitle were not synchronized, then another subtitle of that video will be processed till it finds the matching subtitle. Using this approach gives ability to make context based parallel corpus through filtering videos by genre. Context based corpus can be used in complex translators which decode sentences by different networks after determining contents subject. Languages have many differences in their formal and informal styles, including words and syntax. Other advantage of this method is to make corpus of informal style of languages. Because most of movies dialogues are parts of a conversation. So they had informal style. This feature of generated corpus can be used in real-time translators to have more accurate conversation translations.
△ Less
Submitted 11 April, 2018;
originally announced April 2018.