-
How Bidirectionality Helps Language Models Learn Better via Dynamic Bottleneck Estimation
Authors:
Md Kowsher,
Nusrat Jahan Prottasha,
Shiyun Xu,
Shetu Mohanto,
Chen Chen,
Ozlem Garibay,
Niloofar Yousefi
Abstract:
Bidirectional language models have better context understanding and perform better than unidirectional models on natural language understanding tasks, yet the theoretical reasons behind this advantage remain unclear. In this work, we investigate this disparity through the lens of the Information Bottleneck (IB) principle, which formalizes a trade-off between compressing input information and prese…
▽ More
Bidirectional language models have better context understanding and perform better than unidirectional models on natural language understanding tasks, yet the theoretical reasons behind this advantage remain unclear. In this work, we investigate this disparity through the lens of the Information Bottleneck (IB) principle, which formalizes a trade-off between compressing input information and preserving task-relevant content. We propose FlowNIB, a dynamic and scalable method for estimating mutual information during training that addresses key limitations of classical IB approaches, including computational intractability and fixed trade-off schedules. Theoretically, we show that bidirectional models retain more mutual information and exhibit higher effective dimensionality than unidirectional models. To support this, we present a generalized framework for measuring representational complexity and prove that bidirectional representations are strictly more informative under mild conditions. We further validate our findings through extensive experiments across multiple models and tasks using FlowNIB, revealing how information is encoded and compressed throughout training. Together, our work provides a principled explanation for the effectiveness of bidirectional architectures and introduces a practical tool for analyzing information flow in deep language models.
△ Less
Submitted 2 June, 2025; v1 submitted 1 June, 2025;
originally announced June 2025.
-
Predicting Through Generation: Why Generation Is Better for Prediction
Authors:
Md Kowsher,
Nusrat Jahan Prottasha,
Prakash Bhat,
Chun-Nam Yu,
Mojtaba Soltanalian,
Ivan Garibay,
Ozlem Garibay,
Chen Chen,
Niloofar Yousefi
Abstract:
This paper argues that generating output tokens is more effective than using pooled representations for prediction tasks because token-level generation retains more mutual information. Since LLMs are trained on massive text corpora using next-token prediction, generation aligns naturally with their learned behavior. Using the Data Processing Inequality (DPI), we provide both theoretical and empiri…
▽ More
This paper argues that generating output tokens is more effective than using pooled representations for prediction tasks because token-level generation retains more mutual information. Since LLMs are trained on massive text corpora using next-token prediction, generation aligns naturally with their learned behavior. Using the Data Processing Inequality (DPI), we provide both theoretical and empirical evidence supporting this claim. However, autoregressive models face two key challenges when used for prediction: (1) exposure bias, where the model sees ground truth tokens during training but relies on its own predictions during inference, leading to errors, and (2) format mismatch, where discrete tokens do not always align with the tasks required output structure. To address these challenges, we introduce PredGen(Predicting Through Generating), an end to end framework that (i) uses scheduled sampling to reduce exposure bias, and (ii) introduces a task adapter to convert the generated tokens into structured outputs. Additionally, we introduce Writer-Director Alignment Loss (WDAL), which ensures consistency between token generation and final task predictions, improving both text coherence and numerical accuracy. We evaluate PredGen on multiple classification and regression benchmarks. Our results show that PredGen consistently outperforms standard baselines, demonstrating its effectiveness in structured prediction tasks.
△ Less
Submitted 26 May, 2025; v1 submitted 24 February, 2025;
originally announced February 2025.
-
BnTTS: Few-Shot Speaker Adaptation in Low-Resource Setting
Authors:
Mohammad Jahid Ibna Basher,
Md Kowsher,
Md Saiful Islam,
Rabindra Nath Nandi,
Nusrat Jahan Prottasha,
Mehadi Hasan Menon,
Tareq Al Muntasir,
Shammur Absar Chowdhury,
Firoj Alam,
Niloofar Yousefi,
Ozlem Ozmen Garibay
Abstract:
This paper introduces BnTTS (Bangla Text-To-Speech), the first framework for Bangla speaker adaptation-based TTS, designed to bridge the gap in Bangla speech synthesis using minimal training data. Building upon the XTTS architecture, our approach integrates Bangla into a multilingual TTS pipeline, with modifications to account for the phonetic and linguistic characteristics of the language. We pre…
▽ More
This paper introduces BnTTS (Bangla Text-To-Speech), the first framework for Bangla speaker adaptation-based TTS, designed to bridge the gap in Bangla speech synthesis using minimal training data. Building upon the XTTS architecture, our approach integrates Bangla into a multilingual TTS pipeline, with modifications to account for the phonetic and linguistic characteristics of the language. We pre-train BnTTS on 3.85k hours of Bangla speech dataset with corresponding text labels and evaluate performance in both zero-shot and few-shot settings on our proposed test dataset. Empirical evaluations in few-shot settings show that BnTTS significantly improves the naturalness, intelligibility, and speaker fidelity of synthesized Bangla speech. Compared to state-of-the-art Bangla TTS systems, BnTTS exhibits superior performance in Subjective Mean Opinion Score (SMOS), Naturalness, and Clarity metrics.
△ Less
Submitted 8 February, 2025;
originally announced February 2025.
-
Does Self-Attention Need Separate Weights in Transformers?
Authors:
Md Kowsher,
Nusrat Jahan Prottasha,
Chun-Nam Yu,
Ozlem Ozmen Garibay,
Niloofar Yousefi
Abstract:
The success of self-attention lies in its ability to capture long-range dependencies and enhance context understanding, but it is limited by its computational complexity and challenges in handling sequential data with inherent directionality. This work introduces a shared weight self-attention-based BERT model that only learns one weight matrix for (Key, Value, and Query) representations instead o…
▽ More
The success of self-attention lies in its ability to capture long-range dependencies and enhance context understanding, but it is limited by its computational complexity and challenges in handling sequential data with inherent directionality. This work introduces a shared weight self-attention-based BERT model that only learns one weight matrix for (Key, Value, and Query) representations instead of three individual matrices for each of them. Our shared weight attention reduces the training parameter size by more than half and training time by around one-tenth. Furthermore, we demonstrate higher prediction accuracy on small tasks of GLUE over the BERT baseline and in particular a generalization power on noisy and out-of-domain data. Experimental results indicate that our shared self-attention method achieves a parameter size reduction of 66.53% in the attention block. In the GLUE dataset, the shared weight self-attention-based BERT model demonstrates accuracy improvements of 0.38%, 5.81%, and 1.06% over the standard, symmetric, and pairwise attention-based BERT models, respectively. The model and source code are available at Anonymous.
△ Less
Submitted 2 May, 2025; v1 submitted 29 November, 2024;
originally announced December 2024.
-
LLM-Mixer: Multiscale Mixing in LLMs for Time Series Forecasting
Authors:
Md Kowsher,
Md. Shohanur Islam Sobuj,
Nusrat Jahan Prottasha,
E. Alejandro Alanis,
Ozlem Ozmen Garibay,
Niloofar Yousefi
Abstract:
Time series forecasting remains a challenging task, particularly in the context of complex multiscale temporal patterns. This study presents LLM-Mixer, a framework that improves forecasting accuracy through the combination of multiscale time-series decomposition with pre-trained LLMs (Large Language Models). LLM-Mixer captures both short-term fluctuations and long-term trends by decomposing the da…
▽ More
Time series forecasting remains a challenging task, particularly in the context of complex multiscale temporal patterns. This study presents LLM-Mixer, a framework that improves forecasting accuracy through the combination of multiscale time-series decomposition with pre-trained LLMs (Large Language Models). LLM-Mixer captures both short-term fluctuations and long-term trends by decomposing the data into multiple temporal resolutions and processing them with a frozen LLM, guided by a textual prompt specifically designed for time-series data. Extensive experiments conducted on multivariate and univariate datasets demonstrate that LLM-Mixer achieves competitive performance, outperforming recent state-of-the-art models across various forecasting horizons. This work highlights the potential of combining multiscale analysis and LLMs for effective and scalable time-series forecasting.
△ Less
Submitted 1 June, 2025; v1 submitted 15 October, 2024;
originally announced October 2024.
-
RoCoFT: Efficient Finetuning of Large Language Models with Row-Column Updates
Authors:
Md Kowsher,
Tara Esmaeilbeig,
Chun-Nam Yu,
Chen Chen,
Mojtaba Soltanalian,
Niloofar Yousefi
Abstract:
We propose RoCoFT, a parameter-efficient fine-tuning method for large-scale language models (LMs) based on updating only a few rows and columns of the weight matrices in transformers. Through extensive experiments with medium-size LMs like BERT and RoBERTa, and larger LMs like Bloom-7B, Llama2-7B, and Llama2-13B, we show that our method gives comparable or better accuracies than state-of-art PEFT…
▽ More
We propose RoCoFT, a parameter-efficient fine-tuning method for large-scale language models (LMs) based on updating only a few rows and columns of the weight matrices in transformers. Through extensive experiments with medium-size LMs like BERT and RoBERTa, and larger LMs like Bloom-7B, Llama2-7B, and Llama2-13B, we show that our method gives comparable or better accuracies than state-of-art PEFT methods while also being more memory and computation-efficient. We also study the reason behind the effectiveness of our method with tools from neural tangent kernel theory. We empirically demonstrate that our kernel, constructed using a restricted set of row and column parameters, are numerically close to the full-parameter kernel and gives comparable classification performance. Ablation studies are conducted to investigate the impact of different algorithmic choices, including the selection strategy for rows and columns as well as the optimal rank for effective implementation of our method.
△ Less
Submitted 1 June, 2025; v1 submitted 13 October, 2024;
originally announced October 2024.
-
Parameter-Efficient Fine-Tuning of Large Language Models using Semantic Knowledge Tuning
Authors:
Nusrat Jahan Prottasha,
Asif Mahmud,
Md. Shohanur Islam Sobuj,
Prakash Bhat,
Md Kowsher,
Niloofar Yousefi,
Ozlem Ozmen Garibay
Abstract:
Large Language Models (LLMs) are gaining significant popularity in recent years for specialized tasks using prompts due to their low computational cost. Standard methods like prefix tuning utilize special, modifiable tokens that lack semantic meaning and require extensive training for best performance, often falling short. In this context, we propose a novel method called Semantic Knowledge Tuning…
▽ More
Large Language Models (LLMs) are gaining significant popularity in recent years for specialized tasks using prompts due to their low computational cost. Standard methods like prefix tuning utilize special, modifiable tokens that lack semantic meaning and require extensive training for best performance, often falling short. In this context, we propose a novel method called Semantic Knowledge Tuning (SK-Tuning) for prompt and prefix tuning that employs meaningful words instead of random tokens. This method involves using a fixed LLM to understand and process the semantic content of the prompt through zero-shot capabilities. Following this, it integrates the processed prompt with the input text to improve the model's performance on particular tasks. Our experimental results show that SK-Tuning exhibits faster training times, fewer parameters, and superior performance on tasks such as text classification and understanding compared to other tuning methods. This approach offers a promising method for optimizing the efficiency and effectiveness of LLMs in processing language tasks.
△ Less
Submitted 11 October, 2024;
originally announced October 2024.
-
Morphological Detection and Classification of Microplastics and Nanoplastics Emerged from Consumer Products by Deep Learning
Authors:
Hadi Rezvani,
Navid Zarrabi,
Ishaan Mehta,
Christopher Kolios,
Hussein Ali Jaafar,
Cheng-Hao Kao,
Sajad Saeedi,
Nariman Yousefi
Abstract:
Plastic pollution presents an escalating global issue, impacting health and environmental systems, with micro- and nanoplastics found across mediums from potable water to air. Traditional methods for studying these contaminants are labor-intensive and time-consuming, necessitating a shift towards more efficient technologies. In response, this paper introduces micro- and nanoplastics (MiNa), a nove…
▽ More
Plastic pollution presents an escalating global issue, impacting health and environmental systems, with micro- and nanoplastics found across mediums from potable water to air. Traditional methods for studying these contaminants are labor-intensive and time-consuming, necessitating a shift towards more efficient technologies. In response, this paper introduces micro- and nanoplastics (MiNa), a novel and open-source dataset engineered for the automatic detection and classification of micro and nanoplastics using object detection algorithms. The dataset, comprising scanning electron microscopy images simulated under realistic aquatic conditions, categorizes plastics by polymer type across a broad size spectrum. We demonstrate the application of state-of-the-art detection algorithms on MiNa, assessing their effectiveness and identifying the unique challenges and potential of each method. The dataset not only fills a critical gap in available resources for microplastic research but also provides a robust foundation for future advancements in the field.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
Characterizing Multimedia Information Environment through Multi-modal Clustering of YouTube Videos
Authors:
Niloofar Yousefi,
Mainuddin Shaik,
Nitin Agarwal
Abstract:
This study aims to investigate the comprehensive characterization of information content in multimedia (videos), particularly on YouTube. The research presents a multi-method framework for characterizing multimedia content by clustering signals from various modalities, such as audio, video, and text. With a focus on South China Sea videos as a case study, this approach aims to enhance our understa…
▽ More
This study aims to investigate the comprehensive characterization of information content in multimedia (videos), particularly on YouTube. The research presents a multi-method framework for characterizing multimedia content by clustering signals from various modalities, such as audio, video, and text. With a focus on South China Sea videos as a case study, this approach aims to enhance our understanding of online content, especially on YouTube. The dataset includes 160 videos, and our findings offer insights into content themes and patterns within different modalities of a video based on clusters. Text modality analysis revealed topical themes related to geopolitical countries, strategies, and global security, while video and audio modality analysis identified distinct patterns of signals related to diverse sets of videos, including news analysis/reporting, educational content, and interviews. Furthermore, our findings uncover instances of content repurposing within video clusters, which were identified using the barcode technique and audio similarity assessments. These findings indicate potential content amplification techniques. In conclusion, this study uniquely enhances our current understanding of multimedia content information based on modality clustering techniques.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Multimodal Characterization of Emotion within Multimedia Space
Authors:
Dayo Samuel Banjo,
Connice Trimmingham,
Niloofar Yousefi,
Nitin Agarwal
Abstract:
Technological advancement and its omnipresent connection have pushed humans past the boundaries and limitations of a computer screen, physical state, or geographical location. It has provided a depth of avenues that facilitate human-computer interaction that was once inconceivable such as audio and body language detection. Given the complex modularities of emotions, it becomes vital to study human…
▽ More
Technological advancement and its omnipresent connection have pushed humans past the boundaries and limitations of a computer screen, physical state, or geographical location. It has provided a depth of avenues that facilitate human-computer interaction that was once inconceivable such as audio and body language detection. Given the complex modularities of emotions, it becomes vital to study human-computer interaction, as it is the commencement of a thorough understanding of the emotional state of users and, in the context of social networks, the producers of multimodal information. This study first acknowledges the accuracy of classification found within multimodal emotion detection systems compared to unimodal solutions. Second, it explores the characterization of multimedia content produced based on their emotions and the coherence of emotion in different modalities by utilizing deep learning models to classify emotion across different modalities.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
FragXsiteDTI: Revealing Responsible Segments in Drug-Target Interaction with Transformer-Driven Interpretation
Authors:
Ali Khodabandeh Yalabadi,
Mehdi Yazdani-Jahromi,
Niloofar Yousefi,
Aida Tayebi,
Sina Abdidizaji,
Ozlem Ozmen Garibay
Abstract:
Drug-Target Interaction (DTI) prediction is vital for drug discovery, yet challenges persist in achieving model interpretability and optimizing performance. We propose a novel transformer-based model, FragXsiteDTI, that aims to address these challenges in DTI prediction. Notably, FragXsiteDTI is the first DTI model to simultaneously leverage drug molecule fragments and protein pockets. Our informa…
▽ More
Drug-Target Interaction (DTI) prediction is vital for drug discovery, yet challenges persist in achieving model interpretability and optimizing performance. We propose a novel transformer-based model, FragXsiteDTI, that aims to address these challenges in DTI prediction. Notably, FragXsiteDTI is the first DTI model to simultaneously leverage drug molecule fragments and protein pockets. Our information-rich representations for both proteins and drugs offer a detailed perspective on their interaction. Inspired by the Perceiver IO framework, our model features a learnable latent array, initially interacting with protein binding site embeddings using cross-attention and later refined through self-attention and used as a query to the drug fragments in the drug's cross-attention transformer block. This learnable query array serves as a mediator and enables seamless information translation, preserving critical nuances in drug-protein interactions. Our computational results on three benchmarking datasets demonstrate the superior predictive power of our model over several state-of-the-art models. We also show the interpretability of our model in terms of the critical components of both target proteins and drug molecules within drug-target pairs.
△ Less
Submitted 4 November, 2023;
originally announced November 2023.
-
Comparing Toxicity Across Social Media Platforms for COVID-19 Discourse
Authors:
Nahiyan Bin Noor,
Niloofar Yousefi,
Billy Spann,
Nitin Agarwal
Abstract:
The emergence of toxic information on social networking sites, such as Twitter, Parler, and Reddit, has become a growing concern. Consequently, this study aims to assess the level of toxicity in COVID-19 discussions on Twitter, Parler, and Reddit. Using data analysis from January 1 through December 31, 2020, we examine the development of toxicity over time and compare the findings across the three…
▽ More
The emergence of toxic information on social networking sites, such as Twitter, Parler, and Reddit, has become a growing concern. Consequently, this study aims to assess the level of toxicity in COVID-19 discussions on Twitter, Parler, and Reddit. Using data analysis from January 1 through December 31, 2020, we examine the development of toxicity over time and compare the findings across the three platforms. The results indicate that Parler had lower toxicity levels than both Twitter and Reddit in discussions related to COVID-19. In contrast, Reddit showed the highest levels of toxicity, largely due to various anti-vaccine forums that spread misinformation about COVID-19 vaccines. Notably, our analysis of COVID-19 vaccination conversations on Twitter also revealed a significant presence of conspiracy theories among individuals with highly toxic attitudes. Our computational approach provides decision-makers with useful information about reducing the spread of toxicity within online communities. The study's findings highlight the importance of taking action to encourage more uplifting and productive online discourse across all platforms.
△ Less
Submitted 26 April, 2023; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Sequential Deep Learning for Credit Risk Monitoring with Tabular Financial Data
Authors:
Jillian M. Clements,
Di Xu,
Nooshin Yousefi,
Dmitry Efimov
Abstract:
Machine learning plays an essential role in preventing financial losses in the banking industry. Perhaps the most pertinent prediction task that can result in billions of dollars in losses each year is the assessment of credit risk (i.e., the risk of default on debt). Today, much of the gains from machine learning to predict credit risk are driven by gradient boosted decision tree models. However,…
▽ More
Machine learning plays an essential role in preventing financial losses in the banking industry. Perhaps the most pertinent prediction task that can result in billions of dollars in losses each year is the assessment of credit risk (i.e., the risk of default on debt). Today, much of the gains from machine learning to predict credit risk are driven by gradient boosted decision tree models. However, these gains begin to plateau without the addition of expensive new data sources or highly engineered features. In this paper, we present our attempts to create a novel approach to assessing credit risk using deep learning that does not rely on new model inputs. We propose a new credit card transaction sampling technique to use with deep recurrent and causal convolution-based neural networks that exploits long historical sequences of financial data without costly resource requirements. We show that our sequential deep learning approach using a temporal convolutional network outperformed the benchmark non-sequential tree-based model, achieving significant financial savings and earlier detection of credit risk. We also demonstrate the potential for our approach to be used in a production environment, where our sampling technique allows for sequences to be stored efficiently in memory and used for fast online learning and inference.
△ Less
Submitted 30 December, 2020;
originally announced December 2020.
-
Probabilistic Model of Narratives Over Topical Trends in Social Media: A Discrete Time Model
Authors:
Toktam A. Oghaz,
Ece C. Mutlu,
Jasser Jasser,
Niloofar Yousefi,
Ivan Garibay
Abstract:
Online social media platforms are turning into the prime source of news and narratives about worldwide events. However,a systematic summarization-based narrative extraction that can facilitate communicating the main underlying events is lacking. To address this issue, we propose a novel event-based narrative summary extraction framework. Our proposed framework is designed as a probabilistic topic…
▽ More
Online social media platforms are turning into the prime source of news and narratives about worldwide events. However,a systematic summarization-based narrative extraction that can facilitate communicating the main underlying events is lacking. To address this issue, we propose a novel event-based narrative summary extraction framework. Our proposed framework is designed as a probabilistic topic model, with categorical time distribution, followed by extractive text summarization. Our topic model identifies topics' recurrence over time with a varying time resolution. This framework not only captures the topic distributions from the data, but also approximates the user activity fluctuations over time. Furthermore, we define significance-dispersity trade-off (SDT) as a comparison measure to identify the topic with the highest lifetime attractiveness in a timestamped corpus. We evaluate our model on a large corpus of Twitter data, including more than one million tweets in the domain of the disinformation campaigns conducted against the White Helmets of Syria. Our results indicate that the proposed framework is effective in identifying topical trends, as well as extracting narrative summaries from text corpus with timestamped data.
△ Less
Submitted 14 April, 2020;
originally announced April 2020.
-
Deep Agent: Studying the Dynamics of Information Spread and Evolution in Social Networks
Authors:
Ivan Garibay,
Toktam A. Oghaz,
Niloofar Yousefi,
Ece C. Mutlu,
Madeline Schiappa,
Steven Scheinert,
Georgios C. Anagnostopoulos,
Christina Bouwens,
Stephen M. Fiore,
Alexander Mantzaris,
John T. Murphy,
William Rand,
Anastasia Salter,
Mel Stanfill,
Gita Sukthankar,
Nisha Baral,
Gabriel Fair,
Chathika Gunaratne,
Neda B. Hajiakhoond,
Jasser Jasser,
Chathura Jayalath,
Olivia Newton,
Samaneh Saadat,
Chathurani Senevirathna,
Rachel Winter
, et al. (1 additional authors not shown)
Abstract:
This paper explains the design of a social network analysis framework, developed under DARPA's SocialSim program, with novel architecture that models human emotional, cognitive and social factors. Our framework is both theory and data-driven, and utilizes domain expertise. Our simulation effort helps in understanding how information flows and evolves in social media platforms. We focused on modeli…
▽ More
This paper explains the design of a social network analysis framework, developed under DARPA's SocialSim program, with novel architecture that models human emotional, cognitive and social factors. Our framework is both theory and data-driven, and utilizes domain expertise. Our simulation effort helps in understanding how information flows and evolves in social media platforms. We focused on modeling three information domains: cryptocurrencies, cyber threats, and software vulnerabilities for the three interrelated social environments: GitHub, Reddit, and Twitter. We participated in the SocialSim DARPA Challenge in December 2018, in which our models were subjected to extensive performance evaluation for accuracy, generalizability, explainability, and experimental power. This paper reports the main concepts and models, utilized in our social media modeling effort in developing a multi-resolution simulation at the user, community, population, and content levels.
△ Less
Submitted 29 May, 2021; v1 submitted 25 March, 2020;
originally announced March 2020.
-
Facial Expression Phoenix (FePh): An Annotated Sequenced Dataset for Facial and Emotion-Specified Expressions in Sign Language
Authors:
Marie Alaghband,
Niloofar Yousefi,
Ivan Garibay
Abstract:
Facial expressions are important parts of both gesture and sign language recognition systems. Despite the recent advances in both fields, annotated facial expression datasets in the context of sign language are still scarce resources. In this manuscript, we introduce an annotated sequenced facial expression dataset in the context of sign language, comprising over $3000$ facial images extracted fro…
▽ More
Facial expressions are important parts of both gesture and sign language recognition systems. Despite the recent advances in both fields, annotated facial expression datasets in the context of sign language are still scarce resources. In this manuscript, we introduce an annotated sequenced facial expression dataset in the context of sign language, comprising over $3000$ facial images extracted from the daily news and weather forecast of the public tv-station PHOENIX. Unlike the majority of currently existing facial expression datasets, FePh provides sequenced semi-blurry facial images with different head poses, orientations, and movements. In addition, in the majority of images, identities are mouthing the words, which makes the data more challenging. To annotate this dataset we consider primary, secondary, and tertiary dyads of seven basic emotions of "sad", "surprise", "fear", "angry", "neutral", "disgust", and "happy". We also considered the "None" class if the image's facial expression could not be described by any of the aforementioned emotions. Although we provide FePh as a facial expression dataset of signers in sign language, it has a wider application in gesture recognition and Human Computer Interaction (HCI) systems.
△ Less
Submitted 8 September, 2020; v1 submitted 2 March, 2020;
originally announced March 2020.
-
Workload Scheduling on heterogeneous Mobile Edge Cloud in 5G networks to Minimize SLA Violation
Authors:
Mostafa Hadadian Nejad Yousefi,
Amirmasoud Ghiassi,
Boshra Sadat Hashemi,
Maziar Goudarzi
Abstract:
Smart devices have become an indispensable part of our lives and gain increasing applicability in almost every area. Latency-aware applications such as Augmented Reality (AR), autonomous driving, and online gaming demand more resources such as network bandwidth and computational capabilities. Since the traditional mobile networks cannot fulfill the required bandwidth and latency, Mobile Edge Cloud…
▽ More
Smart devices have become an indispensable part of our lives and gain increasing applicability in almost every area. Latency-aware applications such as Augmented Reality (AR), autonomous driving, and online gaming demand more resources such as network bandwidth and computational capabilities. Since the traditional mobile networks cannot fulfill the required bandwidth and latency, Mobile Edge Cloud (MEC) emerged to provide cloud computing capabilities in the proximity of users on 5G networks. In this paper, we consider a heterogeneous MEC network with numerous mobile users that send their tasks to MEC servers. Each task has a maximum acceptable response time. Non-uniform distribution of users makes some MEC servers hotspots that cannot take more. A solution is to relocate the tasks among MEC servers, called Workload Migration. We formulate this problem of task scheduling as a mixed-integer non-linear optimization problem to minimize the number of Service Level Agreement (SLA) violations. Since solving this optimization problem has high computational complexity, we introduce a greedy algorithm called MESA, Migration Enabled Scheduling Algorithm, which reaches a near-optimal solution quickly. Our experiments show that in the term of SLA violation, MESA is only 8% and 11% far from the optimal choice on the average and the worst-case, respectively. Moreover, the migration enabled solution can reduce SLA violations by about 30% compare to assigning tasks to MEC servers without migration.
△ Less
Submitted 21 March, 2020; v1 submitted 5 March, 2020;
originally announced March 2020.
-
A storage expansion planning framework using reinforcement learning and simulation-based optimization
Authors:
S. Tsianikas,
N. Yousefi,
J. Zhou,
M. Rodgers,
D. W. Coit
Abstract:
In the wake of the highly electrified future ahead of us, the role of energy storage is crucial wherever distributed generation is abundant, such as in microgrid settings. Given the variety of storage options that are becoming more and more economical, determining which type of storage technology to invest in, along with the appropriate timing and capacity becomes a critical research question. It…
▽ More
In the wake of the highly electrified future ahead of us, the role of energy storage is crucial wherever distributed generation is abundant, such as in microgrid settings. Given the variety of storage options that are becoming more and more economical, determining which type of storage technology to invest in, along with the appropriate timing and capacity becomes a critical research question. It is inevitable that these problems will continue to become increasingly relevant in the future and require strategic planning and holistic and modern frameworks in order to be solved. Reinforcement Learning algorithms have already proven to be successful in problems where sequential decision-making is inherent. In the operations planning area, these algorithms are already used but mostly in short-term problems with well-defined constraints. On the contrary, we expand and tailor these techniques to long-term planning by utilizing model-free algorithms combined with simulation-based models. A model and expansion plan have been developed to optimally determine microgrid designs as they evolve to dynamically react to changing conditions and to exploit energy storage capabilities. We show that it is possible to derive better engineering solutions that would point to the types of energy storage units which could be at the core of future microgrid applications. Another key finding is that the optimal storage capacity threshold for a system depends heavily on the price movements of the available storage units. By utilizing the proposed approaches, it is possible to model inherent problem uncertainties and optimize the whole streamline of sequential investment decision-making.
△ Less
Submitted 24 March, 2021; v1 submitted 10 January, 2020;
originally announced January 2020.
-
A Comprehensive Survey on Machine Learning Techniques and User Authentication Approaches for Credit Card Fraud Detection
Authors:
Niloofar Yousefi,
Marie Alaghband,
Ivan Garibay
Abstract:
With the increase of credit card usage, the volume of credit card misuse also has significantly increased. As a result, financial organizations are working hard on developing and deploying credit card fraud detection methods, in order to adapt to ever-evolving, increasingly sophisticated defrauding strategies and identifying illicit transactions as quickly as possible to protect themselves and the…
▽ More
With the increase of credit card usage, the volume of credit card misuse also has significantly increased. As a result, financial organizations are working hard on developing and deploying credit card fraud detection methods, in order to adapt to ever-evolving, increasingly sophisticated defrauding strategies and identifying illicit transactions as quickly as possible to protect themselves and their customers. Compounding on the complex nature of such adverse strategies, credit card fraudulent activities are rare events compared to the number of legitimate transactions. Hence, the challenge to develop fraud detection that are accurate and efficient is substantially intensified and, as a consequence, credit card fraud detection has lately become a very active area of research. In this work, we provide a survey of current techniques most relevant to the problem of credit card fraud detection. We carry out our survey in two main parts. In the first part,we focus on studies utilizing classical machine learning models, which mostly employ traditional transnational features to make fraud predictions. These models typically rely on some static physical characteristics, such as what the user knows (knowledge-based method), or what he/she has access to (object-based method). In the second part of our survey, we review more advanced techniques of user authentication, which use behavioral biometrics to identify an individual based on his/her unique behavior while he/she is interacting with his/her electronic devices. These approaches rely on how people behave (instead of what they do), which cannot be easily forged. By providing an overview of current approaches and the results reported in the literature, this survey aims to drive the future research agenda for the community in order to develop more accurate, reliable and scalable models of credit card fraud detection.
△ Less
Submitted 2 December, 2019;
originally announced December 2019.
-
DeepFork: Supervised Prediction of Information Diffusion in GitHub
Authors:
Ramya Akula,
Niloofar Yousefi,
Ivan Garibay
Abstract:
Information spreads on complex social networks extremely fast, in other words, a piece of information can go viral within no time. Often it is hard to barricade this diffusion prior to the significant occurrence of chaos, be it a social media or an online coding platform. GitHub is one such trending online focal point for any business to reach their potential contributors and customers, simultaneo…
▽ More
Information spreads on complex social networks extremely fast, in other words, a piece of information can go viral within no time. Often it is hard to barricade this diffusion prior to the significant occurrence of chaos, be it a social media or an online coding platform. GitHub is one such trending online focal point for any business to reach their potential contributors and customers, simultaneously. By exploiting such software development paradigm, millions of free software emerged lately in diverse communities. To understand human influence, information spread and evolution of transmitted information among assorted users in GitHub, we developed a deep neural network model: DeepFork, a supervised machine learning based approach that aims to predict information diffusion in complex social networks; considering node as well as topological features. In our empirical studies, we observed that information diffusion can be detected by link prediction using supervised learning. DeepFork outperforms other machine learning models as it better learns the discriminative patterns from the input features. DeepFork aids in understanding information spread and evolution through a bipartite network of users and repositories i.e., information flow from a user to repository to user.
△ Less
Submitted 17 October, 2019;
originally announced October 2019.
-
Multi-Task Learning Using Neighborhood Kernels
Authors:
Niloofar Yousefi,
Cong Li,
Mansooreh Mollaghasemi,
Georgios Anagnostopoulos,
Michael Georgiopoulos
Abstract:
This paper introduces a new and effective algorithm for learning kernels in a Multi-Task Learning (MTL) setting. Although, we consider a MTL scenario here, our approach can be easily applied to standard single task learning, as well. As shown by our empirical results, our algorithm consistently outperforms the traditional kernel learning algorithms such as uniform combination solution, convex comb…
▽ More
This paper introduces a new and effective algorithm for learning kernels in a Multi-Task Learning (MTL) setting. Although, we consider a MTL scenario here, our approach can be easily applied to standard single task learning, as well. As shown by our empirical results, our algorithm consistently outperforms the traditional kernel learning algorithms such as uniform combination solution, convex combinations of base kernels as well as some kernel alignment-based models, which have been proven to give promising results in the past. We present a Rademacher complexity bound based on which a new Multi-Task Multiple Kernel Learning (MT-MKL) model is derived. In particular, we propose a Support Vector Machine-regularized model in which, for each task, an optimal kernel is learned based on a neighborhood-defining kernel that is not restricted to be positive semi-definite. Comparative experimental results are showcased that underline the merits of our neighborhood-defining framework in both classification and regression problems.
△ Less
Submitted 11 July, 2017;
originally announced July 2017.
-
Local Rademacher Complexity-based Learning Guarantees for Multi-Task Learning
Authors:
Niloofar Yousefi,
Yunwen Lei,
Marius Kloft,
Mansooreh Mollaghasemi,
Georgios Anagnostopoulos
Abstract:
We show a Talagrand-type concentration inequality for Multi-Task Learning (MTL), using which we establish sharp excess risk bounds for MTL in terms of distribution- and data-dependent versions of the Local Rademacher Complexity (LRC). We also give a new bound on the LRC for norm regularized as well as strongly convex hypothesis classes, which applies not only to MTL but also to the standard i.i.d.…
▽ More
We show a Talagrand-type concentration inequality for Multi-Task Learning (MTL), using which we establish sharp excess risk bounds for MTL in terms of distribution- and data-dependent versions of the Local Rademacher Complexity (LRC). We also give a new bound on the LRC for norm regularized as well as strongly convex hypothesis classes, which applies not only to MTL but also to the standard i.i.d. setting. Combining both results, one can now easily derive fast-rate bounds on the excess risk for many prominent MTL methods, including---as we demonstrate---Schatten-norm, group-norm, and graph-regularized MTL. The derived bounds reflect a relationship akeen to a conservation law of asymptotic convergence rates. This very relationship allows for trading off slower rates w.r.t. the number of tasks for faster rates with respect to the number of available samples per task, when compared to the rates obtained via a traditional, global Rademacher analysis.
△ Less
Submitted 9 February, 2017; v1 submitted 18 February, 2016;
originally announced February 2016.
-
Multi-Task Learning with Group-Specific Feature Space Sharing
Authors:
Niloofar Yousefi,
Michael Georgiopoulos,
Georgios C. Anagnostopoulos
Abstract:
When faced with learning a set of inter-related tasks from a limited amount of usable data, learning each task independently may lead to poor generalization performance. Multi-Task Learning (MTL) exploits the latent relations between tasks and overcomes data scarcity limitations by co-learning all these tasks simultaneously to offer improved performance. We propose a novel Multi-Task Multiple Kern…
▽ More
When faced with learning a set of inter-related tasks from a limited amount of usable data, learning each task independently may lead to poor generalization performance. Multi-Task Learning (MTL) exploits the latent relations between tasks and overcomes data scarcity limitations by co-learning all these tasks simultaneously to offer improved performance. We propose a novel Multi-Task Multiple Kernel Learning framework based on Support Vector Machines for binary classification tasks. By considering pair-wise task affinity in terms of similarity between a pair's respective feature spaces, the new framework, compared to other similar MTL approaches, offers a high degree of flexibility in determining how similar feature spaces should be, as well as which pairs of tasks should share a common feature space in order to benefit overall performance. The associated optimization problem is solved via a block coordinate descent, which employs a consensus-form Alternating Direction Method of Multipliers algorithm to optimize the Multiple Kernel Learning weights and, hence, to determine task affinities. Empirical evaluation on seven data sets exhibits a statistically significant improvement of our framework's results compared to the ones of several other Clustered Multi-Task Learning methods.
△ Less
Submitted 13 August, 2015;
originally announced August 2015.