-
Multi-Scale Convolutional LSTM with Transfer Learning for Anomaly Detection in Cellular Networks
Authors:
Nooruddin Noonari,
Daniel Corujo,
Rui L. Aguiar,
Francisco J. Ferrao
Abstract:
The rapid growth in mobile broadband usage and increasing subscribers have made it crucial to ensure reliable network performance. As mobile networks grow more complex, especially during peak hours, manual collection of Key Performance Indicators (KPIs) is time-consuming due to the vast data involved. Detecting network failures and identifying unusual behavior during busy periods is vital to asses…
▽ More
The rapid growth in mobile broadband usage and increasing subscribers have made it crucial to ensure reliable network performance. As mobile networks grow more complex, especially during peak hours, manual collection of Key Performance Indicators (KPIs) is time-consuming due to the vast data involved. Detecting network failures and identifying unusual behavior during busy periods is vital to assess network health. Researchers have applied Deep Learning (DL) and Machine Learning (ML) techniques to understand network behavior by predicting throughput, analyzing call records, and detecting outages. However, these methods often require significant computational power, large labeled datasets, and are typically specialized, making retraining for new scenarios costly and time-intensive.
This study introduces a novel approach Multi-Scale Convolutional LSTM with Transfer Learning (TL) to detect anomalies in cellular networks. The model is initially trained from scratch using a publicly available dataset to learn typical network behavior. Transfer Learning is then employed to fine-tune the model by applying learned weights to different datasets. We compare the performance of the model trained from scratch with that of the fine-tuned model using TL. To address class imbalance and gain deeper insights, Exploratory Data Analysis (EDA) and the Synthetic Minority Over-sampling Technique (SMOTE) are applied. Results demonstrate that the model trained from scratch achieves 99% accuracy after 100 epochs, while the fine-tuned model reaches 95% accuracy on a different dataset after just 20 epochs.
△ Less
Submitted 30 September, 2024;
originally announced October 2024.
-
Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use
Authors:
Franz Louis Cesista,
Rui Aguiar,
Jason Kim,
Paolo Acilo
Abstract:
Business Document Information Extraction (BDIE) is the problem of transforming a blob of unstructured information (raw text, scanned documents, etc.) into a structured format that downstream systems can parse and use. It has two main tasks: Key-Information Extraction (KIE) and Line Items Recognition (LIR). In this paper, we argue that BDIE is best modeled as a Tool Use problem, where the tools are…
▽ More
Business Document Information Extraction (BDIE) is the problem of transforming a blob of unstructured information (raw text, scanned documents, etc.) into a structured format that downstream systems can parse and use. It has two main tasks: Key-Information Extraction (KIE) and Line Items Recognition (LIR). In this paper, we argue that BDIE is best modeled as a Tool Use problem, where the tools are these downstream systems. We then present Retrieval Augmented Structured Generation (RASG), a novel general framework for BDIE that achieves state of the art (SOTA) results on both KIE and LIR tasks on BDIE benchmarks.
The contributions of this paper are threefold: (1) We show, with ablation benchmarks, that Large Language Models (LLMs) with RASG are already competitive with or surpasses current SOTA Large Multimodal Models (LMMs) without RASG on BDIE benchmarks. (2) We propose a new metric class for Line Items Recognition, General Line Items Recognition Metric (GLIRM), that is more aligned with practical BDIE use cases compared to existing metrics, such as ANLS*, DocILE, and GriTS. (3) We provide a heuristic algorithm for backcalculating bounding boxes of predicted line items and tables without the need for vision encoders. Finally, we claim that, while LMMs might sometimes offer marginal performance benefits, LLMs + RASG is oftentimes superior given real-world applications and constraints of BDIE.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Effect of the initial configuration of weights on the training and function of artificial neural networks
Authors:
R. J. Jesus,
M. L. Antunes,
R. A. da Costa,
S. N. Dorogovtsev,
J. F. F. Mendes,
R. L. Aguiar
Abstract:
The function and performance of neural networks is largely determined by the evolution of their weights and biases in the process of training, starting from the initial configuration of these parameters to one of the local minima of the loss function. We perform the quantitative statistical characterization of the deviation of the weights of two-hidden-layer ReLU networks of various sizes trained…
▽ More
The function and performance of neural networks is largely determined by the evolution of their weights and biases in the process of training, starting from the initial configuration of these parameters to one of the local minima of the loss function. We perform the quantitative statistical characterization of the deviation of the weights of two-hidden-layer ReLU networks of various sizes trained via Stochastic Gradient Descent (SGD) from their initial random configuration. We compare the evolution of the distribution function of this deviation with the evolution of the loss during training. We observed that successful training via SGD leaves the network in the close neighborhood of the initial configuration of its weights. For each initial weight of a link we measured the distribution function of the deviation from this value after training and found how the moments of this distribution and its peak depend on the initial weight. We explored the evolution of these deviations during training and observed an abrupt increase within the overfitting region. This jump occurs simultaneously with a similarly abrupt increase recorded in the evolution of the loss function. Our results suggest that SGD's ability to efficiently find local minima is restricted to the vicinity of the random initial configuration of weights.
△ Less
Submitted 4 December, 2020;
originally announced December 2020.
-
Selecting Regions of Interest in Large Multi-Scale Images for Cancer Pathology
Authors:
Rui Aguiar,
Jon Braatz
Abstract:
Recent breakthroughs in object detection and image classification using Convolutional Neural Networks (CNNs) are revolutionizing the state of the art in medical imaging, and microscopy in particular presents abundant opportunities for computer vision algorithms to assist medical professionals in diagnosis of diseases ranging from malaria to cancer. High resolution scans of microscopy slides called…
▽ More
Recent breakthroughs in object detection and image classification using Convolutional Neural Networks (CNNs) are revolutionizing the state of the art in medical imaging, and microscopy in particular presents abundant opportunities for computer vision algorithms to assist medical professionals in diagnosis of diseases ranging from malaria to cancer. High resolution scans of microscopy slides called Whole Slide Images (WSIs) offer enough information for a cancer pathologist to come to a conclusion regarding cancer presence, subtype, and severity based on measurements of features within the slide image at multiple scales and resolutions. WSIs' extremely high resolutions and feature scales ranging from gross anatomical structures down to cell nuclei preclude the use of standard CNN models for object detection and classification, which have typically been designed for images with dimensions in the hundreds of pixels and with objects on the order of the size of the image itself. We explore parallel approaches based on Reinforcement Learning and Beam Search to learn to progressively zoom into the WSI to detect Regions of Interest (ROIs) in liver pathology slides containing one of two types of liver cancer, namely Hepatocellular Carcinoma (HCC) and Cholangiocarcinoma (CC). These ROIs can then be presented directly to the pathologist to aid in measurement and diagnosis or be used for automated classification of tumor subtype.
△ Less
Submitted 3 July, 2020;
originally announced July 2020.
-
Exploring Optimal Control With Observations at a Cost
Authors:
Rui Aguiar,
Nikka Mofid,
Hyunji Alex Nam
Abstract:
There has been a current trend in reinforcement learning for healthcare literature, where in order to prepare clinical datasets, researchers will carry forward the last results of the non-administered test known as the last-observation-carried-forward (LOCF) value to fill in gaps, assuming that it is still an accurate indicator of the patient's current state. These values are carried forward witho…
▽ More
There has been a current trend in reinforcement learning for healthcare literature, where in order to prepare clinical datasets, researchers will carry forward the last results of the non-administered test known as the last-observation-carried-forward (LOCF) value to fill in gaps, assuming that it is still an accurate indicator of the patient's current state. These values are carried forward without maintaining information about exactly how these values were imputed, leading to ambiguity. Our approach models this problem using OpenAI Gym's Mountain Car and aims to address when to observe the patient's physiological state and partly how to intervene, as we have assumed we can only act after following an observation. So far, we have found that for a last-observation-carried-forward implementation of the state space, augmenting the state with counters for each state variable tracking the time since last observation was made, improves the predictive performance of an agent, supporting the notion of "informative missingness", and using a neural network based Dynamics Model to predict the most probable next state value of non-observed state variables instead of carrying forward the last observed value through LOCF further improves the agent's performance, leading to faster convergence and reduced variance.
△ Less
Submitted 28 June, 2020;
originally announced June 2020.
-
Autonomous Haiku Generation
Authors:
Rui Aguiar,
Kevin Liao
Abstract:
Artificial Intelligence is an excellent tool to improve efficiency and lower cost in many quantitative real world applications, but what if the task is not easily defined? What if the task is generating creativity? Poetry is a creative endeavor that is highly difficult to both grasp and achieve with any level of competence. As Rita Dove, a famous American poet and author states, "Poetry is languag…
▽ More
Artificial Intelligence is an excellent tool to improve efficiency and lower cost in many quantitative real world applications, but what if the task is not easily defined? What if the task is generating creativity? Poetry is a creative endeavor that is highly difficult to both grasp and achieve with any level of competence. As Rita Dove, a famous American poet and author states, "Poetry is language at its most distilled and most powerful." Taking Doves quote as an inspiration, our task was to generate high quality haikus using artificial intelligence and deep learning.
△ Less
Submitted 20 June, 2019;
originally announced June 2019.
-
Decentralized Resource Discovery and Management for Future Manycore Systems
Authors:
Javad Zarrin,
Rui L. Aguiar,
Joao Paulo Barraca
Abstract:
The next generation of many-core enabled large-scale computing systems relies on thousands of billions of heterogeneous processing cores connected to form a single computing unit. In such large-scale computing environments, resource management is one of the most challenging, and complex issues for efficient resource sharing and utilization, particularly as we move toward Future ManyCore Systems (F…
▽ More
The next generation of many-core enabled large-scale computing systems relies on thousands of billions of heterogeneous processing cores connected to form a single computing unit. In such large-scale computing environments, resource management is one of the most challenging, and complex issues for efficient resource sharing and utilization, particularly as we move toward Future ManyCore Systems (FMCS). This work proposes a novel resource management scheme for future peta-scale many-core-enabled computing systems, based on hybrid adaptive resource discovery, called ElCore. The proposed architecture contains a set of modules which will dynamically be instantiated on the nodes in the distributed system on demand. Our approach provides flexibility to allocate the required set of resources for various types of processes/applications. It can also be considered as a generic solution (with respect to the general requirements of large scale computing environments) which brings a set of interesting features (such as auto-scaling, multitenancy, multi-dimensional mapping, etc,.) to facilitate its easy adaptation to any distributed technology (such as SOA, Grid and HPC many-core). The achieved evaluation results assured the significant scalability and the high quality resource mapping of the proposed resource discovery and management over highly heterogeneous, hierarchical and dynamic computing environments with respect to several scalability and efficiency aspects while supporting flexible and complex queries with guaranteed discovery results accuracy. The simulation results prove that, using our approach, the mapping between processes and resources can be done with high level of accuracy which potentially leads to a significant enhancement in the overall system performance.
△ Less
Submitted 10 October, 2017;
originally announced October 2017.
-
Artificial Intelligence MArkup Language: A Brief Tutorial
Authors:
Maria das Graças Bruno Marietto,
Rafael Varago de Aguiar,
Gislene de Oliveira Barbosa,
Wagner Tanaka Botelho,
Edson Pimentel,
Robson dos Santos França,
Vera Lúcia da Silva
Abstract:
The purpose of this paper is to serve as a reference guide for the development of chatterbots implemented with the AIML language. In order to achieve this, the main concepts in Pattern Recognition area are described because the AIML uses such theoretical framework in their syntactic and semantic structures. After that, AIML language is described and each AIML command/tag is followed by an applicat…
▽ More
The purpose of this paper is to serve as a reference guide for the development of chatterbots implemented with the AIML language. In order to achieve this, the main concepts in Pattern Recognition area are described because the AIML uses such theoretical framework in their syntactic and semantic structures. After that, AIML language is described and each AIML command/tag is followed by an application example. Also, the usage of AIML embedded tags for the handling of sequence dialogue limitations between humans and machines is shown. Finally, computer systems that assist in the design of chatterbots with the AIML language are classified and described.
△ Less
Submitted 11 July, 2013;
originally announced July 2013.