Search | arXiv e-print repository

From Anger to Joy: How Nationality Personas Shape Emotion Attribution in Large Language Models

Authors: Mahammed Kamruzzaman, Abdullah Al Monsur, Gene Louis Kim, Anshuman Chhabra

Abstract: Emotions are a fundamental facet of human experience, varying across individuals, cultural contexts, and nationalities. Given the recent success of Large Language Models (LLMs) as role-playing agents, we examine whether LLMs exhibit emotional stereotypes when assigned nationality-specific personas. Specifically, we investigate how different countries are represented in pre-trained LLMs through emo… ▽ More Emotions are a fundamental facet of human experience, varying across individuals, cultural contexts, and nationalities. Given the recent success of Large Language Models (LLMs) as role-playing agents, we examine whether LLMs exhibit emotional stereotypes when assigned nationality-specific personas. Specifically, we investigate how different countries are represented in pre-trained LLMs through emotion attributions and whether these attributions align with cultural norms. Our analysis reveals significant nationality-based differences, with emotions such as shame, fear, and joy being disproportionately assigned across regions. Furthermore, we observe notable misalignment between LLM-generated and human emotional responses, particularly for negative emotions, highlighting the presence of reductive and potentially biased stereotypes in LLM outputs. △ Less

Submitted 3 June, 2025; originally announced June 2025.

arXiv:2506.00256 [pdf, ps, other]

The Impact of Disability Disclosure on Fairness and Bias in LLM-Driven Candidate Selection

Authors: Mahammed Kamruzzaman, Gene Louis Kim

Abstract: As large language models (LLMs) become increasingly integrated into hiring processes, concerns about fairness have gained prominence. When applying for jobs, companies often request/require demographic information, including gender, race, and disability or veteran status. This data is collected to support diversity and inclusion initiatives, but when provided to LLMs, especially disability-related… ▽ More As large language models (LLMs) become increasingly integrated into hiring processes, concerns about fairness have gained prominence. When applying for jobs, companies often request/require demographic information, including gender, race, and disability or veteran status. This data is collected to support diversity and inclusion initiatives, but when provided to LLMs, especially disability-related information, it raises concerns about potential biases in candidate selection outcomes. Many studies have highlighted how disability can impact CV screening, yet little research has explored the specific effect of voluntarily disclosed information on LLM-driven candidate selection. This study seeks to bridge that gap. When candidates shared identical gender, race, qualifications, experience, and backgrounds, and sought jobs with minimal employment rate gaps between individuals with and without disabilities (e.g., Cashier, Software Developer), LLMs consistently favored candidates who disclosed that they had no disability. Even in cases where candidates chose not to disclose their disability status, the LLMs were less likely to select them compared to those who explicitly stated they did not have a disability. △ Less

Submitted 30 May, 2025; originally announced June 2025.

Comments: Accepted at The 38th International FLAIRS Conference (FLAIRS 2025)(main)

arXiv:2409.11638 [pdf, ps, other]

BanStereoSet: A Dataset to Measure Stereotypical Social Biases in LLMs for Bangla

Authors: Mahammed Kamruzzaman, Abdullah Al Monsur, Shrabon Das, Enamul Hassan, Gene Louis Kim

Abstract: This study presents BanStereoSet, a dataset designed to evaluate stereotypical social biases in multilingual LLMs for the Bangla language. In an effort to extend the focus of bias research beyond English-centric datasets, we have localized the content from the StereoSet, IndiBias, and Kamruzzaman et. al.'s datasets, producing a resource tailored to capture biases prevalent within the Bangla-speaki… ▽ More This study presents BanStereoSet, a dataset designed to evaluate stereotypical social biases in multilingual LLMs for the Bangla language. In an effort to extend the focus of bias research beyond English-centric datasets, we have localized the content from the StereoSet, IndiBias, and Kamruzzaman et. al.'s datasets, producing a resource tailored to capture biases prevalent within the Bangla-speaking community. Our BanStereoSet dataset consists of 1,194 sentences spanning 9 categories of bias: race, profession, gender, ageism, beauty, beauty in profession, region, caste, and religion. This dataset not only serves as a crucial tool for measuring bias in multilingual LLMs but also facilitates the exploration of stereotypical bias across different social categories, potentially guiding the development of more equitable language technologies in Bangladeshi contexts. Our analysis of several language models using this dataset indicates significant biases, reinforcing the necessity for culturally and linguistically adapted datasets to develop more equitable language technologies. △ Less

Submitted 29 May, 2025; v1 submitted 17 September, 2024; originally announced September 2024.

Comments: Accepted at ACL-2025

arXiv:2409.11636 [pdf, other]

"A Woman is More Culturally Knowledgeable than A Man?": The Effect of Personas on Cultural Norm Interpretation in LLMs

Authors: Mahammed Kamruzzaman, Hieu Nguyen, Nazmul Hassan, Gene Louis Kim

Abstract: As the deployment of large language models (LLMs) expands, there is an increasing demand for personalized LLMs. One method to personalize and guide the outputs of these models is by assigning a persona -- a role that describes the expected behavior of the LLM (e.g., a man, a woman, an engineer). This study investigates whether an LLM's understanding of social norms varies across assigned personas.… ▽ More As the deployment of large language models (LLMs) expands, there is an increasing demand for personalized LLMs. One method to personalize and guide the outputs of these models is by assigning a persona -- a role that describes the expected behavior of the LLM (e.g., a man, a woman, an engineer). This study investigates whether an LLM's understanding of social norms varies across assigned personas. Ideally, the perception of a social norm should remain consistent regardless of the persona, since acceptability of a social norm should be determined by the region the norm originates from, rather than by individual characteristics such as gender, body size, or race. A norm is universal within its cultural context. In our research, we tested 36 distinct personas from 12 sociodemographic categories (e.g., age, gender, beauty) across four different LLMs. We find that LLMs' cultural norm interpretation varies based on the persona used and the norm interpretation also varies within a sociodemographic category (e.g., a fat person and a thin person as in physical appearance group) where an LLM with the more socially desirable persona (e.g., a thin person) interprets social norms more accurately than with the less socially desirable persona (e.g., a fat person). We also discuss how different types of social biases may contribute to the results that we observe. △ Less

Submitted 17 September, 2024; originally announced September 2024.

Comments: Preprint, Under Review

arXiv:2406.13997 [pdf, other]

"Global is Good, Local is Bad?": Understanding Brand Bias in LLMs

Authors: Mahammed Kamruzzaman, Hieu Minh Nguyen, Gene Louis Kim

Abstract: Many recent studies have investigated social biases in LLMs but brand bias has received little attention. This research examines the biases exhibited by LLMs towards different brands, a significant concern given the widespread use of LLMs in affected use cases such as product recommendation and market analysis. Biased models may perpetuate societal inequalities, unfairly favoring established globa… ▽ More Many recent studies have investigated social biases in LLMs but brand bias has received little attention. This research examines the biases exhibited by LLMs towards different brands, a significant concern given the widespread use of LLMs in affected use cases such as product recommendation and market analysis. Biased models may perpetuate societal inequalities, unfairly favoring established global brands while marginalizing local ones. Using a curated dataset across four brand categories, we probe the behavior of LLMs in this space. We find a consistent pattern of bias in this space -- both in terms of disproportionately associating global brands with positive attributes and disproportionately recommending luxury gifts for individuals in high-income countries. We also find LLMs are subject to country-of-origin effects which may boost local brand preference in LLM outputs in specific contexts. △ Less

Submitted 27 September, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: Accepted at EMNLP-2024 (main)

arXiv:2406.13993 [pdf, other]

Exploring Changes in Nation Perception with Nationality-Assigned Personas in LLMs

Authors: Mahammed Kamruzzaman, Gene Louis Kim

Abstract: Persona assignment has become a common strategy for customizing LLM use to particular tasks and contexts. In this study, we explore how evaluation of different nations change when LLMs are assigned specific nationality personas. We assign 193 different nationality personas (e.g., an American person) to four LLMs and examine how the LLM evaluations (or ''perceptions'')of countries change. We find t… ▽ More Persona assignment has become a common strategy for customizing LLM use to particular tasks and contexts. In this study, we explore how evaluation of different nations change when LLMs are assigned specific nationality personas. We assign 193 different nationality personas (e.g., an American person) to four LLMs and examine how the LLM evaluations (or ''perceptions'')of countries change. We find that all LLM-persona combinations tend to favor Western European nations, though nation-personas push LLM behaviors to focus more on and treat the nation-persona's own region more favorably. Eastern European, Latin American, and African nations are treated more negatively by different nationality personas. We additionally find that evaluations by nation-persona LLMs of other nations correlate with human survey responses but fail to match the values closely. Our study provides insight into how biases and stereotypes are realized within LLMs when adopting different national personas. In line with the ''Blueprint for an AI Bill of Rights'', our findings underscore the critical need for developing mechanisms to ensure that LLM outputs promote fairness and avoid over-generalization. △ Less

Submitted 16 October, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: Pre-print, Under review

arXiv:2405.13843 [pdf]

doi 10.1016/j.atech.2024.100533

Hyperspectral Image Reconstruction for Predicting Chick Embryo Mortality Towards Advancing Egg and Hatchery Industry

Authors: Md. Toukir Ahmed, Md Wadud Ahmed, Ocean Monjur, Jason Lee Emmert, Girish Chowdhary, Mohammed Kamruzzaman

Abstract: As the demand for food surges and the agricultural sector undergoes a transformative shift towards sustainability and efficiency, the need for precise and proactive measures to ensure the health and welfare of livestock becomes paramount. In the context of the broader agricultural landscape outlined, the application of Hyperspectral Imaging (HSI) takes on profound significance. HSI has emerged as… ▽ More As the demand for food surges and the agricultural sector undergoes a transformative shift towards sustainability and efficiency, the need for precise and proactive measures to ensure the health and welfare of livestock becomes paramount. In the context of the broader agricultural landscape outlined, the application of Hyperspectral Imaging (HSI) takes on profound significance. HSI has emerged as a cutting-edge, non-destructive technique for fast and accurate egg quality analysis, including the detection of chick embryo mortality. However, the high cost and operational complexity compared to conventional RGB imaging are significant bottlenecks in the widespread adoption of HSI technology. To overcome these hurdles and unlock the full potential of HSI, a promising solution is hyperspectral image reconstruction from standard RGB images. This study aims to reconstruct hyperspectral images from RGB images for non-destructive early prediction of chick embryo mortality. Firstly, the performance of different image reconstruction algorithms, such as HRNET, MST++, Restormer, and EDSR were compared to reconstruct the hyperspectral images of the eggs in the early incubation period. Later, the reconstructed spectra were used to differentiate live from dead chick-producing eggs using the XGBoost and Random Forest classification methods. Among the reconstruction methods, HRNET showed impressive reconstruction performance with MRAE of 0.0955, RMSE of 0.0159, and PSNR of 36.79 dB. This study motivated that harnessing imaging technology integrated with smart sensors and data analytics has the potential to improve automation, enhance biosecurity, and optimize resource management towards sustainable agriculture 4.0. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: Under review

Journal ref: Smart Agricultural Technology,Volume 9 , December 2024

arXiv:2405.13331 [pdf]

Comparative Analysis of Hyperspectral Image Reconstruction Using Deep Learning for Agricultural and Biological Applications

Authors: Md. Toukir Ahmed, Arthur Villordon, Mohammed Kamruzzaman

Abstract: Hyperspectral imaging (HSI) has become a key technology for non-invasive quality evaluation in various fields, offering detailed insights through spatial and spectral data. Despite its efficacy, the complexity and high cost of HSI systems have hindered their widespread adoption. This study addressed these challenges by exploring deep learning-based hyperspectral image reconstruction from RGB (Red,… ▽ More Hyperspectral imaging (HSI) has become a key technology for non-invasive quality evaluation in various fields, offering detailed insights through spatial and spectral data. Despite its efficacy, the complexity and high cost of HSI systems have hindered their widespread adoption. This study addressed these challenges by exploring deep learning-based hyperspectral image reconstruction from RGB (Red, Green, Blue) images, particularly for agricultural products. Specifically, different hyperspectral reconstruction algorithms, such as Hyperspectral Convolutional Neural Network - Dense (HSCNN-D), High-Resolution Network (HRNET), and Multi-Scale Transformer Plus Plus (MST++), were compared to assess the dry matter content of sweet potatoes. Among the tested reconstruction methods, HRNET demonstrated superior performance, achieving the lowest mean relative absolute error (MRAE) of 0.07, root mean square error (RMSE) of 0.03, and the highest peak signal-to-noise ratio (PSNR) of 32.28 decibels (dB). Some key features were selected using the genetic algorithm (GA), and their importance was interpreted using explainable artificial intelligence (XAI). Partial least squares regression (PLSR) models were developed using the RGB, reconstructed, and ground truth (GT) data. The visual and spectra quality of these reconstructed methods was compared with GT data, and predicted maps were generated. The results revealed the prospect of deep learning-based hyperspectral image reconstruction as a cost-effective and efficient quality assessment tool for agricultural and biological applications. △ Less

Submitted 2 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

Comments: Under review

arXiv:2405.12313 [pdf]

doi 10.1016/j.jfoodeng.2024.112223

Deep learning-based hyperspectral image reconstruction for quality assessment of agro-product

Authors: Md. Toukir Ahmed, Ocean Monjur, Mohammed Kamruzzaman

Abstract: Hyperspectral imaging (HSI) has recently emerged as a promising tool for many agricultural applications; however, the technology cannot be directly used in a real-time system due to the extensive time needed to process large volumes of data. Consequently, the development of a simple, compact, and cost-effective imaging system is not possible with the current HSI systems. Therefore, the overall goa… ▽ More Hyperspectral imaging (HSI) has recently emerged as a promising tool for many agricultural applications; however, the technology cannot be directly used in a real-time system due to the extensive time needed to process large volumes of data. Consequently, the development of a simple, compact, and cost-effective imaging system is not possible with the current HSI systems. Therefore, the overall goal of this study was to reconstruct hyperspectral images from RGB images through deep learning for agricultural applications. Specifically, this study used Hyperspectral Convolutional Neural Network - Dense (HSCNN-D) to reconstruct hyperspectral images from RGB images for predicting soluble solid content (SSC) in sweet potatoes. The algorithm accurately reconstructed the hyperspectral images from RGB images, with the resulting spectra closely matching the ground-truth. The partial least squares regression (PLSR) model based on reconstructed spectra outperformed the model using the full spectral range, demonstrating its potential for SSC prediction in sweet potatoes. These findings highlight the potential of deep learning-based hyperspectral image reconstruction as a low-cost, efficient tool for various agricultural uses. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: Under review

Journal ref: Journal of Food Engineering, Volume 382 , December 2024, 112223

arXiv:2404.17218 [pdf, other]

Prompting Techniques for Reducing Social Bias in LLMs through System 1 and System 2 Cognitive Processes

Authors: Mahammed Kamruzzaman, Gene Louis Kim

Abstract: Dual process theory posits that human cognition arises via two systems. System 1, which is a quick, emotional, and intuitive process, which is subject to cognitive biases, and System 2, is a slow, onerous, and deliberate process. NLP researchers often compare zero-shot prompting in LLMs to System 1 reasoning and chain-of-thought (CoT) prompting to System 2. In line with this interpretation, prior… ▽ More Dual process theory posits that human cognition arises via two systems. System 1, which is a quick, emotional, and intuitive process, which is subject to cognitive biases, and System 2, is a slow, onerous, and deliberate process. NLP researchers often compare zero-shot prompting in LLMs to System 1 reasoning and chain-of-thought (CoT) prompting to System 2. In line with this interpretation, prior research has found that using CoT prompting in LLMs leads to reduced gender bias. We investigate the relationship between bias, CoT prompting, a debiasing prompt, and dual process theory in LLMs directly. We compare zero-shot CoT, debiasing, and a variety of dual process theory-based prompting strategies on two bias datasets spanning nine different social bias categories. We incorporate human and machine personas to determine whether the effects of dual process theory in LLMs exist independent of explicit persona models or are based on modeling human cognition. We find that a human persona, debiasing, System 2, and CoT prompting all tend to reduce social biases in LLMs, though the best combination of features depends on the exact model and bias category -- resulting in up to a 19 percent drop in stereotypical judgments by an LLM. △ Less

Submitted 22 September, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

Comments: Pre-print, Under review

arXiv:2311.02570 [pdf, other]

BanMANI: A Dataset to Identify Manipulated Social Media News in Bangla

Authors: Mahammed Kamruzzaman, Md. Minul Islam Shovon, Gene Louis Kim

Abstract: Initial work has been done to address fake news detection and misrepresentation of news in the Bengali language. However, no work in Bengali yet addresses the identification of specific claims in social media news that falsely manipulates a related news article. At this point, this problem has been tackled in English and a few other languages, but not in the Bengali language. In this paper, we cur… ▽ More Initial work has been done to address fake news detection and misrepresentation of news in the Bengali language. However, no work in Bengali yet addresses the identification of specific claims in social media news that falsely manipulates a related news article. At this point, this problem has been tackled in English and a few other languages, but not in the Bengali language. In this paper, we curate a dataset of social media content labeled with information manipulation relative to reference articles, called BanMANI. The dataset collection method we describe works around the limitations of the available NLP tools in Bangla. We expect these techniques will carry over to building similar datasets in other low-resource languages. BanMANI forms the basis both for evaluating the capabilities of existing NLP systems and for training or fine-tuning new models specifically on this task. In our analysis, we find that this task challenges current LLMs both under zero-shot and fine-tuned settings. △ Less

Submitted 5 November, 2023; originally announced November 2023.

arXiv:2309.08902 [pdf, other]

Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models

Authors: Mahammed Kamruzzaman, Md. Minul Islam Shovon, Gene Louis Kim

Abstract: LLMs are increasingly powerful and widely used to assist users in a variety of tasks. This use risks the introduction of LLM biases to consequential decisions such as job hiring, human performance evaluation, and criminal sentencing. Bias in NLP systems along the lines of gender and ethnicity has been widely studied, especially for specific stereotypes (e.g., Asians are good at math). In this pape… ▽ More LLMs are increasingly powerful and widely used to assist users in a variety of tasks. This use risks the introduction of LLM biases to consequential decisions such as job hiring, human performance evaluation, and criminal sentencing. Bias in NLP systems along the lines of gender and ethnicity has been widely studied, especially for specific stereotypes (e.g., Asians are good at math). In this paper, we investigate bias along less-studied but still consequential, dimensions, such as age and beauty, measuring subtler correlated decisions that LLMs make between social groups and unrelated positive and negative attributes. We ask whether LLMs hold wide-reaching biases of positive or negative sentiment for specific social groups similar to the "what is beautiful is good" bias found in people in experimental psychology. We introduce a template-generated dataset of sentence completion tasks that asks the model to select the most appropriate attribute to complete an evaluative statement about a person described as a member of a specific social group. We also reverse the completion task to select the social group based on an attribute. We report the correlations that we find for 4 cutting-edge LLMs. This dataset can be used as a benchmark to evaluate progress in more generalized biases and the templating technique can be used to expand the benchmark with minimal additional human annotation. △ Less

Submitted 19 June, 2024; v1 submitted 16 September, 2023; originally announced September 2023.

Comments: Camera-ready version for Findings of ACL 2024

arXiv:2308.02022 [pdf, other]

Efficient Sentiment Analysis: A Resource-Aware Evaluation of Feature Extraction Techniques, Ensembling, and Deep Learning Models

Authors: Mahammed Kamruzzaman, Gene Louis Kim

Abstract: While reaching for NLP systems that maximize accuracy, other important metrics of system performance are often overlooked. Prior models are easily forgotten despite their possible suitability in settings where large computing resources are unavailable or relatively more costly. In this paper, we perform a broad comparative evaluation of document-level sentiment analysis models with a focus on reso… ▽ More While reaching for NLP systems that maximize accuracy, other important metrics of system performance are often overlooked. Prior models are easily forgotten despite their possible suitability in settings where large computing resources are unavailable or relatively more costly. In this paper, we perform a broad comparative evaluation of document-level sentiment analysis models with a focus on resource costs that are important for the feasibility of model deployment and general climate consciousness. Our experiments consider different feature extraction techniques, the effect of ensembling, task-specific deep learning modeling, and domain-independent large language models (LLMs). We find that while a fine-tuned LLM achieves the best accuracy, some alternate configurations provide huge (up to 24, 283 *) resource savings for a marginal (<1%) loss in accuracy. Furthermore, we find that for smaller datasets, the differences in accuracy shrink while the difference in resource consumption grows further. △ Less

Submitted 18 April, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

arXiv:2111.02004 [pdf]

BRACU Mongol Tori: Next Generation Mars Exploration Rover

Authors: Niaz Sharif Shourov, Masnur Rahman, Mohammad Zahirul Islam, Ali Ahsan, Syed Md Kamruzzaman, Saifur Rahman, Md Sakiluzzaman, Intisar Hasnain, Ekhwan Islam, Saiful Islam, Md. Khalilur Rhaman

Abstract: BRAC University (BRACU) has participated in the University Rover Challenge (URC), a robotics competition for university level students organized by the Mars Society to design and build a rover that would be of use to early explorers on Mars. BRACU has designed and developed a full functional next-generation mars rover, Mongol Tori, which can be operated in the extreme, hostile condition expected i… ▽ More BRAC University (BRACU) has participated in the University Rover Challenge (URC), a robotics competition for university level students organized by the Mars Society to design and build a rover that would be of use to early explorers on Mars. BRACU has designed and developed a full functional next-generation mars rover, Mongol Tori, which can be operated in the extreme, hostile condition expected in planet Mars. Not only has Mongol Tori embedded with both autonomous and manual controlled features to functionalize, it can also capable of conducting scientific tasks to identify the characteristics of soils and weathering in the mars environment. △ Less

Submitted 2 November, 2021; originally announced November 2021.

arXiv:2106.13397 [pdf, other]

Pheno-Mapper: An Interactive Toolbox for the Visual Exploration of Phenomics Data

Authors: Youjia Zhou, Methun Kamruzzaman, Patrick Schnable, Bala Krishnamoorthy, Ananth Kalyanaraman, Bei Wang

Abstract: High-throughput technologies to collect field data have made observations possible at scale in several branches of life sciences. The data collected can range from the molecular level (genotypes) to physiological (phenotypic traits) and environmental observations (e.g., weather, soil conditions). These vast swathes of data, collectively referred to as phenomics data, represent a treasure trove of… ▽ More High-throughput technologies to collect field data have made observations possible at scale in several branches of life sciences. The data collected can range from the molecular level (genotypes) to physiological (phenotypic traits) and environmental observations (e.g., weather, soil conditions). These vast swathes of data, collectively referred to as phenomics data, represent a treasure trove of key scientific knowledge on the dynamics of the underlying biological system. However, extracting information and insights from these complex datasets remains a significant challenge owing to their multidimensionality and lack of prior knowledge about their complex structure. In this paper, we present Pheno-Mapper, an interactive toolbox for the exploratory analysis and visualization of large-scale phenomics data. Our approach uses the mapper framework to perform a topological analysis of the data, and subsequently render visual representations with built-in data analysis and machine learning capabilities. We demonstrate the utility of this new tool on real-world plant (e.g., maize) phenomics datasets. In comparison to existing approaches, the main advantage of Pheno-Mapper is that it provides rich, interactive capabilities in the exploratory analysis of phenomics data, and it integrates visual analytics with data analysis and machine learning in an easily extensible way. In particular, Pheno-Mapper allows the interactive selection of subpopulations guided by a topological summary of the data and applies data mining and machine learning to these selected subpopulations for in-depth exploration. △ Less

Submitted 6 July, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

Comments: This is a preprint version. For a published version, please refer to ACM DOI: 10.1145/3459930.3469511

arXiv:1712.10197 [pdf, other]

Interesting Paths in the Mapper

Authors: Ananth Kalyanaraman, Methun Kamruzzaman, Bala Krishnamoorthy

Abstract: The Mapper produces a compact summary of high dimensional data as a simplicial complex. We study the problem of quantifying the interestingness of subpopulations in a Mapper, which appear as long paths, flares, or loops. First, we create a weighted directed graph G using the 1-skeleton of the Mapper. We use the average values at the vertices of a target function to direct edges (from low to high).… ▽ More The Mapper produces a compact summary of high dimensional data as a simplicial complex. We study the problem of quantifying the interestingness of subpopulations in a Mapper, which appear as long paths, flares, or loops. First, we create a weighted directed graph G using the 1-skeleton of the Mapper. We use the average values at the vertices of a target function to direct edges (from low to high). The difference between the average values at vertices (high-low) is set as the edge's weight. Covariation of the remaining h functions (independent variables) is captured by a h-bit binary signature assigned to the edge. An interesting path in G is a directed path whose edges all have the same signature. We define the interestingness score of such a path as a sum of its edge weights multiplied by a nonlinear function of their ranks in the path. Second, we study three optimization problems on this graph G. In the problem Max-IP, we seek an interesting path in G with the maximum interestingness score. We show that Max-IP is NP-complete. For the special case when G is a directed acyclic graph (DAG), we show that Max-IP can be solved in polynomial time - in O(mnd_i) where d_i is the maximum indegree of a vertex in G. In the more general problem IP, the goal is to find a collection of edge-disjoint interesting paths such that the overall sum of their interestingness scores is maximized. We also study a variant of IP termed k-IP, where the goal is to identify a collection of edge-disjoint interesting paths each with k edges, and their total interestingness score is maximized. While k-IP can be solved in polynomial time for k <= 2, we show k-IP is NP-complete for k >= 3 even when G is a DAG. We develop polynomial time heuristics for IP and k-IP on DAGs. △ Less

Submitted 10 April, 2018; v1 submitted 29 December, 2017; originally announced December 2017.

Comments: NP-completeness of k-IP shown only for DAGs now; connections to coboundary operations outlined

MSC Class: 05C85; 68Q25; 62H30; 55U99 ACM Class: G.2.2; F.2.2

arXiv:1707.04362 [pdf, other]

Hyppo-X: A Scalable Exploratory Framework for Analyzing Complex Phenomics Data

Authors: Methun Kamruzzaman, Ananth Kalyanaraman, Bala Krishnamoorthy, Stefan Hey, Patrick Schnable

Abstract: Phenomics is an emerging branch of modern biology that uses high throughput phenotyping tools to capture multiple environmental and phenotypic traits, often at massive spatial and temporal scales. The resulting high dimensional data represent a treasure trove of information for providing an in-depth understanding of how multiple factors interact and contribute to the overall growth and behavior of… ▽ More Phenomics is an emerging branch of modern biology that uses high throughput phenotyping tools to capture multiple environmental and phenotypic traits, often at massive spatial and temporal scales. The resulting high dimensional data represent a treasure trove of information for providing an in-depth understanding of how multiple factors interact and contribute to the overall growth and behavior of different genotypes. However, computational tools that can parse through such complex data and aid in extracting plausible hypotheses are currently lacking. In this paper, we present Hyppo-X, a new algorithmic approach to visually explore complex phenomics data and in the process characterize the role of environment on phenotypic traits. We model the problem as one of unsupervised structure discovery, and use emerging principles from algebraic topology and graph theory for discovering higher-order structures of complex phenomics data. We present an open source software which has interactive visualization capabilities to facilitate data navigation and hypothesis formulation. We test and evaluate Hyppo-X on two real-world plant (maize) data sets. Our results demonstrate the ability of our approach to delineate divergent subpopulation-level behavior. Notably, our approach shows how environmental factors could influence phenotypic behavior, and how that effect varies across different genotypes and different time scales. To the best of our knowledge, this effort provides one of the first approaches to systematically formalize the problem of hypothesis extraction for phenomics data. Considering the infancy of the phenomics field, tools that help users explore complex data and extract plausible hypotheses in a data-guided manner will be critical to future advancements in the use of such data. △ Less

Submitted 5 June, 2019; v1 submitted 13 July, 2017; originally announced July 2017.

Comments: Substantially expanded from previous version. Now illustrating interesting flares and paths on two different data sets

MSC Class: 68U05; 55U10; 05C20 ACM Class: J.3; I.3.5; F.2.2

arXiv:1010.0325

Routing Protocols for Cognitive Radio Networks: A Survey

Authors: S. M. Kamruzzaman, Dong Geun Jeong

Abstract: This article has been withdrawn by arXiv administrators because it plagiarises http://www2.ece.ohio-state.edu/~ekici/papers/crnroutingsurvey.pdf This article has been withdrawn by arXiv administrators because it plagiarises http://www2.ece.ohio-state.edu/~ekici/papers/crnroutingsurvey.pdf △ Less

Submitted 21 October, 2010; v1 submitted 2 October, 2010; originally announced October 2010.

Comments: This article has been withdrawn by arXiv administrators because it plagiarises http://www2.ece.ohio-state.edu/~ekici/papers/crnroutingsurvey.pdf

Journal ref: Journal of Information Industrial Engineering, Vol. 16, pp. 153-169, Aug. 2010

arXiv:1009.5048 [pdf]

The Most Advantageous Bangla Keyboard Layout Using Data Mining Technique

Authors: Abdul Kadar Muhammad Masum, Mohammad Mahadi Hassan, S. M. Kamruzzaman

Abstract: Bangla alphabet has a large number of letters, for this it is complicated to type faster using Bangla keyboard. The proposed keyboard will maximize the speed of operator as they can type with both hands parallel. Association rule of data mining to distribute the Bangla characters in the keyboard is used here. The frequencies of data consisting of monograph, digraph and trigraph are analyzed, which… ▽ More Bangla alphabet has a large number of letters, for this it is complicated to type faster using Bangla keyboard. The proposed keyboard will maximize the speed of operator as they can type with both hands parallel. Association rule of data mining to distribute the Bangla characters in the keyboard is used here. The frequencies of data consisting of monograph, digraph and trigraph are analyzed, which are derived from data wire-house, and then used association rule of data mining to distribute the Bangla characters in the layout. Experimental results on several data show the effectiveness of the proposed approach with better performance. This paper presents an optimal Bangla Keyboard Layout, which distributes the load equally on both hands so that maximizing the ease and minimizing the effort. △ Less

Submitted 25 September, 2010; originally announced September 2010.

Comments: 10 Pages, International Journal

Journal ref: Journal of Computer Science, IBAIS University, Dkhaka, Bangladesh, Vol. 1, No. 2, Dec. 2007

arXiv:1009.4994 [pdf]

doi 10.3923/ajit.2004.657.665

Text Categorization using Association Rule and Naive Bayes Classifier

Authors: S M Kamruzzaman, Chowdhury Mofizur Rahman

Abstract: As the amount of online text increases, the demand for text categorization to aid the analysis and management of text is increasing. Text is cheap, but information, in the form of knowing what classes a text belongs to, is expensive. Automatic categorization of text can provide this information at low cost, but the classifiers themselves must be built with expensive human effort, or trained from t… ▽ More As the amount of online text increases, the demand for text categorization to aid the analysis and management of text is increasing. Text is cheap, but information, in the form of knowing what classes a text belongs to, is expensive. Automatic categorization of text can provide this information at low cost, but the classifiers themselves must be built with expensive human effort, or trained from texts which have themselves been manually classified. Text categorization using Association Rule and Naïve Bayes Classifier is proposed here. Instead of using words word relation i.e association rules from these words is used to derive feature set from pre-classified text documents. Naive Bayes Classifier is then used on derived features for final categorization. △ Less

Submitted 25 September, 2010; originally announced September 2010.

Comments: 9 Pages, International Journal

Journal ref: Asian Journal of Information Technology, Vol. 3, No. 9, pp 657-665, Sep. 2004

arXiv:1009.4992 [pdf]

A System for Smart Home Control of Appliances based on Timer and Speech Interaction

Authors: S. M. Anamul Haque, S. M. Kamruzzaman, Md. Ashraful Islam

Abstract: The main objective of this work is to design and construct a microcomputer based system: to control electric appliances such as light, fan, heater, washing machine, motor, TV, etc. The paper discusses two major approaches to control home appliances. The first involves controlling home appliances using timer option. The second approach is to control home appliances using voice command. Moreover, it… ▽ More The main objective of this work is to design and construct a microcomputer based system: to control electric appliances such as light, fan, heater, washing machine, motor, TV, etc. The paper discusses two major approaches to control home appliances. The first involves controlling home appliances using timer option. The second approach is to control home appliances using voice command. Moreover, it is also possible to control appliances using Graphical User Interface. The parallel port is used to transfer data from computer to the particular device to be controlled. An interface box is designed to connect the high power loads to the parallel port. This system will play an important role for the elderly and physically disable people to control their home appliances in intuitive and flexible way. We have developed a system, which is able to control eight electric appliances properly in these three modes. △ Less

Submitted 25 September, 2010; originally announced September 2010.

Comments: 4 Pages, International Conference

Journal ref: Proc. 4th International Conference on Electrical Engineering, The Institution of Engineers, Dhaka, Bangladesh pp. 128-131, Jan. 2006

arXiv:1009.4991 [pdf]

Web Page Categorization Using Artificial Neural Networks

Authors: S. M. Kamruzzaman

Abstract: Web page categorization is one of the challenging tasks in the world of ever increasing web technologies. There are many ways of categorization of web pages based on different approach and features. This paper proposes a new dimension in the way of categorization of web pages using artificial neural network (ANN) through extracting the features automatically. Here eight major categories of web pag… ▽ More Web page categorization is one of the challenging tasks in the world of ever increasing web technologies. There are many ways of categorization of web pages based on different approach and features. This paper proposes a new dimension in the way of categorization of web pages using artificial neural network (ANN) through extracting the features automatically. Here eight major categories of web pages have been selected for categorization; these are business & economy, education, government, entertainment, sports, news & media, job search, and science. The whole process of the proposed system is done in three successive stages. In the first stage, the features are automatically extracted through analyzing the source of the web pages. The second stage includes fixing the input values of the neural network; all the values remain between 0 and 1. The variations in those values affect the output. Finally the third stage determines the class of a certain web page out of eight predefined classes. This stage is done using back propagation algorithm of artificial neural network. The proposed concept will facilitate web mining, retrievals of information from the web and also the search engines. △ Less

Submitted 25 September, 2010; originally announced September 2010.

Comments: 4 Pages, International Conference

Journal ref: Proc. 4th International Conference on Electrical Engineering, The Institution of Engineers, Dhaka, Bangladesh, pp. 96-99, Jan. 2006

arXiv:1009.4988 [pdf]

REx: An Efficient Rule Generator

Authors: S. M. Kamruzzaman

Abstract: This paper describes an efficient algorithm REx for generating symbolic rules from artificial neural network (ANN). Classification rules are sought in many areas from automatic knowledge acquisition to data mining and ANN rule extraction. This is because classification rules possess some attractive features. They are explicit, understandable and verifiable by domain experts, and can be modified, e… ▽ More This paper describes an efficient algorithm REx for generating symbolic rules from artificial neural network (ANN). Classification rules are sought in many areas from automatic knowledge acquisition to data mining and ANN rule extraction. This is because classification rules possess some attractive features. They are explicit, understandable and verifiable by domain experts, and can be modified, extended and passed on as modular knowledge. REx exploits the first order information in the data and finds shortest sufficient conditions for a rule of a class that can differentiate it from patterns of other classes. It can generate concise and perfect rules in the sense that the error rate of the rules is not worse than the inconsistency rate found in the original data. An important feature of rule extraction algorithm, REx, is its recursive nature. They are concise, comprehensible, order insensitive and do not involve any weight values. Extensive experimental studies on several benchmark classification problems, such as breast cancer, iris, season, and golf-playing, demonstrate the effectiveness of the proposed approach with good generalization ability. △ Less

Submitted 25 September, 2010; originally announced September 2010.

Comments: 4 Pages, International Conference

Journal ref: Proc. 4th International Conference on Electrical Engineering, The Institution of Engineers, Dhaka, Bangladesh, pp. 79-82, Jan. 2006

arXiv:1009.4987 [pdf]

Text Classification using Data Mining

Authors: S. M. Kamruzzaman, Farhana Haider, Ahmed Ryadh Hasan

Abstract: Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing… ▽ More Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Existing supervised learning algorithms to automatically classify text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification using data mining that requires fewer documents for training. Instead of using words, word relation i.e. association rules from these words is used to derive feature set from pre-classified text documents. The concept of Naive Bayes classifier is then used on derived features and finally only a single concept of Genetic Algorithm has been added for final classification. A system based on the proposed algorithm has been implemented and tested. The experimental results show that the proposed system works as a successful text classifier. △ Less

Submitted 25 September, 2010; originally announced September 2010.

Comments: 19 Pages, International Conference

Journal ref: Proc. International Conference on Information and Communication Technology in Management (ICTM-2005), Multimedia University, Malaysia, May 2005

arXiv:1009.4984 [pdf]

Rule Extraction using Artificial Neural Networks

Authors: S. M. Kamruzzaman, Ahmed Ryadh Hasan

Abstract: Artificial neural networks have been successfully applied to a variety of business application problems involving classification and regression. Although backpropagation neural networks generally predict better than decision trees do for pattern classification problems, they are often regarded as black boxes, i.e., their predictions are not as interpretable as those of decision trees. In many appl… ▽ More Artificial neural networks have been successfully applied to a variety of business application problems involving classification and regression. Although backpropagation neural networks generally predict better than decision trees do for pattern classification problems, they are often regarded as black boxes, i.e., their predictions are not as interpretable as those of decision trees. In many applications, it is desirable to extract knowledge from trained neural networks so that the users can gain a better understanding of the solution. This paper presents an efficient algorithm to extract rules from artificial neural networks. We use two-phase training algorithm for backpropagation learning. In the first phase, the number of hidden nodes of the network is determined automatically in a constructive fashion by adding nodes one after another based on the performance of the network on training data. In the second phase, the number of relevant input units of the network is determined using pruning algorithm. The pruning process attempts to eliminate as many connections as possible from the network. Relevant and irrelevant attributes of the data are distinguished during the training process. Those that are relevant will be kept and others will be automatically discarded. From the simplified networks having small number of connections and nodes we may easily able to extract symbolic rules using the proposed algorithm. Extensive experimental results on several benchmarks problems in neural networks demonstrate the effectiveness of the proposed approach with good generalization ability. △ Less

Submitted 25 September, 2010; originally announced September 2010.

Comments: 14 Pages, International Conference

Journal ref: Proc. International Conference on Information and Communication Technology in Management (ICTM 2005), Multimedia University, Malaysia, May 2005

arXiv:1009.4983 [pdf]

Pattern Classification using Simplified Neural Networks

Authors: S. M. Kamruzzaman, Ahmed Ryadh Hasan

Abstract: In recent years, many neural network models have been proposed for pattern classification, function approximation and regression problems. This paper presents an approach for classifying patterns from simplified NNs. Although the predictive accuracy of ANNs is often higher than that of other methods or human experts, it is often said that ANNs are practically "black boxes", due to the complexity o… ▽ More In recent years, many neural network models have been proposed for pattern classification, function approximation and regression problems. This paper presents an approach for classifying patterns from simplified NNs. Although the predictive accuracy of ANNs is often higher than that of other methods or human experts, it is often said that ANNs are practically "black boxes", due to the complexity of the networks. In this paper, we have an attempted to open up these black boxes by reducing the complexity of the network. The factor makes this possible is the pruning algorithm. By eliminating redundant weights, redundant input and hidden units are identified and removed from the network. Using the pruning algorithm, we have been able to prune networks such that only a few input units, hidden units and connections left yield a simplified network. Experimental results on several benchmarks problems in neural networks show the effectiveness of the proposed approach with good generalization ability. △ Less

Submitted 25 September, 2010; originally announced September 2010.

Comments: 7 Pages, International Conference

Journal ref: Proc. International Conference on Information and Communication Technology in Management (ICTM 2005), Multimedia University, Malaysia, May 2005

arXiv:1009.4982 [pdf]

Optimal Bangla Keyboard Layout using Data Mining Technique

Authors: S. M. Kamruzzaman, Md. Hijbul Alam, Abdul Kadar Muhammad Masum, Md. Mahadi Hassan

Abstract: This paper presents an optimal Bangla Keyboard Layout, which distributes the load equally on both hands so that maximizing the ease and minimizing the effort. Bangla alphabet has a large number of letters, for this it is difficult to type faster using Bangla keyboard. Our proposed keyboard will maximize the speed of operator as they can type with both hands parallel. Here we use the association ru… ▽ More This paper presents an optimal Bangla Keyboard Layout, which distributes the load equally on both hands so that maximizing the ease and minimizing the effort. Bangla alphabet has a large number of letters, for this it is difficult to type faster using Bangla keyboard. Our proposed keyboard will maximize the speed of operator as they can type with both hands parallel. Here we use the association rule of data mining to distribute the Bangla characters in the keyboard. First, we analyze the frequencies of data consisting of monograph, digraph and trigraph, which are derived from data wire-house, and then used association rule of data mining to distribute the Bangla characters in the layout. Experimental results on several data show the effectiveness of the proposed approach with better performance. △ Less

Submitted 25 September, 2010; originally announced September 2010.

Comments: 9 Pages, International Conference

Journal ref: Proc. International Conference on Information and Communication Technology in Management (ICTM 2005), Multimedia University, Malaysia, May 2005

arXiv:1009.4981 [pdf]

An Efficient Technique for Text Compression

Authors: Md. Abul Kalam Azad, Rezwana Sharmeen, Shabbir Ahmad, S. M. Kamruzzaman

Abstract: For storing a word or the whole text segment, we need a huge storage space. Typically a character requires 1 Byte for storing it in memory. Compression of the memory is very important for data management. In case of memory requirement compression for text data, lossless memory compression is needed. We are suggesting a lossless memory requirement compression method for text data compression. The p… ▽ More For storing a word or the whole text segment, we need a huge storage space. Typically a character requires 1 Byte for storing it in memory. Compression of the memory is very important for data management. In case of memory requirement compression for text data, lossless memory compression is needed. We are suggesting a lossless memory requirement compression method for text data compression. The proposed compression method will compress the text segment or the text file based on two level approaches firstly reduction and secondly compression. Reduction will be done using a word lookup table not using traditional indexing system, then compression will be done using currently available compression methods. The word lookup table will be a part of the operating system and the reduction will be done by the operating system. According to this method each word will be replaced by an address value. This method can quite effectively reduce the size of persistent memory required for text data. At the end of the first level compression with the use of word lookup table, a binary file containing the addresses will be generated. Since the proposed method does not use any compression algorithm in the first level so this file can be compressed using the popular compression algorithms and finally will provide a great deal of data compression on purely English text data. △ Less

Submitted 25 September, 2010; originally announced September 2010.

Comments: 7 Pages, International Conference

Journal ref: Proc. International Conference on Information Management and Business (IMB 2005), Shih Chien University, Taiwan, pp. 467-473, Mar. 2005

arXiv:1009.4980 [pdf]

Completely Enhanced Cell Phone Keypad

Authors: Rezwana Sharmeen, Md. Abul Kalam Azad, Shabbir Ahmad, S. M. Kamruzzaman

Abstract: The enhanced frequency based keypad is designed to speed up the typing process. This paper will show that the proposed layout will increase the typing speed and be flexible for thumb. Traditional cell phone keypad is not a scientific keypad from the frequency point of view. Approaches have been explored to speed up the typing process. We found that no manufacturer has considered the frequency of t… ▽ More The enhanced frequency based keypad is designed to speed up the typing process. This paper will show that the proposed layout will increase the typing speed and be flexible for thumb. Traditional cell phone keypad is not a scientific keypad from the frequency point of view. Approaches have been explored to speed up the typing process. We found that no manufacturer has considered the frequency of the alphabet. The current architecture does not provide flexibility although the users are accustomed to the currently available multi-tapping keypad. Since the currently available keypad layouts are not best suited for users, this paper will suggest a keypad for cell phone and other cellular device based on the frequency of the alphabet in English language and also with the view of structure of human finger movements to provide a flexible and fast cell phone keypad. It also takes into consideration the key jamming problem that was available in typewriter. At first we identified those keys of cell phone, which are easily reachable and create less pressure on the thumb. Thus the key frequency order is calculated from anatomical point of view. In our proposed layout we arranged the alphabet in the frequent keys based on the frequency of the alphabet. △ Less

Submitted 25 September, 2010; originally announced September 2010.

Comments: 5 Pages, International Conference

Journal ref: Proc. International Conference on Information Management and Business (IMB 2005), Shih Chien University, Taiwan, pp. 217-221, Mar. 2005

arXiv:1009.4979 [pdf]

Smart Bengali Cell Phone Keypad Layout

Authors: Md. Abul Kalam Azad, Rezwana Sharmeen, Shabbir Ahmad, S. M. Kamruzzaman

Abstract: Nowadays cell phone is the most common communicating used by mass people. SMS based communication is a cheap and popular communication method. It is human tendency to have the opportunity to write SMS in their mother language. Text input in mother language is more flexible when the alphabets of that language are printed on the keypad. Bangla mobile keypad based on phonetics has been proposed earli… ▽ More Nowadays cell phone is the most common communicating used by mass people. SMS based communication is a cheap and popular communication method. It is human tendency to have the opportunity to write SMS in their mother language. Text input in mother language is more flexible when the alphabets of that language are printed on the keypad. Bangla mobile keypad based on phonetics has been proposed earlier. But the keypad is not scientific from frequency and flexibility point of view. Since it is not a feasible solution in this paper we have proposed an efficient Bengali keypad for cell phone and other cellular device. The proposed keypad is based on the frequency of the alphabets in Bengali language and also with the view of structure of human finger movements. We took the two points in count to provide a flexible and fast cell phone keypad. △ Less

Submitted 25 September, 2010; originally announced September 2010.

Comments: 4 Pages, International Conference

Journal ref: Proc. 8th International Conference on Computer and Information Technology (ICCIT 2005), Dhaka, Bangladesh, pp. 1208-1211, Dec. 2005

arXiv:1009.4978 [pdf]

Extracting Symbolic Rules for Medical Diagnosis Problem

Authors: S. M. Kamruzzaman

Abstract: Neural networks (NNs) have been successfully applied to solve a variety of application problems involving classification and function approximation. Although backpropagation NNs generally predict better than decision trees do for pattern classification problems, they are often regarded as black boxes, i.e., their predictions cannot be explained as those of decision trees. In many applications, it… ▽ More Neural networks (NNs) have been successfully applied to solve a variety of application problems involving classification and function approximation. Although backpropagation NNs generally predict better than decision trees do for pattern classification problems, they are often regarded as black boxes, i.e., their predictions cannot be explained as those of decision trees. In many applications, it is desirable to extract knowledge from trained NNs for the users to gain a better understanding of how the networks solve the problems. An algorithm is proposed and implemented to extract symbolic rules for medical diagnosis problem. Empirical study on three benchmarks classification problems, such as breast cancer, diabetes, and lenses demonstrates that the proposed algorithm generates high quality rules from NNs comparable with other methods in terms of number of rules, average number of conditions for a rule, and predictive accuracy. △ Less

Submitted 25 September, 2010; originally announced September 2010.

Comments: 6 Pages, International Conference

Journal ref: Proc. 8th International Conference on Computer and Information Technology (ICCIT 2005), Dhaka, Bangladesh, pp. 602-607, Dec. 2005

arXiv:1009.4977 [pdf]

Universal Numeric Segmented Display

Authors: Md. Abul kalam Azad, Rezwana Sharmeen, S. M. Kamruzzaman

Abstract: Segmentation display plays a vital role to display numerals. But in today's world matrix display is also used in displaying numerals. Because numerals has lots of curve edges which is better supported by matrix display. But as matrix display is costly and complex to implement and also needs more memory, segment display is generally used to display numerals. But as there is yet no proposed compact… ▽ More Segmentation display plays a vital role to display numerals. But in today's world matrix display is also used in displaying numerals. Because numerals has lots of curve edges which is better supported by matrix display. But as matrix display is costly and complex to implement and also needs more memory, segment display is generally used to display numerals. But as there is yet no proposed compact display architecture to display multiple language numerals at a time, this paper proposes uniform display architecture to display multiple language digits and general mathematical expressions with higher accuracy and simplicity by using a 18-segment display, which is an improvement over the 16 segment display. △ Less

Submitted 25 September, 2010; originally announced September 2010.

Comments: 6 Pages, International Conference

Journal ref: Proc. 7th International Conference on Computer and Information Technology (ICCIT-2004), Dhaka, Bangladesh, pp. 887-892, Dec. 2004

arXiv:1009.4976 [pdf]

Text Classification using Association Rule with a Hybrid Concept of Naive Bayes Classifier and Genetic Algorithm

Authors: S. M. Kamruzzaman, Farhana Haider, Ahmed Ryadh Hasan

Abstract: Text classification is the automated assignment of natural language texts to predefined categories based on their content. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Now a day the deman… ▽ More Text classification is the automated assignment of natural language texts to predefined categories based on their content. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Now a day the demand of text classification is increasing tremendously. Keeping this demand into consideration, new and updated techniques are being developed for the purpose of automated text classification. This paper presents a new algorithm for text classification. Instead of using words, word relation i.e. association rules is used to derive feature set from pre-classified text documents. The concept of Naive Bayes Classifier is then used on derived features and finally a concept of Genetic Algorithm has been added for final classification. A system based on the proposed algorithm has been implemented and tested. The experimental results show that the proposed system works as a successful text classifier. △ Less

Submitted 25 September, 2010; originally announced September 2010.

Comments: 6 Pages, International Conference

Journal ref: Proc. 7th International Conference on Computer and Information Technology (ICCIT-2004), Dhaka, Bangladesh, pp. 682-687, Dec. 2004

arXiv:1009.4974 [pdf]

Rotation Invariant Face Detection Using Wavelet, PCA and Radial Basis Function Networks

Authors: S. M. Kamruzzaman, Firoz Ahmed Siddiqi, Md. Saiful Islam, Md. Emdadul Haque, Mohammad Shamsul Alam

Abstract: This paper introduces a novel method for human face detection with its orientation by using wavelet, principle component analysis (PCA) and redial basis networks. The input image is analyzed by two-dimensional wavelet and a two-dimensional stationary wavelet. The common goals concern are the image clearance and simplification, which are parts of de-noising or compression. We applied an effective p… ▽ More This paper introduces a novel method for human face detection with its orientation by using wavelet, principle component analysis (PCA) and redial basis networks. The input image is analyzed by two-dimensional wavelet and a two-dimensional stationary wavelet. The common goals concern are the image clearance and simplification, which are parts of de-noising or compression. We applied an effective procedure to reduce the dimension of the input vectors using PCA. Radial Basis Function (RBF) neural network is then used as a function approximation network to detect where either the input image is contained a face or not and if there is a face exists then tell about its orientation. We will show how RBF can perform well then back-propagation algorithm and give some solution for better regularization of the RBF (GRNN) network. Compared with traditional RBF networks, the proposed network demonstrates better capability of approximation to underlying functions, faster learning speed, better size of network, and high robustness to outliers. △ Less

Submitted 25 September, 2010; originally announced September 2010.

Comments: 5 Pages, International Conference

Journal ref: 12th International Conference on Human Computer Interaction, Beijing, China, Vol. 18, Jul. 2007

arXiv:1009.4973 [pdf]

Performance Analysis of Pulse Shaping Technique for OFDM PAPR Reduction

Authors: S. M. Kamruzzaman, Md. Anisur Rahman

Abstract: Orthogonal Frequency Division Multiplexing (OFDM) is an attractive modulation and multiple access techniques for channels with a nonflat frequency response, as it saves the need for complex equalizers. It can offer high quality performance in terms of bandwidth efficiency, robustness against multipath fading and cost-effective implementation. However, its main disadvantage is the high peak-to-aver… ▽ More Orthogonal Frequency Division Multiplexing (OFDM) is an attractive modulation and multiple access techniques for channels with a nonflat frequency response, as it saves the need for complex equalizers. It can offer high quality performance in terms of bandwidth efficiency, robustness against multipath fading and cost-effective implementation. However, its main disadvantage is the high peak-to-average power ratio (PAPR) of the output signal. As a result, a linear behavior of the system over a large dynamic range is needed and therefore the efficiency of the output amplifier is reduced. In this paper, we investigate the effect of some of these sets of time waveforms on the OFDM system performance in terms of Bit Error Rate (BER). We evaluate the system performance in AWGN channels. The obtained results indicate that the reduction in PAPR of the investigated methods is associated with considerable improvement in BER performance of the system, in multipath channels, as compared to conventional OFDM. These promising results indicate that pulse shaping with reduced PAPR is an attractive solution for an OFDM system. △ Less

Submitted 25 September, 2010; originally announced September 2010.

Comments: 5 Pages, International Conference

Journal ref: Proc. 12th International Conference on Human Computer Interaction, Beijing, China, Vol. 18, Jul. 2007

arXiv:1009.4972 [pdf]

doi 10.3923/ijepe.2007.274.278

Speaker Identification using MFCC-Domain Support Vector Machine

Authors: S. M. Kamruzzaman, A. N. M. Rezaul Karim, Md. Saiful Islam, Md. Emdadul Haque

Abstract: Speech recognition and speaker identification are important for authentication and verification in security purpose, but they are difficult to achieve. Speaker identification methods can be divided into text-independent and text-dependent. This paper presents a technique of text-dependent speaker identification using MFCC-domain support vector machine (SVM). In this work, melfrequency cepstrum coe… ▽ More Speech recognition and speaker identification are important for authentication and verification in security purpose, but they are difficult to achieve. Speaker identification methods can be divided into text-independent and text-dependent. This paper presents a technique of text-dependent speaker identification using MFCC-domain support vector machine (SVM). In this work, melfrequency cepstrum coefficients (MFCCs) and their statistical distribution properties are used as features, which will be inputs to the neural network. This work firstly used sequential minimum optimization (SMO) learning technique for SVM that improve performance over traditional techniques Chunking, Osuna. The cepstrum coefficients representing the speaker characteristics of a speech segment are computed by nonlinear filter bank analysis and discrete cosine transform. The speaker identification ability and convergence speed of the SVMs are investigated for different combinations of features. Extensive experimental results on several samples show the effectiveness of the proposed approach. △ Less

Submitted 25 September, 2010; originally announced September 2010.

Comments: 5 Pages, International Journal

Journal ref: International Journal of Electrical and Power Engineering, Vol. 1, No. 3, pp. 274-278, 2007

arXiv:1009.4964 [pdf]

Text Classification using Artificial Intelligence

Authors: S. M. Kamruzzaman

Abstract: Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing… ▽ More Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Existing supervised learning algorithms for classifying text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification using artificial intelligence technique that requires fewer documents for training. Instead of using words, word relation i.e. association rules from these words is used to derive feature set from pre-classified text documents. The concept of naïve Bayes classifier is then used on derived features and finally only a single concept of genetic algorithm has been added for final classification. A system based on the proposed algorithm has been implemented and tested. The experimental results show that the proposed system works as a successful text classifier. △ Less

Submitted 24 September, 2010; originally announced September 2010.

Comments: 6 Pages, International Journal

Journal ref: Journal of Electrical Engineering, The Institution of Engineers, Bangladesh, Vol. EE 33, No. I & II, Dec. 2006

arXiv:1009.4962 [pdf]

RGANN: An Efficient Algorithm to Extract Rules from ANNs

Authors: S. M. Kamruzzaman

Abstract: This paper describes an efficient rule generation algorithm, called rule generation from artificial neural networks (RGANN) to generate symbolic rules from ANNs. Classification rules are sought in many areas from automatic knowledge acquisition to data mining and ANN rule extraction. This is because classification rules possess some attractive features. They are explicit, understandable and verifi… ▽ More This paper describes an efficient rule generation algorithm, called rule generation from artificial neural networks (RGANN) to generate symbolic rules from ANNs. Classification rules are sought in many areas from automatic knowledge acquisition to data mining and ANN rule extraction. This is because classification rules possess some attractive features. They are explicit, understandable and verifiable by domain experts, and can be modified, extended and passed on as modular knowledge. A standard three-layer feedforward ANN is the basis of the algorithm. A four-phase training algorithm is proposed for backpropagation learning. Comparing them to the symbolic rules generated by other methods supports explicitness of the generated rules. Generated rules are comparable with other methods in terms of number of rules, average number of conditions for a rule, and predictive accuracy. Extensive experimental studies on several benchmarks classification problems, including breast cancer, wine, season, golf-playing, and lenses classification demonstrate the effectiveness of the proposed approach with good generalization ability. △ Less

Submitted 24 September, 2010; originally announced September 2010.

Comments: 12 Pages, International Journal

Journal ref: Journal of Electronics and Computer Science, Jahangarnagar University, Bangladesh, Vol. 8, pp. 19-30, Jun. 2007

arXiv:1009.4590 [pdf]

A Unique 10 Segment Display for Bengali Numerals

Authors: Md. Abul Kalam Azad, Rezwana Sharmeen, Shabbir Ahmad, S. M. Kamruzzaman

Abstract: Segmented display is widely used for efficient display of alphanumeric characters. English numerals are displayed by 7 segment and 16 segment display. The segment size is uniform in this two display architecture. Display architecture using 8, 10, 11, 18 segments have been proposed for Bengali numerals 0...9 yet no display architecture is designed using segments of uniform size and uniform power co… ▽ More Segmented display is widely used for efficient display of alphanumeric characters. English numerals are displayed by 7 segment and 16 segment display. The segment size is uniform in this two display architecture. Display architecture using 8, 10, 11, 18 segments have been proposed for Bengali numerals 0...9 yet no display architecture is designed using segments of uniform size and uniform power consumption. In this paper we have proposed a uniform 10 segment architecture for Bengali numerals. This segment architecture uses segments of uniform size and no bent segment is used. △ Less

Submitted 23 September, 2010; originally announced September 2010.

Comments: 3 Pages, International Conference

Journal ref: Proc. 8th International Conference on Computer and Information Technology (ICCIT 2005), Dhaka, Bangladesh, pp. 97-99, Dec. 2005

arXiv:1009.4586 [pdf]

Optimal Bangla Keyboard Layout using Association Rule of Data Mining

Authors: Md. Hijbul Alam, Abdul Kadar Muhammad Masum, Mohammad Mahadi Hassan, S. M. Kamruzzaman

Abstract: In this paper we present an optimal Bangla Keyboard Layout, which distributes the load equally on both hands so that maximizing the ease and minimizing the effort. Bangla alphabet has a large number of letters, for this it is difficult to type faster using Bangla keyboard. Our proposed keyboard will maximize the speed of operator as they can type with both hands parallel. Here we use the associati… ▽ More In this paper we present an optimal Bangla Keyboard Layout, which distributes the load equally on both hands so that maximizing the ease and minimizing the effort. Bangla alphabet has a large number of letters, for this it is difficult to type faster using Bangla keyboard. Our proposed keyboard will maximize the speed of operator as they can type with both hands parallel. Here we use the association rule of data mining to distribute the Bangla characters in the keyboard. First, we analyze the frequencies of data consisting of monograph, digraph and trigraph, which are derived from data wire-house, and then used association rule of data mining to distribute the Bangla characters in the layout. Finally, we propose a Bangla Keyboard Layout. Experimental results on several keyboard layout shows the effectiveness of the proposed approach with better performance. △ Less

Submitted 23 September, 2010; originally announced September 2010.

Comments: 3 Pages, International Conference

Journal ref: Proc. 7th International Conference on Computer and Information Technology (ICCIT 2004), Dhaka, Bangladesh, pp. 679-681, Dec. 2004

arXiv:1009.4582 [pdf]

Text Classification using the Concept of Association Rule of Data Mining

Authors: Chowdhury Mofizur Rahman, Ferdous Ahmed Sohel, Parvez Naushad, S. M. Kamruzzaman

Abstract: As the amount of online text increases, the demand for text classification to aid the analysis and management of text is increasing. Text is cheap, but information, in the form of knowing what classes a text belongs to, is expensive. Automatic classification of text can provide this information at low cost, but the classifiers themselves must be built with expensive human effort, or trained from t… ▽ More As the amount of online text increases, the demand for text classification to aid the analysis and management of text is increasing. Text is cheap, but information, in the form of knowing what classes a text belongs to, is expensive. Automatic classification of text can provide this information at low cost, but the classifiers themselves must be built with expensive human effort, or trained from texts which have themselves been manually classified. In this paper we will discuss a procedure of classifying text using the concept of association rule of data mining. Association rule mining technique has been used to derive feature set from pre-classified text documents. Naive Bayes classifier is then used on derived features for final classification. △ Less

Submitted 23 September, 2010; originally announced September 2010.

Comments: 8 Pages, International Conference

Journal ref: Proc. International Conference on Information Technology, Kathmandu, Nepal, pp. 234-241, May. 2003

arXiv:1009.4574 [pdf]

A hybrid learning algorithm for text classification

Authors: S. M. Kamruzzaman, Farhana Haider

Abstract: Text classification is the process of classifying documents into predefined categories based on their content. Existing supervised learning algorithms to automatically classify text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification that requires fewer documents for training. Instead of using words, word relation i.e association rules from th… ▽ More Text classification is the process of classifying documents into predefined categories based on their content. Existing supervised learning algorithms to automatically classify text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification that requires fewer documents for training. Instead of using words, word relation i.e association rules from these words is used to derive feature set from preclassified text documents. The concept of Naive Bayes classifier is then used on derived features and finally only a single concept of Genetic Algorithm has been added for final classification. Experimental results show that the classifier build this way is more accurate than the existing text classification systems. △ Less

Submitted 23 September, 2010; originally announced September 2010.

Comments: 4 pages, International Conference

Journal ref: Proc. 3rd International Conference on Electrical & Computer Engineering (ICECE 2004), Dhaka Bangladesh, pp. 577-580, Dec. 2004

arXiv:1009.4572 [pdf]

Medical diagnosis using neural network

Authors: S. M. Kamruzzaman, Ahmed Ryadh Hasan, Abu Bakar Siddiquee, Md. Ehsanul Hoque Mazumder

Abstract: This research is to search for alternatives to the resolution of complex medical diagnosis where human knowledge should be apprehended in a general fashion. Successful application examples show that human diagnostic capabilities are significantly worse than the neural diagnostic system. This paper describes a modified feedforward neural network constructive algorithm (MFNNCA), a new algorithm for… ▽ More This research is to search for alternatives to the resolution of complex medical diagnosis where human knowledge should be apprehended in a general fashion. Successful application examples show that human diagnostic capabilities are significantly worse than the neural diagnostic system. This paper describes a modified feedforward neural network constructive algorithm (MFNNCA), a new algorithm for medical diagnosis. The new constructive algorithm with backpropagation; offer an approach for the incremental construction of near-minimal neural network architectures for pattern classification. The algorithm starts with minimal number of hidden units in the single hidden layer; additional units are added to the hidden layer one at a time to improve the accuracy of the network and to get an optimal size of a neural network. The MFNNCA was tested on several benchmarking classification problems including the cancer, heart disease and diabetes. Experimental results show that the MFNNCA can produce optimal neural network architecture with good generalization ability. △ Less

Submitted 23 September, 2010; originally announced September 2010.

Comments: 4 pages, International Conference

Journal ref: Proc. 3rd International Conference on Electrical & Computer Engineering (ICECE 2004), Dhaka Bangladesh, pp. 537-540, Dec. 2004

arXiv:1009.4570 [pdf]

Extraction of Symbolic Rules from Artificial Neural Networks

Authors: S. M. Kamruzzaman, Md. Monirul Islam

Abstract: Although backpropagation ANNs generally predict better than decision trees do for pattern classification problems, they are often regarded as black boxes, i.e., their predictions cannot be explained as those of decision trees. In many applications, it is desirable to extract knowledge from trained ANNs for the users to gain a better understanding of how the networks solve the problems. A new rule… ▽ More Although backpropagation ANNs generally predict better than decision trees do for pattern classification problems, they are often regarded as black boxes, i.e., their predictions cannot be explained as those of decision trees. In many applications, it is desirable to extract knowledge from trained ANNs for the users to gain a better understanding of how the networks solve the problems. A new rule extraction algorithm, called rule extraction from artificial neural networks (REANN) is proposed and implemented to extract symbolic rules from ANNs. A standard three-layer feedforward ANN is the basis of the algorithm. A four-phase training algorithm is proposed for backpropagation learning. Explicitness of the extracted rules is supported by comparing them to the symbolic rules generated by other methods. Extracted rules are comparable with other methods in terms of number of rules, average number of conditions for a rule, and predictive accuracy. Extensive experimental studies on several benchmarks classification problems, such as breast cancer, iris, diabetes, and season classification problems, demonstrate the effectiveness of the proposed approach with good generalization ability. △ Less

Submitted 23 September, 2010; originally announced September 2010.

Comments: 7 Pages, WASET Transactions

Journal ref: WASET Transactions on Science, Engineering and Technology, Vol. 10, pp. 271-277, Dec. 2005

arXiv:1009.4566 [pdf]

An Algorithm to Extract Rules from Artificial Neural Networks for Medical Diagnosis Problems

Authors: S. M. Kamruzzaman, Md. Monirul Islam

Abstract: Artificial neural networks (ANNs) have been successfully applied to solve a variety of classification and function approximation problems. Although ANNs can generally predict better than decision trees for pattern classification problems, ANNs are often regarded as black boxes since their predictions cannot be explained clearly like those of decision trees. This paper presents a new algorithm, cal… ▽ More Artificial neural networks (ANNs) have been successfully applied to solve a variety of classification and function approximation problems. Although ANNs can generally predict better than decision trees for pattern classification problems, ANNs are often regarded as black boxes since their predictions cannot be explained clearly like those of decision trees. This paper presents a new algorithm, called rule extraction from ANNs (REANN), to extract rules from trained ANNs for medical diagnosis problems. A standard three-layer feedforward ANN with four-phase training is the basis of the proposed algorithm. In the first phase, the number of hidden nodes in ANNs is determined automatically by a constructive algorithm. In the second phase, irrelevant connections and input nodes are removed from trained ANNs without sacrificing the predictive accuracy of ANNs. The continuous activation values of the hidden nodes are discretized by using an efficient heuristic clustering algorithm in the third phase. Finally, rules are extracted from compact ANNs by examining the discretized activation values of the hidden nodes. Extensive experimental studies on three benchmark classification problems, i.e. breast cancer, diabetes and lenses, demonstrate that REANN can generate high quality rules from ANNs, which are comparable with other methods in terms of number of rules, average number of conditions for a rule, and predictive accuracy. △ Less

Submitted 23 September, 2010; originally announced September 2010.

Comments: 19 Pages, Internatiomal Journal

Journal ref: International Journal of Information Technology (IJIT), Vol. 12, No. 8, pp. 41-59, 2006

arXiv:1009.4564 [pdf]

A Constructive Algorithm for Feedforward Neural Networks for Medical Diagnostic Reasoning

Authors: Abu Bakar Siddiquee, Md. Ehsanul Hoque Mazumder, S. M. Kamruzzaman

Abstract: This research is to search for alternatives to the resolution of complex medical diagnosis where human knowledge should be apprehended in a general fashion. Successful application examples show that human diagnostic capabilities are significantly worse than the neural diagnostic system. Our research describes a constructive neural network algorithm with backpropagation; offer an approach for the i… ▽ More This research is to search for alternatives to the resolution of complex medical diagnosis where human knowledge should be apprehended in a general fashion. Successful application examples show that human diagnostic capabilities are significantly worse than the neural diagnostic system. Our research describes a constructive neural network algorithm with backpropagation; offer an approach for the incremental construction of nearminimal neural network architectures for pattern classification. The algorithm starts with minimal number of hidden units in the single hidden layer; additional units are added to the hidden layer one at a time to improve the accuracy of the network and to get an optimal size of a neural network. Our algorithm was tested on several benchmarking classification problems including Cancer1, Heart, and Diabetes with good generalization ability. △ Less

Submitted 23 September, 2010; originally announced September 2010.

Comments: 4 Pages, International Symposium

Journal ref: Proc. MMU International Symposium on Information and Communications Technology (M2USIC 2004), Kuala Lumpur, Malaysia, pp. TS4B2: 5-8, Oct. 2004

arXiv:1009.4521 [pdf]

doi 10.5121/ijcnc.2010.2501

CR-MAC: A multichannel MAC protocol for cognitive radio ad hoc networks

Authors: S. M. Kamruzzaman

Abstract: This paper proposes a cross-layer based cognitive radio multichannel medium access control (MAC) protocol with TDMA, which integrate the spectrum sensing at physical (PHY) layer and the packet scheduling at MAC layer, for the ad hoc wireless networks. The IEEE 802.11 standard allows for the use of multiple channels available at the PHY layer, but its MAC protocol is designed only for a single chan… ▽ More This paper proposes a cross-layer based cognitive radio multichannel medium access control (MAC) protocol with TDMA, which integrate the spectrum sensing at physical (PHY) layer and the packet scheduling at MAC layer, for the ad hoc wireless networks. The IEEE 802.11 standard allows for the use of multiple channels available at the PHY layer, but its MAC protocol is designed only for a single channel. A single channel MAC protocol does not work well in a multichannel environment, because of the multichannel hidden terminal problem. Our proposed protocol enables secondary users (SUs) to utilize multiple channels by switching channels dynamically, thus increasing network throughput. In our proposed protocol, each SU is equipped with only one spectrum agile transceiver, but solves the multichannel hidden terminal problem using temporal synchronization. The proposed cognitive radio MAC (CR-MAC) protocol allows SUs to identify and use the unused frequency spectrum in a way that constrains the level of interference to the primary users (PUs). Our scheme improves network throughput significantly, especially when the network is highly congested. The simulation results show that our proposed CR-MAC protocol successfully exploits multiple channels and significantly improves network performance by using the licensed spectrum band opportunistically and protects PUs from interference, even in hidden terminal situations. △ Less

Submitted 23 September, 2010; originally announced September 2010.

Comments: 14 Pages, International Journal

Journal ref: International Journal of Computer Networks & Communications (IJCNC), Vol.2, No.5, pp. 1-14, Sep. 2010

arXiv:1009.4520 [pdf]

An Energy Efficient Multichannel MAC Protocol for Cognitive Radio Ad Hoc Networks

Authors: S. M. Kamruzzaman

Abstract: This paper presents a TDMA based energy efficient cognitive radio multichannel medium access control (MAC) protocol called ECR-MAC for wireless Ad Hoc Networks. ECR-MAC requires only a single half-duplex radio transceiver on each node that integrates the spectrum sensing at physical (PHY) layer and the packet scheduling at MAC layer. In addition to explicit frequency negotiation which is adopted b… ▽ More This paper presents a TDMA based energy efficient cognitive radio multichannel medium access control (MAC) protocol called ECR-MAC for wireless Ad Hoc Networks. ECR-MAC requires only a single half-duplex radio transceiver on each node that integrates the spectrum sensing at physical (PHY) layer and the packet scheduling at MAC layer. In addition to explicit frequency negotiation which is adopted by conventional multichannel MAC protocols, ECR-MAC introduces lightweight explicit time negotiation. This two-dimensional negotiation enables ECR-MAC to exploit the advantage of both multiple channels and TDMA, and achieve aggressive power savings by allowing nodes that are not involved in communication to go into doze mode. The IEEE 802.11 standard allows for the use of multiple channels available at the PHY layer, but its MAC protocol is designed only for a single channel. A single channel MAC protocol does not work well in a multichannel environment, because of the multichannel hidden terminal problem. The proposed energy efficient ECR-MAC protocol allows SUs to identify and use the unused frequency spectrum in a way that constrains the level of interference to the primary users (PUs). Extensive simulation results show that our proposed ECR-MAC protocol successfully exploits multiple channels and significantly improves network performance by using the licensed spectrum band opportunistically and protects QoS provisioning over cognitive radio ad hoc networks. △ Less

Submitted 23 September, 2010; originally announced September 2010.

Comments: 8 Pages, International Journal

Journal ref: International Journal of Communication Networks and Information Security (IJCNIS), Vol. 2, No. 2, pp. 112-119, Aug. 2010

Showing 1–48 of 48 results for author: Kamruzzaman, M