Search | arXiv e-print repository

Pseudo-Labeling Driven Refinement of Benchmark Object Detection Datasets via Analysis of Learning Patterns

Authors: Min Je Kim, Muhammad Munsif, Altaf Hussain, Hikmat Yar, Sung Wook Baik

Abstract: Benchmark object detection (OD) datasets play a pivotal role in advancing computer vision applications such as autonomous driving, and surveillance, as well as in training and evaluating deep learning-based state-of-the-art detection models. Among them, MS-COCO has become a standard benchmark due to its diverse object categories and complex scenes. However, despite its wide adoption, MS-COCO suffe… ▽ More Benchmark object detection (OD) datasets play a pivotal role in advancing computer vision applications such as autonomous driving, and surveillance, as well as in training and evaluating deep learning-based state-of-the-art detection models. Among them, MS-COCO has become a standard benchmark due to its diverse object categories and complex scenes. However, despite its wide adoption, MS-COCO suffers from various annotation issues, including missing labels, incorrect class assignments, inaccurate bounding boxes, duplicate labels, and group labeling inconsistencies. These errors not only hinder model training but also degrade the reliability and generalization of OD models. To address these challenges, we propose a comprehensive refinement framework and present MJ-COCO, a newly re-annotated version of MS-COCO. Our approach begins with loss and gradient-based error detection to identify potentially mislabeled or hard-to-learn samples. Next, we apply a four-stage pseudo-labeling refinement process: (1) bounding box generation using invertible transformations, (2) IoU-based duplicate removal and confidence merging, (3) class consistency verification via expert objects recognizer, and (4) spatial adjustment based on object region activation map analysis. This integrated pipeline enables scalable and accurate correction of annotation errors without manual re-labeling. Extensive experiments were conducted across four validation datasets: MS-COCO, Sama COCO, Objects365, and PASCAL VOC. Models trained on MJ-COCO consistently outperformed those trained on MS-COCO, achieving improvements in Average Precision (AP) and APS metrics. MJ-COCO also demonstrated significant gains in annotation coverage: for example, the number of small object annotations increased by more than 200,000 compared to MS-COCO. △ Less

Submitted 1 June, 2025; originally announced June 2025.

arXiv:2204.06788 [pdf, other]

Pyramidal Attention for Saliency Detection

Authors: Tanveer Hussain, Abbas Anwar, Saeed Anwar, Lars Petersson, Sung Wook Baik

Abstract: Salient object detection (SOD) extracts meaningful contents from an input image. RGB-based SOD methods lack the complementary depth clues; hence, providing limited performance for complex scenarios. Similarly, RGB-D models process RGB and depth inputs, but the depth data availability during testing may hinder the model's practical applicability. This paper exploits only RGB images, estimates depth… ▽ More Salient object detection (SOD) extracts meaningful contents from an input image. RGB-based SOD methods lack the complementary depth clues; hence, providing limited performance for complex scenarios. Similarly, RGB-D models process RGB and depth inputs, but the depth data availability during testing may hinder the model's practical applicability. This paper exploits only RGB images, estimates depth from RGB, and leverages the intermediate depth features. We employ a pyramidal attention structure to extract multi-level convolutional-transformer features to process initial stage representations and further enhance the subsequent ones. At each stage, the backbone transformer model produces global receptive fields and computing in parallel to attain fine-grained global predictions refined by our residual convolutional attention decoder for optimal saliency prediction. We report significantly improved performance against 21 and 40 state-of-the-art SOD methods on eight RGB and RGB-D datasets, respectively. Consequently, we present a new SOD perspective of generating RGB-D SOD without acquiring depth data during training and testing and assist RGB methods with depth clues for improved performance. The code and trained models are available at https://github.com/tanveer-hussain/EfficientSOD2 △ Less

Submitted 14 April, 2022; originally announced April 2022.

Comments: Accepted at CVPRW 2022. (2022 IEEE CVPR Workshop on Fair, Data Efficient and Trusted Computer Vision)

arXiv:2102.06407 [pdf, other]

Densely Deformable Efficient Salient Object Detection Network

Authors: Tanveer Hussain, Saeed Anwar, Amin Ullah, Khan Muhammad, Sung Wook Baik

Abstract: Salient Object Detection (SOD) domain using RGB-D data has lately emerged with some current models' adequately precise results. However, they have restrained generalization abilities and intensive computational complexity. In this paper, inspired by the best background/foreground separation abilities of deformable convolutions, we employ them in our Densely Deformable Network (DDNet) to achieve ef… ▽ More Salient Object Detection (SOD) domain using RGB-D data has lately emerged with some current models' adequately precise results. However, they have restrained generalization abilities and intensive computational complexity. In this paper, inspired by the best background/foreground separation abilities of deformable convolutions, we employ them in our Densely Deformable Network (DDNet) to achieve efficient SOD. The salient regions from densely deformable convolutions are further refined using transposed convolutions to optimally generate the saliency maps. Quantitative and qualitative evaluations using the recent SOD dataset against 22 competing techniques show our method's efficiency and effectiveness. We also offer evaluation using our own created cross-dataset, surveillance-SOD (S-SOD), to check the trained models' validity in terms of their applicability in diverse scenarios. The results indicate that the current models have limited generalization potentials, demanding further research in this direction. Our code and new dataset will be publicly available at https://github.com/tanveer-hussain/EfficientSOD △ Less

Submitted 12 February, 2021; originally announced February 2021.

arXiv:1610.01382 [pdf]

Divide-and-Conquer based Ensemble to Spot Emotions in Speech using MFCC and Random Forest

Authors: Abdul Malik Badshah, Jamil Ahmad, Mi Young Lee, Sung Wook Baik

Abstract: Besides spoken words, speech signals also carry information about speaker gender, age, and emotional state which can be used in a variety of speech analysis applications. In this paper, a divide and conquer strategy for ensemble classification has been proposed to recognize emotions in speech. Intrinsic hierarchy in emotions has been utilized to construct an emotions tree, which assisted in breaki… ▽ More Besides spoken words, speech signals also carry information about speaker gender, age, and emotional state which can be used in a variety of speech analysis applications. In this paper, a divide and conquer strategy for ensemble classification has been proposed to recognize emotions in speech. Intrinsic hierarchy in emotions has been utilized to construct an emotions tree, which assisted in breaking down the emotion recognition task into smaller sub tasks. The proposed framework generates predictions in three phases. Firstly, emotions are detected in the input speech signal by classifying it as neutral or emotional. If the speech is classified as emotional, then in the second phase, it is further classified into positive and negative classes. Finally, individual positive or negative emotions are identified based on the outcomes of the previous stages. Several experiments have been performed on a widely used benchmark dataset. The proposed method was able to achieve improved recognition rates as compared to several other approaches. △ Less

Submitted 5 October, 2016; originally announced October 2016.

Comments: 8 pages, conference paper, The 2nd International Integrated Conference & Concert on Convergence (2016)

arXiv:1601.01577 [pdf]

Gender Identification using MFCC for Telephone Applications - A Comparative Study

Authors: Jamil Ahmad, Mustansar Fiaz, Soon-il Kwon, Maleerat Sodanil, Bay Vo, Sung Wook Baik

Abstract: Gender recognition is an essential component of automatic speech recognition and interactive voice response systems. Determining gender of the speaker reduces the computational burden of such systems for any further processing. Typical methods for gender recognition from speech largely depend on features extraction and classification processes. The purpose of this study is to evaluate the performa… ▽ More Gender recognition is an essential component of automatic speech recognition and interactive voice response systems. Determining gender of the speaker reduces the computational burden of such systems for any further processing. Typical methods for gender recognition from speech largely depend on features extraction and classification processes. The purpose of this study is to evaluate the performance of various state-of-the-art classification methods along with tuning their parameters for helping selection of the optimal classification methods for gender recognition tasks. Five classification schemes including k-nearest neighbor, naïve Bayes, multilayer perceptron, random forest, and support vector machine are comprehensively evaluated for determination of gender from telephonic speech using the Mel-frequency cepstral coefficients. Different experiments were performed to determine the effects of training data sizes, length of the speech streams, and parameter tuning on classification performance. Results suggest that SVM is the best classifier among all the five schemes for gender recognition. △ Less

Submitted 7 January, 2016; originally announced January 2016.

Journal ref: International Journal of Computer Science and Electronics Engineering 3.5 (2015): 351-355

arXiv:1510.02177 [pdf]

Ontology-based Secure Retrieval of Semantically Significant Visual Contents

Authors: Khan Muhammad, Irfan Mehmood, Mi Young Lee, Su Mi Ji, Sung Wook Baik

Abstract: Image classification is an enthusiastic research field where large amount of image data is classified into various classes based on their visual contents. Researchers have presented various low-level features-based techniques for classifying images into different categories. However, efficient and effective classification and retrieval is still a challenging problem due to complex nature of visual… ▽ More Image classification is an enthusiastic research field where large amount of image data is classified into various classes based on their visual contents. Researchers have presented various low-level features-based techniques for classifying images into different categories. However, efficient and effective classification and retrieval is still a challenging problem due to complex nature of visual contents. In addition, the traditional information retrieval techniques are vulnerable to security risks, making it easy for attackers to retrieve personal visual contents such as patients records and law enforcement agencies databases. Therefore, we propose a novel ontology-based framework using image steganography for secure image classification and information retrieval. The proposed framework uses domain-specific ontology for mapping the low-level image features to high-level concepts of ontologies which consequently results in efficient classification. Furthermore, the proposed method utilizes image steganography for hiding the image semantics as a secret message inside them, making the information retrieval process secure from third parties. The proposed framework minimizes the computational complexity of traditional techniques, increasing its suitability for secure and real-time visual contents retrieval from personalized image databases. Experimental results confirm the efficiency, effectiveness, and security of the proposed framework as compared with other state-of-the-art systems. △ Less

Submitted 7 October, 2015; originally announced October 2015.

Comments: A short paper of 11 pages for secure visual contents retrieval.The original version can be accessed at this link: http://www.kingpc.or.kr/inc_html/index.html

Journal ref: Khan Muhammad, Irfan Mehmood, Mi Young Lee, Su Mi Ji, Sung Wook Baik, "Ontology-based Secure Retrieval of Semantically Significant Visual Contents," JOURNAL OF KOREAN INSTITUTE OF NEXT GENERATION COMPUTING, vol. 11, pp. 87-96, 2015

arXiv:1506.02100 [pdf]

doi 10.1007/s11042-015-2671-9

A novel magic LSB substitution method (M-LSB-SM) using multi-level encryption and achromatic component of an image

Authors: Khan Muhammad, Muhammad Sajjad, Irfan Mehmood, Seungmin Rho, Sung Wook Baik

Abstract: Image Steganography is a thriving research area of information security where secret data is embedded in images to hide its existence while getting the minimum possible statistical detectability. This paper proposes a novel magic least significant bit substitution method (M-LSB-SM) for RGB images. The proposed method is based on the achromatic component (I-plane) of the hue-saturation-intensity (H… ▽ More Image Steganography is a thriving research area of information security where secret data is embedded in images to hide its existence while getting the minimum possible statistical detectability. This paper proposes a novel magic least significant bit substitution method (M-LSB-SM) for RGB images. The proposed method is based on the achromatic component (I-plane) of the hue-saturation-intensity (HSI) color model and multi-level encryption (MLE) in the spatial domain. The input image is transposed and converted into an HSI color space. The I-plane is divided into four sub-images of equal size, rotating each sub-image with a different angle using a secret key. The secret information is divided into four blocks, which are then encrypted using an MLE algorithm (MLEA). Each sub-block of the message is embedded into one of the rotated sub-images based on a specific pattern using magic LSB substitution. Experimental results validate that the proposed method not only enhances the visual quality of stego images but also provides good imperceptibility and multiple security levels as compared to several existing prominent methods. △ Less

Submitted 5 June, 2015; originally announced June 2015.

Comments: This paper has been published in Multimedia Tools and Applications Journal with impact factor=1.058. The readers can study the formatted paper using the following link: http://link.springer.com/article/10.1007/s11042-015-2671-9. Please use sci-hub.org for downloading this paper if you are unable to access it freely or email us at [email protected]

Journal ref: Multimedia Tools and Applications, pp. 1-27, 2015

arXiv:1502.07041 [pdf]

Describing Colors, Textures and Shapes for Content Based Image Retrieval - A Survey

Authors: Jamil Ahmad, Muhammad Sajjad, Irfan Mehmood, Seungmin Rho, Sung Wook Baik

Abstract: Visual media has always been the most enjoyed way of communication. From the advent of television to the modern day hand held computers, we have witnessed the exponential growth of images around us. Undoubtedly it's a fact that they carry a lot of information in them which needs be utilized in an effective manner. Hence intense need has been felt to efficiently index and store large image collecti… ▽ More Visual media has always been the most enjoyed way of communication. From the advent of television to the modern day hand held computers, we have witnessed the exponential growth of images around us. Undoubtedly it's a fact that they carry a lot of information in them which needs be utilized in an effective manner. Hence intense need has been felt to efficiently index and store large image collections for effective and on- demand retrieval. For this purpose low-level features extracted from the image contents like color, texture and shape has been used. Content based image retrieval systems employing these features has proven very successful. Image retrieval has promising applications in numerous fields and hence has motivated researchers all over the world. New and improved ways to represent visual content are being developed each day. Tremendous amount of research has been carried out in the last decade. In this paper we will present a detailed overview of some of the powerful color, texture and shape descriptors for content based image retrieval. A comparative analysis will also be carried out for providing an insight into outstanding challenges in this field. △ Less

Submitted 24 February, 2015; originally announced February 2015.

Journal ref: (2014), Journal of Platform Technology 2(4): 34-48

Showing 1–8 of 8 results for author: Baik, S W