-
Automatic video scene segmentation based on spatial-temporal clues and rhythm
Authors:
Walid Mahdi,
Liming Chen,
Mohsen Ardebilian
Abstract:
With ever increasing computing power and data storage capacity, the potential for large digital video libraries is growing rapidly.However, the massive use of video for the moment is limited by its opaque characteristics. Indeed, a user who has to handle and retrieve sequentially needs too much time in order to find out segments of interest within a video. Therefore, providing an environment both…
▽ More
With ever increasing computing power and data storage capacity, the potential for large digital video libraries is growing rapidly.However, the massive use of video for the moment is limited by its opaque characteristics. Indeed, a user who has to handle and retrieve sequentially needs too much time in order to find out segments of interest within a video. Therefore, providing an environment both convenient and efficient for video storing and retrieval, especially for content-based searching as this exists in traditional textbased database systems, has been the focus of recent and important efforts of a large research community
In this paper, we propose a new automatic video scene segmentation method that explores two main video features; these are spatial-temporal relationship and rhythm of shots. The experimental evidence we obtained from a 80 minutevideo showed that our prototype provides very high accuracy for video segmentation.
△ Less
Submitted 15 December, 2014;
originally announced December 2014.
-
Lip Localization and Viseme Classification for Visual Speech Recognition
Authors:
Salah Werda,
Walid Mahdi,
Abdelmajid Ben Hamadou
Abstract:
The need for an automatic lip-reading system is ever increasing. Infact, today, extraction and reliable analysis of facial movements make up an important part in many multimedia systems such as videoconference, low communication systems, lip-reading systems. In addition, visual information is imperative among people with special needs. We can imagine, for example, a dependent person ordering a mac…
▽ More
The need for an automatic lip-reading system is ever increasing. Infact, today, extraction and reliable analysis of facial movements make up an important part in many multimedia systems such as videoconference, low communication systems, lip-reading systems. In addition, visual information is imperative among people with special needs. We can imagine, for example, a dependent person ordering a machine with an easy lip movement or by a simple syllable pronunciation. Moreover, people with hearing problems compensate for their special needs by lip-reading as well as listening to the person with whome they are talking.
△ Less
Submitted 19 January, 2013;
originally announced January 2013.
-
A Visual Grammar Approach for TV Program Identification
Authors:
Tarek Zlitni,
Walid Mahdi
Abstract:
Automatic identification of TV programs within TV streams is an important task for archive exploitation. This paper proposes a new spatial-temporal approach to identify programs in TV streams in two main steps: First, a reference catalogue for video grammars of visual jingles is constructed. We exploit visual grammars characterizing instances of the same program type in order to identify the vario…
▽ More
Automatic identification of TV programs within TV streams is an important task for archive exploitation. This paper proposes a new spatial-temporal approach to identify programs in TV streams in two main steps: First, a reference catalogue for video grammars of visual jingles is constructed. We exploit visual grammars characterizing instances of the same program type in order to identify the various program types in the TV stream. The role of video grammar is to represent the visual invariants for each visual jingle using a set of descriptors appropriate for each TV program. Secondly, programs in TV streams are identified by examining the similarity of the video signal to the visual grammars in the catalogue. The main idea of identification process consists in comparing the visual similarity of the video signal signature in TV stream to the catalogue elements. After presenting the proposed approach, the paper overviews the encouraging experimental results on several streams extracted from different channels and composed of several programs.
△ Less
Submitted 10 January, 2013;
originally announced January 2013.
-
AViTExt: Automatic Video Text Extraction, A new Approach for video content indexing Application
Authors:
Baseem Bouaziz,
Tarek Zlitni,
Walid Mahdi
Abstract:
In this paper, we propose a spatial temporal video-text detection technique which proceed in two principal steps:potential text region detection and a filtering process. In the first step we divide dynamically each pair of consecutive video frames into sub block in order to detect change. A significant difference between homologous blocks implies the appearance of an important object which may be…
▽ More
In this paper, we propose a spatial temporal video-text detection technique which proceed in two principal steps:potential text region detection and a filtering process. In the first step we divide dynamically each pair of consecutive video frames into sub block in order to detect change. A significant difference between homologous blocks implies the appearance of an important object which may be a text region. The temporal redundancy is then used to filter these regions and forms an effective text region. The experimentation driven on a variety of video sequences shows the effectiveness of our approach by obtaining a 89,39% as precision rate and 90,19 as recall.
△ Less
Submitted 10 January, 2013;
originally announced January 2013.
-
Content-Based Video Browsing by Text Region Localization and Classification
Authors:
Bassem Bouaziz,
Walid Mahdi,
Tarek Zlitni,
Abdelmajid ben Hamadou
Abstract:
The amount of digital video data is increasing over the world. It highlights the need for efficient algorithms that can index, retrieve and browse this data by content. This can be achieved by identifying semantic description captured automatically from video structure. Among these descriptions, text within video is considered as rich features that enable a good way for video indexing and browsing…
▽ More
The amount of digital video data is increasing over the world. It highlights the need for efficient algorithms that can index, retrieve and browse this data by content. This can be achieved by identifying semantic description captured automatically from video structure. Among these descriptions, text within video is considered as rich features that enable a good way for video indexing and browsing. Unlike most video text detection and extraction methods that treat video sequences as collections of still images, we propose in this paper spatiotemporal. video-text localization and identification approach which proceeds in two main steps: text region localization and text region classification. In the first step we detect the significant appearance of the new objects in a frame by a split and merge processes applied on binarized edge frame pair differences. Detected objects are, a priori, considered as text. They are then filtered according to both local contrast variation and texture criteria in order to get the effective ones. The resulted text regions are classified based on a visual grammar descriptor containing a set of semantic text class regions characterized by visual features. A visual table of content is then generated based on extracted text regions occurring within video sequence enriched by a semantic identification. The experimentation performed on a variety of video sequences shows the efficiency of our approach.
△ Less
Submitted 10 January, 2013;
originally announced January 2013.
-
A new approach for digit recognition based on hand gesture analysis
Authors:
Ahmed Ben Jmaa,
Walid Mahdi,
Yousra Ben Jemaa,
Abdelmajid Ben Hamadou
Abstract:
We present in this paper a new approach for hand gesture analysis that allows digit recognition. The analysis is based on extracting a set of features from a hand image and then combining them by using an induction graph. The most important features we extract from each image are the fingers locations, their heights and the distance between each pair of fingers. Our approach consists of three st…
▽ More
We present in this paper a new approach for hand gesture analysis that allows digit recognition. The analysis is based on extracting a set of features from a hand image and then combining them by using an induction graph. The most important features we extract from each image are the fingers locations, their heights and the distance between each pair of fingers. Our approach consists of three steps: (i) Hand detection and localization, (ii) fingers extraction and (iii) features identification and combination to digit recognition. Each input image is assumed to contain only one person, thus we apply a fuzzy classifier to identify the skin pixels. In the finger extraction step, we attempt to remove all the hand components except the fingers, this process is based on the hand anatomy properties. The final step consists on representing histogram of the detected fingers in order to extract features that will be used for digit recognition. The approach is invariant to scale, rotation and translation of the hand. Some experiments have been undertaken to show the effectiveness of the proposed approach.
△ Less
Submitted 27 June, 2009;
originally announced June 2009.