STFCN: Spatio-Temporal FCN for Semantic Video Segmentation

Fayyaz, Mohsen; Saffar, Mohammad Hajizadeh; Sabokrou, Mohammad; Fathy, Mahmood; Klette, Reinhard; Huang, Fay

Computer Science > Computer Vision and Pattern Recognition

arXiv:1608.05971 (cs)

[Submitted on 21 Aug 2016 (v1), last revised 2 Sep 2016 (this version, v2)]

Title:STFCN: Spatio-Temporal FCN for Semantic Video Segmentation

Authors:Mohsen Fayyaz, Mohammad Hajizadeh Saffar, Mohammad Sabokrou, Mahmood Fathy, Reinhard Klette, Fay Huang

View PDF

Abstract:This paper presents a novel method to involve both spatial and temporal features for semantic video segmentation. Current work on convolutional neural networks(CNNs) has shown that CNNs provide advanced spatial features supporting a very good performance of solutions for both image and video analysis, especially for the semantic segmentation task. We investigate how involving temporal features also has a good effect on segmenting video data. We propose a module based on a long short-term memory (LSTM) architecture of a recurrent neural network for interpreting the temporal characteristics of video frames over time. Our system takes as input frames of a video and produces a correspondingly-sized output; for segmenting the video our method combines the use of three components: First, the regional spatial features of frames are extracted using a CNN; then, using LSTM the temporal features are added; finally, by deconvolving the spatio-temporal features we produce pixel-wise predictions. Our key insight is to build spatio-temporal convolutional networks (spatio-temporal CNNs) that have an end-to-end architecture for semantic video segmentation. We adapted fully some known convolutional network architectures (such as FCN-AlexNet and FCN-VGG16), and dilated convolution into our spatio-temporal CNNs. Our spatio-temporal CNNs achieve state-of-the-art semantic segmentation, as demonstrated for the Camvid and NYUDv2 datasets.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1608.05971 [cs.CV]
	(or arXiv:1608.05971v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1608.05971

Submission history

From: Mohsen Fayyaz [view email]
[v1] Sun, 21 Aug 2016 17:34:08 UTC (939 KB)
[v2] Fri, 2 Sep 2016 15:51:49 UTC (976 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:STFCN: Spatio-Temporal FCN for Semantic Video Segmentation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:STFCN: Spatio-Temporal FCN for Semantic Video Segmentation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators