Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries

Zhuang, Bohan; Wu, Qi; Shen, Chunhua; Reid, Ian; Hengel, Anton van den

Computer Science > Computer Vision and Pattern Recognition

arXiv:1711.06370 (cs)

[Submitted on 17 Nov 2017]

Title:Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries

Authors:Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton van den Hengel

View PDF

Abstract:Recognising objects according to a pre-defined fixed set of class labels has been well studied in the Computer Vision. There are a great many practical applications where the subjects that may be of interest are not known beforehand, or so easily delineated, however. In many of these cases natural language dialog is a natural way to specify the subject of interest, and the task achieving this capability (a.k.a, Referring Expression Comprehension) has recently attracted attention. To this end we propose a unified framework, the ParalleL AttentioN (PLAN) network, to discover the object in an image that is being referred to in variable length natural expression descriptions, from short phrases query to long multi-round dialogs. The PLAN network has two attention mechanisms that relate parts of the expressions to both the global visual content and also directly to object candidates. Furthermore, the attention mechanisms are recurrent, making the referring process visualizable and explainable. The attended information from these dual sources are combined to reason about the referred object. These two attention mechanisms can be trained in parallel and we find the combined system outperforms the state-of-art on several benchmarked datasets with different length language input, such as RefCOCO, RefCOCO+ and GuessWhat?!.

Comments:	11 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1711.06370 [cs.CV]
	(or arXiv:1711.06370v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1711.06370

Submission history

From: Chunhua Shen [view email]
[v1] Fri, 17 Nov 2017 01:46:48 UTC (1,735 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators