Search | arXiv e-print repository

arXiv:1904.08709 [pdf, other]

doi 10.1016/j.datak.2018.07.006

Knowledge-rich Image Gist Understanding Beyond Literal Meaning

Authors: Lydia Weiland, Ioana Hulpus, Simone Paolo Ponzetto, Wolfgang Effelsberg, Laura Dietz

Abstract: We investigate the problem of understanding the message (gist) conveyed by images and their captions as found, for instance, on websites or news articles. To this end, we propose a methodology to capture the meaning of image-caption pairs on the basis of large amounts of machine-readable knowledge that has previously been shown to be highly effective for text understanding. Our method identifies t… ▽ More We investigate the problem of understanding the message (gist) conveyed by images and their captions as found, for instance, on websites or news articles. To this end, we propose a methodology to capture the meaning of image-caption pairs on the basis of large amounts of machine-readable knowledge that has previously been shown to be highly effective for text understanding. Our method identifies the connotation of objects beyond their denotation: where most approaches to image understanding focus on the denotation of objects, i.e., their literal meaning, our work addresses the identification of connotations, i.e., iconic meanings of objects, to understand the message of images. We view image understanding as the task of representing an image-caption pair on the basis of a wide-coverage vocabulary of concepts such as the one provided by Wikipedia, and cast gist detection as a concept-ranking problem with image-caption pairs as queries. To enable a thorough investigation of the problem of gist understanding, we produce a gold standard of over 300 image-caption pairs and over 8,000 gist annotations covering a wide variety of topics at different levels of abstraction. We use this dataset to experimentally benchmark the contribution of signals from heterogeneous sources, namely image and text. The best result with a Mean Average Precision (MAP) of 0.69 indicate that by combining both dimensions we are able to better understand the meaning of our image-caption pairs than when using language or vision information alone. We test the robustness of our gist detection approach when receiving automatically generated input, i.e., using automatically generated image tags or generated captions, and prove the feasibility of an end-to-end automated process. △ Less

Submitted 18 April, 2019; originally announced April 2019.

Journal ref: Data & Knowledge Engineering, Volume 117, September 2018, Pages 114-132

arXiv:1809.08593 [pdf, other]

Understanding the Gist of Images - Ranking of Concepts for Multimedia Indexing

Authors: Lydia Weiland, Simone Paolo Ponzetto, Wolfgang Effelsberg, Laura Dietz

Abstract: Nowadays, where multimedia data is continuously generated, stored, and distributed, multimedia indexing, with its purpose of group- ing similar data, becomes more important than ever. Understanding the gist (=message) of multimedia instances is framed in related work as a ranking of concepts from a knowledge base, i.e., Wikipedia. We cast the task of multimedia indexing as a gist understanding pro… ▽ More Nowadays, where multimedia data is continuously generated, stored, and distributed, multimedia indexing, with its purpose of group- ing similar data, becomes more important than ever. Understanding the gist (=message) of multimedia instances is framed in related work as a ranking of concepts from a knowledge base, i.e., Wikipedia. We cast the task of multimedia indexing as a gist understanding problem. Our pipeline benefits from external knowledge and two subsequent learning- to-rank (l2r) settings. The first l2r produces a ranking of concepts rep- resenting the respective multimedia instance. The second l2r produces a mapping between the concept representation of an instance and the targeted class topic(s) for the multimedia indexing task. The evaluation on an established big size corpus (MIRFlickr25k, with 25,000 images), shows that multimedia indexing benefits from understanding the gist. Finally, with a MAP of 61.42, it can be shown that the multimedia in- dexing task benefits from understanding the gist. Thus, the presented end-to-end setting outperforms DBM and competes with Hashing-based methods. △ Less

Submitted 23 September, 2018; originally announced September 2018.

arXiv:1409.8624 [pdf, ps, other]

The Optimal Input Distribution for Partial Decode-and-Forward in the MIMO Relay Channel

Authors: Lennart Gerdes, Christoph Hellings, Lorenz Weiland, Wolfgang Utschick

Abstract: This paper considers the partial decode-and-forward (PDF) strategy for the Gaussian multiple-input multiple-output (MIMO) relay channel. Unlike for the decode-and-forward (DF) strategy or point-to-point (P2P) transmission, for which Gaussian channel inputs are known to be optimal, the input distribution that maximizes the achievable PDF rate for the Gaussian MIMO relay channel has remained unknown… ▽ More This paper considers the partial decode-and-forward (PDF) strategy for the Gaussian multiple-input multiple-output (MIMO) relay channel. Unlike for the decode-and-forward (DF) strategy or point-to-point (P2P) transmission, for which Gaussian channel inputs are known to be optimal, the input distribution that maximizes the achievable PDF rate for the Gaussian MIMO relay channel has remained unknown so far. For some special cases, e.g., for relay channels where the optimal PDF strategy reduces to DF or P2P transmission, it could be deduced that Gaussian inputs maximize the PDF rate. For the general case, however, the problem has remained open until now. In this work, we solve this problem by proving that the maximum achievable PDF rate for the Gaussian MIMO relay channel is always attained by Gaussian channel inputs. Our proof relies on the channel enhancement technique, which was originally introduced by Weingarten et al. to derive the (private message) capacity region of the Gaussian MIMO broadcast channel. By combining this technique with a primal decomposition approach, we first establish that jointly Gaussian source and relay inputs maximize the achievable PDF rate for the aligned Gaussian MIMO relay channel. Subsequently, we use a limiting argument to extend this result from the aligned to the general Gaussian MIMO relay channel. △ Less

Submitted 30 September, 2014; originally announced September 2014.

Comments: 23 pages, 2 figures, submitted to IEEE Transactions on Information Theory

MSC Class: 94A15

Showing 1–3 of 3 results for author: Weiland, L