Search | arXiv e-print repository

arXiv:1906.11759 [pdf, other]

Low-dimensional Embodied Semantics for Music and Language

Authors: Francisco Afonso Raposo, David Martins de Matos, Ricardo Ribeiro

Abstract: Embodied cognition states that semantics is encoded in the brain as firing patterns of neural circuits, which are learned according to the statistical structure of human multimodal experience. However, each human brain is idiosyncratically biased, according to its subjective experience history, making this biological semantic machinery noisy with respect to the overall semantics inherent to media… ▽ More Embodied cognition states that semantics is encoded in the brain as firing patterns of neural circuits, which are learned according to the statistical structure of human multimodal experience. However, each human brain is idiosyncratically biased, according to its subjective experience history, making this biological semantic machinery noisy with respect to the overall semantics inherent to media artifacts, such as music and language excerpts. We propose to represent shared semantics using low-dimensional vector embeddings by jointly modeling several brains from human subjects. We show these unsupervised efficient representations outperform the original high-dimensional fMRI voxel spaces in proxy music genre and language topic classification tasks. We further show that joint modeling of several subjects increases the semantic richness of the learned latent vector spaces. △ Less

Submitted 20 June, 2019; originally announced June 2019.

Comments: 6 pages, 1 figure, 1 table

ACM Class: I.2.6; H.5.5; H.5.1; I.2.7

arXiv:1903.10534 [pdf, other]

doi 10.1007/s00521-021-06090-8

Learning Embodied Semantics via Music and Dance Semiotic Correlations

Authors: Francisco Afonso Raposo, David Martins de Matos, Ricardo Ribeiro

Abstract: Music semantics is embodied, in the sense that meaning is biologically mediated by and grounded in the human body and brain. This embodied cognition perspective also explains why music structures modulate kinetic and somatosensory perception. We leverage this aspect of cognition, by considering dance as a proxy for music perception, in a statistical computational model that learns semiotic correla… ▽ More Music semantics is embodied, in the sense that meaning is biologically mediated by and grounded in the human body and brain. This embodied cognition perspective also explains why music structures modulate kinetic and somatosensory perception. We leverage this aspect of cognition, by considering dance as a proxy for music perception, in a statistical computational model that learns semiotic correlations between music audio and dance video. We evaluate the ability of this model to effectively capture underlying semantics in a cross-modal retrieval task. Quantitative results, validated with statistical significance testing, strengthen the body of evidence for embodied cognition in music and show the model can recommend music audio for dance video queries and vice-versa. △ Less

Submitted 25 March, 2019; originally announced March 2019.

Comments: 24 pages, 1 figure, 5 tables

Journal ref: Neural Computing and Applications, vol. 33, pp. 14481-14493, 2021

arXiv:1712.05197 [pdf, other]

Towards Deep Modeling of Music Semantics using EEG Regularizers

Authors: Francisco Raposo, David Martins de Matos, Ricardo Ribeiro, Suhua Tang, Yi Yu

Abstract: Modeling of music audio semantics has been previously tackled through learning of mappings from audio data to high-level tags or latent unsupervised spaces. The resulting semantic spaces are theoretically limited, either because the chosen high-level tags do not cover all of music semantics or because audio data itself is not enough to determine music semantics. In this paper, we propose a generic… ▽ More Modeling of music audio semantics has been previously tackled through learning of mappings from audio data to high-level tags or latent unsupervised spaces. The resulting semantic spaces are theoretically limited, either because the chosen high-level tags do not cover all of music semantics or because audio data itself is not enough to determine music semantics. In this paper, we propose a generic framework for semantics modeling that focuses on the perception of the listener, through EEG data, in addition to audio data. We implement this framework using a novel end-to-end 2-view Neural Network (NN) architecture and a Deep Canonical Correlation Analysis (DCCA) loss function that forces the semantic embedding spaces of both views to be maximally correlated. We also detail how the EEG dataset was collected and use it to train our proposed model. We evaluate the learned semantic space in a transfer learning context, by using it as an audio feature extractor in an independent dataset and proxy task: music audio-lyrics cross-modal retrieval. We show that our embedding model outperforms Spotify features and performs comparably to a state-of-the-art embedding model that was trained on 700 times more data. We further discuss improvements to the model that are likely to improve its performance. △ Less

Submitted 15 December, 2017; v1 submitted 14 December, 2017; originally announced December 2017.

Comments: 5 pages, 2 figures

ACM Class: H.5.5; H.5.1

arXiv:1711.08976 [pdf, other]

Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval

Authors: Yi Yu, Suhua Tang, Francisco Raposo, Lei Chen

Abstract: Little research focuses on cross-modal correlation learning where temporal structures of different data modalities such as audio and lyrics are taken into account. Stemming from the characteristic of temporal structures of music in nature, we are motivated to learn the deep sequential correlation between audio and lyrics. In this work, we propose a deep cross-modal correlation learning architectur… ▽ More Little research focuses on cross-modal correlation learning where temporal structures of different data modalities such as audio and lyrics are taken into account. Stemming from the characteristic of temporal structures of music in nature, we are motivated to learn the deep sequential correlation between audio and lyrics. In this work, we propose a deep cross-modal correlation learning architecture involving two-branch deep neural networks for audio modality and text modality (lyrics). Different modality data are converted to the same canonical space where inter modal canonical correlation analysis is utilized as an objective function to calculate the similarity of temporal structures. This is the first study on understanding the correlation between language and music audio through deep architectures for learning the paired temporal correlation of audio and lyrics. Pre-trained Doc2vec model followed by fully-connected layers (fully-connected deep neural network) is used to represent lyrics. Two significant contributions are made in the audio branch, as follows: i) pre-trained CNN followed by fully-connected layers is investigated for representing music audio. ii) We further suggest an end-to-end architecture that simultaneously trains convolutional layers and fully-connected layers to better learn temporal structures of music audio. Particularly, our end-to-end deep architecture contains two properties: simultaneously implementing feature learning and cross-modal correlation learning, and learning joint representation by considering temporal structures. Experimental results, using audio to retrieve lyrics or using lyrics to retrieve audio, verify the effectiveness of the proposed deep correlation learning architectures in cross-modal music retrieval. △ Less

Submitted 28 November, 2017; v1 submitted 24 November, 2017; originally announced November 2017.

arXiv:1612.02350 [pdf, other]

doi 10.1016/j.patrec.2019.03.014

An Information-theoretic Approach to Machine-oriented Music Summarization

Authors: Francisco Raposo, David Martins de Matos, Ricardo Ribeiro

Abstract: Music summarization allows for higher efficiency in processing, storage, and sharing of datasets. Machine-oriented approaches, being agnostic to human consumption, optimize these aspects even further. Such summaries have already been successfully validated in some MIR tasks. We now generalize previous conclusions by evaluating the impact of generic summarization of music from a probabilistic persp… ▽ More Music summarization allows for higher efficiency in processing, storage, and sharing of datasets. Machine-oriented approaches, being agnostic to human consumption, optimize these aspects even further. Such summaries have already been successfully validated in some MIR tasks. We now generalize previous conclusions by evaluating the impact of generic summarization of music from a probabilistic perspective. We estimate Gaussian distributions for original and summarized songs and compute their relative entropy, in order to measure information loss incurred by summarization. Our results suggest that relative entropy is a good predictor of summarization performance in the context of tasks relying on a bag-of-features model. Based on this observation, we further propose a straightforward yet expressive summarizer, which minimizes relative entropy with respect to the original song, that objectively outperforms previous methods and is better suited to avoid potential copyright issues. △ Less

Submitted 21 September, 2018; v1 submitted 7 December, 2016; originally announced December 2016.

Comments: 7 pages, 1 algorithm, 7 figures, 1 table, submitted to Pattern Recognition Letters (Elsevier)

ACM Class: H.5.5

Journal ref: Pattern Recognition Letters, vol. 123, pp. 75-81, 2019

arXiv:1506.01273 [pdf, other]

doi 10.1016/j.patrec.2015.12.016

Summarization of Films and Documentaries Based on Subtitles and Scripts

Authors: Marta Aparício, Paulo Figueiredo, Francisco Raposo, David Martins de Matos, Ricardo Ribeiro, Luís Marujo

Abstract: We assess the performance of generic text summarization algorithms applied to films and documentaries, using the well-known behavior of summarization of news articles as reference. We use three datasets: (i) news articles, (ii) film scripts and subtitles, and (iii) documentary subtitles. Standard ROUGE metrics are used for comparing generated summaries against news abstracts, plot summaries, and s… ▽ More We assess the performance of generic text summarization algorithms applied to films and documentaries, using the well-known behavior of summarization of news articles as reference. We use three datasets: (i) news articles, (ii) film scripts and subtitles, and (iii) documentary subtitles. Standard ROUGE metrics are used for comparing generated summaries against news abstracts, plot summaries, and synopses. We show that the best performing algorithms are LSA, for news articles and documentaries, and LexRank and Support Sets, for films. Despite the different nature of films and documentaries, their relative behavior is in accordance with that obtained for news articles. △ Less

Submitted 9 March, 2016; v1 submitted 3 June, 2015; originally announced June 2015.

Comments: 7 pages, 9 tables, 4 figures, submitted to Pattern Recognition Letters (Elsevier)

ACM Class: I.2.7

Journal ref: Pattern Recognition Letters, Volume 73, 1 April 2016, Pages 7-12

arXiv:1503.06666 [pdf, other]

doi 10.1109/TASLP.2016.2541299

Using Generic Summarization to Improve Music Information Retrieval Tasks

Authors: Francisco Raposo, Ricardo Ribeiro, David Martins de Matos

Abstract: In order to satisfy processing time constraints, many MIR tasks process only a segment of the whole music signal. This practice may lead to decreasing performance, since the most important information for the tasks may not be in those processed segments. In this paper, we leverage generic summarization algorithms, previously applied to text and speech summarization, to summarize items in music dat… ▽ More In order to satisfy processing time constraints, many MIR tasks process only a segment of the whole music signal. This practice may lead to decreasing performance, since the most important information for the tasks may not be in those processed segments. In this paper, we leverage generic summarization algorithms, previously applied to text and speech summarization, to summarize items in music datasets. These algorithms build summaries, that are both concise and diverse, by selecting appropriate segments from the input signal which makes them good candidates to summarize music as well. We evaluate the summarization process on binary and multiclass music genre classification tasks, by comparing the performance obtained using summarized datasets against the performances obtained using continuous segments (which is the traditional method used for addressing the previously mentioned time constraints) and full songs of the same original dataset. We show that GRASSHOPPER, LexRank, LSA, MMR, and a Support Sets-based Centrality model improve classification performance when compared to selected 30-second baselines. We also show that summarized datasets lead to a classification performance whose difference is not statistically significant from using full songs. Furthermore, we make an argument stating the advantages of sharing summarized datasets for future MIR research. △ Less

Submitted 9 March, 2016; v1 submitted 23 March, 2015; originally announced March 2015.

Comments: 24 pages, 10 tables; Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing

ACM Class: H.5.5

Journal ref: IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 24, n. 6, March 2016

arXiv:1406.4877 [pdf, ps, other]

doi 10.1109/LSP.2014.2347582

On the Application of Generic Summarization Algorithms to Music

Authors: Francisco Raposo, Ricardo Ribeiro, David Martins de Matos

Abstract: Several generic summarization algorithms were developed in the past and successfully applied in fields such as text and speech summarization. In this paper, we review and apply these algorithms to music. To evaluate this summarization's performance, we adopt an extrinsic approach: we compare a Fado Genre Classifier's performance using truncated contiguous clips against the summaries extracted with… ▽ More Several generic summarization algorithms were developed in the past and successfully applied in fields such as text and speech summarization. In this paper, we review and apply these algorithms to music. To evaluate this summarization's performance, we adopt an extrinsic approach: we compare a Fado Genre Classifier's performance using truncated contiguous clips against the summaries extracted with those algorithms on 2 different datasets. We show that Maximal Marginal Relevance (MMR), LexRank and Latent Semantic Analysis (LSA) all improve classification performance in both datasets used for testing. △ Less

Submitted 18 June, 2014; originally announced June 2014.

Comments: 12 pages, 1 table; Submitted to IEEE Signal Processing Letters

ACM Class: H.5.5

Journal ref: IEEE Signal Processing Letters, IEEE, vol. 22, n. 1, January 2015

Showing 1–8 of 8 results for author: Raposo, F