-
KnowRA: Knowledge Retrieval Augmented Method for Document-level Relation Extraction with Comprehensive Reasoning Abilities
Authors:
Chengcheng Mai,
Yuxiang Wang,
Ziyu Gong,
Hanxiang Wang,
Yihua Huang
Abstract:
Document-level relation extraction (Doc-RE) aims to extract relations between entities across multiple sentences. Therefore, Doc-RE requires more comprehensive reasoning abilities like humans, involving complex cross-sentence interactions between entities, contexts, and external general knowledge, compared to the sentence-level RE. However, most existing Doc-RE methods focus on optimizing single r…
▽ More
Document-level relation extraction (Doc-RE) aims to extract relations between entities across multiple sentences. Therefore, Doc-RE requires more comprehensive reasoning abilities like humans, involving complex cross-sentence interactions between entities, contexts, and external general knowledge, compared to the sentence-level RE. However, most existing Doc-RE methods focus on optimizing single reasoning ability, but lack the ability to utilize external knowledge for comprehensive reasoning on long documents. To solve these problems, a knowledge retrieval augmented method, named KnowRA, was proposed with comprehensive reasoning to autonomously determine whether to accept external knowledge to assist DocRE. Firstly, we constructed a document graph for semantic encoding and integrated the co-reference resolution model to augment the co-reference reasoning ability. Then, we expanded the document graph into a document knowledge graph by retrieving the external knowledge base for common-sense reasoning and a novel knowledge filtration method was presented to filter out irrelevant knowledge. Finally, we proposed the axis attention mechanism to build direct and indirect associations with intermediary entities for achieving cross-sentence logical reasoning. Extensive experiments conducted on two datasets verified the effectiveness of our method compared to the state-of-the-art baselines. Our code is available at https://anonymous.4open.science/r/KnowRA.
△ Less
Submitted 1 May, 2025; v1 submitted 31 December, 2024;
originally announced January 2025.
-
Learning Metadata-Agnostic Representations for Text-to-SQL In-Context Example Selection
Authors:
Chuhong Mai,
Ro-ee Tal,
Thahir Mohamed
Abstract:
In-context learning (ICL) is a powerful paradigm where large language models (LLMs) benefit from task demonstrations added to the prompt. Yet, selecting optimal demonstrations is not trivial, especially for complex or multi-modal tasks where input and output distributions differ. We hypothesize that forming task-specific representations of the input is key. In this paper, we propose a method to al…
▽ More
In-context learning (ICL) is a powerful paradigm where large language models (LLMs) benefit from task demonstrations added to the prompt. Yet, selecting optimal demonstrations is not trivial, especially for complex or multi-modal tasks where input and output distributions differ. We hypothesize that forming task-specific representations of the input is key. In this paper, we propose a method to align representations of natural language questions and those of SQL queries in a shared embedding space. Our technique, dubbed MARLO - Metadata-Agnostic Representation Learning for Text-tO-SQL - uses query structure to model querying intent without over-indexing on underlying database metadata (i.e. tables, columns, or domain-specific entities of a database referenced in the question or query). This allows MARLO to select examples that are structurally and semantically relevant for the task rather than examples that are spuriously related to a certain domain or question phrasing. When used to retrieve examples based on question similarity, MARLO shows superior performance compared to generic embedding models (on average +2.9\%pt. in execution accuracy) on the Spider benchmark. It also outperforms the next best method that masks metadata information by +0.8\%pt. in execution accuracy on average, while imposing a significantly lower inference latency.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
Generalization Enhancement Strategies to Enable Cross-year Cropland Mapping with Convolutional Neural Networks Trained Using Historical Samples
Authors:
Sam Khallaghi,
Rahebe Abedi,
Hanan Abou Ali,
Hamed Alemohammad,
Mary Dziedzorm Asipunu,
Ismail Alatise,
Nguyen Ha,
Boka Luo,
Cat Mai,
Lei Song,
Amos Wussah,
Sitian Xiong,
Yao-Ting Yao,
Qi Zhang,
Lyndon D. Estes
Abstract:
The accuracy of mapping agricultural fields across large areas is steadily improving with high-resolution satellite imagery and deep learning (DL) models, even in regions where fields are small and geometrically irregular. However, developing effective DL models often requires large, expensive label datasets, typically available only for specific years or locations. This limits the ability to crea…
▽ More
The accuracy of mapping agricultural fields across large areas is steadily improving with high-resolution satellite imagery and deep learning (DL) models, even in regions where fields are small and geometrically irregular. However, developing effective DL models often requires large, expensive label datasets, typically available only for specific years or locations. This limits the ability to create annual maps essential for agricultural monitoring, as domain shifts occur between years and regions due to changes in farming practices and environmental conditions. The challenge is to design a model flexible enough to account for these shifts without needing yearly labels. While domain adaptation techniques or semi-supervised training are common solutions, we explored enhancing the model's generalization power. Our results indicate that a holistic approach is essential, combining methods to improve generalization. Specifically, using an area-based loss function, such as Tversky-focal loss (TFL), significantly improved predictions across multiple years. The use of different augmentation techniques helped to encode different types of invariance, particularly photometric augmentations encoded invariance to brightness changes, though they increased false positives. The combination of photometric augmentation, TFL loss, and MC-dropout produced the best results, although dropout alone led to more false negatives in subsequent year predictions. Additionally, the choice of input normalization had a significant impact, with the best results obtained when statistics were calculated either locally or across the entire dataset over all bands (lab and gab). We developed a workflow that enabled a U-Net model to generate effective multi-year crop maps over large areas. Our code, available at: https://github.com/agroimpacts/cnn-generalization-enhancement, will be regularly updated with improvements.
△ Less
Submitted 14 August, 2024; v1 submitted 12 August, 2024;
originally announced August 2024.
-
Weakly Contrastive Learning via Batch Instance Discrimination and Feature Clustering for Small Sample SAR ATR
Authors:
Yikui Zhai,
Wenlve Zhou,
Bing Sun,
Jingwen Li,
Qirui Ke,
Zilu Ying,
Junying Gan,
Chaoyun Mai,
Ruggero Donida Labati,
Vincenzo Piuri,
Fabio Scotti
Abstract:
In recent years, impressive performance of deep learning technology has been recognized in Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR). Since a large amount of annotated data is required in this technique, it poses a trenchant challenge to the issue of obtaining a high recognition rate through less labeled data. To overcome this problem, inspired by the contrastive learning,…
▽ More
In recent years, impressive performance of deep learning technology has been recognized in Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR). Since a large amount of annotated data is required in this technique, it poses a trenchant challenge to the issue of obtaining a high recognition rate through less labeled data. To overcome this problem, inspired by the contrastive learning, we proposed a novel framework named Batch Instance Discrimination and Feature Clustering (BIDFC). In this framework, different from that of the objective of general contrastive learning methods, embedding distance between samples should be moderate because of the high similarity between samples in the SAR images. Consequently, our flexible framework is equipped with adjustable distance between embedding, which we term as weakly contrastive learning. Technically, instance labels are assigned to the unlabeled data in per batch and random augmentation and training are performed few times on these augmented data. Meanwhile, a novel Dynamic-Weighted Variance loss (DWV loss) function is also posed to cluster the embedding of enhanced versions for each sample. Experimental results on the moving and stationary target acquisition and recognition (MSTAR) database indicate a 91.25% classification accuracy of our method fine-tuned on only 3.13% training data. Even though a linear evaluation is performed on the same training data, the accuracy can still reach 90.13%. We also verified the effectiveness of BIDFC in OpenSarShip database, indicating that our method can be generalized to other datasets. Our code is avaliable at: https://github.com/Wenlve-Zhou/BIDFC-master.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Mind the Visual Discomfort: Assessing Event-Related Potentials as Indicators for Visual Strain in Head-Mounted Displays
Authors:
Francesco Chiossi,
Yannick Weiss,
Thomas Steinbrecher,
Christian Mai,
Thomas Kosch
Abstract:
When using Head-Mounted Displays (HMDs), users may not always notice or report visual discomfort by blurred vision through unadjusted lenses, motion sickness, and increased eye strain. Current measures for visual discomfort rely on users' self-reports those susceptible to subjective differences and lack of real-time insights. In this work, we investigate if Electroencephalography (EEG) can objecti…
▽ More
When using Head-Mounted Displays (HMDs), users may not always notice or report visual discomfort by blurred vision through unadjusted lenses, motion sickness, and increased eye strain. Current measures for visual discomfort rely on users' self-reports those susceptible to subjective differences and lack of real-time insights. In this work, we investigate if Electroencephalography (EEG) can objectively measure visual discomfort by sensing Event-Related Potentials (ERPs). In a user study (N=20), we compare four different levels of Gaussian blur in a user study while measuring ERPs at occipito-parietal EEG electrodes. The findings reveal that specific ERP components (i.e., P1, N2, and P3) discriminated discomfort-related visual stimuli and indexed increased load on visual processing and fatigue. We conclude that time-locked brain activity can be used to evaluate visual discomfort and propose EEG-based automatic discomfort detection and prevention tools.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
AsCL: An Asymmetry-sensitive Contrastive Learning Method for Image-Text Retrieval with Cross-Modal Fusion
Authors:
Ziyu Gong,
Chengcheng Mai,
Yihua Huang
Abstract:
The image-text retrieval task aims to retrieve relevant information from a given image or text. The main challenge is to unify multimodal representation and distinguish fine-grained differences across modalities, thereby finding similar contents and filtering irrelevant contents. However, existing methods mainly focus on unified semantic representation and concept alignment for multi-modalities, w…
▽ More
The image-text retrieval task aims to retrieve relevant information from a given image or text. The main challenge is to unify multimodal representation and distinguish fine-grained differences across modalities, thereby finding similar contents and filtering irrelevant contents. However, existing methods mainly focus on unified semantic representation and concept alignment for multi-modalities, while the fine-grained differences across modalities have rarely been studied before, making it difficult to solve the information asymmetry problem. In this paper, we propose a novel asymmetry-sensitive contrastive learning method. By generating corresponding positive and negative samples for different asymmetry types, our method can simultaneously ensure fine-grained semantic differentiation and unified semantic representation between multi-modalities. Additionally, a hierarchical cross-modal fusion method is proposed, which integrates global and local-level features through a multimodal attention mechanism to achieve concept alignment. Extensive experiments performed on MSCOCO and Flickr30K, demonstrate the effectiveness and superiority of our proposed method.
△ Less
Submitted 17 May, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
Embedding Democratic Values into Social Media AIs via Societal Objective Functions
Authors:
Chenyan Jia,
Michelle S. Lam,
Minh Chau Mai,
Jeff Hancock,
Michael S. Bernstein
Abstract:
Can we design artificial intelligence (AI) systems that rank our social media feeds to consider democratic values such as mitigating partisan animosity as part of their objective functions? We introduce a method for translating established, vetted social scientific constructs into AI objective functions, which we term societal objective functions, and demonstrate the method with application to the…
▽ More
Can we design artificial intelligence (AI) systems that rank our social media feeds to consider democratic values such as mitigating partisan animosity as part of their objective functions? We introduce a method for translating established, vetted social scientific constructs into AI objective functions, which we term societal objective functions, and demonstrate the method with application to the political science construct of anti-democratic attitudes. Traditionally, we have lacked observable outcomes to use to train such models, however, the social sciences have developed survey instruments and qualitative codebooks for these constructs, and their precision facilitates translation into detailed prompts for large language models. We apply this method to create a democratic attitude model that estimates the extent to which a social media post promotes anti-democratic attitudes, and test this democratic attitude model across three studies. In Study 1, we first test the attitudinal and behavioral effectiveness of the intervention among US partisans (N=1,380) by manually annotating (alpha=.895) social media posts with anti-democratic attitude scores and testing several feed ranking conditions based on these scores. Removal (d=.20) and downranking feeds (d=.25) reduced participants' partisan animosity without compromising their experience and engagement. In Study 2, we scale up the manual labels by creating the democratic attitude model, finding strong agreement with manual labels (rho=.75). Finally, in Study 3, we replicate Study 1 using the democratic attitude model instead of manual labels to test its attitudinal and behavioral impact (N=558), and again find that the feed downranking using the societal objective function reduced partisan animosity (d=.25). This method presents a novel strategy to draw on social science theory and methods to mitigate societal harms in social media AIs.
△ Less
Submitted 14 February, 2024; v1 submitted 25 July, 2023;
originally announced July 2023.
-
Privacy Computing Meets Metaverse: Necessity, Taxonomy and Challenges
Authors:
Chuan Chen,
Yuecheng Li,
Zhenpeng Wu,
Chengyuan Mai,
Youming Liu,
Yanming Hu,
Zibin Zheng,
Jiawen Kang
Abstract:
Metaverse, the core of the next-generation Internet, is a computer-generated holographic digital environment that simultaneously combines spatio-temporal, immersive, real-time, sustainable, interoperable, and data-sensitive characteristics. It cleverly blends the virtual and real worlds, allowing users to create, communicate, and transact in virtual form. With the rapid development of emerging tec…
▽ More
Metaverse, the core of the next-generation Internet, is a computer-generated holographic digital environment that simultaneously combines spatio-temporal, immersive, real-time, sustainable, interoperable, and data-sensitive characteristics. It cleverly blends the virtual and real worlds, allowing users to create, communicate, and transact in virtual form. With the rapid development of emerging technologies including augmented reality, virtual reality and blockchain, the metaverse system is becoming more and more sophisticated and widely used in various fields such as social, tourism, industry and economy. However, the high level of interaction with the real world also means a huge risk of privacy leakage both for individuals and enterprises, which has hindered the wide deployment of metaverse. Then, it is inevitable to apply privacy computing techniques in the framework of metaverse, which is a current research hotspot. In this paper, we conduct comprehensive research on the necessity, taxonomy and challenges when privacy computing meets metaverse. Specifically, we first introduce the underlying technologies and various applications of metaverse, on which we analyze the challenges of data usage in metaverse, especially data privacy. Next, we review and summarize state-of-the-art solutions based on federated learning, differential privacy, homomorphic encryption, and zero-knowledge proofs for different privacy problems in metaverse. Finally, we show the current security and privacy challenges in the development of metaverse and provide open directions for building a well-established privacy-preserving metaverse system. For easy access and reference, we integrate the related publications and their codes into a GitHub repository: https://github.com/6lyc/Awesome-Privacy-Computing-in-Metaverse.git.
△ Less
Submitted 21 February, 2024; v1 submitted 23 April, 2023;
originally announced April 2023.
-
Energy Efficiency Maximization in Large-Scale Cell-Free Massive MIMO: A Projected Gradient Approach
Authors:
Trang C. Mai,
Hien Quoc Ngo,
Le-Nam Tran
Abstract:
This paper considers the fundamental power allocation problem in cell-free massive mutiple-input and multiple-output (MIMO) systems which aims at maximizing the total energy efficiency (EE) under a sum power constraint at each access point (AP) and a quality-of-service (QoS) constraint at each user. Existing solutions for this optimization problem are based on solving a sequence of second-order co…
▽ More
This paper considers the fundamental power allocation problem in cell-free massive mutiple-input and multiple-output (MIMO) systems which aims at maximizing the total energy efficiency (EE) under a sum power constraint at each access point (AP) and a quality-of-service (QoS) constraint at each user. Existing solutions for this optimization problem are based on solving a sequence of second-order cone programs (SOCPs), whose computational complexity scales dramatically with the network size. Therefore, they are not implementable for practical large-scale cell-free massive MIMO systems. To tackle this issue, we propose an iterative power control algorithm based on the frame work of an accelerated projected gradient (APG) method. In particular, each iteration of the proposed method is done by simple closed-form expressions, where a penalty method is applied to bring constraints into the objective in the form of penalty functions. Finally, the convergence of the proposed algorithm is analytically proved and numerically compared to the known solution based on SOCP. Simulations results demonstrate that our proposed power control algorithm can achieve the same EE as the existing SOCPs-based method, but more importantly, its run time is much lower (one to two orders of magnitude reduction in run time, compared to the SOCPs-based approaches).
△ Less
Submitted 20 January, 2022;
originally announced January 2022.
-
#StayHome #WithMe: How Do YouTubers Help with COVID-19 Loneliness?
Authors:
Shuo Niu,
Ava Bartolome,
Cat Mai,
Nguyen B. Ha
Abstract:
Loneliness threatens public mental wellbeing during COVID-19. In response, YouTube creators participated in the #StayHome #WithMe movement (SHWM) and made myriad videos for people experiencing loneliness or boredom at home. User-shared videos generate parasocial attachment and virtual connectedness. However, there is limited knowledge of how creators contributed videos during disasters to provide…
▽ More
Loneliness threatens public mental wellbeing during COVID-19. In response, YouTube creators participated in the #StayHome #WithMe movement (SHWM) and made myriad videos for people experiencing loneliness or boredom at home. User-shared videos generate parasocial attachment and virtual connectedness. However, there is limited knowledge of how creators contributed videos during disasters to provide social provisions as disaster-relief. Grounded on Weiss's loneliness theory, this work analyzed 1488 SHWM videos to examine video sharing as a pathway to social provisions. Findings suggested that skill and knowledge sharing, entertaining arts, homelife activities, live chatting, and gameplay were the most popular video styles. YouTubers utilized parasocial relationships to form a space for staying away from the disaster. SHWM YouTubers provided friend-like, mentor-like, and family-like provisions through videos in different styles. Family-like provisions led to the highest overall viewer engagement. Based on the findings, design implications for supporting viewers' mental wellbeing in disasters are discussed.
△ Less
Submitted 13 January, 2021; v1 submitted 11 January, 2021;
originally announced January 2021.
-
Downlink Spectral Efficiency of Cell-Free Massive MIMO Systems with Multi-antenna Users
Authors:
Trang C. Mai,
Hien Quoc Ngo,
Trung Q. Duong
Abstract:
This paper studies a cell-free massive multiple-input multiple-output (MIMO) system where its access points (APs) and users are equipped with multiple antennas. Two transmission protocols are considered. In the first transmission protocol, there are no downlink pilots, while in the second transmission protocol, downlink pilots are proposed in order to improve the system performance. In both transm…
▽ More
This paper studies a cell-free massive multiple-input multiple-output (MIMO) system where its access points (APs) and users are equipped with multiple antennas. Two transmission protocols are considered. In the first transmission protocol, there are no downlink pilots, while in the second transmission protocol, downlink pilots are proposed in order to improve the system performance. In both transmission protocols, the users use the minimum mean-squared error-based successive interference cancellation (MMSE-SIC) scheme to detect the desired signals. For the analysis, we first derive a general spectral efficiency formula with arbitrary side information at the users. Then analytical expressions for the spectral efficiency of different transmission protocols are derived. To improve the spectral efficiency (SE) of the system, max-min fairness power control (PC) is applied for the first protocol by using the closed-form expression of its SE. Due to the computation complexity of deriving the closed-form performance expression of SE for the second protocol, we apply the optimal power coefficients of the first protocol to the second protocol. Numerical results show that two protocols combining with multi-antenna users are prerequisites to achieve the suboptimal SE regardless of the number of user in the system.
△ Less
Submitted 24 April, 2020;
originally announced April 2020.
-
Fast and Accurate Capitalization and Punctuation for Automatic Speech Recognition Using Transformer and Chunk Merging
Authors:
Binh Nguyen,
Vu Bao Hung Nguyen,
Hien Nguyen,
Pham Ngoc Phuong,
The-Loc Nguyen,
Quoc Truong Do,
Luong Chi Mai
Abstract:
In recent years, studies on automatic speech recognition (ASR) have shown outstanding results that reach human parity on short speech segments. However, there are still difficulties in standardizing the output of ASR such as capitalization and punctuation restoration for long-speech transcription. The problems obstruct readers to understand the ASR output semantically and also cause difficulties f…
▽ More
In recent years, studies on automatic speech recognition (ASR) have shown outstanding results that reach human parity on short speech segments. However, there are still difficulties in standardizing the output of ASR such as capitalization and punctuation restoration for long-speech transcription. The problems obstruct readers to understand the ASR output semantically and also cause difficulties for natural language processing models such as NER, POS and semantic parsing. In this paper, we propose a method to restore the punctuation and capitalization for long-speech ASR transcription. The method is based on Transformer models and chunk merging that allows us to (1), build a single model that performs punctuation and capitalization in one go, and (2), perform decoding in parallel while improving the prediction accuracy. Experiments on British National Corpus showed that the proposed approach outperforms existing methods in both accuracy and decoding speed.
△ Less
Submitted 6 August, 2019;
originally announced August 2019.
-
Security Update Labels: Establishing Economic Incentives for Security Patching of IoT Consumer Products
Authors:
Philipp Morgner,
Christoph Mai,
Nicole Koschate-Fischer,
Felix Freiling,
Zinaida Benenson
Abstract:
With the expansion of the Internet of Things (IoT), the number of security incidents due to insecure and misconfigured IoT devices is increasing. Especially on the consumer market, manufacturers focus on new features and early releases at the expense of a comprehensive security strategy. Hence, experts have started calling for regulation of the IoT consumer market, while policymakers are seeking f…
▽ More
With the expansion of the Internet of Things (IoT), the number of security incidents due to insecure and misconfigured IoT devices is increasing. Especially on the consumer market, manufacturers focus on new features and early releases at the expense of a comprehensive security strategy. Hence, experts have started calling for regulation of the IoT consumer market, while policymakers are seeking for suitable regulatory approaches. We investigate how manufacturers can be incentivized to increase sustainable security efforts for IoT products. We propose mandatory security update labels that inform consumers during buying decisions about the willingness of the manufacturer to provide security updates in the future. Mandatory means that the labels explicitly state when security updates are not guaranteed. We conducted a user study with more than 1,400 participants to assess the importance of security update labels for the consumer choice by means of a conjoint analysis. The results show that the availability of security updates (until which date the updates are guaranteed) accounts for 8% to 35% impact on overall consumers' choice, depending on the perceived security risk of the product category. For products with a high perceived security risk, this availability is twice as important as other high-ranked product attributes. Moreover, provisioning time for security updates (how quickly the product will be patched after a vulnerability is discovered) additionally accounts for 7% to 25% impact on consumers' choices. The proposed labels are intuitively understood by consumers, do not require product assessments by third parties before release, and have a potential to incentivize manufacturers to provide sustainable security support.
△ Less
Submitted 26 June, 2019;
originally announced June 2019.
-
Frontal Screens on Head-Mounted Displays to Increase Awareness of the HMD Users' State in Mixed Presence Collaboration
Authors:
Christian Mai,
Alexander Knittel,
Heinrich Hußmann
Abstract:
In the everyday context, e.g., a household, HMD users remain a part of the social life for Non-HMD users being co-located with them. Due to the social context situations arise that demand interaction between the HMD and the Non-HMD user. We focus on the challenge that the Non-HMD user is not able to interpret the HMD user's state -- e.g., attentiveness; the need for assistance --, as the HMD cover…
▽ More
In the everyday context, e.g., a household, HMD users remain a part of the social life for Non-HMD users being co-located with them. Due to the social context situations arise that demand interaction between the HMD and the Non-HMD user. We focus on the challenge that the Non-HMD user is not able to interpret the HMD user's state -- e.g., attentiveness; the need for assistance --, as the HMD covers the wearer's face. We propose a front facing display attached to the HMD that supports collaboration by showing the state. We explore the impact of abstract and realistic visualizations for such displays on collaborative performance and social presence in a within-subject user study (N=25). We present to the Non-HMD user (1) a blank screen (baseline), (2) textual representation of the user's state and (3) a representation that looks like the HMD is see-through. The results show positive effects for textual representation on collaborative performance and a positive effect of realistic representation on social presence. We conclude that when developing HMDs we need to take into account the social needs of everyday life to reduce the risk of social separation in a household context.
△ Less
Submitted 15 May, 2019;
originally announced May 2019.
-
A Qualitative Post-Experience Method for Evaluating Changes in VR Presence Experience Over Time
Authors:
Christian Mai,
Heinrich Hußmann
Abstract:
A particular measure to evaluate a head-mounted display (HMD) based experience is the state of feeling present in virtual reality. Interruptions of a presence experience - break in presence (BIP) - appearing over time, need to be detected to assess and improve an application. Existing methods either lack in taking these BIPs into account - questionnaires - or are complex in their application and e…
▽ More
A particular measure to evaluate a head-mounted display (HMD) based experience is the state of feeling present in virtual reality. Interruptions of a presence experience - break in presence (BIP) - appearing over time, need to be detected to assess and improve an application. Existing methods either lack in taking these BIPs into account - questionnaires - or are complex in their application and evaluation - physiological and behavioral measures -. To provide a practical approach, we propose a post-experience method in which the users reflect on their experience by drawing a line, indicating their experienced state of presence, in a paper-based drawing template. The amplitude of the drawn line represents the variation of their presence experience over time. We propose a descriptive model that describes temporal variations in the drawings by the definition of relevant points over time - e.g., putting on the HMD -, phases of the experience - e.g., transition into VR - and parameters - e.g., the transition time -. The descriptive model enables us to objectively evaluate user drawings and represent the course of the drawings by a defined set of parameters. An exploratory user study (N=30) showed that the drawings are very consistent, the method can detect all BIPs and shows good indications for representing the intensity of a BIP. With our method practitioners and researchers can accelerate the evaluation and optimization of experiences by evaluating BIPs. The possibility to store objective parameters paves the way for automated evaluation methods and big data approaches.
△ Less
Submitted 14 May, 2019;
originally announced May 2019.
-
A high quality and phonetic balanced speech corpus for Vietnamese
Authors:
Pham Ngoc Phuong,
Quoc Truong Do,
Luong Chi Mai
Abstract:
This paper presents a high quality Vietnamese speech corpus that can be used for analyzing Vietnamese speech characteristic as well as building speech synthesis models. The corpus consists of 5400 clean-speech utterances spoken by 12 speakers including 6 males and 6 females. The corpus is designed with phonetic balanced in mind so that it can be used for speech synthesis, especially, speech adapta…
▽ More
This paper presents a high quality Vietnamese speech corpus that can be used for analyzing Vietnamese speech characteristic as well as building speech synthesis models. The corpus consists of 5400 clean-speech utterances spoken by 12 speakers including 6 males and 6 females. The corpus is designed with phonetic balanced in mind so that it can be used for speech synthesis, especially, speech adaptation approaches. Specifically, all speakers utter a common dataset contains 250 phonetic balanced sentences. To increase the variety of speech context, each speaker also utters another 200 non-shared, phonetic-balanced sentences. The speakers are selected to cover a wide range of age and come from different regions of the North of Vietnam. The audios are recorded in a soundproof studio room, they are sampling at 48 kHz, 16 bits PCM, mono channel.
△ Less
Submitted 11 April, 2019;
originally announced April 2019.