Search | arXiv e-print repository

Stay Hungry, Stay Foolish: On the Extended Reading Articles Generation with LLMs

Authors: Yow-Fu Liou, Yu-Chien Tang, An-Zi Yen

Abstract: The process of creating educational materials is both time-consuming and demanding for educators. This research explores the potential of Large Language Models (LLMs) to streamline this task by automating the generation of extended reading materials and relevant course suggestions. Using the TED-Ed Dig Deeper sections as an initial exploration, we investigate how supplementary articles can be enri… ▽ More The process of creating educational materials is both time-consuming and demanding for educators. This research explores the potential of Large Language Models (LLMs) to streamline this task by automating the generation of extended reading materials and relevant course suggestions. Using the TED-Ed Dig Deeper sections as an initial exploration, we investigate how supplementary articles can be enriched with contextual knowledge and connected to additional learning resources. Our method begins by generating extended articles from video transcripts, leveraging LLMs to include historical insights, cultural examples, and illustrative anecdotes. A recommendation system employing semantic similarity ranking identifies related courses, followed by an LLM-based refinement process to enhance relevance. The final articles are tailored to seamlessly integrate these recommendations, ensuring they remain cohesive and informative. Experimental evaluations demonstrate that our model produces high-quality content and accurate course suggestions, assessed through metrics such as Hit Rate, semantic similarity, and coherence. Our experimental analysis highlight the nuanced differences between the generated and existing materials, underscoring the model's capacity to offer more engaging and accessible learning experiences. This study showcases how LLMs can bridge the gap between core content and supplementary learning, providing students with additional recommended resources while also assisting teachers in designing educational materials. △ Less

Submitted 21 April, 2025; originally announced April 2025.

Comments: Accepted by iRAISE@AAAI2025

arXiv:2410.12330 [pdf, other]

MAX: Masked Autoencoder for X-ray Fluorescence in Geological Investigation

Authors: An-Sheng Lee, Yu-Wen Pao, Hsuan-Tien Lin, Sofia Ya Hsuan Liou

Abstract: Pre-training foundation models has become the de-facto procedure for deep learning approaches, yet its application remains limited in the geological studies, where in needs of the model transferability to break the shackle of data scarcity. Here we target on the X-ray fluorescence (XRF) scanning data, a standard high-resolution measurement in extensive scientific drilling projects. We propose a sc… ▽ More Pre-training foundation models has become the de-facto procedure for deep learning approaches, yet its application remains limited in the geological studies, where in needs of the model transferability to break the shackle of data scarcity. Here we target on the X-ray fluorescence (XRF) scanning data, a standard high-resolution measurement in extensive scientific drilling projects. We propose a scalable self-supervised learner, masked autoencoders on XRF spectra (MAX), to pre-train a foundation model covering geological records from multiple regions of the Pacific and Southern Ocean. In pre-training, we find that masking a high proportion of the input spectrum (50\%) yields a nontrivial and meaningful self-supervisory task. For downstream tasks, we select the quantification of XRF spectra into two costly geochemical measurements, CaCO$_3$ and total organic carbon, due to their importance in understanding the paleo-oceanic carbon system. Our results show that MAX, requiring only one-third of the data, outperforms models without pre-training in terms of quantification accuracy. Additionally, the model's generalizability improves by more than 60\% in zero-shot tests on new materials, with explainability further ensuring its robustness. Thus, our approach offers a promising pathway to overcome data scarcity in geological discovery by leveraging the self-supervised foundation model and fast-acquired XRF scanning data. △ Less

Submitted 16 October, 2024; originally announced October 2024.

arXiv:2303.16637 [pdf, other]

MuRAL: Multi-Scale Region-based Active Learning for Object Detection

Authors: Yi-Syuan Liou, Tsung-Han Wu, Jia-Fong Yeh, Wen-Chin Chen, Winston H. Hsu

Abstract: Obtaining large-scale labeled object detection dataset can be costly and time-consuming, as it involves annotating images with bounding boxes and class labels. Thus, some specialized active learning methods have been proposed to reduce the cost by selecting either coarse-grained samples or fine-grained instances from unlabeled data for labeling. However, the former approaches suffer from redundant… ▽ More Obtaining large-scale labeled object detection dataset can be costly and time-consuming, as it involves annotating images with bounding boxes and class labels. Thus, some specialized active learning methods have been proposed to reduce the cost by selecting either coarse-grained samples or fine-grained instances from unlabeled data for labeling. However, the former approaches suffer from redundant labeling, while the latter methods generally lead to training instability and sampling bias. To address these challenges, we propose a novel approach called Multi-scale Region-based Active Learning (MuRAL) for object detection. MuRAL identifies informative regions of various scales to reduce annotation costs for well-learned objects and improve training performance. The informative region score is designed to consider both the predicted confidence of instances and the distribution of each object category, enabling our method to focus more on difficult-to-detect classes. Moreover, MuRAL employs a scale-aware selection strategy that ensures diverse regions are selected from different scales for labeling and downstream finetuning, which enhances training stability. Our proposed method surpasses all existing coarse-grained and fine-grained baselines on Cityscapes and MS COCO datasets, and demonstrates significant improvement in difficult category performance. △ Less

Submitted 29 March, 2023; originally announced March 2023.

arXiv:2202.06484 [pdf, other]

D2ADA: Dynamic Density-aware Active Domain Adaptation for Semantic Segmentation

Authors: Tsung-Han Wu, Yi-Syuan Liou, Shao-Ji Yuan, Hsin-Ying Lee, Tung-I Chen, Kuan-Chih Huang, Winston H. Hsu

Abstract: In the field of domain adaptation, a trade-off exists between the model performance and the number of target domain annotations. Active learning, maximizing model performance with few informative labeled data, comes in handy for such a scenario. In this work, we present D2ADA, a general active domain adaptation framework for semantic segmentation. To adapt the model to the target domain with minim… ▽ More In the field of domain adaptation, a trade-off exists between the model performance and the number of target domain annotations. Active learning, maximizing model performance with few informative labeled data, comes in handy for such a scenario. In this work, we present D2ADA, a general active domain adaptation framework for semantic segmentation. To adapt the model to the target domain with minimum queried labels, we propose acquiring labels of the samples with high probability density in the target domain yet with low probability density in the source domain, complementary to the existing source domain labeled data. To further facilitate labeling efficiency, we design a dynamic scheduling policy to adjust the labeling budgets between domain exploration and model uncertainty over time. Extensive experiments show that our method outperforms existing active learning and domain adaptation baselines on two benchmarks, GTA5 -> Cityscapes and SYNTHIA -> Cityscapes. With less than 5% target domain annotations, our method reaches comparable results with that of full supervision. Our code is publicly available at https://github.com/tsunghan-wu/D2ADA. △ Less

Submitted 18 July, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

Comments: Accepted by ECCV 2022. The code is available at https://github.com/tsunghan-wu/D2ADA

arXiv:2112.01348 [pdf, other]

3rd Place Solution for NeurIPS 2021 Shifts Challenge: Vehicle Motion Prediction

Authors: Ching-Yu Tseng, Po-Shao Lin, Yu-Jia Liou, Kuan-Chih Huang, Winston H. Hsu

Abstract: Shifts Challenge: Robustness and Uncertainty under Real-World Distributional Shift is a competition held by NeurIPS 2021. The objective of this competition is to search for methods to solve the motion prediction problem in cross-domain. In the real world dataset, It exists variance between input data distribution and ground-true data distribution, which is called the domain shift problem. In this… ▽ More Shifts Challenge: Robustness and Uncertainty under Real-World Distributional Shift is a competition held by NeurIPS 2021. The objective of this competition is to search for methods to solve the motion prediction problem in cross-domain. In the real world dataset, It exists variance between input data distribution and ground-true data distribution, which is called the domain shift problem. In this report, we propose a new architecture inspired by state of the art papers. The main contribution is the backbone architecture with self-attention mechanism and predominant loss function. Subsequently, we won 3rd place as shown on the leaderboard. △ Less

Submitted 2 December, 2021; originally announced December 2021.

Journal ref: Bayesian Deep Learning Workshop, NeurIPS 2021

arXiv:2110.05221 [pdf, other]

Multi-Task Learning for Situated Multi-Domain End-to-End Dialogue Systems

Authors: Po-Nien Kung, Chung-Cheng Chang, Tse-Hsuan Yang, Hsin-Kai Hsu, Yu-Jia Liou, Yun-Nung Chen

Abstract: Task-oriented dialogue systems have been a promising area in the NLP field. Previous work showed the effectiveness of using a single GPT-2 based model to predict belief states and responses via causal language modeling. In this paper, we leverage multi-task learning techniques to train a GPT-2 based model on a more challenging dataset with multiple domains, multiple modalities, and more diversity… ▽ More Task-oriented dialogue systems have been a promising area in the NLP field. Previous work showed the effectiveness of using a single GPT-2 based model to predict belief states and responses via causal language modeling. In this paper, we leverage multi-task learning techniques to train a GPT-2 based model on a more challenging dataset with multiple domains, multiple modalities, and more diversity in output formats. Using only a single model, our method achieves better performance on all sub-tasks, across domains, compared to task and domain-specific models. Furthermore, we evaluated several proposed strategies for GPT-2 based dialogue systems with comprehensive ablation studies, showing that all techniques can further improve the performance. △ Less

Submitted 11 October, 2021; originally announced October 2021.

arXiv:2109.03551 [pdf, other]

Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion

Authors: Yi-Syuan Liou, Wen-Chin Huang, Ming-Chi Yen, Shu-Wei Tsai, Yu-Huai Peng, Tomoki Toda, Yu Tsao, Hsin-Min Wang

Abstract: Voice conversion (VC) is an effective approach to electrolaryngeal (EL) speech enhancement, a task that aims to improve the quality of the artificial voice from an electrolarynx device. In frame-based VC methods, time alignment needs to be performed prior to model training, and the dynamic time warping (DTW) algorithm is widely adopted to compute the best time alignment between each utterance pair… ▽ More Voice conversion (VC) is an effective approach to electrolaryngeal (EL) speech enhancement, a task that aims to improve the quality of the artificial voice from an electrolarynx device. In frame-based VC methods, time alignment needs to be performed prior to model training, and the dynamic time warping (DTW) algorithm is widely adopted to compute the best time alignment between each utterance pair. The validity is based on the assumption that the same phonemes of the speakers have similar features and can be mapped by measuring a pre-defined distance between speech frames of the source and the target. However, the special characteristics of the EL speech can break the assumption, resulting in a sub-optimal DTW alignment. In this work, we propose to use lip images for time alignment, as we assume that the lip movements of laryngectomee remain normal compared to healthy people. We investigate two naive lip representations and distance metrics, and experimental results demonstrate that the proposed method can significantly outperform the audio-only alignment in terms of objective and subjective evaluations. △ Less

Submitted 8 September, 2021; originally announced September 2021.

Comments: Accepted to APSIPA ASC 2021

arXiv:1902.08524 [pdf]

doi 10.1063/1.5093299

Acoustically modulated optical emission of hexagonal boron nitride layers

Authors: F. Iikawa, A. Hernández-Mínguez, I. Aharonovich, S. Nakhaie, Y. -T. Liou, J. M. J. Lopes, P. V. Santos

Abstract: We investigate the effect of surface acoustic waves on the atomic-like optical emission from defect centers in hexagonal boron nitride layers deposited on the surface of a LiNbO$_3$ substrate. The dynamic strain field of the surface acoustic waves modulates the emission lines resulting in intensity variations as large as 50% and oscillations of the emission energy with an amplitude of almost 1 meV… ▽ More We investigate the effect of surface acoustic waves on the atomic-like optical emission from defect centers in hexagonal boron nitride layers deposited on the surface of a LiNbO$_3$ substrate. The dynamic strain field of the surface acoustic waves modulates the emission lines resulting in intensity variations as large as 50% and oscillations of the emission energy with an amplitude of almost 1 meV. From a systematic study of the dependence of the modulation on the acoustic wave power, we determine a hydrostatic deformation potential for defect centers in this two-dimensional material of about 40 meV/%. Furthermore, we show that the dynamic piezoelectric field of the acoustic wave could contribute to the stabilization of the optical properties of these centers. Our results show that surface acoustic waves are a powerful tool to modulate and control the electronic states of two-dimensional materials. △ Less

Submitted 16 August, 2019; v1 submitted 22 February, 2019; originally announced February 2019.

Journal ref: Appl. Phys. Lett. 114, 171104 (2019)

arXiv:1811.00040 [pdf, ps, other]

doi 10.1088/1361-6463/aad593

Interaction of surface acoustic waves with electronic excitations in graphene

Authors: A. Hernández-Mínguez, Y. -T. Liou, P. V. Santos

Abstract: This article reviews the main theoretical and experimental advances regarding the interaction between surface acoustic waves (SAWs) and electronic excitations in graphene. The coupling of the graphene electron gas to the SAW piezoelectric field can modify the propagation properties of the SAW, and even amplify the intensity of SAWs traveling along the graphene layer. Conversely, the periodic elect… ▽ More This article reviews the main theoretical and experimental advances regarding the interaction between surface acoustic waves (SAWs) and electronic excitations in graphene. The coupling of the graphene electron gas to the SAW piezoelectric field can modify the propagation properties of the SAW, and even amplify the intensity of SAWs traveling along the graphene layer. Conversely, the periodic electric and strain fields of the SAW can be used to modify the graphene Dirac cone and to couple light into graphene plasmons. Finally, SAWs can generate acousto-electric currents in graphene. These increase linearly with the SAW frequency and power but, in contrast to conventional currents, they depend non-monotonously on the graphene electric conductivity. Most of these functionalities have been reported in graphene transferred to the surface of strong piezoelectric insulators. The recent observation of acousto-electric currents in epitaxial graphene on SiC opens the way to the large-scale fabrication of graphene-based acousto-electric devices patterned directly on a semi-insulating wafer. △ Less

Submitted 31 October, 2018; originally announced November 2018.

Journal ref: J. Phys. D: Appl. Phys. 51, 383001 (2018)

arXiv:1802.07934 [pdf, other]

Adversarial Learning for Semi-Supervised Semantic Segmentation

Authors: Wei-Chih Hung, Yi-Hsuan Tsai, Yan-Ting Liou, Yen-Yu Lin, Ming-Hsuan Yang

Abstract: We propose a method for semi-supervised semantic segmentation using an adversarial network. While most existing discriminators are trained to classify input images as real or fake on the image level, we design a discriminator in a fully convolutional manner to differentiate the predicted probability maps from the ground truth segmentation distribution with the consideration of the spatial resoluti… ▽ More We propose a method for semi-supervised semantic segmentation using an adversarial network. While most existing discriminators are trained to classify input images as real or fake on the image level, we design a discriminator in a fully convolutional manner to differentiate the predicted probability maps from the ground truth segmentation distribution with the consideration of the spatial resolution. We show that the proposed discriminator can be used to improve semantic segmentation accuracy by coupling the adversarial loss with the standard cross entropy loss of the proposed model. In addition, the fully convolutional discriminator enables semi-supervised learning through discovering the trustworthy regions in predicted results of unlabeled images, thereby providing additional supervisory signals. In contrast to existing methods that utilize weakly-labeled images, our method leverages unlabeled images to enhance the segmentation model. Experimental results on the PASCAL VOC 2012 and Cityscapes datasets demonstrate the effectiveness of the proposed algorithm. △ Less

Submitted 24 July, 2018; v1 submitted 22 February, 2018; originally announced February 2018.

Comments: Accepted in BMVC 2018. Code and models available at https://github.com/hfslyc/AdvSemiSeg

arXiv:1708.05236 [pdf, ps, other]

doi 10.1088/1361-6463/aa8e8a

Acousto-electric transport in MgO/ZnO-covered graphene on SiC

Authors: Yi-Ting Liou, Alberto Hernández-Mínguez, Jens Herfort, João Marcelo J. Lopes, Abbes Tahraoui, Paulo V. Santos

Abstract: We investigate the acousto-electric transport induced by surface acoustic waves (SAWs) in epitaxial graphene (EG) coated by a MgO/ZnO film. The deposition of a thin MgO layer protects the EG during the sputtering of a piezoelectric ZnO film for the efficient generation of SAWs. We demonstrate by Raman and electric measurements that the coating does not harm the EG structural and electronic propert… ▽ More We investigate the acousto-electric transport induced by surface acoustic waves (SAWs) in epitaxial graphene (EG) coated by a MgO/ZnO film. The deposition of a thin MgO layer protects the EG during the sputtering of a piezoelectric ZnO film for the efficient generation of SAWs. We demonstrate by Raman and electric measurements that the coating does not harm the EG structural and electronic properties. We report the generation of two SAW modes with frequencies around 2 GHz. For both modes, we measure acousto-electric currents in EG devices placed in the SAW propagation path. The currents increase linearly with the SAW power, reaching values up to almost two orders of magnitude higher than in previous reports for acousto-electric transport in EG on SiC. Our results agree with the predictions from the classical relaxation model of the interaction between SAWs and a two dimensional electron gas. △ Less

Submitted 22 May, 2018; v1 submitted 17 August, 2017; originally announced August 2017.

Journal ref: J. Phys. D: Appl. Phys. 50 (2017) 464008

arXiv:1506.02327 [pdf, other]

A Multi-layered Acoustic Tokenizing Deep Neural Network (MAT-DNN) for Unsupervised Discovery of Linguistic Units and Generation of High Quality Features

Authors: Cheng-Tao Chung, Cheng-Yu Tsai, Hsiang-Hung Lu, Yuan-ming Liou, Yen-Chen Wu, Yen-Ju Lu, Hung-yi Lee, Lin-shan Lee

Abstract: This paper summarizes the work done by the authors for the Zero Resource Speech Challenge organized in the technical program of Interspeech 2015. The goal of the challenge is to discover linguistic units directly from unlabeled speech data. The Multi-layered Acoustic Tokenizer (MAT) proposed in this work automatically discovers multiple sets of acoustic tokens from the given corpus. Each acoustic… ▽ More This paper summarizes the work done by the authors for the Zero Resource Speech Challenge organized in the technical program of Interspeech 2015. The goal of the challenge is to discover linguistic units directly from unlabeled speech data. The Multi-layered Acoustic Tokenizer (MAT) proposed in this work automatically discovers multiple sets of acoustic tokens from the given corpus. Each acoustic token set is specified by a set of hyperparameters that describe the model configuration. These sets of acoustic tokens carry different characteristics of the given corpus and the language behind thus can be mutually reinforced. The multiple sets of token labels are then used as the targets of a Multi-target DNN (MDNN) trained on low-level acoustic features. Bottleneck features extracted from the MDNN are used as feedback for the MAT and the MDNN itself. We call this iterative system the Multi-layered Acoustic Tokenizing Deep Neural Network (MAT-DNN) which generates high quality features for track 1 of the challenge and acoustic tokens for track 2 of the challenge. △ Less

Submitted 7 June, 2015; originally announced June 2015.

Comments: submitted to Interspeech 2015

Showing 1–12 of 12 results for author: Liou, Y