Skip to main content

Showing 1–22 of 22 results for author: Hsiao, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.06053  [pdf, other

    cs.SD eess.AS

    PyNeuralFx: A Python Package for Neural Audio Effect Modeling

    Authors: Yen-Tung Yeh, Wen-Yi Hsiao, Yi-Hsuan Yang

    Abstract: We present PyNeuralFx, an open-source Python toolkit designed for research on neural audio effect modeling. The toolkit provides an intuitive framework and offers a comprehensive suite of features, including standardized implementation of well-established model architectures, loss functions, and easy-to-use visualization tools. As such, it helps promote reproducibility for research on neural audio… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: toolkit paper

  2. arXiv:2408.04829  [pdf, other

    cs.SD eess.AS

    Hyper Recurrent Neural Network: Condition Mechanisms for Black-box Audio Effect Modeling

    Authors: Yen-Tung Yeh, Wen-Yi Hsiao, Yi-Hsuan Yang

    Abstract: Recurrent neural networks (RNNs) have demonstrated impressive results for virtual analog modeling of audio effects. These networks process time-domain audio signals using a series of matrix multiplication and nonlinear activation functions to emulate the behavior of the target device accurately. To additionally model the effect of the knobs for an RNN-based model, existing approaches integrate con… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted to DAFx24

  3. arXiv:2407.15060  [pdf, other

    cs.SD cs.AI eess.AS

    MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation

    Authors: Yun-Han Lan, Wen-Yi Hsiao, Hao-Chung Cheng, Yi-Hsuan Yang

    Abstract: Existing text-to-music models can produce high-quality audio with great diversity. However, textual prompts alone cannot precisely control temporal musical features such as chords and rhythm of the generated music. To address this challenge, we introduce MusiConGen, a temporally-conditioned Transformer-based text-to-music model that builds upon the pretrained MusicGen framework. Our innovation lie… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Accepted by the 25th International Society for Music Information Retrieval (ISMIR)

  4. arXiv:2208.05476  [pdf, other

    cs.CR cs.AI

    Sequence Feature Extraction for Malware Family Analysis via Graph Neural Network

    Authors: S. W. Hsiao, P. Y. Chu

    Abstract: Malicious software (malware) causes much harm to our devices and life. We are eager to understand the malware behavior and the threat it made. Most of the record files of malware are variable length and text-based files with time stamps, such as event log data and dynamic analysis profiles. Using the time stamps, we can sort such data into sequence-based data for the following analysis. However, d… ▽ More

    Submitted 10 August, 2022; originally announced August 2022.

    Comments: 13 pages

  5. arXiv:2208.04756  [pdf, other

    cs.SD eess.AS

    DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation

    Authors: Da-Yi Wu, Wen-Yi Hsiao, Fu-Rong Yang, Oscar Friedman, Warren Jackson, Scott Bruzenak, Yi-Wen Liu, Yi-Hsuan Yang

    Abstract: A vocoder is a conditional audio generation model that converts acoustic features such as mel-spectrograms into waveforms. Taking inspiration from Differentiable Digital Signal Processing (DDSP), we propose a new vocoder named SawSing for singing voices. SawSing synthesizes the harmonic part of singing voices by filtering a sawtooth source signal with a linear time-variant finite impulse response… ▽ More

    Submitted 18 August, 2022; v1 submitted 9 August, 2022; originally announced August 2022.

    Comments: Accepted at ISMIR 2022

    Journal ref: International Society for Music Information Retrieval (ISMIR) 2022

  6. arXiv:2202.09907  [pdf, other

    cs.SD eess.AS

    towards automatic transcription of polyphonic electric guitar music:a new dataset and a multi-loss transformer model

    Authors: Yu-Hua Chen, Wen-Yi Hsiao, Tsu-Kuang Hsieh, Jyh-Shing Roger Jang, Yi-Hsuan Yang

    Abstract: In this paper, we propose a new dataset named EGDB, that con-tains transcriptions of the electric guitar performance of 240 tab-latures rendered with different tones. Moreover, we benchmark theperformance of two well-known transcription models proposed orig-inally for the piano on this dataset, along with a multi-loss Trans-former model that we newly propose. Our evaluation on this datasetand a se… ▽ More

    Submitted 20 February, 2022; originally announced February 2022.

    Comments: to be published at ICASSP 2022

  7. arXiv:2106.08703  [pdf, other

    cs.SD cs.LG eess.AS

    Source Separation-based Data Augmentation for Improved Joint Beat and Downbeat Tracking

    Authors: Ching-Yu Chiu, Joann Ching, Wen-Yi Hsiao, Yu-Hua Chen, Alvin Wen-Yu Su, Yi-Hsuan Yang

    Abstract: Due to advances in deep learning, the performance of automatic beat and downbeat tracking in musical audio signals has seen great improvement in recent years. In training such deep learning based models, data augmentation has been found an important technique. However, existing data augmentation methods for this task mainly target at balancing the distribution of the training data with respect to… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: Accepted to European Signal Processing Conference (EUSIPCO 2021)

  8. arXiv:2102.10013  [pdf, other

    cs.GR cs.CG

    Curvy: An Interactive Design Tool for Varying Density Support Structures

    Authors: Erva Ulu, Nurcan Gecer Ulu, Jiahao Li, Walter Hsiao

    Abstract: We introduce Curvy-an interactive design tool to generate varying density support structures for 3D printing. Support structures are essential for printing models with extreme overhangs. Yet, they often cause defects on contact areas, resulting in poor surface quality. Low-level design of support structures may alleviate such negative effects. However, it is tedious and unintuitive for novice user… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

    Comments: Submitted to Computer Graphics Forum

  9. arXiv:2102.01690  [pdf, other

    cs.CV

    From Culture to Clothing: Discovering the World Events Behind A Century of Fashion Images

    Authors: Wei-Lin Hsiao, Kristen Grauman

    Abstract: Fashion is intertwined with external cultural factors, but identifying these links remains a manual process limited to only the most salient phenomena. We propose a data-driven approach to identify specific cultural factors affecting the clothes people wear. Using large-scale datasets of news articles and vintage photos spanning a century, we present a multi-modal statistical model to detect influ… ▽ More

    Submitted 20 September, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

    Comments: Accepted to ICCV 2021

  10. arXiv:2101.02402  [pdf, other

    cs.SD cs.AI eess.AS

    Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs

    Authors: Wen-Yi Hsiao, Jen-Yu Liu, Yin-Cheng Yeh, Yi-Hsuan Yang

    Abstract: To apply neural sequence models such as the Transformers to music generation tasks, one has to represent a piece of music by a sequence of tokens drawn from a finite set of pre-defined vocabulary. Such a vocabulary usually involves tokens of various types. For example, to describe a musical note, one needs separate tokens to indicate the note's pitch, duration, velocity (dynamics), and placement (… ▽ More

    Submitted 7 January, 2021; originally announced January 2021.

  11. arXiv:2008.02480  [pdf, other

    eess.AS cs.LG cs.SD

    Mixing-Specific Data Augmentation Techniques for Improved Blind Violin/Piano Source Separation

    Authors: Ching-Yu Chiu, Wen-Yi Hsiao, Yin-Cheng Yeh, Yi-Hsuan Yang, Alvin Wen-Yu Su

    Abstract: Blind music source separation has been a popular and active subject of research in both the music information retrieval and signal processing communities. To counter the lack of available multi-track data for supervised model training, a data augmentation method that creates artificial mixtures by combining tracks from different songs has been shown useful in recent works. Following this light, we… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: Accepted to IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP 2020)

  12. arXiv:2008.01431  [pdf, other

    cs.SD eess.AS

    Automatic Composition of Guitar Tabs by Transformers and Groove Modeling

    Authors: Yu-Hua Chen, Yu-Hsiang Huang, Wen-Yi Hsiao, Yi-Hsuan Yang

    Abstract: Deep learning algorithms are increasingly developed for learning to compose music in the form of MIDI files. However, whether such algorithms work well for composing guitar tabs, which are quite different from MIDIs, remain relatively unexplored. To address this, we build a model for composing fingerstyle guitar tabs with Transformer-XL, a neural sequence model architecture. With this model, we in… ▽ More

    Submitted 4 August, 2020; originally announced August 2020.

    Comments: Accepted at Proc. Int. Society for Music Information Retrieval Conf. 2020

  13. Learning Patterns of Tourist Movement and Photography from Geotagged Photos at Archaeological Heritage Sites in Cuzco, Peru

    Authors: Nicole D. Payntar, Wei-Lin Hsiao, R. Alan Covey, Kristen Grauman

    Abstract: The popularity of media sharing platforms in recent decades has provided an abundance of open source data that remains underutilized by heritage scholars. By pairing geotagged internet photographs with machine learning and computer vision algorithms, we build upon the current theoretical discourse of anthropology associated with visuality and heritage tourism to identify travel patterns across a k… ▽ More

    Submitted 29 June, 2020; originally announced June 2020.

    Comments: Accepted to Tourism Management

  14. arXiv:2002.00722  [pdf

    eess.SP cs.IT

    Multipath Division Multiple Access for 5G Millimeter Wave Cellular Systems

    Authors: Shin-Yuan Wang, Wei-Han Hsiao, Kang-Lun Chiu, Chia-Chi Huang

    Abstract: Future 5G communication systems require more demanding performances than the existing cellular communication systems, e.g., 10 to 100 Mbps user data rate and much larger cellular spectrum efficiency. The well-used multiple access methods like CDMA and OFDMA are hard to achieve these challenging requirements simultaneously even with advanced signal processing techniques and base station cooperation… ▽ More

    Submitted 3 February, 2020; originally announced February 2020.

    Comments: 6 pages, 5 figures, and 1 table

  15. arXiv:2001.02360  [pdf, other

    cs.SD cs.LG eess.AS

    Automatic Melody Harmonization with Triad Chords: A Comparative Study

    Authors: Yin-Cheng Yeh, Wen-Yi Hsiao, Satoru Fukayama, Tetsuro Kitahara, Benjamin Genchel, Hao-Min Liu, Hao-Wen Dong, Yian Chen, Terence Leong, Yi-Hsuan Yang

    Abstract: Several prior works have proposed various methods for the task of automatic melody harmonization, in which a model aims to generate a sequence of chords to serve as the harmonic accompaniment of a given multiple-bar melody sequence. In this paper, we present a comparative study evaluating and comparing the performance of a set of canonical approaches to this task, including a template matching bas… ▽ More

    Submitted 27 April, 2021; v1 submitted 7 January, 2020; originally announced January 2020.

    Comments: 20 pages, 6 figures, published in Journal of New Music Research (JNMR), Volume 50 Issue 1

  16. arXiv:1912.06697  [pdf, other

    cs.CV

    ViBE: Dressing for Diverse Body Shapes

    Authors: Wei-Lin Hsiao, Kristen Grauman

    Abstract: Body shape plays an important role in determining what garments will best suit a given person, yet today's clothing recommendation methods take a "one shape fits all" approach. These body-agnostic vision methods and datasets are a barrier to inclusion, ill-equipped to provide good suggestions for diverse body shapes. We introduce ViBE, a VIsual Body-aware Embedding that captures clothing's affinit… ▽ More

    Submitted 28 March, 2020; v1 submitted 13 December, 2019; originally announced December 2019.

    Comments: Accepted to CVPR 2020

  17. arXiv:1909.11620  [pdf, other

    cs.GR cs.CG

    Manufacturability Oriented Model Correction and Build Direction Optimization for Additive Manufacturing

    Authors: Erva Ulu, Nurcan Gecer Ulu, Walter Hsiao, Saigopal Nelaturi

    Abstract: We introduce a method to analyze and modify a shape to make it manufacturable for a given additive manufacturing (AM) process. Different AM technologies, process parameters or materials introduce geometric constraints on what is manufacturable or not. Given an input 3D model and minimum printable feature size dictated by the manufacturing process characteristics and parameters, our algorithm gener… ▽ More

    Submitted 19 September, 2019; originally announced September 2019.

    Comments: Accepted to Journal of Mechanical Design

  18. arXiv:1904.09261  [pdf, other

    cs.CV

    Fashion++: Minimal Edits for Outfit Improvement

    Authors: Wei-Lin Hsiao, Isay Katsman, Chao-Yuan Wu, Devi Parikh, Kristen Grauman

    Abstract: Given an outfit, what small changes would most improve its fashionability? This question presents an intriguing new vision challenge. We introduce Fashion++, an approach that proposes minimal adjustments to a full-body clothing outfit that will have maximal impact on its fashionability. Our model consists of a deep image generation neural network that learns to synthesize clothing conditioned on l… ▽ More

    Submitted 2 September, 2019; v1 submitted 19 April, 2019; originally announced April 2019.

    Comments: accepted to ICCV 2019

  19. arXiv:1712.02662  [pdf, other

    cs.CV

    Creating Capsule Wardrobes from Fashion Images

    Authors: Wei-Lin Hsiao, Kristen Grauman

    Abstract: We propose to automatically create capsule wardrobes. Given an inventory of candidate garments and accessories, the algorithm must assemble a minimal set of items that provides maximal mix-and-match outfits. We pose the task as a subset selection problem. To permit efficient subset selection over the space of all outfit combinations, we develop submodular objective functions capturing the key ingr… ▽ More

    Submitted 14 April, 2018; v1 submitted 7 December, 2017; originally announced December 2017.

    Comments: Accepted to CVPR 2018

  20. arXiv:1709.06298  [pdf, other

    eess.AS cs.AI cs.LG cs.SD stat.ML

    MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment

    Authors: Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, Yi-Hsuan Yang

    Abstract: Generating music has a few notable differences from generating images and videos. First, music is an art of time, necessitating a temporal model. Second, music is usually composed of multiple instruments/tracks with their own temporal dynamics, but collectively they unfold over time interdependently. Lastly, musical notes are often grouped into chords, arpeggios or melodies in polyphonic music, an… ▽ More

    Submitted 24 November, 2017; v1 submitted 19 September, 2017; originally announced September 2017.

    Comments: to appear at AAAI 2018

  21. arXiv:1707.03376  [pdf, other

    cs.CV

    Learning the Latent "Look": Unsupervised Discovery of a Style-Coherent Embedding from Fashion Images

    Authors: Wei-Lin Hsiao, Kristen Grauman

    Abstract: What defines a visual style? Fashion styles emerge organically from how people assemble outfits of clothing, making them difficult to pin down with a computational model. Low-level visual similarity can be too specific to detect stylistically similar images, while manually crafted style categories can be too abstract to capture subtle style differences. We propose an unsupervised approach to learn… ▽ More

    Submitted 3 August, 2017; v1 submitted 11 July, 2017; originally announced July 2017.

  22. arXiv:1605.01987  [pdf

    cs.NI

    TCPTuner: Congestion Control Your Way

    Authors: Kevin Miller, Luke W. Hsiao

    Abstract: TCPTuner is a TCP (transmission control protocol) congestion control kernel module and GUI (graphical user interface) for Linux that allows real-time modification of the congestion control parameters of TCP CUBIC, the current default algorithm in Linux. Specifically, the tool provides access to alpha, the rate at which a sender's congestion window grows; beta, the multiplicative factor to decrease… ▽ More

    Submitted 6 May, 2016; originally announced May 2016.

    Comments: 6 pages, 9 figures