-
PyNeuralFx: A Python Package for Neural Audio Effect Modeling
Authors:
Yen-Tung Yeh,
Wen-Yi Hsiao,
Yi-Hsuan Yang
Abstract:
We present PyNeuralFx, an open-source Python toolkit designed for research on neural audio effect modeling. The toolkit provides an intuitive framework and offers a comprehensive suite of features, including standardized implementation of well-established model architectures, loss functions, and easy-to-use visualization tools. As such, it helps promote reproducibility for research on neural audio…
▽ More
We present PyNeuralFx, an open-source Python toolkit designed for research on neural audio effect modeling. The toolkit provides an intuitive framework and offers a comprehensive suite of features, including standardized implementation of well-established model architectures, loss functions, and easy-to-use visualization tools. As such, it helps promote reproducibility for research on neural audio effect modeling, and enable in-depth performance comparison of different models, offering insight into the behavior and operational characteristics of models through DSP methodology. The toolkit can be found at https://github.com/ytsrt66589/pyneuralfx.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
Hyper Recurrent Neural Network: Condition Mechanisms for Black-box Audio Effect Modeling
Authors:
Yen-Tung Yeh,
Wen-Yi Hsiao,
Yi-Hsuan Yang
Abstract:
Recurrent neural networks (RNNs) have demonstrated impressive results for virtual analog modeling of audio effects. These networks process time-domain audio signals using a series of matrix multiplication and nonlinear activation functions to emulate the behavior of the target device accurately. To additionally model the effect of the knobs for an RNN-based model, existing approaches integrate con…
▽ More
Recurrent neural networks (RNNs) have demonstrated impressive results for virtual analog modeling of audio effects. These networks process time-domain audio signals using a series of matrix multiplication and nonlinear activation functions to emulate the behavior of the target device accurately. To additionally model the effect of the knobs for an RNN-based model, existing approaches integrate control parameters by concatenating them channel-wisely with some intermediate representation of the input signal. While this method is parameter-efficient, there is room to further improve the quality of generated audio because the concatenation-based conditioning method has limited capacity in modulating signals. In this paper, we propose three novel conditioning mechanisms for RNNs, tailored for black-box virtual analog modeling. These advanced conditioning mechanisms modulate the model based on control parameters, yielding superior results to existing RNN- and CNN-based architectures across various evaluation metrics.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation
Authors:
Yun-Han Lan,
Wen-Yi Hsiao,
Hao-Chung Cheng,
Yi-Hsuan Yang
Abstract:
Existing text-to-music models can produce high-quality audio with great diversity. However, textual prompts alone cannot precisely control temporal musical features such as chords and rhythm of the generated music. To address this challenge, we introduce MusiConGen, a temporally-conditioned Transformer-based text-to-music model that builds upon the pretrained MusicGen framework. Our innovation lie…
▽ More
Existing text-to-music models can produce high-quality audio with great diversity. However, textual prompts alone cannot precisely control temporal musical features such as chords and rhythm of the generated music. To address this challenge, we introduce MusiConGen, a temporally-conditioned Transformer-based text-to-music model that builds upon the pretrained MusicGen framework. Our innovation lies in an efficient finetuning mechanism, tailored for consumer-grade GPUs, that integrates automatically-extracted rhythm and chords as the condition signal. During inference, the condition can either be musical features extracted from a reference audio signal, or be user-defined symbolic chord sequence, BPM, and textual prompts. Our performance evaluation on two datasets -- one derived from extracted features and the other from user-created inputs -- demonstrates that MusiConGen can generate realistic backing track music that aligns well with the specified conditions. We open-source the code and model checkpoints, and provide audio examples online, https://musicongen.github.io/musicongen_demo/.
△ Less
Submitted 21 July, 2024;
originally announced July 2024.
-
Sequence Feature Extraction for Malware Family Analysis via Graph Neural Network
Authors:
S. W. Hsiao,
P. Y. Chu
Abstract:
Malicious software (malware) causes much harm to our devices and life. We are eager to understand the malware behavior and the threat it made. Most of the record files of malware are variable length and text-based files with time stamps, such as event log data and dynamic analysis profiles. Using the time stamps, we can sort such data into sequence-based data for the following analysis. However, d…
▽ More
Malicious software (malware) causes much harm to our devices and life. We are eager to understand the malware behavior and the threat it made. Most of the record files of malware are variable length and text-based files with time stamps, such as event log data and dynamic analysis profiles. Using the time stamps, we can sort such data into sequence-based data for the following analysis. However, dealing with the text-based sequences with variable lengths is difficult. In addition, unlike natural language text data, most sequential data in information security have specific properties and structure, such as loop, repeated call, noise, etc. To deeply analyze the API call sequences with their structure, we use graphs to represent the sequences, which can further investigate the information and structure, such as the Markov model. Therefore, we design and implement an Attention Aware Graph Neural Network (AWGCN) to analyze the API call sequences. Through AWGCN, we can obtain the sequence embeddings to analyze the behavior of the malware. Moreover, the classification experiment result shows that AWGCN outperforms other classifiers in the call-like datasets, and the embedding can further improve the classic model's performance.
△ Less
Submitted 10 August, 2022;
originally announced August 2022.
-
DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation
Authors:
Da-Yi Wu,
Wen-Yi Hsiao,
Fu-Rong Yang,
Oscar Friedman,
Warren Jackson,
Scott Bruzenak,
Yi-Wen Liu,
Yi-Hsuan Yang
Abstract:
A vocoder is a conditional audio generation model that converts acoustic features such as mel-spectrograms into waveforms. Taking inspiration from Differentiable Digital Signal Processing (DDSP), we propose a new vocoder named SawSing for singing voices. SawSing synthesizes the harmonic part of singing voices by filtering a sawtooth source signal with a linear time-variant finite impulse response…
▽ More
A vocoder is a conditional audio generation model that converts acoustic features such as mel-spectrograms into waveforms. Taking inspiration from Differentiable Digital Signal Processing (DDSP), we propose a new vocoder named SawSing for singing voices. SawSing synthesizes the harmonic part of singing voices by filtering a sawtooth source signal with a linear time-variant finite impulse response filter whose coefficients are estimated from the input mel-spectrogram by a neural network. As this approach enforces phase continuity, SawSing can generate singing voices without the phase-discontinuity glitch of many existing vocoders. Moreover, the source-filter assumption provides an inductive bias that allows SawSing to be trained on a small amount of data. Our experiments show that SawSing converges much faster and outperforms state-of-the-art generative adversarial network and diffusion-based vocoders in a resource-limited scenario with only 3 training recordings and a 3-hour training time.
△ Less
Submitted 18 August, 2022; v1 submitted 9 August, 2022;
originally announced August 2022.
-
towards automatic transcription of polyphonic electric guitar music:a new dataset and a multi-loss transformer model
Authors:
Yu-Hua Chen,
Wen-Yi Hsiao,
Tsu-Kuang Hsieh,
Jyh-Shing Roger Jang,
Yi-Hsuan Yang
Abstract:
In this paper, we propose a new dataset named EGDB, that con-tains transcriptions of the electric guitar performance of 240 tab-latures rendered with different tones. Moreover, we benchmark theperformance of two well-known transcription models proposed orig-inally for the piano on this dataset, along with a multi-loss Trans-former model that we newly propose. Our evaluation on this datasetand a se…
▽ More
In this paper, we propose a new dataset named EGDB, that con-tains transcriptions of the electric guitar performance of 240 tab-latures rendered with different tones. Moreover, we benchmark theperformance of two well-known transcription models proposed orig-inally for the piano on this dataset, along with a multi-loss Trans-former model that we newly propose. Our evaluation on this datasetand a separate set of real-world recordings demonstrate the influenceof timbre on the accuracy of guitar sheet transcription, the potentialof using multiple losses for Transformers, as well as the room forfurther improvement for this task.
△ Less
Submitted 20 February, 2022;
originally announced February 2022.
-
Source Separation-based Data Augmentation for Improved Joint Beat and Downbeat Tracking
Authors:
Ching-Yu Chiu,
Joann Ching,
Wen-Yi Hsiao,
Yu-Hua Chen,
Alvin Wen-Yu Su,
Yi-Hsuan Yang
Abstract:
Due to advances in deep learning, the performance of automatic beat and downbeat tracking in musical audio signals has seen great improvement in recent years. In training such deep learning based models, data augmentation has been found an important technique. However, existing data augmentation methods for this task mainly target at balancing the distribution of the training data with respect to…
▽ More
Due to advances in deep learning, the performance of automatic beat and downbeat tracking in musical audio signals has seen great improvement in recent years. In training such deep learning based models, data augmentation has been found an important technique. However, existing data augmentation methods for this task mainly target at balancing the distribution of the training data with respect to their tempo. In this paper, we investigate another approach for data augmentation, to account for the composition of the training data in terms of the percussive and non-percussive sound sources. Specifically, we propose to employ a blind drum separation model to segregate the drum and non-drum sounds from each training audio signal, filtering out training signals that are drumless, and then use the obtained drum and non-drum stems to augment the training data. We report experiments on four completely unseen test sets, validating the effectiveness of the proposed method, and accordingly the importance of drum sound composition in the training data for beat and downbeat tracking.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
Curvy: An Interactive Design Tool for Varying Density Support Structures
Authors:
Erva Ulu,
Nurcan Gecer Ulu,
Jiahao Li,
Walter Hsiao
Abstract:
We introduce Curvy-an interactive design tool to generate varying density support structures for 3D printing. Support structures are essential for printing models with extreme overhangs. Yet, they often cause defects on contact areas, resulting in poor surface quality. Low-level design of support structures may alleviate such negative effects. However, it is tedious and unintuitive for novice user…
▽ More
We introduce Curvy-an interactive design tool to generate varying density support structures for 3D printing. Support structures are essential for printing models with extreme overhangs. Yet, they often cause defects on contact areas, resulting in poor surface quality. Low-level design of support structures may alleviate such negative effects. However, it is tedious and unintuitive for novice users as it is hard to predict the impact of changes to the support structure on the final printed part. Curvy allows users to define their high-level preferences on the surface quality directly on the target object rather than explicitly designing the supports. These preferences are then automatically translated into low-level design parameters to generate the support structure. Underlying novel curvy zigzag toolpathing algorithm uses these instructions to generate varying density supports by altering the spacing between individual paths in order to achieve prescribed quality. Combined with the build orientation optimization, Curvy provides a practical solution to the design of support structures with minimal perceptual or functional impact on the target part to be printed.
△ Less
Submitted 19 February, 2021;
originally announced February 2021.
-
From Culture to Clothing: Discovering the World Events Behind A Century of Fashion Images
Authors:
Wei-Lin Hsiao,
Kristen Grauman
Abstract:
Fashion is intertwined with external cultural factors, but identifying these links remains a manual process limited to only the most salient phenomena. We propose a data-driven approach to identify specific cultural factors affecting the clothes people wear. Using large-scale datasets of news articles and vintage photos spanning a century, we present a multi-modal statistical model to detect influ…
▽ More
Fashion is intertwined with external cultural factors, but identifying these links remains a manual process limited to only the most salient phenomena. We propose a data-driven approach to identify specific cultural factors affecting the clothes people wear. Using large-scale datasets of news articles and vintage photos spanning a century, we present a multi-modal statistical model to detect influence relationships between happenings in the world and people's choice of clothing. Furthermore, on two image datasets we apply our model to improve the concrete vision tasks of visual style forecasting and photo timestamping. Our work is a first step towards a computational, scalable, and easily refreshable approach to link culture to clothing.
△ Less
Submitted 20 September, 2021; v1 submitted 2 February, 2021;
originally announced February 2021.
-
Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs
Authors:
Wen-Yi Hsiao,
Jen-Yu Liu,
Yin-Cheng Yeh,
Yi-Hsuan Yang
Abstract:
To apply neural sequence models such as the Transformers to music generation tasks, one has to represent a piece of music by a sequence of tokens drawn from a finite set of pre-defined vocabulary. Such a vocabulary usually involves tokens of various types. For example, to describe a musical note, one needs separate tokens to indicate the note's pitch, duration, velocity (dynamics), and placement (…
▽ More
To apply neural sequence models such as the Transformers to music generation tasks, one has to represent a piece of music by a sequence of tokens drawn from a finite set of pre-defined vocabulary. Such a vocabulary usually involves tokens of various types. For example, to describe a musical note, one needs separate tokens to indicate the note's pitch, duration, velocity (dynamics), and placement (onset time) along the time grid. While different types of tokens may possess different properties, existing models usually treat them equally, in the same way as modeling words in natural languages. In this paper, we present a conceptually different approach that explicitly takes into account the type of the tokens, such as note types and metric types. And, we propose a new Transformer decoder architecture that uses different feed-forward heads to model tokens of different types. With an expansion-compression trick, we convert a piece of music to a sequence of compound words by grouping neighboring tokens, greatly reducing the length of the token sequences. We show that the resulting model can be viewed as a learner over dynamic directed hypergraphs. And, we employ it to learn to compose expressive Pop piano music of full-song length (involving up to 10K individual tokens per song), both conditionally and unconditionally. Our experiment shows that, compared to state-of-the-art models, the proposed model converges 5--10 times faster at training (i.e., within a day on a single GPU with 11 GB memory), and with comparable quality in the generated music.
△ Less
Submitted 7 January, 2021;
originally announced January 2021.
-
Mixing-Specific Data Augmentation Techniques for Improved Blind Violin/Piano Source Separation
Authors:
Ching-Yu Chiu,
Wen-Yi Hsiao,
Yin-Cheng Yeh,
Yi-Hsuan Yang,
Alvin Wen-Yu Su
Abstract:
Blind music source separation has been a popular and active subject of research in both the music information retrieval and signal processing communities. To counter the lack of available multi-track data for supervised model training, a data augmentation method that creates artificial mixtures by combining tracks from different songs has been shown useful in recent works. Following this light, we…
▽ More
Blind music source separation has been a popular and active subject of research in both the music information retrieval and signal processing communities. To counter the lack of available multi-track data for supervised model training, a data augmentation method that creates artificial mixtures by combining tracks from different songs has been shown useful in recent works. Following this light, we examine further in this paper extended data augmentation methods that consider more sophisticated mixing settings employed in the modern music production routine, the relationship between the tracks to be combined, and factors of silence. As a case study, we consider the separation of violin and piano tracks in a violin piano ensemble, evaluating the performance in terms of common metrics, namely SDR, SIR, and SAR. In addition to examining the effectiveness of these new data augmentation methods, we also study the influence of the amount of training data. Our evaluation shows that the proposed mixing-specific data augmentation methods can help improve the performance of a deep learning-based model for source separation, especially in the case of small training data.
△ Less
Submitted 6 August, 2020;
originally announced August 2020.
-
Automatic Composition of Guitar Tabs by Transformers and Groove Modeling
Authors:
Yu-Hua Chen,
Yu-Hsiang Huang,
Wen-Yi Hsiao,
Yi-Hsuan Yang
Abstract:
Deep learning algorithms are increasingly developed for learning to compose music in the form of MIDI files. However, whether such algorithms work well for composing guitar tabs, which are quite different from MIDIs, remain relatively unexplored. To address this, we build a model for composing fingerstyle guitar tabs with Transformer-XL, a neural sequence model architecture. With this model, we in…
▽ More
Deep learning algorithms are increasingly developed for learning to compose music in the form of MIDI files. However, whether such algorithms work well for composing guitar tabs, which are quite different from MIDIs, remain relatively unexplored. To address this, we build a model for composing fingerstyle guitar tabs with Transformer-XL, a neural sequence model architecture. With this model, we investigate the following research questions. First, whether the neural net generates note sequences with meaningful note-string combinations, which is important for the guitar but not other instruments such as the piano. Second, whether it generates compositions with coherent rhythmic groove, crucial for fingerstyle guitar music. And, finally, how pleasant the composed music is in comparison to real, human-made compositions. Our work provides preliminary empirical evidence of the promise of deep learning for tab composition, and suggests areas for future study.
△ Less
Submitted 4 August, 2020;
originally announced August 2020.
-
Learning Patterns of Tourist Movement and Photography from Geotagged Photos at Archaeological Heritage Sites in Cuzco, Peru
Authors:
Nicole D. Payntar,
Wei-Lin Hsiao,
R. Alan Covey,
Kristen Grauman
Abstract:
The popularity of media sharing platforms in recent decades has provided an abundance of open source data that remains underutilized by heritage scholars. By pairing geotagged internet photographs with machine learning and computer vision algorithms, we build upon the current theoretical discourse of anthropology associated with visuality and heritage tourism to identify travel patterns across a k…
▽ More
The popularity of media sharing platforms in recent decades has provided an abundance of open source data that remains underutilized by heritage scholars. By pairing geotagged internet photographs with machine learning and computer vision algorithms, we build upon the current theoretical discourse of anthropology associated with visuality and heritage tourism to identify travel patterns across a known archaeological heritage circuit, and quantify visual culture and experiences in Cuzco, Peru. Leveraging large-scale in-the-wild tourist photos, our goals are to (1) understand how the intensification of tourism intersects with heritage regulations and social media, aiding in the articulation of travel patterns across Cuzco's heritage landscape; and to (2) assess how aesthetic preferences and visuality become entangled with the rapidly evolving expectations of tourists, whose travel narratives are curated on social media and grounded in historic site representations.
△ Less
Submitted 29 June, 2020;
originally announced June 2020.
-
Multipath Division Multiple Access for 5G Millimeter Wave Cellular Systems
Authors:
Shin-Yuan Wang,
Wei-Han Hsiao,
Kang-Lun Chiu,
Chia-Chi Huang
Abstract:
Future 5G communication systems require more demanding performances than the existing cellular communication systems, e.g., 10 to 100 Mbps user data rate and much larger cellular spectrum efficiency. The well-used multiple access methods like CDMA and OFDMA are hard to achieve these challenging requirements simultaneously even with advanced signal processing techniques and base station cooperation…
▽ More
Future 5G communication systems require more demanding performances than the existing cellular communication systems, e.g., 10 to 100 Mbps user data rate and much larger cellular spectrum efficiency. The well-used multiple access methods like CDMA and OFDMA are hard to achieve these challenging requirements simultaneously even with advanced signal processing techniques and base station cooperation. Recently, massive MIMO has gain much attention since it provides large signal dimensions that can be used to improve future 5G cellular system performance. This tutorial paper describes a recently proposed multiple access scheme called multipath division multiple access (MDMA) based on massive antennas and multipath channel characteristics in millimeter wave band, which offers uniform user data rate and achieves high cellular spectrum efficiency for 5G systems. We describe the fundamental principle and show the uplink and downlink block diagrams, control signaling and the call setup process. Besides, benefits of using MDMA as a multiple access scheme are discussed. Finally, practical concerns are addressed for future research directions.
△ Less
Submitted 3 February, 2020;
originally announced February 2020.
-
Automatic Melody Harmonization with Triad Chords: A Comparative Study
Authors:
Yin-Cheng Yeh,
Wen-Yi Hsiao,
Satoru Fukayama,
Tetsuro Kitahara,
Benjamin Genchel,
Hao-Min Liu,
Hao-Wen Dong,
Yian Chen,
Terence Leong,
Yi-Hsuan Yang
Abstract:
Several prior works have proposed various methods for the task of automatic melody harmonization, in which a model aims to generate a sequence of chords to serve as the harmonic accompaniment of a given multiple-bar melody sequence. In this paper, we present a comparative study evaluating and comparing the performance of a set of canonical approaches to this task, including a template matching bas…
▽ More
Several prior works have proposed various methods for the task of automatic melody harmonization, in which a model aims to generate a sequence of chords to serve as the harmonic accompaniment of a given multiple-bar melody sequence. In this paper, we present a comparative study evaluating and comparing the performance of a set of canonical approaches to this task, including a template matching based model, a hidden Markov based model, a genetic algorithm based model, and two deep learning based models. The evaluation is conducted on a dataset of 9,226 melody/chord pairs we newly collect for this study, considering up to 48 triad chords, using a standardized training/test split. We report the result of an objective evaluation using six different metrics and a subjective study with 202 participants.
△ Less
Submitted 27 April, 2021; v1 submitted 7 January, 2020;
originally announced January 2020.
-
ViBE: Dressing for Diverse Body Shapes
Authors:
Wei-Lin Hsiao,
Kristen Grauman
Abstract:
Body shape plays an important role in determining what garments will best suit a given person, yet today's clothing recommendation methods take a "one shape fits all" approach. These body-agnostic vision methods and datasets are a barrier to inclusion, ill-equipped to provide good suggestions for diverse body shapes. We introduce ViBE, a VIsual Body-aware Embedding that captures clothing's affinit…
▽ More
Body shape plays an important role in determining what garments will best suit a given person, yet today's clothing recommendation methods take a "one shape fits all" approach. These body-agnostic vision methods and datasets are a barrier to inclusion, ill-equipped to provide good suggestions for diverse body shapes. We introduce ViBE, a VIsual Body-aware Embedding that captures clothing's affinity with different body shapes. Given an image of a person, the proposed embedding identifies garments that will flatter her specific body shape. We show how to learn the embedding from an online catalog displaying fashion models of various shapes and sizes wearing the products, and we devise a method to explain the algorithm's suggestions for well-fitting garments. We apply our approach to a dataset of diverse subjects, and demonstrate its strong advantages over the status quo body-agnostic recommendation, both according to automated metrics and human opinion.
△ Less
Submitted 28 March, 2020; v1 submitted 13 December, 2019;
originally announced December 2019.
-
Manufacturability Oriented Model Correction and Build Direction Optimization for Additive Manufacturing
Authors:
Erva Ulu,
Nurcan Gecer Ulu,
Walter Hsiao,
Saigopal Nelaturi
Abstract:
We introduce a method to analyze and modify a shape to make it manufacturable for a given additive manufacturing (AM) process. Different AM technologies, process parameters or materials introduce geometric constraints on what is manufacturable or not. Given an input 3D model and minimum printable feature size dictated by the manufacturing process characteristics and parameters, our algorithm gener…
▽ More
We introduce a method to analyze and modify a shape to make it manufacturable for a given additive manufacturing (AM) process. Different AM technologies, process parameters or materials introduce geometric constraints on what is manufacturable or not. Given an input 3D model and minimum printable feature size dictated by the manufacturing process characteristics and parameters, our algorithm generates a corrected geometry that is printable with the intended AM process. A key issue in model correction for manufacturability is the identification of critical features that are affected by the printing process. To address this challenge, we propose a topology aware approach to construct the allowable space for a print head to traverse during the 3D printing process. Combined with our build orientation optimization algorithm, the amount of modifications performed on the shape is kept at minimum while providing an accurate approximation of the as-manufactured part. We demonstrate our method on a variety of 3D models and validate it by 3D printing the results.
△ Less
Submitted 19 September, 2019;
originally announced September 2019.
-
Fashion++: Minimal Edits for Outfit Improvement
Authors:
Wei-Lin Hsiao,
Isay Katsman,
Chao-Yuan Wu,
Devi Parikh,
Kristen Grauman
Abstract:
Given an outfit, what small changes would most improve its fashionability? This question presents an intriguing new vision challenge. We introduce Fashion++, an approach that proposes minimal adjustments to a full-body clothing outfit that will have maximal impact on its fashionability. Our model consists of a deep image generation neural network that learns to synthesize clothing conditioned on l…
▽ More
Given an outfit, what small changes would most improve its fashionability? This question presents an intriguing new vision challenge. We introduce Fashion++, an approach that proposes minimal adjustments to a full-body clothing outfit that will have maximal impact on its fashionability. Our model consists of a deep image generation neural network that learns to synthesize clothing conditioned on learned per-garment encodings. The latent encodings are explicitly factorized according to shape and texture, thereby allowing direct edits for both fit/presentation and color/patterns/material, respectively. We show how to bootstrap Web photos to automatically train a fashionability model, and develop an activation maximization-style approach to transform the input image into its more fashionable self. The edits suggested range from swapping in a new garment to tweaking its color, how it is worn (e.g., rolling up sleeves), or its fit (e.g., making pants baggier). Experiments demonstrate that Fashion++ provides successful edits, both according to automated metrics and human opinion. Project page is at http://vision.cs.utexas.edu/projects/FashionPlus.
△ Less
Submitted 2 September, 2019; v1 submitted 19 April, 2019;
originally announced April 2019.
-
Creating Capsule Wardrobes from Fashion Images
Authors:
Wei-Lin Hsiao,
Kristen Grauman
Abstract:
We propose to automatically create capsule wardrobes. Given an inventory of candidate garments and accessories, the algorithm must assemble a minimal set of items that provides maximal mix-and-match outfits. We pose the task as a subset selection problem. To permit efficient subset selection over the space of all outfit combinations, we develop submodular objective functions capturing the key ingr…
▽ More
We propose to automatically create capsule wardrobes. Given an inventory of candidate garments and accessories, the algorithm must assemble a minimal set of items that provides maximal mix-and-match outfits. We pose the task as a subset selection problem. To permit efficient subset selection over the space of all outfit combinations, we develop submodular objective functions capturing the key ingredients of visual compatibility, versatility, and user-specific preference. Since adding garments to a capsule only expands its possible outfits, we devise an iterative approach to allow near-optimal submodular function maximization. Finally, we present an unsupervised approach to learn visual compatibility from "in the wild" full body outfit photos; the compatibility metric translates well to cleaner catalog photos and improves over existing methods. Our results on thousands of pieces from popular fashion websites show that automatic capsule creation has potential to mimic skilled fashionistas in assembling flexible wardrobes, while being significantly more scalable.
△ Less
Submitted 14 April, 2018; v1 submitted 7 December, 2017;
originally announced December 2017.
-
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment
Authors:
Hao-Wen Dong,
Wen-Yi Hsiao,
Li-Chia Yang,
Yi-Hsuan Yang
Abstract:
Generating music has a few notable differences from generating images and videos. First, music is an art of time, necessitating a temporal model. Second, music is usually composed of multiple instruments/tracks with their own temporal dynamics, but collectively they unfold over time interdependently. Lastly, musical notes are often grouped into chords, arpeggios or melodies in polyphonic music, an…
▽ More
Generating music has a few notable differences from generating images and videos. First, music is an art of time, necessitating a temporal model. Second, music is usually composed of multiple instruments/tracks with their own temporal dynamics, but collectively they unfold over time interdependently. Lastly, musical notes are often grouped into chords, arpeggios or melodies in polyphonic music, and thereby introducing a chronological ordering of notes is not naturally suitable. In this paper, we propose three models for symbolic multi-track music generation under the framework of generative adversarial networks (GANs). The three models, which differ in the underlying assumptions and accordingly the network architectures, are referred to as the jamming model, the composer model and the hybrid model. We trained the proposed models on a dataset of over one hundred thousand bars of rock music and applied them to generate piano-rolls of five tracks: bass, drums, guitar, piano and strings. A few intra-track and inter-track objective metrics are also proposed to evaluate the generative results, in addition to a subjective user study. We show that our models can generate coherent music of four bars right from scratch (i.e. without human inputs). We also extend our models to human-AI cooperative music generation: given a specific track composed by human, we can generate four additional tracks to accompany it. All code, the dataset and the rendered audio samples are available at https://salu133445.github.io/musegan/ .
△ Less
Submitted 24 November, 2017; v1 submitted 19 September, 2017;
originally announced September 2017.
-
Learning the Latent "Look": Unsupervised Discovery of a Style-Coherent Embedding from Fashion Images
Authors:
Wei-Lin Hsiao,
Kristen Grauman
Abstract:
What defines a visual style? Fashion styles emerge organically from how people assemble outfits of clothing, making them difficult to pin down with a computational model. Low-level visual similarity can be too specific to detect stylistically similar images, while manually crafted style categories can be too abstract to capture subtle style differences. We propose an unsupervised approach to learn…
▽ More
What defines a visual style? Fashion styles emerge organically from how people assemble outfits of clothing, making them difficult to pin down with a computational model. Low-level visual similarity can be too specific to detect stylistically similar images, while manually crafted style categories can be too abstract to capture subtle style differences. We propose an unsupervised approach to learn a style-coherent representation. Our method leverages probabilistic polylingual topic models based on visual attributes to discover a set of latent style factors. Given a collection of unlabeled fashion images, our approach mines for the latent styles, then summarizes outfits by how they mix those styles. Our approach can organize galleries of outfits by style without requiring any style labels. Experiments on over 100K images demonstrate its promise for retrieving, mixing, and summarizing fashion images by their style.
△ Less
Submitted 3 August, 2017; v1 submitted 11 July, 2017;
originally announced July 2017.
-
TCPTuner: Congestion Control Your Way
Authors:
Kevin Miller,
Luke W. Hsiao
Abstract:
TCPTuner is a TCP (transmission control protocol) congestion control kernel module and GUI (graphical user interface) for Linux that allows real-time modification of the congestion control parameters of TCP CUBIC, the current default algorithm in Linux. Specifically, the tool provides access to alpha, the rate at which a sender's congestion window grows; beta, the multiplicative factor to decrease…
▽ More
TCPTuner is a TCP (transmission control protocol) congestion control kernel module and GUI (graphical user interface) for Linux that allows real-time modification of the congestion control parameters of TCP CUBIC, the current default algorithm in Linux. Specifically, the tool provides access to alpha, the rate at which a sender's congestion window grows; beta, the multiplicative factor to decrease the congestion window on a loss event; as well as CUBIC's fast convergence and tcp friendliness parameters. Additionally, the interface provides access to ip-route parameters for the minimum retransmission time and initial congestion window size. In this paper, we describe the implementation of TCPTuner and show experimental data of the effects of adjusting congestion control parameters.
△ Less
Submitted 6 May, 2016;
originally announced May 2016.