Skip to main content

Showing 1–22 of 22 results for author: Han, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.16968  [pdf, ps, other

    cs.AR cs.AI cs.CL cs.LG cs.PL

    CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark

    Authors: Ahmed Heakl, Sarim Hashmi, Gustavo Bertolo Stahl, Seung Hun Eddie Han, Salman Khan, Abdulrahman Mahmoud

    Abstract: We introduce CASS, the first large-scale dataset and model suite for cross-architecture GPU code transpilation, targeting both source-level (CUDA <--> HIP) and assembly-level (Nvidia SASS <--> AMD RDNA3) translation. The dataset comprises 70k verified code pairs across host and device, addressing a critical gap in low-level GPU code portability. Leveraging this resource, we train the CASS family o… ▽ More

    Submitted 29 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: 20 pages, 11 figures, 5 tables

  2. arXiv:2505.14946  [pdf, ps, other

    cs.AI

    Reinforcement Learning from User Feedback

    Authors: Eric Han, Jun Chen, Karthik Abinav Sankararaman, Xiaoliang Peng, Tengyu Xu, Eryk Helenowski, Kaiyan Peng, Mrinal Kumar, Sinong Wang, Han Fang, Arya Talebzadeh

    Abstract: As large language models (LLMs) are increasingly deployed in diverse user facing applications, aligning them with real user preferences becomes essential. Existing methods like Reinforcement Learning from Human Feedback (RLHF) rely on expert annotators trained on manually defined guidelines, whose judgments may not reflect the priorities of everyday users. We introduce Reinforcement Learning from… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  3. arXiv:2505.00304  [pdf, other

    stat.ML cs.LG stat.ME

    Reinforcement Learning with Continuous Actions Under Unmeasured Confounding

    Authors: Yuhan Li, Eugene Han, Yifan Hu, Wenzhuo Zhou, Zhengling Qi, Yifan Cui, Ruoqing Zhu

    Abstract: This paper addresses the challenge of offline policy learning in reinforcement learning with continuous action spaces when unmeasured confounders are present. While most existing research focuses on policy evaluation within partially observable Markov decision processes (POMDPs) and assumes discrete action spaces, we advance this field by establishing a novel identification result to enable the no… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  4. arXiv:2504.13392  [pdf, ps, other

    cs.CV cs.HC

    POET: Supporting Prompting Creativity and Personalization with Automated Expansion of Text-to-Image Generation

    Authors: Evans Xu Han, Alice Qian Zhang, Hong Shen, Haiyi Zhu, Paul Pu Liang, Jane Hsieh

    Abstract: State-of-the-art visual generative AI tools hold immense potential to assist users in the early ideation stages of creative tasks -- offering the ability to generate (rather than search for) novel and unprecedented (instead of existing) images of considerable quality that also adhere to boundless combinations of user specifications. However, many large-scale text-to-image systems are designed for… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  5. arXiv:2411.07705  [pdf, other

    cs.CY

    dpvis: A Visual and Interactive Learning Tool for Dynamic Programming

    Authors: David H. Lee, Aditya Prasad, Ramiro Deo-Campo Vuong, Tianyu Wang, Eric Han, David Kempe

    Abstract: Dynamic programming (DP) is a fundamental and powerful algorithmic paradigm taught in most undergraduate (and many graduate) algorithms classes. DP problems are challenging for many computer science students because they require identifying unique problem structures and a refined understanding of recursion. In this paper, we present dpvis, a Python library that helps students understand DP through… ▽ More

    Submitted 12 November, 2024; originally announced November 2024.

    Comments: Published as a conference paper at Technical Symposium on Computer Science Education (SIGCSE TS '25); dpvis is available at https://dpvis.readthedocs.io/en/latest/

    ACM Class: K.3.1; K.3.2

  6. arXiv:2410.16719  [pdf, other

    cs.CV cs.LG

    Progressive Compositionality in Text-to-Image Generative Models

    Authors: Evans Xu Han, Linghao Jin, Xiaofeng Liu, Paul Pu Liang

    Abstract: Despite the impressive text-to-image (T2I) synthesis capabilities of diffusion models, they often struggle to understand compositional relationships between objects and attributes, especially in complex settings. Existing solutions have tackled these challenges by optimizing the cross-attention mechanism or learning from the caption pairs with minimal semantic changes. However, can we generate hig… ▽ More

    Submitted 26 April, 2025; v1 submitted 22 October, 2024; originally announced October 2024.

  7. arXiv:2409.20370  [pdf, other

    cs.LG cs.AI cs.CL

    The Perfect Blend: Redefining RLHF with Mixture of Judges

    Authors: Tengyu Xu, Eryk Helenowski, Karthik Abinav Sankararaman, Di Jin, Kaiyan Peng, Eric Han, Shaoliang Nie, Chen Zhu, Hejia Zhang, Wenxuan Zhou, Zhouhao Zeng, Yun He, Karishma Mandyam, Arya Talabzadeh, Madian Khabsa, Gabriel Cohen, Yuandong Tian, Hao Ma, Sinong Wang, Han Fang

    Abstract: Reinforcement learning from human feedback (RLHF) has become the leading approach for fine-tuning large language models (LLM). However, RLHF has limitations in multi-task learning (MTL) due to challenges of reward hacking and extreme multi-objective optimization (i.e., trade-off of multiple and/or sometimes conflicting objectives). Applying RLHF for MTL currently requires careful tuning of the wei… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

    Comments: submitted to conference

  8. Effect of Duration and Delay on the Identifiability of VR Motion

    Authors: Mark Roman Miller, Vivek Nair, Eugy Han, Cyan DeVeaux, Christian Rack, Rui Wang, Brandon Huang, Marc Erich Latoschik, James F. O'Brien, Jeremy N. Bailenson

    Abstract: Social virtual reality is an emerging medium of communication. In this medium, a user's avatar (virtual representation) is controlled by the tracked motion of the user's headset and hand controllers. This tracked motion is a rich data stream that can leak characteristics of the user or can be effectively matched to previously-identified data to identify a user. To better understand the boundaries… ▽ More

    Submitted 26 August, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: 6 pages, 2 figures, presented at the SePAR workshop (Security and Privacy in Mixed, Augmented, and Virtual Realities), co-located with WoWMoM 2024. arXiv admin note: text overlap with arXiv:2303.01430

  9. arXiv:2407.02896  [pdf, other

    cs.HC cs.CY

    Predicting and Understanding Turn-Taking Behavior in Open-Ended Group Activities in Virtual Reality

    Authors: Portia Wang, Eugy Han, Anna C. M. Queiroz, Cyan DeVeaux, Jeremy N. Bailenson

    Abstract: In networked virtual reality (VR), user behaviors, individual differences, and group dynamics can serve as important signals into future speech behaviors, such as who the next speaker will be and the timing of turn-taking behaviors. The ability to predict and understand these behaviors offers opportunities to provide adaptive and personalized assistance, for example helping users with varying sens… ▽ More

    Submitted 25 April, 2025; v1 submitted 3 July, 2024; originally announced July 2024.

  10. arXiv:2406.06516  [pdf, other

    stat.ME cs.LG stat.ML

    Distribution-Free Predictive Inference under Unknown Temporal Drift

    Authors: Elise Han, Chengpiao Huang, Kaizheng Wang

    Abstract: Distribution-free prediction sets play a pivotal role in uncertainty quantification for complex statistical models. Their validity hinges on reliable calibration data, which may not be readily available as real-world environments often undergo unknown changes over time. In this paper, we propose a strategy for choosing an adaptive window and use the data therein to construct prediction sets. The w… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 25 pages, 4 figures, 6 tables

  11. arXiv:2402.08672  [pdf, other

    cs.LG cs.AI stat.ME

    Model Assessment and Selection under Temporal Distribution Shift

    Authors: Elise Han, Chengpiao Huang, Kaizheng Wang

    Abstract: We investigate model assessment and selection in a changing environment, by synthesizing datasets from both the current time period and historical epochs. To tackle unknown and potentially arbitrary temporal distribution shift, we develop an adaptive rolling window approach to estimate the generalization error of a given model. This strategy also facilitates the comparison between any two candidat… ▽ More

    Submitted 3 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: 26 pages, 6 figures, 4 tables

    MSC Class: 62G05 (Primary); 62J02 (Secondary)

  12. arXiv:2307.05435  [pdf, other

    cs.LG

    One-Versus-Others Attention: Scalable Multimodal Integration for Biomedical Data

    Authors: Michal Golovanevsky, Eva Schiller, Akira Nair, Eric Han, Ritambhara Singh, Carsten Eickhoff

    Abstract: Multimodal learning models have become increasingly important as they surpass single-modality approaches on diverse tasks ranging from question-answering to autonomous driving. Despite the importance of multimodal learning, existing efforts focus on NLP applications, where the number of modalities is typically less than four (audio, video, text, images). However, data inputs in other domains, such… ▽ More

    Submitted 21 October, 2024; v1 submitted 11 July, 2023; originally announced July 2023.

  13. arXiv:2303.01430  [pdf, other

    cs.CR

    A Large-Scale Study of Personal Identifiability of Virtual Reality Motion Over Time

    Authors: Mark Roman Miller, Eugy Han, Cyan DeVeaux, Eliot Jones, Ryan Chen, Jeremy N. Bailenson

    Abstract: In recent years, social virtual reality (VR), sometimes described as the "metaverse," has become widely available. With its potential comes risks, including risks to privacy. To understand these risks, we study the identifiability of participants' motion in VR in a dataset of 232 VR users with eight weekly sessions of about thirty minutes each, totaling 764 hours of social interaction. The sample… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: 15 pages, 5 figures

  14. arXiv:2207.05261  [pdf, other

    cs.CL cs.AI cs.LG

    Building Korean Sign Language Augmentation (KoSLA) Corpus with Data Augmentation Technique

    Authors: Changnam An, Eunkyung Han, Dongmyeong Noh, Ohkyoon Kwon, Sumi Lee, Hyunshim Han

    Abstract: We present an efficient framework of corpus for sign language translation. Aided with a simple but dramatic data augmentation technique, our method converts text into annotated forms with minimum information loss. Sign languages are composed of manual signals, non-manual signals, and iconic features. According to professional sign language interpreters, non-manual signals such as facial expression… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

  15. Improving fairness in speaker verification via Group-adapted Fusion Network

    Authors: Hua Shen, Yuguang Yang, Guoli Sun, Ryan Langman, Eunjung Han, Jasha Droppo, Andreas Stolcke

    Abstract: Modern speaker verification models use deep neural networks to encode utterance audio into discriminative embedding vectors. During the training process, these networks are typically optimized to differentiate arbitrary speakers. This learning process biases the learning of fine voice characteristics towards dominant demographic groups, which can lead to an unfair performance disparity across diff… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

    Comments: To appear in Proc. IEEE ICASSP 2022

    Journal ref: Proc. IEEE ICASSP, May 2022, pp. 7077-7081

  16. Contrastive-mixup learning for improved speaker verification

    Authors: Xin Zhang, Minho Jin, Roger Cheng, Ruirui Li, Eunjung Han, Andreas Stolcke

    Abstract: This paper proposes a novel formulation of prototypical loss with mixup for speaker verification. Mixup is a simple yet efficient data augmentation technique that fabricates a weighted combination of random data point and label pairs for deep neural network training. Mixup has attracted increasing attention due to its ability to improve robustness and generalization of deep neural networks. Althou… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

    Journal ref: Proc. IEEE ICASSP, May 2022, pp. 7652-7656

  17. ASR-Aware End-to-end Neural Diarization

    Authors: Aparna Khare, Eunjung Han, Yuguang Yang, Andreas Stolcke

    Abstract: We present a Conformer-based end-to-end neural diarization (EEND) model that uses both acoustic input and features derived from an automatic speech recognition (ASR) model. Two categories of features are explored: features derived directly from ASR output (phones, position-in-word and word boundaries) and features derived from a lexical speaker change detection model, trained by fine-tuning a pret… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

    Comments: To appear in ICASSP 2022

    Journal ref: Proc. IEEE ICASSP, May 2022, pp. 8092-8096

  18. arXiv:2110.08449  [pdf, other

    stat.ML cs.CR cs.LG

    Adversarial Attacks on Gaussian Process Bandits

    Authors: Eric Han, Jonathan Scarlett

    Abstract: Gaussian processes (GP) are a widely-adopted tool used to sequentially optimize black-box functions, where evaluations are costly and potentially noisy. Recent works on GP bandits have proposed to move beyond random noise and devise algorithms robust to adversarial attacks. This paper studies this problem from the attacker's perspective, proposing various adversarial attack methods with differing… ▽ More

    Submitted 16 June, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: Accepted to ICML 2022

    MSC Class: 90C26 ACM Class: G.1.6

  19. Improving Speaker Identification for Shared Devices by Adapting Embeddings to Speaker Subsets

    Authors: Zhenning Tan, Yuguang Yang, Eunjung Han, Andreas Stolcke

    Abstract: Speaker identification typically involves three stages. First, a front-end speaker embedding model is trained to embed utterance and speaker profiles. Second, a scoring function is applied between a runtime utterance and each speaker profile. Finally, the speaker is identified using nearest neighbor according to the scoring metric. To better distinguish speakers sharing a device within the same ho… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: Submitted to ASRU 2021

    Journal ref: Proc. IEEE Automatic Speech Recognition and Understanding Workshop, Dec. 2021, pp. 1124-1131

  20. End-to-end Neural Diarization: From Transformer to Conformer

    Authors: Yi Chieh Liu, Eunjung Han, Chul Lee, Andreas Stolcke

    Abstract: We propose a new end-to-end neural diarization (EEND) system that is based on Conformer, a recently proposed neural architecture that combines convolutional mappings and Transformer to model both local and global dependencies in speech. We first show that data augmentation and convolutional subsampling layers enhance the original self-attentive EEND in the Transformer-based EEND, and then Conforme… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

    Comments: To appear in Interspeech 2021

    Journal ref: Proc. Interspeech, Sept. 2021, pp. 3081-3085

  21. arXiv:2012.13088  [pdf, other

    stat.ML cs.LG

    High-Dimensional Bayesian Optimization via Tree-Structured Additive Models

    Authors: Eric Han, Ishank Arora, Jonathan Scarlett

    Abstract: Bayesian Optimization (BO) has shown significant success in tackling expensive low-dimensional black-box optimization problems. Many optimization problems of interest are high-dimensional, and scaling BO to such settings remains an important challenge. In this paper, we consider generalized additive models in which low-dimensional functions with overlapping subsets of variables are composed to mod… ▽ More

    Submitted 23 December, 2020; originally announced December 2020.

    Comments: To appear in AAAI 2021

    MSC Class: 90C26 ACM Class: G.1.6

    Journal ref: Vol. 35 No. 9: AAAI-21 Technical Tracks 9 (2021) 7630-7638

  22. BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers

    Authors: Eunjung Han, Chul Lee, Andreas Stolcke

    Abstract: We present a novel online end-to-end neural diarization system, BW-EDA-EEND, that processes data incrementally for a variable number of speakers. The system is based on the Encoder-Decoder-Attractor (EDA) architecture of Horiguchi et al., but utilizes the incremental Transformer encoder, attending only to its left contexts and using block-level recurrence in the hidden states to carry information… ▽ More

    Submitted 12 February, 2021; v1 submitted 5 November, 2020; originally announced November 2020.

    Journal ref: Proc. IEEE ICASSP, June 2021, pp. 7193-7197