Skip to main content

Showing 1–6 of 6 results for author: Wang, L Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.04647  [pdf, other

    cs.LG q-bio.QM

    Sub-Clustering for Class Distance Recalculation in Long-Tailed Drug Classification

    Authors: Yujia Su, Xinjie Li, Lionel Z. Wang

    Abstract: In the real world, long-tailed data distributions are prevalent, making it challenging for models to effectively learn and classify tail classes. However, we discover that in the field of drug chemistry, certain tail classes exhibit higher identifiability during training due to their unique molecular structural features, a finding that significantly contrasts with the conventional understanding th… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

  2. arXiv:2503.21679  [pdf, other

    cs.CL cs.CY

    JiraiBench: A Bilingual Benchmark for Evaluating Large Language Models' Detection of Human Self-Destructive Behavior Content in Jirai Community

    Authors: Yunze Xiao, Tingyu He, Lionel Z. Wang, Yiming Ma, Xingyu Song, Xiaohang Xu, Irene Li, Ka Chung Ng

    Abstract: This paper introduces JiraiBench, the first bilingual benchmark for evaluating large language models' effectiveness in detecting self-destructive content across Chinese and Japanese social media communities. Focusing on the transnational "Jirai" (landmine) online subculture that encompasses multiple forms of self-destructive behaviors including drug overdose, eating disorders, and self-harm, we pr… ▽ More

    Submitted 30 March, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

    Comments: 20 pages, 1 figures

  3. arXiv:2412.02823  [pdf, other

    cs.CL cs.AI

    Minimization of Boolean Complexity in In-Context Concept Learning

    Authors: Leroy Z. Wang, R. Thomas McCoy, Shane Steinert-Threlkeld

    Abstract: What factors contribute to the relative success and corresponding difficulties of in-context learning for Large Language Models (LLMs)? Drawing on insights from the literature on human concept learning, we test LLMs on carefully designed concept learning tasks, and show that task performance highly correlates with the Boolean complexity of the concept. This suggests that in-context learning exhibi… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

  4. arXiv:2408.11871  [pdf, other

    cs.CL cs.AI

    MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models

    Authors: Lionel Z. Wang, Yiming Ma, Renfei Gao, Beichen Guo, Han Zhu, Wenqi Fan, Zexin Lu, Ka Chung Ng

    Abstract: The advent of large language models (LLMs) has revolutionized online content creation, making it much easier to generate high-quality fake news. This misuse threatens the integrity of our digital environment and ethical standards. Therefore, understanding the motivations and mechanisms behind LLM-generated fake news is crucial. In this study, we analyze the creation of fake news from a social psyc… ▽ More

    Submitted 25 September, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  5. arXiv:2407.17869  [pdf, other

    cs.LG

    EllipBench: A Large-scale Benchmark for Machine-learning based Ellipsometry Modeling

    Authors: Yiming Ma, Xinjie Li, Xin Sun, Zhiyong Wang, Lionel Z. Wang

    Abstract: Ellipsometry is used to indirectly measure the optical properties and thickness of thin films. However, solving the inverse problem of ellipsometry is time-consuming since it involves human expertise to apply the data fitting techniques. Many studies use traditional machine learning-based methods to model the complex mathematical fitting process. In our work, we approach this problem from a deep l… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  6. arXiv:2201.10474  [pdf, other

    cs.CL cs.AI

    Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection

    Authors: Suchin Gururangan, Dallas Card, Sarah K. Dreier, Emily K. Gade, Leroy Z. Wang, Zeyu Wang, Luke Zettlemoyer, Noah A. Smith

    Abstract: Language models increasingly rely on massive web dumps for diverse text data. However, these sources are rife with undesirable content. As such, resources like Wikipedia, books, and newswire often serve as anchors for automatically selecting web text most suitable for language modeling, a process typically referred to as quality filtering. Using a new dataset of U.S. high school newspaper articles… ▽ More

    Submitted 26 January, 2022; v1 submitted 25 January, 2022; originally announced January 2022.