Skip to main content

Showing 1–16 of 16 results for author: Zhai, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.01557  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Contextures: Representations from Contexts

    Authors: Runtian Zhai, Kai Yang, Che-Ping Tsai, Burak Varici, Zico Kolter, Pradeep Ravikumar

    Abstract: Despite the empirical success of foundation models, we do not have a systematic characterization of the representations that these models learn. In this paper, we establish the contexture theory. It shows that a large class of representation learning methods can be characterized as learning from the association between the input and a context variable. Specifically, we show that many popular metho… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: ICML 2025, longer version. arXiv admin note: substantial text overlap with arXiv:2504.19792

  2. arXiv:2504.19792  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Contextures: The Mechanism of Representation Learning

    Authors: Runtian Zhai

    Abstract: This dissertation establishes the contexture theory to mathematically characterize the mechanism of representation learning, or pretraining. Despite the remarkable empirical success of foundation models, it is not very clear what representations they learn, and why these representations are useful for various downstream tasks. A scientific understanding of representation learning is critical, espe… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: PhD Dissertation

    Report number: CMU-CS-25-104

  3. arXiv:2412.20563  [pdf, other

    cs.CL

    Counterfactual Samples Constructing and Training for Commonsense Statements Estimation

    Authors: Chong Liu, Zaiwen Feng, Lin Liu, Zhenyun Deng, Jiuyong Li, Ruifang Zhai, Debo Cheng, Li Qin

    Abstract: Plausibility Estimation (PE) plays a crucial role for enabling language models to objectively comprehend the real world. While large language models (LLMs) demonstrate remarkable capabilities in PE tasks but sometimes produce trivial commonsense errors due to the complexity of commonsense knowledge. They lack two key traits of an ideal PE model: a) Language-explainable: relying on critical word se… ▽ More

    Submitted 29 December, 2024; originally announced December 2024.

    Comments: 14 pages, 4 figures

  4. arXiv:2402.00645  [pdf, other

    stat.ML cs.LG

    Spectrally Transformed Kernel Regression

    Authors: Runtian Zhai, Rattana Pukdee, Roger Jin, Maria-Florina Balcan, Pradeep Ravikumar

    Abstract: Unlabeled data is a key component of modern machine learning. In general, the role of unlabeled data is to impose a form of smoothness, usually from the similarity information encoded in a base kernel, such as the $ε$-neighbor kernel or the adjacency matrix of a graph. This work revisits the classical idea of spectrally transformed kernel regression (STKR), and provides a new class of general and… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: ICLR 2024 spotlight. 36 pages

  5. arXiv:2311.17335  [pdf, other

    cs.CV cs.MM

    Towards Emotion Analysis in Short-form Videos: A Large-Scale Dataset and Baseline

    Authors: Xuecheng Wu, Heli Sun, Junxiao Xue, Jiayu Nie, Xiangyan Kong, Ruofan Zhai, Liang He

    Abstract: Nowadays, short-form videos (SVs) are essential to web information acquisition and sharing in our daily life. The prevailing use of SVs to spread emotions leads to the necessity of conducting video emotion analysis (VEA) towards SVs. Considering the lack of SVs emotion data, we introduce a large-scale dataset named eMotions, comprising 27,996 videos. Meanwhile, we alleviate the impact of subjectiv… ▽ More

    Submitted 9 December, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

  6. arXiv:2310.18832  [pdf, other

    cs.AI

    Responsible AI (RAI) Games and Ensembles

    Authors: Yash Gupta, Runtian Zhai, Arun Suggala, Pradeep Ravikumar

    Abstract: Several recent works have studied the societal effects of AI; these include issues such as fairness, robustness, and safety. In many of these objectives, a learner seeks to minimize its worst-case loss over a set of predefined distributions (known as uncertainty sets), with usual examples being perturbed versions of the empirical distribution. In other words, aforementioned problems can be written… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  7. arXiv:2308.02961  [pdf, other

    cs.CY

    A Study of China's Censorship and Its Evasion Through the Lens of Online Gaming

    Authors: Yuzhou Feng, Ruyu Zhai, Radu Sion, Bogdan Carbunar

    Abstract: For the past 20 years, China has increasingly restricted the access of minors to online games using addiction prevention systems (APSes). At the same time, and through different means, i.e., the Great Firewall of China (GFW), it also restricts general population access to the international Internet. This paper studies how these restrictions impact young online gamers, and their evasion efforts. We… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

    Journal ref: Usenix Security 2023

  8. arXiv:2306.00788  [pdf, other

    cs.LG stat.ML

    Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression

    Authors: Runtian Zhai, Bingbin Liu, Andrej Risteski, Zico Kolter, Pradeep Ravikumar

    Abstract: Data augmentation is critical to the empirical success of modern self-supervised representation learning, such as contrastive learning and masked language modeling. However, a theoretical understanding of the exact role of augmentation remains limited. Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator, su… ▽ More

    Submitted 18 January, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: ICLR 2024 spotlight. 34 pages

  9. arXiv:2305.15640  [pdf, other

    cs.LG cs.CV

    Characterizing Out-of-Distribution Error via Optimal Transport

    Authors: Yuzhe Lu, Yilong Qin, Runtian Zhai, Andrew Shen, Ketong Chen, Zhenlin Wang, Soheil Kolouri, Simon Stepputtis, Joseph Campbell, Katia Sycara

    Abstract: Out-of-distribution (OOD) data poses serious challenges in deployed machine learning models, so methods of predicting a model's performance on OOD data without labels are important for machine learning safety. While a number of methods have been proposed by prior work, they often underestimate the actual error, sometimes by a large margin, which greatly impacts their applicability to real tasks. I… ▽ More

    Submitted 27 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  10. arXiv:2302.05018  [pdf, other

    cs.LG cs.CV

    Predicting Out-of-Distribution Error with Confidence Optimal Transport

    Authors: Yuzhe Lu, Zhenlin Wang, Runtian Zhai, Soheil Kolouri, Joseph Campbell, Katia Sycara

    Abstract: Out-of-distribution (OOD) data poses serious challenges in deployed machine learning models as even subtle changes could incur significant performance drops. Being able to estimate a model's performance on test data is important in practice as it indicates when to trust to model's decisions. We present a simple yet effective method to predict a model's performance on an unknown distribution withou… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

  11. arXiv:2201.12293  [pdf, other

    cs.LG stat.ML

    Understanding Why Generalized Reweighting Does Not Improve Over ERM

    Authors: Runtian Zhai, Chen Dan, Zico Kolter, Pradeep Ravikumar

    Abstract: Empirical risk minimization (ERM) is known in practice to be non-robust to distributional shift where the training and the test distributions are different. A suite of approaches, such as importance weighting, and variants of distributionally robust optimization (DRO), have been proposed to solve this problem. But a line of recent work has empirically shown that these approaches do not significant… ▽ More

    Submitted 7 February, 2023; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: ICLR 2023. 40 pages, 3 figures

  12. arXiv:2110.13948  [pdf, other

    cs.LG stat.ML

    Boosted CVaR Classification

    Authors: Runtian Zhai, Chen Dan, Arun Sai Suggala, Zico Kolter, Pradeep Ravikumar

    Abstract: Many modern machine learning tasks require models with high tail performance, i.e. high performance over the worst-off samples in the dataset. This problem has been widely studied in fields such as algorithmic fairness, class imbalance, and risk-sensitive decision making. A popular approach to maximize the model's tail performance is to minimize the CVaR (Conditional Value at Risk) loss, which com… ▽ More

    Submitted 10 November, 2021; v1 submitted 26 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021. 16 pages, 4 figures

  13. arXiv:2106.06142  [pdf, ps, other

    cs.LG stat.ML

    DORO: Distributional and Outlier Robust Optimization

    Authors: Runtian Zhai, Chen Dan, J. Zico Kolter, Pradeep Ravikumar

    Abstract: Many machine learning tasks involve subpopulation shift where the testing data distribution is a subpopulation of the training distribution. For such settings, a line of recent work has proposed the use of a variant of empirical risk minimization(ERM) known as distributionally robust optimization (DRO). In this work, we apply DRO to real, large-scale tasks with subpopulation shift, and observe tha… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: ICML 2021. Codes: https://github.com/RuntianZ/doro

  14. arXiv:2007.12446  [pdf, other

    cs.LG stat.ML

    Transferred Discrepancy: Quantifying the Difference Between Representations

    Authors: Yunzhen Feng, Runtian Zhai, Di He, Liwei Wang, Bin Dong

    Abstract: Understanding what information neural networks capture is an essential problem in deep learning, and studying whether different models capture similar features is an initial step to achieve this goal. Previous works sought to define metrics over the feature matrices to measure the difference between two models. However, different metrics sometimes lead to contradictory conclusions, and there has b… ▽ More

    Submitted 24 July, 2020; originally announced July 2020.

    Comments: 23 pages, 3 figures

  15. arXiv:2001.02378  [pdf, other

    cs.LG cs.CR stat.ML

    MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius

    Authors: Runtian Zhai, Chen Dan, Di He, Huan Zhang, Boqing Gong, Pradeep Ravikumar, Cho-Jui Hsieh, Liwei Wang

    Abstract: Adversarial training is one of the most popular ways to learn robust models but is usually attack-dependent and time costly. In this paper, we propose the MACER algorithm, which learns robust models without using adversarial training but performs better than all existing provable l2-defenses. Recent work shows that randomized smoothing can be used to provide a certified l2 radius to smoothed class… ▽ More

    Submitted 14 March, 2022; v1 submitted 8 January, 2020; originally announced January 2020.

    Comments: Published in ICLR 2020. 20 Pages

  16. arXiv:1906.00555  [pdf, ps, other

    cs.LG stat.ML

    Adversarially Robust Generalization Just Requires More Unlabeled Data

    Authors: Runtian Zhai, Tianle Cai, Di He, Chen Dan, Kun He, John Hopcroft, Liwei Wang

    Abstract: Neural network robustness has recently been highlighted by the existence of adversarial examples. Many previous works show that the learned networks do not perform well on perturbed test data, and significantly more labeled data is required to achieve adversarially robust generalization. In this paper, we theoretically and empirically show that with just more unlabeled data, we can learn a model w… ▽ More

    Submitted 25 September, 2019; v1 submitted 2 June, 2019; originally announced June 2019.

    Comments: 16 pages. Submitted to ICLR 2020