Search | arXiv e-print repository

Context-aware Ensemble of Multifaceted Factorization Models for Recommendation Prediction in Social Networks

Authors: Yunwen Chen, Zuotao Liu, Daqi Ji, Yingwei Xin, Wenguang Wang, Lu Yao, Yi Zou

Abstract: This paper describes the solution of Shanda Innovations team to Task 1 of KDD-Cup 2012. A novel approach called Multifaceted Factorization Models is proposed to incorporate a great variety of features in social networks. Social relationships and actions between users are integrated as implicit feedbacks to improve the recommendation accuracy. Keywords, tags, profiles, time and some other features… ▽ More This paper describes the solution of Shanda Innovations team to Task 1 of KDD-Cup 2012. A novel approach called Multifaceted Factorization Models is proposed to incorporate a great variety of features in social networks. Social relationships and actions between users are integrated as implicit feedbacks to improve the recommendation accuracy. Keywords, tags, profiles, time and some other features are also utilized for modeling user interests. In addition, user behaviors are modeled from the durations of recommendation records. A context-aware ensemble framework is then applied to combine multiple predictors and produce final recommendation results. The proposed approach obtained 0.43959 (public score) / 0.41874 (private score) on the testing dataset, which achieved the 2nd place in the KDD-Cup competition. △ Less

Submitted 3 May, 2021; originally announced May 2021.

Comments: KDD 2012

arXiv:2102.00667 [pdf, ps, other]

Probabilistic Learning Vector Quantization on Manifold of Symmetric Positive Definite Matrices

Authors: Fengzhen Tang, Haifeng Feng, Peter Tino, Bailu Si, Daxiong Ji

Abstract: In this paper, we develop a new classification method for manifold-valued data in the framework of probabilistic learning vector quantization. In many classification scenarios, the data can be naturally represented by symmetric positive definite matrices, which are inherently points that live on a curved Riemannian manifold. Due to the non-Euclidean geometry of Riemannian manifolds, traditional Eu… ▽ More In this paper, we develop a new classification method for manifold-valued data in the framework of probabilistic learning vector quantization. In many classification scenarios, the data can be naturally represented by symmetric positive definite matrices, which are inherently points that live on a curved Riemannian manifold. Due to the non-Euclidean geometry of Riemannian manifolds, traditional Euclidean machine learning algorithms yield poor results on such data. In this paper, we generalize the probabilistic learning vector quantization algorithm for data points living on the manifold of symmetric positive definite matrices equipped with Riemannian natural metric (affine-invariant metric). By exploiting the induced Riemannian distance, we derive the probabilistic learning Riemannian space quantization algorithm, obtaining the learning rule through Riemannian gradient descent. Empirical investigations on synthetic data, image data , and motor imagery EEG data demonstrate the superior performance of the proposed method. △ Less

Submitted 1 February, 2021; originally announced February 2021.

Comments: 15 pages, 7 figures

arXiv:2010.09851 [pdf, other]

Can I Trust My Fairness Metric? Assessing Fairness with Unlabeled Data and Bayesian Inference

Authors: Disi Ji, Padhraic Smyth, Mark Steyvers

Abstract: We investigate the problem of reliably assessing group fairness when labeled examples are few but unlabeled examples are plentiful. We propose a general Bayesian framework that can augment labeled data with unlabeled data to produce more accurate and lower-variance estimates compared to methods based on labeled data alone. Our approach estimates calibrated scores for unlabeled examples in each gro… ▽ More We investigate the problem of reliably assessing group fairness when labeled examples are few but unlabeled examples are plentiful. We propose a general Bayesian framework that can augment labeled data with unlabeled data to produce more accurate and lower-variance estimates compared to methods based on labeled data alone. Our approach estimates calibrated scores for unlabeled examples in each group using a hierarchical latent variable model conditioned on labeled examples. This in turn allows for inference of posterior distributions with associated notions of uncertainty for a variety of group fairness metrics. We demonstrate that our approach leads to significant and consistent reductions in estimation error across multiple well-known fairness datasets, sensitive attributes, and predictive models. The results show the benefits of using both unlabeled data and Bayesian inference in terms of assessing whether a prediction model is fair or not. △ Less

Submitted 19 October, 2020; originally announced October 2020.

Comments: 27 pages

arXiv:2002.06532 [pdf, other]

Active Bayesian Assessment for Black-Box Classifiers

Authors: Disi Ji, Robert L. Logan IV, Padhraic Smyth, Mark Steyvers

Abstract: Recent advances in machine learning have led to increased deployment of black-box classifiers across a wide variety of applications. In many such situations there is a critical need to both reliably assess the performance of these pre-trained models and to perform this assessment in a label-efficient manner (given that labels may be scarce and costly to collect). In this paper, we introduce an act… ▽ More Recent advances in machine learning have led to increased deployment of black-box classifiers across a wide variety of applications. In many such situations there is a critical need to both reliably assess the performance of these pre-trained models and to perform this assessment in a label-efficient manner (given that labels may be scarce and costly to collect). In this paper, we introduce an active Bayesian approach for assessment of classifier performance to satisfy the desiderata of both reliability and label-efficiency. We begin by developing inference strategies to quantify uncertainty for common assessment metrics such as accuracy, misclassification cost, and calibration error. We then propose a general framework for active Bayesian assessment using inferred uncertainty to guide efficient selection of instances for labeling, enabling better performance assessment with fewer labels. We demonstrate significant gains from our proposed active Bayesian approach via a series of systematic empirical experiments assessing the performance of modern neural classifiers (e.g., ResNet and BERT) on several standard image and text classification datasets. △ Less

Submitted 15 March, 2021; v1 submitted 16 February, 2020; originally announced February 2020.

arXiv:1905.09993 [pdf, other]

Inference of Dynamic Graph Changes for Functional Connectome

Authors: Dingjue Ji, Junwei Lu, Yiliang Zhang, Hongyu Zhao, Siyuan Gao

Abstract: Dynamic functional connectivity is an effective measure for the brain's responses to continuous stimuli. We propose an inferential method to detect the dynamic changes of brain networks based on time-varying graphical models. Whereas most existing methods focus on testing the existence of change points, the dynamics in the brain network offer more signals in many neuroscience studies. We propose a… ▽ More Dynamic functional connectivity is an effective measure for the brain's responses to continuous stimuli. We propose an inferential method to detect the dynamic changes of brain networks based on time-varying graphical models. Whereas most existing methods focus on testing the existence of change points, the dynamics in the brain network offer more signals in many neuroscience studies. We propose a novel method to conduct hypothesis testing on changes in dynamic brain networks. We introduce a bootstrap statistic to approximate the supreme of the high-dimensional empirical processes over dynamically changing edges. Our simulations show that this framework can capture the change points with changed connectivity. Finally, we apply our method to a brain imaging dataset under a natural audio-video stimulus and illustrate that we are able to detect temporal changes in brain networks. The functions of the identified regions are consistent with specific emotional annotations, which are closely associated with changes inferred by our method. △ Less

Submitted 19 June, 2020; v1 submitted 23 May, 2019; originally announced May 2019.

Journal ref: International Conference on Artificial Intelligence and Statistics, 26-28 August 2020, Online, PMLR 108:3230-3240

arXiv:1711.07673 [pdf, other]

Mondrian Processes for Flow Cytometry Analysis

Authors: Disi Ji, Eric Nalisnick, Padhraic Smyth

Abstract: Analysis of flow cytometry data is an essential tool for clinical diagnosis of hematological and immunological conditions. Current clinical workflows rely on a manual process called gating to classify cells into their canonical types. This dependence on human annotation limits the rate, reproducibility, and complexity of flow cytometry analysis. In this paper, we propose using Mondrian processes t… ▽ More Analysis of flow cytometry data is an essential tool for clinical diagnosis of hematological and immunological conditions. Current clinical workflows rely on a manual process called gating to classify cells into their canonical types. This dependence on human annotation limits the rate, reproducibility, and complexity of flow cytometry analysis. In this paper, we propose using Mondrian processes to perform automated gating by incorporating prior information of the kind used by gating technicians. The method segments cells into types via Bayesian nonparametric trees. Examining the posterior over trees allows for interpretable visualizations and uncertainty quantification - two vital qualities for implementation in clinical practice. △ Less

Submitted 28 November, 2017; v1 submitted 21 November, 2017; originally announced November 2017.

Comments: 7 pages, 4 figures, NIPS workshop ML4H: Machine Learning for Health 2017, Long Beach, CA, USA

Showing 1–6 of 6 results for author: Ji, D