Search | arXiv e-print repository

arXiv:1910.10937 [pdf, other]

Online Boosting for Multilabel Ranking with Top-k Feedback

Authors: Vinod Raman, Daniel T. Zhang, Young Hun Jung, Ambuj Tewari

Abstract: We present online boosting algorithms for multilabel ranking with top-k feedback, where the learner only receives information about the top k items from the ranking it provides. We propose a novel surrogate loss function and unbiased estimator, allowing weak learners to update themselves with limited information. Using these techniques we adapt full information multilabel ranking algorithms (Jung… ▽ More We present online boosting algorithms for multilabel ranking with top-k feedback, where the learner only receives information about the top k items from the ranking it provides. We propose a novel surrogate loss function and unbiased estimator, allowing weak learners to update themselves with limited information. Using these techniques we adapt full information multilabel ranking algorithms (Jung and Tewari, 2018) to the top-k feedback setting and provide theoretical performance bounds which closely match the bounds of their full information counterparts, with the cost of increased sample complexity. These theoretical results are further substantiated by our experiments, which show a small gap in performance between the algorithms for the top-k feedback setting and that for the full information setting across various datasets. △ Less

Submitted 19 October, 2020; v1 submitted 24 October, 2019; originally announced October 2019.

Comments: Under review for AISTATS 2021. Fixed small errors throughout the manuscript and added new content comparing/contrasting various randomization procedures

arXiv:1810.05290 [pdf, other]

Online Multiclass Boosting with Bandit Feedback

Authors: Daniel T. Zhang, Young Hun Jung, Ambuj Tewari

Abstract: We present online boosting algorithms for multiclass classification with bandit feedback, where the learner only receives feedback about the correctness of its prediction. We propose an unbiased estimate of the loss using a randomized prediction, allowing the model to update its weak learners with limited information. Using the unbiased estimate, we extend two full information boosting algorithms… ▽ More We present online boosting algorithms for multiclass classification with bandit feedback, where the learner only receives feedback about the correctness of its prediction. We propose an unbiased estimate of the loss using a randomized prediction, allowing the model to update its weak learners with limited information. Using the unbiased estimate, we extend two full information boosting algorithms (Jung et al., 2017) to the bandit setting. We prove that the asymptotic error bounds of the bandit algorithms exactly match their full information counterparts. The cost of restricted feedback is reflected in the larger sample complexity. Experimental results also support our theoretical findings, and performance of the proposed models is comparable to that of an existing bandit boosting algorithm, which is limited to use binary weak learners. △ Less

Submitted 25 February, 2019; v1 submitted 11 October, 2018; originally announced October 2018.

Comments: Accepted in AISTATS 2019

arXiv:1707.01591 [pdf, other]

doi 10.1145/3097983.3098078

A Data Science Approach to Understanding Residential Water Contamination in Flint

Authors: Alex Chojnacki, Chengyu Dai, Arya Farahi, Guangsha Shi, Jared Webb, Daniel T. Zhang, Jacob Abernethy, Eric Schwartz

Abstract: When the residents of Flint learned that lead had contaminated their water system, the local government made water-testing kits available to them free of charge. The city government published the results of these tests, creating a valuable dataset that is key to understanding the causes and extent of the lead contamination event in Flint. This is the nation's largest dataset on lead in a municipal… ▽ More When the residents of Flint learned that lead had contaminated their water system, the local government made water-testing kits available to them free of charge. The city government published the results of these tests, creating a valuable dataset that is key to understanding the causes and extent of the lead contamination event in Flint. This is the nation's largest dataset on lead in a municipal water system. In this paper, we predict the lead contamination for each household's water supply, and we study several related aspects of Flint's water troubles, many of which generalize well beyond this one city. For example, we show that elevated lead risks can be (weakly) predicted from observable home attributes. Then we explore the factors associated with elevated lead. These risk assessments were developed in part via a crowd sourced prediction challenge at the University of Michigan. To inform Flint residents of these assessments, they have been incorporated into a web and mobile application funded by \texttt{Google.org}. We also explore questions of self-selection in the residential testing program, examining which factors are linked to when and how frequently residents voluntarily sample their water. △ Less

Submitted 5 July, 2017; originally announced July 2017.

Comments: Applied Data Science track paper at KDD 2017. For associated promotional video, see https://www.youtube.com/watch?v=0g66ImaV8Ag

Showing 1–3 of 3 results for author: Zhang, D T