-
Towards Aligned Data Removal via Twin Machine Unlearning
Authors:
Haoxuan Ji,
Zheng Lin,
Yuyao Sun,
Gao Fei,
Yuhang Wang,
Haichang Gao,
Zhenxing Niu
Abstract:
Modern privacy regulations have spurred the evolution of machine unlearning, a technique that enables the removal of data from an already trained ML model without requiring retraining from scratch. Previous unlearning methods tend to induce the model to achieve lowest classification accuracy on the removal data. Nonetheless, the authentic objective of machine unlearning is to align the unlearned m…
▽ More
Modern privacy regulations have spurred the evolution of machine unlearning, a technique that enables the removal of data from an already trained ML model without requiring retraining from scratch. Previous unlearning methods tend to induce the model to achieve lowest classification accuracy on the removal data. Nonetheless, the authentic objective of machine unlearning is to align the unlearned model with the gold model, i.e., achieving the same classification accuracy as the gold model. For this purpose, we present a Twin Machine Unlearning (TMU) approach, where a twin unlearning problem is defined corresponding to the original unlearning problem. As a results, the generalization-label predictor trained on the twin problem can be transferred to the original problem, facilitating aligned data removal. Comprehensive empirical experiments illustrate that our approach significantly enhances the alignment between the unlearned model and the gold model. Meanwhile, our method allows data removal without compromising the model accuracy.
△ Less
Submitted 2 May, 2025; v1 submitted 21 August, 2024;
originally announced August 2024.
-
Graph Feedback Bandits with Similar Arms
Authors:
Han Qi,
Guo Fei,
Li Zhu
Abstract:
In this paper, we study the stochastic multi-armed bandit problem with graph feedback. Motivated by the clinical trials and recommendation problem, we assume that two arms are connected if and only if they are similar (i.e., their means are close enough). We establish a regret lower bound for this novel feedback structure and introduce two UCB-based algorithms: D-UCB with problem-independent regre…
▽ More
In this paper, we study the stochastic multi-armed bandit problem with graph feedback. Motivated by the clinical trials and recommendation problem, we assume that two arms are connected if and only if they are similar (i.e., their means are close enough). We establish a regret lower bound for this novel feedback structure and introduce two UCB-based algorithms: D-UCB with problem-independent regret upper bounds and C-UCB with problem-dependent upper bounds. Leveraging the similarity structure, we also consider the scenario where the number of arms increases over time. Practical applications related to this scenario include Q\&A platforms (Reddit, Stack Overflow, Quora) and product reviews in Amazon and Flipkart. Answers (product reviews) continually appear on the website, and the goal is to display the best answers (product reviews) at the top. When the means of arms are independently generated from some distribution, we provide regret upper bounds for both algorithms and discuss the sub-linearity of bounds in relation to the distribution of means. Finally, we conduct experiments to validate the theoretical results.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
ClickSAM: Fine-tuning Segment Anything Model using click prompts for ultrasound image segmentation
Authors:
Aimee Guo,
Grace Fei,
Hemanth Pasupuleti,
Jing Wang
Abstract:
The newly released Segment Anything Model (SAM) is a popular tool used in image processing due to its superior segmentation accuracy, variety of input prompts, training capabilities, and efficient model design. However, its current model is trained on a diverse dataset not tailored to medical images, particularly ultrasound images. Ultrasound images tend to have a lot of noise, making it difficult…
▽ More
The newly released Segment Anything Model (SAM) is a popular tool used in image processing due to its superior segmentation accuracy, variety of input prompts, training capabilities, and efficient model design. However, its current model is trained on a diverse dataset not tailored to medical images, particularly ultrasound images. Ultrasound images tend to have a lot of noise, making it difficult to segment out important structures. In this project, we developed ClickSAM, which fine-tunes the Segment Anything Model using click prompts for ultrasound images. ClickSAM has two stages of training: the first stage is trained on single-click prompts centered in the ground-truth contours, and the second stage focuses on improving the model performance through additional positive and negative click prompts. By comparing the first stage predictions to the ground-truth masks, true positive, false positive, and false negative segments are calculated. Positive clicks are generated using the true positive and false negative segments, and negative clicks are generated using the false positive segments. The Centroidal Voronoi Tessellation algorithm is then employed to collect positive and negative click prompts in each segment that are used to enhance the model performance during the second stage of training. With click-train methods, ClickSAM exhibits superior performance compared to other existing models for ultrasound image segmentation.
△ Less
Submitted 24 February, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Physics-informed Deep Diffusion MRI Reconstruction with Synthetic Data: Break Training Data Bottleneck in Artificial Intelligence
Authors:
Chen Qian,
Haoyu Zhang,
Yuncheng Gao,
Mingyang Han,
Zi Wang,
Dan Ruan,
Yu Shen,
Yaping Wu,
Yirong Zhou,
Chengyan Wang,
Boyu Jiang,
Ran Tao,
Zhigang Wu,
Jiazheng Wang,
Liuhong Zhu,
Yi Guo,
Taishan Kang,
Jianzhong Lin,
Tao Gong,
Chen Yang,
Guoqiang Fei,
Meijin Lin,
Di Guo,
Jianjun Zhou,
Meiyun Wang
, et al. (1 additional authors not shown)
Abstract:
Diffusion magnetic resonance imaging (MRI) is the only imaging modality for non-invasive movement detection of in vivo water molecules, with significant clinical and research applications. Diffusion weighted imaging (DWI) MRI acquired by multi-shot techniques can achieve higher resolution, better signal-to-noise ratio, and lower geometric distortion than single-shot, but suffers from inter-shot mo…
▽ More
Diffusion magnetic resonance imaging (MRI) is the only imaging modality for non-invasive movement detection of in vivo water molecules, with significant clinical and research applications. Diffusion weighted imaging (DWI) MRI acquired by multi-shot techniques can achieve higher resolution, better signal-to-noise ratio, and lower geometric distortion than single-shot, but suffers from inter-shot motion-induced artifacts. These artifacts cannot be removed prospectively, leading to the absence of artifact-free training labels. Thus, the potential of deep learning in multi-shot DWI reconstruction remains largely untapped. To break the training data bottleneck, here, we propose a Physics-Informed Deep DWI reconstruction method (PIDD) to synthesize high-quality paired training data by leveraging the physical diffusion model (magnitude synthesis) and inter-shot motion-induced phase model (motion phase synthesis). The network is trained only once with 100,000 synthetic samples, achieving encouraging results on multiple realistic in vivo data reconstructions. Advantages over conventional methods include: (a) Better motion artifact suppression and reconstruction stability; (b) Outstanding generalization to multi-scenario reconstructions, including multi-resolution, multi-b-value, multi-under-sampling, multi-vendor, and multi-center; (c) Excellent clinical adaptability to patients with verifications by seven experienced doctors (p<0.001). In conclusion, PIDD presents a novel deep learning framework by exploiting the power of MRI physics, providing a cost-effective and explainable way to break the data bottleneck in deep learning medical imaging.
△ Less
Submitted 3 May, 2025; v1 submitted 20 October, 2022;
originally announced October 2022.
-
Detecting Changed-Hands Online Review Accounts
Authors:
Geli Fei,
Shuai Wang,
Bing Liu,
Leman Akoglu
Abstract:
A reputable social media or review account can be a good cover for spamming activities. It has become prevalent that spammers buy/sell such accounts openly on the Web. We call these sold/bought accounts the changed-hands (CH) accounts. They are hard to detect by existing spam detection algorithms as their spamming activities are under the disguise of clean histories. In this paper, we first propos…
▽ More
A reputable social media or review account can be a good cover for spamming activities. It has become prevalent that spammers buy/sell such accounts openly on the Web. We call these sold/bought accounts the changed-hands (CH) accounts. They are hard to detect by existing spam detection algorithms as their spamming activities are under the disguise of clean histories. In this paper, we first propose the problem of detecting CH accounts, and then design an effective detection algorithm which exploits changes in content and writing styles of individual accounts, and a proposed novel feature selection method that works at a fine-grained level within each individual account. The proposed method not only determines if an account has changed hands, but also pinpoints the change point. Experimental results with online review accounts demonstrate the high effectiveness of our approach.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
Contextual and Position-Aware Factorization Machines for Sentiment Classification
Authors:
Shuai Wang,
Mianwei Zhou,
Geli Fei,
Yi Chang,
Bing Liu
Abstract:
While existing machine learning models have achieved great success for sentiment classification, they typically do not explicitly capture sentiment-oriented word interaction, which can lead to poor results for fine-grained analysis at the snippet level (a phrase or sentence). Factorization Machine provides a possible approach to learning element-wise interaction for recommender systems, but they a…
▽ More
While existing machine learning models have achieved great success for sentiment classification, they typically do not explicitly capture sentiment-oriented word interaction, which can lead to poor results for fine-grained analysis at the snippet level (a phrase or sentence). Factorization Machine provides a possible approach to learning element-wise interaction for recommender systems, but they are not directly applicable to our task due to the inability to model contexts and word sequences. In this work, we develop two Position-aware Factorization Machines which consider word interaction, context and position information. Such information is jointly encoded in a set of sentiment-oriented word interaction vectors. Compared to traditional word embeddings, SWI vectors explicitly capture sentiment-oriented word interaction and simplify the parameter learning. Experimental results show that while they have comparable performance with state-of-the-art methods for document-level classification, they benefit the snippet/sentence-level sentiment analysis.
△ Less
Submitted 18 January, 2018;
originally announced January 2018.
-
Computational Models for Attitude and Actions Prediction
Authors:
Jalal Mahmud,
Geli Fei,
Anbang Xu,
Aditya Pal,
Michelle Zhou
Abstract:
In this paper, we present computational models to predict Twitter users' attitude towards a specific brand through their personal and social characteristics. We also predict their likelihood to take different actions based on their attitudes. In order to operationalize our research on users' attitude and actions, we collected ground-truth data through surveys of Twitter users. We have conducted ex…
▽ More
In this paper, we present computational models to predict Twitter users' attitude towards a specific brand through their personal and social characteristics. We also predict their likelihood to take different actions based on their attitudes. In order to operationalize our research on users' attitude and actions, we collected ground-truth data through surveys of Twitter users. We have conducted experiments using two real world datasets to validate the effectiveness of our attitude and action prediction framework. Finally, we show how our models can be integrated with a visual analytics system for customer intervention.
△ Less
Submitted 16 April, 2017;
originally announced April 2017.
-
Modeling Review Spam Using Temporal Patterns and Co-bursting Behaviors
Authors:
Huayi Li,
Geli Fei,
Shuai Wang,
Bing Liu,
Weixiang Shao,
Arjun Mukherjee,
Jidong Shao
Abstract:
Online reviews play a crucial role in helping consumers evaluate and compare products and services. However, review hosting sites are often targeted by opinion spamming. In recent years, many such sites have put a great deal of effort in building effective review filtering systems to detect fake reviews and to block malicious accounts. Thus, fraudsters or spammers now turn to compromise, purchase…
▽ More
Online reviews play a crucial role in helping consumers evaluate and compare products and services. However, review hosting sites are often targeted by opinion spamming. In recent years, many such sites have put a great deal of effort in building effective review filtering systems to detect fake reviews and to block malicious accounts. Thus, fraudsters or spammers now turn to compromise, purchase or even raise reputable accounts to write fake reviews. Based on the analysis of a real-life dataset from a review hosting site (dianping.com), we discovered that reviewers' posting rates are bimodal and the transitions between different states can be utilized to differentiate spammers from genuine reviewers. Inspired by these findings, we propose a two-mode Labeled Hidden Markov Model to detect spammers. Experimental results show that our model significantly outperforms supervised learning using linguistic and behavioral features in identifying spammers. Furthermore, we found that when a product has a burst of reviews, many spammers are likely to be actively involved in writing reviews to the product as well as to many other products. We then propose a novel co-bursting network for detecting spammer groups. The co-bursting network enables us to produce more accurate spammer groups than the current state-of-the-art reviewer-product (co-reviewing) network.
△ Less
Submitted 20 November, 2016;
originally announced November 2016.
-
A Novel Methodologyof Router-To-ASMapping inspired by Community Discovery
Authors:
Weiyi Liu,
Qing Jiang,
Gaolei Fei,
Mingkai Yuan,
Guangmin Hu
Abstract:
In the last decade many works has been done on the Internet topology at router or autonomous system (AS) level. As routers is the essential composition of ASes while ASes dominate the behavior of their routers. It is no doubt that identifying the affiliation between routers and ASes can let us gain a deeper understanding on the topology. However, the existing methods that assign a router to an AS…
▽ More
In the last decade many works has been done on the Internet topology at router or autonomous system (AS) level. As routers is the essential composition of ASes while ASes dominate the behavior of their routers. It is no doubt that identifying the affiliation between routers and ASes can let us gain a deeper understanding on the topology. However, the existing methods that assign a router to an AS just based on the origin ASes of its IP addresses, which does not make full use of information in our hand. In this paper, we propose a methodology to assign routers to their owner ASes based on community discovery tech. First, we use the origin ASes information along with router-pairs similarities to construct a weighted router level topology, secondly, for enormous topology data (more than 2M nodes and 19M edges) from CAIDA ITDK project, we propose a fast hierarchy clustering which time and space complex are both linear to do ASes community discovery, last we do router-to-AS mapping based on these ASes communities. Experiments show that combining with ASes communities our methodology discovers, the best accuracy rate of router-to-AS mapping can reach to 82.62%, which is drastically high comparing to prior works that stagnate on 65.44%.
△ Less
Submitted 12 December, 2015;
originally announced December 2015.