-
Personal Data Protection in Smart Home Activity Monitoring for Digital Health: A Case Study
Authors:
Claudio Bettini,
Azin Moradbeikie,
Gabriele Civitarese
Abstract:
Researchers in pervasive computing have worked for decades on sensor-based human activity recognition (HAR). Among the digital health applications, the recognition of activities of daily living (ADL) in smart home environments enables the identification of behavioral changes that clinicians consider as a digital bio-marker of early stages of cognitive decline. The real deployment of sensor-based H…
▽ More
Researchers in pervasive computing have worked for decades on sensor-based human activity recognition (HAR). Among the digital health applications, the recognition of activities of daily living (ADL) in smart home environments enables the identification of behavioral changes that clinicians consider as a digital bio-marker of early stages of cognitive decline. The real deployment of sensor-based HAR systems in the homes of elderly subjects poses several challenges, with privacy and ethical concerns being major ones. This paper reports our experience applying privacy by design principles to develop and deploy one of these systems.
△ Less
Submitted 27 March, 2025;
originally announced April 2025.
-
The SERENADE project: Sensor-Based Explainable Detection of Cognitive Decline
Authors:
Gabriele Civitarese,
Michele Fiori,
Andrea Arighi,
Daniela Galimberti,
Graziana Florio,
Claudio Bettini
Abstract:
Mild Cognitive Impairment (MCI) affects 12-18% of individuals over 60. MCI patients exhibit cognitive dysfunctions without significant daily functional loss. While MCI may progress to dementia, predicting this transition remains a clinical challenge due to limited and unreliable indicators. Behavioral changes, like in the execution of Activities of Daily Living (ADLs), can signal such progression.…
▽ More
Mild Cognitive Impairment (MCI) affects 12-18% of individuals over 60. MCI patients exhibit cognitive dysfunctions without significant daily functional loss. While MCI may progress to dementia, predicting this transition remains a clinical challenge due to limited and unreliable indicators. Behavioral changes, like in the execution of Activities of Daily Living (ADLs), can signal such progression. Sensorized smart homes and wearable devices offer an innovative solution for continuous, non-intrusive monitoring ADLs for MCI patients. However, current machine learning models for detecting behavioral changes lack transparency, hindering clinicians' trust. This paper introduces the SERENADE project, a European Union-funded initiative that aims to detect and explain behavioral changes associated with cognitive decline using explainable AI methods. SERENADE aims at collecting one year of data from 30 MCI patients living alone, leveraging AI to support clinical decision-making and offering a new approach to early dementia detection.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
Leveraging Large Language Models for Explainable Activity Recognition in Smart Homes: A Critical Evaluation
Authors:
Michele Fiori,
Gabriele Civitarese,
Priyankar Choudhary,
Claudio Bettini
Abstract:
Explainable Artificial Intelligence (XAI) aims to uncover the inner reasoning of machine learning models. In IoT systems, XAI improves the transparency of models processing sensor data from multiple heterogeneous devices, ensuring end-users understand and trust their outputs. Among the many applications, XAI has also been applied to sensor-based Activities of Daily Living (ADLs) recognition in sma…
▽ More
Explainable Artificial Intelligence (XAI) aims to uncover the inner reasoning of machine learning models. In IoT systems, XAI improves the transparency of models processing sensor data from multiple heterogeneous devices, ensuring end-users understand and trust their outputs. Among the many applications, XAI has also been applied to sensor-based Activities of Daily Living (ADLs) recognition in smart homes. Existing approaches highlight which sensor events are most important for each predicted activity, using simple rules to convert these events into natural language explanations for non-expert users. However, these methods produce rigid explanations lacking natural language flexibility and are not scalable. With the recent rise of Large Language Models (LLMs), it is worth exploring whether they can enhance explanation generation, considering their proven knowledge of human activities. This paper investigates potential approaches to combine XAI and LLMs for sensor-based ADL recognition. We evaluate if LLMs can be used: a) as explainable zero-shot ADL recognition models, avoiding costly labeled data collection, and b) to automate the generation of explanations for existing data-driven XAI approaches when training data is available and the goal is higher recognition rates. Our critical evaluation provides insights into the benefits and challenges of using LLMs for explainable ADL recognition.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
GNN-XAR: A Graph Neural Network for Explainable Activity Recognition in Smart Homes
Authors:
Michele Fiori,
Davide Mor,
Gabriele Civitarese,
Claudio Bettini
Abstract:
Sensor-based Human Activity Recognition (HAR) in smart home environments is crucial for several applications, especially in the healthcare domain. The majority of the existing approaches leverage deep learning models. While these approaches are effective, the rationale behind their outputs is opaque. Recently, eXplainable Artificial Intelligence (XAI) approaches emerged to provide intuitive explan…
▽ More
Sensor-based Human Activity Recognition (HAR) in smart home environments is crucial for several applications, especially in the healthcare domain. The majority of the existing approaches leverage deep learning models. While these approaches are effective, the rationale behind their outputs is opaque. Recently, eXplainable Artificial Intelligence (XAI) approaches emerged to provide intuitive explanations to the output of HAR models. To the best of our knowledge, these approaches leverage classic deep models like CNNs or RNNs. Recently, Graph Neural Networks (GNNs) proved to be effective for sensor-based HAR. However, existing approaches are not designed with explainability in mind. In this work, we propose the first explainable Graph Neural Network explicitly designed for smart home HAR. Our results on two public datasets show that this approach provides better explanations than state-of-the-art methods while also slightly improving the recognition rate.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
Using Large Language Models to Compare Explainable Models for Smart Home Human Activity Recognition
Authors:
Michele Fiori,
Gabriele Civitarese,
Claudio Bettini
Abstract:
Recognizing daily activities with unobtrusive sensors in smart environments enables various healthcare applications. Monitoring how subjects perform activities at home and their changes over time can reveal early symptoms of health issues, such as cognitive decline. Most approaches in this field use deep learning models, which are often seen as black boxes mapping sensor data to activities. Howeve…
▽ More
Recognizing daily activities with unobtrusive sensors in smart environments enables various healthcare applications. Monitoring how subjects perform activities at home and their changes over time can reveal early symptoms of health issues, such as cognitive decline. Most approaches in this field use deep learning models, which are often seen as black boxes mapping sensor data to activities. However, non-expert users like clinicians need to trust and understand these models' outputs. Thus, eXplainable AI (XAI) methods for Human Activity Recognition have emerged to provide intuitive natural language explanations from these models. Different XAI methods generate different explanations, and their effectiveness is typically evaluated through user surveys, that are often challenging in terms of costs and fairness. This paper proposes an automatic evaluation method using Large Language Models (LLMs) to identify, in a pool of candidates, the best XAI approach for non-expert users. Our preliminary results suggest that LLM evaluation aligns with user surveys.
△ Less
Submitted 24 July, 2024;
originally announced August 2024.
-
Large Language Models are Zero-Shot Recognizers for Activities of Daily Living
Authors:
Gabriele Civitarese,
Michele Fiori,
Priyankar Choudhary,
Claudio Bettini
Abstract:
The sensor-based recognition of Activities of Daily Living (ADLs) in smart home environments enables several applications in the areas of energy management, safety, well-being, and healthcare. ADLs recognition is typically based on deep learning methods requiring large datasets to be trained. Recently, several studies proved that Large Language Models (LLMs) effectively capture common-sense knowle…
▽ More
The sensor-based recognition of Activities of Daily Living (ADLs) in smart home environments enables several applications in the areas of energy management, safety, well-being, and healthcare. ADLs recognition is typically based on deep learning methods requiring large datasets to be trained. Recently, several studies proved that Large Language Models (LLMs) effectively capture common-sense knowledge about human activities. However, the effectiveness of LLMs for ADLs recognition in smart home environments still deserves to be investigated. In this work, we propose ADL-LLM, a novel LLM-based ADLs recognition system. ADLLLM transforms raw sensor data into textual representations, that are processed by an LLM to perform zero-shot ADLs recognition. Moreover, in the scenario where a small labeled dataset is available, ADL-LLM can also be empowered with few-shot prompting. We evaluated ADL-LLM on two public datasets, showing its effectiveness in this domain.
△ Less
Submitted 20 March, 2025; v1 submitted 1 July, 2024;
originally announced July 2024.
-
Comparing Self-Supervised Learning Techniques for Wearable Human Activity Recognition
Authors:
Sannara Ek,
Riccardo Presotto,
Gabriele Civitarese,
François Portet,
Philippe Lalanda,
Claudio Bettini
Abstract:
Human Activity Recognition (HAR) based on the sensors of mobile/wearable devices aims to detect the physical activities performed by humans in their daily lives. Although supervised learning methods are the most effective in this task, their effectiveness is constrained to using a large amount of labeled data during training. While collecting raw unlabeled data can be relatively easy, annotating d…
▽ More
Human Activity Recognition (HAR) based on the sensors of mobile/wearable devices aims to detect the physical activities performed by humans in their daily lives. Although supervised learning methods are the most effective in this task, their effectiveness is constrained to using a large amount of labeled data during training. While collecting raw unlabeled data can be relatively easy, annotating data is challenging due to costs, intrusiveness, and time constraints.
To address these challenges, this paper explores alternative approaches for accurate HAR using a limited amount of labeled data. In particular, we have adapted recent Self-Supervised Learning (SSL) algorithms to the HAR domain and compared their effectiveness. We investigate three state-of-the-art SSL techniques of different families: contrastive, generative, and predictive. Additionally, we evaluate the impact of the underlying neural network on the recognition rate by comparing state-of-the-art CNN and transformer architectures.
Our results show that a Masked Auto Encoder (MAE) approach significantly outperforms other SSL approaches, including SimCLR, commonly considered one of the best-performing SSL methods in the HAR domain.
The code and the pre-trained SSL models are publicly available for further research and development.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
ContextGPT: Infusing LLMs Knowledge into Neuro-Symbolic Activity Recognition Models
Authors:
Luca Arrotta,
Claudio Bettini,
Gabriele Civitarese,
Michele Fiori
Abstract:
Context-aware Human Activity Recognition (HAR) is a hot research area in mobile computing, and the most effective solutions in the literature are based on supervised deep learning models. However, the actual deployment of these systems is limited by the scarcity of labeled data that is required for training. Neuro-Symbolic AI (NeSy) provides an interesting research direction to mitigate this issue…
▽ More
Context-aware Human Activity Recognition (HAR) is a hot research area in mobile computing, and the most effective solutions in the literature are based on supervised deep learning models. However, the actual deployment of these systems is limited by the scarcity of labeled data that is required for training. Neuro-Symbolic AI (NeSy) provides an interesting research direction to mitigate this issue, by infusing common-sense knowledge about human activities and the contexts in which they can be performed into HAR deep learning classifiers. Existing NeSy methods for context-aware HAR rely on knowledge encoded in logic-based models (e.g., ontologies) whose design, implementation, and maintenance to capture new activities and contexts require significant human engineering efforts, technical knowledge, and domain expertise. Recent works show that pre-trained Large Language Models (LLMs) effectively encode common-sense knowledge about human activities. In this work, we propose ContextGPT: a novel prompt engineering approach to retrieve from LLMs common-sense knowledge about the relationship between human activities and the context in which they are performed. Unlike ontologies, ContextGPT requires limited human effort and expertise. An extensive evaluation carried out on two public datasets shows how a NeSy model obtained by infusing common-sense knowledge from ContextGPT is effective in data scarcity scenarios, leading to similar (and sometimes better) recognition rates than logic-based approaches with a fraction of the effort.
△ Less
Submitted 20 March, 2025; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Combining Public Human Activity Recognition Datasets to Mitigate Labeled Data Scarcity
Authors:
Riccardo Presotto,
Sannara Ek,
Gabriele Civitarese,
François Portet,
Philippe Lalanda,
Claudio Bettini
Abstract:
The use of supervised learning for Human Activity Recognition (HAR) on mobile devices leads to strong classification performances. Such an approach, however, requires large amounts of labeled data, both for the initial training of the models and for their customization on specific clients (whose data often differ greatly from the training data). This is actually impractical to obtain due to the co…
▽ More
The use of supervised learning for Human Activity Recognition (HAR) on mobile devices leads to strong classification performances. Such an approach, however, requires large amounts of labeled data, both for the initial training of the models and for their customization on specific clients (whose data often differ greatly from the training data). This is actually impractical to obtain due to the costs, intrusiveness, and time-consuming nature of data annotation. Moreover, even with the help of a significant amount of labeled data, model deployment on heterogeneous clients faces difficulties in generalizing well on unseen data. Other domains, like Computer Vision or Natural Language Processing, have proposed the notion of pre-trained models, leveraging large corpora, to reduce the need for annotated data and better manage heterogeneity. This promising approach has not been implemented in the HAR domain so far because of the lack of public datasets of sufficient size. In this paper, we propose a novel strategy to combine publicly available datasets with the goal of learning a generalized HAR model that can be fine-tuned using a limited amount of labeled data on an unseen target domain. Our experimental evaluation, which includes experimenting with different state-of-the-art neural network architectures, shows that combining public datasets can significantly reduce the number of labeled samples required to achieve satisfactory performance on an unseen target domain.
△ Less
Submitted 23 June, 2023;
originally announced June 2023.
-
Neuro-Symbolic Approaches for Context-Aware Human Activity Recognition
Authors:
Luca Arrotta,
Gabriele Civitarese,
Claudio Bettini
Abstract:
Deep Learning models are a standard solution for sensor-based Human Activity Recognition (HAR), but their deployment is often limited by labeled data scarcity and models' opacity. Neuro-Symbolic AI (NeSy) provides an interesting research direction to mitigate these issues by infusing knowledge about context information into HAR deep learning classifiers. However, existing NeSy methods for context-…
▽ More
Deep Learning models are a standard solution for sensor-based Human Activity Recognition (HAR), but their deployment is often limited by labeled data scarcity and models' opacity. Neuro-Symbolic AI (NeSy) provides an interesting research direction to mitigate these issues by infusing knowledge about context information into HAR deep learning classifiers. However, existing NeSy methods for context-aware HAR require computationally expensive symbolic reasoners during classification, making them less suitable for deployment on resource-constrained devices (e.g., mobile devices). Additionally, NeSy approaches for context-aware HAR have never been evaluated on in-the-wild datasets, and their generalization capabilities in real-world scenarios are questionable. In this work, we propose a novel approach based on a semantic loss function that infuses knowledge constraints in the HAR model during the training phase, avoiding symbolic reasoning during classification. Our results on scripted and in-the-wild datasets show the impact of different semantic loss functions in outperforming a purely data-driven model. We also compare our solution with existing NeSy methods and analyze each approach's strengths and weaknesses. Our semantic loss remains the only NeSy solution that can be deployed as a single DNN without the need for symbolic reasoning modules, reaching recognition rates close (and better in some cases) to existing approaches.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
SelfAct: Personalized Activity Recognition based on Self-Supervised and Active Learning
Authors:
Luca Arrotta,
Gabriele Civitarese,
Samuele Valente,
Claudio Bettini
Abstract:
Supervised Deep Learning (DL) models are currently the leading approach for sensor-based Human Activity Recognition (HAR) on wearable and mobile devices. However, training them requires large amounts of labeled data whose collection is often time-consuming, expensive, and error-prone. At the same time, due to the intra- and inter-variability of activity execution, activity models should be persona…
▽ More
Supervised Deep Learning (DL) models are currently the leading approach for sensor-based Human Activity Recognition (HAR) on wearable and mobile devices. However, training them requires large amounts of labeled data whose collection is often time-consuming, expensive, and error-prone. At the same time, due to the intra- and inter-variability of activity execution, activity models should be personalized for each user. In this work, we propose SelfAct: a novel framework for HAR combining self-supervised and active learning to mitigate these problems. SelfAct leverages a large pool of unlabeled data collected from many users to pre-train through self-supervision a DL model, with the goal of learning a meaningful and efficient latent representation of sensor data. The resulting pre-trained model can be locally used by new users, which will fine-tune it thanks to a novel unsupervised active learning strategy. Our experiments on two publicly available HAR datasets demonstrate that SelfAct achieves results that are close to or even better than the ones of fully supervised approaches with a small number of active learning queries.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Ultrasound Detection of Subquadricipital Recess Distension
Authors:
Marco Colussi,
Gabriele Civitarese,
Dragan Ahmetovic,
Claudio Bettini,
Roberta Gualtierotti,
Flora Peyvandi,
Sergio Mascetti
Abstract:
Joint bleeding is a common condition for people with hemophilia and, if untreated, can result in hemophilic arthropathy. Ultrasound imaging has recently emerged as an effective tool to diagnose joint recess distension caused by joint bleeding. However, no computer-aided diagnosis tool exists to support the practitioner in the diagnosis process. This paper addresses the problem of automatically det…
▽ More
Joint bleeding is a common condition for people with hemophilia and, if untreated, can result in hemophilic arthropathy. Ultrasound imaging has recently emerged as an effective tool to diagnose joint recess distension caused by joint bleeding. However, no computer-aided diagnosis tool exists to support the practitioner in the diagnosis process. This paper addresses the problem of automatically detecting the recess and assessing whether it is distended in knee ultrasound images collected in patients with hemophilia. After framing the problem, we propose two different approaches: the first one adopts a one-stage object detection algorithm, while the second one is a multi-task approach with a classification and a detection branch. The experimental evaluation, conducted with $483$ annotated images, shows that the solution based on object detection alone has a balanced accuracy score of $0.74$ with a mean IoU value of $0.66$, while the multi-task approach has a higher balanced accuracy value ($0.78$) at the cost of a slightly lower mean IoU value.
△ Less
Submitted 22 November, 2022;
originally announced November 2022.
-
Personalized Semi-Supervised Federated Learning for Human Activity Recognition
Authors:
Claudio Bettini,
Gabriele Civitarese,
Riccardo Presotto
Abstract:
The most effective data-driven methods for human activities recognition (HAR) are based on supervised learning applied to the continuous stream of sensors data. However, these methods perform well on restricted sets of activities in domains for which there is a fully labeled dataset. It is still a challenge to cope with the intra- and inter-variability of activity execution among different subject…
▽ More
The most effective data-driven methods for human activities recognition (HAR) are based on supervised learning applied to the continuous stream of sensors data. However, these methods perform well on restricted sets of activities in domains for which there is a fully labeled dataset. It is still a challenge to cope with the intra- and inter-variability of activity execution among different subjects in large scale real world deployment. Semi-supervised learning approaches for HAR have been proposed to address the challenge of acquiring the large amount of labeled data that is necessary in realistic settings. However, their centralised architecture incurs in the scalability and privacy problems when the process involves a large number of users. Federated Learning (FL) is a promising paradigm to address these problems. However, the FL methods that have been proposed for HAR assume that the participating users can always obtain labels to train their local models. In this work, we propose FedHAR: a novel hybrid method for HAR that combines semi-supervised and federated learning. Indeed, FedHAR combines active learning and label propagation to semi-automatically annotate the local streams of unlabeled sensor data, and it relies on FL to build a global activity model in a scalable and privacy-aware fashion. FedHAR also includes a transfer learning strategy to personalize the global model on each user. We evaluated our method on two public datasets, showing that FedHAR reaches recognition rates and personalization capabilities similar to state-of-the-art FL supervised approaches. As a major advantage, FedHAR only requires a very limited number of annotated data to populate a pre-trained model and a small number of active learning questions that quickly decrease while using the system, leading to an effective and scalable solution for the data scarcity problem of HAR.
△ Less
Submitted 19 April, 2021; v1 submitted 15 April, 2021;
originally announced April 2021.
-
Context-driven Active and Incremental Activity Recognition
Authors:
Gabriele Civitarese,
Riccardo Presotto,
Claudio Bettini
Abstract:
Human activity recognition based on mobile device sensor data has been an active research area in mobile and pervasive computing for several years. While the majority of the proposed techniques are based on supervised learning, semi-supervised approaches are being considered to significantly reduce the size of the training set required to initialize the recognition model. These approaches usually…
▽ More
Human activity recognition based on mobile device sensor data has been an active research area in mobile and pervasive computing for several years. While the majority of the proposed techniques are based on supervised learning, semi-supervised approaches are being considered to significantly reduce the size of the training set required to initialize the recognition model. These approaches usually apply self-training or active learning to incrementally refine the model, but their effectiveness seems to be limited to a restricted set of physical activities. We claim that the context which surrounds the user (e.g., semantic location, proximity to transportation routes, time of the day) combined with common knowledge about the relationship between this context and human activities could be effective in significantly increasing the set of recognized activities including those that are difficult to discriminate only considering inertial sensors, and the ones that are highly context-dependent. In this paper, we propose CAVIAR, a novel hybrid semi-supervised and knowledge-based system for real-time activity recognition. Our method applies semantic reasoning to context data to refine the prediction of a semi-supervised classifier. The context-refined predictions are used as new labeled samples to update the classifier combining self-training and active learning techniques. Results on a real dataset obtained from 26 subjects show the effectiveness of the context-aware approach both on the recognition rates and on the number of queries to the subjects generated by the active learning module. In order to evaluate the impact of context reasoning, we also compare CAVIAR with a purely statistical version, considering features computed on context data as part of the machine learning process.
△ Less
Submitted 7 June, 2019;
originally announced June 2019.
-
Extended Report: Fine-grained Recognition of Abnormal Behaviors for Early Detection of Mild Cognitive Impairment
Authors:
Daniele Riboni,
Claudio Bettini,
Gabriele Civitarese,
Zaffar Haider Janjua,
Rim Helaoui
Abstract:
According to the World Health Organization, the rate of people aged 60 or more is growing faster than any other age group in almost every country, and this trend is not going to change in a near future. Since senior citizens are at high risk of non communicable diseases requiring long-term care, this trend will challenge the sustainability of the entire health system. Pervasive computing can provi…
▽ More
According to the World Health Organization, the rate of people aged 60 or more is growing faster than any other age group in almost every country, and this trend is not going to change in a near future. Since senior citizens are at high risk of non communicable diseases requiring long-term care, this trend will challenge the sustainability of the entire health system. Pervasive computing can provide innovative methods and tools for early detecting the onset of health issues. In this paper we propose a novel method to detect abnormal behaviors of elderly people living at home. The method relies on medical models, provided by cognitive neuroscience researchers, describing abnormal activity routines that may indicate the onset of early symptoms of mild cognitive impairment. A non-intrusive sensor-based infrastructure acquires low-level data about the interaction of the individual with home appliances and furniture, as well as data from environmental sensors. Based on those data, a novel hybrid statistical-symbolical technique is used to detect the abnormal behaviors of the patient, which are communicated to the medical center. Differently from related works, our method can detect abnormal behaviors at a fine-grained level, thus providing an important tool to support the medical diagnosis. In order to evaluate our method we have developed a prototype of the system and acquired a large dataset of abnormal behaviors carried out in an instrumented smart home. Experimental results show that our technique is able to detect most anomalies while generating a small number of false positives.
△ Less
Submitted 22 January, 2015;
originally announced January 2015.
-
Supporting Temporal Reasoning by Mapping Calendar Expressions to Minimal Periodic Sets
Authors:
C. Bettini,
S. Mascetti,
X. S. Wang
Abstract:
In the recent years several research efforts have focused on the concept of time granularity and its applications. A first stream of research investigated the mathematical models behind the notion of granularity and the algorithms to manage temporal data based on those models. A second stream of research investigated symbolic formalisms providing a set of algebraic operators to define granularitie…
▽ More
In the recent years several research efforts have focused on the concept of time granularity and its applications. A first stream of research investigated the mathematical models behind the notion of granularity and the algorithms to manage temporal data based on those models. A second stream of research investigated symbolic formalisms providing a set of algebraic operators to define granularities in a compact and compositional way. However, only very limited manipulation algorithms have been proposed to operate directly on the algebraic representation making it unsuitable to use the symbolic formalisms in applications that need manipulation of granularities.
This paper aims at filling the gap between the results from these two streams of research, by providing an efficient conversion from the algebraic representation to the equivalent low-level representation based on the mathematical models. In addition, the conversion returns a minimal representation in terms of period length. Our results have a major
practical impact: users can more easily define arbitrary granularities in terms of algebraic operators, and then access granularity reasoning and other services operating efficiently on the equivalent, minimal low-level representation. As an example, we illustrate the application to temporal constraint reasoning with multiple granularities.
From a technical point of view, we propose an hybrid algorithm that interleaves the conversion of calendar subexpressions into periodical sets with the minimization of the period length. The algorithm returns set-based granularity representations having minimal period length, which is the most relevant parameter for the performance of the considered reasoning services. Extensive experimental work supports the techniques used in the algorithm, and shows the efficiency and effectiveness of the algorithm.
△ Less
Submitted 10 October, 2011;
originally announced October 2011.
-
Preserving Privacy in Sequential Data Release against Background Knowledge Attacks
Authors:
Daniele Riboni,
Linda Pareschi,
Claudio Bettini
Abstract:
A large amount of transaction data containing associations between individuals and sensitive information flows everyday into data stores. Examples include web queries, credit card transactions, medical exam records, transit database records. The serial release of these data to partner institutions or data analysis centers is a common situation. In this paper we show that, in most domains, correlat…
▽ More
A large amount of transaction data containing associations between individuals and sensitive information flows everyday into data stores. Examples include web queries, credit card transactions, medical exam records, transit database records. The serial release of these data to partner institutions or data analysis centers is a common situation. In this paper we show that, in most domains, correlations among sensitive values associated to the same individuals in different releases can be easily mined, and used to violate users' privacy by adversaries observing multiple data releases. We provide a formal model for privacy attacks based on this sequential background knowledge, as well as on background knowledge on the probability distribution of sensitive values over different individuals. We show how sequential background knowledge can be actually obtained by an adversary, and used to identify with high confidence the sensitive values associated with an individual. A defense algorithm based on Jensen-Shannon divergence is proposed, and extensive experiments show the superiority of the proposed technique with respect to other applicable solutions. To the best of our knowledge, this is the first work that systematically investigates the role of sequential background knowledge in serial release of transaction data.
△ Less
Submitted 5 October, 2010;
originally announced October 2010.
-
Privacy in geo-social networks: proximity notification with untrusted service providers and curious buddies
Authors:
Sergio Mascetti,
Dario Freni,
Claudio Bettini,
X. Sean Wang,
Sushil Jajodia
Abstract:
A major feature of the emerging geo-social networks is the ability to notify a user when one of his friends (also called buddies) happens to be geographically in proximity with the user. This proximity service is usually offered by the network itself or by a third party service provider (SP) using location data acquired from the users. This paper provides a rigorous theoretical and experimental an…
▽ More
A major feature of the emerging geo-social networks is the ability to notify a user when one of his friends (also called buddies) happens to be geographically in proximity with the user. This proximity service is usually offered by the network itself or by a third party service provider (SP) using location data acquired from the users. This paper provides a rigorous theoretical and experimental analysis of the existing solutions for the location privacy problem in proximity services. This is a serious problem for users who do not trust the SP to handle their location data, and would only like to release their location information in a generalized form to participating buddies. The paper presents two new protocols providing complete privacy with respect to the SP, and controllable privacy with respect to the buddies. The analytical and experimental analysis of the protocols takes into account privacy, service precision, and computation and communication costs, showing the superiority of the new protocols compared to those appeared in the literature to date. The proposed protocols have also been tested in a full system implementation of the proximity service.
△ Less
Submitted 6 November, 2010; v1 submitted 2 July, 2010;
originally announced July 2010.
-
The Role of Quasi-identifiers in k-Anonymity Revisited
Authors:
Claudio Bettini,
X. Sean Wang,
Sushil Jajodia
Abstract:
The concept of k-anonymity, used in the recent literature to formally evaluate the privacy preservation of published tables, was introduced based on the notion of quasi-identifiers (or QI for short). The process of obtaining k-anonymity for a given private table is first to recognize the QIs in the table, and then to anonymize the QI values, the latter being called k-anonymization. While k-anony…
▽ More
The concept of k-anonymity, used in the recent literature to formally evaluate the privacy preservation of published tables, was introduced based on the notion of quasi-identifiers (or QI for short). The process of obtaining k-anonymity for a given private table is first to recognize the QIs in the table, and then to anonymize the QI values, the latter being called k-anonymization. While k-anonymization is usually rigorously validated by the authors, the definition of QI remains mostly informal, and different authors seem to have different interpretations of the concept of QI. The purpose of this paper is to provide a formal underpinning of QI and examine the correctness and incorrectness of various interpretations of QI in our formal framework. We observe that in cases where the concept has been used correctly, its application has been conservative; this note provides a formal understanding of the conservative nature in such cases.
△ Less
Submitted 8 November, 2006;
originally announced November 2006.