Skip to main content

Showing 1–50 of 56 results for author: Yoshikawa, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.00494  [pdf, ps, other

    cs.CR cs.AI cs.LG

    Data Overvaluation Attack and Truthful Data Valuation in Federated Learning

    Authors: Shuyuan Zheng, Sudong Cai, Chuan Xiao, Yang Cao, Jianbin Qin, Masatoshi Yoshikawa, Makoto Onizuka

    Abstract: In collaborative machine learning (CML), data valuation, i.e., evaluating the contribution of each client's data to the machine learning model, has become a critical task for incentivizing and selecting positive data contributions. However, existing studies often assume that clients engage in data valuation truthfully, overlooking the practical motivation for clients to exaggerate their contributi… ▽ More

    Submitted 24 May, 2025; v1 submitted 1 February, 2025; originally announced February 2025.

  2. arXiv:2410.16121  [pdf, other

    cs.LG cs.CR

    Extracting Spatiotemporal Data from Gradients with Large Language Models

    Authors: Lele Zheng, Yang Cao, Renhe Jiang, Kenjiro Taura, Yulong Shen, Sheng Li, Masatoshi Yoshikawa

    Abstract: Recent works show that sensitive user data can be reconstructed from gradient updates, breaking the key privacy promise of federated learning. While success was demonstrated primarily on image data, these methods do not directly transfer to other domains, such as spatiotemporal data. To understand privacy risks in spatiotemporal federated learning, we first propose Spatiotemporal Gradient Inversio… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2407.08529

  3. arXiv:2408.10231  [pdf, ps, other

    cs.RO

    Achieving Faster and More Accurate Operation of Deep Predictive Learning

    Authors: Masaki Yoshikawa, Hiroshi Ito, Tetsuya Ogata

    Abstract: Achieving both high speed and precision in robot operations is a significant challenge for social implementation. While factory robots excel at predefined tasks, they struggle with environment-specific actions like cleaning and cooking. Deep learning research aims to address this by enabling robots to autonomously execute behaviors through end-to-end learning with sensor data. RT-1 and ACT are not… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: 2 pages, 2 figures

  4. arXiv:2408.02928  [pdf, other

    cs.DB

    PGB: Benchmarking Differentially Private Synthetic Graph Generation Algorithms

    Authors: Shang Liu, Hao Du, Yang Cao, Bo Yan, Jinfei Liu, Masatoshi Yoshikawa

    Abstract: Differentially private graph analysis is a powerful tool for deriving insights from diverse graph data while protecting individual information. Designing private analytic algorithms for different graph queries often requires starting from scratch. In contrast, differentially private synthetic graph generation offers a general paradigm that supports one-time generation for multiple queries. Althoug… ▽ More

    Submitted 9 December, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

    Comments: 18 pages, accepted by ICDE 2025

  5. arXiv:2407.08529  [pdf, other

    cs.CR

    Enhancing Privacy of Spatiotemporal Federated Learning against Gradient Inversion Attacks

    Authors: Lele Zheng, Yang Cao, Renhe Jiang, Kenjiro Taura, Yulong Shen, Sheng Li, Masatoshi Yoshikawa

    Abstract: Spatiotemporal federated learning has recently raised intensive studies due to its ability to train valuable models with only shared gradients in various location-based services. On the other hand, recent studies have shown that shared gradients may be subject to gradient inversion attacks (GIA) on images or texts. However, so far there has not been any systematic study of the gradient inversion a… ▽ More

    Submitted 15 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by DASFAA 2024, 16 pages

  6. arXiv:2405.20576  [pdf, other

    cs.CR

    Federated Graph Analytics with Differential Privacy

    Authors: Shang Liu, Yang Cao, Takao Murakami, Weiran Liu, Seng Pei Liew, Tsubasa Takahashi, Jinfei Liu, Masatoshi Yoshikawa

    Abstract: Collaborative graph analysis across multiple institutions is becoming increasingly popular. Realistic examples include social network analysis across various social platforms, financial transaction analysis across multiple banks, and analyzing the transmission of infectious diseases across multiple hospitals. We define the federated graph analytics, a new problem for collaborative graph analytics… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 13 pages

  7. arXiv:2405.08043  [pdf, other

    cs.CR cs.LG

    HRNet: Differentially Private Hierarchical and Multi-Resolution Network for Human Mobility Data Synthesization

    Authors: Shun Takagi, Li Xiong, Fumiyuki Kato, Yang Cao, Masatoshi Yoshikawa

    Abstract: Human mobility data offers valuable insights for many applications such as urban planning and pandemic response, but its use also raises privacy concerns. In this paper, we introduce the Hierarchical and Multi-Resolution Network (HRNet), a novel deep generative model specifically designed to synthesize realistic human mobility data while guaranteeing differential privacy. We first identify the key… ▽ More

    Submitted 19 July, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  8. arXiv:2312.12938  [pdf, other

    cs.CR cs.DB

    CARGO: Crypto-Assisted Differentially Private Triangle Counting without Trusted Servers

    Authors: Shang Liu, Yang Cao, Takao Murakami, Jinfei Liu, Masatoshi Yoshikawa

    Abstract: Differentially private triangle counting in graphs is essential for analyzing connection patterns and calculating clustering coefficients while protecting sensitive individual information. Previous works have relied on either central or local models to enforce differential privacy. However, a significant utility gap exists between the central and local models of differentially private triangle cou… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: Accepted by ICDE 2024

  9. arXiv:2308.12210  [pdf, other

    cs.LG cs.CR

    ULDP-FL: Federated Learning with Across Silo User-Level Differential Privacy

    Authors: Fumiyuki Kato, Li Xiong, Shun Takagi, Yang Cao, Masatoshi Yoshikawa

    Abstract: Differentially Private Federated Learning (DP-FL) has garnered attention as a collaborative machine learning approach that ensures formal privacy. Most DP-FL approaches ensure DP at the record-level within each silo for cross-silo FL. However, a single user's data may extend across multiple silos, and the desired user-level DP guarantee for such a setting remains unknown. In this study, we present… ▽ More

    Submitted 16 June, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

    Comments: This is the full version of the paper accepted to VLDB 2024

  10. arXiv:2306.13293  [pdf, other

    cs.DB cs.CR

    Differentially Private Streaming Data Release under Temporal Correlations via Post-processing

    Authors: Xuyang Cao, Yang Cao, Primal Pappachan, Atsuyoshi Nakamura, Masatoshi Yoshikawa

    Abstract: The release of differentially private streaming data has been extensively studied, yet striking a good balance between privacy and utility on temporally correlated data in the stream remains an open problem. Existing works focus on enhancing privacy when applying differential privacy to correlated data, highlighting that differential privacy may suffer from additional privacy leakage under correla… ▽ More

    Submitted 25 June, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

  11. arXiv:2306.10656  [pdf, other

    cs.LG cs.AI stat.ML

    Virtual Human Generative Model: Masked Modeling Approach for Learning Human Characteristics

    Authors: Kenta Oono, Nontawat Charoenphakdee, Kotatsu Bito, Zhengyan Gao, Hideyoshi Igata, Masashi Yoshikawa, Yoshiaki Ota, Hiroki Okui, Kei Akita, Shoichiro Yamaguchi, Yohei Sugawara, Shin-ichi Maeda, Kunihiko Miyoshi, Yuki Saito, Koki Tsuda, Hiroshi Maruyama, Kohei Hayashi

    Abstract: Identifying the relationship between healthcare attributes, lifestyles, and personality is vital for understanding and improving physical and mental well-being. Machine learning approaches are promising for modeling their relationships and offering actionable suggestions. In this paper, we propose the Virtual Human Generative Model (VHGM), a novel deep generative model capable of estimating over 2… ▽ More

    Submitted 29 January, 2025; v1 submitted 18 June, 2023; originally announced June 2023.

    Comments: 19 pages, 4 figures

  12. arXiv:2302.08148  [pdf, other

    cs.AI cs.CL

    Empirical Investigation of Neural Symbolic Reasoning Strategies

    Authors: Yoichi Aoki, Keito Kudo, Tatsuki Kuribayashi, Ana Brassard, Masashi Yoshikawa, Keisuke Sakaguchi, Kentaro Inui

    Abstract: Neural reasoning accuracy improves when generating intermediate reasoning steps. However, the source of this improvement is yet unclear. Here, we investigate and factorize the benefit of generating intermediate steps for symbolic reasoning. Specifically, we decompose the reasoning strategy w.r.t. step granularity and chaining strategy. With a purely symbolic numerical reasoning dataset (e.g., A=1,… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

    Comments: This paper is accepted as the findings at EACL 2023, and the earlier version (non-archival) of this work got the Best Paper Award in the Student Research Workshop of AACL 2022

  13. arXiv:2302.07866  [pdf, other

    cs.CL cs.AI

    Do Deep Neural Networks Capture Compositionality in Arithmetic Reasoning?

    Authors: Keito Kudo, Yoichi Aoki, Tatsuki Kuribayashi, Ana Brassard, Masashi Yoshikawa, Keisuke Sakaguchi, Kentaro Inui

    Abstract: Compositionality is a pivotal property of symbolic reasoning. However, how well recent neural models capture compositionality remains underexplored in the symbolic reasoning tasks. This study empirically addresses this question by systematically examining recently published pre-trained seq2seq models with a carefully controlled dataset of multi-hop arithmetic symbolic reasoning. We introduce a ski… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: accepted by EACL 2023

  14. arXiv:2301.06758  [pdf, other

    cs.LG cs.AI cs.CL

    Tracing and Manipulating Intermediate Values in Neural Math Problem Solvers

    Authors: Yuta Matsumoto, Benjamin Heinzerling, Masashi Yoshikawa, Kentaro Inui

    Abstract: How language models process complex input that requires multiple steps of inference is not well understood. Previous research has shown that information about intermediate values of these inputs can be extracted from the activations of the models, but it is unclear where that information is encoded and whether that information is indeed used during inference. We introduce a method for analyzing ho… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

    Comments: 5 pages, 4 figures, MathNLP

  15. arXiv:2212.10688  [pdf, other

    cs.CV cs.DB

    Local Differential Privacy Image Generation Using Flow-based Deep Generative Models

    Authors: Hisaichi Shibata, Shouhei Hanaoka, Yang Cao, Masatoshi Yoshikawa, Tomomi Takenaga, Yukihiro Nomura, Naoto Hayashi, Osamu Abe

    Abstract: Diagnostic radiologists need artificial intelligence (AI) for medical imaging, but access to medical images required for training in AI has become increasingly restrictive. To release and use medical images, we need an algorithm that can simultaneously protect privacy and preserve pathologies in medical images. To develop such an algorithm, here, we propose DP-GLOW, a hybrid of a local differentia… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

  16. Secure Shapley Value for Cross-Silo Federated Learning (Technical Report)

    Authors: Shuyuan Zheng, Yang Cao, Masatoshi Yoshikawa

    Abstract: The Shapley value (SV) is a fair and principled metric for contribution evaluation in cross-silo federated learning (cross-silo FL), wherein organizations, i.e., clients, collaboratively train prediction models with the coordination of a parameter server. However, existing SV calculation methods for FL assume that the server can access the raw FL models and public test data. This may not be a vali… ▽ More

    Submitted 25 December, 2024; v1 submitted 11 September, 2022; originally announced September 2022.

    Comments: Technical report for our VLDB 2023 paper (https://www.vldb.org/pvldb/vol16/p1657-zheng.pdf)

    Journal ref: Proceedings of the VLDB Endowment, 16(7): 1657-1670, 2023

  17. A Crypto-Assisted Approach for Publishing Graph Statistics with Node Local Differential Privacy

    Authors: Shang Liu, Yang Cao, Takao Murakami, Masatoshi Yoshikawa

    Abstract: Publishing graph statistics under node differential privacy has attracted much attention since it provides a stronger privacy guarantee than edge differential privacy. Existing works related to node differential privacy assume a trusted data curator who holds the whole graph. However, in many applications, a trusted curator is usually not available due to privacy and security issues. In this paper… ▽ More

    Submitted 15 April, 2023; v1 submitted 6 September, 2022; originally announced September 2022.

    Comments: Accepted by IEEE BigData 2022, https://ieeexplore.ieee.org/document/10020435

  18. arXiv:2204.13032  [pdf, other

    cs.CL

    BiTimeBERT: Extending Pre-Trained Language Representations with Bi-Temporal Information

    Authors: Jiexin Wang, Adam Jatowt, Masatoshi Yoshikawa, Yi Cai

    Abstract: Time is an important aspect of documents and is used in a range of NLP and IR tasks. In this work, we investigate methods for incorporating temporal information during pre-training to further improve the performance on time-related tasks. Compared with common pre-trained language models like BERT which utilize synchronic document collections (e.g., BookCorpus and Wikipedia) as the training corpora… ▽ More

    Submitted 27 April, 2023; v1 submitted 27 April, 2022; originally announced April 2022.

  19. arXiv:2204.03919  [pdf, other

    cs.CR cs.DB cs.LG

    Network Shuffling: Privacy Amplification via Random Walks

    Authors: Seng Pei Liew, Tsubasa Takahashi, Shun Takagi, Fumiyuki Kato, Yang Cao, Masatoshi Yoshikawa

    Abstract: Recently, it is shown that shuffling can amplify the central differential privacy guarantees of data randomized with local differential privacy. Within this setup, a centralized, trusted shuffler is responsible for shuffling by keeping the identities of data anonymous, which subsequently leads to stronger privacy guarantees for systems. However, introducing a centralized entity to the originally l… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: 15 pages, 9 figures; SIGMOD 2022 version

  20. HDPView: Differentially Private Materialized View for Exploring High Dimensional Relational Data

    Authors: Fumiyuki Kato, Tsubasa Takahashi, Shun Takagi, Yang Cao, Seng Pei Liew, Masatoshi Yoshikawa

    Abstract: How can we explore the unknown properties of high-dimensional sensitive relational data while preserving privacy? We study how to construct an explorable privacy-preserving materialized view under differential privacy. No existing state-of-the-art methods simultaneously satisfy the following essential properties in data exploration: workload independence, analytical reliability (i.e., providing er… ▽ More

    Submitted 26 May, 2022; v1 submitted 13 March, 2022; originally announced March 2022.

    Comments: accepted at VLDB 2022

  21. arXiv:2202.07165  [pdf, other

    cs.LG cs.CR

    OLIVE: Oblivious Federated Learning on Trusted Execution Environment against the risk of sparsification

    Authors: Fumiyuki Kato, Yang Cao, Masatoshi Yoshikawa

    Abstract: Combining Federated Learning (FL) with a Trusted Execution Environment (TEE) is a promising approach for realizing privacy-preserving FL, which has garnered significant academic attention in recent years. Implementing the TEE on the server side enables each round of FL to proceed without exposing the client's gradient information to untrusted servers. This addresses usability gaps in existing secu… ▽ More

    Submitted 19 June, 2023; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: This paper is the full version of a paper accepted at VLDB 2023

  22. arXiv:2109.13497  [pdf, other

    cs.CL cs.LG

    Instance-Based Neural Dependency Parsing

    Authors: Hiroki Ouchi, Jun Suzuki, Sosuke Kobayashi, Sho Yokoi, Tatsuki Kuribayashi, Masashi Yoshikawa, Kentaro Inui

    Abstract: Interpretable rationales for model predictions are crucial in practical applications. We develop neural models that possess an interpretable inference process for dependency parsing. Our models adopt instance-based inference, where dependency edges are extracted and labeled by comparing them to edges in a training set. The training edges are explicitly used for the predictions; thus, it is easy to… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

    Comments: 15 pages, accepted to TACL 2021

  23. arXiv:2109.03438  [pdf, other

    cs.CL cs.AI

    ArchivalQA: A Large-scale Benchmark Dataset for Open Domain Question Answering over Historical News Collections

    Authors: Jiexin Wang, Adam Jatowt, Masatoshi Yoshikawa

    Abstract: In the last few years, open-domain question answering (ODQA) has advanced rapidly due to the development of deep learning techniques and the availability of large-scale QA datasets. However, the current datasets are essentially designed for synchronic document collections (e.g., Wikipedia). Temporal news collections such as long-term news archives spanning several decades, are rarely used in train… ▽ More

    Submitted 21 February, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

  24. arXiv:2106.07033  [pdf, other

    cs.CR cs.LG

    Understanding the Interplay between Privacy and Robustness in Federated Learning

    Authors: Yaowei Han, Yang Cao, Masatoshi Yoshikawa

    Abstract: Federated Learning (FL) is emerging as a promising paradigm of privacy-preserving machine learning, which trains an algorithm across multiple clients without exchanging their data samples. Recent works highlighted several privacy and robustness weaknesses in FL and addressed these concerns using local differential privacy (LDP) and some well-studied methods used in conventional ML, separately. How… ▽ More

    Submitted 13 June, 2021; originally announced June 2021.

  25. FL-Market: Trading Private Models in Federated Learning

    Authors: Shuyuan Zheng, Yang Cao, Masatoshi Yoshikawa, Huizhong Li, Qiang Yan

    Abstract: The difficulty in acquiring a sufficient amount of training data is a major bottleneck for machine learning (ML) based data analytics. Recently, commoditizing ML models has been proposed as an economical and moderate solution to ML-oriented data acquisition. However, existing model marketplaces assume that the broker can access data owners' private training data, which may not be realistic in prac… ▽ More

    Submitted 3 April, 2023; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: Extended report for our IEEE BigData 2022 paper

    Journal ref: Proceedings of the 2022 IEEE International Conference on Big Data, 1525-1534. (https://ieeexplore.ieee.org/document/10020232)

  26. arXiv:2105.01651  [pdf, other

    cs.CR cs.DB

    Pricing Private Data with Personalized Differential Privacy and Partial Arbitrage Freeness

    Authors: Shuyuan Zheng, Yang Cao, Masatoshi Yoshikawa

    Abstract: There is a growing trend regarding perceiving personal data as a commodity. Existing studies have built frameworks and theories about how to determine an arbitrage-free price of a given query according to the privacy loss quantified by differential privacy. However, those studies have assumed that data buyers can purchase query answers with the arbitrary privacy loss of data owners, which may not… ▽ More

    Submitted 23 November, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

  27. arXiv:2104.06569  [pdf, other

    cs.CR

    Preventing Manipulation Attack in Local Differential Privacy using Verifiable Randomization Mechanism

    Authors: Fumiyuki Kato, Yang Cao, Masatoshi Yoshikawa

    Abstract: Several randomization mechanisms for local differential privacy (LDP) (e.g., randomized response) are well-studied to improve the utility. However, recent studies show that LDP is generally vulnerable to malicious data providers in nature. Because a data collector has to estimate background data distribution only from already randomized data, malicious data providers can manipulate their output be… ▽ More

    Submitted 9 June, 2021; v1 submitted 13 April, 2021; originally announced April 2021.

    Comments: accepted by DBSec 2021

  28. arXiv:2103.00996  [pdf, other

    cs.CR

    Asymmetric Differential Privacy

    Authors: Shun Takagi, Yang Cao, Masatoshi Yoshikawa

    Abstract: Differential privacy (DP) is getting attention as a privacy definition when publishing statistics of a dataset. This paper focuses on the limitation that DP inevitably causes two-sided error, which is not desirable for epidemic analysis such as how many COVID-19 infected individuals visited location A. For example, consider publishing misinformation that many infected people did not visit location… ▽ More

    Submitted 5 September, 2022; v1 submitted 1 March, 2021; originally announced March 2021.

  29. arXiv:2012.13061  [pdf, other

    cs.CY cs.CR

    Quantifying the Privacy-Utility Trade-offs in COVID-19 Contact Tracing Apps

    Authors: Patrick Ocheja, Yang Cao, Shiyao Ding, Masatoshi Yoshikawa

    Abstract: How to contain the spread of the COVID-19 virus is a major concern for most countries. As the situation continues to change, various countries are making efforts to reopen their economies by lifting some restrictions and enforcing new measures to prevent the spread. In this work, we review some approaches that have been adopted to contain the COVID-19 virus such as contact tracing, clusters identi… ▽ More

    Submitted 23 December, 2020; originally announced December 2020.

    Comments: 12 pages, 11 figures, 4 tables

    MSC Class: 68P27 ACM Class: H.3.4

  30. arXiv:2012.03782  [pdf, other

    cs.CR cs.CY

    PCT-TEE: Trajectory-based Private Contact Tracing System with Trusted Execution Environment

    Authors: Fumiyuki Kato, Yang Cao, Masatoshi Yoshikawa

    Abstract: Existing Bluetooth-based Private Contact Tracing (PCT) systems can privately detect whether people have come into direct contact with COVID-19 patients. However, we find that the existing systems lack functionality and flexibility, which may hurt the success of the contact tracing. Specifically, they cannot detect indirect contact (e.g., people may be exposed to coronavirus because of used the sam… ▽ More

    Submitted 31 December, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

    Comments: Accepted by ACM TSAS

  31. arXiv:2010.13449  [pdf, other

    cs.CR

    Geo-Graph-Indistinguishability: Location Privacy on Road Networks Based on Differential Privacy

    Authors: Shun Takagi, Yang Cao, Yasuhito Asano, Masatoshi Yoshikawa

    Abstract: In recent years, concerns about location privacy are increasing with the spread of location-based services (LBSs). Many methods to protect location privacy have been proposed in the past decades. Especially, perturbation methods based on Geo-Indistinguishability (Geo-I), which randomly perturb a true location to a pseudolocation, are getting attention due to its strong privacy guarantee inherited… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

  32. arXiv:2010.13381  [pdf, other

    cs.CR cs.DB

    Secure and Efficient Trajectory-Based Contact Tracing using Trusted Hardware

    Authors: Fumiyuki Kato, Yang Cao, Masatoshi Yoshikawa

    Abstract: The COVID-19 pandemic has prompted technological measures to control the spread of the disease. Private contact tracing (PCT) is one of the promising techniques for the purpose. However, the recently proposed Bluetooth-based PCT has several limitations in terms of functionality and flexibility. The existing systems are only able to detect direct contact (i.e., human-human contact), but cannot dete… ▽ More

    Submitted 4 November, 2020; v1 submitted 26 October, 2020; originally announced October 2020.

    Comments: Accepted by 7th International Workshop on Privacy and Security of Big Data (PSBD 2020) in conjunction with 2020 IEEE International Conference on Big Data (IEEE BigData 2020)

  33. arXiv:2009.08063  [pdf, other

    cs.LG cs.CR stat.ML

    FLAME: Differentially Private Federated Learning in the Shuffle Model

    Authors: Ruixuan Liu, Yang Cao, Hong Chen, Ruoyang Guo, Masatoshi Yoshikawa

    Abstract: Federated Learning (FL) is a promising machine learning paradigm that enables the analyzer to train a model without collecting users' raw data. To ensure users' privacy, differentially private federated learning has been intensively studied. The existing works are mainly based on the \textit{curator model} or \textit{local model} of differential privacy. However, both of them have pros and cons. T… ▽ More

    Submitted 20 March, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

    Comments: accepted by AAAI-21

  34. arXiv:2006.12101  [pdf, other

    cs.LG cs.CR cs.DB stat.ML

    P3GM: Private High-Dimensional Data Release via Privacy Preserving Phased Generative Model

    Authors: Shun Takagi, Tsubasa Takahashi, Yang Cao, Masatoshi Yoshikawa

    Abstract: How can we release a massive volume of sensitive data while mitigating privacy risks? Privacy-preserving data synthesis enables the data holder to outsource analytical tasks to an untrusted third party. The state-of-the-art approach for this problem is to build a generative model under differential privacy, which offers a rigorous privacy guarantee. However, the existing method cannot adequately h… ▽ More

    Submitted 7 March, 2022; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: The version accepted at ICDE 2021 includes wrong proof in the Wishart mechanism. The current version fixes the problem

  35. arXiv:2005.01263  [pdf, other

    cs.CR cs.CY

    PGLP: Customizable and Rigorous Location Privacy through Policy Graph

    Authors: Yang Cao, Yonghui Xiao, Shun Takagi, Li Xiong, Masatoshi Yoshikawa, Yilin Shen, Jinfei Liu, Hongxia Jin, Xiaofeng Xu

    Abstract: Location privacy has been extensively studied in the literature. However, existing location privacy models are either not rigorous or not customizable, which limits the trade-off between privacy and utility in many real-world applications. To address this issue, we propose a new location privacy notion called PGLP, i.e., \textit{Policy Graph based Location Privacy}, providing a rich interface to r… ▽ More

    Submitted 15 July, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

    Comments: accepted in the 25th European Symposium on Research in Computer Security (ESORICS) 2020

  36. arXiv:2005.00186  [pdf, other

    cs.DB cs.CR

    PANDA: Policy-aware Location Privacy for Epidemic Surveillance

    Authors: Yang Cao, Shun Takagi, Yonghui Xiao, Li Xiong, Masatoshi Yoshikawa

    Abstract: In this demonstration, we present a privacy-preserving epidemic surveillance system. Recently, many countries that suffer from coronavirus crises attempt to access citizen's location data to eliminate the outbreak. However, it raises privacy concerns and may open the doors to more invasive forms of surveillance in the name of public health. It also brings a challenge for privacy protection techniq… ▽ More

    Submitted 6 June, 2020; v1 submitted 30 April, 2020; originally announced May 2020.

    Comments: Accepted in the 46th International Conference on Very Large Data Bases (VLDB 2020) demonstration track

  37. arXiv:2004.07442  [pdf, other

    cs.CR cs.SD eess.AS

    Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release

    Authors: Yaowei Han, Sheng Li, Yang Cao, Qiang Ma, Masatoshi Yoshikawa

    Abstract: With the development of smart devices, such as the Amazon Echo and Apple's HomePod, speech data have become a new dimension of big data. However, privacy and security concerns may hinder the collection and sharing of real-world speech data, which contain the speaker's identifiable information, i.e., voiceprint, which is considered a type of biometric identifier. Current studies on voiceprint priva… ▽ More

    Submitted 15 April, 2020; originally announced April 2020.

    Comments: The paper has been accepted by the IEEE International Conference on Multimedia & Expo 2020(ICME 2020)

  38. arXiv:2003.10637  [pdf, other

    cs.LG cs.CR stat.ML

    FedSel: Federated SGD under Local Differential Privacy with Top-k Dimension Selection

    Authors: Ruixuan Liu, Yang Cao, Masatoshi Yoshikawa, Hong Chen

    Abstract: As massive data are produced from small gadgets, federated learning on mobile devices has become an emerging trend. In the federated setting, Stochastic Gradient Descent (SGD) has been widely used in federated learning for various machine learning models. To prevent privacy leakages from gradients that are calculated on users' sensitive data, local differential privacy (LDP) has been considered as… ▽ More

    Submitted 23 March, 2020; originally announced March 2020.

    Comments: 18 pages, to be published in DASFAA 2020

  39. arXiv:1910.11040  [pdf, ps, other

    cs.DB

    Toward a view-based data cleaning architecture

    Authors: Toshiyuki Shimizu, Hiroki Omori, Masatoshi Yoshikawa

    Abstract: Big data analysis has become an active area of study with the growth of machine learning techniques. To properly analyze data, it is important to maintain high-quality data. Thus, research on data cleaning is also important. It is difficult to automatically detect and correct inconsistent values for data requiring expert knowledge or data created by many contributors, such as integrated data from… ▽ More

    Submitted 24 October, 2019; originally announced October 2019.

    Comments: Proceedings of the Third Workshop on Software Foundations for Data Interoperability (SFDI2019+), October 28, 2019, Fukuoka, Japan

  40. Protecting Spatiotemporal Event Privacy in Continuous Location-Based Services

    Authors: Yang Cao, Yonghui Xiao, Li Xiong, Liquan Bai, Masatoshi Yoshikawa

    Abstract: Location privacy-preserving mechanisms (LPPMs) have been extensively studied for protecting users' location privacy by releasing a perturbed location to third parties such as location-based service providers. However, when a user's perturbed locations are released continuously, existing LPPMs may not protect the sensitive information about the user's spatiotemporal activities, such as "visited hos… ▽ More

    Submitted 16 May, 2020; v1 submitted 24 July, 2019; originally announced July 2019.

    Comments: accepted in TKDE. arXiv admin note: substantial text overlap with arXiv:1810.09152

    Journal ref: IEEE Transactions on Knowledge and Data Engineering (TKDE) 2020

  41. arXiv:1906.05457  [pdf, other

    cs.CR cs.DB

    Trading Location Data with Bounded Personalized Privacy Loss

    Authors: Shuyuan Zheng, Yang Cao, Masatoshi Yoshikawa

    Abstract: As personal data have been the new oil of the digital era, there is a growing trend perceiving personal data as a commodity. Although some people are willing to trade their personal data for money, they might still expect limited privacy loss, and the maximum tolerable privacy loss varies with each individual. In this paper, we propose a framework that enables individuals to trade their personal d… ▽ More

    Submitted 24 October, 2019; v1 submitted 12 June, 2019; originally announced June 2019.

    Comments: Proceedings of the Third Workshop on Software Foundations for Data Interoperability (SFDI2019+), October 28, 2019, Fukuoka, Japan

  42. arXiv:1906.03952  [pdf, other

    cs.CL

    Multimodal Logical Inference System for Visual-Textual Entailment

    Authors: Riko Suzuki, Hitomi Yanaka, Masashi Yoshikawa, Koji Mineshima, Daisuke Bekki

    Abstract: A large amount of research about multimodal inference across text and vision has been recently developed to obtain visually grounded word and sentence representations. In this paper, we use logic-based representations as unified meaning representations for texts and images and present an unsupervised multimodal logical inference system that can effectively prove entailment relations between them.… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

  43. arXiv:1906.01834  [pdf, other

    cs.CL

    Automatic Generation of High Quality CCGbanks for Parser Domain Adaptation

    Authors: Masashi Yoshikawa, Hiroshi Noji, Koji Mineshima, Daisuke Bekki

    Abstract: We propose a new domain adaptation method for Combinatory Categorial Grammar (CCG) parsing, based on the idea of automatic generation of CCG corpora exploiting cheaper resources of dependency trees. Our solution is conceptually simple, and not relying on a specific parser architecture, making it applicable to the current best-performing parsers. We conduct extensive parsing experiments with detail… ▽ More

    Submitted 5 June, 2019; originally announced June 2019.

    Comments: 11 pages, accepted as long paper to ACL 2019 Italy

  44. Blockchain-based Bidirectional Updates on Fine-grained Medical Data

    Authors: Chunmiao Li, Yang Cao, Zhenjiang Hu, Masatoshi Yoshikawa

    Abstract: Electronic medical data sharing between stakeholders, such as patients, doctors, and researchers, can promote more effective medical treatment collaboratively. These sensitive and private data should only be accessed by authorized users. Given a total medical data, users may care about parts of them and other unrelated information might interfere with the user interested data search and increase t… ▽ More

    Submitted 23 April, 2019; originally announced April 2019.

    ACM Class: H.2; D.2; K.6.5

  45. When and where do you want to hide? Recommendation of location privacy preferences with local differential privacy

    Authors: Maho Asada, Masatoshi Yoshikawa, Yang Cao

    Abstract: In recent years, it has become easy to obtain location information quite precisely. However, the acquisition of such information has risks such as individual identification and leakage of sensitive information, so it is necessary to protect the privacy of location information. For this purpose, people should know their location privacy preferences, that is, whether or not he/she can release locati… ▽ More

    Submitted 23 April, 2019; originally announced April 2019.

  46. arXiv:1811.06203  [pdf, other

    cs.CL cs.AI

    Combining Axiom Injection and Knowledge Base Completion for Efficient Natural Language Inference

    Authors: Masashi Yoshikawa, Koji Mineshima, Hiroshi Noji, Daisuke Bekki

    Abstract: In logic-based approaches to reasoning tasks such as Recognizing Textual Entailment (RTE), it is important for a system to have a large amount of knowledge data. However, there is a tradeoff between adding more knowledge data for improved RTE performance and maintaining an efficient RTE system, as such a big database is problematic in terms of the memory usage and computational complexity. In this… ▽ More

    Submitted 15 November, 2018; originally announced November 2018.

    Comments: 9 pages, accepted to AAAI 2019

  47. arXiv:1809.10357  [pdf, other

    cs.DB

    Making View Update Strategies Programmable - Toward Controlling and Sharing Distributed Data -

    Authors: Yasuhito Asano, Soichiro Hidaka, Zhenjiang Hu, Yasunori Ishihara, Hiroyuki Kato, Hsiang-Shang Ko, Keisuke Nakano, Makoto Onizuka, Yuya Sasaki, Toshiyuki Shimizu, Van-Dang Tran, Kanae Tsushima, Masatoshi Yoshikawa

    Abstract: Views are known mechanisms for controlling access of data and for sharing data of different schemas. Despite long and intensive research on views in both the database community and the programming language community, we are facing difficulties to use views in practice. The main reason is that we lack ways to directly describe view update strategies to deal with the inherent ambiguity of view updat… ▽ More

    Submitted 27 September, 2018; originally announced September 2018.

    Comments: 6 pages. arXiv admin note: text overlap with arXiv:1803.06674

  48. arXiv:1804.08473  [pdf, other

    cs.CV cs.AI cs.MM

    Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training

    Authors: Bei Liu, Jianlong Fu, Makoto P. Kato, Masatoshi Yoshikawa

    Abstract: Automatic generation of natural language from images has attracted extensive attention. In this paper, we take one step further to investigate generation of poetic language (with multiple lines) to an image for automatic poetry creation. This task involves multiple challenges, including discovering poetic clues from the image (e.g., hope from green), and generating poems to satisfy both relevance… ▽ More

    Submitted 9 October, 2018; v1 submitted 23 April, 2018; originally announced April 2018.

  49. arXiv:1804.07068  [pdf, ps, other

    cs.CL

    Consistent CCG Parsing over Multiple Sentences for Improved Logical Reasoning

    Authors: Masashi Yoshikawa, Koji Mineshima, Hiroshi Noji, Daisuke Bekki

    Abstract: In formal logic-based approaches to Recognizing Textual Entailment (RTE), a Combinatory Categorial Grammar (CCG) parser is used to parse input premises and hypotheses to obtain their logical formulas. Here, it is important that the parser processes the sentences consistently; failing to recognize a similar syntactic structure results in inconsistent predicate argument structures among them, in whi… ▽ More

    Submitted 19 April, 2018; originally announced April 2018.

    Comments: 6 pages. short paper accepted to NAACL2018

  50. arXiv:1803.06674  [pdf, other

    cs.DB

    A View-based Programmable Architecture for Controlling and Integrating Decentralized Data

    Authors: Yasuhito Asano, Soichiro Hidaka, Zhenjiang Hu, Yasunori Ishihara, Hiroyuki Kato, Hsiang-Shang Ko, Keisuke Nakano, Makoto Onizuka, Yuya Sasaki, Toshiyuki Shimizu, Kanae Tsushima, Masatoshi Yoshikawa

    Abstract: The view and the view update are known mechanism for controlling access of data and for integrating data of different schemas. Despite intensive and long research on them in both the database community and the programming language community, we are facing difficulties to use them in practice. The main reason is that we are lacking of control over the view update strategy to deal with inherited amb… ▽ More

    Submitted 18 March, 2018; originally announced March 2018.

    Comments: 14 pages, 2 figures, conference