Skip to main content

Showing 1–12 of 12 results for author: Amagata, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.13446  [pdf, ps, other

    cs.DB

    Approximate Reverse $k$-Ranks Queries in High Dimensions

    Authors: Daichi Amagata, Kazuyoshi Aoyama, Keito Kido, Sumio Fujita

    Abstract: Many objects are represented as high-dimensional vectors nowadays. In this setting, the relevance between two objects (vectors) is usually evaluated by their inner product. Recently, item-centric searches, which search for users relevant to query items, have received attention and find important applications, such as product promotion and market analysis. To support these applications, this paper… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Accepted to SSDBM2025

  2. arXiv:2504.13445  [pdf, ps, other

    cs.DB

    How to Mine Potentially Popular Items? A Reverse MIPS-based Approach

    Authors: Daichi Amagata, Kazuyoshi Aoayama, Keito Kido, Sumio Fujita

    Abstract: The $k$-MIPS ($k$ Maximum Inner Product Search) problem has been employed in many fields. Recently, its reverse version, the reverse $k$-MIPS problem, has been proposed. Given an item vector (i.e., query), it retrieves all user vectors such that their $k$-MIPS results contain the item vector. Consider the cardinality of a reverse $k$-MIPS result. A large cardinality means that the item is potentia… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Accepted to SSDBM2025

  3. arXiv:2405.08315  [pdf, other

    cs.DB

    Independent Range Sampling on Interval Data (Longer Version)

    Authors: Daichi Amagata

    Abstract: Many applications require efficient management of large sets of intervals because many objects are associated with intervals (e.g., time and price intervals). In such interval management systems, range search is a primitive operator for retrieving and analysis tasks. As dataset sizes are growing nowadays, range search results are also becoming larger, which may overwhelm users and incur long compu… ▽ More

    Submitted 22 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: Ful version of our ICDE2024 paper

  4. arXiv:2405.05601  [pdf, other

    cs.DB

    Efficient Algorithms for Top-k Stabbing Queries on Weighted Interval Data (Full Version)

    Authors: Daichi Amagata, Junya Yamada, Yuchen Ji, Takahiro Hara

    Abstract: Intervals have been generated in many applications (e.g., temporal databases), and they are often associated with weights, such as prices. This paper addresses the problem of processing top-k weighted stabbing queries on interval data. Given a set of weighted intervals, a query value, and a result size $k$, this problem finds the $k$ intervals that are stabbed by the query value and have the large… ▽ More

    Submitted 22 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: Full version of our DEXA2024 paper

  5. arXiv:2312.16033  [pdf, other

    cs.DB

    Fast Algorithm for Embedded Order Dependency Validation (Extended Version)

    Authors: Alejandro Ramos, Takuya Uemura, Daichi Amagata, Ryo Shirai, Takahiro Hara

    Abstract: Order Dependencies (ODs) have many applications, such as query optimization, data integration, and data cleaning. Although many works addressed the problem of discovering OD (and its variants), they do not consider datasets with missing values, a standard observation in real-world datasets. This paper introduces the novel notion of Embedded ODs (eODs) to deal with missing values. The intuition of… ▽ More

    Submitted 28 December, 2023; v1 submitted 26 December, 2023; originally announced December 2023.

  6. arXiv:2306.04846  [pdf, ps, other

    cs.DB cs.AI

    Learned spatial data partitioning

    Authors: Keizo Hori, Yuya Sasaki, Daichi Amagata, Yuki Murosaki, Makoto Onizuka

    Abstract: Due to the significant increase in the size of spatial data, it is essential to use distributed parallel processing systems to efficiently analyze spatial data. In this paper, we first study learned spatial data partitioning, which effectively assigns groups of big spatial data to computers based on locations of data by using machine learning techniques. We formalize spatial data partitioning in t… ▽ More

    Submitted 19 June, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

  7. arXiv:2211.03390  [pdf, other

    cs.IR

    Debiasing Graph Transfer Learning via Item Semantic Clustering for Cross-Domain Recommendations

    Authors: Zhi Li, Daichi Amagata, Yihong Zhang, Takahiro Hara, Shuichiro Haruta, Kei Yonekawa, Mori Kurokawa

    Abstract: Deep learning-based recommender systems may lead to over-fitting when lacking training interaction data. This over-fitting significantly degrades recommendation performances. To address this data sparsity problem, cross-domain recommender systems (CDRSs) exploit the data from an auxiliary source domain to facilitate the recommendation on the sparse target domain. Most existing CDRSs rely on overla… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: 11 pages, 4 figures

  8. arXiv:2208.14210  [pdf, other

    cs.DB cs.AI cs.LG

    Learned k-NN Distance Estimation

    Authors: Daichi Amagata, Yusuke Arai, Sumio Fujita, Takahiro Hara

    Abstract: Big data mining is well known to be an important task for data science, because it can provide useful observations and new knowledge hidden in given large datasets. Proximity-based data analysis is particularly utilized in many real-life applications. In such analysis, the distances to k nearest neighbors are usually employed, thus its main bottleneck is derived from data retrieval. Much efforts h… ▽ More

    Submitted 27 November, 2022; v1 submitted 29 August, 2022; originally announced August 2022.

    Comments: Accepted to SIGSPATIAL2022 (as short paper)

  9. arXiv:2207.04649  [pdf, other

    cs.DB

    Fast Density-Peaks Clustering: Multicore-based Parallelization Approach

    Authors: Daichi Amagata, Takahiro Hara

    Abstract: Clustering multi-dimensional points is a fundamental task in many fields, and density-based clustering supports many applications as it can discover clusters of arbitrary shapes. This paper addresses the problem of Density-Peaks Clustering (DPC), a recently proposed density-based clustering framework. Although DPC already has many applications, its straightforward implementation incurs a quadratic… ▽ More

    Submitted 29 November, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: This is a corrected version of our SIGMOD2021 paper

  10. arXiv:2110.08959  [pdf, other

    cs.DB

    Fast and Exact Outlier Detection in Metric Spaces: A Proximity Graph-based Approach

    Authors: Daichi Amagata, Makoto Onizuka, Takahiro Hara

    Abstract: Distance-based outlier detection is widely adopted in many fields, e.g., data mining and machine learning, because it is unsupervised, can be employed in a generic metric space, and does not have any assumptions of data distributions. Data mining and machine learning applications face a challenge of dealing with large datasets, which requires efficient distance-based outlier detection algorithms.… ▽ More

    Submitted 21 October, 2021; v1 submitted 17 October, 2021; originally announced October 2021.

    Comments: Accepted to SIGMOD2021

  11. arXiv:2110.07131  [pdf, other

    cs.DB

    Reverse Maximum Inner Product Search: How to efficiently find users who would like to buy my item?

    Authors: Daichi Amagata, Takahiro Hara

    Abstract: The MIPS (maximum inner product search), which finds the item with the highest inner product with a given query user, is an essential problem in the recommendation field. It is usual that e-commerce companies face situations where they want to promote and sell new or discounted items. In these situations, we have to consider a question: who are interested in the items and how to find them? This pa… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

    Comments: Accepted to RecSys2021

  12. arXiv:2101.12417  [pdf, other

    cs.DB

    Distributed Spatial-Keyword kNN Monitoring for Location-aware Pub/Sub

    Authors: Shohei Tsuruoka, Daichi Amagata, Shunya Nishio, Takahiro Hara

    Abstract: Recent applications employ publish/subscribe (Pub/Sub) systems so that publishers can easily receive attentions of customers and subscribers can monitor useful information generated by publishers. Due to the prevalence of smart devices and social networking services, a large number of objects that contain both spatial and keyword information have been generated continuously, and the number of subs… ▽ More

    Submitted 9 February, 2021; v1 submitted 29 January, 2021; originally announced January 2021.

    Comments: 10 pages