Search | arXiv e-print repository

arXiv:1908.00825 [pdf, other]

doi 10.1145/3240323.3240388

A Hierarchical Bayesian Model for Size Recommendation in Fashion

Authors: Romain Guigourès, Yuen King Ho, Evgenii Koriagin, Abdul-Saboor Sheikh, Urs Bergmann, Reza Shirvany

Abstract: We introduce a hierarchical Bayesian approach to tackle the challenging problem of size recommendation in e-commerce fashion. Our approach jointly models a size purchased by a customer, and its possible return event: 1. no return, 2. returned too small 3. returned too big. Those events are drawn following a multinomial distribution parameterized on the joint probability of each event, built follow… ▽ More We introduce a hierarchical Bayesian approach to tackle the challenging problem of size recommendation in e-commerce fashion. Our approach jointly models a size purchased by a customer, and its possible return event: 1. no return, 2. returned too small 3. returned too big. Those events are drawn following a multinomial distribution parameterized on the joint probability of each event, built following a hierarchy combining priors. Such a model allows us to incorporate extended domain expertise and article characteristics as prior knowledge, which in turn makes it possible for the underlying parameters to emerge thanks to sufficient data. Experiments are presented on real (anonymized) data from millions of customers along with a detailed discussion on the efficiency of such an approach within a large scale production system. △ Less

Submitted 2 August, 2019; originally announced August 2019.

Journal ref: In: Proceedings of the 12th ACM Conference on Recommender Systems. ACM, 2018. S. 392-396

arXiv:1907.09844 [pdf, other]

doi 10.1145/3298689.3347006

A Deep Learning System for Predicting Size and Fit in Fashion E-Commerce

Authors: Abdul-Saboor Sheikh, Romain Guigoures, Evgenii Koriagin, Yuen King Ho, Reza Shirvany, Roland Vollgraf, Urs Bergmann

Abstract: Personalized size and fit recommendations bear crucial significance for any fashion e-commerce platform. Predicting the correct fit drives customer satisfaction and benefits the business by reducing costs incurred due to size-related returns. Traditional collaborative filtering algorithms seek to model customer preferences based on their previous orders. A typical challenge for such methods stems… ▽ More Personalized size and fit recommendations bear crucial significance for any fashion e-commerce platform. Predicting the correct fit drives customer satisfaction and benefits the business by reducing costs incurred due to size-related returns. Traditional collaborative filtering algorithms seek to model customer preferences based on their previous orders. A typical challenge for such methods stems from extreme sparsity of customer-article orders. To alleviate this problem, we propose a deep learning based content-collaborative methodology for personalized size and fit recommendation. Our proposed method can ingest arbitrary customer and article data and can model multiple individuals or intents behind a single account. The method optimizes a global set of parameters to learn population-level abstractions of size and fit relevant information from observed customer-article interactions. It further employs customer and article specific embedding variables to learn their properties. Together with learned entity embeddings, the method maps additional customer and article attributes into a latent space to derive personalized recommendations. Application of our method to two publicly available datasets demonstrate an improvement over the state-of-the-art published results. On two proprietary datasets, one containing fit feedback from fashion experts and the other involving customer purchases, we further outperform comparable methodologies, including a recent Bayesian approach for size recommendation. △ Less

Submitted 23 July, 2019; originally announced July 2019.

Comments: Published at the Thirteenth ACM Conference on Recommender Systems (RecSys '19), September 16--20, 2019, Copenhagen, Denmark

arXiv:1905.11784 [pdf, other]

SizeNet: Weakly Supervised Learning of Visual Size and Fit in Fashion Images

Authors: Nour Karessli, Romain Guigourès, Reza Shirvany

Abstract: Finding clothes that fit is a hot topic in the e-commerce fashion industry. Most approaches addressing this problem are based on statistical methods relying on historical data of articles purchased and returned to the store. Such approaches suffer from the cold start problem for the thousands of articles appearing on the shopping platforms every day, for which no prior purchase history is availabl… ▽ More Finding clothes that fit is a hot topic in the e-commerce fashion industry. Most approaches addressing this problem are based on statistical methods relying on historical data of articles purchased and returned to the store. Such approaches suffer from the cold start problem for the thousands of articles appearing on the shopping platforms every day, for which no prior purchase history is available. We propose to employ visual data to infer size and fit characteristics of fashion articles. We introduce SizeNet, a weakly-supervised teacher-student training framework that leverages the power of statistical models combined with the rich visual information from article images to learn visual cues for size and fit characteristics, capable of tackling the challenging cold start problem. Detailed experiments are performed on thousands of textile garments, including dresses, trousers, knitwear, tops, etc. from hundreds of different brands. △ Less

Submitted 28 May, 2019; originally announced May 2019.

Comments: IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) 2019 Focus on Fashion and Subjective Search - Understanding Subjective Attributes of Data (FFSS-USAD)

arXiv:1608.07929 [pdf, other]

doi 10.1007/s11634-015-0218-6

Discovering Patterns in Time-Varying Graphs: A Triclustering Approach

Authors: Romain Guigourès, Marc Boullé, Fabrice Rossi

Abstract: This paper introduces a novel technique to track structures in time varying graphs. The method uses a maximum a posteriori approach for adjusting a three-dimensional co-clustering of the source vertices, the destination vertices and the time, to the data under study, in a way that does not require any hyper-parameter tuning. The three dimensions are simultaneously segmented in order to build clust… ▽ More This paper introduces a novel technique to track structures in time varying graphs. The method uses a maximum a posteriori approach for adjusting a three-dimensional co-clustering of the source vertices, the destination vertices and the time, to the data under study, in a way that does not require any hyper-parameter tuning. The three dimensions are simultaneously segmented in order to build clusters of source vertices, destination vertices and time segments where the edge distributions across clusters of vertices follow the same evolution over the time segments. The main novelty of this approach lies in that the time segments are directly inferred from the evolution of the edge distribution between the vertices, thus not requiring the user to make any a priori quantization. Experiments conducted on artificial data illustrate the good behavior of the technique, and a study of a real-life data set shows the potential of the proposed approach for exploratory data analysis. △ Less

Submitted 29 August, 2016; originally announced August 2016.

Comments: Advances in Data Analysis and Classification, Springer Verlag, 2015, Online First

arXiv:1511.01281 [pdf, other]

doi 10.1007/978-3-319-23751-0_2

Co-Clustering Network-Constrained Trajectory Data

Authors: Mohamed Khalil El Mahrsi, Romain Guigourès, Fabrice Rossi, Marc Boullé

Abstract: Recently, clustering moving object trajectories kept gaining interest from both the data mining and machine learning communities. This problem, however, was studied mainly and extensively in the setting where moving objects can move freely on the euclidean space. In this paper, we study the problem of clustering trajectories of vehicles whose movement is restricted by the underlying road network.… ▽ More Recently, clustering moving object trajectories kept gaining interest from both the data mining and machine learning communities. This problem, however, was studied mainly and extensively in the setting where moving objects can move freely on the euclidean space. In this paper, we study the problem of clustering trajectories of vehicles whose movement is restricted by the underlying road network. We model relations between these trajectories and road segments as a bipartite graph and we try to cluster its vertices. We demonstrate our approaches on synthetic data and show how it could be useful in inferring knowledge about the flow dynamics and the behavior of the drivers using the road network. △ Less

Submitted 4 November, 2015; originally announced November 2015.

Journal ref: Advances in Knowledge Discovery and Management, 615, Springer International Publishing, pp.19-32, 2015, Studies in Computational Intelligence, 978-3-319-23750-3

arXiv:1510.09005 [pdf, other]

doi 10.1007/978-3-319-23751-0_1

A Study of the Spatio-Temporal Correlations in Mobile Calls Networks

Authors: Romain Guigourès, Marc Boullé, Fabrice Rossi

Abstract: For the last few years, the amount of data has significantly increased in the companies. It is the reason why data analysis methods have to evolve to meet new demands. In this article, we introduce a practical analysis of a large database from a telecommunication operator. The problem is to segment a territory and characterize the retrieved areas owing to their inhabitant behavior in terms of mobi… ▽ More For the last few years, the amount of data has significantly increased in the companies. It is the reason why data analysis methods have to evolve to meet new demands. In this article, we introduce a practical analysis of a large database from a telecommunication operator. The problem is to segment a territory and characterize the retrieved areas owing to their inhabitant behavior in terms of mobile telephony. We have call detail records collected during five months in France. We propose a two stages analysis. The first one aims at grouping source antennas which originating calls are similarly distributed on target antennas and conversely for target antenna w.r.t. source antenna. A geographic projection of the data is used to display the results on a map of France. The second stage discretizes the time into periods between which we note changes in distributions of calls emerging from the clusters of source antennas. This enables an analysis of temporal changes of inhabitants behavior in every area of the country. △ Less

Submitted 30 October, 2015; originally announced October 2015.

Comments: Advances in Knowledge Discovery and Management, 615, Springer International Publishing, pp.3-17, 2015, Studies in Computational Intelligence

arXiv:1505.01300 [pdf, other]

Cats & Co: Categorical Time Series Coclustering

Authors: Dominique Gay, Romain Guigourès, Marc Boullé, Fabrice Clérot

Abstract: We suggest a novel method of clustering and exploratory analysis of temporal event sequences data (also known as categorical time series) based on three-dimensional data grid models. A data set of temporal event sequences can be represented as a data set of three-dimensional points, each point is defined by three variables: a sequence identifier, a time value and an event value. Instantiating data… ▽ More We suggest a novel method of clustering and exploratory analysis of temporal event sequences data (also known as categorical time series) based on three-dimensional data grid models. A data set of temporal event sequences can be represented as a data set of three-dimensional points, each point is defined by three variables: a sequence identifier, a time value and an event value. Instantiating data grid models to the 3D-points turns the problem into 3D-coclustering. The sequences are partitioned into clusters, the time variable is discretized into intervals and the events are partitioned into clusters. The cross-product of the univariate partitions forms a multivariate partition of the representation space, i.e., a grid of cells and it also represents a nonparametric estimator of the joint distribution of the sequences, time and events dimensions. Thus, the sequences are grouped together because they have similar joint distribution of time and events, i.e., similar distribution of events along the time dimension. The best data grid is computed using a parameter-free Bayesian model selection approach. We also suggest several criteria for exploiting the resulting grid through agglomerative hierarchies, for interpreting the clusters of sequences and characterizing their components through insightful visualizations. Extensive experiments on both synthetic and real-world data sets demonstrate that data grid models are efficient, effective and discover meaningful underlying patterns of categorical time series data. △ Less

Submitted 6 May, 2015; originally announced May 2015.

ACM Class: H.2.8

arXiv:1503.06060 [pdf, other]

Country-scale Exploratory Analysis of Call Detail Records through the Lens of Data Grid Models

Authors: Romain Guigourès, Dominique Gay, Marc Boullé, Fabrice Clérot, Fabrice Rossi

Abstract: Call Detail Records (CDRs) are data recorded by telecommunications companies, consisting of basic informations related to several dimensions of the calls made through the network: the source, destination, date and time of calls. CDRs data analysis has received much attention in the recent years since it might reveal valuable information about human behavior. It has shown high added value in many a… ▽ More Call Detail Records (CDRs) are data recorded by telecommunications companies, consisting of basic informations related to several dimensions of the calls made through the network: the source, destination, date and time of calls. CDRs data analysis has received much attention in the recent years since it might reveal valuable information about human behavior. It has shown high added value in many application domains like e.g., communities analysis or network planning. In this paper, we suggest a generic methodology for summarizing information contained in CDRs data. The method is based on a parameter-free estimation of the joint distribution of the variables that describe the calls. We also suggest several well-founded criteria that allows one to browse the summary at various granularities and to explore the summary by means of insightful visualizations. The method handles network graph data, temporal sequence data as well as user mobility data stemming from original CDRs data. We show the relevance of our methodology for various case studies on real-world CDRs data from Ivory Coast. △ Less

Submitted 20 March, 2015; originally announced March 2015.

Comments: Submitted to Industrial Track of ECML/PKDD 2015

MSC Class: stat.ML - Machine Learning

arXiv:1407.0612 [pdf, other]

doi 10.1007/978-3-319-02999-3_2

Nonparametric Hierarchical Clustering of Functional Data

Authors: Marc Boullé, Romain Guigourès, Fabrice Rossi

Abstract: In this paper, we deal with the problem of curves clustering. We propose a nonparametric method which partitions the curves into clusters and discretizes the dimensions of the curve points into intervals. The cross-product of these partitions forms a data-grid which is obtained using a Bayesian model selection approach while making no assumptions regarding the curves. Finally, a post-processing te… ▽ More In this paper, we deal with the problem of curves clustering. We propose a nonparametric method which partitions the curves into clusters and discretizes the dimensions of the curve points into intervals. The cross-product of these partitions forms a data-grid which is obtained using a Bayesian model selection approach while making no assumptions regarding the curves. Finally, a post-processing technique, aiming at reducing the number of clusters in order to improve the interpretability of the clustering, is proposed. It consists in optimally merging the clusters step by step, which corresponds to an agglomerative hierarchical classification whose dissimilarity measure is the variation of the criterion. Interestingly this measure is none other than the sum of the Kullback-Leibler divergences between clusters distributions before and after the merges. The practical interest of the approach for functional data exploratory analysis is presented and compared with an alternative approach on an artificial and a real world data set. △ Less

Submitted 2 July, 2014; originally announced July 2014.

Journal ref: Advances in Knowledge Discovery and Management, Guillet, Fabrice and Pinaud, Bruno and Venturini, Gilles and Zighed, Djamel Abdelkader (Ed.) (2014) 15-35

arXiv:1301.2659 [pdf, other]

doi 10.1109/ICDMW.2012.61

A Triclustering Approach for Time Evolving Graphs

Authors: Romain Guigourès, Marc Boullé, Fabrice Rossi

Abstract: This paper introduces a novel technique to track structures in time evolving graphs. The method is based on a parameter free approach for three-dimensional co-clustering of the source vertices, the target vertices and the time. All these features are simultaneously segmented in order to build time segments and clusters of vertices whose edge distributions are similar and evolve in the same way ove… ▽ More This paper introduces a novel technique to track structures in time evolving graphs. The method is based on a parameter free approach for three-dimensional co-clustering of the source vertices, the target vertices and the time. All these features are simultaneously segmented in order to build time segments and clusters of vertices whose edge distributions are similar and evolve in the same way over the time segments. The main novelty of this approach lies in that the time segments are directly inferred from the evolution of the edge distribution between the vertices, thus not requiring the user to make an a priori discretization. Experiments conducted on a synthetic dataset illustrate the good behaviour of the technique, and a study of a real-life dataset shows the potential of the proposed approach for exploratory data analysis. △ Less

Submitted 12 January, 2013; originally announced January 2013.

Journal ref: Co-clustering and Applications International Conference on Data Mining Workshop, Brussels : Belgium (2012)

Showing 1–10 of 10 results for author: Guigourès, R