Skip to main content

Showing 1–16 of 16 results for author: Aupetit, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.20047  [pdf

    cs.IR cs.AI cs.DB

    HCT-QA: A Benchmark for Question Answering on Human-Centric Tables

    Authors: Mohammad S. Ahmad, Zan A. Naeem, Michaël Aupetit, Ahmed Elmagarmid, Mohamed Eltabakh, Xiasong Ma, Mourad Ouzzani, Chaoyi Ruan

    Abstract: Tabular data embedded within PDF files, web pages, and other document formats are prevalent across numerous sectors such as government, engineering, science, and business. These human-centric tables (HCTs) possess a unique combination of high business value, intricate layouts, limited operational power at scale, and sometimes serve as the only data source for critical insights. However, their comp… ▽ More

    Submitted 9 March, 2025; originally announced April 2025.

    Comments: 12 pages

  2. arXiv:2503.01097  [pdf, other

    cs.LG

    Measuring the Validity of Clustering Validation Datasets

    Authors: Hyeon Jeon, Michaël Aupetit, DongHwa Shin, Aeri Cho, Seokhyeon Park, Jinwook Seo

    Abstract: Clustering techniques are often validated using benchmark datasets where class labels are used as ground-truth clusters. However, depending on the datasets, class labels may not align with the actual data clusters, and such misalignment hampers accurate validation. Therefore, it is essential to evaluate and compare datasets regarding their cluster-label matching (CLM), i.e., how well their class l… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  3. arXiv:2308.00278  [pdf, other

    cs.LG

    Classes are not Clusters: Improving Label-based Evaluation of Dimensionality Reduction

    Authors: Hyeon Jeon, Yun-Hsin Kuo, Michaël Aupetit, Kwan-Liu Ma, Jinwook Seo

    Abstract: A common way to evaluate the reliability of dimensionality reduction (DR) embeddings is to quantify how well labeled classes form compact, mutually separated clusters in the embeddings. This approach is based on the assumption that the classes stay as clear clusters in the original high-dimensional space. However, in reality, this assumption can be violated; a single class can be fragmented into m… ▽ More

    Submitted 11 August, 2023; v1 submitted 1 August, 2023; originally announced August 2023.

    Comments: IEEE Transactions on Visualization and Computer Graphics (TVCG) (Proc. IEEE VIS 2023)

  4. arXiv:2209.10042  [pdf, other

    cs.LG

    Sanity Check for External Clustering Validation Benchmarks using Internal Validation Measures

    Authors: Hyeon Jeon, Michael Aupetit, DongHwa Shin, Aeri Cho, Seokhyeon Park, Jinwook Seo

    Abstract: We address the lack of reliability in benchmarking clustering techniques based on labeled datasets. A standard scheme in external clustering validation is to use class labels as ground truth clusters, based on the assumption that each class forms a single, clearly separated cluster. However, as such cluster-label matching (CLM) assumption often breaks, the lack of conducting a sanity check for the… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

    Comments: Datasets available on https://github.com/hj-n/labeled-datasets

  5. arXiv:2201.12822  [pdf, other

    cs.HC cs.LG

    ClassSPLOM -- A Scatterplot Matrix to Visualize Separation of Multiclass Multidimensional Data

    Authors: Michael Aupetit, Ahmed Ali

    Abstract: In multiclass classification of multidimensional data, the user wants to build a model of the classes to predict the label of unseen data. The model is trained on the data and tested on unseen data with known labels to evaluate its quality. The results are visualized as a confusion matrix which shows how many data labels have been predicted correctly or confused with other classes. The multidimens… ▽ More

    Submitted 30 January, 2022; originally announced January 2022.

    Comments: Work presented as a poster at IEEE VIS 2016 conference in Baltimore, MD, USA

    ACM Class: H.5.2

  6. arXiv:2201.06379  [pdf, other

    cs.HC cs.LG

    Distortion-Aware Brushing for Interactive Cluster Analysis in Multidimensional Projections

    Authors: Hyeon Jeon, Michael Aupetit, Soohyun Lee, Hyung-Kwon Ko, Youngtaek Kim, Jinwook Seo

    Abstract: Brushing is an everyday interaction in 2D scatterplots, which allows users to select and filter data points within a continuous, enclosed region and conduct further analysis on the points. However, such conventional brushing cannot be directly applied to Multidimensional Projections (MDP), as they hardly escape from False and Missing Neighbors distortions that make the relative positions of the po… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

    Comments: This work has been submitted to the IEEE Transactions on Visualization and Computer Graphics for possible publication

  7. ClustML: A Measure of Cluster Pattern Complexity in Scatterplots Learnt from Human-labeled Groupings

    Authors: Mostafa M. Abbas, Ehsan Ullah, Abdelkader Baggag, Halima Bensmail, Michael Sedlmair, Michaël Aupetit

    Abstract: Visual quality measures (VQMs) are designed to support analysts by automatically detecting and quantifying patterns in visualizations. We propose a new VQM for visual grouping patterns in scatterplots, called ClustML, which is trained on previously collected human subject judgments. Our model encodes scatterplots in the parametric space of a Gaussian Mixture Model and uses a classifier trained on… ▽ More

    Submitted 1 May, 2024; v1 submitted 1 June, 2021; originally announced June 2021.

    Comments: Published in SAGE Information Visualization journal

    ACM Class: D.2.8; H.1.2; I.3.6; I.5.1; I.5.2; I.5.3

    Journal ref: Information Visualization Journal 23(2) 105-122 (2024)

  8. arXiv:2101.12511  [pdf, other

    cs.HC

    Aquanims: Area-Preserving Animated Transitions in Statistical Data Graphics based on a Hydraulic Metaphor

    Authors: Michael Aupetit

    Abstract: We propose "aquanims" as new design metaphors for animated transitions that preserve displayed areas during the transformation. Animated transitions are used to facilitate understanding of graphical transformations between different visualizations. Area is key information to preserve during filtering or ordering transitions of area-based charts like bar charts, histograms, treemaps, or mosaic plot… ▽ More

    Submitted 29 January, 2021; originally announced January 2021.

    ACM Class: I.3.3

  9. arXiv:2012.04411  [pdf, other

    cs.HC

    An Enhanced MA Plot with R-Shiny to Ease Exploratory Analysis of Transcriptomic Data

    Authors: Ali Sheharyar, Talar Boghos Yacoubian, Dina Aljogol, Borbala Mifsud, Dena Al Thani, Michael Aupetit

    Abstract: MA plots are used to analyze the genome-wide differences in gene expression between two distinct biological conditions. An MA plot is usually rendered as a static scatter plot. Our interview with 3 experts in genomics showed that we could improve the usability of this plot by adding interactive analytic features. In this work we present the design study of the enhanced MA plot.

    Submitted 8 December, 2020; originally announced December 2020.

    Comments: Presented at BioVis 2020 Redesign Challenge @ IEEE VIS. http://biovis.net/2020/program_ieee/

  10. arXiv:2011.07532  [pdf, other

    cs.HC cs.GR

    Aquanims -- Area-Preserving Animated Transitions based on a Hydraulic Metaphor

    Authors: Michael Aupetit

    Abstract: We propose "Aquanims" as new design metaphors for animated transitions that preserve displayed areas during the transformation. As liquids are incompressible fluids, we use a hydraulic metaphor to convey the sense of area preservation during animated transitions. We study the design space of Aquanims for rectangle-based charts.

    Submitted 15 November, 2020; originally announced November 2020.

  11. arXiv:1904.02000  [pdf, other

    cs.SI

    Unsupervised User Stance Detection on Twitter

    Authors: Kareem Darwish, Peter Stefanov, Michaël Aupetit, Preslav Nakov

    Abstract: We present a highly effective unsupervised framework for detecting the stance of prolific Twitter users with respect to controversial topics. In particular, we use dimensionality reduction to project users onto a low-dimensional space, followed by clustering, which allows us to find core users that are representative of the different stances. Our framework has three major advantages over pre-exist… ▽ More

    Submitted 21 May, 2020; v1 submitted 3 April, 2019; originally announced April 2019.

    MSC Class: 62P25; 91D30

  12. arXiv:1805.05144  [pdf, other

    cs.SI

    A Twitter Tale of Three Hurricanes: Harvey, Irma, and Maria

    Authors: Firoj Alam, Ferda Ofli, Muhammad Imran, Michael Aupetit

    Abstract: People increasingly use microblogging platforms such as Twitter during natural disasters and emergencies. Research studies have revealed the usefulness of the data available on Twitter for several disaster response tasks. However, making sense of social media data is a challenging task due to several reasons such as limitations of available tools to analyze high-volume and high-velocity data strea… ▽ More

    Submitted 15 May, 2018; v1 submitted 14 May, 2018; originally announced May 2018.

    Comments: Accepted at ISCRAM 2018 conference

  13. arXiv:1705.05283  [pdf, other

    cs.HC

    Visualizing Dimensionality Reduction Artifacts: An Evaluation

    Authors: Nicolas Heulot, Jean-Daniel Fekete, Michael Aupetit

    Abstract: Multidimensional scaling allows visualizing high-dimensional data as 2D maps with the premise that insights in 2D reveal valid information in high-dimensions. However, the resulting projections suffer from artifacts such as bad local neighborhood preservation and clusters tearing. Interactively coloring the projection according to the discrepancy between original proximities relative to a referenc… ▽ More

    Submitted 15 May, 2017; originally announced May 2017.

    ACM Class: H.5.2

  14. arXiv:1705.03691  [pdf

    cs.HC

    Visualization of Wearable Data and Biometrics for Analysis and Recommendations in Childhood Obesity

    Authors: Michael Aupetit, Luis Fernandez-Luque, Meghna Singh, Jaideep Srivastava

    Abstract: Obesity is one of the major health risk factors be- hind the rise of non-communicable conditions. Understanding the factors influencing obesity is very complex since there are many variables that can affect the health behaviors leading to it. Nowadays, multiple data sources can be used to study health behaviors, such as wearable sensors for physical activity and sleep, social media, mobile and hea… ▽ More

    Submitted 10 May, 2017; originally announced May 2017.

    Comments: 2 pages short paper IEEE CBMS 2017

  15. arXiv:1203.2021  [pdf

    cs.IR

    A new supervised non-linear mapping

    Authors: Sylvain Lespinats, Anke Meyer-Baese, Michael Aupetit

    Abstract: Supervised mapping methods project multi-dimensional labeled data onto a 2-dimensional space attempting to preserve both data similarities and topology of classes. Supervised mappings are expected to help the user to understand the underlying original class structure and to classify new data visually. Several methods have been designed to achieve supervised mapping, but many of them modify origina… ▽ More

    Submitted 9 March, 2012; originally announced March 2012.

    Comments: 2 pages

  16. arXiv:cs/0604046  [pdf, ps, other

    cs.LG cs.NE

    Concerning the differentiability of the energy function in vector quantization algorithms

    Authors: Dominique Lepetz, Max Nemoz-Gaillard, Michael Aupetit

    Abstract: The adaptation rule for Vector Quantization algorithms, and consequently the convergence of the generated sequence, depends on the existence and properties of a function called the energy function, defined on a topological manifold. Our aim is to investigate the conditions of existence of such a function for a class of algorithms examplified by the initial ''K-means'' and Kohonen algorithms. The… ▽ More

    Submitted 11 April, 2006; originally announced April 2006.

    Comments: Under submission to a peer-reviewed international journal