Skip to main content

Showing 1–34 of 34 results for author: Henry, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.16840  [pdf, other

    cs.CV

    A Low-Cost Photogrammetry System for 3D Plant Modeling and Phenotyping

    Authors: Joe Hrzich, Michael A. Beck, Christopher P. Bidinosti, Christopher J. Henry, Kalhari Manawasinghe, Karen Tanino

    Abstract: We present an open-source, low-cost photogrammetry system for 3D plant modeling and phenotyping. The system uses a structure-from-motion approach to reconstruct 3D representations of the plants via point clouds. Using wheat as an example, we demonstrate how various phenotypic traits can be computed easily from the point clouds. These include standard measurements such as plant height and radius, a… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  2. Impact of buckypaper on the mechanical properties and failure modes of composites

    Authors: Kartik Tripathi, Mohamed H. Hamza, Aditi Chattopadhyay, Todd C. Henry, Asha Hall

    Abstract: Recently, there has been an interest in the incorporation of buckypaper (BP), or carbon nanotube (CNT) membranes, in composite laminates. Research has shown that using BP in contrast to nanotube doped resin enables the introduction of a higher CNT weight fraction which offers multiple benefits including higher piezo resistivity for health monitoring applications and enhanced mechanical response fo… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: In 38th Technical Conference of the American Society for Composites, ASC 2023 (pp. 2281-2297)

    Journal ref: In 38th Technical Conference of the American Society for Composites, ASC 2023 (pp. 2281-2297). DEStech Publications

  3. arXiv:2503.02968  [pdf, other

    cs.LG cs.CR

    Privacy-Preserving Fair Synthetic Tabular Data

    Authors: Fatima J. Sarmin, Atiquer R. Rahman, Christopher J. Henry, Noman Mohammed

    Abstract: Sharing of tabular data containing valuable but private information is limited due to legal and ethical issues. Synthetic data could be an alternative solution to this sharing problem, as it is artificially generated by machine learning algorithms and tries to capture the underlying data distribution. However, machine learning models are not free from memorization and may introduce biases, as they… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

  4. arXiv:2411.12844  [pdf, other

    cs.HC cs.CL cs.RO

    SCOUT: A Situated and Multi-Modal Human-Robot Dialogue Corpus

    Authors: Stephanie M. Lukin, Claire Bonial, Matthew Marge, Taylor Hudson, Cory J. Hayes, Kimberly A. Pollard, Anthony Baker, Ashley N. Foots, Ron Artstein, Felix Gervits, Mitchell Abrams, Cassidy Henry, Lucia Donatelli, Anton Leuski, Susan G. Hill, David Traum, Clare R. Voss

    Abstract: We introduce the Situated Corpus Of Understanding Transactions (SCOUT), a multi-modal collection of human-robot dialogue in the task domain of collaborative exploration. The corpus was constructed from multiple Wizard-of-Oz experiments where human participants gave verbal instructions to a remotely-located robot to move and gather information about its surroundings. SCOUT contains 89,056 utterance… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

    Comments: 14 pages, 7 figures

    ACM Class: I.2.7; I.2.9; I.2.10; H.5.2; J.7

    Journal ref: 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) https://aclanthology.org/2024.lrec-main.1259/

  5. Human-Robot Dialogue Annotation for Multi-Modal Common Ground

    Authors: Claire Bonial, Stephanie M. Lukin, Mitchell Abrams, Anthony Baker, Lucia Donatelli, Ashley Foots, Cory J. Hayes, Cassidy Henry, Taylor Hudson, Matthew Marge, Kimberly A. Pollard, Ron Artstein, David Traum, Clare R. Voss

    Abstract: In this paper, we describe the development of symbolic representations annotated on human-robot dialogue data to make dimensions of meaning accessible to autonomous systems participating in collaborative, natural language dialogue, and to enable common ground with human partners. A particular challenge for establishing common ground arises in remote dialogue (occurring in disaster relief or search… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

    Comments: 52 pages, 14 figures

    ACM Class: I.2.7; I.2.9; I.2.10; H.5.2; J.7

    Journal ref: Language Resources and Evaluation 2024

  6. arXiv:2409.12817  [pdf, other

    cs.CV

    Automated Linear Disturbance Mapping via Semantic Segmentation of Sentinel-2 Imagery

    Authors: Andrew M. Nagel, Anne Webster, Christopher Henry, Christopher Storie, Ignacio San-Miguel Sanchez, Olivier Tsui, Jason Duffe, Andy Dean

    Abstract: In Canada's northern regions, linear disturbances such as roads, seismic exploration lines, and pipelines pose a significant threat to the boreal woodland caribou population (Rangifer tarandus). To address the critical need for management of these disturbances, there is a strong emphasis on developing mapping approaches that accurately identify forest habitat fragmentation. The traditional approac… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

  7. arXiv:2404.17607  [pdf, other

    cs.IR cs.AI cs.CL cs.LG cs.SI

    Utilizing Large Language Models to Identify Reddit Users Considering Vaping Cessation for Digital Interventions

    Authors: Sai Krishna Revanth Vuruma, Dezhi Wu, Saborny Sen Gupta, Lucas Aust, Valerie Lookingbill, Caleb Henry, Yang Ren, Erin Kasson, Li-Shiun Chen, Patricia Cavazos-Rehg, Dian Hu, Ming Huang

    Abstract: The widespread adoption of social media platforms globally not only enhances users' connectivity and communication but also emerges as a vital channel for the dissemination of health-related information, thereby establishing social media data as an invaluable organic data resource for public health research. The surge in popularity of vaping or e-cigarette use in the United States and other countr… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  8. arXiv:2404.03092  [pdf, other

    cs.CL cs.RO

    Unsupervised, Bottom-up Category Discovery for Symbol Grounding with a Curious Robot

    Authors: Catherine Henry, Casey Kennington

    Abstract: Towards addressing the Symbol Grounding Problem and motivated by early childhood language development, we leverage a robot which has been equipped with an approximate model of curiosity with particular focus on bottom-up building of unsupervised categories grounded in the physical world. That is, rather than starting with a top-down symbol (e.g., a word referring to an object) and providing meanin… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 10 pages

  9. arXiv:2401.16600  [pdf, other

    cs.CV

    Depth Anything in Medical Images: A Comparative Study

    Authors: John J. Han, Ayberk Acar, Callahan Henry, Jie Ying Wu

    Abstract: Monocular depth estimation (MDE) is a critical component of many medical tracking and mapping algorithms, particularly from endoscopic or laparoscopic video. However, because ground truth depth maps cannot be acquired from real patient data, supervised learning is not a viable approach to predict depth maps for medical scenes. Although self-supervised learning for MDE has recently gained attention… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: 10 pages, 2 figures, 3 tables

  10. arXiv:2310.08470  [pdf, other

    cs.LG cs.NE

    Strategies and impact of learning curve estimation for CNN-based image classification

    Authors: Laura Didyk, Brayden Yarish, Michael A. Beck, Christopher P. Bidinosti, Christopher J. Henry

    Abstract: Learning curves are a measure for how the performance of machine learning models improves given a certain volume of training data. Over a wide variety of applications and models it was observed that learning curves follow -- to a large extent -- a power law behavior. This makes the performance of different models for a given task somewhat predictable and opens the opportunity to reduce the trainin… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  11. arXiv:2308.05074  [pdf, other

    cs.CY cs.AI cs.CV

    Drones4Good: Supporting Disaster Relief Through Remote Sensing and AI

    Authors: Nina Merkle, Reza Bahmanyar, Corentin Henry, Seyed Majid Azimi, Xiangtian Yuan, Simon Schopferer, Veronika Gstaiger, Stefan Auer, Anne Schneibel, Marc Wieland, Thomas Kraft

    Abstract: In order to respond effectively in the aftermath of a disaster, emergency services and relief organizations rely on timely and accurate information about the affected areas. Remote sensing has the potential to significantly reduce the time and effort required to collect such information by enabling a rapid survey of large areas. To achieve this, the main challenge is the automatic extraction of re… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  12. A comprehensive review of 3D convolutional neural network-based classification techniques of diseased and defective crops using non-UAV-based hyperspectral images

    Authors: Nooshin Noshiri, Michael A. Beck, Christopher P. Bidinosti, Christopher J. Henry

    Abstract: Hyperspectral imaging (HSI) is a non-destructive and contactless technology that provides valuable information about the structure and composition of an object. It can capture detailed information about the chemical and physical properties of agricultural crops. Due to its wide spectral range, compared with multispectral- or RGB-based imaging methods, HSI can be a more effective tool for monitorin… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Journal ref: Smart Agricultural Technology 5 (2023) 100316

  13. arXiv:2303.05634  [pdf, other

    cs.CV cs.LG eess.IV

    Fusarium head blight detection, spikelet estimation, and severity assessment in wheat using 3D convolutional neural networks

    Authors: Oumaima Hamila, Christopher J. Henry, Oscar I. Molina, Christopher P. Bidinosti, Maria Antonia Henriquez

    Abstract: Fusarium head blight (FHB) is one of the most significant diseases affecting wheat and other small grain cereals worldwide. The development of resistant varieties requires the laborious task of field and greenhouse phenotyping. The applications considered in this work are the automated detection of FHB disease symptoms expressed on a wheat plant, the automated estimation of the total number of spi… ▽ More

    Submitted 9 March, 2023; originally announced March 2023.

  14. arXiv:2212.12056  [pdf, other

    cs.CV cs.LG

    Semantically-consistent Landsat 8 image to Sentinel-2 image translation for alpine areas

    Authors: M. Sokolov, J. L. Storie, C. J. Henry, C. D. Storie, J. Cameron, R. S. Ødegård, V. Zubinaite, S. Stikbakke

    Abstract: The availability of frequent and cost-free satellite images is in growing demand in the research world. Such satellite constellations as Landsat 8 and Sentinel-2 provide a massive amount of valuable data daily. However, the discrepancy in the sensors' characteristics of these satellites makes it senseless to use a segmentation model trained on either dataset and applied to another, which is why do… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

    Comments: 13 pages, 6 figures

  15. arXiv:2211.03854  [pdf, other

    cs.CV cs.LG eess.IV

    Exploration of Convolutional Neural Network Architectures for Large Region Map Automation

    Authors: R. M. Tsenov, C. J. Henry, J. L. Storie, C. D. Storie, B. Murray, M. Sokolov

    Abstract: Deep learning semantic segmentation algorithms have provided improved frameworks for the automated production of Land-Use and Land-Cover (LULC) maps, which significantly increases the frequency of map generation as well as consistency of production quality. In this research, a total of 28 different model variations were examined to improve the accuracy of LULC maps. The experiments were carried ou… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

  16. Inside Out: Transforming Images of Lab-Grown Plants for Machine Learning Applications in Agriculture

    Authors: A. E. Krosney, P. Sotoodeh, C. J. Henry, M. A. Beck, C. P. Bidinosti

    Abstract: Machine learning tasks often require a significant amount of training data for the resultant network to perform suitably for a given problem in any domain. In agriculture, dataset sizes are further limited by phenotypical differences between two plants of the same genotype, often as a result of differing growing conditions. Synthetically-augmented datasets have shown promise in improving existing… ▽ More

    Submitted 5 November, 2022; originally announced November 2022.

    Comments: 35 pages, 23 figures

  17. High-resolution semantically-consistent image-to-image translation

    Authors: Mikhail Sokolov, Christopher Henry, Joni Storie, Christopher Storie, Victor Alhassan, Mathieu Turgeon-Pelchat

    Abstract: Deep learning has become one of remote sensing scientists' most efficient computer vision tools in recent years. However, the lack of training labels for the remote sensing datasets means that scientists need to solve the domain adaptation problem to narrow the discrepancy between satellite image datasets. As a result, image segmentation models that are then trained, could better generalize and us… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: 25 pages, 7 figures

  18. arXiv:2205.10955  [pdf, other

    cs.LG

    Investigating classification learning curves for automatically generated and labelled plant images

    Authors: Michael A. Beck, Christopher P. Bidinosti, Christopher J. Henry, Manisha Ajmani

    Abstract: In the context of supervised machine learning a learning curve describes how a model's performance on unseen data relates to the amount of samples used to train the model. In this paper we present a dataset of plant images with representatives of crops and weeds common to the Manitoba prairies at different growth stages. We determine the learning curve for a classification task on this data with t… ▽ More

    Submitted 30 June, 2022; v1 submitted 22 May, 2022; originally announced May 2022.

  19. arXiv:2203.13691  [pdf, other

    cs.CV cs.AI cs.DB cs.RO

    The TerraByte Client: providing access to terabytes of plant data

    Authors: Michael A. Beck, Christopher P. Bidinosti, Christopher J. Henry, Manisha Ajmani

    Abstract: In this paper we demonstrate the TerraByte Client, a software to download user-defined plant datasets from a data portal hosted at Compute Canada. To that end the client offers two key functionalities: (1) It allows the user to get an overview on what data is available and a quick way to visually check samples of that data. For this the client receives the results of queries to a database and disp… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

  20. arXiv:2203.02611  [pdf, other

    cs.CV cs.LG

    Plant Species Recognition with Optimized 3D Polynomial Neural Networks and Variably Overlapping Time-Coherent Sliding Window

    Authors: Habib Ben Abdallah, Christopher J. Henry, Sheela Ramanna

    Abstract: Recently, the EAGL-I system was developed to rapidly create massive labeled datasets of plants intended to be commonly used by farmers and researchers to create AI-driven solutions in agriculture. As a result, a publicly available plant species recognition dataset composed of 40,000 images with different sizes consisting of 8 plant species was created with the system in order to demonstrate its ca… ▽ More

    Submitted 29 August, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

  21. arXiv:2108.05789  [pdf, other

    cs.CV cs.AI cs.RO

    Presenting an extensive lab- and field-image dataset of crops and weeds for computer vision tasks in agriculture

    Authors: Michael A. Beck, Chen-Yi Liu, Christopher P. Bidinosti, Christopher J. Henry, Cara M. Godee, Manisha Ajmani

    Abstract: We present two large datasets of labelled plant-images that are suited towards the training of machine learning and computer vision models. The first dataset encompasses as the day of writing over 1.2 million images of indoor-grown crops and weeds common to the Canadian Prairies and many US states. The second dataset consists of over 540,000 images of plants imaged in farmland. All indoor plant im… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

  22. arXiv:2103.14734  [pdf, other

    eess.IV cs.CV cs.LG

    Fully Automated 2D and 3D Convolutional Neural Networks Pipeline for Video Segmentation and Myocardial Infarction Detection in Echocardiography

    Authors: Oumaima Hamila, Sheela Ramanna, Christopher J. Henry, Serkan Kiranyaz, Ridha Hamila, Rashid Mazhar, Tahir Hamid

    Abstract: Cardiac imaging known as echocardiography is a non-invasive tool utilized to produce data including images and videos, which cardiologists use to diagnose cardiac abnormalities in general and myocardial infarction (MI) in particular. Echocardiography machines can deliver abundant amounts of data that need to be quickly analyzed by cardiologists to help them make a diagnosis and treat cardiac condi… ▽ More

    Submitted 3 August, 2022; v1 submitted 26 March, 2021; originally announced March 2021.

    Comments: Multimed Tools Appl (2022)

  23. arXiv:2009.04077  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    1-Dimensional polynomial neural networks for audio signal related problems

    Authors: Habib Ben Abdallah, Christopher J. Henry, Sheela Ramanna

    Abstract: In addition to being extremely non-linear, modern problems require millions if not billions of parameters to solve or at least to get a good approximation of the solution, and neural networks are known to assimilate that complexity by deepening and widening their topology in order to increase the level of non-linearity needed for a better approximation. However, compact topologies are always prefe… ▽ More

    Submitted 12 January, 2022; v1 submitted 8 September, 2020; originally announced September 2020.

  24. arXiv:2007.06124  [pdf, other

    cs.CV

    EAGLE: Large-scale Vehicle Detection Dataset in Real-World Scenarios using Aerial Imagery

    Authors: Seyed Majid Azimi, Reza Bahmanyar, Corenin Henry, Franz Kurz

    Abstract: Multi-class vehicle detection from airborne imagery with orientation estimation is an important task in the near and remote vision domains with applications in traffic monitoring and disaster management. In the last decade, we have witnessed significant progress in object detection in ground imagery, but it is still in its infancy in airborne imagery, mostly due to the scarcity of diverse and larg… ▽ More

    Submitted 23 November, 2020; v1 submitted 12 July, 2020; originally announced July 2020.

    Comments: Accepted in ICPR 2020

  25. arXiv:2007.06102  [pdf, other

    cs.CV

    SkyScapes -- Fine-Grained Semantic Understanding of Aerial Scenes

    Authors: Seyed Majid Azimi, Corentin Henry, Lars Sommer, Arne Schumann, Eleonora Vig

    Abstract: Understanding the complex urban infrastructure with centimeter-level accuracy is essential for many applications from autonomous driving to mapping, infrastructure monitoring, and urban management. Aerial images provide valuable information over a large area instantaneously; nevertheless, no current dataset captures the complexity of aerial scenes at the level of granularity required by real-world… ▽ More

    Submitted 12 July, 2020; originally announced July 2020.

    Comments: Accepted in IEEE ICCV19

  26. An embedded system for the automated generation of labeled plant images to enable machine learning applications in agriculture

    Authors: Michael A. Beck, Chen-Yi Liu, Christopher P. Bidinosti, Christopher J. Henry, Cara M. Godee, Manisha Ajmani

    Abstract: A lack of sufficient training data, both in terms of variety and quantity, is often the bottleneck in the development of machine learning (ML) applications in any domain. For agricultural applications, ML-based models designed to perform tasks such as autonomous plant classification will typically be coupled to just one or perhaps a few plant species. As a consequence, each crop-specific task is v… ▽ More

    Submitted 1 April, 2021; v1 submitted 1 June, 2020; originally announced June 2020.

    Comments: 35 pages, 8 figures, Preprint submitted to PLoS One

  27. ELRUNA: Elimination Rule-based Network Alignment

    Authors: Zirou Qiu, Ruslan Shaydulin, Xiaoyuan Liu, Yuri Alexeev, Christopher S. Henry, Ilya Safro

    Abstract: Networks model a variety of complex phenomena across different domains. In many applications, one of the most essential tasks is to align two or more networks to infer the similarities between cross-network vertices and discover potential node-level correspondence. In this paper, we propose ELRUNA (Elimination rule-based network alignment), a novel network alignment algorithm that relies exclusive… ▽ More

    Submitted 23 February, 2021; v1 submitted 29 October, 2019; originally announced November 2019.

    Journal ref: ACM J. Exp. Algorithmics 26, 1, Article 1.7 (2021)

  28. arXiv:1810.02017  [pdf, other

    cs.RO cs.HC

    Balancing Efficiency and Coverage in Human-Robot Dialogue Collection

    Authors: Matthew Marge, Claire Bonial, Stephanie Lukin, Cory Hayes, Ashley Foots, Ron Artstein, Cassidy Henry, Kimberly Pollard, Carla Gordon, Felix Gervits, Anton Leuski, Susan Hill, Clare Voss, David Traum

    Abstract: We describe a multi-phased Wizard-of-Oz approach to collecting human-robot dialogue in a collaborative search and navigation task. The data is being used to train an initial automated robot dialogue system to support collaborative exploration tasks. In the first phase, a wizard freely typed robot utterances to human participants. For the second phase, this data was used to design a GUI that includ… ▽ More

    Submitted 7 October, 2018; v1 submitted 3 October, 2018; originally announced October 2018.

    Comments: Presented at AI-HRI AAAI-FSS, 2018 (arXiv:1809.06606)

    Report number: AI-HRI/2018/01

  29. arXiv:1807.08076  [pdf, ps, other

    cs.CL cs.HC cs.RO

    Consequences and Factors of Stylistic Differences in Human-Robot Dialogue

    Authors: Stephanie M. Lukin, Kimberly A. Pollard, Claire Bonial, Matthew Marge, Cassidy Henry, Ron Arstein, David Traum, Clare R. Voss

    Abstract: This paper identifies stylistic differences in instruction-giving observed in a corpus of human-robot dialogue. Differences in verbosity and structure (i.e., single-intent vs. multi-intent instructions) arose naturally without restrictions or prior guidance on how users should speak with the robot. Different styles were found to produce different rates of miscommunication, and correlations were fo… ▽ More

    Submitted 20 July, 2018; originally announced July 2018.

    Comments: Originally published in the Proceedings of the 19th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 2018

  30. Road Segmentation in SAR Satellite Images with Deep Fully-Convolutional Neural Networks

    Authors: Corentin Henry, Seyed Majid Azimi, Nina Merkle

    Abstract: Remote sensing is extensively used in cartography. As transportation networks grow and change, extracting roads automatically from satellite images is crucial to keep maps up-to-date. Synthetic Aperture Radar satellites can provide high resolution topographical maps. However roads are difficult to identify in these data as they look visually similar to targets such as rivers and railways. Most roa… ▽ More

    Submitted 16 August, 2018; v1 submitted 5 February, 2018; originally announced February 2018.

    Comments: 5 pages, accepted for publication in IEEE Geoscience and Remote Sensing Letters

  31. arXiv:1710.06406  [pdf, other

    cs.CL cs.AI cs.HC cs.RO

    Laying Down the Yellow Brick Road: Development of a Wizard-of-Oz Interface for Collecting Human-Robot Dialogue

    Authors: Claire Bonial, Matthew Marge, Ron artstein, Ashley Foots, Felix Gervits, Cory J. Hayes, Cassidy Henry, Susan G. Hill, Anton Leuski, Stephanie M. Lukin, Pooja Moolchandani, Kimberly A. Pollard, David Traum, Clare R. Voss

    Abstract: We describe the adaptation and refinement of a graphical user interface designed to facilitate a Wizard-of-Oz (WoZ) approach to collecting human-robot dialogue data. The data collected will be used to develop a dialogue system for robot navigation. Building on an interface previously used in the development of dialogue systems for virtual agents and video playback, we add templates with open param… ▽ More

    Submitted 17 October, 2017; originally announced October 2017.

    Comments: 7 pages, 2 figures, accepted for oral presentation at the Symposium on Natural Communication for Human-Robot Collaboration, AAAI Fall Symposium Series, November 9-11, 2017, https://www.aaai.org/ocs/index.php/FSS/FSS17

  32. arXiv:1310.6283  [pdf, other

    cs.FL math.GR

    The (Nested) Word Problem

    Authors: Christopher S. Henry

    Abstract: In this article we provide a new perspective on the word problem of a group by using languages of nested words. These were introduced by Alur and Madhusudan as a way to model programming languages such as HTML. We demonstrate how a class of nested word languages called visibly pushdown can be used to study the word problem of virtually free groups in a natural way.

    Submitted 27 October, 2014; v1 submitted 23 October, 2013; originally announced October 2013.

    Comments: 1 figure

    MSC Class: 20F10; 20E05; 68Q45; 03D40

  33. arXiv:0810.1261  [pdf, ps, other

    cs.CL cs.IR

    Soft Uncoupling of Markov Chains for Permeable Language Distinction: A New Algorithm

    Authors: Richard Nock, Pascal Vaillant, Frank Nielsen, Claudia Henry

    Abstract: Without prior knowledge, distinguishing different languages may be a hard task, especially when their borders are permeable. We develop an extension of spectral clustering -- a powerful unsupervised classification toolbox -- that is shown to resolve accurately the task of soft language distinction. At the heart of our approach, we replace the usual hard membership assignment of spectral clusteri… ▽ More

    Submitted 7 October, 2008; originally announced October 2008.

    Comments: 6 pages, 7 embedded figures, LaTeX 2e using the ecai2006.cls document class and the algorithm2e.sty style file (+ standard packages like epsfig, amsmath, amssymb, amsfonts...). Extends the short version contained in the ECAI 2006 proceedings

    ACM Class: H.3.3; I.2.7

    Journal ref: ECAI 2006: 17th European Conference on Artificial Intelligence. Riva del Garda, Italy, 29 August - 1st September 2006

  34. arXiv:0810.1212  [pdf, ps, other

    cs.CL cs.IR

    Analyse spectrale des textes: détection automatique des frontières de langue et de discours

    Authors: Pascal Vaillant, Richard Nock, Claudia Henry

    Abstract: We propose a theoretical framework within which information on the vocabulary of a given corpus can be inferred on the basis of statistical information gathered on that corpus. Inferences can be made on the categories of the words in the vocabulary, and on their syntactical properties within particular languages. Based on the same statistical data, it is possible to build matrices of syntagmatic… ▽ More

    Submitted 7 October, 2008; originally announced October 2008.

    Comments: In French. 10 pages, 5 figures, LaTeX 2e using EPSF and custom package taln2006.sty (designed by Pierre Zweigenbaum, ATALA). Proceedings of the 13th annual French-speaking conference on Natural Language Processing: `Traitement Automatique des Langues Naturelles' (TALN 2006), Louvain (Leuven), Belgium, 10-13 April 2003

    ACM Class: H.3.3; I.2.7

    Journal ref: Verbum ex machina: Actes de la 13eme conference annuelle sur le Traitement Automatique des Langues Naturelles (TALN 2006), p. 619-629. Louvain (Leuven), Belgique, 10-13 avril 2006