THERIF: A Pipeline for Generating Themes for Readability with Iterative Feedback
Authors:
Tianyuan Cai,
Aleena Gertrudes Niklaus,
Michael Kraley,
Bernard Kerr,
Zoya Bylinskii
Abstract:
Digital reading applications give readers the ability to customize fonts, sizes, and spacings, all of which have been shown to improve the reading experience for readers from different demographics. However, tweaking these text features can be challenging, especially given their interactions on the final look and feel of the text. Our solution is to offer readers preset combinations of font, chara…
▽ More
Digital reading applications give readers the ability to customize fonts, sizes, and spacings, all of which have been shown to improve the reading experience for readers from different demographics. However, tweaking these text features can be challenging, especially given their interactions on the final look and feel of the text. Our solution is to offer readers preset combinations of font, character, word and line spacing, which we bundle together into reading themes. To arrive at a recommended set of reading themes, we present our THERIF pipeline, which combines crowdsourced text adjustments, ML-driven clustering of text formats, and design sessions. We show that after four iterations of our pipeline, we converge on a set of three COR themes (Compact, Open, and Relaxed) that meet diverse readers' preferences, when evaluating the reading speeds, comprehension scores, and preferences of hundreds of readers with and without dyslexia, using crowdsourced experiments.
△ Less
Submitted 7 March, 2023;
originally announced March 2023.
Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Network
Authors:
Xiao Yang,
Ersin Yumer,
Paul Asente,
Mike Kraley,
Daniel Kifer,
C. Lee Giles
Abstract:
We present an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images. We consider document semantic structure extraction as a pixel-wise segmentation task, and propose a unified model that classifies pixels based not only on their visual appearance, as in the traditional page segmentation task, but also on the content of underlying text. Moreove…
▽ More
We present an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images. We consider document semantic structure extraction as a pixel-wise segmentation task, and propose a unified model that classifies pixels based not only on their visual appearance, as in the traditional page segmentation task, but also on the content of underlying text. Moreover, we propose an efficient synthetic document generation process that we use to generate pretraining data for our network. Once the network is trained on a large set of synthetic documents, we fine-tune the network on unlabeled real documents using a semi-supervised approach. We systematically study the optimum network architecture and show that both our multimodal approach and the synthetic data pretraining significantly boost the performance.
△ Less
Submitted 7 June, 2017;
originally announced June 2017.