Skip to main content

Showing 1–12 of 12 results for author: Zanibbi, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.16865  [pdf, other

    cs.IR

    Multimodal Search in Chemical Documents and Reactions

    Authors: Ayush Kumar Shah, Abhisek Dey, Leo Luo, Bryan Amador, Patrick Philippy, Ming Zhong, Siru Ouyang, David Mark Friday, David Bianchi, Nick Jackson, Richard Zanibbi, Jiawei Han

    Abstract: We present a multimodal search tool that facilitates retrieval of chemical reactions, molecular structures, and associated text from scientific literature. Queries may combine molecular diagrams, textual descriptions, and reaction data, allowing users to connect different representations of chemical information. To support this, the indexing process includes chemical diagram extraction and parsing… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Comments: 4 pages, 2 figures, SIGIR 2025 Demonstration Submission

  2. arXiv:2410.18555  [pdf, other

    cs.CV cs.LG

    Local and Global Graph Modeling with Edge-weighted Graph Attention Network for Handwritten Mathematical Expression Recognition

    Authors: Yejing Xie, Richard Zanibbi, Harold Mouchère

    Abstract: In this paper, we present a novel approach to Handwritten Mathematical Expression Recognition (HMER) by leveraging graph-based modeling techniques. We introduce an End-to-end model with an Edge-weighted Graph Attention Mechanism (EGAT), designed to perform simultaneous node and edge classification. This model effectively integrates node and edge features, facilitating the prediction of symbol clas… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

  3. arXiv:2408.13672  [pdf, other

    cs.IR

    ColBERT's [MASK]-based Query Augmentation: Effects of Quadrupling the Query Input Length

    Authors: Ben Giacalone, Richard Zanibbi

    Abstract: A unique aspect of ColBERT is its use of [MASK] tokens in queries to score documents (query augmentation). Prior work shows [MASK] tokens weighting non-[MASK] query terms, emphasizing certain tokens over others , rather than introducing whole new terms as initially proposed. We begin by demonstrating that a term weighting behavior previously reported for [MASK] tokens in ColBERTv1 holds for ColBER… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: 5 pages, 3 figures, two tables

    ACM Class: H.3.3

  4. arXiv:2408.11646  [pdf, other

    cs.IR

    Mathematical Information Retrieval: Search and Question Answering

    Authors: Richard Zanibbi, Behrooz Mansouri, Anurag Agarwal

    Abstract: Mathematical information is essential for technical work, but its creation, interpretation, and search are challenging. To help address these challenges, researchers have developed multimodal search engines and mathematical question answering systems. This book begins with a simple framework characterizing the information tasks that people and systems perform as we work to answer math-related ques… ▽ More

    Submitted 14 January, 2025; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: [DRAFT] Revised (3rd) draft

    ACM Class: H.3.3; H.5.1; H.5.2

  5. arXiv:2408.09283  [pdf, other

    cs.IR

    A Study of PHOC Spatial Region Configurations for Math Formula Retrieval

    Authors: Matt Langsenkamp, Bryan Amador, Richard Zanibbi

    Abstract: A Pyramidal Histogram Of Characters (PHOC) represents the spatial location of symbols as binary vectors. The vectors are composed of levels that split a formula into equal-sized regions of one or more types (e.g., rectangles or ellipses). For each region type, this produces a pyramid of overlapping regions, where the first level contains the entire formula, and the final level the finest-grained r… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  6. ChemScraper: Leveraging PDF Graphics Instructions for Molecular Diagram Parsing

    Authors: Ayush Kumar Shah, Bryan Manrique Amador, Abhisek Dey, Ming Creekmore, Blake Ocampo, Scott Denmark, Richard Zanibbi

    Abstract: Most molecular diagram parsers recover chemical structure from raster images (e.g., PNGs). However, many PDFs include commands giving explicit locations and shapes for characters, lines, and polygons. We present a new parser that uses these born-digital PDF primitives as input. The parsing model is fast and accurate, and does not require GPUs, Optical Character Recognition (OCR), or vectorization.… ▽ More

    Submitted 26 February, 2025; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: 20 pages without references, 12 figures, 4 Tables, submitted to International Conference on Document Analysis and Recognition (ICDAR) - Journal Track

    Journal ref: IJDAR, vol. 27, no. 3, pp. 395-414, Sep. 2024

  7. arXiv:2111.10504  [pdf, other

    cs.IR

    Effects of context, complexity, and clustering on evaluation for math formula retrieval

    Authors: Behrooz Mansouri, Douglas W. Oard, Anurag Agarwal, Richard Zanibbi

    Abstract: There are now several test collections for the formula retrieval task, in which a system's goal is to identify useful mathematical formulae to show in response to a query posed as a formula. These test collections differ in query format, query complexity, number of queries, content source, and relevance definition. Comparisons among six formula retrieval test collections illustrate that defining r… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

  8. arXiv:2003.08005  [pdf, other

    cs.CV

    ScanSSD: Scanning Single Shot Detector for Mathematical Formulas in PDF Document Images

    Authors: Parag Mali, Puneeth Kukkadapu, Mahshad Mahdavi, Richard Zanibbi

    Abstract: We introduce the Scanning Single Shot Detector (ScanSSD) for locating math formulas offset from text and embedded in textlines. ScanSSD uses only visual features for detection: no formatting or typesetting information such as layout, font, or character labels are employed. Given a 600 dpi document page image, a Single Shot Detector (SSD) locates formulas at multiple scales using sliding windows, a… ▽ More

    Submitted 17 March, 2020; originally announced March 2020.

    Comments: 8 pages, 7 figures

  9. arXiv:1912.04115  [pdf, other

    cs.IR

    Query Auto Completion for Math Formula Search

    Authors: Shaurya Rohatgi, Wei Zhong, Richard Zanibbi, Jian Wu, C. Lee Giles

    Abstract: Query Auto Completion (QAC) is among the most appealing features of a web search engine. It helps users formulate queries quickly with less effort. Although there has been much effort in this area for text, to the best of our knowledge there is few work on mathematical formula auto completion. In this paper, we implement 5 existing QAC methods on mathematical formula and evaluate them on the NTCIR… ▽ More

    Submitted 9 December, 2019; originally announced December 2019.

  10. arXiv:1507.06235  [pdf, other

    cs.IR

    The Tangent Search Engine: Improved Similarity Metrics and Scalability for Math Formula Search

    Authors: Richard Zanibbi, Kenny Davila, Andrew Kane, Frank Tompa

    Abstract: With the ever-increasing quantity and variety of data worldwide, the Web has become a rich repository of mathematical formulae. This necessitates the creation of robust and scalable systems for Mathematical Information Retrieval, where users search for mathematical information using individual formulae (query-by-expression) or a combination of keywords and formulae. Often, the pages that best sati… ▽ More

    Submitted 22 July, 2015; originally announced July 2015.

    Comments: 10 pages

    ACM Class: H.2.4; H.3.3; H.3.4

  11. arXiv:1505.02798  [pdf, other

    cs.IR

    Math Search for the Masses: Multimodal Search Interfaces and Appearance-Based Retrieval

    Authors: Richard Zanibbi, Awelemdy Orakwue

    Abstract: We summarize math search engines and search interfaces produced by the Document and Pattern Recognition Lab in recent years, and in particular the min math search interface and the Tangent search engine. Source code for both systems are publicly available. "The Masses" refers to our emphasis on creating systems for mathematical non-experts, who may be looking to define unfamiliar notation, or brow… ▽ More

    Submitted 11 May, 2015; originally announced May 2015.

    Comments: Paper for Invited Talk at 2015 Conference on Intelligent Computer Mathematics (July, Washington DC)

  12. Detecting Figures and Part Labels in Patents: Competition-Based Development of Image Processing Algorithms

    Authors: Christoph Riedl, Richard Zanibbi, Marti A. Hearst, Siyu Zhu, Michael Menietti, Jason Crusan, Ivan Metelsky, Karim R. Lakhani

    Abstract: We report the findings of a month-long online competition in which participants developed algorithms for augmenting the digital version of patent documents published by the United States Patent and Trademark Office (USPTO). The goal was to detect figures and part labels in U.S. patent drawing pages. The challenge drew 232 teams of two, of which 70 teams (30%) submitted solutions. Collectively, tea… ▽ More

    Submitted 11 November, 2014; v1 submitted 24 October, 2014; originally announced October 2014.