Skip to main content

Showing 1–50 of 51 results for author: Schubotz, M

.
  1. An Overview of zbMATH Open Digital Library

    Authors: Madhurima Deb, Isabel Beckenbach, Matteo Petrera, Dariush Ehsani, Marcel Fuhrmann, Yun Hao, Olaf Teschke, Moritz Schubotz

    Abstract: Mathematical research thrives on the effective dissemination and discovery of knowledge. zbMATH Open has emerged as a pivotal platform in this landscape, offering a comprehensive repository of mathematical literature. Beyond indexing and abstracting, it serves as a unified quality-assured infrastructure for finding, evaluating, and connecting mathematical information that advances mathematical r… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

  2. arXiv:2407.06720  [pdf, ps, other

    cs.DL cs.HC

    Author Intent: Eliminating Ambiguity in MathML

    Authors: David Carlisle, Paul Libbrecht, Moritz Schubotz, Neil Soiffer

    Abstract: MathML has been successful in improving the accessibility of mathematical notation on the web. All major screen readers support MathML to generate speech, allow navigation of the math, and generate braille. A troublesome area remains: handling ambiguous notations such as \( \vert x\vert\). While it is possible to speak this syntactically, anecdotal evidence indicates most people prefer semantic sp… ▽ More

    Submitted 23 February, 2025; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: This preprint has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this contribution is published in Int. Conf. on Computers Helping People with Special Needs will be available online at TBD

  3. arXiv:2406.03858  [pdf, ps, other

    cs.IR cs.DL

    Reducing the climate impact of data portals: a case study

    Authors: Noah Gießing, Madhurima Deb, Ankit Satpute, Moritz Schubotz, Olaf Teschke

    Abstract: The carbon footprint share of the information and communication technology (ICT) sector has steadily increased in the past decade and is predicted to make up as much as 23 \% of global emissions in 2030. This shows a pressing need for developers, including the information retrieval community, to make their code more energy-efficient. In this project proposal, we discuss techniques to reduce the en… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 4 pages

  4. arXiv:2404.00344  [pdf, other

    cs.CL cs.AI cs.IR

    Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange

    Authors: Ankit Satpute, Noah Giessing, Andre Greiner-Petter, Moritz Schubotz, Olaf Teschke, Akiko Aizawa, Bela Gipp

    Abstract: Large Language Models (LLMs) have demonstrated exceptional capabilities in various natural language tasks, often achieving performances that surpass those of humans. Despite these advancements, the domain of mathematics presents a distinctive challenge, primarily due to its specialized structure and the precision it demands. In this study, we adopted a two-step approach for investigating the profi… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: Accepted for publication at the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) July 14--18, 2024, Washington D.C.,USA

  5. Taxonomy of Mathematical Plagiarism

    Authors: Ankit Satpute, Andre Greiner-Petter, Noah Gießing, Isabel Beckenbach, Moritz Schubotz, Olaf Teschke, Akiko Aizawa, Bela Gipp

    Abstract: Plagiarism is a pressing concern, even more so with the availability of large language models. Existing plagiarism detection systems reliably find copied and moderately reworded text but fail for idea plagiarism, especially in mathematical science, which heavily uses formal mathematical notation. We make two contributions. First, we establish a taxonomy of mathematical content reuse by annotating… ▽ More

    Submitted 31 May, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: 46th European Conference on Information Retrieval (ECIR)

  6. arXiv:2401.16786  [pdf

    cs.DL

    WikiTexVC: MediaWiki's native LaTeX to MathML converter for Wikipedia

    Authors: Johannes Stegmüller, Moritz Schubotz

    Abstract: MediaWiki and Wikipedia authors usually use LaTeX to define mathematical formulas in the wiki text markup. In the Wikimedia ecosystem, these formulas were processed by a long cascade of web services and finally delivered to users' browsers in rendered form for visually readable representation as SVG. With the latest developments of supporting MathML Core in Chromium-based browsers, MathML contin… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 9 pages

  7. arXiv:2309.11829  [pdf, ps, other

    cs.DL

    Making Mathematical Research Data FAIR: A Technology Overview

    Authors: Tim Conrad, Eloi Ferrer, Daniel Mietchen, Larissa Pusch, Johannes Stegmuller, Moritz Schubotz

    Abstract: The sharing and citation of research data is becoming increasingly recognized as an essential building block in scientific research across various fields and disciplines. Sharing research data allows other researchers to reproduce results, replicate findings, and build on them. Ultimately, this will foster faster cycles in knowledge generation. Some disciplines, such as astronomy or bioinformatics… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  8. arXiv:2309.11484  [pdf, other

    cs.DL cs.IR

    Bravo MaRDI: A Wikibase Powered Knowledge Graph on Mathematics

    Authors: Moritz Schubotz, Eloi Ferrer, Johannes Stegmüller, Daniel Mietchen, Olaf Teschke, Larissa Pusch, Tim OF Conrad

    Abstract: Mathematical world knowledge is a fundamental component of Wikidata. However, to date, no expertly curated knowledge graph has focused specifically on contemporary mathematics. Addressing this gap, the Mathematical Research Data Initiative (MaRDI) has developed a comprehensive knowledge graph that links multimodal research data in mathematics. This encompasses traditional research data items like… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: Accepted at Wikidata'23: Wikidata workshop at ISWC 2023

  9. arXiv:2305.16433  [pdf, other

    cs.CL cs.SC stat.AP

    Neural Machine Translation for Mathematical Formulae

    Authors: Felix Petersen, Moritz Schubotz, Andre Greiner-Petter, Bela Gipp

    Abstract: We tackle the problem of neural machine translation of mathematical formulae between ambiguous presentation languages and unambiguous content languages. Compared to neural machine translation on natural language, mathematical formulae have a much smaller vocabulary and much longer sequences of symbols, while their translation requires extreme precision to satisfy mathematical information needs. In… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: Published at ACL 2023

  10. arXiv:2305.13193  [pdf, other

    cs.IR

    TEIMMA: The First Content Reuse Annotator for Text, Images, and Math

    Authors: Ankit Satpute, André Greiner-Petter, Moritz Schubotz, Norman Meuschke, Akiko Aizawa, Olaf Teschke, Bela Gipp

    Abstract: This demo paper presents the first tool to annotate the reuse of text, images, and mathematical formulae in a document pair -- TEIMMA. Annotating content reuse is particularly useful to develop plagiarism detection algorithms. Real-world content reuse is often obfuscated, which makes it challenging to identify such cases. TEIMMA allows entering the obfuscation type to enable novel classifications… ▽ More

    Submitted 13 June, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  11. Methods and Tools to Advance the Retrieval of Mathematical Knowledge from Digital Libraries for Search-, Recommendation-, and Assistance-Systems

    Authors: Bela Gipp, André Greiner-Petter, Moritz Schubotz, Norman Meuschke

    Abstract: This project investigated new approaches and technologies to enhance the accessibility of mathematical content and its semantic information for a broad range of information retrieval applications. To achieve this goal, the project addressed three main research challenges: (1) syntactic analysis of mathematical expressions, (2) semantic enrichment of mathematical expressions, and (3) evaluation usi… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: The final report for the DFG-Project MathIR - July 1st, 2018 - December 31st, 2022

    Report number: GI 1259-1 ACM Class: H.3.0

  12. Introducing Peer Copy -- A Fully Decentralized Peer-to-Peer File Transfer Tool

    Authors: Dennis Trautwein, Moritz Schubotz, Bela Gipp

    Abstract: It allows any two parties that are either both on the same network or connected via the internet to transfer the contents of a file based on a particular sequence of words. Peer discovery happens via multicast DNS if both peers are on the same network or via entries in the distributed hash table (DHT) of the InterPlanetary File-System (IPFS) if both peers are connected across network boundaries. A… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Journal ref: 2021 IFIP Networking Conference

  13. arXiv:2303.01994  [pdf, other

    cs.IR cs.LG

    Discovery and Recognition of Formula Concepts using Machine Learning

    Authors: Philipp Scharpf, Moritz Schubotz, Howard S. Cohl, Corinna Breitinger, Bela Gipp

    Abstract: Citation-based Information Retrieval (IR) methods for scientific documents have proven effective for IR applications, such as Plagiarism Detection or Literature Recommender Systems in academic disciplines that use many references. In science, technology, engineering, and mathematics, researchers often employ mathematical concepts through formula notation to refer to prior knowledge. Our long-term… ▽ More

    Submitted 19 March, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

    Comments: Accepted by Scientometrics (Springer) journal

    MSC Class: 68P20 (Primary); 68T50 (Secondary) ACM Class: H.3.3; I.2.7

  14. Collaborative and AI-aided Exam Question Generation using Wikidata in Education

    Authors: Philipp Scharpf, Moritz Schubotz, Andreas Spitz, Andre Greiner-Petter, Bela Gipp

    Abstract: Since the COVID-19 outbreak, the use of digital learning or education platforms has significantly increased. Teachers now digitally distribute homework and provide exercise questions. In both cases, teachers need to continuously develop novel and individual questions. This process can be very time-consuming and should be facilitated and accelerated both through exchange with other teachers and by… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    MSC Class: 68Uxx ACM Class: H.4

  15. arXiv:2211.06664  [pdf

    cs.IR

    Mining Mathematical Documents for Question Answering via Unsupervised Formula Labeling

    Authors: Philipp Scharpf, Moritz Schubotz, Bela Gipp

    Abstract: The increasing number of questions on Question Answering (QA) platforms like Math Stack Exchange (MSE) signifies a growing information need to answer math-related questions. However, there is currently very little research on approaches for an open data QA system that retrieves mathematical formulae using their concept names or querying formula identifier relationships from knowledge graphs. In th… ▽ More

    Submitted 12 November, 2022; originally announced November 2022.

    MSC Class: 68Uxx ACM Class: H.4

  16. Caching and Reproducibility: Making Data Science experiments faster and FAIRer

    Authors: Moritz Schubotz, Ankit Satpute, Andre Greiner-Petter, Akiko Aizawa, Bela Gipp

    Abstract: Small to medium-scale data science experiments often rely on research software developed ad-hoc by individual scientists or small teams. Often there is no time to make the research software fast, reusable, and open access. The consequence is twofold. First, subsequent researchers must spend significant work hours building upon the proposed hypotheses or experimental framework. In the worst case, o… ▽ More

    Submitted 9 November, 2022; v1 submitted 8 November, 2022; originally announced November 2022.

    Comments: 8 pages, 1 table

    Journal ref: Frontiers in Research Metrics and Analytics, volume 7, 2022

  17. Design and Evaluation of IPFS: A Storage Layer for the Decentralized Web

    Authors: Dennis Trautwein, Aravindh Raman, Gareth Tyson, Ignacio Castro, Will Scott, Moritz Schubotz, Bela Gipp, Yiannis Psaras

    Abstract: Recent years have witnessed growing consolidation of web operations. For example, the majority of web traffic now originates from a few organizations, and even micro-websites often choose to host on large pre-existing cloud infrastructures. In response to this, the "Decentralized Web" attempts to distribute ownership and operation of web services more evenly. This paper describes the design and im… ▽ More

    Submitted 11 August, 2022; originally announced August 2022.

    Comments: 14 pages, 11 figures

    ACM Class: C.2.2; C.2.1

    Journal ref: SIGCOMM '22, August 22-26, 2022, Amsterdam, Netherlands

  18. arXiv:2205.01058  [pdf, other

    cs.CY physics.data-an

    Electronic Laboratory Notebook: A lazy approach

    Authors: Simon Schubotz, Moritz Schubotz, Günter K Auernhammer

    Abstract: Good research data management is essential in modern-day lab work. Various solutions exist that are either highly specific or need a significant effort to be customized appropriately. This paper presents an integrated solution for individuals and small groups of researchers in data-driven deductive research. Our electronic lab book generates itself out of notes and files, which are generated by on… ▽ More

    Submitted 27 April, 2022; originally announced May 2022.

    Comments: Paper was submitted to jors https://openresearchsoftware.metajnl.com/

  19. Comparative Verification of the Digital Library of Mathematical Functions and Computer Algebra Systems

    Authors: André Greiner-Petter, Howard S. Cohl, Abdou Youssef, Moritz Schubotz, Avi Trost, Rajen Dey, Akiko Aizawa, Bela Gipp

    Abstract: Digital mathematical libraries assemble the knowledge of years of mathematical research. Numerous disciplines (e.g., physics, engineering, pure and applied mathematics) rely heavily on compendia gathered findings. Likewise, modern research applications rely more and more on computational solutions, which are often calculated and verified by computer algebra systems. Hence, the correctness, accurac… ▽ More

    Submitted 31 March, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

    Journal ref: In: TACAS, Apr. 2022, pp. 87-105

  20. arXiv:2112.08110  [pdf

    cs.DB

    Academic Storage Cluster

    Authors: Alexander von Tottleben, Cornelius Ihle, Moritz Schubotz, Bela Gipp

    Abstract: Decentralized storage is still rarely used in an academic and educational environment, although it offers better availability than conventional systems. It still happens that data is not available at a certain time due to heavy load or maintenance on university servers. A decentralized solution can help keep the data available and distribute the load among several peers. In our experiment, we crea… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Comments: 2 pages, 2 figures, Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), poster paper,

    ACM Class: H.3.2; H.3.7; E.2

  21. Detecting Cross-Language Plagiarism using Open Knowledge Graphs

    Authors: Johannes Stegmüller, Fabian Bauer-Marquart, Norman Meuschke, Terry Ruas, Moritz Schubotz, Bela Gipp

    Abstract: Identifying cross-language plagiarism is challenging, especially for distant language pairs and sense-for-sense translations. We introduce the new multilingual retrieval model Cross-Language Ontology-Based Similarity Analysis (CL-OSA) for this task. CL-OSA represents documents as entity vectors obtained from the open knowledge graph Wikidata. Opposed to other methods, CL-OSA does not require compu… ▽ More

    Submitted 16 December, 2021; v1 submitted 18 November, 2021; originally announced November 2021.

    Comments: 10 pages, EEKE21, Preprint

  22. Automated Symbolic and Numerical Testing of DLMF Formulae using Computer Algebra Systems

    Authors: Howard S. Cohl, André Greiner-Petter, Moritz Schubotz

    Abstract: We have developed an automated procedure for symbolic and numerical testing of formulae extracted from the NIST Digital Library of Mathematical Functions (DLMF). For the NIST Digital Repository of Mathematical Formulae, we have developed conversion tools from semantic LaTeX to the Computer Algebra System (CAS) Maple which relies on Youssef's part-of-math tagger. We convert a test data subset of 4,… ▽ More

    Submitted 18 September, 2021; originally announced September 2021.

    Comments: Appeared in the Proceedings of the 11th International Conference on Intelligent Computer Mathematics (CICM) 2018

  23. Semantic Preserving Bijective Mappings of Mathematical Formulae between Document Preparation Systems and Computer Algebra Systems

    Authors: Howard S. Cohl, Moritz Schubotz, Abdou Youssef, André Greiner-Petter, Jürgen Gerhard, Bonita V. Saunders, Marjorie A. ~McClain

    Abstract: Document preparation systems like LaTeX offer the ability to render mathematical expressions as one would write these on paper. Using LaTeX, LaTeXML, and tools generated for use in the National Institute of Standards (NIST) Digital Library of Mathematical Functions, semantically enhanced mathematical LaTeX markup (semantic LaTeX) is achieved by using a semantic macro set. Computer algebra systems… ▽ More

    Submitted 17 September, 2021; originally announced September 2021.

    Comments: Proceedings of the 10th International Conference on Intelligent Computer Mathematics (CICM)

  24. MathTools: An Open API for Convenient MathML Handling

    Authors: André Greiner-Petter, Moritz Schubotz, Howard S. Cohl, Bela Gipp

    Abstract: Mathematical formulae carry complex and essential semantic information in a variety of formats. Accessing this information with different systems requires a standardized machine-readable format that is capable of encoding presentational and semantic information. Even though MathML is an official recommendation by W3C and an ISO standard for representing mathematical expressions, we could identify… ▽ More

    Submitted 17 September, 2021; originally announced September 2021.

    Comments: Published in Proceedings of the International Conference on Intelligent Computer Mathematics (CICM) 2018

  25. arXiv:2109.00954  [pdf, other

    cs.IR

    Towards Explaining STEM Document Classification using Mathematical Entity Linking

    Authors: Philipp Scharpf, Moritz Schubotz, Bela Gipp

    Abstract: Document subject classification is essential for structuring (digital) libraries and allowing readers to search within a specific field. Currently, the classification is typically made by human domain experts. Semi-supervised Machine Learning algorithms can support them by exploiting the labeled data to predict subject classes for unclassified new documents. However, while humans partly do, machin… ▽ More

    Submitted 2 September, 2021; originally announced September 2021.

  26. arXiv:2107.13877  [pdf, ps, other

    cs.DL math.HO

    10 Years Later: The Mathematics Subject Classification and Linked Open Data

    Authors: Susanne Arndt, Patrick Ion, Mila Runnwerth, Moritz Schubotz, Olaf Teschke

    Abstract: Ten years ago, the Mathematics Subject Classification MSC 2010 was released, and a corresponding machine-readable Linked Open Data collection was published using the Simple Knowledge Organization System (SKOS). Now, the new MSC 2020 is out. This paper recaps the last ten years of working on machine-readable MSC data and presents the new machine-readable MSC 2020. We describe the processing requi… ▽ More

    Submitted 2 August, 2021; v1 submitted 29 July, 2021; originally announced July 2021.

    Comments: Extended version of the CICM article

    MSC Class: 00-01 ACM Class: G.m; E.m

  27. arXiv:2106.04664  [pdf, other

    cs.DL

    zbMATH Open: API Solutions and Research Challenges

    Authors: Matteo Petrera, Dennis Trautwein, Isabel Beckenbach, Dariush Ehsani, Fabian Mueller, Olaf Teschke, Bela Gipp, Moritz Schubotz

    Abstract: We present zbMATH Open, the most comprehensive collection of reviews and bibliographic metadata of scholarly literature in mathematics. Besides our website https://zbMATH.org which is openly accessible since the beginning of this year, we provide API endpoints to offer our data. The API improves interoperability with others, i.e., digital libraries, and allows using our data for research purposes.… ▽ More

    Submitted 23 June, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

  28. arXiv:2104.05111  [pdf, other

    cs.DL

    Fast Linking of Mathematical Wikidata Entities in Wikipedia Articles Using Annotation Recommendation

    Authors: Philipp Scharpf, Moritz Schubotz, Bela Gipp

    Abstract: Mathematical information retrieval (MathIR) applications such as semantic formula search and question answering systems rely on knowledge-bases that link mathematical expressions to their natural language names. For database population, mathematical formulae need to be annotated and linked to semantic concepts, which is very time-consuming. In this paper, we present our approach to structure and s… ▽ More

    Submitted 11 April, 2021; originally announced April 2021.

  29. arXiv:2012.02413  [pdf

    cs.DL

    ARQMath Lab: An Incubator for Semantic Formula Search in zbMATH Open?

    Authors: Philipp Scharpf, Moritz Schubotz, Andre Greiner-Petter, Malte Ostendorff, Olaf Teschke, Bela Gipp

    Abstract: The zbMATH database contains more than 4 million bibliographic entries. We aim to provide easy access to these entries. Therefore, we maintain different index structures, including a formula index. To optimize the findability of the entries in our database, we continuously investigate new approaches to satisfy the information needs of our users. We believe that the findings from the ARQMath evalua… ▽ More

    Submitted 10 December, 2020; v1 submitted 4 December, 2020; originally announced December 2020.

    Comments: in Working Notes of {CLEF} 2020 - Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, September 22-25, 2020 http://ceur-ws.org/Vol-2696/paper_200.pdf

  30. AutoMSC: Automatic Assignment of Mathematics Subject Classification Labels

    Authors: Moritz Schubotz, Philipp Scharpf, Olaf Teschke, Andreas Kuehnemund, Corinna Breitinger, Bela Gipp

    Abstract: Authors of research papers in the fields of mathematics, and other math-heavy disciplines commonly employ the Mathematics Subject Classification (MSC) scheme to search for relevant literature. The MSC is a hierarchical alphanumerical classification scheme that allows librarians to specify one or multiple codes for publications. Digital Libraries in Mathematics, as well as reviewing services, such… ▽ More

    Submitted 9 November, 2020; v1 submitted 25 May, 2020; originally announced May 2020.

    Journal ref: Intelligent Computer Mathematics - 13thInternational Conference, {CICM} 2020, Bertinoro, Italy, July 26-31, 2020, Proceedings

  31. arXiv:2005.11504  [pdf, ps, other

    cs.CR cs.DL cs.IR

    A First Step Towards Content Protecting Plagiarism Detection

    Authors: Cornelius Ihle, Moritz Schubotz, Norman Meuschke, Bela Gipp

    Abstract: Plagiarism detection systems are essential tools for safeguarding academic and educational integrity. However, today's systems require disclosing the full content of the input documents and the document collection to which the input documents are compared. Moreover, the systems are centralized and under the control of individual, typically commercial providers. This situation raises procedural and… ▽ More

    Submitted 23 May, 2020; originally announced May 2020.

    Comments: Submitted to JCDL 2020: Proceedings of the ACM/ IEEE Joint Conference on Digital Libraries in 2020 (JCDL '20), August 1-5, 2020, Virtual Event, China

    ACM Class: H.3.0; H.3.4; H.3.7

  32. arXiv:2005.11021  [pdf, other

    cs.DL cs.CL cs.IR cs.LG

    Classification and Clustering of arXiv Documents, Sections, and Abstracts, Comparing Encodings of Natural and Mathematical Language

    Authors: Philipp Scharpf, Moritz Schubotz, Abdou Youssef, Felix Hamborg, Norman Meuschke, Bela Gipp

    Abstract: In this paper, we show how selecting and combining encodings of natural and mathematical language affect classification and clustering of documents with mathematical content. We demonstrate this by using sets of documents, sections, and abstracts from the arXiv preprint server that are labeled by their subject class (mathematics, computer science, physics, etc.) to compare different encodings of t… ▽ More

    Submitted 22 May, 2020; originally announced May 2020.

    Journal ref: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries JCDL 2020

  33. arXiv:2003.09881  [pdf, other

    cs.DL cs.CL cs.IR

    Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles

    Authors: Malte Ostendorff, Terry Ruas, Moritz Schubotz, Georg Rehm, Bela Gipp

    Abstract: Many digital libraries recommend literature to their users considering the similarity between a query document and their repository. However, they often fail to distinguish what is the relationship that makes two documents alike. In this paper, we model the problem of finding the relationship between two documents as a pairwise document classification task. To find the semantic relation between do… ▽ More

    Submitted 22 March, 2020; originally announced March 2020.

    Comments: Accepted at ACM/IEEE Joint Conference on Digital Libraries (JCDL 2020)

  34. Mathematical Formulae in Wikimedia Projects 2020

    Authors: Moritz Schubotz, André Greiner-Petter, Norman Meuschke, Olaf Teschke, Bela Gipp

    Abstract: This poster summarizes our contributions to Wikimedia's processing pipeline for mathematical formulae. We describe how we have supported the transition from rendering formulae as course-grained PNG images in 2001 to providing modern semantically enriched language-independent MathML formulae in 2020. Additionally, we describe our plans to improve the accessibility and discoverability of mathematica… ▽ More

    Submitted 6 May, 2020; v1 submitted 20 March, 2020; originally announced March 2020.

    Comments: Submitted to JCDL 2020: Proceedings of the ACM/ IEEE Joint Conference on Digital Libraries in 2020 (JCDL '20), August 1-5, 2020, Virtual Event, China

  35. Discovering Mathematical Objects of Interest -- A Study of Mathematical Notations

    Authors: Andre Greiner-Petter, Moritz Schubotz, Fabian Mueller, Corinna Breitinger, Howard S. Cohl, Akiko Aizawa, Bela Gipp

    Abstract: Mathematical notation, i.e., the writing system used to communicate concepts in mathematics, encodes valuable information for a variety of information search and retrieval systems. Yet, mathematical notations remain mostly unutilized by today's systems. In this paper, we present the first in-depth study on the distributions of mathematical notation in two large scientific corpora: the open access… ▽ More

    Submitted 22 June, 2021; v1 submitted 7 February, 2020; originally announced February 2020.

    Comments: Proceedings of The Web Conference 2020 (WWW'20), April 20--24, 2020, Taipei, Taiwan

  36. arXiv:1909.10266  [pdf

    cs.IR

    NewsDeps: Visualizing the Origin of Information in News Articles

    Authors: Felix Hamborg, Philipp Meschenmoser, Moritz Schubotz, Bela Gipp

    Abstract: In scientific publications, citations allow readers to assess the authenticity of the presented information and verify it in the original context. News articles, however, do not contain citations and only rarely refer readers to further sources. Readers often cannot assess the authenticity of the presented information as its origin is unclear. We present NewsDeps, the first approach that analyzes… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.

  37. arXiv:1907.01642  [pdf, other

    cs.IR cs.CL cs.DL

    Introducing MathQA -- A Math-Aware Question Answering System

    Authors: Moritz Schubotz, Philipp Scharpf, Kaushal Dudhat, Yash Nagar, Felix Hamborg, Bela Gipp

    Abstract: We present an open source math-aware Question Answering System based on Ask Platypus. Our system returns as a single mathematical formula for a natural language question in English or Hindi. This formulae originate from the knowledge-base Wikidata. We translate these formulae to computable data by integrating the calculation engine sympy into our system. This way, users can enter numeric values fo… ▽ More

    Submitted 28 June, 2019; originally announced July 2019.

    Comments: Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), Workshop on Knowledge Discovery (2018)

  38. Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations

    Authors: Norman Meuschke, Vincent Stange, Moritz Schubotz, Michael Karmer, Bela Gipp

    Abstract: Identifying academic plagiarism is a pressing task for educational and research institutions, publishers, and funding agencies. Current plagiarism detection systems reliably find instances of copied and moderately reworded text. However, reliably detecting concealed plagiarism, such as strong paraphrases, translations, and the reuse of nontextual content and ideas is an open research problem. In t… ▽ More

    Submitted 27 June, 2019; originally announced June 2019.

    Comments: Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL) 2019. The data and code of our study are openly available at https://purl.org/hybridPD

  39. Semantic Preserving Bijective Mappings for Expressions involving Special Functions in Computer Algebra Systems and Document Preparation Systems

    Authors: Andre Greiner-Petter, Moritz Schubotz, Howard S. Cohl, Bela Gipp

    Abstract: Purpose: Modern mathematicians and scientists of math-related disciplines often use Document Preparation Systems (DPS) to write and Computer Algebra Systems (CAS) to calculate mathematical expressions. Usually, they translate the expressions manually between DPS and CAS. This process is time-consuming and error-prone. Our goal is to automate this translation. This paper uses Maple and Mathematica… ▽ More

    Submitted 27 June, 2019; originally announced June 2019.

    Comments: This work was supported by the German Research Foundation (DFG, grant GI-1259-1)

  40. arXiv:1905.08359  [pdf, other

    cs.DL cs.AI cs.IR

    Why Machines Cannot Learn Mathematics, Yet

    Authors: André Greiner-Petter, Terry Ruas, Moritz Schubotz, Akiko Aizawa, William Grosky, Bela Gipp

    Abstract: Nowadays, Machine Learning (ML) is seen as the universal solution to improve the effectiveness of information retrieval (IR) methods. However, while mathematics is a precise and accurate science, it is usually expressed by less accurate and imprecise descriptions, contributing to the relative dearth of machine learning applications for IR in this domain. Generally, mathematical documents communica… ▽ More

    Submitted 20 May, 2019; originally announced May 2019.

    Comments: Submitted to 4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries colocated at the 42nd International ACM SIGIR Conference

    Journal ref: 2019 http://ceur-ws.org/Vol-2414/paper14.pdf

  41. Forms of Plagiarism in Digital Mathematical Libraries

    Authors: Moritz Schubotz, Olaf Teschke, Vincent Stange, Norman Meuschke, Bela Gipp

    Abstract: We report on an exploratory analysis of the forms of plagiarism observable in mathematical publications, which we identified by investigating editorial notes from zbMATH. While most cases we encountered were simple copies of earlier work, we also identified several forms of disguised plagiarism. We investigated 11 cases in detail and evaluate how current plagiarism detection systems perform in ide… ▽ More

    Submitted 9 September, 2019; v1 submitted 8 May, 2019; originally announced May 2019.

    Journal ref: Intelligent Computer Mathematics - 12th International Conference, {CICM} 2019, Prague, Czech Republic, July 8-12, 2019, Proceedings

  42. arXiv:1904.00237  [pdf, other

    cs.CY

    A decentralized method for making sensor measurements tamper-proof to support open science applications

    Authors: Patrick Wortner, Moritz Schubotz, Corinna Breitinger, Stephan Leible, Bela Gipp

    Abstract: Open science has become a synonym for modern, digital and inclusive science. Inclusion does not stop at open access. Inclusion also requires transparency through open datasets and the right and ability to take part in the knowledge creation process. This implies new challenges for digital libraries. Citizens should be able to contribute data in a curatable form to advance science. At the same time… ▽ More

    Submitted 2 April, 2019; v1 submitted 30 March, 2019; originally announced April 2019.

  43. arXiv:1811.04234  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Towards Formula Translation using Recursive Neural Networks

    Authors: Felix Petersen, Moritz Schubotz, Bela Gipp

    Abstract: While it has become common to perform automated translations on natural language, performing translations between different representations of mathematical formulae has thus far not been possible. We implemented the first translator for mathematical formulae based on recursive neural networks. We chose recursive neural networks because mathematical formulae inherently include a structural encoding… ▽ More

    Submitted 10 November, 2018; originally announced November 2018.

    Comments: 11 pages, Work-in-Progress paper in CICM-WS 2018 Workshop Papers at 11th Conference on Intelligent Computer Mathematics CICM 2018

    Journal ref: Conference on Intelligent Computer Mathematics (CICM) 2018, CEUR-WS Vol-2307, WiP3

  44. Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context

    Authors: Moritz Schubotz, Andre Greiner-Petter, Philipp Scharpf, Norman Meuschke, Howard Cohl, Bela Gipp

    Abstract: Mathematical formulae represent complex semantic information in a concise form. Especially in Science, Technology, Engineering, and Mathematics, mathematical formulae are crucial to communicate information, e.g., in scientific papers, and to perform computations using computer algebra systems. Enabling computers to access the information encoded in mathematical formulae requires machine-readable f… ▽ More

    Submitted 13 April, 2018; originally announced April 2018.

    Comments: 10 pages, 4 figures

    Journal ref: Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), Jun. 2018, Fort Worth, USA

  45. VMEXT: A Visualization Tool for Mathematical Expression Trees

    Authors: Moritz Schubotz, Norman Meuschke, Thomas Hepp, Howard S. Cohl, Bela Gipp

    Abstract: Mathematical expressions can be represented as a tree consisting of terminal symbols, such as identifiers or numbers (leaf nodes), and functions or operators (non-leaf nodes). Expression trees are an important mechanism for storing and processing mathematical expressions as well as the most frequently used visualization of the structure of mathematical expressions. Typically, researchers and pract… ▽ More

    Submitted 12 July, 2017; originally announced July 2017.

    Comments: 15 pages, 4 figures, Intelligent Computer Mathematics - 10th International Conference CICM 2017, Edinburgh, UK, July 17-21, 2017, Proceedings

    ACM Class: H.3.3; H.5.2

    Journal ref: Lecture Notes in Computer Science, Springer 2017

  46. arXiv:1505.01431  [pdf, ps, other

    cs.DL cs.IR

    Growing the Digital Repository of Mathematical Formulae with Generic LaTeX Sources

    Authors: Howard S. Cohl, Moritz Schubotz, Marjorie A. McClain, Bonita V. Saunders, Cherry Y. Zou, Azeem S. Mohammed, Alex A. Danoff

    Abstract: One initial goal for the DRMF is to seed our digital compendium with fundamental orthogonal polynomial formulae. We had used the data from the NIST Digital Library of Mathematical Functions (DLMF) as initial seed for our DRMF project. The DLMF input LaTeX source already contains some semantic information encoded using a highly customized set of semantic LaTeX macros. Those macros could be converte… ▽ More

    Submitted 10 May, 2015; v1 submitted 6 May, 2015; originally announced May 2015.

    Comments: I included an extra unrelated png file in the zip directory and it was falsely mentioned on a page 9. Previously I tried unsuccessfully to fix this. I removed the png file and now it is only 8 pages how it should be

  47. arXiv:1407.0167  [pdf, other

    cs.DL cs.CL cs.IR

    Mathematical Language Processing Project

    Authors: Robert Pagael, Moritz Schubotz

    Abstract: In natural language, words and phrases themselves imply the semantics. In contrast, the meaning of identifiers in mathematical formulae is undefined. Thus scientists must study the context to decode the meaning. The Mathematical Language Processing (MLP) project aims to support that process. In this paper, we compare two approaches to discover identifier-definition tuples. At first we use a simple… ▽ More

    Submitted 1 July, 2014; originally announced July 2014.

    Comments: 8 pages, one figure, Conferences on Intelligent Computer Mathematics (CICM) 2014

    Report number: urn:nbn:de:0074-1186-1

  48. Digital Repository of Mathematical Formulae

    Authors: Howard S. Cohl, Marjorie A. McClain, Bonita V. Saunders, Moritz Schubotz, Janelle C. Williams

    Abstract: The purpose of the NIST Digital Repository of Mathematical Formulae (DRMF) is to create a digital compendium of mathematical formulae for orthogonal polynomials and special functions (OPSF) and of associated mathematical data. The DRMF addresses needs of working mathematicians, physicists and engineers: providing a platform for publication and interaction with OPSF formulae on the web. Using Media… ▽ More

    Submitted 14 May, 2014; v1 submitted 25 April, 2014; originally announced April 2014.

  49. Mathoid: Robust, Scalable, Fast and Accessible Math Rendering for Wikipedia

    Authors: Moritz Schubotz, Gabriel Wicke

    Abstract: Wikipedia is the first address for scientists who want to recap basic mathematical and physical laws and concepts. Today, formulae in those pages are displayed as Portable Network Graphics images. Those images do not integrate well into the text, can not be edited after copying, are inaccessible to screen readers for people with special needs, do not support line breaks for small screens and do no… ▽ More

    Submitted 24 April, 2014; originally announced April 2014.

    Comments: 12 pages, accepted at Conferences on Intelligent Computer Mathematics CICM2014

  50. arXiv:1304.5475  [pdf, other

    cs.DL

    Making Math Searchable in Wikipedia

    Authors: Moritz Schubotz

    Abstract: Wikipedia, the world largest encyclopedia contains a lot of knowledge that is expressed as formulae exclusively. Unfortunately, this knowledge is currently not fully accessible by intelligent information retrieval systems. This immense body of knowledge is hidden form value-added services, such as search. In this paper, we present our MathSearch implementation for Wikipedia that enables users to p… ▽ More

    Submitted 19 April, 2013; originally announced April 2013.

    Comments: 7 pages, 2 figures, Conference on Intelligent Computer Mathematics, July 9-14 2012, Bremen, Germany. To be published in Lecture Notes, Artificial Intelligence, Springer

    MSC Class: 68T10