Discovery and Recognition of Formula Concepts using Machine Learning
Citation-based Information Retrieval (IR) methods for scientific documents have proven effective for IR applications, such as Plagiarism Detection or Literature Recommender Systems in academic disciplines that use many references. In science, technology, engineering, and mathematics, researchers oft...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Citation-based Information Retrieval (IR) methods for scientific documents
have proven effective for IR applications, such as Plagiarism Detection or
Literature Recommender Systems in academic disciplines that use many
references. In science, technology, engineering, and mathematics, researchers
often employ mathematical concepts through formula notation to refer to prior
knowledge. Our long-term goal is to generalize citation-based IR methods and
apply this generalized method to both classical references and mathematical
concepts. In this paper, we suggest how mathematical formulas could be cited
and define a Formula Concept Retrieval task with two subtasks: Formula Concept
Discovery (FCD) and Formula Concept Recognition (FCR). While FCD aims at the
definition and exploration of a 'Formula Concept' that names bundled equivalent
representations of a formula, FCR is designed to match a given formula to a
prior assigned unique mathematical concept identifier. We present machine
learning-based approaches to address the FCD and FCR tasks. We then evaluate
these approaches on a standardized test collection (NTCIR arXiv dataset). Our
FCD approach yields a precision of 68% for retrieving equivalent
representations of frequent formulas and a recall of 72% for extracting the
formula name from the surrounding text. FCD and FCR enable the citation of
formulas within mathematical documents and facilitate semantic search and
question answering as well as document similarity assessments for plagiarism
detection or recommender systems. |
---|---|
DOI: | 10.48550/arxiv.2303.01994 |