Hint learning-based semi-structured webpage attribute value extraction method and system

The invention discloses a semi-structured webpage attribute value extraction method and system based on prompt learning, and relates to the field of Internet, first, according to a DOM tree simplification algorithm, a DOM tree visual angle prompt of a variable node is retrieved, then a task template...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: CAO CONG, CAO YANAN, LI BAOKE, FENG JIALI, LU YUHAI, YUAN FANGFANG, LIU YANBING
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a semi-structured webpage attribute value extraction method and system based on prompt learning, and relates to the field of Internet, first, according to a DOM tree simplification algorithm, a DOM tree visual angle prompt of a variable node is retrieved, then a task template containing task description is designed to obtain template visual angle prompt information, and the template visual angle prompt information is extracted; and finally, introducing a pre-training language model based on an encoder-decoder structure, taking'prompt 'as core operation, comprehensively analyzing domain data characteristics and target task characteristics, designing prompt information of two visual angles, and filling and fusing the double-visual-angle prompt information through a template, so as to obtain a target object. The pre-training language model is jointly guided to perform task learning on a semantic level and a task level in a prompt learning mode, so that effective combination of the pre-tra