Difficulty-controllable question generation over knowledge graphs: A counterfactual reasoning approach
Difficulty-controllable question generation (DCQG) over knowledge graphs aims to generate questions with a given subgraph and a difficulty label, such as “easy” or “hard.” However, three significant challenges currently confront DCQG: (1) limited modes for modeling difficulty, (2) the inability to e...
Gespeichert in:
Veröffentlicht in: | Information processing & management 2024-07, Vol.61 (4), p.103721, Article 103721 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Difficulty-controllable question generation (DCQG) over knowledge graphs aims to generate questions with a given subgraph and a difficulty label, such as “easy” or “hard.” However, three significant challenges currently confront DCQG: (1) limited modes for modeling difficulty, (2) the inability to ensure causality between difficulty labels and generated outcomes, and (3) lack of difficulty-annotated datasets. To overcome these challenges, we present DiffQG, a DCQG model that uses soft templates and counterfactual reasoning. DiffQG utilizes a mixture of experts as soft template selectors to enhance the diversity of difficulty representation. Soft templates can efficiently capture the similarity among questions of different difficulties, avoiding the need for constructing explicit templates. A disentanglement module is introduced to separate triple representations in the input subgraph that are pertinent and extraneous to the current question’s difficulty. Disentanglement minimizes the interference of irrelevant information on the generated output in neural networks due to entanglement. More importantly, disentangled representations enable the model to create training samples for counterfactual reasoning, strengthening causality between inputs and outputs. Additionally, we propose a question difficulty estimation method that simultaneously considers the input subgraph, question, and answering process. Extensive experiments reveal that our model can successfully generate questions at desired difficulty levels, surpassing the baselines by at least 8% in terms of difficulty control. Furthermore, DiffQG exhibits superior generalizability and interpretability.
•We design a framework for difficulty controllable question generation over KGs based on counterfactual reasoning, modeling the causality between input and output.•A novel difficulty modeling method based on mixture soft templates. The proposed model dynamically selects the appropriate soft template according to the given input.•We design an automatic question difficulty estimation method by analyzing subgraphs, questions, and answering patterns, focusing on entity attributes and textual features.•Comprehensive experiments demonstrate our method’s superior performance in controlling question difficulty and generating diverse question. |
---|---|
ISSN: | 0306-4573 |
DOI: | 10.1016/j.ipm.2024.103721 |