Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision
Process supervision, using a trained verifier to evaluate the intermediate steps generated by a reasoner, has demonstrated significant improvements in multi-step problem solving. In this paper, to avoid the expensive effort of human annotation on the verifier training data, we introduce Model-induce...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Process supervision, using a trained verifier to evaluate the intermediate
steps generated by a reasoner, has demonstrated significant improvements in
multi-step problem solving. In this paper, to avoid the expensive effort of
human annotation on the verifier training data, we introduce Model-induced
Process Supervision (MiPS), a novel method for automating data curation. MiPS
annotates an intermediate step by sampling completions of this solution through
the reasoning model, and obtaining an accuracy defined as the proportion of
correct completions. Inaccuracies of the reasoner would cause MiPS
underestimating the accuracy of intermediate steps, therefore, we suggest and
empirically show that verification focusing on high predicted scores of the
verifier shall be preferred over that of low predicted scores, contrary to
prior observations on human curated data. Our approach significantly improves
the performance of PaLM 2 on math and coding tasks (accuracy +0.67% on GSM8K,
+4.16% on MATH, +0.92% on MBPP compared with an output supervision trained
verifier). Additionally, our study demonstrates that the verifier exhibits
strong generalization ability across different reasoning models. |
---|---|
DOI: | 10.48550/arxiv.2402.02658 |