Loading jsMath...
Weizmann Logo
ECCC
Electronic Colloquium on Computational Complexity

Under the auspices of the Computational Complexity Foundation (CCF)

Login | Register | Classic Style



REPORTS > DETAIL:

Revision(s):

Revision #2 to TR24-098 | 7th June 2024 02:18

Models That Prove Their Own Correctness

RSS-Feed




Revision #2
Authors: Noga Amit, Shafi Goldwasser, Orr Paradise, Guy Rothblum
Accepted on: 7th June 2024 02:18
Downloads: 254
Keywords: 


Abstract:

How can we trust the correctness of a learned model on a particular input of interest? Model accuracy is typically measured *on average* over a distribution of inputs, giving no guarantee for any fixed input. This paper proposes a theoretically-founded solution to this problem: to train *Self-Proving models* that prove the correctness of their output to a verification algorithm V via an Interactive Proof.

Self-Proving models satisfy that, with high probability over an input sampled from a given distribution, the model generates a correct output *and* successfully proves its correctness to V. The *soundness* property of V guarantees that, for *every* input, no model can convince V of the correctness of an incorrect output. Thus, a Self-Proving model proves correctness of most of its outputs, while *all* incorrect outputs (of any model) are detected by V. We devise a generic methods for learning Self-Proving models, and prove its convergence under certain assumptions.

The theoretical framework and results are complemented by experiments on an arithmetic capability: computing the greatest common divisor (GCD) of two integers. Our learning method is used to train a Self-Proving transformer that computes the GCD *and* proves the correctness of its answer.



Changes to previous version:

Fixed author listing.


Revision #1 to TR24-098 | 7th June 2024 02:15

Models That Prove Their Own Correctness





Revision #1
Authors: Noga Amit, Orr Paradise, Guy Rothblum, shafi goldwasser
Accepted on: 7th June 2024 02:15
Downloads: 149
Keywords: 


Abstract:

How can we trust the correctness of a learned model on a particular input of interest? Model accuracy is typically measured *on average* over a distribution of inputs, giving no guarantee for any fixed input.
This paper proposes a theoretically-founded solution to this problem: to train *Self-Proving models* that prove the correctness of their output to
a verification algorithm V via an Interactive Proof.

Self-Proving models satisfy that, with high probability over an input sampled from a given distribution, the model generates a correct output *and* successfully proves its correctness to V\!. The *soundness* property of V guarantees that, for *every* input, no model can convince V of the correctness of an incorrect output. Thus, a
Self-Proving model proves correctness of most of its outputs, while *all* incorrect outputs (of any model) are detected by V. We devise a generic methods for learning Self-Proving models, and prove its convergence under certain assumptions.

The theoretical framework and results are complemented by experiments on an arithmetic capability: computing the greatest common divisor (GCD) of two integers. Our learning method is used to train a Self-Proving transformer that computes the GCD *and* proves the correctness of its answer.



Changes to previous version:

Corrected the statement of Theorem 4.1 and a typo in its proof, and added several clarifications:
- Added Table 1 and "Scope" paragraph to page 2.
- Added Remark 3.5.
- Improved Figure 2.
- Added Appendix B.
- Merged Appendices D and E.


Paper:

TR24-098 | 26th May 2024 18:16

Models That Prove Their Own Correctness


Abstract:

How can we trust the correctness of a learned model on a particular input of interest? Model accuracy is typically measured on\ average over a distribution of inputs, giving no guarantee for any fixed input. This paper proposes a theoretically-founded solution to this problem: to train Self-Proving\ models that prove the correctness of their output to a verification algorithm V via an Interactive Proof.

Self-Proving models satisfy that, with high probability over a random input, the model generates a correct output and successfully proves its correctness to V\!. The soundness property of V guarantees that, for every input, no model can convince V of the correctness of an incorrect output. Thus, a Self-Proving model proves correctness of most of its outputs, while all incorrect outputs (of any model) are detected by V. We devise a generic method for learning Self-Proving models, and we prove convergence bounds under certain assumptions.

The theoretical framework and results are complemented by experiments on an arithmetic capability: computing the greatest common divisor (GCD) of two integers. Our learning method is used to train a Self-Proving transformer that computes the GCD and proves the correctness of its answer.



ISSN 1433-8092 | Imprint