Human brain is a dopamine-based prediction machine, and learning can be seen as a process of creating goal-oriented prediction models. Thus, how do we formalize learning in uncertainty, where we cannot be sure our predictions would be useful in an evolving world? Leslie Valiant proposed one of the earliest such mathematical framework: PAC Learning.
PAC Learning refers to Probably Approximately Correct Learning: with high probability (Probably), a limited-goal-oriented-learner predicting (Learning) outcomes not too far off (Approximately Correct) from consistent experience of an objective reality. Now formalize these terms in the sentence: (1) Objective reality (2) Consistent experience (3) Limited-goal-oriented-learner (4) Far off (5) High probability.
(1) Objective reality: We assume Objective reality exists independently from experience, and not vice versa. We define Reality with two abstractions: a set (instance or input space) and a set (concept class) of set (concept). These two abstractions allow arbitrary knowledge representation and thus the possibility of learning and goal-seeking in it.
Specifically, is a set of encoding of objects in learner’s world, is subset of that positively exemplify certain simple or interesting rule in , and is a set of subset over , expressing some general knowledge representation. For example, say a learner is solely interested in learning people’s height, then would be the the set of positive real number, would be a rule mapping a group’s height range of to 1 and its complement to 0, and expresses the general knowledge of groups’ height ranges.
(2) Consistent Experience: It refers to tuples of the form with unknown target concept from , and drawn from with fixed target distribution . These tuples are generated from procedure that runs in unit time. The fixing of distribution makes the experience consistent, and such experience transcends subjectivity, since it does not depend on a learner.
(3) Limited-goal-oriented-learner: Such a learner receives units of consistent experience bounded by its running time. The goal of the learner is to utilize its received experience to mimic with its learned prediction , assuming . Quantitatively, the goal of the learner is to minimize . This learner aims to learn some fundamental rule of the reality by mimicking it, using consistent experience (intrinsically uncertain) limited by its running time.
(4) Far off: The learner’s predictions are not too far off if .
(5) High Probability: Probability of the learner’s predictions not too far off is above a threshold: . The source of randomness for probability comes both from and Learner’s possible randomization in its learning algorithm.
This formalization presumes metaphysical predispositions that, by themselves are good answers to some epistemological inquiries. This is the preliminary definition of PAC Learning:
Definition 1 Let be a concept class over . We say that is PAC learnable if there exists an algorithm with the following property: for every concept of , for every distribution on , and for all and , if is given access to and inputs and , then with probability at least , outputs a hypothesis concept of satisfying . This probability is taken over the random examples drawn by calls to , and any internal randomization of .
If runs in time polynomial in and , we say that is efficiently PAC learnable. We will sometimes refer to the input as the error parameter and as the confidence parameter.
— An Introduction to Computational Learning Theory by Michael J. Kearns and Umesh V. Vazirani
A couple important refinements to this preliminary definition can be made in the next post. We will introduce some toy problems below using definition limited in this post.