Introduction
Human brain is a dopamine-based prediction machine, and learning can be seen as a process of creating goal-oriented prediction models. Thus, how do we formalize learning in uncertainty, where we cannot be sure our predictions would be useful in an evolving world? Leslie Valiant proposed one of the earliest such mathematical framework: PAC Learning.
Formulation
PAC Learning refers to Probably Approximately Correct Learning: with high probability (Probably), a limited-goal-oriented-learner predicting (Learning) outcomes not too far off (Approximately Correct) from consistent experience of an objective reality. Now formalize these terms in the sentence: (1) Objective reality (2) Consistent experience (3) Limited-goal-oriented-learner (4) Far off (5) High probability.
(1) Objective reality: We assume Objective reality exists independently from experience, and not vice versa. We define Reality with two abstractions: a set (instance or input space) and a set
(concept class) of set
(concept). These two abstractions allow arbitrary knowledge representation and thus the possibility of learning and goal-seeking in it.
Specifically, is a set of encoding of objects in learner’s world,
is subset of
that positively exemplify certain simple or interesting rule in
, and
is a set of subset
over
, expressing some general knowledge representation. For example, say a learner is solely interested in learning people’s height, then
would be the the set of positive real number,
would be a rule mapping a group’s height range
of
to 1 and its complement to 0, and
expresses the general knowledge of groups’ height ranges.
(2) Consistent Experience: It refers to tuples of the form with unknown target concept
from
, and
drawn from
with fixed target distribution
. These tuples are generated from procedure
that runs in unit time. The fixing of distribution
makes the experience consistent, and such experience transcends subjectivity, since it does not depend on a learner.
(3) Limited-goal-oriented-learner: Such a learner receives units of consistent experience bounded by its running time. The goal of the learner is to utilize its received experience to mimic with its learned prediction
, assuming
. Quantitatively, the goal of the learner is to minimize
. This learner aims to learn some fundamental rule of the reality by mimicking it, using consistent experience (intrinsically uncertain) limited by its running time.
(4) Far off: The learner’s predictions are not too far off if .
(5) High Probability: Probability of the learner’s predictions not too far off is above a threshold: . The source of randomness for probability comes both from
and Learner’s possible randomization in its learning algorithm.
This formalization presumes metaphysical predispositions that, by themselves are good answers to some epistemological inquiries. This is the preliminary definition of PAC Learning:
Definition 1 Let
be a concept class over
. We say that
is PAC learnable if there exists an algorithm
with the following property: for every concept
of
, for every distribution
on
, and for all
and
, if
is given access to
and inputs
and
, then with probability at least
,
outputs a hypothesis concept
of
satisfying
. This probability is taken over the random examples drawn by calls to
, and any internal randomization of
.
Ifruns in time polynomial in
and
, we say that
is efficiently PAC learnable. We will sometimes refer to the input
as the error parameter and
as the confidence parameter.
— An Introduction to Computational Learning Theory by Michael J. Kearns and Umesh V. Vazirani
A couple important refinements to this preliminary definition can be made in the next post. We will introduce some toy problems below using definition limited in this post.
Example
Not finished.