# PAC Learning

### Introduction

Human brain is a dopamine-based prediction machine, and learning can be seen as a process of creating goal-oriented prediction models. Thus, how do we formalize learning in uncertainty, where we cannot be sure our predictions would be useful in an evolving world? Leslie Valiant proposed one of the earliest such mathematical framework: PAC Learning.

### Formulation

PAC Learning refers to Probably Approximately Correct Learning: with high probability (Probably), a limited-goal-oriented-learner predicting (Learning) outcomes not too far off (Approximately Correct) from consistent experience of an objective reality. Now formalize these terms in the sentence: (1) Objective reality (2) Consistent experience (3) Limited-goal-oriented-learner (4) Far off (5) High probability.

(1) Objective reality: We assume Objective reality exists independently from experience, and not vice versa. We define Reality with two abstractions: a set $X$ (instance or input space) and a set $C$ (concept class) of set $c$ (concept). These two abstractions allow arbitrary knowledge representation and thus the possibility of learning and goal-seeking in it.

Specifically, $X$ is a set of encoding of objects in learner’s world, $c$ is subset of $X$ that positively exemplify certain simple or interesting rule in $X$, and $C$ is a set of subset $c$ over $X$, expressing some general knowledge representation. For example, say a learner is solely interested in learning people’s height, then $X$ would be the the set of positive real number, $c$ would be a rule mapping a group’s height range $[a, b]$ of $X$ to 1 and its complement to 0, and $C$ expresses the general knowledge of groups’ height ranges.

(2) Consistent Experience: It refers to tuples of the form $(x_i, c(x_i))$ with unknown target concept $c$ from $C$, and $x_i$ drawn from $X$ with fixed target distribution $D$. These tuples are generated from procedure $EX(c, D)$ that runs in unit time. The fixing of distribution $D$ makes the experience consistent, and such experience transcends subjectivity, since it does not depend on a learner.

(3) Limited-goal-oriented-learner: Such a learner receives units of consistent experience bounded by its running time. The goal of the learner is to utilize its received experience to mimic $c(x \sim D(X))$ with its learned prediction $h(x \sim D(X))$, assuming $h \in C$. Quantitatively, the goal of the learner is to minimize $error(h) = P_{x \sim D(X)}[c(x) != h(x)]$. This learner aims to learn some fundamental rule of the reality by mimicking it, using consistent experience (intrinsically uncertain) limited by its running time.

(4) Far off: The learner’s predictions are not too far off if $error(h) \leq \epsilon$.

(5) High Probability: Probability of the learner’s predictions not too far off is above a threshold: $P_{D}(error(h) \leq \epsilon) \geq \delta$. The source of randomness for probability comes both from $D$ and Learner’s possible randomization in its learning algorithm.

This formalization presumes metaphysical predispositions that, by themselves are good answers to some epistemological inquiries. This is the preliminary definition of PAC Learning:

Definition 1 Let $C$ be a concept class over $X$. We say that $C$ is PAC learnable if there exists an algorithm $L$ with the following property: for every concept $c$ of $C$, for every distribution $D$ on $X$, and for all $0<\epsilon<1/2$ and $0<\delta<1/2$, if $L$ is given access to $EX(c, D)$ and inputs $\epsilon$ and $\delta$, then with probability at least $1-\delta$, $L$ outputs a hypothesis concept $h$ of $C$ satisfying $error(h) \leq \epsilon$. This probability is taken over the random examples drawn by calls to $EX(c, D)$, and any internal randomization of $L$.
If $L$ runs in time polynomial in $1/\epsilon$ and $1/\delta$, we say that $C$ is efficiently PAC learnable. We will sometimes refer to the input $\epsilon$ as the error parameter and $\delta$ as the confidence parameter.
— An Introduction to Computational Learning Theory by Michael J. Kearns and Umesh V. Vazirani

A couple important refinements to this preliminary definition can be made in the next post. We will introduce some toy problems below using definition limited in this post.

### Example

Not finished.

This site uses Akismet to reduce spam. Learn how your comment data is processed.