introduction

Description of the Best-Case Model

A learning algorithm (NN learning for example) takes a training set S of examples:

S = {X, X,..., X},

where each example consists of a set of d features plus a category label. The algorithm must use the examples to build a representation with which it will classify a test set T of new examples.

A concept is a mapping from the set of all examples in a domain to a set of labels. A concept class is a set of concepts.

For example, in the domain defined by the unit square, the partitioning defined by the line x=0.5 might be a concept where any (x,y) in which x<=0.5 is labeled 0, and x>0.5 is labeled 1. An example of a concept class in this domain might be the set of all partitioning defined by lines of the form x=r, where 0<=x<=1. If the examples si are contained in Rd, then an equivalent term for concept is decision boundary, where the boundary includes all surfaces separating the classes.

We will say that a learner has learned a concept exactly if it correctly classifies all possible examples. The distribution from which the new examples in T are taken can be different from the distribution used for S. because our model is not statistical; the results are independent of any assumptions about these distributions.

In the best-case model, the learning algorithm and the concept are fixed and known to the teacher. The teacher then chooses examples for S in the best way possible. "best" means the learning algorithm requires a minimum number of training examples to learn a concept. The term sample complexity refers to the number of examples required for learning. The teacher is able to compute a representation of the concept in the same language used by the learner.

Even when using the best-case model, some concept classes still have exponential sample complexity for the NN algorithm.

fig. 4