The optimal split is one with highest value of evaluation function. The optimal split tends tosplit data into two subsets with minimal within variance and greatest between variance.Important stage of tree learning is split stopping criterion. Stopping criterion should bedefined in such a way that balance between high generalization power and high fitting poweris achieved. Sample trees with low depth, may not model data very well, and we might endup with leaves with high entropy. On the other side if the number of node levels is high,terminal nodes or leaves might contain just few samples, what may be indication of lowgeneralization power. Tree can always be built in a way that terminal nodes are perfectlypure. Measure of generalization power might be determined by difference betweenclassification error on testing samples and classification error on training samples. Lowmisclassification error on training data but large difference between errors obtained on testingand training data indicate over-fitting of the tree. Reason for over-fitting is usually smalldataset, too many feature space dimensions or bias introduced by dataset. In order to addressthis issue two classes of pruning algorithms exist: pre-pruning and post-pruning. Pre pruningis performed during the tree building and it is based on statistical significance test. Ifsignificance of splitting in arbitrary node is below some predefined threshold then the node ispronounced as terminal node. However, Breiman et al.14 showed that this stop splittingapproach may end up with too many branches (problem with over fitting) if threshold is toosmall and on the other side splitting might be stopped to early if threshold is high. If weassume that in arbitrary node, significance test over data samples is below predefinedthreshold, it does not necessarily mean that if we split data into two subsets, this significancetest would be below threshold in each of them. Splitting might also stop if number of samplesin the node is below certain number.