Thursday, March 20, 2008

Learning and changing

About two years ago, I had a post on learning from losing. There I mentioned about one situation that triggers learning. It's almost always true that learning is triggered from losing something or to avoid losing something. That something can be anything like say starting from survival to satisfaction gained by understanding Nature.

Life can be viewed as a function of several parameters. There are two aspects in optimizing the function. First identifying the parameters then adjusting the values of the parameters. The more parameters one identifies or in other words the higher ones' awareness is, the better he can adjust or try to adjust the parameters to optimize the value of the function. Everyone is typically endowed with certain level of awareness by default as part of intellectual heritage. Then there are some noble ones who add to our intellectual heritage.

Adjusting the parameters is also called learning. Now based on general observations in the population different people find it hard to learn different parameters. Why is that? Let's try to see. Typically to learn a parameter, we need to compute gradients in the parameter space. The gradients are the places of change. The harder it is to compute gradient along a certain parameter (detect changes) the harder it is to adjust that parameter in the direction of convergence. See we know that not everyone can perceive all relevant parameters let alone gradients along those. Again even if one does perceive gradients the function might not be convex along that parameter and it might require more sophistication of zig-zagging (most people adapt this approach) or trying to convexify (some spiritually inclined adapt this approach) to avoid local minima. This is why change could be hard sometimes.

Hardness of learning a parameter depends on the VC dimension of the training examples for that parameter. The complexity of the learning algorithm required is inversely proportional to the VC dimension. The higher the VC-dimension the easier the algorithm can be. Being aware of this helps us in admitting that some parameters can be easier to learn while some may not. VC-dimension can also be viewed as the number of training examples that the algorithm can shatter that is give zero training error. So what can we psychologically do to keep our learning in life simple or optimal? We have to try to adjust the labels for our training data (experiences) so that we don't unnecessarily increase the requirements on learning algorithms. We rather are better off spending our resources on some parameters which genuinely need hard algorithms. Here genuinely meaning for independent samples in successful and noble population.

No comments: