Machine Learning Simplified (Ch 1)
February 6, 2015
How our brain works?!
It is the first day of first grade in school, you have stepped into a new world out of the comfort-zone of your home…….you find yourself sitting with a bunch of chaps like you—all with new bags, new uniform and fresh smell of new notebook. Then comes in a person and introduces herself as your teacher. She then draws something on a board like this…
….and says, “A for Apple”. You are totally oblivious but you are curious and credulous enough to tolerate any apparent “nonsense” you are presented with. Then she goes on rambling about “B for ball”, “C for Cat” and so on. It reaches up to Z but doesn’t stop there. It goes on repeating day after day, almost round the whole year.
In this process you are presented with a plethora of different styles of the same 26 alphabets; sometimes written by a different person—may be your friend or some other teacher—whereas sometimes it is printed in a book. If you look back now you will realize that there have been uncountable instances of “A” that you have seen and still when you will see an “A” written by someone unknown tomorrow, you cannot guarantee that it would be an exact match of one you have already seen at some point in past; but still you know that you can read it correctly.
In this whole scenario, you were showed something (like the A on the board) and was told that it is called something, then you were showed the next thing and it’s nomenclature and so on….and this was repeated multiple times with different instances of same characters.
This is precisely what we call ‘learning’ and when we do the same with the machines it is called ‘Machine Learning’.
Lets get Technical!
Now consider a black box similar to your brain with one input x and one output y:
Suppose we feed input as x = 1 and tell the black box that y should be 2. Then we feed x = 2 and tell that y = 4, then x = 5 and y = 10 and so on. If this black box is designed such that it learns from the inputs and the corresponding desired outputs and if sufficient amount of such samples are given to the black box it learns that pattern. Now if we feed an input as x = 1.5 it will give output y = 3 despite the fact that this was not taught to the black box explicitly. Also, the black box has absolutely no idea that what we are making it learn is a simple equation:
y = 2x
An alert reader might notice that there is no need of over-working by feeding the training samples to the black box; we can simply tell the black box the equation to be used to get the output. However, there can be a set of inputs (like x1, x2, x3, etc.) and a set of outputs (like y1, y2, y3, etc.) to the black box and this could make the equation so complex that it would be impossible for us to represent it definitively. This usually happens when we are designing a system for face recognition, hand-writing recognition, generic object recognition, etc. That is why we need a system which can dynamically extract patterns out of the training samples provided by us and use these patterns to generalize a rule.
For an instance, consider that we have certain images from which we need to classify if it is a table or a tree. Let’s observe what process goes through our brain when we do it in reality. We first look at the shape of the object in front of us and then sometimes we also look into the color of the object. In this case of tree and table only the shape feature would suffice but sometimes we also need additional information like color. In case of machines too, we can do the same exact thing; we can detect the edges from the image—which would cater for the shape of the object—and take the color information from the image, tailor all this information into a suitable array like (x1, x2, …) and feed it to the black box. With initial few images we can tell the black box if the image is a tree or a table. Later when the black box has learnt enough it can classify it into the right class by itself. For this particular classification one output is enough—if y = 0 it is a tree else if y = 1 it is a table. If there are multiple classes into which the images need to be classified we can simple increase the number of outputs.
If you are finding this black box kind of interesting at this point then behold it’s technical name—it is what the robotic freaks called the Artificial Neural Network! It is called so because it is actually an artificially devised system that resembles the network of neurons that is present in our brain.
Similar to its function in animals, a neural network keeps evolving with information during it’s learning phase and goes on becoming robust with more and more learning. However, there are ways in which we need to make it learn so that it learns efficiently. Imagine if your first grade teacher had just taught you an “A” in your first grade and nothing else like B, C, D. Then in the second grade if she would have taught you only B and never ever referred back to the A, you would have forgotten A and remembered only B after some point of time. It was because she taught you A, then B, then C and so on and then came back to A and repeated the whole process, you learnt all the characters explicitly and you still remember them flawlessly. This is exactly how you need to train the Artificial Neural Network (ANN) to get the best result. You feed all the learning samples one after the other and then you repeat all the samples again and again multiple times. One such round of training all the samples is called an epoch.
Also if you were always showed the printed characters and were taught the alphabets you would have blindly memorized all the alphabets and you would have failed to recognize an A if it was hand-written or if the font style was changed. It was because you were presented with various varieties of writing an A which made you generalize how an A looks and now you can recognize it under any conditions. This is also followed in machine learning where you feed various samples of the same class and train the ANN. This helps the ANN to generalize rather than memorize.
What’s Fuzzy Decision??
Now let us imagine that you are shown a digital screen. Your brain classifies it into a screen instantly. However, depending on if it is accompanied with a keyboard, mouse and CPU we come to know if it is a computer monitor or a television screen. If there are no computer accessories besides it, it is more probably a television screen. Let’s observe what our brain went through over here. Its neural network could recognize something placed in front of it as a screen. Owing to the neural network, it would have recognize it despite of the factors like the lighting conditions in the room or the color of the screen. However the neural network could not cater with the uncertainty about it being a computer screen or a TV screen. You needed additional information like the presence or absence of the accessories to conclude on that. There was another neural network in your brain which classified the CPU, mouse and the keyboard. Then all this information was consolidated and it was concluded that it is a computer screen. This uncertainty was resolved by something called a Fuzzy Logic. You saw something which was definitely a screen, but there was a 50% chance of it being a TV screen and a 50% chance of it being a computer screen (Are you sensing the introduction of “fuzziness” arising due to the introduction of percentage of viability?). As and when you went on recognizing other accessories the weightage of it being a computer screen went on increasing whereas the weightage of it being a TV screen went on decreasing.
This Fuzzy Logic helps a robotic system to arbitrate when uncertainties arise. So in a nutshell, a neural network is responsible for the robustness and the adaptation of the robot to the environment whereas the fuzzy logic is responsible to solve the uncertainties in the classification.
One neural network says that it is 0.5/1 a computer screen and 0.5/1 a TV screen. This is where the neural network gives a fuzzy decision. There are other neural networks running simultaneously which classify the CPU, mouse, etc. Then there is a fuzzy decision logic at a later stage which consolidates all the fuzzy conclusions of the neural networks preceding it and gives the final decision.
Wait a minute….Fuzziness and Probability are absolutely disparate!!
It may appear that the fuzzy logic is similar to probability. However, note that there is a grave difference between probability and fuzzy logic. Consider a house consisting of two rooms connected with a door and you are randomly sitting in any one of the two rooms. If we sample your status in the house (i.e. in which room you are) at random instances, your probability of being in Room 1 would be 0.5 and that of being in Room 2 would be 0.5. This means that if your status is sampled for 10 million times out of that for close to 5 million instances you would be in Room 1 and other times in Room 2. This is what probability tells us. Now consider that you are standing in the door of the two rooms with 10% of your body in Room 1 and 90% in Room 2. This means that you are 0.1 in Room 1 and 0.9 in Room 2. This is what fuzzy logic tells us. This example shows how gravely disparate fuzzy logic and probability are.