Matrix | The Vector of Intelligence

3 min readMay 18, 2020

When we hear the word 'matrix' , it pretty much draws an intuition of a rectangular array of numbers or that’s what the books call it.

When studying the chapter of Matrix in school mathematics, who might have guessed that this rectangular representation of numbers is what AI is hunting for. For better intuition of how a matrix deals with Machine Learning, let’s say you have 'm' number of features of an object that you want your Machine Learning algorithm to look, for classifying that object as either 'A' or 'B’. If we represent these features as a column vector, then we say that this vector is one - dimensional column matrix containing features that will classify our model into either of above mentioned categories. Now, can we classify an unknown object into a category by looking at just one feature example? The concept is pretty straight forward here. We need, maybe thousands of similar features that identify the object as either being 'A' or 'B’. Let’s extend the number of columns to 'n’ each containing the features of object. Now, we have a data matrix, say 'A’, of order m x n having each column as a feature vector and rows as the training examples.

For instance, take a shallow neural network having only one hidden layer as our classifier and the hidden layer comprises of four neurons. One of the feature columns, say ‘x’, containing x1,x2,x3 .. upto xm is picked from the matrix and we initialize a radom matrix 'w' of order 4 x m which is called the weight matrix. The dot product of weight matrix and the column vector is added to another randomly initialized matrix 'b' known as bias. Note that the dimensions of‘b’ are (4,1) because the resultant matrix of dot product of w(4,m) and x(m,1) has the dimensions (4,1). The resultant matrix, say 'a’, is passed into an activation function (preferably sigmoid function in case of a shallow network) that maps each element of matrix to the function. The sigmoid function binds every element within the range [0,1]. Notice that there does not exist any vertical asymptote of this function.

The work we did so far looks pretty much like this

The same procedure is repeated in the next layer of the network that finally classifies the features into one of the classes. The output can be processed with softmax function which distributes the outputs on the basis of probabilities. The mean absolute error is calculated in prediction and then the backpropagation algorithm comes into play to make the neural network learn.

Conclusion

It seems that the definition 'rectangular representation' holds good in Machine Learning as well. The matrix is one of the most important concept required in the field of linear algebra that contributes towards Machine Learning.

Matrix | The Vector of Intelligence

Conclusion

Written by Maaz_Bin_Asad