Programming is a great way to get insights about math concepts. You’ll see here tips and tricks to learn math, more specifically linear algebra, from a coding perspective. You’ll see the relationship between Numpy functions and linear algebra abstract concepts.

At the end of this mini-tutorial, you’ll know what are vectors & matrices, and why they are the core of machine learning and data science.


If you’re a bit into data science and machine learning, you might hear the word vector all the time. Let’s clarify what they are.

Geometric and Coordinate Vectors

You can distinguish geometric vectors, which are arrows pointing in space, from coordinate vectors, which are list of values stored in arrays. The relationship between the two is that you can take the coordinates of the endpoint of the arrows to get values that depends on the coordinate system.

Mathematically, you can refer to vectors with lowercase bold italic letters, as $boldsymbol{v}$ for instance. Let’s have the following vector $boldsymbol{v}$:

$$ boldsymbol{v} = begin{bmatrix} 1 \ -1 end{bmatrix} $$

With Numpy, vectors are coordinate vectors: one-dimensional arrays of numerical values. You can create vectors with the function np.array():

import numpy as np
v = np.array([1, -1])
array([ 1, -1])

The variable v contains a Numpy one-dimensional array, that is, a vector, containing two values. From a geometric point of view, you can consider each of these values as coordinates. Since there are only two values, you can represent the vector in a Cartesian plane.

Let’s use Matplotlib to represent this geometric vector. You can use the function quiver() to draw arrows. The four first parameters are respectively: the starting point of $x$, the starting point of $y$, the ending point of $x$, and the ending point of $y$.

You can also draw axes with plt.axhline and plt.axvline (the parameter zorder allows you to set the axes behind the other elements).

import matplotlib.pyplot as plt
plt.quiver(0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, color="gray")
plt.xlim(-0.3, 2.2)
plt.ylim(-1.5, 1)
from matplotlib import ticker
# Assure that ticks are displayed with a specific step
ax = plt.gca()
# draw axes
plt.axhline(0, c='#d6d6d6', zorder=0)
plt.axvline(0, c='#d6d6d6', zorder=0)


You can see the arrow corresponding to the vector $boldsymbol{v}$. This vector has two components (the values in the array): $1$, that we represented on the $x$-axis and $-1$ represented on the $y$-axis.

It is convenient to take a vector with only two components as an example to represent it geometrically. However, the concepts you’ll learn are applicable for more components.

You can also represent vectors as the ending point of the arrow only. For instance, let’s simulate some data:

x = np.random.normal(0, 1, 100)
y = x + np.random.normal(0, 1, 100)

You can represent each data sample as a geometric vector:

for i in range(x.shape[0]):
    plt.quiver(0, 0, x[i], y[i], angles='xy', scale_units='xy', scale=1, color="#A9A9A9", alpha=0.4)
plt.xlim(x.min() - 1, x.max() + 1)
plt.ylim(y.min() - 1, y.max() + 1)


However, it is easier to represent data samples as points corresponding to the ending of the arrows:

plt.scatter(x, y, s=30)


In data science, you can use vectors to store the values corresponding to different features. This allows you to leverage linear algebra tools and concepts on your data.

Using Vectors with Numpy
You saw how to create a vector using the function array(). Note also that many Numpy functions return arrays. For instance, look at the following chunk of code:

random_vector = np.random.normal(0, 1, 2)
array([-1.0856306 ,  0.99734545])

The function np.random.normal() is used to draw random values from a normal distribution. You can see that it returns a Numpy array with the random values.

Let’s consider this array as a geometric vector and plot it:

plt.quiver(0, 0, random_vector[0], random_vector[1], angles='xy', scale_units='xy', scale=1, color="gray")
plt.xlim(random_vector[0] - 1, random_vector[1] + 1)
plt.ylim(random_vector[0] - 1, random_vector[1] + 1)



Let’s now create a vector with more components to illustrate the basics of indexing in Numpy:

b = np.random.normal(0, 1, 10)
array([-1.0856306 ,  0.99734545,  0.2829785 , -1.50629471, -0.57860025,
        1.65143654, -2.42667924, -0.42891263,  1.26593626, -0.8667404 ])

You can get only part of the vector using indexing. You can use values or list of values as indexes. For instance:

b[[0, 2]]
array([-1.0856306,  0.2829785])

You can also use a semicolon to get element from an index to another: start:end. For example:

array([ 0.2829785 , -1.50629471, -0.57860025])

If you omit start or end, it will uses respectively the first element and the last element. For instance, [:] will return the three first elements.

array([-1.0856306 ,  0.99734545,  0.2829785 ])

You can index from the last value using a negative sign. For instance, -1 corresponds to the last value, -2 to the one before, etc.:


You can also look at the shape of an array with the attribute shape:


You can see that there are 10 components in the vector $b$. Looking at the shape of your vectors tells you how many components it contains.


Say that you have multiple vectors corresponding to different observations from your dataset. You have one vector per observation with a length corresponding to the number of features. Similarly, you can have one vector per features corresponding containing each observations (you’ll see that transposition allows you to go from one view to another).

Matrices are two-dimensional arrays: they have rows and columns. You can denote a matrix with an uppercase bold italic letter, as $boldsymbol{A}$. For instance, you can have:

$$ boldsymbol{A} = begin{bmatrix} 1 & 2 \ 3 & 4 \ 5 & 6 end{bmatrix} $$

The matrix $boldsymbol{A}$ contains three rows and two columns. You can think of it as two column vectors or as three row vectors.

Let’s take an example creating a matrix containing random values:

C = np.random.normal(0, 1, (5, 3))
array([[-1.0856306 ,  0.99734545,  0.2829785 ],
       [-1.50629471, -0.57860025,  1.65143654],
       [-2.42667924, -0.42891263,  1.26593626],
       [-0.8667404 , -0.67888615, -0.09470897],
       [ 1.49138963, -0.638902  , -0.44398196]])

The matrix $boldsymbol{C}$ has 5 rows and 3 columns. You can look at its shape using again the shape attribute:

(5, 3)

Unlike with vectors, the shape of matrices is described by two numbers (instead of one): the first tells you the number of rows and the second the number of columns.


Like with vectors, you can get subsets of matrices using indexing. Since there are rows and columns, you need to use two indexes. For instance, remembering that Python uses zero-based indexing, to get the elements in the second row and the third column of the preceding matrix $boldsymbol{C}$, you do:

C[1, 2]

If you want to get the column 0, you need to take all rows (using :) for this column:

C[:, 0]
array([-1.0856306 , -1.50629471, -2.42667924, -0.8667404 ,  1.49138963])

If you want the last rows, you can do the same (all columns using :) and use -1:

C[-1, :]
array([ 1.49138963, -0.638902  , -0.44398196])


The norm of a vector, denoted with double vertical bars like $lVert boldsymbol{v rVert}$, is a value (a scalar) associated with the vector that satisfy the following rules:

  • The value can’t be negative.
  • Only the zero vector (a vector that doesn’t change another vector when you add them) has a norm of zero.
  • Scalar multiplication: $lVert k cdot boldsymbol{v rVert} = |k| cdot lVert boldsymbol{v rVert}$.
  • Triangle inequity: $lVert boldsymbol{u rVert + boldsymbol{v}} leq lVert boldsymbol{u rVert} + lVert boldsymbol{v rVert}$.

The physical concept of length satisfy these rules, so the length of a vector is a kind of norm. This also means that you can have multiple kinds of norms.

Vector norms are used in machine learning in cost functions for instance: the difference between estimated value and true value for each data sample is stored in a vector, and the norm of this vector tells you how well the estimation is. Another example is in regularization, where you add the norm of a vector containing the parameters of you model to the cost function. This norm tells you how large the parameters are, allowing the algorithm to avoid too large values (and thus, limit overfitting).

You’ll find more details about the mathematical definitions of norms ($L^1$ and $L^2$) and their use in machine learning in my upcoming session at ODSC, “Introduction to Linear Algebra for Data Science and Machine Learning With Python.” You’ll also learn to consider matrices as linear transformations, linear combinations, and how to understand least square approximation using the matrix form of systems of linear equations.

About the author/ODSC Europe speaker: Hadrien Jean, PhD:

Hadrien Jean is a machine learning scientist. He’s currently working on the book “Essential Math for Data Science” with O’Reilly. He previously worked at Ava on speech diarization. He also works on a bird detection project using deep learning. He completed his Ph.D. in cognitive science at the École Normale Supérieure (Paris, France) on the topic of auditory perceptual learning with a behavioral and electrophysiological approach. He has published a series of blog articles aiming at building intuition on mathematics through code and visualization (