# C++ neural networks from scratch – Pt 1. building a matrix library

Creating a bare-bones linear algebra library to train a neural net.

Without having access to any linear algebra or autodifferentiation libraries, what’s the absolute **bare-minimum** required to train a neural network?
Specifically, what do we need to build and train a multilayer perceptron (a cascade of linear-nonlinear layers) to regress onto some data?

In this first part of a 3-part series of posts, I’m going to build a bare-bones linear algebra library from the ground up using modern C++. In the following parts, I’ll create a trainable simple neural network, and validate it on some toy data.

## Building our own matrix library

Considering for now the forward pass of a single layer neural network, \(y \leftarrow {\color{forestgreen}f(}{\color{crimson} Wx} {\color{blue}+ b}{\color{forestgreen})}\), From this we see we’ll need to build tools that can:

- store data in a matrix (or vector)
- perform matrix products
- perform matrix addition
- apply a function element-wise to a matrix

Other operations that will be helpful for training a simple neural net are:

- element-wise multiplication
- scalar multiplication as a special case

- matrix subtraction
- matrix transposition

### Constructor and printing methods

We’ll create a templated class to allow us to create matrices with arbitrary type (float, double, int, etc.).
To instantiate a 2x3 float `Matrix`

, we just need to call `Matrix<float> M(2, 3);`

.
The constructor will also instatiate an empty 1D `(2 * 3)`

`std::vector`

with which to store the data.
The shape and number of elements of the matrix are stored in a public tuple.
For simplicity, the minimum size of each dimension should be 1, like MATLAB, and unlike numpy.

```
// matrix.h
#pragma once
#include <vector>
#include <cmath>
#include <cassert>
#include <iostream>
#include <tuple>
#include <functional>
#include <random>
template<typename Type>
class Matrix {
size_t cols;
size_t rows;
public:
std::vector<Type> data;
std::tuple<size_t, size_t> shape;
int numel = rows * cols;
/* constructors */
Matrix(size_t rows, size_t cols)
: cols(cols), rows(rows), data({}) {
data.resize(cols * rows, Type()); // init empty vector for data
shape = {rows, cols};
}
Matrix() : cols(0), rows(0), data({}) { shape = {rows, cols}; };
/* [print methods go here] */
/* [linear algebra methods go here] */
};
```

### Helper print methods

We’ll also add some `print()`

methods to the class that will help for debugging and displaying the data of the matrix.
The `print()`

method simply loops through each element of a row and prints each row to the screen using `std::cout`

.

```
// shape printer for debugging, PyTorch style formatting
void print_shape() {
std::cout << "Matrix Size([" << rows << ", " << cols << "])" << std::endl;
}
void print() {
for (size_t r = 0; r < rows; ++r) {
for (int c = 0; c < cols; ++c) {
std::cout << (*this)(r, c) << " ";
}
std::cout << std::endl;
}
std::cout << std::endl;
}
```

### Accessing matrix elements

Storing the underlying data in a fixed-size vector is more efficient than storing it explicitly as a 2D array because the data is guaranteed to be contiguous in memory. We can just do some simple indexing to translate from (row, col) to linear vector index.

To access elements of the matrix MATLAB-style (except zero-indexed), we can define an overload for the `operator()`

that returns a reference to the element to allow us to edit the matrix.
Despite the data being stored in a 1D `std::vector`

for efficiency, we can easily find the `(row,col)`

element using some simple indexing.

```
Type& operator()(size_t row, size_t col) {
return data[row * cols + col];
}
```

Now we can access a matrix like `M(row, col)`

and get the corresponding element of the array.

### Basic linear algebra methods

#### Matrix multiplication

The most important method we’ll need is a matrix multiply.
If we’re given a target matrix with which to multiply our current instantiated matrix, we need to first assert (from `#include <cassert>`

) that the `cols`

dimension of our current matrix matches the `rows`

of our target matrix.

We’re going to implement the most naive/simple matrix multiplication algorithm without worrying about efficiency.
If the matrices were both dense and `(n x n)`

square, then this would be O(n^3) time complexity.
Optimal implentations lie somewhere between O(n^2.3) and O(n^3) (the best we could possibly hope for is O(n^2) since we’d have to loop through all the elements at least once).

```
//O(rows^2 * cols) time | O(rows*cols) space
Matrix matmul(Matrix &target) {
assert(cols == target.rows);
Matrix output(rows, target.cols);
for (size_t r = 0; r < output.rows; ++r) {
for (size_t c = 0; c < output.cols; ++c) {
for (size_t k = 0; k < target.rows; ++k)
output(r, c) += (*this)(r, k) * target(k, c);
}
}
return output;
}
```

A much more straightforward multiplication operation is element-wise multiplication.
Given some `target`

, we’ll assert that its size is the same as our current matrix
Then we’ll loop through and multiply each element together, and assign to the appropriate element of an `output`

matrix.

Using this same logic, we can easily create a `square()`

method that does an element-wise squaring of a matrix (useful for computing squared error), and a matrix-times-scalar operation.

```
// O(rows*cols) time | O(rows*cols) space
Matrix multiply_elementwise(Matrix &target){
assert(shape == target.shape);
Matrix output((*this));
for (size_t r = 0; r < output.rows; ++r) {
for (size_t c = 0; c < output.cols; ++c) {
output(r, c) = target(r,c) * (*this)(r, c);
}
}
return output;
}
// O(rows*cols) time | O(rows*cols) space
Matrix square() {
Matrix output((*this));
output = multiply_elementwise(output);
return output;
}
// O(rows*cols) time | O(rows*cols) space
Matrix multiply_scalar(Type scalar) {
Matrix output((*this));
for (size_t r = 0; r < output.rows; ++r) {
for (size_t c = 0; c < output.cols; ++c) {
output(r, c) = scalar * (*this)(r, c);
}
}
return output;
}
```

#### Matrix addition

Matrix addition is probably the simplest matrix operation apart from scalar multiplication.
Given a reference to a `target`

`Matrix`

, we’ll first assert that the shapes are the same as our current matrix.
Then, we’ll just loop through each component of both arrays, sum them together, and assign them to the corresponding position of our `output`

matrix.

```
// binary arg matrix addition
// O(rows*cols) time | O(rows*cols) space
Matrix add(Matrix &target) {
assert(shape == target.shape);
Matrix output(rows, get<1>(target.shape));
for (size_t r = 0; r < output.rows; ++r) {
for (size_t c = 0; c < output.cols; ++c) {
output(r, c) = (*this)(r, c) + target(r, c);
}
}
return output;
}
Matrix operator+(Matrix &target) {
return add(target);
}
```

#### Matrix subtraction

To do a matrix operation like \(A = B - C\), we can break it down and view it like \(A \leftarrow {\color{blue}(B +} {\color{red}(-C)}{\color{blue})}\).
So we can first negate `C`

, then use our existing method to add `-C`

to `B`

.

To negate a matrix, we can write a unary `operator-()`

that loops through the current matrix’s data, negates every element and assigns them each to the corresponding elements of an `output`

matrix.

```
// unary-arg matrix negation
// O(rows*cols) time | O(rows*cols) space
Matrix operator-() {
Matrix output(rows, cols);
for (size_t r = 0; r < rows; ++r) {
for (size_t c = 0; c < cols; ++c) {
output(r, c) = -(*this)(r, c);
}
}
return output;
}
```

Now that we have a unary negation operator, to subtract a `target`

matrix from our current matrix, we simply negate `target`

, then call our `add()`

method on the negated `target`

.

```
// binary-arg matrix subtraction
// O(rows*cols) time | O(rows*cols) space
Matrix sub(Matrix &target) {
Matrix neg_target = -target;
return add(neg_target);
}
Matrix operator-(Matrix &target) { // for cleaner usage
return sub(target);
}
```

This can be optimized by rewriting the `add()`

method entirely while replacing the `+`

with a `-`

, but for our purposes, doing it this way is a nice exercise in composing different methods together.

#### Matrix transpose

To transpose a matrix, we simply swap the elements of its rows and columns.
We can do this easily by instantiating a new `Matrix`

with the transposed size of our current matrix, assign the appropriate elements to it by looping through our current matrix, then returning the new matrix.

```
// swap rows and cols
// O(rows*cols) time | O(rows*cols) space
Matrix transpose() {
size_t new_rows{cols}, new_cols{rows};
Matrix transposed(new_rows, new_cols);
for (size_t r = 0; r < new_rows; ++r) {
for (size_t c = 0; c < new_cols; ++c) {
transposed(r, c) = (*this)(c, r); // swap row and col
}
}
return transposed;
}
Matrix T(){ // Similar to numpy, etc.
return transpose();
}
```

#### Element-wise function application

Neural networks would be pretty useless without nonlinearities, because a cascade of linear operations can be collapsed into a single linear operation. So we’ll need to create a way to apply a nonlinear function (i.e. activation function) to each element of a matrix.

To apply a function (e.g. sigmoid, Tanh, ReLU, etc.) to each element of the matrix, we’ll need a method that can take as input a reference to a function (from `#include <functional>`

).
We’ll then loop through every component of the array and assign the output of the function of that component to a new `output`

matrix.

```
// O(rows*cols* O(function)) time | space depends on function
Matrix apply_function(const std::function<Type(const Type &)> &function) {
Matrix output((*this));
for (size_t r = 0; r < rows; ++r) {
for (size_t c = 0; c < cols; ++c) {
output(r, c) = function((*this)(r, c));
}
}
return output;
}
```

#### Matrix initialization

Lastly, we’ll need to create matrices filled with structured data in order to build and train a neural net.
Different options include something analogous to `zeros()`

, or `ones()`

, or various random matrices like `random.randn()`

in numpy.
Here’s an implementation of a `randn()`

matrix function that we’ll use to initialize our weight and bias matrices to Normal-distributed values.

```
template <typename T>
struct mtx {
static Matrix<T> randn(size_t rows, size_t cols) {
Matrix<T> M(rows, cols);
std::random_device rd{};
std::mt19937 gen{rd()};
// init Gaussian distr. w/ N(mean=0, stdev=1/sqrt(numel))
T n(M.numel);
T stdev{1 / sqrt(n)};
std::normal_distribution<T> d{0, stdev};
// fill each element w/ draw from distribution
for (size_t r = 0; r < rows; ++r) {
for (int c = 0; c < cols; ++c) {
M(r, c) = d(gen);
}
}
return M;
}
```

## The fruits of our labour

Now let’s test all the functions we made:

```
#include "matrix.h"
int main(){
auto M = mtx<float>::randn(2, 2); // init randn matrix
M.print_shape();
M.print(); // print the OG matrix
(M-M).print(); // print M minus itself
(M+M).print(); // print its sum
(M.multiply_scalar(2.f)).print(); // print 2x itself
(M.multiply_elementwise(M)).print(); // mult M w itself
auto MT = M.T(); // transpose the matrix
MT.print();
(MT.matmul(M)).print(); // form symm. pos. def. matrix
(M.apply_function([](auto x){return x-x;} )).print(); // apply fun
}
```

**Outputs:**

```
// M.shape
Matrix Size([2, 2])
// M
0.0383611 -0.310235
-0.615681 -0.356848
// M - M
0 0
0 0
// M + M
0.0767223 -0.62047
-1.23136 -0.713695
// 2 * M
0.0767223 -0.62047
-1.23136 -0.713695
// M elementwise multiply M
0.00147158 0.0962457
0.379063 0.12734
// M transpose
0.0383611 -0.615681
-0.310235 -0.356848
// M.T() matrix-multiply M
0.380534 0.207803
0.207803 0.223586
// Apply f(x) = x-x to each element of M.
0 0
0 0
```

With this from-scratch linear algebra library, we’re ready to now build a neural network.