June 4, 2026 12 min read

Reading Neural Networks - Every Arrow Has an Address

Learn how to read a fully connected neural network diagram — from layers and activations to weight matrices, subscripts, and arrow labels.

Stories by Sagar Kharel

The Launch Log Comes First

A rocket team keeps a small launch log.

Before every launch attempt, Mission Control records a few clues and makes one call:

GO or HOLD?

L001

L002

HOLD

L003

L004

HOLD

Deep learning notation starts here.

Not with symbols.

From One Neuron to the Whole Network

In the last article, we used this launch log to learn how one neuron thinks:

inputs come in
weights decide how much to listen
bias nudges the score
activation turns the score into GO or HOLD

Start with Part 1 How one rocket neuron turns Wind and Fuel into GO or HOLD.

In this article, we keep the same rocket model and zoom out.

Instead of reading one neuron, we will learn how to read the whole network.

At first, the arrows look like spaghetti.

But they are not random.

Every arrow has an address.

The Three Parts of the Network

Now let’s zoom out from one neuron to a small network.

For our rocket model, the network has one hidden layer.

The input layer (in) receives the launch clues:

x = \begin{bmatrix} \text{Wind} \\ \text{Fuel} \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}

The hidden layer (h) sits between the input clues and the final answer.

It is called hidden because we do not directly program what each middle neuron should notice.

Mission Control provides the inputs:

\text{Wind and Fuel}

And the training data provides the target:

\text{GO or HOLD}

But no one tells the hidden layer:

watch Wind this way, combine Fuel that way, create this exact signal.

Those middle signals are learned.

That is why the layer is hidden: it is the model’s internal logic, not a column we manually design.

The output layer (out) turns the final signal into the prediction:

GO or HOLD?

So the network flow is:

\begin{array}{c} \text{input layer }(\text{in}) \\ \downarrow \\ \text{hidden layer }(\text{h}) \\ \downarrow \\ \text{output layer }(\text{out}) \end{array}

Our example has one hidden layer.

A network with zero or one hidden layer is often called a shallow neural network.

A network with more than one hidden layer is often called a deep neural network.

A deeper network simply repeats the hidden-layer idea:

\begin{array}{c} \text{input layer} \\ \downarrow \\ \text{hidden layer} \\ \downarrow \\ \cdots \\ \downarrow \\ \text{hidden layer} \\ \downarrow \\ \text{output layer} \end{array}

With one hidden layer, the network gets one middle round to combine Wind and Fuel.

With more hidden layers, it can build patterns on top of patterns.

In practice, networks can go very deep — dozens, hundreds, or even more layers.

But depth is not magic.

More layers can give the network more room to learn, but they can also make it harder to train and harder to understand.

The Notation Map

Layer Addresses

First we need to agree on how we count the layers.

Textbooks often number layers starting at 0:

layer 0 $\rightarrow$ input layer
layer 1 $\rightarrow$ 1st hidden layer
layer 2 $\rightarrow$ 2nd hidden layer
layer $L$ $\rightarrow$ output layer

We will use in and out for the input and output layer — the bookends of the network.

The middle hidden layers will use numbers: 1, 2, 3, and so on.

\begin{array}{c} \text{input layer }(0 \,/\, \text{in}) \\ \downarrow \\ \text{hidden layer }(1) \\ \downarrow \\ \text{output layer }(L \,/\, \text{out}) \end{array}

General Convention

Most neural network notation looks like:

{\text{symbol}\kern0.08em\vphantom{\Large|}}_{\text{subscript}}^{(\text{superscript})} \quad \begin{array}{l} \leftarrow \small \text{Layer address} \\ \leftarrow \small \text{Unit address} \end{array}

So the superscript tells us where in the network we are.

The subscript tells us which circle inside that layer we mean.

$W$ needs one extra rule because it connects two layers.

We will handle that when we get to the arrows.

1. Inputs: $x$

Inputs are the raw launch clues entering the network.

\Large x_i^{(in)} \quad \begin{array}{l} \leftarrow \small \text{input layer} \\ \leftarrow \small i\text{-th feature in the input} \end{array}

So $\large x_2^{(in)}$ means the second input feature.

This is the Fuel circle in our input layer.

2. Activations: $a$

Activations are the signals produced by neurons.

\Large a_i^{(l)} \quad \begin{array}{l} \leftarrow \small \text{layer } l \text{ (e.g., 1, 2, or } h\text{)} \\ \leftarrow \small i\text{-th unit inside that layer} \end{array}

So $\large a_2^{(1)}$ means second activation in the first hidden layer.

This is the second circle in our layer 1.

3. Biases: $b$

Bias is the starting nudge for a layer.

\Large b^{(l)} \quad \begin{array}{l} \leftarrow \small \text{layer } l \text{ (e.g., } h \text{ or } out\text{)} \\ \phantom{\leftarrow} \end{array}

So $\large b^{(1)}$ means the bias vector for layer 1.

The number of elements in bias matches the number of neurons in that layer.

Our layer 1 has three neurons, and hence there are 3 entries:

\large b^{(1)} = \begin{bmatrix} b_1^{(1)} \\ b_2^{(1)} \\ b_3^{(1)} \end{bmatrix}

4. Weights: $W$

Weights represent the connections between neurons.

They live on the arrows between layers.

Because weights connect two different layers, their notation is slightly different.

A weight connects a unit in layer $l$ to a unit in layer $l+1$ :

\text{layer } l \rightarrow \text{layer } l+1

For the superscript, we use the destination layer:

\Large W^{(l+1)} \quad \begin{array}{l} \leftarrow \small \text{destination layer} \\ \phantom{\leftarrow \small \text{destination unit}} \end{array}

Since the superscript already points to the destination layer, the subscript follows the same idea:

\Large W_{\text{destination},\text{ source}}

So:

\large W_{j,k}^{(l+1)} \quad \begin{array}{l} \leftarrow \small \text{destination layer} \\ \leftarrow \small \text{destination unit } j \text{ source unit } k \end{array}

So $\large W_{j,k}^{(l+1)}$ means the weight going to unit $j$ in layer $l+1$ , from unit $k$ in layer $l$ .

The matrix shape follows the same rule:

W = \text{output units} \times \text{input units}

In plain English:

how many outputs do we want from how many inputs?

Memory hook:

Where are you going? Destination first.

In pictures

Think of notation as an address system.

For values like $x$ , $a$ , and $b$ , we are describing where a value lives:

\large {\text{value}\kern0.1em\vphantom{\Big|}}_{\text{unit}}^{(\text{layer})} \quad \begin{array}{l} \leftarrow \small \text{layer where the value lives} \\ \leftarrow \small \text{unit in that layer} \end{array}

For weights $W$ , we are describing an arrow between two places:

\large W_{j,k}^{(l+1)} \quad \begin{array}{l} \leftarrow \small \text{destination layer} \\ \leftarrow \small \text{destination unit } j,\text{ source unit } k \end{array}

Building the Addresses on Every Arrow

In Matrix Form: Step by Step

1. The Shapes

At this point, we only know the shapes we are working with:

Our input $x$ is a single launch attempt.
- As a row, it carries 2 launch clues: Wind and Fuel.
- So its shape is $1 \times 2$ .
We want our hidden layer to have 3 units.
Based on our outputs $\times$ inputs rule, our weight matrix $W$ must be $3 \times 2$ .

If we try to multiply a $1 \times 2$ row by a $3 \times 2$ matrix, the inner dimensions do not match:

(1 \times 2)(3 \times 2)

So we use the $W^T$ transpose.

Now the multiplication works:

x \cdot W^T = (1 \times 2)(2 \times 3) = 1 \times 3

The Elephant in the Room: Why use $W^T$ ?

Why did Part 1 use Xw? One neuron: one weight column. Many neurons: one weight matrix.

You might remember that in our single-neuron story, we multiplied the launch logbook by the weights like this:

Xw

That worked because $w$ was a simple column vector.

But now the hidden layer has many neurons, so our weights have grown into a full matrix $W$ .

2. The Linear Combinations

Now let’s open up the multiplication.

We are building the first hidden layer, so every weight gets a $(1)$ superscript.

For now, do not worry about the exact subscripts.

Just notice the shape:

x \cdot W^T = \begin{bmatrix} x_1 & x_2 \end{bmatrix} \begin{bmatrix} \textcolor{#E69F00}{w^{(1)}} & \textcolor{#009E73}{w^{(1)}} & \textcolor{#CC79A7}{w^{(1)}} \\ \textcolor{#E69F00}{w^{(1)}} & \textcolor{#009E73}{w^{(1)}} & \textcolor{#CC79A7}{w^{(1)}} \end{bmatrix}

If we multiply the rows by the columns, we get three separate equations for our three new scores:

So the multiplication gives us three scores.

One score for each hidden unit.

\begin{aligned} x_1\textcolor{#E69F00}{w^{(1)}} + x_2\textcolor{#E69F00}{w^{(1)}} &= \textcolor{#E69F00}{z_1^{(1)}} \\ x_1\textcolor{#009E73}{w^{(1)}} + x_2\textcolor{#009E73}{w^{(1)}} &= \textcolor{#009E73}{z_2^{(1)}} \\ x_1\textcolor{#CC79A7}{w^{(1)}} + x_2\textcolor{#CC79A7}{w^{(1)}} &= \textcolor{#CC79A7}{z_3^{(1)}} \end{aligned}

So those three scores sit side by side:

z^{(1)} = \begin{bmatrix} \textcolor{#E69F00}{z_1^{(1)}} & \textcolor{#009E73}{z_2^{(1)}} & \textcolor{#CC79A7}{z_3^{(1)}} \end{bmatrix}

One launch attempt gives one row of hidden-layer scores.

3. Mapping the Rows and Columns

Now the matrix starts to tell us what it is doing.

To make the map clear, let’s color-code the weights based on the score they build:

one color for the weights that build $z_1^{(1)}$
one color for the weights that build $z_2^{(1)}$
one color for the weights that build $z_3^{(1)}$

\normalsize \begin{array}{ccc} \begin{bmatrix} x_1 & x_2 \end{bmatrix} & \begin{bmatrix} \textcolor{#E69F00}{w^{(1)}} & \textcolor{#009E73}{w^{(1)}} & \textcolor{#CC79A7}{w^{(1)}} \\[0.65em] \textcolor{#E69F00}{w^{(1)}} & \textcolor{#009E73}{w^{(1)}} & \textcolor{#CC79A7}{w^{(1)}} \end{bmatrix} & \begin{array}{l} \scriptstyle \leftarrow \text{multiplies } x_1 \\[0.5em] \scriptstyle \leftarrow \text{multiplies } x_2 \end{array} \\[0.9em] {} & \begin{array}{ccc} \downarrow & \downarrow & \downarrow \\[0.35em] \textcolor{#E69F00}{z_1^{(1)}} & \textcolor{#009E73}{z_2^{(1)}} & \textcolor{#CC79A7}{z_3^{(1)}} \end{array} & {} \end{array}

Read the arrows:

The first row of weights meets $x_1$ .
The second row of weights meets $x_2$ .
Each column builds one hidden-layer score.

x_1 \textcolor{#E69F00}{w^{(1)}} + x_2 \textcolor{#E69F00}{w^{(1)}} = \textcolor{#E69F00}{z_1^{(1)}}

x_1 \textcolor{#009E73}{w^{(1)}} + x_2 \textcolor{#009E73}{w^{(1)}} = \textcolor{#009E73}{z_2^{(1)}}

x_1 \textcolor{#CC79A7}{w^{(1)}} + x_2 \textcolor{#CC79A7}{w^{(1)}} = \textcolor{#CC79A7}{z_3^{(1)}}

That is the map:

Rows line up with the input clues.
Columns build the hidden-layer scores.

4. Filling the Destination Slot: $w_{\color{#E69F00}{?},\_}^{(1)}$

We are already building the first hidden layer, so the superscript stays $(1)$ .

Now add the first subscript: the destination unit.

The first column builds $\large z_\mathbf{1}^{(1)}$ , so its weights get destination index 1.

The second column builds $\large z_\mathbf{2}^{(1)}$ , so its weights get destination index 2.

The third column builds $\large z_\mathbf{3}^{(1)}$ , so its weights get destination index 3.

\begin{array}{c} \begin{bmatrix} \textcolor{#E69F00}{w_{1,\_}^{(1)}} & \textcolor{#009E73}{w_{2,\_}^{(1)}} & \textcolor{#CC79A7}{w_{3,\_}^{(1)}} \\[1.0em] \textcolor{#E69F00}{w_{1,\_}^{(1)}} & \textcolor{#009E73}{w_{2,\_}^{(1)}} & \textcolor{#CC79A7}{w_{3,\_}^{(1)}} \end{bmatrix} \\[1.0em] \begin{array}{ccc} \downarrow & \downarrow & \downarrow \\[0.5em] \textcolor{#E69F00}{z_1^{(1)}} & \textcolor{#009E73}{z_2^{(1)}} & \textcolor{#CC79A7}{z_3^{(1)}} \end{array} \end{array}

Now the equations carry that same destination index.

x_1 \textcolor{#E69F00}{w_{1,\_}^{(1)}} + x_2 \textcolor{#E69F00}{w_{1,\_}^{(1)}} = \textcolor{#E69F00}{z_1^{(1)}}

x_1 \textcolor{#009E73}{w_{2,\_}^{(1)}} + x_2 \textcolor{#009E73}{w_{2,\_}^{(1)}} = \textcolor{#009E73}{z_2^{(1)}}

x_1 \textcolor{#CC79A7}{w_{3,\_}^{(1)}} + x_2 \textcolor{#CC79A7}{w_{3,\_}^{(1)}} = \textcolor{#CC79A7}{z_3^{(1)}}

That is the first subscript:

destination unit first

5. Filling the Source Slot: $w_{j,\_}^{(1)}$

The second subscript tells us where the weight is coming from.

weights in the first row come from source 1
weights in the second row come from source 2

\begin{array}{cc} \begin{bmatrix} \textcolor{#E69F00}{w_{1,1}^{(1)}} & \textcolor{#009E73}{w_{2,1}^{(1)}} & \textcolor{#CC79A7}{w_{3,1}^{(1)}} \\[1.1em] \textcolor{#E69F00}{w_{1,2}^{(1)}} & \textcolor{#009E73}{w_{2,2}^{(1)}} & \textcolor{#CC79A7}{w_{3,2}^{(1)}} \end{bmatrix} & \begin{array}{l} \leftarrow \small \text{source 1} \\[1.1em] \leftarrow \small \text{source 2} \end{array} \\[1.1em] \begin{array}{ccc} \downarrow & \downarrow & \downarrow \\[0.5em] \textcolor{#E69F00}{z_1^{(1)}} & \textcolor{#009E73}{z_2^{(1)}} & \textcolor{#CC79A7}{z_3^{(1)}} \end{array} & {} \end{array}

Now every weight has a full address:

x_1 \textcolor{#E69F00}{w_{1,1}^{(1)}} + x_2 \textcolor{#E69F00}{w_{1,2}^{(1)}} = \textcolor{#E69F00}{z_1^{(1)}}

x_1 \textcolor{#009E73}{w_{2,1}^{(1)}} + x_2 \textcolor{#009E73}{w_{2,2}^{(1)}} = \textcolor{#009E73}{z_2^{(1)}}

\underbrace{x_1\textcolor{#CC79A7}{w_{3,1}^{(1)}}}_{\text{source 1}} + \underbrace{x_2\textcolor{#CC79A7}{w_{3,2}^{(1)}}}_{\text{source 2}} = \textcolor{#CC79A7}{z_3^{(1)}}

That is the full weight address:

destination unit first, source input second

Scoring the Entire Logbook at Once

So far, we scored one launch attempt:

x \cdot W^T = 1 \times 3

That gave us one row of hidden-layer scores:

z^{(1)} = \begin{bmatrix} z_1^{(1)} & z_2^{(1)} & z_3^{(1)} \end{bmatrix}

But the launch logbook has more than one row.

If we score the whole logbook, lowercase $x$ becomes capital $X$ :

X = n \times 2

$n$ launch attempts.
2 input clues: Wind and Fuel.

Because we are still building 3 hidden units, the weight matrix stays the same.

W^T = 2 \times 3

So the multiplication becomes:

X \cdot W^T = (n \times 2)(2 \times 3) = n \times 3

The inner 2s still match.

The output is now:

one score row per launch attempt

For two launch attempts, it looks like this:

\underbrace{ \begin{bmatrix} x_1 & x_2 \\ x_1 & x_2 \end{bmatrix} }_{X} \cdot \underbrace{ \begin{bmatrix} w_{1,1}^{(1)} & w_{2,1}^{(1)} & w_{3,1}^{(1)} \\[0.8em] w_{1,2}^{(1)} & w_{2,2}^{(1)} & w_{3,2}^{(1)} \end{bmatrix} }_{W^T}

= \begin{array}{cc} \begin{bmatrix} z_1^{(1)} & z_2^{(1)} & z_3^{(1)} \\ z_1^{(2)} & z_2^{(2)} & z_3^{(2)} \end{bmatrix} & \begin{array}{l} \leftarrow \small \text{launch 1} \\ \leftarrow \small \text{launch 2} \end{array} \end{array}

That is the trick.

We do not score Launch 1, then come back and score Launch 2.

One matrix multiplication gives hidden-layer scores for the whole logbook at once.

Vectorization: Scoring the Whole Page at Once

In the last section, we stopped scoring one launch attempt at a time and scored the whole logbook instead.

That move has a name: vectorization

Vectorization is where linear algebra starts doing the heavy lifting.

Instead of treating the launch logbook as separate rows, we treat it as one matrix.

That lets one matrix multiplication score the whole logbook in one pass:

X \cdot W^T = Z^{(1)}

In our rocket story:

lowercase $x$ is one launch attempt
capital $X$ is the whole launch logbook

So vectorization is the jump from one row to many rows:

x \rightarrow X

One launch attempt becomes the whole logbook.

And the formula barely changes:

x \cdot W^T \quad \longrightarrow \quad X \cdot W^T

That is why linear algebra is so powerful in neural networks.

It lets the model stop doing this:

score launch 1
score launch 2
score launch 3

and start doing this:

score the whole logbook at once

The shapes show the speed trick:

(n \times 2)(2 \times 3) = n \times 3

That output means:

one score row per launch attempt

Vectorization is not just cleaner notation.

It is how the same calculation scales from one launch attempt to the whole logbook.

Mission Control Briefing

Input Layer (in)

The nervous intern with the clipboard: Wind and Fuel. It does not decide anything; it just reports the facts.

Input Layer (in). The nervous intern with the clipboard: Wind and Fuel. It does not decide anything; it just reports the facts.

Quiz

86% of people love quizzes after learning. Are you one of them?

★

Question 1 of 12 🏆 0 / 120 ⚡ Attempt 1 of 2

Question text