AI

June 4, 2026 12 min read

Reading Neural Networks - Every Arrow Has an Address

Learn how to read a fully connected neural network diagram — from layers and activations to weight matrices, subscripts, and arrow labels.

Stories by Sagar Kharel

The Launch Log Comes First

A rocket team keeps a small launch log.

Before every launch attempt, Mission Control records a few clues and makes one call:

GO or HOLD?

Launch ID
Wind (mph)
Fuel (%)
Launch
L001
8
92
GO
L002
22
35
HOLD
L003
12
81
GO
L004
28
41
HOLD

Deep learning notation starts here.

Not with symbols.


From One Neuron to the Whole Network

In the last article, we used this launch log to learn how one neuron thinks:

  • inputs come in
  • weights decide how much to listen
  • bias nudges the score
  • activation turns the score into GO or HOLD
Start with Part 1 How one rocket neuron turns Wind and Fuel into GO or HOLD.

In this article, we keep the same rocket model and zoom out.

Instead of reading one neuron, we will learn how to read the whole network.

At first, the arrows look like spaghetti.

But they are not random.

Every arrow has an address.


The Three Parts of the Network

Now let’s zoom out from one neuron to a small network.

For our rocket model, the network has one hidden layer.

A 1-Hidden-Layer Neural Network

x₁ Wind x₂ Fuel Layer 0 Input Layer (in) Wind + Fuel Layer 1 Hidden Layer (h) combines the clues Layer L Output Layer (out) predicts GO / HOLD ŷ

The input layer (in) receives the launch clues:

x=[WindFuel]=[x1x2]x = \begin{bmatrix} \text{Wind} \\ \text{Fuel} \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}

The hidden layer (h) sits between the input clues and the final answer.

It is called hidden because we do not directly program what each middle neuron should notice.

Mission Control provides the inputs:

Wind and Fuel\text{Wind and Fuel}

And the training data provides the target:

GO or HOLD\text{GO or HOLD}

But no one tells the hidden layer:

watch Wind this way, combine Fuel that way, create this exact signal.

Those middle signals are learned.

That is why the layer is hidden: it is the model’s internal logic, not a column we manually design.

The output layer (out) turns the final signal into the prediction:

GO or HOLD?

So the network flow is:

input layer (in)hidden layer (h)output layer (out)\begin{array}{c} \text{input layer }(\text{in}) \\ \downarrow \\ \text{hidden layer }(\text{h}) \\ \downarrow \\ \text{output layer }(\text{out}) \end{array}

Our example has one hidden layer.

A network with zero or one hidden layer is often called a shallow neural network.

A network with more than one hidden layer is often called a deep neural network.

A deeper network simply repeats the hidden-layer idea:

input layerhidden layerhidden layeroutput layer\begin{array}{c} \text{input layer} \\ \downarrow \\ \text{hidden layer} \\ \downarrow \\ \cdots \\ \downarrow \\ \text{hidden layer} \\ \downarrow \\ \text{output layer} \end{array}

With one hidden layer, the network gets one middle round to combine Wind and Fuel.

With more hidden layers, it can build patterns on top of patterns.

In practice, networks can go very deep — dozens, hundreds, or even more layers.

But depth is not magic.

More layers can give the network more room to learn, but they can also make it harder to train and harder to understand.


The Notation Map

Layer Addresses

First we need to agree on how we count the layers.

Textbooks often number layers starting at 0:

  • layer 0 \rightarrow input layer
  • layer 1 \rightarrow 1st hidden layer
  • layer 2 \rightarrow 2nd hidden layer
  • layer LL \rightarrow output layer

We will use in and out for the input and output layer — the bookends of the network.

The middle hidden layers will use numbers: 1, 2, 3, and so on.

input layer (0/in)hidden layer (1)output layer (L/out)\begin{array}{c} \text{input layer }(0 \,/\, \text{in}) \\ \downarrow \\ \text{hidden layer }(1) \\ \downarrow \\ \text{output layer }(L \,/\, \text{out}) \end{array}

General Convention

Most neural network notation looks like:

symbolsubscript(superscript)Layer addressUnit address{\text{symbol}\kern0.08em\vphantom{\Large|}}_{\text{subscript}}^{(\text{superscript})} \quad \begin{array}{l} \leftarrow \small \text{Layer address} \\ \leftarrow \small \text{Unit address} \end{array}

So the superscript tells us where in the network we are.

The subscript tells us which circle inside that layer we mean.

WW needs one extra rule because it connects two layers.

We will handle that when we get to the arrows.


1. Inputs: xx

Inputs are the raw launch clues entering the network.

xi(in)input layeri-th feature in the input\Large x_i^{(in)} \quad \begin{array}{l} \leftarrow \small \text{input layer} \\ \leftarrow \small i\text{-th feature in the input} \end{array}

So x2(in)\large x_2^{(in)} means the second input feature.

This is the Fuel circle in our input layer.


2. Activations: aa

Activations are the signals produced by neurons.

ai(l)layer l (e.g., 1, 2, or h)i-th unit inside that layer\Large a_i^{(l)} \quad \begin{array}{l} \leftarrow \small \text{layer } l \text{ (e.g., 1, 2, or } h\text{)} \\ \leftarrow \small i\text{-th unit inside that layer} \end{array}

So a2(1)\large a_2^{(1)} means second activation in the first hidden layer.

This is the second circle in our layer 1.


3. Biases: bb

Bias is the starting nudge for a layer.

b(l)layer l (e.g., h or out)\Large b^{(l)} \quad \begin{array}{l} \leftarrow \small \text{layer } l \text{ (e.g., } h \text{ or } out\text{)} \\ \phantom{\leftarrow} \end{array}

So b(1)\large b^{(1)} means the bias vector for layer 1.

The number of elements in bias matches the number of neurons in that layer.

Our layer 1 has three neurons, and hence there are 3 entries:

b(1)=[b1(1)b2(1)b3(1)]\large b^{(1)} = \begin{bmatrix} b_1^{(1)} \\ b_2^{(1)} \\ b_3^{(1)} \end{bmatrix}

4. Weights: WW

Weights represent the connections between neurons.

They live on the arrows between layers.

Because weights connect two different layers, their notation is slightly different.

A weight connects a unit in layer ll to a unit in layer l+1l+1:

layer llayer l+1\text{layer } l \rightarrow \text{layer } l+1

For the superscript, we use the destination layer:

W(l+1)destination layerdestination unit\Large W^{(l+1)} \quad \begin{array}{l} \leftarrow \small \text{destination layer} \\ \phantom{\leftarrow \small \text{destination unit}} \end{array}

Since the superscript already points to the destination layer, the subscript follows the same idea:

Wdestination, source\Large W_{\text{destination},\text{ source}}

So:

Wj,k(l+1)destination layerdestination unit j source unit k\large W_{j,k}^{(l+1)} \quad \begin{array}{l} \leftarrow \small \text{destination layer} \\ \leftarrow \small \text{destination unit } j \text{ source unit } k \end{array}

So Wj,k(l+1)\large W_{j,k}^{(l+1)} means the weight going to unit jj in layer l+1l+1, from unit kk in layer ll.

The matrix shape follows the same rule:

W=output units×input unitsW = \text{output units} \times \text{input units}

In plain English:

how many outputs do we want from how many inputs?

Memory hook:

Where are you going? Destination first.


In pictures

Think of notation as an address system.

For values like xx, aa, and bb, we are describing where a value lives:

valueunit(layer)layer where the value livesunit in that layer\large {\text{value}\kern0.1em\vphantom{\Big|}}_{\text{unit}}^{(\text{layer})} \quad \begin{array}{l} \leftarrow \small \text{layer where the value lives} \\ \leftarrow \small \text{unit in that layer} \end{array}

For weights WW, we are describing an arrow between two places:

Wj,k(l+1)destination layerdestination unit j, source unit k\large W_{j,k}^{(l+1)} \quad \begin{array}{l} \leftarrow \small \text{destination layer} \\ \leftarrow \small \text{destination unit } j,\text{ source unit } k \end{array}

Building the Addresses on Every Arrow

ŷ W (1) 1 , 1 W (1) 2 , 1 W (1) 3 , 1 W (1) 1 , 2 W (1) 2 , 2 W (1) 3 , 2 W (out) 1 , 1 W (out) 1 , 2 W (out) 1 , 3 x 1 x 2 a (1) 1 a (1) 2 a (1) 3 a (out) 1 + b 1 (1) + b 2 (1) + b 3 (1) + b 1 (out) Input Layer (in) Wind + Fuel Hidden Layer (1) combines the clues Output Layer (out) predicts GO / HOLD

In Matrix Form: Step by Step

1. The Shapes

At this point, we only know the shapes we are working with:

  • Our input xx is a single launch attempt.

    • As a row, it carries 2 launch clues: Wind and Fuel.
    • So its shape is 1×21 \times 2.
  • We want our hidden layer to have 3 units.

  • Based on our outputs ×\times inputs rule, our weight matrix WW must be 3×23 \times 2.

If we try to multiply a 1×21 \times 2 row by a 3×23 \times 2 matrix, the inner dimensions do not match:

(1×2)(3×2)(1 \times 2)(3 \times 2)

So we use the WTW^T transpose.

Now the multiplication works:

xWT=(1×2)(2×3)=1×3x \cdot W^T = (1 \times 2)(2 \times 3) = 1 \times 3

The Elephant in the Room: Why use WTW^T?

Why did Part 1 use Xw? One neuron: one weight column. Many neurons: one weight matrix.

You might remember that in our single-neuron story, we multiplied the launch logbook by the weights like this:

XwXw

That worked because ww was a simple column vector.

But now the hidden layer has many neurons, so our weights have grown into a full matrix WW.


2. The Linear Combinations

Now let’s open up the multiplication.

We are building the first hidden layer, so every weight gets a (1)(1) superscript.

For now, do not worry about the exact subscripts.

Just notice the shape:

xWT=[x1x2][w(1)w(1)w(1)w(1)w(1)w(1)]x \cdot W^T = \begin{bmatrix} x_1 & x_2 \end{bmatrix} \begin{bmatrix} \textcolor{#E69F00}{w^{(1)}} & \textcolor{#009E73}{w^{(1)}} & \textcolor{#CC79A7}{w^{(1)}} \\ \textcolor{#E69F00}{w^{(1)}} & \textcolor{#009E73}{w^{(1)}} & \textcolor{#CC79A7}{w^{(1)}} \end{bmatrix}

If we multiply the rows by the columns, we get three separate equations for our three new scores:

So the multiplication gives us three scores.

One score for each hidden unit.

x1w(1)+x2w(1)=z1(1)x1w(1)+x2w(1)=z2(1)x1w(1)+x2w(1)=z3(1)\begin{aligned} x_1\textcolor{#E69F00}{w^{(1)}} + x_2\textcolor{#E69F00}{w^{(1)}} &= \textcolor{#E69F00}{z_1^{(1)}} \\ x_1\textcolor{#009E73}{w^{(1)}} + x_2\textcolor{#009E73}{w^{(1)}} &= \textcolor{#009E73}{z_2^{(1)}} \\ x_1\textcolor{#CC79A7}{w^{(1)}} + x_2\textcolor{#CC79A7}{w^{(1)}} &= \textcolor{#CC79A7}{z_3^{(1)}} \end{aligned}

So those three scores sit side by side:

z(1)=[z1(1)z2(1)z3(1)]z^{(1)} = \begin{bmatrix} \textcolor{#E69F00}{z_1^{(1)}} & \textcolor{#009E73}{z_2^{(1)}} & \textcolor{#CC79A7}{z_3^{(1)}} \end{bmatrix}

One launch attempt gives one row of hidden-layer scores.


3. Mapping the Rows and Columns

Now the matrix starts to tell us what it is doing.

To make the map clear, let’s color-code the weights based on the score they build:

  • one color for the weights that build z1(1)z_1^{(1)}
  • one color for the weights that build z2(1)z_2^{(1)}
  • one color for the weights that build z3(1)z_3^{(1)}
[x1x2][w(1)w(1)w(1)w(1)w(1)w(1)]multiplies x1multiplies x2z1(1)z2(1)z3(1)\normalsize \begin{array}{ccc} \begin{bmatrix} x_1 & x_2 \end{bmatrix} & \begin{bmatrix} \textcolor{#E69F00}{w^{(1)}} & \textcolor{#009E73}{w^{(1)}} & \textcolor{#CC79A7}{w^{(1)}} \\[0.65em] \textcolor{#E69F00}{w^{(1)}} & \textcolor{#009E73}{w^{(1)}} & \textcolor{#CC79A7}{w^{(1)}} \end{bmatrix} & \begin{array}{l} \scriptstyle \leftarrow \text{multiplies } x_1 \\[0.5em] \scriptstyle \leftarrow \text{multiplies } x_2 \end{array} \\[0.9em] {} & \begin{array}{ccc} \downarrow & \downarrow & \downarrow \\[0.35em] \textcolor{#E69F00}{z_1^{(1)}} & \textcolor{#009E73}{z_2^{(1)}} & \textcolor{#CC79A7}{z_3^{(1)}} \end{array} & {} \end{array}

Read the arrows:

  • The first row of weights meets x1x_1.
  • The second row of weights meets x2x_2.
  • Each column builds one hidden-layer score.
x1w(1)+x2w(1)=z1(1)x_1 \textcolor{#E69F00}{w^{(1)}} + x_2 \textcolor{#E69F00}{w^{(1)}} = \textcolor{#E69F00}{z_1^{(1)}} x1w(1)+x2w(1)=z2(1)x_1 \textcolor{#009E73}{w^{(1)}} + x_2 \textcolor{#009E73}{w^{(1)}} = \textcolor{#009E73}{z_2^{(1)}} x1w(1)+x2w(1)=z3(1)x_1 \textcolor{#CC79A7}{w^{(1)}} + x_2 \textcolor{#CC79A7}{w^{(1)}} = \textcolor{#CC79A7}{z_3^{(1)}}

That is the map:

Rows line up with the input clues.
Columns build the hidden-layer scores.


4. Filling the Destination Slot: w?,_(1)w_{\color{#E69F00}{?},\_}^{(1)}

We are already building the first hidden layer, so the superscript stays (1)(1).

Now add the first subscript: the destination unit.

The first column builds z1(1)\large z_\mathbf{1}^{(1)}, so its weights get destination index 1.

The second column builds z2(1)\large z_\mathbf{2}^{(1)}, so its weights get destination index 2.

The third column builds z3(1)\large z_\mathbf{3}^{(1)}, so its weights get destination index 3.

[w1,_(1)w2,_(1)w3,_(1)w1,_(1)w2,_(1)w3,_(1)]z1(1)z2(1)z3(1)\begin{array}{c} \begin{bmatrix} \textcolor{#E69F00}{w_{1,\_}^{(1)}} & \textcolor{#009E73}{w_{2,\_}^{(1)}} & \textcolor{#CC79A7}{w_{3,\_}^{(1)}} \\[1.0em] \textcolor{#E69F00}{w_{1,\_}^{(1)}} & \textcolor{#009E73}{w_{2,\_}^{(1)}} & \textcolor{#CC79A7}{w_{3,\_}^{(1)}} \end{bmatrix} \\[1.0em] \begin{array}{ccc} \downarrow & \downarrow & \downarrow \\[0.5em] \textcolor{#E69F00}{z_1^{(1)}} & \textcolor{#009E73}{z_2^{(1)}} & \textcolor{#CC79A7}{z_3^{(1)}} \end{array} \end{array}

Now the equations carry that same destination index.

x1w1,_(1)+x2w1,_(1)=z1(1)x_1 \textcolor{#E69F00}{w_{1,\_}^{(1)}} + x_2 \textcolor{#E69F00}{w_{1,\_}^{(1)}} = \textcolor{#E69F00}{z_1^{(1)}} x1w2,_(1)+x2w2,_(1)=z2(1)x_1 \textcolor{#009E73}{w_{2,\_}^{(1)}} + x_2 \textcolor{#009E73}{w_{2,\_}^{(1)}} = \textcolor{#009E73}{z_2^{(1)}} x1w3,_(1)+x2w3,_(1)=z3(1)x_1 \textcolor{#CC79A7}{w_{3,\_}^{(1)}} + x_2 \textcolor{#CC79A7}{w_{3,\_}^{(1)}} = \textcolor{#CC79A7}{z_3^{(1)}}

That is the first subscript:

destination unit first


5. Filling the Source Slot: wj,_(1)w_{j,\_}^{(1)}

The second subscript tells us where the weight is coming from.

  • weights in the first row come from source 1
  • weights in the second row come from source 2
[w1,1(1)w2,1(1)w3,1(1)w1,2(1)w2,2(1)w3,2(1)]source 1source 2z1(1)z2(1)z3(1)\begin{array}{cc} \begin{bmatrix} \textcolor{#E69F00}{w_{1,1}^{(1)}} & \textcolor{#009E73}{w_{2,1}^{(1)}} & \textcolor{#CC79A7}{w_{3,1}^{(1)}} \\[1.1em] \textcolor{#E69F00}{w_{1,2}^{(1)}} & \textcolor{#009E73}{w_{2,2}^{(1)}} & \textcolor{#CC79A7}{w_{3,2}^{(1)}} \end{bmatrix} & \begin{array}{l} \leftarrow \small \text{source 1} \\[1.1em] \leftarrow \small \text{source 2} \end{array} \\[1.1em] \begin{array}{ccc} \downarrow & \downarrow & \downarrow \\[0.5em] \textcolor{#E69F00}{z_1^{(1)}} & \textcolor{#009E73}{z_2^{(1)}} & \textcolor{#CC79A7}{z_3^{(1)}} \end{array} & {} \end{array}

Now every weight has a full address:

x1w1,1(1)+x2w1,2(1)=z1(1)x_1 \textcolor{#E69F00}{w_{1,1}^{(1)}} + x_2 \textcolor{#E69F00}{w_{1,2}^{(1)}} = \textcolor{#E69F00}{z_1^{(1)}} x1w2,1(1)+x2w2,2(1)=z2(1)x_1 \textcolor{#009E73}{w_{2,1}^{(1)}} + x_2 \textcolor{#009E73}{w_{2,2}^{(1)}} = \textcolor{#009E73}{z_2^{(1)}} x1w3,1(1)source 1+x2w3,2(1)source 2=z3(1)\underbrace{x_1\textcolor{#CC79A7}{w_{3,1}^{(1)}}}_{\text{source 1}} + \underbrace{x_2\textcolor{#CC79A7}{w_{3,2}^{(1)}}}_{\text{source 2}} = \textcolor{#CC79A7}{z_3^{(1)}}

That is the full weight address:

destination unit first, source input second


Scoring the Entire Logbook at Once

So far, we scored one launch attempt:

xWT=1×3x \cdot W^T = 1 \times 3

That gave us one row of hidden-layer scores:

z(1)=[z1(1)z2(1)z3(1)]z^{(1)} = \begin{bmatrix} z_1^{(1)} & z_2^{(1)} & z_3^{(1)} \end{bmatrix}

But the launch logbook has more than one row.

If we score the whole logbook, lowercase xx becomes capital XX:

X=n×2X = n \times 2
  • nn launch attempts.
  • 2 input clues: Wind and Fuel.

Because we are still building 3 hidden units, the weight matrix stays the same.

WT=2×3W^T = 2 \times 3

So the multiplication becomes:

XWT=(n×2)(2×3)=n×3X \cdot W^T = (n \times 2)(2 \times 3) = n \times 3

The inner 2s still match.

The output is now:

one score row per launch attempt

For two launch attempts, it looks like this:

[x1x2x1x2]X[w1,1(1)w2,1(1)w3,1(1)w1,2(1)w2,2(1)w3,2(1)]WT\underbrace{ \begin{bmatrix} x_1 & x_2 \\ x_1 & x_2 \end{bmatrix} }_{X} \cdot \underbrace{ \begin{bmatrix} w_{1,1}^{(1)} & w_{2,1}^{(1)} & w_{3,1}^{(1)} \\[0.8em] w_{1,2}^{(1)} & w_{2,2}^{(1)} & w_{3,2}^{(1)} \end{bmatrix} }_{W^T} =[z1(1)z2(1)z3(1)z1(2)z2(2)z3(2)]launch 1launch 2= \begin{array}{cc} \begin{bmatrix} z_1^{(1)} & z_2^{(1)} & z_3^{(1)} \\ z_1^{(2)} & z_2^{(2)} & z_3^{(2)} \end{bmatrix} & \begin{array}{l} \leftarrow \small \text{launch 1} \\ \leftarrow \small \text{launch 2} \end{array} \end{array}

That is the trick.

We do not score Launch 1, then come back and score Launch 2.

One matrix multiplication gives hidden-layer scores for the whole logbook at once.


Vectorization: Scoring the Whole Page at Once

In the last section, we stopped scoring one launch attempt at a time and scored the whole logbook instead.

That move has a name: vectorization

Vectorization is where linear algebra starts doing the heavy lifting.

Instead of treating the launch logbook as separate rows, we treat it as one matrix.

That lets one matrix multiplication score the whole logbook in one pass:

XWT=Z(1)X \cdot W^T = Z^{(1)}

In our rocket story:

  • lowercase xx is one launch attempt
  • capital XX is the whole launch logbook

So vectorization is the jump from one row to many rows:

xXx \rightarrow X

One launch attempt becomes the whole logbook.

And the formula barely changes:

xWTXWTx \cdot W^T \quad \longrightarrow \quad X \cdot W^T

That is why linear algebra is so powerful in neural networks.

It lets the model stop doing this:

  • score launch 1
  • score launch 2
  • score launch 3

and start doing this:

score the whole logbook at once

The shapes show the speed trick:

(n×2)(2×3)=n×3(n \times 2)(2 \times 3) = n \times 3

That output means:

one score row per launch attempt

Vectorization is not just cleaner notation.

It is how the same calculation scales from one launch attempt to the whole logbook.

Mission Control Briefing

Input Layer (in)

The nervous intern with the clipboard: Wind and Fuel. It does not decide anything; it just reports the facts.

Input Layer (in). The nervous intern with the clipboard: Wind and Fuel. It does not decide anything; it just reports the facts.

Quiz

86% of people love quizzes after learning. Are you one of them?

Question 1 of 12 🏆 0 / 120 ⚡ Attempt 1 of 2

Question text