Hopfield Networks - Part 1

With John Hopfield recently receiving the Nobel Prize in Physics, it’s the perfect time to dive into his groundbreaking work on neural networks. In this series, I’ll explore the fascinating world of Hopfield networks—an elegant and biologically inspired model for associative memory. These networks not only provide insights into brain function but also have broad applications in computational neuroscience and machine learning. In this first post, we’ll break down the core principles of Hopfield networks, their characteristics, and I’ll walk you through Python implementations to get you started with this powerful concept.

Hopfield Nobel — ©Johan Jarnestad/The Royal Swedish Academy of Sciences

Introduction: Associative Memory and Hopfield Networks

Associative memory is a remarkable feature of the brain that allows us to retrieve information from partial or noisy inputs. In computational terms, the problem of associative memory can be stated as follows:

Store a set of $p$ patterns $x_i^\mu$ in a network such that when the network is presented with a new pattern $z_i$, it responds by recalling the stored pattern that is most similar to it.

Here, the index $\mu$ labels the stored patterns ($\mu = 1, 2, \dots, p$), while the index $i$ refers to the components of each pattern ($i = 1, 2, \dots, N$). We consider binary patterns, meaning each component of a pattern takes only two values: $0$ or $1$.

A straightforward way to solve this problem would be to compare the new pattern against the stored ones, using a similarity measure such as the Hamming distance [¹], and select the closest match. However, our goal is to design a neural network capable of performing this task autonomously. This is where Hopfield networks come into play.

Hopfield networks [²] offer a biologically inspired way to model associative memory by leveraging the collective behavior of interconnected neurons. Unlike look-up based methods, Hopfield networks operate dynamically, evolving over time to retrieve stored information.

The key idea is that a Hopfield network consists of $N$ interconnected neurons, each capable of influencing the others based on predefined connection strengths. The network operates as a dynamical system, where given an initial configuration (which may be a noisy or partial version of a stored pattern), it will naturally evolve toward a stable state corresponding to the closest stored pattern. In this series, I will follow the presentation from [³], with some elements from [⁴]. You can also refer to this great lecture from MIT if you want a nice introduction with more context.

Formalization of the problem

For mathematical convenience, we use a model in which the state of the networks are still binary, but equal to $-1$ and $+1$ instead of $0$ and $1$. We will denote the state of each neuron with $S_i$ where $i = 1, 2, \ldots, N$. Since these neurons are connected among each other, we will encode the strength of this connection is a matrix with elements $w_{ij}$. The network evolves in time according to the following rule:

\begin{equation} S_i(t+1) = \text{sgn}\left(\sum_{j} w_{ij} S_j(t)\right) \end{equation}

In which the function $\text{sgn}$ is the sign function, which returns $+1$ if the argument is positive and $-1$ otherwise.

Now, this update can be conducted in two ways: synchronous or asynchronous. In the synchronous case, all the neurons are updated at the same time, while in the asynchronous case, the neurons are updated one at a time, with the neuron chosen at random. In this post, we will focus on the synchronous case, as it is easier to implement. From the point of view of the theoretical analysis, they are basically identical.

We can already start implementing those elements in Python. Let’s start by generating a random pattern.

# create a random binary pattern
bin_pattern = np.array([1,0,1])

As we said, we will use the binary patterns, but we will encode them as $-1$ and $+1$.

def ztm(pattern): # zeros to minus ones
    return np.where(pattern == 0, -1, pattern)

def mtz(pattern): # minus ones to zeros
    return np.where(pattern == -1, 0, pattern)

pattern = ztm(bin_pattern)

[ 1 -1  1]

One pattern

Let’s start by considering the case in which we have only one pattern. In this case, we want the network to converge to the pattern itself, which needs to be stable. The stability consists in the fact that the pattern is an equilibrium point of the dynamics, i.e., if we start from the pattern, we will remain there. Formally:

\[\begin{equation} S_i(t+1) = S_i(t) \quad \forall i \end{equation}\]

Since our pattern is $x_i$ (we drop the $\mu$ index since we have only one pattern), the condition for the stability of the pattern is that:

\[\begin{equation} x_i = \text{sgn}\left(\sum_{j} w_{ij} x_j\right) \quad \forall i \end{equation}\]

One can see that this condition is satisfied if the element $w_{ij}$ is proportional to the product $x_i x_j$ (as $x_j^2 = 1$). It makes sense to take the constant of proportionality to be $1/N$, so that the matrix $w_{ij}$ is defined as:

\[\begin{equation} w_{ij} = \frac{1}{N} x_i x_j \end{equation}\]

In fact:

\begin{equation} x_i = \text{sgn}\left(\sum_{j} \frac{1}{N} x_i x_j x_j\right) = \text{sgn}\left(\frac{1}{N} x_i \sum_{j} x_j x_j\right) = \text{sgn}\left(\frac{1}{N} x_i N\right) = x_i \end{equation}

which explains the choice of the constant of proportionality.

In python, this can easily be implemented using the np.outer function.

weights = np.outer(pattern, pattern)
print(weights)

[[ 1 -1  1]
 [-1  1 -1]
 [ 1 -1  1]]

We can now verify that the pattern is stable by defining a step function that updates the state of the network according to the rule we defined above.

def step(pattern, weights):
    return np.sign(np.dot(weights, pattern)).astype(np.int64)

print(mtz(step(pattern, weights)))

[ 1 0  1]

As expected, applying the step function to the pattern, we obtain the same pattern as output (i.e., the pattern is stable). But as we said, we also require that the pattern is an attractor: if we start from a pattern that is close to the original pattern, the network should converge to the original pattern. To see that, let’s consider an initial state (which has the same shape as the pattern) that is different from the original pattern. We will call it $k_i$. Let’s see what happens when the network evolves starting from this state.

\[\begin{equation} \text{sgn}\left(\sum_{j} w_{ij} k_j\right) = \text{sgn}\left(\sum_{j} \frac{1}{N}x_i x_j k_j\right) = \text{sgn}\left(\frac{1}{N} x_i \sum_{j} x_j k_j\right) \end{equation}\]

We can now notice that the sum can be divided into the neurons $k_j$ that have the same values as the stored pattern (i.e., they are correct) and the one that are in the opposite state (incorrect). The sum can be then written as:

\[\begin{equation} \text{sgn}\left(\sum_{j} w_{ij} k_j\right) = \text{sgn}\left(\frac{1}{N} x_i ( \underbrace{\sum_{j=\text{correct}} x_j k_j + \sum_{j=\text{incorrect}} x_j k_j }_{A})\right) \end{equation}\]

So, if the number of correct neurons is greater than the number of incorrect neurons, all the terms of $A$ together will lead to a positive contribution, and the sign of the result will be given by $x_i$, which is the value of the stored pattern. Let’s see this:

new_pattern = ztm(np.array([1,0,0]))
print(step(new_pattern, weights))

[ 1 0  1]

another_new_pattern = ztm(np.array([0,0,1]))
print(mtz(step(another_new_pattern, weights)))

[ 1 0  1]

Yet, if we start from a pattern that has more than one element different from the original pattern, the network will not converge to the original pattern.

different_pattern = ztm(np.array([0,1,0]))
print(mtz(step(different_pattern, weights)))

[ 0 1  0]

Indeed, the network converges to a different pattern, which in this case is the negation (all the values flipped) of the original pattern. We can verify that this pattern is also stable:

neg_pattern = ztm(np.array([0,1,0]))
print(mtz(step(neg_pattern, weights)))

[ 0 1  0]

So in a sense, the network has learned two memories: the original pattern and its negation. This is due to the simmetry of $W$. Each initial state will converge two the closest (in the sense of Hamming distance) of these two patterns.

Multiple Patterns

Let’s now consider the case in which we have $p$ patterns. As we already said, we would like the network to converge to one of the stored patterns when we start from an initial state that is close to it. Let’s start by generating a set of random patterns:

n_patterns = 4
size = 5
bin_patterns = np.random.randint(2, size=(n_patterns, size))
print(bin_patterns)

[[0 1 0 0 0]
 [1 0 0 0 1]
 [0 0 0 0 1]
 [0 1 1 1 0]]

For simplicity, we stored them in a matrix in which the rows are the patterns. In analogy to the case of one pattern, in which we build the matrix as the outer product of the pattern with itself, we can try building the matrix $W$ as the superposition these outer product, which is:

\[\begin{equation}\label{eqn:hebbian_w} w_{ij} = \frac{1}{N} \sum_{\mu}^p x_i^\mu x_j^\mu \end{equation}\]

In python:

# remember to convert them
patterns = ztm(bin_patterns)

W = np.zeros((size, size))
for pattern in patterns:
    W += np.outer(pattern, pattern)
W /= size
print(W)

[[ 0.8 -0.4  0.   0.   0.4]
 [-0.4  0.8  0.4  0.4 -0.8]
 [ 0.   0.4  0.8  0.8 -0.4]
 [ 0.   0.4  0.8  0.8 -0.4]
 [ 0.4 -0.8 -0.4 -0.4  0.8]]

This is a form of the Hebbian learning rule, which state that the strength of the connection between two neurons should be proportional to the correlation [⁵] between their activities. In fact, to make that explicit we can see that $W$ can directly be computed using:

#cool way
W_2 = 1/size*(patterns.T@patterns)
print(W_2)

[[ 0.8 -0.4  0.   0.   0.4]
 [-0.4  0.8  0.4  0.4 -0.8]
 [ 0.   0.4  0.8  0.8 -0.4]
 [ 0.   0.4  0.8  0.8 -0.4]
 [ 0.4 -0.8 -0.4 -0.4  0.8]]

This can be done in a more compact way using the np.einsum function:

# cooler_way
W_1 = np.einsum('ij,ik->jk', patterns, patterns) / size 
print(W_1)

[[ 0.8 -0.4  0.   0.   0.4]
 [-0.4  0.8  0.4  0.4 -0.8]
 [ 0.   0.4  0.8  0.8 -0.4]
 [ 0.   0.4  0.8  0.8 -0.4]
 [ 0.4 -0.8 -0.4 -0.4  0.8]]

We can now verify that the patterns are fixed points by using our step function.

print(mtz(patterns[0]))
print(mtz(step(patterns[0],W)))

[0 1 0 0 0]
[0 1 0 0 0]

In this post, we’ve explored the foundational aspects of Hopfield networks and implemented the basic mechanics behind associative memory. While I had originally planned to also discuss the stability of stored patterns, I encountered a few discrepancies in the sources I was using. These errors require a more in-depth correction and clarification, so I’ll dedicate a separate post to thoroughly address the stability conditions and ensure a complete understanding. Stay tuned for the next installment, where we’ll dive deeper into the mathematical details and proper formulation of stability in Hopfield networks.

The Hamming distance between two patterns is defined as the number of components in which they differ. ↩
Hopfield, John J. “Neural networks and physical systems with emergent collective computational abilities.” Proceedings of the national academy of sciences 79.8 (1982): 2554-2558. ↩
Hertz, John A. Introduction to the theory of neural computation. Crc Press, 1991. ↩
Rojas, Raúl. Neural networks: a systematic introduction. Springer Science & Business Media, 1996 ↩
You can convince yourself that if you have and array $X$ of $p$ data points of dimension $N$, the covariance matrix is given by $\frac{1}{p}X^T X$. In our case we are doing the same but dropping the $p$ factor and replacing it with $N$. This doesn’t really matter, as the only thing we need is the proportionality. ↩

Written on February 9, 2025