The Shannon entropy is a statistical quantifier extensively used for the characterization of complex systems. It can be interpreted as:

  • Measure of Uncertainty: It quantifies the unpredictability of information content. Higher entropy indicates greater uncertainty or variability in the outcomes of a random variable.
  • Information Content: It represents the average number of bits needed to encode messages from a source. A source with uniform probability distribution (where all outcomes are equally likely) has maximum entropy, while a deterministic source (where one outcome is certain) has zero entropy.

When observed over time, the entropy is frequently used for anomalies detection. Expressive variations in the entropy levels of a system can indicate a significant change in the system itself.

import math
import numpy as np
import pandas as pd
from typing import Tuple
def message_entropy(msg: str, base: int = 2) -> Tuple:
    """Calculates the Shannon entropy of a string message."""
    add = 0
    symbols = {}
    n = len(msg)
    chars = set(list(msg))
    for char in chars:
        proba = msg.count(char) / n
        add += proba * math.log(proba, base)
        symbols[char] = proba
    return add * -1, symbols
h, symbols = message_entropy(msg="successful", base=2)
print(f"Entropy: {h:.4f}\nSymbols: {symbols}")
Entropy: 2.4464
Symbols: {'u': 0.2, 'f': 0.1, 'l': 0.1, 's': 0.3, 'c': 0.2, 'e': 0.1}
h, symbols = message_entropy(msg="successful", base=6)
print(f"Entropy: {h:.4f}\nSymbols: {symbols}")
Entropy: 0.9464
Symbols: {'u': 0.2, 'f': 0.1, 'l': 0.1, 's': 0.3, 'c': 0.2, 'e': 0.1}
h, symbols = message_entropy(msg="HELLO", base=2)
print(f"Entropy: {h:.4f}\nSymbols: {symbols}")
Entropy: 1.9219
Symbols: {'E': 0.2, 'O': 0.2, 'L': 0.4, 'H': 0.2}
  • 1.92 (~ 2) bits needed for encode each symbol in the message.
SymbolCode
H00
E01
L10
O11

Entropy Rate of a Markov Chain

adj_mat_A = np.array([[0, 1, 0], [0.25, 0.5, 0.25], [0.5, 0.5, 0]])
pd.DataFrame(
    adj_mat_A,
    index=["(Origin) State 1", "(Origin) State 2", "(Origin) State 3"],
    columns=["State 1", "State 2", "State 3"],
)
State 1State 2State 3
(Origin) State 10.001.00.00
(Origin) State 20.250.50.25
(Origin) State 30.500.50.00
# changing the system's probabilities
adj_mat_B = np.array([[0, 1, 0], [0.05, 0.9, 0.05], [0.05, 0.95, 0]])
pd.DataFrame(
    adj_mat_B,
    index=["(Origin) State 1", "(Origin) State 2", "(Origin) State 3"],
    columns=["State 1", "State 2", "State 3"],
)
State 1State 2State 3
(Origin) State 10.001.000.00
(Origin) State 20.050.900.05
(Origin) State 30.050.950.00
# adding a new state
adj_mat_C = np.array(
    [[0, 1, 0, 0], [0.025, 0.8, 0.025, 0.15], [0.05, 0.95, 0, 0], [0, 0, 0, 0]]
)
pd.DataFrame(
    adj_mat_C,
    index=[
        "(Origin) State 1",
        "(Origin) State 2",
        "(Origin) State 3",
        "(Origin) State 4",
    ],
    columns=["State 1", "State 2", "State 3", "State 4"],
)
State 1State 2State 3State 4
(Origin) State 10.0001.000.0000.00
(Origin) State 20.0250.800.0250.15
(Origin) State 30.0500.950.0000.00
(Origin) State 40.0000.000.0000.00
def estimate_markov_chain_entropy_rate(adj_mat: np.array) -> float:
    """Calculates the Shannon entropy for a Markov Chain given your
    adjacency matrix representation."""
    m = adj_mat + 1e-10
    m_norm = np.apply_along_axis(
        arr=m, func1d=lambda row: row / sum(row), axis=1
    )
    return -1 * np.sum(m_norm * np.log2(m_norm))
estimate_markov_chain_entropy_rate(adj_mat_A)
2.5000000103485926
estimate_markov_chain_entropy_rate(adj_mat_B)
0.8553925621663911
estimate_markov_chain_entropy_rate(adj_mat_C)
3.2205806955477687

There are two possibilities for changes in a system represented by Markov chains:

  • Without new states creation - Temporal entropy variation
  • With new states creation - Spatial entropy variation (have a greater impact on entropy variation)

This strategy can be used to monitor concept drift in both the pre-modeling and post-modeling phases by detecting changes in a model’s response.

Spatial Entropy

Moreover, as proposed by Von Neumann, the Shannon entropy can be used to describe the spatial etropy and thus serving as a criterion for choosing spaces.

Using normalized eigenvalues from Principal Component Analysis (PCA) as probabilities to estimate the entropy of a data space involves several key steps. This technique leverages the relationship between eigenvalues, variance, and information content in datasets.

The spatial entropy value provides insight into the complexity or disorder within the dataset. A higher entropy indicates a more complex structure with less predictability, while lower entropy suggests a more ordered and predictable structure.