The Shannon entropy is a statistical quantifier extensively used for the characterization of complex systems. It can be interpreted as:
- Measure of Uncertainty: It quantifies the unpredictability of information content. Higher entropy indicates greater uncertainty or variability in the outcomes of a random variable.
- Information Content: It represents the average number of bits needed to encode messages from a source. A source with uniform probability distribution (where all outcomes are equally likely) has maximum entropy, while a deterministic source (where one outcome is certain) has zero entropy.
When observed over time, the entropy is frequently used for anomalies detection. Expressive variations in the entropy levels of a system can indicate a significant change in the system itself.
Entropy: 2.4464
Symbols: {'u': 0.2, 'f': 0.1, 'l': 0.1, 's': 0.3, 'c': 0.2, 'e': 0.1}
Entropy: 0.9464
Symbols: {'u': 0.2, 'f': 0.1, 'l': 0.1, 's': 0.3, 'c': 0.2, 'e': 0.1}
Entropy: 1.9219
Symbols: {'E': 0.2, 'O': 0.2, 'L': 0.4, 'H': 0.2}
- 1.92 (~ 2) bits needed for encode each symbol in the message.
Symbol | Code |
---|---|
H | 00 |
E | 01 |
L | 10 |
O | 11 |
Entropy Rate of a Markov Chain
State 1 | State 2 | State 3 | |
---|---|---|---|
(Origin) State 1 | 0.00 | 1.0 | 0.00 |
(Origin) State 2 | 0.25 | 0.5 | 0.25 |
(Origin) State 3 | 0.50 | 0.5 | 0.00 |
State 1 | State 2 | State 3 | |
---|---|---|---|
(Origin) State 1 | 0.00 | 1.00 | 0.00 |
(Origin) State 2 | 0.05 | 0.90 | 0.05 |
(Origin) State 3 | 0.05 | 0.95 | 0.00 |
State 1 | State 2 | State 3 | State 4 | |
---|---|---|---|---|
(Origin) State 1 | 0.000 | 1.00 | 0.000 | 0.00 |
(Origin) State 2 | 0.025 | 0.80 | 0.025 | 0.15 |
(Origin) State 3 | 0.050 | 0.95 | 0.000 | 0.00 |
(Origin) State 4 | 0.000 | 0.00 | 0.000 | 0.00 |
2.5000000103485926
0.8553925621663911
3.2205806955477687
There are two possibilities for changes in a system represented by Markov chains:
- Without new states creation - Temporal entropy variation
- With new states creation - Spatial entropy variation (have a greater impact on entropy variation)
This strategy can be used to monitor concept drift in both the pre-modeling and post-modeling phases by detecting changes in a modelβs response.
Spatial Entropy
Moreover, as proposed by Von Neumann, the Shannon entropy can be used to describe the spatial etropy and thus serving as a criterion for choosing spaces.
Using normalized eigenvalues from Principal Component Analysis (PCA) as probabilities to estimate the entropy of a data space involves several key steps. This technique leverages the relationship between eigenvalues, variance, and information content in datasets.
The spatial entropy value provides insight into the complexity or disorder within the dataset. A higher entropy indicates a more complex structure with less predictability, while lower entropy suggests a more ordered and predictable structure.