Information Theory

Now, we will show that Information Theory provides the language necessary to describe the metrization procedure in detail.

It is possible to introduce Information Theory axiomatically by a suitable generalization of the axioms[16] in Feinstein.[17] But to simplify the discussion here, we will use the less elegant but equivalent method of defining certain definite integrals. The probability density distribution p is defined from the cumulative probability distribution P by

P(X′) = ∫X′_{measurable ⊂ X} p(x)dx.(1)

Then the information rate H is defined as

H(X) = -∫ₓp(x) ln κ p(x)dx(2)

where kappa has (carries) the units of X. Finally, the channel rate R is defined as

R	(⨀Xᵢ) =	Σ	H(Xᵢ) - H(X),(3)
	I	I

where X is the denumerable[18] cartesian product space

X =	⨂	Xᵢ.(4)
	I

Next, we define the angle Θ

			-R(⨀Xᵢ)
			I
\| Θ	(⨀Xᵢ) \| =	sin⁻¹e	(5)
	I

and the norm

|X| = κ(2πe)⁻¹ᐟ² e^H(X).(6)

Now, if[19] a statistically independent basis; i.e., one for which

	κ
R(⨀Xᵢ)	≡	constant,(7)
I

can be provided in terms of one-dimensional components; i.e., none of them can be decomposed further, then it is just the usual problem of diagonalization of a symmetric matrix by means of a congruence transformation to provide an orthogonal coordinate system. Furthermore, for uniqueness, we arrange the spectrum in decreasing order. Then, by means of the Radon Nikodym theorem applied to each of these one-dimensional axes, the probability distribution may be made; e.g., Gaussian, if desired. Thus, we obtain the promised orthogonal Euclidean space.