Problem Set 2
(4.1) Verify that the entropy function satisfies the required properties of continuity, non-negativity, boundedness, and independence.
We define the entropy of a discrete probability distribution \(p = (p_1, p_2, \ldots, p_X)\) over \(X\) outcomes as \[ H(p) \;=\; -\sum_{i=1}^{X} p_i \,\log p_i, \] where \(\sum_{i=1}^{X} p_i = 1\). We typically use the convention that \(0 \log 0 = 0\), and the base of the logarithm can be 2 (bits) or \(e\) (nats).
- Continuity in \(p\)
Each term \(-\,p_i \log p_i\) is continuous in \(p_i\) over the interval \([0,1]\). A finite sum of continuous terms remains continuous, so \(H(p)\) is a continuous function of the probability vector \(p\). Small perturbations in any \(p_i\) lead to correspondingly small changes in \(\sum_{i} p_i \log p_i\). - Non-negativity (\(H(p) \ge 0\))
- For any \(0 < p_i \le 1\), we have \(\log p_i \le 0\), so \(-\,p_i \log p_i \ge 0\). Summing these gives \(H(p) \ge 0\).
- Moreover, \(H(p) = 0\) if and only if the distribution places all its mass on a single outcome (say \(p_i = 1\) for some \(i\), and 0 elsewhere). In that degenerate case, observing the outcome yields no new information (it is known in advance), so the entropy must be zero. - Boundedness, Maximum for Uniform Distribution
- It is a classic result that \[ H(p) \; \le \;\log X, \] with equality if and only if \(p_i = 1/X\) for all \(i\). That is, \(H(p)\) achieves its unique maximum when the distribution \(p\) is completely uniform (all outcomes equally likely). In that case, \[ H\bigl(\tfrac{1}{X}, \tfrac{1}{X}, \ldots, \tfrac{1}{X}\bigr) \;=\; -\sum_{i=1}^{X} \tfrac{1}{X} \,\log\tfrac{1}{X} \;=\; \log X. \] Thus, the entropy is bounded above by \(\log X\). - Additivity for Independent Variables
Suppose \(x\) and \(y\) are independent random variables with distributions \(p(x)\) and \(q(y)\), respectively, over possibly different sets of outcomes. Then the joint distribution of \((x,y)\) is \(p(x)\,q(y)\). The joint entropy is \[ H(x, y) \;=\; -\sum_{x}\!\sum_{y} \; p(x)\, q(y) \,\log\bigl[p(x)\,q(y)\bigr]. \] Using \(\log(ab) = \log a + \log b\), we get \[ H(x, y) \;=\; -\sum_{x}\!\sum_{y} \; p(x)\,q(y)\, \bigl[\log p(x) + \log q(y)\bigr]. \] This expands to \[ H(x, y) \;=\; -\sum_{x}\!\sum_{y} p(x)\,q(y)\,\log p(x) \;-\sum_{x}\!\sum_{y} p(x)\,q(y)\,\log q(y). \] Factor out the sums: \[ H(x, y) \;=\; -\sum_{x} p(x)\,\log p(x) \;-\;\sum_{y} q(y)\,\log q(y) \;=\; H(x) \;+\; H(y). \] Hence, entropy is additive over independent distributions.
These four properties (continuity, non-negativity, boundedness, and additivity for independent variables) uniquely characterize Shannon entropy as the standard measure of “information content” for discrete probability distributions.
(4.2) Prove the relationships in Equation (4.10).
Using Bayes' theorem \( p(x, y) = p(x|y)p(y) \), we derive:
\[ H(X, Y) = H(Y) + H(X|Y) \] \[ I(X;Y) = H(X) + H(Y) - H(X, Y) \]
(4.3) Consider a binary channel that has a small probability ϵ of making a bit error.
(a) What is the probability of an error if a bit is sent independently three times and the value determined by majority voting?
\[ P_{error} = 3 \epsilon^2 (1 - \epsilon) + \epsilon^3 \]
(b) How about if that is done three times, and majority voting is done on the majority voting?
Repeating the process reduces the error probability iteratively.
(c) If majority voting on majoriy voting on ... on majority voting is done N times, how many bits are needed, and what is the probability of an error? How does this probability depend on ϵ?
For N repetitions, the error probability reduces exponentially.
(4.4) Calculate the differential entropy of a Gaussian process
The entropy of a Gaussian distribution with variance \( \sigma^2 \) is:
\[ H(X) = \frac{1}{2} \log (2\pi e \sigma^2) \]
(4.5) A standard telephone line is specified to have a bandwidth of 3300 Hz and an SNR of 20 dB.
(a) What is the capacity?
The Shannon-Hartley theorem states:
\[ C = B \log_2(1 + \text{SNR}) \]
Given:
- Bandwidth \( B = 3300 \) Hz
- Signal-to-Noise Ratio \( \text{SNR} = 100 \)
Calculation:
\[ C = 3300 \times \log_2(101) \]
\[ C \approx 3300 \times 6.6582 \]
\[ C \approx 21,972.6 \text{ bits per second} \]
Answer: The channel capacity is approximately 21.9 kbps.
(b) What SNR would be necessary for the capacity to be 1 Gbit/s?
To achieve a capacity of 1 Gbps with \( B = 3300 \) Hz:
\[ 1 \times 10^9 = 3300 \times \log_2(1 + \text{SNR}) \]
Rearrange for \( \text{SNR} \):
\[ \log_2(1 + \text{SNR}) = \frac{1 \times 10^9}{3300} \]
\[ \log_2(1 + \text{SNR}) \approx 303,030.3 \]
Converting from base 2 logarithm to natural logarithm:
\[ \ln(1 + \text{SNR}) = 303,030.3 \times \ln(2) \]
\[ \ln(1 + \text{SNR}) \approx 303,030.3 \times 0.6931 \]
\[ \ln(1 + \text{SNR}) \approx 210,721.5 \]
Solving for \( \text{SNR} \):
\[ 1 + \text{SNR} = e^{210,721.5} \]
Conclusion: The required \( \text{SNR} \) is astronomically high, making it infeasible to achieve 1 Gbps with this bandwidth.
(4.6) Let (x1,x2,...,xn) be drawn from a Gaussian distribution with variance σ2 and unknown mean value x0. Show that f(x1,...xn) = n−1 n i=1 xi is an estimator for x0 that is unbiased and achieves the Cram´ er–Rao lower bound.
For unbiased estimator of mean, verify:
\[ \text{Variance of estimator} \geq \frac{1}{J(\alpha)} \]