# 23.1. Random Vectors¶

A vector valued random variable, or more simply, a random vector, is a list of random variables defined on the same space. We will think of it as a column.

$\begin{split} \mathbf{X} ~ = ~ \begin{bmatrix} X_1 \\ X_2 \\ \vdots \\ X_n \end{bmatrix} \end{split}$

For ease of display, we will sometimes write $$\mathbf{X} = [X_1 X_2 \ldots X_n]^T$$ where $$\mathbf{M}^T$$ is notation for the transpose of the matrix $$\mathbf{M}$$.

The mean vector of $$\mathbf{X}$$ is $$\boldsymbol{\mu} = [\mu_1 ~ \mu_2 ~ \ldots ~ \mu_n]^T$$ where $$\mu_i = E(X_i)$$.

The covariance matrix of $$\mathbf{X}$$ is the $$n \times n$$ matrix $$\boldsymbol{\Sigma}$$ whose $$(i, j)$$ element is $$Cov(X_i, X_j)$$.

The $$i$$th diagonal element of $$\boldsymbol{\Sigma}$$ is the variance of $$X_i$$. The matrix is symmetric because of the symmetry of covariance.

## 23.1.1. Linear Transformation: Mean Vector¶

Let $$\mathbf{A}$$ be an $$m \times n$$ numerical matrix and $$\mathbf{b}$$ an $$m \times 1$$ numerical vector. Consider the $$m \times 1$$ random vector $$\mathbf{Y} = \mathbf{AX} + \mathbf{b}$$. Then the $$i$$th element of $$\mathbf{Y}$$ is

$Y_i ~ = ~ \mathbf{A}_{i*}\mathbf{X} + \mathbf{b}(i)$

where $$\mathbf{A}_{i*}$$ denotes the $$i$$th row of $$\mathbf{A}$$ and $$\mathbf{b}(i)$$ denotes the $$i$$th element of $$\mathbf{b}$$. Written longhand,

$Y_i ~ = ~ a_{i1}X_1 + a_{i2}X_2 + \cdots + a_{in}X_n + b_i$

where $$a_{ij}$$ is the $$(i, j)$$ entry of $$\mathbf{A}$$ and $$b_i = \mathbf{b}(i)$$.

Thus $$Y_i$$ is a linear combination of the elements of $$\mathbf{X}$$. Therefore by linearity of expectation,

$E(Y_i) ~ = ~ \mathbf{A}_{i*} \boldsymbol{\mu} + \mathbf{b}(i)$

Let $$\boldsymbol{\mu}_\mathbf{Y}$$ be the mean vector of $$\mathbf{Y}$$. Then by the calculation above,

$\boldsymbol{\mu}_\mathbf{Y} ~ = ~ \mathbf{A} \boldsymbol{\mu} + \mathbf{b}$

## 23.1.2. Linear Transformation: Covariance Matrix¶

$$Cov(Y_i, Y_j)$$ can be calculated using bilinearity of covariance.

\begin{split} \begin{align*} Cov(Y_i, Y_j) ~ &= ~ Cov(\mathbf{A}_{i*}\mathbf{X}, \mathbf{A}_{j*}\mathbf{X}) \\ &= ~ Cov\big{(} \sum_{k=1}^n a_{ik}X_k, \sum_{l=1}^n a_{jl}X_l \big{)} \\ &= ~ \sum_{k=1}^n\sum_{l=1}^n a_{ik}a_{jl}Cov(X_k, X_l) \\ &= ~ \sum_{k=1}^n\sum_{l=1}^n a_{ik}Cov(X_k, X_l)t_{lj} ~~~~~ \text{where } t_{lj} = \mathbf{A}^T(l, j) \\ \end{align*} \end{split}

This is the $$(i, j)$$ element of $$\mathbf{A}\boldsymbol{\Sigma}\mathbf{A}^T$$. So if $$\boldsymbol{\Sigma}_\mathbf{Y}$$ denotes the covariance matrix $$\mathbf{Y}$$, then

$\boldsymbol{\Sigma}_\mathbf{Y} ~ = ~ \mathbf{A} \boldsymbol{\Sigma} \mathbf{A}^T$

## 23.1.3. Constraints on $$\boldsymbol{\Sigma}$$¶

We know that $$\boldsymbol{\Sigma}$$ has to be symmetric and that all the elements on its main diagonal must be non-negative. Also, no matter what $$\mathbf{A}$$ is, the diagonal elements of $$\boldsymbol{\Sigma}\_\mathbf{Y}$$ must all be non-negative as they are the variances of the elements of $$\mathbf{Y}$$. By the formula for $$\boldsymbol{\Sigma}_\mathbf{Y}$$ this means

$\mathbf{a} \boldsymbol{\Sigma} \mathbf{a}^T ~ \ge ~ 0 ~~~~ \text{for all } 1\times n \text{ vectors } \mathbf{a}$

which is the same as saying

$\mathbf{a}^T \boldsymbol{\Sigma} \mathbf{a} ~ \ge ~ 0 ~~~~ \text{for all } n\times 1 \text{ vectors } \mathbf{a}$

because $$\mathbf{a} \boldsymbol{\Sigma} \mathbf{a}^T$$ is a scalar and therefore the same as its transpose.

That is, $$\boldsymbol{\Sigma}$$ must be positive semidefinite. Usually, we will be working with positive definite covariance matrices, because if $$\mathbf{a}^T \boldsymbol{\Sigma} \mathbf{a} = 0$$ for some $$\mathbf{a}$$ then some linear combination of the elements of $$\mathbf{X}$$ is constant. Hence you can write some of the elements as linear combinations of the others and just study a reduced set of elements.