Random Vectors
A vector valued random variable, or more simply, a random vector, is a list of random variables defined on the same space. We will think of it as a column.
For ease of display, we will sometimes write $\mathbf{X} = [X_1 X_2 \ldots X_n]^T$ where $\mathbf{M}^T$ is notation for the transpose of the matrix $\mathbf{M}$.
The mean vector of $\mathbf{X}$ is $\boldsymbol{\mu} = [\mu_1 ~ \mu_2 ~ \ldots ~ \mu_n]^T$ where $\mu_i = E(X_i)$.
The covariance matrix of $\mathbf{X}$ is the $n \times n$ matrix $\boldsymbol{\Sigma}$ whose $(i, j)$ element is $Cov(X_i, X_j)$.
The $i$th diagonal element of $\boldsymbol{\Sigma}$ is the variance of $X_i$. The matrix is symmetric because of the symmetry of covariance.
Linear Transformation: Mean Vector
Let $\mathbf{A}$ be an $m \times n$ numerical matrix and $\mathbf{b}$ an $m \times 1$ numerical vector. Consider the $m \times 1$ random vector $\mathbf{Y} = \mathbf{AX} + \mathbf{b}$. Then the $i$th element of $\mathbf{Y}$ is
where $\mathbf{A}_{i*}$ denotes the $i$th row of $\mathbf{A}$ and $\mathbf{b}(i)$ denotes the $i$th element of $\mathbf{b}$. Written longhand,
where $a_{ij}$ is the $(i, j)$ entry of $\mathbf{A}$ and $b_i = \mathbf{b}(i)$.
Thus $Y_i$ is a linear combination of the elements of $\mathbf{X}$. Therefore by linearity of expectation,
Let $\boldsymbol{\mu}_\mathbf{Y}$ be the mean vector of $\mathbf{Y}$. Then by the calculation above,
Linear Transformation: Covariance Matrix
$Cov(Y_i, Y_j)$ can be calculated using bilinearity of covariance.
This is the $(i, j)$ element of $\mathbf{A}\boldsymbol{\Sigma}\mathbf{A}^T$. So if $\boldsymbol{\Sigma}_\mathbf{Y}$ denotes the covariance matrix $\mathbf{Y}$, then
Constraints on $\boldsymbol{\Sigma}$
We know that $\boldsymbol{\Sigma}$ has to be symmetric and that all the elements on its main diagonal must be non-negative. Also, no matter what $\mathbf{A}$ is, the diagonal elements of $\boldsymbol{\Sigma}_\mathbf{Y}$ must all be non-negative as they are the variances of the elements of $\mathbf{Y}$. By the formula for $\boldsymbol{\Sigma}_\mathbf{Y}$ this means
which is the same as saying
because $\mathbf{a} \boldsymbol{\Sigma} \mathbf{a}^T$ is a scalar and therefore the same as its transpose.
That is, $\boldsymbol{\Sigma}$ must be positive semidefinite. Usually, we will be working with positive definite covariance matrices, because if $\mathbf{a}^T \boldsymbol{\Sigma} \mathbf{a} = 0$ for some $\mathbf{a}$ then some linear combination of the elements of $\mathbf{X}$ is constant. Hence you can write some of the elements as linear combinations of the others and just study a reduced set of elements.