随机变量（RANDOM VARIABLES）

RANDOM VARIABLES:

$$ f:\Omega\mapsto\mathbb{R} $$

For a random variable $X$, the function $F$ defined by

$$ F(x)=P\{X \leq x\} \qquad -\infty \leq x \leq \infty $$

is called the cumulative distribution function, or, more simply, the distribution function of $X$. Thus, the distribution function specifies, for all real values $x$, the probability that the random variable is less than or equal to $x$.

DISCRETE RANDOM VARIABLES

A random variable that can take on at most a countable number of possible values is said to be discrete. For a discrete random variable $X$, we define the probability mass function $p(a)$ of $X$ by

$$ p(a)=P\{X=a\} $$

The probability mass function $p(a)$ is positive for at most a countable number of values of $a$. That is, if $X$ must assume one of the values $x_1, x2, \dots ,$ then

$p(x_i) \geq 0$ for $ i = 1, 2, \dots$ $p(x) = 0$ for all other values of $x$

Since $X$ must take on one of the values $x_i$, we have

$$ \sum_{i=1}^{\infty}p(x_i)=1 $$

EXPECTED VALUE

If $X$ is a discrete random variable having a probability mass function $p(x)$, then the expectation,or the expected value of $X$, denoted by $\mathbb{E}[X]$, is defined by

$$ \mathbb{E}[X]=\sum_{x:p(x)>0}xp(x) $$

In words, the expected value of $X$ is a weighted average of the possible values that $X$ can take on, each value being weighted by the probability that $X$ assumes it.

Remark. IF the expected number is larger than the average number

The probability concept of expectation is analogous to the physical concept of the center of gravity of a distribution of mass. Consider a discrete random variable $X$ having probability mass function $p(x_i),i\geq1$. If we now imagine a weightless rod in which weights with mass $p(x_i),i\geq1$, are located at the points $x_i,i\geq1$ , then the point at which the rod would be in balance is known as the center of gravity. For those readers acquainted with elementary statics, it is now a simple matter to show that this point is at $\mathbb{E}[X]$.

EXPECTATION OF A FUNCTION OF A RANDOM VARIABLE

Suppose $g$ is a function of a random variable,

$$ X:\Omega\mapsto\Bbb{R} \\ g:\Bbb{R}\mapsto\Bbb{R} $$

implies that $g \cdot X:\Omega\mapsto\Bbb{R}\mapsto\Bbb{R}$

Proposition 1.

If $X$ is a discrete random variable that takes on one of the values $x_i, i\geq1$, with respective probabilities $p(x_i)$, then, for any real-valued function $g$,

$$ \mathbb{E}[g(X)]=\sum_{i}g(x_i)p(x_i) $$

Utility Function

Corollary 1.

If $a$ and $b$ are constants, then

$$ \mathbb{E}[aX+b]=a\mathbb{E}[X]+b $$

Moment

The expected value of a random variable $X$, $\mathbb{E}[X]$, is also referred to as the mean or the first moment of $X$.

The quantity $\mathbb{E}[X_n], n \geq 1$, is called the $n$th moment of $X$.

$$ E[X^n]=\sum_{x:p(x)>0}x^np(x) $$

VARIANCE

Because we expect $X$ to take on values around its mean $\mathbb{E}[X]$, it would appear that a reasonable way of measuring the possible variation of $X$ would be to look at how far apart $X$ would be from its mean, on the average. One possible way to measure this variation would be to consider the quantity $\mathbb{E}[|X − μ|]$, where $μ = \mathbb{E}[X]$.

$$ Var(X)=\mathbb{E}[(X-\mu)^2] $$

The variance of $X$ is equal to the expected value of $X^2$ minus the square of its expected value.

$$ Var(X)=\mathbb{E}[X^2]-(\mathbb{E}[X])^2 $$

If $a$ and $b$ are constants, then

$$ Var(aX+b)=a^2Var(X) $$

Remarks.

(a) Analogous to the means being the center of gravity of a distribution of mass, the variance represents, in the terminology of mechanics, the moment of inertia.

(b) The square root of the $Var(X)$ is called the standard deviation of $X$, and we denote it by $SD(X)$. That is,

$$ SD(X) = Var(X) $$

THE BERNOULLI AND BINOMIAL RANDOM VARIABLES

Suppose that a trial, whose outcome can be classified as either a success($X = 1$) or a failure($X = 0$) is performed.

The probability mass function of $X$ is given by

$$ p(0)=P\{X=0\}=1-p \\ p(1)=P\{X=1\}=p $$

where $p(0 \leq p \leq 1)$, is the probability that the trial is a success.

The random variable $X$ is said to be a Bernoulli random variable.

$$ X\sim Ber(p) $$

means $X$ is distributed like a Bernoulli random variable.

Suppose now that $n$ independent trials, each of which results in a success with probability $p$ and in a failure with probability $1 − p$, are to be performed.

If $X$ represents the number of successes that occur in the $n$ trials, then $X$ is said to be a binomial random variable with parameters $(n, p)$.

Thus, a Bernoulli random variable is just a binomial random variable with parameters $(1, p)$.

The probability mass function of a binomial random variable having parameters $(n, p)$ is given by

$$ p(i)=\binom{n}{i}p^{i}(1-p)^{n-i} $$

Note that, by the binomial theorem, the probabilities sum to $1$; that is,

$$ \sum_{i=0}^{\infty}{p(i)}=\sum_{i=0}^{n}{\binom{n}{i}}{p^i}{(1-p)^{n-i}}=[p+(1-p)]^n=1 $$

Properties of Binomial Random Variables

If $X$ is a binomial random variable with parameters $n$ and $p$,then

$$ \begin{aligned} \mathbb{E}[X]&=np \\ Var(X)&=np(1-p) \end{aligned} $$

The following proposition details how the binomial probability mass function first increases and then decreases.

Proposition 2.

If $X$ is a binomial random variable with parameters $(n, p)$, where $0 < p < 1$, then as $k$ goes from $0$ to $n$, $P\{X = k\}$ first increases monotonically and then decreases monotonically, reaching its largest value when $k$ is the largest integer less than or equal to $[(n + 1)p]$.

Computing the Binomial Distribution Function

Suppose that $X$ is binomial with parameters $(n, p)$. The key to computing its distribution function

$$ P\{X\leq{i}\}=\sum_{k=0}^{i}\binom{n}{k}p^{k}(1-p)^{n-k} $$

is to utilize the following relationship between $P\{X = k + 1\}$ and $P\{X = k\}$, which was established in the proof of Proposition 2:

$$ P\{X=k+1\}=\frac{p}{1-p}\frac{n-k}{k+1}P\{X=k\} $$

THE POISSON RANDOM VARIABLE

A random variable $X$ that takes on one of the values $0, 1, 2, \dots$ is said to be a Poisson random variable with parameter $\lambda$ if, for some $\lambda > 0$,

$$ p(i)=P\{X=i\}=e^{-\lambda}\frac{\lambda^i}{i!} $$

this defines a probability mass function since

$$ \sum_{i=0}^{\infty}p(i)=e^{-\lambda}\sum_{i=0}^{\infty}\frac{\lambda^i}{i!}=e^{-\lambda}e^{\lambda}=1 $$

The Poisson random variable may be used as an approximation for a binomial random variable with parameters $(n, p)$ when $n$ is large and $p$ is small enough so that $np$ is of moderate size.

Expected value and Variance

The expected value and variance of a Poisson random variable are both equal to its parameter $\lambda$.

$$ \begin{aligned} \mathbb{E}[X]&=\lambda \\ Var(X)&=\lambda \end{aligned} $$

In fact, it remains a good approximation even when the trials are not independent, provided that their dependence is weak.

Poisson Paradigm.

Consider $n$ events, with $p_i$ equal to the probability that event $i$ occurs. If all the $p_i$ are “small” and the trials are either independent or at most “weakly dependent”, then the number of these events that occur approximately has a Poisson distribution with mean $\sum_{i=1}^n{p_i}$

Length of the longest run

OTHER DISCRETE PROBABILITY DISTRIBUTIONS

The Geometric Random Variable

Suppose that independent trials, each having a probability $p,0 < p < 1$, of being a success, are performed until a success occurs. If we let $X$ equal the number of trials required, then

$$ P\{X=n\}=(1-p)^{n-1}p \qquad n=1,2,\dots $$

Since

$$ \sum{P\{X=n\}}=p\sum{(1-p)^{n-1}}=\frac{p}{1-(1-p)}=1 $$

it follows that, with probability $1$, a success will eventually occur. Any random variable $X$ whose probability mass function is given by $(1-p)^{n-1}p$ is said to be a geometric random variable with parameter $p$.

Expected value and Variance

$$ \begin{aligned} \mathbb{E}[X]&=1/p \\ Var(X)&=\frac{1-p}{P^2} \end{aligned} $$

The Negative Binomial Random Variable

Suppose that independent trials, each having probability $p,0 < p < 1$, of being a success are performed until a total of $r$ successes is accumulated. If we let $X$ equal the number of trials required, then

$$ P\{X=n\}=\binom{r-1}{n-1}p^{r}(1-p)^{n-r} \qquad n=r,r+1,\dots $$

in order for the $r$th success to occur at the $n$th trial, there must be $r − 1$ successes in the first $n − 1$ trials and the $n$th trial must be a success. The probability of the first event is

$$ \binom{r-1}{n-1}p^{r-1}(1-p)^{n-r} $$

and the probability of the second is $p$; thus, by independence, the equation is established. To verify that a total of $r$ successes must eventually be accumulated, either we can prove analytically that

$$ \sum_{n=r}^{\infty}P\{X=n\}=\sum_{n=r}^{\infty}\binom{r-1}{n-1}p^{r}(1-p)^{n-r}=1 $$

or we can give a probabilistic argument as follows: The number of trials required to obtain $r$ successes can be expressed as $Y_1 + Y_2 + \cdots + Y_r$, where $Y_1$ equals the number of trials required for the first success, $Y_2$ the number of additional trials after the first success until the second success occurs, $Y_3$ the number of additional trials until the third success, and so on. Because the trials are independent and all have the same probability of success, it follows that $Y_1 + Y_2 + \cdots + Y_r$ are all geometric random variables. Hence, each is finite with probability $1$, so $\sum\limits_{i=1}^{r}Y_i$ must also be finite. Any random variable $X$ whose probability mass function is given by the equation is said to be a negative binomial random variable with parameters $(r, p)$. Note that a geometric random variable is just a negative binomial with parameter $(1, p)$.

Expected value and Variance

$$ \begin{aligned} \mathbb{E}[X]&=\frac{r}{p}\\ Var(X)&=\frac{r(1-p)}{p^2} \end{aligned} $$

Since a geometric random variable is just a negative binomial with parameter $r = 1$, it follows from the preceding example that the variance of a geometric random variable with parameter $p$ is equal to $(1 − p)/p^2$

The Hypergeometric Random Variable

$$ P\{X=i\}=\frac{\binom{m}{i}\binom{N-m}{n-i}}{\binom{N}{n}} \qquad i=0,1,\dots,n $$

Expected value and Variance

$$ \begin{aligned} \mathbb{E}[X]&=\frac{nm}{N}\\ Var(X)&=\frac{nm}{N}[\frac{(n-1)(m-1)}{N-1}+1-\frac{nm}{N}] \end{aligned} $$

Letting $p = m/N$ and using

$$ \frac{m-1}{N-1}=\frac{Np-1}{N-1}=p-\frac{1-p}{N-1} $$

shows that

$$ Var(X)=np(1-p)(1-\frac{n-1}{N-1}) $$

if $N$ is large in relation to $n$ [so that $(N − n)/(N − 1)$ is approximately equal to $1$], then

$$ Var(x) \approx np(1-p) $$

The Zeta (or Zipf) Distribution

A random variable is said to have a zeta (sometimes called the Zipf) distribution if its probability mass function is given by

$$ p\{X=k\}=\frac{C}{k^{\alpha+1}} \qquad k=1,2,\dots $$

for some value of $α> 0$. Since the sum of the foregoing probabilities must equal $1$, it follows that

$$ C=\Bigg[\sum_{1}^{\infty}(\frac{1}{k})^{\alpha+1}\Bigg]^{-1} $$

EXPECTED VALUE OF SUMS OF RANDOM VARIABLES

Proposition 3.

$$ \mathbb{E}[X]=\sum_{s\in S}X(s)p(s) $$

Corollary 3.

For random variables $X_1,X_2, \dots ,X_n,$

$$ \mathbb{E}\bigg[\sum_{i=1}^{n}X_i\bigg]=\sum_{i=1}^{n}\mathbb{E}[X_i] $$

PROPERTIES OF THE CUMULATIVE DISTRIBUTION FUNCTION

For the distribution function $F$ of $X$, $F(b)$ denotes the probability that the random variable $X$ takes on a value that is less than or equal to $b$. Following are some properties of the cumulative distribution function (c.d.f.) $F$:

$F$ is a nondecreasing function; that is, if $a < b$,then $F(a) \leq F(b)$.
$\lim\limits_{b\to\infty}F(b)=1$
$\lim\limits_{b\to-\infty}F(b)=0$
$F$ is right continuous. That is, for any $b$ and any decreasing sequence $b_n, n \geq 1$, that converges to $b$, $\lim\limits_{n\to\infty}F(b_n)=F(b)$