Multivariate Normal Distribution

J. Tacq , in International Encyclopedia of Education (Third Edition), 2010

Bivariate Normal Distribution

A special case of the multivariate normal distribution is the bivariate normal distribution with only two variables, so that we can show many of its aspects geometrically. (For more than two variables it becomes impossible to draw figures.) The probability density function of the univariate normal distribution contained two parameters: μ and σ. With two variables, say X 1 and X 2, the function will contain five parameters: two means μ 1 and μ 2, two standard deviations σ 1 and σ 2 and the product moment correlation between the two variables, ρ. The probability density function (pdf) of the bivariate normal distribution is given by

f ( x 1 , x 2 ) = 1 2 π σ 1 σ 2 1 ρ 2 exp [ 1 2 ( x - μ ) Σ 1 ( x - μ ) ] .

The constant term can be written in a more compact notation, if we notice that the determinant of the covariance matrix, |Σ|, simplifies to σ 1 2 σ 2 2 1 ρ 2 . Indeed, Σ contains on its principal diagonal the variances σ 1 2 and σ 2 2 and on its off-diagonal the covariance ρ σ 1 σ 2 , where its determinant is equal to σ 1 2 σ 2 2 ρ 2 σ 1 2 σ 2 2 = σ 1 2 σ 2 2 1 ρ 2 :

Σ = [ σ 1 2 ρ σ 1 σ 2 ρ σ 1 σ 2 σ 2 2 ] , | Σ | = σ 1 2 σ 2 2 ( 1 ρ 2 )

Thus, the pdf of the bivariate normal distribution can also be expressed as

f ( x 1 , x 2 ) = ( 2 π ) 1 | Σ | 1 / 2 exp [ 1 2 ( x - μ ) ' Σ 1 ( x - μ ) ]

We can see that this pdf displays a general bell-shaped appearance. It looks like a mountain of normal distribution curves. The surface is centered at the point (μ1, μ2), that is, the centroid. For each point on the bottom X 1, X 2 plane, we have a point f  (X 1, X 2) lying on the surface of the bell-shaped mountain ( Figure 1 ).

Figure 1. Joint bivariate normal density function.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780080448947013518

Some Multivariate Methods

Rand Wilcox , in Introduction to Robust Estimation and Hypothesis Testing (Fourth Edition), 2017

6.16 Multivariate Discriminate Analysis

Roughly, multivariate discriminate analysis, or classification analysis, deals with the following problem. Imagine that an individual belongs to one of G groups. For example, the groups might correspond to different diseases and the p measures might be symptoms associated with a given individual. The goal is to find an effective rule for classifying an individual as belonging to one of the G groups. The data used to determine a classification rule is typically called the training set. There are classic methods for addressing this issue (e.g., Mardia, Kent, & Bibby, 1979; Huberty, 1994) that assume multivariate normality. This section summarizes a more robust approach. For other methods worth considering, see for example Schapire and Fruend (2012) and Breiman (2001).

The basic strategy stems from Li, Cuesta-Albertos, and Liu (2012) who suggest transforming the data to some measure of depth, such as halfspace depth. For two groups, they search for the best separating polynomial based on the transformed data. Hubert, Rousseeuw, and Segaert (2015) suggest some modifications of the approach used by Li et al. First, they suggest using a distance measure that is based in part on halfspace depth, but which avoids measures of depth that become zero outside the convex hull of the data. (They focus on a particular measure of distance, which they all bagdistance.) They then use the kNN classification rule (Fix & Hodges, 1951) to classify individuals, rather then a separating polynomial, which was found to perform well and it is computationally easier to use. Briefly, for each new observation, the kNN (k nearest neighbor) rule looks up the k training data points closest to it (typically using Euclidean distance), and then assigns it to the most prevalent group among those neighbors. The value of k is typically chosen by cross-validation to minimize the misclassification rate. Here, the default measure of depth is the projection-based depth computed via the R function prodepth in Section 6.2.8 rather than the distance used by Hubert et al. Currently, an R function for computing bagdistance is not readily available. But the R function in the next section, which applies the method described here, can be used with any measure of depth, as will be seen.

For every vector x in the training data, transform it to the G-variate point

( dist ( x , P 1 ) , , dist ( x , P G ) ) ,

where dist ( x , P g ) is some depth measure associated with x and based on the training data in group P g ( g = 1 , , G ). Based on these distance measures, classify some future x based on the kNN rule. (For a possible way of improving the correct classification rate, see Croux, Joossens, & Lemmens, 2007.)

6.16.1 R Function KNNdist

The R function

KNNdist(train,test,g, k=1, prob=TRUE, plotit=FALSE, xlab='Group 1', ylab='Group 2',
depthfun=prodepth, ...)

applies the classification method described in the previous section. The argument train is a matrix with n rows and p columns that contains the training data. The argument test contains the data to be classified and the argument g, having length n, contains the labels for the training set. For example, g[1]=3 means that the first row vector in train belongs to group 3 and g[2]=1 means the second row vector in train belongs to group 1. The argument k corresponds to the k used by the kNN classification rule as described in the previous section. For two groups, plotit=TRUE will create a scatterplot of the distances. The argument depthfun indicates the distance measure that will be used, which defaults to Zou's projection-based measure of depth. The function applies the kNN rule via the R function knn, which belongs to the R library class. (For a description of the argument prob=TRUE, use the R command ?knn.)

Example

The following R commands illustrate the function KNNdist with data generated from a bivariate normal distribution:

set.seed(54)
x=rmul(100)
x[51:100,]=x[51:100,]+3
g=c(rep(1,50),rep(2,50))
test=rmul(10)
test[5:10,]=test[5:10,]+3
KNNdist(x,test,g)

That is, the training set has a sample of size n = 100 and was generated from a bivariate normal distribution for which the first half has mean (0, 0) and the second half has mean (3, 3). The first five vectors for the test set come from the first group and the remaining five come from the second group. The output is:

[1]   1   1   1   1   2   2   2   2   2   2

This indicates that the first four vectors in the test set were classified as coming from group 1 and the remaining were classified as coming from group 2. So the fifth vector was misclassified.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128047330000068

Joint and conditional p.d.f.'s, conditional expectation and variance, moment generating function, covariance, and correlation coefficient

George G. Roussas , in An Introduction to Probability and Statistical Inference (Second Edition), 2015

4.5.2 Bivariate Normal Distribution

The joint distribution of the r.v.'s X and Y is said to be the Bivariate Normal distribution with parameters μ 1, μ 2 in ℜ, σ 1, σ 2 positive, and ρ ∈ [−1, 1], if the joint p.d.f. is given by the formula:

(50) f X , Y ( x , y ) = 1 2 π σ 1 σ 2 1 ρ 2 e q / 2 , x , y ,

where

(51) q = 1 1 ρ 2 [ ( x μ 1 σ 1 ) 2 2 ρ ( x μ 1 σ 1 ) ( y μ 2 σ 2 ) + ( y μ 2 σ 2 ) ] .

This distribution is also referred to as two-dimensional Normal. The shape of f X,Y looks like a bell sitting on the xy-plane and whose highest point is located at the point ( μ 1 , μ 2 , 1 / ( 2 π σ 1 σ 2 1 ρ 2 ) ) (see Figure 4.7).

Figure 4.7. Graphs of the p.d.f. of the Bivariate Normal distribution: (a) Centered at the origin; (b) Centered elsewhere in the (x, y)-plane.

That f X,Y integrates to 1 and therefore is a p.d.f. is seen by rewriting it in a convenient way. Specifically,

(52) ( x μ 1 σ 1 ) 2 2 ρ ( x μ 1 σ 1 ) ( y μ 2 σ 2 ) + ( y μ 2 σ 2 ) 2 = ( y μ 2 σ 2 ) 2 2 ( ρ x μ 1 σ 1 ) ( y μ 2 σ 2 ) + ( ρ x μ 1 σ 1 ) 2 + ( 1 ρ 2 ) ( x μ 1 σ 1 ) 2 = [ ( y μ 2 σ 2 ) ( ρ x μ 1 σ 1 ) ] 2 + ( 1 ρ 2 ) ( x μ 1 σ 1 ) 2 .

Furthermore,

y μ 2 σ 2 ρ x μ 1 σ 1 = y μ 2 σ 2 1 σ 2 × ρ σ 2 x μ 1 σ 1 = 1 σ 2 { y [ μ 2 + ρ σ 2 σ 1 ( x μ 1 ) ] } = y b x σ 2 , where b x = μ 2 + ρ σ 2 σ 1 ( x μ 1 )

(see also Exercise 5.6).

Therefore, the right-hand side of (52) is equal to:

( y b x σ 2 ) 2 + ( 1 ρ 2 ) ( x μ 1 σ 1 ) 2 ,

and hence the exponent becomes

( x μ 1 ) 2 2 σ 1 2 ( y b x ) 2 2 ( σ 2 1 ρ 2 ) 2 .

Then the joint p.d.f. may be rewritten as follows:

(53) f X , Y ( x , y ) = 1 2 π σ 1 e ( x μ 1 ) 2 / 2 σ 1 2 × 1 2 π ( σ 2 1 ρ 2 ) e ( y b x ) 2 / 2 ( σ 2 1 ρ 2 ) 2 .

The first factor on the right-hand side of (53) is the p.d.f. of N ( μ 1 , σ 1 2 ) and the second factor is the p.d.f. of N ( b x , ( σ 2 1 ρ 2 ) 2 ) . Therefore, integration with respect to y produces the marginal N ( μ 1 , σ 1 2 ) distribution, which, of course, integrates to 1. So, we have established the following two facts: f X , Y ( x , y ) d x d y = 1 , and

(54) X N ( μ 1 , σ 1 2 ) , and, by symmetry, Y N ( μ 2 , σ 2 2 ) .

The results recorded in (54) also reveal the special significance of the parameters μ 1 , σ 1 2 and μ 2 , σ 2 2 . Namely, they are the means and the variances of the (Normally distributed) r.v.'s X and Y, respectively. Relations (53) and (54) also provide immediately the conditional p.d.f. f Y | X ; namely,

f Y | X ( y | x ) = 1 2 π ( σ 2 1 ρ 2 ) 2 exp [ ( y b x ) 2 2 ( σ 2 1 ρ 2 ) 2 ] .

Thus, in obvious notation:

(55) Y | X = x N ( b x , ( σ 2 1 ρ 2 ) 2 ) , b x = μ 2 + ρ σ 2 σ 1 ( x μ 1 ) ,

and, by symmetry:

(56) X | Y = y N ( b y , ( σ 1 1 ρ 2 ) 2 ) , b y = μ 1 + ρ σ 1 σ 2 ( y μ 2 ) .

In Figure 4.8, the conditional p.d.f. f Y|X (∙ | x) is depicted for three values of x : x = 5,10, and 15.

Figure 4.8. Conditional probability density functions of the bivariate Normal distribution.

Formulas (53), (54), and (56) also allow us to calculate easily the covariance and the correlation coefficient of X and Y. Indeed, by (53):

E ( X Y ) = x y f X , Y ( x , y ) d x d y = x f X ( x ) [ y f Y | X ( y | x ) d y ] d x = x f X ( x ) b x d x = x f X ( x ) [ μ 2 + ρ σ 2 σ 1 ( x μ 1 ) ] d x = μ 1 μ 2 + ρ σ 1 σ 2

(see also Exercise 5.7). Since we already know that EX = μ 1, EY = μ 2, and Var ( X ) = σ 1 2 , Var ( Y ) = σ 2 2 , we obtain:

Cov ( X , Y ) = E ( X Y ) ( E X ) ( E Y ) = μ 1 μ 2 + ρ σ 1 σ 2 μ 1 μ 2 = ρ σ 1 σ 2 ,

and therefore ρ ( X , Y ) = ρ σ 1 σ 2 σ 1 σ 2 = ρ . Thus, we have:

(57) Cov ( X , Y ) = ρ σ 1 σ 2 and ρ ( X , Y ) = ρ .

Relation (57) reveals that the parameter ρ in (50) is, actually, the correlation coefficient of the r.v.'s X and Y.

Example 27

If the r.v.'s X 1 and X 2 have the Bivariate Normal distribution with parameters μ 1 , μ 2 , σ 1 2 , σ 2 2 , and ρ:

(i)

Calculate the quantities: E(c 1 X 1 + c 2 X 2), Var(c 1 X 1 + c 2 X 2), where c 1, c 2 are constants.

(ii)

How the expression in part (i) becomes for: μ 1 = 1 , μ 2 = 3 , σ 1 2 = 4 , σ 2 2 = 9 , and ρ = 1 2 ?

Discussion

(i)

E(c 1 X 1 + c 2 X 2) = c 1 EX 1 + c 2 EX 2 = c 1 μ 1 + c 2 μ 2, since X i N ( μ i , σ i 2 ) , so that EX i = μ i , i= 1, 2. Also,

Var ( c 1 X 1 + c 2 X 2 ) = c 1 2 σ X 1 2 + c 2 2 σ X 2 2 + 2 c 1 c 2 σ X 1 σ X 2 ρ ( X 1 , X 2 ) ( by 39 ) = c 1 2 σ 1 2 + c 2 2 σ 2 2 + 2 c 1 c 2 σ 1 σ 2 ρ ,

since X i ~ N ( μ i , σ i 2 ) , so that Var ( X i ) = σ i 2 , i = 1 , 2 , and ρ(X 1,X 2) = ρ, by (57).
(ii)

Here, E(c 1 X 1 + c 2 X 2) = −c 1 + 3c 2, and Var ( c 1 X 1 + c 2 X 2 ) = 4 c 1 + 9 c 2 + 2 c 1 c 2 × 2 × 3 × 1 2 = 4 c 1 + 9 c 2 + 6 c 1 c 2 .

Example 28

Suppose that the heights of fathers and sons are r.v.'s X and Y, respectively, having (approximately) Bivariate Normal distribution with parameters (expressed in inches) μ 1 = 70, σ 1 = 2, μ 2 = 71, σ 2 = 2 and ρ = 0.90. If for a given pair (father, son) it is observed that X = x = 69, determine:

(i)

The conditional distribution of the height of the son.

(ii)

The expected height of the son.

(iii)

The probability of the height of the son to be more than 72 in.

Discussion

(i)

According to (55), Y | X = x N ( b x , ( σ 2 1 ρ 2 ) 2 ) , where

b x = μ 2 + ρ σ 2 σ 1 ( x μ 1 ) = 71 + 0.90 × ( 69 70 ) = 70.1 , and σ 2 1 ρ 2 = 2 × 1 0.90 2 0.87.

That is, YǀX = 69 is distributed as N(70.1, (0.87)2).

(ii)

The (conditional) expectation of Y, given X = 69, is equal to b 69 = 70.1.

(iii)

The required (conditional) probability is

P ( Y > 72 | X = 69 ) = P ( Y b 69 σ 2 1 ρ 2 > 72 70.1 0.87 ) P ( Z > 2.18 ) = 1 Φ ( 2.18 ) = 1 0.985371 = 0.014629.

Finally, it can be seen by integration that the joint m.g.f. of X and Y is given by the formula:

(58) M X , Y ( t 1 , t 2 ) = exp [ μ 1 t 1 + μ 2 t 2 + 1 2 ( σ 1 2 t 1 2 + 2 ρ σ 1 σ 2 t 1 t 2 + σ 2 2 t 2 2 ) ] , t 1 t 2 ;

we choose not to pursue its justification (which can be found, e.g., in pages 158–159, in the book A Course in Mathematical Statistics, 2nd edition (1997), Academic Press, by G.G. Roussas). We see, however, easily that:

t 1 M X , Y ( t 1 , t 2 ) = ( μ 1 + σ 1 2 t 1 + ρ σ 1 σ 2 t 2 ) M X , Y ( t 1 , t 2 ) ,

and hence:

2 t 1 t 2 M X , Y ( t 1 , t 2 ) = ρ σ 1 σ 2 M X , Y ( t 1 , t 2 ) + ( μ 1 + σ 1 2 t 1 + ρ σ 1 σ 2 t 2 ) × ( μ 2 + σ 2 2 t 2 + ρ σ 1 σ 2 t 1 ) M X , Y ( t 1 , t 2 ) ,

which, evaluated at t 1 = t 2 = 0, yields ρσ 1 σ 2 + μ 1 μ 2 = E(XY), as we have already seen.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128001141000044

Assessing structural relationships between distributions - a quantile process approach based on Mallows distance

G. Freitag , ... M. Vogt , in Recent Advances and Trends in Nonparametric Statistics, 2003

4.2 Acceleration model

For this model we used the symmetrized version of the test statistics, as indicated in (6) . Several bivariate normal distribution settings were considered, and each test was performed with Δ 2 0  =   1. We found that in general trimming improves the performance of the BC a test. For smaller variances, it turns out to be liberal in case of no trimming and conservative in case of trimming. For larger variances, the BC a method is always liberal, although trimming reduces the liberality. It was seen that an increase in location difference (while keeping the true distance Γ A,β (F, G) constant) results in more liberal tests for β =   0, and in less liberal tests for β =   0.05.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780444513786500090

Ratio and Product Type Exponential Estimators for Population Mean Using Ranked Set Sampling

Gajendra K. Vishwakarma , ... Carlos N. Bouza-Herrera , in Ranked Set Sampling, 2019

18.4 A Simulation Study

To illustrate how one can gain an insight into the application or properties of the proposed estimator, a computer simulation was conducted. Bivariate random observations were generated from a bivariate normal distribution with parameters μ y , μ x , σ x , σ y and correlation coefficient ρ . The sampling method explained above is used to pick RSS data with sets of size m and after r repeated cycles to get an RSS of size mr . A sample of size mr bivariate units is randomly chosen from the population (we refer to these data as SRS data). The simulation was performed with m = 3, 4, 5 and with r = 3 and 6 (i.e., with total sample sizes of 9, 12, 15, 18, 24, and 30) for the RSS and SRS data sets. Here, we have ranked the auxiliary variate X which induces ranking in study variate Y (ranking on Y will be perfect if ρ = 1 or will be with errors in ranking if ρ < 1 ). Using R software we have conducted 5,000 replications for estimates of the means and mean square errors. The results of these simulations are summarized by the percentage relative efficiencies of the estimators using the formula.

(18.28) PRE [ * , Y ¯ ˆ R rss ] = MSE ( Y ¯ ˆ R rss ) MSE ( * ) × 100

(18.29) PRE [ * , Y ¯ ˆ P rss ] = MSE ( Y ¯ ˆ P rss ) MSE ( * ) × 100

where, * = Y ¯ ˆ Re rss , Y ¯ ˆ Pe rss , Y ¯ ˆ G rss .

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128150443000186

Some Generalizations to k Random Variables, and Three Multivariate Distributions

George Roussas , in Introduction to Probability (Second Edition), 2014

9.4 Multivariate Normal Distribution

This chapter is concluded with a brief reference to the multivariate normal distribution without entering into any details. A relevant reference is given for the interested reader.

The multivariate normal distribution is the generalization of the bivariate normal distribution and can be defined in a number of ways; we choose the one given here. To this end, for k 2 , let μ = ( μ 1 , , μ k ) be a vector of constants, and let Σ be a k × k nonsingular matrix, so that the inverse Σ - 1 exists and the determinant Σ 0 . Finally, set X for the vector of r.v.'s X 1 , , X k ; i.e., X = ( X 1 , , X k ) and x = ( x 1 , , x k ) for any point in R k . Then, the joint p.d.f. of the X i 's, or the p.d.f. of the random vector X , is said to be multivariate normal, or k-variate normal, if it is given by the formula:

f X ( x ) = 1 ( 2 π ) k / 2 Σ 1 / 2 exp - 1 2 ( x - μ ) Σ - 1 ( x - μ ) , x R k ,

where, it is to be recalled that " " stands for transpose.

It can be seen that EX i = μ i , Var ( X i ) = σ i 2 is the ( i , i ) th element of Σ , and Cov ( X i , X j ) is the ( i , j ) th element of Σ , so that μ = ( EX 1 , , EX k ) and Σ = ( Cov ( X i , X j ) ) , i , j = 1 , , k . The quantities μ and Σ are called the parameters of the distribution. It can also be seen that the joint m.g.f. of the X i 's, or the m.g.f. of the random vector X , is given by:

M X ( t ) = exp μ t + 1 2 t Σ t , t R k .

The k-variate normal distribution has properties similar to those of the 2-dimensional normal distribution, and the latter is obtained from the former by taking μ = ( μ 1 , μ 2 ) and Σ = σ 1 2 ρ σ 1 σ 2 ρ σ 1 σ 2 σ 2 2 , where ρ = ρ ( X 1 , X 2 ) .

More relevant information can be found, for example, in Chapter 18 of the reference cited in Exercise 3.4.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128000410000092

Quality of Analytical Measurements: Univariate Regression

M.C. Ortiz , ... L.A. Sarabia , in Comprehensive Chemometrics, 2009

1.05.4.4 Confidence Interval for the Prediction

For a concentration x 0 where no experiments were carried out, the response is estimated by

(57) y ˆ 0 = b 0 + b 1 x 0

However, as b 0 and b 1 are random variables that follow a bivariate normal distribution, y ˆ 0 is also a random variable that follows a normal distribution whose mean and variance as computed with Equations (11)–(15) are

(58) E ( y ˆ 0 ) = β 0 + β 1 x 0

(59) Var ( y ˆ 0 ) = ( 1 N + ( x 0 x ¯ ) 2 i = 1 N ( x i x ¯ ) 2 ) σ 2

Var ( y ˆ 0 ) has a minimum when x 0 = x ¯ and increases as x 0 separates from x ¯ in either direction. In other words, the most precise prediction is expected to be in the mean of the calibration range and precision is lost as we separate from x ¯ .

The confidence interval at level (1     α)100% for the true mean value of the response for a given x 0 is then computed as

(60) ( y ˆ 0 t α / 2 , N 2 s y 0 , y ˆ 0 + t α / 2 , N 2 s y 0 ) s y 0 = Var ˆ ( y ˆ 0 ) = s yx 1 N + ( x 0 x ¯ ) 2 i = 1 N ( x i x ¯ ) 2

If we join up all the lower endpoints and all the upper endpoints of intervals defined in Equation (60) as x 0 changes, we would obtain the two dotted hyperbolas as shown in Figure 7 for the data in Example 1, Table 3 .

Figure 7. Confidence intervals at 95% for the calibration data of Example 1, Table 3 . Dotted hyperbolas are for the true mean. Continuous hyperbolas are for a new prediction with q  =   1 in Equation (63). The squares are the experimental points.

The individual values of the random variable y ˆ 0 are distributed around its mean with variance σ2 independent of Var ( y ˆ 0 ) . Therefore, the variance of the prediction of an individual observation would be

(61) σ 2 ( 1 + 1 N + ( x 0 x ¯ ) 2 i = 1 N ( x i x ¯ ) 2 )

and the corresponding estimated value is obtained by substituting σ2 by s y x 2 in Equation (61).

The confidence interval for a new observation is then

(62) y ˆ 0 ± t α / 2 , N 2 s yx 1 + 1 N + ( x 0 x ¯ ) 2 i = 1 N ( x i x ¯ ) 2

A confidence interval for the average of q new observations is obtained similarly as follows:

(63) y ˆ 0 ± t α / 2 , N 2 s yx 1 q + 1 N + ( x 0 x ¯ ) 2 i = 1 N ( x i x ¯ ) 2

Again, by joining the corresponding endpoints of the confidence intervals for one new observation, we obtain two hyperbolas (those drawn with a continuous line in Figure 7 ). The rest of the hyperbolas that would be obtained for q  >   1 would be located between the two 'limiting' hyperbolas (the dotted and the continuous lines in Figure 7 ).

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780444527011000910

Correlation and Tests of Independence

Rand Wilcox , in Introduction to Robust Estimation and Hypothesis Testing (Third Edition), 2012

9.4.3 The OP Correlation

The so-called OP correlation coefficient begins by eliminating any outliers using the projection method in Section 6.4.9. Then it merely computes some correlation coefficient with the data that remain. Pearson's correlation is assumed unless stated otherwise.

Imagine that data are randomly sampled from some bivariate normal distribution. If the goal is to use a skipped correlation coefficient that gives a reasonably accurate estimate of Pearson's correlation, ρ, relative to r, then the OP estimator is the only skipped estimator known to be reasonably satisfactory.

Let rp represent the skipped correlation coefficient and let m be the number of pairs of points left after outliers are removed. A seemingly simple method for testing the hypothesis of independence is to apply the usual T test for Pearson's correlation but with r replaced by rp and n replaced by m. But this simple solution fails because it does not take into account the dependence among the points remaining after outliers are removed. If this problem is ignored, unsatisfactory control over the probability of a type I error results (Wilcox, 2010f). However, let

T p = r p n 2 1 r p 2

and suppose the hypothesis of independence is rejected at the α = 0.05 level if |Tp | ≥ c, where

c = 6.947 n + 2.3197.

The critical value c was determined via simulations under normality by determining an appropriate critical value for n ranging between 10 and 200, and then a least squares regression line was fit to the data. For nonnormal distributions, all indications are that this hypothesis testing method has an actual type I error probability reasonably close to the nominal 0.05 level.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123869838000093

Copulas and quasi-copulas: An introduction to their properties and applications

Roger B. Nelsen , in Logical, Algebraic, Analytic and Probabilistic Aspects of Triangular Norms, 2005

14.3.3 Normal copulas

Let N ρ(x,y) denote the standard bivariate normal joint distribution function with correlation coefficient ρ. Then C ρ, the copula corresponding to N ρ, is given by C ρ(u,v)   = N ρ  1(u),Φ  1 (v)) [where Φ denotes the standard normal distribution function]. Since there is no closed form expression for Φ  1, there is no closed form expression for N ρ. However, N ρ can be evaluated approximately in order to construct bivariate distribution functions with the same dependence structure as the standard bivariate normal distribution function but with non-normal marginals.

14.3.1 Definition

For a pair X,Y of random variables with marginal distribution functions F,G, respectively, and joint distribution function H, the corresponding marginal survival functions F ¯ , G ¯ , and joint survival function H ¯ are given by F ¯ x = P X > x , G ¯ y = p y > y and H ¯ x y = p X > x , Y > y , respectively. The function Ĉ which couples the joint survival function to its marginal survival functions is called a survival copula:

H ¯ x y = C ^ F ¯ x , G ¯ y .

It is easy to show that Ĉ is a copula, and is related to the (ordinary) copula C of X and Y via the equation Ĉ(u,v)   = u  + v    1   + C(1   u,1   v). See [29] for details.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780444518149500148

Markov Chain Monte Carlo

William L. Dunn , J. Kenneth Shultis , in Exploring Monte Carlo Methods, 2012

Example 6.3 A Modified Buffon's Needle Problem

Consider a modified Buffon's needle game in which needles of length L are dropped by a biased machine. In the original Buffon's needle problem, the PDF for the position of the needle endpoints is f (x) = 1/D, 0 ≤ xD, the PDF for the orientation angle is g(θ) = 1/π, 0 ≤ θ ≤ π, and Pcut is given by Eq. (1.7). The biased machine, however, drops needles such that the endpoints x and the orientation angles θ are distributed, within a unit cell, according to a bivariate normal distribution (see Appendix A.3.1), i.e.,

f ( x , θ ) = K 2 π σ 1 σ 2 1 ρ 2 exp [ z ( x , θ ) 2 ( 1 ρ 2 ) ] , 0 x < D , 0 θ < π ,

where K is a normalization constant,

z ( x , θ ) = ( x μ 1 ) 2 σ 1 2 2 ρ ( x μ 1 ) ( θ μ 2 ) σ 1 σ 2 + ( θ μ 2 ) 2 σ 2 2 ,

and

ρ = σ 12 σ 1 σ 2 .

Employing an analysis similar to that taken in Example 1.1, the cut probability for this machine dropping needles on a grid with DL can be expressed as

P c u t = 0 π 0 D L sin θ D f ( x , θ ) d x d θ .

How might one estimate this cut probability for a given set of parameters?

Consider the case where D = 5, L = 2, μ1 = 3, μ2 = 1, σ1 = 1, σ2 = 1, and σ12 = 0.1. One could drop n needles into the machine and observe k crossings on the grid below, and approximate Pcut as k/n. However, it is much more efficient to run an experiment on a computer using MCMC.

Choose as proposal distributions functions that are easy to sample, say h 1(u) = 1/D and h 2(v) = 1/π. Then, sample u and v from these proposal functions. In this case, the Metropolis ratio is simply

R = f ( u , v ) f ( x , θ )

because the proposals are both constants. After running 106 histories an estimate of Pcut = 0.2837 ± 0.0001 was obtained (a value statistically slightly higher than for uniform drops). Notice that the two-dimensional integral for Pcut is evaluated without having to sample from the bivariate normal distribution or having to determine the normalization constant K. All that is required is that the ratio of bivariate normals be evaluated at both current and candidate points sampled from the simple proposal functions.

Results generated by global MCMC sampling from a multidimensional distribution are shown in Figs. 6.9 to 6.11, in which a bivariate normal PDF (see Appendix A.3.1) is sampled by MCMC for three different values of N. The large spikes in the tails evident for 103 histories are "softened" by 105 histories and almost completely removed by N = 107 histories.

Figure 6.9. Bivariate normal PDF sampled using MCMC after 103 histories.

Figure 6.10. Bivariate normal PDF sampled using MCMC after 105 histories.

Figure 6.11. Bivariate normal PDF sampled using MCMC after 107 histories.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780444515759000063