9512.net

甜梦文库

甜梦文库

当前位置：首页 >> >> # A generalization of the Lindeberg principle

The Annals of Probability 2006, Vol. 34, No. 6, 2061–2076 DOI: 10.1214/009117906000000575 c Institute of Mathematical Statistics, 2006

arXiv:math/0508519v3 [math.PR] 27 Feb 2007

A GENERALIZATION OF THE LINDEBERG PRINCIPLE1 By Sourav Chatterjee University of California, Berkeley

We generalize Lindeberg’s proof of the central limit theorem to an invariance principle for arbitrary smooth functions of independent and weakly dependent random variables. The result is applied to get a similar theorem for smooth functions of exchangeable random variables. This theorem allows us to identify, for the ?rst time, the limiting spectral distributions of Wigner matrices with exchangeable entries.

1. Introduction and results. J. W. Lindeberg’s elegant proof of the central limit theorem [12], despite being in the shadow of Fourier analytic methods for a long time, is now well known. It was revived by Trotter [20] and has since been successfully used to derive CLT’s in in?nite-dimensional spaces, where the Fourier analytic methods are not so useful. While the original Lindeberg method and its extensions compare the distributions of convolutions in great generality (the history of which is irrelevant to our discussion, so we refer to [15] for details), it soon becomes clear that the same principle works not only for sums, but for more general smooth functions as well. Comparison of f (X1 , . . . , Xn ) and f (Y1 , . . . , Yn ) for polynomial f has been examined by Rotar [16] and Mossel, O’Donnell and Oleszkiewicz [13], and for general smooth f with bounded derivatives by Chatterjee [5]. In [5], it is shown how to apply the method to establish universality in physical models, including the Sherrington–Kirkpatrick model of spin glasses. It was recently observed by Tou?c Suidan [18] that the results in [5] can be used to give an immediate proof of the universality of last passage percolation in thin rectangles (originally a result of [3] and [4]). Developed independently, the Mossel, O’Donnell and Oleszkiewicz paper [13] is another repository of very striking modern applications of this very old idea.

Received August 2005; revised March 2006. Supported in part by NSF Grant DMS-04-06042. AMS 2000 subject classi?cations. Primary 60F17; secondary 60G09, 15A52. Key words and phrases. Lindeberg, de Finetti, Wigner, exchangeable, random matrices, invariance principle, semicircle law.

1

This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Probability, 2006, Vol. 34, No. 6, 2061–2076. This reprint di?ers from the original in pagination and typographic detail. 1

2

S. CHATTERJEE

A closer examination of Lindeberg’s method reveals that there is a direct generalization by which the independence of the coordinates can be dispensed with. The argument, which may be possible to guess once the theorem has been stated, will be given in Section 2. Theorem 1.1. Suppose X and Y are random vectors in Rn with Y having independent components. For 1 ≤ i ≤ n, let Bi := E|E(Xi2 |X1 , . . . , Xi?1 ) ? E(Yi2 )|. Ai := E|E(Xi |X1 , . . . , Xi?1 ) ? E(Yi )|,

Let M3 be a bound on maxi (E|Xi |3 + E|Yi |3 ). Suppose f : Rn → R is a thrice continuously di?erentiable function, and for r = 1, 2, 3, let Lr (f ) be a ?nite r f (x)| ≤ L (f ) for each i and x, where ? r denotes the constant such that |?i r i r -fold derivative in the ith coordinate. Then

n

|Ef (X) ? Ef (Y )| ≤

1 (Ai L1 (f ) + 1 2 Bi L2 (f )) + 6 nL3 (f )M3 . i=1

Let us now say a bit about the condition of boundedness of third derivatives. The implications of this condition have been inspected in detail in the context of convolutions by Zolotarev (see, e.g., [23]) and other authors. Zolotarev de?nes the ζ3 -metric on the space of distributions as follows: ζ3 (F, G) = supf | f dF ? f dG|, where the sup is taken over all f with third derivative bounded by 1. The ζ3 metric has not been so popular in practice because of the di?culty in connecting this metric with the common notions of distance between measures. However, instead of taking supremum over a class of f ’s, we consider only individual functions of interest. For instance, in the random matrix scenario, our f will be the Stieltjes transform of a matrix at a ?xed z ∈ C\R, which is a nice C ∞ function of the original matrix. In the paper [5], the author considered the partition function of a disordered physical system (the Sherrington–Kirkpatrick model of spin glasses), which again turns out to be a C ∞ function of the disorder matrix, and has nicely bounded derivatives. The condition of boundedness of the derivatives can be dropped (as demonstrated in [16] and [13]) by careful examination of the remainder term; but our focus is di?erent: We are more concerned with ways to extend the method to the case of weakly dependent variables (as in Theorem 1.1 above), and more speci?cally, to exchangeable random variables, as below. Exchangeable random variables. Let us now present a surprisingly nontrivial application of the basic tool developed in the previous section. Suppose X is a vector with exchangeable components. Certainly, we cannot

LINDEBERG PRINCIPLE

3

expect to replace the components of X by independent Gaussians as we did in Theorem 1.1. For instance, all the components may be equal to the same random variable, in which case there is no hope of replacing these variables by something generic. However, not all is lost; our next theorem shows that the following “summarization” of X can still be carried through: Suppose X is a vector with exchangeable components, having ?nite fourth moments. Let (1) ? ? := 1 n Xi n i=1 and σ ? 2 := 1 n (Xi ? ? ?)2 . n i=1

? := Let Z be a standard Gaussian vector in Rn , independent of X. Let Z n 1 i=1 Zi and n (2) ? ), Yi := ? ?+σ ? (Zi ? Z i = 1, . . . , n. Then, for su?ciently well-behaved f (to be described below), we have Ef (X) ≈ Ef (Y ). That is, X can be “replaced” by the modi?ed vector Y for evaluation under suitably smooth f . Note that in the process, we summarized the random vector X into the couple (? ?, σ ? ). The precise statement is as follows:

Theorem 1.2. Suppose X is a random vector with exchangeable components, and ? ?, σ ? and Y are de?ned as in (1) and (2). Let f : Rn → R be a thrice continuously di?erentiable function, and for r = 1, 2, 3, let L′ r (f ) be a uniform bound on all r th partial derivatives of f , including mixed partials. For each p, let mp = E|X1 ? ? ?|p . Then we have the bound

1/2 |Ef (X) ? Ef (Y )| ≤ 9.5m4 L′ + 13m3 L′ 2 (f )n 3 (f )n. 1/2

We postpone the (somewhat long) proof of this theorem until Section 3, giving only a brief sketch at this point. The ?rst step is to show that there is no loss of generality in assuming that ? ? ≡ 0 and σ ? ≡ 1. Having assumed that, if we de?ne Ri = Xi +

i?1 1 Xj , n ? i + 1 j =1

then it is a straightforward exercise (which we will work out, nevertheless) that E(Ri |Fi?1 ) = 0, where Fi?1 is the sigma-algebra generated by X1 , . . . , Xi?1 . The next step is to prove

2 E(Ri |Fi?1 ) = 1 + O((n ? i + 1)?1/2 ),

which is computationally slightly harder. Having established these approximations, we can now replace Ri ’s by independent Gaussian variables V1 , . . . , Vn ,

4

S. CHATTERJEE

using Theorem 1.1. However, the inverse transform, which takes R to X, does not take V to Y or anything resembling Y , but to something that is close to Y in distribution. This will be formalized using Gaussian interpolation techniques. Before moving on to applications, let us quickly mention without detailed justi?cation that Theorem 1.2 has no apparent connection with de Finetti’s theorem [6] or its ?nite version due to Diaconis and Freedman [8]. An application : Wigner ’s law for exchangeable random variables. Let us begin with a very quick introduction to some necessary material from random matrix theory. Spectral measures. The empirical spectral distribution (ESD) of an N × N 1 N square matrix A is the probability distribution N i=1 δλi , where λ1 , . . . , λN are the eigenvalues of A repeated by multiplicities, and δx denotes the point mass at x. The weak limit of a sequence of ESDs is called the limiting spectral distribution (LSD) of the corresponding sequence of matrices. The existence and identi?cation of LSDs for various kinds of random matrices is one of the main goals of random matrix theory. For a Hermitian matrix, the ESD is supported on the real line and hence has a corresponding cumulative distribution function. We will denote the c.d.f. for the ESD of a Hermitian 1 #{i : λi ≤ x}. matrix A by FA . Explicitly, FA (x) = N Wigner matrices. A standard Gaussian Wigner matrix of order N is a matrix of the form AN = (N ?1/2 Xij )1≤i,j ≤N , where (Xij )1≤i≤j ≤N is a collection of i.i.d. standard Gaussian random variables, and Xij = Xji for i > j . Wigner [21] showed that the LSD for a sequence of standard Gaussian Wigner matrices (with order N → ∞) is the semicircle law, which has density √ (2π )?1 4 ? x2 in the interval [?2, 2]. It was later shown that the distribution of the entries does not play a signi?cant role; convergence to the semicircle law holds under more general conditions (cf. [1, 2, 10]). The weakest known condition was given by Pastur [14]. It is claimed that the condition was shown to be necessary by Girko [9]. Although most conditions require independence of the entries on and above the diagonal, there have been some advances (e.g., [7, 17]) allowing certain kinds of dependence. However, none of these cover the case of exchangeable entries. For a detailed exposition of the key results about the spectra of Wigner matrices and other results in the study of the spectral behavior of large random matrices, see [2] or [11]. Here we consider the question of identifying the limiting spectral distributions for Wigner matrices with exchangeable entries. The following theorem gives a precise answer under minimal assumptions.

LINDEBERG PRINCIPLE

5

Theorem 1.3. Suppose that for each N we have a random matrix AN = N N) (N ?1/2 Xij 1≤i,j ≤N , where the collection (Xij )1≤i≤j ≤N is exchangeable and N = X N for i > j . Let Xij ji ? ?N := 2 XN N (N + 1) i≤j ≤N ij and

2 σ ?N :=

2 (X N ? ? ?N )2 . N (N + 1) i≤j ≤N ij

N ?? ?N )/σ ?N |4 < ∞. Assume that σ ?N > 0 a.s. for all N and supN ≥1 E|(X11 ?1 Then the empirical spectral distribution of σ ?N AN converges weakly to the semicircle law in probability.

We will derive the above result from a quantitative bound (Lemma 4.1) on the di?erence between Stieltjes transforms (to be discussed in Section 4). The proof will show that it is actually enough to assume the weaker condition N ?? ? N )/σ ?N |4 = o(N 2/3 ) as N → ∞. It will also be evident that the that E|(X11 argument can be adapted to more complicated exchangeability assumptions than the most basic one assumed above. 2. Proof of Theorem 1.1. Throughout the remainder of this article, we ?f will use the notation ?i f instead of the more familiar ?x . Similarly, we will i

? f write ?i ?j f instead of ?x and so on. i ?xj Now let us begin with the proof. Without loss of generality, we can assume that X and Y are de?ned on the same probability space and are independent. For each i, 0 ≤ i ≤ n, let

2

Zi = (X1 , . . . , Xi , Yi+1 , . . . , Yn ) and Then clearly

n

Z0 i = (X1 , . . . , Xi?1 , 0, Yi+1 , . . . , Yn ).

Ef (X) ? Ef (Y ) =

i=1

(Ef (Zi ) ? Ef (Zi?1 )).

Now, by third-order Taylor approximation,

0 f (Zi ) ? f (Z0 i ) ? Xi ?i f (Zi ) ?

Xi2 2 |Xi |3 L3 (f ) ≤ ?i f (Z0 ) i 2 6 |Yi |3 L3 (f ) Yi2 2 ?i f (Z0 . i) ≤ 2 6

and similarly,

0 f (Zi?1 ) ? f (Z0 i ) ? Yi ?i f (Zi ) ?

Now, since the Yi ’s are independent, we have

0 E((Xi ? Yi ) ?i f (Z0 i )) = E((E(Xi |X1 , . . . , Xi?1 ) ? E(Yi )) ?i f (Zi )).

6

S. CHATTERJEE

Similarly, we also have Thus, for any i,

2 2 2 2 0 E((Xi2 ? Yi2 ) ?i f (Z0 i )) = E((E(Xi |X1 , . . . , Xi?1 ) ? E(Yi )) ?i f (Zi )).

3 3 |Ef (Zi ) ? E(Zi?1 )| ≤ 1 6 L3 (f )(E|Xi | + E|Yi | )

This completes the proof.

1 ≤1 6 L3 (f )M3 + Ai L1 (f ) + 2 Bi L2 (f ).

2 2 2 0 1 + |E((Xi ? Yi ) ?i f (Z0 i ))| + 2 |E((Xi ? Yi )?i f (Zi ))|

3. Proof of Theorem 1.2. First, note that X = Y on the event {σ ? = 0}. Thus, if P{σ ? = 0} = 1, there is nothing to prove. If P{σ ? = 0} < 1, we can condition on the event {σ ? > 0} and consequently assume, without loss of generality, that σ ? > 0 almost surely, because the conditioning retains the exchangeability of the Xi ’s. Thus, let us assume that P{σ ? > 0} = 1. ? i = (Xi ? ? ? i = 0 and n X ?2 For i = 1, . . . , n let X ?)/σ ? . Then n X i=1 i=1 i = n (we will be using these identities numerous times, often without mention). ?i = Zi ? Z ? , where Z ?= 1 n Let Z i=1 Zi . n In the following, we will use E0 and P0 to denote the expectation and ? is a vector with probability conditional on the pair (? ?, σ ? ). Observe that X exchangeable components under P0 for all values of (? ?, σ ? ). Now assume that (? ?, σ ? ) is given and ?xed. Let f0 (x1 , . . . , xn ) := f (? ?+σ ? x1 , . . . , ? ?+σ ? xn ). ′ ? ? Then f (X) = f0 (X) and f (Y ) = f0 (Z). Note that Lr (f0 ) = L′ σ r , where r (f )? ′ Lr (g) denotes a uniform bound on the r th order derivatives of a function g , including mixed partials. First, we need to do a list of computations. For 0 ≤ i ≤ n, let Fi be ?1 , . . . , X ? i } and (? ? j ’s are the sigma-algebra generated by {X ?, σ ? ). Since the X ?k |Fi?1 ) = E0 (X ? l |Fi?1 ) for every exchangeable given (? ?, σ ? ), we have E0 (X k, l > i ? 1, and hence ? i |Fi?1 ) = E0 E0 (X Now, since

n ? j =1 Xj n 1 ? j Fi?1 . X n ? i + 1 j =i

= 0,

n i?1 1 1 ?j = ? ?j , X X n ? i + 1 j =i n ? i + 1 j =1

which is Fi?1 -measurable. Thus, (3) ? i |Fi?1 ) = ? E0 (X

i?1 n 1 1 ?j = ?j . X X n ? i + 1 j =1 n ? i + 1 j =i

LINDEBERG PRINCIPLE

7

From the above identity and exchangeability, it follows that ? i |Fi?1 )2 ) = E0 (E0 (X 1 ? 2) [(n ? i + 1)E0 (X 1 (n ? i + 1)2 ?1 X ? 2 )]. + (n ? i + 1)(n ? i)E0 (X ? 2 ) = E0 ( 1 Now clearly E0 (X 1 n ?1 X ?2 ) = E0 (X

n ?2 i=1 Xi ) = 1

and

1 n(n ? 1)

?i X ?j ) E0 (X

i=j

=?

n 1 ?i2 ) = ? 1 . E0 (X n(n ? 1) i=1 n?1 n ? j =1 Xj

(The second equality holds because (4) ?i |Fi?1 )2 ) = E0 (E0 (X

= 0.) Combining, we get

(i ? 1) . (n ? i + 1)(n ? 1)

Again, by a similar argument as before (using the identity we have ? i2 |Fi?1 ) = E0 (X It follows that ? 2 |Fi?1 )) = Var0 (E0 (X i 1 ? 2) [(n ? i + 1) Var0 (X 1 (n ? i + 1)2

n 1 ? 2. X n ? i + 1 j =i j

n ?2 i=1 Xi

= n),

? 2, X ? 2 )]. + (n ? i + 1)(n ? i) Cov 0 (X 1 2 ? 4 ), and ? 2 ) ≤ E0 (X Now, Var0 (X 1 1

2 ?2 ?1 Cov0 (X , X2 ) =

1 n(n ? 1)

i=j

2 2 2 ?i2 X ?j ?1 ?2 E0 (X ) ? E0 (X )E0 (X )

= = Combining, we get (5)

n 1 ? 2 (n ? X ? 2 )) ? 1 E0 ( X i i n(n ? 1) i=1

? 4) 1 ? E0 (X 1 ≤ 0, n?1

? 4 ) ≥ (E0 (X ? 2 ))2 = 1. since E0 (X 1 1

?4 ?i2 |Fi?1 )) ≤ E0 (X1 ) . Var0 (E0 (X n?i+1

8

S. CHATTERJEE

Now let G be the matrix whose (i, j )th element is gij =

? ? 1/(n ? i + 1),

? . Then Ri is Fi -measurable, and from (3), we see that Let R = GX (6) Next, note that

2 ? i ? E0 (X ? i |Fi?1 ))2 |Fi?1 ) E0 (Ri |Fi?1 ) = E0 ((X

1, ? 0,

if i > j , if i = j , if i < j .

E0 (Ri |Fi?1 ) = 0.

? 2 ) = 1, we get Using (4), (5), the triangle inequality and the fact that E0 (X i

2 ?i2 |Fi?1 ) ? 1| + E0 (E0 (X ? i |Fi?1 )2 ) E0 |E0 (Ri |Fi?1 ) ? 1| ≤ E0 |E0 (X

?i2 |Fi?1 ) ? (E0 (X ?i |Fi?1 ))2 . = E0 (X

(7)

≤ ≤2

? 4) (i ? 1) E0 (X 1 + n ? i + 1 (n ? i + 1)(n ? 1) ? 4) E0 (X 1 n?i+1

4 ?1 since E0 (X ) ≥ 1.

3 1/3 , which is the We will now temporarily use the notation Ri 0 3 for (E0 |Ri | ) conditional L3 norm of Ri . By Minkowski’s inequality and exchangeability, we have

Ri

0 3

?i ≤ X

0 3

+

n 1 ?j X n ? i + 1 j =i

0 3

?1 0 . =2 X 3

This bound can be rewritten as (8) E0 |Ri |3 ≤ 8? σ ?3 E0 |X1 ? ? ?|3 .

Now de?ne the function f1 : Rn → R as f1 (x) := f0 (G?1 x). Then f1 (R) = ? ). Let gij denote the (i, j )th element of G?1 . It is a simple exercise to f0 (X verify that g =

ij

? ? ?1/(n ? j ), ?

1, 0,

if i > j , if i = j , if i < j .

Using the chain rule we see that for any j , r and x,

r ?j f1 (x) = 1≤i1 ,...,ir ≤n

(?i1 ?i2 · · · ?ir f0 )(G?1 x)gi1 j gi2 j · · · gir j .

LINDEBERG PRINCIPLE

9

Thus for any r ,

n r

(9)

Lr (f1 ) ≤ L′ r (f0 ) max

1≤j ≤n

i=1

|gij |

r ′ = L′ σ r 2r , r (f0 )2 = Lr (f )?

where Lr is de?ned as in the statement of Theorem 1.1. Now let V be a standard Gaussian vector in Rn . Using the bounds from (6)–(9) in Theorem 1.1, we get |E0 f1 (R) ? E0 f1 (V)|

n 1 1 2 E0 |E0 (Ri |Fi?1 ) ? 1| + nL3 (f1 ) max (E0 |Ri |3 + E|Vi |3 ) ≤ L2 (f1 ) 1≤i≤n 2 6 i=1 n

≤ 4L′ σ2 2 (f )?

i=1

? 1 |4 = σ Now E|X ? ?4 E0 |X1 ? ? ?√ |4 , and by comparing sums with integrals, we n ? 1 / 2 ≤ 2 n. Thus, have i=1 (n ? i + 1) |E0 f1 (R) ? E0 f1 (V)| (10) ≤ 8L′ ?|4 2 (f ) nE0 |X1 ? ?

′ ?|3 + σ ? 3 E|V1 |3 ). +4 3 nL3 (f )(8E0 |X1 ? ?

? 4) E0 (X 8 1 + nL′ (f )? σ 3 (8? σ ?3 E0 |X1 ? ? ?|3 + E|V1 |3 ). n?i+1 6 3

Let U = G?1 V. Explicitly,

Ui = Vi ?

Vj . n?j j =1

i?1

? ) = f (Y ) and f1 (R) = f0 (X ? ) = f (X). Now note that f1 (V) = f0 (U), f0 (Z Combining, we get ? )| |E0 f (X) ? E0 f (Y )| = |E0 f1 (R) ? E0 f0 (Z (11) ≤ |E0 f1 (R) ? E0 f1 (V)| ? )|. + |E0 f0 (U) ? E0 f0 (Z

We already have a bound on |E0 f1 (R) ? E0 f1 (V)| from (10). We will now ? )|, where recall that Z ?i = Zi ? Z ?, compute a bound on |E0 f0 (U) ? E0 f0 (Z and Z is a standard Gaussian vector. To do that, we ?rst need to do some computations. Let σ ?ij := Cov(Ui , Uj ). Then

? j ?1 ? ? ? 1 ? ? ?(n ? j ) + (n ? k)?2 , ? ? ? ? k =1

j ?1

if i > j , if i = j , if i < j .

σ ?ij =

? ? 1+ (n ? k)?2 , ? ? ? ? k =1 ? ?

σ ?ji ,

10

S. CHATTERJEE

Now, for i > j , we can rewrite the ?rst term in σ ?ij as a telescoping sum to get σ ?ij = ? =? Thus, if we de?ne ?i , Z ?j ) = σij := Cov(Z then

n i,j n i?1

1 1 1 ? ? n ? 1 k=1 n ? k ? 1 n ? k 1 1 ? . 2 n ? 1 k=1 (n ? k) (n ? k ? 1) ?1/n, (n ? 1)/n,

j ?1

j ?1

+

1 (n ? k)2 k =1

j ?1

if i = j , if i = j ,

|σij ? σ ?ij | =

i=1

|σii ? σ ?ii | + 2

n i?1

i=1 j =1

|σij ? σ ?ij |

≤2+ (12) +2

1 ( n ? k)2 i=1 k =1

n i?1 j ?1 i=1 j =1 k =1 n?1

1 (n ? k)2 (n ? k ? 1)

=2+

n?2 n?1 1 1 1 + =3+2 . n ? k k=1 n ? k k k =1 k =2

We will use the well-known “Gaussian interpolation technique” for bounding ? )|. This classical method for proving Slepian-type inequal|E0 f0 (U) ? E0 f0 (Z ities has been used extensively in recent years by Talagrand [19] in his e?orts to obtain a rigorous the cavity method for spin glasses. For each √version of√ ? . Then t ∈ [0, 1], let Wt = 1 ? tU + tZ ? ) ? E0 f0 (U) E0 f0 (Z

1

(13)

= E0 = E0

d f0 (Wt ) dt 0 dt 1 n ?i Ui Z √ ?i f0 (Wt ) dt . ? √ 2 1?t 0 i=1 2 t

Now, if a random vector ξ = (ξ1 , . . . , ξn ) has a centered Gaussian distribution, then it is not di?cult to show using integration by parts that for any di?erentiable function h with subexponential growth at in?nity, and any i,

LINDEBERG PRINCIPLE

11

the following identity holds:

n

E(ξi h(ξ )) =

j =1

E(ξi ξj )E(?j h(ξ )).

Since we do not want to expand our list of references, let us refer to Appendix A.6 of Talagrand’s book [19] for a proof. Applying this result to our problem (after noting that interchanging integrals is not an issue since everything is bounded), we get E0 (Ui ?i f0 (Wt )) = and similarly, ?i ?i f0 (Wt )) = E0 (Z Combining, we have ? ) ? E0 f0 (U) = E0 f0 (Z

1 2 1 0 1≤i,j ≤n

√ 1?t

n

σ ?ij E0 (?j ?i f0 (Wt ))

j =1

√ t

n

σij E0 (?j ?i f0 (Wt )).

j =1

E0 [?i ?j f0 (Wt )](σij ? σ ?ij ) dt.

Using the bound from (12), we get (14)

n?1 1 ? )| ≤ 1 L′ (f0 ) 3 + 2 |E0 f0 (U) ? E0 f0 (Z . 2 2 k k =2

Combining this with (10) and (14), we get |Ef (X) ? Ef (Y )| ≤ E|E0 f (X) ? E0 f (Y )| ≤ 8L′ ?|4 2 (f ) nE|X1 ? ? 4 (f )(8E|X1 ? ? ?|3 + E(? σ 3 )E|V1 |3 ) + nL′ 3 3

n?1 1 ′ 1 2 + L2 (f )E(? σ ) 3+2 . 2 k k =2

To complete the proof, we apply Jensen’s inequality to get E(? σ r ) ≤ E|X1 ? √ n?1 ?1 r ? ?| for r ≥ 2 and use the crude bounds 3 + 2 k=2 k ≤ 3 n and E|V1 |3 ≤ 1.7 to unify terms.

12

S. CHATTERJEE

4. Proof of Theorem 1.3. We will now prove Theorem 1.3 via an application of Theorem 1.2, by using the Stieltjes transform of the spectral measure as a smooth function of the matrix entries. The Stieltjes transform (or Cauchy transform, or resolvent) of a cumulative probability distribution function F on R is de?ned as ∞ 1 (15) dF (x) for every z ∈ C\R. mF (z ) := x ? z ?∞ Analogously, the Stieltjes transform of an N × N Hermitian matrix A at a number z ∈ C\R is de?ned as 1 Tr((A ? zI )?1 ), N where I is the identity matrix of order N . Note that this is just the Stieltjes transform of the empirical spectral distribution (ESD) of A. The ESDs of a sequence {AN }∞ N =1 of random Hermitian matrices converge weakly in probability to a distribution F if and only if (16) mA (z ) := mAN (z ) ?→ mF (z )

P

for every z ∈ C\R.

For the proof of this result and further details like Berry–Esseen-type error bounds, we refer to [2], pages 639–640. Now recall that if A(x) is a matrix-valued di?erentiable function of a scalar x, and G(x) := (A(x) ? zI )?1 , where z ∈ C\R and I is the identity matrix, then dG dA = ?G G. dx dx This standard result is obtained by di?erentiating both sides of the identity G(A ? zI ) ≡ I . Di?erentiability follows from the fact that the elements of the inverse of a matrix are all rational functions of the elements of the original matrix. Higher-order derivatives may be computed by repeatedly applying the above formula. The following lemma is the key to the proof of Theorem 1.3: (17) ?N = Lemma 4.1. Suppose that for each N we have a random matrix A ? 1 / 2 N N ? ? (N Xij )1≤i,j ≤N , where the collection (Xij )1≤i≤j ≤N is exchangeable and N ? ? N for i > j . Suppose Xij = X ji 2 ?N = 0 X N (N + 1) i≤j ≤N ij and 2 ? N )2 = 1 (X N (N + 1) i≤j ≤N ij a.s.

N) For each N , let (Zij 1≤i≤j ≤N be a collection of i.i.d. standard Gaussian 2 N N = ZN ? Z ? N , where Z ?N = random variables and let Yij i≤j Zij . ij N (N +1)

LINDEBERG PRINCIPLE

13

N = Y N for i > j , and let B = (N ?1/2 Y N ) Let Yij N ij 1≤i,j ≤N . Then, for any ji z ∈ C\R, and any g : R → R with bounded derivatives up to the third order, we have

|Eg(Re mA ?N (z )) ? Eg (Re mBN (z ))| where m is the Stieltjes transform as de?ned in (16) and C1 and C2 are constants depending only on g and z . The quantity |Eg(Im mA ?N (z )) ? Eg (Im mBN (z ))| also admits the same upper bound. To complete the proof of Theorem 1.3 using this lemma, we need the following fact about spectral distributions of Hermitian matrices: Lemma 4.2 (Quoted from [2], Lemma 2.2). Let A and B be two N × N Hermitian matrices, with empirical distribution functions FA and FB . Then 1 FA ? FB ∞ ≤ rank(A ? B ). N This lemma is an easy consequence of the well-known interlacing inequalities for eigenvalues of Hermitian matrices. Let us now complete the proof of Theorem 1.3 by combining Lemma 4.1 and Lemma 4.2. ?N = ? N = (X N ? ? ?N )/σ ?N , and let A Proof of Theorem 1.3. Let X ij ij ? 1 / 2 N ?N satis?es the hypotheses of Lemma 4.1. ? )1≤i,j ≤N . Clearly, A (N X ij Thus, we have |Eg(Re mA ?N (z )) ? Eg (Re mBN (z ))| The same bound holds for |Eg(Im mA ?N (z )) ? Eg (Im mBN (z ))| as well. The 4 N ? | = o(N 2/3 ), and thus under that condition, bound converges to zero if E|X 12 ?N and BN must have the same LSD. Finally, observe that by Lemma 4.2, A ?1 ?N }, and FB converges the sequence {σ ?N AN } has the same LSD as {A N weakly to the semicircle distribution in probability. This completes the proof. Proof of Lemma 4.1. To formalize things in a way that is suitable for our purpose, consider the map A which “constructs” Wigner matrices of order N . Let n = N (N + 1)/2 and write elements of Rn as x = (xij )1≤i≤j ≤N . For any x ∈ Rn , let A(x) be the matrix de?ned as (18) A(x)ij := N ?1/2 xij , N ?1/2 xji , if i ≤ j , if i > j .

N 4 1/2 N 3 ?12 ? 12 ≤ C1 N ?1 (E|X | ) + C2 N ?1/2 E|X | .

? N |4 )1/2 + C2 N ?1/2 E|X ? N |3 , ≤ C1 N ?1 (E|X 12 12

14

S. CHATTERJEE

√ Now let us ?x z = u + ?1v ∈ C, with v = 0. Let G(x) := (A(x) ? zI )?1 , and de?ne h : Rn → R as h(x) := N ?1 Tr(G(x)). For any α ∈ {(i, j )}1≤i≤j ≤n , we will write ?α h for ?h/?xα by our usual convention. From (17), it follows that for any α, (19) Now note that for any α, β ∈ {(i, j )}1≤i≤j ≤n , we have ?β ?α A ≡ 0. An easy computation involving repeated applications of (17) to the above expression for ?α h gives, for any α, β, γ ∈ {(i, j )}1≤i≤j ≤N , (20) ?β ?α h = N ?1 Tr(G(?β ′ A)G(?α′ A)G),

{β ′ ,α′ }={β,α}

?α h = ?N ?1 Tr(G(?α A)G).

(21) ?γ ?β ?α h = ?N ?1

Tr(G(?γ ′ A)G(?β ′ A)G(?α′ A)G).

{γ ′ ,β ′ ,α′ }={γ,β,α}

Note that the ?rst sum runs over all permutations of (β, α), which amounts to only two terms. Similarly, the second sum involves six terms. To bound (19), ?rst note that Tr(G(?α A)G) = Tr((?α A)G2 ). Since G2 has a spectral decomposition and all its eigenvalues are bounded by |v |?2 in magnitude, it follows in particular that the elements of G2 are also bounded by |v |?2 . Now, ?α A has at most two nonzero elements, which are equal to N ?1/2 . Hence, | Tr(G(?α A)G)| = | Tr((?α A)G2 )| ≤ 2|v |?2 N ?1/2 . To bound (20) and (21), we need to recall the properties of the Hilbert– Schmidt norm for matrices. For an N × N complex matrix B = (bij )1≤i,j ≤N , the Hilbert–Schmidt norm of B is de?ned as B := ( i,j |bij |2 )1/2 . Besides the usual properties of a matrix norm, it has the following additional features: (a) | Tr(BC )| ≤ B C , (b) if U is a unitary matrix, then for any C of the same order, CU = U C = C , and (c) for a matrix B admitting a spectral decomposition (i.e., a normal matrix) with eigenvalues λ1 , . . . , λN , and any other matrix C of the same order, max{ BC , CB } ≤ max1≤i≤N |λi | · C . For a proof of these standard facts one can look up, for example, [22], pages 55–58. Clearly, G and the derivatives of A are all normal matrices. Moreover, the eigenvalues of G are bounded by |v |?1 , where v = Im z . Thus, by the properties of the Hilbert–Schmidt norm listed above, we have | Tr(G(?β A)G(?α A)G)| ≤ G(?β A) G(?α A)G ≤ 2|v |?3 N ?1 . ≤ |v |?3 ?β A ?α A

LINDEBERG PRINCIPLE

15

Similarly, | Tr(G(?γ A)G(?β A)G(?α A)G)| ≤ G(?γ A) G(?β A)G(?α A)G ≤ G(?γ A) G(?β A)G (?α A)G ≤ |v |?4 ?γ A ?β A ?α A ≤ 23/2 |v |?4 N ?3/2 .

Finally, note that since the matrix entries are all real, therefore ?α (Re h) = Re ?α h and so on. Thus, if we let f = g ? (Re h), then substituting the bounds ?2 and L′ (f ) ≤ obtained above in (19), (20) and (21), we get L′ 3 2 (f ) ≤ K1 N ? 5 / 2 K2 N , where K1 and K2 are constants depending only on g and z . By Theorem 1.2, it now follows that ? ) ? Ef (Y )| ≤ 9.5K1 N ?1 (E|X ? 12 |4 )1/2 + 13K2 N ?1/2 E|X ? 12 |3 , |Ef (X

? , and Zij ’s are i.i.d. standard Gaussian random variables. where Yij = Zij ? Z This completes the proof of the lemma. Acknowledgments. The author thanks the anonymous referee for his very detailed and helpful comments on the material and the style of presentation. He is also grateful to his former thesis advisor Persi Diaconis for constant encouragement and support. REFERENCES

[1] Arnold, L. (1967). On the asymptotic distribution of the eigenvalues of random matrices. J. Math. Anal. Appl. 20 262–268. MR0217833 [2] Bai, Z. D. (1999). Methodologies in the spectral analysis of large-dimensional random matrices, a review. Statist. Sinica 9 611–677. MR1711663 [3] Baik, J. and Suidan, T. M. (2005). A GUE central limit theorem and universality of directed ?rst and last passage site percolation. Int. Math. Res. Not. 6 325–337. MR2131383 [4] Bodineau, T. and Martin, J. (2005). A universality property for last-passage percolation paths close to the axis. Electron. Comm. Probab. 10 105–112. MR2150699 [5] Chatterjee, S. (2004). A simple invariance theorem. Available at http://arxiv.org/math.PR/0508213 . [6] de Finetti, B. (1969). Sulla prosequibilit` a di processi aleatori scambiabili. Rend. Ist. Mat. Univ. Trieste 1 53–67. MR0292146 [7] Boutet de Monvel, A. and Khorunzhy, A. (1998). Limit theorems for random matrices. Markov Process. Related Fields 4 175–197. MR1641617 [8] Diaconis, P. and Freedman, D. (1980). Finite exchangeable sequences. Ann. Probab. 8 745–764. MR0577313 [9] Girko, V. L. (1988). Spectral Theory of Random Matrices. Nauka, Moscow. MR0955497 [10] Grenander, U. (1963). Probabilities on Algebraic Structures. Wiley, New York. MR0259969

16

S. CHATTERJEE

[11] Khorunzhy, A. M., Khoruzhenko, B. A. and Pastur, L. A. (1996). Asymptotic properties of large random matrices with independent entries. J. Math. Phys. 37 5033–5060. MR1411619 [12] Lindeberg, J. W. (1922). Eine neue herleitung des exponentialgesetzes in der wahrscheinlichkeitsrechnung. Math. Z. 15 211–225. MR1544569 [13] Mossel, E., O’Donnell, R. and Oleszkiewicz, K. (2005). Noise stability of functions with low in?uences: Invariance and optimality. Available at http://arxiv.org/math.PR/0503503 . [14] Pastur, L. A. (1972). The spectrum of random matrices. Teoret. Mat. Fiz. 10 102– 112. MR0475502 ˇkauskas, A. (1989). Approximation Theory in the Cen[15] Paulauskas, V. and Rac tral Limit Theorem. Exact Results in Banach Spaces. Kluwer, Dordrecht. MR1015294 [16] Rotar, V. I. (1979). Limit theorems for polylinear forms. J. Multivariate Anal. 9 511–530. MR0556909 [17] Schenker, J. H. and Schulz-Baldes, H. (2005). Semicircle law and freeness for random matrices with symmetries or correlations. Math. Res. Lett. 12 531–542. MR2155229 [18] Suidan, T. (2006). A remark on a theorem of Chatterjee and last passage percolation. J. Phys. A 39 8977–8981. MR2240468 [19] Talagrand, M. (2003). Spin Glasses: A Challenge for Mathematicians. Cavity and Mean Field Models. Springer, Berlin. MR1993891 [20] Trotter, H. F. (1959). Elementary proof of the central limit theorem. Archiv der Mathem. 10 226–234. MR0108847 [21] Wigner, E. P. (1958). On the distribution of the roots of certain symmetric matrices. Ann. of Math. (2) 67 325–327. MR0095527 [22] Wilkinson, J. H. (1965). The Algebraic Eigenvalue Problem. Clarendon Press, Oxford. MR0184422 [23] Zolotarev, V. M. (1977). Ideal metrics in the problem of approximating the distributions of sums of independent random variables. Theory Probab. Appl. 22 449–465. MR0455066

Department of Statistics 367 Evans Hall #3860 University of California, Berkeley Berkeley, California 94720 USA E-mail: sourav@stat.berkeley.edu URL: http://www.stat.berkeley.edu/?sourav

- Generalization of the Principle of Chopper Stabilization
- Estimating the Generalization Performance of a SVM
- Generalization of the Atkinson-Wilcox Theorem and the Development of a Novel Scaled Boundar
- A novel generalization of the gray-scale histogram and its application
- Estimating the Generalization Performance of a SVM Efficiently
- A Generalization of the Least General Generalization
- 1 An Economic Theory of the GATT A Generalization
- THE GENERALIZATION ARGUMENT AND THE GENERALIZATION PRINCIPLE
- The B-Exponential Map A Generalization of the Logistic Map, and Its Applications In Generat
- A Generalization of Wilkie's Theorem of the Complement, and an Application to Pfaffian Clos

更多相关文章：
**
Anisotropic diffusion - Wikipedia, ***the* free encyclopedia.pdf

Both these cases can be described by*a* *generalization* *of* *the* usual ...^ *Lindeberg*, T., Scale-Space Theory in Computer Vision, Kluwer Academic ...**
Scale-Space Events for ***the* *Generalization* *of* 3D-Building Data....pdf

Scale-Space Events for*the* *Generalization* *of* 3D-Building Data Adjustment_专业...nition *of* *the* concept “scale” (*Lindeberg*, 1994) in terms *of* signals/...**
...statistics ***a* *generalization* *of* Bose-Einstein's *principle*_....pdf

Non-exclusion statistics*a* *generalization* *of* Bose-Einstein's *principle*_专业资料。By constructing *the* super-particle representation *of* *the* free boson gas, we ...**
***Generalization* *of* *the* *Principle* *of* Chopper Stabilization.pdf

50, NO. 8, AUGUST 2003 975*Generalization* *of* *the* *Principle* *of* Chopper Stabilization László Tóth and Yannis P. Tsividis, Fellow, IEEE Abstract*The* ...**
Constrained twistors ***principle* *of* *the* string theory.pdf

Constrained twistors*principle* *of* *the* string theory_专业资料。*The* new ...*The* *generalization* *of* these results to *the* case *of* six space-time ...**
Remark on ***the* Second *Principle* *of* Thermodynamics.pdf

Remark on*the* Second *Principle* *of* Thermodynamics_专业资料。All presently ...*A* variety *of* mathematical tools [11] (e.g., *the* q-*generalization* *of* ...**
...Aerodynamics for Complex Configurations ***A* *General* Theory_....pdf

*The* theory presented here is *a* *generalization* *of* *the* Huygens' *Principle* to ...Mangier, K. W. and Smith, J. H. B, "Behaviour *of* *the* Vortex pp. ...**
Resolution and ***the* weak pigeonhole *principle*.pdf

In some cases, ner distinctions can be made using generalized forms*of* *the* pigeonhole *principle*. One such *principle* is *the* \m into n" *generalization* ...**
...***the* Cosmological Parameters in *the* Relativistic Theory *of* ....pdf

27/387,*A*-1040 Vienna, Austria u Abstract *The* causality *principle* imposes ...*The* *generalization* *of* *the* ?eld equation by *the* means *of* *the* insertion *of* ...**
An analogue ***of* Beurling’s theorem for *the* Jacobi t....pdf

Keywords: Jacobi transform, Uncertainty*principle*, Beurling’s theorem, Abel ...*The* Jacobi transform can be regarded as *the* *generalization* *of* *the* Helgason-...**
... Part II ***a* Local-Global *Principle* for Lexicographic.pdf

*General* Edge-isoperimetric Inequalities, Part II *a* Local-Global *Principle* for... 1971. 10 ] G.F. Clements and B. Lindstrom, \*A* *generalization* *of* *a* ...**
***A* *Generalization* *of* *the* Maximum Noise Fraction Transform.pdf

Gnanadesikan [9] proposed*a* *generalization* *of* *the* *principle* component transform. Powers *of* *the* original bands were appended to *the* data set as new bands...**
***The* Cooperative *Principle*_图文.ppt

4. Flouting*the* maxim *of* Manner Vague,Over-*generalization* References ? Wikipedia, *the* free encyclopedia,“Cooperative *principle*” https://en.wikipedia.org/...**
$kappa$-***generalization* *of* Gauss' law *of* error.pdf

We show*the* κ-*generalization* *of* Gauss’ law *of* error, in which *the* κ-generalized maximum likelihood *principle* leads to *the* κ-generalized Gaussian ...**
***lindeberg* feller 中心极限定理的一个概率方法证明.pdf

*lindeberg* feller 中心极限定理的一个概率方法证明_理学_高等教育_教育专区。...From this *generalization* it now becomes somewhat clearer why various ...**
...***of* Special Relativity and *the* *Principle* *of* Conse....pdf

Foundations*of* Special Relativity and *the* *Principle* *of* Conservation *of* ...*The* third is *a* *generalization* *of* Newtonian dynamics. *The* two ?rst laws ...**
...Mechanics for Identical Particles ***The* Calogero *a*....pdf

It is pointed out that Haldane’s*generalization* *of* *the* Pauli *principle* can be deduced from *the* anyon model in *a* strong magnetic ?eld at low ...**
...and ***a* Limit on *the* Violation *of* *the* Pauli *Principle* f.pdf

Using this*generalization* we ?nd bounds on possible violations *of* *the* Pauli exclusion *principle* for nucleons and quarks based on such bounds for nuclei. ...**
...***a* Trapped Strongly Interacting Fermi Gases *of* Atoms.pdf

On*the* basis *of* *a* *generalization* *of* *the* variational Schwinger method, we ...4. Schwinger variational *principle* *The* Schwinger variational *principle* (SVP) [...**
Some theoretical properties ***of* interval-valued cond....pdf

Then,*a* ?exible approach can be obtained by using imprecise probabilities, based on *a* suitable *generalization* *of* *the* coherence *principle* *of* de Finetti, or... 更多相关标签：

Both these cases can be described by

Scale-Space Events for

Non-exclusion statistics

50, NO. 8, AUGUST 2003 975

Constrained twistors

Remark on

In some cases, ner distinctions can be made using generalized forms

27/387,

Keywords: Jacobi transform, Uncertainty

Gnanadesikan [9] proposed

4. Flouting

We show

Foundations

It is pointed out that Haldane’s

Using this

On

Then,