9512.net

A generalization of the Lindeberg principle

The Annals of Probability 2006, Vol. 34, No. 6, 2061–2076 DOI: 10.1214/009117906000000575 c Institute of Mathematical Statistics, 2006

arXiv:math/0508519v3 [math.PR] 27 Feb 2007

A GENERALIZATION OF THE LINDEBERG PRINCIPLE1 By Sourav Chatterjee University of California, Berkeley
We generalize Lindeberg’s proof of the central limit theorem to an invariance principle for arbitrary smooth functions of independent and weakly dependent random variables. The result is applied to get a similar theorem for smooth functions of exchangeable random variables. This theorem allows us to identify, for the ?rst time, the limiting spectral distributions of Wigner matrices with exchangeable entries.

1. Introduction and results. J. W. Lindeberg’s elegant proof of the central limit theorem , despite being in the shadow of Fourier analytic methods for a long time, is now well known. It was revived by Trotter  and has since been successfully used to derive CLT’s in in?nite-dimensional spaces, where the Fourier analytic methods are not so useful. While the original Lindeberg method and its extensions compare the distributions of convolutions in great generality (the history of which is irrelevant to our discussion, so we refer to  for details), it soon becomes clear that the same principle works not only for sums, but for more general smooth functions as well. Comparison of f (X1 , . . . , Xn ) and f (Y1 , . . . , Yn ) for polynomial f has been examined by Rotar  and Mossel, O’Donnell and Oleszkiewicz , and for general smooth f with bounded derivatives by Chatterjee . In , it is shown how to apply the method to establish universality in physical models, including the Sherrington–Kirkpatrick model of spin glasses. It was recently observed by Tou?c Suidan  that the results in  can be used to give an immediate proof of the universality of last passage percolation in thin rectangles (originally a result of  and ). Developed independently, the Mossel, O’Donnell and Oleszkiewicz paper  is another repository of very striking modern applications of this very old idea.
Received August 2005; revised March 2006. Supported in part by NSF Grant DMS-04-06042. AMS 2000 subject classi?cations. Primary 60F17; secondary 60G09, 15A52. Key words and phrases. Lindeberg, de Finetti, Wigner, exchangeable, random matrices, invariance principle, semicircle law.
1

This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Probability, 2006, Vol. 34, No. 6, 2061–2076. This reprint di?ers from the original in pagination and typographic detail. 1

2

S. CHATTERJEE

A closer examination of Lindeberg’s method reveals that there is a direct generalization by which the independence of the coordinates can be dispensed with. The argument, which may be possible to guess once the theorem has been stated, will be given in Section 2. Theorem 1.1. Suppose X and Y are random vectors in Rn with Y having independent components. For 1 ≤ i ≤ n, let Bi := E|E(Xi2 |X1 , . . . , Xi?1 ) ? E(Yi2 )|. Ai := E|E(Xi |X1 , . . . , Xi?1 ) ? E(Yi )|,

Let M3 be a bound on maxi (E|Xi |3 + E|Yi |3 ). Suppose f : Rn → R is a thrice continuously di?erentiable function, and for r = 1, 2, 3, let Lr (f ) be a ?nite r f (x)| ≤ L (f ) for each i and x, where ? r denotes the constant such that |?i r i r -fold derivative in the ith coordinate. Then
n

|Ef (X) ? Ef (Y )| ≤

1 (Ai L1 (f ) + 1 2 Bi L2 (f )) + 6 nL3 (f )M3 . i=1

Let us now say a bit about the condition of boundedness of third derivatives. The implications of this condition have been inspected in detail in the context of convolutions by Zolotarev (see, e.g., ) and other authors. Zolotarev de?nes the ζ3 -metric on the space of distributions as follows: ζ3 (F, G) = supf | f dF ? f dG|, where the sup is taken over all f with third derivative bounded by 1. The ζ3 metric has not been so popular in practice because of the di?culty in connecting this metric with the common notions of distance between measures. However, instead of taking supremum over a class of f ’s, we consider only individual functions of interest. For instance, in the random matrix scenario, our f will be the Stieltjes transform of a matrix at a ?xed z ∈ C\R, which is a nice C ∞ function of the original matrix. In the paper , the author considered the partition function of a disordered physical system (the Sherrington–Kirkpatrick model of spin glasses), which again turns out to be a C ∞ function of the disorder matrix, and has nicely bounded derivatives. The condition of boundedness of the derivatives can be dropped (as demonstrated in  and ) by careful examination of the remainder term; but our focus is di?erent: We are more concerned with ways to extend the method to the case of weakly dependent variables (as in Theorem 1.1 above), and more speci?cally, to exchangeable random variables, as below. Exchangeable random variables. Let us now present a surprisingly nontrivial application of the basic tool developed in the previous section. Suppose X is a vector with exchangeable components. Certainly, we cannot

LINDEBERG PRINCIPLE

3

expect to replace the components of X by independent Gaussians as we did in Theorem 1.1. For instance, all the components may be equal to the same random variable, in which case there is no hope of replacing these variables by something generic. However, not all is lost; our next theorem shows that the following “summarization” of X can still be carried through: Suppose X is a vector with exchangeable components, having ?nite fourth moments. Let (1) ? ? := 1 n Xi n i=1 and σ ? 2 := 1 n (Xi ? ? ?)2 . n i=1

? := Let Z be a standard Gaussian vector in Rn , independent of X. Let Z n 1 i=1 Zi and n (2) ? ), Yi := ? ?+σ ? (Zi ? Z i = 1, . . . , n. Then, for su?ciently well-behaved f (to be described below), we have Ef (X) ≈ Ef (Y ). That is, X can be “replaced” by the modi?ed vector Y for evaluation under suitably smooth f . Note that in the process, we summarized the random vector X into the couple (? ?, σ ? ). The precise statement is as follows:

Theorem 1.2. Suppose X is a random vector with exchangeable components, and ? ?, σ ? and Y are de?ned as in (1) and (2). Let f : Rn → R be a thrice continuously di?erentiable function, and for r = 1, 2, 3, let L′ r (f ) be a uniform bound on all r th partial derivatives of f , including mixed partials. For each p, let mp = E|X1 ? ? ?|p . Then we have the bound
1/2 |Ef (X) ? Ef (Y )| ≤ 9.5m4 L′ + 13m3 L′ 2 (f )n 3 (f )n. 1/2

We postpone the (somewhat long) proof of this theorem until Section 3, giving only a brief sketch at this point. The ?rst step is to show that there is no loss of generality in assuming that ? ? ≡ 0 and σ ? ≡ 1. Having assumed that, if we de?ne Ri = Xi +
i?1 1 Xj , n ? i + 1 j =1

then it is a straightforward exercise (which we will work out, nevertheless) that E(Ri |Fi?1 ) = 0, where Fi?1 is the sigma-algebra generated by X1 , . . . , Xi?1 . The next step is to prove
2 E(Ri |Fi?1 ) = 1 + O((n ? i + 1)?1/2 ),

which is computationally slightly harder. Having established these approximations, we can now replace Ri ’s by independent Gaussian variables V1 , . . . , Vn ,

4

S. CHATTERJEE

using Theorem 1.1. However, the inverse transform, which takes R to X, does not take V to Y or anything resembling Y , but to something that is close to Y in distribution. This will be formalized using Gaussian interpolation techniques. Before moving on to applications, let us quickly mention without detailed justi?cation that Theorem 1.2 has no apparent connection with de Finetti’s theorem  or its ?nite version due to Diaconis and Freedman . An application : Wigner ’s law for exchangeable random variables. Let us begin with a very quick introduction to some necessary material from random matrix theory. Spectral measures. The empirical spectral distribution (ESD) of an N × N 1 N square matrix A is the probability distribution N i=1 δλi , where λ1 , . . . , λN are the eigenvalues of A repeated by multiplicities, and δx denotes the point mass at x. The weak limit of a sequence of ESDs is called the limiting spectral distribution (LSD) of the corresponding sequence of matrices. The existence and identi?cation of LSDs for various kinds of random matrices is one of the main goals of random matrix theory. For a Hermitian matrix, the ESD is supported on the real line and hence has a corresponding cumulative distribution function. We will denote the c.d.f. for the ESD of a Hermitian 1 #{i : λi ≤ x}. matrix A by FA . Explicitly, FA (x) = N Wigner matrices. A standard Gaussian Wigner matrix of order N is a matrix of the form AN = (N ?1/2 Xij )1≤i,j ≤N , where (Xij )1≤i≤j ≤N is a collection of i.i.d. standard Gaussian random variables, and Xij = Xji for i > j . Wigner  showed that the LSD for a sequence of standard Gaussian Wigner matrices (with order N → ∞) is the semicircle law, which has density √ (2π )?1 4 ? x2 in the interval [?2, 2]. It was later shown that the distribution of the entries does not play a signi?cant role; convergence to the semicircle law holds under more general conditions (cf. [1, 2, 10]). The weakest known condition was given by Pastur . It is claimed that the condition was shown to be necessary by Girko . Although most conditions require independence of the entries on and above the diagonal, there have been some advances (e.g., [7, 17]) allowing certain kinds of dependence. However, none of these cover the case of exchangeable entries. For a detailed exposition of the key results about the spectra of Wigner matrices and other results in the study of the spectral behavior of large random matrices, see  or . Here we consider the question of identifying the limiting spectral distributions for Wigner matrices with exchangeable entries. The following theorem gives a precise answer under minimal assumptions.

LINDEBERG PRINCIPLE

5

Theorem 1.3. Suppose that for each N we have a random matrix AN = N N) (N ?1/2 Xij 1≤i,j ≤N , where the collection (Xij )1≤i≤j ≤N is exchangeable and N = X N for i > j . Let Xij ji ? ?N := 2 XN N (N + 1) i≤j ≤N ij and
2 σ ?N :=

2 (X N ? ? ?N )2 . N (N + 1) i≤j ≤N ij

N ?? ?N )/σ ?N |4 < ∞. Assume that σ ?N > 0 a.s. for all N and supN ≥1 E|(X11 ?1 Then the empirical spectral distribution of σ ?N AN converges weakly to the semicircle law in probability.

We will derive the above result from a quantitative bound (Lemma 4.1) on the di?erence between Stieltjes transforms (to be discussed in Section 4). The proof will show that it is actually enough to assume the weaker condition N ?? ? N )/σ ?N |4 = o(N 2/3 ) as N → ∞. It will also be evident that the that E|(X11 argument can be adapted to more complicated exchangeability assumptions than the most basic one assumed above. 2. Proof of Theorem 1.1. Throughout the remainder of this article, we ?f will use the notation ?i f instead of the more familiar ?x . Similarly, we will i
? f write ?i ?j f instead of ?x and so on. i ?xj Now let us begin with the proof. Without loss of generality, we can assume that X and Y are de?ned on the same probability space and are independent. For each i, 0 ≤ i ≤ n, let
2

Zi = (X1 , . . . , Xi , Yi+1 , . . . , Yn ) and Then clearly
n

Z0 i = (X1 , . . . , Xi?1 , 0, Yi+1 , . . . , Yn ).

Ef (X) ? Ef (Y ) =

i=1

(Ef (Zi ) ? Ef (Zi?1 )).

Now, by third-order Taylor approximation,
0 f (Zi ) ? f (Z0 i ) ? Xi ?i f (Zi ) ?

Xi2 2 |Xi |3 L3 (f ) ≤ ?i f (Z0 ) i 2 6 |Yi |3 L3 (f ) Yi2 2 ?i f (Z0 . i) ≤ 2 6

and similarly,
0 f (Zi?1 ) ? f (Z0 i ) ? Yi ?i f (Zi ) ?

Now, since the Yi ’s are independent, we have
0 E((Xi ? Yi ) ?i f (Z0 i )) = E((E(Xi |X1 , . . . , Xi?1 ) ? E(Yi )) ?i f (Zi )).

6

S. CHATTERJEE

Similarly, we also have Thus, for any i,
2 2 2 2 0 E((Xi2 ? Yi2 ) ?i f (Z0 i )) = E((E(Xi |X1 , . . . , Xi?1 ) ? E(Yi )) ?i f (Zi )).

3 3 |Ef (Zi ) ? E(Zi?1 )| ≤ 1 6 L3 (f )(E|Xi | + E|Yi | )

This completes the proof.

1 ≤1 6 L3 (f )M3 + Ai L1 (f ) + 2 Bi L2 (f ).

2 2 2 0 1 + |E((Xi ? Yi ) ?i f (Z0 i ))| + 2 |E((Xi ? Yi )?i f (Zi ))|

3. Proof of Theorem 1.2. First, note that X = Y on the event {σ ? = 0}. Thus, if P{σ ? = 0} = 1, there is nothing to prove. If P{σ ? = 0} < 1, we can condition on the event {σ ? > 0} and consequently assume, without loss of generality, that σ ? > 0 almost surely, because the conditioning retains the exchangeability of the Xi ’s. Thus, let us assume that P{σ ? > 0} = 1. ? i = (Xi ? ? ? i = 0 and n X ?2 For i = 1, . . . , n let X ?)/σ ? . Then n X i=1 i=1 i = n (we will be using these identities numerous times, often without mention). ?i = Zi ? Z ? , where Z ?= 1 n Let Z i=1 Zi . n In the following, we will use E0 and P0 to denote the expectation and ? is a vector with probability conditional on the pair (? ?, σ ? ). Observe that X exchangeable components under P0 for all values of (? ?, σ ? ). Now assume that (? ?, σ ? ) is given and ?xed. Let f0 (x1 , . . . , xn ) := f (? ?+σ ? x1 , . . . , ? ?+σ ? xn ). ′ ? ? Then f (X) = f0 (X) and f (Y ) = f0 (Z). Note that Lr (f0 ) = L′ σ r , where r (f )? ′ Lr (g) denotes a uniform bound on the r th order derivatives of a function g , including mixed partials. First, we need to do a list of computations. For 0 ≤ i ≤ n, let Fi be ?1 , . . . , X ? i } and (? ? j ’s are the sigma-algebra generated by {X ?, σ ? ). Since the X ?k |Fi?1 ) = E0 (X ? l |Fi?1 ) for every exchangeable given (? ?, σ ? ), we have E0 (X k, l > i ? 1, and hence ? i |Fi?1 ) = E0 E0 (X Now, since
n ? j =1 Xj n 1 ? j Fi?1 . X n ? i + 1 j =i

= 0,

n i?1 1 1 ?j = ? ?j , X X n ? i + 1 j =i n ? i + 1 j =1

which is Fi?1 -measurable. Thus, (3) ? i |Fi?1 ) = ? E0 (X

i?1 n 1 1 ?j = ?j . X X n ? i + 1 j =1 n ? i + 1 j =i

LINDEBERG PRINCIPLE

7

From the above identity and exchangeability, it follows that ? i |Fi?1 )2 ) = E0 (E0 (X 1 ? 2) [(n ? i + 1)E0 (X 1 (n ? i + 1)2 ?1 X ? 2 )]. + (n ? i + 1)(n ? i)E0 (X ? 2 ) = E0 ( 1 Now clearly E0 (X 1 n ?1 X ?2 ) = E0 (X
n ?2 i=1 Xi ) = 1

and

1 n(n ? 1)

?i X ?j ) E0 (X
i=j

=?

n 1 ?i2 ) = ? 1 . E0 (X n(n ? 1) i=1 n?1 n ? j =1 Xj

(The second equality holds because (4) ?i |Fi?1 )2 ) = E0 (E0 (X

= 0.) Combining, we get

(i ? 1) . (n ? i + 1)(n ? 1)

Again, by a similar argument as before (using the identity we have ? i2 |Fi?1 ) = E0 (X It follows that ? 2 |Fi?1 )) = Var0 (E0 (X i 1 ? 2) [(n ? i + 1) Var0 (X 1 (n ? i + 1)2
n 1 ? 2. X n ? i + 1 j =i j

n ?2 i=1 Xi

= n),

? 2, X ? 2 )]. + (n ? i + 1)(n ? i) Cov 0 (X 1 2 ? 4 ), and ? 2 ) ≤ E0 (X Now, Var0 (X 1 1
2 ?2 ?1 Cov0 (X , X2 ) =

1 n(n ? 1)

i=j

2 2 2 ?i2 X ?j ?1 ?2 E0 (X ) ? E0 (X )E0 (X )

= = Combining, we get (5)

n 1 ? 2 (n ? X ? 2 )) ? 1 E0 ( X i i n(n ? 1) i=1

? 4) 1 ? E0 (X 1 ≤ 0, n?1

? 4 ) ≥ (E0 (X ? 2 ))2 = 1. since E0 (X 1 1

?4 ?i2 |Fi?1 )) ≤ E0 (X1 ) . Var0 (E0 (X n?i+1

8

S. CHATTERJEE

Now let G be the matrix whose (i, j )th element is gij =
? ? 1/(n ? i + 1),

? . Then Ri is Fi -measurable, and from (3), we see that Let R = GX (6) Next, note that
2 ? i ? E0 (X ? i |Fi?1 ))2 |Fi?1 ) E0 (Ri |Fi?1 ) = E0 ((X

1, ? 0,

if i > j , if i = j , if i < j .

E0 (Ri |Fi?1 ) = 0.

? 2 ) = 1, we get Using (4), (5), the triangle inequality and the fact that E0 (X i
2 ?i2 |Fi?1 ) ? 1| + E0 (E0 (X ? i |Fi?1 )2 ) E0 |E0 (Ri |Fi?1 ) ? 1| ≤ E0 |E0 (X

?i2 |Fi?1 ) ? (E0 (X ?i |Fi?1 ))2 . = E0 (X

(7)

≤ ≤2

? 4) (i ? 1) E0 (X 1 + n ? i + 1 (n ? i + 1)(n ? 1) ? 4) E0 (X 1 n?i+1
4 ?1 since E0 (X ) ≥ 1.

3 1/3 , which is the We will now temporarily use the notation Ri 0 3 for (E0 |Ri | ) conditional L3 norm of Ri . By Minkowski’s inequality and exchangeability, we have

Ri

0 3

?i ≤ X

0 3

+

n 1 ?j X n ? i + 1 j =i

0 3

?1 0 . =2 X 3

This bound can be rewritten as (8) E0 |Ri |3 ≤ 8? σ ?3 E0 |X1 ? ? ?|3 .

Now de?ne the function f1 : Rn → R as f1 (x) := f0 (G?1 x). Then f1 (R) = ? ). Let gij denote the (i, j )th element of G?1 . It is a simple exercise to f0 (X verify that g =
ij

? ? ?1/(n ? j ), ?

1, 0,

if i > j , if i = j , if i < j .

Using the chain rule we see that for any j , r and x,
r ?j f1 (x) = 1≤i1 ,...,ir ≤n

(?i1 ?i2 · · · ?ir f0 )(G?1 x)gi1 j gi2 j · · · gir j .

LINDEBERG PRINCIPLE

9

Thus for any r ,
n r

(9)

Lr (f1 ) ≤ L′ r (f0 ) max

1≤j ≤n

i=1

|gij |

r ′ = L′ σ r 2r , r (f0 )2 = Lr (f )?

where Lr is de?ned as in the statement of Theorem 1.1. Now let V be a standard Gaussian vector in Rn . Using the bounds from (6)–(9) in Theorem 1.1, we get |E0 f1 (R) ? E0 f1 (V)|
n 1 1 2 E0 |E0 (Ri |Fi?1 ) ? 1| + nL3 (f1 ) max (E0 |Ri |3 + E|Vi |3 ) ≤ L2 (f1 ) 1≤i≤n 2 6 i=1 n

≤ 4L′ σ2 2 (f )?

i=1

? 1 |4 = σ Now E|X ? ?4 E0 |X1 ? ? ?√ |4 , and by comparing sums with integrals, we n ? 1 / 2 ≤ 2 n. Thus, have i=1 (n ? i + 1) |E0 f1 (R) ? E0 f1 (V)| (10) ≤ 8L′ ?|4 2 (f ) nE0 |X1 ? ?
′ ?|3 + σ ? 3 E|V1 |3 ). +4 3 nL3 (f )(8E0 |X1 ? ?

? 4) E0 (X 8 1 + nL′ (f )? σ 3 (8? σ ?3 E0 |X1 ? ? ?|3 + E|V1 |3 ). n?i+1 6 3

Let U = G?1 V. Explicitly,

Ui = Vi ?

Vj . n?j j =1

i?1

? ) = f (Y ) and f1 (R) = f0 (X ? ) = f (X). Now note that f1 (V) = f0 (U), f0 (Z Combining, we get ? )| |E0 f (X) ? E0 f (Y )| = |E0 f1 (R) ? E0 f0 (Z (11) ≤ |E0 f1 (R) ? E0 f1 (V)| ? )|. + |E0 f0 (U) ? E0 f0 (Z

We already have a bound on |E0 f1 (R) ? E0 f1 (V)| from (10). We will now ? )|, where recall that Z ?i = Zi ? Z ?, compute a bound on |E0 f0 (U) ? E0 f0 (Z and Z is a standard Gaussian vector. To do that, we ?rst need to do some computations. Let σ ?ij := Cov(Ui , Uj ). Then
? j ?1 ? ? ? 1 ? ? ?(n ? j ) + (n ? k)?2 , ? ? ? ? k =1
j ?1

if i > j , if i = j , if i < j .

σ ?ij =

? ? 1+ (n ? k)?2 , ? ? ? ? k =1 ? ?

σ ?ji ,

10

S. CHATTERJEE

Now, for i > j , we can rewrite the ?rst term in σ ?ij as a telescoping sum to get σ ?ij = ? =? Thus, if we de?ne ?i , Z ?j ) = σij := Cov(Z then
n i,j n i?1

1 1 1 ? ? n ? 1 k=1 n ? k ? 1 n ? k 1 1 ? . 2 n ? 1 k=1 (n ? k) (n ? k ? 1) ?1/n, (n ? 1)/n,
j ?1

j ?1

+

1 (n ? k)2 k =1

j ?1

if i = j , if i = j ,

|σij ? σ ?ij | =

i=1

|σii ? σ ?ii | + 2
n i?1

i=1 j =1

|σij ? σ ?ij |

≤2+ (12) +2

1 ( n ? k)2 i=1 k =1
n i?1 j ?1 i=1 j =1 k =1 n?1

1 (n ? k)2 (n ? k ? 1)

=2+

n?2 n?1 1 1 1 + =3+2 . n ? k k=1 n ? k k k =1 k =2

We will use the well-known “Gaussian interpolation technique” for bounding ? )|. This classical method for proving Slepian-type inequal|E0 f0 (U) ? E0 f0 (Z ities has been used extensively in recent years by Talagrand  in his e?orts to obtain a rigorous the cavity method for spin glasses. For each √version of√ ? . Then t ∈ [0, 1], let Wt = 1 ? tU + tZ ? ) ? E0 f0 (U) E0 f0 (Z
1

(13)

= E0 = E0

d f0 (Wt ) dt 0 dt 1 n ?i Ui Z √ ?i f0 (Wt ) dt . ? √ 2 1?t 0 i=1 2 t

Now, if a random vector ξ = (ξ1 , . . . , ξn ) has a centered Gaussian distribution, then it is not di?cult to show using integration by parts that for any di?erentiable function h with subexponential growth at in?nity, and any i,

LINDEBERG PRINCIPLE

11

the following identity holds:
n

E(ξi h(ξ )) =
j =1

E(ξi ξj )E(?j h(ξ )).

Since we do not want to expand our list of references, let us refer to Appendix A.6 of Talagrand’s book  for a proof. Applying this result to our problem (after noting that interchanging integrals is not an issue since everything is bounded), we get E0 (Ui ?i f0 (Wt )) = and similarly, ?i ?i f0 (Wt )) = E0 (Z Combining, we have ? ) ? E0 f0 (U) = E0 f0 (Z
1 2 1 0 1≤i,j ≤n

√ 1?t

n

σ ?ij E0 (?j ?i f0 (Wt ))
j =1

√ t

n

σij E0 (?j ?i f0 (Wt )).
j =1

E0 [?i ?j f0 (Wt )](σij ? σ ?ij ) dt.

Using the bound from (12), we get (14)
n?1 1 ? )| ≤ 1 L′ (f0 ) 3 + 2 |E0 f0 (U) ? E0 f0 (Z . 2 2 k k =2

Combining this with (10) and (14), we get |Ef (X) ? Ef (Y )| ≤ E|E0 f (X) ? E0 f (Y )| ≤ 8L′ ?|4 2 (f ) nE|X1 ? ? 4 (f )(8E|X1 ? ? ?|3 + E(? σ 3 )E|V1 |3 ) + nL′ 3 3
n?1 1 ′ 1 2 + L2 (f )E(? σ ) 3+2 . 2 k k =2

To complete the proof, we apply Jensen’s inequality to get E(? σ r ) ≤ E|X1 ? √ n?1 ?1 r ? ?| for r ≥ 2 and use the crude bounds 3 + 2 k=2 k ≤ 3 n and E|V1 |3 ≤ 1.7 to unify terms.

12

S. CHATTERJEE

4. Proof of Theorem 1.3. We will now prove Theorem 1.3 via an application of Theorem 1.2, by using the Stieltjes transform of the spectral measure as a smooth function of the matrix entries. The Stieltjes transform (or Cauchy transform, or resolvent) of a cumulative probability distribution function F on R is de?ned as ∞ 1 (15) dF (x) for every z ∈ C\R. mF (z ) := x ? z ?∞ Analogously, the Stieltjes transform of an N × N Hermitian matrix A at a number z ∈ C\R is de?ned as 1 Tr((A ? zI )?1 ), N where I is the identity matrix of order N . Note that this is just the Stieltjes transform of the empirical spectral distribution (ESD) of A. The ESDs of a sequence {AN }∞ N =1 of random Hermitian matrices converge weakly in probability to a distribution F if and only if (16) mA (z ) := mAN (z ) ?→ mF (z )
P

for every z ∈ C\R.

For the proof of this result and further details like Berry–Esseen-type error bounds, we refer to , pages 639–640. Now recall that if A(x) is a matrix-valued di?erentiable function of a scalar x, and G(x) := (A(x) ? zI )?1 , where z ∈ C\R and I is the identity matrix, then dG dA = ?G G. dx dx This standard result is obtained by di?erentiating both sides of the identity G(A ? zI ) ≡ I . Di?erentiability follows from the fact that the elements of the inverse of a matrix are all rational functions of the elements of the original matrix. Higher-order derivatives may be computed by repeatedly applying the above formula. The following lemma is the key to the proof of Theorem 1.3: (17) ?N = Lemma 4.1. Suppose that for each N we have a random matrix A ? 1 / 2 N N ? ? (N Xij )1≤i,j ≤N , where the collection (Xij )1≤i≤j ≤N is exchangeable and N ? ? N for i > j . Suppose Xij = X ji 2 ?N = 0 X N (N + 1) i≤j ≤N ij and 2 ? N )2 = 1 (X N (N + 1) i≤j ≤N ij a.s.

N) For each N , let (Zij 1≤i≤j ≤N be a collection of i.i.d. standard Gaussian 2 N N = ZN ? Z ? N , where Z ?N = random variables and let Yij i≤j Zij . ij N (N +1)

LINDEBERG PRINCIPLE

13

N = Y N for i > j , and let B = (N ?1/2 Y N ) Let Yij N ij 1≤i,j ≤N . Then, for any ji z ∈ C\R, and any g : R → R with bounded derivatives up to the third order, we have

|Eg(Re mA ?N (z )) ? Eg (Re mBN (z ))| where m is the Stieltjes transform as de?ned in (16) and C1 and C2 are constants depending only on g and z . The quantity |Eg(Im mA ?N (z )) ? Eg (Im mBN (z ))| also admits the same upper bound. To complete the proof of Theorem 1.3 using this lemma, we need the following fact about spectral distributions of Hermitian matrices: Lemma 4.2 (Quoted from , Lemma 2.2). Let A and B be two N × N Hermitian matrices, with empirical distribution functions FA and FB . Then 1 FA ? FB ∞ ≤ rank(A ? B ). N This lemma is an easy consequence of the well-known interlacing inequalities for eigenvalues of Hermitian matrices. Let us now complete the proof of Theorem 1.3 by combining Lemma 4.1 and Lemma 4.2. ?N = ? N = (X N ? ? ?N )/σ ?N , and let A Proof of Theorem 1.3. Let X ij ij ? 1 / 2 N ?N satis?es the hypotheses of Lemma 4.1. ? )1≤i,j ≤N . Clearly, A (N X ij Thus, we have |Eg(Re mA ?N (z )) ? Eg (Re mBN (z ))| The same bound holds for |Eg(Im mA ?N (z )) ? Eg (Im mBN (z ))| as well. The 4 N ? | = o(N 2/3 ), and thus under that condition, bound converges to zero if E|X 12 ?N and BN must have the same LSD. Finally, observe that by Lemma 4.2, A ?1 ?N }, and FB converges the sequence {σ ?N AN } has the same LSD as {A N weakly to the semicircle distribution in probability. This completes the proof. Proof of Lemma 4.1. To formalize things in a way that is suitable for our purpose, consider the map A which “constructs” Wigner matrices of order N . Let n = N (N + 1)/2 and write elements of Rn as x = (xij )1≤i≤j ≤N . For any x ∈ Rn , let A(x) be the matrix de?ned as (18) A(x)ij := N ?1/2 xij , N ?1/2 xji , if i ≤ j , if i > j .
N 4 1/2 N 3 ?12 ? 12 ≤ C1 N ?1 (E|X | ) + C2 N ?1/2 E|X | .

? N |4 )1/2 + C2 N ?1/2 E|X ? N |3 , ≤ C1 N ?1 (E|X 12 12

14

S. CHATTERJEE

√ Now let us ?x z = u + ?1v ∈ C, with v = 0. Let G(x) := (A(x) ? zI )?1 , and de?ne h : Rn → R as h(x) := N ?1 Tr(G(x)). For any α ∈ {(i, j )}1≤i≤j ≤n , we will write ?α h for ?h/?xα by our usual convention. From (17), it follows that for any α, (19) Now note that for any α, β ∈ {(i, j )}1≤i≤j ≤n , we have ?β ?α A ≡ 0. An easy computation involving repeated applications of (17) to the above expression for ?α h gives, for any α, β, γ ∈ {(i, j )}1≤i≤j ≤N , (20) ?β ?α h = N ?1 Tr(G(?β ′ A)G(?α′ A)G),
{β ′ ,α′ }={β,α}

?α h = ?N ?1 Tr(G(?α A)G).

(21) ?γ ?β ?α h = ?N ?1

Tr(G(?γ ′ A)G(?β ′ A)G(?α′ A)G).
{γ ′ ,β ′ ,α′ }={γ,β,α}

Note that the ?rst sum runs over all permutations of (β, α), which amounts to only two terms. Similarly, the second sum involves six terms. To bound (19), ?rst note that Tr(G(?α A)G) = Tr((?α A)G2 ). Since G2 has a spectral decomposition and all its eigenvalues are bounded by |v |?2 in magnitude, it follows in particular that the elements of G2 are also bounded by |v |?2 . Now, ?α A has at most two nonzero elements, which are equal to N ?1/2 . Hence, | Tr(G(?α A)G)| = | Tr((?α A)G2 )| ≤ 2|v |?2 N ?1/2 . To bound (20) and (21), we need to recall the properties of the Hilbert– Schmidt norm for matrices. For an N × N complex matrix B = (bij )1≤i,j ≤N , the Hilbert–Schmidt norm of B is de?ned as B := ( i,j |bij |2 )1/2 . Besides the usual properties of a matrix norm, it has the following additional features: (a) | Tr(BC )| ≤ B C , (b) if U is a unitary matrix, then for any C of the same order, CU = U C = C , and (c) for a matrix B admitting a spectral decomposition (i.e., a normal matrix) with eigenvalues λ1 , . . . , λN , and any other matrix C of the same order, max{ BC , CB } ≤ max1≤i≤N |λi | · C . For a proof of these standard facts one can look up, for example, , pages 55–58. Clearly, G and the derivatives of A are all normal matrices. Moreover, the eigenvalues of G are bounded by |v |?1 , where v = Im z . Thus, by the properties of the Hilbert–Schmidt norm listed above, we have | Tr(G(?β A)G(?α A)G)| ≤ G(?β A) G(?α A)G ≤ 2|v |?3 N ?1 . ≤ |v |?3 ?β A ?α A

LINDEBERG PRINCIPLE

15

Similarly, | Tr(G(?γ A)G(?β A)G(?α A)G)| ≤ G(?γ A) G(?β A)G(?α A)G ≤ G(?γ A) G(?β A)G (?α A)G ≤ |v |?4 ?γ A ?β A ?α A ≤ 23/2 |v |?4 N ?3/2 .

Finally, note that since the matrix entries are all real, therefore ?α (Re h) = Re ?α h and so on. Thus, if we let f = g ? (Re h), then substituting the bounds ?2 and L′ (f ) ≤ obtained above in (19), (20) and (21), we get L′ 3 2 (f ) ≤ K1 N ? 5 / 2 K2 N , where K1 and K2 are constants depending only on g and z . By Theorem 1.2, it now follows that ? ) ? Ef (Y )| ≤ 9.5K1 N ?1 (E|X ? 12 |4 )1/2 + 13K2 N ?1/2 E|X ? 12 |3 , |Ef (X

? , and Zij ’s are i.i.d. standard Gaussian random variables. where Yij = Zij ? Z This completes the proof of the lemma. Acknowledgments. The author thanks the anonymous referee for his very detailed and helpful comments on the material and the style of presentation. He is also grateful to his former thesis advisor Persi Diaconis for constant encouragement and support. REFERENCES
 Arnold, L. (1967). On the asymptotic distribution of the eigenvalues of random matrices. J. Math. Anal. Appl. 20 262–268. MR0217833  Bai, Z. D. (1999). Methodologies in the spectral analysis of large-dimensional random matrices, a review. Statist. Sinica 9 611–677. MR1711663  Baik, J. and Suidan, T. M. (2005). A GUE central limit theorem and universality of directed ?rst and last passage site percolation. Int. Math. Res. Not. 6 325–337. MR2131383  Bodineau, T. and Martin, J. (2005). A universality property for last-passage percolation paths close to the axis. Electron. Comm. Probab. 10 105–112. MR2150699  Chatterjee, S. (2004). A simple invariance theorem. Available at http://arxiv.org/math.PR/0508213 .  de Finetti, B. (1969). Sulla prosequibilit` a di processi aleatori scambiabili. Rend. Ist. Mat. Univ. Trieste 1 53–67. MR0292146  Boutet de Monvel, A. and Khorunzhy, A. (1998). Limit theorems for random matrices. Markov Process. Related Fields 4 175–197. MR1641617  Diaconis, P. and Freedman, D. (1980). Finite exchangeable sequences. Ann. Probab. 8 745–764. MR0577313  Girko, V. L. (1988). Spectral Theory of Random Matrices. Nauka, Moscow. MR0955497  Grenander, U. (1963). Probabilities on Algebraic Structures. Wiley, New York. MR0259969

16

S. CHATTERJEE

 Khorunzhy, A. M., Khoruzhenko, B. A. and Pastur, L. A. (1996). Asymptotic properties of large random matrices with independent entries. J. Math. Phys. 37 5033–5060. MR1411619  Lindeberg, J. W. (1922). Eine neue herleitung des exponentialgesetzes in der wahrscheinlichkeitsrechnung. Math. Z. 15 211–225. MR1544569  Mossel, E., O’Donnell, R. and Oleszkiewicz, K. (2005). Noise stability of functions with low in?uences: Invariance and optimality. Available at http://arxiv.org/math.PR/0503503 .  Pastur, L. A. (1972). The spectrum of random matrices. Teoret. Mat. Fiz. 10 102– 112. MR0475502 ˇkauskas, A. (1989). Approximation Theory in the Cen Paulauskas, V. and Rac tral Limit Theorem. Exact Results in Banach Spaces. Kluwer, Dordrecht. MR1015294  Rotar, V. I. (1979). Limit theorems for polylinear forms. J. Multivariate Anal. 9 511–530. MR0556909  Schenker, J. H. and Schulz-Baldes, H. (2005). Semicircle law and freeness for random matrices with symmetries or correlations. Math. Res. Lett. 12 531–542. MR2155229  Suidan, T. (2006). A remark on a theorem of Chatterjee and last passage percolation. J. Phys. A 39 8977–8981. MR2240468  Talagrand, M. (2003). Spin Glasses: A Challenge for Mathematicians. Cavity and Mean Field Models. Springer, Berlin. MR1993891  Trotter, H. F. (1959). Elementary proof of the central limit theorem. Archiv der Mathem. 10 226–234. MR0108847  Wigner, E. P. (1958). On the distribution of the roots of certain symmetric matrices. Ann. of Math. (2) 67 325–327. MR0095527  Wilkinson, J. H. (1965). The Algebraic Eigenvalue Problem. Clarendon Press, Oxford. MR0184422  Zolotarev, V. M. (1977). Ideal metrics in the problem of approximating the distributions of sums of independent random variables. Theory Probab. Appl. 22 449–465. MR0455066
Department of Statistics 367 Evans Hall #3860 University of California, Berkeley Berkeley, California 94720 USA E-mail: sourav@stat.berkeley.edu URL: http://www.stat.berkeley.edu/?sourav 更多相关文章：
Anisotropic diffusion - Wikipedia, the free encyclopedia.pdf
Both these cases can be described by a generalization of the usual ...^ Lindeberg, T., Scale-Space Theory in Computer Vision, Kluwer Academic ...
Scale-Space Events for the Generalization of 3D-Building Data....pdf
Scale-Space Events for the Generalization of 3D-Building Data Adjustment_专业...nition of the concept “scale” (Lindeberg, 1994) in terms of signals/...
...statistics a generalization of Bose-Einstein's principle_....pdf
Non-exclusion statistics a generalization of Bose-Einstein's principle_专业资料。By constructing the super-particle representation of the free boson gas, we ...
Generalization of the Principle of Chopper Stabilization.pdf
50, NO. 8, AUGUST 2003 975 Generalization of the Principle of Chopper Stabilization László Tóth and Yannis P. Tsividis, Fellow, IEEE AbstractThe ...
Constrained twistors principle of the string theory.pdf
Constrained twistors principle of the string theory_专业资料。The new ...The generalization of these results to the case of six space-time ...
Remark on the Second Principle of Thermodynamics.pdf
Remark on the Second Principle of Thermodynamics_专业资料。All presently ...A variety of mathematical tools  (e.g., the q-generalization of ...
...Aerodynamics for Complex Configurations A General Theory_....pdf
The theory presented here is a generalization of the Huygens' Principle to ...Mangier, K. W. and Smith, J. H. B, "Behaviour of the Vortex pp. ...
Resolution and the weak pigeonhole principle.pdf
In some cases, ner distinctions can be made using generalized forms of the pigeonhole principle. One such principle is the \m into n" generalization ...
...the Cosmological Parameters in the Relativistic Theory of ....pdf
27/387, A-1040 Vienna, Austria u Abstract The causality principle imposes ...The generalization of the ?eld equation by the means of the insertion of ...
An analogue of Beurling’s theorem for the Jacobi t....pdf
Keywords: Jacobi transform, Uncertainty principle, Beurling’s theorem, Abel ...The Jacobi transform can be regarded as the generalization of the Helgason-...
... Part II a Local-Global Principle for Lexicographic.pdf
General Edge-isoperimetric Inequalities, Part II a Local-Global Principle for... 1971. 10 ] G.F. Clements and B. Lindstrom, \A generalization of a ...
A Generalization of the Maximum Noise Fraction Transform.pdf
Gnanadesikan  proposed a generalization of the principle component transform. Powers of the original bands were appended to the data set as new bands...
The Cooperative Principle_图文.ppt
4. Flouting the maxim of Manner Vague,Over-generalization References ? Wikipedia, the free encyclopedia,“Cooperative principle” https://en.wikipedia.org/...
\$kappa\$-generalization of Gauss' law of error.pdf
We show the κ-generalization of Gauss’ law of error, in which the κ-generalized maximum likelihood principle leads to the κ-generalized Gaussian ...
lindeberg feller 中心极限定理的一个概率方法证明.pdf
lindeberg feller 中心极限定理的一个概率方法证明_理学_高等教育_教育专区。...From this generalization it now becomes somewhat clearer why various ...
...of Special Relativity and the Principle of Conse....pdf
Foundations of Special Relativity and the Principle of Conservation of ...The third is a generalization of Newtonian dynamics. The two ?rst laws ...
...Mechanics for Identical Particles The Calogero a....pdf
It is pointed out that Haldane’s generalization of the Pauli principle can be deduced from the anyon model in a strong magnetic ?eld at low ...
...and a Limit on the Violation of the Pauli Principle f.pdf
Using this generalization we ?nd bounds on possible violations of the Pauli exclusion principle for nucleons and quarks based on such bounds for nuclei. ...
...a Trapped Strongly Interacting Fermi Gases of Atoms.pdf
On the basis of a generalization of the variational Schwinger method, we ...4. Schwinger variational principle The Schwinger variational principle (SVP) [...
Some theoretical properties of interval-valued cond....pdf
Then, a ?exible approach can be obtained by using imprecise probabilities, based on a suitable generalization of the coherence principle of de Finetti, or... 更多相关标签：