9512.net

# Optimal scenario tree generation for multiperiod nancial optimization

Optimal scenario tree generation for multiperiod nancial optimization
G.Ch. P ugy A. wi tanowskiy December 1998

Multiperiod nancial optimization is usually based on a scenario model for the possible market situations. Whereas there is a rich literature about modelling and estimation of nancial processes, the scenario tree is typically constructed in an ad-hoc manner. (For interest-rate trees consistency with present observations is however often required). In this paper we show how an optimal scenario tree may be constructed on the basis of a simulation model of the underlying nancial process by using a stochastic approximation technique.

Abstract

1 Introduction

the approximation problem

Austrian Science Fund. y Department of Statistics, University of Vienna, Universit tsstrasse 5, A-1010 Wien, Austria, e-mail addresses: pflug@smc.univie.ac.at, swietanowski@bigfoot.com.

A multistage nancial optimization problem with decision periods 1; 2; : : : T is based on a stochastic model of the future development of the economic environment (prices, interests, cash- ows, etc.). This scenario process is expressed as a (possibly vector-valued) stochastic process 1 ; 2 ; : : : ; T ; a decision model for the actions to be taken. The decisions at time stage t, which may depend on the past observations 1 ; : : : ; t?1 are x1 , x2 ( 1 ), x2 ( 1 ; 2 ), : : : ; xT ( 1 ; : : : ; T ?1 ); the objective function, which expresses the long-term goals of the decision maker. Except for extremly simple and unrealistic cases, multiperiod nancial optimization problems can only be formulated, but not solved. The reason for practical unsolvability is the fact that the decisions are functions, making the problem a functional optimization problem, which cannot be numerically solved as it is. The usual way of reducing the problem to a solvable one is to restrict to discrete cases in which the random vector 1 ; : : : ; T can only take nitely many values, since in this case the decision functions reduce to large decision vectors. For a discrete scenario process, the pertaining history process (i.e. the process ( 1 ), ( 1 ; 2 ), ( 1 ; 2 ; 3 ), : : :) can be represented as a tree. It is this tree, that represents uncertainty in stochastic multiperiod optimization problems. The work described in this paper was supported by the Special Research Program SFB F011 Aurora of the

1

The natural question, which will be answered in this paper, is: Suppose a continuous-space stochastic process is given. How should the discrete-space approximation model be designed in an optimal way, i.ewith smallest approximation error? To begin with, let us consider the simple problem of optimally approximating a one-stage stochastic optimization problem. In section 3, we extend the method to multistage problems. Suppose that the stochastic optimization problem is to minimize

F (x) =

Z

where f (x; u) is the cost function, G is some continuous distribution function on R and X R is the feasible set. Denote by x = argminx F (x) its solution (We suppose for simplicity that it is unique). For the practical calculation, we use instead of the true distribution function G an approxi~ mate discrete distribution G, i.ewe minimize

x2X

f (x; u) dG(u)

(1)

~ F (x) =

Z

The approximation error e is evidently ~ ~ e(F; F ) := F ( argminxF (x)) ? F ( argminxF (x)) 0

x2X

~ f (x; u) dG(u)

(2) (3)

Typically, the error e de ned by (3) is di cult to calculate. It is easier, to get an upper bound for it by using the following Lemma.

Lemma 1.

~ ~ e(F; F ) 2 sup jF (x) ? F (x)j:
x

~ ~ Proof. Let x 2 argmin F and x 2 argmin F . Set = supx jF (x) ? F (x)j: Let M = fx : ~ F (x) F (x ) + 2 g. Suppose that x 2 M . Then ~ = ~x ~ F (x ) + 2 < F (~ ) F (~ ) + F (x ) + F (x ) + 2 : x This contradiction establishes x 2 M , i.e. ~ ~ e(F; F ) = F (~ ) ? F (x ) 2 : x

~ Lemma 1 tells us that we have to choose G in such a way that the sup-distance between F ~ small. and F We introduce the following de nition. De nition 1. (The Wasserstein-distance d1) Let L1 (f ) be the Lipschitz-constant of f , i.e.

2

L1 (f ) = inf fL : jf (u) ? f (v)j Lju ? vj for all u; vg (4) ~ (If there is no such L, we set L1 (f ) = 1.) The Wasserstein-distance d1 between G and G is
de ned as

~ ~ d1 (G; G) = supf f (u) dG(u) ? f (u) dG(u) : L1(f ) 1g:
2

Z

Z

(5)

It is well known, that this distance is related to the mass transportation problem (Monge's problem) through the following theorem.

Theorem 1.

The d1 distance has the following properties (i) (Theorem of Kantorovich-Rubinstein) ~ ~ ~ d1 (G; G) = inf fE (jX ? X j; where the joint distribution (X; X ) is arbitrary, but the marginal distributions are xed ~ ~ such that X G; X Gg

R ~ R ~ ~ (ii) d1 (G; G) = jG(u) ? G(u)j du = jG?1 (u) ? G?1(u)j du: ~ (iii) Among all G, which sit on the mass points z1 ; z2 ; : : : zk , the one closest to G in d1 -distance is X zi + zi+1 ~ G(x) = G( 2 ); fi:z xg
~ where zk+1 = 1 and G(1) = 1. For this G, the supremum in (5) is attained by
i

(6)

f (u) = min ju ? zi j; i
i.e. where z0 = ?1.

~ X d1(G; G) =
k i=1

Z

zi +zi+1
2

zi 1 +zi
2

?

ju ? zi j dG(u);

consequence of (ii). 2 ~ It can be shown that the in mum in (6) is attained. The best joint distribution (X; X ) describes how the mass with distribution G should be optimally transported to yield the new ~ mass distribution G. Suppose now that the cost functions u 7! f (x; u) are uniformly Lipschitz, i.e. for all x 2 X , L1(f (x; )) = inf fL : jf (x; u) ? f (x; v)j Lju ? vjg L1: Then, ~ ~ sup jF (x) ? F (x)j L1 d1 (G; G) (7)

Proof. For (i) see 7], Theorem 5.3.2 and Theorem 6.1.1. (ii) is proved in 6]. (iii) is an easy

~ and we see that our approximation problem reduces to minimizing the distance d1 (G; G), which ~ , which is closest to G in the mass transportation sense. is equivalent to nding the discrete G
Consider the well-known newsvendor problem (see 1], page 15). The newsvendor has to decide about the number x of newspapers he orders for the price of . He sells the paper for the price of + . Suppose that the demand is u. Then the cost function is x ? min(x; u) . If we add the decision independent function u we get the equivalent cost function f (x; u) = x ? u]+ + u ? x]+. Let G be the distribution function of the random demand. The stochastic R optimization problem is to minimize F (x) = f (x; u) dG(u). 3

x

Example 1.

Figure 1: The true objective function F of Example 1.
11 10

9

8

7

6

5

4

3

2

1 ?2

?1

0

1

2

3

4

After some calculation we nd that where The minimizer is

F (x) = xG(x) + G (x) + x(1 ? G(x)) + ( u dG(u) ? G (x));

Z

G (u) =

Zu
?1

G(v) dv:

and = 1, = 5. With this speci cation, the objective function F (x) is F (x) = x (x) + 5x( (x) ? 1) + p6 exp(?x2=2) 2 (see Figure 1.) and the minimizer is x = ?1 ( 5 ) = 0:9674. 6 Since u 7! f (x; u) is Lipschitz-continuous of order 1 (i.e. Lipschitz in the usual sense), it su ces to approximate G in the d1 -distance. ~ Let G be a discrete distribution and Zu ~ ~(u) = G(v) dv: G We have that and
?1

x = argminxF (x) = G?1 + : To give a numerical example, suppose that G is the standard normal distribution function

~ ~ ~ ~ ~ ~ F (x) = xG(x) + G (x) + x(1 ? G(x)) + ( u dG(u) ? G (x)) ~ ~ x = argminxF (x) = G?1 ~
4

Z

(8)

+

:

Figure 2: The d1 -distance between
0

~ and Gz for 0 z 5.

?0.05

?0.1

?0.15

?0.2

?0.25

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

+ ](~ G(~ ) ? x G(x ) + G (x ) ? G (~ )): x x x ~ Suppose we want to choose G such that it sits on exactly three points. For symmetricity reasons, the three points must be of the form ?z; 0; z . By Theorem 1 (iii), the masses sitting on ?z; 0; z must be (?z=2), (z=2) ? (?z=2), (1 ? (z=2)). Call the distribution with this ~ ~ mass distribution Gz . The d1 -distance between and Gz is found to be ~ d1 ( ; Gz ) = p2 (exp(?z2 =2) ? exp(?z2 =8)) + 2z (z) ? z (z=2) ? z: 2 This function is shown in Figure 2. The optimal z equals to 1.029. The masses sitting on ?1:029; 0; 1:029 are 0:3035; 0:3930; 0:3035 respectively. ~ ~ With this approximation, the objective function F looks as in Figure 3. The minimizer of F ~ ) = 0:0028. is x = 1:029. The approximation error is e(F; F ~ A widely used technique for approximation is to choose the discretization in such a way, that the rst two moments are matched. Suppose we want to approximate the standard normal distribution by a three point distribution, each with probability 1=3. If the latter should be symmetric around zero and have variance 1, then the distribution must sit on ?1:22; 0; 1:22. For this approximation, the minimizer is 1.22, resulting in an approximation error of e = 0:0439.
In many applications, the functions u 7! f (x; u) are not Lipschitz, because they grow faster than linear as juj ! 1. For including these cases, we introduce the Lipschitz-constant of order p:

The approximation error is ~ e(F; F ) =

2 Superlinear growth
De nition 2.

5

~ Figure 3: The approximate objective function F of Example 1.
10 9

8

7

6

5

4

3

2

1 ?2

?1

0

1

2

3

4

The Lipschitz-constant of order p of f is

Lp(f ) = inf fL : jf (u) ? f (v)j Lju ? vj max(1; jujp?1 ; jvjp?1 )g: The Fortet-Mourier distance dp for distribution functions is de ned as:
~ ~ dp(G; G) = supf f (u) dG(u) ? f (u) dG(u) : Lp(f ) 1g;
It turns out that this distance can be equivalently written as

Z

Z

~ ~ dp(G; G) = max(1; jujp )jG(u) ? G(u)j du:
(see 7], page 93). If p = 1, this coincides with the previously de ned Wasserstein-distance. The Fortet-Mourier metric was used by Dupacova and R misch ( 2]) in connection with stability results for stochastic programs. Suppose now that for all x 2 X ,

Z

Lp(f (x; )) = inf fL : jf (x; u) ? f (x; v)j Lju ? vj max(1; jujp?1 ; jvjp?1 )g Lp:
Then,

and we see that if the cost functions f ( ; u) are continuous with Lipschitz-constant of order p ~ (unifromly in x), then our approximation problem reduces to minimizing the distance dp (G; G). Now we show that we may reduce the problem to the minimization of the d1 -distance. Let u juj 1 p (u) = jujp sgn(u) juj 1 Notice that ?1 (u) = 1=p (u). p 6

~ ~ sup jF (x) ? F (x)j Lp dp (G; G) x

(9)

If L1 (f ) < 1, then

jf ( p(u)) ? f ( p(v))j
which implies that

L1(f )j p(u) ? p(v)j L1(f ) p max(1; jujp?1 ; jvjp?1 )ju ? vj
p)

On the other hand, if Lp(f ) < 1, then

Lp(f

p L1 (f ):

(10)

(u)) ? f ( 1=p(v))j Lp(f ) max(1; j 1=p (u)jp?1 ; j 1=p(v)jp?1 )j 1=p(u) ? 1=p(v)j Lp(f ) max(1; max(juj; jvj)p?1 ) max(1; max(juj; jvj)(1?p)=p ju ? vj Lp(f )ju ? vj
1=p

jf (

and therefore

) Lp(f ): We nally get for cost functions with superlinear growth ~ sup jF (x) ? F (x)j
1=p

L1(f

(11)

=

~ sup j f (x; 1=p (u)) dG( 1=p (u)) ? f (x; 1=p (u)) dG( 1=p (u))j x ~ d1 (G 1=p; G 1=p) sup Lp(f (x; )):
x

Z Z ~ sup j f (x; u) dG(u) ? f (x; u) dG(u)j x Z Z
x

~ ~ This shows that it is equivalent to minimize dp (G; G) or to minimize d1 (G 1=p ; G 1=p ). Notice that G 1=p is the distribution of p(Y ), where Y has distribution G. We summarize: If the cost functions u 7! f (x; u) obey a usual Lipschitz condition for each x, then no transformation is necessary. If however they are only Lipschitz of order p for some p > 1, then we change the geometry of the axis through the nonlinear transformation u 7! 1=p (u), arriving in both cases at the problem of minimizing the distance d1 between the true model and the discrete approximation. ~ From now on, we assume that G and G are already transformed if necessary. ~ In this section we assume that G(u) and G(u) are probability distribution functions on RT , T we introduce the distance where u = (u1 ; : : : ; uT ). On R

3 Optimal tree construction.

ku ? vk =

T X t=1

atjut ? vt j:

Here (a1 ; : : : ; aT ) is a vector of nonnegative constants, which weight the importance of the dimensions; typically a1 a2 : : : aT > 0. 7

In principle, the one-dimensional setup can be carried over to the multidimensional case by introducing

Lp(f ) = inf fL : jf (u) ? f (v)j L
and

T X t=1

atjut ? vtj max(1; jut jp?1; jvt jp?1 )g

~ ~ dp(G; G) = supf f (u) dG(u) ? f (u) dG(u) : Lp(f ) 1g:

Z

Z

As in section 2, componentwise application of the mapping ut 7! p (ut ) may reduce the problem to the case p = 1. Notice that for di erent dimensions, we may even apply di erent transformations. Therefore, we may and will w.l.o.g. assume that the functions f (x; u) are Lipschitz, i.e. inf fL : jf (x; u) ? f (x; v)j Lku ? vkg < 1. Our goal is to approximate G by the best tree-structured discrete multivariate distribution ~ G. To describe the tree-structure, we assume that the tree has height T . In layer 0, 1, 3, : : :, T there are 1; k1 ; k3 ; : : : ; kT nodes. The root of the tree is (0; 0). The other nodes are denoted by

f(t; i) : 1 i kt ; 1 t T g:
If node (t; j ) is the predecessor of the terminal node (T; i), we write

predt (i) = j:
Consider a ternary tree of height 2. For this tree we have T = 2, k1 = 3, k2 = 9. Its nodes are: (0,0), (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (2,4), (2,5), (2,6), (2,7), (2,8), (2,9). (see Figure 4.) With each node of the tree (except for the root) we associate a real value zt;i . The vector of all these values is denoted by

Example 2.

Z = fzt;i : i = 1; : : : ; kt ; t = 1; : : : ; T g: We group the points in Z into vectors z1 ; : : : ; zk by setting 1 0 z 1;pred (i) B z2;pred (i) C C B C: B .. zi = B C Bz . A @ T ?1;pred ? (i) C zT;i
T
1 2

T

1

A discrete probability measure which sits on the vectors zi is called tree-structured. The mass points Z induce a partition A(Z ) = fAi(Z ) ; i = 1; : : : ; kT g of RT :

These partitions are not disjoint. They may, however, be made disjoint by the following convention: If u is in several sets Ai(Z ) , then we assign it to the set with minimal i. 8

A(iZ ) = fu 2 RT : ku ? zik = min ku ? zj kg: j

Figure 4: Ternary tree structure.
(2,1) (1,1) (2,2) (2,3)

(2,4) (0,0) (1,2) (2,5) (2,6)

(2,7) (1,3) (2,8) (2,9)

~ We use the notation G(Z ) for the distribution sitting on the points zi with masses p(Z ) , where

pi =
Let

(Z )

Z

A(Z) i

dG(u):

D(Z ) =
=

Z
T

k XZ

min ku ? zj k dG(u) j

Lemma 2. If f (x; ) are uniformly Lipschitz, i.e. there is a L1 < 1 such that L1(f (x; )) = inf fL : jf (x; u) ? f (x; v)j Lku ? vkg L1 for all x 2 X . Then Z Z ~ ~ sup jF (x) ? F (x)j = sup j f (x; u) dG(u) ? f (x; u) dG(Z ) (u) x x
L1 D(Z ):

(Z ) i=1 Ai

ku ? zik dG(u):

(12)

Proof.

X ~ jF (x) ? F (x)j = j f (x; u) dG(u) ? p(iZ )f (x; zi )j
kT i=1

Z

9

k XZ
T

i=1

L1

k XZ
T

A(Z) i

jf (x; u) ? f (x; zi )j dG(u) ku ? zi k dG(u)
2

= L1 D(Z ):

(Z ) i=1 Ai

Lemma 2 tells us that we have to minimize D(Z ) over all tree-structures Z . We discuss three algorithms for solving this minimization problem: (i) A deterministic iteration procedure, which is applicable, if G is completely known (ii) A stochastic approximation procedure, which is based on a sample from G and which converges to a local minimum of D (iii) A branch-and-bound procedure, which is also based on a sample from G and which converges to a global minimum of D. Before introducing the algorithms, we study the di erentiablility properties of Z 7! D(Z ). Let rZ D(Z ) be the column vector consisting of the entries

f

X

Z

(Z ) fj :predt(j )=ig Aj

aj sgn(zt;i ? ut) dG(u) : i = 1; : : : ; kt ; t = 1; : : : ; T g:

Proposition 1. If G has a Lebesgue density, then Z 7! D(Z ) is di erentiable with derivative rZ D(Z ). If G has bounded Lebesgue density g such that there is an integrable function k such that g(u) k(kuk), then Z 7! rZ D(Z ) is Lipschitz. Proof. Since

Z
4h

jut ? zt ? hj ? ju ? zj ? h sgn(zt ? ut )]2 dG(u)
2

Z

1lfjut ?zt j hg dG(u)

= o(h2 )
as h ! 0, we see that zt 7! jut ? zt j is L2 (dG)-di erentiable with derivative sgn(zt ? ut ). Hence also z 7! ku ? zk is L2 (dG)-di erentiable. The rst assertion follows now from 5], Corollary 3.52 on page 184. We prove now the second assertion: For any set B RT , let B := fu : inf fku ? vk : v 2 B g g be the -fattening of B . Let H be the family of all hyperspaces of RT of dimension T ? 1. The assumption implies that R dG(u) : H 2 H; > 0g < 1: supf H For a tree-structure Z , let

B (Z ) = fu 2 RT : min ku ? zj k is not unique g: j
10

B (Z ) is the boudary-set of the partition A(Z ). Let K be a large constant, such that B (Z ) coincides on fu : kuk > K g with a nite set of hyperplanes. Such K exists because of the structure of B (Z ). If kZ ? Z k , then the partitions A(Z ) and A(Z ) di er only for points in B (Z ). We get

jrZ D(Z ) ? rZ D(Z )j

Z

1lfkuk K g dG(u) + Z 1lfkuk>K g dG(u) B K1 + K 2
B(Z)
( )

ZB

(Z )

dG(u)

Z

for appropriately chosen constants K1 and K2 .

2

3.1 The deterministic iteration
1. Initialize

If G is completely known, we may use the following algorithm:

s = 0 (0) Z = fzt;i : 1 i kt ; 1 t T g
(0) (s 3. Let zt;i+1) 2 argminy f fj :predt(j )=ig A Z s jut ? yj dG(u)g, i.e. zt;i is the median of the j s S distribution G restricted to fj :predt(j )=ig A(Z ) in the t-th coordinate. j
( ( )) ( )

2. Find A(Z i

(s) )

for i t kT .

P

R

4. Stop if D(Z (s) ) D(Z (s?1) ). Otherwise set s := s + 1 and goto 2.

D(Z (s+1)) D(Z (s)) If D(Z (s +1) ) = D(Z (s ) ) for some s , then D(Z (s+1) ) = D(Z (s) ) for all s s and

rithm, then

Proposition. If Z (s) is the sequence of trees generated by the deterministic iteration algo-

rZ D(Z (s ) ) = 0:
Proof.
D(Z ) =
=
(s)

Z
T

k XZ

( min ku ? zjs) k dG(u) j

T X

t t=1 j =1 Aj kt T X X X Z (s jut ? zt;i)j dG(u) at = (Z (s) ) t=1 i=1 fj :predt(j )=ig Aj

(Z (s) )

(s) atjut ? zt;pred (j)j dG(u)

11

k T X X

t=1 i=1 fj :predt(j )=ig Aj kT T X XZ (s+1) atjut ? zt;predt(j)j dG(u) = (Z (s) ) t=1 j =1 Aj kT Z X = ku ? z(js+1) k dG(u) (Z (s) ) A j Z=1 j min ku ? z(s+1) k dG(u) j j = D(Z (s+1) )

at

t

X

Z
(Z (s) )

(s jut ? zt;i+1) j dG(u)

If D(Z (s +1) ) = D(Z (s ) ), then necessarily, for all t and all i,

zt;i 2 argminy f
which is equivalent to

(s )

X

Z

(Z (s fj :predt(j )=ig Aj

))

jut ? yj dG(u)g;

X

Z
(Z (s ) )

fj :predt(j )=ig Aj

(s sgn(zt;i ) ? ut ) dG(u) = 0;

i.e. rZ D(Z (s ) ) = 0 and evidently, the iteration has reached a xpoint. 2 We remark here that this method is related to the k-means method of cluster analysis (see e.g. Mc Queen 3]).

3.2 The stochastic approximation
(s)

Now we describe how we can estimate the best approximation on the basis of a sample from G. Suppose that we observe an i.i.d. sequence of vectors
( ( = ( 1s) ; : : : ; Ts) );

s = 1; 2; : : : each distributed according to G.
1. Initialize

The stochastic approximation algorithm is

s = 0 (0) Z = fzt;i : 1 i kt ; 1 t T g p(0) = 1=kT for 1 i kT i
(0)

2. Observe the next trajectory (s)
s 3. Find j 2 f1; 2; : : : ; kT g such that (s) 2 A(Z ) . j
( )

12

4. Change for 1 t T

zt;pred (j) = zt;pred (j) +
t t

(s+1)

(s)

(

? as

at s

t

if if

(s) (s) zt;predt(j) t (s) (s) t < zt;predt(j )

Leave all other points unchanged. 5. Estimate

pj
6. Set s := s + 1 and goto 2. algorithm, then

(s+1)

p(is+1)

sp(js) + 1 = s+1 sp(is) for i 6= j = s+1

Proposition. If Z (s) is the sequence of trees generated by the stochastic approximation

In particular, if D(Z ) has a unique minimizer Z , then

rZ D(Z (s)) ! 0
Z (s) ! Z

a:s:

a:s:

Proof. The vectors Z (s) satisfy the recursion
with Since

Z (s+1) = Z (s) ? 1 rZ D(Z (s)) ? 1 W (s) s s

(s Wt;i) =

(s) at sgn( t(s) ? zt;pred (j) )1lf 2A fj :pred (j )=ig
t
(s)

X
t

(Z (s) )

j

g

:

E (Z (s+1)

we see that E (W (s) jZ (s) ) = 0.PBy 5], Theorem 5.1. on page 282, the Lipschitz property of rZ D(Z ) and the fact that s s1 E ( W (s) ]2 ) < 1, W (s) is bounded, imply that D(Z (s) ) converges a.s. and rZ D(Z (s) ) ! 0 a.s. 2
2

jZ (s)) = Z (s) ? 1 rZ D(Z (s)): s

Let G be the bivariate normal distribution with mean 0 and covariance matrix 1 1 . 1 2 0 The problem is to approximate this distribution by a ternary tree-structured discrete distribution (see Example 2). Our stochastic approximation algorithm was implemented and tested with a sample size of 30000. We have chosen the constants a1 = 10; a2 = 1. The resulting partition is shown in Figure 5 and the pertaining tree is shown in Figure 6. Notice that the partition is not hierarchical (We call a partition hierarchical, if it can be generated by partitioning the rst coordinate rst, then conditionally partitioning the second 13

Example 3.

Figure 5: Partition A found by stochastic approximation (a1 = 10, a2 = 1, sample size: 30000).
1.5

1

0.5

0

?0.5

?1

?1.5 ?1.5 ?1 ?0.5 0 0.5 1 1.5

Table 1: Accuracy of stochastic approximation procedure. Exact value Result of stochastic approximation 1.01 0.008 -1.02 0.310 0.388 0.302

z1;1 z1;2 z1;3 p1;1 p1;2 p1;3

1.029 0.0 -1.029 0.3035 0.3930 0.3035

coordinate, and so on). The probabilities of the partitions were estimated by the stochastic approximation procedure as p1 = 0:197, p2 = 0:072, p3 = 0:033, p4 = 0:136, p5 = 0:121, p6 = 0:131, p7 = 0:034, p8 = 0:073, p9 = 0:203: The rst stage probabilities are p1 + p2 + p3 = 0:302, p4 + p5 + p6 = 0:388, p7 + p8 + p9 = 0:310. The rst stage probabilities, as well as the conditional second stage probabilities are shown on the arcs of the tree in Figure 5. Since the marginal distribution of the rst coordinate is standard normal, we may compare the result to Example 1. Notice, how close the values of the z1;i 's and the probabilities are to the best one-dimensional three-point approximation of the standard normal distribution, although the partition is not hierarchical (Table 1). By choosing the weight factors ai appropriately, one may nd partitions, which are nearly hierachical. In particular, if ai ! 1, such that aiai ! 1, the optimal partition tends to a hierachical one. As an Example, we have repeated Example 3 with the choice a1 = 1000; a2 = 1.
+1

14

Figure 6: The optimal tree found by stochastic approximation (a1 = 10; a2 = 1).
0.110 0.235 1.34 0.15 ?0.57

1.01

0.655 0.310 0.350 0.388 0.312

0.86 ?0.003 ?0.85

0.008

0.338 0.302 0.197 0.072

0.55 ?0.16 ?1.34

?1.02

0.033

The resulting partition is shown in Figure 7. The location of the mass points zt;i and the probabilities pt;i changed only very slightly. The optimal tree construction can be interpreted as a facility location problem, where the locations Z of the facilities have to be tree structured. For such type of problems, the stochastic branch and bound method can be applied. As was shown in 4], this method nds the global minimizer of D(Z ). However, it is much more complicated to implement and leads to very large execution times. For details we refer to the cited paper.

3.3 Stochastic branch-and-bound

4 Final remark
We have presented a method of estimating the optimal tree-structured approximation to a given stochastic process. In the nancial optimization literature it is often argued that consistency of the tree with present observations is the only criterion for the quality of the construction. We do not follow this argumentation here, because we do not believe in a discrete world as ultimate model. The reality, as well as the modelled stochastic processes have to be continuous in space and discrete in time. These processes must ful ll the required consistency, otherwise they would not serve as good models. 15

Figure 7: The partition A found by stochastic approximation (a1 = 1000; a2 = 1, sample size 30000).
1.5

1

0.5

0

?0.5

?1

?1.5 ?1.5 ?1 ?0.5 0 0.5 1 1.5

The problem of best discretization of these processes is not a modelling problem but an approximation problem. The criterion is the approximation error, making it a mathematical problem. It is this problem, which we solved here.

References
1] Birge J., Louveaux F. (1997). Introduction to Stochastic Programming. Springer Series in Operations Research, New York. 2] Dupacova J., R misch W. (1998). Quantitative stability of scenario-based stochastic programs. Prague Stochastics '98 (M. Huskova et al. Eds.) JCMF, Prague, pp. 119 124. 3] MacQueen J. (1966). Some methods for classi cation and analysis of multivariate observations. Proc. Fifth Berkeley Symp. University of California Press 1, pp. 281 297. 4] Norkin V., P ug G., Ruszczy ski A.(1998). A branch and bound method for stochastic global optimization. Mathematical Programming 83, pp. 425 450. 5] P ug G.Ch. (1996). Optimization of Stochastic Models: The Interface between Simulation and Optimization. Kluwer Academic Publishers, Boston. 6] Vallander S.S. (1973). Calculation of the Wasserstein distance between probability distributions on the line. Theor. Prob. Appl. 18, pp. 784 786. 7] Rachev S.T. (1991). Probability metrics and the stability of stochastic models. J. Wiley and Sons, New York. 16