9512.net

甜梦文库

甜梦文库

当前位置：首页 >> >> # On the effect of analog noise in discrete-time analog computations

On the E ect of Analog Noise in Discrete-Time Analog Computations

Pekka Orponen Wolfgang Maass Department of Mathematics Institute for Theoretical Computer Science Technische Universitat Graz University of Jyvaskylay September 23, 1996

We introduce a model for analog noise in analog computation with discrete time that is exible enough to cover the most important concrete cases, such as noisy analog neural nets and networks of spiking neurons. We show that the presence of arbitrarily small amounts of analog noise reduces the power of analog computational models to that of nite automata, and we also prove a new type of upper bound for the VC-dimension of computational models with analog noise.

Abstract

1 Introduction

Analog noise is obviously a serious issue in analog computation. In particular one has to be concerned about the e ect of omnipresent smaller (and occasionally larger) perturbations of the values of intermediate analog variables (for example given by some Gaussian distribution). However there exists no formal model which allows us to address this issue in an adequate manner. In addition there exists no noise-model that can also be applied to computations in networks of noisy spiking neurons. Because of the lack of an adequate mathematical model for analog noise it has so far been impossible to prove rigorous bounds for the computational power of discrete-time systems with analog noise. Previously Siegelmann, 1994] had investigated the e ect of \digital noise" in analog neural nets, where instead of perturbations of intermediate analog values only randomly occuring total failures of gates and wires are allowed (similar as in the quite large literature on noisy digital circuits von Neumann, 1956], Pippenger, 1989]). More recently Casey, 1996] has investigated a model for analog noise that allows perturbations of intermediate analog variables up to some > 0. However, this model does not even cover the case of a Gaussian distribution of noise where also larger deviations than may occur, although with low probability. Furthermore within his framework one can only analyze those computations which can give their binary output (accept/reject) in spite of noise with perfect reliability. Obviously this covers only a rather special class of computations. In addition the model from Casey, 1996]

Klosterwiesgasse 32/2, A{8010 Graz, Austria. E-mail: maass@igi.tu-graz.ac.at. P. O. Box 35, FIN{40351 Jyvaskyla, Finland. E-mail: orponen@math.jyu. . Part of this work was done while this author was at the University of Helsinki, and during visits to the Technische Universitat Graz and the University of Chile in Santiago.

y

1

is not exible enough to be applicable to networks of noisy spiking neurons, where apart from jitter in the ring times it also occurs that neurons re \spontaneously", or that neurons fail to re with a certain probability. In order to model such noise one also has to allow that a discrete state (\not- ring") is randomly replaced by an analog value (a ring time), and vice versa. As the general setup for our noise model, which is intended to cover all these and in addition any other reasonable concrete situations, let us consider some \analog" system M with d components that each have some bounded real-valued state. The total state space of the system is then some bounded set Rd. In the case of, say, d independent analog neurons, each with a state from the interval ?1; 1], the state space would be simply = ?1; 1]d , but we wish to allow also for possible correlations between the components, and \forbidden regions" in the state space, e.g. around the origin. Also, we want to cover systems with discrete and hybrid continuous-discrete state spaces, such as networks of spiking neurons Maass, 1996a] or analog machines with binary registers. The only properties of the state space that are essential to our results are that it is nite-dimensional and bounded, i.e. that the components have some hard limits beyond which their response cannot go. In a noise-free setting, the system M would move in a single time step from a state p 2 to its next state p+ 2 according to some deterministic rule. (We shall discuss the precise de nitions in Section 2.) In the more realistic noisy version we consider, the next state q is given only as some probability distribution over , with a density function (p; q). This transition density is determined as a function of the deterministic next state p+ , and the noise process a ecting the system; more precisely, (p; q) = z(p+; q), where z is the noise density function. In typical situations, this distribution would be centered around p+ , as when z is e.g. some Gaussian density function; however we shall make only the much weaker assumption that the noise process is \piecewise uniformly continuous". (For precise de nitions, see Section 2. An issue requiring some care here is that since the state space is bounded, any unbounded, e.g. Gaussian density would have to be \clipped" at the boundaries of , so that any noise leading a component beyond its response limit is \rounded" to that limit value. This causes a technical problem in the context of density functions, since a single state on the boundary of may be assumed with probability > 0 .) All the standard continuous noise processes (and their clipped versions) satisfy our assumptions, as well as noise processes that are smooth over continuous parts of , and de ned in an arbitrary manner over the possible discrete parts. We even allow the noise process to depend on the current state of the system in a rather arbitrary fashion. The rst result we prove in this paper (Section 3) is that analog systems of the type discussed above, and a ected by noise conforming to our general noise model, have at best the power of nite automata, even if they are allowed to give their binary output with low reliability. This is in stark contrast to e.g. the results in Siegelmann, Sontag, 1995], where noise-free analog neural nets are shown to be capable of universal computation, and the results in Maass, 1996a], where the same is proved of the spiking neural network model without noise. To state the result more precisely, let us assume that our systems are provided with some input-output mechanism, and say that a set L of input sequences is recognized by a system M with reliability 0, if for any input sequence x, the output of M agrees with whether x 2 L with probability at least 1=2 + . We prove that if a set L is recognized with any reliability > 0 by a nite-dimensional, bounded system M a ected by a piecewise 2

uniformly continuous noise process, then L is regular, i.e. it can also be recognized by some nite automaton. Conversely, we verify in Section 4 that if we only consider noise processes that are bounded by some constant < 1=2 (i.e. for any state p, z (p; q) > 0 only for states q such that jqi ? pij holds for all i), then any regular set L can be recognized with perfect ( = 1=2) reliability by a recurrent analog neural net of the type discussed in Siegelmann, Sontag, 1995], even in the presence of analog noise. In Section 5 we prove a new type of upper bound for the VC-dimension of computational models with analog noise. We show that in the presence of arbitrarily small amounts of analog noise there exists an upper bound for the VC-dimension of, e.g. neural nets that is independent of the total number of units in the case of a feedforward architecture, and independent of the computation time in the case of a recurrent neural net. This contrasts with the anomaly that in the noise-free setting the classes of nite recurrent analog neural nets Siegelmann, Sontag, 1995] and nite recurrent networks of spiking neurons Maass, 1996] have in nite VC-dimension, and are thus strongly \unlearnable" from the point of view of learning theory. We show that this anomaly disappears as soon as even an arbitrarily small amount of analog noise is present. More speci c hardware-oriented models for analog noise in analog neural nets have been discussed in Phatak, Koren 1995]. Noise models similar to ours have been used in general studies of the stability of dynamical systems a ected by random perturbations (e.g., Kifer, 1988]). Finally we would like to point out that model for analog noise can also be applied to completely di erent types of models for discrete time analog computation, such as arithmetical circuits Turan, Vatan, 1994], the random access machine (RAM) with analog inputs, the parallel random access machine (PRAM) with analog inputs, various computational discrete-time dynamical systems ( Moore, 1990], Asarin, Maler, 1994], Orponen, Matamala, 1996]), and (with some minor adjustments) also the BSS model Blum, Shub, Smale, 1989], Koiran et al., 1994]. This article provides for each of these quite important special cases a suitable framework for modeling analog noise, as well as signi cant results regarding the e ect of analog noise on their computational power.

2 Preliminaries: Computational Systems and Noise Processes

We shall de ne our computational model rst in the noise-free setting, and then consider the e ect of noise on computations separately. An analog discrete-time computational system (brie y: computational system) M is dened in a general way as a 5-tuple h ; p0 ; F; ; si, where , the set of states, is a bounded subset of Rd , p0 2 is a distinguished initial state, F is the set of accepting states, is the input domain, and s : ! is the transition function. To avoid unnecessary pathologies, we impose the conditions that and F are Borel subsets of Rd , and for each a 2 , s(p; a) is a measurable function of p. We also assume that contains a distinguished null value t, which may be used to pad the actual input to arbitrary length. The nonnull input domain is denoted by 0 = ? ftg. The intended noise-free dynamics of such a system M is as follows. The system starts its computation in state p0 , and on each single computation step on input element a 2 0 moves 3

from its current state p to its next state s(p; a). After the actual input sequence has been exhausted, M may still continue to make pure computation steps, which lead it from a state p to the state s(p; t). The system accepts its input if it enters a state in the class F at some point after the input has nished. (We shall give a more precise de nition of the dynamics, including the e ects of noise, later.) For instance, the recurrent analog neural net model of Siegelmann, Sontag, 1995] (also known as the \Brain State in a Box" model of Anderson et al., 1977]) is obtained from this general framework as follows. For a network N with d neurons and activation values between ?1 and 1, the state space is = ?1; 1]d . The input domain may be chosen as either = R or = f?1; 0; 1g (for \online" input) or = Rn (for \batch" input). In each case the value zero (or the zero vector) serves conveniently as the null value t. For simplicity, we treat here formally only the cases where R; the extensions to the case = Rn are straightforward. The transition function s : ! is in this model given in terms of a d d weight matrix W = (wij ), a d-component bias vector h = (hi ), a d-component input weight vector c = (ci ), and a neuron activation function : R ! ?1; 1]. For any p 2 and a 2 , we de ne s(p; a) = p+ , where for each i = 1; : : : ; d,

p+ = ( i

Xd

j =1 wij pj + hi + ci a)

Both Anderson et al., 1977] and Siegelmann, Sontag, 1995] use the saturated-linear sigmoid activation function 8 > ?1; if u < ?1; < (u) = > u; if ? 1 u 1; : 1; if u > 1; but one may obviously also de ne the model with respect to other activation functions, notably the \standard sigmoid" (u) = tanh u, or the discontinuous signum function sgn(u) =

(

?1; if u < 0;

1; if u 0;

the latter choice yielding the model of recurrent threshold logic networks. The initial state in each of these models may be chosen as p0 = (?1; : : : ; ?1), and the set of accepting states is determined by the activity of some speci c output unit, say unit 1, so that F = fp 2 j p1 > g, for some threshold value > 0. In the sequel, we shall use to denote also the componentwise extension of the chosen activation function to state vectors, so that for any p 2 , (p) := ( (p1 ); : : : ; (pd )). This convention lets us write the transition function s de ned above compactly as s(p; a) = (Wp + h + ac): Feedforward analog neural nets may also be modeled in the same manner, except that in this case one may wish to select as the state set := ( ?1; 1] fdormantg)d , where dormant is a distinguished value not in ?1; 1]. This special value is used to indicate the state of a unit whose inputs have not all yet been available at the beginning of a given computation step (e.g. for units on the l-th layer of a net at computation steps t < l). The completely di erent model of a network of m stochastic spiking neurons (see e.g. Gerstner, van Hemmen, 1994] or Maass, 1996]) is also a special case of our general frameS work. In this case one wants to set sp := ( lj =1 0; T )j fnot- ringg)m , where T > 0 is a su ciently large constant so that it su ces to consider only the ring history of the network 4

during a preceding time interval of length T in order to determine whether a neuron res (e.g. T = 30 ms for a biological neural system). If one partitions the time axis into discrete time windows 0; T ) ; T; 2T ) ; : : : ; then in the noise-free case the ring events during each time window are completely determined by those in the preceding one. A component pi 2 0; T )j of a state in this set sp indicates that the corresponding neuron i has red exactly j times during the considered time interval, and it also speci es the j ring times of this neuron during this interval. Due to refractory e ects one can choose l < 1 for biological neural systems, e.g. l = 15 for T = 30 ms. With some straightforward formal operations one can also write this state set sp as a bounded subset of Rd for d := l m. Let us then consider the e ect of noise on computations. Let Z (p; B ) be a function that for each state p 2 and Borel set B indicates the probability of noise corrupting state p into some state in B . The function Z is called the noise process a ecting M , and it should satisfy the mild conditions of being a stochastic kernel Feller, 1971, p. 205], i.e., for each p 2 , Z (p; ) should be a probability distribution, and for each Borel set B , Z ( ; B ) should be a measurable function. We assume that there is some measure over so that Z (p; ) is absolutely continuous with respect to for each p 2 , i.e. (B ) = 0 implies Z (p; B ) = 0 for every measurable B . By the Radon{Nikodym theorem Feller, 1971, p. 140], Z then possesses a density kernel with respect to , i.e. there exists a function z ( ; ) such that for any state p 2 and Borel set B , Z Z (p; B ) = z(p; q) d : We assume that this function z ( ; ) has values in 0; 1) and is measurable. (Actually, in view of our other conditions this can be assumed without loss of generality.) The dynamics of a computational system M a ected by a noise process Z is now de ned as follows. If the system starts in a state p, the distribution of states q obtained after a single computation step on input a 2 is given by the density kernel a (p; q) = z (s(p; a); q). (Note that as a composition of two measurable functions, a is again a measurable function.) The long-term dynamics of the system is given by a Markov process, where the distribution starting in state p is xa (p; q) of states after jxaj computation steps with input xa 2 de ned recursively by Z xa (p; q) = x (p; r) a (r; q) d : One easily can verify by induction on juj that

xu (p; q) =

Z

q2B

r2

for all x; u 2 of length 1 . Let us denote by x (q) the distribution x(p0 ; q), i.e. the distribution of states of M after it has processed string x, starting from the initial state p0 . Let > 0 be the required reliability level. In the most basic version the system M accepts (rejects) some input x 2 0 R 1 1 if F x(q) d 2 + (respectively 2 ? ). In less trivial cases the system may also perform pure computation steps after it has read all of the input. Thus, we de ne more generally that the system M recognizes a set L 0 with reliability if for any x 2 0 : Z 1 + for some u 2 ftg x2L , xu(q) d 2 F 5

r2

x (p; r)

u (r; q) d

1 ? for all z 2 ftg : 2 F This covers also the case of batch input, where jxj = 1 and 0 is typically quite large (e.g. 0 = Rn ). One gets a reasonably realistic model for noise in an analog neural net with state space = ?1; 1]d by de ning the noise process Z so that it re ects a \clipped Gaussian distribution". Without more speci c knowledge about the noise source, this appears to be the most appropriate model for analog noise in an analog neural net. One assumes in this model that for any computation step the intended output pi 2 ?1; 1] of the ith unit of the net is replaced by a \clipped Gaussian distribution" of values qi 2 ?1; 1], where values < ?1 (> 1) are rounded to ?1 (respectively 1). If one assumes that this rounding occurs independently for each of the d units i in the network, and if one assumes for simplicity that all the underlying Gaussians have the same variance, then one arrives in our general framework for a noisy computational system M at a noise process Z where Z (p; ) is de ned for each p 2 = ?1; 1]d by a symmetric Gaussian distribution with density z(p; q) = (k q ? p k) around p, but with all values qi < ?1 (qi > 1) of the occurring states hq1 ; : : : ; qd i rounded to ?1 (respectively 1). (Here k v k denotes the Euclidean norm of a vector v, and is the density function of some symmetric d-variate Gaussian distribution.) Since such a rounding process will assign probability > 0 to the lower-dimensional bounding hyperrectangles of , we cannot simply de ne as the Lebesgue measure over in order to subsume this type of analog noise under our general noise model. Rather one has to decompose into components 1 ; : : : ; k (representing the interior 1 and lower dimensional bounding hyperrectangles 2 ; : : : ; k of = ?1; 1]d ), and de ne as a sum of measures 1 + : : : + k , where 1 is the Lebesgue measure over 1 and 2 ; : : : ; k are Lebesgue measures for the lower dimensional spaces 2 ; : : : ; k . In the case of a network of spiking neurons, the noise model has to take into account that not only the ring time of a neuron is subject to some jitter (which can be modeled by a Gaussian distribution), but also neurons may randomly fail to re, or they may re \spontaneously" (i.e. even when they would not re in the corresponding deterministic model). All these e ects can be modeled by a suitable noise process Z de ned on the state space sp discussed earlier, with a measure over de ned via a decomposition of similarly as in the case of analog neural nets.

x2L , =

Z

xz (q) d

3 An Upper Bound for the Computational Power of Systems with Analog Noise

It has been shown for various concrete models for analog computation without noise (such as recurrent neural nets Siegelmann, Sontag, 1995] and networks of spiking neurons Maass, 1996a]) that they can simulate a universal Turing machine, and hence have immense computational power. It has long been conjectured that their computational power collapses as soon as one assumes that they are subject to even small amounts of analog noise. We provide in this section a proof of this conjecture. It turns out that one has to argue quite di erently than in the analysis of digital noise (e.g. gate failures) von Neumann, 1956, Pippenger, 1989] and also completely di erently than in the case of noisy analog computations 1 with perfect reliability (i.e. = 2 ) which had been considered in Casey, 1996]. A closer 6

relative is Rabin's Rabin, 1963, Paz, 1971] proof showing that probabilistic nite automata with \isolated cutpoints" are equivalent to deterministic nite automata; however in Rabin's case the niteness of the state space simpli es the setup considerably. Our proof requires a mild continuity assumption for the density functions z (r; ) , which is satis ed in all concrete cases that we have considered. We do not require any global continuity property over for the density functions z (r; ) because of the previously discussed concrete cases, where the state space is a disjoint union of subspaces 1 ; : : : ; k with di erent measures on each subspace. We only assume that for some arbitrary partition of into Borel sets 1 ; : : : ; k the density functions z (r; ) are uniformly continuous over each j , with moduli of continuity that can be bounded independently of r . In other words, we require that z ( ; ) satis es the following condition: A function ( ; ) from 2 into R is called piecewise uniformly continuous if for every " > 0 there is a > 0 such that for every r 2 , and for all p; q 2 j , j = 1; : : : ; k:

k p?q k

implies j (r; p) ? (r; q)j

":

(1)

If z ( ; ) satis es this condition, we say that the resulting noise process Z is piecewise uniformly continuous. Our preceding discussions suggest that the analog noise process Z is in all practically relevant cases piecewise uniformly continuous. Note that because the state space is bounded, any piecewise uniformly continuous density function on has a bounded range. To formulate our rst result, we need a notion of regular sets of sequences over arbitrary domains 0 , which we de ne as follows. Let L 0 be a set of sequences over an input domain 0 . Sequences x; y 2 0 are equivalent with respect to L if one has xw 2 L , yw 2 L for all w 2 0 . The set L is regular if this equivalence relation has only nitely many equivalence classes. By the Myhill{Nerode theorem Hopcroft, Ullman, 1979, pp. 65{67], for nite alphabets 0 this de nition coincides with the usual de nition of regular sets via nite automata. From the point of view of computational complexity theory, machine models that only accept regular sets belong to the most \primitive" class of models. In contrast to Turing machines and other \universal" computational models, the number of internal states of such machine models is xed, independently of the length of the input string.

Theorem 3.1 Let L

0 be a set of sequences over an arbitrary input domain 0 . Assume that some computational system M , a ected by a piecewise uniformly continuous noise process Z , recognizes L with reliability , for some arbitrary > 0. Then L is regular. 0

L. We shall show that there are only nitely many equivalence classes of sequences with respect to L. We begin by observing that if for two sequences x; y 2 0 , the distributions x ( ) and y ( ) are su ciently close, then x and y are equivalent. To see this, assume that R , and suppose for a contradiction that x and y are not equivr2 j x(r) ? y (r)j d = alent. Then there exists some w 2 0 with xw 2 L , yw 2 L.R Without loss of generality, 1 assume that xw 2 L. Thus there exists some u 2 ftg with F xwu(q) d 2 + and R 1 ? . This yields the contradiction F ywu (q) d 2

2

Z

Proof: Let M = h ; p0; F; ; si, where =

ftg, be the system in question recognizing

q2F

xwu(q) d ?

Z

q2F

ywu (q) d

7

= =

x(r) wu(r; q) d d ? y (r) wu(r; q) d d q2FZ r2 q2F r2 Z j x(r) ? y (r)j wu(r; q) d d

Z

Z

Z

Z

Z

q2F r2 r2

j x(r) ? y (r)j

R

Z

implies that x; y 2 0 are equivalent. Thus we have shown that r2 j x (r) ? y (r)j d Next we observe that all the density functions x ( ) for x 2 are piecewise uniformly continuous, with the same bounds on their moduli of continuity as the noise density functions z(r; ) have. This is veri ed by induction on jxj. Given " > 0, let > 0 be such that the density function z ( ; ) satis es condition (1) for all r 2 and j = 1; : : : ; k. We then have for any x 2 + , a 2 , and all p; q 2 j such that k p ? q k :

:

q2F

wu(r; q) d

d

j xa(p) ?

xa (q)j =

Z

=

r2 Z r2Z

x (r) x (r)

j a(r; p) ? a(r; q)j d jz(s(r; a); p) ? z(s(r; a); q)j d

The preceding observation now implies that the space of all functions x ( ) for x 2 0 can be partitioned into nitely many classes C so that any two functions x ( ), y ( ) in the R , and hence correspond to sequences that are same class C satisfy r2 j x (r) ? y (r)j d equivalent with respect to L. Such a partition can for example be achieved in the following way. Using the piecewise uniform continuity of the x ( ), choose from within each component j of a nite set (or \grid") Gj that is so dense that for each r 2 j , if tr 2 Gj is the gridpoint closest to r, then j x (r) ? x (tr )j =4 ( ). (To see that such a nite Gj always exists, note that given the value > 0 corresponding to " = =4 ( ) in condition (1), one can by the Bolzano{Weierstrass theorem choose only a nite number of points t from within the bounded set jSso that any two distinct chosen points t, t0 are more than a distance apart.) Take G = k=1 Gj . Now partition the (bounded!) range of all functions x( ) into j nitely many intervals I of length =2 ( ), and place two functions x ( ), y ( ) in the same class C if for every gridpoint t 2 G the values of x (t) and y (t) fall into the same interval I . Then for any two functions x ( ), y ( ) in the same class C it is the case that for any r 2 j , j = 1; : : : ; k, j x(r) ? y (r)j j x(r) ? x(tr )j + j x(tr ) ? y (tr )j + j y (tr ) ? y (r)j = ( ); R R ( = ( )) r2 d = . and thus r2 j x (r) ? y (r)j d

" r2 = ":

x (r) d

Remark 3.2 In stark contrast to the results of Siegelmann, Sontag, 1995] and

Maass, 1996] for the noise-free case, the preceding Theorem implies that both recurrent analog neural nets and recurrent networks of spiking neurons with online input from 0 can only recognize regular languages in the presence of any reasonable type of analog noise, even if their computation time is unlimited and if they employ arbitrary real-valued parameters.

8

Let us say that a noise process Z de ned on a set Rd is bounded by if it can move a state p only to other states q that have a distance from p in the L1 -norm over Rd , i.e. if its density kernel z has the property that for any p = hp1 ; : : : ; pd i and q = hq1 ; : : : ; qd i 2 , z(p; q) > 0 implies that jqi ? pi j for all i = 1; : : : ; d. As a partial converse to the upper bound result of the previous Section, we now prove that regular languages over the alphabet 1 f?1; 1g can be recognized with perfect reliability (i.e. = 2 ) by recurrent analog neural nets, as long as the noise process a ecting the computation is bounded by a certain constant > 0. The basic idea of this proof is simply to rst construct a threshold logic network T recognizing the regular language under consideration, and then simulate T with a noisetolerant analog neural net. However, in order to obtain the tolerance vs. delay tradeo results in a uniform manner, we derive them as corollaries from a general result on simulating threshold logic networks by noisy recurrent analog neural nets. Consider a d-unit threshold logic network T (cf. Section 2) with transition function s(p; a) = sgn(Wp + h + ac), where W 2 Rd d is the weight matrix of T , h 2 Rd is the bias vector, and c 2 Rd is the input weight vector. Let us say that T has separation , if at each unit, the argument to the signum function is always at least away from zero, i.e., if jwiT p + hi + ciaj always holds, for every i = 1; : : : ; d, p 2 f?1; 1gd , and a 2 f?1; 0; 1g. Any threshold logic network operating on the input alphabet f?1; 0; 1g may be modi ed to have some nonzero separation value by adjusting the bias values appropriately. An important special case are networks with integer weights, which may be adjusted to have separation 1. (On input values a 2 f?1; 1g this is straightforward; dealing with the value a = 0 may in some cases require modifying the network structure.)

4 Noisy Analog Neural Nets Recognize Regular Languages

(i.e. in states u such that (u) is within of 1, for some small constant ), so that they in e ect function as threshold logic units. This is achieved by multiplying the weights in N by a su ciently large constant m. Thus, let a language L be recognized by an d-unit threshold logic network T with transition function p+ = s(p; a) = sgn(Wp + h + ac), and separation . Let and u be constants such that the noise bound is < =wmax ? , and for all u u , j1 ? (u)j , and for all u ?u , j(?1) ? (u)j . Now consider the analog network N obtained from T by multiplying all the weights and thresholds by a constant m ? w u ( + ); max and replacing the signum nonlinearities by the sigmoids. We claim that N reproduces the 9

Theorem 4.1 Let a language L f?1; 1g be recognized by some d-unit threshold logic network T with separation P 0, and let wmax be the maximum total input weight to any > unit of T (i.e., wmax = maxi j jwij j). Let be a constant satisfying < =wmax. Then L can also be recognized by a d-unit recurrent analog neural net N that has perfect reliability 1 ( = 2 ) when a ected by any noise process Z bounded by . The activation function of N may be any function satisfying (u) ! ?1 for u ! ?1 and (u) ! 1 for u ! 1. Proof: The idea of the proof is simply to simulate the threshold logic network T with an analog neural network N by forcing the analog units to always operate close to saturation

behavior of T exactly, in the sense that the state of N at each time step, before noise is applied, is within of the corresponding state of T . Assume that the claim is true at some given time, when the state of T is some p 2 f?1; 1gd , and that of N correspondingly p = p + r, for some r 2 ? ; ]d . Consider then the update of ~ N rst with a noise vector e = q ? p, where q is generated according to some componentwise ~ ~ ~ -bounded noise density z (~; q), and then with the network transition function p~

p+ = ~

= =

(mW q + mh + mac) ~ (mW (p + r + e) + mh + mac) (m(Wp + h + ac) + mW (r + e)):

Considering the argument vector to the sigmoid componentwise, we obtain for each i = 1; : : : ; d the bound:

jm(wiT p + hi + ci a) + mwiT (r + e)j

m ? mwmax( + )

u:

By our choice of the value u , we are thus again assured that the components of the new state vector p+ of N are within of the corresponding components of the state vector p+ of ~ T . The claim follows by induction. One technicality concerning the choice of nal states in the network N still needs to be pointed out. Even though in the network T the nal states may be de ned as, say, FT = fp 2 f?1; 1gd j p1 = 1g, noise in the network N also a ects the state of the output unit, and so the nal states there should be de ned as FN = fp 2 ?1; 1]d j p1 1 ? g, if the noise is bounded by .

Corollary 4.2 For every regular language L f?1; 1g there is a constant > 0 such that L can be recognized with perfect reliability (i.e. = 1 ) by a recurrent analog neural net in 2 spite of any noise process Z bounded by . Proof: Let L be recognized by some nite automaton with m states. As presented in e.g. Minsky, 1972, pp. 55{57], one can easily construct from this automaton a threshold logic network T with 2m + 1 units that recognizes L. In Minsky's construction there is one threshold logic unit for each (state, input symbol) pair of the simulated automaton, plus one unit that tests for the acceptance condition. (Actually, our present model mandates testing also for input termination, which requires adding a few extra units.) A unit is activated (goes to state 1) when it receives an excitatory signal both from some preceding (state, symbol) unit and its input line. All the nonzero weights in T have absolute value 1, and the units have fan-in at most 2m + 1. Since this network satis es the conditions of Theorem 4.1 with = 1, wmax = 2m + 1, we may choose = 1=(2m + 1).

The next corollary shows that we can increase the noise tolerance of a network by slowing down the computation. Given an integer constant 1, let us say that a network N recognizes a language L with delay , if for every string x = a1 : : : ak 2 f?1; 1g , x 2 L if and only if N accepts the string a1 : : : ak (i.e. each input symbol ai is repeated times before the next one is presented). 10

Corollary 4.3 For every regular language L f?1; 1g there is a constant delay value

1 1 such that for any < 2 , L can be recognized with delay with perfect reliability (i.e. = 2 ) by a recurrent analog neural net that may be subject to any noise process Z bounded by .

Proof: Let again L be recognized by some nite automaton with m states. The threshold logic units used in the simulation of Corollary 4.2 simply test for the simultaneous activity on any one of the lines coming from the preceding (state, symbol) units, and the appropriate input line. Thus, each such unit can be replaced by a tree of fan-in 2 OR gates, and a concluding AND gate. Considering that the maximum fan-in of the original units is 2m + 1, the AND-OR trees may be constructed to have height = dlog2 me + 2. The resulting network then has integer weights, with wmax = 2, and recognizes the language L with delay : Remark 4.4 One can obtain di erent noise-tolerance vs. delay trado s using the

recent more advanced simulations of nite automata by threshold logic networks ( Alon, Dewdney, Ott, 1991], Horne, Hush, 1996], Indyk, 1995]). For instance, Horne, Hush, 1996] presents a simulation of m-state nite automata by threshp old logic networks with O( m log m) units, connection weights p1, and delay 4. Thus, one can in Corollary 4.3 achieve a noise-tolerance bound of = O(1= m log m) with delay = 4.

of the interval used to encode unit states in the analog neural net model. The results are here formulated using the interval ?1; 1], and changes in this interval would have the proportional e ects on the values. For instance, if the interval 0; 1] were used (as in e.g. 1 Siegelmann, Sontag, 1995]), the bound in Corollary 4.3 would decrease from 2 to 1 . 4

Remark 4.5 The precise values of the bounds obtained above are proportional to the width

5 A Novel Upper Bound for the VC-Dimension of Various Types of Neural Nets with Analog Noise

In this section we provide an example for the e ect of analog noise on discrete time analog computations with batch-input. We focus our attention on the most common types of analog neural nets and show that in the presence of arbitrarily small amounts of analog noise there exists an upper bound for the VC-dimension of such neural nets that is independent of the total number of gates in the case of a feedforward architecture, and independent of the computation time in the case of a recurrent neural net. This novel type of upper bound depends apart from the analog noise only on those parameters of the net that are relevant for its rst computation step, and it holds for arbitrary real-valued batch-inputs and arbitrary realvalued \programmable parameters" (i.e. weights etc.). It will become obvious from the proof that this upper bound is actually of a quite general nature, and it can also be applied to various other models for discrete-time analog computation with analog noise. The VC-dimension (abbreviated VC-dim(F )) of an arbitrary class F of functions f : Rn ! f0; 1g is de ned as follows. One says that F shatters a nite set S Rn if for every subset A S there exists a function f 2 F with f (x) = 1 for x 2 A and f (x) = 0 11

for x 2 S ? A. The VC-dimension of F is de ned as VC-dim(F ) := sup fjS j : S Rn is shattered by Fg. The VC-dimension of F may be viewed as a measure for the expressibility (or \degrees of freedom") of F . In particular it provides for arbitrary nite sets D Rn an upper bound of the form jDjO(VC-dim(F )) for the number of functions D ! f0; 1g that can be written as a restriction of a function in F to this nite domain D. As a consequence, the VC-dimension of F is the key parameter for estimating the number of randomly chosen examples that are needed to \learn" arbitrary target functions g : Rn ! f0; 1g from randomly chosen examples hx; g(x)i for g by a learning algorithm that uses functions from F as hypotheses (see Haussler, 1992], Vapnik, Chervonenkis, 1971], Blumer et al., 1989], Maass, 1995]). It should be noted that this does not only hold for the \classical" PAC-learning model where the target function g is required to belong to the class F , but according to Haussler, 1992] also in the general case of agnostic PAC-learning where g : Rn ! f0; 1g can be any function. Of course the latter case is much more relevant for the theory of learning with neural nets, where the class F of possible \hypotheses" is xed by the architecture of the neural net on which we run a learning algorithm, whereas the examples hx; g(x)i may arise from some arbitrary \real-world" classi cation problem for which we train the neural net. It is obvious from the results of Siegelmann, Sontag, 1995] and Maass, 1996] that there exist nite recurrent analog neural nets and nite recurrent networks of spiking neurons with batch-input and parameters from Q that have in nite VC-dimension (consider networks that can simulate a universal Turing machine, with each input bit-string encoded into a rational number). From the point of view of learning theory an in nite VC-dimension is commonly interpreted as information-theoretic evidence that there exists no \learning algorithm" for such networks (not even one with unlimited computation time). We will show in this section that this \anomaly" disappears as soon as one takes into account that the neural net is subject to analog noise, even if the amount of such noise is arbitrarily small. For technical reasons we will talk in this section also about the pseudo-dimension P-dim(G ) of a class G of real-valued functions g : Rn ! R. One can de ne P-dim(G ) as the VCdimension of the following associated class

F := ff : Rn+1 ! f0; 1g : 9 g 2 G (f (x; y) = 1 if g(x) y and f (x; y) = 0 if g(x) < y)g

of Boolean-valued functions. Consider now the computation of a system M = h ; p0 ; F; Rn ; si on a batch input vector x 2 Rn, a ected by some piecewise uniformly continuous noise process Z whose density function z has values in some range 0; B ]. The distribution of states of M after k 1 computation steps is given by the density function xu (p), where jxj = 1 and u = tk?1. R For k > 1, this density can be decomposed as q2 x (p0 ; q) u(q; p) d , and for k = 1 we have simply xu(p) = x (p0 ; q). This decomposition of the density function for the statedistribution of M will be essential for our subsequent results. We show in Theorem 5.1. that there exists a nite upper bound for the VC-dimension of the class F of functions computable by a class M of such systems M (which receive arbitrary real-valued batch input), that does not depend on the \complexity" of the class H of functions z ( ; ) that describe the second part of the computations of these systems M after their rst computation step. Let M be a class of such systems, a ected by the same piecewise uniformly continuous noise process Z . For example M can be the class of systems M that result from di erent weight-assignments to some feedforward or recurrent analog neural net with some xed ar12

chitecture. Denote by G the class of all density kernels of the form (x; q) := x (p0 ; q) for systems M 2 M, and by H the class of density kernels of the form !(q; p) := u (q; p), for systems M 2 M and sequences u 2 ftg . (As a special case, we include also the constant function 1 in H.) Then all the Boolean functions computed with reliability by the systems M 2 M are included in the class F of functions f : Rn ! f0; 1g that are composed of a function 2 G and a function ! 2 H so that for any x 2 Rn the integral R R 1 + if f (x) = 1; p2F q2 (x; q) !(q; p) d d has a value 2 (2) and else a value 1 ? : 2 Actually, the class F contains somewhat more than just the functions computed by systems from M, because the two component functions and ! in (2) may come from two di erent systems in M (for example from two di erent weight-assignments to a recurrent analog neural net). In Theorem 5.1 we consider an even more general set-up where one has two bounded state sets Rd and 0 Rd0 , measures over and 0 over 0 , as well as a Borel set F 0 of accepting nal states. (In applications is typically the set of possible intermediates states after a xed number l (e.g. l = 1) of computation steps, and 0 is the set of possible output states of a computation. One has d 6= d0 if for example the number d of units on the rst hidden layer of a feedforward sigmoidal neural net di ers from the number d0 of output nodes of the net; see Corollary 5.3.). We assume in Theorem 5.1 that G is an arbitrary class of piecewise uniformly continuous density kernels : Rn ! 0; B ] with uniformly bounded moduli of continuity (as in 0 ! R+ , that condition (1)), that H is an arbitrary class of density kernels ! : > 0 is an arbitrary given parameter, and that F is the class of functions f : Rn ! f0; 1g for which there exist functions 2 G and ! 2 H so that for any x 2 R the integral R R 1 0 p2F q2 (x; q) !(q; p) d d has a value 2 + if f (x) = 1 , and otherwise a value 1 2? . Because of our assumption about the function class G , one can (as in the proof of Theorem 3.1) superimpose on the space a nite grid G, such that for any ; ~ 2 G and x; x 2 Rn : ~ j (x; q) ? ~ (~; q)j =5 ( ) for all q 2 G implies that x R x q2 j (x; q) ? ~ (~; q)j d < =2: The size jGj of the grid (i.e. the number of gridpoints) depends in general on the reliability parameter , the common moduli of continuity of the functions in G , and the volume and shape of the state space . Theorem 5.1 Let G , H, and F be function classes as speci ed above, and assume in addition that the class G has nite pseudo-dimension . Then one can give a nite upper bound for the VC-dimension of F in terms of , B , jGj, , and ( ). Obviously this bound does not depend on the complexity of the function class H (except via parameters related to the state set ) . Proof: Let S Rn be some arbitrary nite set shattered by F . For any subset A S we x functions A 2 G and !A 2 H so that for any x 2 S the integral R R 1 0 p2F q2 A (x; q) !A (q; p) d d has a value 2 + if x 2 A; and else a value 1 ? : 2 13

We write GS for the class of all functions A 2 G for A S , and GS for the class of restrictions of these functions to the nite domain S G. We also consider for := =10 ( ) and any class A of functions with range R+ the class A of all \ -discretizations" g of functions g 2 A, where g (z) := g(z) for any z in the domain of g : In particular for the class GS the functions A 2 GS map S G into f0; : : : ; b ? 1g for b := bB= c + 1. One should note that by our assumptions on G , for any ; ~ 2 G and any x; x 2 S ~ R x the condition 8q 2 G (j (x; q) ? ~ (~; q)j 1) implies that j (x; q) ? ~ (~; q)j d < =2. x One can get an upper bound for the \complexity" of GS by applying to GS a general-

ization of \Sauer'sP ? due to Alon et al., 1993]. Given integers m, b, and , de ne Lemma" (m; b; ) := log2 i=1 m bi . Lemma 15 of Alon et al., 1993] states that if A is any class i of functions obtained as the discretizations of the functions in a class A of pseudo-dimension , such that the functions in A have a domain D of size m and range f0; : : : ; b ? 1g, then A must contain an \L1 2-cover" B A of size at most

jB j 2 (mb2 ) (m;b; ) : ~ That is, for every f 2 A there is some f~ 2 B such that jf (z ) ? f (z )j < 2 (and hence 1) for every z 2 D. (The result holds for general values of , , m, and b.) Applied to our context (with A := GS ) this result implies that there exists a set G GS whose cardinality can be bounded in terms of the pseudo-dimension of G as jG j 2 (jS j jGj b2 ) (jSj jGj;b; ); (3) such that for every 2 GS there exists some ~ 2 G with j (x; q) ? ~ (x; q)j 1 for all x 2 S and all q 2 G. With the help of the 2-cover of GS which is induced by G we can now show that the cardinality jS j of the shattered set S can be bounded through the inequality 2jS j 2 (jS j jGj b2 ) (jS j jGj;b; ) 2bjGj : (4) It is obvious that this inequality yields an upper bound for jS j which does not depend on the complexity of the function class H (except for parameters related to ) . Let us consider for each ! 2 H the discrete map ! : f0; : : : ; b ? 1gG ! f0; 1g which is ^

induced by ! through the following de nition: !(^ ) has value 0 for ^ 2 f0; : : : ; b ? 1gG ^ if there exist some 2 G and x 2 S with j ^ (q) ? (x; q)j 1 for all q 2 G and R R 0 1? : p2F q2 (x; q) !(q; p) d d 2 Else we set !(^ ) = 1: ^

Since we have the upper bound (3) on the size of the cover G , and there exist at most 2bjGj di erent functions !, it su ces for proving (4) to show that the following claim holds. ^ 14

moreover !A1 = !A2 , then A1 = A2 . ^ ^ In order to prove this claim, let us assume that A1 6= A2 , but both A1 and A2 are covered by the same function ~ 2 G . We shall show that then !A1 6= !A2 . ^ ^ Fix some x0 2 S so that either x0 2 A1 ? A2 or x0 2 A2 ? A1 ; without loss of generality we may assume that x0 2 A1 ? A2 . Let ^ : G ! f0; : : : ; b ? 1g be de ned by ^ (q) = ~ (x0 ; q). Then we have !A2 (^ ) = 0, since by assumption j ~ (x0 ; q) ? A2 (x0 ; q)j 1 for all q 2 G and ^ Z Z 1? : A2 (x0 ; q) !A2 (q; p) d d 0 2 p2F q2 Assume for a contradiction that also !A1 (^ ) = 0 for this function ^ . This implies that ^ there exist some 2 G and some x1 2 S with Z Z 1 (5) (x1 ; q) !A1 (q; p) d d 0 2 ? and p2F q2 j ^ (q) ? (x1 ; q)j 1 for all q 2 G: The latter implies by our choice of G and and the de nition of ^ that

Z

Claim: Let A1 ; A2 S . If some function ~ 2 G covers both A1 and A2 , in the sense that j ~ (x; q) ? A1 (x; q)j 1 and j ~ (x; q) ? A2 (x; q)j 1 for all x 2 S and all q 2 G, and

q 2 G, hence

On the other hand the assumptions on ~ 2 G imply that j ~ (x0 ; q) ? A1 (x0 ; q)j 1 for all

Z

q2

j ~ (x0 ; q) ? (x1 ; q)j d

<

=2:

(6)

Furthermore since x0 2 A1 , we have by choice of !A1 that

Z Z

q2

j ~ (x0 ; q) ?

A1 (x0 ; q)

Z

A1 (x0 ; q)j d

<

=2 :

1+ : 2

(7) (8)

p2F q2

!A1 (q; p) d d 0

A1 (x0 ; q)j d

Z Z

The inequalities (6) and (7) imply that

q2

Z Z

j (x1 ; q) ?

< :

This inequality yields in combination with (5) and (8) the contradiction (x1 ; q) !A1 (q; p) d d 0 ? A1 (x0 ; q) !A1 (q; p) d d 0 p2FZ q2 p2F q2 Z j (x1 ; q) ? A1 (x0 ; q)j !A1 (q; p) d d 0 =

Z

p2F q2 q2 q2

Z

j (x1; q) ? j (x1; q) ?

A1 (x0 ; q)j A1 (x0 ; q)j

Z

< :

d

p2F

!A1 (q; p) d 0 d

15

This contradiction implies that !A1 (^ ) = 1, hence !A1 6= !A2 . Thus we have veri ed the ^ ^ ^ preceding claim, and the proof of Theorem 5.1 is now complete.

Remark 5.2 It follows from Alon et al., 1993] that instead of a nite upper bound for the pseudo-dimension of G it su ces for Theorem 5.1 to assume a nite upper bound for the -dimension P -dim(G ) of G for = =20 ( ). Corollary 5.3 There exists a nite upper bound for the VC-dimension of layered feedforward

sigmoidal neural nets and feedforward networks of spiking neurons with piecewise uniformly continuous analog noise (for arbitrary real-valued inputs, Boolean output computed with some arbitrary reliability > 0, and arbitrary real-valued \programmable parameters") which does not depend on the size or structure of the network beyond its rst hidden layer.

layered feedforward sigmoidal neural nets with n input-nodes, d units on their rst hidden layer, and d0 output nodes. Thus the nets in N may have arbitrary numbers of layers and gates, and arbitrary real-valued weight-assignments. We assume that the d gates on the rst layer are a ected by some piecewise uniformly continuous noise process with density kernel z : 2 ! R+ , where := ?1; 1]d . Let F : Rn Rm ! be the function whose value F (x; w) is the vector of outputs of the d rst hidden layer units (without noise), for arbitrary network inputs x 2 Rn and arbitrary assignments w 2 Rm to the weights and biases of these units. We take as the class G of functions considered in the proof of Theorem 5.1 all functions of the form (x; q) = z (F (x; w); q) for arbitrary parameters w 2 Rm . The results presented in Karpinski, Macintyre, 1995] imply that the pseudo-dimension of this class G of functions is bounded by a polynomial in m, for all common choices of activation functions of the sigmoidal units and all practically relevant density kernels z for the noise process (even involving the exponential function). In the case where the activation functions and density kernels are piecewise polynomial one can apply the results of Goldberg, Jerrum, 1995] to get a slightly better nite upper bound for the pseudo-dimension of G . We de ne for = ?1; 1]d and 0 = ?1; 1]d0 the class H as the class of all density kernels 0 ! R+ that describe the computations of the remaining layers of networks in N !: with arbitrary noise processes (and arbitrary real-valued weights). It follows from Theorem 5.1 that the nite VC-dimension bound obtained for the class F of functions computed with reliability > 0 by networks in the class N does not depend on the complexity of the function class H , and hence not on the number of layers, the number of units beyond the rst layer, or the noise process on later layers of these networks. In the case of a network N of noisy spiking neurons the programmable parameters consist of the \weights" of synapses, time-delays for postsynaptic potentials, and parameters which determine other aspects of the functional form of response functions (i.e. postsynaptic potentials) and threshold functions. The pseudo-dimension of the class G which arises when one applies (as described in Section 2) the here considered framework to the rst layer of a network N of noisy spiking neurons can be bounded with the help of the same tools as for the case of sigmoidal neural nets. 16

Proof: We rst consider for some arbitrary given parameters n; d; d0 2 N the class N of all

Corollary 5.4 There exists a nite upper bound for the VC-dimension of recurrent sigmoidal

neural nets and networks of spiking neurons with analog noise (for arbitrary real valued inputs, Boolean output computed with some arbitrary reliability > 0, and arbitrary real valued \programmable parameters") which does not depend on the computation time of the network, even if the computation time is allowed to vary for di erent inputs.

Proof: One proceeds in the same manner as for the proof of Corollary 5.3, except that G

now consists of the class of all state-distributions that arise from the rst computation step of the total network, and H consists of all possible state-transformations that can arise from the rest of the computations of the same network.

6 Conclusions

We have introduced a new framework for the analysis of analog noise in discrete-time analog computations that is better suited for \real-world" applications and more exible than previous models. In contrast to preceding models it also covers important concrete cases such as analog neural nets with a Gaussian distribution of noise on analog gate outputs, noisy computations with less than perfect reliability, and computations in networks of noisy spiking neurons. Furthermore we have introduced adequate mathematical tools for analyzing the e ect of analog noise in this new framework. These tools di er quite strongly from those that have previously been used for the investigation of noisy computations. Finally we have established signi cant new results regarding general limitations for the computational power and VC-dimension of noisy computational systems. These results provide in particular new bounds for the computational power and VC-dimension of analog neural nets and networks of spiking neurons in the presence of analog noise.

Acknowledgement

We would like to thank Peter Auer for helpful conversations.

References

Alon et al., 1993] N. Alon, N. Cesa-Bianchi, S. Ben-David, D. Haussler, Scale-sensitive dimensions, uniform convergence, and learnability. Proceedings of the 34th Annual IEEE Symposium on Foundations of Computer Science, 292{301. IEEE Computer Science Press, New York, NY, 1993. Alon, Dewdney, Ott, 1991] N. Alon, A. K. Dewdney, T. J. Ott, E cient simulation of nite automata by neural nets. J. Assoc. Comput. Mach. 38 (1991), 495{514. Anderson et al., 1977] J. A. Anderson, J. W. Silverstein, S. A. Ritz, R. S. Jones. Distinctive features, categorical perception, and probability learning: some applications of a neural model. Psychological Review 84 (1977), 413{451. Reprinted in Anderson, Rosenfeld, 1988], pp. 287{325. 17

Anderson, Rosenfeld, 1988] J. A. Anderson and E. Rosenfeld (eds.), Neurocomputing: Foundations of Research. The MIT Press, Cambridge, MA, 1988. Asarin, Maler, 1994] E. Asarin, O. Maler. On some relations between dynamical systems and transition systems. Proceedings of the 21st International Colloquium on Automata, Languages, and Programming, 59{72. Lecture Notes in Computer Science 820, SpringerVerlag, Berlin, 1994. Blum, Shub, Smale, 1989] L. Blum, M. Shub, S. Smale, On a theory of computation over the real numbers: NP-completeness, recursive functions and universal machines. Bulletin of the Amer. Math. Soc. 21 (1989), 1{46. Blumer et al., 1989] A. Blumer, A. Ehrenfeucht, D. Haussler, M. K. Warmuth, Learnability and the Vapnik-Chervonenkis dimension. J. Assoc. Comput. Mach. 36 (1989), 929{965. Casey, 1996] M. Casey, The dynamics of discrete-time computation, with application to recurrent neural networks and nite state machine extraction. Neural Computation 8 (1996), 1135{1178. Feller, 1971] W. Feller, An Introduction to Probability Theory and Its Applications, Vol. 2, Second Edition. J. Wiley & Sons, New York, NY, 1971. Gerstner, van Hemmen, 1994] W. Gerstner, J. L. van Hemmen, How to describe neuronal activity: spikes, rates or assemblies? Advances in Neural Information Processing Systems 6, 463{470. Morgan Kaufmann, San Mateo, CA, 1994. Goldberg, Jerrum, 1995] P. W. Goldberg, and M. R. Jerrum, Bounding the VapnikChervonenkis dimension of concept classes parameterized by real numbers, Machine Learning 18 (1995), 131{148. Haussler, 1992] D. Haussler, Decision theoretic generalizations of the PAC-model for neural nets and other learning applications, Inf. and Comp. 100 (1992), 78{150. Hopcroft, Ullman, 1979] J. E. Hopcroft, J. D. Ullman, Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Reading, MA, 1979. Horne, Hush, 1996] B. G. Horne, D. R. Hush, Bounds on the complexity of recurrent neural network implementations of nite state machines. Neural Networks 9 (1996), 243{252. Indyk, 1995] P. Indyk, Optimal simulation of automata by neural nets. Proceedings of the 12th Annual Symposium on Theoretical Aspects of Computer Science, 337{347. Lecture Notes in Computer Science 900, Springer-Verlag, Berlin, 1995. Karpinski, Macintyre, 1995] M. Karpinski, A. Macintyre, Polynomial bounds for VCdimension of sigmoidal and general Pfa an neural networks. J. of Computer and System Sciences, to appear. Kifer, 1988] Y. Kifer, Random Perturbations of Dynamical Systems. Birkhauser, Boston, MA, 1988. Koiran et al., 1994] P. Koiran, M. Cosnard, M. Garzon, Computability with low-dimensional dynamical systems. Theoret. Comput. Sci. 132 (1994), 113{128. 18

Maass, 1995] W. Maass, Vapnik-Chervonenkis dimension of neural nets. The Handbook of Brain Theory and Neural Networks (M. A. Arbib, ed.), 1000{1003. The MIT Press, Cambridge, MA, 1995. Maass, 1996] W. Maass, On the computational power of noisy spiking neurons. Advances in Neural Information Processing Systems 8, 211{217. The MIT Press, Cambridge, MA. Maass, 1996a] W. Maass, Lower bounds for the computational power of networks of spiking neurons. Neural Computation 8 (1996), 1{40. Minsky, 1972] M. L. Minsky, Computation: Finite and In nite Machines. Prentice-Hall, Englewood Cli s, NJ, 1972. Moore, 1990] C. Moore, Unpredictability and undecidability in physical systems. Phys. Review Letters 64 (1990), 2354{2357. Orponen, Matamala, 1996] P. Orponen, M. Matamala, Universal computation by nite twodimensional coupled map lattices. Manuscript, Proc. Workshop on Physics and Computation PhysComp'96, to appear. Paz, 1971] A. Paz, Introduction to Probabilistic Automata. Academic Press, New York, NY, 1971. Phatak, Koren 1995] D. S. Phatak, I. Koren, Complete and partial fault tolerance of feedforward neural nets. IEEE Transactions on Neural Networks 6 (1995), 446{456. Pippenger, 1989] N. Pippenger, Invariance of complexity measures for networks with unreliable gates. J. Assoc. Comput. Mach. 36 (1989), 531{539. Rabin, 1963] M. Rabin, Probabilistic automata. Information and Control 6 (1963), 230{245. Siegelmann, 1994] H. T. Siegelmann, On the computational power of probabilistic and faulty networks. Proc. 21st International Colloquium on Automata, Languages, and Programming, 23{34. Lecture Notes in Computer Science 820, Springer-Verlag, Berlin, 1994. Siegelmann, Sontag, 1995] H. T. Siegelmann, E. D. Sontag, On the computational power of neural nets. J. Comput. System Sciences 50 (1995), 132{150. Turan, Vatan, 1994] G. Turan, F. Vatan, On the computation of Boolean functions by analog circuits of bounded fan-in. Proceedings of the 35th Annual IEEE Symposium on Foundations of Computer Science, 553{564. IEEE Computer Science Press, New York, NY, 1994. Vapnik, Chervonenkis, 1971] V. N. Vapnik, A. Y. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications 16 (1971), 264-280. von Neumann, 1956] J. von Neumann, Probabilistic logics and the synthesis of reliable organisms from unreliable components. Automata Studies (C. E. Shannon, J. E. McCarthy, eds.), 329{378. Annals of Mathematics Studies 34, Princeton University Press, Princeton, NJ, 1956. 19

赞助商链接

- 1 On the Effects of Noise and Speed on Computations
- Effect of Temperature on a Two-Phase Clock-Driven Discrete-Time Chaotic Circuit
- Effect of noise on real-time parameter estimation of multifrequency signals
- The Effect of Interaction on Shot Noise in The Quantum Limit
- A general formulation of discrete-time quantum mechanics, restrictions on the action and th
- On the Hedging of American Options in Discrete Time Markets with Proportional Transaction C
- Discrete-Time Linear Filtering in Arbitrary Noise W
- On the treatment of threshold effects in SUSY spectrum computations
- On the effect of discrete numbers of stars in chemical evolution models
- the effects of just-in-time systems on financial accounting metrics
- A Lower Bound on the Disconnection Time of a Discrete Cylinder
- On invariant measures of discrete time filters in the
- the effect of hourly payment on the evaluation of time
- Effect of Noise on the Standard Mapping
- A general formulation of discrete-time quantum mechanics, restrictions on the action and th

更多相关文章：
**
***Analog* *In*-circuit Testing

*Analog* *In*-circuit Testing_信息与通信_工程科技_专业资料。模拟测试的调试方法。 Agilent 技术 2001 测试方法: 类比 2-1 __书架 __语法叁考 __主人索引 __...**
脉冲编码调制PCM译文
**

and transmission.*In* *effect*,*the* system commonly ...*discrete-time*,discrete-level *analog* *of* a previous...all distortion and *noise* *in* a digital signal are...**
控制器隔离 - ***Analog* Devices

控制器隔离 -*Analog* Devices_建筑/土木_工程科技_专业资料。控制器隔离 - *Analog* Devices今日推荐 180份文档 2014证券从业资格考试...**
***ANALOG* INPUTDATE

8页 2财富值 ekr1000_*analog* 89页 5财富值如要投诉违规内容,请到百度文库投诉中心;如要提出功能问题或意见建议,请点击此处进行反馈。...**
连续时间信号的数字处理河南工业大学版
**

% Program P5_2 % Illustration*of* Aliasing *Effect*...('*Discrete-time* signal x[n]'); subplot(2,2,...*of* *Analog* Lowpass Filter clf; Fp = 3500;Fs =...**
一篇关于FPGA的英文文献及翻译
**

*discrete* and *analog* control, custom sensor ...You can specify *the* clock rate at compile *time*....*noise*, vibration, and harshness (NVH) analysis ...**
36e_***analog*_RevB

36e_*analog*_RevB_调查/报告_表格/模板_实用文档。TEAMWORK & TECHNOLOGY 1336 *IMPACT* *ANALOG* I/O Hands-*On* Lab Using AC vs. DC Dyne Hands-*on* Exercise ...**
***Analog*集成电路设计知识点

*Analog*集成电路设计知识点*Analog*集成电路设计知识点隐藏>> *Analog* 集成电路设计...2:phase *noise* 在带宽处上翘,是因为 pll 环路设计的相位裕度不够。 3:对的...**
3 ***Analog* Electronics

3*Analog* Electronics_电子/电路_工程科技_专业资料。3 *Analog* Electronics 模拟...(CMOS) logic circuits use field-*effect* transistors (FETs), which differ ...**
FPGA的文献翻译
**

*discrete* and *analog* control, custom sensor ...LabVIEW Real-*Time* systems provide deterministic ...Mobile/portable *noise*, vibration, and harshness (...
更多相关标签：

and transmission.

控制器隔离 -

8页 2财富值 ekr1000_

% Program P5_2 % Illustration

36e_

3