9512.net

甜梦文库

甜梦文库

当前位置：首页 >> >> # On the relationship between symbolic and neural computation

On the relationship between symbolic and neural computation

Whitney Tabor and Dalia Terhesiu

University of Connecticut 406 Babbidge Drive Storrs, CT 06269 tabor@uconnvm.uconn.edu

To appear as a 2004 AAAI Fall Symposium Series Technical Report (see http://www.sp.uconn.edu/ ps300vc/papers.html) Abstract

There is a need to clarify the relationship between traditional symbolic computation and neural network computation. We suggest that traditional context-free grammars are best understood as a special case of neural network computation; the special case derives its power from the presence of certain kinds of symmetries in the weight values. We describe a simple class of stochastic neural networks, Stochastic Linear Dynamical Automata (SLDAs), de?ne Lyapunov Exponents for these networks, and show that they exhibit a signi?cant range of dynamical behaviors—contractive and chaotic, with context free grammars at the boundary between these regimes. Placing context-free languages in this more general context has allowed us, in previous work, to make headway on the challenging problem of designing neural mechanisms that can learn them.

chaotic case. Along the way, we adopt a method of studying chaos in stochastic systems, demonstrating its wellbehavedness. We identify a particularly simple class of networks (Stochastic Linear Dynamical Automata or SLDAs) that provides a convenient domain for studying all three dynamical regimes (contractive, edge of chaos, and chaotic) and comparing them.

History

(Elman 1990; 1991; Pollack 1990) provided promising notes with regard to getting neural networks to learn context-free languages. (Levy, Melnik, & Pollack 2000; Melnik, Levy, & Pollack 2000) recognized the role of fractal sets in (Pollack 1990)’s proposal and were able to prove convergence for a simple case. (Wiles & Elman 1995; Rodriguez, Wiles, & Elman 1999; Rodriguez 2001) made headway in understanding how symbol counting could work in Elman’s Simple Recurrent Networks (SRNs) and (Rodriguez 2001) found that a linear approximation of the solution could be extracted which revealed the network’s counting technique. (Gers & Schmidhuber 2001) have created a very robust counting unit which can learn dependencies of great length. So far, the results only handle linear and quadratic stategrowth languages. Exponential state growth languages (where the number of causal states (Shalizi, Shalizi, & Crutch?eld 2002) grows exponentially with string length; palindrome languages are an example) are more dif?cult, but clearly central to interesting applications (e.g. human language). Meanwhile, (Moore 1996) showed how to recognize context-free grammars with dynamical recognizers with piecewise linear activation functions. (Tabor 2000) described a similar mechanism, pushdown dynamical automata (PDDAs), and suggested that such encodings might be a reachable state for complex grammar learning. (Tabor 2002b) provided evidence for this view by using a variant on a classi?cation technique from dynamical systems theory, Lyapunov Exponents (Oseledec 1968; Abarbanel, Gilpin, & Rotenberg 1996). Lyapunov Exponents measure the average expansiveness of the points along a trajectory of a dynamical system. Among discrete, deterministic dynamical systems, the maximal Lyapunov Exponent is negative for periodic attractors and positive for chaotic attractors, so the boundary

Introduction

Part of the interest in symbol composition lies in the fact that it supports recursive symbolic computation. A hope is that neural networks can exhibit the power of recursive computation, but do so in a way that maintains some of their advantageous properties, like graceful degradation, learning, and human-like generalization. There is also a sense that in addition to affording these desirable qualities, some different kind of insight may be offered if recursive neural network symbol processing can be made robustly successful. We suggest the following as one aspect of this ”different kind of insight”: symbolic grammars (i.e., Turing Machines) are a special case within the wide gamut of dynamical (i.e., real-valued, recurrent) computing devices. Context free grammars, for example, lie at the boundary between contractive and chaotic systems, which has been associated with complex processing in other work (Wolfram 1984; Langton 1992; Moore 1990; Crutch?eld 1994; Kaufmann 1993). Our prior work (Tabor 2002b) suggested this correspondence by identifying contractive neural networks that generate/recognize ?nite state languages and edge-of-chaos networks that generate/recognize context-free languages. The present work extends these results by including the

Copyright c 2004, American Association for Arti?cial Intelligence (www.aaai.org). All rights reserved.

between the periodic and chaotic regimes is at 0. Under a proposed extension of the de?nition to stochastic systems, (Tabor 2002b) showed that the maximal exponent was 0 for probabilistic PDDAs. Moreover, the maximal exponents for SRNs trained (only partially successfully) on stack- and queue-based languages were much closer to 0 than the maximal exponents for ?nite-state languages. These results suggest (1) it may be possible to design learning algorithms that converge on zero-Lyapunov solutions like the PDDAs; (2) neural networks seem to ?t the picture sketched by (Wolfram 1984; Moore 1990; Langton 1992), in which the computations of Turing machines lie in a special section of dynamical computation space on the border between periodic and chaotic processes (the so-called ”edge of chaos”)—see also (Siegelmann 1999; Griffeath & Moore 1996). Regarding (1), (Tabor 2003; 2002a) described a learning network called a Fractal Learner which shows, in simulations, evidence of convergence on certain PDDAs, including exponential state growth cases. Regarding (2), it must be noted that the results of (Tabor 2002b) only provide evidence for part of the proposed picture. That is, they show neural networks in the contractive and edge of chaos domains. They do not show neural networks in the chaotic domain. The present paper introduces a construction, Stochastic Linear Dynamical Automata (SLDAs), that helps ?esh out the picture for one simple class of neural dynamical systems: one-dimensional (i.e., one-hidden unit) maps with linear activation and gating functions. We (i) sketch a method of proving that the extended de?nition of Lyapunov Exponents given in (Tabor 2002b) is well-behaved in the sense of being independent of initial state and the randomness of the process and (ii) show that all three dynamical types (periodic, edge-of-chaos, and chaotic) are present among SLDAs. The results suggest a route to checking the proposed correspondences between Turing and dynamical computation.

only ?nitely many points. If at least one exponent is positive and the system is bounded, then the system is chaotic. The case in which the greatest Lyapunov exponent is 0 in a discrete system is a special case which can yield complex behavior (famously for the logistic map at the “edge of chaos”—(Crutch?eld & Young 1990)). To extend (2) to stochastic dynamical systems, we let the de?nition of eigenvalues depend not only on the initial condition, but also on the speci?c random sequence of inputs, S , that the autonomously functioning network chooses: OSL(h, S ) = lim ((Dft )T (Dft )) 2t (h)

t→∞

1

(3)

(Tabor 2002b) hypothesized that the logarithms of the eigenvalues of this matrix are essentially independent of the choice of h and corresponding symbol sequences S .

A simple case: Stochastic Linear Dynamical Automata (SLDAs).

Consider the following class of dynamical automata:1 M = (H, F, P, Σ, IM, h0 , F R) (4)

where H = {h ∈ RN : 0 ≤ hj ≤ 1, j ∈ {1, . . . , N }}, F = (m, b) is the set of K linear functions, {Fi : R → R : Fi (h) = mi · h + bi , 0 ≤ hj , (Fi )j (h) ≤ 1, i ∈ {1, . . . , K }, j ∈ {1, . . . , N }}, P is a partition of H the boundaries of whose compartments are the boundaries of the domains of the Fi , Σ = {1, . . . , K } is the alphabet of symbols, IM , the input map, speci?es that when h is in the domain of Fi then a possible next symbol is i and the corresponding next state is Fi (h), h0 ∈ H is the initial state, and F R ∈ H is the ?nal region. Here, we focus on a related class of devices M P = (H, F, P, Σ, IM, h0 , P D) (5)

Construction

Lyapunov Characteristic Exponents (Oseledec 1968; Abarbanel, Gilpin, & Rotenberg 1996) measure the average rate of contraction or divergence of trajectories near the attractor of a dynamical system. Let ht+1 = f (ht ) (1)

be a discrete, deterministic dynamical system with N dimensional state, h. The Lyapunov Exponents, λi for i = 1, . . . , N , of the trajectory starting at h are the logarithms of the eigenvalues of the matrix OSL(h) = lim ((Dft )T (Dft )) 2t (h)

t→∞

1

(2)

where there is no ?nal region (so the machine starts at h0 and runs forever) and where P D : H → F is a probability function which assigns probabilities to choices when the map is nondeterministic. We say that the probabilities are properly assigned if some function in F has positive probability at every point in H . We call these stochastic maps Stochastic Linear Dynamical Automata (SLDAs). One-dimensional SLDAs can be depicted graphically in a transparent way. We take languages to be distributions of strings. Table 1 shows a grammar for generating Language 1, a context free language which is not a ?nite state language. Figure 1 shows an SLDA for processing Language 1.

where T denotes transpose, Df (h) is the Jacobian of f at h. For h in a single attractor, the values of the eigenvalues are essentially independent of the choice of h so we may speak of the Lyapunov exponents of the attractor. If all the exponents are negative, then the system is a limit cycle and visits

Sketch of invariance demonstration for SLDAs

(See http://www.sp.uconn.edu/?ps300vc/papers.html#eocnn for details.)

1

Dynamical automata are de?ned in (Tabor 2000).

Table 1: Grammar 1, which generates Language 1. Parentheses denote optional constituents, which occur with probability 0.5. A slash between two productions means they each occur with probability 0.5. Generation of a string is accomplished by expanding S. The grammar generates an in?nite string by repeatedly generating a ?nite string and appending it to its previous output. S → AB/XY A → a (S) B → b (S) X → x (S) Y → y (S)

-0.0123. In fact, 0 is the maximum Lyapunov exponent that this system can achieve over all proper assignments of probabilities to transition functions. Example 2. A ?nite state contractive mapping. Setting, (m, b) = 1/2 1/2 0 1/2 (7)

We would like to know if the Lyapunov exponents which we measure for SLDAs are a reliable indicator of the dynamical status of the SLDAs. To this end, we ask if the values of the Lyapunov exponents are independent of the choice of initial state, h0 and its corresponding symbol sequence, S ∈ Σ∞ . We show that there is an invariant probability measure on the product space, H × S ∞ . This means that the case at hand falls under a generalization of (Oseledec 1968)’s Mulitplicative Ergodic Theorem to discretetime, stochastic systems (Arnold 1998). This guarantees the existence of Lyapunov exponents. We also show that the invariant measure is unique. Therefore, the Lyapunov exponents are independent of the initial state, h0 , for almost all choices of initial state (Walters 1986). Note that in the onedimensional case, there is only one Lyapunov exponent for each system, given by 1 λ = limn→∞ log |mn | (6) n where mn is the slope of the function applied on the n’th iteration.

where both functions are equally probable for all h yields an exponent of -log 2 (measured value -0.6931). Note that this case generates the same language as the ?nite state machine with one state and two equally probable paths from the state to itself. Note that the maximum Lyapunov exponent over all assignments of probabilities for this case is negative (= -log 2). Example 3. A chaotic case. Let ? ? 1/2 0 ? 1/2 1/2 ? (m, b) = ? 2 0 ? 2 ?1

(8)

with P = {[0, 1/2), [1/2, 1]} and the probability of each 1 and the probability of each excontractive map equal to 6 2 pansive map equal to 3 on its interval. Then the Lyapunov exponent is (log 2)/3 (measured value 0.2377). Since this number is greater than 0 and the system is bounded, the automaton is chaotic. Note that the maximum of the Lyapunov exponent over all assignments of probabilities is positive in this case (= log 2).

Conclusions

These cases are consistent with the conjecture of (Tabor 2002b) that contractive dynamical automata correspond to probabilistic ?nite-state automata while Turing computers lie on the boundary separating contractive from chaotic dynamical automata. An advantage of the current framework is that the full range of dynamical behaviors is exhibited in a very simple class of functions. Desirable and possibly reasonable goals for future work are (i) to fully establish the correspondences just mentioned for SLDAs, (ii) to probe the implications of the sensitivity of the Lyapunov exponents to assignments of probabilities to transition functions, (iii) to provide a symbolic characterization of the chaotic cases, and (iv) to generally establish the convergence of a learning algorithm along the lines of (Tabor 2003). An interesting idea, suggested by the foregoing, is that complex cognitive computation is rare because it depends on the occurrence of rare symmetries in neural parameter space.

SLDAs as Recurrent Neural Networks (RNNs)

An SLDA can be interpreted as a layered neural network. The coef?cients of the Fi correspond to ?rst and secondorder weights on connections into the second layer. Figure 2 shows the neural network version of the SLDA in Figure 1.

Various dynamical behaviors observed

By analogy with the deterministic case, we call an SLDA whose Lyapunov exponent is negative a contractive SLDA, and an SLDA whose Lyapunov exponent is positive a chaotic SLDA. Some examples show that there exist contractive, edge-of-chaos, and chaotic devices within the set of 1-dimensional SLDAs. Example 1. A context free grammar at the edge of chaos. Under the de?nition discussed in the previous sections, the Lyapunov exponent of SLDA1 is 0. This result can be derived by noting that SLDA1 satis?es the de?nition of a Pushdown Dynamical Automaton (Tabor 2000) and therefore undergoes equal contraction and expansion during the processing of any well-formed sentence. This implies convergence of the Lyapunov exponent on 0 as n → ∞. Measurement of the value on a corpus of 10000 words produced the value

References

Abarbanel, H. D. I.; Gilpin, M. E.; and Rotenberg, M. 1996. Analysis of Observed Chaotic Data. New York: Springer-Verlag. Arnold, L. 1998. Random Dynamical Systems. Berlin: Springer Verlag. Crutch?eld, J. P., and Young, K. 1990. Computation at the onset of chaos. In Zurek, W. H., ed., Complexity, Entropy,

and the Physics of Information. Redwood City, California: Addison-Wesley. 223–70. Crutch?eld, J. P. 1994. The calculi of emergence: Computation, dynamics, and induction. Physica D 75:11–54. In the special issue on the Proceedings of the Oji International Seminar, Complex Systems—from Complex Dynamics to Arti?cial Reality. Elman, J. L. 1990. Finding structure in time. Cognitive Science 14:179–211. Elman, J. L. 1991. Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning 7:195–225. Gers, F. A., and Schmidhuber, J. 2001. Lstm recurrent networks learn simple context-free and contextsensitive languages. IEEE Transactions on Neural Networks 12(6):1333–1340. Griffeath, D., and Moore, C. 1996. Life without death is p-complete. Complex Systems 10:437–447. Kaufmann, S. 1993. Origins of Order. Oxford: Oxford University Press. Langton, C. 1992. Life at the edge of chaos. In Langton, C. G.; Taylor, C.; Farmer, J. D.; and Rasmussen, S., eds., Arti?cial Life II. Addison-Wesley. 41–91. Proceedings of the workshop on arti?cial life held February, 1990 in Santa Fe, New Mexico. Proceedings Volume X, Santa Fe Institute Studies in the Sciences of Complexity. Levy, S.; Melnik, O.; and Pollack, J. 2000. In?nite raam: A principled conectionist basis for grammatical competence. In Proceedings of the 22nd Annual Meeting of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum Associates. 298–303. Melnik, O.; Levy, S.; and Pollack, J. B. 2000. RAAM for in?nite context-free languages. In Proceedings of the International Joint Conference on Arti?cial Neural Networks (IJCNN). IEEE. Moore, C. 1990. Unpredictability and undecidability in dynamical systems. Physical Review Letters 64:2354–2357. Moore, C. 1996. Recursion theory on the reals and continuous-time computation. Theoretical Computer Science 162(1):23–44. Oseledec, V. I. 1968. A multiplicative ergodic theorem. lyapunov characteristic numbers for dynamical systems. Trudy Mosk. Mat. Obsc. 19:197. Pollack, J. B. 1990. Recursive distributed representations. Arti?cial Intelligence 46:77–106. Special issue on Connectionist symbol processing edited by G. E. Hinton. Rodriguez, P.; Wiles, J.; and Elman, J. 1999. A recurrent neural network that learns to count. Connection Science 11(1):5–40. Rodriguez, P. 2001. Simple recurrent networks learn context-free and context-sensitive languages by counting. Neural Computation 13(9). Shalizi, C. R.; Shalizi, K. L.; and Crutch?eld, J. P. 2002. Pattern discovery in time series, part i: Theory, algorithm,

analysis, and convergence. Santa Fe Working Paper #0210-060. Siegelmann, H. T. 1999. Neural Networks and Analog Computation: Beyond the Turing Limit. Boston: Birkh¨ auser. Tabor, W. 2000. Fractal encoding of context-free grammars in connectionist networks. Expert Systems: The International Journal of Knowledge Engineering and Neural Networks 17(1):41–56. Tabor, W. 2002a. Fractal learning neural networks. Unpublished Manuscript, University of Connecticut. Tabor, W. 2002b. The value of symbolic computation. Ecological Psychology 14(1/2):21–52. Tabor, W. 2003. Learning exponential state-growth languages by hill climbing. IEEE Transactions on Neural Networks 14(2):444–446. Walters, P. 1986. Unique ergodicity and random matrix products. In Arnold, L., and Wihstutz, V., eds., Lecture notes in mathematics/Lyapunov exponents, number 1186, 37–56. Heidelberg: Springer Verlag. Wiles, J., and Elman, J. 1995. Landscapes in recurrent networks. In Moore, J. D., and Lehman, J. F., eds., Proceedings of the 17th Annual Cognitive Science Conference. Lawrence Erlbaum Associates. Wolfram, S. 1984. Universality and complexity in cellular automata. Physica D 10:1–35.

Figure 1: Graphical depiction of SLDA1, a one-dimensional Linear Dynamical Automaton that generates Language 1. The 1 2 1 2 1 initial state (h0 ) is 0.5; F1 : [0, 1] → [0, 3 ] = 1 3 h, F2 : [0, 1] → [ 3 , 1] = 3 h + 3 , F3 : [0, 3 ] → [0, 1] = 3h, F4 : 2 1 1 2 2 [ 3 , 1] → [0, 1] = 3h ? 2; P1 = [0, 3 ], P2 = [ 3 , 3 ], P3 = [ 3 , 1]; P D1 = (0.25, 0.25, 0.5, 0), P D2 = (0.5, 0.5, 0, 0), P D3 = (0.25, 0.25, 0, 0.5). P Di = (p1 , . . . pK ) means that in Pi , the probability of selecting Fj is pj , for 1 ≤ j ≤ K .

1 0.8 0.6 0.4 0.2 0 0

h(n + 1)

0.2

0.4 h(n)

0.6

0.8

1

Figure 2: Neural network implementation of SLDA 1. The second order connections from the input units to the hidden layer specify the weight on the hidden unit self-connection. The hidden unit activation function is identity. The output units are threshold units with thresholds indicated on them. The outputs are binary codes. An additional layer can be added to compute probabilities as a normalization of the weighted binary codes. The next word is picked by sampling the output distribution.

a

0

x

0

y

2/3

b

?1/3

Output

1

1

1

?1

Hidden

1/3

0 1/3

2/3

?2 3

0

3

Input a x y b

- ON THE RELATIONSHIP BETWEEN THE LL(k) AND LR(k) GRAMMARS
- Research on the Relationship between Real Estate Price and Financial Market
- A Note on the Relationship between Linguistic Theory and Linguistic Engineering
- On the Relationship between Thinking Pattern and Translation
- Comments On the Relationship Between Teaching and Testing
- On the relationship between the GLRT and UMPI tests for the detection of signals with unknown parame
- 父子关系分析On the Relationship between Father and Son
- On the relationship between the Fourier and fractional Fourier transforms
- On the Relationship Between J-Integral and Crack Tip
- Effect of supply chain integration on the relationship between diversification and performance

更多相关文章：
**
***On* *the* *relationship* *between* *symbolic* *and* *neural* *computation*_....pdf

*On* *the* *relationship* *between* *symbolic* *and* *neural* *computation*_专业资料。There is a need to clarify *the* *relationship* *between* traditional *symbolic* *computation* *and* ...**
Computational Templates, ***Neural*NetworkDynamics, *and* *Symbolic* ....pdf

*the* *relationship* *between* sub*symbolic* *neural* networks *and* *symbolic* logical ...state) can thus be taken to be *the* outcome of *the* inference (*computation*...**
...based learning A comparison of ***symbolic* *and* *neural* network....pdf

Explanation based learning A comparison of*symbolic* *and* *neural* network approaches_专业资料。Explanation based learning has typically been considered a *symbolic* ...**
***Neural* *Computation*, 11(8)1995-2016, 1999. How to Design a ....pdf

or complete*neural* network re-implementation of some *symbolic* algorithm only....As there exists a part-whole *relationship* *between* a complete sentence (e.g...**
***The* Development of a *Neural* Network's Sub-*Symbolic* Knowledge ....pdf

*The* Development of a *Neural* Network's Sub-*Symbolic* Knowledge Extraction ... by *the* Rule extraction module (illustrated in Fig. 1), for *computation*....**
NEULA A hybrid ***neural*-*symbolic* expert system shell.pdf

NEULA A hybrid*neural*-*symbolic* expert system shell_专业资料。Current expert ...*On* *the* other hand, *the* result of *the* *computation* is a complete model of...**
Integrating ***symbolic* reasoning with *neural*ly represented back....pdf

Integrating*symbolic* reasoning with *neural*ly represented background knowledge_...into xed values *and* running a simulated annealing *computation* *on* *the* network...**
Discovery of ***symbolic*, neuro-*symbolic* *and* *neural* networks ....pdf

Discovery of*symbolic*, neuro-*symbolic* *and* *neural* networks with parallel ...*symbolic* *and* algebraic networks (comparable with *the* *computation* e ort ...**
***Symbolic*, *Neural* *and* Neuro-fuzzy Approaches for Pat....pdf

*Symbolic*, *Neural* *and* Neuro-fuzzy Approaches for Pattern Recognition in ...However, in recent years *neural* *computation* *and* hybrid architectures for ...**
Combining prior ***symbolic* knowledge *and* constructive *neural* ....pdf

Combining prior*symbolic* knowledge *and* constructive *neural* network learning_专业...*computation* has performed poorly (e.g., ambiguous data or large contextual ...**
***Symbolic* processing in *neural* networks.pdf

can be translated into recurrent (analog, rational weighted)*neural* nets. ... in *the* recent efforts to merge *symbolic* *and* sub*symbolic* *computation*. ...**
...from recurrent ***neural* networks using a *symbolic* machine ....pdf

Rule extraction from recurrent*neural* networks using a *symbolic* machine ..." *Neural* *Computation*, vol. 4, no. 3, p. 406, 1992. 21] R. ...**
Control ***and* Communication in Mental *Computation*.pdf

Control*and* Communication in Mental *Computation*_专业...First, it suggests that *the* *neural* hardware ...(Commentary *on* R. Miikkulainen: “Sub*symbolic* ...**
***Symbolic* *and* numeric *computation* of *the* Barnes functions.pdf

*Symbolic* *and* numeric *computation* of *the* Barnes functions_专业资料。This paper...z 2k *and* *the* functional *relationship* with *the* Hurwitz zeta function ζ (t...**
Control ***and* Communication in Mental *Computation*.pdf

Control*and* Communication in Mental *Computation*_专业...First, it suggests that *the* *neural* hardware ...(Commentary *on* R. Miikkulainen: “Sub*symbolic* ...**
...Association-based Natural Language Processing with ***Neural* ....pdf

Natural Language Processing with*Neural* Networks_...Combining an associative net- to *symbolic* ...cost *computation* such as semantic/discourse analyses...**
A HYBRID MODEL FOR VISUAL PERCEPTION BASED ***ON* DYNAMIC ....pdf

(1986).*Neural* *Computation* by Concentrating Information in Time. Proc. Nat....system, is based *on* *the* integration *between* *symbolic* *and* sub*symbolic* ...**
optimality-from ***neural* networks to UG.pdf

Recent research has explored*the* implications of one principle of *neural* *computation*, optimization, for *the* theory of grammar. Optimization over *symbolic* ...**
Extracting Rules from ***Neural* Networks Using *Symbolic* ....pdf

Using*Symbolic* Algorithms Preliminary Results_专业... are combined to form a global *relationship*. ...*Neural* *Computation*, 1:541551. Moreno, J., ...**
***Symbolic*-*computation* study of integrable properties for *the*....pdf

In this paper, by virtue of*the* *symbolic* *computation*, we will investigate ...(5.15)?(5.20) establish *the* *relationship* *between* two singular manifolds φ... 更多相关标签：

Explanation based learning A comparison of

or complete

NEULA A hybrid

Integrating

Discovery of

Combining prior

can be translated into recurrent (analog, rational weighted)

Rule extraction from recurrent

Control

Control

Natural Language Processing with

(1986).

Recent research has explored

Using

In this paper, by virtue of