9512.net
甜梦文库
当前位置:首页 >> >>

Universiteit Utrecht Supervised by


Evolution of syntax in groups of agents — doctoraalscriptie —
W.H. Zuidema Cognitieve Kunstmatige Intelligentie Universiteit Utrecht

Supervised by: Prof. Dr. Paulien Hogeweg Theoretische Biologie Universiteit Utrecht. Prof. Dr. Michael Moortgat Computationele taalkunde Universiteit Utrecht. Dr. Vincent van Oostrom Logica Universiteit Utrecht.

The study described in this thesis was carried out at the department of theoretical biology and bioinformatics at Utrecht University.

The cover picture is “Vragende kinderen” (1949) by Karel Appel. The quotes on the next page are taken from James Gleick [“Chaos: making a new science”, 1987] and Pinker & Bloom [1990].

“The sciences do not try to explain, they hardly even try to interpret, they mainly make models. By a model is meant a mathematical construct which, with the addition of certain verbal interpretations, describes observed phenomena. The justi?cation of such a mathematical construct is solely and precisely that it is expected to work.” — John von Neumann

“Evolutionary theory is informative about many things, but it has little to say, as of now, of questions of this nature [e.g., the evolution of language]. The answers may well lie not so much in the theory of natural selection as in molecular biology, in the study of what kinds of physical systems can develop under the conditions of life on earth and why, ultimately because of physical principles.” — Noam Chomsky

4

Contents
1 Language & evolution, a general introduction 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 1.2 Evolution . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 ?tness landscapes . . . . . . . . . . . . . . . 1.2.2 genotype–phenotype mapping . . . . . . . 1.2.3 coevolution . . . . . . . . . . . . . . . . . . 1.2.4 multiple levels . . . . . . . . . . . . . . . . . 1.2.5 space . . . . . . . . . . . . . . . . . . . . . . 1.2.6 selfstructuring . . . . . . . . . . . . . . . . . 1.2.7 dynamical systems . . . . . . . . . . . . . . 1.3 Language . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 phrase structure grammars . . . . . . . . . 1.3.2 productivity . . . . . . . . . . . . . . . . . . 1.3.3 arguments of innateness . . . . . . . . . . . 1.4 Evolution of language . . . . . . . . . . . . . . . . . 1.4.1 the language instinct . . . . . . . . . . . . . 1.4.2 a false dichotomy . . . . . . . . . . . . . . . 1.4.3 emergence of speech sounds . . . . . . . . 1.4.4 syntax without natural selection . . . . . . 1.4.5 the emergence of grammar . . . . . . . . . 1.4.6 compositionality overcomes an error limit . 1.4.7 this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11 13 13 15 16 17 18 18 19 20 21 22 23 24 24 25 26 27 28 30 30 33 33 34 37 39 40 42 45 46 47 49 51 51 52 53 53 54 55

2 Social patterns guide evolving grammars 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Model description . . . . . . . . . . . . . . . . . . . . . . 2.3 Previous work . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 evolution yields powerful grammars . . . . . . . 2.4.2 indexical and compositional dynamical regimes 2.4.3 group effects . . . . . . . . . . . . . . . . . . . . . 2.4.4 punctuated equilibria . . . . . . . . . . . . . . . 2.4.5 not being recognized . . . . . . . . . . . . . . . . 2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 3 Some properties of ?nite grammars and languages 3.1 Introduction . . . . . . . . . . . . . . . . . . . . 3.2 Classi?cation of routes . . . . . . . . . . . . . . 3.3 Dynamical properties: expressiveness . . . . . 3.3.1 rules & routes . . . . . . . . . . . . . . . 3.3.2 expressiveness & rules . . . . . . . . . . 3.3.3 characterization of changing grammars 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6 3.3.4 evaluation with simulation data 3.4 Statistical properties: language . . . . 3.4.1 substring frequencies . . . . . . 3.4.2 co-occurrence patterns . . . . . 3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 59 59 61 63 65 65 66 66 69 73 74 74 75 75 77 78 79 82 83 84 85 85 86 87 88

4 Selective advantages of syntax 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 4.2 The global model . . . . . . . . . . . . . . . . . . . . 4.2.1 model description . . . . . . . . . . . . . . . . 4.2.2 the development syntax is not trivial . . . . . 4.2.3 paradox of sel?sh language . . . . . . . . . . 4.3 The spatial model . . . . . . . . . . . . . . . . . . . . 4.3.1 model description . . . . . . . . . . . . . . . . 4.3.2 perception in space . . . . . . . . . . . . . . . 4.3.3 communication in space . . . . . . . . . . . . 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 homogeneous high expressiveness is optimal 4.4.2 frequency dependent selection . . . . . . . . 4.4.3 other aspects . . . . . . . . . . . . . . . . . . 4.4.4 spatial patterns . . . . . . . . . . . . . . . . . 4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . 5 Summarizing conclusions 5.1 Summary . . . . . . . 5.2 Implications . . . . . 5.3 Methodology . . . . . 5.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A Chapter 2 93 A.1 the algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 A.2 Examples of indexical and recursive regimes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 B Chapter 3 97 B.1 the Chomsky hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 B.2 decimal coding of strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 B.3 full string dendrogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 C Chapter 4 103 C.1 Examples of communication and perception runs . . . . . . . . . . . . . . . . . . . . . . . . 103

Preface
There is a strange gap in cognitive science between the overwhelming amounts of empirical studies and the the piles of books discussing cognition: the role of mathematical and computational models remains underdeveloped in the study of cognition, unlike for instance physics, biology and economy. This is unfortunate, because modeling pre-eminently can relate empirical ?ndings and diverse components of a theory. It can account for the many interactions between parts and evaluate internal coherence, much better, I believe, than a philosophical approach that fully depends on our human, and limited, analytical skills. One reason for this gap is that formal accounts of cognition are often unsatisfactory, because they need to make unpleasant simpli?cations and idealizations to work. Few are willing to accept predicate logic as a useful model for every-day reasoning, game theory as a useful frame-work to describe decision making or algebra as a useful description of human calculations. Science always simpli?es, but one would like to be able to use formal tools without abstracting away the mechanisms that really matter. With present-day computer facilities a promising, new kind of modeling has emerged. Computational modeling has largely broadened the scope of what can be modeled. Arti?cial intelligence has been viewed as providing exactly that: computational models for cognition. Its role as “theoretical cognitive science” could be similar to theoretical physics in physics and theoretical biology in biology. However, most work in arti?cial intelligence does not aspire to ful?ll such a role. Rather, it focuses on applications and issues in computer science. The computational modeling work that does exist, is done largely in isolation. An exception is the ambitious and popular research program of “connectionism”, that studies many different aspects of cognition in simple computational models. However, connectionism focuses too much on one particular representation and fails to acknowledge that the question determines which assumptions and representations are acceptable. (Although some interesting developments are taking place, Elman et al. [1998], MacWhinney, personal communication). What lacks, I believe, is a sound methodology of how computational models are studied, interpreted and compared and how such research should relate to other approaches. This thesis reports on a 7 months project carried out between November 1998 and January 2000 at the department of theoretical biology at Utrecht University. It borrows many insights and intuitions from that ?eld to study the origins of syntax in a computational model. Apart from being instructive about syntax and evolution, I also view this project as a personal orientation on how a methodology, that has been successful to help understanding biological evolution and animal behavior, can be applied to the questions of cognitive science. The approach taken consists of formulating a model as a computer program. The model is much simpler, of course, than the original system, but shares some of its fundamental characteristics. However, the model is usually too complex to study analytically. It is therefore studied much like experimental science studies real world systems: with experiments to test hypotheses and mini-models to better understand what is going on. Such an approach is productive, in the sense that the model behaviors are often unexpected and can thus yield new hypotheses and concepts. Computational models can further help to evaluate how generic some properties of a system are, and tackle the supposed self-evidence of arguments. Chapter 1 will discuss the theoretical background of the project in evolutionary biology and linguistics, and some of the related work. Chapter 2 presents a computational model of evolution of syntax, and evaluates the consequences of evolution within a group. Chapter 3 studies some details of the rewriting 7

8

CONTENTS

grammar formalism that is used in the models. Chapter 4, ?nally, investigates in two variants of the ?rst model which selection pressures lead to expressive syntax.

Acknowledgments
Ik wil graag allereerst Paulien Hogeweg bedanken voor een intense en stimulerende begeleiding. Ik heb al een hoop van je geleerd in de afgelopen anderhalf jaar, en ik zal in de komende jaren ongetwijfeld nog veel meer herontdekken van de dingen die je mij hebt proberen duidelijk te maken. Ik dank Michael Moortgat voor zijn begeleiding op afstand en voor de leuke en interessante discussies, en Vincent van Oostrom, die in een laat stadium derde beoordelaar werd, maar waardevolle kritiek heeft kunnen geven. Ik dank mijn hele beoordelingscommissie, en Albert Visser en Annemarie Besselink, voor het geduld en de medewerking, zodat ik, ver achter op schema, toch nog net op tijd kan afstuderen. Verder dank ik de studenten, promovendi van de vakgroep voor hun hulp en de stimulerende omgevA ing. Zonder de geduldige tips van Ludo Pagie en Stan Mar? e was het werken met C++ en LTEXeen ramp e geworden. Andr? ’s eindeloze kennis van de wiskunde was altijd beschikbaar en kwam altijd van pas. De e discussies, etentjes, wijn, bier en olijven op Club T, de wijnclub en andere gelegenheden hebben mijn stage tot een geweldige tijd gemaakt. Dank dus, aan Bas, Kirsten, Ivo, Christian, Jorge, Jos? , Martijn, Erik, e Eltica, Marc en Roeland. Ten slotte, ook dank aan Onno Zoeter voor eindeloze discussies op de meest onmogelijke momenten.

Chapter 1

Language & evolution, a general introduction
This chapter introduces some of the useful concepts and tools from evolutionary theory and linguistics that are used in the models of language evolution discussed in chapter 2, 3 and 4. It considers some of the ideas put forward from both these ?elds on the origins of language and some of the related mathematical and computational models that have been developed.

1.1 Introduction
The emergence of grammatical language is described as one of the major transitions in evolution [MaynardSmith & Szathm? ry, 1995] and a turning point in the history of the human species. Grammar in our coma munication system, allows humans to express an unlimited number of messages. It allows us to share experiences and insights in extreme detail, and to transmit knowledge over many generations. And it allows us to consider full scenarios (“narratives”) before acting. These features make human language fundamentally different from other animal communication systems. For both students of evolution and of human language, the origins of language are an intriguing domain with many unsolved problems. This chapter introduces some of the useful concepts and tools from evolutionary theory and linguistics that are used in the models of language evolution discussed in chapter 2, 3 and 4. It considers some of the ideas put forward from both these ?elds on the origins of language and some of the related mathematical and computational models that have been developed. Evolution, throughout this thesis, is studied as a dynamical process, of which the dynamics and outcome are not trivial. Modern evolutionary theory in that tradition, offers a framework to describe how variation and selection operate on different levels of information transmission, in systems with a spatial distribution of replicators and in domains with nonlinear genotype–phenotype mappings. The work on understanding evolution in both biology and arti?cial intelligence, provides a perspective that is useful for studying the origins of language. The transition towards grammatical language is considered to be one of the major transitions in evolution like the emergence of DNA-coding and multicellularity. It is thought to share properties with these transitions like the opening of new levels of transmission and selection, division of labor and open-ended coding [Maynard-Smith & Szathm? ry, 1995]. a Linguistics, the second source for this thesis, offers a useful description of language processing, language acquisition and development, cultural differences and universals. Linguistic tools and concepts, like “phrase structure grammars”, “compositionality” and “recursion”, are exploited in the studies of this thesis. However, mainstream linguistics [i.e. Cook, 1993] departs from two important distinctions, which we believe do not necessarily provide the most useful perspective on language origins. Those are the distinction between competence and performance, and between syntax and semantics. These distinctions have led researchers to study language competence as an idealized formal system. Because empirical and ana9

10

CHAPTER 1. LANGUAGE & EVOLUTION, A GENERAL INTRODUCTION

lytical observations have indicated a universal and innate base for that system, some have concluded that the essentials of syntax are explicitly coded in our genes and generated in a process of genetic evolution [Pinker & Bloom, 1990]. We believe that the underlying assumption that elements of syntax are either learned, or explicitly represented in the genome, underestimates the complexity of patterns that may spontaneously form in the many interactions involved in language acquisition and language evolution. Restricting oneself to the study of syntax and competence, excludes many of these crucial interactions. Recent work in computational modeling of language origins has therefore focussed on language in groups of agents in a meaningful context, and thus more on performance and semantics than on competence and syntax. Even though the languages of these agents are extremely simple, the new complication of studying language dynamics in a group structure yields intriguing new insights. The picture that emerges from a diverse range of computational models, is that, while at odds with some of the key assumption in contemporary linguistics, communication systems can arise that are successful in several respects: they are shared by all members of a population without relying on an innate speci?cation, and they can be expressive and show recursion and compositionality without an underlying formal system in the Chomskyan sense. However, this by no means excludes a role for evolution. The view of “selfstructuring as a substrate for evolution” [Boerlijst & Hogeweg, 1991a], as proposed in the ?eld of evolutionary biology, might offer a way to study linguistic patterns that form spontaneously but are not necessarily accidental. The next sections will review some of the work in contemporary evolutionary theory and linguistics, and discuss several ideas put forward from both these ?elds on the origins of human language. Where adequate, I will include some work from machine learning in the discussion, because the fundamental questions in this ?eld relate in many ways to evolutionary and linguistic theory. Section 4 will discuss some related work on the evolution of language.

1.2 Evolution
Crudely, one can distinguish two types of research in evolutionary biology. One class of research simply assumes that evolution leads to better solutions. This research is primarily interested in the end-product of evolution. It speculates about why zebra’s have stripes, why frogs are green and why human have language. This type of research, often called adaptationism, studies how “adaptations” are bene?cial for an organism, and in some sense takes evolution as nature’s omniscient designer. However, anyone with some experience with evolutionary algorithms on computers, soon realizes that one can not equate evolution and optimization. Whether or not a successful solution is found, depends on many details of the algorithm and the problem domain. Similarly, the course of evolution in nature is not a trivial route to the top. These issues are studied in a second class of research, that is interested in “evolutionary dynamics”: the way evolution leads to different solutions and the factors that in?uence this process. Darwin’s theory of evolution by natural selection from 1859, of course still constitutes the framework of the research on evolution. However, rather than as the ?nal word, one should consider the principle of “survival of the ?ttest” as de?ning a paradigm, in which a plethora of phenomena, mechanisms and often contradictory theories are grouped together. Research on evolutionary dynamics has brought many conceptual and mathematical tools that characterize the ways in which Darwin’s framework can be ?lled in1 .

1.2.1 ?tness landscapes
A powerful image that is often invoked to understand evolutionary dynamics, is that of a ?tness landscape. A ?tness landscape is the multidimensional landscape that is de?ned by ?tness on one axis (the expected number of offspring of an individual, or some other quality-measure) and the genotype-space
1 The discussion

in this section, originates for large part in Paulien Hogeweg’s course on Bioinformatics (unpublished).

1.2. EVOLUTION

11

on all other axes. Some intuitions on types of evolutionary dynamics can be gained from thinking about the shapes such a landscape can have. One can, for instance imagine that this landscape has a very smooth shape, with one global maximum in ?tness and a smooth decrease in ?tness proportional to the distance from this maximum in genotypespace. On the other side of the spectrum, one can imagine a ?tness landscape where the local maximum is surrounded by deep valleys and the landscape contains many local peaks. Such a landscape is said to be “rugged” (see ?gure 1.1). Kauffman developed a well known procedure to produce high-dimensional landscapes (“NK-landscapes”) with varying degrees of ruggedness [Kauffman, 1993].

quality

solutions

quality

solutions

(a) A smooth landscape

(b) A rugged landscape

Figure 1.1: An example of a (a) smooth ?tness landscape, and (b) a rugged ?tness landscape. Note that for all interesting systems the genotype space is high dimensional, unlike the 1 dimensional “solution”-axis in this example.

Fitness landscapes in some sense relate evolutionary research to machine learning problems. It’s not hard to see that the higher the ruggedness of a landscape, the harder an optimization task becomes: a point in genotype space contains less information on the ?tness value of the neighboring points. Learning algorithms, including evolutionary optimization, tend to get “stuck” on the many local maxima. However, “locality” in such a landscape is determined by the coding of the problem and the applied algorithm. Each combination of a problem and a an algorithm will thus yield a different ?tness landscape, with varying degrees of ruggedness and, consequently, success in optimization. In machine learning, ?tness landscapes are called “error landscapes” and genotype space is “hypothesis space”. Those concepts can be traced back to the idea of “energy landscapes” in physics, that are used to describe the process of some physical systems (e.g. “spin glasses”) that move towards a minimal energy distribution (see ?gure 1.2). The crucial concept to describe the behavior of a machine learning algorithm is “inductive bias”[Mitchell, 1997]. Inductive bias in some sense characterizes the shape of an error landscape. It can be divided in two components: representation bias and search bias. Representation bias describes all solutions (hypotheses) possible in the representational language of the algorithm (which determines the size of the hypothesis space). Solutions that cannot be represented, can obviously not be learned. Search bias describes, in some sense, the order in which possible solutions are evaluated. This depends on the initial conditions, neighborhood relations in the hypothesis space and the kind of steps taken in this space.

1.2.2 genotype–phenotype mapping
Fitness landscapes form a powerful conceptualization that helps understanding the performance of algorithms and the differences between them. It is, however, dangerous to depend too much on intuitions one can form about 2D or 3D representations of ?tness landscapes. One issue where such intuitions easily go wrong, is “neutrality”: an equal ?tness level for a subspace of genotypes. In a 3D landscape, a large plateau of equal ?tness seems rather unlikely. However, an important insight that has emerged in the last

12

CHAPTER 1. LANGUAGE & EVOLUTION, A GENERAL INTRODUCTION

Figure 1.2: It is often useful (and sometimes dangerous) to think about energy, ?tness or accuracy, as something like this landscape. The z-axis is the energy, and the x-y place represents the possible states of the system.

few years, is that the possibility of “neutral networks”, networks of connected points in genotype space that have equal ?tness, increases with an increasing dimensionality. Chapter 2 discusses how neutrality might play a role in explaining the dynamics observed in the model of language evolution we studied. In this study we observe periods of stasis (“epochs” or “punctuated equilibria”), with certain bursts of innovation and subsequent increase in ?tness. Van Nimwegen et al. [1999] studied this phenomenon. They showed in both a mathematical analysis and in computer simulations of RNA-evolution that neutrality is likely to be the underlying mechanism. The crossing of “?tness barriers” (infrequent, stochastic escapes from local maxima), which are often assumed to be the explanation for the phenomenon, are unlikely to play a role. Neutrality arises if there is redundancy and non-linearity in the genotype–phenotype mapping. In many evolving systems one can identify such redundancy. The 1D sequences of RNA, for instance, form a very redundant coding for its functional 3D foldings. Many different sequences lead to the same 3D folding. On the other hand, similar sequences often code for totally different foldings. Thus, from any part in the genotype-space other parts of the space can be reached following the neutral network, without a decrease in ?tness. But also, speci?c “solutions” are spread over the genotypespace, such that from any part of this space they can be found at a reasonable distance. These properties of the genotype–phenotype mapping, are thought to facilitate evolutionary optimization: evolution does not get stuck on local peaks, because it can always escape over the neutral network. Moreover, in any part of the genotype space many different solutions are reachable within a few mutations. Redundancy and neutrality can be important aspects of the mapping from a language generation system to the actual language, and are in fact observed in the models discussed in this thesis. Unlike the representations in many genetic algorithms, but similar to many biological genotype—phenotype mappings, the grammar-language mapping in the models of this thesis are very non-linear and redundant. This is an important aspect, as it might signi?cantly alter the evolutionary dynamics. In such systems, different routes can be followed to the same solution. It has been pointed out that the language shares another property with genetic coding. That property is the ability to take an inde?nitely large number of forms [Maynard-Smith & Szathm? ry, 1995]. In a genetics, this leads to unlimited heredity, or open-ended evolution, which is thought to be a crucial property for the origins of life. In language, this opens up the possibility for “in?nite expressiveness”, which is thought to be a crucial selective advantage in the development of human language.

1.2. EVOLUTION

13

1.2.3 coevolution
Another deviation from the traditional ?tness landscape picture, also plays a role in the models of chapter 2 and 4. The discussion above assumes that ?tness is a static (though perhaps complicated) function of the genotype. This might be a reasonable assumptions in arti?cial evolutionary optimization problems, in many natural evolutionary processes ?tness also depends on the population composition, a changing environment and other evolving species. Moreover, since ?tness is an average over an enormous set of circumstances, individual “?tness evaluation” might often be based on only a sample from the complete set (“sparse evaluation”). This makes the ?tness landscape in fact a ?uctuation landscape (a “seascape”) that changes shape every time step, and many of the results for static landscapes don’t necessarily hold in such a situation. In general, when the ?tness function changes over time, this is referred to as “coevolution”, although often this term is used to describe speci?cally the situation of two evolving species, where the ?tness of one species depends on its own traits and the traits of the other species. The situation where members of only one species evolve and their ?tnesses depend on the composition of the population, is often referred to as “frequency dependent selection”. Chapter 2 discusses such a situation, in which the dynamic nature of the landscape leads to solutions that cannot be reached in a static landscape. On the other hand, in other circumstances it prevents the development of solutions one would expect from intuitions about static landscapes. Chapter 4 further elaborates on the issue of frequency dependency. As such, this study is relevant for the recent optimism about “coevolutionary learning”. This optimism is based on the intuition that coevolution provides an inherent didactic that might be useful for many learning tasks. One aspect of this, is the “arms race” that can result if two competing species (or competitors within one species) develop ever more ef?cient tools to hinder their competitors. The weak spots of a particular solutions will therefore continuously be detected, exploited and subsequently “adjusted” [Hillis, 1991]. A related intuition is that coevolving species provide a “beginning small” environment for each other [Elman, 1991], and the environment only gets more complex when the species have evolved to deal with the simple cases. A third aspect is the “sparse evaluation”: at each generation individuals are only confronted with a subset of all possible problems. Simulation results indicate that such sparse evaluation might lead to better and better generalizable solutions [Pagie & Hogeweg, 1997].

1.2.4 multiple levels
Studies of evolution usually consider the situation, in which the reproductive success of individuals depends on the set of traits they carry. Individuals thus are the replicators, and selection and information transmission only operate at one level: the individual level. However, in real life entities at a subindividual level, such as genes, organelles, cells, organs and cultural elements (“memes”), also replicate, mutate and undergo selection. Similarly, groups of individuals can compete with other groups and undergo their own evolution. Recent developments in experimental and theoretical biology have shown that these multiple levels alter the evolutionary dynamics in a sometimes quite unexpected manner. Richard Dawkins has championed the view of “sel?sh genes”, arguing that not the individual, but the gene is the level of selection [Dawkins, 1989]. Individuals, in this view, are merely the machinery for genes to procreate. This extreme view also fails to acknowledge the existence of multiple levels; it does, however, draw attention to interesting phenomena like “sel?sh elements”, that are in fact empirically observed. A nice example are B chromosomes: extra chromosomes, that are not part of the normal complementary sets of chromosomes, are found in many species and in most well-studied cases lower the ?tness of their carrier. At least in some ?owering plants, their persistence can be explained by a superMendelian rate of inheritance, which results from actively “jumping the queue” in cell-division [Burt & Trivers, 1997]. An elegant theoretical example is Boerlijst’s work on spontaneous spiral wave patterns [Boerlijst & Hogeweg, 1991b]. The spirals establish a new level of selection, with the ultimate consequence of inversing selection pressures on the individual level. This will be further discussed in section 1.2.5. Understanding the interaction between multiple levels is especially important for the evolution of language, as language typically is something that occurs in groups of individuals. Also, language is something that can be transmitted (at least partly) culturally, and thus replicate and mutate, and itself become

14

CHAPTER 1. LANGUAGE & EVOLUTION, A GENERAL INTRODUCTION

a “replicator” (cultural evolution). Both at a super- and at a sub-individual level one can expect evolutionary processes to alter the dynamics. Deacon [1997] presents a theory for the origins of language (“the coevolution of language and the human brain”), which is based on the intriguing idea that evolution on the level of languages (superindividual level) has led to languages that best ?t our cognitive peculiarities, while evolution on the level of the individual has led to brains that best deal with language.

1.2.5 space
Many models in biology ignore the spatial positions entities have. Including this information in a model, often leads to surprisingly different behaviors. Maarten Boerlijst, for instance, studied Manfred Eigen’s hypercycle model in a spatial embedding [Boerlijst & Hogeweg, 1991b]. Eigen’s model concerns replicating molecules that catalyze each others reproduction, and was originally proposed as a mechanism for the origins of life. However, the situation in the model can not be maintained in evolution, because there exists no selection pressure for catalyzing the reproduction of other molecules. Molecules (“parasites”) that don’t catalyze, but do receive catalysis will sooner or later appear and take over the population (before going extinct themselves). Boerlijst modeled Eigen’s hypercycle on a 2D lattice, where each molecule occupies one square and interacts only with its 8 immediate neighbors. In this spatial embedding, rotating spiral structures emerge. The spirals yield a new level of competition and selection, and the dynamics on the spiral level feed back on the individual molecule level. As a result, there now is a selection pressure on giving catalysis, as it contributes to the “?tness” of the spiral. Boerlijst’s model in fact turns out not to be vulnerable to parasites. This example shows one of the roles a spatial embedding can play in evolutionary models. In fact, it has been shown many times now that altruism and cooperation can much easier evolve in models that include space. This is a relevant observation for the evolution of language, as language is something that seems to require cooperation and shows at least some kind of spatial patterns. The example also shows how different levels of selection can naturally emerge, and alter dynamical properties of systems.

1.2.6 selfstructuring
Boerlijst’s results also show that quite complicated behavior can arise from very simple basic principles. This phenomenon is usually called “selforganization” or “selfstructuring”, and has received much attention in physics, theoretical biology and arti?cial life. Debates exist on the ontological status of “selforganization”; here we will adopt a pragmatic approach and simply speak of selfstructuring2 if unexpected, complicated patterns arise from simple rules. The last decade has brought many examples of selforganization, that should make one cautious of one’s intuitions on the global behavior of systems with simple local rules. A nice example is the work of Stan Mar? e on Dictyostelium discoideum, unicellular amoebae that under scarce food conditions aggree gate to form a multicellular organism. The aggregation and a wide range of rather complex behaviors, can be explained by a small set of very simple principles and parameters [Mar? e et al., 1999]. e Boerlijst & Hogeweg [1991a] propose to view selforganization as a substrate for evolution. That is, the parameters in selforganizing systems that determine which of a rich set of behaviors takes place, are the ideal handle for evolutionary processes to work on. This is, also for the evolution of language, an appealing proposal, as it might offer a way out for those who accept an innate bias for learning language, but refuse to accept speci?c language genes.

1.2.7 dynamical systems
Evolutionary models in general, and the models of this thesis in particular, can be viewed as dynamical systems. Such systems are de?ned in terms of rate of change of variables, rather than in terms of values of the variables. In simple systems this is merely a different perspective, that makes little difference in the understanding of a phenomenon. One can, for instance, describe the behavior of a pendulum easily
2 The term “selforganization” suggests an unpleasant

teleology that is not intended here; hence “selfstructuring”.

1.3. LANGUAGE

15

with an equation that relates position (x) to time (t). Describing the behavior with equations that specify some initial condition (x(0)) and the rate of change over time (dx=dt), adds little to our knowledge of pendulums. More complex systems, however, often are much better understood when viewed as a dynamical system. Mathematical developments have yielded a rich language to describe dynamical phenomena, such as: phase space the space de?ned by the variables of the system on each dimension; equilibria the points in that space where all rates of change of variables are zero; null-clines the regions in the phase space where the rate of change of one variable is zero; stability whether or not in?nitely small perturbations can drive a system out of an equilibrium; trajectories the path a system follows through the phase space when initialized at a certain position; attractors points or regions in the phase space that serve as the end point of trajectories; basin of attraction the region in phase space from which all trajectories lead in the limit to a particular attractor; limit cycles the circular trajectories some systems will follow in the limit. Limit cycles can be both stable (attractors) and unstable. bifurcations qualitative changes in the structure of the phase space, such as the appearance of a new equilibrium or a change in stability. chaos the long-term unpredictability of a system, even though short-term behavior is known with in?nite precision, because of sensitive dependence on initial conditions. These concepts and the underlying mathematics of the theory of bifurcations and chaos, have proven to be extremely useful in describing a wide variety of physical, chemical, biological and economical systems. In biology, especially the ?elds of immunology, ecology and neuroscience have pro?ted from such mathematical models. However, thorough mathematical analysis remains restricted to systems with relatively few variables. Present-day computer facilities allow for the systematic study of computational models with many more variables. The models of language evolution studied in chapter 2 and 4 are examples of such models. They also bene?t from the dynamical systems perspective. In our analyses we will use concepts like phase space and attractor, and study the stability of some of the observed regimes.

1.3 Language
Modern linguistic theory grew out of disagreement with behaviorism, the psychological and biological paradigm of the ?rst half of the twentieth century that refused to accept that a scienti?c theory of behavior could postulate internal cognitive states. Noam Chomsky, without doubt the 20th century’s most in?uential linguist, in fact de?ned “linguistics” as the study of internal language, the language system internal to an individual, that can only indirectly be observed externally. Fundamental to Chomsky’s view on language is the distinction between “competence” and “performance”. The former concerns the system of internal language, and is thought to be of a “pure” logical and systematic nature. The latter concerns the actual observable external language, which through cognitive limitations, errors and all sorts of other noise does not necessarily show all fundamental properties of the internal language. A metaphor that makes this distinction more clear, is given by Cook [1993]: one can compare competence with the Highway Code, known by all drivers on the highway, but not necessarily always obeyed in their actual performance. A second important distinction is made between syntax and semantics: the form of expressions versus their meaning. These two aspects of language are studied rather separately in Chomskyan linguistics, with traditionally a large emphasis on syntax. The famous (and clich? ) example that illustrates the validity e of this distinction, is Chomsky’s sentence “colorless green ideas sleep furiously”. This sentence conveys no meaning, yet is accepted as fully grammatically correct. If one accepts these two distinctions, some crucial components of the generativist linguistic theory quite naturally unroll. This section will discuss the formal representation of language competence

16

CHAPTER 1. LANGUAGE & EVOLUTION, A GENERAL INTRODUCTION

championed by Chomsky, and some observations about natural language, concerning its in?niteness, its learnability and its universality. Alternative accounts of linguistic abilities exist, ranging from cognitive grammars and categorial grammar to connectionist models. As they play no role in the studies of this thesis, we will not discuss them here. For models of language origins some general and non-controversial observations are important: i. Human language is (in?nitely) expressive, in the sense that it can express an inde?nite number of meanings with a ?nite set of forms; ii. Language is compositional. It is composed of meaningful parts, and the order in which these parts are put is relevant (structure sensitive); iii. Language is recursive. Parts can be nested in larger structures, like in “the dog that chases the cat that chases the mouse runs fast”; iv. Languages are diverse on a global scale, but uniform on a local scale, allowing for ef?cient communication with neighbors; v. Languages are dynamic, constantly subject to innovations; vi. Languages share universal tendencies; vii. Language is used for very diverse purposes, including information exchange (communication), but also expression, manipulation, intimidation and social cohesion.

1.3.1 phrase structure grammars
One of the most powerful aspects of Chomsky’s theories, is the fact that he introduced a sound mathematical framework along with a verbal, more intuitive reasoning. The combination of phrase structure grammars and transformational operations provided a rich enough representation to convince one linguist after another, up to then primarily concerned with historical linguistics, of the potential of Chomsky’s work [Gardner, 1987]. Phrase structure grammars describe the syntactical form, the “phrase structure”, of language competence. They consist of a list of rules, that each rewrite some sequence of symbols to another sequence of symbols. Symbols are elements of a terminal alphabet or a non-terminal alphabet. There is one special non-terminal symbol S , called the start symbol. Language production, “derivation”, with this formalism, is applying the rules iteratively to the start symbol until some sequence of terminal symbols is reached. Language comprehension, “parsing”, on the other hand, is, given such a terminal string, ?nding a series of rewriting steps that lead from S to this terminal string. Phrase structure grammars are also called rewriting grammars. The following rewriting grammar can, for instance, generate the English sentences “the child cries” or “the child laughs”:

S 7! NP VP NP 7! the child VP 7! cries VP 7! laughs

(1.1) (1.2) (1.3) (1.4)

The derivation of such a sentence, can be represented as a tree. One can easily extent the grammar to deal with multiple subjects, or more complicated verb phrases. For sentences more complicated than such “kernel sentences”, such as questions or passive forms, one also needs transformations that operate on the tree representations (“transformational generative grammars”). Efforts to describe the full grammar of a natural language like English, yield enormous lists of rules and transformations. Such grammars give a reasonable adequate, though never complete, description of the sentences in English that a native speakers would classify as grammatical.

1.3. LANGUAGE

17

Although Chomsky and generative linguistics have moved beyond transformational generative grammars, rewriting grammars remain a popular representation in linguistics. Because of their mathematical simplicity and yet huge computational ability, they have proven their usefulness in other ?elds as well, ranging from computer science and logic, to molecular biology and literary studies. The models studied in this thesis also use rewriting grammars as the representation of language capacity; this is not so much motivated by our support of Chomskyan linguistics, but rather some of the elegant properties of the grammars: a parsimonious, well understood formalism with a redundant, non-linear genotypephenotype mapping. Chapter 3 discusses some more relevant details of rewriting grammars.

1.3.2 productivity
Comparing human language to what is known of animal communication systems, the most striking difference is the huge expressiveness of human language. Most sentences one hears, are sentences never expressed before. The child that said “My guinea pig died with its legs crossed” [Cook, 1993], formed a totally novel combination of known words. This creativity of users of language, depends on its compositionality and recursiveness. Compositionality in a sentence is the property of consisting of components, that can be combined together in different ways. Words in human languages are such components; the grammar speci?es the ways words can be combined, where each combination corresponds to a different meaning. In rewriting grammars, compositionality appears when the right-hand sides of rules contain nonterminal symbols, that themselves can be rewritten in several ways (i.e. rule 1 in the example above). Recursiveness in a sentence is the property of consisting of components of a certain type, that themselves can be composed of elements of the same type. The noun phrase “the crying baby” in English, for instance, contains the component “the baby” that itself also is a noun phrase. Conversely, “the crying baby” can itself again be a component of “the sad crying baby”. Recursiveness occurs in rewriting grammars, for instance, if there are rules that rewrite a non-terminal symbol to a string that contains the same non-terminal symbol. Although, because of cognitive limitations and limited life-span, the external language will always be ?nite, because of recursiveness the internal language is in?nite, generativist linguists claim. Any acceptable representation of language, must thus also yield an in?nite expressiveness. This is known as the “productivity argument”. However, if one questions the usefulness of the distinction between competence and performance, the productivity argument looses much of its power. The established in?niteness of human language, and many of the subsequent inferences, depend on this distinction. Although results with neural network modeling of language are still very limited, the fact that some recurrent neural networks [Elman, 1991; Batali, 1997] can use a language that shares at least some properties with human communication, should make one cautious about inferring the necessity of an underlying formal system. Postulating an idealized, in?nite formal language internal to the networks, does not seem to offer a meaningful level of description. It is often assumed that without in?niteness acquiring a language becomes trivial, because one can simply enumerate all possible sentences. This is not the case, however. As in many machine learning problems, learning or evolving a language involves a high dimensional search space that might be ?nite, but nevertheless is so huge, that an extensive search is impossible in reasonable time, regardless of the resources. In this thesis, in fact, we restrict ourselves to ?nite domains. Nevertheless, sensible numerical and analytical results can be obtained.

1.3.3 arguments of innateness
A second observation of generative linguists, is that children in acquiring language, learn it remarkably fast, from very few examples and without hardly any negative feedback. Children do show in their speech knowledge of all kinds of speci?c grammatical rules, without ever having encountered examples of those rules. The only explanation, generativist claim, is that these aspects of language are innate. This is known as the “poverty of stimulus argument”.

18

CHAPTER 1. LANGUAGE & EVOLUTION, A GENERAL INTRODUCTION

Gold [1967, discussed in Elman et al., 1998] provides a formal aspect to this argument. He shows that a broad class of (in?nite) grammars are not learnable from (?nite) data, without giving the learner an extensive “innate” bias. Learnability is even harder if only positive feedback is allowed. Because natural language grammars are thought be in the class of “hard” grammars, and negative feedback is assumed to be absent, Gold’s analysis is usually interpreted as strong support for the innateness hypothesis. Other evidence for an innate “language instinct” comes from data showing universal features of all human languages. Bickerton studied pidgin and creole languages. Pidgin languages are simple communication system with little grammar that emerge in situations where people with very different native languages need to communicate, such as the inhabitants of Hawaii in times of slavery. Creole languages are the languages of the second generation: the children that grow up in a pidgin language environment. Surprisingly, the second generation seems to acquire a fully developed language, that shares the “universals” with other languages [Bickerton, 1988, discussed in Pinker, 1994]. The poverty of stimulus argument and data showing the universality of some aspects of language, lead generative linguist to believe that language has an innate component, the universal grammar, that speci?es the principles all human languages obey. Acquiring a language, in this view, is obtaining a lexicon and setting the parameters. Some critics argue that Gold’s formal prove is not valid for the “real” human language [Elman et al., 1998], and that no true language universals are identi?ed. However, given that all humans and no other species use language, the fact that there is some innate bias is not controversial. In fact, there exist good theoretical reasons for why any learner has a bias to learn certain solutions easier than others [Mitchell, 1997]. In the next sections we will explain, why we believe one can accept such a bias, without accepting the idea of an explicit, innate universal grammar.

1.4 Evolution of language
1.4.1 the language instinct
Probably the most well known (and very well articulated) speculation about the origins of human language, is Steven Pinker’s book “The language instinct” [Pinker, 1994], based on an earlier paper by Pinker and Paul Bloom [Pinker & Bloom, 1990]. In that paper they summarize their view quite clearly: Our conclusion is based on two facts that we would think would be entirely uncontroversial: language shows signs of complex design for the communication of propositional structures, and the only explanation for the origin of organs with complex design is the process of natural selection. [Pinker & Bloom, 1990] This quote immediately reminds one of the approach in evolutionary studies described above as “adaptationism”. The paper argues that language is adaptive and takes as a fact that evolution leads to such adaptive solutions. The paper does, however, consider in some detail the way language functions and could have functioned and be advantageous. And it discusses some possible intermediate steps from an extensive “animal” communication system, to the human linguistic abilities. Pinker [1994] recapitulates the Chomskyan line of arguments. Syntax is independent of semantics, as one can observe in syntactical correct but meaningless sentences. Underlying this syntax must be a productive, formal system, to obtain the in?niteness of human language. This system is acquired by infants so fast and with so little data, and all human languages share so many characteristics, that there must be a shared, innate component: the universal grammar. Pinker & Bloom original contribution, is the claim that such a system cannot arise spontaneously or as a side effect of other (cognitive) developments. For that, language is too intricate and complex. Moreover, language clearly shows signs of adaptiveness: it allows humans to communicate in?nitely many messages, including messages that refer to events that happened at other times and places than the present, messages that convey conditional and causal information or complete narratives that allow one to draw lessons from someone else’s experience. Each of these features can be bene?cial in itself, and thus be an intermediate step in the evolution. Inevitably, Pinker & Bloom argue, one should conclude that genetic evolution is the explanation for human’s

1.4. EVOLUTION OF LANGUAGE

19

remarkable linguistic talent. The innate blueprint for an individual’s “internal language”, the universal grammar, has been gradually updated to get from a non-grammatical protolanguage to the grammatical complexity of present human language. In many ways Pinker & Bloom’s paper re?ects the competence-performance and syntax-semantics distinctions that are so dominant in generative linguistics. Their theory is in the ?rst place a theory on how, in the development of our species, internal language has obtained its formal characteristics. However, to make their argument, Pinker & Bloom repeatedly need observations that have a lot to do with semantics and performance. E.g. the references to the semantic needs to express information about time or causality, or the reference to anthropological data that suggests that gifted orators receive more offspring in tribal societies.

1.4.2 a false dichotomy
Pinker & Bloom’s work is symptomatic for the popular conviction that one can only choose between two types of explanations for the origins of human, grammatical language. One refers to the powerful learning abilities of humans, and assumes that a human child, born as a tabula rasa, rapidly learns the regularities in its linguistic environment. Language speci?cs originate in the cultural transmission of language knowledge from one individual to the other. The other type of explanation assumes a large innate component, e.g. the “universal grammar”, that speci?es the principles of human languages. This innate component reduces the load on learning to merely identifying the “parameters” of a particular language’s grammar and building a lexicon of wordmeaning relationships. The essential, universal aspects of language originate in the process of natural selection: genetic evolution has updated our “language organ” to meet our adaptive demands. This twosideness of the origin of language-debate echoes much of the unpleasant simpli?cations and conceptual confusion of other “nurture–nature” (and symbolism–connectionism) discussions. The fact that there is something innate and something learned is never controversial. Chinese babies learn perfect English when brought up in England, and chimpanzees never learn to speak a human language ?uently, no matter how carefully trained. In fact, any biological trait always arises from the interaction between genetic coding and an environment. Discussions of the “innateness”-hypothesis hinder the understanding of such interactions by repeatedly suggesting that language knowledge is either learned, or explicitly coded for. For instance: Any aspect of language that the speaker knows must either be learnable from positive evidence, that is to say, through exposure to sentences of the language, or be part of the innate equipment of the human mind. [Cook, 1993]. Underlying such reasoning is a strong intuition that the patterns observed in human language are too complicated to arise “spontaneously”. However, an impressive amount of examples show that intuitions about the complexity of underlying mechanisms are often ?awed. These examples, ranging from Alan Turing work on pattern formation and morphogenesis, to Chris Langton’s “Ant”, are usually described as selforganization or selfstructuring. Mechanisms of such spontaneous pattern formation in linguistics remain largely unexplored, although some recent studies suggest that its impact can be large and thus its misunderstanding unfortunate. “Spontaneous” should not be confused with “accidental”, though. Pinker & Bloom’s metaphor of sunshades and television sets makes a valid point: If someone told you that John uses X as a sunshade or a paperweight, you would certainly be hard-pressed to guess what X is or where X came from, because all sorts of things make good sunshades or paperweights. But if someone told you that John uses X to display television broadcasts, it would be a very good bet that X is a television set or is similar in structure to one, and that it was designed for that purpose. The reason is that it would be vanishingly unlikely for something that was not designed as a television set to display television programs; the engineering demands are simply too complex. [Pinker & Bloom, 1990]

20

CHAPTER 1. LANGUAGE & EVOLUTION, A GENERAL INTRODUCTION

However, we believe that by putting selfstructuring and evolution in opposition Pinker & Bloom do, but also [Steels, 1997c], excludes the fun part of the story. Evolution needs a substrate to operate on, and selfstructuring needs a mechanism to set the right parameters. Boerlijst & Hogeweg [1991a] therefore propose to view selfstructuring as the substrate for evolution. Applying this view — for which well documented case is made in evolutionary biology — to language, offers a fresh perspective on some of the recurrent problems in linguistics. It might very well be that children use grammatical rules in their speech without ever having encountered them. But such rules don’t need to be hard-wired in an infant’s genome, if they are a consequence of the interaction between the infant’s brain structures, its perceptual and motoric machinery, and its physical and cultural environment. Consequences of such interactions are often counterintuitive. They are therefore best studied in mathematical and computational models. In the remainder of this chapter we will discuss four such studies that relate to the work in this thesis. The ?rst three of these study spontaneous pattern formation in language development and investigate the origins of language in groups of agents. The fourth study studies the dynamics of language evolution from a game-theoretic perspective.

1.4.3 emergence of speech sounds
An excellent example of selfstructuring in linguistics is the work of De Boer [1999]. De Boer discusses the tendencies found in human vowel systems, which are thought to be an example of language universals. Humans are capable of producing at least 45 different vowel qualities. However, human language uses only a subset of these and these subsets show some remarkable regularities. The nativist account of such universal patterns, is that there is an innate component to the human phonological system that speci?es the principles any speci?c vowel system must obey. Acquiring the vowels of a language, is setting the parameters that ?ll in the space the principles leave open. While not disputing that there might indeed be such universal tendencies in the structure of vowel systems, De Boer gives an elegant explanation based on the idea of selfstructuring. He studied a simulation of a population of agents that play a simple game, the imitation game. In this game, one agent makes a sound, chosen randomly from its repertoire. Another agent hears the sound, compares it with the sounds it has in its own repertoire and makes the sound that comes closest. The ?rst agent hears the sound, and communicates non-verbally whether it believes the two sounds were similar (success) or not (failure). Subsequently, both agents adjust their repertoire. The possible sounds in the repertoire, the similarity measures and the adjustments, are based on the physical properties of the human auditory and articulary systems. Together, these assumptions are suf?cient to explain the emergence of a sound system that has remarkable similarities with human sound systems. The different systems that result from different simulations and their distribution, map almost perfectly on the distribution of sound systems found in human languages. De Boer ?nds that these results are robust over a wide range of settings, including those with open systems where new agents continuously enter the population and others leave (die). Interestingly, De Boer ?nds that a vowel system is best preserved if he includes an aging mechanism: the older agents get, the less likely they are to change their settings. He remarks: The results of the simulations with age structure show that the fact that older language users learn less quickly than younger ones might not just be an unfortunate consequence of aging, but might help to preserve sound systems (and possibly other aspects of language) across the generations. This remark suggests that the aging mechanism is not accidental. Although we not necessarily adopt this suggestion, the idea that perhaps such an parameter was set by an evolutionary process, illustrates the view championed in this chapter, of selfstructuring as the substrate for evolution in a linguistic context.

1.4. EVOLUTION OF LANGUAGE

21

1.4.4 syntax without natural selection
A nice study of selfstructuring that is concerned with syntax, is the work of Simon Kirby [1999]. He studied the language systems that arise in a simulation with agents that have quite sophisticated learning abilities and communicate about situations in a structured “meaning space”. Similar to the models of this thesis, agents use context free grammars to produce and understand strings. Kirby extended these grammars with a semantics. The tasks of the agents is to acquire a grammar such that they best understand the utterances of other agents. The learning samples are form–meaning pairs; agents in Kirby’s model thus have direct access to the “intentions” of the speaker. The “meanings” are expressions in a predicate logic, that shows an explicit compositionality (e.g. “John ?nds Mary” vs. “John sees Mary”). Nevertheless, grammars can be non-compositional to convey such compositional meanings. E.g.:

S = hAgent = John ; Patient = Mary ; Predicate = Finds i 7! john ndsmary
(Between angled brackets is the semantic condition for rewriting S to john ndsmary ). Kirby ?nds in his simulations that after a period in which the grammars are primarily non-compositional and have low expressiveness, a turbulent regime occurs with on average much more expressive grammars, culminating in a stable regime were agents reach maximum expressiveness with rather small grammars. The ?nal solution is a highly structured, compositional grammar with a regular correspondence between meaning elements (e.g. “John”) and forms (e.g. “da”). Agents in the model are not restricted to use compositional grammars. Yet, by simply observing each other’s behavior, learn from it and occasionally inventing new behaviors, syntactical language emerges. Kirby explains this phenomenon with the observation that the units of I-language (the internal knowledge of language), in the model the rewriting rules, only “survive” if they are faithfully preserved in the mapping to E-language (the external utterances in the “arena of use”) and back to another individual’s I-language. Kirby’s model is a nice illustration of the idea that in a population of learners, culturally transmitted units (“memes” [Dawkins, 1989]) in some sense compete for survival, that is for being learned. Rewriting rules themselves can become units of selections, and if there is variation, undergo cultural evolution. Compositional rules have a selective advantage in such a process, because they can be used to express many different messages and hence tend to be used more often. In another paper Kirby [2000], Kirby shows similar results for the emergence of recursive grammars, with a recursively structured meaning space.

1.4.5 the emergence of grammar
John Batali [1997] studied a population of agents with neural networks, that, through social interactions, spontaneously develop a language with some striking formal regularities. His setup is similar to Kirby’s, except for that the messages to be conveyed are even simpler. Unlike Kirby, Batali uses a learning algorithm, Elman’s simple recurrent neural network [Elman, 1991, see ?gure 1.3], is not particularly wellsuited for obtaining compositional (or recursive) languages. Agents in Batali’s model, use their networks quit straightforward to map a received string to a “meaning vector”. Each symbol is represented as an input vector of 0’s and 1’s; all input vectors are orthogonal. Symbols are presented to the network sequentially. The procedure to produce strings, inspired on Hurford [1989], is a little more complicated. For each position in the string, agents determine which symbol brings their output layer closest to the target meaning and communicate the thus generated string. This enforces a strict relation between language production and reception. In linguistics, this is generally accepted to be a realistic assumption, and it is also present in the models of this thesis. Agents, in Batali’s model, learn using “backpropagation”, the received message and the speaker’s meaning vector; agents thus have, also here, direct access to the intention of the speaker. The result is that after several thousands of communications, agents have developed a structured, compositional language. Batali performs a very simple linguistics analysis on the language generated by the networks. He ?nds that almost all utterances are composed of meaningful “building blocks”. The

22

CHAPTER 1. LANGUAGE & EVOLUTION, A GENERAL INTRODUCTION

I

H

O

R

Figure 1.3: The network consists of four types of nodes: input nodes, hidden nodes, output nodes and recurrent nodes. The input layer and the recurrent layer feed to the hidden layer; the hidden feeds to the output layer, all in a fully connected manner. The recurrent nodes obtain the activation of the hidden nodes; they form a copy of the hidden layer in the previous time step. The network is usually trained using backpropagation to predict, given an input symbol, the next symbol in a string.

formal properties of this language seem to be the consequence of the interactions between agents (“performance”), rather than some inherent property of their language production system (“competence”). We believe this is very suggestive for the usefulness of a key assumption in linguistics: the choice to study competence rather than performance. Putting the explanatory burden completely on the formal properties of competence, makes, at least in this example, the emergence of syntax dif?cult to understand.

1.4.6 compositionality overcomes an error limit
An interesting model study is Nowak & Krakauer [1999]. Using the matrix representations of Hurford [1989], they infer a “linguistic error limit”. The active matrix P is a table that gives, for a speaker, the probabilities that a certain meaning is associated with a certain sound. The passive matrix Q, conversely denotes the probability of inferring a certain object when hearing a certain sound. Evolution of referential systems with increasing expressiveness can be described with such a formalism. Given that an individual makes mistakes in distinguishing sounds with a probability that depends on the similarity between those sounds, Nowak & Krakauer calculate a limit on the number of messages an individual can convey. They show mathematically that word formation can help overcome such a limit. If an individual knows all possible combinations of sounds, it can recognize reliably much more combinations of sounds than it can recognize distinctive sounds. The next step in the evolution of language, according to Nowak & Krakauer [1999], is the combination of words to make up sentences. They analyze a hypothetical example with n objects and h actions, a fraction of which is relevant for communication. A non grammatical language would need N = nh different words for all relevant combinations of objects and actions. A grammatical language, on the other hand, would have n words for objects and h words for actions. After some calculations of the maximum ?tness of a language, Nowak & Krakauer arrive at the rather trivial conclusion that a grammatical strategy is bene?cial if there are more relevant events than there are words (N n + h). A more interesting conclusion is drawn from analysis of adaptive dynamics of non-grammar and grammar strategies. The combinations of a small number of objects and actions can be adequately expressed by both a non-grammar and a grammar strategy. Both strategies are evolutionary stable strategies

1.4. EVOLUTION OF LANGUAGE

23

(i.e. cannot be invaded by other strategies). However, Nowak & Krakauer show that every mixed strategy can be invaded by every mixed strategy that uses more grammatical sentences. Thus, the adaptive dynamics ?ow toward grammar. There is an apparent contradiction of this result with the results we will report in chapter 2 and 4, that show that in a population with suf?ciently large non-grammatical language, grammar can not emerge. This difference can be explained with Nowak & Krakauer’s assumption that hearers use a passive matrix that can deal with all expressions the speaker produces.

1.4.7 this thesis
The studies in this thesis should be viewed as a preliminary attempt in clarifying some of the many interactions involved in language evolution. Brain structures form in the morphogenesis of a human embryo; an infant hears language, imitates sounds and murmurs to itself; young childs acquire language and use it to communicate about the surrounding world with parents, siblings and friends; languages themselves change within an individual and in the process of transmission from individual to individual; social functions, education and many other things in?uence the process. No formal, verbal or computational study of the evolution of language can make sense of all these interactions at the same time. The models studied in chapter 2 and 4 focus on one particular interaction: between evolutionary dynamics and group dynamics. We study the evolution of very simple language-like communication in a population of agents with very limited abilities. We collapse the whole process of language acquisition by simply assuming that agents end up with a language system very similar to that of their parent. We collapse the whole problem of “meaning” by simply assuming that being able to speak, understand and being understood can contribute to one’s ?tness. Under these simpli?ed conditions, we study the evolutionary dynamics that result from evaluating one’s linguistic abilities with respect to the performance in a group, rather than with respect to some abstract, objective “expressiveness”. We will show that even without learning and cultural transmission, social patterns can in?uence the evolutionary dynamics. A social embedding can yield powerful, recursive grammars, but it can also prevent a population from obtaining them. Interestingly, because of these group effects, rules in one agent’s grammar can in?uence the persistence of rules in other grammars, even though the mechanism of cultural evolution is excluded.

24

CHAPTER 1. LANGUAGE & EVOLUTION, A GENERAL INTRODUCTION

Chapter 2

Social patterns guide evolving grammars
This chapter discusses a model study of evolving grammars, based on the work of Hashimoto & Ikegami [1996]. The main results of their work are reproduced and some interesting new model behaviors are identi?ed. The ?tness of agents is determined by their communicative success in a group. The resulting dynamic ?tness landscape can both facilitate and hinder the development of recursive grammars. We observe three qualitatively different behaviors, which we call indexical, compositional and recursive regimes. Each regime is (to a limited extent) self-enforcing, and shows characteristics in expressiveness, grammar size and mutational robustness.

2.1 Introduction
The transition from short, ?nite communication systems found in many animal species, to the open ended language system of humans, is considered to be one of the major transitions in evolution [MaynardSmith & Szathm? ry, 1995]. There is large agreement that the main qualitative difference is the syntax of a human language: the compositional and recursive nature allows for a systematic production and interpretation of a tremendous amount of different messages. Syntax therefore reconciles the need for a large expressiveness with the limitations in learning and memory. This is, according to speculations about the origin of human language, what makes syntax selectively advantageous, and caused the transition from an extensive non-syntactical “protolanguage” to a more ef?cient, syntactical language system [Pinker & Bloom, 1990; Nowak & Krakauer, 1999]. However, empirical evidence on e.g. animal communication, innateness and language universals, remains controversial and inconclusive. An intriguing alternative approach has emerged: mathematical and computational modeling of language origins [Hurford, 1989; Steels, 1997b; Hashimoto & Ikegami, 1996; Nowak & Krakauer, 1999]. In this line of research an effort is made to understand the dynamics of language evolution by studying simple models (“minimal models”) of communicating agents. These models help to generate new hypotheses, to evaluate how generic certain properties are, to tackle the supposed self-evidence of arguments and to ?nd a minimal set of assumptions suf?cient to explain a phenomenon. Their main contribution so far is, that they have shown the plausibility of cultural evolution as a mechanism in the development of more complex languages [Steels, 1997b; De Jong, 1998; De Boer & Vogt, 1999; Batali, 1997; Kirby, 1999, 2000]. Fewer studies exist that model genetic transmission of language capabilities. Following Hashimoto & Ikegami [1996], the model reported in this chapter studies the dynamics of genetic transmission of language. It takes an extreme position, as it ignores learning mechanisms and semantics, and models genetic adaptation of particular grammars. Language capabilities are described with “context free grammars”, that make compositional and recursive structures very easy to obtain. However, unlike some other studies of genetic transmission [e.g. Batali, 1994; Nowak & Krakauer, 1999], no static ?tness function is 25

26

CHAPTER 2. SOCIAL PATTERNS GUIDE EVOLVING GRAMMARS

de?ned; the grammars of all individuals in a group determine the environment in which an agent must survive. Under these simpli?ed conditions, the interaction between evolutionary dynamics and group dynamics is studied. We will show that even without learning and cultural transmission, social patterns can in?uence the evolutionary dynamics. We observe that a social embedding can yield powerful, recursive grammars, but it can also prevent a population from obtaining them. Interestingly, because of these group effects, rules in one agent’s grammar can in?uence the persistence of rules in other grammars, even though the mechanism of cultural transmission is excluded.

2.2 Model description
The model consists of a small set of agents that play a language game. They communicate in a language of short sequences (maxl = 6) of 0’s and 1’s. Agents speak (“derive”) and understand (“parse”) these strings using a Chomskyan rewriting grammar, which they inherited – with some random mutations – from their parent (see table 2.1). In each language game, all agents can speak once and try to understand each of the spoken strings. Agents receive scores depending on their success in speaking, understanding and (not) being understood. After a number of language games, scores are evaluated and offspring is produced. Successful agents have a higher chance of survival and reproduction (see ?gure 2.1).

mutation adding deleting replacing

probability madd per grammar mdel per grammar mrep per grammar

effect add and modify a random rule from the population to the grammar delete a random rule from the grammar modify a random rule from the grammar

Modify by replacing the left hand side symbol with a random non-terminal, by replacing a random symbol from the right hand side with a random symbol, or by deleting a random symbol (if more than one) from rhs. Don’t perform mutations that lead to rules with the same symbol on both left and right hand side.
Table 2.1: The evolution scheme. The adding mutation is, with the given probability, applied to agents with an above average score. The deleting and replacing mutations are applied to all agents with the given probabilities. Applying a mutation here means: replacing a random agent with a below average score with a copy of the replicating agent and mutating the copy as described. Default parameters: madd=0.04, mdel=0.04, mrep=0.04.

The grammars of the agents are context free grammars, with a small terminal (Vte = f0; 1g) and nonterminal alphabet (Vnt = fS; A; B g). In some simulations reported here, as an extra restriction, the nonterminal symbol of the left-hand side is not allowed on the right-hand side. In other simulations, the start symbol is not allowed on the right-hand side of rules. At the start of most simulations, grammars are randomly initialized with either S 7! 1 or S 7! 0. Derivation always starts with the start symbol, and applies iteratively random ?tting rules for some maximum number of steps (maxd = 60; failure), until no ?tting rule exists (failure), or until a string of only terminal symbols is reached (success). In parsing rules are tried in the order they are stored, and ?tting rules are applied recursively until the maximum number of steps (maxp = 500) is reached (failure), no other ?tting rules exist to any intermediate string (failure), or the start symbol is reached (success). An example of a derivation tree and a sketch of the corresponding parsing tree is given in ?gure 2.2. The use of rewriting grammars in this model study is an extremely simple variant of the dominant way to describe grammatical forms in natural language in linguistics. A rule S 7! NV for example states that a correct sentence can be formed by rewriting the start symbol with the variables for noun and verb. The N and V on their part can be rewritten to strings that can actually be pronounced (terminals) by rules like N 7! theboy and V 7! cries , producing the correct English sentence “the boy cries”.

2.2. MODEL DESCRIPTION

27

Evolving set of com municating agents

speaking score

derive

S
recognizing score being rec. score

01101

... ...

parse S
P speakers

evolution

R games

G generations

Figure 2.1: The basic model architecture. In one language game (the inner circle) all agents (P) speak one string that all agents try to parse. Agents receive scores for speaking, understanding and being understood according to some scoring scheme. Each generation a number of games (R) is played. Depending on the scores, agents can reproduce, with some mutations, to the next generation. Default parameters: P=10, R=10.

Success speaking recognizing being recognized

Failure

rsp string length frequency string length rrec steps rbr string length P

M rsp ?2 ?rrec string length

?rbr string length
P

Table 2.2: The scoring scheme of Hashimoto & Ikegami [1996]. The scores are updated after every derivation or parsing procedure. In the reported simulations that results in 1 speaking update and 10 recognizing and being recognized updates every language game. Frequency is the frequency of the spoken string in the last 10 generations only, including the present communication. M is the maximum string length at which strings are truncated after derivation. Steps is the number of steps the parsing algorithm needed to accept a string. P is the number of agents in the population. Default parameters: rsp=30.0, rrec=1.0, rbr=-2.0.

28

CHAPTER 2. SOCIAL PATTERNS GUIDE EVOLVING GRAMMARS

(1) (2) (3) (4)

S A B B

?> ?> ?> ?>

0AB1 1 0B 1
3

S
1

0AB1
4 2

rst search) (complete, depth?fi

3 4 2

3

PARSING

0A011
2

010B1 0A00B1
4 3 3 2

0111 ...

01011
... 0100A1

0100B1
4 3

010011

...

Figure 2.2: An example of language processing using the rules from a grammar. Derivation consists of applying recursively a random ?tting rule. Parsing consists of a complete search for a path back to the start symbol S. Rules that leave a string unchanged are ignored. Strings are truncated after derivation after the M’th symbol. Only a limited number of steps is allowed in the derivation (maxderive) and parsing (maxparse) algorithms. Note that in the example, rule 3 is a recursive rule (of the type not allowed in most simulations reported here, see ?gure 2.4). Default parameters: maxderive=60, maxparse=500, M=6.

The model is similar to the model introduced by Hashimoto & Ikegami [1996]. They use a very speci?c scoring scheme, where not being recognized is rewarded and scores depend on population size, the length of a string, and a string’s novelty in the population (see table 2.2). This chapter shows new results obtained with similar parameters. Chapter 4 discusses results obtained with much simpler scoring schemes. Hashimoto & Ikegami discuss their results in terms of the Chomsky hierarchy of grammars and languages. In a domain of changing grammars and ?nite languages, we believe it is much more convenient to use a classi?cation in terms of “routes”. A route is a sequence of rewriting steps that connects the start symbol S to a string of terminal symbols. Routes can be categorized as indexical (directly from S to a terminal string), compositional (via non-terminal symbols from S to a terminal string) or recursive (leading from a non-terminal symbol via one or more rewriting steps to the same non-terminal symbol). The number of routes, can be divided in three components RI , RC , RR , that depend on each of these categories of routes. Similarly, expressiveness (the number of distinct strings a grammar can parse) can be divided in EI , EC , ER . Grammars can be characterized by these values, and classi?ed according to the largest component. This classi?cation is further discussed in chapter 3.

2.3 Previous work
Hashimoto and Ikegami’s main results were that: i. powerful (context free, recursive) grammars often quickly evolve; ii. group effects in the population strongly interact with the evolutionary dynamics; iii. the evolution shows punctuated equilibria in the total of exchanged information; iv. and a weak score for not being recognized accelerates the evolution towards more powerful grammars. ad i Hashimoto & Ikegami de?ne computational ability as the ratio of the number of strings a grammar is able to parse (the expressiveness) and all possible strings in the domain (i.e. 126). The simula1 tion results they report, show an increase in this computational ability from 126 at initialization to

(random walk)

4

2

DERIVATION

0A0B1

01B1

0A11

2.4. RESULTS

29

around 0.5 after 600 time steps. They report on two important stages in the evolution: module-type evolution (the emergence of compositionality) and the emergence of a loop structure (recursiveness). They describe this development as a climb in the Chomsky hierarchy from type 3 (regular grammars) to type 2 (context free grammars).1 ad ii Frequently, however, they observe agents with high computational abilities that nevertheless cannot persist in the population. Thus, computational ability is not (as one would expect from the scoring scheme) a very good indicator of ?tness, due to group effects. Hashimoto & Ikegami ?nd that even though those agents have more powerful grammars, they are not able to ef?ciently process the dominant strings in the population. The group of agents that do process these strings are called an Ensemble with a Common set of Words (ECW). The set of strings Hashimoto & Ikegami call a “net-grammar”, which they describe as an upper structure that emerges from the behavior of individuals, but in turn also restricts the behavior of individuals. ad iii Hashimoto & Ikegami hypothesize that the ensembles hinder the development of more powerful grammars, but only for a limited amount of time. They de?ne a measure for the amount of information exchanged in the population and show that this value develops stepwise over time. They describe these dynamics as an alternation between periods of stasis, when competing ensembles exist, and periods of “algorithmic evolution”, when one ensemble has taken over the whole population and an increase in grammatical complexity can occur. ad iv Hashimoto & Ikegami pay some special attention to the parameter that weights the being recognized score (rbr). They ?nd that both the variety in spoken strings and the information exchange value reach high values when this parameter is small (between -3.0 and 2.0). They conclude that a slight negative value accelerates the evolution of powerful grammars. These results show that if all the necessary apparatus for dealing with grammar is provided, evolution of rewriting grammars can indeed lead to more powerful structures. Furthermore, the results show that the coevolution in a group is an important aspect of language evolution, because patterns on the level of the group alter the ?tness landscape signi?cantly. Finally, the ?nding that selecting on not being recognized speeds up evolution leads one to carefully reconsider the silent assumptions on what language is for. However, it is important to realize that with the chosen representation of rewriting grammars, context free grammars (as de?ned in terms of the Chomsky hierarchy) are rather easy to obtain, and even ef?cient recursive grammars are always only a few mutations away. With all circumstances set up for recursiveness, the fact that it develops seems not a very strong result. In section 2.4 we will discuss the more surprising ?nding, that in very similar situations, recursiveness does nevertheless not develop. In our interpretation, the model serves as a starting point for understanding the dynamics of grammatical evolution, rather than as a validation of the genetic transmission scenario. Section 2.4 discusses the partial reproduction of these results, and some interesting new observations for further research. Of the many parameters of the model, only a limited number is varied. Many of the choices in this model study are rather arbitrary. Such “over-determinism” is inescapable in any model study; because of lack of time and experience it is more apparent in this thesis than in published work.

2.4 Results
We studied the behavior of the model with the parameter settings of Hashimoto & Ikegami [1996], and a number of variations. Similar to their results, we ?nd that evolution can quickly lead to grammars that can parse a large fraction of the 126 possible strings. However, under slightly different parameter settings we also ?nd quite different results. We observe three types of behavior:
1 This is not quite correct, though, as regular grammars can be recursive and context free grammars are not necessarily so (see chapter 3).

30

CHAPTER 2. SOCIAL PATTERNS GUIDE EVOLVING GRAMMARS
i. The ?rst type of behavior is a quick growth of expressiveness, from 1 at initialization, to over 100 after about a 1000 generations. In the ?rst stage the expressiveness depends only on indexical routes. Soon, however, compositional routes and recursive routes become more important. Eventually, recursive routes dominate the grammar’s expressiveness. Striking feature of this type of runs, is that although the majority of agents have a very high expressiveness, frequently mutants show up with a very low expressiveness. When expressiveness is tested in a domain with larger maximum string length, grammars tend to generalize very well. ii. In another type of runs, it takes much longer to reach the high level of expressiveness, ranging from 2000 to many thousands of generations. In these type of runs, compositional routes quickly become important, but recursive routes are relatively infrequent. Grammars tend to be much more mutational robust, and generalize worse to larger domains.

iii. A third type of runs show very little growth in expressiveness. After 3000 generations, only around 20 words can be parsed. In these runs, expressiveness depends almost exclusively on indexical routes. At any particular generation, there is hardly any variation in expressiveness (grammars are extremely robust). Grammars don’t generalize at all to larger domains.

150
S ?> BBB B ?> BB | 1 | 0

120

expressiveness

expressiveness

100

80

recursiv

e

100

S ?> B10 | 1B0 B ?> 0 | B1 | B10 B01B0B

60

S ?> 110 | B1 | 111 | B10 | 1B101 | ... B ?> 01 | 00 | 1100 | B1 | 111 | 1101 | ...

co

o mp

siti

on

al

50

40

S ?> 011 | 1100 | 11110 | 111 | 0A00 | ... A ?> 01

20

indexical

0

0

1000 time

2000

3000

0

0

4

8 12 functional grammar size

16

20

(a) Expressiveness over 3000 generations for three example runs

(b) The same three runs in a “phase space” of expressiveness vs. the number of rules used in communication

Figure 2.3: Three runs, typical for the indexical, compositional and recursive regimes, and some example grammars. Parameters are in all three runs the same, except for the random seed. Mutations: as in table 2.1, except for that scores are evaluated with respect to the median score, rather than the average. Adding mutations add a random rule from the population unmodi?ed to the end of the grammar. Scores: as in table 2.2. Rules: as in ?gure 2.4(b). The dashed curves in (b) are theoretical predictions of the relation between grammar size and expressiveness (see chapter 3).

With some particular parameter settings, each of the three types of behavior can occur, solely depending on the “seed” for the random generator (see ?gure 2.3). At different generations we restarted indexical runs with original grammars but a different random seed. In early generations, a change of type of behavior occurs frequently. However, in restarts from later generations, the type of behavior seems ?xed and a change of type becomes increasingly improbable. The types of behavior thus form self-enforcing, dynamical regimes. Indeed, simulations with a variety of initial indexical grammars con?rm that there is a rather abrupt change in the probability of developing powerful grammars. Especially the string length seems important for this bifurcation, as all initial grammars that contain a rule with a right-hand side of length 6 (the maximum) lead to an indexical regime. This suggests that especially the property of generating longer strings gives compositional and recursive rules a selective advantage.

2.4. RESULTS

31

In all simulations we observe an increase in expressiveness. This is not quite unexpected, because in a random population, agents with more expressive grammars speak more novel strings, understand more strings and are less likely to be understood, and thus should receive higher scores (see table 2.2). However, large expressiveness in the model occurs only when compositional and recursive grammars are present. We observe that the development of such grammars is not at all trivial. These results crucially depend on the fact that the ?tness of an agent is evaluated with respect to its performance in the group, rather than with respect to some static ?tness function. We will discuss in which ways “group patterns” can both facilitate and hinder the development of ef?cient recursive grammars.

2.4.1 evolution yields powerful grammars
The ?nding of Hashimoto & Ikegami [1996] that evolution in this model can yield highly expressive, recursive grammars, was easily replicated. To test to what extent this result depends on the dynamic nature of the ?tness landscape, we changed the model to yield a ?xed ?tness landscape. This alternative model consists of two subpopulations. One of these is initialized with a recursive grammar and does not change. The other subpopulation is initialized with the default single rule grammars, and evolves under the same regime as before. Communication takes place only between member of different subpopulations. Agents in the second subpopulation thus evolve in a constant environment, except for some stochastic effects in what strings the non-evolving agents choose to speak. If a large number of language games is played per generation, the model approaches a ?xed ?tness evaluation. We initialized the ?xed subpopulation with grammars that can parse a large subset of all 126 possible strings. Under these conditions, evolution is not successful. Sometimes random drift leads agents to have a rule that allows them to communicate a particular string with the ?xed population. However, because of the high expressiveness of the ?xed population, this string is used only seldomly and gives the evolving agents only little selective advantage. Random ?uctuations cause this one rule to disappear again. If the ?xed subpopulation is initialized with grammars that can parse only a few strings, performance seems a little better, but after a long time still few strings can be parsed. If the “target” grammar has very low expressiveness, the evolving subpopulation shows only random drift through the huge genotype space. Apparently, the ?xed ?tness landscape is too ?at for evolution to optimize expressiveness. The dynamic ?tness landscape, on the other hand, does not run into this problem. With exactly the same parameters as used in the ?xed ?tness simulations, the dynamic ?tness landscapes always leads to highly expressive grammars within a 1000 generations. Several ideas exist about why such “coevolutionary learning” could be successful: i. Agents evolve under a regime where they at all times experience only a subset of all problems possible in the domain. This “sparse evaluation” shows different dynamics, such as leading to better and better generalizable solutions [Pagie & Hogeweg, 1997]. ii. At the beginning of the simulations the environment poses only simple problems. Only when agents start mastering these simple tasks, more challenging tasks naturally emerge [“beginning small” Elman, 1991]. iii. Evolution is continuously triggered to ?nd better solutions, because an agent’s competitors evolve to exploit its weak spots. The evolution can thus be described as an “arms race” [Hillis, 1991]. Mechanism (ii) implies mechanism (i), and mechanism (iii) implies mechanism (ii). To test which of these ideas best explains the success of evolution in a dynamic ?tness landscape, we compared simulations with a ?xed landscape and only 1 language game per generation (sparse evaluation) or 50 language games per generation (close to complete evaluation). If “sparseness” contributes to successful evolution, we expect the ?rst simulations to yield higher expressiveness. Initial results show no difference in behavior between sparse and more complete evaluation. However, more work is needed to decide what causes success of co-evolution. We leave this as future work.

32

CHAPTER 2. SOCIAL PATTERNS GUIDE EVOLVING GRAMMARS

2.4.2 indexical and compositional dynamical regimes
Simulations with exactly Hashimoto & Ikegami’s rule-scheme (see ?gure 2.4a) and the default initial grammar always show the recursive regime behavior. In simulations we did with a different rule-scheme from Hashimoto & Ikegami’s, we also observe quite different behaviors. In this scheme rules that rewrite nonterminal X to a string that contains X are allowed; however, the start-symbol ’S’ is not allowed at the right-hand side (see ?gure 2.4b)2 . Surprisingly, we ?nd, that in a signi?cant fraction of the simulations with this scheme no large increase in expressiveness takes place (see ?gure 2.3). Of these grammars, all (or almost all) rules used in parsing are of the type S 7! terminal. We call this dynamical regime the “indexical regime”. Alternatively, in some simulations large expressiveness does eventually develop, but it takes relatively long. Since expressiveness in these type of runs depends for the largest part on compositional routes, we call this the “compositional regime”.

A S terminal S

A terminal S

A terminal

B

B

B

(a) the Hashimoto-Ikegami rule scheme

(b) our alternative rule scheme

(c) no restrictions rule scheme

Figure 2.4: Three possible rule schemes for context free grammars with Vnt = S; A; B . An edge between nodes represents the possibility to have a rule in the grammar that rewrites from the non-terminal corresponding to the ?rst node, to a string that contains the non-terminal of the second node. Note, that one rule can thus be represented by multiple edges.

f

g

This phenomenon was not reported by Hashimoto & Ikegami. However, the indexical regime also exists with their rule-scheme, if agents are initialized with a more extended indexical grammar. We believe these simulations are at least as interesting. The chosen representation (rewriting grammars) makes the use of compositionality and recursiveness very easy; the fact that such solutions are nevertheless sometimes not used is intriguing. We believe that the existence of the dynamical regimes, in a non-random population, can be explained by three mechanisms: a mutation bias, a context effect and a group effect. The mutation bias is that mutations of rules of a certain type, are most likely to be of the same type. The context effect is that, when a mutation introduces a new rule, this rule is likely to be similar to existing rules, and it will in fact tend to be more successful if it is of a similar type. The group effect is that agents, because of family relationships, have similar grammars, and in fact tend to be most successful if they have similar grammars. This is because the features of a grammar map (in a very non-linear way) on features of its language. The derive-languages of individuals, jointly constitute a group language, that in turn determines the success of agents in parsing. This indirect feedback can best be described as a social pattern that emerges from individual behaviors, and in turn restricts individual success (see ?gure 2.5). Initial similarities (in terms of our classi?cation) are enforced by these social patterns. The derive-languages of individuals, jointly constitute a group language, that in turn determines the
2 Other settings that deviate from default are: the median evolution scheme is used (only below-median scoring agents can be replaced, above median scoring agents can have the adding mutation), newly added rules are taken randomly from the population and placed at the end of the grammar.

2.4. RESULTS
individual grammar

33

local fitness landscape

individual language

group language
Figure 2.5: The “group effect”: present individual grammars determine via an indirect feedback loop the success of individual agents, and thus what the next generation looks like.

success of agents in parsing. This indirect feedback can best be described as a social pattern that emerges from individual behaviors, and in turn restricts individual success. Initial similarities (in terms of our classi?cation) are enforced by these social patterns. Apparently, the larger an indexical grammar is, the less likely it is that evolution can lead to compositional and recursive grammars. This in some sense contradicts the traditional picture of the evolution of syntax, that states that only when indexical grammars became too large, syntax emerged. A simple analysis can lead to some qualitative predictions on how, given the existence of these regimes, different variables in the model should relate. One can show, that the number of routes grows linearly with grammar size in the indexical regime. In a compositional regime it grows faster, and in a recursive regime extremely fast3 . A rough estimate of how expressiveness4 depends on R, gives a qualitative explanation for the trajectories in the phase space in ?gure 2.3b. If a linear growth of grammar size over time is assumed, the shape of the curves in ?gure 2.3a can also be explained.

2.4.3 group effects
As Hashimoto & Ikegami describe, we frequently observe agents in the simulation with signi?cantly larger E, which are nevertheless not able to persist in the population. Figure 2.6(a) shows an example of such a situation; ?gure 2.6(b) shows an example language game with the grammars at the point indicated with an arrow, and the score each agent gathers per string. Although agent 586 and 602 have the highest expressiveness (30), they gather a below average score in the language game. Two mechanisms can explain this phenomenon: i) stochasticity and ii) group effects. Both effects depend on the population size. Stochasticity decreases with population size, because language games consist of more communications, which averages out stochastic variation. Group effects increase with population size, as innovations have lower relative frequency and innovative agents will have more trouble receiving enough scores. A simple test can distinguish between these two effects: if a large number of language games is played with the same population, the stochastic effect disappears. In fact, in the given example, agent 602 en 586 have the highest cumulative scores after 500 language games. Therefore, it is stochasticity that explain their low performance in ?gure 2.6. The group effect, context effect and stochasticity explain the phenomena of the untimely death of powerful agents and the different dynamical regimes. The group effect is adequately described by Hashimoto & Ikegami’s concept of “net-grammar”: the group language that emerges from individual behaviors and itself restricts the the evolutionary dynamics of individuals. We ?nd their concept of “ensemble” less useful. It suggests a clearer separation between members and non-members than seems to exist, and is unnecessary for the explanation of these phenomena.
for example the simple case of grammars with Vnt = S; A , and at most one non-terminal and at least one terminal ?1 ?1 2 maxc +2 , symbol at all right-hand sides of rules. Estimates of R in each of the regimes are: RI N , RC 2 N , RR 3 N where maxc is the maximum number of cycles.
3 Take 4

f

g

E Emax

1

1 ? 1 ? ( Emax ) R

, here Emax = 126

34

CHAPTER 2. SOCIAL PATTERNS GUIDE EVOLVING GRAMMARS

50

200

individual agents 10pt running average
expressiveness
40

180 160 140 120

600 556 147

cumulative scores

100 80 60 40 20 0 ?20 ?40

599

30

20

603 (20) 600 (29) 585 (28) 593 (25) 602 (30) 556 (25) 572 (23) 586 (30) 604 (25) 599 (22)

10

01

11 0 01 1

10

10

00

01

00

01

01

00

11

0

0

100

200

300

400

500

600

700 800 generations

900 1000

?100 derived strings

(a) An example run

(b) An example language game

Figure 2.6: (a) An example run, in which agents with above average expressiveness, are nevertheless not able to persist in the population, due to stochastic or group effects. (b) An example language game, played by the population at the time indicated by the arrow in (a). Four agents gather above average scores, including agent 599 that has the lowest expressiveness in the population. Both agents with the highest expressiveness (30) have below average scores. In this example, these are stochastic effects; if many more language games are played, these latter agents do receive the highest cumulative scores.

2.4.4 punctuated equilibria
Hashimoto and Ikegami describe the dynamics of the model as an alternation between periods of stasis and periods of “algorithmic evolution”. They explain the stasis by competition between ECW’s (ensembles with a common set of words). They don’t provide a formal de?nition of an ECW or quantitative results of how they develop over time. The alternation is illustrated with a table of communications and a graph showing “punctuated equilibria” in the amounts of exchanged information. We have not been able to replicate a graph as convincing as Hashimoto and Ikegami’s. Nor have we found evidence that ECW’s are separate, de?nable patterns in our simulations. It is, however, clear that the power of grammars develops stepwise over time and that group effects alter the dynamics signi?cantly. Punctuated equilibria in this context refers to periods of relative stasis in the evolution of phenotypic traits within a species (not to the long periods of little innovation between species, as Gould originally used the term). In theoretical studies of evolution, several alternative explanations for this phenomenon have appeared. Most important of these, are explanations that involve a local ?tness-peak or, alternatively, neutral networks [Van Nimwegen et al., 1999]. Both types of explanations, however, involve ?xed ?tness landscapes, i.e.: the ?tness of agents is a unique function of their genotype, regardless of environmental circumstances or population structure. Clearly, this assumption does not hold in the model studied in this paper. In short, we are left with three unsatisfactory explanations for the epochal nature of the evolutionary dynamics that need further study: i. In the periods of relative stasis the population is stuck on some sort of local maximum of ?tness. Only when a rare sequence of mutations occurs, a new ?tness-peak is discovered, innovation occurs and a new “epoch” starts. ii. Alternatively, the population ?oats randomly through a “near-neutral network”, a network of connected genotypes that have the same ?tness value. Only when by chance a “portal” is discovered to a new ?tness-level, innovation occurs and a new “epoch” starts.

10

?80

10

0

11

00

?60

2.4. RESULTS

35

iii. In the periods of stasis, innovation is suppressed by speci?c group structures (a con?ict between ECW’s); only when the population dynamics have changed this group structure (one ECW has taken over the whole population) the “algorithmic evolution” can continue. In some sense, this explanation relates to explanation i, because it assumes the speci?c group structures make the present genotypic situation temporarily a local maximum. The consequences of these explanations still have to be evaluated and tested to reach a satisfactory explanation. For this, the theoretical results on ?xed ?tness landscapes, have to be made useful for the ?uctuating landscape that arises due to the coevolutionary settings. An important measure will be the rate of genotypic change: how many mutations are accumulated during an epoch in the grammars. Explanation i and iii predict large change between epochs, and little change during epochs (random ?uctuations around the local peak)5 Explanation ii predicts large change during epochs, because of (more or less) random drift through a near-neutral network. Preliminary results suggest that neutral mutations — mutations that have no effect on the immediate ?tness of individuals — are in fact frequent in our simulations. E.g. rules which contain on both left hand side and right hand side a variable that is not present in other rules, do not play any role in either the derivation or the parsing procedure. Random drift of these rules can, however, suddenly lead to a large change in processing power. E.g. when a rule with S as left hand side mutates to have that variable at its right-hand side. Figure 2.7 shows an example of a run with a variant of the present model (see chapter 4). It shows very clear “epochs”In this particular case a “group effect” is needed to explain the long periods of stasis. The sudden jumps to a new level of expressiveness can best be understood with the concept of neutrality. Competition between ensembles plays de?nitely no role in the explanation of these jumps.

120

individual agents 20pt running average

120

individual agents 20pt running average

100

100

expressiveness

80

expressiveness

80

60

60

40

40

20

20

0

0

2000

4000 time

6000

8000

0

0

5

10

15 grammar size

20

25

30

(a) Expressiveness over 9000 generation in an example run

(b) The same run in a “phase space” of expressiveness vs. the grammar size

Figure 2.7: Epochal evolution. In this example run very clear epochs can be seen. (a) During such an epoch, expressiveness stays at a ?xed level. In fact, in the ?rst stage (E=31) the dominant language stays exactly the same for thousands of generations. Individual agents with higher expressiveness occur, but are not able to survive in the group. (b) Grammars do vary, however, which is possible because of the neutrality in the grammar–language mapping (see text). In the phase space, one can clearly see that grammar size ?uctuates during an epoch. All jumps to higher levels take place when grammars are relatively large. Such grammars are clearly larger than necessary and have a neutral tails. Parameters: default “communication” run with innovation pressure; see chapter 4.

5 Of course,

population dynamics may cause changes in the average genotype.

36

CHAPTER 2. SOCIAL PATTERNS GUIDE EVOLVING GRAMMARS

2.4.5 not being recognized
Hashimoto & Ikegami report that the speed of evolution is highest, with the being recognized-parameter (rbr) between -3.0 and 2.0, with an optimal at a slightly negative value. We have not replicated this result, obtained with the complicated scoring scheme of table 2.2. In stead, we studied the impact of each of the three scoring dimensions by designing several very simple scoring schemes. Most striking result from this study, is the ?nding that just a score for recognizing leads to powerful grammars, while with a score for both recognizing and being recognized, language capacity remains extremely limited. As Hashimoto & Ikegami report, we ?nd that the development of complex grammars is hindered by a positive score for being recognized. However, in our simulation we also observe that, if no score is given for being recognized, agents develop grammars that make it more likely to fail in derivation and thus to hold one’s tongue. With the bene?t for language being only at the side of the receiver, a classic altruism problem arises because agents no longer bene?t from speaking. Chapter 4 discusses this “paradox of sel?sh language” and some possible solutions.

2.5 Conclusions
The main ?nding in the model study of Hashimoto & Ikegami [1996], is that group dynamics in the evolution of language can show patterns that restrict the evolutionary dynamics in an unexpected manner. This can, as Hashimoto and Ikegami report, explain why frequently agents with more expressive language capabilities, can nevertheless not persist in the population. In addition, we ?nd that there exist distinct dynamical regimes, that lead to qualitatively different grammars, and that group effects can explain their existence. Hashimoto and Ikegami describe these group patterns as “ensembles”: groups of agents that share a common vocabulary. We argue that “ensemble” is a problematic concept here, and that it is unnecessary for explaining group effects. In stead, we consider their concept of “net-grammar”: the vocabulary of the group, that restricts the success of expressions. Unlike the “productive grammars” that generate individual languages, this group vocabulary constitutes in some sense a “restrictive grammar”. A second ?nding of Hashimoto and Ikegami is that there indeed exists – as is often assumed, but seldom shown – an evolutionary route from simple, indexical grammars, to complicated, compositional and recursive grammars. However, we ?nd that in the model, this route disappears if the initial indexical grammar is too powerful. Interestingly, the model ignores learning and meaning, but nevertheless shows complicated and rich dynamics. These ?ndings are in several ways suggestive for the evolution of language. It shows that higher expressiveness cannot be equated with selective advantage. Theories on evolution of language must thus explain how subsequent steps can be selectively advantageous in an group that shows social patterns. Furthermore, the model suggests that language evolution shows rich, inherent dynamics, even in this simpli?ed context. Where other work has focussed on the potential of social effects to generate complex languages, this study shows that in a system where language is transmitted genetically, social effects emerge and can both facilitate and hinder the development of such languages. We conclude that the work of Hashimoto and Ikegami forms a good starting point for studying the dynamics of language evolution using a computational modeling approach. We point out several issues for improvement and further study. First, as a technical point, we ?nd the implementation choices often rather arbitrary and less elegant than one would prefer. E.g. the adding mutation follows a discontinuous step function that arbitrarily differs from the deleting and replacing mutations. The scoring and evolution scheme can be replaced by more elegant schemes, that contain smoother functions, mutation probabilities per rule rather than per grammar, and more meaningful scores. A model with these modi?cations is studied in chapter 4. Second, by focusing on genetic transmission and group dynamics, the model ignores many aspects that should be expected to play major roles in the development of complicated languages, like learning and meaning. These are obviously important issues for further study.

Chapter 3

Some properties of ?nite grammars and languages
This chapter introduces an alternative classi?cation to describe ?nite grammars and languages. The classi?cation is in terms of routes, the sequence of rewriting steps to get from S to a terminal string. We show that this classi?cation allows for a meaningful characterization of grammars. Also, the classi?cation leads to testable predictions on the dynamical behavior of such grammars and the nature of their languages. The new concepts are evaluated with data from computational simulations of evolving grammars.

3.1 Introduction
Recent papers on computational modeling of language evolution study the emergence of syntax [Hashimoto & Ikegami, 1996; Batali, 1994, 1997; Steels, 1997a, 1998; Kirby, 1999, 2000; Nowak & Krakauer, 1999; Hurford, 2000b,a]. Two fundamental aspects of syntax receive particular attention: compositionality, the feature of sentences to be build-up of meaningful components, and recursiveness, the feature of sentences to show nested structures. These aspects of syntax are generally accepted to account for the huge, or in?nite, expressiveness of human language. The usage of these concepts, however, differs from study to study. Little use has been made of formal concepts and tools from logic and linguistics, with Hashimoto & Ikegami [1996] as an exception. The reason is that well-known formal results, such as the Chomsky hierarchy, don’t seem to be particularly useful. Such results deal with exact properties of static grammars in in?nite domains. In contrast, the languages in the computational models are usually dynamic and ?nite, and we are more interested in their statistical properties. Hashimoto & Ikegami [1996] acknowledge these problems, but nevertheless choose to describe the evolution of grammars in their model study, as an escalation in the Chomsky hierarchy. For several reasons we think this escalation is not very informative: i. The representation Hashimoto & Ikegami [1996] use, allows only for context-free grammars. The classi?cation is therefore reduced to a binary opposition: either a grammar is regular (type 3) or it is not (type 2). ii. The mutations in their model of language evolution, make it extremely easy to force a grammar out of the class of regular grammars. A mutation that adds a second non-terminal or a terminal symbol on the left-hand side of the existing non-terminal, already alters the type of a grammar (however, not necessarily the type of the language). iii. The languages, of which the type is more informative than of grammars, cannot automatically be classi?ed, because it is often hard to ?nd the minimal (highest type) grammar that generates it. Moreover, in the simulations strings are truncated at a ?nite length; the resulting ?nite languages are all regular (type 3). 37

38

CHAPTER 3. SOME PROPERTIES OF FINITE GRAMMARS AND LANGUAGES

iv. The concepts of recursion and in?niteness, which are so important in the origin of language debates, can not adequately be characterized in terms of this hierarchy, since all classes in the hierarchy contain recursive, in?nite languages. Other categorizations of grammars and languages exist. One can for instance, transform algorithmicly any rewriting grammar to its unique Chomsky Normal Form or other normal forms. Characterizations of grammars can be given in terms of some of the properties of this normal form (number of rules, nonterminal symbols etc.). For our purposes this does not seem a very promising approach, however, and it is therefore no further pursued. This chapter focuses on some observations and calculations that are useful for interpreting the simulation data in chapter 2. Relations with work and terminology in formal language theory and computational linguistics remain to be sorted out.

3.2 Classi?cation of routes
The language 0n grows in size proportional to the maximum string length. Its size linearly approaches in?nity as the maximum string length goes to in?nity. In contrast, the size of the language (0; 1)n , grows exponentially with the maximum string length. In ?nite domains, this constitutes a signi?cant difference between languages. However, both language are regular. Moreover, in the study of language learning and language evolution, the grammars and languages often undergo constant change. One would like to be able to classify grammars and languages in such a way, that this change can be described as the extent to which it belongs to a certain class. And ideally, a classi?cation also contains information on the typical ways in which grammars and languages change as a result of learning or evolution. Formalizing the concepts of indexical, compositional and recursive language, and expressiveness, as they are already being used in verbal theories on language evolution, seems a promising approach to meet these demands.1 First, we need to de?ne an extra concept “route”. For now, we restrict our selves to a simpler subset, Gs , of all possible grammars, of which all right-hand sides of rules contain at most one non-terminal symbol. De?nition 1 A ROUTE r of a grammar Gs = hPs ; S; Vnt ; Vte i is a distinct, ordered sequence of rewriting rules from P that lead from the start symbol S to a string w 2 Vte . We can de?ne a classi?cation of such routes. Later, we will describe grammars by what types of routes they allow. De?nition 2 A route r of a grammar Gs tions:
=

hPs ; S; Vnt ; Vte i is classi?ed according to the following restricS 7! w, of which S is the start

i. A route r is INDEXICAL (type I) if it contains only one rewriting rule symbol and w 2 Vte .

ii. A route r is COMPOSITIONAL (type C) if it contains more than one rule, and if all non-terminals symbols appear at most once at a right-hand side (and consequently at most once at a left-hand side of the rules). iii. A route r is RECURSIVE (type R) if it contains more than one rule and if at least one terminal appears more than once at a right-hand side (and consequently also more than once at a left-hand side).
1 Terminology is loosely based on Deacon [1997]; Pinker [1994]; Kirby [1999, 2000]. A possible source of confusion is the fact that these terms are usually used to describe a speci?c type of relation between form and meaning. Here, we discuss only structural characteristics; however, there is a tight correspondence between the usage of terms here, and their usage in a semantic context.

3.3. DYNAMICAL PROPERTIES: EXPRESSIVENESS

39

3.3 Dynamical properties: expressiveness
3.3.1 rules & routes
The relation between N , the number of rules (productions), and the number of routes depends on the types of routes. Let us restrict our selves to an even simpler subset of grammar, Ges , where Vnt contains, aside from S , only one symbol and all right-hand sides in Pes contain no S , at most one non-terminal and at least one terminal symbol (see ?gure 3.1).

d b A d

S

a

terminal

Figure 3.1: Graph representation of the possible types of rules in right-hand side contains at least one terminal symbol.

Ges .

Terminal symbols are not represented in the graph; every

Let Ri be the number of possible routes of type i in a grammar Ges , where i 2 fI; C; Rg. We can formulate the following upper-bounds of the number of routes of type i, and estimate the values by assuming a random distribution of rules over all possible edges:

RI 5 Na

1 4

N

(3.1)

1 N RC 5 N b N d 4 maxc maxc X X RR 5 Nb Nc Nd

2

(3.2)
1

=1

=1

N 4

+2

;

(3.3)

where m axc is the maximum number of cycles through the loop structures; here: m axc 5 m axl , the maximum string length, given that at least one terminal symbol is added to the string every cycle. So, we derive that for grammars in Ges , the number of indexical routes shows a linear, the number of compositional routes a quadratic and the number of recursive routes a higher order dependence on the number of rules. This means that, with a ?xed distribution of types of rules, even if a high proportion of rules contributes only to indexical routes, as the number of rules goes to in?nity recursive routes will always dominate.

3.3.2 expressiveness & rules
We can use the relation between rules and routes, to estimate “expressiveness”, the size of the language a grammar generates. De?nition 3 EXPRESSIVENESS (E) is the number of distinct strings a language consists of, or the number of distinct strings a grammar can accept (or generate). If we assume that every string w has the same chance of being reached by a random route, the probability that a given string is not reached with R number of routes is given by:

P (w 2 L) = =

1

? E1

R
(3.4)

m ax

40

CHAPTER 3. SOME PROPERTIES OF FINITE GRAMMARS AND LANGUAGES

The expected number of different strings, is simply the maximum number of strings, Em ax times the probability of a string not being not reached:

E = Em ax te:

1

?

1

?E

1

R!

m ax

(3.5)

Em ax depends as follows on the maximum string length, m axl , and the number of terminal symbols, m axl X l Em ax = (3.6) (te)
l=1

For ?nding the dependence of the expressiveness on N we can use equation 3.5. E shows an approximately linear dependence on R for low values, and it approaches Em ax for high values of R. A very good approximation of equation 3.5 is:

Em ax

E

1

R ? e? Em ax

(3.7)

tion, E R, seems reasonable. This implies that the expressiveness E depends in the same order on N as does R. It seems reasonable to assume that the qualitative observations above, also hold in a more general set of grammars: we expect RI to grow linearly with the number of routes, while RC grows much faster and RR grows extremely fast. Mathematically, this remains to be sorted out. For this, the representation of routes should be altered to trees, in stead of linear sequences and the classi?cation altered accordingly, and the estimates of the relation between R and N should deal with more than one non-terminal.

1 Em ax . For languages with a reasonably long m axl , the Em ax will be enormous and a linear approxima2

One can mathematically show that a linear approximation of this equation is reasonable up to R

3.3.3 characterization of changing grammars
We can now characterize grammars using the developed concepts. Given a particular grammar G, one can calculate the exact number of distinct possible routes of each type or the actual number of routes produced by some particular derivation or parsing algorithm: RI , RC , RR . Another relevant measure, is the probability that, if at every rewriting step a random ?tting rule is chosen, a route of type i results. Let G be a grammar from our simple set Ges , than Pi denotes the probability of a random walk from the start symbol S to a terminal string, to produce a route ri of type i: (3.8) Na + Nb PC = N Nb N N NdN (3.9) a+ b c+ d PR = N Nb N N NcN (3.10) a+ b c+ d The estimated path lengths Ti are 1 if i = I , and 2 if i = C . In the case of recursive routes this is slightly
=

PI

Na

more complicated:

TR = 2 +
(where Pc

m axc X

Expressiveness can also be separated in three components, EI , EC , ER , for which RI , RC , and RR are good estimators if R is small. For any particular grammar the values can directly be calculated:

m axc ! 1.)

=

Nc Nc +Nd . The approximation is exact if there is no upper-bound on the number of cycles, i.e.

n=1

((

Paa )n n)

2+

Paa n (?1 + Paa )

(3.11)

3.3. DYNAMICAL PROPERTIES: EXPRESSIVENESS

41

De?nition 4 Ei (G) is the number of strings a grammar G can accept (or generate) by following route ri of type i, where i 2 fI; C; Rg. A categorization of grammars can now be based on all three measures Ri , Pi or Ei . For instance: De?nition 5 A grammar G is de?ned to be of type i, where i is the type of the largest component Ei and i 2 fI; C; Rg. For large enough m axc (=m axl ) and large enough R, and some restriction on the distribution of rules, we predict that any random grammar will be of type R. Speculatively, this suggest that a language system of organisms with enough cognitive skills to deal with long strings and many rules will necessarily be of a recursive type (see ?gure 3.2b).
100
S?>110, B1, 111, B10, 1B101 B10B, 10, B11B B?>01, 00, 111, B10, 111, 11010, 101, 11, B00 ST=3 SB=5 BT=7 BB=2 N=29 Nused =17 R=528

80

indexical compositional recursive

expressiveness(i)

Recursive expressiveness dominates

recursive compositional indexical

60

40

Indexical expressiveness dominates

Compositional expressiveness dominates

P=100% E=56

20

E (parse)

P (derive)

R (routes)

0

0

10

20 grammar size

30

40

(a) A grammar taken from one of the simulations (see ?gure 3.3) and some of its properties. The three measures (E, P and R) discussed in this section, differ signi?cantly. maxc is set to 4. We consider only the rules that are indeed used in parsing.

(b) In large enough, random grammars, recursive expressiveness will dominate, even if there are relatively few recursive rules. Parameters are: a=.57, b=.2, c=0.03, d=0.2 (the relative fraction of rules of type a, b, c or d, as in ?gure 3.1), maxc=6.

Figure 3.2: Indexical, compositional and recursive grammars

3.3.4 evaluation with simulation data
We evaluate the usefulness of the calculations above, by applying them to some of the data obtained in the simulation study of chapter 2. In ?gure 3.3(a) both the analytical prediction of the relation between grammar size and expressiveness, and the observed value in three simulations typical of the regimes discussed in chapter 2 are plotted. The grammars in these simulations are not restricted in the number of terminal and non-terminal symbols that occur at the right-hand sides of rules. In our calculations we made some strong assumptions about these numbers, so we should not expect the data to quantitatively match. However, as one can see, the calculations do give a qualitative explanation of the ordering and the shape of the curves. If one assumes a monotonous increase in the grammar size, also the shape of the curves in (b) can be understood. For a more quantitative analysis we would need to calculate the relation between E and N for the much more complicated types of grammars we allow in the simulations. Although some progress is made in ?nding the techniques necessary for such calculations (which include “structure matrices”), we leave these issues as future work.

42

CHAPTER 3. SOME PROPERTIES OF FINITE GRAMMARS AND LANGUAGES

150

120

S ?> BBB B ?> BB | 1 | 0

recursiv e

100

expressiveness

60

co

o mp

siti

on

al

expressiveness

80

100

S ?> B10 | 1B0 B ?> 0 | B1 | B10 B01B0B

S ?> 110 | B1 | 111 | B10 | 1B101 | ... B ?> 01 | 00 | 1100 | B1 | 111 | 1101 | ...

40

50

20

indexical

S ?> 011 | 1100 | 11110 | 111 | 0A00 | ... A ?> 01

0

0

4

8 12 functional grammar size

16

20

0

0

1000 time

2000

3000

(a) Linear, quadratic and higher order dependence of E on N. Data from computer simulations of evolving grammars show similar dependence, and indeed can be characterized as primarily indexical, compositional and recursive.

(b) The same simulations show rapid, intermediate and linear growth of E over time. The examples illustrate, that the rate of change of E indeed seems to contain information on the grammar’s type.

Figure 3.3: Simulation data

3.4 Statistical properties: language
The dynamical behavior of languages (rate of change of E ), the distribution of strings over string space and the distribution over string length space contains information on the type of grammar that generates the language. In chapter 2 we discussed 3 self-enforcing dynamical regimes, that yield primarily recursive, compositional and indexical grammars respectively. One of the hypothesized mechanisms underlying this behavior was a group effect: the languages of agents jointly constitute a group language. This can restrict the evolutionary path to only lead towards grammars of the type that is already dominant. However, a prerequisite for such a mechanism is that one can ?nd statistical patterns in an individual language that re?ect the class of its corresponding grammar. This section discusses two efforts to identify such patterns, neither of which is conclusive. This work relates to work in the ?eld of “grammar induction”, but such relations remain to be sorted out.

3.4.1 substring frequencies
A ?rst attempt to identify such patterns looks at the frequencies of occurrence of substrings in the strings of a language. If we assume that all rules of a grammar have a random sequence of terminal symbols (a “building block”) at one side of the non-terminal symbol (if present), we can infer several expectations about the language generated. If all routes are of a indexical type, the language simply consists of these random sequences, and therefore shows no non-random distribution effects. However, if all routes are of a compositional type, we expect similarities between different strings, as they are built up from the same building blocks. If all routes are recursive, we also expect repetition of the same substring within strings of a language. The following “signatures” of the grammar-type are thus expected in languages: i. (random) indexical grammars yield a random distribution of strings in a language; ii. compositional grammars yield non-random repetition of substrings between strings of a language; iii. recursive grammars yield non-random repetition of substrings within and between strings of a language.

3.4. STATISTICAL PROPERTIES: LANGUAGE

43

Whether or not such patterns are observable is a non-trivial question, especially because of the small amount of terminal symbols (2) we allow in the simulations. To investigate to what extent the expectations hold, we will use histograms of the frequencies of occurrence of each substring in the strings of a language L. In a domain with maximum string length m axl = 6 and Vte = f0; 1g this leads to 126 values, which can be represented by a vector ~ of length 126. Multiple occurrences of a substring in a string are h counted only once. We will consider two such vectors: i. ii.

~ are all frequencies of the substrings in the set of all possible strings. h ~ a are all frequencies of the substrings in language La.2 h ~ e xp = E ~ h E h
m ax

We de?ne ~ e xp as the expected frequencies of substrings, given that these substring are building blocks h (“terminal additions”) in a compositional grammar. An estimate of this vector is: (3.12)

h We expect that comparing the resulting vector with ~ a , yields information on which substrings are in fact compositional building blocks. Similarly, indications of recursiveness can be found in a matrix, where each row is a vector containing the frequencies of n-time occurrences of a substring in the strings of a language. Recursive building blocks are likely to occur relatively often in a particular string.
Grammar: S?>001A, 10A A?>100, 11 Language: 001100, 00111, 10100, 1011

4

01

0

1

frequency

3

011

10 11

2

1

0 0 5 10 15 20 substring index 25 30

Figure 3.4: Substring frequencies for an simple example grammar. Multiple occurrences of a substring in a string are counted only once. The “building blocks” in the grammar are not the most frequent substrings in the language.

Results of such an analysis are disappointing. Consider for instance the case of a very simple grammar:

001

S S A A
2 Both

7! 7! 7! 7!

001 10

A A

100 11

vectors are calculated using the matrix M , in which each element Mij is 1 if string i is a substring of string j , and 0 otherwise. ~ and ~ a are calculated by multiplying an all 1 vector, l1 , and the language vector, la , with the matrix. la is the vector h h where each element i is 1 if the string in present in the language, and 0 otherwise.

0011

100

101

44

CHAPTER 3. SOME PROPERTIES OF FINITE GRAMMARS AND LANGUAGES

One can easily see that the strings such a grammar can produce, are 001100, 00111, 10100 and 1011. Those are, of course, exactly the strings that can be built up from the building blocks 001 and 10 on the left side, and 100 and 11 on the right side. If one calculates the substring frequencies, however, these are not the most frequent substrings. In the substring histogram of ?gure 3.4 one can see that the highest frequencies are for substrings that are not such building blocks at all, but accidental combinations of symbols from the hind and front of multiple building blocks. Also in the substring histograms of the grammars from our simulations, no clear pattern could be identi?ed. Possibly, this failure is an artifact of the extremely small terminal alphabet we used. We did not explore this approach any further.

3.4.2 co-occurrence patterns
By simply counting the number of times substrings occur in a language, the analysis above ignores much information. Given some substrings that are “building blocks” in a compositional or recursive grammar, one expects a set of strings to be elements of the language. Absence of one of these is evidence that the hypothesized substrings are in fact not the building blocks. One can work out these ideas to evaluate for every combination of building blocks in a compositional or recursive framework, how well they ?t the language observed. We expect such an approach to quickly face a combinatorial explosion, due to the high dimensionality of the search space. To get a grip on the information that is contained in the co-occurrence patterns, we used a well-known heuristic to look at high-dimensional data: non-supervised cluster analysis, with the Ward cluster criterion. We used data from 100 simulations with the same parameters as in ?gure 3.3. From each simulation we took one of the ?rst agents with an expressiveness above 9, and one with an expressiveness above 49. We thus have 200 agents. For each possible string, we calculated a vector of length 200, where each entry i is 1 if that particular string can be parsed by the grammar of agent i, and 0 otherwise. The results of clustering are shown in ?gure 3.5. As one can see, strings in a cluster are dominantly grouped together because of recursive relations. In part, the explanation from this dominance is trivial, as most of the 50+ agents have recursive grammars. However, some examples can be found of clusters of strings with compositional relations. The preliminary conclusion from the clustering data, is that there indeed is a pattern in the language that re?ects the structure of the grammar. Although one particular string can just as well be generated in an indexical, compositional or recursive manner, the co-occurrence of multiple strings holds information which of those 3 types is actually used.

3.5 Conclusions
Finite grammars and languages are often thought to be uninteresting. However, in models of language learning and evolution they play an important role. In such studies their dynamical and statistical properties remain largely unexplored. We present a preliminary attempt to understand the dynamics of expressiveness in our model of language evolution by analyzing the relation between the number of routes and grammar size, and expressiveness and the number of routes. The classi?cation of grammars in terms of routes we introduced seems useful, as it leads to predictions on how these variables depend on each other. Further, we present a preliminary attempt to understand the apparent feedback loop from grammar to language to grammar in our model, by looking at some of the statistical patterns in the grammar— language mapping. Although very preliminary, we believe our analysis sheds some light on how statistical patterns in a language re?ect the structure of the grammar. Neither of the discussions is conclusive. However, we consider this a ?rst step of many that need to be set to bridge the gap between evolution of language studies and the ?elds of formal languages, ?nite state machines and graph theory.

3.5. CONCLUSIONS

45

______ 21 | | __ 45 ___| __| | || |__ 93 | | | |_____ 77 | || ____ 43 || | || ||____ 75 ||____| | | ____ 83 | || _______________| |____ 91 | | | | ___ 37 | | _| | | ___| |___ 69 | | | | | | | |_____ 73 | | | | |_| ____ 41 | | _| | | | |____ 81 | |__| | | ___ 85 | |__| | |___ 89 | | ______ 25 | | | | ___ 49 ______| | | | _| ||___ 97 | | ||| | | ||____ 99 | | | | __________| |_____101 | | | | | | ____ 53 | | | | | | | ||____107 | | |__| | | |_____105 | | | | | |_____109 | | |_______| _____ 27 | | | | __ 55 | ||__| | ___| |__111 | | | | | | ____ 51 | | |_| | | |____115 |________| | _____ 29 | | | | ____ 59 | || |____|| ___ 61 ||| ||___123 | |____119

0110 01110 011110 001110 01100 001100 010100 011100 00110 000110 001010 01010 010010 010110 011010 1010 10010 100010 100100 100110 10110 101100 101010 101110 1100 11000 110000 10100 110100 1110 11100 11110 111100 111000

Figure 3.5: A fragment of the cluster analysis, based on data of 200 agents from 100 simulations (see text). Every string is represented with its “occurrence vector”. In an occurrence vector an element a is 1 if that particular string is part of the language of agent a. All such vectors de?ne points in a 200-dimensional space. Initially, every point forms one cluster. A cluster Ci is merged with another cluster Cj if the mean squared distance between all points in Ci and Cj is the lowest of all i; j -pairs (“Ward’s averaging”, Hogeweg, personal communication).

46

CHAPTER 3. SOME PROPERTIES OF FINITE GRAMMARS AND LANGUAGES

Chapter 4

Selective advantages of syntax
This chapter studies the circumstances under which syntax develops in two variants of the computational model of language evolution that was introduced in chapter 2. We designed several simple alternative scoring schemes, that re?ect hypotheses on the selection pressures that led to human linguistic abilities. The most important of these schemes are labeled “communication”, that assigns scores to both speaker and hearer if a string is successfully parsed, and “perception”, that assigns only a score to the hearer. We studied two types of models. In the ?rst type, the global model, all agents can interact with all other agents. In the second type, the spatial model, agents are localized in space and interact only with their immediate neighbors. Under communication settings syntax and high levels of expressiveness are hard to obtain, while under perception settings expressive syntax usually quickly develops. However, in the global model with the perception scheme agents loose the willingness to speak. A paradox arises, as highly expressive syntax and willingness to speak seem mutually exclusive. In the spatial model this paradox can be resolved.

4.1 Introduction
Speculations on the origins of language usually attribute to language the function of a medium for exchanging information. The huge difference in expressiveness between animal communication and human language is thought to be explained by an evolutionary optimization of expressiveness, driven by the adaptive advantage of exchanging more and more information. Syntax, in this view, reconciles the need for high expressiveness, with some of the natural boundary conditions on communication, such as memory limitations [Pinker & Bloom, 1990], an error limit in transmission [Nowak & Krakauer, 1999] or a transmission bottleneck [Kirby, 1999]. However, present-day language ful?lls many more functions than exchanging information, including facilitating social relations, individual expression, increase of status and esthetic experience. Some argue that language is the primary means to internalize our knowledge of the world and to use symbolic reasoning. It is unclear in what way such functions are recent side-effects, or play an important role in explaining the origins of language. Discussions of such issues tend to be very unsatisfactory, because they seem hardly restricted by empirical or theoretical bounds. Computational modeling offers a novel approach to these issues, because such models are at least restricted by whether or not the combination of assumptions implemented in the model yield the hypothesized outcome: syntactical language. For that reason, we modi?ed the model that was studied in chapter 2 to evaluate which combination of evolution schemes and selection pressures lead to the highly expressive, syntactical language systems. Whereas chapter 2 studied some general model behaviors with a ?xed scoring scheme, here we will vary the scoring schemes to implement different selection pressures. The model architecture resembles the model of Hashimoto & Ikegami [1996], and the study in this chapter relates to their surprising result that a negative score for being recognized speeds up the evolution of syntactical language. However, in 47

48

CHAPTER 4. SELECTIVE ADVANTAGES OF SYNTAX

stead of the rather ad-hoc evolution and scoring schemes in their study, we will use much simpler model choices. This chapter will describe the details and results of two models. The ?rst model, discussed in section 4.2, is “global” in the sense that every agent can interact with every other agent. In contrast, section 4.3 will discuss a “spatial” model, in which agents have a position in a 2-dimensional space and interact only with their immediate neighbors. In such a model spatial patterns can occur. The behaviors in both models differ signi?cantly. The chapter shows some interesting model behaviors and gives a preliminary analysis to explain them. However, we emphasize that it presents only a starting point for a more complete model study.

4.2 The global model
4.2.1 model description
Similar to chapter 2, the model consists of a population of agents with an innate context free grammar, which they inherit with some mutations from their parent. Agents have the ability to derive and parse strings of 0’s and 1’s of maximum length 6, using the rules from the grammar. In a language game, all agents can speak one string and try to recognize the strings produced by other agents. Every generation a number of games is played and scores are attributed in two scoring dimensions: recognizing and being recognized.

derive

3000x
parse

language games

mutation & selection (scores)

00110
(a) The language game

10x

(b) The evolutionary cycle

Figure 4.1: The basic model architecture. (a) In one language game all agents (P) speak one string that all agents try to parse. Agents receive scores for speaking, understanding and being understood according to the used scoring scheme. Each generation a number of games (R) is played. (b) Depending on the scores, agents can reproduce, with some mutations, to the next generation. Default parameters: P=20, R=20.

Unlike chapter 2, in this study we replace all agents every generation with offspring of the present population. The number of offspring of an agent depends on the total score it has received relative to other agents. Random mutations are applied to the offspring with ?xed probabilities for modi?cation of existing rules (“replace”), duplication of a random rule (“add”) or deletion of a rule (“delete”; see table 4.1). We also implemented a mutation “shift”, that swaps a rule with the previous rule in the grammar and occurs with a probability per rule. Note that, apart from being simpler than the evolution scheme of Hashimoto & Ikegami (see table 2.1), the scheme used here also differs in de?ning the replacing mutation probability per rule in stead of per grammar. This prevents the mechanism of effective mutations becoming increasingly improbable with increasing grammar size. Adding and deletion probabilities are de?ned per grammar.1
1 This was chosen

to avoid a self-enforcing increase or decrease in grammar size if the probabilities for addition and deletion are

4.2. THE GLOBAL MODEL

49

mutation probability effect adding madd per grammar duplicate a random rule from the grammar deleting mdel per grammar delete a random rule from the grammar replacing mrep per rule modify the rule shift mshift per rule swap a rule with the previous rule in the grammar Modify by replacing the left hand side symbol with a random non-terminal, by replacing a random symbol from the right hand side with a random symbol, or by deleting a random symbol (if more than one) from rhs. Don’t perform mutation that lead to an “S” at the right-hand side.
Table 4.1: The evolution scheme. Every generation, all agents reproduce and die. The population size is held constant. Each new agent is the child of an old agent with a probability proportional to the old agent’s score divided by the total score. Mutations are applied to the offspring with the given probabilities. Default parameters: madd=0.01, mdel=0.01, mrep=0.01, mshift=0.5

The scores in the scheme of Hashimoto & Ikegami [1996] depend in a rather complicated way on the string length, the number of parsing steps and the number of times a string is already used in the last 1000 communications (“trend”, see table 2.2). We simpli?ed the scores by giving a ?xed score for a successful communication, and none for failure (except for “intimidation”). In most simulations, we use an explicit “innovation pressure”. This pressure is implemented by discounting scores with the number of times a string is already heard before. Note that, unlike chapter 2 and Hashimoto & Ikegami [1996], no scores are attributed for “speaking”. The weight of this score was highest in these models, but its biological interpretation is unclear. We designed 5 schemes, that re?ect hypotheses on the function of language. These schemes are labeled “communication”, “perception”, “intimidation”, “manipulation” and “cognition”. communication corresponds to a selection pressure to optimize the total of exchanged information, such that both the speaker and the hearer bene?t from successful communication. This pressure is implemented by a score for recognition and for being recognized.; perception corresponds to a selection pressure to optimize the total of information received, in order to make use of the knowledge of others (as if one indirectly shares someone else’s perception). This pressure is implemented as a score for recognition; intimidation corresponds to a selection pressure on using language to intimidate others, for instance by speaking long and novel strings. This pressure is implemented by giving a score for not being understood (or a score for speaking long strings); manipulation corresponds to a selection pressure on using language to manipulate others. This pressure is implemented as a score for being recognized. cognition corresponds to a selection pressure on using language to process information internally. It is implemented by giving a score for recognizing each novel string an agent itself has produced. We will investigate under which circumstances in this model highly expressive, syntactical grammars emerge. “Syntax” here is interpreted as involving compositionality and recursiveness (as de?ned in chapter 3). A level of expressiveness close to 126, the upper bound that results from a maximum string length of 6, is a reliable indicator of syntax throughout this thesis. To reach such high expressiveness, nonsyntactical grammars would need more rules than can be obtained within the time and mutation constraints of our simulations.

4.2.2 the development syntax is not trivial
Although compositional and recursive structures are always only a few mutations away, the development of syntactical grammars is not trivial at all. In fact, in less than half of the parameter combinations we
not equal. Since they are preferably equal anyway, such that grammar size is determined by selective advantages only, this choice seems rather arbitrary in hindsight.

50

CHAPTER 4. SELECTIVE ADVANTAGES OF SYNTAX

scheme communication perception intimidation manipulation cognition

recognizing 1 1 0 0 self:1

being recognized 1 0 failure:1 1 0

Table 4.2: The scoring schemes. Scores are only attributed for successful communications. In most simulations scores are only given for strings that are novel to the hearer. In some simulations these scores are multiplied with the string length. The intimidation score is attributed if the hearer fails to understand a string. The cognition score is only attributed for the novel strings that an agent itself has produced.

tried we observed the emergence of syntax. Even with an explicit pressure on innovation, by explicitly discounting scores with the number of times a string is already heard before, expressive syntax does not necessarily emerge. This fact is surprising, because in such a case the average score per agent has its optimum at maximal expressiveness (see section 4.4.1). The results of the simulations are summarized in table 4.3.

communication perception intimidation manipulation (cognition)

default init + -

indexical init + -

recursive init + + -

(a) without innovation pressure

communication perception intimidation manipulation cognition

default init + + +

indexical init -

recursive init + + -

(b) with innovation pressure

Table 4.3: Summary of results. Indicated is whether or not syntactical grammars with a sustainable high level of expressiveness 10. The recursive initial (E > 100) is reached. The default initial grammar has E = 1. The indexical initial grammar has E grammar has E 100.

communication does not lead to highly expressive grammars with the default initial grammar and without the innovation pressure. Expressiveness remains at a minimal level of 1 or 2. If scores are proportional to string length one can observe an increase in the string length. If the initial grammar is an expressive, recursive grammar, the high level of expressiveness can be maintained. In contrast, with an indexical grammar, grammars remain indexical and expressiveness remains limited (see ?gure 4.2a). With an innovation pressure, the behaviors are very similar, except for that with the default initialization expressive syntax eventually does develop. In this type of runs we observe a stepwise development, with typically long intervals at the same level of expressiveness. Expressive, syntactical

4.2. THE GLOBAL MODEL
grammars are reached only after very many generations (see ?gure 4.2b).

51

communication / without innovation pressure 120 120

communication / with innovation pressure

100

100

expressiveness

80

expressiveness

80

default init indexical init recursive init

60 default init indexical init recursive init

60

40

40

20

20

0

0

400

800

1200

1600 time

2000

2400

2800

0

0

400

800

1200

1600 time

2000

2400

2800

(a) without innovation pressure

(b) with innovation pressure

Figure 4.2: Some typical examples of simulations with “communication” settings and the default initial grammar, the indexical initial grammar and the recursive initial grammar. (a) shows simulations without an innovation pressure (see text) and (b) with such a pressure.

perception shows rapid growth in expressiveness in most cases considered. With the default initialization and an innovation pressure, the growth is generally slower than without such an innovation pressure. Infrequently, we even observe runs that show the “indexical regime” that was discussed in chapter 2. Also, when initialized with an indexical grammar, the runs with innovation pressure show such an regime. For perception settings, innovation pressure thus makes the development of syntax less likely. Apparently, the fact that the hearer bene?ts from richer input hinders this development. Agents in perception runs show another, unexpected feature. While retaining high parsing abilities, they have evolved to fail frequently in derivation and thus not speak (see ?gure 4.3). This aspect will be discussed below. intimidation does not lead to high expressiveness in any of the cases considered. Incidentally a somewhat intermediate level of expressiveness is reached (E 20), but it can not be retained. manipulation does not give rise to expressive syntax either, although a minimal level is always maintained. With all initializations and independent of innovation pressure, simulations lead to grammars that can parse a small (usually 1) number of strings. cognition was only studied with the default initial grammar and an innovation pressure (here, only successfully parsed, novel strings yield a score). Under these circumstances a quick growth of expressiveness was observed, leading to short, ef?cient recursive grammars.

In the simulations without a innovation pressure, agents obtain high scores, even at a low level of expressiveness. Because scores are not discounted with usage, repeating the same string over and over again is no less rewarding than using a richer vocabulary. Biologically this seems not a very realistic situation. Note, however, that the perception scheme nevertheless yields high expressiveness. The competitive advantage of being less understood, while increasing the chance of understanding incidental novel strings, works as a strong incentive to develop richer grammars.

52

CHAPTER 4. SELECTIVE ADVANTAGES OF SYNTAX
perception / with innovation pressure 120

100

expressiveness / %

80

60

40

20

expressiveness d. failures (%) 0 400 800 1200 1600 time 2000 2400 2800

0

Figure 4.3: A typical example of a simulation with “perception” settings, the default initial grammar and an innovation pressure. Shown are the expressiveness over time, and the percentage of failures in derivation. After around 3000 generations this percentage approaches 100, indicating that very little communication is maintained.

4.2.3 paradox of sel?sh language
As discussed above, the perception and communication scheme differ very much in the type of grammars that develop and, subsequently, the level of expressiveness that is obtained. Another striking difference can be seen in ?gure 4.3 and has to do with the high number of failures that occur in derivation. Apparently, agents in the perception settings develop grammars that are able to parse a high number of strings, but nevertheless frequently fail in derivation. This is possible because of the asymmetry in parsing and derivation. The derivation procedure applies, starting with an “S”, a random ?tting rule at every step without backtracking. In contrast, the parsing procedure consist of a complete, depth-?rst search through the parsing tree with some maximum number of steps. One can easily design a grammar with a corresponding tree that shows a “bottleneck”: a rewriting step that can easily be found in a complete, “bottom-up” search, but is extremely unlikely to be chosen in a random “top-down” walk. For instance:

S 7! A j B 1 j 00B j 1B 0 B 7! 0B A 7! AA j 0 j 1
In derivation, an agent with such a grammar will in 3 out of 4 cases choose a rule that rewrites S to some string that contains a B, enter a loop until the maximum number of derivation steps is reached (60) and subsequently fail to speak. Nevertheless, the recursive structure with variable A enables the agent to parse all possible strings. In the simulations we ?nd grammars that fail in 98 out of 100 derivation calls, and nevertheless can parse all 126 strings. This possibility was not implemented intensionally. Nevertheless, the evolutionary process discovered it and “actively” exploits it. This observation points at a important assumption in the model: agents are forced to participate in the language game. Under perception settings, participation is not bene?cial for the speaker. A classic altruism problem thus arises: if speaking behavior is bene?cial only for an individual’s competitors, why would it be retained in evolution? We extended the model with a parameter for probability to speak. Under perception settings this parameter indeed quickly evolves to zero. Interestingly, taken together the results of the global model constitute a paradox: under those circumstances that syntactic expressiveness develops, willingness to

4.3. THE SPATIAL MODEL

53

speak disappears. Under the circumstances where willingness to speak is retained, syntactical language does not develop (starting with an indexical language of some size). In theoretical biology the evolution of altruism is a well-known problem. Mechanisms proposed to explain the evolution of cooperation and altruism include kin selection, group selection and level formation. Spatial models, in which individuals are localized in a space and interaction occur locally, have been repeatedly shown to naturally yield altruism and cooperation. Spatial pattern formation makes multilevel evolution possible and kin selection more likely [Boerlijst & Hogeweg, 1991b; Nowak & May, 1992]. In the following, we will discuss an alternative model in which such spatial patterns can occur.

4.3 The spatial model
4.3.1 model description
Agents in the spatial model are very similar to the agents in the global model. They have a context free grammar, inherited, with some mutations, from a parent. They have the ability to parse and derive strings of 0’s and 1’s of maximum length 6. In the model there are much more agents, and the exact population size ?uctuates. Agents are placed on a 2D square grid of cells. One cell can be occupied by one agent. The model is thus an example of cellular automata (with asynchronous updating). Interaction occurs only between agents of neighboring cells. All agents, in random order, can produce a string. All neighboring agents — the maximum is 8 — try to parse the string. If parsing is successful, scores are attributed. As in the global model, we used a “communication” scheme, where both speaker and hearer receive a score. This score can be discounted with the number of times the hearer has already heard the string (“innovation pressure”). In the “perception” scheme, only the hearer receives a score. Every time step an agent has a ?xed probability (usually 0.2) to “die”. Every empty square can be ?lled with a child of one of its neighbors. If an empty cell has neighbors, the probability of each neighbor to be chosen to replicate to the empty square, depends on its relative score. An agent’s score (S) is taken to the power x (the selection coef?cient), and divided by the total of the scores to the power x of all neighboring agents to determine this probability:
)x Pi (replicating) = P(SiS )x (

j j

Because an empty square is ?lled almost immediately, and 20% of the agents dies every time step, the ?eld of 50x50 cells usually contains close to 2000 agents. All squares are updated every time step in random order. Offspring in the model inherits the grammar of its parent. Some mutations are applied to it, similar to the global model described in section 2. The adding mutation consists of duplicating a random rule with some probability per grammar. The deleting mutation consists of deleting a random rule with a probability per grammar. The replacing mutation consists of modifying a rule with a probability per rule. In some of the simulations reported here we also used the shift mutation, that swaps a rule with the previous rule in the grammar. To investigate the effect of pattern formation, we use a procedure “shuf?e” to destroy spatial patterns. This procedure consists of randomizing the order of all agents every time step. The boundary conditions in the model are simple: boundaries are empty. Agents at the boundary of the ?eld thus communicate with fewer neighbors: at most 5 at the edges and 3 at the corners. If the ?eld is small, this might lead to artifacts. The simulations therefore use ?elds of at least 50x50 cells. To observe the behavior of the model we frequently save all features of agents to disk. This ?le can be used to calculate average expressiveness, number of rules etc. But it can also be used to create a picture of the ?eld, where every agents receives a color depending on its expressiveness, its type of language etc. Such pictures can be viewed one by one (“snapshots”) or as a movie. Changing patterns in the movies can reveal the kind of spatial pattern formation that is responsible for certain types of behavior.

54

CHAPTER 4. SELECTIVE ADVANTAGES OF SYNTAX

4.3.2 perception in space
Interestingly, the willingness to speak can be retained in the spatial model with perception settings. The parameter that determines the probability of an agent to speak at its turn in the language game, is initialized at 0.1. As one can see in the example of ?gure 4.4(b), the average value rapidly evolves to a high value close to the maximum (the individual maximum value is 1.0; however, on the level of the population this maximum is lower, because mutations are frequent and can at the upper bound only lower the value). Two example snapshots of the ?eld are shown in ?gure 4.5. Spatial patterns are responsible for this selection pressure towards altruistic behavior. If one destroys the spatial patterns, also the willingness to speak disappears. This can be seen in ?gure 4.4(a), where at each time step the position on the ?eld of all agents is randomized.
perception / shuffled 1 expressiveness willingness to speak 1 perception / spatial

average percentage of maximum

0.6

average percentage of maximum

0.8

0.8

0.6

0.4

0.4

0.2

0.2 expressiveness willingness to speak

0

0

500 time

1000

1500

0

0

500 time

1000

1500

(a) shuf?ed

(b) spatial

Figure 4.4: Perception in space. Shown is average fraction of maximum of expressiveness (maximum is 126) and willingness to speak (maximum is 1). Parameters are: initial population size = 2000, number of games per generation = 1, maximum string length = 6, minimum number of understanders = 0, madd = 0.1, mrep = 0.01, mdel = 0.01, maximum number of parsing steps = 500, maximum number of derivation steps = 60, self-interaction not allowed, discount factor 1.0, scores proportional to string length

4.3.3 communication in space
The behavior of the spatial model under communication settings is very similar to the global model. In the few simulations that were studied, the general observation is that expressiveness stays low and willingness to speak quickly grows in both shuf?ed and spatial settings, and both with and without innovation pressure (see ?gure 4.6b). Figure 4.6(a) shows an exception, where after around 800 generations a sudden increase in expressiveness is observed. In the simulations of ?gure 4.4 and 4.6 the probability of the adding mutation is much higher than the probability of the deletion mutation. This leads to a strong bias to larger grammars. Such large grammars have a large unused, neutral tail, that accidentally by some mutation can yield high expressiveness. As we have seen in the global model, “communication” can retain a high level of expressiveness once it is present. The surprising behavior thus might be explained as accidental. However, no very high levels of expressiveness were observed in the simulations with spatial patterns, although several of them had the same bias for larger grammars. More work needs to be done to see how common the “exceptional” behavior is and to understand the mechanisms underlying it.

4.4. DISCUSSION

55

(a) generation 700

(b) generation 900

Figure 4.5: Perception in space. Shown are snap shots of the ?eld at (a) generation 700 and (b) generation 900 of the same simulation as in ?gure 4.4(b). The dark region in the left corner are agents with high expressiveness. They eventually take over the whole ?eld. Black squares are empty.

4.4 Discussion
For understanding these results better, it is useful to simplify the language game to its essence, and to gradually add more model details to see how many are needed to understand the observed behaviors. In the following we will ?rst show that in a homogeneous population with innovation pressure, a higher level of expressiveness yields higher average scores. However, the population of course does not evolve homogeneously. Selection is “frequency dependent” and many additional feature of the behavior can be understood from a simple game-theoretic model. Next, we will discuss some important aspects that are still left out. Finally, we will speculate on two mechanisms that could explain the difference between the global and the spatial model.

4.4.1 homogeneous high expressiveness is optimal
The fact that in many cases syntactic grammars do not develop, is surprising. One can easily construct a short, syntactical grammar that can parse and produce all 126 possible strings (“minimal almighty”). For instance:

A A A S

7! 1 7! 0 7! AA 7! A

This grammar has only 4 rules and can parse and derive all 126 possible strings. This particular grammar would not compete very well in the simulations, because derivation is strongly biased to short strings and parsing capabilities dramatically collapse with any mutation (even most changes in the order of the

56

CHAPTER 4. SELECTIVE ADVANTAGES OF SYNTAX

communication / shuffled 1 1

communication / spatial

average percentage of maximum

0.6

average percentage of maximum
expressiveness willingness to speak

0.8

0.8

0.6

0.4

0.4

0.2

0.2 expressiveness willingness to speak 0 0 500 time 1000 1500

0

0

500 time

1000

1500

(a) shuf?ed

(b) spatial

Figure 4.6: communication

rules would be very harmful, because of the maximum number of parsings steps). However, the point here is that from any given grammar, only a limited number of mutations are necessary to obtain expressive, syntactical structures. Moreover, in all the scoring schemes with “innovation pressure” a homogeneous population of agents will obtain higher scores if the expressiveness of their shared language is higher. This is because scores are attributed for successful communications, and discounted for the number of times the hearer has already heard a particular string. Given that the language is shared, an extra string would not decrease the number of successful communications, but would be more novel and thus lessen the discount. Figure 4.7(a) shows the average scores of agents in a homogeneous population, for 3 different scoring schemes. These curves are obtained using a hand-built indexical grammar, that contained all possible strings in the order of decreasing length. In such a grammar, all strings that are part of the language have equal probabilities of being spoken. The indicated scores therefore are close to the maximum. Syntactical grammars yield lower average scores. Compared to grammars from the simulations, handbuilt recursive grammars tend to yield much lower expressiveness than anticipated, because of the counterintuitive effects of the maximum number of parsings steps, and tend to yield much lower scores because of non-equal probabilities of strings to be spoken. Figure 4.7(b) shows some examples of “communication” runs with both indexical and syntactical grammars in a space of score vs. expressiveness. Without an innovation pressure, average scores in a homogeneous population are independent of expressiveness and would show horizontal lines.

4.4.2 frequency dependent selection
The intuitive expectation is thus, that expressive grammars are selectively advantageous. However, the model points at the need for a more differentiated analysis. Indeed, the explicit role of expressiveness, is that scores depend on the novelty of strings. Implicitly, expressiveness also in?uences the scores in other ways. (i) Expressive speakers are more likely not to be understood, while (ii) expressive listeners are more likely to understand. These aspects play a role if the population is not homogeneous, which an evolving population of course never is. The ?tness contribution of a certain trait (i.e. speaking and understanding a particu-

4.4. DISCUSSION

57

average score in a homogeneous population 600 600

communication / innovation pressure recursive init default init default init indexical init

500 400 400

score

score
200 communication perception manipulation 0 0 20 40 60 expressiveness 80 100

300

200

100

0

0

20

40

60 80 expressiveness

100

120

(a) Average scores of a homogeneous population of agents with indexical grammars of increasing expressiveness, with an “innovation pressure”. In all scoring schemes, except “intimidation”, the higher the expressiveness, the higher the scores.

(b) Trajectories of 4 “communication” runs in a space of score vs. expressiveness. Note that with the recursive initial grammar (taken from a “perception” run with many failures in derivation) initially scores are very low even though expressiveness is high. Soon, however, agents reach a nearoptimal situation.

Figure 4.7: Score vs. expressiveness for (a) a homogeneous population of agents with indexical grammars, and (b) four simulations under “communication” settings with innovation pressure.

lar string) is not a function of an agent’s grammar alone, but also of the frequency of this trait in the rest of the population. Such a situation is often referred to as “frequency dependent selection”, and intensely studied in evolutionary game theory. Some of the striking differences in the results of different scoring schemes can be better understood by looking at a very simple game-theoretic model. We look at just two agents, who communicate in some language with two levels of expressiveness: low (1) and high (5). We assume that at any level mutual understanding is complete. However, a low expressive agent communicating with a high expressive agents, understands only a fraction of the spoken strings. In a communication scheme a score is attributed for every successful communication (out of 5 each) to both speaker and hearer; in the perception scheme only to the hearer. We assume that an agent speaks all strings it can speak, and that a score (S ) is discounted for the number of times (N ) a string is already heard by the hearer according to: S = (:5)N . If we work out the language games that take place in such a set-up, we can ?nd the behaviors as represented in table 4.4. It is easy to see that these lead to the net scores as shown in table 4.5(a,b). For sake of completeness, table 4.5(c,d) shows the net scores in a similar simpli?ed language game without innovation pressure. As one can see, both the low/low and the high/high situations are equilibria in the communication case. If both agents have low expressiveness, it is disadvantageous for either of them to move towards higher expressiveness. In contrast, in the case of perception agents have competitive advantage to move towards higher expressiveness. Only the high/high situation is an equilibrium. The essential observation here is that, although homogeneous high expressiveness is the “best” solution, unilateral high expressiveness under communication setting is in fact disadvantageous. The fact that there are two equilibria in the case of communication and one in the case of perception, qualitatively corresponds to the results we obtained in the simulations. If initialized with a highly expressive, syntactical grammar the communication runs can retain those, while under other circumstances expressive syntax is dif?cult to obtain (not impossible, however). Under almost all circumstances perception runs develop expressive syntax (not always, however).

58

CHAPTER 4. SELECTIVE ADVANTAGES OF SYNTAX

low/low A1 speaks w1 w1 w1 w1 w1 high/high A1 speaks w1 w2 w3 w4 w5

communication S1 S2 1 1 .5 .5 .25 .25 .125 .125 .063 .063 communication S1 S2 1 1 1 1 1 1 1 1 1 1
(a)

perception S1 S2 0 1 0 .5 0 .25 0 .125 0 .063 perception S1 S2 0 1 0 1 0 1 0 1 0 1

high/low A1 speaks w1 w2 w3 w4 w5 A2 speaks w1 w1 w1 w1 w1

communication S1 S2 1 1 0 0 0 0 0 0 0 0

perception S1 S2 0 1 0 0 0 0 0 0 0 0

1 .5 .25 .125 .063

1 .5 .25 .125 .063
(b)

1 .5 .25 .125 .063

0 0 0 0 0

Table 4.4: Scores (S) per communication for two agents, A1 and A2, in a simpli?ed language game with innovation pressure. Strings from the language are represented with a w; different strings have different numbers. Shown are 6 qualitatively different situations, with (a) equally low expressiveness (low/low), equally high expressiveness (high/high) and (b) different levels of expressiveness (high/low), for both the communication and perception scoring schemes.

4.4.3 other aspects
However, such a simple analysis certainly does not capture all of the behaviors in our model: for instance, without the innovation pressure, perception runs also develop expressive syntax. We explained this above as a competition effect, but the game-theoretic analysis ignores competition. Further, with the novelty pressure and initialized with the default grammar (S 7! 1 or 0), communication runs can also develop high expressiveness, while initialized with a larger indexical grammar no expressive syntax can develop. With the novelty pressure, on the other hand, perception sometimes does not develop high expressiveness, but remains in a indexical regime. Here, a game-theoretic analysis runs in to dif?culties that have to do with some crude assumptions we made: i. We looked only at two agents, in stead of the ten or twenty or more in the simulations; ii. We represented expressiveness with only 2 values, in stead of the 126 in the simulations; iii. We assumed the speaker uses all strings it can produce; iv. We assumed small languages to be proper subsets of large languages, while in fact they can be totally disjoint; v. We assumed fully deterministic dynamics, while the computational model is stochastic. A game-theoretic analysis on the behavior studied here, quickly becomes very complicated if one relaxes the assumptions above. It would be interesting to study a simpli?ed computational model along these lines (a “model of the model”). We did a preliminary effort to extent the game-theoretic analysis to resemble the computational simulations more closely, by adjusting assumption (ii) and (iv). We represented expressiveness as a continuous value between 0 and 126. We looked at two extremes of interdependence of languages: full dependence, where every language is a subset of every larger language, and full independence, where every language is some random subset of size E of all possible strings. Results from this analysis are promising, but yet inconclusive and are therefore left as future work.

4.4. DISCUSSION

59

A2 A1 low

low 3.9 3.9

high 2.9 2.9 2.9 10 10 high A1 low

A2

low 1.9 1.9 1 1.9

high 1.9 1 5 5

high 2.9

(a) communication / discounted

(b) perception / discounted

A2 A1 low

low 10 10

high 6 6 6 10 10 high A1 low

A2

low 5 5 1 5

high 5 1 5 5

high 6

(c) communication / undiscounted

(d) perception / undiscounted

Table 4.5: Net pay-off in a simpli?ed language game with two levels of expressiveness and (a) scores for recognizing and being recognized, and (b) scores only for recognizing with an innovation pressure (discount) , and (c,d) without such an innovation pressure.

We suspect that for full understanding of the behavior of the global model, the very non-linear grammarlanguage mapping needs to be addressed. As discussed in chapter 3, results from formal language theory and other sources might offer useful tools and insights here and need to be explored. Interestingly, many of the key behaviors of the models in this thesis (such as the indexical regime of chapter 2 and the behavior of communication runs with innovation) seem to depend on the non-linearity and complexity of the mapping, and disappear if one simpli?es this mapping too much.

4.4.4 spatial patterns
Spatial models have repeatedly been shown to quite naturally allow for the evolution of altruistic traits [Boerlijst & Hogeweg, 1991b; Nowak & May, 1992]. Two aspects of spatial models are thought to be important for this: i. Agents tend to be localized near their relatives, which makes “kin selection” more likely to occur. Kin selection, propagating one’s own genetic information by helping relatives, can occur when Hamilton’s inequality holds: b=c > 1=r, where b is the bene?t for the recipient, c is the cost for the actor (measured as changes in the expected number of offspring) and r is the relatedness of the actor to the recipient [Hamilton, 1963, discussed in Maynard-Smith & Szathm? ry, 1995, sec. 16.2]. Crucial is a the average value of r; in the spatial this is inherently larger. ii. Spatial patterns might lead to new levels of selection. Although locally (on the “micro-scale”) altruists are outcompeted by sel?sh agents, patches of altruistic agents grow faster than patches of sel?sh agents. Evolution on a “meso-scale” level (the level of the patches or other spatial patterns) might favor altruism. This relates to Wilson’s group selection [Wilson, 1980, discussed in Maynard-Smith & Szathm? ry, 1995, sec. 4.6], although Wilson imposes a group level, while in spatial models new a levels can spontaneously form. An interesting mechanism in our spatial model would be that relatives tend to have a similar language, such that altruistic speaking is not too harmful, because non-relatives, with less similar languages, can not pro?t from it much anyway. In terms of Hamilton’s inequality: the b is larger and the c smaller for relatives such that the inequality can be satis?ed easier. However, initial results suggest that willingness

60

CHAPTER 4. SELECTIVE ADVANTAGES OF SYNTAX

to speak can be retained even when the vast majority of agents can speak and understand all possible strings and all differences between individual languages have thus faded. Many open questions remain. For instance, under perception settings there is an indirect bene?t of speaking that leads to high values of the willingness to speak. Why then, does this indirect bene?t not result in the same disadvantage of unilateral high expressiveness that we observe under communication settings? Such intriguing issues are left for future work.

4.5 Conclusions
This chapter studied the selective advantages of syntax in a global and a spatial model of evolving rewriting grammars. We showed that syntax is not a trivial outcome of the evolution of language in circumstances where communicating about many things is bene?cial, which is an implicit assumption in many “adaptationist” accounts of the origins of grammar. In fact, high expressiveness and willingness to speak seem to exclude one another in the global model. High expressiveness can be problematic in the case of “communication”, while willingness to speak can be problematic in the case of “perception”. Together, these results suggest an interesting paradox. This paradox can be resolved in the spatial model. Spatial pattern formation can explain the evolution of willingness to speak under perception settings. Under some particular mutation parameters also evolution of expressive syntax under communication settings was observed. Much work still needs to be done to evaluate how general the observed behaviors are and to understand all underlying mechanisms. We believe the results reported here form a good starting point for such work. Moreover, we believe these results suggest that the fact that language evolution occurs in a groups of agents that are not homogeneous and might show spatial patterns, will be an important aspect of our understanding of the origins of language.

Chapter 5

Summarizing conclusions
5.1 Summary
The emergence of grammatical language is seen as a major transition in evolution and a turning point in the history of the human species. In chapter 1 we argued that speculations about the origins of the human linguistic abilities, such as Pinker & Bloom [1990], remain unsatisfactory for two main reasons: i) they don’t make use of formal and computational modeling techniques, that can enforce internal consistency and guide the understanding of systems that involve too many interactions to tackle without such tools; ii) they study language, in the tradition of mainstream Chomskyan linguistics, as “syntactic competence”, the hypothesized autonomous, logical system in an individual’s brain. In contrast, this thesis ?ts into a recent trend of computational and mathematical modeling of language in groups of agents. Results from these studies suggest that many aspects of language are better understood if one also considers the many interactions between language development and social, physical and cognitive constraints. This thesis studied the mechanisms that are inherent of an evolving system of communicating agents, if their language potentially shows syntactic structure. In chapter 2 we found that the fact that the success of one agent depends on the state of other agents, rather than on some objective quality measure, significantly in?uences the evolutionary process. We identi?ed an indirect feedback loop, where an agent’s rewriting grammar (“internal language”) maps to an individual “external language”, all these individual languages jointly constitute a “group language”, and the group language in turn determines the success of individual grammars. In simulations we observed three different dynamical behaviors. In chapter 3 we introduced a classi?cation of grammars, that enabled us to characterize these dynamical regimes as primarily indexical, compositional and recursive. Some simple calculations qualitatively explain the shape of the corresponding expressiveness vs. time and grammar size curves. Further, a simple analysis of the grammar–language mapping sheds some light on how statistical patterns in a language can re?ect the structure of the grammar. In chapter 4 we broaden the discussion, by studying the circumstances under which syntactical (compositional & recursive) languages are likely to arise. We ?nd that if both the speaker and the hearer bene?t from successful communication, syntax is hard to obtain. If only the hearer bene?ts, expressive syntax usually develops, but agents loose the willingness to speak. On the other hand, in a variant of the model where agents are placed on a 2D spatial grid, this willingness to speak can be retained.

5.2 Implications
This study concerns the interaction between group dynamics and evolutionary dynamics. We have seen that social patterns in?uence the course of evolution. Under some conditions powerful, recursive grammars develop [Hashimoto & Ikegami, 1996]. Obtaining this type of grammars is in fact a hard problem; 61

62

CHAPTER 5. SUMMARIZING CONCLUSIONS

the fact that we nevertheless ?nd recursive grammars, appears to be due to the social embedding that yields a dynamical ?tness landscape. This relates to work showing the bene?ts of cultural transmission [Batali, 1997; Kirby, 1999], “starting small” [Elman, 1991] and sparse ?tness evaluation [Pagie & Hogeweg, 1997]. However, in other circumstances social patterns hinder the development of such grammars. These results are particularly interesting, as these speci?c circumstances in some sense resemble the situation that is thought to precede the emergence of syntax: large indexical grammars and mutually bene?cial communication. In the model we arrive at a paradox, where those selection pressures that lead to syntactical languages, also lead to unwillingness to speak. This paradox can be solved if a spatial distribution of agents and local communication is assumed. Relaxing the idea of explicit selection pressures for syntax, an observation in chapter 3 also points at an alternative mechanism for the development of recursion. The fact that recursive expressiveness (ER ) grows very fast with the number of rules (N ), shows that the larger N (i.e. the “storage capacity”), the larger the expected relative fraction of recursive expressiveness. Whereas the traditional view emphasizes that cognitive limitations create the need for syntax, this observation indicates that larger cognitive abilities in fact make recursive expressiveness more likely to dominate. This might explain the apparent paradox, that the species with the most extended cognitive abilities, is the only species that developed “ef?cient”, recursive communication.

5.3 Methodology
We started this thesis with some remarks on the particular methodology that is used in this study. We argued that computational models are relatively precise implementations of a set of assumptions, that allow one to evaluate its internal coherence. Moreover, we argued such models are productive, because they can show unexpected behaviors and thus generate new concepts and hypotheses. Models can therefore sometimes lead verbal, formal and empirical approaches in new, interesting directions. Although the model studies that are discussed in this thesis are still very incomplete, we believe they illustrate some of the bene?ts of such a constructive modeling approach and give a fresh perspective on the issue of the origins of syntax: i. Traditionally the origins of language are thought to be explained as either the spontaneous result of human cognitive abilities and social interactions, or the result of an evolutionary updating of our innate language capacity. The model study of chapter 2 shows an example system where both social interaction and evolutionary updating play a role. Not because one part of language can be explained by “nurture” and another part by “nature”, but because they fundamentally interact: social interactions shape the evolutionary process and vice versa. ii. Formal language theory and linguistics have deemed ?nite languages and grammars as uninteresting. Nevertheless, we ?nd that the grammar—language mappings in the models of chapter 2 and 4 are not trivial, even though string length, parsing and derivation steps are all necessary ?nite. The analysis of chapter 3 suggests that some dynamical and statistical properties can in fact be quite interesting. Examples of such properties are the way expressiveness scales with grammar size and maximum string length, and the way statistical patterns in the language re?ect grammatical structure. iii. Traditionally language and the evolution of language are studied in terms of how much information about the outside world can be transmitted. The results of chapter 4 suggest that this might not always be the most interesting way of looking at language. Under perception settings not being understood by your competitors plays a role. Under communication settings higher expressiveness is disadvantageous, if the rest of the population lacks behind. In the spatial model, communicating with relatives is bene?cial, while communicating with non-relatives is not. These results show that language can have its own dynamics within a group that is quite independent from how well it represents the outside world.

5.4. FUTURE WORK

63

Interestingly, the simple game-theoretic mini-model that explains many of these observations, does not include “evolution”. Predictions about the course of evolution are made by looking at the scores agents obtain in the qualitatively different situations. Such simple observations are just as valid if the lang

赞助商链接

更多相关文章:
2015荷兰最新高校统计一览表
TUD Technische Universiteit Eindhoven, TU/e Universiteit Leiden Universiteit Maastricht, UM Universiteit Twente, UT Universiteit Utrecht, UU Universiteit van ...
手机国际快递到荷兰鹿特丹大学_河南敦豪航空速递
(Universiteit Twente,屯特) 乌得勒支大学(Universiteit Utrecht,乌得勒支) 阿姆斯特丹自由大学(Vrije Universiteit Amsterdam,阿姆斯特丹) 瓦赫宁恩大学(Wageningen ...
2018荷兰U类大学有哪些
WWW.SLL.CN 荷兰 u 类大学 1 乌特勒支大学(简称乌大)(荷兰文:Universiteit Utrecht)是荷兰最古老大学之一,也是欧洲规模 最大的大学之一。 乌特勒支大学是荷兰...
乌得勒支大学与马斯特里赫特大学哪个好
乌特勒支大学(荷兰语:Universiteit Utrecht),荷兰最古老大学之一,也是欧洲规模最大的大学 之一。乌特勒支大学坐落在荷兰乌特勒支市,创办于 1636 年 3 月 26 日。...
荷兰商业分析解析
4、申请要求: 1)专业背景 2)最好是 GRE,也接受 GMAT 3)雅思 6.5,单科 6 4)课程时长:1 年 Universiteit Utrecht, UU 乌特勒支大学(No.94) 1、项目名称...
QS生物医学工程世界排名详细榜单
t Wien 62 乌得勒支大学 Universiteit Utrecht 63 北卡罗莱纳大学查佩尔山分校(查佩尔 山) University of North Carolina at Chapel Hill (Chapel Hill) 64 加州...
荷兰本科最牛5所名校的申请要求
3 乌特勒支大学 乌特勒支大学(荷兰语:Universiteit Utrecht),荷兰最古老大学之一,也是欧洲规模最大 的大学之一。乌特勒支大学坐落在荷兰乌特勒支市,创办于 1636 年 3...
毕业论文手册
Universiteit Utrecht,2003 (11) Priemus,H.Rent subsidies in the USA and housing allowances in the Netherlands: Worlds apart[J].International Journal of ...
有机化学领域研究发表期刊
Habraken Debye Institute, Universiteit Utrecht, P.O. Box 80.000, 3508 TA Utrecht, Netherlands, Email: F.H.P.M.Habraken@phys.uu.nl H. Kobayashi ...
细数荷兰几大名校的雅思成绩要求
天道留学 http://tiandaoedu.com/ 乌特勒支大学 学校简介 乌特勒支大学(荷兰语:Universiteit Utrecht),荷兰最古老大学之一,也是欧洲规模最大的 大学之一。乌特勒支...
更多相关标签:

All rights reserved Powered by 甜梦文库 9512.net

copyright ©right 2010-2021。
甜梦文库内容来自网络,如有侵犯请联系客服。zhit325@126.com|网站地图