当前位置:首页 >> >>

ABSTRACT Learning to Map between Ontologies on the Semantic Web

Learning to Map between Ontologies on the Semantic Web
AnHai Doan, Jayant Madhavan, Pedro Domingos, and Alon Halevy
Computer Science and Engineering University of Washington, Seattle, WA, USA
{anhai, jayant, pedrod, alon}@cs.washington.edu

Ontologies play a prominent role on the Semantic Web. They make possible the widespread publication of machine understandable data, opening myriad opportunities for automated information processing. However, because of the Semantic Web’s distributed nature, data on it will inevitably come from many di?erent ontologies. Information processing across ontologies is not possible without knowing the semantic mappings between their elements. Manually ?nding such mappings is tedious, error-prone, and clearly not possible at the Web scale. Hence, the development of tools to assist in the ontology mapping process is crucial to the success of the Semantic Web. We describe GLUE, a system that employs machine learning techniques to ?nd such mappings. Given two ontologies, for each concept in one ontology GLUE ?nds the most similar concept in the other ontology. We give well-founded probabilistic de?nitions to several practical similarity measures, and show that GLUE can work with all of them. This is in contrast to most existing approaches, which deal with a single similarity measure. Another key feature of GLUE is that it uses multiple learning strategies, each of which exploits a di?erent type of information either in the data instances or in the taxonomic structure of the ontologies. To further improve matching accuracy, we extend GLUE to incorporate commonsense knowledge and domain constraints into the matching process. For this purpose, we show that relaxation labeling, a well-known constraint optimization technique used in computer vision and other ?elds, can be adapted to work e?ciently in our context. Our approach is thus distinguished in that it works with a variety of well-de?ned similarity notions and that it e?ciently incorporates multiple types of knowledge. We describe a set of experiments on several real-world domains, and show that GLUE proposes highly accurate semantic mappings.

Semantic Web, Ontology Mapping, Machine Learning, Relaxation Labeling.



The current World-Wide Web has well over 1.5 billion pages [3], but the vast majority of them are in humanreadable format only (e.g., HTML). As a consequence software agents (softbots) cannot understand and process this information, and much of the potential of the Web has so far remained untapped. In response, researchers have created the vision of the Semantic Web [6], where data has structure and ontologies describe the semantics of the data. Ontologies allow users to organize information into taxonomies of concepts, each with their attributes, and describe relationships between concepts. When data is marked up using ontologies, softbots can better understand the semantics and therefore more intelligently locate and integrate data for a wide variety of tasks. The following example illustrates the vision of the Semantic Web. Example 1.1. Suppose you want to ?nd out more about someone you met at a conference. You know that his last name is Cook, and that he teaches Computer Science at a nearby university, but you do not know which one. You also know that he just moved to the US from Australia, where he had been an associate professor at his alma mater. On the World-Wide Web of today you will have trouble ?nding this person. The above information is not contained within a single Web page, thus making keyword search ineffective. On the Semantic Web, however, you should be able to quickly ?nd the answers. A marked-up directory service makes it easy for your personal softbot to ?nd nearby Computer Science departments. These departments have marked up data using some ontology such as the one in Figure 1.a. Here the data is organized into a taxonomy that includes courses, people, and professors. Professors have attributes such as name, degree, and degree-granting institution. Such marked-up data makes it easy for your softbot to ?nd a professor with the last name Cook. Then by examining the attribute “granting institution”, the softbot quickly ?nds the alma mater CS department in Australia. Here, the softbot learns that the data has been marked up using an ontology speci?c to Australian universities, such as the one in Figure 1.b, and that there are many entities named Cook. However, knowing that “associate professor” is equivalent to “senior lecturer”, the bot can select the right subtree in the

Categories and Subject Descriptors
I.2.6 [Computing Methodologies]: Arti?cial Intelligence— Learning; H.2.5 [Information Systems]: Database Management—Heterogenous Databases, Data translation

General Terms
Algorithms, Design, Experimentation.
Copyright is held by the author/owner(s).

WWW2002, May 7–11, 2002, Honolulu, Hawaii, USA.
ACM 1-58113-449-5/02/0005.

departmental taxonomy, and zoom in on the old homepage of your conference acquaintance. 2 The Semantic Web thus o?ers a compelling vision, but it also raises many di?cult challenges. Researchers have been actively working on these challenges, focusing on ?eshing out the basic architecture, developing expressive and e?cient ontology languages, building techniques for e?cient marking up of data, and learning ontologies (e.g., [15, 8, 30, 23, 4]). A key challenge in building the Semantic Web, one that has received relatively little attention, is ?nding semantic mappings among the ontologies. Given the de-centralized nature of the development of the Semantic Web, there will be an explosion in the number of ontologies. Many of these ontologies will describe similar domains, but using di?erent terminologies, and others will have overlapping domains. To integrate data from disparate ontologies, we must know the semantic correspondences between their elements [6, 35]. For example, in the conference-acquaintance scenario described earlier, in order to ?nd the right person, your softbot must know that “associate professor” in the US corresponds to “senior lecturer” in Australia. Thus, the semantic correspondences are in e?ect the “glue” that hold the ontologies together into a “web of semantics”. Without them, the Semantic Web is akin to an electronic version of the Tower of Babel. Unfortunately, manually specifying such correspondences is time-consuming, error-prone [28], and clearly not possible on the Web scale. Hence, the development of tools to assist in ontology mapping is crucial to the success of the Semantic Web [35]. In this paper we describe the GLUE system, which applies machine learning techniques to semi-automatically create such semantic mappings. Since taxonomies are central components of ontologies, we focus ?rst on ?nding correspondences among the taxonomies of two given ontologies: for each concept node in one taxonomy, ?nd the most similar concept node in the other taxonomy. The ?rst issue we address in this realm is: what is the meaning of similarity between two concepts? Clearly, many di?erent de?nitions of similarity are possible, each being appropriate for certain situations. Our approach is based on the observation that many practical measures of similarity can be de?ned based solely on the joint probability distribution of the concepts involved. Hence, instead of committing to a particular de?nition of similarity, GLUE calculates the joint distribution of the concepts, and lets the application use the joint distribution to compute any suitable similarity measure. Speci?cally, for any two concepts A and B, we compute P (A, B), P (A, B), P (A, B), and P (A, B), where a term such as P (A, B) is the probability that an instance in the domain belongs to concept A but not to concept B. An application can then de?ne similarity to be a suitable function of these four values. For example, a similarity measure we use in this paper is P (A∩B)/P (A∪B), otherwise known as the Jaccard coe?cient [36]. The second challenge we address is that of computing the joint distribution of any two given concepts A and B. Under certain general assumptions (discussed in Section 4), a term such as P (A, B) can be approximated as the fraction of instances that belong to both A and B (in the data associated with the taxonomies or, more generally, in the probability distribution that generated it). Hence, the problem reduces to deciding for each instance if it belongs to A ∩ B. However, the input to our problem includes instances of A and

instances of B in isolation. GLUE addresses this problem using machine learning techniques as follows: it uses the instances of A to learn a classi?er for A, and then classi?es instances of B according to that classi?er, and vice-versa. Hence, we have a method for identifying instances of A ∩ B. Applying machine learning to our context raises the question of which learning algorithm to use and which types of information to use in the learning process. Many di?erent types of information can contribute toward deciding the membership of an instance: its name, value format, the word frequencies in its value, and each of these is best utilized by a di?erent learning algorithm. GLUE uses a multi-strategy learning approach [12]: we employ a set of learners, then combine their predictions using a meta-learner. In previous work [12] we have shown that multi-strategy learning is effective in the context of mapping between database schemas. Finally, GLUE attempts to exploit available domain constraints and general heuristics in order to improve matching accuracy. An example heuristic is the observation that two nodes are likely to match if nodes in their neighborhood also match. An example of a domain constraint is “if node X matches Professor and node Y is an ancestor of X in the taxonomy, then it is unlikely that Y matches AssistantProfessor”. Such constraints occur frequently in practice, and heuristics are commonly used when manually mapping between ontologies. Previous works have exploited only one form or the other of such knowledge and constraints, in restrictive settings [29, 26, 21, 25]. Here, we develop a unifying approach to incorporate all such types of information. Our approach is based on relaxation labeling, a powerful technique used extensively in the vision and image processing community [16], and successfully adapted to solve matching and classi?cation problems in natural language processing [31] and hypertext classi?cation [10]. We show that relaxation labeling can be adapted e?ciently to our context, and that it can successfully handle a broad variety of heuristics and domain constraints. In the rest of the paper we describe the GLUE system and the experiments we conducted to validate it. Speci?cally, the paper makes the following contributions: ? We describe well-founded notions of semantic similarity, based on the joint probability distribution of the concepts involved. Such notions make our approach applicable to a broad range of ontology-matching problems that employ di?erent similarity measures. ? We describe the use of multi-strategy learning for ?nding the joint distribution, and thus the similarity value of any concept pair in two given taxonomies. The GLUE system, embodying our approach, utilizes many di?erent types of information to maximize matching accuracy. Multi-strategy learning also makes our system easily extensible to additional learners, as they become available. ? We introduce relaxation labeling to the ontology-matching context, and show that it can be adapted to e?ciently exploit a broad range of common knowledge and domain constraints to further improve matching accuracy. ? We describe a set of experiments on several real-world domains to validate the e?ectiveness of GLUE. The results show the utility of multi-strategy learning and

CS Dept US

CS Dept Australia

UnderGrad Courses

Grad Courses Faculty

People Staff

Courses Academic Staff

Staff Technical Staff

Assistant Professor
- name - degree - granting -institution

Associate Professor
K. Burn Ph.D. Univ. of Michigan R.Cook Ph.D. Univ. of Sydney


- first-name - last-name - education

Senior Lecturer




Figure 1: Computer Science Department Ontologies relaxation labeling, and that GLUE can work well with di?erent notions of similarity. In the next section we de?ne the ontology-matching problem. Section 3 discusses our approach to measuring similarity, and Sections 4-5 describe the GLUE system. Section 6 presents our experiments. Section 7 reviews related work. Section 8 discusses future work and concludes. above. Given two ontologies, the ontology-matching problem is to ?nd semantic mappings between them. The simplest type of mapping is a one-to-one (1-1) mapping between the elements, such as “Associate-Professor maps to Senior-Lecturer”, and “degree maps to education”. Notice that mappings between di?erent types of elements are possible, such as “the relation AdvisedBy(Student,Professor) maps to the attribute advisor of the concept Student”. Examples of more complex types of mapping include “name maps to the concatenation of ?rst-name and last-name”, and “the union of UndergradCourses and Grad-Courses maps to Courses”. In general, a mapping may be speci?ed as a query that transforms instances in one ontology into instances in the other [9]. In this paper we focus on ?nding 1-1 mappings between the taxonomies. This is because taxonomies are central components of ontologies, and successfully matching them would greatly aid in matching the rest of the ontologies. Extending matching to attributes and relations and considering more complex types of matching is the subject of ongoing research. There are many ways to formulate a matching problem for taxonomies. The speci?c problem that we consider is as follows: given two taxonomies and their associated data instances, for each node (i.e., concept) in one taxonomy, ?nd the most similar node in the other taxonomy, for a prede?ned similarity measure. This is a very general problem setting that makes our approach applicable to a broad range of common ontology-related problems on the Semantic Web, such as ontology integration and data translation among the ontologies. Data instances: GLUE makes heavy use of the fact that we have data instances associated with the ontologies we are matching. We note that many real-world ontologies already have associated data instances. Furthermore, on the Semantic Web, the largest bene?ts of ontology matching come from matching the most heavily used ontologies; and the more heavily an ontology is used for marking up data, the more data it has. Finally, we show in our experiments that only a moderate number of data instances is necessary in order to obtain good matching accuracy.

We now introduce ontologies, then de?ne the problem of ontology matching. An ontology speci?es a conceptualization of a domain in terms of concepts, attributes, and relations [14]. The concepts provided model entities of interest in the domain. They are typically organized into a taxonomy tree where each node represents a concept and each concept is a specialization of its parent. Figure 1 shows two sample taxonomies for the CS department domain (which are simpli?cations of real ones). Each concept in a taxonomy is associated with a set of instances. For example, concept Associate-Professor has instances “Prof. Cook” and “Prof. Burn” as shown in Figure 1.a. By the taxonomy’s de?nition, the instances of a concept are also instances of an ancestor concept. For example, instances of Assistant-Professor, Associate-Professor, and Professor in Figure 1.a are also instances of Faculty and People. Each concept is also associated with a set of attributes. For example, the concept Associate-Professor in Figure 1.a has the attributes name, degree, and granting-institution. An instance that belongs to a concept has ?xed attribute values. For example, the instance “Professor Cook” has value name = “R. Cook”, degree = “Ph.D.”, and so on. An ontology also de?nes a set of relations among its concepts. For example, a relation AdvisedBy(Student,Professor) might list all instance pairs of Student and Professor such that the former is advised by the latter. Many formal languages to specify ontologies have been proposed for the Semantic Web, such as OIL, DAML+OIL, SHOE, and RDF [8, 2, 15, 7]. Though these languages di?er in their terminologies and expressiveness, the ontologies that they model essentially share the same features we described

To match concepts between two taxonomies, we need a notion of similarity. We now describe the similarity measures that GLUE handles; but before doing that, we discuss the motivations leading to our choices. First, we would like the similarity measures to be wellde?ned. A well-de?ned measure will facilitate the evaluation of our system. It also makes clear to the users what the system means by a match, and helps them ?gure out whether the system is applicable to a given matching scenario. Furthermore, a well-de?ned similarity notion may allow us to leverage special-purpose techniques for the matching process. Second, we want the similarity measures to correspond to our intuitive notions of similarity. In particular, they should depend only on the semantic content of the concepts involved, and not on their syntactic speci?cation. Finally, it is clear that many reasonable similarity measures exist, each being appropriate to certain situations. Hence, to maximize our system’s applicability, we would like it to be able to handle a broad variety of similarity measures. The following examples illustrate the variety of possible de?nitions of similarity. Example 3.1. In searching for your conference acquaintance, your softbot should use an “exact” similarity measure that maps Associate-Professor into Senior Lecturer, an equivalent concept. However, if the softbot has some postprocessing capabilities that allow it to ?lter data, then it may tolerate a “most-speci?c-parent” similarity measure that maps Associate-Professor to Academic-Sta?, a more general concept. 2 Example 3.2. A common task in ontology integration is to place a concept A into an appropriate place in a taxonomy T . One way to do this is to (a) use an “exact” similarity measure to ?nd the concept B in T that is “most similar” to A, (b) use a “most-speci?c-parent” similarity measure to ?nd the concept C in T that is the most speci?c superset concept of A, (c) use a “most-general-child” similarity measure to ?nd the concept D in T that is the most general subset concept of A, then (d) decide on the placement of A, based on B, C, and D. 2 Example 3.3. Certain applications may even have di?erent similarity measures for di?erent concepts. Suppose that a user tells the softbot to ?nd houses in the range of $300500K, located in Seattle. The user expects that the softbot will not return houses that fail to satisfy the above criteria. Hence, the softbot should use exact mappings for price and address. But it may use approximate mappings for other concepts. If it maps house-description into neighborhood-info, that is still acceptable. 2 Most existing works in ontology (and schema) matching do not satisfy the above motivating criteria. Many works implicitly assume the existence of a similarity measure, but never de?ne it. Others de?ne similarity measures based on the syntactic clues of the concepts involved. For example, the similarity of two concepts might be computed as the dot product of the two TF/IDF (Term Frequency/Inverse Document Frequency) vectors representing the concepts, or a function based on the common tokens in the names of the concepts. Such similarity measures are problematic because

they depend not only on the concepts involved, but also on their syntactic speci?cations.


Distribution-based Similarity Measures

We now give precise similarity de?nitions and show how our approach satis?es the motivating criteria. We begin by modeling each concept as a set of instances, taken from a ?nite universe of instances. In the CS domain, for example, the universe consists of all entities of interest in that world: professors, assistant professors, students, courses, and so on. The concept Professor is then the set of all instances in the universe that are professors. Given this model, the notion of the joint probability distribution between any two concepts A and B is well de?ned. This distribution consists of the four probabilities: P (A, B), P (A, B), P (A, B), and P (A, B). A term such as P (A, B) is the probability that a randomly chosen instance from the universe belongs to A but not to B, and is computed as the fraction of the universe that belongs to A but not to B. Many practical similarity measures can be de?ned based on the joint distribution of the concepts involved. For instance, a possible de?nition for the “exact” similarity measure in Example 3.1 is Jaccard-sim(A, B) = P (A ∩ B)/P (A ∪ B) P (A, B) (1) = P (A, B) + P (A, B) + P (A, B)

This similarity measure is known as the Jaccard coe?cient [36]. It takes the lowest value 0 when A and B are disjoint, and the highest value 1 when A and B are the same concept. Most of our experiments will use this similarity measure. A de?nition for the “most-speci?c-parent” similarity measure in Example 3.2 is M SP (A, B) = P (A|B) 0 if P (B|A) = 1 otherwise (2)

where the probabilities P (A|B) and P (B|A) can be trivially expressed in terms of the four joint probabilities. This definition states that if B subsumes A, then the more speci?c B is, the higher P (A|B), and thus the higher the similarity value M SP (A, B) is. Thus it suits the intuition that the most speci?c parent of A in the taxonomy is the smallest set that subsumes A. An analogous de?nition can be formulated for the “most-general-child” similarity measure. Instead of trying to estimate speci?c similarity values directly, GLUE focuses on computing the joint distributions. Then, it is possible to compute any of the above mentioned similarity measures as a function over the joint distributions. Hence, GLUE has the signi?cant advantage of being able to work with a variety of similarity functions that have well-founded probabilistic interpretations.



We now describe GLUE in detail. The basic architecture of GLUE is shown in Figure 2. It consists of three main modules: Distribution Estimator, Similarity Estimator, and Relaxation Labeler. The Distribution Estimator takes as input two taxonomies O1 and O2 , together with their data instances. Then it applies machine learning techniques to compute for every pair of concepts A ∈ O1 , B ∈ O2 their joint probability distribution. Recall from Section 3 that this joint distribution

Mappings for O 1 , Mappings for O 2

Relaxation Labeler Common knowledge & Domain constraints

Similarity Matrix

Similarity Estimator

the taxonomies may be overlapping, but are not necessarily so. To estimate P (A, B), we make the general assumption that the set of instances of each input taxonomy is a representative sample of the instance universe covered by the taxonomy.1 We denote by Ui the set of instances given for taxonomy Oi , by N (Ui ) the size of Ui , and by N (UiA,B ) the number of instances in Ui that belong to both A and B. With the above assumption, P (A, B) can be estimated by the following equation:2
A,B A,B P (A, B) = [N (U1 ) + N (U2 )] / [N (U1 ) + N (U2 )], (3)

Similarity function

Joint Distributions: P(A,B), P(A, notB ), ...

Meta Learner M Distribution Estimator Base Learner L 1 Base Learner Lk

Taxonomy O 1 (tree structure + data instances)

Taxonomy O 2 (tree structure + data instances)

Figure 2: The GLUE Architecture

consists of four numbers: P (A, B), P (A, B), P (A, B), and P (A, B). Thus a total of 4|O1 ||O2 | numbers will be computed, where |Oi | is the number of nodes (i.e., concepts) in taxonomy Oi . The Distribution Estimator uses a set of base learners and a meta-learner. We describe the learners and the motivation behind them in Section 4.2. Next, GLUE feeds the above numbers into the Similarity Estimator, which applies a user-supplied similarity function (such as the ones in Equations 1 or 2) to compute a similarity value for each pair of concepts A ∈ O1 , B ∈ O2 . The output from this module is a similarity matrix between the concepts in the two taxonomies. The Relaxation Labeler module then takes the similarity matrix, together with domain-speci?c constraints and heuristic knowledge, and searches for the mapping con?guration that best satis?es the domain constraints and the common knowledge, taking into account the observed similarities. This mapping con?guration is the output of GLUE. We now describe the Distribution Estimator. First, we discuss the general machine-learning technique used to estimate joint distributions from data, and then the use of multi-strategy learning in GLUE. Section 5 describes the Relaxation Labeler. The Similarity Estimator is trivial because it simply applies a user-de?ned function to compute the similarity of two concepts from their joint distribution, and hence is not discussed further.

A,B Computing P (A, B) then reduces to computing N (U1 ) A,B A,B and N (U2 ). Consider N (U2 ). We can compute this quantity if we know for each instance s in U2 whether it belongs to both A and B. One part is easy: we already know whether s belongs to B – if it is explicitly speci?ed as an instance of B or of any descendant node of B. Hence, we only need to decide whether s belongs to A. This is where we use machine learning. Speci?cally, we partition U1 , the set of instances of ontology O1 , into the set of instances that belong to A and the set of instances that do not belong to A. Then, we use these two sets as positive and negative examples, respectively, to train a classi?er for A. Finally, we use the classi?er to predict whether instance s belongs to A. In summary, we estimate the joint probability distribution of A and B as follows (the procedure is illustrated in Figure 3): A A 1. Partition U1 , into U1 and U1 , the set of instances that do and do not belong to A, respectively (Figures 3.ab). A A 2. Train a learner L for instances of A, using U1 and U1 as the sets of positive and negative training examples, respectively.

3. Partition U2 , the set of instances of taxonomy O2 , into B B U2 and U2 , the set of instances that do and do not belong to B, respectively (Figures 3.d-e).
B 4. Apply learner L to each instance in U2 (Figure 3.e). A,B A,B B This partitions U2 into the two sets U2 and U2 B shown in Figure 3.f. Similarly, applying L to U2 reA,B A,B sults in the two sets U2 and U2 .

5. Repeat Steps 1-4, but with the roles of taxonomies O1 A,B A,B and O2 being reversed, to obtain the sets U1 , U1 , A,B A,B U1 , and U1 . 6. Finally, compute P (A, B) using Formula 3. The remaining three joint probabilities are computed in a A,B A,B similar manner, using the sets U2 , . . . , U1 computed in Steps 4-5. This is a standard assumption in machine learning and statistics, and seems appropriate here, unless the available instances were generated in some unusual way. 2 Notice that N (UiA,B )/N (Ui ) is also a reasonable approximation of P (A, B), but it is estimated based only on the data of Oi . The estimation in (3) is likely to be more accurate because it is based on more data, namely, the data of both O1 and O2 .

4.1 The Distribution Estimator
Consider computing the value of P (A, B). This joint probability can be computed as the fraction of the instance universe that belongs to both A and B. In general we cannot compute this fraction because we do not know every instance in the universe. Hence, we must estimate P (A, B) based on the data we have, namely, the instances of the two input taxonomies. Note that the instances that we have for


A U1

G Trained Learner L U1
not A

B U2

A,B U2

not U2 A,B

A E t1, t2 F

C t5

D t6, t7

t1, t2, t3, t4 t5, t6, t7

s1 B I s2, s3 J s4

H s5, s6

s1, s2, s3, s4 s5, s6
not U2 B


s1, s3 s5

s2, s4 s6
not U2 A,not B

t3, t4

U A,not B

Taxonomy O 1 (a) (b) (c)

Taxonomy O 2 (d) (e) (f)

Figure 3: Estimating the joint distribution of concepts A and B By applying the above procedure to all pairs of concepts A ∈ O1 , B ∈ O2 we obtain all joint distributions of interest. where the wj are tokens. To make a prediction, the Content Learner needs to compute the probability that an input instance is an instance of A, given its tokens, i.e., P (A|d). Using Bayes’ theorem, P (A|d) can be rewritten as P (d|A)P (A)/P (d). Fortunately, two of these values can be estimated using the training instances, and the third, P (d), can be ignored because it is just a normalizing constant. Speci?cally, P (A) is estimated as the portion of training instances that belong to A. To compute P (d|A), we assume that the tokens wj appear in d independently of each other given A (this is why the method is called naive Bayes). With this assumption, we have P (d|A) = P (w1 |A)P (w2 |A) · · · P (wk |A) P (wj |A) is estimated as n(wj , A)/n(A), where n(A) is the total number of token positions of all training instances that belong to A, and n(wj , A) is the number of times token wj appears in all training instances belonging to A. Even though the independence assumption is typically not valid, the Naive Bayes learner still performs surprisingly well in many domains, notably text-based ones (see [13] for an explanation). We compute P (A|d) in a similar manner. Hence, the Content Learner predicts A with probability P (A|d), and A with the probability P (A|d). The Content Learner works well on long textual elements, such as course descriptions, or elements with very distinct and descriptive values, such as color (red, blue, green, etc.). It is less e?ective with short, numeric elements such as course numbers or credits. The Name Learner: This learner is similar to the Content Learner, but makes predictions using the full name of the input instance, instead of its content. The full name of an instance is the concatenation of concept names leading from the root of the taxonomy to that instance. For example, the full name of instance with the name s4 in taxonomy O2 (Figure 3.d) is “G B J s4 ”. This learner works best on speci?c and descriptive names. It does not do well with names that are too vague or vacuous. The Meta-Learner: The predictions of the base learners are combined using the meta-learner. The meta-learner assigns to each base learner a learner weight that indicates how much it trusts that learner’s predictions. Then it combines the base learners’ predictions via a weighted sum. For example, suppose the weights of the Content Learner and the Name Learner are 0.6 and 0.4, respectively. Suppose further that for instance s4 of taxonomy O2 (Figure 3.d) the Content Learner predicts A with probability 0.8 and A

4.2 Multi-Strategy Learning
Given the diversity of machine learning methods, the next issue is deciding which one to use for the procedure we described above. A key observation in our approach is that there are many di?erent types of information that a learner can glean from the training instances, in order to make predictions. It can exploit the frequencies of words in the text value of the instances, the instance names, the value formats, the characteristics of value distributions, and so on. Since di?erent learners are better at utilizing di?erent types of information, GLUE follows [12] and takes a multistrategy learning approach. In Step 2 of the above estimation procedure, instead of training a single learner L, we train a set of learners L1 , . . . , Lk , called base learners. Each base learner exploits well a certain type of information from the training instances to build prediction hypotheses. Then, to classify an instance in Step 4, we apply the base learners to the instance and combine their predictions using a meta-learner. This way, we can achieve higher classi?cation accuracy than with any single base learner alone, and therefore better approximations of the joint distributions. The current implementation of GLUE has two base learners, Content Learner and Name Learner, and a meta-learner that is a linear combination of the base learners. We now describe these learners in detail. The Content Learner: This learner exploits the frequencies of words in the textual content of an instance to make predictions. Recall that an instance typically has a name and a set of attributes together with their values. In the current version of GLUE, we do not handle attributes directly; rather, we treat them and their values as the textual content of the instance3 . For example, the textual content of the instance “Professor Cook” is “R. Cook, Ph.D., University of Sidney, Australia”. The textual content of the instance “CSE 342” is the text content of this course’ homepage. The Content Learner employs the Naive Bayes learning technique [13], one of the most popular and e?ective text classi?cation methods. It treats the textual content of each input instance as a bag of tokens, which is generated by parsing and stemming the words and symbols in the content. Let d = {w1 , . . . , wk } be the content of an input instance, However, more sophisticated learners can be developed that deal explicitly with the attributes, such as the XML Learner in [12].

with probability 0.2, and the Name Learner predicts A with probability 0.3 and A with probability 0.7. Then the MetaLearner predicts A with probability 0.8 · 0.6 + 0.3 · 0.4 = 0.6 and A with probability 0.2 · 0.6 + 0.7 · 0.4 = 0.4. In the current GLUE system, the learner weights are set manually, based on the characteristics of the base learners and the taxonomies. However, they can also be set automatically using a machine learning approach called stacking [37, 34], as we have shown in [12].

1 Sigmoid(x) 0.9 0.8 0.7 0.6 P(x) 0.5 0.4 0.3 0.2 0.1

We now describe the Relaxation Labeler, which takes the similarity matrix from the Similarity Estimator, and searches for the mapping con?guration that best satis?es the given domain constraints and heuristic knowledge. We ?rst describe relaxation labeling, then discuss the domain constraints and heuristic knowledge employed in our approach.

0 -10


0 x



Figure 4: The sigmoid function

P (X = L|?K )


P (X = L, MX |?K ) P (X = L|MX , ?K )P (MX |?K ) (4)

5.1 Relaxation Labeling
Relaxation labeling is an e?cient technique to solve the problem of assigning labels to nodes of a graph, given a set of constraints. The key idea behind this approach is that the label of a node is typically in?uenced by the features of the node’s neighborhood in the graph. Examples of such features are the labels of the neighboring nodes, the percentage of nodes in the neighborhood that satisfy a certain criterion, and the fact that a certain constraint is satis?ed or not. Relaxation labeling exploits this observation. The in?uence of a node’s neighborhood on its label is quanti?ed using a formula for the probability of each label as a function of the neighborhood features. Relaxation labeling assigns initial labels to nodes based solely on the intrinsic properties of the nodes. Then it performs iterative local optimization. In each iteration it uses the formula to change the label of a node based on the features of its neighborhood. This continues until labels do not change from one iteration to the next, or some other convergence criterion is reached. Relaxation labeling appears promising for our purposes because it has been applied successfully to similar matching problems in computer vision, natural language processing, and hypertext classi?cation [16, 31, 10]. It is relatively ef?cient, and can handle a broad range of constraints. Even though its convergence properties are not yet well understood (except in certain cases) and it is liable to converge to a local maxima, in practice it has been found to perform quite well [31, 10]. We now explain how to apply relaxation labeling to the problem of mapping from taxonomy O1 to taxonomy O2 . We regard nodes (concepts) in O2 as labels, and recast the problem as ?nding the best label assignment to nodes (concepts) in O1 , given all knowledge we have about the domain and the two taxonomies. Our goal is to derive a formula for updating the probability that a node takes a label based on the features of the neighborhood. Let X be a node in taxonomy O1 , and L be a label (i.e., a node in O2 ). Let ?K represent all that we know about the domain, namely, the tree structures of the two taxonomies, the sets of instances, and the set of domain constraints. Then we have the following conditional probability =

where the sum is over all possible label assignments MX to all nodes other than X in taxonomy O1 . Assuming that the nodes’ label assignments are independent of each other given ?K , we have P (MX |?K ) =
(Xi =Li )∈MX

P (Xi = Li |?K )


Consider P (X = L|MX , ?K ). MX and ?K constitutes all that we know about the neighborhood of X. Suppose now that the probability of X getting label L depends only on the values of n features of this neighborhood, where each feature is a function fi (MX , ?K , X, L). As we explain later in this section, each such feature corresponds to one of the heuristics or domain constraints that we wish to exploit. Then P (X = L|MX , ?K ) = P (X = L|f1 , . . . , fn ) (6)

If we have access to previously-computed mappings between taxonomies in the same domain, we can use them as the training data from which to estimate P (X = L|f1 , . . . , fn ) (see [10] for an example of this in the context of hypertext classi?cation). However, here we will assume that such mappings are not available. Hence we use alternative methods to quantify the in?uence of the features on the label assignment. In particular, we use the sigmoid or logistic function σ(x) = 1/(1 + e?x ), where x is a linear combination of the features fk , to estimate the above probability. This function is widely used to combine multiple sources of evidence [5]. The general shape of the sigmoid is as shown in Figure 4. Thus: P (X = L|f1 , . . . , fn ) ∝ σ(α1 · f1 + · · · + αn · fn ) (7)

where ∝ denotes “proportional to”, and the weight αk indicates the importance of feature fk . The sigmoid is essentially a smoothed threshold function, which makes it a good candidate for use in combining evidence from the di?erent features. If the total evidence is

Constraint Types

Examples Two nodes match if their children also match. Two nodes match if their parents match and at least x% of their children also match. Two nodes match if their parents match and some of their desce dants also match. n If all children of node X match node Y, then X also matches Y. If node Y is a descendant of node X, and Y matches PROFESSOR, then it unlikely that X matches ASST PROFESSOR. is If node Y is NOT a descendant of node X, and Y matches PROFESSOR, then it is unlikely that X matches FACULTY. There can be at most one node that matches DEPARTMENT CHAIR. If a node in the neighborhood of node X matches ASSOC PROFESSOR, then the chance thatX matches PROFESSOR is increased.



Union Subsumption




Table 1: Examples of constraints that can be exploited to improve matching accuracy. below a certain value, it is unlikely that the nodes match; above this threshold, they probably do. By substituting Equations 5-7 into Equation 4, we obtain

P (X = L|?K ) ∝


αk fk (MX , ?K , X, L) P (Xi = Li |?K )

× (8)

(Xi =Li )∈MX

The proportionality constant is found by renormalizing the probabilities of all the labels to sum to one. Notice that this equation expresses the probabilities P (X = L|?K ) for the various nodes in terms of each other. This is the iterative equation that we use for relaxation labeling. In our implementation, we optimized relaxation labeling for e?ciency in a number of ways that take advantage of the speci?c structure of the ontology matching problem. Space limitations preclude discussing these optimizations here, but see Section 6 for a discussion on the running time of the Relaxation Labeler.

f1 (MX , ?K , X, L) that is the percentage of X’s children that match a child of L, under the given MX mapping. Thus f1 is a numeric feature that takes values from 0 to 1. Next, we assign to fi a positive weight αi . This has the intuitive e?ect that, all other things being equal, the higher the value fi (i.e., the percentage of matching children), the higher the probability of X matching L is. As another example, consider the constraint c2 : “if node Y is a descendant of node X, and Y matches PROFESSOR, then it is unlikely that X matches ASST-PROFESSOR”. The corresponding feature, f2 (MX , ?K , X, L), is 1 if the condition “there exists a descendant of X that matches PROFESSOR” is satis?ed, given the MX mapping con?guration, and 0 otherwise. Clearly, when this feature takes value 1, we want to substantially reduce the probability that X matches ASST-PROFESSOR. We model this e?ect by assigning to f2 a negative weight α2 .



5.2 Constraints
Table 1 shows examples of the constraints currently used in our approach and their characteristics. We distinguish between two types of constraints: domain-independent and dependent constraints. Domain-independent constraints convey our general knowledge about the interaction between related nodes. Perhaps the most widely used such constraint is the Neighborhood Constraint: “two nodes match if nodes in their neighborhood also match”, where the neighborhood is de?ned to be the children, the parents, or both [29, 21, 26] (see Table 1). Another example is the Union Constraint: “if all children of a node A match node B, then A also matches B”. This constraint is speci?c to the taxonomy context. It exploits the fact that A is the union of all its children. Domain-dependent constraints convey our knowledge about the interaction between speci?c nodes in the taxonomies. Table 1 shows examples of three types of domain-dependent constraints. To incorporate the constraints into the relaxation labeling process, we model each constraint ci as a feature fi of the neighborhood of node X. For example, consider the constraint c1 : “two nodes are likely to match if their children match”. To model this constraint, we introduce the feature

We have evaluated GLUE on several real-world domains. Our goals were to evaluate the matching accuracy of GLUE, to measure the relative contribution of the di?erent components of the system, and to verify that GLUE can work well with a variety of similarity measures. Domains and Taxonomies: We evaluated GLUE on three domains, whose characteristics are shown in Table 2. The domains Course Catalog I and II describe courses at Cornell University and the University of Washington. The taxonomies of Course Catalog I have 34 - 39 nodes, and are fairly similar to each other. The taxonomies of Course Catalog II are much larger (166 - 176 nodes) and much less similar to each other. Courses are organized into schools and colleges, then into departments and centers within each college. The Company Pro?le domain uses ontologies from Yahoo.com and TheStandard.com and describes the current business status of companies. Companies are organized into sectors, then into industries within each sector4 . In each domain we downloaded two taxonomies. For each taxonomy, we downloaded the entire set of data instances,
4 Many ontologies are also available from research resources (e.g., DAML.org, semanticweb.org, OntoBroker [1], SHOE, OntoAgents). However, they currently have no or very few data instances.

Taxonomies Course Catalog I Course Catalog II Company Profiles Cornell Washington Cornell Washington Standard.com Yahoo.com

# nodes 34 39 176 166 333 115

# non-leaf nodes 6 8 27 25 30 13

depth 4 4 4 4 3 3

# instances in taxonomy 1526 1912 4360 6957 13634 9504

max # instances at a leaf 155 214 161 214 222 656

max # children of a node 10 11 27 49 29 25

# manual mappings created 34 37 54 50 236 104

Table 2: Domains and taxonomies for our experiments.
Name Learner
100 90
Matching accuracy (%)

Content Learner

Meta Learner

Relaxation Labeler

80 70 60 50 40 30 20 10 0 Cornell to Wash. Wash. to Cornell Cornell to Wash. Wash. to Cornell Standard to Yahoo Yahoo to Standard

Course Catalog I

Course Catalog II

Company Profile

Figure 5: Matching accuracy of GLUE. and performed some trivial data cleaning such as removing HTML tags and phrases such as “course not o?ered” from the instances. We also removed instances of size less than 130 bytes, because they tend to be empty or vacuous, and thus do not contribute to the matching process. We then removed all nodes with fewer than 5 instances, because such nodes cannot be matched reliably due to lack of data. Similarity Measure & Manual Mappings: We chose to evaluate GLUE using the Jaccard similarity measure (Section 3), because it corresponds well to our intuitive understanding of similarity. Given the similarity measure, we manually created the correct 1-1 mappings between the taxonomies in the same domain, for evaluation purposes. The rightmost column of Table 2 shows the number of manual mappings created for each taxonomy. For example, we created 236 one-to-one mappings from Standard to Yahoo!, and 104 mappings in the reverse direction. Note that in some cases there were nodes in a taxonomy for which we could not ?nd a 1-1 match. This was either because there was no equivalent node (e.g., School of Hotel Administration at Cornell has no equivalent counterpart at the University of Washington), or when it is impossible to determine an accurate match without additional domain expertise. Domain Constraints: We speci?ed domain constraints for the relaxation labeler. For the taxonomies in Course Catalog I, we speci?ed all applicable subsumption constraints (see Table 1). For the other two domains, because their sheer size makes specifying all constraints di?cult, we speci?ed only the most obvious subsumption constraints (about 10 constraints for each taxonomy). For the taxonomies in Company Pro?les we also used several frequency constraints. Experiments: For each domain, we performed two experiments. In each experiment, we applied GLUE to ?nd the mappings from one taxonomy to the other. The matching accuracy of a taxonomy is then the percentage of the manual mappings (for that taxonomy) that GLUE predicted correctly.


Matching Accuracy

Figure 5 shows the matching accuracy for di?erent domains and con?gurations of GLUE. In each domain, we show the matching accuracy of two scenarios: mapping from the ?rst taxonomy to the second, and vice versa. The four bars in each scenario (from left to right) represent the accuracy produced by: (1) the name learner alone, (2) the content learner alone, (3) the meta-learner using the previous two learners, and (4) the relaxation labeler on top of the metalearner (i.e., the complete GLUE system). The results show that GLUE achieves high accuracy across all three domains, ranging from 66 to 97%. In contrast, the best matching results of the base learners, achieved by the content learner, are only 52 - 83%. It is interesting that the name learner achieves very low accuracy, 12 - 15% in four out of six scenarios. This is because all instances of a concept, say B, have very similar full names (see the description of the name learner in Section 4.2). Hence, when the name learner for a concept A is applied to B, it will classify all instances of B as A or A. In cases when this class?cation is incorrect, which might be quite often, using the name learner alone leads to poor estimates of the joint distributions. The poor performance of the name learner underscores the importance of data instances and multi-strategy learning in ontology matching. The results clearly show the utility of the meta-learner and

relaxation labeler. Even though in half of the cases the metalearner only minimally improves the accuracy, in the other half it makes substantial gains, between 6 and 15%. And in all but one case, the relaxation labeler further improves accuracy by 3 - 18%, con?rming that it is able to exploit the domain constraints and general heuristics. In one case (from Standard to Yahoo), the relaxation labeler decreased accuracy by 2%. The performance of the relaxation labeler is discussed in more detail below. In Section 6.4 we identify the reasons that prevent GLUE from identifying the remaining mappings. In the current experiments, GLUE utilized on average only 30 to 90 data instances per leaf node (see Table 2). The high accuracy in these experiments suggests that GLUE can work well with only a modest amount of data.

Cornell to Wash.
100 90 80

Wash. To Cornell

Matching Accuracy (%)

70 60 50 40 30 20 10 0 0 0.1 0.2 0.3 0.4 0.5

6.2 Performance of the Relaxation Labeler
In our experiments, when the relaxation labeler was applied, the accuracy typically improved substantially in the ?rst few iterations, then gradually dropped. This phenomenon has also been observed in many previous works on relaxation labeling [16, 20, 31]. Because of this, ?nding the right stopping criterion for relaxation labeling is of crucial importance. Many stopping criteria have been proposed, but no general e?ective criterion has been found. We considered three stopping criteria: (1) stopping when the mappings in two consecutive iterations do not change (the mapping criterion), (2) when the probabilities do not change, or (3) when a ?xed number of iterations has been reached. We observed that when using the last two criteria the accuracy sometimes improved by as much as 10%, but most of the time it decreased. In contrast, when using the mapping criterion, in all but one of our experiments the accuracy substantially improved, by 3 - 18%, and hence, our results are reported using this criterion. We note that with the mapping criterion, we observed that relaxation labeling always stopped in the ?rst few iterations. In all of our experiments, relaxation labeling was also very fast. It took only a few seconds in Catalog I and under 20 seconds in the other two domains to ?nish ten iterations. This observation shows that relaxation labeling can be implemented e?ciently in the ontology-matching context. It also suggests that we can e?ciently incorporate user feedback into the relaxation labeling process in the form of additional domain constraints. We also experimented with di?erent values for the constraint weights (see Section 5), and found that the relaxation labeler was quite robust with respect to such parameter changes.


Figure 6: The accuracy of GLUE in the Course Catalog I domain, using the most-speci?c-parent similarity measure. added an factor to account for the error in approximating P (B|A). Figure 6 shows the matching accuracy, plotted against . As can be seen, GLUE performed quite well on a broad range of . This illustrates how GLUE can be e?ective with more than one similarity measure.



6.3 Most-Speci?c-Parent Similarity Measure
So far we have experimented only with the Jaccard similarity measure. We wanted to know whether GLUE can work well with other similarity measures. Hence we conducted an experiment in which we used GLUE to ?nd mappings for taxonomies in the Course Catalog I domain, using the following similarity measure: M SP (A, B) = P (A|B) 0 if P (B|A) ≥ 1 ? otherwise

This measure is the same as the the most-speci?c-parent similarity measure described in Section 3, except that we

The accuracy of GLUE is quite impressive as is, but it is natural to ask what limits GLUE from obtaining even higher accuracy. There are several reasons that prevent GLUE from correctly matching the remaining nodes. First, some nodes cannot be matched because of insu?cient training data. For example, many course descriptions in Course Catalog II contain only vacuous phrases such as “3 credits”. While there is clearly no general solution to this problem, in many cases it can be mitigated by adding base learners that can exploit domain characteristics to improve matching accuracy. And second, the relaxation labeler performed local optimizations, and sometimes converged to only a local maxima, thereby not ?nding correct mappings for all nodes. Here, the challenge will be in developing search techniques that work better by taking a more “global perspective”, but still retain the runtime e?ciency of local optimization. Further, the two base learners we used in our implementation are rather simple general-purpose text classi?ers. Using other leaners that perform domain-speci?c feature selection and comparison can also improve the accuracy. We note that some nodes cannot be matched automatically because they are simply ambiguous. For example, it is not clear whether “networking and communication devices” should match “communication equipment” or “computer networks”. A solution to this problem is to incorporate user interaction into the matching process [28, 12, 38]. GLUE currently tries to predict the best match for every node in the taxonomy. However, in some cases, such a match simply does not exist (e.g., unlike Cornell, the University of

Washington does not have a School of Hotel Administration). Hence, an additional extension to GLUE is to make it be aware of such cases, and not predict an incorrect match when this occurs.

sure, whereas GLUE allows for application-dependent similarity measures. Ontology Learning: Machine learning has been applied to other ontology-related tasks, most notably learning to construct ontologies from data and other ontologies, and extracting ontology instances from data [30, 23, 32]. Our work here provides techniques to help in the ontology construction process [23]. [22] gives a comprehensive summary of the role of machine learning in the Semantic Web e?ort.

GLUE is related to our previous work on LSD [12], whose goal was to semi-automatically ?nd schema mappings for data integration. There, we had a mediated schema, and our goal was to ?nd mappings from the schemas of a multitude of data sources to the mediated schema. The observation was that we can use a set of manually given mappings on several sources as training examples for a learner that predicts mappings for subsequent sources. LSD illustrated the e?ectiveness of multi-strategy learning for this problem. In GLUE since our problem is to match a pair of ontologies, there are no manual mappings for training, and we need to obtain the training examples for the learner automatically. Further, since GLUE deals with a more expressive formalism (ontologies versus schemas), the role of constraints is much more important, and we innovate by using relaxation labeling for this purpose. Finally, LSD did not consider in depth the semantics of a mapping, as we do here. We now describe other related work to GLUE from several perspectives. Ontology Matching: Many works have addressed ontology matching in the context of ontology design and integration (e.g., [11, 24, 28, 27]). These works do not deal with explicit notions of similarity. They use a variety of heuristics to match ontology elements. They do not use machine learning and do not exploit information in the data instances. However, many of them [24, 28] have powerful features that allow for e?cient user interaction, or expressive rule languages [11] for specifying mappings. Such features are important components of a comprehensive solution to ontology matching, and hence should be added to GLUE in the future. Several recent works have attempted to further automate the ontology matching process. The Anchor-PROMPT system [29] exploits the general heuristic that paths (in the taxonomies or ontology graphs) between matching elements tend to contain other matching elements. The HICAL system [17] exploits the data instances in the overlap between the two taxonomies to infer mappings. [18] computes the similarity between two taxonomic nodes based on their signature TF/IDF vectors, which are computed from the data instances. Schema Matching: Schemas can be viewed as ontologies with restricted relationship types. The problem of schema matching has been studied in the context of data integration and data translation (see [33] for a survey). Several works [26, 21, 25] have exploited variations of the general heuristic “two nodes match if nodes in their neighborhood also match”, but in an isolated fashion, and not in the same general framework we have in GLUE. Notions of Similarity: The similarity measure in [17] is based on κ statistics, and can be thought of as being de?ned over the joint probability distribution of the concepts involved. In [19] the authors propose an information-theoretic notion of similarity that is based on the joint distribution. These works argue for a single best universal similarity mea-



The vision of the Semantic Web is grand. With the proliferation of ontologies on the Semantic Web, the development of automated techniques for ontology matching will be crucial to its success. We have described an approach that applies machine learning techniques to propose such semantic mappings. Our approach is based on well-founded notions of semantic similarity, expressed in terms of the joint probability distribution of the concepts involved. We described the use of machine learning, and in particular, of multi-strategy learning, for computing concept similarities. This learning technique makes our approach easily extensible to additional learners, and hence to exploiting additional kinds of knowledge about instances. Finally, we introduced relaxation labeling to the ontology-matching context, and showed that it can be adapted to e?ciently exploit a variety of heuristic knowledge and domain-speci?c constraints to further improve matching accuracy. Our experiments showed that we can accurately match 66 - 97% of the nodes on several real-world domains. Aside from striving to improve the accuracy of our methods, our main line of future research involves extending our techniques to handle more sophisticated mappings between ontologies (i.e., non 1-1 mappings), and exploiting more of the constraints that are expressed in the ontologies (via attributes and relationships, and constraints expressed on them).

We thank Phil Bernstein, Geo? Hulten, Natasha Noy, Rachel Pottinger, Matt Richardson, Pradeep Shenoy, and the anonymous reviewers for their invaluable comments. This work is supported by NSF Grants 9523649, 9983932, IIS-9978567, and IIS-9985114. The third author is also supported by an IBM Faculty Partnership Award. The fourth author is also supported by a Sloan Fellowship and gifts from Microsoft Research, NEC and NTT.

[1] [2] [3] [4] [5]


http://ontobroker.semanticweb.org. www.daml.org. www.google.com. IEEE Intelligent Systems, 16(2), 2001. A. Agresti. Categorical Data Analysis. Wiley, New York, NY, 1990. [6] T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scienti?c American, 279, 2001. [7] D. Brickley and R. Guha. Resource Description Framework Schema Speci?cation 1.0, 2000.

[8] J. Broekstra, M. Klein, S. Decker, D. Fensel, F. van Harmelen, and I. Horrocks. Enabling knowledge representation on the Web by Extending RDF Schema. In Proceedings of the Tenth International World Wide Web Conference, 2001. [9] D. Calvanese, D. G. Giuseppe, and M. Lenzerini. Ontology of Integration and Integration of Ontologies. In Proceedings of the 2001 Description Logic Workshop (DL 2001). [10] S. Chakrabarti, B. Dom, and P. Indyk. Enhanced Hypertext Categorization Using Hyperlinks. In Proceedings of the ACM SIGMOD Conference, 1998. [11] H. Chalupsky. Ontomorph: A Translation system for symbolic knowledge. In Principles of Knowledge Representation and Reasoning, 2000. [12] A. Doan, P. Domingos, and A. Halevy. Reconciling Schemas of Disparate Data Sources: A Machine Learning Approach. In Proceedings of the ACM SIGMOD Conference, 2001. [13] P. Domingos and M. Pazzani. On the Optimality of the Simple Bayesian Classi?er under Zero-One Loss. Machine Learning, 29:103–130, 1997. [14] D. Fensel. Ontologies: Silver Bullet for Knowledge Management and Electronic Commerce. SpringerVerlag, 2001. [15] J. He?in and J. Hendler. A Portrait of the Semantic Web in Action. IEEE Intelligent Systems, 16(2), 2001. [16] R. Hummel and S. Zucker. On the Foundations of Relaxation Labeling Processes. PAMI, 5(3):267–287, May 1983. [17] R. Ichise, H. Takeda, and S. Honiden. Rule Induction for Concept Hierarchy Alignment. In Proceedings of the Workshop on Ontology Learning at the 17th International Joint Conference on Arti?cial Intelligence (IJCAI), 2001. [18] M. Lacher and G. Groh. Facilitating the exchange of explixit knowledge through ontology mappings. In Proceedings of the 14th Int. FLAIRS conference, 2001. [19] D. Lin. An Information-Theoritic De?niton of Similarity. In Proceedings of the International Conference on Machine Learning (ICML), 1998. [20] S. Lloyd. An optimization approach to relaxation labeling algorithms. Image and Vision Computing, 1(2), 1983. [21] J. Madhavan, P. Bernstein, and E. Rahm. Generic Schema Matching with Cupid. In Proceedings of the International Conference on Very Large Databases (VLDB), 2001. [22] A. Maedche. A Machine Learning Perspective for the Semantic Web. Semantic Web Working Symposium (SWWS) Position Paper, 2001. [23] A. Maedche and S. Saab. Ontology Learning for the Semantic Web. IEEE Intelligent Systems, 16(2), 2001.

[24] D. McGuinness, R. Fikes, J. Rice, and S. Wilder. The Chimaera Ontology Environment. In Proceedings of the 17th National Conference on Arti?cial Intelligence (AAAI), 2000. [25] S. Melnik, H. Molina-Garcia, and E. Rahm. Similarity Flooding: A Versatile Graph Matching Algorithm. In Proceedings of the International Conference on Data Engineering (ICDE), 2002. [26] T. Milo and S. Zohar. Using Schema Matching to Simplify Heterogeneous Data Translation. In Proceedings of the International Conference on Very Large Databases (VLDB), 1998. [27] P. Mitra, G. Wiederhold, and J. Jannink. Semiautomatic Integration of Knowledge Sources. In Proceedings of Fusion’99. [28] N. Noy and M. Musen. PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment. In Proceedings of the National Conference on Arti?cial Intelligence (AAAI), 2000. [29] N. Noy and M. Musen. Anchor-PROMPT: Using NonLocal Context for Semantic Matching. In Proceedings of the Workshop on Ontologies and Information Sharing at the International Joint Conference on Arti?cial Intelligence (IJCAI), 2001. [30] B. Omelayenko. Learning of Ontologies for the Web: the Analysis of Existent approaches. In Proceedings of the International Workshop on Web Dynamics, 2001. [31] L. Padro. A Hybrid Environment for Syntax-Semantic Tagging, 1998. [32] N. Pernelle, M.-C. Rousset, and V. Ventos. Automatic Construction and Re?nement of a Class Hierarchy over Semi-Structured Data. In Proceeding of the Workshop on Ontology Learning at the 17th International Joint Conference on Arti?cial Intelligence (IJCAI), 2001. [33] E. Rahm and P. Bernstein. On Matching Schemas Automatically. VLDB Journal, 10(4), 2001. [34] K. M. Ting and I. H. Witten. Issues in stacked generalization. Journal of Arti?cial Intelligence Research (JAIR), 10:271–289, 1999. [35] M. Uschold. Where is the semantics in the Semantic Web? In Workshop on Ontologies in Agent Systems (OAS) at the 5th International Conference on Autonomous Agents, 2001. [36] van Rijsbergen. Information Retrieval. London:Butterworths, 1979. Second Edition. [37] D. Wolpert. Stacked generalization. Neural Networks, 5:241–259, 1992. [38] L. Yan, R. Miller, L. Haas, and R. Fagin. Data Driven Understanding and Re?nement of Schema Mappings. In Proceedings of the ACM SIGMOD, 2001.



All rights reserved Powered by 甜梦文库 9512.net

copyright ©right 2010-2021。