A Graph-based Framework for Relation Propagation and Its Application to Multi-Label Learning Ming Wu Dept. of Computer Science and Engineering Michigan State University East Lansing, MI 48824, USA Rong Jin Dept. of Computer Science and Engineering Michigan State University East Lansing, MI 48824, USA wuming@msu.edu ABSTRACT Lab el propagation exploits the structure of the unlab eled documents by propagating the lab el information of the training documents to the unlab eled documents. The limitation with the existing lab el propagation approaches is that they can only deal with a single typ e of ob jects. We prop ose a framework, named "relation propagation", that allows for information propagated among multiple typ es of ob jects. Empirical studies with multi-lab el text categorization showed that the prop osed algorithm is more effective than several semi-sup ervised learning algorithms in that it is capable of exploring the correlation among different categories and the structure of unlab eled documents simultaneously. rongjin@cse.msu.edu Figure 1: An example of the connected graph for relation propagation Categories and Subject Descriptors H.3.3 [Information Storage ad Retrieval]: [Information Search and Retrieval]; I.2.6 [Artificial Intelligence]: Learning--Concept Learning General Terms Algorithms, Exp erimentation, Theory Keywords Lab el propagation, Relation propagation, Semi-sup ervised Learning, Graph Laplacian 1. INTRODUCTION Lab el propagation is proven to b e effective for b oth text categorization and information retrieval [2, 1, 4, 3]. The key idea is to estimate the confidence scores of the unlab eled documents by propagating the lab el information of the lab eled documents through the similarity graph among documents. The limitation of most previous work is that they did not consider the scenarios in many applications that the lab el information needs to b e propagated not only among ob jects of the same typ e, but also b etween ob jects of different typ es. For instance, multi-lab el classification problem has two typ es of ob jects: documents and categories. We prop ose a generalized propagation framework for multiple typ es of ob jects (in particular, documents and categories) which propagates the relationship b etween documents and categories. We construct a weighted graph as follows: each node O(i,j ) = (di , cj ) in the graph represents the relationship that document di b elongs to the category cj ; any two nodes O(i,j ) and O(k,l) in the graph are connected by an edge whose weight reflects the correlation b etween the two corresp onding relationships. In other words, a large similarity b etween node O(i,j ) and O(k,l) indicates that document di is likely to b e assigned to class ci if document dk b elongs to class cl , and vice versa. It is imp ortant to note that class ck and cl can b e different in the ab ove propagation scheme. Figure 1 illustrates an example of the graph for the relation propagation with three documents and three categories. we refer to the prop osed framework as "Relation Propagation", or RP for short. 2. RELATION PROPAGATION AND ITS APPLICATION TO MULTI-LABEL LEARNING we first describ e the framework of relation propagation for two typ es of ob jects, followed by the generalization to multiple typ es of ob jects. Let A = (a1 , a2 , . . . , an ) and B = (b1 , b2 , . . . , bm ) denote the two typ es of ob jects. Let f (ai , bj ) : A × B R denote the relation function b etween an ob ject of typ e A and an ob ject of typ e B. Let y denote the vector of size mn Copyright is held by the author/owner(s). SIGIR'06, August 6­11, 2006, Seattle, Washington, USA. ACM 1-59593-369-7/06/0008. 717 0.25 0.2 Micro-average F1 measure 0.25 0.2 Macro-average F1 measure RP LP KNN SGT F1 measure 0.15 0.1 F1 measure 0 5 10 Rank 15 20 0.15 0.1 0.05 0 0.05 0 0 5 10 Rank 15 20 Figure 2: Classification results for 10% training examples. whose element y(i,j ) corresp onds to the relation f (ai , bj ). A double index (i, j ) is used to refer an element in vector y. For the convenience of discussion, we assume that the first Nl elements in vector y, denoted by yl , are the lab eled relations, and the remaining Nu = mn - Nl elements in y, denoted by yu , are the unlab eled relations that need to b e A B predicted. Finally, let S A = [Si,j ]n×n and S B = [Si,j ]m×m denote the matrices of similarities among the ob jects of typ e A and among the ob jects of typ e B, resp ectively. The energy function can b e defined as follows: Em = y LA,B (3) for multi-lab el learning can b e expanded as: yu = I ¯D - Su , u Ç ¯ SC -1 ¯ Ç D Su , l ¯ SC y l (4) 3. EXPERIMENTS We compare the prop osed algorithm to the lab el propagation approach(LP), KNN method and sp ectral graph transducer (SGT) for multi-lab el learning. We randomly selected 100 categories and textual description of 1000 images as our testb ed from the ESTA1 . For the selected testb ed, the numb er of documents for each category varies from 14 to around 176, and the average numb er of documents p er category is around 44. Meanwhile, the numb er of categories p er document varies from 1 to 12, and the average numb er of categories p er document is around 4. We evaluate the prop osed algorithm using F1 measure based on precison/recall across documents (micro-average) and precision/recall across categories (macro-average) at the first 20 ranks. Finally, 10% of documents are used for training and 90% are used for testing. The average F1 measure for 10 fold is used as final evaluation metric. Figure 2 shows the classification results. y S B . Opwhere LA,B = DA,B - S A,B and S A,B = S Æ stands for the direct product of matrices. DA,B erator is defined as a diagonal matrix whose diagonal element is AB AB defined as D(i,,j ),(i,j ) = Di Dj . The optimal solution for yu that minimizes the energy function is yu = AÆ (1) L A,B u,u -1 A Su,,B yl l (2) Replacing normal similarity matrices with normalized similarity matrices, yu can b e defined as: yu = I ¯A u - Su,,B -1 ¯A Su,,B yl l (3) 4. CONCLUSION In this pap er, we presented a graph-based framework for lab el propagation, named "relation propagation", that allows for propagating lab el information among multiple typ es of ob jects simultaneously. Empirical studies with multilab el text categorization have verified that the prop osed algorithm for multi-lab el learning is more effective than b oth the lab el propagation using harmonic functions and the sp ectral graph transducer, particularly when the numb er of training examples is small. It is straightforward to extend the ab ove formulism to the case of multiple typ es of ob jects. Let T = (O 1 , O2 , . . . , Ot ) denote the t different typ es of ob jects. Let f : O 1 × O2 . . . × Ot R denote the relation function among t different typ es k of ob jects. Let {S k = [Si,j ]nk ×nk , k = 1, 2, . . . , t} denote the similarity matrices for the t typ es of ob jects. We then have the energy function for the t typ es of ob jects written as: Em = y Ç t k =1 y S k 5. REFERENCES 2 t i1 ,oi2 ,...,oit ) where each element in the vector y, i.e., y(o1 , cor- resp onds to the relation f (o11 , o22 , . . . , ott ). A solution simii i i lar to the one in Equation (2) will minimize the ab ove energy function. It is straightforward to apply the framework of relation propagation to multi-lab el learning. Let's denote the collection of documents by D = (d1 , d2 , . . . , dn ), and the set of categories by C = (c1 , c2 , . . . , cm ). Then, the relation f (di , cj ) is a binary function that outputs 1 when document di b elong to the j -th category, and 0 otherwise. Using simi¯D ¯u C lar way to compute Su,,C and LD,,u , the solution in Equation l [1] W. Chu and Z. Ghahramani. Preference learning with gaussian processes. In Proc. ICML, 2005. [2] C. Williams. Computation with infinite neural networks. Neural Computation, 10(5), 1998. [3] D. Zhou, B. Sch¨lkopf, and T. Hofmann. o Semi-sup ervised learning on directed graphs. In Proc. NIPS, 2005. [4] X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-sup ervised learning using Gaussian fields and harmonic functions. In Proc. ICML, 2003. 1 http://ir.shef.ac.uk/imageclef/2004/stand.html 718