Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure

Täckström, Oscar and McDonald, Ryan and Uszkoreit, Jakob (2012) Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure. In: The 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2012), 3-8 June 2012, Montreal, Canada. (In Press)

PDF - Accepted Version


It has been established that incorporating word cluster features derived from large unlabeled corpora can significantly improve prediction of linguistic structure. While previous work has focused primarily on English, we extend these results to other languages along two dimensions. First, we show that these results hold true for a number of languages across families. Second, and more interestingly, we provide an algorithm for inducing cross-lingual clusters and we show that features derived from these clusters significantly improve the accuracy of cross-lingual structure prediction. Specifically, we show that by augmenting direct-transfer systems with cross-lingual cluster features, the relative error of delexicalized dependency parsers, trained on English treebanks and transferred to foreign languages, can be reduced by up to 13%. When applying the same method to direct transfer of named-entity recognizers, we observe relative improvements of up to 26%.

Item Type:Conference or Workshop Item (Paper)
ID Code:5251
Deposited By:Oscar Tackström
Deposited On:16 Apr 2012 14:44
Last Modified:16 Apr 2012 14:44

Repository Staff Only: item control page