Automatic Bilingual Lexicon Acquisition Using Random Indexing of Parallel Corpora

Sahlgren, Magnus and Karlgren, Jussi (2005) Automatic Bilingual Lexicon Acquisition Using Random Indexing of Parallel Corpora. Natural Language Engineering, 11 (3). pp. 327-341.

Full text not available from this repository.


This paper presents a very simple and effective approach to using parallel corpora for automatic bilingual lexicon acquisition. The approach, which uses the Random Indexing vector space methodology, is based on finding correlations between terms based on their distributional characteristics. The approach requires a minimum of preprocessing and linguistic knowledge, and is efficient, fast and scalable. In this paper, we explain how our approach differs from traditional cooccurrence-based word alignment algorithms, and we demonstrate how to extract bilingual lexica using the Random Indexing approach applied to aligned parallel data. The acquired lexica are evaluated by comparing them to manually compiled gold standards, and we report overlap of around 60\%. We also discuss methodological problems with evaluating lexical resources of this kind.

Item Type:Article
Subjects:I. Computing Methodologies > I.2 ARTIFICIAL INTELLIGENCE
I. Computing Methodologies > I.7 DOCUMENT AND TEXT PROCESSING (H.4, H.5)
J. Computer Applications > J.5 ARTS AND HUMANITIES
ID Code:25
Deposited By:Userware Researcher
Deposited On:20 Oct 2005
Last Modified:18 Nov 2009 15:51

Repository Staff Only: item control page