skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

Building a Large Syntactically-Annotated Corpus of Vietnamese

Proc. LAW III, 2009

Distributed under a Creative Commons Attribution 4.0 International License

Digital Resources/Online E-Resources

Citations Cited by
  • Title:
    Building a Large Syntactically-Annotated Corpus of Vietnamese
  • Author: Nguyen, Phuong Thai ; Vu, Xuan Luong ; Nguyen, Thi Minh Huyen ; Nguyen, van Hiep ; Le, Hong Phuong
  • Subjects: Computer Science ; Document and Text Processing
  • Is Part Of: Proc. LAW III, 2009
  • Description: Treebank is an important resource for both research and application of natural language processing. For Vietnamese, we still lack such kind of corpora. This paper presents up-to-date results of a project for Vietnamese treebank construction. Since Vietnamese is an isolating language and has no word delimiter, there are many ambiguities in sentence analysis. We systematically applied a lot of linguistic techniques to handle such ambiguities. Annotators are supported by automatic labeling tools and a tree-editor tool. Raw texts are extracted from Tuoi Tre (Youth), an online Vietnamese daily newspaper. The current annotation agreement is around 90 percent.
  • Language: English
  • Source: Hyper Article en Ligne (HAL) (Open Access)

Searching Remote Databases, Please Wait