skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

Concept Recognition with Convolutional Neural Networks to Optimize Keyphrase Extraction

DOI: 10.5281/zenodo.3727553

Digital Resources/Online E-Resources

Citations Cited by
  • Title:
    Concept Recognition with Convolutional Neural Networks to Optimize Keyphrase Extraction
  • Author: Waldis, Andreas ; Mazzola, Luca ; Kaufmann, Michael
  • Subjects: Concept recognition ; Convolutional neural networks ; Keyphrase extraction ; Keyword extraction ; Natural language processing
  • Description: For knowledge management purposes, it would be useful to automatically classify and tag documents based on their content. Keyphrase extraction is one way of achieving this automatically by using statistical or semantic methods. Speak corpus-index-based keyphrase extraction can extract relevant concepts for documents, the inverse document index grows exponentially with the number of words that candidate concepts can have. Document-based heuristics can solve this issue, but often result in keyphrases that are not concepts. To increase concept precision, or the percentage of extracted keyphrases that represent actual concepts, we contribute a method to filter keyphrases based on a pre-trained convolutional neural network (CNN). We tested CNNs containing vertical and horizontal filters to decide whether an n-gram (ie, a consecutive sequence of N words) is a concept or not, from a training set with labeled examples. The classification training signal is derived from the Wikipedia corpus, assuming that an n-gram certainly represents a concept if a corresponding Wikipedia page title exists. The CNN input feature is the vector representation of each word, derived from a word embedding model; the output is the probability of an n-gram to represent a concept. Multiple configurations for vertical and horizontal filters are analyzed and optimized through a hyper-parameterization process. The results demonstrated concept precision for extracted keywords of between 60 and 80% on average. Consequently, by applying a CNN-based concept recognition filter, the concept precision of keyphrase extraction was significantly improved.
  • Publisher: Springer Nature Switzerland AG
  • Creation Date: 2019
  • Language: German
  • Identifier: DOI: 10.5281/zenodo.3727553
  • Source: LORY (Lucerne Open Repository)

Searching Remote Databases, Please Wait