skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

Reflection of Demographic Background on Word Usage

Computational linguistics - Association for Computational Linguistics, 2023-06, Vol.49 (2), p.373-394 [Peer Reviewed Journal]

2023. This work is published under https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. ;ISSN: 0891-2017 ;EISSN: 1530-9312 ;DOI: 10.1162/coli_a_00475

Full text available

Citations Cited by
  • Title:
    Reflection of Demographic Background on Word Usage
  • Author: Garimella, Aparna ; Banea, Carmen ; Mihalcea, Rada
  • Subjects: Categories ; Collocations ; Computational linguistics ; Demographics ; Demography ; Language usage ; Linguistics ; Psychology ; Topics ; Word meaning ; Words (language)
  • Is Part Of: Computational linguistics - Association for Computational Linguistics, 2023-06, Vol.49 (2), p.373-394
  • Description: The availability of personal writings in electronic format provides researchers in the fields of linguistics, psychology, and computational linguistics with an unprecedented chance to study, on a large scale, the relationship between language use and the demographic background of writers, allowing us to better understand people across different demographics. In this article, we analyze the relation between language and demographics by developing cross-demographic word models to identify words with , or words that are used in significantly different ways by speakers of different demographics. Focusing on three demographic categories, namely, location, gender, and industry, we identify words with significant usage differences in each category and investigate various approaches of encoding a word’s usage, allowing us to identify language aspects that contribute to the differences. Our word models using topic-based features achieve at least 20% improvement in accuracy over the baseline for all demographic categories, even for scenarios with classification into 15 categories, illustrating the usefulness of topic-based features in identifying word usage differences. Further, we note that for and , topics extracted from immediate context are the best predictors of word usages, hinting at the importance of and its for these demographics, while for , topics obtained from longer contexts are better predictors for word usage.
  • Publisher: One Broadway, 12th Floor, Cambridge, Massachusetts 02142, USA: MIT Press
  • Language: English
  • Identifier: ISSN: 0891-2017
    EISSN: 1530-9312
    DOI: 10.1162/coli_a_00475
  • Source: Alma/SFX Local Collection
    DOAJ Directory of Open Access Journals

Searching Remote Databases, Please Wait