skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

CatBoost for big data: an interdisciplinary review

Journal of big data, 2020-11, Vol.7 (1), p.94-94, Article 94 [Peer Reviewed Journal]

The Author(s) 2020 ;The Author(s) 2020. ;The Author(s) 2020. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. ;ISSN: 2196-1115 ;EISSN: 2196-1115 ;DOI: 10.1186/s40537-020-00369-8 ;PMID: 33169094

Full text available

Citations Cited by
  • Title:
    CatBoost for big data: an interdisciplinary review
  • Author: Hancock, John T. ; Khoshgoftaar, Taghi M.
  • Subjects: Algorithms ; Big Data ; CatBoost ; Categorical variable encoding ; Classification ; Cognitive tasks ; Communications Engineering ; Computational Science and Engineering ; Computer Science ; Data Mining and Knowledge Discovery ; Database Management ; Decision tree ; Decision trees ; Ensemble methods ; Information Storage and Retrieval ; Interdisciplinary aspects ; Machine learning ; Mathematical Applications in Computer Science ; Networks ; Parameter sensitivity ; Survey Paper
  • Is Part Of: Journal of big data, 2020-11, Vol.7 (1), p.94-94, Article 94
  • Description: Gradient Boosted Decision Trees (GBDT’s) are a powerful tool for classification and regression tasks in Big Data. Researchers should be familiar with the strengths and weaknesses of current implementations of GBDT’s in order to use them effectively and make successful contributions. CatBoost is a member of the family of GBDT machine learning ensemble techniques. Since its debut in late 2018, researchers have successfully used CatBoost for machine learning studies involving Big Data. We take this opportunity to review recent research on CatBoost as it relates to Big Data, and learn best practices from studies that cast CatBoost in a positive light, as well as studies where CatBoost does not outshine other techniques, since we can learn lessons from both types of scenarios. Furthermore, as a Decision Tree based algorithm, CatBoost is well-suited to machine learning tasks involving categorical, heterogeneous data. Recent work across multiple disciplines illustrates CatBoost’s effectiveness and shortcomings in classification and regression tasks. Another important issue we expose in literature on CatBoost is its sensitivity to hyper-parameters and the importance of hyper-parameter tuning. One contribution we make is to take an interdisciplinary approach to cover studies related to CatBoost in a single work. This provides researchers an in-depth understanding to help clarify proper application of CatBoost in solving problems. To the best of our knowledge, this is the first survey that studies all works related to CatBoost in a single publication.
  • Publisher: Cham: Springer International Publishing
  • Language: English
  • Identifier: ISSN: 2196-1115
    EISSN: 2196-1115
    DOI: 10.1186/s40537-020-00369-8
    PMID: 33169094
  • Source: SpringerOpen
    Coronavirus Research Database
    ROAD: Directory of Open Access Scholarly Resources
    ProQuest Central
    DOAJ Directory of Open Access Journals

Searching Remote Databases, Please Wait