skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

Decision tree learning in Neo4j on homogeneous and unconnected graph nodes from biological and clinical datasets

BMC medical informatics and decision making, 2023-03, Vol.22 (Suppl 6), p.347-347, Article 347 [Peer Reviewed Journal]

2023. The Author(s). ;COPYRIGHT 2023 BioMed Central Ltd. ;2023. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. ;The Author(s) 2023 ;ISSN: 1472-6947 ;EISSN: 1472-6947 ;DOI: 10.1186/s12911-023-02112-8 ;PMID: 36879243

Full text available

Citations Cited by
  • Title:
    Decision tree learning in Neo4j on homogeneous and unconnected graph nodes from biological and clinical datasets
  • Author: Mondal, Rahul ; Do, Minh Dung ; Ahmed, Nasim Uddin ; Walke, Daniel ; Micheel, Daniel ; Broneske, David ; Saake, Gunter ; Heyer, Robert
  • Subjects: Algorithms ; Analysis ; Biomarkers ; Biomedical Research ; Blood pressure ; Body Mass Index ; Care and treatment ; Cypher ; Database administration ; Databases, Factual ; Datasets ; Decision tree ; Decision Trees ; Diabetes ; Diabetes mellitus ; Diabetics ; Feature extraction ; Graph database ; Health aspects ; Health informatics ; Health risks ; Humans ; Hyperglycemia ; Hypertension ; Insulin ; Java ; Learning algorithms ; Libraries ; Machine learning ; Methods ; Neo4j ; Nodes ; Pathogenesis ; Python ; Queries ; Risk factors
  • Is Part Of: BMC medical informatics and decision making, 2023-03, Vol.22 (Suppl 6), p.347-347, Article 347
  • Description: Graph databases enable efficient storage of heterogeneous, highly-interlinked data, such as clinical data. Subsequently, researchers can extract relevant features from these datasets and apply machine learning for diagnosis, biomarker discovery, or understanding pathogenesis. To facilitate machine learning and save time for extracting data from the graph database, we developed and optimized Decision Tree Plug-in (DTP) containing 24 procedures to generate and evaluate decision trees directly in the graph database Neo4j on homogeneous and unconnected nodes. Creation of the decision tree for three clinical datasets directly in the graph database from the nodes required between 0.059 and 0.099 s, while calculating the decision tree with the same algorithm in Java from CSV files took 0.085-0.112 s. Furthermore, our approach was faster than the standard decision tree implementations in R (0.62 s) and equal to Python (0.08 s), also using CSV files as input for small datasets. In addition, we have explored the strengths of DTP by evaluating a large dataset (approx. 250,000 instances) to predict patients with diabetes and compared the performance against algorithms generated by state-of-the-art packages in R and Python. By doing so, we have been able to show competitive results on the performance of Neo4j, in terms of quality of predictions as well as time efficiency. Furthermore, we could show that high body-mass index and high blood pressure are the main risk factors for diabetes. Overall, our work shows that integrating machine learning into graph databases saves time for additional processes as well as external memory, and could be applied to a variety of use cases, including clinical applications. This provides user with the advantages of high scalability, visualization and complex querying.
  • Publisher: England: BioMed Central Ltd
  • Language: English
  • Identifier: ISSN: 1472-6947
    EISSN: 1472-6947
    DOI: 10.1186/s12911-023-02112-8
    PMID: 36879243
  • Source: DOAJ Directory of Open Access Journals
    Geneva Foundation Free Medical Journals at publisher websites
    MEDLINE
    PubMed Central
    Springer Nature OA/Free Journals
    Coronavirus Research Database
    ROAD: Directory of Open Access Scholarly Resources
    ProQuest Central

Searching Remote Databases, Please Wait