skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

When is a correlation between non-independent variables "spurious"?

Oikos, 2004-06, Vol.105 (3), p.647-656 [Peer Reviewed Journal]

Copyright 2004 Oikos ;2004 INIST-CNRS ;ISSN: 0030-1299 ;EISSN: 1600-0706 ;DOI: 10.1111/j.0030-1299.2004.12777.x ;CODEN: OIKSAA

Full text available

Citations Cited by
  • Title:
    When is a correlation between non-independent variables "spurious"?
  • Author: Brett, Michael T.
  • Subjects: Animal, plant and microbial ecology ; Biological and medical sciences ; Correlation coefficients ; Correlations ; Ecological modeling ; Ecology ; Error rates ; Fundamental and applied biological sciences. Psychology ; General aspects. Techniques ; Hemic system ; Methods and techniques (sampling, tagging, trapping, modelling...) ; Monte Carlo methods ; Nitrogen ; Opinions ; Sample size ; Statistics
  • Is Part Of: Oikos, 2004-06, Vol.105 (3), p.647-656
  • Description: Correlations which are artifacts of various types of data transformations can be said to be spurious. This study considers four common types of analyses where the X and Y variables are not independent; these include regressions of the form X/Z vs Y/Z, X × Z vs Y × Z, X vs Y/X, and X+Y vs Y. These analyses were carried out using a series of Monte Carlo simulations while varying sample size and sample variability. The impact of disparities in variability between the shared and non-shared terms and measurement error for the shared term on the magnitude of the spurious correlations was also considered. The accuracy of equations previously derived to predict the magnitude of spurious correlations was also assessed. These results show the risk of producing spurious correlations when analyzing non-independent variables is very large. Spurious correlations occurred in all cases assessed, the mean spurious coefficient of determination ( r2) frequently exceeded 0.50, and in some cases the 90% confidence interval for these simulations included all large r2 values. The magnitude of spurious correlations was sensitive to differences in the variability of the shared and non-shared terms, with large spurious correlations obtained when the variability for the shared term was larger. Sample size had only a modest impact on the magnitude of spurious correlations. When measurement error for the shared variable was smaller than one half the coefficient of variation for that variable, which is generally the case, the measurement error did not generate large spurious correlations. The equations available to predict expected spurious correlations provided accurate predictions for the case of X × Z vs Y × Z, variable predictions for the case of X vs Y/X, and poor predictions for most cases of X/Z vs Y/Z, and X+Y vs Y.
  • Publisher: Copenhagen: Munksgaard International Publishers
  • Language: English
  • Identifier: ISSN: 0030-1299
    EISSN: 1600-0706
    DOI: 10.1111/j.0030-1299.2004.12777.x
    CODEN: OIKSAA
  • Source: Alma/SFX Local Collection

Searching Remote Databases, Please Wait