Spurious relationship

The Conceptual Model

Correlation

Figure 1: Correlation

Spurious relationship

Figure 2: Spurious relationship


Explanation

In a conceptual model, the concepts are normally placed in a rectangular. We have three concepts, X, S and Y.
We assume an association between X and Y (See Figure 1). The observed association is the result of a common cause S (See Figure 2). If you want to graphically depict a Spurious relationship, it may be useful to show the model without the common cause as well but this is a bit a matter of taste.
It is not always necessary to label the paths but for this tutorial it will turn out to be handy. Normally, when there is no sign (or label) it is assumed that the path has a positive valence. It is, however, good practice to include the valence of the paths in your conceptual models.

Please note that in this example I hypothesize an association between X and Y. But the same reasoning would hold if we would hypothesize a direct effect between X and Y.


Abstract hypotheses

  • Hypo1: X is related to Y (\(d>0\)).
  • Hypo2: S leads to X (\(e>0\)).
  • Hypo3: S leads to Y (\(f>0\)).
  • Hypo4: S is the reason why X and Y are related (\(d' < d\))

Real life example

X is occupational success S is occupational success.
Y is health

  • Hypo1: Occupational success is related to a better health.
  • Hypo2: Educational success leads to occupational success.
  • Hypo3: Educational success leads to a better health.
  • Hypo4: That occupational success and health are related is at least in part because educational success causes both occupational success and a better health.

Structural equations

  • Y=X
  • Y=S
  • X=S

or, following the syntax of the R package Lavaan

  • Y~~X
  • Y~S
  • X~S

Formal test of hypotheses

Load the NELLS data.

rm(list = ls())  #empty environment
require(haven)
nells <- read_dta("../static/NELLS panel nl v1_2.dta")  #change directory name to your working directory

Operationalize concepts.

# We will use the data of wave 2.
nellsw2 <- nells[nells$w2cpanel == 1, ]

# As an indicator of occupational success we will use income in wave 2.
table(nellsw2$w2fa61, useNA = "always")
attributes(nellsw2$w2fa61)
# recode (I will start newly created variables with cm from conceptual models)
nellsw2$cm_income <- nellsw2$w2fa61
nellsw2$cm_income[nellsw2$cm_income == 1] <- 100
nellsw2$cm_income[nellsw2$cm_income == 2] <- 225
nellsw2$cm_income[nellsw2$cm_income == 3] <- 400
nellsw2$cm_income[nellsw2$cm_income == 4] <- 750
nellsw2$cm_income[nellsw2$cm_income == 5] <- 1250
nellsw2$cm_income[nellsw2$cm_income == 6] <- 1750
nellsw2$cm_income[nellsw2$cm_income == 7] <- 2250
nellsw2$cm_income[nellsw2$cm_income == 8] <- 2750
nellsw2$cm_income[nellsw2$cm_income == 9] <- 3250
nellsw2$cm_income[nellsw2$cm_income == 10] <- 3750
nellsw2$cm_income[nellsw2$cm_income == 11] <- 4250
nellsw2$cm_income[nellsw2$cm_income == 12] <- 4750
nellsw2$cm_income[nellsw2$cm_income == 13] <- 5250
nellsw2$cm_income[nellsw2$cm_income == 14] <- 5750
nellsw2$cm_income[nellsw2$cm_income == 15] <- 6500
nellsw2$cm_income[nellsw2$cm_income == 16] <- 7500
nellsw2$cm_income[nellsw2$cm_income == 17] <- NA
# let us scale the variable a bit and translate into income per 1000euro
nellsw2$cm_income <- nellsw2$cm_income/1000

# from household income to personal income
attributes(nellsw2$w2fa62)
table(nellsw2$w2fa62, useNA = "always")
nellsw2$cm_income_per <- nellsw2$w2fa62
nellsw2$cm_income_per[nellsw2$cm_income_per == 1] <- 0
nellsw2$cm_income_per[nellsw2$cm_income_per == 2] <- 10
nellsw2$cm_income_per[nellsw2$cm_income_per == 3] <- 20
nellsw2$cm_income_per[nellsw2$cm_income_per == 4] <- 30
nellsw2$cm_income_per[nellsw2$cm_income_per == 5] <- 40
nellsw2$cm_income_per[nellsw2$cm_income_per == 6] <- 50
nellsw2$cm_income_per[nellsw2$cm_income_per == 7] <- 60
nellsw2$cm_income_per[nellsw2$cm_income_per == 8] <- 70
nellsw2$cm_income_per[nellsw2$cm_income_per == 9] <- 80
nellsw2$cm_income_per[nellsw2$cm_income_per == 10] <- 90
nellsw2$cm_income_per[nellsw2$cm_income_per == 11] <- 100
nellsw2$cm_income_per[nellsw2$cm_income_per == 12] <- NA

nellsw2$cm_income_ind <- nellsw2$cm_income * nellsw2$cm_income_per/100

# as an indicator of educational success we will use highest completed level of education in years.
# the rationale behind this coding this I will take the maximum for university as 16.5 (taking into
# account that some masters are 2 years and some 1 year) and subsequently subtract the years needed
# to obtain a university degree given the degree under consideration.

attributes(nellsw2$w2fa102)
table(nellsw2$w2fa102, useNA = "always")
nellsw2$cm_education <- nellsw2$w2fa102
nellsw2$cm_education[nellsw2$w2fa102 == 1] <- 6
nellsw2$cm_education[nellsw2$w2fa102 == 2] <- 9
nellsw2$cm_education[nellsw2$w2fa102 == 3] <- 10
nellsw2$cm_education[nellsw2$w2fa102 == 4] <- 11
nellsw2$cm_education[nellsw2$w2fa102 == 5] <- 12
nellsw2$cm_education[nellsw2$w2fa102 == 6] <- 10
nellsw2$cm_education[nellsw2$w2fa102 == 7] <- 11
nellsw2$cm_education[nellsw2$w2fa102 == 8] <- 14
nellsw2$cm_education[nellsw2$w2fa102 == 9] <- 15
nellsw2$cm_education[nellsw2$w2fa102 == 10] <- 16.5
nellsw2$cm_education[nellsw2$w2fa102 == 11] <- 16.5
nellsw2$cm_education[nellsw2$w2fa102 == 12] <- 7
nellsw2$cm_education[nellsw2$w2fa102 == 13] <- 11
nellsw2$cm_education[nellsw2$w2fa102 == 14] <- 14.5
nellsw2$cm_education[nellsw2$w2fa102 == 15] <- 4


# as an indicator of health we will use subjective well being from 5 (excellent) to 1 (bad) thus we
# have to reverse code original variable
attributes(nellsw2$w2scf1)
table(nellsw2$w2scf1, useNA = "always")
nellsw2$cm_health <- 6 - nellsw2$w2scf1
## 
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17 <NA> 
##   55   78  103  204  338  326  282  272  276  205  133   62   48   22   22   29  374    0 
## $label
## [1] " wat is het netto inkomen per maand van u en uw partner samen?/van u?/  "
## 
## $format.stata
## [1] "%8.0g"
## 
## $labels
##  Minder dan ¤150 per maand      ¤150 - ¤299 per maand      ¤300 - ¤499 per maand 
##                          1                          2                          3 
##      ¤500 - ¤999 per maand  ¤1.000 - ¤1.499 per maand  ¤1.500 - ¤1.999 per maand 
##                          4                          5                          6 
##  ¤2.000 - ¤2.499 per maand  ¤2.500 - ¤2.999 per maand  ¤3.000 - ¤3.499 per maand 
##                          7                          8                          9 
##  ¤3.500 - ¤3.999 per maand  ¤4.000 - ¤4.499 per maand  ¤4.500 - ¤4.999 per maand 
##                         10                         11                         12 
##  ¤5.000 - ¤5.499 per maand  ¤5.500 - ¤5.999 per maand  ¤6.000 - ¤6.999 per maand 
##                         13                         14                         15 
##   ¤7.000 of meer per maand weet niet, wil niet zeggen 
##                         16                         17 
## 
## $class
## [1] "haven_labelled" "vctrs_vctr"     "double"        
## 
## $label
## [1] " hoe groot is uw bijdrage in dit inkomen ongeveer? kunt u een percentage noemen "
## 
## $format.stata
## [1] "%8.0g"
## 
## $labels
## vrijwel geen bijdrage          ongeveer 10%          ongeveer 20%          ongeveer 30% 
##                     1                     2                     3                     4 
##          ongeveer 40%          ongeveer 50%          ongeveer 60%          ongeveer 70% 
##                     5                     6                     7                     8 
##          ongeveer 80%          ongeveer 90%         ongeveer 100%             weet niet 
##                     9                    10                    11                    12 
## 
## $class
## [1] "haven_labelled" "vctrs_vctr"     "double"        
## 
## 
##    1    2    3    4    5    6    7    8    9   10   11   12 <NA> 
##  253   48   89  259  233  242  183  229  114   63  887  229    0 
## $label
## [1] " wat is uw hoogst voltooide opleiding, dat wil zeggen waarvan u een diploma heef"
## 
## $format.stata
## [1] "%8.0g"
## 
## $labels
##                                                   lagere school 
##                                                               1 
##                                               lbo, vmbo-kb\\bbl 
##                                                               2 
##                                                   mavo, vmbo-tl 
##                                                               3 
##                                                            havo 
##                                                               4 
##                                                  vwo\\gymnasium 
##                                                               5 
##    mbo-kort (kmbo), primair leerlingwezen, bol\\bbl niveau 1 of 
##                                                               6 
## mbo-tussen\\lang (mbo), secundair\\tertiar leerlingwezen, bol\\ 
##                                                               7 
##                                                             hbo 
##                                                               8 
##                                         universiteit (bachelor) 
##                                                               9 
##                                universiteit (master, doctoraal) 
##                                                              10 
##                                                 promotietraject 
##                                                              11 
##    buitenlandse opleiding, niet goed in te delen, lager onderwi 
##                                                              12 
##    buitenlandse opleiding, niet goed in te delen, middelbaar on 
##                                                              13 
##    buitenlandse opleiding, niet goed in te delen, hoger onderwi 
##                                                              14 
##                                                  geen opleiding 
##                                                              15 
## 
## $class
## [1] "haven_labelled" "vctrs_vctr"     "double"        
## 
## 
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15 <NA> 
##  118  223  202  205  117  223  737  586   89  208   12    8   20   17   34   30 
## $label
## [1] " wat vindt u, over het algemeen genomen, van uw gezondheid? "
## 
## $format.stata
## [1] "%8.0g"
## 
## $labels
## uitstekend  zeer goed       goed      matig     slecht 
##          1          2          3          4          5 
## 
## $class
## [1] "haven_labelled" "vctrs_vctr"     "double"        
## 
## 
##    1    2    3    4    5 <NA> 
##  438  853 1211  247   48   32

And test the model with Lavaan.

require(lavaan)
var(cbind(nellsw2$cm_income_ind, nellsw2$cm_health), na.rm = TRUE)

model <- "
  # association
  cm_health ~~ cm_income_ind  
  #regressions
  cm_health ~ cm_education  
  cm_income_ind ~ cm_education
"

# first check the sample covariance matrix
inspectSampleCov(model, data = nellsw2)

fit <- sem(model, data = nellsw2)
summary(fit, standardized = TRUE)
##            [,1]       [,2]
## [1,] 1.02349869 0.07399422
## [2,] 0.07399422 0.83887136
## $cov
##               cm_hlt cm_nc_ cm_dct
## cm_health     0.837               
## cm_income_ind 0.069  1.011        
## cm_education  0.378  0.900  6.570 
## 
## lavaan 0.6-7 ended normally after 13 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          5
##                                                       
##                                                   Used       Total
##   Number of observations                          2326        2829
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   cm_health ~                                                           
##     cm_education      0.058    0.007    7.873    0.000    0.058    0.161
##   cm_income_ind ~                                                       
##     cm_education      0.137    0.008   17.972    0.000    0.137    0.349
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .cm_health ~~                                                          
##    .cm_income_ind     0.017    0.018    0.958    0.338    0.017    0.020
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .cm_health         0.816    0.024   34.103    0.000    0.816    0.974
##    .cm_income_ind     0.887    0.026   34.103    0.000    0.887    0.878

We observe that the (error)covariance between health and occupational success is no longer significant.