Conceptual Models


Conceptual models may help a researcher to:

  • organize the thought process and improve structured thinking
  • critically think about causality
  • identify equivalent models
  • formulate (informative and testable) hypotheses1
  • summarize hypotheses graphically for a lay audience
  • summarize hypotheses graphically for an expert audience when translated into a path diagram

Although closely related, a conceptual model is not a path diagram.
A conceptual model is a tool for researchers to organize the thought process and to concisely summarize hypotheses. Although it is a good idea to follow some conventions there are no real rules in how to graphically represent the assumed relations between concepts. Researcher can (should?!) be creative in how they want to depict non-linear relations, feedback mechanisms, or complex ‘aggregation rules’.
A path diagram is a formal and graphical representation of a structural model (i.e. a set of structural equations). There are clear rules how to depict manifest and latent variables, (error)variances between exogenous and endogenous variables, random intercepts and slopes, et cetera.


The intended learning outcomes of this course are:

  1. Theoretical knowledge and insight: you will be able to define core concepts related to conceptual models (e.g. correlation, causality, mediation, moderation) and are able to explain the distinction with path diagrams.
  2. Academic attitude: you develop a positive attitude towards applying conceptual models in your own social science research projects.
  3. Research skills: you will be able to use conceptual models as a tool to: formulate strong hypotheses; summarize hypotheses; assess the system of formulated hypotheses in social science projects of others.


On each different page we discuss a different conceptual model.

  • Correlation
  • Direct effect
  • Two direct effects (partial correlation / multiple regression)
  • Mediation
  • Spuriousness
  • Moderation
  • Mediated Moderation
  • Moderated Mediation
  • Multi-level model
  • Multi-level mediation (2-1-1 model)
  • Multi-level moderation (cross-level interaction)

Each page contains the following subsections:

  • The Conceptual Model
  • Explanation
  • Abstract hypothesis/hypotheses
  • Real life example
  • Structural equations
  • Formal test of hypotheses

Feel free to jump to the page and subsection you are most interested in. But there is a clear order in the pages and subsections. The best way to reach the aim of this course is by going through the pages one by one from top to bottom.
Use the menu on the left to navigate through the main sections.
Use the menu on the right to navigate to subsections of each current page.

SEM and path diagrams

Although I will try to translate each Conceptual Model into a more formal set of structural equations and will show you how to test this structural model (with the R package Lavaan), this is not a tutorial on SEM or path diagrams.

For useful links see:

Correlation versus Causation

More often than not, the conceptual model will try to summarize causal mechanisms (e.g. X causes Y). And more often than not, it is not possible to give a causal interpretation of the estimates obtained after the hypotheses test. I will try to discuss the extent to which parameter estimates may be given a causal interpretation throughout this tutorial.
Also note that I use the label correlation but naturally the correct measure for the presumed (bivariate) association depends on the type of scales used for the measurement of the variables involved.

Levels of measurement

  • Categorical (binary and nominal)
  • Ordinal
  • Continuous:
    • Interval
    • Ratio

Covariance and Correlation

Social scientists try to explain variance and covariance. It is therefore a good idea to learn by heart the formula for variance and covariance. The sample variance of a random continuous variable X, VAR(X), is as follows:

\[VAR(X)=s_{xx}^2 = s_x^2= \frac{\Sigma^n_{i=1}(X_i-\overline{X})(X_i-\overline{X})}{n-1}= \frac{\Sigma^n_{i=1}(X_i-\overline{X})^2}{n-1}\]
(And the sample standard deviation is given by: \(STD(X)=\sqrt{s_x^2}=s_x\).)

The sample covariance of two random continuous variables X and Y, COV(X,Y) is as follows:

\[COV(X,Y)=s_{xy}^2 = \frac{\Sigma^n_{i=1}(X_i-\overline{X})(Y_i-\overline{Y})}{n-1}\] The sample correlation coefficient between two random continuous variables X and Y, COR(X,Y), is a covariance on standardized variables (\(z_x=X_{sd}=(X-\overline{X})/s_x\)) and hence:

\[COR(X,Y)=r_{xy} = \frac{s_{xy}^2}{s_x s_y}= \frac{\Sigma^n_i(X_i-\overline{X})(Y_i-\overline{Y})}{\sqrt{\Sigma^n_i(X_i-\overline{X})^2}\sqrt{\Sigma^n_i(Y_i-\overline{Y})^2}}\] Just to be complete. The population equivalent of the covariance:

\[\sigma _{xy}^2 = \frac{\Sigma^n_i(X_i - \mu_x)(Y_i-\mu_y)}{N},\] with \(\mu\) the population mean. And the correlation within the population is:

\[\rho_{xy} = \frac{\sigma_{xy}^2}{\sigma_x \sigma_y}\]


For the examples in this tutorial I will use the dateset [The NEtherlands Longitudinal Lifecourse Study (NELLS)] ( (Tolsma et al. (2014)). Please follow the link to download the codebook and dataset.


Tolsma, Jochem, G. L. M. Kraaykamp, Paul M. de Graaf, Matthijs Kalmijn, and Christiaan WS Monden. 2014. “The NEtherlands Longitudinal Lifecourse Study (NELLS, Panel): Codebook.”

  1. With hypothesis (plural hypotheses) I mean a proposed explanation for a (social) phenomenon. The hypothesis becomes scientific when the explanation is a prediction derived from a theory and it is testable with empirical data and a suitable scientific method.↩︎