Canonical Correlation Analysis of brain and behaviour

Effects of sample size/composition and analysis pipeline

Repository

Background

Canonical Correlation Analysis (CCA) is a multivariate technique that can describe how measures from different domains – such as brain and behaviour – vary together. Recent work suggests that thousands of individuals are required in this type of multivariate analysis to obtain consistently reproducible results (Marek et al., 2022).

Objective

The goal of this project is to investigate the effects of sample size on brain-behaviour CCA. We will use imaging-derived phenotypes and cognitive measures from around 40,000 individuals from the UK Biobank. Specifically, we will focus on diffusion magnetic resonance imaging (dMRI) data and cognitive function, which have been previously shown to covary strongly (McPherson & Pestilli, 2021). We aim to assess the replicability of CCA correlations by fitting models on bootstrapped samples and testing the fitted models on held-out (validation) data. We will vary our data samples along the following axes:

Sample size of the training dataset
Sample composition (full sample, healthy participants, disease samples)
CCA pipeline (with or without cross-validation)

We expect this study to further inform on the effects of sample size, sample composition and analysis pipeline on the replicability of multivariate methods.

Preprint coming soon!

People

Michelle Wang

PhD student

Brent McPherson

Postdoctoral researcher