Canonical Correlation Analysis (CCA) is a multivariate technique that can describe how measures from different domains – such as brain and behaviour – vary together. Recent work suggests that thousands of individuals are required in this type of multivariate analysis to obtain consistently reproducible results (Marek et al., 2022).
The goal of this project is to investigate the effects of sample size on brain-behaviour CCA. We will use imaging-derived phenotypes and cognitive measures from around 40,000 individuals from the UK Biobank. Specifically, we will focus on diffusion magnetic resonance imaging (dMRI) data and cognitive function, which have been previously shown to covary strongly (McPherson & Pestilli, 2021). We aim to assess the replicability of CCA correlations by fitting models on bootstrapped samples and testing the fitted models on held-out (validation) data. We will vary our data samples along the following axes:
We expect this study to further inform on the effects of sample size, sample composition and analysis pipeline on the replicability of multivariate methods.