Individual differences research typically involves datasets with many possible predictors (p) (e.g., cognitive and linguistic abilities, demographic variables, etc.) but a limited number of observations per participant (n).  This “small n large p” problem is susceptible to model overfitting if all predictors were contained in the same model.  Machine learning methods offer a non-parametric bottom-up approach to assessing variable importance that is especially effective when there is high collinearity among predictors.

Figure1A copy.jpg

Click image to see the full paper.

Click image to see the full paper.

Current Collaborators

We have applied the Random Forests technique in a number of studies.  In this method, a forest of “decision trees” is created from multiple random samples of data and predictors. Predictor importance is assessed via a procedure that randomly ruffles the predictor among trees to determine how its presence of absence affects data modeling.  The outcome is a classifier parameterized on specific values of the most important predictors among the full set.  The method is particularly valuable for preserving the identity of predictor assessments (unlike data compression methods like PCA) and for uncovering novel relationships within the data.

More recent work uses Support Vector Machines (SVM) to build classifiers for distinguishing low proficiency bilinguals from bilingual children with specific language impairment.

Representative Publications

Jahn, A., Matsuki, K., Molfese, P.J., Van Dyke, J.A., (under revision).  Application of the random forests statistical technique to diffusion tensor imaging data. [poster]

Kuperman, V., Matsuki, K., & Van Dyke, J.A (2018). Contributions of reader- and text-level characteristics to eye-movement patterns during passage reading. Journal of Experimental Psychology: Learning, Memory, and Cognition.

[publisher | pubmed]

Matsuki, K., Kuperman, V., Van Dyke, J.A.(2016) The Random Forests statistical technique as applied to the study of reading disability.  Invited submission for special issue of Scientific Studies of Reading, 20(1), 20-33. [publisher | pubmed]