--- title: "Analysis script and documentation for Dynamic assessment of the effectiveness of digital game-based literacy training in beginning readers" author: Toivo Glatz1,2,3, Wim Tops1, Elisabeth Borleffs1,2, Ulla Richardson4, Natasha Maurits2,5, Annemie Desoete6, & Ben Maassen1,2,7;1Centre for Language and Cognition (CLCG),University of Groningen, The Netherlands, 2Behavioural and Cognitive Neuroscience (BCN), University of Groningen, The Netherlands, 3Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Public Health, 4Centre for Applied Language Studies, University of Jyväskylä, Finland, 5Department of Neurology, University of Groningen, the Netherlands, 6Department of Developmental Disorders, Ghent University, Belgium 7Department of Neuroscience, University Medical Center Groningen, University of Groningen, The Netherlands date: "Created on: `r format(Sys.time(), '%B %d, %Y - %H:%M:%S')`" output: html_document: number_sections: true fig_height: 8 fig_width: 11 code_folding: hide editor_options: chunk_output_type: console --- ```{r setup, include=FALSE} knitr::opts_chunk$set(error = FALSE, message = FALSE, warning = FALSE, cache = TRUE, size = "small") options(width = 200) options(warn=-1) packages <- c("lme4", "lmerTest", "effects", "lattice", "data.table", "mgcv", "itsadug", "psych", "boot", "Hmisc", "arsenal", "MuMIn", "optimx", "performance") if (length(setdiff(packages, rownames(installed.packages()))) > 0) { install.packages(setdiff(packages, rownames(installed.packages())), repos="https://cloud.r-project.org", dependencies = TRUE) } library(lmerTest) library(effects) #library(lattice) library(data.table) library(mgcv) library(itsadug) library(psych) library(boot) library(Hmisc) library(arsenal) library(MuMIn) #library(optimx) #library(performance) source("rsquared.glmm.r") # dependency to compute conditional and marginal R^2 of mixed models source("corstars.R") # dependency to display correlation tables and significance asterisks bootthreads = 5 # to speed up bootstrapping set to number of physical cores simulations = 1000 # number of bootstrapping replications # 1000 in paper bootparallel = "multicore" # method to set up use of multiple cores, use "snow" on windows and "multicore" on POSIX (MAC/LINUX) load("GG-NL_Data_PeerJ.rda") ``` # Documentation & Analyses {.tabset} ## Abstract In this paper, we report on a study evaluating the effectiveness of a digital game-based learning (DGBL) tool for beginning readers of Dutch, employing active (math game) and passive (no game) control conditions. This classroom-level randomized control trial included 247 first graders from 16 classrooms in the Netherlands and the Dutch-speaking part of Belgium. The intervention consisted of 10 to 15 minutes of daily playing during school time for a period of 4 to 7 weeks. Our outcome measures included reading fluency, as well as purpose built in-game proficiency levels to measure written lexical decision and letter speech sound association. After an average of 28 playing sessions, the literacy game improved letter knowledge at a scale generalizable for all children in the classroom compared to the other two conditions. In addition to a small classroom wide benefit in terms of reading fluency, we furthermore discovered that children who scored high on phonological awareness prior to training were more fluent readers after extensive exposure to the reading game. This study is among the first to exploit game generated data for the evaluation of DGBL for literacy interventions. Second revision submitted to **PeerJ** in May 2021. ## R packages The following R version and packages were used to generate this markdown: ```{r dependencies} sessionInfo() ``` ## Dataframes {.tabset} Here we present the complete raw data of 286 children who initially consented to participate and who were subsequently allocated to a treatment condition via block randomization. Below are the reasons for exclusion, as stated in flowchart (Figure 1). **ADD** = attention deficit disorder, **ADHD** = attention deficit hyperactivity disorder, **age** = age above 8 years, **consentGAM** = parents only gave consent to play the game, but not to carry out psychometric testing at school, **DCD** = developmental coordination disorder, **no** = 247 participants included in the subsequent analyses, **noT2** = lost to follow up, **repeat** = repeated first grade, **sessions** = played less than 20 gaming sessions. ```{r exclude} summary(BehavDat$Exclude) ``` ### Offline assessments {.tabset} Dataframe **BehavDat** contains the raw data based on questionnaires (e.g. familial risk for dyslexia) and psychometric test (e.g. CELF-IV-NL PA). Variable names and their descriptions are provided below. #### Variable Names The dataset consists of `r nrow(BehavDat)` rows and `r ncol(BehavDat)` columns with the following column names: ```{r colnames} str(BehavDat) summary(BehavDat) ``` #### Variable Description * Subj - unique subject ID * School - unique school ID * Class - unique classroom ID * Country - country: Netherlands (NL) or Belgium (B) * T1AgeF - fractional age in years at pretest (T1) * Gender - gender (male/female) * Exclude - exclusion criteria (e.g. ADHD or not enough sessions). Participants with Exclude=="no" are used for analyses * Cond - gaming condition (Passive, Math, Read) * HandScore - handedness score based on average of 11 questions (z-transformed) * FamRisk - self reported familal risk for dyslexia * Lang - language spoken at home (monolingual/multilingual) * SON_1st - analogies and categories subtest of SON-R 6-40 IQ test (z-transformed) * hoursPlayed - time on task (read & math only) * levelsPlayed - number of played levels (read & math only) * sessionsPlayed - number of played sessions (read & math only) * MaxLevel - highest level that was played (read & math only) * MaxLevelS - highest level that was played (as above, but z-transformed) * itemsSeen - total number of items seen during gameplay (read & math only) * responsesGiven - number of responses give during gameplay (read & math only) * T1CF_Total - CELF IV NL: total test score at T1 (raw scores 0-45) * T1CF_TotalPc - CELF IV NL: total test score at T1 (percentile 0-100) * T2CF_Total - CELF IV NL: total test score at T2 (raw scores 0-45) * T2CF_TotalPc - CELF IV NL: total test score at T2 (percentiles 0-100) * T1PF_Total - PREOF PA: total test score at T1 (raw scores 0-10) * T1PF_TotalPc - PREOF PA: total test score at T1 (percentiles 0-100) * T2PF_Total - PREOF PA: total test score at T2 (raw scores 0-10) * T2PF_TotalPc - PREOF PA:total test score at T2 (percentiles 0-100) * T1RANcT - Rapid Automatized Naming of colours at T1 (time in seconds) * T1RANoT - Rapid Automatized Naming of objects at T1 (time in seconds) * T2RANcT - Rapid Automatized Naming of colours at T2 (time in seconds) * T2RANoT - Rapid Automatized Naming of objects at T2 (time in seconds) * T2.LetKen - Computerized letter speech sound identification at T2 (raw scores 0-32) * T1.LetKen - Computerized letter speech sound identification at T1 (raw scores 0-32) * T2.LexDec - Computerized written lexical decision at T2 (raw scores 0-32) * EMT - One minute reading fluency scores at T2 (raw scores in correctly read words per minute) #### Reliability of offline measures ##### Reading fluency ```{r} readingdata <- droplevels(subset(BehavDat,EMT<100)) readingdata <- subset(readingdata, select=c(T2EMTaCor, T2EMTbCor)) (r = cor(readingdata$T2EMTaCor,readingdata$T2EMTbCor)) # 0.93 correlation between two lists ((2 * r) / (1 + r)) # 0.96 split half reliability with Spearman Brown correction psych::alpha(readingdata) # 0.96 based on 272 children ``` ### Online assessments {.tabset} Dataframe **inGameDat** contains the raw data from the in-game assessments that took place during the very first and the very last gaming session. #### Variable Names The dataset consists of `r nrow(inGameDat)` rows and `r ncol(inGameDat)` columns with the following additional variables: ```{r ingame colnames} str(inGameDat[1:18]) summary(inGameDat[1:18]) ``` #### Variable Description * idLevel - ingame level ID * stream - pre- or post test measure (T1/T2) * lvl - ingame level name * selectionsCounts - number of targets/distractors selected in a given level ID * selectionsCorrect - number of correctly selected targets in a given level ID * selectionsIncorrect - number of mistakenly selected distractors in a given level ID * idTirla - ingame trial ID * result - outcome of a single trial/click (correct/null[=incorrect]) * itemCountOnScreen - sum of target+distractors displayed on screen in a givent rial * correctResponse - the correct response to the task the child has to solve * playerResponse - the actual response to the task by the child * correctSelections - number of correctly selected targets in a given trial ID * incorrectSelections - number of mistakenly selected distractors in a given trial ID * itemsOnScreen - character representation of the order of graphemes on screen. Written lexical decision task additional encodes order of response buttons (correct/wrong) * trialRTs - response time of a given trial ID (in seconds) * TrialNum - sequential number of a given trial ID within a given level ID * prevRT - response time to the previous trial ID #### Reliability of online assessments This section computes the reliability for the letter speech sound identification (LSSI) and written lexical decision (WLD) tasks based on all available data at pre-test. We present Cronbach's alpha as a measure for internal consistency, as well as split-half reliability. Furthermore, we present values for item based reliability and aggregated by level. Although Henson (2001) suggested that speeded tests violate the assumption of measuring a single underlying construct, we can consider speed as part of the construct of interest (GPC automation and reading fluency). ##### LSSI aggregated by level ```{r Reliability} # Letter speech sound identification. Aggregated by level # Establish level variable to sort each trial into the level it belongs to inGameDat$LSSI[inGameDat$itemsOnScreen=="k;d;e;oe;a;n;j;v;r;i"] <- "LK1" inGameDat$LSSI[inGameDat$itemsOnScreen=="s;ee;b;aa;p;oo;u;w;g;ui;z"] <- "LK2" inGameDat$LSSI[inGameDat$itemsOnScreen=="uu;m;l;f;h;t;ou;ie;ij;o;eu"] <- "LK3" inGameDat$LSSI[inGameDat$lvl=="LetKenDifficult"] <- "LK4" inGameDat$LSSI <- as.factor(inGameDat$LSSI) inGameDat$correctSelections <- as.integer(inGameDat$correctSelections) - 1 # We have isolated cases of double pre-test assessments. Here we filter those out and only take the first. assessments <- aggregate(correctSelections~idLevel+Subj+LSSI,subset(inGameDat,stream=="T1"), FUN=sum) firstAssessed <- aggregate(idLevel~Subj+LSSI, assessments,FUN=min) assessment <- subset(assessments, idLevel %in% firstAssessed$idLevel) # internal consistency with cronbach's alpha LSSI <- reshape(assessment, idvar = "Subj", timevar = "LSSI", direction = "wide")[,c(3,5,7,9)] LSSIn <- LSSI[complete.cases(LSSI),] #a = (4 / (4 - 1)) * (1 - (var(LSSIn$correctSelections.LK1) + var(LSSIn$correctSelections.LK2) + var(LSSIn$correctSelections.LK3)+ var(LSSIn$correctSelections.LK4)) / var (LSSIn$correctSelections.LK1 +LSSIn$correctSelections.LK2 + LSSIn$correctSelections.LK3 + LSSIn$correctSelections.LK4)) # compute alpha from formula score_e <- rowMeans(LSSIn[, c(TRUE, FALSE)],na.rm=T) # with even items score_o <- rowMeans(LSSIn[, c(FALSE, TRUE)],na.rm=T) # with odd items r <- cor(score_e, score_o) (2 * r) / (1 + r) # 0.72 split half reliability with Spearman Brown correction psych::alpha(LSSI[complete.cases(LSSI),]) # 0.68 Cronbach's alpha ``` *** ##### LSSI by item ```{r} # Letter speech sound identification. Single trial data itemanalysis <- subset(inGameDat, idLevel %in% firstAssessed$idLevel & stream=="T1") items <- aggregate(correctSelections~correctResponse+Subj,itemanalysis,FUN=sum) item <- reshape(items, idvar = "Subj", timevar = "correctResponse", direction = "wide")[,-1] score_e <- rowMeans(item[, c(TRUE, FALSE)],na.rm=T) # with even items score_o <- rowMeans(item[, c(FALSE, TRUE)],na.rm=T) # with odd items r <- cor(score_e, score_o) (2 * r) / (1 + r) # 0.87 split half reliability with Spearman Brown correction psych::alpha(item) # 0.9 Cronbach's alpha ``` *** ##### Written Lexical Decision aggregated by level ```{r} # We have to subset data and remove bos and kos WLDreliab <- droplevels(subset(inGameDat, lvl=="LexDec")) WLDreliab <- subset(WLDreliab, itemsOnScreen != "bos;correct;wrong" & itemsOnScreen !="kos;wrong;correct") # Check how many unique levelIDs per subject WLDlevels <- aggregate(idLevel~Subj,WLDreliab,FUN=function(x) NROW(unique(x))) # only one subject has 3 times this level #StM105 - solution: remove trial 3991226 WLDreliab <- subset(WLDreliab, idLevel != "3991226") # only 2 levels per person! # now split up in min and max level per person WLDlevel <- aggregate(idLevel~Subj,WLDreliab,FUN=min) WLDlevel$max <- aggregate(idLevel~Subj,WLDreliab,FUN=max)$idLevel WLDlevel$diff <- WLDlevel$max - WLDlevel$idLevel WLDinclude <- droplevels(subset(WLDlevel, diff>0)$Subj) WLDrelia <- droplevels(subset(WLDreliab, Subj %in% WLDinclude)) WLDassess <- aggregate(correctSelections~Subj+idLevel,WLDrelia,FUN=sum) WLDassess$lvl <- "2" WLDassess$lvl[WLDassess$idLevel %in% WLDlevel$idLevel] <- "1" WLDassess$lvl <- as.factor(WLDassess$lvl) WLDassess$idLevel <- NULL WLD <- reshape(WLDassess, idvar = "Subj", timevar = "lvl", direction = "wide")[,-1] r <- cor(WLD$correctSelections.1,WLD$correctSelections.2) ((2 * r) / (1 + r)) # 0.65 split-half reliability with Spearman Brown correction psych::alpha(WLD) # 0.65 Cronbach's alpha ``` *** ##### Written Lexical Decision by item ```{r} items <- aggregate(correctSelections~itemsOnScreen+Subj,WLDreliab,FUN=sum) item <- reshape(items, idvar = "Subj", timevar = "itemsOnScreen", direction = "wide")[,-1] score_e <- rowMeans(item[, c(TRUE, FALSE)],na.rm=T) # with even items score_o <- rowMeans(item[, c(FALSE, TRUE)],na.rm=T) # with odd items r <- cor(score_e, score_o) (2 * r) / (1 + r) # 0.65 split half reliability with Spearman Brown correction psych::alpha(item,check.keys=TRUE) # Cronbach's alpha # 0.68 ``` ## Analyses {.tabset} ### Descriptives Descriptives for the 247 children included in the analyses by Condition, Country, Gender and Familal Risk. #### Tables by Condition, Country, Gender and status of familial risk ```{r} TableDat <- droplevels(subset(BehavDat, Exclude == "no")) tab <- tableby(Cond ~ EMT + FamRisk + Country + T1AgeF + HandScore + Gender + Lang + SON_1st + hoursPlayed + levelsPlayed + sessionsPlayed + MaxLevel + itemsSeen + responsesGiven + T1CF_TotalPc + T1PF_TotalPc + T1.LetKen + T1RANcT + T1RANoT, data = TableDat, numeric.stats=c("meansd","medianrange")) summary(tab, text=TRUE, digits=2, test=F) tab <- tableby(Country ~ EMT + FamRisk + Cond + T1AgeF + HandScore + Gender + Lang + SON_1st + hoursPlayed + levelsPlayed + sessionsPlayed + MaxLevel + itemsSeen + responsesGiven + T1CF_TotalPc + T1PF_TotalPc + T1.LetKen + T1RANcT + T1RANoT, data = TableDat, numeric.stats=c("meansd","medianrange")) summary(tab, text=TRUE, digits=2, test=F) tab <- tableby(Gender ~ EMT + FamRisk + Cond + T1AgeF + HandScore + Country + Lang + SON_1st + hoursPlayed + levelsPlayed + sessionsPlayed + MaxLevel + itemsSeen + responsesGiven + T1CF_TotalPc + T1PF_TotalPc + T1.LetKen + T1RANcT + T1RANoT, data = TableDat, numeric.stats=c("meansd","medianrange")) summary(tab, text=TRUE, digits=2, test=F) tab <- tableby(FamRisk ~ EMT + Country + Cond + T1AgeF + HandScore + Gender + Lang + SON_1st + hoursPlayed + levelsPlayed + sessionsPlayed + MaxLevel + itemsSeen + responsesGiven + T1CF_TotalPc + T1PF_TotalPc + T1.LetKen + T1RANcT + T1RANoT, data = TableDat, numeric.stats=c("meansd","medianrange")) summary(tab, text=TRUE, digits=2, test=F) TableDat$CountrCond <- interaction(TableDat$Country, TableDat$Cond) tab <- tableby(CountrCond ~ EMT + FamRisk + T1AgeF + HandScore + Gender + Lang + SON_1st + hoursPlayed + levelsPlayed + sessionsPlayed + MaxLevel + itemsSeen + responsesGiven + T1CF_TotalPc + T1PF_TotalPc + T1.LetKen + T1RANcT + T1RANoT, data = TableDat, numeric.stats=c("meansd","medianrange")) summary(tab, text=TRUE, digits=2, test=F) TableDat$CountrFamRisk <- interaction(TableDat$Country, TableDat$FamRisk) tab <- tableby(CountrFamRisk ~ EMT + Cond + T1AgeF + HandScore + Gender + Lang + SON_1st + hoursPlayed + levelsPlayed + sessionsPlayed + MaxLevel + itemsSeen + responsesGiven + T1CF_TotalPc + T1PF_TotalPc + T1.LetKen + T1RANcT + T1RANoT, data = TableDat, numeric.stats=c("meansd","medianrange")) summary(tab, text=TRUE, digits=2, test=F) ``` *** #### Normality at T1 None of the variables were normally distributed at T1. ```{r Combined descriptives} datComb <- droplevels(subset(BehavDat, Exclude=="no")) shapiro.test(datComb$T1AgeF) shapiro.test(datComb$T1.LetKen) shapiro.test(datComb$T1CF_TotalPc) shapiro.test(datComb$T1PF_TotalPc) shapiro.test(datComb$T1RANoT) shapiro.test(datComb$T1RANcT) shapiro.test(datComb$SON_1st) ``` *** #### Differences by country and condition Chi-squared/Fisher's exact test to compare counts of Gender, Multilingualism, Handedness and familial risk of dyslexia in the combined sample. No significant differences were observed, although there is a trend towards more multilinguals in the math condition. ##### Chi-square test Gender x Condition ```{r} chisq.test(matrix(c(21,31,44,60,47,44),ncol=3,dimnames=list(c("female","male"), c("Passive","Math","Read")))) ``` ##### Fisher's exact test Multilingualism x Condition ```{r} fisher.test(matrix(c(6,46,15,89,4,87),ncol=3,dimnames=list(c("multilingual","monolingual"), c("Passive","Math","Read")))) ``` ##### Chi-square test Handedness x Condition ```{r} chisq.test(matrix(c(9,43,8,96,10,81),ncol=3,dimnames=list(c("lefthander","righthander"), c("Passive","Math","Read")))) ``` ##### Chi-square test Familial Risk x Condition ```{r} chisq.test(matrix(c(7,45,18,86,16,75), ncol=3, dimnames=list(c("familial risk", "no familial risk"), c("Passive","Math","Read")))) ``` *** ##### ANOVA for variables on interval scales ANOVAs testing for a Condition x Country interaction for variables Age, LK, CELF PA, PROEF PA, RAN objects, RAN colours, and abstract reasoning. We found main effect of country for LK, CELF PA, PROEF PA, RAN objects, and RAN colours. In addition, there were main effects of condition for LK and RAN colours. For LK all groups differed from knowing most letters (math) via passive to least letters (read). Further, the math group was faster in RAN colours than both other groups. We further observed a Condition x Country interaction for RAN objects in that the Math group did not differ between the two countries, but the Dutch reading group outperformed their Belgian reading group peers. ```{r} boxplot(T1AgeF~Cond+Country,data=datComb, main="Age at T1", ylab="Age in years") summary(aov(T1AgeF~Cond*Country,data=datComb)) boxplot(T1.LetKen~Cond+Country,data=datComb, main="LK at T1", ylab="Letter knowledge") summary(aov(T1.LetKen~Cond*Country,data=datComb)) TukeyHSD(aov(T1.LetKen~Cond*Country,data=datComb)) boxplot(T1CF_TotalPc~Cond+Country,data=datComb, main="CELF PA at T1", ylab="CELF PA (percentile)") summary(aov(T1CF_TotalPc~Cond*Country,data=datComb)) boxplot(T1PF_TotalPc~Cond+Country,data=datComb, main="PREOF PA at T1", ylab="PROEF PA (percentile)") summary(aov(T1PF_TotalPc~Cond*Country,data=datComb)) boxplot(T1RANoT~Cond+Country,data=datComb, main="RAN Objects at T1", ylab="Time in seconds") summary(aov(T1RANoT~Cond*Country,data=datComb)) TukeyHSD(aov(T1RANoT~Cond*Country,data=datComb)) boxplot(T1RANcT~Cond+Country,data=datComb, main="RAN Colours at T1", ylab="Time in seconds") summary(aov(T1RANcT~Cond*Country,data=datComb)) TukeyHSD(aov(T1RANcT~Cond*Country,data=datComb)) boxplot(SON_1st~Cond+Country,data=datComb, main="Abstract reasoning at T1", ylab ="Abstract reasoning (z-score)") summary(aov(SON_1st~Cond*Country,data=datComb)) ``` ### ANALYSIS 1 - Belgian sample{.tabset} This is the analysis for the Belgium sample (N = `r nrow(datB)`) which features passive (non-playing) and active (math) control conditions. #### Descriptives Here, we show results of Chi-squared/Fisher's exact test for Gender, Lang, Handedness and FR in the Belgian sample. The sample is balanced in that regard. ##### Chi-square test Gender x Condition ```{r} chisq.test(matrix(c(21,31,21,35,24,29),ncol=3,dimnames=list(c("female","male"), c("Passive","Math","Read")))) ``` ##### Fisher's exact test Multilingualism x Condition ```{r} fisher.test(matrix(c(6,46,3,53,2,51),ncol=3,dimnames=list(c("multilingual","monolingual"), c("Passive","Math","Read")))) ``` ##### Fisher's exact test Handedness x Condition ```{r} fisher.test(matrix(c(9,43,5,51,4,49),ncol=3,dimnames=list(c("lefthander","righthander"), c("Passive","Math","Read")))) ``` ##### Chi-square test Familial Risk x Condition ```{r} chisq.test(matrix(c(7,45,7,49,10,43),ncol=3,dimnames=list(c("namilial risk","no familial risk"), c("Passive","Math","Read")))) ``` *** ANOVAs checking group differences at T1 revealed effects of condition for LK and RANcT. Subsequeny Tukey HSD tests revealed that the reading group knew less letters than the passive group, and that the math group was faster at RAN colours than the reading group. In both cases the math and passive groups did not differ from one another. ```{r} summary(aov(T1AgeF~Cond,data=datB)) summary(aov(T1.LetKen~Cond,data=datB)) TukeyHSD(aov(T1.LetKen~Cond,data=datB)) summary(aov(T1CF_TotalPc~Cond,data=datB)) summary(aov(T1PF_TotalPc~Cond,data=datB)) summary(aov(T1RANoT~Cond,data=datB)) summary(aov(T1RANcT~Cond,data=datB)) TukeyHSD(aov(T1RANcT~Cond,data=datB)) summary(aov(SON_1st~Cond,data=datB)) ``` A summary of the Belgian data with number of observations, means, range and standard deviation per variable and condition. ```{r} tab <- tableby(Gender ~ EMT + Cond + T1AgeF + HandScore + FamRisk + Lang + SON_1st + hoursPlayed + levelsPlayed + sessionsPlayed + MaxLevel + itemsSeen + responsesGiven + T1CF_Total + T1PF_Total + T1.LetKen + T1RANcT + T1RANoT, data = datB, numeric.stats=c("meansd","medianrange")) summary(tab, text=TRUE, digits=2, test=F) tab <- tableby(FamRisk ~ EMT + Cond + T1AgeF + HandScore + Gender + Lang + SON_1st + hoursPlayed + levelsPlayed + sessionsPlayed + MaxLevel + itemsSeen + responsesGiven + T1CF_Total + T1PF_Total + T1.LetKen + T1RANcT + T1RANoT, data = datB, numeric.stats=c("meansd","medianrange")) summary(tab, text=TRUE, digits=2, test=F) tab <- tableby(Cond ~ EMT + FamRisk + T1AgeF + HandScore + Gender + Lang + SON_1st + hoursPlayed + levelsPlayed + sessionsPlayed + MaxLevel + itemsSeen + responsesGiven + T1CF_Total + T1PF_Total + T1.LetKen + T1RANcT + T1RANoT, data = datB, numeric.stats=c("meansd","medianrange")) summary(tab, text=TRUE, digits=2, test=F) ``` #### Correlations Here, we show the correlation matrices of the main outcome variables and covariates at T1 and T2: EMT (one minute reading), CF_Total (CELF PA raw scores), PF_Total (PROEF PA raw scores), RAN (rapid automatized naming) of objects (o) or colors (c), as well as LK (in-game letter knowledge) and WLD (written lexical decision). The asterisk indicates significance at p < .05 ```{r Belgium: Correlations} corB <- subset(datB,select=c(EMT, T1CF_Total,T2CF_Total, T1PF_Total,T2PF_Total, T1RANcT, T2RANcT, T1RANoT, T2RANoT, T1.LetKen, T2.LetKen, T2.LexDec)) names(corB) <- c("T2EMT","T1CF","T2CF","T1PF","T2PF","T1RANc","T2RANc","T1RANo","T2RANo","T1LK","T2LK","T2WLD") corBP <- subset(datB,Cond=="Passive",select=c(EMT, T1CF_Total,T2CF_Total, T1PF_Total,T2PF_Total, T1RANcT, T2RANcT, T1RANoT, T2RANoT, T1.LetKen, T2.LetKen, T2.LexDec)) names(corBP) <- c("T2EMT","T1CF","T2CF","T1PF","T2PF","T1RANc","T2RANc","T1RANo","T2RANo","T1LK","T2LK","T2WLD") corBM <- subset(datB,Cond=="Math",select=c(EMT, T1CF_Total,T2CF_Total, T1PF_Total,T2PF_Total, T1RANcT, T2RANcT, T1RANoT, T2RANoT, T1.LetKen, T2.LetKen, T2.LexDec)) names(corBM) <- c("T2EMT","T1CF","T2CF","T1PF","T2PF","T1RANc","T2RANc","T1RANo","T2RANo","T1LK","T2LK","T2WLD") corBR <- subset(datB,Cond=="Read",select=c(EMT, T1CF_Total,T2CF_Total, T1PF_Total,T2PF_Total, T1RANcT, T2RANcT, T1RANoT, T2RANoT, T1.LetKen, T2.LetKen, T2.LexDec)) names(corBR) <- c("T2EMT","T1CF","T2CF","T1PF","T2PF","T1RANc","T2RANc","T1RANo","T2RANo","T1LK","T2LK","T2WLD") ``` ##### Entire Belgian sample ```{r echo=FALSE} corstars(corB) ``` *** ##### Belgian Passive group ```{r echo=FALSE} corstars(corBP) ``` *** ##### Belgian Math group ```{r echo=FALSE} corstars(corBM) ``` *** ##### Belgian Read group ```{r echo=FALSE} corstars(corBR) ``` #### Reading Fluency Single word reading fluency at T2 as measures by two custom lists with a time limit of one minute each. ```{r Belgium: Reading Fluency at T2, echo=FALSE, message=TRUE, warning=TRUE} datBemt <- droplevels(subset(datB,EMT < 100)) emtB <-lmer(EMT ~ log(T1RANcT) + T1CF_TotalZ + T1.LetKenZ + Cond + (1|School), data=datBemt) datBemt.2 = datBemt[abs(scale(resid(emtB))) < 2, ] emtB.2 <-lmer(EMT ~ log(T1RANcT) + T1CF_TotalZ + T1.LetKenZ + Cond + (1|School), data=datBemt.2) datBemt.2$Cond <- relevel(datBemt.2$Cond,ref="Passive") emtB.3 <-lmer(EMT ~ log(T1RANcT) + T1CF_TotalZ + T1.LetKenZ + Cond + (1|School), data=datBemt.2) plotdat <- as.data.frame(effect("Cond",emtB.3)) plotdat$Cond <- relevel(plotdat$Cond,ref="Passive") ggplot(plotdat, aes(Cond, fit)) + geom_point() + geom_errorbar(aes(ymin=fit-1.96*se, ymax=fit+1.96*se), width=0.3) + theme_bw(base_size=12) + labs(x = "Condition", y = "Words read per minute at T2 (z-score)") pdf("Ch3_B_RF_GG.pdf", width=6,height=4) ggplot(plotdat, aes(Cond, fit)) + geom_point() + geom_errorbar(aes(ymin=fit-1.96*se, ymax=fit+1.96*se), width=0.3) + theme_bw(base_size=12) + labs(x = "Condition", y = "Words read per minute at T2 (z-score)") dev.off() summary(emtB.2) writeLines('Explained variance') rsquared.glmm(list(emtB.2)) writeLines('Sample size by condition') summary(emtB.2@frame$Cond) writeLines('Trimmed observations:') nrow(datBemt[abs(scale(resid(emtB))) > 2 , ]) writeLines('Trimmed in percent:') (1-nrow(datBemt[abs(scale(resid(emtB))) < 2 , ]) / nrow(datBemt)) * 100 performance::check_model(emtB.2) writeLines('Model summary for comparison Passive vs Read') summary(emtB.3) (emtB.2bs <- confint(emtB.2, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) (emtB.3bs <- confint(emtB.3, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) ``` #### PA {.tabset} Phonological awareness at T2 as measured by the CELF-IV-NL and the PROEF. ##### CELF-IV-NL Phonological Awareness ```{r Belgium: CELF Phonlogical Awareness, echo=FALSE, message=TRUE, warning=TRUE} datBcf <- droplevels(subset(datB,T2CF_TotalZ < 100)) cfB <-lmer(T2CF_TotalZ ~ SON_1st + T1PF_TotalZ + T1CF_TotalZ + Cond + (1+T1PF_TotalZ|School), data=datBcf, control = lmerControl(optimizer="bobyqa", optCtrl=list(maxfun=50000) )) datBcf.2 = datBcf[abs(scale(resid(cfB))) < 2, ] cfB.2 <-lmer(T2CF_TotalZ ~ SON_1st + T1PF_TotalZ + T1CF_TotalZ + Cond + (1+T1PF_TotalZ|School), data=datBcf.2, control = lmerControl(optimizer="bobyqa", optCtrl=list(maxfun=50000) )) datBcf.2$Cond <- relevel(datBcf.2$Cond,ref="Passive") cfB.3 <-lmer(T2CF_TotalZ ~ SON_1st + T1PF_TotalZ + T1CF_TotalZ + Cond + (1+T1PF_TotalZ|School), data=datBcf.2, control = lmerControl(optimizer="bobyqa", optCtrl=list(maxfun=50000) )) plotdat <- as.data.frame(effect("Cond",cfB.3)) plotdat$Cond <- relevel(plotdat$Cond,ref="Passive") ggplot(plotdat, aes(Cond, fit)) + geom_point() + geom_errorbar(aes(ymin=fit-1.96*se, ymax=fit+1.96*se), width=0.3) + theme_bw(base_size=12) + labs(x = "Condition", y = "CELF PA at T2 (z-score)") pdf("Ch3_B_CELF_GG.pdf", width=6,height=4) ggplot(plotdat, aes(Cond, fit)) + geom_point() + geom_errorbar(aes(ymin=fit-1.96*se, ymax=fit+1.96*se), width=0.3) + theme_bw(base_size=12) + labs(x = "Condition", y = "CELF PA at T2 (z-score)") dev.off() summary(cfB.2) writeLines('Explained variance') rsquared.glmm(list(cfB.2)) writeLines('Sample size by condition') summary(cfB.2@frame$Cond) writeLines('Trimmed observations:') nrow(datBcf[abs(scale(resid(cfB))) > 2 , ]) writeLines('Trimmed in percent:') (1-nrow(datBcf[abs(scale(resid(cfB))) < 2 , ]) / nrow(datBcf)) * 100 performance::check_model(cfB.2) writeLines('Model summary for comparison Passive vs Read') summary(cfB.3) (cfB.2bs <- confint(cfB.2, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) (cfB.3bs <- confint(cfB.3, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) ``` ##### PROEF Phonological Awareness ```{r Belgium: PROEF Phonological Awareness, echo=FALSE, message=TRUE, warning=TRUE} datBpf <- droplevels(subset(datB,T2PF_TotalZ < 100)) pfB <-lmer(T2PF_TotalZ ~ T1CF_TotalPc + Cond + FamRisk + T1PF_TotalZ + (1+T1PF_TotalZ|School), data=datBpf, control = lmerControl(optimizer="bobyqa", optCtrl=list(maxfun=50000) )) datBpf.2 = datBpf[abs(scale(resid(pfB))) < 2, ] pfB.2 <-lmer(T2PF_TotalZ ~ T1CF_TotalPc + Cond + FamRisk + T1PF_TotalZ + (1+T1PF_TotalZ|School), data=datBpf.2, control = lmerControl(optimizer="bobyqa", optCtrl=list(maxfun=50000) )) datBpf.2$Cond <- relevel(datBpf.2$Cond,ref="Passive") pfB.3 <-lmer(T2PF_TotalZ ~ T1CF_TotalPc + Cond + FamRisk + T1PF_TotalZ + (1+T1PF_TotalZ|School), data=datBpf.2, control = lmerControl(optimizer="bobyqa", optCtrl=list(maxfun=50000) )) plotdat <- as.data.frame(effect("Cond",pfB.3)) plotdat$Cond <- relevel(plotdat$Cond,ref="Passive") ggplot(plotdat, aes(Cond, fit)) + geom_point() + geom_errorbar(aes(ymin=fit-1.96*se, ymax=fit+1.96*se), width=0.3) + theme_bw(base_size=12) + labs(x = "Condition", y = "PROEF PA at T2 (z-score)") pdf("Ch3_B_PFB_GG.pdf", width=6,height=4) ggplot(plotdat, aes(Cond, fit)) + geom_point() + geom_errorbar(aes(ymin=fit-1.96*se, ymax=fit+1.96*se), width=0.3) + theme_bw(base_size=12) + labs(x = "Condition", y = "PROEF PA at T2 (z-score)") dev.off() summary(pfB.2) writeLines('Explained variance') rsquared.glmm(list(pfB.2)) writeLines('Sample size by condition') summary(pfB.2@frame$Cond) writeLines('Trimmed observations:') nrow(datBpf[abs(scale(resid(pfB))) > 2 , ]) writeLines('Trimmed in percent:') (1-nrow(datBpf[abs(scale(resid(pfB))) < 2 , ]) / nrow(datBpf)) * 100 performance::check_model(pfB.2) writeLines('Model summary for comparison Passive vs Read') summary(pfB.3) (pfB.2bs <- confint(pfB.2, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) (pfB.3bs <- confint(pfB.3, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) ``` #### RAN {.tabset} Rapid Automatized Naming of objects and colours at T2. Both of these models are problematic as they show a high multicollinearity of the Cond*PreTest interaction - indicating that there is not much change from pre to post which can be attributed to condition. ##### RAN colours ```{r Belgium: RAN colours, echo=FALSE, message=TRUE, warning=TRUE} datBrC <- droplevels(subset(datB,T2RANcT < 100)) rcB <-lm(T2RANcT ~ Cond * T1RANcT, data=datBrC) datBrC.2 = datBrC[abs(scale(resid(rcB))) < 2, ] rcB.2 <-lm(T2RANcT ~ Cond * T1RANcT, data=datBrC.2) plot(allEffects(rcB.2),multiline=T,main="RAN colours (Cond*T1)", ylab="RAN colours time at T2 (seconds)", xlab="RAN colours time at T1 (seconds)", ci.style="bars", lines=list(col=c("black"), lty=c(2,3))) summary(rcB.2) performance::check_model(rcB.2) (rcB.2bs <- confint(rcB.2, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) ``` ##### RAN objects ```{r Belgium: RAN objects, echo=FALSE, message=TRUE, warning=TRUE} datBrO <- droplevels(subset(datB,T2RANoT < 100)) roB <-lm(T2RANoT ~ Cond * T1RANcT, data=datBrO) datBrO.2 = datBrO[abs(scale(resid(roB))) < 2, ] roB.2 <-lm(T2RANcT ~ Cond * T1RANcT, data=datBrO.2) plot(allEffects(roB.2),multiline=T,main="RAN objects (Cond*T1)", ylab="RAN objects time at T2 (seconds)", xlab="RAN objects time at T1 (seconds)", ci.style="bars", lines=list(col=c("black"), lty=c(2,3))) summary(roB.2) performance::check_model(roB.2) (roB.2bs <- confint(roB.2, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) ``` #### In-Game {.tabset} Apart from the offline pencil and paper tests in 4.1.3 and 4.1.4 we also did in-game assessments in form of letter-speech-sound-identification (LSSI) and written lexical decision (WLD). Both of these feature seperate analyses for accuracy and response times. ##### LSSI Accuracy ```{r Belgium: LSSI Accuracy, cache=TRUE} accLKB <-glmer(correctSelections ~ lvl + trialRTs + T1CF_TotalZ + stream * Cond + (1+trialRTs|correctResponse) + (0+T1CF_TotalZ|correctResponse) + (1|Subj), family=binomial, data=LKaccB, glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=50000) )) LKaccB.2 = LKaccB[abs(scale(resid(accLKB))) < 2 , ] accLKB.2 <-glmer(correctSelections ~ lvl + trialRTs + T1CF_TotalZ + stream * Cond + (1+trialRTs|correctResponse) + (0+T1CF_TotalZ|correctResponse) + (1|Subj), family=binomial, data=LKaccB.2, glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=50000) )) LKaccB.2$Cond <- relevel(LKaccB.2$Cond,ref="Passive") # refit the model to get passive as baseline for plotting accLKB.3 <-glmer(correctSelections ~ lvl + trialRTs + T1CF_TotalZ + stream * Cond + (1+trialRTs|correctResponse) + (0+T1CF_TotalZ|correctResponse) + (1|Subj), family=binomial, data=LKaccB.2, glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=50000) )) plot(effect("stream*Cond",accLKB.3), ylab="probability of correct response", main="Letter Knowledge Accuracy", multiline=TRUE, ci.style="bars", xlab="Session",lines=list(col=c("black"),lty=c(1,2,3)),x.var="stream", lattice=list(key.args=list(title="Condition",x=.05, y=.95,cex=1.2,cex.title=1.2))) pdf("Ch3_B_LSSIacc.pdf", width=6,height=4) plot(effect("stream*Cond",accLKB.3), ylab="probability of correct response", main="", multiline=TRUE, ci.style="bars", xlab="Session",lines=list(col=c("black"),lty=c(1,2,3)),x.var="stream", lattice=list(key.args=list(title="Condition",x=.05, y=.95,cex=1.2,cex.title=1.2))) dev.off() summary(accLKB.2) writeLines('Explained variance') rsquared.glmm(list(accLKB.2)) writeLines('Sample size by condition') summary(aggregate(Cond~Subj,data=accLKB.2@frame,FUN=unique)$Cond) writeLines('Trimmed observations:') nrow(LKaccB[abs(scale(resid(accLKB))) > 2 , ]) writeLines('Trimmed in percent:') (1-nrow(LKaccB[abs(scale(resid(accLKB))) < 2 , ]) / nrow(LKaccB)) * 100 (accLKB.2bs <- confint(accLKB.2, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) summary(accLKB.3) (accLKB.3bs <- confint(accLKB.3, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) ``` ##### LSSI Response Time ```{r Belgium: LSSI Response Time, cache=TRUE} bc <- MASS::boxcox(lm(trialRTs ~ lvl + log(T1RANcT) + T1.LetKenZ + TrialNum + prevRT + stream * Cond, data=LKrtB),plotit=FALSE) lambda <- bc$x[which.max(bc$y)] LKrtB$RTinv = LKrtB$trialRTs^lambda LKrtB$prevRTinv = LKrtB$prevRT^lambda rtLKB = lmer(RTinv ~ lvl + log(T1RANcT) + T1.LetKenZ + TrialNum + prevRTinv + stream * Cond + (1|itemsOnScreen) + (0+stream|Subj) + (1|correctResponse) + (1|Class), data=LKrtB) LKrtB.2 = LKrtB[abs(scale(resid(rtLKB))) < 2 , ] rtLKB.2 = lmer(RTinv ~ lvl + log(T1RANcT) + T1.LetKenZ + TrialNum + prevRTinv + stream * Cond + (1|itemsOnScreen) + (0+stream|Subj) + (1|correctResponse) + (1|Class), data=LKrtB.2 ) LKrtB.2$Cond <- relevel(LKrtB.2$Cond,ref="Passive") # refit the model to get passive as baseline for plotting rtLKB.3 = lmer(RTinv ~ lvl + log(T1RANcT) + T1.LetKenZ + TrialNum + prevRTinv + stream * Cond + (1|itemsOnScreen) + (0+stream|Subj) + (1|correctResponse) + (1|Class), data=LKrtB.2) LKrtef = effect("stream*Cond",rtLKB.3) LKrtef$fit = LKrtef$fit^(1/lambda) LKrtef$upper = LKrtef$upper^(1/lambda) LKrtef$lower = LKrtef$lower^(1/lambda) plot(LKrtef, ylab="Median response time (seconds)", main="Letter Knowledge Response Times", multiline=TRUE, confint=list(style="bars"), xlab="Session", lines=list(col=c("black"), lty=c(1,2,3)),x.var="stream", lattice=list(key.args=list(title="Condition",x=.62, y=.95,cex=1.2,cex.title=1.2))) summary(rtLKB.2) writeLines('Explained variance') r.squaredGLMM(rtLKB.2) writeLines('Sample size by condition') summary(aggregate(Cond~Subj,data=rtLKB.2@frame,FUN=unique)$Cond) writeLines('Trimmed observations:') nrow(LKrtB[abs(scale(resid(rtLKB))) > 2 , ]) writeLines('Trimmed in percent:') (1-nrow(LKrtB[abs(scale(resid(rtLKB))) < 2 , ]) / nrow(LKrtB))*100 performance::check_model(rtLKB.2) acf(resid(rtLKB.2),main="autocorrelation of observations") (LKrtB.2bs <- confint(rtLKB.2, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) summary(rtLKB.3) (LKrtB.3bs <- confint(rtLKB.3, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) ``` ##### WLD Accuracy ```{r Belgium: Written Lexical Decision Accuracy, cache=TRUE} accLDB <- glmer(correctSelections ~ Cond + targettype + T1CF_TotalZ + (1|target) + (1|Subj), family=binomial, data=LDaccB, glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=50000) )) LDaccB.2 = LDaccB[abs(scale(resid(accLDB))) < 2 , ] accLDB.2 <- glmer(correctSelections ~ Cond + targettype + T1CF_TotalZ + (1|target) + (1|Subj), family=binomial, data=LDaccB.2, glmerControl(optimizer = "bobyqa", optCtrl=list(maxfun=50000) )) LDaccB.2$Cond <- relevel(LDaccB.2$Cond,ref="Passive") # refit the model to get passive as baseline for plotting accLDB.3 <- glmer(correctSelections ~ Cond + targettype + T1CF_TotalZ + (1|target) + (1|Subj), family=binomial, data=LDaccB.2, glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=50000) )) plot(effect("Cond",accLDB.3), ylab="probability of correct response", main="Written Lexical Decision Accuracy",multiline=TRUE, ci.style="bars", xlab="",lines=list(col=c("black"),lty=0)) summary(accLDB.2) writeLines('Explained variance') rsquared.glmm(list(accLDB.2)) writeLines('Sample size by condition') summary(aggregate(Cond~Subj,data=accLDB.2@frame,FUN=unique)$Cond) writeLines('Trimmed observations:') nrow(LDaccB[abs(scale(resid(accLDB))) > 2 , ]) writeLines('Trimmed in percent:') (1-nrow(LDaccB[abs(scale(resid(accLDB))) < 2 , ]) / nrow(LDaccB)) * 100 (accLDB.2bs <- confint(accLDB.2, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) ``` ##### WLD Response Time ```{r Belgium: Written Lexical Decision Response Time, cache=TRUE} bc <- MASS::boxcox(lm(trialRTs ~ T1.LetKenZ + T1CF_TotalZ + targettype + T1AgeF + prevRT + Cond, data=LDrtB),plotit=FALSE) lambda <- bc$x[which.max(bc$y)] LDrtB$RTinv = LDrtB$trialRTs^lambda LDrtB$prevRTinv = LDrtB$prevRT^lambda rtLDB = lmer(RTinv ~ T1.LetKenZ + T1CF_TotalZ + targettype + T1AgeF + prevRTinv + Cond + (1|target) + (1+prevRTinv|Subj),data=LDrtB) LDrtB.2 = LDrtB[abs(scale(resid(rtLDB))) < 2 , ] rtLDB.2 = lmer(RTinv ~ T1.LetKenZ + T1CF_TotalZ + targettype + T1AgeF + prevRTinv + Cond + (1|target) + (1+prevRTinv|Subj),data=LDrtB.2) LDrtB.2$Cond <- relevel(LDrtB.2$Cond,ref="Passive") # refit the model to get passive as baseline for plotting rtLDB.3 = lmer(RTinv ~ T1.LetKenZ + T1CF_TotalZ + targettype + T1AgeF + prevRTinv + Cond + (1|target) + (1+prevRTinv|Subj),data=LDrtB.2) LKrtef = effect("Cond",rtLDB.3) LKrtef$fit = LKrtef$fit^(1/lambda) LKrtef$upper = LKrtef$upper^(1/lambda) LKrtef$lower = LKrtef$lower^(1/lambda) plot(LKrtef, ylab="Median response time (seconds)", main="Written Lexical Decision Response Times", multiline=TRUE, ci.style="bars", xlab="", lines=list(col=c("black"), lty=0)) # compute effect size for age summary(rtLDB.2) writeLines('Explained variance') rsquared.glmm(list(rtLDB.2)) writeLines('Sample size by condition') summary(aggregate(Cond~Subj,data=rtLDB.2@frame,FUN=unique)) writeLines('Trimmed observations:') nrow(LDrtB[abs(scale(resid(rtLDB))) > 2 , ]) writeLines('Trimmed in percent:') (1-nrow(LDrtB[abs(scale(resid(rtLDB))) < 2 , ]) / nrow(LDrtB)) * 100 performance::check_model(rtLDB) acf(resid(rtLDB.2),main="autocorrelation of observations") (rtLDB.2bs <- confint(rtLDB.2, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) ``` ### ANALYSIS 2 - Dutch Sample {.tabset} This is the analysis for the Dutch sample (N = `r nrow(datNL)`) which only features an active (math) control condition. #### Descriptives Here, we show results of Chi-squared/Fisher's exact test for Gender, Language, Handedness and Familial risk in Dutch sample There are more multilingual children in the Dutch math cohort. ##### Chi-square test Gender x Condition ```{r} chisq.test(matrix(c(23,25,23,15),ncol=2,dimnames=list(c("female","male"), c("Math","Read")))) ``` ##### Fisher's exact test Multilingualism x Condition ```{r} fisher.test(matrix(c(12,36,2,36),ncol=2,dimnames=list(c("multilingual","monolingual"), c("Math","Read")))) ``` ##### Fisher's exact test Handedness x Condition ```{r} fisher.test(matrix(c(3,45,6,32),ncol=2,dimnames=list(c("lefthander","righthander"), c("Math","Read")))) ``` ##### Chi-square test Familial Risk x Condition ```{r} chisq.test(matrix(c(11,37,6,32),ncol=2,dimnames=list(c("namilial risk","no familial risk"), c("Math","Read")))) ``` *** ANOVAs comparing the remaining T1 measures revealed that the reading group knew marginally less letters at T1. ```{r} summary(aov(T1AgeF~Cond,data=datNL)) summary(aov(T1.LetKen~Cond,data=datNL)) TukeyHSD(aov(T1.LetKen~Cond,data=datNL)) summary(aov(T1CF_TotalPc~Cond,data=datNL)) summary(aov(T1PF_TotalPc~Cond,data=datNL)) summary(aov(T1RANoT~Cond,data=datNL)) summary(aov(T1RANcT~Cond,data=datNL)) summary(aov(SON_1st~Cond,data=datNL)) ``` *** Summaries of the Dutch data with number of observations, means, range and standard deviation per variable and gaming condition. ```{r} tab <- tableby(Gender ~ EMT + Cond + T1AgeF + HandScore + FamRisk + Lang + SON_1st + hoursPlayed + levelsPlayed + sessionsPlayed + MaxLevel + itemsSeen + responsesGiven + T1CF_Total + T1PF_Total + T1.LetKen + T1RANcT + T1RANoT, data = datNL, numeric.stats=c("meansd","medianrange")) summary(tab, text=TRUE, digits=2, test=F) tab <- tableby(FamRisk ~ EMT + Cond + T1AgeF + HandScore + Gender + Lang + SON_1st + hoursPlayed + levelsPlayed + sessionsPlayed + MaxLevel + itemsSeen + responsesGiven + T1CF_Total + T1PF_Total + T1.LetKen + T1RANcT + T1RANoT, data = datNL, numeric.stats=c("meansd","medianrange")) summary(tab, text=TRUE, digits=2, test=F) tab <- tableby(Cond ~ EMT + FamRisk + T1AgeF + HandScore + Gender + Lang + SON_1st + hoursPlayed + levelsPlayed + sessionsPlayed + MaxLevel + itemsSeen + responsesGiven + T1CF_Total + T1PF_Total + T1.LetKen + T1RANcT + T1RANoT, data = datNL, numeric.stats=c("meansd","medianrange")) summary(tab, text=TRUE, digits=2, test=F) ``` #### Correlations Here, we show the correlation matrices of the main outcome variables and covariates at T1 and T2: EMT (one minute reading), CF_Total (CELF PA raw scores), PF_Total (PROEF PA raw scores), RAN (rapid automatized naming) of objects (o) or colors (c), as well as LK (in-game letter knowledge) and WLD (written lexical decision). An asterisk indicates significance at p < .05 ```{r Netherlands: Correlations} corNL <- subset(datNL,select=c(EMT, T1CF_Total,T2CF_Total, T1PF_Total,T2PF_Total, T1RANcT, T2RANcT, T1RANoT, T2RANoT, T1.LetKen, T2.LetKen, T2.LexDec)) names(corNL) <- c("T2EMT","T1CF","T2CF","T1PF","T2PF","T1RANc","T2RANc","T1RANo","T2RANo","T1LK","T2LK","T2WLD") corNLM <- subset(datNL,Cond=="Math",select=c(EMT, T1CF_Total,T2CF_Total, T1PF_Total,T2PF_Total, T1RANcT, T2RANcT, T1RANoT, T2RANoT, T1.LetKen, T2.LetKen, T2.LexDec)) names(corNLM) <- c("T2EMT","T1CF","T2CF","T1PF","T2PF","T1RANc","T2RANc","T1RANo","T2RANo","T1LK","T2LK","T2WLD") corNLR <- subset(datNL,Cond=="Read",select=c(EMT, T1CF_Total,T2CF_Total, T1PF_Total,T2PF_Total, T1RANcT, T2RANcT, T1RANoT, T2RANoT, T1.LetKen, T2.LetKen, T2.LexDec)) names(corNLR) <- c("T2EMT","T1CF","T2CF","T1PF","T2PF","T1RANc","T2RANc","T1RANo","T2RANo","T1LK","T2LK","T2WLD") ``` ##### Entire Dutch Sample ```{r echo=FALSE} corstars(corNL) ``` *** ##### Dutch Math Group ```{r echo=FALSE} corstars(corNLM) ``` *** ##### Dutch Read Group ```{r echo=FALSE} corstars(corNLR) ``` #### Reading Fluency Single word reading fluency at T2 as measures by two custom lists with a time limit of one minute each. ```{r Netherlands: Reading Fluency at T2, echo=FALSE, message=TRUE, warning=TRUE} datNLemt <- droplevels(subset(datNL,EMT < 100)) emtNL <-lmer(EMT ~ HandScore + T1PF_TotalZ + T1.LetKenZ + Cond + (1|Class), data=datNLemt) datNLemt.2 = datNLemt[abs(scale(resid(emtNL))) < 2 , ] emtNL.2 <-lmer(EMT ~ HandScore + T1PF_TotalZ + T1.LetKenZ + Cond + (1|Class), data=datNLemt.2) plotdat <- effect("Cond",emtNL.2) ggplot(as.data.frame(plotdat), aes(Cond, fit)) + geom_point() + geom_errorbar(aes(ymin=fit-1.96*se, ymax=fit+1.96*se), width=0.3) + theme_bw(base_size=12) + labs(x = "Condition", y = "Words read per minute (z-score)") summary(emtNL.2) writeLines('Explained variance') rsquared.glmm(list(emtNL.2)) writeLines('Sample size by condition') summary(emtNL.2@frame$Cond) writeLines('Trimmed observations:') nrow(datNLemt[abs(scale(resid(emtNL))) > 2 , ]) writeLines('Trimmed in percent:') (1-nrow(datNLemt[abs(scale(resid(emtNL))) < 2 , ]) / nrow(datNLemt)) * 100 performance::check_model(emtNL.2) (emtNL.2bs <- confint(emtNL.2, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) ``` #### PA {.tabset} Phonological awareness at T2 as measured by the CELF-IV-NL and the PROEF. ##### CELF-IV-NL Phonological Awareness ```{r Netherlands: CELF Phonlogical Awareness, echo=FALSE, warning=TRUE} datNLcf <- droplevels(subset(datNL,T2CF_TotalZ < 100)) cfNL <-lm(T2CF_TotalZ ~ T1.LetKenZ + T1CF_TotalZ + Cond, data=datNLcf) datNLcf.2 = datNLcf[abs(scale(resid(cfNL))) < 2, ] cfNL.2 <-lm(T2CF_TotalZ ~ T1.LetKenZ + T1CF_TotalZ + Cond, data=datNLcf.2) plotdat <- effect("Cond",cfNL.2) ggplot(as.data.frame(plotdat), aes(Cond, fit)) + geom_point() + geom_errorbar(aes(ymin=fit-1.96*se, ymax=fit+1.96*se), width=0.3) + theme_bw(base_size=12) + labs(x = "Condition", y = "CELF PA at T2 (z-score)") summary(cfNL.2) writeLines('Sample size by condition') summary(cfNL.2$model$Cond) writeLines('Trimmed observations:') nrow(datNLcf[abs(scale(resid(cfNL))) > 2 , ]) writeLines('Trimmed in percent:') (1-nrow(datNLcf[abs(scale(resid(cfNL))) < 2 , ]) / nrow(datNLcf)) * 100 performance::check_model(cfNL) (cfNL.2bs <- confint(cfNL.2, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) ``` ##### PROEF Phonological Awareness ```{r Netherlands: PROEF Phonological Awareness, echo=FALSE, message=TRUE, warning=TRUE} datNLpf <- droplevels(subset(datNL,T2PF_TotalZ < 100)) pfNL <-lmer(T2PF_TotalZ ~ T1PF_TotalZ + T1CF_TotalZ + Cond + (1|Class), data=datNLpf) datNLpf.2 = datNLpf[abs(scale(resid(pfNL))) < 2, ] pfNL.2 <-lmer(T2PF_TotalZ ~ T1PF_TotalZ + T1CF_TotalZ + Cond + (1|Class), data=datNLpf.2) plotdat <- effect("Cond",pfNL.2) ggplot(as.data.frame(plotdat), aes(Cond, fit)) + geom_point() + geom_errorbar(aes(ymin=fit-1.96*se, ymax=fit+1.96*se), width=0.3) + theme_bw(base_size=12) + labs(x = "Condition", y = "PROEF PA at T2 (z-score)") summary(pfNL.2) writeLines('Explained variance') rsquared.glmm(list(pfNL.2)) writeLines('Sample size by condition') summary(pfNL.2@frame$Cond) writeLines('Trimmed observations:') nrow(datNLpf[abs(scale(resid(pfNL))) > 2 , ]) writeLines('Trimmed in percent:') (1-nrow(datNLpf[abs(scale(resid(pfNL))) < 2 , ]) / nrow(datNLpf)) * 100 performance::check_model(pfNL.2) (pfNL.2bs <- confint(pfNL.2, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) ``` #### RAN {.tabset} Rapid Automatized Naming of objects and colours at T2. Both of these models are problematic as they show a high multicollinearity of the Cond*PreTest interaction - indicating that there is not much change from pre to post which can be attributed to condition. ##### RAN Colours ```{r Netherlands: RAN Colours, echo=FALSE, message=TRUE, warning=TRUE} datNLrC <- droplevels(subset(datNL,T2RANcT < 100)) rcNL <-lm(T2RANcT ~ Cond * T1RANcT, data=datNLrC) datNLrC.2 = datNLrC[abs(scale(resid(rcNL))) < 2, ] rcNL.2 <-lm(T2RANcT ~ Cond * T1RANcT, data=datNLrC.2) plot(allEffects(rcNL.2),multiline=T,main="RAN colours (Cond*T1)", ylab="RAN colours time at T2 (seconds)", xlab="RAN colours time at T1 (seconds)", ci.style="bars", lines=list(col=c("black"), lty=c(2,3))) summary(rcNL.2) performance::check_model(rcNL.2) (rcNL.2bs <- confint(rcNL.2, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) ``` ##### RAN Objects ```{r Netherlands: RAN Objects, echo=FALSE, message=TRUE, warning=TRUE} datNLrO <- droplevels(subset(datNL,T2RANoT < 100)) roNL <-lm(T2RANoT ~ Cond * T1RANcT, data=datNLrO) datNLrO.2 = datNLrO[abs(scale(resid(roNL))) < 2, ] roNL.2 <-lm(T2RANcT ~ Cond * T1RANcT, data=datNLrO.2) plot(allEffects(roNL.2),multiline=T,main="RAN objects (Cond*T1)", ylab="RAN objects time at T2 (seconds)", xlab="RAN objects time at T1 (seconds)", ci.style="bars", lines=list(col=c("black"), lty=c(2,3))) summary(roNL.2) performance::check_model(roNL.2) (roNL.2bs <- confint(roNL.2, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) ``` #### In-Game {.tabset} Apart from the offline pencil and paper tests in 4.2.3 and 4.2.4 we also did in-game assessments in form of letter-speech-sound-identification (LSSI) and written lexical decision (WLD). Both of these feature seperate analyses for accuracy and response times. ##### LSSI Accuracy ```{r Netherlands: LSSI Accuracy, cache=TRUE} accLKNL <-glmer(correctSelections ~ T1CF_TotalZ + Cond * stream + Gender + lvl + trialRTs + (1+trialRTs|correctResponse) + (1+trialRTs|Subj), family=binomial, data=LKaccNL, glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=50000) )) LKaccNL.2 = LKaccNL[abs(scale(resid(accLKNL))) < 2 , ] accLKNL.2 <-glmer(correctSelections ~ T1CF_TotalZ + Cond * stream + Gender + lvl + trialRTs + (1+trialRTs|correctResponse) + (1+trialRTs|Subj), family=binomial, data=LKaccNL.2, glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=50000) )) plot(effect("Cond*stream",accLKNL.2), ylab="probability of correct response", main="Letter Knowledge Accuracy", multiline=TRUE, ci.style="bars", xlab="", lines=list(col=c("black"), lty=c(2,3)),x.var="stream", lattice=list(key.args=list(title="Condition",x=.05, y=.95,cex=1.2,cex.title=1.2))) summary(accLKNL.2) writeLines('Explained variance') rsquared.glmm(list(accLKNL.2)) writeLines('Sample size by condition') summary(aggregate(Cond~Subj,data=accLKNL.2@frame,FUN=unique)$Cond) writeLines('Trimmed observations:') nrow(LKaccNL[abs(scale(resid(accLKNL))) > 2 , ]) writeLines('Trimmed in percent:') (1-nrow(LKaccNL[abs(scale(resid(accLKNL))) < 2 , ]) / nrow(LKaccNL)) * 100 (accLKNL.2bs <- confint(accLKNL.2, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) ``` ##### LSSI Response Time ```{r Netherlands: LSSI Response Time, cache=TRUE} bc <- MASS::boxcox(lm(trialRTs ~ T1PF_TotalZ + T1AgeF + prevRT + TrialNum + stream * Cond, data=LKrtNL),plotit=FALSE) lambda <- bc$x[which.max(bc$y)] LKrtNL$RTinv = LKrtNL$trialRTs^lambda LKrtNL$prevRTinv = LKrtNL$prevRT^lambda rtLKNL = lmer(RTinv ~ T1PF_TotalZ + T1AgeF + prevRTinv + TrialNum + stream * Cond + (1+T1.LetKenZ|itemsOnScreen) + (0+stream|Subj) + (0+stream|correctResponse) + (1|Class), data=LKrtNL, control=lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun=50000) )) LKrtNL.2 = LKrtNL[abs(scale(resid(rtLKNL))) < 2 , ] rtLKNL.2 = lmer(RTinv ~ T1PF_TotalZ + T1AgeF + prevRTinv + TrialNum + stream * Cond + (1+T1.LetKenZ|itemsOnScreen) + (0+stream|Subj) + (0+stream|correctResponse) + (1|Class), data=LKrtNL.2, control=lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun=50000) )) LKrtef = effect("stream*Cond",rtLKNL.2) LKrtef$fit = LKrtef$fit^(1/lambda) LKrtef$upper = LKrtef$upper^(1/lambda) LKrtef$lower = LKrtef$lower^(1/lambda) plot(LKrtef, ylab="Median response time (seconds)", main="Letter Knowledge Response Times", multiline=TRUE, ci.style="bars", xlab="Session", lines=list(col=c("black"), lty=c(2,3)),x.var="stream", lattice=list(key.args=list(title="Condition",x=.62, y=.95,cex=1.2,cex.title=1.2))) pdf("Ch3_NL_LSSIrt.pdf", width=6,height=4) plot(LKrtef, ylab="Median response time (seconds)", main="", multiline=TRUE, ci.style="bars", xlab="Session", lines=list(col=c("black"), lty=c(2,3)),x.var="stream", lattice=list(key.args=list(title="Condition",x=.62, y=.95,cex=1.2,cex.title=1.2))) dev.off() pdf("Ch3_NL_LSSIrtGG.pdf", width=6,height=4) ggplot(as.data.frame(LKrtef), aes(Cond, fit, color=stream)) + geom_point() + geom_errorbar(aes(ymin=lower, ymax=upper), width=0.3) + theme_bw(base_size=12) + labs(x = "Condition", y = "Median response time (seconds)",colour="Session") + scale_colour_grey() + theme(panel.grid.major = element_blank(), axis.line = element_line(colour = "black")) dev.off() # compute effect size for age summary(rtLKNL.2) #writeLines('Explained variance') #rsquared.glmm(list(rtLKB.2)) writeLines('Sample size by condition') summary(aggregate(Cond~Subj,data=rtLKNL.2@frame,FUN=unique)$Cond) writeLines('Trimmed observations:') nrow(LKrtNL[abs(scale(resid(rtLKNL))) > 2 , ]) writeLines('Trimmed in percent:') (1-nrow(LKrtNL[abs(scale(resid(rtLKNL))) < 2 , ]) / nrow(LKrtNL)) * 100 performance::check_model(rtLKNL.2) acf(resid(rtLKNL.2),main="autocorrelation of observations") (rtLKNL.2bs <- confint(rtLKNL.2, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) ``` ##### WLD Accuracy ```{r Netherlands: Written Lexical Decision Accuracy, cache=TRUE} accLDNL <-glmer(correctSelections ~ Cond + (1|target) + (1|Subj), family=binomial, data=LDaccNL, glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 50000) )) LDaccNL.2 = LDaccNL[abs(scale(resid(accLDNL))) < 2 , ] accLDNL.2 <-glmer(correctSelections ~ Cond + (1|target) + (1|Subj), family=binomial, data=LDaccNL.2, glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun=50000) )) plot(effect("Cond",accLDNL.2), ylab="probability of correct response", main="Lexical Decision Accuracy", multiline=TRUE, ci.style="bars", xlab="", lines=list(col=c("black"), lty=c(1,2))) summary(accLDNL.2) writeLines('Explained variance') rsquared.glmm(list(accLDNL.2)) writeLines('Sample size by condition') summary(aggregate(Cond~Subj,data=accLDNL.2@frame,FUN=unique)$Cond) writeLines('Trimmed observations:') nrow(LDaccNL[abs(scale(resid(accLDNL))) > 2 , ]) writeLines('Trimmed in percent:') (1-nrow(LDaccNL[abs(scale(resid(accLDNL))) < 2 , ]) / nrow(LDaccNL)) * 100 ``` ##### WLD Response Time ```{r Netherlands: Written Lexical Decision Response Time, cache=TRUE} bc <- MASS::boxcox(lm(trialRTs ~ targettype + T1.LetKenZ + prevRT + T1PF_TotalZ + Cond, data=LDrtNL),plotit=FALSE) lambda <- bc$x[which.max(bc$y)] if (lambda == 0) lambda <- 0.1 LDrtNL$RTinv = LDrtNL$trialRTs^lambda LDrtNL$prevRTinv = LDrtNL$prevRT^lambda rtLDNL = lmer(RTinv ~ targettype + T1.LetKenZ + prevRTinv + T1PF_TotalZ + Cond + (1+prevRTinv|target) + (1|Subj),data=LDrtNL) LDrtNL.2 = LDrtNL[abs(scale(resid(rtLDNL))) < 2 , ] rtLDNL.2 = lmer(RTinv ~ targettype + T1.LetKenZ + prevRTinv + T1PF_TotalZ + Cond + (1+prevRTinv|target) + (1|Subj),data=LDrtNL.2) LKrtef = effect("Cond",rtLDNL.2) LKrtef$fit = LKrtef$fit^(1/lambda) LKrtef$upper = LKrtef$upper^(1/lambda) LKrtef$lower = LKrtef$lower^(1/lambda) plot(LKrtef, ylab="Median response time (seconds)", main="Lexical Decision Response Times", multiline=TRUE, ci.style="bars", xlab="", lines=list(col=c("black"), lty=c(1,2))) summary(rtLDNL.2) writeLines('Explained variance') rsquared.glmm(list(rtLDNL.2)) writeLines('Sample size by condition') summary(aggregate(Cond~Subj,data=rtLDNL.2@frame,FUN=unique)$Cond) writeLines('Trimmed observations:') nrow(LDrtNL[abs(scale(resid(rtLDNL))) > 2 , ]) writeLines('Trimmed in percent:') (1-nrow(LDrtNL[abs(scale(resid(rtLDNL))) < 2 , ]) / nrow(LDrtNL)) * 100 performance::check_model(rtLDNL) acf(resid(rtLDNL.2),main="autocorrelation of observations") (rtLDNL.2bs <- confint(rtLDNL.2, method = "boot", nsim = simulations, level = 0.95, parallel = bootparallel, ncpus = bootthreads)) ``` ### ANALYSIS 3 - Game Exposure {.tabset} Here, we also include subjects which were excluded for other analysis (e.g. not playing enough, or repeating first grade). The variables **hoursPlayed, levelsPlayed, sessionsPlayed, MaxLevel, itemsSeen, responsesGiven** are combined by principal component analyses. #### Descriptives Here, we show a summary of the combined data with number of observations, means, range and standard deviation per variable and group. ```{r} tab <- tableby(Cond ~ EMT + FamRisk + Country + T1AgeF + HandScore + Gender + Lang + SON_1st + hoursPlayed + levelsPlayed + sessionsPlayed + MaxLevel + itemsSeen + responsesGiven + T1CF_TotalPc + T1PF_TotalPc + T1.LetKen + T1RANcT + T1RANoT, data = datExposure, numeric.stats=c("meansd","medianrange")) summary(tab, text=TRUE, digits=2) ``` #### PCA Principal component analysis factor loadings and histograms for relevant variables to show data coverage. ```{r} writeLines('PCA component loadings of reading condition') prcomp(readExposure[,12:17],scale=T) writeLines('PCA component loadings of math condition') prcomp(mathExposure[,12:17],scale=T) hist(datExposure$ExposurePCA,main="Exposure distribution",xlab="Exposure principal component") hist(datExposure$T1CF_TotalZ,main="CELF PA distribution at T1",xlab="CELF PA (z-score)") hist(datExposure$EMT,main="Reading fluency distribution at T2",xlab="One minute reading (z-score)") ``` #### READ vs. MATH ```{r game exposure read vs. math, fig.height = 6, fig.width = 10} RMexp <- gam(EMT ~ s(T1RANcT) + s(T1CF_TotalZ) + ti(T1CF_TotalZ, ExposurePCA, by=Cond) + Country, data=datExposure) datExposure.2 = datExposure[abs(scale(resid(RMexp))) < 2, ] RMexp.2 <- gam(EMT ~ s(T1RANcT) + s(T1CF_TotalZ) + ti(T1CF_TotalZ, ExposurePCA, by=Cond) + Country, data=datExposure.2) par(mfcol=c(1,3)) par(mar=c(4,4,2,3)) fvisgam(RMexp.2,view=c("T1CF_TotalZ","ExposurePCA"), ylim=c(-2.1,2.1),zlim=c(-2.1,2.1), xlim=c(-2,2), n.grid=250,cond=list(Cond="Math"), xlab="PreTest PA (z-score)", ylab="Exposure PCA (z-score)", show.diff=T, col.diff="white", alpha.diff=0.7, main="A) Math", print.summary=F, add.color.legend = F, nCol=200, color=c("blue", "green", "red"), col=1) fadeRug(subset(datExposure.2, Cond=="Math")$T1CF_TotalZ,subset(datExposure.2,Cond=="Math")$ExposurePCA, alpha=0.5,n.grid=1000,use.data.range=F,col="white") box() gradientLegend(valRange = c(-2,2), nCol=200 ,pos=0.87, side=4,color=c("blue", "green", "red"), col=1) fvisgam(RMexp.2,view=c("T1CF_TotalZ","ExposurePCA"), ylim=c(-2.1,2.1),zlim=c(-2,2),xlim=c(-2,2), n.grid=250, cond=list(Cond="Read"), xlab="PreTest PA (z-score)", ylab="", show.diff=T,col.diff="white", alpha.diff=0.7, main="B) Read", print.summary=F, add.color.legend = F, nCol=200, color=c("blue", "green", "red"), col=1) fadeRug(subset(datExposure.2,Cond=="Read")$T1CF_TotalZ,subset(datExposure.2,Cond=="Read")$ExposurePCA, alpha=0.5,n.grid=1000,use.data.range=F) box() gradientLegend(valRange = c(-2,2), nCol=200 ,pos=0.87, side=4, color=c("blue", "green", "red"), col=1) plot_diff2(RMexp.2,view=c("T1CF_TotalZ","ExposurePCA"), ylim=c(-2.1,2.1),zlim=c(-1,1),xlim=c(-2,2), n.grid=250, comp=list(Cond=c("Read","Math")), xlab="PreTest PA (z-score)", ylab="",nCol=200,nlevels=4, show.diff=T,col.diff="white", alpha.diff=0.7, main="C) B-A Difference", print.summary=F, add.color.legend = F, color=c("blue", "green", "red"), col=1) fadeRug(datExposure.2$T1CF_TotalZ,datExposure.2$ExposurePCA,alpha=0.5,n.grid=1000, use.data.range=F) box() gradientLegend(valRange = c(-1,1), nCol=200 ,pos=0.87, side=4, color=c("blue", "green", "red"), col=1) par(mfcol=c(1,1)) summary(RMexp.2) writeLines('Sample size by condition') aggregate(EMT~Cond,RMexp.2$model,FUN=NROW) writeLines('Trimmed observations:') nrow(datExposure[abs(scale(resid(RMexp))) > 2 , ]) writeLines('Trimmed in percent:') (1-nrow(datExposure[abs(scale(resid(RMexp))) < 2 , ]) / nrow(datExposure)) * 100 gam.check(RMexp.2) # also plot individual smooths plot(RMexp.2,select=1, ylab="Reading fluency (z-score)") abline(0,0) plot(RMexp.2,select=2, ylab="Reading fluency (z-score)") abline(0,0) ``` #### READ by Gender ```{r game exposure read by gender, fig.height = 6, fig.width = 10} RGexp <- gam(EMT ~ s(T1CF_TotalZ) + ti(T1CF_TotalZ, ExposurePCA, by=Gender), data=readExposure) readExposure.2 = readExposure[abs(scale(resid(RGexp))) < 2, ] RGexp.2 <- gam(EMT ~ s(T1CF_TotalZ) + ti(T1CF_TotalZ, ExposurePCA, by=Gender), data=readExposure.2) par(mfcol=c(1,3)) fvisgam(RGexp.2,view=c("T1CF_TotalZ","ExposurePCA"), ylim=c(-2,2),zlim=c(-2,2),xlim=c(-2,2), n.grid=250,cond=list(Gender="male"), xlab="Phonological skills at pre-test(z-score)", ylab="Exposure PCA (z-score)",show.diff=T,col.diff="white", alpha.diff=0.7, main="D) Read: Male", print.summary=F,add.color.legend = F, nCol=200) fadeRug(subset(readExposure.2, Gender=="male")$T1CF_TotalZ,subset(readExposure.2,Gender=="male")$ExposurePCA, alpha=0.75,n.grid=1000,use.data.range=F) box() gradientLegend(valRange = c(-2,2), nCol=200 ,pos=0.87, side=4) fvisgam(RGexp.2,view=c("T1CF_TotalZ","ExposurePCA"), ylim=c(-2,2),zlim=c(-2,2),xlim=c(-2,2), n.grid=250,cond=list(Gender="female"), xlab="Phonological skills at pre-test(z-score)", ylab="",show.diff=T,col.diff="white", alpha.diff=0.7, main="E) Read: Female", print.summary=F, add.color.legend = F, nCol=200) fadeRug(subset(readExposure.2,Gender=="female")$T1CF_TotalZ,subset(readExposure.2,Gender=="female")$ExposurePCA,alpha=0.75,n.grid=1000,use.data.range=F) box() gradientLegend(valRange = c(-2,2), nCol=200 ,pos=0.87, side=4) plot_diff2(RGexp.2,view=c("T1CF_TotalZ","ExposurePCA"), ylim=c(-2,2), zlim=c(-2,2), xlim=c(-2,2), n.grid=250, comp=list(Gender=c("female","male")), xlab="Phonological skills at pre-test(z-score)", ylab="",nCol=200, nlevels=8,show.diff=T,col.diff="white", alpha.diff=0.7, main="F) E-D Difference", print.summary=F, add.color.legend = F,se=F) fadeRug(readExposure.2$T1CF_TotalZ,readExposure.2$ExposurePCA,alpha=0.75,n.grid=1000,use.data.range=F, col="white") box() gradientLegend(valRange = c(-2,2), nCol=200 ,pos=0.87, side=4) par(mfcol=c(1,1)) summary(RGexp.2) writeLines('Sample size by condition') aggregate(EMT~Gender,RGexp.2$model,FUN=NROW) writeLines('Trimmed observations:') nrow(readExposure[abs(scale(resid(RGexp))) > 2 , ]) writeLines('Trimmed in percent:') (1-nrow(readExposure[abs(scale(resid(RGexp))) < 2 , ]) / nrow(readExposure)) * 100 gam.check(RGexp.2) plot(RGexp.2,select=1, ylab="Reading fluency (z-score)") abline(0,0) gc() ``` ## Extra Analyses {.tabset} ### LK Belgium This analysis is not shown in the manuscript but presents a conventional approach to evaluate potential changes in letter knowledge with an ANOVA on the aggregated in-game assessment. Here, we only find a main effect of time which is confirmed by a post-hoc wilcoxon test, showing an overall increase form T1 to T2. ```{r Belgium: LK} LKdatB = subset(datB, T2.LetKen >0 & T1.LetKen >0, select=c("Subj","Cond","T1.LetKen","T2.LetKen")) LKdatB = melt(LKdatB,id.vars=c("Subj","Cond")) LKdatB$variable <- sub(".LetKen","",LKdatB$variable) names(LKdatB) <- c("Subj","Cond","Session","LettersCorrect") LKB = aov(LettersCorrect~Cond*Session,LKdatB) # main effect of session summary(LKB) shapiro.test(LKdatB$LettersCorrect) # not normal, so wilcox.test hist(LKdatB$LettersCorrect) boxplot(LKdatB$LettersCorrect~LKdatB$Cond+LKdatB$Session) wilcox.test(LettersCorrect~Session, data=LKdatB, paired=T) # T2 > T1 ``` ### LK Netherlands This analysis is not shown in the manuscript but presents a conventional approach to evaluate potential changes in letter knowledge with an ANOVA on the aggregated in-game assessment. Here, we only find a main effect of Time and a main effect of Condition ```{r Netherlands: LK} LKdatNL = subset(datNL, T2.LetKen >0 & T1.LetKen >0, select=c("Subj","Cond","T1.LetKen","T2.LetKen")) LKdatNL = melt(LKdatNL,id.vars=c("Subj","Cond")) LKdatNL$variable <- sub(".LetKen","",LKdatNL$variable) names(LKdatNL) <- c("Subj","Cond","Session","LettersCorrect") LKNL = aov(LettersCorrect~Cond*Session,LKdatNL) # main effect of session and main effect of condition summary(LKNL) shapiro.test(LKdatNL$LettersCorrect) # not normal hist(LKdatNL$LettersCorrect) boxplot(LKdatNL$LettersCorrect~LKdatNL$Cond+LKdatNL$Session) wilcox.test(LettersCorrect~Session, data=LKdatNL, paired=T) # T2 > T1 wilcox.test(LettersCorrect~Cond, data=LKdatNL) # Math > Read ``` ### Raw PrePost Here, we present the raw means for the outcome measures at both testing points ```{r} datB$EMTraw <- (datB$T2EMTaCor + datB$T2EMTbCor) / 2 tab <- tableby(Cond ~ EMTraw + T1CF_Total + T2CF_Total + T1CF_TotalPc + T2CF_TotalPc + T1PF_Total + T2PF_Total + T1PF_TotalPc + T2PF_TotalPc + T1.LetKen + T2.LetKen + T1RANcT + T2RANcT + T1RANoT +T2RANoT, data = datB, numeric.stats=c("meansd","medianrange")) summary(tab, text=TRUE, digits=2, test=F) datNL$EMTraw <- (datNL$T2EMTaCor + datNL$T2EMTbCor) / 2 tab <- tableby(Cond ~ EMTraw + T1CF_Total + T2CF_Total + T1CF_TotalPc + T2CF_TotalPc + T1PF_Total + T2PF_Total + T1PF_TotalPc + T2PF_TotalPc + T1.LetKen + T2.LetKen + T1RANcT + T2RANcT + T1RANoT +T2RANoT, data = datNL, numeric.stats=c("meansd","medianrange")) summary(tab, text=TRUE, digits=2, test=F) ```