I try to find the relationship between test scores and kindergarten experience, taking background variables into account. The conclusions might have implications for parents.

## Question

Find the relationship between test scores and kindergarten experience, taking background variables into account.

**Data**

- The dependent variables are
`math, literacy, social`

. These are test results. - The explanatory are divided into
- variables of interest
`kindergarten_type, kindergarten_amt`

, and - control variables
`gender, parents_educ, birth_year, birth_month, ID`

.

- variables of interest

## **Conclusions**

- Students who attend kindergarten tend to have higher test scores.
- Students who spend more time in kinderkarten per week, tend to perform better on the test.

## **Method**

Multivariate multiple linear regression (MultiMLR) is used. This method allows for many Y variables and several explanatory X variables, as opposed to multiple linear regression (MLR) in which there is one single Y variable. I used MultiMLR instead of the more common MLR approach because the Y variables are correlated. (Further details on the difference between MLR and MultiMLR can be found here and here.)

## **Script**

R script, including output graphs and tables can be found in this RPubs document, which is written using RMarkdown. Comments — and some variable names — are in Swedish.

## **Todo**

Make a screencast going over the RPubs document in English, since some variables are coded in Swedish.

## **Analysis outline**

Until I have done a proper screencast to explain the analysis, I provide some pointers to the key outputs below.

- A major part of the code is to change the variables, because the dataset we were given should not be used as is. Firstly, I recode the variables
`birth_year, birth_month`

into which quarter of the year the studen was born. Secondly, I reshape the two categorical variables`kindergarten_type, kindergarten_amt`

into two dummy variables measuring (a) whether the student went to kindergarten or not, and (b) if they spent extra time during a typical week in the kindergarten. - A scatter plot displays the relationship among the Y variables. Combined with
`cor()`

we can tell they dependent variables are correlated.

`pairs(~ math + literacy + social, df)`

produces the following graph:Then I create three boxplots using to visualize the distribution — it looks reasonable without any outliers to handle.

`par(mfrow=c(1,3))`

aligns the boxplots in three columns, and`boxplot(df$variable_name, df)`

produces the plots: - Below is a detailed argument for why a MultiMLR is a good method. I use a decision tree of questions to come up with the model choice.
*Q: What type of relationships is being examined, dependent or independent?*A: Dependent.

*Q: Number of dependent variables?*A: Several

*Q: Measurement scale of the dependent variables?*

A: Numerical, Y go from 0 to 50.*Measurement scale of the explanatora variables?*A: Numerical and categorical.

- Based on the questions above we end up with multivariate multiple linear regression as our model choice. The method is also suitable given the research question.

- The fitted model is
`lm(cbind(math, literacy, social) ~ #variables of interest: kinderg + spec_kinderg + extratime #control variables + parents_educ + boy + kva1 +kva2 +kva3 , data=df)`

(I have translated the variable names above, compared to in the document.)

- The regression measures how much the three variables of interest affects the three Y variables, whilst controling for parents education, gender, and which quarter of the year the student were born.
- The three variables of interest

(i)`kinderg`

,

(ii)`spec_kinderg`

,

(iii)`extratime`

are all dummy variables. They measure whether the students have

(i) went to kindergarten at all,

(ii) went to a special kindergarten called “Montesori” or “Ur och Skur”,

(iii) spent over 15 hours in the kindergarten per week.

- Estimated coefficients of the model are found below. I pasted it from the document, translated Swedish variables and rounded the estimates.
`Coefficients: math literacy social (Intercept) 23.73265 16.07515 32.51889 kinderg 7.40438 1.92834 9.49967 spec_kinderg Not Sign. -2.41103 -3.19622 extratime 1.52378 6.57021 5.55214`

From this table of coefficients we can draw some conclusions:

- Students who attend kindergarten tend to have higher test scores. How much higher is seen from the coefficients above: 7.4 for math, 1.9 for literacy and 9.5 for social. Notably,
*literacy*has the least increase. - Students who spend over 15 hours in kinderkarten, tend to perform better on the test. Notably, spending more hours affect the
*literacy*and*social*scores much more the*math*results.

- Students who attend kindergarten tend to have higher test scores. How much higher is seen from the coefficients above: 7.4 for math, 1.9 for literacy and 9.5 for social. Notably,