---
title: "Analysis of Edwinâ€™s K01 Data"
author: "Adam Davey"
date: "December 4, 2015"
output:
html_document:
toc: true
toc_depth: 3
theme: journal
highlight: zenburn
---
# Preamble
Within **R**, Rmarkdown syntax is a useful way to accomplish some version of 'literate programming' in which code and documentation are interwoven. This helps in the process of ensuring that your work is reproducible, and can easily be shared with others.
To get oriented, let's first look at Edwin's request (thank you, Edwin!) before looking at his data. It is formatted as a block quote.
> I have attached a CSV dataset for one of my experiments (basically, a reaction time experiment) for which I'd like to figure out how to do and interpret linear mixed effects models analyses. This dataset will have several issues to contend with (e.g., unbalanced groups, missing data, between- and within-variables, maybe non-normally distributed data) that I find fairly commonly in my work and that others might also find relevant to their work.
## Clear the workspace
First, it is always good practice to clear the workspace prior to running the analysis to ensure that working syntax does not depend on an object that coincidentally exists in the workspace.
```{r}
rm(list=ls())
```
## Read and examine the data
I have verified that the data are comma separated and contain a heder. However, I set the option here explicitly.
```{r}
rtdata <- read.csv(file='~/Downloads/K01_Exp1_AP_raw-data_ABS-screened.csv', header = TRUE)
dim(rtdata)
names(rtdata)
head(rtdata)
table(rtdata$group)
```
I know nothing about the design of this study, but I can now see that it contains data on 5 different groups. One thing that would be interesting is to know how many trials each subject participates in. I could table subject. It would be a mess. First, how many different subjects? Next, how many trials do subjects participate in?
```{r}
length(unique(rtdata$subject))
table(table(rtdata$subject))
```
Before even thinking about analyses, we need to look at the distribution of reaction times, which would be expexted to be very highly positively skewed.
```{r}
hist(rtdata$RT, breaks='scott', probability=T, main='Reaction Time')
lines(density(rtdata$RT), lwd=2, col='darkorange1')
```
This looks like a candidate for log transformation. We can evaluate it directly prior to conducting any transformations.
```{r}
hist(log(rtdata$RT + 1), breaks='scott', probability=T, main='ln(Reaction Time + 1)')
lines(density(log(rtdata$RT + 1)), lwd=2, col='darkorange1')
```
Of course, this still looks terribly lumpy which reminds us that we are forgetting something important about the design -- groups! The "smartest" way to do this is probably using ```ggplot2``` or ```lattice``` but we will stick with the base graphics package here. We can also use **R**'s ```split``` function to temporarily separate the data file by group.
```{r}
tmprt <- split(rtdata$RT, rtdata$group)
nrows <- ceiling((length(tmprt) + 1)/2)
par(mfrow=c(nrows,2))
for (i in 1:length(tmprt)) {
hist(log(tmprt[[i]] + 1), probability = T, breaks='scott', main=paste0('Histogram for ln(', names(tmprt)[i], ')'))
lines(density(log(tmprt[[i]] + 1)), lwd=2, col=i)
}
par(mfrow=c(1,1))
```
Before running any analyses such as growth curve models (a kind of mixed-effects model), we need to plot individual-level data. A preferred type of plot is the "spaghetti plot" which shows individual values over trials/time. It can get quite crowded and so is best done on a small subset of observations (I find that more than 25 can get crowded).
```{r}
# Take a random sample of subject ids
set.seed(1234567)
who <- sample(unique(rtdata$subject), size=10, replace=F)
tmp <- subset(rtdata, subject %in% who)
interaction.plot(tmp$trial, tmp$subject, log(tmp$RT + 1))
```
```{r}
# Now with fewer trials
set.seed(1234567)
who <- sample(unique(rtdata$subject), size=10, replace=F)
tmp <- subset(rtdata, subject %in% who & trial <= 10)
interaction.plot(tmp$trial, tmp$subject, log(tmp$RT + 1))
```
```{r}
# Now with fewer trials
set.seed(1234567)
who <- sample(unique(rtdata$subject), size=30, replace=F)
tmp <- subset(rtdata, subject %in% who & trial <= 10 & group =="SSD")
interaction.plot(tmp$trial, tmp$subject, log(tmp$RT + 1))
```
Here's a strange thing. The range of trials is from 4 to 83. Are individuals seen under different conditions?
```{r}
table(rtdata$trial)
```
```{r}
require('lme4')
library(lme4)
linmod0 <- lm(log(RT + 1) ~ trial + group + overlap + prime + SOA + block + part, data=rtdata)
mixmod0 <- lmer(log(RT + 1) ~ trial + group + overlap + prime + as.factor(SOA) + as.factor(block)*as.factor(part) + (1|subject), data=rtdata)
```