How could I write the below code more efficiently (ie in less lines)?
I can't seem to define a function with a Year argument and write something like Data{Year} = read.csv('{Year}.csv')
or Data{Year}${Year+1} = 0
etc...
## Load data ##
Data2011 = read.csv('2011.csv')
Data2012 = read.csv('2012.csv')
Data2013 = read.csv('2013.csv')
Data2014 = read.csv('2014.csv')
## Year dummies ##
Data2011$D2011 = 1
Data2011$D2012 = 0
Data2011$D2013 = 0
Data2011$D2014 = 0
Data2012$D2011 = 0
Data2012$D2012 = 1
Data2012$D2013 = 0
Data2012$D2014 = 0
Data2013$D2011 = 0
Data2013$D2012 = 0
Data2013$D2013 = 1
Data2013$D2014 = 0
Data2014$D2011 = 0
Data2014$D2012 = 0
Data2014$D2013 = 0
Data2014$D2014 = 1
Well, I'd first read in the data in loop, and attach a new column with the year.
dat <- lapply(2011:2014, function(y) cbind(Year=y, read.csv(paste0(y, '.csv')))
Now, the most common use for dummies is when you're fitting a model, so I'm guessing you want to put all the data together.
dat <- do.call(rbind, dat)
Then in most model fitting you'd never make the dummies yourself, that's a job for the computer. You'd just make the variable of interest a factor and then R will do the right thing.
dat$Year <- factor(dat$Year)
That's all I'd normally do. But if for some reason I really wanted to make those dummies myself, I'd still let the computer do it, like this, and then add that to the data set.
dums <- model.matrix(~0+Year, data=dat)
dat <- cbind(dat, dums)
For the purpose of learning how to loop and access variables, you could also do something like this, where the [[
allows you to use a character string to access or create a variable name, and the *1
converts a boolean to 0/1.
for(y in unique(dat$year)) {
dat[[paste0("Year", y)]] <- (dat$Year==y)*1
}
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments