data.frameには、行のSUBJECTに3つの列があります。行ごとに一意の件名を持つ追加の列が必要です。まず、私のデータはどのように見えるか:
DATE <- c("1","2","3","4","5","6","7","1","2","3","4","5","6","7")
COMP <- c("A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B")
RET <- c(-2.0,1.1,3,1.4,-0.2, 0.6, 0.1, -0.21, -1.2, 0.9, 0.3, -0.1,0.3,-0.12)
CLASS <- c("positive", "negative", "aneutral", "positive", "positive", "negative", "aneutral", "positive", "negative", "negative", "positive", "aneutral", "aneutral", "aneutral")
SUBJECT.1 <- c("LITIGATION","LAYOFF","POLLUTION","CHEMICAL DISASTER","PRESS RELEASE","PEOPLE","EMISSIONS","ENERGY","WASTE MANAGEMENT","EMPLOYEES","MANAGEMENT","PRESS RELEASE","HOTELS","POLLUTION")
SUBJECT.2 <- c("POLLUTION","EMPLOYEES","NUCLEAR","FUELS","STOCK OPTION PLAN","EXECUTIVES","CO2","SOLAR","POLLUTION","EXECUTIVES","PRESS RELEASE","CELEBRITIES","CELEBRITIES","LITIGATION")
SUBJECT.3 <- c("ENVIRONMENT","JOB REDUCTIONS","POWER PLANTS","POLLUTION","EMPLOYEES","FRAUD","CLIMATE CHANGE","SUSTAINABILITY","HAZARDOUS WASTE","BONUS PAY","LITIGATION","EMISSIONS","SCANDALS","SCANDALS")
CONTROLVAR <- c("11","13","13","14","13","14","12","11","13","13","14","13","14","12")
mydf <- data.frame(DATE, COMP, RET, CLASS, SUBJECT.1, SUBJECT.2, SUBJECT.3, CONTROLVAR, stringsAsFactors=F)
mydf
# DATE COMP RET CLASS SUBJECT.1 SUBJECT.2 SUBJECT.3 CONTROLVAR
# 1 1 A -2.00 positive LITIGATION POLLUTION ENVIRONMENT 11
# 2 2 A 1.10 negative LAYOFF EMPLOYEES JOB REDUCTIONS 13
# 3 3 A 3.00 aneutral POLLUTION NUCLEAR POWER PLANTS 13
# 4 4 A 1.40 positive CHEMICAL DISASTER FUELS POLLUTION 14
# 5 5 A -0.20 positive PRESS RELEASE STOCK OPTION PLAN EMPLOYEES 13
# 6 6 A 0.60 negative PEOPLE EXECUTIVES FRAUD 14
# 7 7 A 0.10 aneutral EMISSIONS CO2 CLIMATE CHANGE 12
# 8 1 B -0.21 positive ENERGY SOLAR SUSTAINABILITY 11
# 9 2 B -1.20 negative WASTE MANAGEMENT POLLUTION HAZARDOUS WASTE 13
# 10 3 B 0.90 negative EMPLOYEES EXECUTIVES BONUS PAY 13
# 11 4 B 0.30 positive MANAGEMENT PRESS RELEASE LITIGATION 14
# 12 5 B -0.10 aneutral PRESS RELEASE CELEBRITIES EMISSIONS 13
# 13 6 B 0.30 aneutral HOTELS CELEBRITIES SCANDALS 14
# 14 7 B -0.12 aneutral POLLUTION LITIGATION SCANDALS 12
後の回帰のためにサブジェクトをダミー変数(排他的である必要があります)として含めたいので、各行に一意のサブジェクトを持つ単一の列SUBJECTが必要です。訴訟、汚染、レイオフのテーマに焦点を当てたいと思います。
左から右に移動して、各SUBJECT列でLITIGATION、POLLUTION、LAYOFFを確認したいと思います。
最初の列にLITIGATION、POLLUTION、またはLAYOFFの3つの主題のいずれかがある場合、この主題が採用されます。最初の列に別の主題がある場合は、2番目の列をチェックします。3つのサブジェクト列のいずれにもLITIGATION、POLLUTION、またはLAYOFFが含まれていない場合、サブジェクトはOTHERと呼ばれる必要があります。また、一部の科目はグループ化する必要があります。この例では、EMISSIONSはPOLLUTIONであるかのように扱う必要があります。
出力は次のようになります。
# DATE COMP RET CLASS SUBJECT.1 SUBJECT.2 SUBJECT.3 SUBJECT CONTROLVAR
# 1 1 A -2.00 positive LITIGATION POLLUTION ENVIRONMENT LITIGATION 11
# 2 2 A 1.10 negative LAYOFF EMPLOYEES JOB REDUCTIONS LAYOFF 13
# 3 3 A 3.00 aneutral POLLUTION NUCLEAR POWER PLANTS POLLUTION 13
# 4 4 A 1.40 positive CHEMICAL DISASTER FUELS POLLUTION POLLUTION 14
# 5 5 A -0.20 positive PRESS RELEASE STOCK OPTION PLAN EMPLOYEES OTHER 13
# 6 6 A 0.60 negative PEOPLE EXECUTIVES FRAUD OTHER 14
# 7 7 A 0.10 aneutral EMISSIONS CO2 CLIMATE CHANGE POLLUTION 12
# 8 1 B -0.21 positive ENERGY SOLAR SUSTAINABILITY OTHER 11
# 9 2 B -1.20 negative WASTE MANAGEMENT POLLUTION HAZARDOUS WASTE POLLUTION 13
# 10 3 B 0.90 negative EMPLOYEES EXECUTIVES BONUS PAY OTHER 13
# 11 4 B 0.30 positive MANAGEMENT PRESS RELEASE LITIGATION LITIGATION 14
# 12 5 B -0.10 aneutral PRESS RELEASE CELEBRITIES EMISSIONS POLLUTION 13
# 13 6 B 0.30 aneutral HOTELS CELEBRITIES SCANDALS OTHER 14
# 14 7 B -0.12 aneutral POLLUTION LITIGATION SCANDALS POLLUTION 12
ありがとうございました!
mydf$SUBJECT <- "OTHER"
sapply(c("SUBJECT.3", "SUBJECT.2", "SUBJECT.1"), function(x) mydf[mydf[, x] %in% c("LITIGATION", "POLLUTION", "LAYOFF", "EMISSIONS"), "SUBJECT"] <<- mydf[mydf[, x] %in% c("LITIGATION", "POLLUTION", "LAYOFF", "EMISSIONS"), x])
mydf$SUBJECT[mydf$SUBJECT == "EMISSIONS"] <- "POLLUTION"
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加