根据多个条件分割字符串

Blake 发表于 Dev

布莱克

如果我有如下字符串：

example 1: email:[email protected],username:noneusername, token:nonetoken21309r9023, user_id:nonuserid

example 2: username:slkfsoi,email:[email protected],username:oiwoie,token:asfkjsdf0

example 3: email:[email protected],user_id:lkasflk

我想根据电子邮件，用户名，令牌和user_id进行拆分。在某些情况下，并非全部4个都存在。另外，在其他情况下，字符串可能每个都有多个实例（电子邮件，令牌，电子邮件，user_id，令牌）。在这种情况下，我要采用每个实例的第一个实例。

这就是我对R的要求，但是，如果我在循环中使用它，则当数据帧中有成千上万个字符串时，效率不高。我尝试将此功能与apply一起使用，但是，它不起作用。我认为这是因为我的函数未向量化吗？

match_value <- function(x,z){
  b <- head(grep(z,unlist(strsplit(x,","))),1)
  c <- strsplit(x, ",")
  d <- unlist(c)[b]
  e <- gsub(z,"",d)
  if((length(e) == 0) && (typeof(e) == "character")){
   e = ""
  }
  return(e)
}

在上面的示例中，我将调用以下函数，其中x =字符串值的dataframe列，而z =我要匹配的字符串，例如email：或token:。

谢谢！

拉尔

我会使用您的方法，但使用gsub和正则表达式

x <- c('email:[email protected],username:noneusername, token:nonetoken21309r9023, user_id:nonuserid',
       'username:slkfsoi,email:[email protected],username:oiwoie,token:asfkjsdf0',
       'email:[email protected],user_id:lkasflk')

f <- function(what, string = x) {
  gsub(sprintf('%s\\:\\s*([^,]*)|.', what), '\\1', string, perl = TRUE)
}


f('email', x)
# [1] "[email protected]"      "[email protected]"   "[email protected]"

f('username', x)
# [1] "noneusername"  "slkfsoioiwoie" ""             

f('token', x)
# [1] "nonetoken21309r9023" "asfkjsdf0"           ""                   

f('user_id', x)
# [1] "nonuserid" ""          "lkasflk"  


n <- c('email', 'username', 'token', 'user_id')
data.frame(setNames(lapply(n, f), n))

#                 email      username               token   user_id
# 1      [email protected]  noneusername nonetoken21309r9023 nonuserid
# 2   [email protected] slkfsoioiwoie           asfkjsdf0          
# 3 [email protected]                                     lkasflk