嗨,大家好:我已经在堆栈溢出和互联网的其余部分中搜索了该问题的答案,但是我找不到的答案似乎都适合我。
我有成千上万行的json数据,其中包含来自相机陷阱研究的图像信息。解压缩数据时遇到很多麻烦。我jsonlite::fromJSON
无济于事。与as.tbl_json
tidyjson相同。
我的目标是编写一些代码,该代码将为我提供一个数据框,其中包含以json格式存储的每个变量的列。你能帮我吗?
这是我正在处理的数据向量,尽管实际上我将数据作为一个较大.csv
文件中的单个列存储。第一行是列名。
annotations<-c(annotations,
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""DEERWHITETAILED"",""answers"":{""HOWMANY"":""1"",""YOUNGPRESENT"":""NO"",""ANTLERSPRESENT"":""NO"",""WHATBEHAVIORSDOYOUSEE"":[""ALERT""],""ESTIMATEOFSNOWDEPTHSEETUTORIAL"":""NOSNOWBAREGROUND"",""ISITACTIVELYRAININGORSNOWINGINTHEPICTURE"":""NO""},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""FISHER"",""answers"":{""HOWMANY"":""1"",""YOUNGPRESENT"":""NO"",""WHATBEHAVIORSDOYOUSEE"":[""WALKINGRUNNING"",""ALERT""],""ESTIMATEOFSNOWDEPTHSEETUTORIAL"":""1020CM"",""ISITACTIVELYRAININGORSNOWINGINTHEPICTURE"":""NO""},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]")
这是我运行dput(annotations)会得到的结果:
structure(list(annotations = c("[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]",
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]",
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"DEERWHITETAILED\",\"answers\":{\"HOWMANY\":\"1\",\"YOUNGPRESENT\":\"NO\",\"ANTLERSPRESENT\":\"NO\",\"WHATBEHAVIORSDOYOUSEE\":[\"ALERT\"],\"ESTIMATEOFSNOWDEPTHSEETUTORIAL\":\"NOSNOWBAREGROUND\",\"ISITACTIVELYRAININGORSNOWINGINTHEPICTURE\":\"NO\"},\"filters\":{}}]}]",
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]",
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]",
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]",
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]",
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]",
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"FISHER\",\"answers\":{\"HOWMANY\":\"1\",\"YOUNGPRESENT\":\"NO\",\"WHATBEHAVIORSDOYOUSEE\":[\"WALKINGRUNNING\",\"ALERT\"],\"ESTIMATEOFSNOWDEPTHSEETUTORIAL\":\"1020CM\",\"ISITACTIVELYRAININGORSNOWINGINTHEPICTURE\":\"NO\"},\"filters\":{}}]}]",
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]"
)), class = "data.frame", row.names = c(NA, -10L))
我不清楚您要寻找哪种输出格式。您可以通过多种不同的方式来执行此操作。此外,数据结构中的数组(每个数组中只有一个对象)会使事情变得复杂,因为它们中可能包含更多的对象。
无论如何,tidyjson
都不需要太多的代码spread_all()
。您还可以使用来仅传播特定的值spread_values()
,或者enter_object(answers)
仅传播答案,等等。希望这会有所帮助!
library(tidyjson)
#>
#> Attaching package: 'tidyjson'
#> The following object is masked from 'package:stats':
#>
#> filter
library(tibble)
annotations <- structure(list(annotations = c("[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]",
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]",
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"DEERWHITETAILED\",\"answers\":{\"HOWMANY\":\"1\",\"YOUNGPRESENT\":\"NO\",\"ANTLERSPRESENT\":\"NO\",\"WHATBEHAVIORSDOYOUSEE\":[\"ALERT\"],\"ESTIMATEOFSNOWDEPTHSEETUTORIAL\":\"NOSNOWBAREGROUND\",\"ISITACTIVELYRAININGORSNOWINGINTHEPICTURE\":\"NO\"},\"filters\":{}}]}]",
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]",
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]",
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]",
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]",
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]",
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"FISHER\",\"answers\":{\"HOWMANY\":\"1\",\"YOUNGPRESENT\":\"NO\",\"WHATBEHAVIORSDOYOUSEE\":[\"WALKINGRUNNING\",\"ALERT\"],\"ESTIMATEOFSNOWDEPTHSEETUTORIAL\":\"1020CM\",\"ISITACTIVELYRAININGORSNOWINGINTHEPICTURE\":\"NO\"},\"filters\":{}}]}]",
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]"
)), class = "data.frame", row.names = c(NA, -10L))
ant <- tibble(raw = annotations$annotations)
as.tbl_json(ant, json.column = "raw") %>%
gather_array("object_id") %>%
spread_all() %>%
enter_object("value") %>%
gather_array("value_id") %>%
spread_all() %>%
as_tibble()
#> # A tibble: 10 x 9
#> object_id task value_id choice answers.HOWMANY answers.YOUNGPR…
#> <int> <chr> <int> <chr> <chr> <chr>
#> 1 1 T0 1 NOTHI… <NA> <NA>
#> 2 1 T0 1 NOTHI… <NA> <NA>
#> 3 1 T0 1 DEERW… 1 NO
#> 4 1 T0 1 NOTHI… <NA> <NA>
#> 5 1 T0 1 NOTHI… <NA> <NA>
#> 6 1 T0 1 NOTHI… <NA> <NA>
#> 7 1 T0 1 NOTHI… <NA> <NA>
#> 8 1 T0 1 NOTHI… <NA> <NA>
#> 9 1 T0 1 FISHER 1 NO
#> 10 1 T0 1 NOTHI… <NA> <NA>
#> # … with 3 more variables: answers.ANTLERSPRESENT <chr>,
#> # answers.ESTIMATEOFSNOWDEPTHSEETUTORIAL <chr>,
#> # answers.ISITACTIVELYRAININGORSNOWINGINTHEPICTURE <chr>
由reprex软件包(v0.3.0)创建于2020-03-14
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句