因此,我使用“ rgbif”软件包来挖掘巴西“ actinopterygii”组中鱼类的发病率数据,但是由于该组的发病率如此之高,我无法一次全部检索它们。通过这两行代码,我们可以看到有323200次出现:
#install.packages("rgbif")
library(rgbif)
actinopterygii<-name_backbone(name="Actinopterygii")
occ_count(taxonKey = actinopterygii$classKey,country="BR")
事实是,检索事件的函数每次检索的最大限制为2000次:
actinopterygii_oc<-occ_search(taxonKey = actinopterygii$classKey,country="BR",limit=2000,start=0)
#the start argument refers to the index of the record we are starting at so we can page through all the results
我基本上是在尝试避免重复此行60次,并且每次都将起始值更改为2000,因此我尝试使用for循环,但是它不起作用。我为出现次数创建了一个间隔,以一次执行2000到2000的检索:
interval<-seq(from = 0, to = 323200, by = 2000)
for (value in interval){
actinopterygii_oc<-occ_search(taxonKey = actinopterygii$classKey,country="BR",limit=2000,start=value)
}
问题在于此代码每次仅修改一组数据。那么,有什么方法可以创建几组数据,在间隔中的值之间循环时,为间隔中的每个值创建一组数据?
很抱歉让我感到困惑,但是我无法更好地表达它,在此先感谢您的回答
而不是for循环,请尝试一次purrr::map
获取2,000行的小标题列表。我可能不必告诉你这将需要很长时间
interval <- seq(from = 1, to = 323200, by = 2000)
list_of_tibbles <-
purrr::map(interval,
~ occ_search(taxonKey = actinopterygii$classKey,
country="BR",
limit=2000,
start= .x)
)
我本来不打算获取所有数据,但您会像
[[1]]
Records found [323200]
Records returned [2000]
No. unique hierarchies [661]
No. media records [2000]
No. facets [0]
Args [limit=2000, offset=1, taxonKey=204, country=BR, fields=all]
# A tibble: 2,000 x 145
key scientificName decimalLatitude decimalLongitude issues datasetKey publishingOrgKey
<chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
1 2550… Chaetodipteru… -7.91 -34.8 cdrou… 50c9509d-… 28eb1a3f-1c15-4…
2 2550… Myrichthys oc… -7.90 -34.8 cdrou… 50c9509d-… 28eb1a3f-1c15-4…
3 2550… Mugil curema … -7.91 -34.8 cdrou… 50c9509d-… 28eb1a3f-1c15-4…
4 2550… Centropomus u… -7.91 -34.8 cdrou… 50c9509d-… 28eb1a3f-1c15-4…
5 2550… Trachinotus c… -7.91 -34.8 cdrou… 50c9509d-… 28eb1a3f-1c15-4…
6 2550… Phractocephal… -3.18 -59.9 cdrou… 50c9509d-… 28eb1a3f-1c15-4…
7 2550… Diapterus aur… -7.91 -34.8 cdrou… 50c9509d-… 28eb1a3f-1c15-4…
8 2550… Chaetodipteru… -7.91 -34.8 cdrou… 50c9509d-… 28eb1a3f-1c15-4…
9 2550… Centropomus u… -7.91 -34.8 cdrou… 50c9509d-… 28eb1a3f-1c15-4…
10 2550… Calophysus ma… -3.18 -59.9 cdrou… 50c9509d-… 28eb1a3f-1c15-4…
# … with 1,990 more rows, and 138 more variables: installationKey <chr>,
# publishingCountry <chr>, protocol <chr>, lastCrawled <chr>, lastParsed <chr>,
# crawlId <int>, extensions <chr>, basisOfRecord <chr>, occurrenceStatus <chr>,
# taxonKey <int>, kingdomKey <int>, phylumKey <int>, classKey <int>, orderKey <int>,
# familyKey <int>, genusKey <int>, speciesKey <int>, acceptedTaxonKey <int>,
# acceptedScientificName <chr>, kingdom <chr>, phylum <chr>, order <chr>, family <chr>,
# genus <chr>, species <chr>, genericName <chr>, specificEpithet <chr>, taxonRank <chr>,
# taxonomicStatus <chr>, dateIdentified <chr>, coordinateUncertaintyInMeters <dbl>,
# stateProvince <chr>, year <int>, month <int>, day <int>, eventDate <chr>,
# modified <chr>, lastInterpreted <chr>, references <chr>, license <chr>,
# identifiers <chr>, facts <chr>, relations <chr>, gadm.level0.gid <chr>,
# gadm.level0.name <chr>, gadm.level1.gid <chr>, gadm.level1.name <chr>,
# gadm.level2.gid <chr>, gadm.level2.name <chr>, gadm.level3.gid <chr>,
# gadm.level3.name <chr>, geodeticDatum <chr>, class <chr>, countryCode <chr>,
# recordedByIDs <chr>, identifiedByIDs <chr>, country <chr>, rightsHolder <chr>,
# identifier <chr>, http...unknown.org.nick <chr>, verbatimEventDate <chr>,
# datasetName <chr>, collectionCode <chr>, gbifID <chr>, verbatimLocality <chr>,
# occurrenceID <chr>, taxonID <chr>, catalogNumber <chr>, recordedBy <chr>,
# http...unknown.org.occurrenceDetails <chr>, institutionCode <chr>, rights <chr>,
# eventTime <chr>, identifiedBy <chr>, identificationID <chr>, name <chr>,
# occurrenceRemarks <chr>, gadm <chr>, informationWithheld <chr>,
# recordedByIDs.type <chr>, recordedByIDs.value <chr>, individualCount <int>,
# establishmentMeans <chr>, continent <chr>, organismQuantityType <chr>, habitat <chr>,
# http...rs.tdwg.org.dwc.terms.organismQuantity <chr>,
# georeferenceVerificationStatus <chr>, verbatimSRS <chr>, verbatimCoordinateSystem <chr>,
# county <chr>, locality <chr>, taxonRemarks <chr>, preparations <chr>, disposition <chr>,
# vernacularName <chr>, organismName <chr>, fieldNotes <chr>, originalNameUsage <chr>,
# http...rs.tdwg.org.dwc.terms.organismQuantityType <chr>, …
您会注意到,返回的内容不仅包含数据,还包含其他元数据。将所有data
背面粘合在一起形成一个大数据框map
glued_data <-
purrr::map(list_of_tibbles, "data") %>%
bind_rows()
dim(glued_data)
[1] 10000 162
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句