How do I assign value to a row in a dataframe based on another in r?

LJA Published at Dev

LJA

I am looking at data on butterflies that have been caught in different samples.My problem is that there has been inconsistency in the 'names' (numbers) used for the same species. The species have each been assigned a number to identify them.

I have two dataframes, the first is a dataset of counts of each species "mydata" but each species in it has been assigned multiple IDs, instead of just one correct one. So two different numbers may refer to the same species, and I need to make sure my names are standardised.

IDs <- c(10,8,3,42,7,23,42,2)
sample1 <- c(0,0,2,0,3,0,0,2)
sample2 <- c(0,1,0,2,4,0,3,1)
sample3 <- c(0,1,1,0,2,0,3,1)
sample4 <- c(0,2,0,2,0,1,2,1)
sample5 <- c(3,1,0,0,1,0,0,1)
mydata <- cbind(IDs,sample1,sample2,sample3,sample4,sample5)

I have a second database that I am using as a reference, "specieslist", and this contains the correct ID, plus all alternative IDs that may have been used.

ID1 <- c(10,34,20,2,7,38)
ID2 <- c(22,3,42,NA,6,23)
ID3 <- c(NA,8,NA,NA,1,NA)
correct.ID <- c(10,3,20,2,1,23)
specieslist <- cbind(ID1,ID2,ID3,correct.ID)
splist <- replace(specieslist,is.na(specieslist),0)

I want to search specieslist to find out which number should be used in mydata, and assign the correct ID to a new column in mydata.

I have been trying to create a loop that will find out which row of specieslist contains the value in mydata, and then selecting the value in the correctID column for that row.

corr.sp <- c(NULL)
rws <- length(mydata[,1])
for(s in 1:rws){
  dat <- as.character(mydata[s,1])
  pos <- which(splist==dat, arr.ind=TRUE)
  ind <- pos[1,1]
  corr <- as.matrix(splist[ind,4])
  corr.sp <- c(corr.sp,corr)
}

mydata.corrsps <- cbind(mydata,corr.sp)

What I expect is for corr.sp and mydata.corrsps to look like this:

corr.sp <- c(10,3,3,20,1,23,20,2)
mydata.corrsps <- cbind(mydata,corr.sp)

This demo code seems to work, but in some of my real data my an error appears when I run the loop saying my row index (pos[1,1]) is out of bounds - I've had this error before when it searches for rows of species that weren't found in that dataset, but I have been through and removed any rows where this applies, saved the file as a csv and reimported it to avoid errors of row-index mix-ups (seems to happen with data when subsetting in r). I have also checked that the maximum value for the pos(1,1) does not exceed the number of rows available for selection, and I have checked that all values it searches for are present in the data.

I would be very grateful if anyone could suggest a better way of doing what I am unsuccessfully trying to do, or point out where I am going wrong.

mtoto

You could make splist long format, and then merge the relevant columns with mydata:

library(tidyr)
library(dplyr)

# splist to long format
long.splist <- data.frame(splist) %>% gather(key, IDs, ID1:ID3)

# merge
merge(mydata,long.splist[,c(3,1)])
#  IDs sample1 sample2 sample3 sample4 sample5 correct.ID
#1   2       2       1       1       1       1          2
#2   3       2       0       1       0       0          3
#3   7       3       4       2       0       1          1
#4   8       0       1       1       2       1          3
#5  10       0       0       0       0       3         10
#6  23       0       0       0       1       0         23
#7  42       0       2       0       2       0         20
#8  42       0       3       3       2       0         20

The result is ordered by IDs, as that's the column on which the join was performed.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-02-25

Comments

0 comments

From Dev

Related Related

Article

How do I assign value to a row in a dataframe based on another in r?

How do I assign value to a row in a dataframe based on another in r?

How do i lookup a row on one dataframe based on the column cell value and append that to a row on another dataframe?

How do I apply a value from a dataframe based on the value of a multi-index of another dataframe?

How to assign a value for a column based on another column value in R?

How to assign text into a column based on another dataframe in r

How do you update a column's value in a dataframe based off of another row?

How do I get the value of a datalist option and assign it to another element?

How can I reference a row in another sheet based on value in cell?

In R, how do I set a value for a variable based on the change from the prior (or following) row?

How do I get a table value based on the value of another string?

how can I assign a row with Pyspark Dataframe?

in R, how do I have the scatterplot choose a color for a point based on the value of another variable?

How do i assign value?

How to assign one row of a hierarchically indexed Pandas DataFrame to another row?

How to assign one row of a hierarchically indexed Pandas DataFrame to another row?

How do I assign a value in R if within a certain range of time?

Create a column that assigns value to a row in a dataframe based on an event in another row

How do I filter a pandas DataFrame based on value counts?

How to write a query so that I can insert a row based on another row value

HTML - How do I assign a text value to another attribute's value?

In R, how do I restrict a list of lists based on another list?

Assign value based on row remainder

How to assign a default value in a column based on another column's value?

How to assign a default value in a column based on another column's value?

How do I copy a row from one pandas dataframe to another pandas dataframe?

R - How to get value from a column based on value from another column of same row

How can I add a dataframe converted to an array as an element of another dataframe based on a key value

How to assign a value to a column for every row of pandas dataframe?

How do I Change or Assign a Value to a Private Variable JTextField from another class?

How do I store textbox name reference in a Variable to later assign the value of that textbox to another variable