在R中自动执行（循环）欧式距离测量

debugcn 发表于 Dev

皮肤科医生

目的：我想自动化（循环）下面的代码，而不必为每个示例手动运行它。我有一个很烂的习惯，就是在基础上写很多东西，并且需要开始使用循环，我发现很难实现。

数据：我有两个数据帧：一个样本数据（sample）和一个参考数据（ref）。它们都包含相同的变量（x，y，z）。

代码说明：对于每个样本（sample $ sample_name），我想计算它与参考数据中每种情况的欧几里得距离。然后将结果用于对参考数据进行重新排序，以显示欧几里德（3维）空间中哪些点“最接近”样本数据点。

我当前的代码使我可以简单地替换示例名称（即“ s1”），然后重新运行代码，对.csv文件的文件名进行最后更改。输出是按最接近样品的顺序（在欧几里得空间中）的参考数据列表。

我想自动执行该过程（进入循环？），以便可以使用示例名称列表（samples $ sample_name）在两个数据帧上简单地运行该过程，并希望还可以自动将其导出到.csv文件。

任何帮助将不胜感激！

# Reference data
country<-c("Austria","Austria","Italy","Italy","Turkey","Romania","France")
x<-c(18.881,18.881,18.929,19.139,19.008,19.083,18.883)
y<-c(15.627,15.627,15.654,15.772,15.699,15.741,15.629)
z<-c(38.597,38.597,38.842,39.409,39.048,39.224,38.740)
pb_age<-c(-106,-106,-87,-6,-55,-26,-104)
ref<-data.frame(country,x,y,z,pb_age) # Reference data

# Sample data (for euclidean measurements against Reference data)
sample_name<-c("s1","s2","s3")
x2<-c(18.694,18.729,18.731)
y2<-c(15.682,15.683,15.677)
z2<-c(38.883,38.989,38.891)
pb_age2<-c(120,97,82)
samples<-data.frame(sample_name,x2,y2,z2,pb_age2) # Sample data
colnames(samples)<-c("sample_name","x","y","z","pb_age") # To match Reference data headings

# Euclidean distance measurements
library(fields) # Need package for Euclidean distances

# THIS IS WHAT I WANT TO AUTOMATE/LOOP (BELOW)...
# Currently, I have to update the 'id' for each sample to get a result (for each sample)

id<-"s1"  # Sample ID - this is simply changed so the following code can be re-run for each sample

# The code
x1<-samples[which(samples$sample_name==id),c("x","y","z")]
x2<-ref[,c("x","y","z")]

result_distance<-rdist(x1,x2) # Computing the Euclidean distance
result_distance<-as.vector(result_distance) # Saving the results as a vector

euclid_ref<-data.frame(result_distance,ref) # Creating a new data.frame adding the Euclidean distances to the original Reference data
colnames(euclid_ref)[1]<-"euclid_distance" # Updating the column name for the result

# Saving and exporting the results
results<-euclid_ref[order(euclid_ref$euclid_distance),] # Re-ordering the data.frame by the euclide distances, smallest to largest
write.csv(results, file="s1.csv")   # Ideally, I want the file name to be the same as the SAMPLE id, i.e. s1, s2, s3...

达卡森

循环将非常简单，但是更像R的解决方案将是利用矢量化和函数的apply-family：

result_distances <- data.frame(t(rdist(samples[, 2:4], ref[, 2:4])), ref)
colnames(result_distances)[1:3] <- rep("euclid_distance", 3)
# str(result_distances)
# 'data.frame': 7 obs. of  8 variables:
#  $ euclid_distance: num  0.346 0.346 0.24 0.695 0.355 ...
#  $ euclid_distance: num  0.424 0.424 0.25 0.594 0.286 ...
#  $ euclid_distance: num  0.334 0.334 0.205 0.666 0.319 ...
#  $ country        : chr  "Austria" "Austria" "Italy" "Italy" ...
#  $ x              : num  18.9 18.9 18.9 19.1 19 ...
#  $ y              : num  15.6 15.6 15.7 15.8 15.7 ...
#  $ z              : num  38.6 38.6 38.8 39.4 39 ...
#  $ pb_age         : num  -106 -106 -87 -6 -55 -26 -104

通常，我们不会给多个列使用相同的名称，但是我们计划下一步将它们拔出：

results <- lapply(1:3, function(i) data.frame(result_distances[order(result_distances[, i]), c(i, 4:8)]))
names(results) <- samples$sample_name

现在，我们有了一个列表，results其中包含名为“ s1”，“ s2”和“ s3”的三个数据帧。列表使将功能轻松应用于许多类似组织的数据集变得容易。例如results[["s1"]]或results[[1]]打印第一个样本的数据框。现在我们写出结果：

sapply(names(results), function(x) write.csv(results[[x]], file=paste0(x, ".csv")))

这将创建3个文件，“ s1.csv”，“ s2.csv”，“ s3.csv”。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-04-5

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章

在R中自动执行（循环）欧式距离测量

在R中自动执行（循环）欧式距离测量

R中的加权欧式距离

Python中的欧式距离

R中13个行索引之间的欧式距离置换

计算scipy csr矩阵中的欧式距离

计算c中的欧式距离的函数

如何计算R中两个矩阵之间的欧式距离

使用R中的欧式距离在曲线上找到X，Y坐标

R中两个数据帧的行之间的欧式距离

numpy广播以执行欧式距离矢量化

如何使用x，y坐标自动计算networkx中邻居之间的欧式距离并找到最小生成树

如何从R中的一个回路的距离测量中得出距离矩阵？

图像之间的欧式距离

计算大型矩阵中RGB向量之间的欧式距离

计算大型矩阵中RGB向量之间的欧式距离

计算大型矩阵中RGB向量之间的欧式距离

如何用R在两组点（坐标）之间的欧式距离计算房屋距离

在MKMapView中以米为单位测量距离

熊猫-列之间的欧式距离

字典元素之间的欧式距离

列表中的列表，循环中的循环自动执行无聊的工作

欧式距离与曼哈顿距离的文本聚类

欧式距离与曼哈顿距离的文本聚类

如何测量Scala中每个循环步骤的执行时间？

如何在R中使用相关性而不是欧式距离来创建距离矩阵进行聚类？

计算存储在数据框中的轮廓之间的欧式距离。使用一行作为参考

数组中值之间的欧式距离-新数组中结果有序的asc

在 R 中自动执行子集命令

测量每天的第一个和最后一个位置记录与R中的动物之间的距离

测量每天的第一个和最后一个位置记录与R中的动物之间的距离