我赞赏下面提出了类似的问题,但我所看到的问题都没有超过数万行。即使代码很慢,这些虽然“大”也相对琐碎。就我而言,我正在处理数以百万计的行,因此即使是很小的优化也可能非常有用。
到目前为止我一直依赖的答案在这里和这里。我进行的升级并不令人惊讶...
大致而言,我有一个大约2,000,000行(格式为easting-northing-value)的参考文件,从中我寻找最接近的值ProcessedLineData
(我不能保证列表中存在完全匹配的内容,因此我需要在整个列出每个实例以找到最接近的可行点)。
我选择的第一个选择是:非常慢,它是:
AsciiFile.Sample _closestSample = Modifications.Value_and_Ref_Calc.ReferenceFile.Dataset[0].Data
.OrderBy(t => Public.CalculateDistance(t.Easting, t.Northing, ProcessedLineData.Easting, ProcessedLineData.Northing))
.First();
此后,我认为第二个示例Aggregate
会更好,所以我这样做了:
AsciiFile.Sample _closestSample = Modifications.Value_and_Ref_Calc.ReferenceFile.Dataset[0].Data
.Aggregate((x, y) =>
Public.CalculateDistance(x.Easting, x.Northing, ProcessedLineData.Easting, ProcessedLineData.Northing)
<
Public.CalculateDistance(y.Easting, y.Northing, ProcessedLineData.Easting, ProcessedLineData.Northing)
? x : y);
这仍然很慢(与上一个几乎相同)。然后,我想到了使用ToLookup()
其中一个示例中也介绍过的,但是我没有看到它会带来什么特定的好处。
如果相关,Public.CalculateDistance
则为:
public static double CalculateDistance(double _x1, double _y1, double _x2, double _y2)
{
return Math.Sqrt(Math.Pow(_x1 - _x2, 2) + Math.Pow(_y1 - _y2, 2));
}
在这些示例中,Modifications.Value_and_Ref_Calc.ReferenceFile.Dataset[0].Data
我列出了2,000,000个参考点。
如果我没有ProcessedLineData
找到最接近的值的数亿个点(那么,从具有数万个点的小文件到具有数千万个点的大文件的众多文件),这将变得不那么重要。 ..
最终目的:我正在寻找最接近的值,以便能够使用与特定对象关联的海拔高度,Modifications.Value_and_Ref_Calc.ReferenceFile.Dataset[0].Data
并使用它来修改my中的值ProcessedLineData
。整个过程中最慢的部分肯定是我最接近的值的搜索。
有什么明显的方法可以优化我所缺少的代码吗?
在不了解值的范围和精度的更多信息的情况下,或者对于查找的分布与参考点列表更改的假设不了解太多的情况下,对for
循环进行一些简单的优化后,相对于更快的OrderBy
/First
代码,100次查找的速度提高了约30倍:
pld
为您使用ProcessedLineData
,data
为Modifications.Value_and_Ref_Calc.ReferenceFile.Dataset[0].Data
您得到:
var _closestSample = data[0];
var dist = (_closestSample.Easting - pld.Easting) * (_closestSample.Easting - pld.Easting) + (_closestSample.Northing - pld.Northing) * (_closestSample.Northing - pld.Northing);
for (int j2 = 1; j2 < data.Count; ++j2) {
var y = data[j2];
var ydist = (y.Easting - pld.Easting) * (y.Easting - pld.Easting) + (y.Northing - pld.Northing) * (y.Northing - pld.Northing);
if (ydist < dist) {
dist = ydist;
_closestSample = y;
}
}
以我的时间超过2,000,000个条目data
列表和100次查找,OrderBy
/First
需要2.22秒,而for
0.06秒需要32倍的加速。
因此,我确定有比暴力破解更好的方法,并且经过一番研究,发现了莫顿密码和希尔伯特曲线。进行一些工作即可SpatialIndex
使用希尔伯特曲线生成一个类,并SpatialIndexMorton
使用莫顿索引生成一个类。我还将希尔伯特索引调整为仅索引位16-32,这提供了每秒的最佳查找。对于我的数据,莫顿曲线现在要快一些。
使用相同的随机数据测试,我发现该for
方法可以进行每秒147次查询,每秒希尔伯特索引5634次查询和每秒莫尔顿索引7370次查询,超过10,000次使用2,000,000个参考点的查询。请注意,空间索引大约需要3秒钟的设置时间,因此对于极少数的查找来说,执行暴力破解的速度更快for
-我在468个查找附近获得了收支平衡时间。
为了使这个(某种程度上)通用,我从一个用于地球坐标的(C#8.0)接口开始,该接口提供了一些辅助方法:
public interface ICoordinate {
double Longitude { get; set; }
double Latitude { get; set; }
public ulong MortonCode() {
float f = (float)Latitude;
uint ui;
unsafe { // perform unsafe cast (preserving raw binary)
float* fRef = &f;
ui = *((uint*)fRef);
}
ulong ixl = ui;
f = (float)Longitude;
unsafe { // perform unsafe cast (preserving raw binary)
float* fRef = &f;
ui = *((uint*)fRef);
}
ulong iyl = ui;
ixl = (ixl | (ixl << 16)) & 0x0000ffff0000ffffL;
iyl = (iyl | (iyl << 16)) & 0x0000ffff0000ffffL;
ixl = (ixl | (ixl << 8)) & 0x00ff00ff00ff00ffL;
iyl = (iyl | (iyl << 8)) & 0x00ff00ff00ff00ffL;
ixl = (ixl | (ixl << 4)) & 0x0f0f0f0f0f0f0f0fL;
iyl = (iyl | (iyl << 4)) & 0x0f0f0f0f0f0f0f0fL;
ixl = (ixl | (ixl << 2)) & 0x3333333333333333L;
iyl = (iyl | (iyl << 2)) & 0x3333333333333333L;
ixl = (ixl | (ixl << 1)) & 0x5555555555555555L;
iyl = (iyl | (iyl << 1)) & 0x5555555555555555L;
return ixl | (iyl << 1);
}
const int StartBitMinus1 = 31;
const int EndBit = 16;
//convert (x,y) to 31-bit Hilbert Index
public ulong HilbertIndex() {
float f = (float)Latitude;
uint x;
unsafe { // perform unsafe cast (preserving raw binary)
float* fRef = &f;
x = *((uint*)fRef);
}
f = (float)Longitude;
uint y;
unsafe { // perform unsafe cast (preserving raw binary)
float* fRef = &f;
y = *((uint*)fRef);
}
ulong hi = 0;
for (int bitpos = StartBitMinus1; bitpos >= EndBit; --bitpos) {
// extract s'th bit from x & y
var rx = (x >> bitpos) & 1;
var ry = (y >> bitpos) & 1;
hi <<= 2;
hi += (rx << 1) + (rx ^ ry);
//rotate/flip a quadrant appropriately
if (ry == 0) {
if (rx == 1) {
x = ~x;
y = ~y;
}
//Swap x and y
uint t = x;
x = y;
y = t;
}
}
return hi;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public double DistanceTo(ICoordinate b) =>
Math.Sqrt((Longitude - b.Longitude) * (Longitude - b.Longitude) + (Latitude - b.Latitude) * (Latitude - b.Latitude));
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public double Distance2To(ICoordinate b) => (Longitude - b.Longitude) * (Longitude - b.Longitude) + (Latitude - b.Latitude) * (Latitude - b.Latitude);
public ICoordinate MakeNew(double plat, double plong);
}
public static class ICoordinateExt {
public static ICoordinate Minus(this ICoordinate a, ICoordinate b) =>
a.MakeNew(a.Latitude - b.Latitude, a.Longitude - b.Longitude);
public static ICoordinate Plus(this ICoordinate a, ICoordinate b) =>
a.MakeNew(a.Latitude + b.Latitude, a.Longitude + b.Longitude);
}
然后,使用一个真实的类来实现该接口(它将被您的真实的类替换):
public class PointOfInterest : ICoordinate {
public double Longitude { get; set; }
public double Latitude { get; set; }
public PointOfInterest(double plat, double plong) {
Latitude = plat;
Longitude = plong;
}
public ICoordinate MakeNew(double plat, double plong) => new PointOfInterest(plat, plong);
}
和一个使用Hilbert曲线将aIEnumerable<ICoordinate>
转换为空间索引集合的类ICoordinate
:
public class SpatialIndex {
SortedList<ulong, List<ICoordinate>> orderedData;
List<ulong> orderedIndexes;
public SpatialIndex(IEnumerable<ICoordinate> data) {
orderedData = data.GroupBy(d => d.HilbertIndex()).ToSortedList(g => g.Key, g => g.ToList());
orderedIndexes = orderedData.Keys.ToList();
}
public ICoordinate FindNearest(ICoordinate aPoint) {
var hi = aPoint.HilbertIndex();
var nearestIndex = orderedIndexes.FindNearestIndex(hi);
var nearestGuess = orderedData.Values[nearestIndex][0];
var guessDist = (nearestGuess.Longitude - aPoint.Longitude) * (nearestGuess.Longitude - aPoint.Longitude) + (nearestGuess.Latitude - aPoint.Latitude) * (nearestGuess.Latitude - aPoint.Latitude);
if (nearestIndex > 0) {
var tryGuess = orderedData.Values[nearestIndex-1][0];
var tryDist = (tryGuess.Longitude - aPoint.Longitude) * (tryGuess.Longitude - aPoint.Longitude) + (tryGuess.Latitude - aPoint.Latitude) * (tryGuess.Latitude - aPoint.Latitude);
if (tryDist < guessDist) {
nearestGuess = tryGuess;
guessDist = tryDist;
}
}
var offsetPOI = new PointOfInterest(guessDist, guessDist);
var minhi = (aPoint.Minus(offsetPOI)).HilbertIndex();
var minhii = orderedIndexes.FindNearestIndex(minhi);
if (minhii > 0)
--minhii;
var maxhi = (aPoint.Plus(offsetPOI)).HilbertIndex();
var maxhii = orderedIndexes.FindNearestIndex(maxhi);
for (int j2 = minhii; j2 < maxhii; ++j2) {
var tryList = orderedData.Values[j2];
for (int j3 = 0; j3 < tryList.Count; ++j3) {
var y = tryList[j3];
var ydist = (y.Longitude - aPoint.Longitude) * (y.Longitude - aPoint.Longitude) + (y.Latitude - aPoint.Latitude) * (y.Latitude - aPoint.Latitude);
if (ydist < guessDist) {
nearestGuess = y;
guessDist = ydist;
}
}
}
return nearestGuess;
}
}
还有一个类似的使用莫顿曲线的类:
public class SpatialIndexMorton {
SortedList<ulong, List<ICoordinate>> orderedData;
List<ulong> orderedIndexes;
public SpatialIndexMorton(IEnumerable<ICoordinate> data) {
orderedData = data.GroupBy(d => d.MortonCode()).ToSortedList(g => g.Key, g => g.ToList());
orderedIndexes = orderedData.Keys.ToList();
}
public ICoordinate FindNearest(ICoordinate aPoint) {
var mc = aPoint.MortonCode();
var nearestIndex = orderedIndexes.FindNearestIndex(mc);
var nearestGuess = orderedData.Values[nearestIndex][0];
var guessDist = (nearestGuess.Longitude - aPoint.Longitude) * (nearestGuess.Longitude - aPoint.Longitude) + (nearestGuess.Latitude - aPoint.Latitude) * (nearestGuess.Latitude - aPoint.Latitude);
if (nearestIndex > 0) {
var tryGuess = orderedData.Values[nearestIndex-1][0];
var tryDist = (tryGuess.Longitude - aPoint.Longitude) * (tryGuess.Longitude - aPoint.Longitude) + (tryGuess.Latitude - aPoint.Latitude) * (tryGuess.Latitude - aPoint.Latitude);
if (tryDist < guessDist) {
nearestGuess = tryGuess;
guessDist = tryDist;
}
}
var offsetPOI = new PointOfInterest(guessDist, guessDist);
var minmc = (aPoint.Minus(offsetPOI)).MortonCode();
var minmci = orderedIndexes.FindNearestIndex(minmc);
if (minmci > 0)
--minmci;
var maxmc = (aPoint.Plus(offsetPOI)).MortonCode();
var maxmci = orderedIndexes.FindNearestIndex(maxmc);
for (int j2 = minmci; j2 < maxmci; ++j2) {
var tryList = orderedData.Values[j2];
for (int j3 = 0; j3 < tryList.Count; ++j3) {
var y = tryList[j3];
var ydist = (y.Longitude - aPoint.Longitude) * (y.Longitude - aPoint.Longitude) + (y.Latitude - aPoint.Latitude) * (y.Latitude - aPoint.Latitude);
if (ydist < guessDist) {
nearestGuess = y;
guessDist = ydist;
}
}
}
return nearestGuess;
}
}
还有一些辅助扩展方法:
public static class ListExt {
public static int FindNearestIndex<T>(this List<T> l, T possibleKey) {
var keyIndex = l.BinarySearch(possibleKey);
if (keyIndex < 0) {
keyIndex = ~keyIndex;
if (keyIndex == l.Count)
keyIndex = l.Count - 1;
}
return keyIndex;
}
}
public static class IEnumerableExt {
public static SortedList<TKey, TValue> ToSortedList<T, TKey, TValue>(this IEnumerable<T> src, Func<T, TKey> keySelector, Func<T, TValue> valueSelector) =>
new SortedList<TKey, TValue>(src.ToDictionary(keySelector, valueSelector));
}
最后,一些使用此示例代码,您的引用在中data
,您的查找值在中plds
:
var hilbertIndex = new SpatialIndex(data);
var ans = new (ICoordinate, ICoordinate)[lookups];
for (int j1 = 0; j1 < lookups; ++j1) {
ICoordinate pld = plds[j1];
ans[j1] = (pld, hilbertIndex.FindNearest(pld));
}
更新:我修改了查找最近的算法,以获取索引上目标点上方和下方的点的最接近点,而不仅仅是尝试上面的点。这提供了另一个不错的加速。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句