我有一个字符串数组,用于存储旧单词和相应的新单词。
string[,] arrayWords = new string[,] { { "daniel", "dany" }, { "ebrid", "ebraham" }, { "orlang", "lang" }, { "edison", "edwaid" } };
现在我想用这个数组检查输入字符串的每个单词,并需要用新单词替换旧单词。什么是最快的检查方法?
例如:如果string InputString = "ebrid jackson";
我想用 ebraham 替换那个 ebrid 并且需要将结果输出作为“ebraham jackson”
我有一个字符串数组,用于存储旧单词和相应的新单词。
string[,] arrayWords = new string[,] { { "daniel", "dany" }, { "ebrid", "ebraham" }, { "orlang", "lang" }, { "edison", "edwaid" } };
现在我想检查输入字符串的每个单词 (...)
因此,您有一个输入字符串(取自您的示例):
string InputString = "ebrid jackson";
此外,您要检查它的每个单词。因此,我们首先将该字符串拆分为单词枚举。例如:
var words = InputString.Split(' ');
然后我们将检查每个条件是否存在:
var words = InputString.Split(' ');
foreach (var word in words)
{
if (someCondition(word))
{
//...
}
}
(...) 使用这个数组 (...)
啊,你正在寻找。好的…
var words = InputString.Split(' ');
foreach (var word in words)
{
for (int index = 0; index < arrayWords.GetLength(0); index++)
{
if (word == arrayWords[index, 0])
{
//...
}
}
}
实际上,我们可以做得更好……而且您已经被告知:
我强烈建议使用字典而不是字符串 [,]。
你可以像这样声明它:
var dictionary = new Dictionary<string,string> { { "daniel", "dany" }, { "ebrid", "ebraham" }, { "orlang", "lang" }, { "edison", "edwaid" } };
或者,您可以根据您拥有的内容构建它:
var dictionary = new Dictionary<string,string> ();
for (int index = 0; index < arrayWords.GetLength(0); index++)
{
dictionary[arrayWords[index, 0]] = arrayWords[index, 1];
}
然后查字典:
var words = InputString.Split(' ');
foreach (var word in words)
{
if (dictionary.ContainsKey(word))
{
//...
}
}
这段代码不仅更简单,而且速度更快,因为字典优化了密钥访问。在字典中查找键的时间是常量 ( O(1)
)[*],因为它内部使用了一个哈希表。另一方面,在数组中查找键的时间是线性的 ( O(N)
),因为您必须逐项检查。请注意,如果您对数组进行了排序,并且要执行二分查找,则结果为O(log(N))
.
[*]:除了哈希冲突,它是线性的。
并且需要用新词替换旧词。
哦,你要换了。我们能做到这一点。
var words = InputString.Split(' ');
var result = new StringBuilder();
var first = true;
foreach (var word in words)
{
if (first)
{
first = false;
}
else
{
result.Append(' ');
}
if (dictionary.ContainsKey(word))
{
result.Append(dictionary[word]);
}
else
{
result.Append(word);
}
}
Console.WriteLine(result.ToString());
输出: ebraham jackson
我想提一下,使用first
这种方式将保留原始空间。或者,您可以Join
:
var words = InputString.Split(' ');
var result = new List<string>();
foreach (var word in words)
{
if (dictionary.ContainsKey(word))
{
result.Add(dictionary[word]);
}
else
{
result.Add(word);
}
}
Console.WriteLine(string.Join(" ", result));
List 的变体更易于阅读和维护。但是,它有额外的成本,因为它需要分配比使用更多的字符串StringBuilder
。
我想我们可以继续使用大家最喜欢的:Linq:
var words = InputString.Split(' ');
var result = string.Join(" ", words.Select(word => dictionary.ContainsKey(word) ? dictionary[word] : word));
当然,我们可以结合 Linq 和 StringBuilder:
var words = inputString.Split(' ');
var str = words.Aggregate((StringBuilder: new StringBuilder(), First: true), (result, word) =>
{
if (!result.First)
{
result.StringBuilder.Append(" ");
}
result.StringBuilder.Append(dictionary.ContainsKey(word) ? dictionary[word] : word);
return (StringBuilder: result.StringBuilder, First: false);
}).ToString();
GC.KeepAlive(str);
什么是最快的检查方法?
让我们考虑一下我们上面所做的替代方案......
好吧,我们可以尝试使用 good old String.Replace
,但我们正在尝试匹配整个单词,并且String.Replace
不会这样做。但是,Regex.Replace
会使用单词边界:
foreach (var pair in dictionary)
{
InputString = Regex.Replace(InputString, @"\b" + Regex.Escape(pair.Key) + @"\b", pair.Value);
}
Console.WriteLine(InputString);
输出: ebraham jackson
我想指出,这个替代代码与我之前发布的代码不完全相同。不同之处在于它甚至会替换它确实替换的内容。也就是说,如果你{ "ebraham", "something" }
在最后有一对,这个替代代码会产生something jackson
而前一个会产生,ebraham jackson
因为逐字逐句它不会替换它已经替换的内容。
如果我们在输入字符串中有 N 个单词,并且有 M 个替换。在前面的代码中,我们会检查O(N)
单词,并为每个单词替换O(log(M))
. 因此,复杂度为O(N*log(M))
。
在这个替代代码中,我们O(M)
检查每个替换,对于每个替换,我们检查O(N)
要替换的整个单词。因此,复杂度为O(N*M)
。
并且O(N*log(M))
比 更好地扩展O(N*M)
。
对于心理锻炼,让我们假设我们有一个方法可以在 1 分钟内处理任何大小的列表,我们有另一个方法可以完成相同的工作,但每个项目需要 2 秒。第一种方法(需要 1 分钟的方法)是,O(1)
而另一种方法是O(N)
. O(1)
比 更好地扩展O(N)
,但是,如果您拥有的列表长度为 29 项或更少,则第二种方法会更快。我发布这个例子是为了表明更好的代码复杂性并不意味着更快的运行时间。
我写了一些我在下面展示的测试。它们将执行 13 个替代方法,每个方法执行 100000000 次。输入与 OP 中显示的相同,这意味着它忽略了缩放。
var arrayWords = new[,] { { "daniel", "dany" }, { "ebrid", "ebraham" }, { "orlang", "lang" }, { "edison", "edwaid" } };
var inputString = "ebrid jackson";
Stopwatch sw;
/*----------------------------------*/
sw = Stopwatch.StartNew();
for (var x = 100000000; x > 0; x--)
{
var words = inputString.Split(' ');
var result = new List<string>();
foreach (var word in words)
{
var found = false;
for (var index = arrayWords.GetLength(0) - 1; index >= 0; index--)
{
if (word != arrayWords[index, 0])
{
continue;
}
result.Add(arrayWords[index, 1]);
found = true;
break;
}
if (!found)
{
result.Add(word);
}
}
var str = string.Join(" ", result);
GC.KeepAlive(str);
}
sw.Stop();
Console.WriteLine($"array & foreach & List - Time taken: {sw.Elapsed.TotalMilliseconds}ms");
GC.Collect();
GC.WaitForPendingFinalizers();
/*----------------------------------*/
sw = Stopwatch.StartNew();
for (var x = 100000000; x > 0; x--)
{
var words = inputString.Split(' ');
var result = new List<string>();
for (int i = 0, wordsLength = words.Length; i < wordsLength; i++)
{
var word = words[i];
var found = false;
for (var index = arrayWords.GetLength(0) - 1; index >= 0; index--)
{
if (word != arrayWords[index, 0])
{
continue;
}
result.Add(arrayWords[index, 1]);
found = true;
break;
}
if (!found)
{
result.Add(word);
}
}
var str = string.Join(" ", result);
GC.KeepAlive(str);
}
sw.Stop();
Console.WriteLine($"array & for & List - Time taken: {sw.Elapsed.TotalMilliseconds}ms");
GC.Collect();
GC.WaitForPendingFinalizers();
/*----------------------------------*/
sw = Stopwatch.StartNew();
for (var x = 100000000; x > 0; x--)
{
var words = inputString.Split(' ');
var result = new StringBuilder();
var first = true;
foreach (var word in words)
{
if (first)
{
first = false;
}
else
{
result.Append(" ");
}
var found = false;
for (var index = arrayWords.GetLength(0) - 1; index >= 0; index--)
{
if (word != arrayWords[index, 0])
{
continue;
}
result.Append(arrayWords[index, 1]);
found = true;
break;
}
if (!found)
{
result.Append(word);
}
}
var str = result.ToString();
GC.KeepAlive(str);
}
sw.Stop();
Console.WriteLine($"array & foreach & StringBuilder - Time taken: {sw.Elapsed.TotalMilliseconds}ms");
GC.Collect();
GC.WaitForPendingFinalizers();
/*----------------------------------*/
sw = Stopwatch.StartNew();
for (var x = 100000000; x > 0; x--)
{
var words = inputString.Split(' ');
var result = new StringBuilder();
var first = true;
for (int i = 0, wordsLength = words.Length; i < wordsLength; i++)
{
var word = words[i];
if (first)
{
first = false;
}
else
{
result.Append(" ");
}
var found = false;
for (var index = arrayWords.GetLength(0) - 1; index >= 0; index--)
{
if (word != arrayWords[index, 0])
{
continue;
}
result.Append(arrayWords[index, 1]);
found = true;
break;
}
if (!found)
{
result.Append(word);
}
}
var str = result.ToString();
GC.KeepAlive(str);
}
sw.Stop();
Console.WriteLine($"array & for & StringBuilder - Time taken: {sw.Elapsed.TotalMilliseconds}ms");
GC.Collect();
GC.WaitForPendingFinalizers();
/*----------------------------------*/
sw = Stopwatch.StartNew();
for (var x = 100000000; x > 0; x--)
{
var str = inputString;
for (var index = arrayWords.GetLength(0) - 1; index >= 0; index--)
{
str = Regex.Replace(str, @"\b" + Regex.Escape(arrayWords[index, 0]) + @"\b", arrayWords[index, 1]);
}
GC.KeepAlive(str);
}
sw.Stop();
Console.WriteLine($"array & Regex - Time taken: {sw.Elapsed.TotalMilliseconds}ms");
GC.Collect();
GC.WaitForPendingFinalizers();
/*----------------------------------*/
// Notice I am not cointing the time to build the Dictionary
// I decided this because I suppose it is possible you get
// the input into the Dictionary directly,
// And if you cannot, then this should be fast anyway
// Run your own tests if you disagree.
var dictionary = new Dictionary<string, string>();
for (var index = 0; index < arrayWords.GetLength(0); index++)
{
dictionary[arrayWords[index, 0]] = arrayWords[index, 1];
}
/*----------------------------------*/
sw = Stopwatch.StartNew();
for (var x = 0; x < 100000000; x++)
{
var words = inputString.Split(' ');
var result = new List<string>();
foreach (var word in words)
{
result.Add(dictionary.ContainsKey(word) ? dictionary[word] : word);
}
var str = string.Join(" ", result);
GC.KeepAlive(str);
}
sw.Stop();
Console.WriteLine($"Dictionary & foreach & List - Time taken: {sw.Elapsed.TotalMilliseconds}ms");
GC.Collect();
GC.WaitForPendingFinalizers();
/*----------------------------------*/
sw = Stopwatch.StartNew();
for (var x = 0; x < 100000000; x++)
{
var words = inputString.Split(' ');
var result = new List<string>();
for (int i = 0, wordsLength = words.Length; i < wordsLength; i++)
{
var word = words[i];
result.Add(dictionary.ContainsKey(word) ? dictionary[word] : word);
}
var str = string.Join(" ", result);
GC.KeepAlive(str);
}
sw.Stop();
Console.WriteLine($"Dictionary & for & List - Time taken: {sw.Elapsed.TotalMilliseconds}ms");
GC.Collect();
GC.WaitForPendingFinalizers();
/*----------------------------------*/
sw = Stopwatch.StartNew();
for (var x = 0; x < 100000000; x++)
{
var words = inputString.Split(' ');
var str = string.Join(" ", words.Select(word => dictionary.ContainsKey(word) ? dictionary[word] : word));
GC.KeepAlive(str);
}
sw.Stop();
Console.WriteLine($"Dictionary & Linq - Time taken: {sw.Elapsed.TotalMilliseconds}ms");
GC.Collect();
GC.WaitForPendingFinalizers();
/*----------------------------------*/
sw = Stopwatch.StartNew();
for (var x = 0; x < 100000000; x++)
{
var words = inputString.Split(' ');
var result = new StringBuilder();
var first = true;
foreach (var word in words)
{
if (first)
{
first = false;
}
else
{
result.Append(" ");
}
result.Append(dictionary.ContainsKey(word) ? dictionary[word] : word);
}
var str = result.ToString();
GC.KeepAlive(str);
}
sw.Stop();
Console.WriteLine($"Dictionary & foreach & StringBuilder - Time taken: {sw.Elapsed.TotalMilliseconds}ms");
GC.Collect();
GC.WaitForPendingFinalizers();
/*----------------------------------*/
sw = Stopwatch.StartNew();
for (var x = 0; x < 100000000; x++)
{
var words = inputString.Split(' ');
var result = new StringBuilder();
var first = true;
for (int i = 0, wordsLength = words.Length; i < wordsLength; i++)
{
var word = words[i];
if (first)
{
first = false;
}
else
{
result.Append(" ");
}
result.Append(dictionary.ContainsKey(word) ? dictionary[word] : word);
}
var str = result.ToString();
GC.KeepAlive(str);
}
sw.Stop();
Console.WriteLine($"Dictionary & for & StringBuilder - Time taken: {sw.Elapsed.TotalMilliseconds}ms");
GC.Collect();
GC.WaitForPendingFinalizers();
/*----------------------------------*/
sw = Stopwatch.StartNew();
for (var x = 0; x < 100000000; x++)
{
var words = inputString.Split(' ');
var str = words.Aggregate((StringBuilder: new StringBuilder(), First: true), (result, word) =>
{
if (!result.First)
{
result.StringBuilder.Append(" ");
}
result.StringBuilder.Append(dictionary.ContainsKey(word) ? dictionary[word] : word);
return (StringBuilder: result.StringBuilder, First: false);
}).ToString();
GC.KeepAlive(str);
}
sw.Stop();
Console.WriteLine($"Dictionary & Linq & StringBuilder - Time taken: {sw.Elapsed.TotalMilliseconds}ms");
GC.Collect();
GC.WaitForPendingFinalizers();
/*----------------------------------*/
sw = Stopwatch.StartNew();
for (var x = 100000000 - 1; x >= 0; x--)
{
var str = inputString;
foreach (var pair in dictionary)
{
str = Regex.Replace(str, @"\b" + Regex.Escape(pair.Key) + @"\b", pair.Value);
}
GC.KeepAlive(str);
}
sw.Stop();
Console.WriteLine($"Dictionary & foreach & Regex - Time taken: {sw.Elapsed.TotalMilliseconds}ms");
GC.Collect();
GC.WaitForPendingFinalizers();
/*----------------------------------*/
sw = Stopwatch.StartNew();
for (var x = 100000000 - 1; x >= 0; x--)
{
var str = dictionary.Aggregate(inputString, (current, pair) => Regex.Replace(current, @"\b" + Regex.Escape(pair.Key) + @"\b", pair.Value));
GC.KeepAlive(str);
}
sw.Stop();
Console.WriteLine($"Dictionary & Linq & Regex - Time taken: {sw.Elapsed.TotalMilliseconds}ms");
GC.Collect();
GC.WaitForPendingFinalizers();
/*----------------------------------*/
输出:
array & foreach & List - Time taken: 28987.198ms
array & for & List - Time taken: 27749.3735ms
array & foreach & StringBuilder - Time taken: 18953.0578ms
array & for & StringBuilder - Time taken: 18682.1843ms
array & Regex - Time taken: 224451.3549ms
Dictionary & foreach & List - Time taken: 30875.7747ms
Dictionary & for & List - Time taken: 31131.463ms
Dictionary & Linq - Time taken: 30594.4158ms
Dictionary & foreach & StringBuilder - Time taken: 22467.4147ms
Dictionary & for & StringBuilder - Time taken: 21804.911ms
Dictionary & Linq & StringBuilder - Time taken: 39030.8662ms
Dictionary & foreach & Regex - Time taken: 233443.6416ms
Dictionary & Linq & Regex - Time taken: 245098.4077ms
你可以看到那里 - 如果你选择相信我的结果 - StringBuilder 是要走的路。
回到缩放问题,我选择了使用数组的更快版本和使用字典的更快版本,并使用四种不同的输入对它们进行了测试,每一种都比前一种大。
我不会在这里发布代码,因为它超过了本网站上答案的最大大小。事实上,我能够测试的所有声称可以在线运行 C# 的东西最终都会超时……所以pastebin。事实上,我最终使用的迭代次数减少了 100 倍,因为大数据集花费的时间太长。
此测试的数据大小为:
initial: 4 matches, 2 words
small: 8 matches, 6 words
medium: 32 matches, 14 words
large: 256 matches, 57 words
输出:
initial
array & for & StringBuilder - Time taken: 191.7385ms
Dictionary & for & StringBuilder - Time taken: 215.2904ms
small
array & for & StringBuilder - Time taken: 622.6672ms
Dictionary & for & StringBuilder - Time taken: 664.1003ms
medium
array & for & StringBuilder - Time taken: 2393.254ms
Dictionary & for & StringBuilder - Time taken: 1353.2854ms
large
array & for & StringBuilder - Time taken: 47555.9804ms
Dictionary & for & StringBuilder - Time taken: 6548.9997ms
如您所见,对于初始数据集和小数据集使用数组更快,但从那里开始使用 Dictionary 更快。
测试规格:
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句