Imagine you have a list of numbers (or letters), such as
1177783777297461145777267337774652113777236237118777
I want to find the most frequent combinations of numbers in this list:
for 1-digit-long combinations - it is the most frequent number in this list
for 2-digit-long combinations - probably '11'
for 3-digits-long combinations - probably '777' etc
Is there some special algorythm for such a tasks?
UPDATE Well, by myself I have coded the following (Java). Looks like execution time is proportional to data size multiplied by pattern size:
public static void main(String[] args)
{
int DATA_SIZE = 10000;
int[] data = new int[DATA_SIZE];
for (int i = 0; i < DATA_SIZE; i++)
{
data[i] = (int) (10 * Math.random()) % 10;
System.out.print(data[i]);
}
int[] pattern1 = new int[]{1, 2, 3};
int[] pattern2 = new int[]{7, 7, 7};
int[] pattern3 = new int[]{7, 7};
System.out.println();
System.out.println(match(data, pattern1));
System.out.println(match(data, pattern2));
System.out.println(match(data, pattern3));
}
static int match(int[] data, int[] pattern)
{
int matches = 0;
int i = 0;
while (i < data.length)
{
matches = isEqual(data, i, pattern) ? matches + 1 : matches;
i++;
}
return matches;
}
static boolean isEqual(int[] a, int startIndex, int[] a2)
{
if (a == a2)
{
return true;
}
if (a == null || a2 == null)
{
return false;
}
for (int i = 0; i < a2.length; i++)
{
if (a[startIndex + i] != a2[i])
{
return false;
}
}
return true;
}
To find the largest number of repeats of a sequence of length at least k in a string of length n, you can build a suffix tree (http://en.wikipedia.org/wiki/Suffix_tree) in linear time, then find the node that has the most children that describes sequences of length k (or more) from the root.
Overall, this is linear time in the length of the input string.
For small k, you're better off with a naive algorithm:
from collections import Counter
def input(s, k):
c = Counter()
for i in xrange(len(s) - k):
c[s[i : i + k]] += 1
return c.most_common(1)[0][0]
for k in xrange(1, 4):
print input('1177783777297461145777267337774652113777236237118777', k)
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句