Project Euler prob 10 using haskell

deejay

Okay, so I made a small modification that seems to have made a whole lot of difference to haskell. Here is how it goes:

I implemented the Sieve of Eratosthenes for Prob 10 from project euler. Here's how it goes:

primesBelowN :: Integer -> [Integer]
primesBelowN n = 2:3:filter f [6*k+i | k <- [1..(n-1)`div`6], i <- [-1, 1]]
                 where f x = foldr g True [2..truncate(sqrt(fromInteger x))]
                             where g t ac = (x `rem` t /= 0) && ac

main = print $ sum $ primesBelowN 2000000

Compiling it with ghc, I get a runtime of 8.95 seconds:

$ ghc -O3 010SummationOfPrimes.hs
$ time 010SummationOfPrimes
142913828922
8.739u 0.122s 0:08.95 98.8% 0+0k 2384+0io 1pf+0w

I thought I could optimize the code by taking advantage of haskell's lazy evaluation in the g function. Could be done (or so I thought) by simply placing the ac as the first argument to &&, so that it does not compute the inequality if ac == False:

primesBelowN :: Integer -> [Integer]
primesBelowN n = 2:3:filter f [6*k+i | k <- [1..(n-1)`div`6], i <- [-1, 1]]
                 where f x = foldr g True [2..truncate(sqrt(fromInteger x))]
                             where g t ac = ac && (x `rem` t /= 0)

main = print $ sum $ primesBelowN 2000000

Surprisingly, it makes the program 4X Slower!. The runtime now, noticeably larger, is 30.94s:

$ ghc -O3 010SummationOfPrimes.hs
$ time 010SummationOfPrimes
142913828922
30.765u 0.157s 0:30.94 99.9%    0+0k 2384+0io 1pf+0w

I have no idea what went wrong... any hints/suggestions/answers? Thanks in advance!

EDIT

So, I was playing with this when I toppled over another thing. Looks like it is easy to land into infinite loops if I just modulate my function like this:

primesBelowN :: Integer -> [Integer]
primesBelowN n = 2:3:filter f [6*k+i | k <- [1..(n-1)`div`6], i <- [-1, 1]]
                 where f x = foldr g True [m | m <- [2..], 
                                               m <= truncate(sqrt(fromInteger x))]
                             where g t ac = (x `rem` t /= 0) && ac

main = print $ sum $ primesBelowN 2000000

The process memory just keeps exploding in this case to huge numbers (80Gig, before I killed it), without any outputs:

$ ghc -O3 010SummationOfPrimes.hs
$ time 010SummationOfPrimes
^C20.401u 7.994s 0:28.42 99.8%  0+0k 2384+0io 1pf+0w

Any ideas what's going wrong now?

Veritas

For the first part of the question notice how foldr operates:

foldr g x0 [x1, x2, x3, .., xn] = g x1 (g x2 (g x3 (..(..(g xn x0)))..)

In our case,

foldr g True [2..truncate(sqrt(fromInteger x))]
      where g t ac = (x `rem` t /= 0) && ac

is equivalent to:

foldr g True (map h [2..truncate(sqrt(fromInteger x))])
      where g t ac = t && ac
            h t = x `rem` t /= 0

which is equivalent to:

foldr (&&) True (map h [2..truncate(sqrt(fromInteger x))])
      where h t = x `rem` t /= 0

and which in turn is equivalent to:

(h x1) && ((h x2) && ((h x3) &&(....((h xn) && True)))..)

Remember that we are dealing with a lazy language in which evaluation is led by pattern matching. && is strict only in its first argument which means the second argument expression will not have to be generated unless necessary and even then it will only have to proceed until (h x2) is at the outermost level. As such a more /appropriate/ representation is one such as this (partial pseudocode):

(h x1) && {instructions to generate (h x2) && ((h x3) && (....((h xn) && True)))..)}

As a result the memory requirement to calculate the full && expression is mostly constant. We only need to keep (h x1) and the instructions to generate the next values. If (h x1) is False we stop, otherwise we discard it and generate another pair of values and instructions. This is obviously not how haskell is actually implemented but suffices as a model.

Now if you switch the argument order, && will have to first evaluate the second argument's expression in which && will in turn have to fully evaluate the next sub-expression and so on, requiring all of the intermediate values to stay in the stack until the whole expression is fully expanded and evaluated as necessary. Also notice that the remainder checks will be done in reverse order, making the procedure even more inefficient as it is less likely that a composite number is a multiple of a larger prime compared to a smaller one. As a result the total running time and memory requirements are worse.

Regarding the second (edited) part of the question, the problem is that you are no longer using a finite list. The restriction on the list comprehension works as filtering instead of as a limit. In order for foldr using && to finish, you will either need to provide an empty list (which is impossible with the infinite list's definition) or pattern match against single element of the list for which the predicate returns False. Unfortunately there are cases (x being prime) for which the predicate will not return False and foldr will continue attempting to pattern match against another element for which it will. All this in futility since no more elements will be produced after a point because of the guard. It fails for pretty much the same reasons that this fails:

head $ filter (const False) [1..]

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related