Integer factorization

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Integer factorization

Sergey V. Mikhanov
   Hello,

I am solving a problem of finding the largest prime factor of
600851475143 (Project Euler is great for learning new language!), and
came with the following solution (it uses the most ineffective
approach to finding prime numbers, however is able to factor the above
number in fraction of second):

factors :: Integer -> [Integer]

factors n = doFactors n (eratosthenesFilter [1..n])

doFactors n primes
    | null newPrimes = []
    | otherwise      =
        let topPrime = head newPrimes in
        if topPrime == n
            then [topPrime]
            else topPrime : (doFactors (n `quot` topPrime) primes)
    where
        newPrimes = snd $ break (\x -> (n `rem` x) == 0) primes

eratosthenesFilter :: [Integer] -> [Integer]

eratosthenesFilter [] = []
eratosthenesFilter (num : nums)
    | num == 1  = eratosthenesFilter nums
    | otherwise = num : (eratosthenesFilter $ doFilter num nums)
    where
        doFilter num nums = filter (\x -> x > num && (x `rem` num) /= 0) nums

What would you do different (including stylistic changes)? What are
the general comments about efficiency (not of the algorithm, but of
the implementation: for example, is it fine to use break at every
invocation of doFactors?) and elegance of the solution?

Thanks and regards,
Sergey
Reply | Threaded
Open this post in threaded view
|

Re: Integer factorization

Christian Maeder-2
On
http://www.haskell.org/haskellwiki/Prime_numbers

"primeFactors" should do what you want (although I don't like the
the pattern matching on "1")

Cheers Christian

Sergey V. Mikhanov wrote:

>    Hello,
>
> I am solving a problem of finding the largest prime factor of
> 600851475143 (Project Euler is great for learning new language!), and
> came with the following solution (it uses the most ineffective
> approach to finding prime numbers, however is able to factor the above
> number in fraction of second):
>
> factors :: Integer -> [Integer]
>
> factors n = doFactors n (eratosthenesFilter [1..n])
>
> doFactors n primes
>     | null newPrimes = []
>     | otherwise      =
>         let topPrime = head newPrimes in
>         if topPrime == n
>             then [topPrime]
>             else topPrime : (doFactors (n `quot` topPrime) primes)
>     where
>         newPrimes = snd $ break (\x -> (n `rem` x) == 0) primes
>
> eratosthenesFilter :: [Integer] -> [Integer]
>
> eratosthenesFilter [] = []
> eratosthenesFilter (num : nums)
>     | num == 1  = eratosthenesFilter nums
>     | otherwise = num : (eratosthenesFilter $ doFilter num nums)
>     where
>         doFilter num nums = filter (\x -> x > num && (x `rem` num) /= 0) nums
>
> What would you do different (including stylistic changes)? What are
> the general comments about efficiency (not of the algorithm, but of
> the implementation: for example, is it fine to use break at every
> invocation of doFactors?) and elegance of the solution?
>
> Thanks and regards,
> Sergey
Reply | Threaded
Open this post in threaded view
|

Re: Integer factorization

Heinrich Apfelmus
In reply to this post by Sergey V. Mikhanov
Sergey V. Mikhanov wrote:

>
> factors :: Integer -> [Integer]
> factors n = doFactors n (eratosthenesFilter [1..n])
>
> doFactors n primes
>     | null newPrimes = []
>     | otherwise      =
>         let topPrime = head newPrimes in
>         if topPrime == n
>             then [topPrime]
>             else topPrime : (doFactors (n `quot` topPrime) primes)
>     where
>         newPrimes = snd $ break (\x -> (n `rem` x) == 0) primes
>
> eratosthenesFilter :: [Integer] -> [Integer]
> eratosthenesFilter [] = []
> eratosthenesFilter (num : nums)
>     | num == 1  = eratosthenesFilter nums
>     | otherwise = num : (eratosthenesFilter $ doFilter num nums)
>     where
>         doFilter num nums = filter (\x -> x > num && (x `rem` num) /= 0) nums
>
> What would you do different (including stylistic changes)? What are
> the general comments about efficiency (not of the algorithm, but of
> the implementation: for example, is it fine to use break at every
> invocation of doFactors?) and elegance of the solution?

Stylistically, one usually uses shorter variable names in Haskell. Also,
the guards in  doFactors  are better expressed as pattern matching and
the if can be turned into guards.

    factors :: Integer -> [Integer]
    factors n = go n $ eratosthenes [2..n]
        where
        go n []              = []
        go n (p:ps)
           | n `mod` p == 0  = p : go (n `div` p) ps
           | otherwise       = go n ps


    eratosthenes :: [Integer] -> [Integer]
    eratosthenes []     = []
    eratosthenes (p:ps) = p : erathostenes ps'
       where
       ps' = filter (\x -> x > p && (x `mod` p) /= 0) ps


Other than that, efficiency is best understood as algorithmic
efficiency; there are not straightforward "tweaks" that give you the
warm fuzzy feeling of "writing efficient code".


Regards,
apfelmus

--
http://apfelmus.nfshost.com

Reply | Threaded
Open this post in threaded view
|

Re: Integer factorization

Francesco Bochicchio
2009/3/10 Heinrich Apfelmus <[hidden email]>

> Sergey V. Mikhanov wrote:
> >
>

... some code ...


>
> >
> > What would you do different (including stylistic changes)? What are
> > the general comments about efficiency (not of the algorithm, but of
> > the implementation: for example, is it fine to use break at every
> > invocation of doFactors?) and elegance of the solution?
>
> Stylistically, one usually uses shorter variable names in Haskell.


<beginner rant>
Sometime too short peraphs?  At least, this is one of the things that slows
down my understanding of code posted
on this list on or on various haskell tutorial. In any other language I
know, programmers learn to give meaningful names to variable and functions,
so when one reads a program, one can use the name to remember what the
function does. Then one cames to haskell ...
I guess the short names comes from mathematic background, but  still ...
haskell is already very succint - even more so
when you use pointfree programming - and if one also uses names like a,e,i (
look at Array function definitions ), ...
</beginner rant>

Rant apart, I notice that in my own excercises I tend to shorten names, so
maybe there is a reason for that.
Nevertheless readability tends to be a big issue in languages used in IT
industry, and my feeling is that haskell
tends to err on the laconic side of the balance.


Out of curiosity, there is any reason why you called the auxiliary function
'go' ?

Ciao
-------
FB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/beginners/attachments/20090311/a2dae0f4/attachment-0001.htm
Reply | Threaded
Open this post in threaded view
|

Re: Integer factorization

Colin Paul Adams
>>>>> "Francesco" == Francesco Bochicchio <[hidden email]> writes:

    Francesco> Nevertheless readability tends to be a big issue in
    Francesco> languages used in IT industry, and my feeling is that
    Francesco> haskell tends to err on the laconic side of the
    Francesco> balance.

Strongly seconded.
--
Colin Adams
Preston Lancashire
Reply | Threaded
Open this post in threaded view
|

Integer factorization

David Frey
In reply to this post by Sergey V. Mikhanov
On 3/10/2009, "Sergey V. Mikhanov" <[hidden email]> wrote:

>   Hello,
>
>I am solving a problem of finding the largest prime factor of
>600851475143 (Project Euler is great for learning new language!), and
>came with the following solution (it uses the most ineffective
>approach to finding prime numbers, however is able to factor the above
>number in fraction of second):
>
>factors :: Integer -> [Integer]
>
>factors n = doFactors n (eratosthenesFilter [1..n])
>
>doFactors n primes
>    | null newPrimes = []
>    | otherwise      =
>        let topPrime = head newPrimes in
>        if topPrime == n
>            then [topPrime]
>            else topPrime : (doFactors (n `quot` topPrime) primes)
>    where
>        newPrimes = snd $ break (\x -> (n `rem` x) == 0) primes
>
>eratosthenesFilter :: [Integer] -> [Integer]
>
>eratosthenesFilter [] = []
>eratosthenesFilter (num : nums)
>    | num == 1  = eratosthenesFilter nums
>    | otherwise = num : (eratosthenesFilter $ doFilter num nums)
>    where
>        doFilter num nums = filter (\x -> x > num && (x `rem` num) /= 0) nums
>
>What would you do different (including stylistic changes)? What are
>the general comments about efficiency (not of the algorithm, but of
>the implementation: for example, is it fine to use break at every
>invocation of doFactors?) and elegance of the solution?
>
>Thanks and regards,
>Sergey




This is my solution to the same problem.  I'm just a beginner with
Haskell as well, so just consider this as an alternate solution, not an
ideal solution.

The bottom 2 functions were pulled out of support code that I use for all
my Project Euler solutions, so that's why they seem unnecessarily
generic.

I think the only real advantages of my solution over yours is that I take
advantage of the fact that primes are always odd (except for the number
2) and that the largest prime factor of a number will always be <= half
its value.


main = putStrLn output
output = show result

result = largestPrimeFactor 600851475143


largestPrimeFactor n = last $ primeFactors n

{-
 - Gets the prime factors of an integer.
 -}
primeFactors :: (Integral a) => a -> [a]
primeFactors n = primeFactorsUsingPrimesList (2:[3, 5 .. n `div` 2]) n


{-
 - Gets the prime factors of a number.  The primes list passed as the
first
 - argument is not required to be a list of primes.  It is simply
required to be
 - a list of values to try to divide the input from.  If this list
contains
 - non-prime values, they should be ordered.  If the list does not
contain all
 - of the primes that are divisors of the input value, then the result
will be
 - incorrect.
 -}
primeFactorsUsingPrimesList :: (Integral a) => [a] -> a -> [a]
primeFactorsUsingPrimesList _      1 = []
primeFactorsUsingPrimesList (x:xs) n = if n `rem` x == 0
     then x : primeFactorsUsingPrimesList (x:xs) (n `div` x)
     else primeFactorsUsingPrimesList xs n
primeFactorsUsingPrimesList []     n = [n]
Reply | Threaded
Open this post in threaded view
|

Re: Integer factorization

Heinrich Apfelmus
In reply to this post by Francesco Bochicchio
Francesco Bochicchio wrote:

> Heinrich Apfelmus wrote:
>>
>> Stylistically, one usually uses shorter variable names in Haskell.
>
> <beginner rant>
> Sometime too short peraphs?  At least, this is one of the things that slows
> down my understanding of code posted
> on this list on or on various haskell tutorial. In any other language I
> know, programmers learn to give meaningful names to variable and functions,
> so when one reads a program, one can use the name to remember what the
> function does. Then one cames to haskell ...
> I guess the short names comes from mathematic background, but  still ...
> haskell is already very succint - even more so
> when you use pointfree programming - and if one also uses names like a,e,i (
> look at Array function definitions ), ...
> </beginner rant>
>
> Rant apart, I notice that in my own excercises I tend to shorten names, so
> maybe there is a reason for that.
> Nevertheless readability tends to be a big issue in languages used in IT
> industry, and my feeling is that haskell
> tends to err on the laconic side of the balance.

The goal is of course to make code readable, that's why I recommend
short names. :D

Abstraction is the one driving force for very short names. For example,
take the definition of  foldr

   foldr f z []     = z
   foldr f z (x:xs) = f x (foldr f z xs)

Since this function is polymorphic, so f , z  and the  xs  can be
anything, using more "descriptive" variable names is simply not
possible; the key point of fold is its generality.


A second, and my main reason for short names, or rather against long
names, is that names should be to the point. None of the names

   newPrimes
   topPrime
   doFactors
   doFilter

accurately describe the object they represent. The primes are not "new",
the prime is not "on top". The "do" is a prefix does not carry a meaning
either, it just conveys that  doFactors  has something to do with
factors . This is best expressed by making  doFactors  a local
definition in the where-block of  factors .

The name  eratosthenesFilter  is ok, but since there is no other
eratosthenes around, no meaning is lost by shortening it to simply
eratosthenes . Not to mention that the conventional term is "sieve", not
"filter". The documentation has to elaborate on it anyway.

The generality of the name  num  hints that a single letter name is
preferable.

The names that I think are great because they are to the point are

  factors
  primes



> Out of curiosity, there is any reason why you called the auxiliary function
> 'go' ?

Convention. Often, an auxiliary function that does basically the same
thing as the main function  factors  but with an extra parameter will be
named  factors' . The apostrophe has the drawback that it's easy to
forget, so some people now name such auxiliary functions  go  instead.



Regards,
apfelmus

--
http://apfelmus.nfshost.com

Reply | Threaded
Open this post in threaded view
|

Re: Integer factorization

Sergey V. Mikhanov
Greetings!


> A second, and my main reason for short names, or rather against long
> names, is that names should be to the point. None of the names
>
>   newPrimes
>   topPrime
>   doFactors
>   doFilter
>
> accurately describe the object they represent. The primes are not "new",
> the prime is not "on top". The "do" is a prefix does not carry a meaning
> either, it just conveys that  doFactors  has something to do with
> factors . This is best expressed by making  doFactors  a local
> definition in the where-block of  factors.
>

Those remarks are fine with me! I asked about the stylistic changes because
I came from the, hm, Java world and would like to avoid "writing familiar
things in unfamiliar language". In Java, factors() and doFactors() would be
a perfectly named methods: factors() is public, auxiliary doFactors() is
private and essentially _does_ the factoring.


> > Out of curiosity, there is any reason why you called the auxiliary
> function
> > 'go' ?
>
> Convention. Often, an auxiliary function that does basically the same
> thing as the main function  factors  but with an extra parameter will be
> named  factors' . The apostrophe has the drawback that it's easy to
> forget, so some people now name such auxiliary functions  go  instead.
>

I think having a local function 'go' in 'factors' is aboslutely plausible:
it is local, there's no ambiguity.

Regards,
Sergey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/beginners/attachments/20090313/74eea645/attachment.htm
Reply | Threaded
Open this post in threaded view
|

Re: Integer factorization

Colin Paul Adams
In reply to this post by Heinrich Apfelmus
>>>>> "Heinrich" == Heinrich Apfelmus <[hidden email]> writes:

    Heinrich> Abstraction is the one driving force for very short
    Heinrich> names. For example, take the definition of foldr

    Heinrich>    foldr f z [] = z foldr f z (x:xs) = f x (foldr f z
    Heinrich> xs)

    Heinrich> Since this function is polymorphic, so f , z and the xs
    Heinrich> can be anything, using more "descriptive" variable names
    Heinrich> is simply not possible; the key point of fold is its
    Heinrich> generality.

Wouldn't unit be a better descriptive name than z?
--
Colin Adams
Preston Lancashire
Reply | Threaded
Open this post in threaded view
|

Re: Integer factorization

Heinrich Apfelmus
Colin Paul Adams wrote:

>>>>>> "Heinrich" == Heinrich Apfelmus <[hidden email]> writes:
>
>     Heinrich> Abstraction is the one driving force for very short
>     Heinrich> names. For example, take the definition of foldr
>
>     Heinrich>    foldr f z [] = z foldr f z (x:xs) = f x (foldr f z
>     Heinrich> xs)
>
>     Heinrich> Since this function is polymorphic, so f , z and the xs
>     Heinrich> can be anything, using more "descriptive" variable names
>     Heinrich> is simply not possible; the key point of fold is its
>     Heinrich> generality.
>
> Wouldn't unit be a better descriptive name than z?

I have never heard of a unit in relation to  fold , I'm afraid. Monoids
and groups have units, as do physicists and a few other mathematical
structures.

While  z  is indeed quite often the unit of a monoid, for instance in

    sum     = foldr (+) 0
    product = foldr (*) 1
    concat  = foldr (++) []

it doesn't have to be the unit of a monoid.


Regards,
apfelmus

--
http://apfelmus.nfshost.com

Reply | Threaded
Open this post in threaded view
|

Re: Integer factorization

Francesco Bochicchio
In reply to this post by Heinrich Apfelmus
2009/3/13 Heinrich Apfelmus <[hidden email]>

> Francesco Bochicchio wrote:
> > Heinrich Apfelmus wrote:
> >>
> >> Stylistically, one usually uses shorter variable names in Haskell.
> >
> > <beginner rant>
>  ...
> > </beginner rant>
> >
> > Rant apart, I notice that in my own excercises I tend to shorten names,
> so
> > maybe there is a reason for that.
> > Nevertheless readability tends to be a big issue in languages used in IT
> > industry, and my feeling is that haskell
> > tends to err on the laconic side of the balance.
>
> The goal is of course to make code readable, that's why I recommend
> short names. :D
>
> Abstraction is the one driving force for very short names. For example,
> take the definition of  foldr
>
>   foldr f z []     = z
>   foldr f z (x:xs) = f x (foldr f z xs)
>
> Since this function is polymorphic, so f , z  and the  xs  can be
> anything, using more "descriptive" variable names is simply not
> possible; the key point of fold is its generality.



Ok but one could still hint at their structure or purpose:

foldr function value (x:xs) =  function x ( foldr function value xs )

I believe this would give a little more information to the casual reader.


>
>
> A second, and my main reason for short names, or rather against long
> names, is that names should be to the point. None of the names
>
>   newPrimes
>   topPrime
>   doFactors
>   doFilter
>
> accurately describe the object they represent. The primes are not "new",
> the prime is not "on top". The "do" is a prefix does not carry a meaning
> either, it just conveys that  doFactors  has something to do with
> factors . This is best expressed by making  doFactors  a local
> definition in the where-block of  factors .


I agree that well-documented shared name conventions are better than
roll-your-own.
 (x:xs)  is one example of such convention, although I tend to adopt slight
variations
like (n:nums) for list of numbers and (ch:chars) for list of characters.

But roll-your-own is still better than cryptic.


>
>
> The name  eratosthenesFilter  is ok, but since there is no other
> eratosthenes around, no meaning is lost by shortening it to simply
> eratosthenes . Not to mention that the conventional term is "sieve", not
> "filter". The documentation has to elaborate on it anyway.
>
> The generality of the name  num  hints that a single letter name is
> preferable.
>
> The names that I think are great because they are to the point are
>
>  factors
>  primes
>
> I have some resistance to use nouns for functions. In the imperative world,
nouns are for
variables, verbs are for functions. I know that in pure functional
programming there is not such a thing
as variables, but still I would reserve nouns for function parameters and
bound expressions. Hence if I have a function that
find factors, I would call it findFactors rather than just factors.

One such example of misnaming - from a beginner point of view -  is the
length function in prelude: if it was called
count, I believe more beginners would have realized that works by actually
counting the elements of
a list and not  by accessing to some already available 'property' of the
list.


>
>
> > Out of curiosity, there is any reason why you called the auxiliary
> function
> > 'go' ?
>
> Convention. Often, an auxiliary function that does basically the same
> thing as the main function  factors  but with an extra parameter will be
> named  factors' . The apostrophe has the drawback that it's easy to
> forget, so some people now name such auxiliary functions  go  instead.
>
>
I tend to use _ instead of '. Is more visible and keep conveying the idea
that the auxiliary function
is just a slight variation of the main one.



>
>
> Regards,
> apfelmus
>


Ciao
------
FB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/beginners/attachments/20090314/c0692edd/attachment-0001.htm
Reply | Threaded
Open this post in threaded view
|

Re: Integer factorization

Ertugrul Söylemez
In reply to this post by Sergey V. Mikhanov
"Sergey V. Mikhanov" <[hidden email]> wrote:

> I am solving a problem of finding the largest prime factor of
> 600851475143 (Project Euler is great for learning new language!),
> [...]

You may find my little auxilliary library [1] for ProjectEuler.net
problems handy.  For the specific problem of factoring an integer, it
uses the wheel factoring method, which uses the idea of Eratosthenes
sieving to filter out most non-primes with almost no performance loss
compared to the more naive trial division method.

[1] http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2388


Greets,
Ertugrul.


--
nightmare = unsafePerformIO (getWrongWife >>= sex)
http://blog.ertes.de/


Reply | Threaded
Open this post in threaded view
|

Re: Integer factorization

Heinrich Apfelmus
In reply to this post by Francesco Bochicchio
Francesco Bochicchio wrote:

> Heinrich Apfelmus wrote:
>>
>> Abstraction is the one driving force for very short names. For example,
>> take the definition of  foldr
>>
>>   foldr f z []     = z
>>   foldr f z (x:xs) = f x (foldr f z xs)
>>
>> Since this function is polymorphic, so f , z  and the  xs  can be
>> anything, using more "descriptive" variable names is simply not
>> possible; the key point of fold is its generality.
>
> Ok but one could still hint at their structure or purpose:
>
> foldr function value (x:xs) =  function x ( foldr function value xs )
>
> I believe this would give a little more information to the casual reader.

Sure, though there is already the convention from mathematics that
functions are named  f, g  and values are denoted by  x, y or z  .

> I have some resistance to use nouns for functions. In the imperative world,
> nouns are for variables, verbs are for functions.
> I know that in pure functional programming there is not such a thing
> as variables, but still I would reserve nouns for function parameters and
> bound expressions. Hence if I have a function that
> find factors, I would call it findFactors rather than just factors.
>
> One such example of misnaming - from a beginner point of view -  is the
> length function in prelude: if it was called
> count, I believe more beginners would have realized that works by actually
> counting the elements of
> a list and not  by accessing to some already available 'property' of the
> list.

IMHO, I think that using nouns like  length  or  factors  is a good
idea, for they sounds nice when used in compound expressions. Compare
for instance

   take (length xs `div` 2) xs

   take (count  xs `div` 2) xs

The first can be put into pseudo-english as

   "take (the length of xs divided by 2) elements of xs"

while the verb in the second makes it difficult to use the expression in
parenthesis as object to another verb. In contrast, the runtime (O(n)
versus O(1) for say a record field) is rather secondary.

Implicitly adding the preposition "of" makes the nouns flow, i.e.
"factors of n", "length of xs".


>> Convention. Often, an auxiliary function that does basically the same
>> thing as the main function  factors  but with an extra parameter will be
>> named  factors' . The apostrophe has the drawback that it's easy to
>> forget, so some people now name such auxiliary functions  go  instead.
>>
>>
> I tend to use _ instead of '. Is more visible and keep conveying the idea
> that the auxiliary function is just a slight variation of the main one.

The convention of using an apostrophe for variations comes from
mathematics and it seems that it was put into the Haskell syntax
specifically for this purpose.



In any case, the rationales and style that I presented are how I
interpret the naming conventions from classic papers like

    John Hughes. Why functional programming matters.
    http://www.cs.chalmers.se/~rjmh/Papers/whyfp.pdf

    John Hughes. The Design of a Pretty-Printing Library
    http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.8777

Most of these conventions root in mathematics, of course. I like them
for they encourage thinking purely functionally, but these are of course
not the only possibilities.


Regards,
apfelmus

--
http://apfelmus.nfshost.com