bitSize

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

bitSize

Andrew Coppin
Quoting the Haddock documentation for Data.Bits.bitSize:

"Return the number of bits in the type of the argument. The actual value
of the argument is ignored. The function bitSize is undefined for types
that do not have a fixed bitsize, like Integer."

Does anybody else think it would be *far* more useful if bitSize applied
to an Integer would tell you how many bits that particular Integer is
using? Especially given that it can vary?

Is there a way to actually determine how many bits are in an Integer?

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bitSize

Gabor Greif

Am 25.08.2011 um 19:57 schrieb Andrew Coppin:

Is there a way to actually determine how many bits are in an Integer?


occupiedBits :: Integer -> Int
occupiedBits = (+1) . truncate . logBase 2 . (+1)

Caveat: untested :-)

Cheers,

Gabor

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bitSize

Daniel Fischer
In reply to this post by Andrew Coppin
On Thursday 25 August 2011, 19:57:37, Andrew Coppin wrote:
> Quoting the Haddock documentation for Data.Bits.bitSize:
>
> "Return the number of bits in the type of the argument. The actual value
> of the argument is ignored. The function bitSize is undefined for types
> that do not have a fixed bitsize, like Integer."
>
> Does anybody else think it would be *far* more useful if bitSize applied
> to an Integer would tell you how many bits that particular Integer is
> using? Especially given that it can vary?

I'm not sure about that.

>
> Is there a way to actually determine how many bits are in an Integer?
>

Sure. The exact method depends on what result you want and which integer-*
package you use.

You have to handle 0 and negative n first (how to treat negative n is not
obvious).

Then, for n > 0, f you want 'index of highest set bit' (+1),

(1 +) integerLogBase 2 n

does the trick (integerLogBase is available from GHC.Float, as of 7.2.1,
it's decently fast).

If you want 'WORD_SIZE_IN_BITS * number of words used', you could use the
above and round up to the next multiple of WORD_SIZE_IN_BITS, or you could
make use of the representation of Integers; with integer-gmp,

data Integer
    = S# Int#
    | J# Int# ByteArray#

so

usedWords (S# _) = 1
usedWords (J# s# _) = I# s#

(nota bene, I don't think it's positively guaranteed that the highest order
word in a (J# _ _) is nonzero, so usedWords could give a higher answer than
rounding up from integerLogBase 2 n).

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bitSize

Albert Y. C. Lai
In reply to this post by Andrew Coppin
On 11-08-25 01:57 PM, Andrew Coppin wrote:
> Does anybody else think it would be *far* more useful if bitSize applied
> to an Integer would tell you how many bits that particular Integer is
> using? Especially given that it can vary?

It is useful to know the number of bits used in a value.

It is useful to know the maximum number of bits storable in a type.

It is useless to conflate the two into one single method.

The name "bitSize" is already taken. Come up with another name, and I
will agree with you.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bitSize

Daniel Peebles
And as Daniel mentioned earlier, it's not at all obvious what we mean by "bits used" when it comes to negative numbers. GMP pretends that any negative number has infinite bits in the two's-complement representation. Should we count the highest _unset_ bit when the number is negative?

Or to put it another way, what invariants should the proposed bit-counting function satisfy?

On Thu, Aug 25, 2011 at 6:32 PM, Albert Y. C. Lai <[hidden email]> wrote:
On 11-08-25 01:57 PM, Andrew Coppin wrote:
Does anybody else think it would be *far* more useful if bitSize applied
to an Integer would tell you how many bits that particular Integer is
using? Especially given that it can vary?

It is useful to know the number of bits used in a value.

It is useful to know the maximum number of bits storable in a type.

It is useless to conflate the two into one single method.

The name "bitSize" is already taken. Come up with another name, and I will agree with you.


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bitSize

Andrew Coppin
On 26/08/2011 02:40 AM, Daniel Peebles wrote:
> And as Daniel mentioned earlier, it's not at all obvious what we mean by
> "bits used" when it comes to negative numbers.

I guess part of the problem is that the documentation asserts that
bitSize will never depend on its argument. (So would will write things
like "bitSize undefined :: ThreadID" or similar.)

I can think of several possible results one might want from a bit size
query:

1. The number of bits of precision which will be kept for values of this
type. (For Word16, this is 16. For Integer, this is [almost] infinity.)

2. The amount of RAM that this value is using up. (But that would surely
be measured in bytes, not bits. And processor registors make the picture
more complicated.)

3. The bit count to the most significant bit, ignoring sign.

4. The bit count to the sign bit.

Currently, bitSize implements #1. I'm not especially interested in #2. I
would usually want #3 or #4.

Consider the case of 123 (decimal). The 2s complement representation of
+123 is

...0000000000000001111011

The 2s complement representation of -123 is

...1111111111111110000101

For query #3, I would expect both +123 and -123 to yield 7. For query
#4, I would expect both to yield 8. (Since if you truncate both of those
strings to 8 bits, then the positive value starts with 0, and the
negative one start with 1.)

Then of course, there's the difference between "count of the bits" and
"bit index", which one might expect to be zero-based. (So that the Nth
bit represents 2^N.)

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bitSize

Daniel Fischer
On Friday 26 August 2011, 19:24:37, Andrew Coppin wrote:
> On 26/08/2011 02:40 AM, Daniel Peebles wrote:
> > And as Daniel mentioned earlier, it's not at all obvious what we mean
> > by "bits used" when it comes to negative numbers.
>
> I guess part of the problem is that the documentation asserts that
> bitSize will never depend on its argument. (So would will write things
> like "bitSize undefined :: ThreadID" or similar.)

I don't think that's a problem, it's natural for what bitSize does. And it
would be bad if bitSize did something different for Integer than for Int,
Word, ...

>
> I can think of several possible results one might want from a bit size
> query:

Yup, though for some there are better names than bitSize would be.

>
> 1. The number of bits of precision which will be kept for values of this
> type. (For Word16, this is 16. For Integer, this is [almost] infinity.)

Not "almost infinity", what your RAM or Int allow, whichever cops out
first, or "enough, unless you [try to] do really extreme stuff".

>
> 2. The amount of RAM that this value is using up. (But that would surely
> be measured in bytes, not bits. And processor registors make the picture
> more complicated.)
>
> 3. The bit count to the most significant bit, ignoring sign.
>
> 4. The bit count to the sign bit.
>
> Currently, bitSize implements #1. I'm not especially interested in #2. I
> would usually want #3 or #4.

I'd usually be more interested in #2 than in #4.

>
> Consider the case of 123 (decimal). The 2s complement representation of
> +123 is
>
> ...0000000000000001111011
>
> The 2s complement representation of -123 is
>
> ...1111111111111110000101
>
> For query #3, I would expect both +123 and -123 to yield 7.

One could make a case for the answer 3 for -123, I wouldn't know what to
expect without it being stated in the docs.

> For query
> #4, I would expect both to yield 8. (Since if you truncate both of those
> strings to 8 bits, then the positive value starts with 0, and the
> negative one start with 1.)

#4 would then generally be #3 + 1 for signed types, I think, so not very
interesting, but for unsigned types?

>
> Then of course, there's the difference between "count of the bits" and
> "bit index", which one might expect to be zero-based. (So that the Nth
> bit represents 2^N.)

Yes, but that shouldn't be a problem with good names.

So, which of them are useful and important enough to propose for inclusion?

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bitSize

Steve Schafer
In reply to this post by Andrew Coppin
On Fri, 26 Aug 2011 18:24:37 +0100, you wrote:

>I would usually want #3 or #4.

Out of curiosity, what for? While I do occasionally need to get a
"logarithmic size estimate" of a number (which is basically what #3 and
#4 are), the specific requirements in each case tend to vary, enough so
that it's unlikely that a single function (other than simply taking the
logarithm) can handle the majority of applications.

-Steve Schafer

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bitSize

Andrew Coppin
On 26/08/2011 07:36 PM, Steve Schafer wrote:
> On Fri, 26 Aug 2011 18:24:37 +0100, you wrote:
>
>> I would usually want #3 or #4.
>
> Out of curiosity, what for? While I do occasionally need to get a
> "logarithmic size estimate" of a number (which is basically what #3 and
> #4 are), the specific requirements in each case tend to vary, enough so
> that it's unlikely that a single function (other than simply taking the
> logarithm) can handle the majority of applications.

You wouldn't want to know how many bits you need to store on disk to
reliably recreate the value? Or how many bits of randomness you need to
compute a value less than or equal to this one?

I suppose I could use a binary logarithm. I'm just concerned that it
would be rather slow. After all, I'm not interested in the exact
logarithm (which is fractional), just the number of bits (which is a
small integer)...

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bitSize

Daniel Fischer
On Friday 26 August 2011, 21:30:02, Andrew Coppin wrote:
> You wouldn't want to know how many bits you need to store on disk to
> reliably recreate the value? Or how many bits of randomness you need to
> compute a value less than or equal to this one?
>
> I suppose I could use a binary logarithm. I'm just concerned that it
> would be rather slow. After all, I'm not interested in the exact
> logarithm (which is fractional), just the number of bits (which is a
> small integer)...

As of GHC-7.2, there's GHC.Integer.Logarithms in both, integer-gmp and
integer-simple, providing

integerLog2# :: Integer -> Int#

(integer-* lives before base, so there's no Int yet) which exploits the
representation of Integers and should be fast enough [at least for
integer-gmp, where it's bounded time for normally represented values, the
representation of Integers in integer-simple forces it to be O(log n)].

Caution: integerLog2# expects its argument to be positive, havoc might
ensue if it isn't.

GHC.Float exports

integerLogBase :: Integer -> Integer -> Int

also before 7.2, now it calls the above if the base is 2, so should have
decent performance. (It requires that the base is > 1 and the second
argument positive, of course.)

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bitSize

Steve Schafer
In reply to this post by Andrew Coppin
On Fri, 26 Aug 2011 20:30:02 +0100, you wrote:

>You wouldn't want to know how many bits you need to store on disk to
>reliably recreate the value?

I can't say that I have cared about that sort of thing in a very long
time. Bits are rather cheap these days. I store data on disk, and the
space it occupies is whatever it is; I don't worry about it.

>Or how many bits of randomness you need to compute a value less than or
>equal to this one?

I'm not sure I understand this one. I generally don't need _any_
randomness to compute a value. On the other hand, if the desire is to
take an integer n and generate a set of pseudorandom numbers ranging
from 0 to n-1, that's easily done using the standard random number
methods.

-Steve Schafer

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bitSize

Andrew Coppin
On 26/08/2011 10:51 PM, Steve Schafer wrote:
> On Fri, 26 Aug 2011 20:30:02 +0100, you wrote:
>
>> You wouldn't want to know how many bits you need to store on disk to
>> reliably recreate the value?
>
> I can't say that I have cared about that sort of thing in a very long
> time. Bits are rather cheap these days. I store data on disk, and the
> space it occupies is whatever it is; I don't worry about it.

I meant if you're trying to *implement* serialisation. The Bits class
allows you to access bits one by one, but surely you'd want some way to
know how many bits you need to keep?

Likewise, you say the standard PRNG can be used to generate random
Integers, but what if you're trying to implement a new PRNG?

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bitSize

Steve Schafer
On Sat, 27 Aug 2011 11:57:57 +0100, you wrote:

>I meant if you're trying to *implement* serialisation. The Bits class
>allows you to access bits one by one, but surely you'd want some way to
>know how many bits you need to keep?

For fixed-size types (e.g., Int), I might use a simple byte-for-byte
serialization. But these days, I tend to avoid binary serializations,
and use string conversions for all types, taking advantage of whatever
built-in conversions the language offers. There's obviously more
overhead, but the advantages usually outweigh the disadvantages. For one
thing, I can come back a couple of years later and still figure out what
the data are supposed to be.

>Likewise, you say the standard PRNG can be used to generate random
>Integers, but what if you're trying to implement a new PRNG?

I'm not aware of of any PRNGs that use variable-length types (and I
would think that such a PRNG wouldn't be very efficient), so I'm still
not sure that I understand the question.

-Steve Schafer

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bitSize

Brandon Allbery
In reply to this post by Andrew Coppin
On Sat, Aug 27, 2011 at 06:57, Andrew Coppin <[hidden email]> wrote:
On 26/08/2011 10:51 PM, Steve Schafer wrote:
On Fri, 26 Aug 2011 20:30:02 +0100, you wrote:
You wouldn't want to know how many bits you need to store on disk to
reliably recreate the value?

I can't say that I have cared about that sort of thing in a very long
time. Bits are rather cheap these days. I store data on disk, and the
space it occupies is whatever it is; I don't worry about it.

I meant if you're trying to *implement* serialisation. The Bits class allows you to access bits one by one, but surely you'd want some way to know how many bits you need to keep?

I think that falls into the realm of protocol design; if you're doing it in your program at runtime, you're probably doing it wrong.  (The fixed size version makes sense for marshaling; it's *dynamic* sizes that need to be thought out beforehand.)

--
brandon s allbery                                      [hidden email]
wandering unix systems administrator (available)     (412) 475-9364 vm/sms


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bitSize

Andrew Coppin
>     I meant if you're trying to *implement* serialisation. The Bits
>     class allows you to access bits one by one, but surely you'd want
>     some way to know how many bits you need to keep?
>
>
> I think that falls into the realm of protocol design; if you're doing it
> in your program at runtime, you're probably doing it wrong.  (The fixed
> size version makes sense for marshaling; it's *dynamic* sizes that need
> to be thought out beforehand.)

If you're doing, say, cryptography, then thousand-bit random integers
that need to be serialised are fairly common...

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bitSize

Brandon Allbery
On Mon, Aug 29, 2011 at 03:40, Andrew Coppin <[hidden email]> wrote:
   I meant if you're trying to *implement* serialisation. The Bits
   class allows you to access bits one by one, but surely you'd want
   some way to know how many bits you need to keep?

I think that falls into the realm of protocol design; if you're doing it
in your program at runtime, you're probably doing it wrong.  (The fixed
size version makes sense for marshaling; it's *dynamic* sizes that need
to be thought out beforehand.)

If you're doing, say, cryptography, then thousand-bit random integers that need to be serialised are fairly common...

Sure, and there are encodings that let you do this without needing bitSize (BER).  You need a word count, but that's usually part of the structure holding the integer.

--
brandon s allbery                                      [hidden email]
wandering unix systems administrator (available)     (412) 475-9364 vm/sms


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bitSize

Alexander Kjeldaas
In reply to this post by Brandon Allbery


On 27 August 2011 21:57, Brandon Allbery <[hidden email]> wrote:
On Sat, Aug 27, 2011 at 06:57, Andrew Coppin <[hidden email]> wrote:
On 26/08/2011 10:51 PM, Steve Schafer wrote:
On Fri, 26 Aug 2011 20:30:02 +0100, you wrote:
You wouldn't want to know how many bits you need to store on disk to
reliably recreate the value?

I can't say that I have cared about that sort of thing in a very long
time. Bits are rather cheap these days. I store data on disk, and the
space it occupies is whatever it is; I don't worry about it.

I meant if you're trying to *implement* serialisation. The Bits class allows you to access bits one by one, but surely you'd want some way to know how many bits you need to keep?

I think that falls into the realm of protocol design; if you're doing it in your program at runtime, you're probably doing it wrong.  (The fixed size version makes sense for marshaling; it's *dynamic* sizes that need to be thought out beforehand.)


All search engines deal with compressed integers, all compressors do, and most people doing bit-manipulation. Golomb, gamma, elias, rice coding, they all need this. Heck, even the Intel engineers chose to optimize this function by including the BSR instruction in the 386 architecture.  This is a basic building block.

Don't underestimate the bit, it is coming back with a vengeance. Bit-coding is everywhere now, because of the memory hierarchy.  No haskeller should be left behind.

Alexander


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bitSize

Andrew Coppin
In reply to this post by Brandon Allbery
On 29/08/2011 09:00 AM, Brandon Allbery wrote:

> On Mon, Aug 29, 2011 at 03:40, Andrew Coppin
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>             I meant if you're trying to *implement* serialisation. The Bits
>             class allows you to access bits one by one, but surely you'd
>         want
>             some way to know how many bits you need to keep?
>
>         I think that falls into the realm of protocol design; if you're
>         doing it
>         in your program at runtime, you're probably doing it wrong.
>           (The fixed
>         size version makes sense for marshaling; it's *dynamic* sizes
>         that need
>         to be thought out beforehand.)
>
>
>     If you're doing, say, cryptography, then thousand-bit random
>     integers that need to be serialised are fairly common...
>
>
> Sure, and there are encodings that let you do this without needing
> bitSize (BER).  You need a word count, but that's usually part of the
> structure holding the integer.

OK. But since there's no way of getting a byte count for an Integer
either...

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bitSize

Brandon Allbery
On Mon, Aug 29, 2011 at 04:32, Andrew Coppin <[hidden email]> wrote:
OK. But since there's no way of getting a byte count for an Integer either...

The count depends on how you're serializing it; unless you are literally serializing to individual bits and sending them over a synchronous serial link, the correct answer is an integer logarithm by your word size + the current definition of bitSize applied to said word.

--
brandon s allbery                                      [hidden email]
wandering unix systems administrator (available)     (412) 475-9364 vm/sms


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bitSize

Brandon Allbery
In reply to this post by Alexander Kjeldaas
On Mon, Aug 29, 2011 at 04:08, Alexander Kjeldaas <[hidden email]> wrote:
All search engines deal with compressed integers, all compressors do, and most people doing bit-manipulation. Golomb, gamma, elias, rice coding, they all need this. Heck, even the Intel engineers chose to optimize this function by including the BSR instruction in the 386 architecture.  This is a basic building block.

Don't underestimate the bit, it is coming back with a vengeance. Bit-coding is everywhere now, because of the memory hierarchy.  No haskeller should be left behind.

Is it so basic that we must throw out the current meaning of bitSize, with all *its* uses, to satisfy yours?

--
brandon s allbery                                      [hidden email]
wandering unix systems administrator (available)     (412) 475-9364 vm/sms


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
12