Quoting the Haddock documentation for Data.Bits.bitSize:
"Return the number of bits in the type of the argument. The actual value of the argument is ignored. The function bitSize is undefined for types that do not have a fixed bitsize, like Integer." Does anybody else think it would be *far* more useful if bitSize applied to an Integer would tell you how many bits that particular Integer is using? Especially given that it can vary? Is there a way to actually determine how many bits are in an Integer? _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Am 25.08.2011 um 19:57 schrieb Andrew Coppin:
occupiedBits = (+1) . truncate . logBase 2 . (+1) Caveat: untested :-) Cheers, Gabor
_______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Andrew Coppin
On Thursday 25 August 2011, 19:57:37, Andrew Coppin wrote:
> Quoting the Haddock documentation for Data.Bits.bitSize: > > "Return the number of bits in the type of the argument. The actual value > of the argument is ignored. The function bitSize is undefined for types > that do not have a fixed bitsize, like Integer." > > Does anybody else think it would be *far* more useful if bitSize applied > to an Integer would tell you how many bits that particular Integer is > using? Especially given that it can vary? I'm not sure about that. > > Is there a way to actually determine how many bits are in an Integer? > Sure. The exact method depends on what result you want and which integer-* package you use. You have to handle 0 and negative n first (how to treat negative n is not obvious). Then, for n > 0, f you want 'index of highest set bit' (+1), (1 +) integerLogBase 2 n does the trick (integerLogBase is available from GHC.Float, as of 7.2.1, it's decently fast). If you want 'WORD_SIZE_IN_BITS * number of words used', you could use the above and round up to the next multiple of WORD_SIZE_IN_BITS, or you could make use of the representation of Integers; with integer-gmp, data Integer = S# Int# | J# Int# ByteArray# so usedWords (S# _) = 1 usedWords (J# s# _) = I# s# (nota bene, I don't think it's positively guaranteed that the highest order word in a (J# _ _) is nonzero, so usedWords could give a higher answer than rounding up from integerLogBase 2 n). _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Andrew Coppin
On 11-08-25 01:57 PM, Andrew Coppin wrote:
> Does anybody else think it would be *far* more useful if bitSize applied > to an Integer would tell you how many bits that particular Integer is > using? Especially given that it can vary? It is useful to know the number of bits used in a value. It is useful to know the maximum number of bits storable in a type. It is useless to conflate the two into one single method. The name "bitSize" is already taken. Come up with another name, and I will agree with you. _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
And as Daniel mentioned earlier, it's not at all obvious what we mean by "bits used" when it comes to negative numbers. GMP pretends that any negative number has infinite bits in the two's-complement representation. Should we count the highest _unset_ bit when the number is negative?
Or to put it another way, what invariants should the proposed bit-counting function satisfy?
On Thu, Aug 25, 2011 at 6:32 PM, Albert Y. C. Lai <[hidden email]> wrote:
_______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
On 26/08/2011 02:40 AM, Daniel Peebles wrote:
> And as Daniel mentioned earlier, it's not at all obvious what we mean by > "bits used" when it comes to negative numbers. I guess part of the problem is that the documentation asserts that bitSize will never depend on its argument. (So would will write things like "bitSize undefined :: ThreadID" or similar.) I can think of several possible results one might want from a bit size query: 1. The number of bits of precision which will be kept for values of this type. (For Word16, this is 16. For Integer, this is [almost] infinity.) 2. The amount of RAM that this value is using up. (But that would surely be measured in bytes, not bits. And processor registors make the picture more complicated.) 3. The bit count to the most significant bit, ignoring sign. 4. The bit count to the sign bit. Currently, bitSize implements #1. I'm not especially interested in #2. I would usually want #3 or #4. Consider the case of 123 (decimal). The 2s complement representation of +123 is ...0000000000000001111011 The 2s complement representation of -123 is ...1111111111111110000101 For query #3, I would expect both +123 and -123 to yield 7. For query #4, I would expect both to yield 8. (Since if you truncate both of those strings to 8 bits, then the positive value starts with 0, and the negative one start with 1.) Then of course, there's the difference between "count of the bits" and "bit index", which one might expect to be zero-based. (So that the Nth bit represents 2^N.) _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
On Friday 26 August 2011, 19:24:37, Andrew Coppin wrote:
> On 26/08/2011 02:40 AM, Daniel Peebles wrote: > > And as Daniel mentioned earlier, it's not at all obvious what we mean > > by "bits used" when it comes to negative numbers. > > I guess part of the problem is that the documentation asserts that > bitSize will never depend on its argument. (So would will write things > like "bitSize undefined :: ThreadID" or similar.) I don't think that's a problem, it's natural for what bitSize does. And it would be bad if bitSize did something different for Integer than for Int, Word, ... > > I can think of several possible results one might want from a bit size > query: Yup, though for some there are better names than bitSize would be. > > 1. The number of bits of precision which will be kept for values of this > type. (For Word16, this is 16. For Integer, this is [almost] infinity.) Not "almost infinity", what your RAM or Int allow, whichever cops out first, or "enough, unless you [try to] do really extreme stuff". > > 2. The amount of RAM that this value is using up. (But that would surely > be measured in bytes, not bits. And processor registors make the picture > more complicated.) > > 3. The bit count to the most significant bit, ignoring sign. > > 4. The bit count to the sign bit. > > Currently, bitSize implements #1. I'm not especially interested in #2. I > would usually want #3 or #4. I'd usually be more interested in #2 than in #4. > > Consider the case of 123 (decimal). The 2s complement representation of > +123 is > > ...0000000000000001111011 > > The 2s complement representation of -123 is > > ...1111111111111110000101 > > For query #3, I would expect both +123 and -123 to yield 7. One could make a case for the answer 3 for -123, I wouldn't know what to expect without it being stated in the docs. > For query > #4, I would expect both to yield 8. (Since if you truncate both of those > strings to 8 bits, then the positive value starts with 0, and the > negative one start with 1.) #4 would then generally be #3 + 1 for signed types, I think, so not very interesting, but for unsigned types? > > Then of course, there's the difference between "count of the bits" and > "bit index", which one might expect to be zero-based. (So that the Nth > bit represents 2^N.) Yes, but that shouldn't be a problem with good names. So, which of them are useful and important enough to propose for inclusion? _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Andrew Coppin
On Fri, 26 Aug 2011 18:24:37 +0100, you wrote:
>I would usually want #3 or #4. Out of curiosity, what for? While I do occasionally need to get a "logarithmic size estimate" of a number (which is basically what #3 and #4 are), the specific requirements in each case tend to vary, enough so that it's unlikely that a single function (other than simply taking the logarithm) can handle the majority of applications. -Steve Schafer _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
On 26/08/2011 07:36 PM, Steve Schafer wrote:
> On Fri, 26 Aug 2011 18:24:37 +0100, you wrote: > >> I would usually want #3 or #4. > > Out of curiosity, what for? While I do occasionally need to get a > "logarithmic size estimate" of a number (which is basically what #3 and > #4 are), the specific requirements in each case tend to vary, enough so > that it's unlikely that a single function (other than simply taking the > logarithm) can handle the majority of applications. You wouldn't want to know how many bits you need to store on disk to reliably recreate the value? Or how many bits of randomness you need to compute a value less than or equal to this one? I suppose I could use a binary logarithm. I'm just concerned that it would be rather slow. After all, I'm not interested in the exact logarithm (which is fractional), just the number of bits (which is a small integer)... _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
On Friday 26 August 2011, 21:30:02, Andrew Coppin wrote:
> You wouldn't want to know how many bits you need to store on disk to > reliably recreate the value? Or how many bits of randomness you need to > compute a value less than or equal to this one? > > I suppose I could use a binary logarithm. I'm just concerned that it > would be rather slow. After all, I'm not interested in the exact > logarithm (which is fractional), just the number of bits (which is a > small integer)... As of GHC-7.2, there's GHC.Integer.Logarithms in both, integer-gmp and integer-simple, providing integerLog2# :: Integer -> Int# (integer-* lives before base, so there's no Int yet) which exploits the representation of Integers and should be fast enough [at least for integer-gmp, where it's bounded time for normally represented values, the representation of Integers in integer-simple forces it to be O(log n)]. Caution: integerLog2# expects its argument to be positive, havoc might ensue if it isn't. GHC.Float exports integerLogBase :: Integer -> Integer -> Int also before 7.2, now it calls the above if the base is 2, so should have decent performance. (It requires that the base is > 1 and the second argument positive, of course.) _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Andrew Coppin
On Fri, 26 Aug 2011 20:30:02 +0100, you wrote:
>You wouldn't want to know how many bits you need to store on disk to >reliably recreate the value? I can't say that I have cared about that sort of thing in a very long time. Bits are rather cheap these days. I store data on disk, and the space it occupies is whatever it is; I don't worry about it. >Or how many bits of randomness you need to compute a value less than or >equal to this one? I'm not sure I understand this one. I generally don't need _any_ randomness to compute a value. On the other hand, if the desire is to take an integer n and generate a set of pseudorandom numbers ranging from 0 to n-1, that's easily done using the standard random number methods. -Steve Schafer _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
On 26/08/2011 10:51 PM, Steve Schafer wrote:
> On Fri, 26 Aug 2011 20:30:02 +0100, you wrote: > >> You wouldn't want to know how many bits you need to store on disk to >> reliably recreate the value? > > I can't say that I have cared about that sort of thing in a very long > time. Bits are rather cheap these days. I store data on disk, and the > space it occupies is whatever it is; I don't worry about it. I meant if you're trying to *implement* serialisation. The Bits class allows you to access bits one by one, but surely you'd want some way to know how many bits you need to keep? Likewise, you say the standard PRNG can be used to generate random Integers, but what if you're trying to implement a new PRNG? _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
On Sat, 27 Aug 2011 11:57:57 +0100, you wrote:
>I meant if you're trying to *implement* serialisation. The Bits class >allows you to access bits one by one, but surely you'd want some way to >know how many bits you need to keep? For fixed-size types (e.g., Int), I might use a simple byte-for-byte serialization. But these days, I tend to avoid binary serializations, and use string conversions for all types, taking advantage of whatever built-in conversions the language offers. There's obviously more overhead, but the advantages usually outweigh the disadvantages. For one thing, I can come back a couple of years later and still figure out what the data are supposed to be. >Likewise, you say the standard PRNG can be used to generate random >Integers, but what if you're trying to implement a new PRNG? I'm not aware of of any PRNGs that use variable-length types (and I would think that such a PRNG wouldn't be very efficient), so I'm still not sure that I understand the question. -Steve Schafer _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Andrew Coppin
On Sat, Aug 27, 2011 at 06:57, Andrew Coppin <[hidden email]> wrote:
I think that falls into the realm of protocol design; if you're doing it in your program at runtime, you're probably doing it wrong. (The fixed size version makes sense for marshaling; it's *dynamic* sizes that need to be thought out beforehand.)
brandon s allbery [hidden email] wandering unix systems administrator (available) (412) 475-9364 vm/sms _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
> I meant if you're trying to *implement* serialisation. The Bits
> class allows you to access bits one by one, but surely you'd want > some way to know how many bits you need to keep? > > > I think that falls into the realm of protocol design; if you're doing it > in your program at runtime, you're probably doing it wrong. (The fixed > size version makes sense for marshaling; it's *dynamic* sizes that need > to be thought out beforehand.) If you're doing, say, cryptography, then thousand-bit random integers that need to be serialised are fairly common... _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
On Mon, Aug 29, 2011 at 03:40, Andrew Coppin <[hidden email]> wrote:
Sure, and there are encodings that let you do this without needing bitSize (BER). You need a word count, but that's usually part of the structure holding the integer.
brandon s allbery [hidden email] wandering unix systems administrator (available) (412) 475-9364 vm/sms _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Brandon Allbery
On 27 August 2011 21:57, Brandon Allbery <[hidden email]> wrote:
All search engines deal with compressed integers, all compressors do, and most people doing bit-manipulation. Golomb, gamma, elias, rice coding, they all need this. Heck, even the Intel engineers chose to optimize this function by including the BSR instruction in the 386 architecture. This is a basic building block.
Don't underestimate the bit, it is coming back with a vengeance. Bit-coding is everywhere now, because of the memory hierarchy. No haskeller should be left behind. Alexander
_______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Brandon Allbery
On 29/08/2011 09:00 AM, Brandon Allbery wrote:
> On Mon, Aug 29, 2011 at 03:40, Andrew Coppin > <[hidden email] <mailto:[hidden email]>> wrote: > > I meant if you're trying to *implement* serialisation. The Bits > class allows you to access bits one by one, but surely you'd > want > some way to know how many bits you need to keep? > > I think that falls into the realm of protocol design; if you're > doing it > in your program at runtime, you're probably doing it wrong. > (The fixed > size version makes sense for marshaling; it's *dynamic* sizes > that need > to be thought out beforehand.) > > > If you're doing, say, cryptography, then thousand-bit random > integers that need to be serialised are fairly common... > > > Sure, and there are encodings that let you do this without needing > bitSize (BER). You need a word count, but that's usually part of the > structure holding the integer. OK. But since there's no way of getting a byte count for an Integer either... _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
On Mon, Aug 29, 2011 at 04:32, Andrew Coppin <[hidden email]> wrote:
The count depends on how you're serializing it; unless you are literally serializing to individual bits and sending them over a synchronous serial link, the correct answer is an integer logarithm by your word size + the current definition of bitSize applied to said word.
brandon s allbery [hidden email] wandering unix systems administrator (available) (412) 475-9364 vm/sms _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Alexander Kjeldaas
On Mon, Aug 29, 2011 at 04:08, Alexander Kjeldaas <[hidden email]> wrote:
Is it so basic that we must throw out the current meaning of bitSize, with all *its* uses, to satisfy yours? brandon s allbery [hidden email] wandering unix systems administrator (available) (412) 475-9364 vm/sms _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Free forum by Nabble | Edit this page |