the purpose of QuickCheck's size parameter

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

the purpose of QuickCheck's size parameter

Petr Pudlák
Hi everyone,

I'd like to better understand the principles behind the 'size' parameter. Looking at quickCheckWithResult [1], its computation seems to be somewhat non-trivial, or even arbitrary. As far as I understand it, the size is varied throughout tests, increasing from small to larger values. I see two main purposes:

- Test on smaller as well as larger values. But with generators having proper distribution of values, this should happen anyway, just as if we had a constant, larger 'size' parameter.
- Starting with smaller sizes allows to find smaller count-examples first. But with shrinking, it doesn't matter that much, big counter-examples are shrunk to smaller ones anyway in most cases.

So is this parameter actually necessary? Would anything change considerably if it was dropped?

Thanks,
Petr


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: the purpose of QuickCheck's size parameter

David Feuer
data Foo a = Leaf a | Node [Foo a]

Without the size parameter, it's a bit tricky to control the distribution to avoid generating extremely large trees. I certainly agree, however, that the size parameter is an ugly and ill-specified hack.

On Thu, Jun 14, 2018, 4:20 PM Petr Pudlák <[hidden email]> wrote:
Hi everyone,

I'd like to better understand the principles behind the 'size' parameter. Looking at quickCheckWithResult [1], its computation seems to be somewhat non-trivial, or even arbitrary. As far as I understand it, the size is varied throughout tests, increasing from small to larger values. I see two main purposes:

- Test on smaller as well as larger values. But with generators having proper distribution of values, this should happen anyway, just as if we had a constant, larger 'size' parameter.
- Starting with smaller sizes allows to find smaller count-examples first. But with shrinking, it doesn't matter that much, big counter-examples are shrunk to smaller ones anyway in most cases.

So is this parameter actually necessary? Would anything change considerably if it was dropped?

Thanks,
Petr

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: the purpose of QuickCheck's size parameter

Li-yao Xia-2
In reply to this post by Petr Pudlák
The purpose of the size parameter is not well-defined formally, but it
is a very convenient knob to easily tune the test suite in various
situations, that is definitely worth the negligible cost of having it
around unconditionally.

Without a size parameter, a fixed distribution means that if we want the
generator to cover all possible cases, we have to keep a small
probability of generating humongous examples and thus go OOM. We can
avoid that by making the size of generated values bounded by the size
parameter (or a function thereof), which we can increase easily when
more resources become available.

Furthermore, if we really want to generate very large examples, the only
way with a fixed distribution is to wait longer. Instead, using a size
parameter, we can make smaller values less likely to target further
regions in the search space more accurately.

Some properties can take a while to test on larger examples (either
because of the generators or the actual verification process) so we
might like to keep the size small during development, and raise it again
once we're done.

The Arbitrary class assigns a "default" generator for every type. While
it is not always a good choice, having a parameter to tweak makes
Arbitrary more generally useful.

As for your last point, small examples are faster to generate and check,
so it seems like a decent strategy to start small by default.

Li-yao
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: the purpose of QuickCheck's size parameter

Oleg Grenrus
In reply to this post by David Feuer
Not only avoid extremely large trees, but in general guarantee termination of the generation process

Sent from my iPhone

On 15 Jun 2018, at 0.31, David Feuer <[hidden email]> wrote:

data Foo a = Leaf a | Node [Foo a]

Without the size parameter, it's a bit tricky to control the distribution to avoid generating extremely large trees. I certainly agree, however, that the size parameter is an ugly and ill-specified hack.

On Thu, Jun 14, 2018, 4:20 PM Petr Pudlák <[hidden email]> wrote:
Hi everyone,

I'd like to better understand the principles behind the 'size' parameter. Looking at quickCheckWithResult [1], its computation seems to be somewhat non-trivial, or even arbitrary. As far as I understand it, the size is varied throughout tests, increasing from small to larger values. I see two main purposes:

- Test on smaller as well as larger values. But with generators having proper distribution of values, this should happen anyway, just as if we had a constant, larger 'size' parameter.
- Starting with smaller sizes allows to find smaller count-examples first. But with shrinking, it doesn't matter that much, big counter-examples are shrunk to smaller ones anyway in most cases.

So is this parameter actually necessary? Would anything change considerably if it was dropped?

Thanks,
Petr

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: the purpose of QuickCheck's size parameter

Petr Pudlák
In reply to this post by Li-yao Xia-2
Thank you for your extended explanation. Let me reply my thoughts inline below.

pá 15. 6. 2018 v 0:22 odesílatel Li-yao Xia <[hidden email]> napsal:
The purpose of the size parameter is not well-defined formally, but it
is a very convenient knob to easily tune the test suite in various
situations, that is definitely worth the negligible cost of having it
around unconditionally.

That's what has been disturbing me. Is it a bound on a data structure size? The amount of memory it occupies (which can be quite different due to laziness)? Or a time bound on the tested algorithm running time? 

If we take a specific example, a typical recursive data structure is some kind of a tree. When generating a random binary tree, should the size bound it's height? Or its average height? Or the number of its elements?

As the size bound it it's not really defined, everyone uses it differently, so in the end all we now that there is some bound and that we can make it larger or smaller, but that's really all.

 

Without a size parameter, a fixed distribution means that if we want the
generator to cover all possible cases, we have to keep a small
probability of generating humongous examples and thus go OOM. We can
avoid that by making the size of generated values bounded by the size
parameter (or a function thereof), which we can increase easily when
more resources become available.

That's true, but I'd assume that if we use something like the exponential distribution, the probability of an example that'd cause OOM can be really tiny. (Possibly smaller than the change that there is a human-introduced bug causing OOM, for example.)
 

Furthermore, if we really want to generate very large examples, the only
way with a fixed distribution is to wait longer. Instead, using a size
parameter, we can make smaller values less likely to target further
regions in the search space more accurately.

That's probably the most convincing argument for me. Without a size bound, as the size of a structure increases, the likelihood of it's appearance must approach zero probability. To test larger structures as well, it makes sense to something like "pick a size between 1 and 1M and then generate a structure of that size", so we have a naturally occurring bound in such tests. But still I have doubts if it wouldn't be better to either:
  • Express this limit explicitly for generators of such structures of variable size, like in vectorOf, or
  • Define the meaning of 'size' more rigorously to make the behavior more consistent among different data structures.
 

Some properties can take a while to test on larger examples (either
because of the generators or the actual verification process) so we
might like to keep the size small during development, and raise it again
once we're done.

The Arbitrary class assigns a "default" generator for every type. While
it is not always a good choice, having a parameter to tweak makes
Arbitrary more generally useful.

Here I'd argue that not having a precisely defined meaning of size, tweaking Arbitrary instances by modifying its size parameter before produces very vague results. If one needs at least some level of confidence in the behavior of a test, it's usually necessary to use parametrized functions that document the distribution of the values depending on parameters
 

As for your last point, small examples are faster to generate and check,
so it seems like a decent strategy to start small by default.

I'd say that this applies only during fixing a bug, as you mentioned above. As long as the test is failing, starting with smaller example indeed makes is fail faster. But for example for tests in continuous integration, we expect all of them to succeed and we need to test both small and large, so here the advantage of starting with small ones vanishes.

Best,
Petr
 

Li-yao
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: the purpose of QuickCheck's size parameter

Petr Pudlák
In reply to this post by Oleg Grenrus
PS: Just to make clear, it's not that I have something against QuickCheck or similar libraries, on the contrary, they're great! I'm just playing the devil's advocate to analyze and understand the concept.

ne 17. 6. 2018 v 4:05 odesílatel Oleg Grenrus <[hidden email]> napsal:
Not only avoid extremely large trees, but in general guarantee termination of the generation process

Sent from my iPhone

On 15 Jun 2018, at 0.31, David Feuer <[hidden email]> wrote:

data Foo a = Leaf a | Node [Foo a]

Without the size parameter, it's a bit tricky to control the distribution to avoid generating extremely large trees. I certainly agree, however, that the size parameter is an ugly and ill-specified hack.

On Thu, Jun 14, 2018, 4:20 PM Petr Pudlák <[hidden email]> wrote:
Hi everyone,

I'd like to better understand the principles behind the 'size' parameter. Looking at quickCheckWithResult [1], its computation seems to be somewhat non-trivial, or even arbitrary. As far as I understand it, the size is varied throughout tests, increasing from small to larger values. I see two main purposes:

- Test on smaller as well as larger values. But with generators having proper distribution of values, this should happen anyway, just as if we had a constant, larger 'size' parameter.
- Starting with smaller sizes allows to find smaller count-examples first. But with shrinking, it doesn't matter that much, big counter-examples are shrunk to smaller ones anyway in most cases.

So is this parameter actually necessary? Would anything change considerably if it was dropped?

Thanks,
Petr

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: the purpose of QuickCheck's size parameter

Oliver Charles-3
Is SmallCheck more principles in this regard, or would people consider that equally hacky?

On Sun, 17 Jun 2018, 10:18 am Petr Pudlák, <[hidden email]> wrote:
PS: Just to make clear, it's not that I have something against QuickCheck or similar libraries, on the contrary, they're great! I'm just playing the devil's advocate to analyze and understand the concept.

ne 17. 6. 2018 v 4:05 odesílatel Oleg Grenrus <[hidden email]> napsal:
Not only avoid extremely large trees, but in general guarantee termination of the generation process

Sent from my iPhone

On 15 Jun 2018, at 0.31, David Feuer <[hidden email]> wrote:

data Foo a = Leaf a | Node [Foo a]

Without the size parameter, it's a bit tricky to control the distribution to avoid generating extremely large trees. I certainly agree, however, that the size parameter is an ugly and ill-specified hack.

On Thu, Jun 14, 2018, 4:20 PM Petr Pudlák <[hidden email]> wrote:
Hi everyone,

I'd like to better understand the principles behind the 'size' parameter. Looking at quickCheckWithResult [1], its computation seems to be somewhat non-trivial, or even arbitrary. As far as I understand it, the size is varied throughout tests, increasing from small to larger values. I see two main purposes:

- Test on smaller as well as larger values. But with generators having proper distribution of values, this should happen anyway, just as if we had a constant, larger 'size' parameter.
- Starting with smaller sizes allows to find smaller count-examples first. But with shrinking, it doesn't matter that much, big counter-examples are shrunk to smaller ones anyway in most cases.

So is this parameter actually necessary? Would anything change considerably if it was dropped?

Thanks,
Petr

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: the purpose of QuickCheck's size parameter

Simon  Thompson
Because it works by increasing size, yes, it doesn’t need guidance about the order. On the other hand, you’re exploring a different part of the space of possible inputs. There’s also Lazy SmallCheck, too.

Which is best? There’s no clear answer to this, but a reasonable principle is to try a bundle of approaches if you want to argue that you have used a limited amount of testing resource in as prudent as possible a way.

Simon

On 17 Jun 2018, at 11:28, Oliver Charles <[hidden email]> wrote:

Is SmallCheck more principled in this regard, or would people consider that equally hacky?

On Sun, 17 Jun 2018, 10:18 am Petr Pudlák, <[hidden email]> wrote:
PS: Just to make clear, it's not that I have something against QuickCheck or similar libraries, on the contrary, they're great! I'm just playing the devil's advocate to analyze and understand the concept.

ne 17. 6. 2018 v 4:05 odesílatel Oleg Grenrus <[hidden email]> napsal:
Not only avoid extremely large trees, but in general guarantee termination of the generation process

Sent from my iPhone

On 15 Jun 2018, at 0.31, David Feuer <[hidden email]> wrote:

data Foo a = Leaf a | Node [Foo a]

Without the size parameter, it's a bit tricky to control the distribution to avoid generating extremely large trees. I certainly agree, however, that the size parameter is an ugly and ill-specified hack.

On Thu, Jun 14, 2018, 4:20 PM Petr Pudlák <[hidden email]> wrote:
Hi everyone,

I'd like to better understand the principles behind the 'size' parameter. Looking at quickCheckWithResult [1], its computation seems to be somewhat non-trivial, or even arbitrary. As far as I understand it, the size is varied throughout tests, increasing from small to larger values. I see two main purposes:

- Test on smaller as well as larger values. But with generators having proper distribution of values, this should happen anyway, just as if we had a constant, larger 'size' parameter.
- Starting with smaller sizes allows to find smaller count-examples first. But with shrinking, it doesn't matter that much, big counter-examples are shrunk to smaller ones anyway in most cases.

So is this parameter actually necessary? Would anything change considerably if it was dropped?

Thanks,
Petr

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

Simon Thompson | Professor of Logic and Computation 
School of Computing | University of Kent | Canterbury, CT2 7NF, UK
[hidden email] | M +44 7986 085754 | W www.cs.kent.ac.uk/~sjt


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.