Quantcast

Haskell platform proposal: split package

classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Haskell platform proposal: split package

Brent Yorgey-2
Hello everyone,

This is a proposal for the split package [1] to be included in the
next major release of the Haskell platform.

Everyone is invited to review this proposal, following the standard
procedure [2] for proposing and reviewing packages.
 
Review comments should be sent to the libraries mailing list by August
20 (arbitrarily chosen; there's plenty of time before the October 1
deadline [3]). The Haskell Platform wiki will be kept up-to-date with
the results of the review process:

  http://trac.haskell.org/haskell-platform/wiki/Proposals/split

[1] http://hackage.haskell.org/package/split
[2] http://trac.haskell.org/haskell-platform/wiki/AddingPackages 
[3] http://trac.haskell.org/haskell-platform/wiki/ReleaseTimetable

Credits
=======

Proposal author and package maintainer:
  Brent Yorgey <byorgey at cis.upenn.edu>

Abstract
========

The Data.List.Split module contains a wide range of strategies for
splitting lists with respect to some sort of delimiter, mostly
implemented through a unified combinator interface. The goal is to be
a flexible yet simple alternative to the standard 'split' function
found in some other mainstream languages.

Documentation and tarball from the hackage page:

  http://hackage.haskell.org/package/split

Development repo:

  darcs get http://code.haskell.org/~byorgey/code/split

Rationale
=========

Splitting a list into chunks based on some sort of delimiter(s) is a
common need, and is provided in the standard libraries of several
mainstream languages (e.g. Python [4], Ruby [5], Java [6]).  Haskell
beginners routinely ask whether such a function exists in the standard
libraries.  For a long time, the answer was no.  Adding such a
function to Haskell's standard libraries has been proposed multiple
times over the years, but consensus was never reached on the design of
such a function. (See, e.g. [7, 8, 9].)

[4] http://docs.python.org/py3k/library/stdtypes.html?highlight=split#str.split
[5] http://www.ruby-doc.org/core-1.9.3/String.html#method-i-split
[6] http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String)
[7] http://www.haskell.org/pipermail/libraries/2006-July/005504.html
[8] http://www.haskell.org/pipermail/libraries/2006-October/006072.html
[9] http://www.haskell.org/pipermail/libraries/2008-January/008922.html

In December 2008 the split package was released, implementing not just
a single split method, but a wide range of splitting strategies.

Since then the split package has gained wide acceptance, with almost
95 reverse dependencies [10], putting it in the top 40 for number of
reverse dependencies on Hackage.

[10] http://packdeps.haskellers.com/reverse/split 

The package is quite stable. Since the 0.1.4 release in April 2011
only very minor updates have been made.  It has a large suite of
QuickCheck properties [11]; to my recollection no bugs have ever been
reported.

[11] http://code.haskell.org/~byorgey/code/split/Properties.hs

API
===

For a detailed description of the package API and example usage, see
the Haddock documentation:

  http://hackage.haskell.org/packages/archive/split/0.1.4.3/doc/html/Data-List-Split.html

Design decisions
================

Most of the library is based around a (rather simple) combinator
interface.  Combinators are used to build up configuration records
(recording options such as whether to keep delimiters, whether to keep
blank segments, etc).  A configuration record is finally handed off to
a function which performs a generic maximally-information-preserving
splitting algorithm and then does various postprocessing steps (based
on the configuration) to selectively throw information away.  It is
probably not the fastest way to implement these methods, but speed is
explicitly not a design goal: the aim is to provide a reasonably wide
range of splitting strategies which can be used simply.  Blazing speed
(or more complex processing), when needed, can be obtained from a
proper parsing package.

Open issues
===========

Use of GHC.Exts
---------------

At the request of a user, the 0.1.4.3 release switched from defining
its own version of the standard 'build' function, to importing it from
GHC.Exts.  This allows GHC to do more optimization, resulting in
reported speedups to uses of splitEvery, splitPlaces, and
splitPlacesBlanks.  However, this makes the library GHC-specific.  If
any reviewers think this is an issue I would be willing to go back to
defining build by hand, or use CPP macros to select between build
implementations based on the compiler.

Missing strategies
------------------

The specific way that the generic splitting algorithm is implemented
does preclude some imaginable splitting strategies.  For example, a
few years ago I tried adding a strategy that used a predicate on pairs
of elements, splitting down the middle of any pairs that satisfy the
predicate, but gave up because it simply did not fit.

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Haskell platform proposal: split package

Henning Thielemann

On Fri, 20 Jul 2012, Brent Yorgey wrote:

> Use of GHC.Exts
> ---------------
>
> At the request of a user, the 0.1.4.3 release switched from defining
> its own version of the standard 'build' function, to importing it from
> GHC.Exts.  This allows GHC to do more optimization, resulting in
> reported speedups to uses of splitEvery, splitPlaces, and
> splitPlacesBlanks.  However, this makes the library GHC-specific.  If
> any reviewers think this is an issue I would be willing to go back to
> defining build by hand, or use CPP macros to select between build
> implementations based on the compiler.

You could provide two private modules with the same name in different
directories, one that re-exports 'build' from GHC.Exts and one with a
custom definition of 'build' for non-GHC compilers. Then set
'Hs-Source-Dirs' in Cabal according to the impl(ghc). No CPP needed, only
Cabal. One could even think of a separate package for this purpose.


The only type extension you use, is GADTs, right? It looks like you use it
for an Eq constraint in Delimiter/DelimSublist. That is, you actually need
only ExistentialQuantification. Is it necessary?

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Haskell platform proposal: split package

Henning Thielemann

On Fri, 20 Jul 2012, Henning Thielemann wrote:

> The only type extension you use, is GADTs, right? It looks like you use it
> for an Eq constraint in Delimiter/DelimSublist. That is, you actually need
> only ExistentialQuantification. Is it necessary?

Hm, this (Eq a) isn't even part of an existential quantification. So where
is GADT needed?

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Haskell platform proposal: split package

Brent Yorgey-2
In reply to this post by Henning Thielemann
On Fri, Jul 20, 2012 at 11:25:47PM +0200, Henning Thielemann wrote:

>
> On Fri, 20 Jul 2012, Brent Yorgey wrote:
>
> >Use of GHC.Exts
> >---------------
> >
> >At the request of a user, the 0.1.4.3 release switched from defining
> >its own version of the standard 'build' function, to importing it from
> >GHC.Exts.  This allows GHC to do more optimization, resulting in
> >reported speedups to uses of splitEvery, splitPlaces, and
> >splitPlacesBlanks.  However, this makes the library GHC-specific.  If
> >any reviewers think this is an issue I would be willing to go back to
> >defining build by hand, or use CPP macros to select between build
> >implementations based on the compiler.
>
> You could provide two private modules with the same name in different
> directories, one that re-exports 'build' from GHC.Exts and one with a
> custom definition of 'build' for non-GHC compilers. Then set
> 'Hs-Source-Dirs' in Cabal according to the impl(ghc). No CPP needed,
> only Cabal. One could even think of a separate package for this
> purpose.

Ah, this is a good idea.  I'd still like to hear from other reviewers
whether they think it is worth the trouble.  To what extent should the
Haskell Platform try to be compiler-agnostic (even though it includes
GHC)?

> The only type extension you use, is GADTs, right? It looks like you
> use it for an Eq constraint in Delimiter/DelimSublist. That is, you
> actually need only ExistentialQuantification. Is it necessary?

You are right, actually, only ExistentialQuantification is necessary,
as long as we also stop using GADT syntax.  I didn't realize before
that this syntax is accepted:

  {-# LANGUAGE ExistentialQuantification #-}

  data Delimiter a =         DelimEltPred (a -> Bool)
                   | Eq a => DelimSublist [a]

I do agree that this is a bit weird, what's going on here is not
exactly existential quantification.  But in any case the
ExistentialQuantification extension turns on this ability to embed
class constraints in data constructors -- at least in GHC.  I am happy
to make this change, but I guess it again raises the issue
GHC-specificity.

As to whether there is a way to do this without using
ExistentialQuantification, I don't see an obvious solution (though
there probably is one).  The issue is that we certainly don't want to
require an (Eq a) constraint when using a predicate, but we need one
when matching sublists.  I'm open to suggestions.

-Brent

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Haskell platform proposal: split package

Roman Cheplyaka-2
In reply to this post by Brent Yorgey-2
+1. People often reinvent splitting utilities because adding dependency
on a package for simple functions like these seems overkill.
Hopefully, inclusion into HP will help it and make the overhead smaller.

Regarding the API: I'm a bit concerned with presence of synonyms there
(e.g. chunk = splitEvery, sepBy = splitOn etc.) IMO it makes it harder
to learn the API (which is not small even without the synonyms), and
especially to read other people's code if their preferences in naming
differ from yours.

--
Roman I. Cheplyaka :: http://ro-che.info/

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Haskell platform proposal: split package

Henning Thielemann
In reply to this post by Brent Yorgey-2

On Fri, 20 Jul 2012, Brent Yorgey wrote:

> As to whether there is a way to do this without using
> ExistentialQuantification, I don't see an obvious solution (though
> there probably is one).  The issue is that we certainly don't want to
> require an (Eq a) constraint when using a predicate, but we need one
> when matching sublists.  I'm open to suggestions.

If you do not embed the Eq dictionary into DelimSublist then you have to
add the Eq constraint to some functions. Is this a problem? In turn you
would get a perfectly portable Haskell 98 module.

See attached patch. You would still need to do a major version bump.

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries

split.diff (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Haskell platform proposal: split package

Gábor Lehel
In reply to this post by Brent Yorgey-2
On Sat, Jul 21, 2012 at 2:34 AM, Brent Yorgey <[hidden email]> wrote:

> You are right, actually, only ExistentialQuantification is necessary,
> as long as we also stop using GADT syntax.  I didn't realize before
> that this syntax is accepted:
>
>   {-# LANGUAGE ExistentialQuantification #-}
>
>   data Delimiter a =         DelimEltPred (a -> Bool)
>                    | Eq a => DelimSublist [a]
>
> I do agree that this is a bit weird, what's going on here is not
> exactly existential quantification.  But in any case the
> ExistentialQuantification extension turns on this ability to embed
> class constraints in data constructors -- at least in GHC.

GADTs and ExistentialQuantification are pretty similar. There's only
two differences:
- Syntax
- GADTs enable equality constraints, ExistentialQuantification does not
But if you have equality constraints from somewhere else (say,
TypeFamilies) then ExistentialQuantification is equivalent to GADTs.

An unrelated suggestion: you can give type signatures to the various
functions which are synonyms of each other as a group and they will
show up as a single item in the Haddocks.

For example, instead of

-- | some docs
splitOn :: Eq a => [a] -> [a] -> [[a]]

-- | some other docs
sepBy :: Eq a => [a] -> [a] -> [[a]]

-- | different docs
unintercalate :: Eq a => [a] -> [a] -> [[a]]

you can have

-- | one and only docs
splitOn, sepBy, unintercalate :: Eq a => [a] -> [a] -> [[a]]

I don't know if you consider this an improvement. I think I do.

--
Your ship was caught in a monadic eruption.

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Haskell platform proposal: split package

Brent Yorgey-2
In reply to this post by Roman Cheplyaka-2
On Sat, Jul 21, 2012 at 09:24:16AM +0300, Roman Cheplyaka wrote:
>
> Regarding the API: I'm a bit concerned with presence of synonyms there
> (e.g. chunk = splitEvery, sepBy = splitOn etc.) IMO it makes it harder
> to learn the API (which is not small even without the synonyms), and
> especially to read other people's code if their preferences in naming
> differ from yours.

Would your concern be addressed by Gábor's suggestion to group
synonyms together in the documentation?

-Brent

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Haskell platform proposal: split package

Simon Hengel
In reply to this post by Gábor Lehel
> An unrelated suggestion: you can give type signatures to the various
> functions which are synonyms of each other as a group and they will
> show up as a single item in the Haddocks.

Have you tried this a recent version of Haddock?  I think this was only
in the Haddock version that was released with GHC 7.2 (introduced due to
a change in GHC).  But it had issues with explicit export lists, so we
now always expand such signatures.

Cheers,
Simon

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Haskell platform proposal: split package

Brent Yorgey-2
In reply to this post by Henning Thielemann
On Sat, Jul 21, 2012 at 09:34:13AM +0200, Henning Thielemann wrote:

>
> On Fri, 20 Jul 2012, Brent Yorgey wrote:
>
> >As to whether there is a way to do this without using
> >ExistentialQuantification, I don't see an obvious solution (though
> >there probably is one).  The issue is that we certainly don't want to
> >require an (Eq a) constraint when using a predicate, but we need one
> >when matching sublists.  I'm open to suggestions.
>
> If you do not embed the Eq dictionary into DelimSublist then you have
> to add the Eq constraint to some functions. Is this a problem? In
> turn you would get a perfectly portable Haskell 98 module.

No, I do not want to do this.  The reason is that you would no longer
be able to do splitting over lists of elements with no Eq instance,
even if you are using a predicate.  For example, it is currently
possible to do

  fs = [(+1), (subtract 7), (*6)]
  fs' = splitWhen (\f -> f 7 == 0) fs
 
but this would be no longer possible with your patch.  This is an
admittedly contrived example, but on principle I don't want to
unnecessarily restrict the API in this way.

-Brent

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Haskell platform proposal: split package

Twan van Laarhoven
In reply to this post by Brent Yorgey-2
On 2012-07-21 02:34, Brent Yorgey wrote:
> You are right, actually, only ExistentialQuantification is necessary,
> as long as we also stop using GADT syntax.  I didn't realize before
> that this syntax is accepted:
>
>    {-# LANGUAGE ExistentialQuantification #-}
>
>    data Delimiter a =         DelimEltPred (a -> Bool)
>   | Eq a => DelimSublist [a]
>

Would the following type work?

     data Delimiter a
         = DelimEltPred (a -> Bool)
         | DelimSublistPred [a -> Bool]

You can go from the current DelimSublist to DelimSublistPred with just `map
(==)`. And is the distinction between the DelimEltPred and DelimSublistPred then
still needed at all?


Twan

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Haskell platform proposal: split package

Twan van Laarhoven
On 2012-07-21 14:30, Twan van Laarhoven wrote:
> Would the following type work?
>
> data Delimiter a = DelimEltPred (a -> Bool) | DelimSublistPred [a -> Bool]
>
> You can go from the current DelimSublist to DelimSublistPred with just `map
> (==)`. And is the distinction between the DelimEltPred and DelimSublistPred
> then still needed at all?

I attached some code that uses the simplified Delimiter type.



As an aside, the documentation for splitPlaces contains this unhelpful remark:

> The behavior of splitPlaces ls xs when sum ls /= length xs can be inferred
> from the above examples and the fact that splitPlaces is total.

To me that reads like "the documentation of this function is left as an exercise
to the reader". Perhaps say something like:

     If the input list is longer than the total of the given lengths, then the
remaining elements are dropped. If the list is shorter than the total of the
given lengths, then the result may contain fewer chunks, and the last chunk may
be shorter.

While `splitPlacesBlanks` could say something like:

     If the input list is longer than the total of the given lengths, then the
remaining elements are dropped. If the list is shorter than the total of the
given lengths, then the last several chunk will be shorter or empty.


Twan

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries

2012-07-21-split.hs (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Haskell platform proposal: split package

Brent Yorgey-2
On Sat, Jul 21, 2012 at 02:59:45PM +0200, Twan van Laarhoven wrote:

> On 2012-07-21 14:30, Twan van Laarhoven wrote:
> >Would the following type work?
> >
> >data Delimiter a = DelimEltPred (a -> Bool) | DelimSublistPred [a -> Bool]
> >
> >You can go from the current DelimSublist to DelimSublistPred with just `map
> >(==)`. And is the distinction between the DelimEltPred and DelimSublistPred
> >then still needed at all?
>
> I attached some code that uses the simplified Delimiter type.

Brilliant!  Yes, this is a big improvement, and does indeed allow
getting rid of the ExistentialQuantification (or GADTs) extension.  I
will make this change for sure, regardless of the outcome of the
review process.

> As an aside, the documentation for splitPlaces contains this unhelpful remark:
>
> >The behavior of splitPlaces ls xs when sum ls /= length xs can be inferred
> >from the above examples and the fact that splitPlaces is total.
>
> To me that reads like "the documentation of this function is left as
> an exercise to the reader". Perhaps say something like:
>
>     If the input list is longer than the total of the given lengths,
> then the remaining elements are dropped. If the list is shorter than
> the total of the given lengths, then the result may contain fewer
> chunks, and the last chunk may be shorter.
>
> While `splitPlacesBlanks` could say something like:
>
>     If the input list is longer than the total of the given lengths,
> then the remaining elements are dropped. If the list is shorter than
> the total of the given lengths, then the last several chunk will be
> shorter or empty.

Ah, yes, I agree.  I must have been in somewhat of a cheeky mood when
I wrote that.  I will improve the documentation, thanks for the
suggestions.

-Brent

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Haskell platform proposal: split package

Gábor Lehel
In reply to this post by Simon Hengel
On Sat, Jul 21, 2012 at 2:13 PM, Simon Hengel <[hidden email]> wrote:

>> An unrelated suggestion: you can give type signatures to the various
>> functions which are synonyms of each other as a group and they will
>> show up as a single item in the Haddocks.
>
> Have you tried this a recent version of Haddock?  I think this was only
> in the Haddock version that was released with GHC 7.2 (introduced due to
> a change in GHC).  But it had issues with explicit export lists, so we
> now always expand such signatures.
>
> Cheers,
> Simon

It seems to work with Haddock 2.10 / GHC 7.4. I remember I initially
tried it with whatever old version of Haddock I had installed and I
was annoyed that it didn't work (it got split into two declarations
with one of them missing documentation entirely -- quite suboptimal),
but then following a suggestion I tried it again with GHC 7.4 and was
happy that it seemed to have improved and gave the expected behaviour,
and impressively even worked across module boundaries. I would be sad
if it stopped working again. :)

Some examples (which Hackage built using 7.4):
http://hackage.haskell.org/packages/archive/repa/3.2.1.1/doc/html/Data-Array-Repa.html
(append, (++))
http://hackage.haskell.org/packages/archive/type-eq/0.1.2/doc/html/Type-Eq.html
(cast, (|>), TypeEq(..))

--
Your ship was caught in a monadic eruption.

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Haskell platform proposal: split package

Brent Yorgey-2
On Sat, Jul 21, 2012 at 04:37:48PM +0200, Gábor Lehel wrote:

> On Sat, Jul 21, 2012 at 2:13 PM, Simon Hengel <[hidden email]> wrote:
> >> An unrelated suggestion: you can give type signatures to the various
> >> functions which are synonyms of each other as a group and they will
> >> show up as a single item in the Haddocks.
> >
> > Have you tried this a recent version of Haddock?  I think this was only
> > in the Haddock version that was released with GHC 7.2 (introduced due to
> > a change in GHC).  But it had issues with explicit export lists, so we
> > now always expand such signatures.
> >
> > Cheers,
> > Simon
>
> It seems to work with Haddock 2.10 / GHC 7.4. I remember I initially
> tried it with whatever old version of Haddock I had installed and I
> was annoyed that it didn't work (it got split into two declarations
> with one of them missing documentation entirely -- quite suboptimal),
> but then following a suggestion I tried it again with GHC 7.4 and was
> happy that it seemed to have improved and gave the expected behaviour,
> and impressively even worked across module boundaries. I would be sad
> if it stopped working again. :)

Indeed, from the CHANGES file in the haddock distribution:

  Changes in version 2.9.3:

    * A type signature for multiple names generates one signature in the output

Neato, I didn't know about this feature.  I think grouping synonyms
together like this is a good idea.

-Brent

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Haskell platform proposal: split package

Simon Hengel
In reply to this post by Gábor Lehel
(CCing David Waern, Haddocks maintainer)

On Sat, Jul 21, 2012 at 04:37:48PM +0200, Gábor Lehel wrote:

> On Sat, Jul 21, 2012 at 2:13 PM, Simon Hengel <[hidden email]> wrote:
> >> An unrelated suggestion: you can give type signatures to the various
> >> functions which are synonyms of each other as a group and they will
> >> show up as a single item in the Haddocks.
> >
> > Have you tried this a recent version of Haddock?  I think this was only
> > in the Haddock version that was released with GHC 7.2 (introduced due to
> > a change in GHC).  But it had issues with explicit export lists, so we
> > now always expand such signatures.
> >
> > Cheers,
> > Simon
>
> It seems to work with Haddock 2.10 / GHC 7.4.

Oh, then the patch did not make it into Haddock 2.10 / GHC 7.4.1.  But
Haddock 2.11 (which comes with GHC 7.4.2) expands such signatures.  The
corresponding ticket is at [1].

> I remember I initially tried it with whatever old version of Haddock I
> had installed and I was annoyed that it didn't work

AFAIK, prior to GHC 7.2, ghc did not retain that information in the AST
(see [2]).

> (it got split into two declarations with one of them missing
> documentation entirely -- quite suboptimal),

Maybe some solace, you now at least get the documentation on all the
declarations ;)

> I would be sad if it stopped working again. :)
>
> Some examples (which Hackage built using 7.4):
> http://hackage.haskell.org/packages/archive/repa/3.2.1.1/doc/html/Data-Array-Repa.html
> (append, (++))

Looks like a valid use case.  I never thought about that, but if you
have synonyms for a function, it really makes sense. (Personally, I do
not really like synonyms, but for this particularly case even that makes
sense to me.)

The issue was with stuff like:

    Module Foo (

    -- * Foo
      foo

    -- * Bar
    , bar

    -- * Baz
    , baz
    ) where

    foo, bar, baz :: Int
    ...

Or what, if you change the order of identifiers in the export list?

There is an other case that would need special treatment.  We now
include deprecation messages for deprecated stuff in documentation.  So
for the following example the documentation for `foo` and `bar` would be
different:

    -- | Documentation for `foo` and `bar`.
    foo, bar :: Int
    {-# DEPRECATED foo "use `bar` instead" #-}
    ...

Cheers,
Simon

[1] http://trac.haskell.org/haddock/ticket/192
[2] http://hackage.haskell.org/trac/ghc/ticket/1595

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Haskell platform proposal: split package

Edward Z. Yang
In reply to this post by Brent Yorgey-2
I haven't reviewed the technical content of the proposal yet (beyond
having used split in prior projects), but I generally approve of
having a split-like package in the Haskell Platform.

Cheers,
Edward

Excerpts from Brent Yorgey's message of Fri Jul 20 16:35:57 -0400 2012:

> Hello everyone,
>
> This is a proposal for the split package [1] to be included in the
> next major release of the Haskell platform.
>
> Everyone is invited to review this proposal, following the standard
> procedure [2] for proposing and reviewing packages.
>  
> Review comments should be sent to the libraries mailing list by August
> 20 (arbitrarily chosen; there's plenty of time before the October 1
> deadline [3]). The Haskell Platform wiki will be kept up-to-date with
> the results of the review process:
>
>   http://trac.haskell.org/haskell-platform/wiki/Proposals/split
>
> [1] http://hackage.haskell.org/package/split
> [2] http://trac.haskell.org/haskell-platform/wiki/AddingPackages 
> [3] http://trac.haskell.org/haskell-platform/wiki/ReleaseTimetable
>
> Credits
> =======
>
> Proposal author and package maintainer:
>   Brent Yorgey <byorgey at cis.upenn.edu>
>
> Abstract
> ========
>
> The Data.List.Split module contains a wide range of strategies for
> splitting lists with respect to some sort of delimiter, mostly
> implemented through a unified combinator interface. The goal is to be
> a flexible yet simple alternative to the standard 'split' function
> found in some other mainstream languages.
>
> Documentation and tarball from the hackage page:
>
>   http://hackage.haskell.org/package/split
>
> Development repo:
>
>   darcs get http://code.haskell.org/~byorgey/code/split
>
> Rationale
> =========
>
> Splitting a list into chunks based on some sort of delimiter(s) is a
> common need, and is provided in the standard libraries of several
> mainstream languages (e.g. Python [4], Ruby [5], Java [6]).  Haskell
> beginners routinely ask whether such a function exists in the standard
> libraries.  For a long time, the answer was no.  Adding such a
> function to Haskell's standard libraries has been proposed multiple
> times over the years, but consensus was never reached on the design of
> such a function. (See, e.g. [7, 8, 9].)
>
> [4] http://docs.python.org/py3k/library/stdtypes.html?highlight=split#str.split
> [5] http://www.ruby-doc.org/core-1.9.3/String.html#method-i-split
> [6] http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String)
> [7] http://www.haskell.org/pipermail/libraries/2006-July/005504.html
> [8] http://www.haskell.org/pipermail/libraries/2006-October/006072.html
> [9] http://www.haskell.org/pipermail/libraries/2008-January/008922.html
>
> In December 2008 the split package was released, implementing not just
> a single split method, but a wide range of splitting strategies.
>
> Since then the split package has gained wide acceptance, with almost
> 95 reverse dependencies [10], putting it in the top 40 for number of
> reverse dependencies on Hackage.
>
> [10] http://packdeps.haskellers.com/reverse/split 
>
> The package is quite stable. Since the 0.1.4 release in April 2011
> only very minor updates have been made.  It has a large suite of
> QuickCheck properties [11]; to my recollection no bugs have ever been
> reported.
>
> [11] http://code.haskell.org/~byorgey/code/split/Properties.hs
>
> API
> ===
>
> For a detailed description of the package API and example usage, see
> the Haddock documentation:
>
>   http://hackage.haskell.org/packages/archive/split/0.1.4.3/doc/html/Data-List-Split.html
>
> Design decisions
> ================
>
> Most of the library is based around a (rather simple) combinator
> interface.  Combinators are used to build up configuration records
> (recording options such as whether to keep delimiters, whether to keep
> blank segments, etc).  A configuration record is finally handed off to
> a function which performs a generic maximally-information-preserving
> splitting algorithm and then does various postprocessing steps (based
> on the configuration) to selectively throw information away.  It is
> probably not the fastest way to implement these methods, but speed is
> explicitly not a design goal: the aim is to provide a reasonably wide
> range of splitting strategies which can be used simply.  Blazing speed
> (or more complex processing), when needed, can be obtained from a
> proper parsing package.
>
> Open issues
> ===========
>
> Use of GHC.Exts
> ---------------
>
> At the request of a user, the 0.1.4.3 release switched from defining
> its own version of the standard 'build' function, to importing it from
> GHC.Exts.  This allows GHC to do more optimization, resulting in
> reported speedups to uses of splitEvery, splitPlaces, and
> splitPlacesBlanks.  However, this makes the library GHC-specific.  If
> any reviewers think this is an issue I would be willing to go back to
> defining build by hand, or use CPP macros to select between build
> implementations based on the compiler.
>
> Missing strategies

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Haskell platform proposal: split package

Roman Cheplyaka-2
In reply to this post by Brent Yorgey-2
* Brent Yorgey <[hidden email]> [2012-07-21 08:09:09-0400]

> On Sat, Jul 21, 2012 at 09:24:16AM +0300, Roman Cheplyaka wrote:
> >
> > Regarding the API: I'm a bit concerned with presence of synonyms there
> > (e.g. chunk = splitEvery, sepBy = splitOn etc.) IMO it makes it harder
> > to learn the API (which is not small even without the synonyms), and
> > especially to read other people's code if their preferences in naming
> > differ from yours.
>
> Would your concern be addressed by Gábor's suggestion to group
> synonyms together in the documentation?

This could be a minor improvement (I haven't checked how these
grouped functions look like in haddock), but I don't see the need for
synonyms in the first place. Could you maybe explain the motivation
behind them?

--
Roman I. Cheplyaka :: http://ro-che.info/

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Haskell platform proposal: split package

Brent Yorgey-2
On Sat, Jul 21, 2012 at 11:40:19PM +0300, Roman Cheplyaka wrote:

> * Brent Yorgey <[hidden email]> [2012-07-21 08:09:09-0400]
> > On Sat, Jul 21, 2012 at 09:24:16AM +0300, Roman Cheplyaka wrote:
> > >
> > > Regarding the API: I'm a bit concerned with presence of synonyms there
> > > (e.g. chunk = splitEvery, sepBy = splitOn etc.) IMO it makes it harder
> > > to learn the API (which is not small even without the synonyms), and
> > > especially to read other people's code if their preferences in naming
> > > differ from yours.
> >
> > Would your concern be addressed by Gábor's suggestion to group
> > synonyms together in the documentation?
>
> This could be a minor improvement (I haven't checked how these
> grouped functions look like in haddock), but I don't see the need for
> synonyms in the first place. Could you maybe explain the motivation
> behind them?

Certainly.  The idea is to provide synonyms whenever there are
multiple common names in use, as well as a consistent system of names
within the package itself.  The goal is for new users to be able to
get started using the library as quickly as possible -- users will
usually come looking for some particular function and they may already
have an idea about what it might be called.

To be concrete, the split package has three sets of synonyms:

  * splitOn / sepBy / unintercalate

    Here 'splitOn' is an internally consistent name, which matches
    with the naming scheme used in the rest of the package.  'sepBy'
    is a name from parsec and other parser combinator libraries;
    'unintercalate' emphasizes that this function is right inverse to
    'intercalate'.

  * splitOneOf / sepByOneOf

  * splitEvery / chunk

    Again, 'splitEvery' matches the internal naming scheme; 'chunk' is
    a name commonly used for this function within the community.

I don't see much harm in this (modulo making the documentation
clearer, which I plan to do).  And I really don't want to *remove*
existing names because that would force a major version bump and
potentially break any code depending on split.

-Brent

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Haskell platform proposal: split package - concerns about synonyms in API

Ben Moseley
Firstly, I agree that adding splitting functionality to Platform would be very useful - so thanks for all the work you've done.

On the naming side of things, I'd just like to say I agree with Roman here - I think having synonyms in an API is a bad idea. (It's one of the things I've disliked most about Coq).

I think there might be an argument that synonyms improve things for those who know the API really well - but that comes at the cost of worsening the experience for those (i.e. most) who know it moderately or fairly well and thus - when reading code written by others - find themselves struggling to remember whether the variants that they don't use personally are subtly different or not.

Anyway, I'd be strongly in favour of removing the synonyms (or at least side-lining them into a separate not-exported-by-default module) before adding to the platform.

--Ben
P.S. I think "intercalate" is an awful name, and "unintercalate" is certainly no better ;-) - so I'd be in favour of choosing one of the other two.

On 22 Jul 2012, at 01:43, Brent Yorgey wrote:

> On Sat, Jul 21, 2012 at 11:40:19PM +0300, Roman Cheplyaka wrote:
>> * Brent Yorgey <[hidden email]> [2012-07-21 08:09:09-0400]
>>> On Sat, Jul 21, 2012 at 09:24:16AM +0300, Roman Cheplyaka wrote:
>>>>
>>>> Regarding the API: I'm a bit concerned with presence of synonyms there
>>>> (e.g. chunk = splitEvery, sepBy = splitOn etc.) IMO it makes it harder
>>>> to learn the API (which is not small even without the synonyms), and
>>>> especially to read other people's code if their preferences in naming
>>>> differ from yours.
>>>
>>> Would your concern be addressed by Gábor's suggestion to group
>>> synonyms together in the documentation?
>>
>> This could be a minor improvement (I haven't checked how these
>> grouped functions look like in haddock), but I don't see the need for
>> synonyms in the first place. Could you maybe explain the motivation
>> behind them?
>
> Certainly.  The idea is to provide synonyms whenever there are
> multiple common names in use, as well as a consistent system of names
> within the package itself.  The goal is for new users to be able to
> get started using the library as quickly as possible -- users will
> usually come looking for some particular function and they may already
> have an idea about what it might be called.
>
> To be concrete, the split package has three sets of synonyms:
>
>  * splitOn / sepBy / unintercalate
>
>    Here 'splitOn' is an internally consistent name, which matches
>    with the naming scheme used in the rest of the package.  'sepBy'
>    is a name from parsec and other parser combinator libraries;
>    'unintercalate' emphasizes that this function is right inverse to
>    'intercalate'.
>
>  * splitOneOf / sepByOneOf
>
>  * splitEvery / chunk
>
>    Again, 'splitEvery' matches the internal naming scheme; 'chunk' is
>    a name commonly used for this function within the community.
>
> I don't see much harm in this (modulo making the documentation
> clearer, which I plan to do).  And I really don't want to *remove*
> existing names because that would force a major version bump and
> potentially break any code depending on split.
>
> -Brent
>
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/libraries


_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
12
Loading...