|
Hello everyone,
This is a proposal for the split package [1] to be included in the next major release of the Haskell platform. Everyone is invited to review this proposal, following the standard procedure [2] for proposing and reviewing packages. Review comments should be sent to the libraries mailing list by August 20 (arbitrarily chosen; there's plenty of time before the October 1 deadline [3]). The Haskell Platform wiki will be kept up-to-date with the results of the review process: http://trac.haskell.org/haskell-platform/wiki/Proposals/split [1] http://hackage.haskell.org/package/split [2] http://trac.haskell.org/haskell-platform/wiki/AddingPackages [3] http://trac.haskell.org/haskell-platform/wiki/ReleaseTimetable Credits ======= Proposal author and package maintainer: Brent Yorgey <byorgey at cis.upenn.edu> Abstract ======== The Data.List.Split module contains a wide range of strategies for splitting lists with respect to some sort of delimiter, mostly implemented through a unified combinator interface. The goal is to be a flexible yet simple alternative to the standard 'split' function found in some other mainstream languages. Documentation and tarball from the hackage page: http://hackage.haskell.org/package/split Development repo: darcs get http://code.haskell.org/~byorgey/code/split Rationale ========= Splitting a list into chunks based on some sort of delimiter(s) is a common need, and is provided in the standard libraries of several mainstream languages (e.g. Python [4], Ruby [5], Java [6]). Haskell beginners routinely ask whether such a function exists in the standard libraries. For a long time, the answer was no. Adding such a function to Haskell's standard libraries has been proposed multiple times over the years, but consensus was never reached on the design of such a function. (See, e.g. [7, 8, 9].) [4] http://docs.python.org/py3k/library/stdtypes.html?highlight=split#str.split [5] http://www.ruby-doc.org/core-1.9.3/String.html#method-i-split [6] http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String) [7] http://www.haskell.org/pipermail/libraries/2006-July/005504.html [8] http://www.haskell.org/pipermail/libraries/2006-October/006072.html [9] http://www.haskell.org/pipermail/libraries/2008-January/008922.html In December 2008 the split package was released, implementing not just a single split method, but a wide range of splitting strategies. Since then the split package has gained wide acceptance, with almost 95 reverse dependencies [10], putting it in the top 40 for number of reverse dependencies on Hackage. [10] http://packdeps.haskellers.com/reverse/split The package is quite stable. Since the 0.1.4 release in April 2011 only very minor updates have been made. It has a large suite of QuickCheck properties [11]; to my recollection no bugs have ever been reported. [11] http://code.haskell.org/~byorgey/code/split/Properties.hs API === For a detailed description of the package API and example usage, see the Haddock documentation: http://hackage.haskell.org/packages/archive/split/0.1.4.3/doc/html/Data-List-Split.html Design decisions ================ Most of the library is based around a (rather simple) combinator interface. Combinators are used to build up configuration records (recording options such as whether to keep delimiters, whether to keep blank segments, etc). A configuration record is finally handed off to a function which performs a generic maximally-information-preserving splitting algorithm and then does various postprocessing steps (based on the configuration) to selectively throw information away. It is probably not the fastest way to implement these methods, but speed is explicitly not a design goal: the aim is to provide a reasonably wide range of splitting strategies which can be used simply. Blazing speed (or more complex processing), when needed, can be obtained from a proper parsing package. Open issues =========== Use of GHC.Exts --------------- At the request of a user, the 0.1.4.3 release switched from defining its own version of the standard 'build' function, to importing it from GHC.Exts. This allows GHC to do more optimization, resulting in reported speedups to uses of splitEvery, splitPlaces, and splitPlacesBlanks. However, this makes the library GHC-specific. If any reviewers think this is an issue I would be willing to go back to defining build by hand, or use CPP macros to select between build implementations based on the compiler. Missing strategies ------------------ The specific way that the generic splitting algorithm is implemented does preclude some imaginable splitting strategies. For example, a few years ago I tried adding a strategy that used a predicate on pairs of elements, splitting down the middle of any pairs that satisfy the predicate, but gave up because it simply did not fit. _______________________________________________ Libraries mailing list [hidden email] http://www.haskell.org/mailman/listinfo/libraries |
|
On Fri, 20 Jul 2012, Brent Yorgey wrote: > Use of GHC.Exts > --------------- > > At the request of a user, the 0.1.4.3 release switched from defining > its own version of the standard 'build' function, to importing it from > GHC.Exts. This allows GHC to do more optimization, resulting in > reported speedups to uses of splitEvery, splitPlaces, and > splitPlacesBlanks. However, this makes the library GHC-specific. If > any reviewers think this is an issue I would be willing to go back to > defining build by hand, or use CPP macros to select between build > implementations based on the compiler. You could provide two private modules with the same name in different directories, one that re-exports 'build' from GHC.Exts and one with a custom definition of 'build' for non-GHC compilers. Then set 'Hs-Source-Dirs' in Cabal according to the impl(ghc). No CPP needed, only Cabal. One could even think of a separate package for this purpose. The only type extension you use, is GADTs, right? It looks like you use it for an Eq constraint in Delimiter/DelimSublist. That is, you actually need only ExistentialQuantification. Is it necessary? _______________________________________________ Libraries mailing list [hidden email] http://www.haskell.org/mailman/listinfo/libraries |
|
On Fri, 20 Jul 2012, Henning Thielemann wrote: > The only type extension you use, is GADTs, right? It looks like you use it > for an Eq constraint in Delimiter/DelimSublist. That is, you actually need > only ExistentialQuantification. Is it necessary? Hm, this (Eq a) isn't even part of an existential quantification. So where is GADT needed? _______________________________________________ Libraries mailing list [hidden email] http://www.haskell.org/mailman/listinfo/libraries |
|
In reply to this post by Henning Thielemann
On Fri, Jul 20, 2012 at 11:25:47PM +0200, Henning Thielemann wrote:
> > On Fri, 20 Jul 2012, Brent Yorgey wrote: > > >Use of GHC.Exts > >--------------- > > > >At the request of a user, the 0.1.4.3 release switched from defining > >its own version of the standard 'build' function, to importing it from > >GHC.Exts. This allows GHC to do more optimization, resulting in > >reported speedups to uses of splitEvery, splitPlaces, and > >splitPlacesBlanks. However, this makes the library GHC-specific. If > >any reviewers think this is an issue I would be willing to go back to > >defining build by hand, or use CPP macros to select between build > >implementations based on the compiler. > > You could provide two private modules with the same name in different > directories, one that re-exports 'build' from GHC.Exts and one with a > custom definition of 'build' for non-GHC compilers. Then set > 'Hs-Source-Dirs' in Cabal according to the impl(ghc). No CPP needed, > only Cabal. One could even think of a separate package for this > purpose. Ah, this is a good idea. I'd still like to hear from other reviewers whether they think it is worth the trouble. To what extent should the Haskell Platform try to be compiler-agnostic (even though it includes GHC)? > The only type extension you use, is GADTs, right? It looks like you > use it for an Eq constraint in Delimiter/DelimSublist. That is, you > actually need only ExistentialQuantification. Is it necessary? You are right, actually, only ExistentialQuantification is necessary, as long as we also stop using GADT syntax. I didn't realize before that this syntax is accepted: {-# LANGUAGE ExistentialQuantification #-} data Delimiter a = DelimEltPred (a -> Bool) | Eq a => DelimSublist [a] I do agree that this is a bit weird, what's going on here is not exactly existential quantification. But in any case the ExistentialQuantification extension turns on this ability to embed class constraints in data constructors -- at least in GHC. I am happy to make this change, but I guess it again raises the issue GHC-specificity. As to whether there is a way to do this without using ExistentialQuantification, I don't see an obvious solution (though there probably is one). The issue is that we certainly don't want to require an (Eq a) constraint when using a predicate, but we need one when matching sublists. I'm open to suggestions. -Brent _______________________________________________ Libraries mailing list [hidden email] http://www.haskell.org/mailman/listinfo/libraries |
|
In reply to this post by Brent Yorgey-2
+1. People often reinvent splitting utilities because adding dependency
on a package for simple functions like these seems overkill. Hopefully, inclusion into HP will help it and make the overhead smaller. Regarding the API: I'm a bit concerned with presence of synonyms there (e.g. chunk = splitEvery, sepBy = splitOn etc.) IMO it makes it harder to learn the API (which is not small even without the synonyms), and especially to read other people's code if their preferences in naming differ from yours. -- Roman I. Cheplyaka :: http://ro-che.info/ _______________________________________________ Libraries mailing list [hidden email] http://www.haskell.org/mailman/listinfo/libraries |
|
In reply to this post by Brent Yorgey-2
On Fri, 20 Jul 2012, Brent Yorgey wrote: > As to whether there is a way to do this without using > ExistentialQuantification, I don't see an obvious solution (though > there probably is one). The issue is that we certainly don't want to > require an (Eq a) constraint when using a predicate, but we need one > when matching sublists. I'm open to suggestions. If you do not embed the Eq dictionary into DelimSublist then you have to add the Eq constraint to some functions. Is this a problem? In turn you would get a perfectly portable Haskell 98 module. See attached patch. You would still need to do a major version bump. _______________________________________________ Libraries mailing list [hidden email] http://www.haskell.org/mailman/listinfo/libraries |
|
In reply to this post by Brent Yorgey-2
On Sat, Jul 21, 2012 at 2:34 AM, Brent Yorgey <[hidden email]> wrote:
> You are right, actually, only ExistentialQuantification is necessary, > as long as we also stop using GADT syntax. I didn't realize before > that this syntax is accepted: > > {-# LANGUAGE ExistentialQuantification #-} > > data Delimiter a = DelimEltPred (a -> Bool) > | Eq a => DelimSublist [a] > > I do agree that this is a bit weird, what's going on here is not > exactly existential quantification. But in any case the > ExistentialQuantification extension turns on this ability to embed > class constraints in data constructors -- at least in GHC. GADTs and ExistentialQuantification are pretty similar. There's only two differences: - Syntax - GADTs enable equality constraints, ExistentialQuantification does not But if you have equality constraints from somewhere else (say, TypeFamilies) then ExistentialQuantification is equivalent to GADTs. An unrelated suggestion: you can give type signatures to the various functions which are synonyms of each other as a group and they will show up as a single item in the Haddocks. For example, instead of -- | some docs splitOn :: Eq a => [a] -> [a] -> [[a]] -- | some other docs sepBy :: Eq a => [a] -> [a] -> [[a]] -- | different docs unintercalate :: Eq a => [a] -> [a] -> [[a]] you can have -- | one and only docs splitOn, sepBy, unintercalate :: Eq a => [a] -> [a] -> [[a]] I don't know if you consider this an improvement. I think I do. -- Your ship was caught in a monadic eruption. _______________________________________________ Libraries mailing list [hidden email] http://www.haskell.org/mailman/listinfo/libraries |
|
In reply to this post by Roman Cheplyaka-2
On Sat, Jul 21, 2012 at 09:24:16AM +0300, Roman Cheplyaka wrote:
> > Regarding the API: I'm a bit concerned with presence of synonyms there > (e.g. chunk = splitEvery, sepBy = splitOn etc.) IMO it makes it harder > to learn the API (which is not small even without the synonyms), and > especially to read other people's code if their preferences in naming > differ from yours. Would your concern be addressed by Gábor's suggestion to group synonyms together in the documentation? -Brent _______________________________________________ Libraries mailing list [hidden email] http://www.haskell.org/mailman/listinfo/libraries |
|
In reply to this post by Gábor Lehel
> An unrelated suggestion: you can give type signatures to the various
> functions which are synonyms of each other as a group and they will > show up as a single item in the Haddocks. Have you tried this a recent version of Haddock? I think this was only in the Haddock version that was released with GHC 7.2 (introduced due to a change in GHC). But it had issues with explicit export lists, so we now always expand such signatures. Cheers, Simon _______________________________________________ Libraries mailing list [hidden email] http://www.haskell.org/mailman/listinfo/libraries |
|
In reply to this post by Henning Thielemann
On Sat, Jul 21, 2012 at 09:34:13AM +0200, Henning Thielemann wrote:
> > On Fri, 20 Jul 2012, Brent Yorgey wrote: > > >As to whether there is a way to do this without using > >ExistentialQuantification, I don't see an obvious solution (though > >there probably is one). The issue is that we certainly don't want to > >require an (Eq a) constraint when using a predicate, but we need one > >when matching sublists. I'm open to suggestions. > > If you do not embed the Eq dictionary into DelimSublist then you have > to add the Eq constraint to some functions. Is this a problem? In > turn you would get a perfectly portable Haskell 98 module. No, I do not want to do this. The reason is that you would no longer be able to do splitting over lists of elements with no Eq instance, even if you are using a predicate. For example, it is currently possible to do fs = [(+1), (subtract 7), (*6)] fs' = splitWhen (\f -> f 7 == 0) fs but this would be no longer possible with your patch. This is an admittedly contrived example, but on principle I don't want to unnecessarily restrict the API in this way. -Brent _______________________________________________ Libraries mailing list [hidden email] http://www.haskell.org/mailman/listinfo/libraries |
|
In reply to this post by Brent Yorgey-2
On 2012-07-21 02:34, Brent Yorgey wrote:
> You are right, actually, only ExistentialQuantification is necessary, > as long as we also stop using GADT syntax. I didn't realize before > that this syntax is accepted: > > {-# LANGUAGE ExistentialQuantification #-} > > data Delimiter a = DelimEltPred (a -> Bool) > | Eq a => DelimSublist [a] > Would the following type work? data Delimiter a = DelimEltPred (a -> Bool) | DelimSublistPred [a -> Bool] You can go from the current DelimSublist to DelimSublistPred with just `map (==)`. And is the distinction between the DelimEltPred and DelimSublistPred then still needed at all? Twan _______________________________________________ Libraries mailing list [hidden email] http://www.haskell.org/mailman/listinfo/libraries |
|
On 2012-07-21 14:30, Twan van Laarhoven wrote:
> Would the following type work? > > data Delimiter a = DelimEltPred (a -> Bool) | DelimSublistPred [a -> Bool] > > You can go from the current DelimSublist to DelimSublistPred with just `map > (==)`. And is the distinction between the DelimEltPred and DelimSublistPred > then still needed at all? I attached some code that uses the simplified Delimiter type. As an aside, the documentation for splitPlaces contains this unhelpful remark: > The behavior of splitPlaces ls xs when sum ls /= length xs can be inferred > from the above examples and the fact that splitPlaces is total. To me that reads like "the documentation of this function is left as an exercise to the reader". Perhaps say something like: If the input list is longer than the total of the given lengths, then the remaining elements are dropped. If the list is shorter than the total of the given lengths, then the result may contain fewer chunks, and the last chunk may be shorter. While `splitPlacesBlanks` could say something like: If the input list is longer than the total of the given lengths, then the remaining elements are dropped. If the list is shorter than the total of the given lengths, then the last several chunk will be shorter or empty. Twan _______________________________________________ Libraries mailing list [hidden email] http://www.haskell.org/mailman/listinfo/libraries |
|
On Sat, Jul 21, 2012 at 02:59:45PM +0200, Twan van Laarhoven wrote:
> On 2012-07-21 14:30, Twan van Laarhoven wrote: > >Would the following type work? > > > >data Delimiter a = DelimEltPred (a -> Bool) | DelimSublistPred [a -> Bool] > > > >You can go from the current DelimSublist to DelimSublistPred with just `map > >(==)`. And is the distinction between the DelimEltPred and DelimSublistPred > >then still needed at all? > > I attached some code that uses the simplified Delimiter type. Brilliant! Yes, this is a big improvement, and does indeed allow getting rid of the ExistentialQuantification (or GADTs) extension. I will make this change for sure, regardless of the outcome of the review process. > As an aside, the documentation for splitPlaces contains this unhelpful remark: > > >The behavior of splitPlaces ls xs when sum ls /= length xs can be inferred > >from the above examples and the fact that splitPlaces is total. > > To me that reads like "the documentation of this function is left as > an exercise to the reader". Perhaps say something like: > > If the input list is longer than the total of the given lengths, > then the remaining elements are dropped. If the list is shorter than > the total of the given lengths, then the result may contain fewer > chunks, and the last chunk may be shorter. > > While `splitPlacesBlanks` could say something like: > > If the input list is longer than the total of the given lengths, > then the remaining elements are dropped. If the list is shorter than > the total of the given lengths, then the last several chunk will be > shorter or empty. Ah, yes, I agree. I must have been in somewhat of a cheeky mood when I wrote that. I will improve the documentation, thanks for the suggestions. -Brent _______________________________________________ Libraries mailing list [hidden email] http://www.haskell.org/mailman/listinfo/libraries |
|
In reply to this post by Simon Hengel
On Sat, Jul 21, 2012 at 2:13 PM, Simon Hengel <[hidden email]> wrote:
>> An unrelated suggestion: you can give type signatures to the various >> functions which are synonyms of each other as a group and they will >> show up as a single item in the Haddocks. > > Have you tried this a recent version of Haddock? I think this was only > in the Haddock version that was released with GHC 7.2 (introduced due to > a change in GHC). But it had issues with explicit export lists, so we > now always expand such signatures. > > Cheers, > Simon It seems to work with Haddock 2.10 / GHC 7.4. I remember I initially tried it with whatever old version of Haddock I had installed and I was annoyed that it didn't work (it got split into two declarations with one of them missing documentation entirely -- quite suboptimal), but then following a suggestion I tried it again with GHC 7.4 and was happy that it seemed to have improved and gave the expected behaviour, and impressively even worked across module boundaries. I would be sad if it stopped working again. :) Some examples (which Hackage built using 7.4): http://hackage.haskell.org/packages/archive/repa/3.2.1.1/doc/html/Data-Array-Repa.html (append, (++)) http://hackage.haskell.org/packages/archive/type-eq/0.1.2/doc/html/Type-Eq.html (cast, (|>), TypeEq(..)) -- Your ship was caught in a monadic eruption. _______________________________________________ Libraries mailing list [hidden email] http://www.haskell.org/mailman/listinfo/libraries |
|
On Sat, Jul 21, 2012 at 04:37:48PM +0200, Gábor Lehel wrote:
> On Sat, Jul 21, 2012 at 2:13 PM, Simon Hengel <[hidden email]> wrote: > >> An unrelated suggestion: you can give type signatures to the various > >> functions which are synonyms of each other as a group and they will > >> show up as a single item in the Haddocks. > > > > Have you tried this a recent version of Haddock? I think this was only > > in the Haddock version that was released with GHC 7.2 (introduced due to > > a change in GHC). But it had issues with explicit export lists, so we > > now always expand such signatures. > > > > Cheers, > > Simon > > It seems to work with Haddock 2.10 / GHC 7.4. I remember I initially > tried it with whatever old version of Haddock I had installed and I > was annoyed that it didn't work (it got split into two declarations > with one of them missing documentation entirely -- quite suboptimal), > but then following a suggestion I tried it again with GHC 7.4 and was > happy that it seemed to have improved and gave the expected behaviour, > and impressively even worked across module boundaries. I would be sad > if it stopped working again. :) Indeed, from the CHANGES file in the haddock distribution: Changes in version 2.9.3: * A type signature for multiple names generates one signature in the output Neato, I didn't know about this feature. I think grouping synonyms together like this is a good idea. -Brent _______________________________________________ Libraries mailing list [hidden email] http://www.haskell.org/mailman/listinfo/libraries |
|
In reply to this post by Gábor Lehel
(CCing David Waern, Haddocks maintainer)
On Sat, Jul 21, 2012 at 04:37:48PM +0200, Gábor Lehel wrote: > On Sat, Jul 21, 2012 at 2:13 PM, Simon Hengel <[hidden email]> wrote: > >> An unrelated suggestion: you can give type signatures to the various > >> functions which are synonyms of each other as a group and they will > >> show up as a single item in the Haddocks. > > > > Have you tried this a recent version of Haddock? I think this was only > > in the Haddock version that was released with GHC 7.2 (introduced due to > > a change in GHC). But it had issues with explicit export lists, so we > > now always expand such signatures. > > > > Cheers, > > Simon > > It seems to work with Haddock 2.10 / GHC 7.4. Oh, then the patch did not make it into Haddock 2.10 / GHC 7.4.1. But Haddock 2.11 (which comes with GHC 7.4.2) expands such signatures. The corresponding ticket is at [1]. > I remember I initially tried it with whatever old version of Haddock I > had installed and I was annoyed that it didn't work AFAIK, prior to GHC 7.2, ghc did not retain that information in the AST (see [2]). > (it got split into two declarations with one of them missing > documentation entirely -- quite suboptimal), Maybe some solace, you now at least get the documentation on all the declarations ;) > I would be sad if it stopped working again. :) > > Some examples (which Hackage built using 7.4): > http://hackage.haskell.org/packages/archive/repa/3.2.1.1/doc/html/Data-Array-Repa.html > (append, (++)) Looks like a valid use case. I never thought about that, but if you have synonyms for a function, it really makes sense. (Personally, I do not really like synonyms, but for this particularly case even that makes sense to me.) The issue was with stuff like: Module Foo ( -- * Foo foo -- * Bar , bar -- * Baz , baz ) where foo, bar, baz :: Int ... Or what, if you change the order of identifiers in the export list? There is an other case that would need special treatment. We now include deprecation messages for deprecated stuff in documentation. So for the following example the documentation for `foo` and `bar` would be different: -- | Documentation for `foo` and `bar`. foo, bar :: Int {-# DEPRECATED foo "use `bar` instead" #-} ... Cheers, Simon [1] http://trac.haskell.org/haddock/ticket/192 [2] http://hackage.haskell.org/trac/ghc/ticket/1595 _______________________________________________ Libraries mailing list [hidden email] http://www.haskell.org/mailman/listinfo/libraries |
|
In reply to this post by Brent Yorgey-2
I haven't reviewed the technical content of the proposal yet (beyond
having used split in prior projects), but I generally approve of having a split-like package in the Haskell Platform. Cheers, Edward Excerpts from Brent Yorgey's message of Fri Jul 20 16:35:57 -0400 2012: > Hello everyone, > > This is a proposal for the split package [1] to be included in the > next major release of the Haskell platform. > > Everyone is invited to review this proposal, following the standard > procedure [2] for proposing and reviewing packages. > > Review comments should be sent to the libraries mailing list by August > 20 (arbitrarily chosen; there's plenty of time before the October 1 > deadline [3]). The Haskell Platform wiki will be kept up-to-date with > the results of the review process: > > http://trac.haskell.org/haskell-platform/wiki/Proposals/split > > [1] http://hackage.haskell.org/package/split > [2] http://trac.haskell.org/haskell-platform/wiki/AddingPackages > [3] http://trac.haskell.org/haskell-platform/wiki/ReleaseTimetable > > Credits > ======= > > Proposal author and package maintainer: > Brent Yorgey <byorgey at cis.upenn.edu> > > Abstract > ======== > > The Data.List.Split module contains a wide range of strategies for > splitting lists with respect to some sort of delimiter, mostly > implemented through a unified combinator interface. The goal is to be > a flexible yet simple alternative to the standard 'split' function > found in some other mainstream languages. > > Documentation and tarball from the hackage page: > > http://hackage.haskell.org/package/split > > Development repo: > > darcs get http://code.haskell.org/~byorgey/code/split > > Rationale > ========= > > Splitting a list into chunks based on some sort of delimiter(s) is a > common need, and is provided in the standard libraries of several > mainstream languages (e.g. Python [4], Ruby [5], Java [6]). Haskell > beginners routinely ask whether such a function exists in the standard > libraries. For a long time, the answer was no. Adding such a > function to Haskell's standard libraries has been proposed multiple > times over the years, but consensus was never reached on the design of > such a function. (See, e.g. [7, 8, 9].) > > [4] http://docs.python.org/py3k/library/stdtypes.html?highlight=split#str.split > [5] http://www.ruby-doc.org/core-1.9.3/String.html#method-i-split > [6] http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String) > [7] http://www.haskell.org/pipermail/libraries/2006-July/005504.html > [8] http://www.haskell.org/pipermail/libraries/2006-October/006072.html > [9] http://www.haskell.org/pipermail/libraries/2008-January/008922.html > > In December 2008 the split package was released, implementing not just > a single split method, but a wide range of splitting strategies. > > Since then the split package has gained wide acceptance, with almost > 95 reverse dependencies [10], putting it in the top 40 for number of > reverse dependencies on Hackage. > > [10] http://packdeps.haskellers.com/reverse/split > > The package is quite stable. Since the 0.1.4 release in April 2011 > only very minor updates have been made. It has a large suite of > QuickCheck properties [11]; to my recollection no bugs have ever been > reported. > > [11] http://code.haskell.org/~byorgey/code/split/Properties.hs > > API > === > > For a detailed description of the package API and example usage, see > the Haddock documentation: > > http://hackage.haskell.org/packages/archive/split/0.1.4.3/doc/html/Data-List-Split.html > > Design decisions > ================ > > Most of the library is based around a (rather simple) combinator > interface. Combinators are used to build up configuration records > (recording options such as whether to keep delimiters, whether to keep > blank segments, etc). A configuration record is finally handed off to > a function which performs a generic maximally-information-preserving > splitting algorithm and then does various postprocessing steps (based > on the configuration) to selectively throw information away. It is > probably not the fastest way to implement these methods, but speed is > explicitly not a design goal: the aim is to provide a reasonably wide > range of splitting strategies which can be used simply. Blazing speed > (or more complex processing), when needed, can be obtained from a > proper parsing package. > > Open issues > =========== > > Use of GHC.Exts > --------------- > > At the request of a user, the 0.1.4.3 release switched from defining > its own version of the standard 'build' function, to importing it from > GHC.Exts. This allows GHC to do more optimization, resulting in > reported speedups to uses of splitEvery, splitPlaces, and > splitPlacesBlanks. However, this makes the library GHC-specific. If > any reviewers think this is an issue I would be willing to go back to > defining build by hand, or use CPP macros to select between build > implementations based on the compiler. > > Missing strategies _______________________________________________ Libraries mailing list [hidden email] http://www.haskell.org/mailman/listinfo/libraries |
|
In reply to this post by Brent Yorgey-2
* Brent Yorgey <[hidden email]> [2012-07-21 08:09:09-0400]
> On Sat, Jul 21, 2012 at 09:24:16AM +0300, Roman Cheplyaka wrote: > > > > Regarding the API: I'm a bit concerned with presence of synonyms there > > (e.g. chunk = splitEvery, sepBy = splitOn etc.) IMO it makes it harder > > to learn the API (which is not small even without the synonyms), and > > especially to read other people's code if their preferences in naming > > differ from yours. > > Would your concern be addressed by Gábor's suggestion to group > synonyms together in the documentation? This could be a minor improvement (I haven't checked how these grouped functions look like in haddock), but I don't see the need for synonyms in the first place. Could you maybe explain the motivation behind them? -- Roman I. Cheplyaka :: http://ro-che.info/ _______________________________________________ Libraries mailing list [hidden email] http://www.haskell.org/mailman/listinfo/libraries |
|
On Sat, Jul 21, 2012 at 11:40:19PM +0300, Roman Cheplyaka wrote:
> * Brent Yorgey <[hidden email]> [2012-07-21 08:09:09-0400] > > On Sat, Jul 21, 2012 at 09:24:16AM +0300, Roman Cheplyaka wrote: > > > > > > Regarding the API: I'm a bit concerned with presence of synonyms there > > > (e.g. chunk = splitEvery, sepBy = splitOn etc.) IMO it makes it harder > > > to learn the API (which is not small even without the synonyms), and > > > especially to read other people's code if their preferences in naming > > > differ from yours. > > > > Would your concern be addressed by Gábor's suggestion to group > > synonyms together in the documentation? > > This could be a minor improvement (I haven't checked how these > grouped functions look like in haddock), but I don't see the need for > synonyms in the first place. Could you maybe explain the motivation > behind them? Certainly. The idea is to provide synonyms whenever there are multiple common names in use, as well as a consistent system of names within the package itself. The goal is for new users to be able to get started using the library as quickly as possible -- users will usually come looking for some particular function and they may already have an idea about what it might be called. To be concrete, the split package has three sets of synonyms: * splitOn / sepBy / unintercalate Here 'splitOn' is an internally consistent name, which matches with the naming scheme used in the rest of the package. 'sepBy' is a name from parsec and other parser combinator libraries; 'unintercalate' emphasizes that this function is right inverse to 'intercalate'. * splitOneOf / sepByOneOf * splitEvery / chunk Again, 'splitEvery' matches the internal naming scheme; 'chunk' is a name commonly used for this function within the community. I don't see much harm in this (modulo making the documentation clearer, which I plan to do). And I really don't want to *remove* existing names because that would force a major version bump and potentially break any code depending on split. -Brent _______________________________________________ Libraries mailing list [hidden email] http://www.haskell.org/mailman/listinfo/libraries |
|
Firstly, I agree that adding splitting functionality to Platform would be very useful - so thanks for all the work you've done.
On the naming side of things, I'd just like to say I agree with Roman here - I think having synonyms in an API is a bad idea. (It's one of the things I've disliked most about Coq). I think there might be an argument that synonyms improve things for those who know the API really well - but that comes at the cost of worsening the experience for those (i.e. most) who know it moderately or fairly well and thus - when reading code written by others - find themselves struggling to remember whether the variants that they don't use personally are subtly different or not. Anyway, I'd be strongly in favour of removing the synonyms (or at least side-lining them into a separate not-exported-by-default module) before adding to the platform. --Ben P.S. I think "intercalate" is an awful name, and "unintercalate" is certainly no better ;-) - so I'd be in favour of choosing one of the other two. On 22 Jul 2012, at 01:43, Brent Yorgey wrote: > On Sat, Jul 21, 2012 at 11:40:19PM +0300, Roman Cheplyaka wrote: >> * Brent Yorgey <[hidden email]> [2012-07-21 08:09:09-0400] >>> On Sat, Jul 21, 2012 at 09:24:16AM +0300, Roman Cheplyaka wrote: >>>> >>>> Regarding the API: I'm a bit concerned with presence of synonyms there >>>> (e.g. chunk = splitEvery, sepBy = splitOn etc.) IMO it makes it harder >>>> to learn the API (which is not small even without the synonyms), and >>>> especially to read other people's code if their preferences in naming >>>> differ from yours. >>> >>> Would your concern be addressed by Gábor's suggestion to group >>> synonyms together in the documentation? >> >> This could be a minor improvement (I haven't checked how these >> grouped functions look like in haddock), but I don't see the need for >> synonyms in the first place. Could you maybe explain the motivation >> behind them? > > Certainly. The idea is to provide synonyms whenever there are > multiple common names in use, as well as a consistent system of names > within the package itself. The goal is for new users to be able to > get started using the library as quickly as possible -- users will > usually come looking for some particular function and they may already > have an idea about what it might be called. > > To be concrete, the split package has three sets of synonyms: > > * splitOn / sepBy / unintercalate > > Here 'splitOn' is an internally consistent name, which matches > with the naming scheme used in the rest of the package. 'sepBy' > is a name from parsec and other parser combinator libraries; > 'unintercalate' emphasizes that this function is right inverse to > 'intercalate'. > > * splitOneOf / sepByOneOf > > * splitEvery / chunk > > Again, 'splitEvery' matches the internal naming scheme; 'chunk' is > a name commonly used for this function within the community. > > I don't see much harm in this (modulo making the documentation > clearer, which I plan to do). And I really don't want to *remove* > existing names because that would force a major version bump and > potentially break any code depending on split. > > -Brent > > _______________________________________________ > Libraries mailing list > [hidden email] > http://www.haskell.org/mailman/listinfo/libraries _______________________________________________ Libraries mailing list [hidden email] http://www.haskell.org/mailman/listinfo/libraries |
| Powered by Nabble | Edit this page |
