The amount of CPP we have to use is getting out of hand

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

The amount of CPP we have to use is getting out of hand

Johan Tibell-2
Hi,

(This was initially written as a Google+ post, but I'm reposting it here to raise awareness of the issue.)

The amount of CPP we have to use in Haskell is getting a bit out of hand. Here are the number of modules, per library, that use CPP for some of the libraries I maintain:

containers 18/18
hashable 4/5
unordered-containers 6/9
network 3/7
cassava 4/16
cabal/cabal-install 13/75
cabal/Cabal 7/78
ekg 1/15

If this doesn't look like a lot to you (I hope it does!) consider than some languages don't use CPP at all (e.g. Java).

CPP really sucks from a maintenance perspective:

 * It's not Haskell, but this bizarre string concatenation language.
 * The code is harder to read, bitrots more easily, and is harder to test.
 * The code can't be compiled without using Cabal (which generates some of the CPP macros for us.) This hurts e.g. ad-hoc testing/benchmarking.

There are a couple of reasons we use CPP, but the main one is breaking changes in GHC and libraries we depend on. We need to reduce these kind of breakages in the future. Dealing with breakages and maintaining the resulting CPP-ed code is costing us time we could spend on other things, such as improving our libraries or writing new ones. I for one would like to get on with writing applications instead of spending time on run-of-the-mill libraries.

Often these breaking changes are done in the name of "making things cleaner". Breaking changes, no matter how well-intended, doesn't make code cleaner, it makes it less clean*. Users end up having to use both the old "unclean" API and the new "clean" API.

The right way to move to evolve an new API is to add new functions and data types, not modify old ones, whenever possible.

* It takes about 3 major GHC releases (~3 years) before you can remove the CPP, but since new things keep breaking all the time you always have a considerable amount of CPP.

-- Johan


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: The amount of CPP we have to use is getting out of hand

Michael Snoyman

+1, you have my full support and agreement.


On Fri, Jan 9, 2015, 3:57 PM Johan Tibell <[hidden email]> wrote:
Hi,

(This was initially written as a Google+ post, but I'm reposting it here to raise awareness of the issue.)

The amount of CPP we have to use in Haskell is getting a bit out of hand. Here are the number of modules, per library, that use CPP for some of the libraries I maintain:

containers 18/18
hashable 4/5
unordered-containers 6/9
network 3/7
cassava 4/16
cabal/cabal-install 13/75
cabal/Cabal 7/78
ekg 1/15

If this doesn't look like a lot to you (I hope it does!) consider than some languages don't use CPP at all (e.g. Java).

CPP really sucks from a maintenance perspective:

 * It's not Haskell, but this bizarre string concatenation language.
 * The code is harder to read, bitrots more easily, and is harder to test.
 * The code can't be compiled without using Cabal (which generates some of the CPP macros for us.) This hurts e.g. ad-hoc testing/benchmarking.

There are a couple of reasons we use CPP, but the main one is breaking changes in GHC and libraries we depend on. We need to reduce these kind of breakages in the future. Dealing with breakages and maintaining the resulting CPP-ed code is costing us time we could spend on other things, such as improving our libraries or writing new ones. I for one would like to get on with writing applications instead of spending time on run-of-the-mill libraries.

Often these breaking changes are done in the name of "making things cleaner". Breaking changes, no matter how well-intended, doesn't make code cleaner, it makes it less clean*. Users end up having to use both the old "unclean" API and the new "clean" API.

The right way to move to evolve an new API is to add new functions and data types, not modify old ones, whenever possible.

* It takes about 3 major GHC releases (~3 years) before you can remove the CPP, but since new things keep breaking all the time you always have a considerable amount of CPP.

-- Johan

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: The amount of CPP we have to use is getting out of hand

Erik Hesselink
I agree in principle, however, I'm not sure how feasible it is in
practice. I quickly grepped through some of our (internal and public)
packages and checked why we need CPP (I only checked MIN_VERSION_foo
macros):

data type change: 11
newly added function/type: 6
added instance: 4
function change/rename: 3
function remove: 1
deprecated function: 1

As you can see, most instances are either data type changes (these are
almost all in template haskell or haskell-src-exts) or newly added
functions where we also want to support old versions without the
function. So avoiding function changing or removing would reduce the
CPP usage in our case by about 15%, which is welcome, but doesn't
really change anything fundamental. For data type changes, I don't
really see an alternative: if new features are added the AST changes,
the AST type changes. For additions of functions, I guess we could use
a local definition even for newer versions, but that makes it less
clear when you can remove it. For instance additions, I again see no
alternative.

So while I think we can do slightly better, and I would love it if
that happened, it's probably not going to be significant.

Erik

P.S. Just to add some more data, here's the packages where CPPing for:

   9 base
   7 template_haskell
   4 network
   4 haskell_src_exts
   2 uuid
   2 time
   1 wai
   1 json_schema
   1 containers
   1 HTTP

On Fri, Jan 9, 2015 at 3:00 PM, Michael Snoyman <[hidden email]> wrote:

> +1, you have my full support and agreement.
>
>
> On Fri, Jan 9, 2015, 3:57 PM Johan Tibell <[hidden email]> wrote:
>>
>> Hi,
>>
>> (This was initially written as a Google+ post, but I'm reposting it here
>> to raise awareness of the issue.)
>>
>> The amount of CPP we have to use in Haskell is getting a bit out of hand.
>> Here are the number of modules, per library, that use CPP for some of the
>> libraries I maintain:
>>
>> containers 18/18
>> hashable 4/5
>> unordered-containers 6/9
>> network 3/7
>> cassava 4/16
>> cabal/cabal-install 13/75
>> cabal/Cabal 7/78
>> ekg 1/15
>>
>> If this doesn't look like a lot to you (I hope it does!) consider than
>> some languages don't use CPP at all (e.g. Java).
>>
>> CPP really sucks from a maintenance perspective:
>>
>>  * It's not Haskell, but this bizarre string concatenation language.
>>  * The code is harder to read, bitrots more easily, and is harder to test.
>>  * The code can't be compiled without using Cabal (which generates some of
>> the CPP macros for us.) This hurts e.g. ad-hoc testing/benchmarking.
>>
>> There are a couple of reasons we use CPP, but the main one is breaking
>> changes in GHC and libraries we depend on. We need to reduce these kind of
>> breakages in the future. Dealing with breakages and maintaining the
>> resulting CPP-ed code is costing us time we could spend on other things,
>> such as improving our libraries or writing new ones. I for one would like to
>> get on with writing applications instead of spending time on run-of-the-mill
>> libraries.
>>
>> Often these breaking changes are done in the name of "making things
>> cleaner". Breaking changes, no matter how well-intended, doesn't make code
>> cleaner, it makes it less clean*. Users end up having to use both the old
>> "unclean" API and the new "clean" API.
>>
>> The right way to move to evolve an new API is to add new functions and
>> data types, not modify old ones, whenever possible.
>>
>> * It takes about 3 major GHC releases (~3 years) before you can remove the
>> CPP, but since new things keep breaking all the time you always have a
>> considerable amount of CPP.
>>
>> -- Johan
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> [hidden email]
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: The amount of CPP we have to use is getting out of hand

Johan Tibell-2
In reply to this post by Johan Tibell-2
If anyone would like to compute the CPP usage for your modules, you can use this command:

for lib in hashable cabal/Cabal cabal/cabal-install containers unordered-containers cassava ekg network; do
  echo $lib
  find $lib -type d \( -name tests -o -name benchmarks -o -name dist -o -name .cabal-sandbox -o -name tests-ghc \) -prune \
    -o -name Setup.hs -prune -o -name '*.hs' -exec grep -l 'LANGUAGE.*CPP' {} \; | wc -l
  find $lib -type d \( -name tests -o -name benchmarks -o -name dist -o -name .cabal-sandbox -o -name tests-ghc \) -prune \
    -o -name Setup.hs -prune -o -name '*.hs' -print | wc -l
done

Replace the list in the 'in' clause with your list of packages (which should all be in a per-package directory under $CWD).

On Fri, Jan 9, 2015 at 2:55 PM, Johan Tibell <[hidden email]> wrote:
Hi,

(This was initially written as a Google+ post, but I'm reposting it here to raise awareness of the issue.)

The amount of CPP we have to use in Haskell is getting a bit out of hand. Here are the number of modules, per library, that use CPP for some of the libraries I maintain:

containers 18/18
hashable 4/5
unordered-containers 6/9
network 3/7
cassava 4/16
cabal/cabal-install 13/75
cabal/Cabal 7/78
ekg 1/15

If this doesn't look like a lot to you (I hope it does!) consider than some languages don't use CPP at all (e.g. Java).

CPP really sucks from a maintenance perspective:

 * It's not Haskell, but this bizarre string concatenation language.
 * The code is harder to read, bitrots more easily, and is harder to test.
 * The code can't be compiled without using Cabal (which generates some of the CPP macros for us.) This hurts e.g. ad-hoc testing/benchmarking.

There are a couple of reasons we use CPP, but the main one is breaking changes in GHC and libraries we depend on. We need to reduce these kind of breakages in the future. Dealing with breakages and maintaining the resulting CPP-ed code is costing us time we could spend on other things, such as improving our libraries or writing new ones. I for one would like to get on with writing applications instead of spending time on run-of-the-mill libraries.

Often these breaking changes are done in the name of "making things cleaner". Breaking changes, no matter how well-intended, doesn't make code cleaner, it makes it less clean*. Users end up having to use both the old "unclean" API and the new "clean" API.

The right way to move to evolve an new API is to add new functions and data types, not modify old ones, whenever possible.

* It takes about 3 major GHC releases (~3 years) before you can remove the CPP, but since new things keep breaking all the time you always have a considerable amount of CPP.

-- Johan



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: The amount of CPP we have to use is getting out of hand

Johan Tibell-2
To complete the list, here are my other three packages:

network-uri 1/1
ekg-statsd 1/3
ekg-core 1/8


On Fri, Jan 9, 2015 at 3:46 PM, Johan Tibell <[hidden email]> wrote:
If anyone would like to compute the CPP usage for your modules, you can use this command:

for lib in hashable cabal/Cabal cabal/cabal-install containers unordered-containers cassava ekg network; do
  echo $lib
  find $lib -type d \( -name tests -o -name benchmarks -o -name dist -o -name .cabal-sandbox -o -name tests-ghc \) -prune \
    -o -name Setup.hs -prune -o -name '*.hs' -exec grep -l 'LANGUAGE.*CPP' {} \; | wc -l
  find $lib -type d \( -name tests -o -name benchmarks -o -name dist -o -name .cabal-sandbox -o -name tests-ghc \) -prune \
    -o -name Setup.hs -prune -o -name '*.hs' -print | wc -l
done

Replace the list in the 'in' clause with your list of packages (which should all be in a per-package directory under $CWD).

On Fri, Jan 9, 2015 at 2:55 PM, Johan Tibell <[hidden email]> wrote:
Hi,

(This was initially written as a Google+ post, but I'm reposting it here to raise awareness of the issue.)

The amount of CPP we have to use in Haskell is getting a bit out of hand. Here are the number of modules, per library, that use CPP for some of the libraries I maintain:

containers 18/18
hashable 4/5
unordered-containers 6/9
network 3/7
cassava 4/16
cabal/cabal-install 13/75
cabal/Cabal 7/78
ekg 1/15

If this doesn't look like a lot to you (I hope it does!) consider than some languages don't use CPP at all (e.g. Java).

CPP really sucks from a maintenance perspective:

 * It's not Haskell, but this bizarre string concatenation language.
 * The code is harder to read, bitrots more easily, and is harder to test.
 * The code can't be compiled without using Cabal (which generates some of the CPP macros for us.) This hurts e.g. ad-hoc testing/benchmarking.

There are a couple of reasons we use CPP, but the main one is breaking changes in GHC and libraries we depend on. We need to reduce these kind of breakages in the future. Dealing with breakages and maintaining the resulting CPP-ed code is costing us time we could spend on other things, such as improving our libraries or writing new ones. I for one would like to get on with writing applications instead of spending time on run-of-the-mill libraries.

Often these breaking changes are done in the name of "making things cleaner". Breaking changes, no matter how well-intended, doesn't make code cleaner, it makes it less clean*. Users end up having to use both the old "unclean" API and the new "clean" API.

The right way to move to evolve an new API is to add new functions and data types, not modify old ones, whenever possible.

* It takes about 3 major GHC releases (~3 years) before you can remove the CPP, but since new things keep breaking all the time you always have a considerable amount of CPP.

-- Johan




_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: The amount of CPP we have to use is getting out of hand

Nicola Gigante
In reply to this post by Johan Tibell-2

Il giorno 09/gen/2015, alle ore 14:55, Johan Tibell <[hidden email]> ha scritto:

Hi,

(This was initially written as a Google+ post, but I'm reposting it here to raise awareness of the issue.)

The amount of CPP we have to use in Haskell is getting a bit out of hand. Here are the number of modules, per library, that use CPP for some of the libraries I maintain:

containers 18/18
hashable 4/5
unordered-containers 6/9
network 3/7
cassava 4/16
cabal/cabal-install 13/75
cabal/Cabal 7/78
ekg 1/15

If this doesn't look like a lot to you (I hope it does!) consider than some languages don't use CPP at all (e.g. Java).

CPP really sucks from a maintenance perspective:

 * It's not Haskell, but this bizarre string concatenation language.
 * The code is harder to read, bitrots more easily, and is harder to test.
 * The code can't be compiled without using Cabal (which generates some of the CPP macros for us.) This hurts e.g. ad-hoc testing/benchmarking.

There are a couple of reasons we use CPP, but the main one is breaking changes in GHC and libraries we depend on. We need to reduce these kind of breakages in the future. Dealing with breakages and maintaining the resulting CPP-ed code is costing us time we could spend on other things, such as improving our libraries or writing new ones. I for one would like to get on with writing applications instead of spending time on run-of-the-mill libraries.

Often these breaking changes are done in the name of "making things cleaner". Breaking changes, no matter how well-intended, doesn't make code cleaner, it makes it less clean*. Users end up having to use both the old "unclean" API and the new "clean" API.

The right way to move to evolve an new API is to add new functions and data types, not modify old ones, whenever possible.

* It takes about 3 major GHC releases (~3 years) before you can remove the CPP, but since new things keep breaking all the time you always have a considerable amount of CPP.


Hi

I’m an outsider so this could probably sound ingenuous but,
why not thinking about an in-language feature to solve the
problems addressed by CPP?

I think these all fall into:
Enable some top-level declaration only if the XYZ feature is
available.

This feature should allow the user to specify different definitions
of the same symbol depending on the availability of compiler
features but also _modules_ features.

So if I can declare and export a newFunc function from my module 
only if DataKinds is supported, I can do it explicitly instead of
relying on GHC version X.Y.Z. On the other hand, the users of
my module can decide if they want to compile some code depending
on the fact that my module exports the function or not.

This should not be limited to “the module exports the function”. Other
types of “features” could be tested over, and modules should be
able to declare the new features added which deserve to be tested
in this way. For example, if in the 2.0 version of my module I’ve
increased the laziness of my data structure, I can export the feature
“MyModule.myFunc is lazy” (encoded in some way). Then the
user can decide which implementation of its algorithm to use depending
on this.

I think a system like this should solve the majority of maintenance burden
because:
- Dependencies on libraries features are explicit and the version
  numbers needed to support them can be inferred by cabal.
  For example cabal could support a syntax like containers(with:foo_is_lazy)
  instead of containers >= x.y.z
- GHC can automatically warn about features that are supported by all the 
  currently supported versions of GHC so that the checks can be removed.
- Code is more testable because the test suite could run tests multiple times,
  each time “faking” the availability of certain features, with the GHC support 
  of a “fake old version mode” where it has simply to pretend to not know
  the existence of a certain extension (not at all a “compatibility mode”, to be
  clear). As for library features, integrating with cabal sandboxes one could 
  automatically switch library versions to run the test with.
- Other?

I repeat: I’m an outsider of the world of maintenance of the haskell packages,
so I could be missing something obvious. Hope this can be useful though.


— Johan

Bye,
Nicola


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: The amount of CPP we have to use is getting out of hand

David Fox-12
I wonder how much of the CPP functionality could be implemented using template haskell?

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: The amount of CPP we have to use is getting out of hand

Michael Orlitzky
In reply to this post by Johan Tibell-2
On 01/09/2015 08:55 AM, Johan Tibell wrote:

> Hi,
>
> (This was initially written as a Google+ post, but I'm reposting it here to
> raise awareness of the issue.)
>
> The amount of CPP we have to use in Haskell is getting a bit out of
> hand. Here are the number of modules, per library, that use CPP for some of
> the libraries I maintain:
>
> containers 18/18
> hashable 4/5
> unordered-containers 6/9
> network 3/7
> cassava 4/16
> cabal/cabal-install 13/75
> cabal/Cabal 7/78
> ekg 1/15

I looked at a few of these, and some of the CPP could be avoided.
Whether or not the alternatives involve more work -- well, you be the judge.

1. TESTING constant. Here CPP is used to export internal stuff during
   testing, for example:

     module Data.Set (
     #if !defined(TESTING)
       Set
     #else
       Set(..)
     #endif

   This allows you to put the tests in a separate module, but give them
   access to internal functions. I'm torn on which solution is better,
   but I've settled on putting the tests in the module with the
   functions they test. You then have to depend on e.g. tasty, but who
   cares -- Cabal should be running the test suites anyway and bail out
   if they fail. That's (half of..) what they're for.

   You also have to import the Test.Crap in each module, but this
   bothers me less than I thought it would. If you use doctest to test
   your examples, then those tests have to go in the module with the
   functions themselves, so at that point there's no additional
   uncleanliness felt.

2. Optionally enable new features with newer GHCs. One example:

     #if MIN_VERSION_base(4,8,0)
     import Data.Coerce
     #endif

   These are better addressed with git branches. Do your development on
   the master branch targeting the latest GHC, but also keep a branch
   for older GHC. The master branch would have "import Data.Coerce", but
   the "old_ghc" branch would not. It doesn't produce much extra work
   -- git is designed to do exactly this. Whenever you make a new
   commit on master, it's trivial to merge it back into the old_ghc
   branch.

   Suppose your library foo is at version 1.5.0 when a new GHC is
   released. You can use the master branch for 1.6.0, using the new
   features. The next time you make a release, just release two new
   packages: 1.5.1 and 1.6.1 that target the old and new GHC
   respectively.

   This way you at least *work* off of a clean code base. Your new
   tricks in the master branch just look like a patch on top of
   what's in the old_ghc branch.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: The amount of CPP we have to use is getting out of hand

Andrey Chudnov
In reply to this post by Johan Tibell-2
Johan,
I hear and agree. Even though I've never used CPP in my packages, I've
read code that does ---and it's horrible. And I, too, have experienced
new GHC stable versions breaking foundational libraries, including
Cabal. But, my impression was that most of these breakages are due to
certain GHC extensions being deprecated, and not because the compiler
stops respecting the standard. Is my understanding correct? If so, then
why not disable extensions and limit yourself to Haskell 2010? Yes, you
get used to the good stuff quickly, and it's painful to give it up ---
but isn't that what standards are for?
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: The amount of CPP we have to use is getting out of hand

Thomas DuBuisson
It is not my experience that encouraging stagnation in library APIs
will either reduce CPP or improve the community.

My project modules using CPP:

0/12
0/6
3/9    - Changes to base/tagged, conditionally including expensive
code needed by few users
10/116  - gmp arch specific issues, RecursiveDo vs DoRec,
bitSizeMaybe, debug trace
0/7
4/5 - Platform specific code selection
0/1
0/7
1/4 - Architecture specific unsafeness for performance gains
5/207 - File and line number enhanced error messages
0/1

On Fri, Jan 9, 2015 at 2:45 PM, Andrey Chudnov <[hidden email]> wrote:

> Johan,
> I hear and agree. Even though I've never used CPP in my packages, I've read
> code that does ---and it's horrible. And I, too, have experienced new GHC
> stable versions breaking foundational libraries, including Cabal. But, my
> impression was that most of these breakages are due to certain GHC
> extensions being deprecated, and not because the compiler stops respecting
> the standard. Is my understanding correct? If so, then why not disable
> extensions and limit yourself to Haskell 2010? Yes, you get used to the good
> stuff quickly, and it's painful to give it up --- but isn't that what
> standards are for?
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: The amount of CPP we have to use is getting out of hand

Ben Gamari-2
In reply to this post by Nicola Gigante
Nicola Gigante <[hidden email]> writes:

>> Il giorno 09/gen/2015, alle ore 14:55, Johan Tibell <[hidden email]> ha scritto:
>>
>> Hi,
>>
>> (This was initially written as a Google+ post, but I'm reposting it
>> here to raise awareness of the issue.)
>>
>> The amount of CPP we have to use in Haskell is getting a bit out of
>> hand. Here are the number of modules, per library, that use CPP for
>> some of the libraries I maintain:
>>
[snip]

> Hi
>
> I’m an outsider so this could probably sound ingenuous but,
> why not thinking about an in-language feature to solve the
> problems addressed by CPP?
>
This might be a good time to bring in the data point provided by
Rust [1], where the attribute system to allow conditional
compilation. For instance,

    #[cfg(not(a_feature))]
    pub fn my_function() { ... }

    #[cfg(a_feature)]
    pub fn my_function() { ... }

The build system can then detect whether the feature in question is
available, and potentially pass `-f a_feature` to the compiler.  `cfg`
items can also have string values which can be tested for equality
(although I think they intend on extending this at some point).

This works well for them as it is flexible and fits nicely into the
language, reusing the attribute syntax that Rust users are already
familiar with.

The closest thing Haskell has to this is the conventional `{-# ... #-}`
pragma syntax, but leveraging this would almost certainly require
compiler support and a language extension.

Cheers,

- Ben


[1] http://doc.rust-lang.org/reference.html#conditional-compilation

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

attachment0 (482 bytes) Download Attachment