the last mile in haskell performance

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

the last mile in haskell performance

Alberto G. Corona
Looking at this:


It seems that it is impossible to manage data in Haskell within a core without L1 cache faults. Except for unboxed arrays of primitive types.

Since it is impossible to have unboxed arrays of user-defined types.

Am I right?

This is definitively very bad for tasks that are inherently single threaded and in general for the image of Haskell as a practical language.

I have more to say about that, but I would like to know first if I´m right and second If there is some idea to going on to permit user defined boxed datatypes.  Or if there is some low level trick for having it using foreign call and unsafeCoerce in some way,

I know that the language ATS has unboxing a la carte....

--
Alberto.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: the last mile in haskell performance

David Kraeutmann
This might be of interest to you:
https://hackage.haskell.org/package/structs


On 11/12/2015 6:49 PM, Alberto G. Corona wrote:

> Looking at this:
>
> https://downloads.haskell.org/~ghc/6.12.3/docs/html/users_guide/primitives.html
>
> It seems that it is impossible to manage data in Haskell within a core
> without L1 cache faults. Except for unboxed arrays of primitive types.
>
> Since it is impossible to have unboxed arrays of user-defined types.
>
> Am I right?
>
> This is definitively very bad for tasks that are inherently single threaded
> and in general for the image of Haskell as a practical language.
>
> I have more to say about that, but I would like to know first if I´m right
> and second If there is some idea to going on to permit user defined boxed
> datatypes.  Or if there is some low level trick for having it using foreign
> call and unsafeCoerce in some way,
>
> I know that the language ATS has unboxing a la carte....
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: the last mile in haskell performance

Will Yager
In reply to this post by Alberto G. Corona

On Thu, Nov 12, 2015 at 11:49 AM, Alberto G. Corona <[hidden email]> wrote:
Looking at this:


It seems that it is impossible to manage data in Haskell within a core without L1 cache faults. Except for unboxed arrays of primitive types.

Since it is impossible to have unboxed arrays of user-defined types.

Am I right?

This is definitively very bad for tasks that are inherently single threaded and in general for the image of Haskell as a practical language.

I have more to say about that, but I would like to know first if I´m right and second If there is some idea to going on to permit user defined boxed datatypes.  Or if there is some low level trick for having it using foreign call and unsafeCoerce in some way,

I know that the language ATS has unboxing a la carte....

--
Alberto.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: the last mile in haskell performance

Alberto G. Corona
A buildt-in derivation of the Storable instance would be the solution for the problem. Perhaps a meaningful summer of code project?

2015-11-12 20:38 GMT+01:00 William Yager <[hidden email]>:

On Thu, Nov 12, 2015 at 11:49 AM, Alberto G. Corona <[hidden email]> wrote:
Looking at this:


It seems that it is impossible to manage data in Haskell within a core without L1 cache faults. Except for unboxed arrays of primitive types.

Since it is impossible to have unboxed arrays of user-defined types.

Am I right?

This is definitively very bad for tasks that are inherently single threaded and in general for the image of Haskell as a practical language.

I have more to say about that, but I would like to know first if I´m right and second If there is some idea to going on to permit user defined boxed datatypes.  Or if there is some low level trick for having it using foreign call and unsafeCoerce in some way,

I know that the language ATS has unboxing a la carte....

--
Alberto.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe





--
Alberto.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: the last mile in haskell performance

Alberto G. Corona
In reply to this post by David Kraeutmann
There are no examples. It is hard to guess the functionality and the maturity of he approach.

2015-11-12 18:56 GMT+01:00 David Kraeutmann <[hidden email]>:
This might be of interest to you:
https://hackage.haskell.org/package/structs


On 11/12/2015 6:49 PM, Alberto G. Corona wrote:
> Looking at this:
>
> https://downloads.haskell.org/~ghc/6.12.3/docs/html/users_guide/primitives.html
>
> It seems that it is impossible to manage data in Haskell within a core
> without L1 cache faults. Except for unboxed arrays of primitive types.
>
> Since it is impossible to have unboxed arrays of user-defined types.
>
> Am I right?
>
> This is definitively very bad for tasks that are inherently single threaded
> and in general for the image of Haskell as a practical language.
>
> I have more to say about that, but I would like to know first if I´m right
> and second If there is some idea to going on to permit user defined boxed
> datatypes.  Or if there is some low level trick for having it using foreign
> call and unsafeCoerce in some way,
>
> I know that the language ATS has unboxing a la carte....
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>





--
Alberto.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: the last mile in haskell performance

Michael Snoyman
How about vector-th-unbox[1]?


On Thu, Nov 12, 2015 at 2:53 PM, Alberto G. Corona <[hidden email]> wrote:
There are no examples. It is hard to guess the functionality and the maturity of he approach.

2015-11-12 18:56 GMT+01:00 David Kraeutmann <[hidden email]>:
This might be of interest to you:
https://hackage.haskell.org/package/structs


On 11/12/2015 6:49 PM, Alberto G. Corona wrote:
> Looking at this:
>
> https://downloads.haskell.org/~ghc/6.12.3/docs/html/users_guide/primitives.html
>
> It seems that it is impossible to manage data in Haskell within a core
> without L1 cache faults. Except for unboxed arrays of primitive types.
>
> Since it is impossible to have unboxed arrays of user-defined types.
>
> Am I right?
>
> This is definitively very bad for tasks that are inherently single threaded
> and in general for the image of Haskell as a practical language.
>
> I have more to say about that, but I would like to know first if I´m right
> and second If there is some idea to going on to permit user defined boxed
> datatypes.  Or if there is some low level trick for having it using foreign
> call and unsafeCoerce in some way,
>
> I know that the language ATS has unboxing a la carte....
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>





--
Alberto.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: the last mile in haskell performance

Alberto G. Corona
That is more practical.  

It is a pity that the Unboxed class does not interact with the UNPACK pragma. Or it seems so. 

The problem that round my head after being exposed to the Google phylosophy of "count the bytes" is the trade off when choosing containers: Either boxed, pure, linked multithreaded or  unboxed mutable single threaded.  

Haskell philosophy choose the first, while almost all the mainstream languages choose the second. That is another problem for the adoption of haskell.  Google people say: we don´t need multithreaded programs because we run many single threaded programs in one machine, we use all the cores. 
Only when there is a single application to run the justification for the extra effort of multithreading is justified. And this happens rarely in the real world: In scientific, engineering, financial it is usual. It also happens in distributed settings but in that last case, performance per thread and core is also critical, so Haskell is ruled out.

But that hasn´t to be that way, or at least that is what I think. DiffUArrays are internally mutable but with a pure interface.  They use a kind of versioning. in single threaded environments it theoretically perform at mutable speeds.  the versioning of diffArray is the blend of packed and linked structure that can mix the two worlds.

If the unboxing is extended to any kind of user defined data, the versioning idea can be used to have containers that perform at C speeds when single threaded, but preserve the purity when multithreaded.  So it is possible to have the cake and eat it too.

it is even possible to codify balanced binary trees in a compact diffarray, so very fast maps can be used in single threaded applications that also are pure and work multithreaded.

The goal is to remove the objections about haskell coming from that side of the computer industry by having such containers available without forcing the user to know lots of things about the internals of haskell and GHC.

I do not know if there are thing going on in some of this direction. Maybe I´m being simplistic and there is something that I miss . 

By experience I know that what sell more from a language is not the real performance numbers, but the approaches that the language takes and how much that promises for the future: 

For example I can develop a kind of container following this idea that perform badly both in single and multithreaded. for sure the early version should be so. But, if  people understand that the design has potential for being optimized in the future, people will buy the idea and will accept happily the Haskell language for performance semi-critical apps because they will have arguments against the objections of this kind .

2015-11-12 23:56 GMT+01:00 Michael Snoyman <[hidden email]>:
How about vector-th-unbox[1]?


On Thu, Nov 12, 2015 at 2:53 PM, Alberto G. Corona <[hidden email]> wrote:
There are no examples. It is hard to guess the functionality and the maturity of he approach.

2015-11-12 18:56 GMT+01:00 David Kraeutmann <[hidden email]>:
This might be of interest to you:
https://hackage.haskell.org/package/structs


On 11/12/2015 6:49 PM, Alberto G. Corona wrote:
> Looking at this:
>
> https://downloads.haskell.org/~ghc/6.12.3/docs/html/users_guide/primitives.html
>
> It seems that it is impossible to manage data in Haskell within a core
> without L1 cache faults. Except for unboxed arrays of primitive types.
>
> Since it is impossible to have unboxed arrays of user-defined types.
>
> Am I right?
>
> This is definitively very bad for tasks that are inherently single threaded
> and in general for the image of Haskell as a practical language.
>
> I have more to say about that, but I would like to know first if I´m right
> and second If there is some idea to going on to permit user defined boxed
> datatypes.  Or if there is some low level trick for having it using foreign
> call and unsafeCoerce in some way,
>
> I know that the language ATS has unboxing a la carte....
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>





--
Alberto.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe





--
Alberto.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: the last mile in haskell performance

Alberto G. Corona
Michael,

Thanks to your insight I took a look at Data.Vector.Unboxed

I was very happy, but there is a caveat; Looking at the documentation:

"In particular, unboxed vectors of pairs are represented as pairs of unboxed vectors". https://tldrify.com/cfj

it seems that derivingUnbox and all the Unboxed Vectors code do as much as is possible with what GHC offers to make things unboxed, which are reduced to the primitive types.

So the only way to pack an user defined data is to create one Vector of unboxed primitive type for each  field of the data . So if I have a data with five fields the result is five Vectors.

This is nice in some cases, but does most of the time does not. this does not solve the problem of CPU cache since the fields in the data are at least lenght (Vector)  away. I mean that if the vector is moderately long, if the first field is in the cache, the second or third etc may not be. Usually the fields of any data are handled together.  

This solution will not match C- C++ speed consistently for most single threaded programs.  Moreover there is an additional cost in the packing/unpacking necessary.

However this is the best that can be done with what GHC offers now.

The solution is the possibility of unboxed user-defined types:

data MyUnboxedData= MyUnboxedData#    Int#    Float# ....

where MyUnboxedData#   is an unboxed constructor 

Then there are some questions:
1) Am I wrong? 
2) It is feashible?  
3) It is worth the pain? (My answer: yes This is not satisfactory, it is very important for Haskell success and should be addressed if there is no  insurmountable barrier)
4) Are there alternatives before making a formal proposal?

Thanks

2015-11-13 12:42 GMT+01:00 Alberto G. Corona <[hidden email]>:
That is more practical.  

It is a pity that the Unboxed class does not interact with the UNPACK pragma. Or it seems so. 

The problem that round my head after being exposed to the Google phylosophy of "count the bytes" is the trade off when choosing containers: Either boxed, pure, linked multithreaded or  unboxed mutable single threaded.  

Haskell philosophy choose the first, while almost all the mainstream languages choose the second. That is another problem for the adoption of haskell.  Google people say: we don´t need multithreaded programs because we run many single threaded programs in one machine, we use all the cores. 
Only when there is a single application to run the justification for the extra effort of multithreading is justified. And this happens rarely in the real world: In scientific, engineering, financial it is usual. It also happens in distributed settings but in that last case, performance per thread and core is also critical, so Haskell is ruled out.

But that hasn´t to be that way, or at least that is what I think. DiffUArrays are internally mutable but with a pure interface.  They use a kind of versioning. in single threaded environments it theoretically perform at mutable speeds.  the versioning of diffArray is the blend of packed and linked structure that can mix the two worlds.

If the unboxing is extended to any kind of user defined data, the versioning idea can be used to have containers that perform at C speeds when single threaded, but preserve the purity when multithreaded.  So it is possible to have the cake and eat it too.

it is even possible to codify balanced binary trees in a compact diffarray, so very fast maps can be used in single threaded applications that also are pure and work multithreaded.

The goal is to remove the objections about haskell coming from that side of the computer industry by having such containers available without forcing the user to know lots of things about the internals of haskell and GHC.

I do not know if there are thing going on in some of this direction. Maybe I´m being simplistic and there is something that I miss . 

By experience I know that what sell more from a language is not the real performance numbers, but the approaches that the language takes and how much that promises for the future: 

For example I can develop a kind of container following this idea that perform badly both in single and multithreaded. for sure the early version should be so. But, if  people understand that the design has potential for being optimized in the future, people will buy the idea and will accept happily the Haskell language for performance semi-critical apps because they will have arguments against the objections of this kind .

2015-11-12 23:56 GMT+01:00 Michael Snoyman <[hidden email]>:
How about vector-th-unbox[1]?


On Thu, Nov 12, 2015 at 2:53 PM, Alberto G. Corona <[hidden email]> wrote:
There are no examples. It is hard to guess the functionality and the maturity of he approach.

2015-11-12 18:56 GMT+01:00 David Kraeutmann <[hidden email]>:
This might be of interest to you:
https://hackage.haskell.org/package/structs


On 11/12/2015 6:49 PM, Alberto G. Corona wrote:
> Looking at this:
>
> https://downloads.haskell.org/~ghc/6.12.3/docs/html/users_guide/primitives.html
>
> It seems that it is impossible to manage data in Haskell within a core
> without L1 cache faults. Except for unboxed arrays of primitive types.
>
> Since it is impossible to have unboxed arrays of user-defined types.
>
> Am I right?
>
> This is definitively very bad for tasks that are inherently single threaded
> and in general for the image of Haskell as a practical language.
>
> I have more to say about that, but I would like to know first if I´m right
> and second If there is some idea to going on to permit user defined boxed
> datatypes.  Or if there is some low level trick for having it using foreign
> call and unsafeCoerce in some way,
>
> I know that the language ATS has unboxing a la carte....
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>





--
Alberto.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe





--
Alberto.



--
Alberto.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: the last mile in haskell performance

Tom Ellis
On Sat, Nov 14, 2015 at 12:30:12PM +0100, Alberto G. Corona  wrote:
> 4) Are there alternatives before making a formal proposal?

I think you should talk to Edward Kmett and Johan Tibell about this.
They are interested in such matters.

Tom
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: the last mile in haskell performance

Will Yager
In reply to this post by Alberto G. Corona
This is why CPUs have many independent cache lines. Unpacking a vector into multiple vectors is usually fine for performance. I have seen it actually increase performance, because it simplifies addressing. 

-Will

On Nov 14, 2015, at 05:30, Alberto G. Corona <[hidden email]> wrote:


This is nice in some cases, but does most of the time does not. this does not solve the problem of CPU cache since the fields in the data are at least lenght (Vector)  away. I mean that if the vector is moderately long, if the first field is in the cache, the second or third etc may not be. Usually the fields of any data are handled together.  


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: the last mile in haskell performance

Carter Schonwald
Indeed! There's no one true best format.  
There was some work to add native support for c structs to the Haskell ffi, as storable format for non recursive records, but there's some really gnarly subtleties with respect to alignment and packing that come up that likely are fundamentally impossible to characterize in a simple algorithmic fashion that will serve everyone well. 

On Saturday, November 14, 2015, Will Yager <[hidden email]> wrote:
This is why CPUs have many independent cache lines. Unpacking a vector into multiple vectors is usually fine for performance. I have seen it actually increase performance, because it simplifies addressing. 

-Will

On Nov 14, 2015, at 05:30, Alberto G. Corona <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;agocorona@gmail.com&#39;);" target="_blank">agocorona@...> wrote:


This is nice in some cases, but does most of the time does not. this does not solve the problem of CPU cache since the fields in the data are at least lenght (Vector)  away. I mean that if the vector is moderately long, if the first field is in the cache, the second or third etc may not be. Usually the fields of any data are handled together.  


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: the last mile in haskell performance

Alberto G. Corona
In reply to this post by Will Yager
Hi Will,
Right, I'm not an expert on low level things, but yes, each memory page can cache a different vector and even can work faster. Specially if the algoritm uses a few fields of a large structure. I was wrong on that.

But anyway, Unboxed  need more native support to give Haskell more credibility in performance critical problems.  Now it has some conversion overhead  for user defined data.  That may be optimized away but the whole thing is second class, via an instance instead of a language feature.

Maybe  automatic deriving Unboxed instances can be the right compromise

2015-11-14 18:53 GMT+01:00 Will Yager <[hidden email]>:
This is why CPUs have many independent cache lines. Unpacking a vector into multiple vectors is usually fine for performance. I have seen it actually increase performance, because it simplifies addressing. 

-Will

On Nov 14, 2015, at 05:30, Alberto G. Corona <[hidden email]> wrote:


This is nice in some cases, but does most of the time does not. this does not solve the problem of CPU cache since the fields in the data are at least lenght (Vector)  away. I mean that if the vector is moderately long, if the first field is in the cache, the second or third etc may not be. Usually the fields of any data are handled together.  




--
Alberto.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: the last mile in haskell performance

Alberto G. Corona
In reply to this post by Carter Schonwald
Hi Carter

Now that the Strict pragma is being implemented, perhaps it is time to retake that work.  

I have leksah IDE installed. it is a haskell application. What I can verify with my performance meter on Windows is that merely scrolling text, the lekssah  produces much more page faults than any non Haskell application.

2015-11-15 1:45 GMT+01:00 Carter Schonwald <[hidden email]>:
Indeed! There's no one true best format.  
There was some work to add native support for c structs to the Haskell ffi, as storable format for non recursive records, but there's some really gnarly subtleties with respect to alignment and packing that come up that likely are fundamentally impossible to characterize in a simple algorithmic fashion that will serve everyone well. 


On Saturday, November 14, 2015, Will Yager <[hidden email]> wrote:
This is why CPUs have many independent cache lines. Unpacking a vector into multiple vectors is usually fine for performance. I have seen it actually increase performance, because it simplifies addressing. 

-Will

On Nov 14, 2015, at 05:30, Alberto G. Corona <[hidden email]> wrote:


This is nice in some cases, but does most of the time does not. this does not solve the problem of CPU cache since the fields in the data are at least lenght (Vector)  away. I mean that if the vector is moderately long, if the first field is in the cache, the second or third etc may not be. Usually the fields of any data are handled together.  




--
Alberto.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: the last mile in haskell performance

Kosyrev Serge-2
In reply to this post by Alberto G. Corona
"Alberto G. Corona " <[hidden email]> writes:

> Hi Will,
> Right, I'm not an expert on low level things, but yes, each memory
> page can cache a different vector and even can work faster. Specially
> if the algoritm uses a few fields of a large structure. I was wrong on
> that.
>
> But anyway, Unboxed need more native support to give Haskell more
> credibility in performance critical problems. Now it has some
> conversion overhead for user defined data. That may be optimized away
> but the whole thing is second class, via an instance instead of a
> language feature.
>
> Maybe automatic deriving Unboxed instances can be the right compromise

Given how the imagination immediately suggests that:

  1. performance nuances dictated by the context might suggest different
     preferences for Unboxed encodings..

  2. and, on the other hand, given the undesirability of always engaging
     oneself into full-blown instance definition -- for various reasons..

..it suggests that a language feature would be helpful, that would allow
to gradually inform the derivation GHC makes, without fully engaging into
specifying complete details.

Can we imagine that a derivation could proceed, for example, from:

  - a /subset/ of minimal typeclass definition, or alternatively
  - direct parameters

..?

Also, a set of deeper questions could be asked, if you can forgive me this
daydreaming..

If we follow the spirit of self-defined languages (won't name them):

 1. does it make sense to "open up" the process of derivation?

 2. what prevents us from describing derivations as a functional transformation?

 3. what is the possible language for these -- TH? something else?


--
с уважениeм / respectfully,
Косырев Сергей
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: the last mile in haskell performance

Janis Voigtländer

Also, a set of deeper questions could be asked, if you can forgive me this
daydreaming..

If we follow the spirit of self-defined languages (won’t name them):

  1. does it make sense to “open up” the process of derivation?

  2. what prevents us from describing derivations as a functional transformation?

  3. what is the possible language for these — TH? something else?

https://wiki.haskell.org/GHC.Generics?


2015-11-17 10:50 GMT+01:00 Kosyrev Serge <[hidden email]>:
"Alberto G. Corona " <[hidden email]> writes:
> Hi Will,
> Right, I'm not an expert on low level things, but yes, each memory
> page can cache a different vector and even can work faster. Specially
> if the algoritm uses a few fields of a large structure. I was wrong on
> that.
>
> But anyway, Unboxed need more native support to give Haskell more
> credibility in performance critical problems. Now it has some
> conversion overhead for user defined data. That may be optimized away
> but the whole thing is second class, via an instance instead of a
> language feature.
>
> Maybe automatic deriving Unboxed instances can be the right compromise

Given how the imagination immediately suggests that:

  1. performance nuances dictated by the context might suggest different
     preferences for Unboxed encodings..

  2. and, on the other hand, given the undesirability of always engaging
     oneself into full-blown instance definition -- for various reasons..

..it suggests that a language feature would be helpful, that would allow
to gradually inform the derivation GHC makes, without fully engaging into
specifying complete details.

Can we imagine that a derivation could proceed, for example, from:

  - a /subset/ of minimal typeclass definition, or alternatively
  - direct parameters

..?

Also, a set of deeper questions could be asked, if you can forgive me this
daydreaming..

If we follow the spirit of self-defined languages (won't name them):

 1. does it make sense to "open up" the process of derivation?

 2. what prevents us from describing derivations as a functional transformation?

 3. what is the possible language for these -- TH? something else?


--
с уважениeм / respectfully,
Косырев Сергей
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe