[ANN] sparkle: native Apache Spark applications in Haskell

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[ANN] sparkle: native Apache Spark applications in Haskell

Alp Mestanogullari
Hello -cafe!

Recently at Tweag I/O we've been working on sparkle, a library for writing (distributed) Apache Spark applications directly in Haskell!

We have published a blog post introducing the project (and some of its challenges) here: http://www.tweag.io/blog/haskell-meets-large-scale-distributed-analytics

The corresponding repository lives at https://github.com/tweag/sparkle

While this is still early stage work, we can already write non-trivial Spark applications in Haskell and have them run accross an entire cluster. We obviously do not cover the whole Spark API yet (very, very far from that) but would be glad to already get some feedback.

Cheers

--
Alp Mestanogullari

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: [ANN] sparkle: native Apache Spark applications in Haskell

Lyndon Maydwell
Hi Alp,


Just wanted to say that there's interest here in Melbourne in Spark+Haskell too and we'll definitely be trying this out to see what it's like.

One of the problems that some of the more exotic language-bindings to spark have is that while they include RDD support, they lack a language-idiomatic interpretation of DataFrames. Does Sparkle attempt to tackle this?


Many thanks to Tweag I/O for doing this. It must have been a lot of work!


Regards,

 - Lyndon 

On Fri, Feb 26, 2016 at 4:50 AM, Alp Mestanogullari <[hidden email]> wrote:
Hello -cafe!

Recently at Tweag I/O we've been working on sparkle, a library for writing (distributed) Apache Spark applications directly in Haskell!

We have published a blog post introducing the project (and some of its challenges) here: http://www.tweag.io/blog/haskell-meets-large-scale-distributed-analytics

The corresponding repository lives at https://github.com/tweag/sparkle

While this is still early stage work, we can already write non-trivial Spark applications in Haskell and have them run accross an entire cluster. We obviously do not cover the whole Spark API yet (very, very far from that) but would be glad to already get some feedback.

Cheers

--
Alp Mestanogullari

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: [ANN] sparkle: native Apache Spark applications in Haskell

Alp Mestanogullari
Hello Lyndon,

Glad to hear this is of interest to you. Let us know if you have any kind of feedback -- just keep in mind we only cover a ridiculous fraction of the Spark API at the moment, but this can easily be expanded. The implementation of the Spark classes/methods that we have can be a guide for implementing ones that are not there yet.

Regarding data frames, well, as a haskeller, Spark's data frame impl feels a bit unsafe to me as the type (which is just 'DataFrame') doesn't indicate how many columns there are or what type the values stored in those columns have. But Spark provides a bunch of algorithms that use those data frames so if you happen to need one of those algorithms, you can quickly expose it to Haskell and then wrap it all in a type-safe and haskell-y way once you've made sure everything works. This all means that, at the moment, sparkle doesn't do anything smart there. If you have any idea/suggestion, we're all ears though!

On Fri, Feb 26, 2016 at 12:56 AM, Lyndon Maydwell <[hidden email]> wrote:
Hi Alp,


Just wanted to say that there's interest here in Melbourne in Spark+Haskell too and we'll definitely be trying this out to see what it's like.

One of the problems that some of the more exotic language-bindings to spark have is that while they include RDD support, they lack a language-idiomatic interpretation of DataFrames. Does Sparkle attempt to tackle this?


Many thanks to Tweag I/O for doing this. It must have been a lot of work!


Regards,

 - Lyndon 

On Fri, Feb 26, 2016 at 4:50 AM, Alp Mestanogullari <[hidden email]> wrote:
Hello -cafe!

Recently at Tweag I/O we've been working on sparkle, a library for writing (distributed) Apache Spark applications directly in Haskell!

We have published a blog post introducing the project (and some of its challenges) here: http://www.tweag.io/blog/haskell-meets-large-scale-distributed-analytics

The corresponding repository lives at https://github.com/tweag/sparkle

While this is still early stage work, we can already write non-trivial Spark applications in Haskell and have them run accross an entire cluster. We obviously do not cover the whole Spark API yet (very, very far from that) but would be glad to already get some feedback.

Cheers

--
Alp Mestanogullari

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe





--
Alp Mestanogullari

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe