[ANN] Karps-0.2.0 Haskell frontend and optimizer for Apache Spark's dataframes and datasets

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[ANN] Karps-0.2.0 Haskell frontend and optimizer for Apache Spark's dataframes and datasets

Karps
Hello,
 
I would like to announce the release of Karps, an experimental Haskell frontend to Spark Dataframes and datasets. Apache Spark [1] is a popular framework for distributed programming, which comes with different APIs. The excellent Sparkle project [2] from Tweag integrates well with Spark's low-level ("RDD") API, while Karps focuses on the more recent dataframe and dataset API only. In that sense, both projects are complementary in their goals and scope.
 
What can you do with it? So far, simple queries such as number manipulation, importing lists of data, reading json files, etc. To facilitate debugging, it integrates with Google's Tensorboard [3] to provide rich visualizations of the dataflow. In addition, thanks to Haskell, it includes a full-program analyzer and optimizer that can automate common tasks such as cache management, query optimizations, etc. Some IHaskell notebooks give a flavor of what is possible, see the link in the github page:
 
 
 
The main motivation of the author (a Spark developer) is that writing Spark frontends for new programming languages is very hard. Karps explores a language-agnostic API that is easy enough to build simple frontends (javascript, julia), yet allows Spark to perform rich optimizations under the hood. If you want to know more, a talk will take place at the San Francisco Spark Users meetup on this topic.
 
Since this is my first Haskell project (I wrote my first line of Haskell nine months ago), I will appreciate all feedback regarding form and substance. For example, some questions still puzzle me:
- how to integrate a style checker (I use atom+ghc-mod)
- what are the best practices for integration testing?
- can I have tests that depend on internal modules, yet hide these internal module from the haddock documentation?
 
Thank you for your feedback
 
[1] http://spark.apache.org/
[2] https://github.com/tweag/sparkle
[3] https://www.tensorflow.org/get_started/summaries_and_tensorboard
 

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Loading...