[ANN] Karps-0.2.0 Haskell frontend and optimizer for Apache Spark's dataframes and datasets
I would like to announce the release of Karps, an experimental Haskell frontend to Spark Dataframes and datasets. Apache Spark  is a popular framework for distributed programming, which comes with different APIs. The excellent Sparkle project  from Tweag integrates well with Spark's low-level ("RDD") API, while Karps focuses on the more recent dataframe and dataset API only. In that sense, both projects are complementary in their goals and scope.
What can you do with it? So far, simple queries such as number manipulation, importing lists of data, reading json files, etc. To facilitate debugging, it integrates with Google's Tensorboard  to provide rich visualizations of the dataflow. In addition, thanks to Haskell, it includes a full-program analyzer and optimizer that can automate common tasks such as cache management, query optimizations, etc. Some IHaskell notebooks give a flavor of what is possible, see the link in the github page:
Since this is my first Haskell project (I wrote my first line of Haskell nine months ago), I will appreciate all feedback regarding form and substance. For example, some questions still puzzle me:
- how to integrate a style checker (I use atom+ghc-mod)
- what are the best practices for integration testing?
- can I have tests that depend on internal modules, yet hide these internal module from the haddock documentation?