ANN:DBFunctor-0.1.1.0 - Functional Data Management / ETL Data Processing in Haskell

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

ANN:DBFunctor-0.1.1.0 - Functional Data Management / ETL Data Processing in Haskell

Nikos Karagiannidis
Dear all,

I am pleased to announce the release of DBFunctor 0.1.1.0

DBFunctor is a Haskell library for ETL (Extract Transform and Load) data processing. It comes with an embedded type-level DSL called Julius that enables full SQL functionality over tabular data (e.g., CSV files) but also the ability to write a full ETL data processing flow. Currently, DBFunctor can be used for in-memory data processing in Haskell, without the need for some external database.

The most notable change in this new release is the full support of DML (Data Manipulation Language) operations. I.e., Insert (single tuple), Insert-Into-Select, Delete, Update, Upsert (Merge) operations have been implemented along with the corresponding Julius clauses.
Other changes includes:
  • Implemented string aggregate function string_agg (listagg in Oracle) and the corresponding Julius clause
  • Implemented Julius Aggregate clauses: CountDist and CountStar (count(distinct col) clause and count(*))
  • Implemented semi-join operation and corresponding Julius clause
  • Implemented anti-join operation and corresponding Julius clause
  • Added support for UTCTime values
  • Solved CSV orphan instances problem
  • Various other fixes
For any issues/problems with the DBFunctor package please open an issue on github.

Happy data processing!

Thank you.
Best Regards,
Nikos




_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: ANN:DBFunctor-0.1.1.0 - Functional Data Management / ETL Data Processing in Haskell

Olaf Klinke
Dear Nikos,

(not CC'ing the list yet because I don't know whether my remarks miss the design goals of DBFunctor)

As a data scientist I have some questions regarding the design of DBFunctor.

1.
Why is RTable, or rather RDataType a closed type? Was this forced by compatibility to other databases? There might be RTabular Types whose elements are hard to represent as RDataType, or the conversion deletes some of the data's semantics. What is the essence that makes the operations in the Julius language work? For example, the inner join could have the more general type
MonadPlus m => (a -> b -> Bool) -> m a -> m b -> m (a,b)

2.
Is there a way of saying that a column may not contain NULL values?

3.
What about the efficiency of the operations? I see no complexities stated in the documentation of Etl.Julius.

4.
I like the feature of named operations (:=>). But can we bind a sub-operation to a Haskell variable without leaving Julius? After all, takeNamedResult can lead to exceptions because the user may reference a name that has not been defined. With haskell variables, the compiler prevents this. Why not make the Julius language a monad, so that one can use do-notation for building the result? E.g.

type JuliusExpr = JuliusLang RTable
instance Monad JuliusLang where ...

Kind regards,
Olaf
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: ANN:DBFunctor-0.1.1.0 - Functional Data Management / ETL Data Processing in Haskell

Olaf Klinke
In reply to this post by Nikos Karagiannidis
Apologies Nikos, apparently clicking on the sender in the mail archives replaces the address shown with "[hidden email]", a behaviour I was unaware of.

Olaf

> Dear Nikos,
>
> (not CC'ing the list yet because I don't know whether my remarks miss the design goals of DBFunctor)
>
> As a data scientist I have some questions regarding the design of DBFunctor.
>
> 1.
> Why is RTable, or rather RDataType a closed type? Was this forced by compatibility to other databases? There might be RTabular Types whose elements are hard to represent as RDataType, or the conversion deletes some of the data's semantics. What is the essence that makes the operations in the Julius language work? For example, the inner join could have the more general type
> MonadPlus m => (a -> b -> Bool) -> m a -> m b -> m (a,b)
>
> 2.
> Is there a way of saying that a column may not contain NULL values?
>
> 3.
> What about the efficiency of the operations? I see no complexities stated in the documentation of Etl.Julius.
>
> 4.
> I like the feature of named operations (:=>). But can we bind a sub-operation to a Haskell variable without leaving Julius? After all, takeNamedResult can lead to exceptions because the user may reference a name that has not been defined. With haskell variables, the compiler prevents this. Why not make the Julius language a monad, so that one can use do-notation for building the result? E.g.
>
> type JuliusExpr = JuliusLang RTable
> instance Monad JuliusLang where ...
>
> Kind regards,
> Olaf
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.