relational data representation in memory using haskell?

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

relational data representation in memory using haskell?

Marc Weber
I'm kind of stuck that's why I'm posting here to ask wether this makes
sense at all, maybe someone else has already done it?

What I'd like to have:

Some way representing relational data which is typically stored in
databases such as Postgresql..

Rewriting something like Postgresql in haskell would take ages..
So I'd be satisfied with having in memory representation only (this
would fit the HAppS state system very well .. :)
Why ?
* type safety
* less conversions compared to SQL data
* no need to switch processes, parse SQL etc so maybe it's even faster?
  (a small benchmark showed that inserting 20000 Ints into a list was 8
   times faster than using MySQL parsing 20000 INSERT INTO x (1)
   statements )


I'd like to illustrate two different ideas using a small example:
(A)
        data CD = CD { title :: String, tracks :: [ Track ] }
        data Track = Track { track :: String, cd :: CD }
        data PDB = PDB { cds :: Set CD, tracks :: Set Track }

because it's not using foreign ids but kind of pointers I'll call this
the pointer method

using uniq ids it would look like this:
(B)
        data CD = CD { id : Int,  title :: String, tracks :: [Int ] }
        data Track = Track { trackId :: Int,  track :: String, cd :: Int }
        data IDB = IDB { cds :: Map Int CD, tracks :: Map Int Track }
I will call it I DB (I = using ids)

PDB: pro : * less work when doing joins (no need to look foreign rows up)
     con : * you need uniq ids or such when serializing to disk
           * When updating a track you'll also have to update the pointer
             stored in cds. and if you had another table shelfs.. this
             had to be updated as well..

IDB: the other way round


I find the idea not using any lookups when using joins appealing.

Of course having a simple
        data Table = Table Map UniqId Rec

isn't enough, sometimes you need more than one index or even a multi index:
        data Table = Table { byId :: Map Int Rec
                     , byNameAndAge :: Map String (Map Int (Set Rec)) }

Note that I've used Set here as well because this index does'nt have to
be uniq! starting to write an
insertTable :: Table -> Rec -> Table
more than twice is getting tedious..

Of course you can start using some type hackery to insert a rec
into all maps automatically.. but you'll get into trouble making
the type system apply the best index not the first matching one.
(I bet this could be done using HList etc somehow as well.. )
So my current attempt is defining the database using some data types and
make template haskell derive those insertIntoTable and update functions.

I've added the draft below. But before continuing spending much time on
it I'd like to get your advice: Is there a chance that it will pay off?

Some general considerations:
haskell solution con:
        haskell can get close to C but in general it may be >10 times slower when
        not caring too much about design or writing low level (see recent thread
        about md5 or one where David Roundy has said something about a matrix thread:
        only 10 times slower?)

        Using a garbage collector on database data (some hundred MB)
        might not be the optimal way because I feel you can tell exactly
        when you no longer need a piece of allocated memory here?
        So some time might be wasted.

        projects tend to run longer as expected.. And if data no longer
        fits into memory .. :(... -> bad performance
        I think systems such as postgresql do scale much better if you
        have some gbs of data and only use the most recent X records
        frequently.. So maybe you'll have to spend time later which
        you've won by using a haskell relational data representation in
        memory only.. Another solution: use clusters - I don't have any
        experience.
         
pro:
        much more safety (STM, type system ..) there are less
        possibilities making compared to C / PHP etc

Do you also think (A) is more interesting because some load (looking up
foreign keys) is moved on insert / delete and update operations taking
less time in but are called more frequently thus maybe reducing peak
load on queries?

Of course some time would have to be spend on queries wich might
look like this:
let queryresult = $(query ( tables + constraints + relations ) ) db
automatically generating the query function taking into account expected
index cardinality etc..

Any comments, suggestions, links to existing solutions (except coddfish,
haskelldb) ?

Marc Weber


draft
============= types represeting tables and db ========================
module RDMH.Types where
import Language.Haskell.TH
import Language.Haskell.TH.Syntax

data Uniqueness = Uniq | NotUniq deriving (Show, Eq)
data ModifyMode = InsertOnly | UpdateInsert | UpdateInsertDelete deriving (Show, Eq)

type TypeS = String -- a name of a data type (data A = ..)

data Index = I {
    uniqueness :: Uniqueness
    , key :: Exp  -- a fuction rec -> key
    , subkeys ::  [ Index ]
    , keyType ::  TypeS  -- the key type (this *should* be determined by the function.. But I don't know yet how to do it.. Probably using some GHC API magic
    -- , expectedCardinality :: Int -- how should this look like exactly? This will be used to select the best indexes
  } deriving (Show, Eq)

  -- Int is negative weight / cardinality.
  -- If you issue a query having two contstraints on types beeing indexed the one having biggest cardinality is taken first
  -- first type:`

-- convinient function to get the type of a name - see TestCase.hs
t = ConT . mkName -- type (either key or row)
k = VarE . mkName -- gets the key
-- app (VarE $ mkName "fst" )

-- a table representation
data Table = Table {
      tableName :: String
    , row :: TypeS -- this row type must be an instance of Eq
    , indexes :: [ Index ] --  [ ( [  mkName "Int", mkName "String" ]] is represented as Map Int ( Map String rec )
    -- , history :: Bool -- to be implemented. Each time a record is updated the original row should be saved somehow somewhere
    , modify :: ModifyMode
    -- , insertTimeStamp :: Bool
    -- , updateTimeStamp :: Bool
  }
  deriving (Eq, Show)

data DB = DB {
    dbName :: String
  , tables :: [ Table ]
  , oneToN :: [ (Table, Table, String)  ]
  -- String must be set if you want two relations to the same table (eg one ticket has foreign inbound and outbound flight)
  -- , nToM :: [ ( Table, Table ) ]
}
============= example use case =======================================

{-# OPTIONS_GHC -XTemplateHaskell #-}
module TestCase where
import Language.Haskell.TH
import RDMH.Types
import RDMH.TH
import Data.Time


-- the data
data RCD = CD { title :: String
             , artist :: String
             , year :: Int
             }
data RTrack = Track {
               trackTitle :: String
             , recordingDate :: UTCTime
             }

-- a track table with two indexes title and recordingDate
trackT = Table "track"
                 (t "RTrack") -- row type
                 -- indexes:
                 [ I NotUniq (k "trackTitle" ) [] (t "String")
                 , I NotUniq (k "recordingDate" ) [] (t "UTCTime")
                 ]
                 UpdateInsertDelete
-- a cd table with two indexes title and year
cdT = Table "cd"
                 (t "RCD" ) -- row type
                 -- indexes:
                 [ I Uniq (k "title" ) [] (t "String")
                 , I NotUniq (k "year" ) [] (t "Int")
                 ]
                 UpdateInsertDelete


-- the db: both tables and a simple relation
db = DB "cdDB"
        [ cdT, trackT ]
        [ OneToN cdT trackT "" ) ]


$(createDB db)
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: relational data representation in memory using haskell?

Jeremy Shaw-3
At Thu, 22 May 2008 01:04:24 +0200,
Marc Weber wrote:

> Some way representing relational data which is typically stored in
> databases such as Postgresql..
>
> Rewriting something like Postgresql in haskell would take ages..
> So I'd be satisfied with having in memory representation only (this
> would fit the HAppS state system very well .. :)

Are you familiar with the HAppS IxSet library?

You would do something like:



$( deriveAll [''Ord,''Eq,''Read,''Show,''Default]
   [d|
        data CD = CD AlbumTitle Artist [Track]
        newtype Artist = Artist String
        newtype AlbumTitle = AlbumTitle String
        data Track = Track TrackTitle TrackIndex
        newtype TrackIndex = TrackIndex Int
        newtype TrackTitle = TrackTitle String
  |])

$(inferIxSet "CDS" 'noCalcs [''AlbumTitle, ''TrackTitle, ''Artist])

This creates a table with indexs on AlbumTitle, TrackTitle, and
Artist.

You can do a simple query like:

  mycds @= (Artist "Wesley Willis")

to get all the tracks by Wesley Willis.

You should be able to build joins, etc on top of that.

j.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: relational data representation in memory using haskell?

Marc Weber
On Wed, May 21, 2008 at 05:05:21PM -0700, Jeremy Shaw wrote:

> At Thu, 22 May 2008 01:04:24 +0200,
> Marc Weber wrote:
>
> > Some way representing relational data which is typically stored in
> > databases such as Postgresql..
> >
> > Rewriting something like Postgresql in haskell would take ages..
> > So I'd be satisfied with having in memory representation only (this
> > would fit the HAppS state system very well .. :)
>
> Are you familiar with the HAppS IxSet library?
Yes - not with all that sybwith-class stuff though.
There are some issues:
its dynamic : doesn't this waste some CPU cycles?
no multi indexes..
maybe some space leaks because the data type containing the Maps is
build after each filter maybe leaving unevaluating chunks - Saizan has
told me about it on HAppS.. And you can't extend it to the degree I'd
like to (eg throw a query at it and let the system figure out which
indexes to use)
And last but not least: It does'nt support relations at all yet.
So all the effort adding / checking foreign keys etc has to be done
anyway.

Thanks Marc
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: relational data representation in memory using haskell?

Dan Weston
Consider SQLite [1], which is "a software library that implements a
self-contained, serverless, zero-configuration, transactional SQL
database engine."

It is embeddable, can reside completely in memory (including the data),
and can be saved and restored to disk when needed. It neatly fills the
niche between maps and a client/server database model.

It has a C API which you can wrap as needed with the FFI, and you
wouldn't need more than a dozen or so functions to start with (it
understands SQL too).

[1] http://www.sqlite.org/

Marc Weber wrote:

> On Wed, May 21, 2008 at 05:05:21PM -0700, Jeremy Shaw wrote:
>> At Thu, 22 May 2008 01:04:24 +0200,
>> Marc Weber wrote:
>>
>>> Some way representing relational data which is typically stored in
>>> databases such as Postgresql..
>>>
>>> Rewriting something like Postgresql in haskell would take ages..
>>> So I'd be satisfied with having in memory representation only (this
>>> would fit the HAppS state system very well .. :)
>> Are you familiar with the HAppS IxSet library?
> Yes - not with all that sybwith-class stuff though.
> There are some issues:
> its dynamic : doesn't this waste some CPU cycles?
> no multi indexes..
> maybe some space leaks because the data type containing the Maps is
> build after each filter maybe leaving unevaluating chunks - Saizan has
> told me about it on HAppS.. And you can't extend it to the degree I'd
> like to (eg throw a query at it and let the system figure out which
> indexes to use)
> And last but not least: It does'nt support relations at all yet.
> So all the effort adding / checking foreign keys etc has to be done
> anyway.
>
> Thanks Marc
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: relational data representation in memory using haskell?

Salvatore Insalaco
In reply to this post by Marc Weber
2008/5/22 Marc Weber <[hidden email]>:
> I'd like to illustrate two different ideas using a small example:
> (A)
>        data CD = CD { title :: String, tracks :: [ Track ] }
>        data Track = Track { track :: String, cd :: CD }
>        data PDB = PDB { cds :: Set CD, tracks :: Set Track }
>
> because it's not using foreign ids but kind of pointers I'll call this
> the pointer method

This doesn't look like a relational structure at all in Haskell.
Let's take the CD and Track relations. In a relational database you
have something like:
CD (1, 'Querying about you')
Track (1, 'Inserting some love', 1)
Track (2, 'Updating my feelings', 1)
Track (3, 'Deleting my hopes', 1)

In an imperative language you can do something similar in memory using
objects (you can in haskell to with IORefs and so on, but let's stay
on "data"). You get something like:

0x000 CD('Querying about you')
0x004 Track('Inserting some love, 0x004)
...

In Haskell when you say:
>        data Track = Track { track :: String, cd :: CD }

You are not storing in Track a reference, a pointer or something
similar to a CD, you are storing a *value* (low level you probably
have a pointer, but you have not pointer semantics). As you noticed,
you cannot "update" the CD title without changing each Track. That's a
way to store information, and a good way too, but it's not a
relational structure by any extent.

If you want to use this structure for your relational data you need two things:
1) Something that will convert from a value-based representation of
data to something relational (aka ORM in the OO world... a FRM? VRM?).
2) A relational storage (internal or external).

If you want to use "normal" Haskell ADT, are you sure that a
relational storage is what you want? Keeping that in memory doesn't
give you some advantages of relational databases (e.g. uniform
representation), and the impedance between the functional and the
relational world is not easy to manage.

Maybe I misunderstood what you are trying to accomplish, and you only
want to do a generic data structure with fast lookups on the content
of the items? Or do you really need relational semantics?

Salvatore
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: relational data representation in memory using haskell?

Marc Weber
On Thu, May 22, 2008 at 08:16:54AM +0200, Salvatore Insalaco wrote:

> 2008/5/22 Marc Weber <[hidden email]>:
> > I'd like to illustrate two different ideas using a small example:
> > (A)
> >        data CD = CD { title :: String, tracks :: [ Track ] }
> >        data Track = Track { track :: String, cd :: CD }
> >        data PDB = PDB { cds :: Set CD, tracks :: Set Track }
> >
> > because it's not using foreign ids but kind of pointers I'll call this
> > the pointer method
>
> This doesn't look like a relational structure at all in Haskell.
> Let's take the CD and Track relations. In a relational database you
> have something like:
> CD (1, 'Querying about you')
> Track (1, 'Inserting some love', 1)
> Track (2, 'Updating my feelings', 1)
> Track (3, 'Deleting my hopes', 1)
>
> In an imperative language you can do something similar in memory using
> objects (you can in haskell to with IORefs and so on, but let's stay
> on "data"). You get something like:
>
> 0x000 CD('Querying about you')
> 0x004 Track('Inserting some love, 0x004)
You are right. But ghc does represent those things as pointers.. :)
So it is indeed unless you ask ghc to not use (eg by using strict fields
etc), correct? IORefs are not that good because you can't read them
within STM. But you are right: Using IORefs was my first idea.

> similar to a CD, you are storing a *value* (low level you probably
> have a pointer, but you have not pointer semantics). As you noticed,
> you cannot "update" the CD title without changing each Track. That's a
> way to store information, and a good way too, but it's not a
> relational structure by any extent.
I agree it's not in general and higly implementation dependant
Maybe I'm totally wrong. I imagine ghc having some internal
representation of data types which come close to

struct CD {
  Track * tracks;
  title ** char;
} cd;

struct Track {
  cd * CD;
  title ** char;
} track;

(does'nt compile but you get the idea. My C knowldge is good enough for
reading only).

So if you start seeing the whole database as list of connected structs
vio pointers adding / deleting/ inserting is quite the same as adding
some nodes to a Data.Map. You replace the nodes you have to replace and
finally get a poiter pointing to te new database state. As long as you
don't loose the original "pointer" you can easily rollpack.
Consider

let x = Cd ...
forkIO $ ( do something with x } -- (1)
print x -- (2)

How can ghc know when running line (2) that (1) hasen't changed the
record? I see two solutions:
a) give the forked process a copy (Then my design will collapse)
   but this is expensive to copy data without knowing you ned to
b) use pointers and replace x ony on updating. Thus if (1) changes the
   title a new struct wil be created poiting to the old list but a new
   title String. line (2) doesn't have to care at all.
I'm guessing that analyzing a) when values have to be copied is
complicated - thus b) is implemented ? quicksilver has told me so as
well - But I'm not sure that's why I'm asking.

> If you want to use "normal" Haskell ADT, are you sure that a
> relational storage is what you want? Keeping that in memory doesn't
> give you some advantages of relational databases (e.g. uniform
> representation), and the impedance between the functional and the
> relational world is not easy to manage.
>
> Maybe I misunderstood what you are trying to accomplish, and you only
> want to do a generic data structure with fast lookups on the content
> of the items?
Exactly.. Of course I only want a generic data structure with fast
lookups and content of items.. But I don't want to write the insert
delete and update functions for each table again and again..
Does this already exist?

Marc Weber
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: relational data representation in memory using haskell?

Marc Weber
In reply to this post by Dan Weston
On Wed, May 21, 2008 at 06:07:15PM -0700, Dan Weston wrote:
>  Consider SQLite [1], which is "a software library that implements a
[..]
>  It has a C API which you can wrap as needed with the FFI, and you wouldn't
>  need more than a dozen or so functions to start with (it understands SQL
>  too).
So it has kind of API enabling me inserting rows without using SQL?
I still have to do some marshalling to / from C and synchronize db
layout and haskell data types.

Marc Weber
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: relational data representation in memory using haskell?

Ketil Malde-5
In reply to this post by Salvatore Insalaco
"Salvatore Insalaco" <[hidden email]> writes:

> This doesn't look like a relational structure at all in Haskell.

I believe you are abusing terminology here.  'Relation' refers to a
table (since it represents a subset of AxBxC.., i.e. a relation), not
to references between tables.

Mutability and mutability of references is of course important in most
relational databases, but I'm not convinced an immutable database
wouldn't be interesting and useful in a functional programming
language.

I've always (well not that I use them often) been annoyed at RDBMS
lack of discriminated unions.

The TH based approach by HAppS looks cool, but I think simply a
slightly more general Data.Map (supporting multiple indices, search
by named field and so on) could be a useful thing.

-k
--
If I haven't seen further, it is by standing in the footprints of giants
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: relational data representation in memory using haskell?

Salvatore Insalaco
In reply to this post by Marc Weber
> Consider
>
> let x = Cd ...
> forkIO $ ( do something with x } -- (1)
> print x -- (2)
>
> How can ghc know when running line (2) that (1) hasen't changed the
> record? I see two solutions:
> a) give the forked process a copy (Then my design will collapse)
>   but this is expensive to copy data without knowing you ned to
> b) use pointers and replace x ony on updating. Thus if (1) changes the
>   title a new struct wil be created poiting to the old list but a new
>   title String. line (2) doesn't have to care at all.

GHC knows that because in Haskell isn't possible to "update" x. x is
not a variable, it's a binding.
To put it simply: with IORefs (and STRefs, MVars, ...) you have
references to values that you can change (inside their respective
monads), much like variables, but data declarations are values, not
references to values (even if GHC stores them as pointers you cannot
treat them as such), so you cannot update it.

So, in your example, you have more or less a relation (CD) where all
the "columns" are part of primary key (and so they are not mutable).

Salvatore
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: relational data representation in memory using haskell?

Marc Weber
In reply to this post by Ketil Malde-5
On Thu, May 22, 2008 at 10:56:03AM +0200, Ketil Malde wrote:
> "Salvatore Insalaco" <[hidden email]> writes:
>
> > This doesn't look like a relational structure at all in Haskell.
>
> I believe you are abusing terminology here.  'Relation' refers to a
Yes. Sorry. I thought the relational in relational databases refers to
references between tables. But you a right a relation is a set of rows
forming a table. Thanks for clarifying.

Thanks Marc
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: relational data representation in memory using haskell?

Marc Weber
In reply to this post by Salvatore Insalaco
Hi Salvatore

On Thu, May 22, 2008 at 11:01:01AM +0200, Salvatore Insalaco wrote:

> > Consider
> >
> > let x = Cd ...
> > forkIO $ ( do something with x } -- (1)
> > print x -- (2)
> >
> > How can ghc know when running line (2) that (1) hasen't changed the
> > record? I see two solutions:
> > a) give the forked process a copy (Then my design will collapse)
> >   but this is expensive to copy data without knowing you ned to
> > b) use pointers and replace x ony on updating. Thus if (1) changes the
> >   title a new struct wil be created poiting to the old list but a new
> >   title String. line (2) doesn't have to care at all.
>
> GHC knows that because in Haskell isn't possible to "update" x. x is
> not a variable, it's a binding.
> To put it simply: with IORefs (and STRefs, MVars, ...) you have
> references to values that you can change (inside their respective
> monads), much like variables, but data declarations are values, not
> references to values (even if GHC stores them as pointers you cannot
> treat them as such), so you cannot update it.
Sorry - maybe I'm unable to express using the correct terminology..
So I'll just give a small example how I think it could magically work?

  data CD = CD { title :: String, tracks :: [ Track ] }
  data Track = Track { track :: String, cd :: CD }
  data PDB = PDB { cds :: Set CD, tracks :: Set Track }

Let's fill the database with 1 track and a cd:

0x3 = pointer to DB rec
0x1: adress of CD
0x5: adress of Track
0x4, 0x9, 0x9: start adress of linked list connected by pointers..
    In the final solution should use finger trees or such to speed up
    deletion / replacing elements

0x3 database:
        0x8 cds :
                tuple1 0x1 : (0x6 "My song")     (0x4 [ 0x5, ... ])
                              ^ pointer to str    ^ pointer to track list, 0x5 = pointer to track

        0x9 tracks:
                tuple1 0x5 : ( 0x7 "track 1") 0x1
                                        ^ reference to cd
       
Now I query the track, and "update" it (replacing the title)..
It's a little bit tricky, because when updating the track I need to
update the cd as well (circular referency). All new pointers are
starting from 0x20

So in haskell it would look like this:
        let updatedCd = 0x22  CD (0x6 "My song")  (0x20 ( 0x23 : ...)
            updatedTrack = 0x23 Track ( 0x21 "updated track title" ) 0x22
        in (0x27) DB (0x24 (updatedCd:otherCds)) (0x25 (updatedTrack:otherTracks))

Now my new address to access the database is 0x25. So pretty every
adress has been changed but 0x6, ..., otherCds and otherTracks
A query running using db 0x3 will not notice any change on its snapshot.
Are these actions called rebinding?

Of course if you have a lot of relations writing this
        let
        in
will become tedious and error prone.. That's why I'd like to use
template haskell to automatically derive it.

Thanks for listening

Marc Weber
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: relational data representation in memory using haskell?

Salvatore Insalaco
2008/5/22 Marc Weber <[hidden email]>:
> So in haskell it would look like this:
>        let updatedCd = 0x22  CD (0x6 "My song")  (0x20 ( 0x23 : ...)
>            updatedTrack = 0x23 Track ( 0x21 "updated track title" ) 0x22
>        in (0x27) DB (0x24 (updatedCd:otherCds)) (0x25 (updatedTrack:otherTracks))

Mmmm I don't think that this is a good way to go.
Let me do a counter-example:

data A = A String
data B = B String [A]
data C = C String [B]
data D = D String [C]

Suppose to have some As, Bs, Cs, Ds in your database. Now you want to
"update" the String of A. As you cannot "update" stuff in Haskell
mantaining the same pointer, you've got a "new A". So you must find
all Bs that had this A in their list, and update that.
Unfortunately lists are not mutable too, so you are creating a new
list; so you need to create new containing Bs too. But then you must
change Cs... and so on.
A little change like changing the String in A requires updating the whole "DB".

Salvatore
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: relational data representation in memory using haskell?

Marc Weber
On Thu, May 22, 2008 at 12:48:42PM +0200, Salvatore Insalaco wrote:

> 2008/5/22 Marc Weber <[hidden email]>:
> > So in haskell it would look like this:
> >        let updatedCd = 0x22  CD (0x6 "My song")  (0x20 ( 0x23 : ...)
> >            updatedTrack = 0x23 Track ( 0x21 "updated track title" ) 0x22
> >        in (0x27) DB (0x24 (updatedCd:otherCds)) (0x25 (updatedTrack:otherTracks))
>
> Mmmm I don't think that this is a good way to go.
> Let me do a counter-example:
>
> data A = A String
> data B = B String [A]
> data C = C String [B]
> data D = D String [C]
>
> A little change like changing the String in A requires updating the whole "DB".
You're right. Very bad idea unless you only insert once a year and only
have queries the whole day.
The only way to fix this is by separating relational data from record
data.
> data A = Map RecordDataA RelationalDataA
> data B = Map RecordDataB RelationalDataB

So when changing a field in RecordDataA only the relational data B would
have to be updated.. but I see that that's not that good either.
Fine. Then the only way to go is using uniq ids as keys the way it's
already done everywhere

Thanks
Marc W
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: relational data representation in memory using haskell?

Isaac Dupree
In reply to this post by Ketil Malde-5
to whoever in this thread hasn't realized it:

Map String (Map Int Foo) == Map (String,Int) Foo

(at least to an approximation)

-Isaac
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: relational data representation in memory using haskell?

Marc Weber
On Thu, May 22, 2008 at 09:11:28AM -0400, Isaac Dupree wrote:
>  to whoever in this thread hasn't realized it:
>  Map String (Map Int Foo) == Map (String,Int) Foo
>  (at least to an approximation)

That's note quite right..
if you care only about the Int -> String -> [ Rec ] lookup you are
right. But When using Map String (Map Int Rec) you can still use the
first index also you'll have to spend some time joining all sets found
in the second one..

Marc
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: relational data representation in memory using haskell?

Marc Weber
On Thu, May 22, 2008 at 03:34:36PM +0200, Marc Weber wrote:
> On Thu, May 22, 2008 at 09:11:28AM -0400, Isaac Dupree wrote:
> >  to whoever in this thread hasn't realized it:
> >  Map String (Map Int Foo) == Map (String,Int) Foo
> >  (at least to an approximation)
There is another difference if you want to query <,<=,>,>=
say String = city and Int = age.
Now take Map (Int, String) rec and use this index to filter all tuples
having an age >= 80 and beeing "city"

Marc
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: relational data representation in memory using haskell?

Isaac Dupree
Marc Weber wrote:
> On Thu, May 22, 2008 at 03:34:36PM +0200, Marc Weber wrote:
>> On Thu, May 22, 2008 at 09:11:28AM -0400, Isaac Dupree wrote:
>>>  to whoever in this thread hasn't realized it:
>>>  Map String (Map Int Foo) == Map (String,Int) Foo
>>>  (at least to an approximation)
> There is another difference if you want to query <,<=,>,>=
> say String = city and Int = age.
> Now take Map (Int, String) rec and use this index to filter all tuples
> having an age >= 80 and beeing "city"

there are two problems:
Data.Map doesn't have a very good API for that (splitLookup is about the
best you can get for ranges)
Whether tupled or not, the order of the two indices matters (Int,String)
vs. (String,Int) for what you can look up efficiently.  It's essentially
a binary tree either way (Map x (Map y rec)) or (Map (x,y) rec), sorted
in the same order.  (tuples sort by lexicographical order)

-Isaac
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe