Quantcast

Parsing a bunch of strings without backtracking

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Parsing a bunch of strings without backtracking

Olaf Klinke
The tree with internal nodes of type Char and leaves of type v is a representation of a finite state machine, a common tool in parsing certain grammars. I'm not sure your type Compact v captures exactly this, since a value v may be followed by more Chars down in the tree. You'd want

type Compact v = [Either (More v) v] -- akin to Data.Tree.Forest
data More v = More Char (Compact v) -- akin to Data.Tree.Tree

What you may want to look at is a so-called trie [1,2].
In textbooks, it is common to assume that there is a charcter '$' not element of your alphabet. Then '$' is appended to each string before compressing all into a tree structure. This way, one can put both a string and a prefix of it into the same tree.

A special kind of trie is the suffix trie. It holds all suffixes of a string and facilitates fast full-text search. More efficient variants are extensively used in bioinformatics [3].

Olaf

[1] http://hackage.haskell.org/package/word-trie-0.3.0/docs/Data-Trie.html
[2] https://en.wikipedia.org/wiki/Trie
[3] http://mummer.sourceforge.net
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Loading...