Dear Simon,

since you'll be talking to genomics people, you might want to look at a Bachelor thesis about sequence alignment [1]. The author is a PhD student in Cambridge now.

In a nutshell, the ListT monad transformer [2] is a good abstraction for full-text index structures [*] that are in heavy use among the genomics people. ListT represents the basic state transformation when a symbol is added to the query string. It also facilitates alignment of sequences with uncertain values [+]. Swapping out the monad lets you change the model of uncertainty, but you write your algorithm only once. I can provide code if required.

The edit distance algorithm [3] might also go down well at Sanger. It is an example where lazyness and subtle re-arrangement turns a quadratic algorithm into an optimal one.

Olaf

[1]

https://pp.ipd.kit.edu/uploads/publikationen/kuhnle13bachelorarbeit.pdf[2]

http://hackage.haskell.org/package/list-t[3]

http://users.monash.edu/~lloyd/tildeStrings/Alignment/92.IPL.html
[*] Burrows-Wheeler transform, FM-index, suffix arrays, suffix trees, etc.

[+] Current index structures hold only one sequence, while known genomic polymorphisms are stored separately.

_______________________________________________

Haskell-Cafe mailing list

To (un)subscribe, modify options or view archives go to:

http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafeOnly members subscribed via the mailman list are allowed to post.