I currently have a task at hand which requires to find a n-tuple of sentences s, where the biggest distance of sentences is searched(So the most dis-similar sentences shall be chosen).
Especially with growing n, this problem is causing me trouble, as in a naive implementation the number I need to calculate the distances grows exponentially by n and linear by s.
The distance function itself is drop-in distance :: Text -> Text -> Double
For the questions the distance can be expected to behave in constant time, and the list of n-tuples to be deepseq ready.
Given my real world problems, this is giving me a hard time waiting.
Coming from a more imperative environment, one approach I'd take would be to calculate a dictionary of predefined distances, so instead of calculating the distance new I'd only have to lookup the distance.
This would - in my understanding - mean that I create the dictionary which takes s * (s-1) steps to calculate the full dictionary, and then for s^n only dictionary lookups.
But from the little I remember "Real World Haskell", the functions and variables are only evaluated once and hang around until the garbage collector comes around.
is this right, and does this mean, that if there is no GC in between, the function distance(a,b) is only calculated once, and therefore the dictionary approach is mostly useless?
The function distance is symmetric, is there a way to help my Haskell (and the evaluation) to understand that it only needs to be run once? Iff not, does it make sense to iterate the over the tuples first and "order" them, to help distances only occur in a certain order? Is there a way to create unique, ordered n-tuples in the first place?
Additionally, I am grateful for any kind of help regarding performance-tricks to this case.
I know that this is a typical case for multi threading, but I first want to try and find "the best" base-algorithm. Also I think it's a nice question :)