I'm looking for a bit of help with a library design choice.

The streaming package currently offers a slidingWindow function

converting a stream into a stream of fixed-size windows of that

stream[1]:

slidingWindow

:: Monad m

=> Int -- Window size

-> Stream (Of a) m b

-> Stream (Of (Seq a)) m b

This is based directly on a similar function in conduit. Using a rough

translation into the world of lists, we have

slidingWindow 3 "abcdef" = ["abc","bcd","cde","def"]

The awkward case where the stream is shorter than the window is

handled by potentially producing a short sequence at the end:

slidingWindow 3 "ab" = ["ab"]

slidingWindow 3 "" = [""]

I recently merged a pull request that adds variations on sliding

window maxima and minima using what's apparently a "folklore"

algorithm. For example

slidingWindowMax 3 "abcbab" = "abcccb"

This is basically like

slidingWindowMax k = map maximum . slidingWindow k

except that an empty stream doesn't yield anything, to avoid undefined values.

The big advantage of these specialized functions is that rather than

having to take a maximum over a sequence of length `k` at each step,

they only do a constant (amortized) amount of work at each step. Nice!

But not very general. Suppose we want to take a moving average of some

sort, like an arithmetic mean, geometric mean, harmonic mean, or

median? That thought leads quite naturally to a data structure: a

queue holding elements of some arbitrary *semigroup* that efficiently

keeps track of the sum of all the elements in the queue[2].

While the choice of *data structure* is moderately obvious, the choice

of *sliding window function* is less so. The tricky bit is, again,

what happens when the stream is too short for the window. If you work

in the Sum semigroup and divide the results by the window size to get

a moving average, then a too-short stream will give a (single) result

that's completely wrong! Oof. What would be the most useful way to

deal with this? The streams in `streaming` give us the option of

producing a distinguished "return" value that comes after all the

yields. Would it make sense to *return* the incomplete sum, and the

number of elements that went into it, instead of *yielding* it into

the result stream? That seems flexible, but maybe a tad annoying. What

do y'all think?

[1]

https://hackage.haskell.org/package/streaming-0.2.3.0/docs/Streaming-Prelude.html#v:slidingWindow[2] See the AnnotatedQueue in

https://github.com/haskell-streaming/streaming/pull/99/files which

basically modifies Okasaki's implicit queues using some of the basic

ideas that appear in Hinze-Paterson 2–3 trees.

_______________________________________________

Libraries mailing list

[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries