I'm happy this is seeing some attention! I've habitually over promised on this and feel bad about that. My work-life-hobby programming balance has remained a mystery to me.
Even so, I'll give what answers I can to any questions that come up. I remember ultimately getting stuck on trying to identify a broadly useful heuristic: there are lots of possible parameters to tune and I was seeing massive run time differences for nofib cases on different CPU architectures for equivalent LLF parameters. I never managed to get a good handle on the crucial independent variables (e.g. code cache sizes, RTS parameters, etc). I'm hoping the join point work revealed some easier wins. Best of luck and don't hesitate to ping me with questions. Hope I can help.