No Title
As I've been working with CRM114 and patching a problem with inter-process deadlock (I wrote the developer, he explained the problem and I altered my copy to fix it, dusting off my C skills for the first time in a while) I have been thinking about other applications of this thing. It does statistical matches based on distributions of words and combinations of words. That's how the spam filter works - you have a set of characteristics for things you have called "spam" and "nonspam" and then you ask CRM114 "Which set is this thing more like?" However, it doesn't have to be limited to spam.
I once had a script that did regex searching through newsgroup postings, typically looking for things in .forsale groups. At the time I was deep into Rocketeer toys, so anything in rec.toys.forsale with the term "Rocketeer" in it would get mailed to me. The problem is that maintaining the list is a pain and it is limited to what you literatlly put in it. I'm thinking of a newgroups filter written in the CRM114 command language through which you run newsgroup posts. You train two data sets - "interesting" and "not interesting" and ask which it is more like. You don't have to know what words are matching or even care - over time the neural network is trained on the things you like. Given the training sets, you could then have a script that goes out periodically, downloads all the posts in target groups and forwards you all the posts that are more like the "interesting" set than the "non-interesting". I would envision this being kind of a pain at first as you train the sets initially, and then getting really cool. Just like the spam stuff if something gets marked as relevant and you don't think it is, you can unlearn it from one set and add it to the other. The wrinkle is that each group needs its own training set because what makes a post interesting in ga.forsale is quite different from what makes one interesting in rec.arts.music.zappa. It is an interesting idea. e.com#link4) to the accusation he might be an impostor. Sorry for the long links, my friends that read these in the newgroups. What kind of combination of cluelessness and brass monkeys does it take to assume that you can spot as an impostor someone with whom you have never interacted (based on the fact that he had a Hotmail e-mail as the only evidence of foul play), particularly when that person has such a unique writing style. Why even does the first thing you post have to be "This person might not be who they say they are" anyway? Isn't that the risk with every non-PGP signed Usenet post ever? The risk didn't just happen, so there is no reason to unleash the guns on poor George. The fact that Usenet posts might be forged is in the "No shit sherlock" category of insights. Checks might be forged too, but it's not the default assumption. Can we not begin by giving people a break and waiting for them to do something fishy before jumping on them?