Comment Spammers Walk the Earth

I was sitting here working on the laptop when I got a big wad of email. It was the writeback system here telling me I got a comment, 6 of them. By examing them, I could see they were all comment spam. Since I had installed the Blosxom port of the MT anti-spam plugin I hadn’t had a problem but then hadn’t weathered many attempts either. My first thought was that my cron job that updates the blacklist had failed. I checked the blacklist and sure enough, the url in question was in the list. Curious, I thought. However, even as I examined it spam was coming in at the pace of one a minute (hilarious, a comment spammer that obeys netiquette about robot accesses of a web page.) I temporarily blocked the IP address via .htaccess while I tried to sort this out.

I went to my comments page and tried to duplicate the spammers comments, ones that should have been caught. They went through! Damn. I put some print statments into the plugin to write to STDERR what was happening, and I kept loading the page. I could see that it did correctly load the lines from the blacklist, and that it should have been caught. However, I noticed that the printout was double spaced. That ain’t right. I ended my print with a newline, but that implied that their was a newline in the actual variable. I looked at the routine that loaded the list, and sure enough he did not chomp the newline off the end. Since this value was then passed to be the pattern of a regular expression, it was only going to catch the spam if the values it was matching against had a newline after the occurrence of the url in question. I added the chomp line in the routine so that the newline disappeard, and tried to spam it again. Voila, it was caught! Since the spammer was still running amok despite being forbidden, I unblocked him via .htaccess to let the spam fighter try again. Double viola! I began getting the “we rejected comment spam” mails immediately. I don’t know if this bug has been caught in the main line of this plugin, but it certainly needs to be fixed. I’ll be contacting Doug Alcorn shortly.

Blosxom Writeback Vulnerability

My high school buddy Kevin forwards along a reference to a vulnerability in the Blosxom writeback plugin. The advisory is on insufficient HTML validation to strip out malicious script code. However, I’m not sure what version of writeback they are looking at because by a visual inspection what I have doesn’t have this problem. They say the issue is:

In the writeback plugin, the code to filter out tags is a simple regular expression: “s/<.*?>//mg”. So entering scripts as “<script>alert(‘test’);</script>” will get filtered into “alert(‘test’);” and no code will be executed by the client.

Now as I dig through, I see that I’m using the writebackplus, which doesn’t appear to have this vulnerability. This is the thing that was even stripping out the paragraph tags that so irked some correspondents. Just to be sure, I’ll leave myself a writeback to test this out. For those of you out there using the original writeback plugin, I’d recommend either fixing the regular expression yourself if you can or switching to writebackplus. If in doubt, temporarily remove this plugin while you sort it out.

Technorati Plugin Works!

This perhaps isn’t the best blog to test my Technorati plugin because I don’t have a whole lot of inbound links and what I do have are fairly static. However, today because of the trackback I left earlier Ernest Miller’s blog popped onto my list and was duly represented. Nice! I hadn’t thought about it and then noticed it showing up in my right rail. It’s supposed to work but that doesn’t mean I’m so jaded to not enjoy it when it does.

One thing I have gained an appreciation for is the relative flakiness of Technorati. By using the Technorati API I can see oddly different results from time to time. Sometimes when my refresh happens, nothing is returned from the call. Sometimes every item in the list is some reference to this weblog, not the incoming links. I suppose it is good that it flakes a lot now, so I can code the plugin to deal with the weirdness. I considered having it not write out the cache file when nothing is returned, but then it would slow loading the blog to a crawl when Technorati is down because every page would be waiting for the plugin to timeout. I learned my lesson with the XML-RPC pinging plugin – don’t make the actions that happen frequently during remote server outages have side effects to cause them to be called even more. I’d rather have an empty list now and then than have it really hose my site when Technorati is down.

Technorati Plugin Working

So far so good, the Technorati cosmos plugin seems to be working just fine. It is correctly refreshing the cache every 6 hours. If only I had a more active cosmos, I could see things moving around and changing in there. For a jaunt, I tried pointing the URL at the beta of the new API but it has some problems. Every link in the whole list was from this weblog pointing back at itself, so that’s not so tremendously useful. Thus far I’m kind of pleased that this thing works, consider I have about 40 or 60 minutes of work in it.

Technorati Cosmos Blosxom Plugin

I took a wag at a blosxom plugin that assembles the list of a blog’s Technorati cosmos. Other blogging packages have had this for a while and it is generally pretty simple to do, so why shouldn’t blosxom? On the bottom of the right rail should be my cosmos, which should get updated every six hours. I only fetch it from the server once every (configurable) six hours, so the rest of the time it is coming from the cache. After I give this a few days of shakedown I’ll release it publicly and announce it on the blosxom plugin registry.

Page Titler 0.2 Released

Page Titler 0.2 is now available for download. This is the version that keeps the titles per request URL. No longer is there just a single title for everything, you can have different titles on the day pages, on the writeback pages, etc. It will always be based on the top story in that particular view. If you download it and use it, give me some feedback please.

Page Titler 0.2

I’ve made some modifications to my page_titler plugin. Before, it would act upon only the main unqualified page. It had some funky logic to skip what it was doing if this was a page view or a category view or something, but that never seemed to work quite right anyway. Loading the page of recent writebacks would always screw up the cache and there were other situations like that. Now, you have configuration variables that will allow you to decide whether to turn it on in the page and/or category views. If you do, it will title any of those pages and save a cache based on it and keyed off of the subdirectories.

The whole rationale of this plugin is that I like to have the title of the top story in my HTML title tag. However, by the time blosxom reaches the point where it can tell what the top story is, the top part of the web page has already been written. This plugin will cache out what the top story is for that URL and on the next load for it use that as the title. This means that the first load of any given URL will always have either no title or the title from the previous story and after that they will be right. By adding in the ability to keep a list of them keyed by the URL, not only does the main page work a lot better but the subpages have titles too!

The cache is stored as a very simple colon delimited file, not using Storable or anything fancy like that. Since the colon is not a valid character in an URL string, this shouldn’t ever be a problem (colons in the title won’t hurt anything.) Here is an example from my current cache:

2003/08/01:Cringely is a True Evil Genius
:One and Done
arts:The Crisis Continues
misc:Quote of the Day
arts/books:Book Crossing
2003/09/04:Winding Down
life:There and Back Again
2003/10/23:More on Elliott Smith
misc/030909_04.writeback:RIAA PSA
arts/sciencefiction/conventions:OryCon
?recent=7:Book Crossing
2004/02/17:Rip Off the Hood of Your Desoto and Send it In
arts/books/031126_02.writeback:The War on Copying
fitness:This Old Body
2004/03/19:Cajuns and Yellowjackets

The unqualified main page is currently titled “One and Done”, and has the empty prefix (the second line). You note that this also includes even the subpages that come from query parameters, such as the recent writeback view, which is keyed by the parameter string “?recent=7”. By doing this simple change, it changed from being a marginal plugin that is fragile and only somewhat useful to being a cool thing that is generally applicable and much more useful. I do believe that this will even supercede a few plugins, by doing the same thing as the “story_title” plugin, for example. What that plugin does is automatically covered by this one.

The updated page_titler is working on this blog currently. I’m going to wait a little while just to make sure there isn’t some lingering bug, and then make the version 0.2 available for download. I’ll post announcements of this fact to the blosxom mailing list and ping the plugin registry. If you use it, give me some feedback on it please!

Blosxom Date Problem Fixed

With a little digging in code, I found out why my blosxom dates were screwing up. This morning was particularly bad, in that the first entry said “Wednesday March 10”, the second said “Thursday, March 11” and the third was back to the 10th. I’ve long suspected there was a timezone issue in here, because the screwed up entries were always with things after 6 PM and I’m 6 hours from GMT when we are on standard time. By adding some data printing in the middle of blosxom, I was able to figure out that the first time the date was “prettified” it was correct and all subsequent ones were off by 6 hours. I also was seeing the print in the date function twice per file yet it was only called once in the blosxom code. Add in the fact that the very first time, the first print was right and the second for that file was screwed up and this led me to believe a plugin was doing it. I grepped for “nice_date” in my plugin directory and found one and only one that uses it: atomfeed (which I only ever installed on a whim anyway, I don’t really use it or publicize it.) I removed it from the plugin directory and voila, everything was fixed.

By examining what the plugin does, I tracked it down to the fact that atomfeed wants the date in both local time and in UTC. It does this by setting the environment’s TZ value first to “GMT” and then back to the original. Well, on my box that isn’t set in the environment so after the first time through all the times from then on out are in UTC. This explains why I would see all the right entries on the day by day view because that filtering happened before atomfeed rewrote the timezone. and why I would see the evening entries on the following day. For now, I just got rid of atomfeed. If I can see a why to fix their code to soemthing less destructive, I’ll put it back in and communicate my changes to the developers.

Date Problem with Blosxom

There’s a problem I’m having with blosxom right now. It’s a low level annoyance, not so important to be a jet scrambler but it really has been irking me. If I make a post in the evening (my home and the server are both in US Central time), it will appear as the correct day when I first post it. Later, after there is a subsequent post the next day, it will show up as having been posted the next day, before the next post. Oddly, when you do the day views it shows up in both of them. In the one, it will be the last of one day or the first of the next day, but is present in each.

I went a long time without changing or adding any plugins, but when I upgraded to the writebackplus from the standard writeback I also installed the interpolateconditional plugin. I’m wondering if that does have something to do with this. It looks kind of like the posts show up as in a day if they are in that day in either the local timezone or in UTC. The late posts would be in the next day if we were at +0000 time zone adjustment. That’s just conjecture. I don’t know that has anything to do with it, I’m just trying to think of theories that would explain this behavior. It does appear as if 6 PM is the breaking point and Central Time is -0600 so that’s what got me thinking in this direction. Any blosxom heads out there have any suggestions?

Writebackplus

Fletcher Penney has rewritten and expanded the functionality of the blosxom writeback plugin with his writebackplus. He says it should be a drop-in replacement for writeback, seewritebacks and recentwritebacks. So far so good. He purports it to have better handling of the combined email/url field that so bedevils writeback leavers, records IP addresses, has anti-spam comment measures and generally is an improvement. Thus far, I know that it seems to work with my old writebacks. We’ll see how it works with leaving new ones.

page_titler

I’ve been working with this for a while and it has been running on my weblog for two weeks now. No sense in stalling any more, here is version 0.1 of page_titler! This is a simple plugin that will look at the top story in the unqualified (ie, main page) of the weblog, cache it to disk and export it in the variable – No Title . This allows for use in the header. I like to have my HTML <title> tag to have the actual title of the newest story in it. This is how it used to be in Blogmax and I’m duplicating that bit of behavior here. Download the code here and if you use it or have any suggestions, leave me a writeback. My first plugin, I’m so excited!