What Happened to Crawler Etiquette?

Looking at the server logs for this weblog a few minutes ago, I noticed the Everest crawler from Vulcan (which namechecks owner Paul Allen on that page) downloading pages. I also notice that they were distressingly frequent, fetching pages 30K and larger every few seconds. I filled out their feedback form, letting them know that I have denied them at the server level as long as they are running at this high of a resource burden. I always thought one minute was the most frequent ethical limit on crawler accesses, and I consider hitting more often than once every ten seconds to be clearly abusive.

Am I alone in being concerned about this? More and more I see crawlers that hit at this level. Does everyone one of these crawler authors think they are the only one out there? When you have several dozen crawlers all hitting your site every few seconds, it becomes a big issue for an average citizen. I get a little pissed off when I have to increase the size of my iron just to service your fricking abusive swarms of robots. Uncool, dudes, uncool.

Back when I last did crawler programming with Perl’s RobotUA module, it’s default was to enforce that you couldn’t hit the same domain more than one minute. Has this completely dropped out of the radar? I think anyone building a robot or a crawler or even a crawler module should institute this minimum as a default. As crawlers you are guests on these servers, so be good ones. Nothing sucks worse than a project whose value depends one the resources of others but is then a shitty steward of them. I just did one of these projects that involves consuming RSS feeds and when I can I use the modification times to avoid fetching feeds I can avoid. I try not to not fetch too often, even though my project’s responsiveness would be improved by checking everyone’s feed more often. We’ve all got to coexist, so sometimes you have to bust out the golden rule.

Update: A member of the Everest team was stand up enough to leave an apologetic comment on this post, for which I thank them. It should also be noted that they were ethical enough to have an identifiable UserAgent that allowed me to find them. Since I wrote this, I have had two others with just default Java strings in there hit at a much higher level than Everest. Uncool, uncool.

X hits the Spot

Listening to this week’s Personality Crisis (via the RSS feed of course), Jon Kincaid played a long long set of music from X. Dang, I love this band and I love that show. He played a bunch of their songs, and yet didn’t hit my favorite – “The Phone is Off the Hook (but You’re Not)”.

I think that Punk: Attitude documentary I watched on IFC was pretty shameful in spending 40 minutes on London punk, 40 minutes on New York punk, and like 2 and half on California punk. At least half of the bands I really care about from the scene are Cali bands – X, Dead Kennedys, Circle Jerks, Black Flag, (really old) Suicidal Tendencies, etc. I know the director was actually a player in the NY punk scene and has loyalties that way, but jeez.

Excerpted RSS Feeds

I’ve been beating the drum for full text RSS feeds for years, enough to have burned myself out on the subject multiple times. Years later, it’s still about where it always has been and still swirls at about the same rate. Let me strip down my end of the argument to a non-moral, completely empirical set of observations.

  • When I read postings in my RSS reader, it takes effectively no time to move from item to item because they have all already been downloaded before I look at them.
  • When I open the webpage of an item from that feed it takes time, usually from 1 to 10 seconds per item.
  • When I sit down to read my feeds, I typically have between 40 and 200 individual items in there. At an average load time of 3 seconds per item , that would add from 2 to 10 minutes to my reading time just in waiting for pages to load if everyone did this.
  • Most excerpted feeds are really excerpted. Here’s a real world example of something that came down a feed, the information I was given to decide whether I want to pursue reading this or not:

    While Wharton claims he may now have been “assimilated” into the culture of Action Greensboro, I seriously doubt it. While I, too, attended last night’s follow-up meetin

  • If you knew how often I looked at the first 18 words of your post and decided that although I care enough to subscribe to your RSS feed I don’t care enough to chase this post down, it would probably hurt your feelings. Sorry kids, you have to make tough calls in this life.
  • I’m actually becoming a full-text hardass again, and by the end of the week will be purging out all the excerpted feeds from my newsreader. If you don’t care enough to make it easy on me trying to follow lots of information, I don’t care enough to read your stuff. That’s harsh, but quid pro quo often is.

Don’t Trust iTunes Lists

I saw this article about podcasting affecting the business models of public radio. It’s an interesting read, but I want to focus on one paragraph:

Talk about pent-up demand. According to Maria Thomas, vice president and general manager of NPR Online, it took only six days after launch for NPR’s “Story of the Day” podcast to reach the coveted No. 1 spot on iTunes for most downloaded podcast. On Nov. 21, NPR’s podcasts held down 11 spots on the iTunes Top 100, more than any other media outlet.

People always assume that’s what the top of the iTunes list means – “most downloaded shows.” That’s a sensible thing to believe, because that’s how any rational person would set up that system. In fact, unless I missed it we don’t actually know what that list means. At least in the recent past, it was “the number of times the subscribe button has been hit for this podcast in the recent past”, a vastly different metric. As much as I use and like (most) Apple products, they are a completely opaque company. For all any of us knows, they changed how those lists are generated yesterday and will again tomorrow.

This is just another longer way of saying that those top N lists mean nothing, and the ones from the iTunes Music Store mean even less than that. Don’t over interpret data when you don’t know what it actually is.

AmigoFish

It’s time that I announced the project I’ve been working on in my evenings and weekends. It is a collaborative filter for new media – podcasts and videoblogs mostly. Behold, the mighty AmigoFish! You can create an account, rate the things you care about and get predictions for other things you might like. I’ve been using it that way pretty much every day for the last month, and have found all kinds of new things to listen to. Try it out and let me know what you think. I’m not going to burden everyone with the “beta” or not nomenclature. It is a work in progress, much like everything like this. I appreciate any feedback you might have about how it works for you, what features you might like, and so on. I’m nervous about going public with this, but it seems like it is time. It’s not like it is finished – it never will be – but I realized that the main reason for not putting it out there was only my own fear. Put it all on black and let it ride!

There is also a blog I’ve been keeping for the project. Obviously, I haven’t sunk a lot of time into things like changing from the default template. If you follow that blog, you can get a little insight into the inner workings. I’ve been asking people that were early users to keep quiet about it so that I could stay off the radar until I was ready to go public. That time is now, so you are all welcome to do what you want. If you could throw a brother a little blog link love, I’d be highly appreciative.

Great big thanks to the alpha users, some of whom suffered through the really bad interface that was on the early iterations of this project. The input of those people was really helpful. I tried to do what the people like Jason Fried say you should – I set up the shell of a useful thing and let people’s use of it drive where it should go. That will be what continues to drive it forward. Here’s hoping some of you sign up and use it and find it useful.

I’m scared, but excited. It’s time to push this bird out of the nest and let it fly or not.

Backbeat

In my podcast earlier this evening, I went public with my recent signing up with Back Beat Media. I’ve been giving a lot of static to the various podcast networks, so here is their opportunity to fire back. I am in fact in a network per se, but one organized around the one function I’m not good at and don’t like to do – getting sponsors.

My rules aren’t changing. I’m still not taking preproduced announcements and only taking things that people will let me riff on and make fun of their message, and so forth. From my perspective, I’m outsourcing something I’d prefer to leave to those who are better at it than me. They have no say in the editorial end or in the content or delivery in any way. For me, this is the best of all worlds. I’m still playing my own game my own way, you just deal with my agents when you want to sponsor it.

If you’d be interested in sponsoring this here shenanigan, you can request a quote online. However, bear in mind that we are sold out through the end of the year so it can’t start right away.

EGC Clambake for November 29, 2005

Here is the Bittorrent link and direct MP3 download for the EGC clambake for November 29, 2005.

I talk about the Backbeat Media deal for sponsorships; I play a song by American Heartbreak; I present my interview with JD Lasica from the Portable Media Expo; I play a song by Big Machine; I talk about how weird it is to say that AJAX as a tool is “all hype”; I play a song from the upcoming Michelle Malone CD; hasta manana, iguana.

You can subscribe to this feed via RSS.

This episode is sponsored in part by the fine folks at iPod Observer and Reel Reviews! Don’t forget, you can fly your EGC flag by buying the stuff package. For the month of November, $25 of your purchase goes to the Mercy Corps.

This show as a whole is Creative Commons licensed Attribution-NonCommercial-ShareAlike 1.0.

Links mentioned in this episode:

PlayPlay

iPodderX Name Change

My former sponsor and the podcatching client I use ever day, iPodderX must change its name, due to the heavy handedness of some pricks at Apple. I think that sucks, but they are making the best of it by holding a fun contest to rename the product. I have thus far recused myself from every single podcast contest because I didn’t want it to look (or really be) shaky by winning anything. This one, however, I’m going for. I don’t know exactly what I’m doing, but it should be far afield from most things because in doing things like this I turn off my conscious mind and try to reach down into my inner lizard brain with a hint of Tourette’s. I don’t want to give away anything that will let you poach my entry, but I guarantee anything I submit will not have “pod” in it. This should be fun.

As We Mean To Go On

When Kevin Smokler’s book Bookmark Now: Writing in Unreaderly Times came out, it had a joint essay by two of our dear friends, Nicola Griffith and Kelley Eskridge. They are both writers and have been a couple for 17 years. I am beamingly proud to have been their friend for 13 of those years. That perhaps betrays a certain lack of judgment on their part, but everyone has their flaws.

Their essay, “As We Mean to Go On”, is now available online. I rave and rave about both of their fiction, as you can find in the history of this weblog. I actually love the writing of Kelley in this essay. Her prose style is beautiful and often borders on poetry, but it is not necessarily direct. That’s why I was tickled to find such in-your-face bits as this:

And there’s the occasional truly nasty questioner who can’t quite hide the hope that writing and love are two horses fighting in harness, pulling in opposite directions, that our work is the slow bullet in the brain of our relationship. Don’t you ever worry that she’ll be more successful? I mean… Yes, sunshine, we know what you mean. Fuck you.

At our ages, sometimes you have to let out your inner angry 17 year old punk in black boots to stomp some shit. I highly recommend this essay as a great look into the creative process and in doing work informed by your moral values. She also touches right in the same vein I’ve been talking about in the context of podcasting:

[A]s much as I want to be a rock star, I’m resisting the impulse toward the Cult of Me. My connections with readers are about the work: how it is to read, to write, to become part of each other’s story for a little while.

Later on, Nicola brings it home with this insight:

Here’s another paradox: I believe firmly that it’s a mistake for a reader to assume she knows the details of a writer’s life from reading her work, but I also believe that if you have read all of my novels you have an essential grasp of how I regard the world. The details are fictional, but the essence shines through. I can’t hide it: most of me doesn’t want to. Trying to hide is probably the major contributing factor to bad fiction.

Please do read the whole essay, and if you find it valuable or touching or meaningful, you’ll surely like any of the novels either of them have written. They are all fabulous.

MP3 Weirdness

The most recent show, the one I published yesterday, has something odd about it. Several people have said they can’t copy it to their iPod and I’ve confirmed this myself. I didn’t deliberately do anything different and I can’t see any particular reason why this should be this way. Can anyone figure this out? Any Apple engineers reading this that feel like taking a crack at it? Why should any standard MP3 resist being copied to an iPod? It’s the man, trying to keep me down!

Me and Tura Satana

A while ago I posted my comment to the Reel Reviews Radio show notes about Faster Pussycat Kill Kill in which I tell the silly story of my meeting with the women from that movie. In an odd thing to happen on Thanksgiving, Tura Satana herself left a response saying that she remembers that day. Crazy, daddyo! I’d be surprised if my friend Suzie remembered it. Luckily, I only had nice things to say about Tura , who was the nice one who was very sweet that day. It was the other two that seemed like they hated me.

This is an object lesson in handling our modern information age. Assume anything you say, even about the Russ Meyer girls in a comment thread, will be read by anyone that you reference. This post-Google/Technorati/Feedster world is highly Kibo-esque.

EGC Clambake for November 24, 2005

Here is the Bittorrent link and direct MP3 download for the EGC clambake for November 24, 2005.

This show’s glaring error: I refer to a Jill Sobule song as “unreleased” and 3 seconds later name the album it came from. Where are my continuity people when I need them?

I talk about what I’m thankful for; I play a holiday song from Jill Sobule; I talk a little more about the Portable Media Expo; I have a distressing mouth; I play a song by Big Leg Emma; I tell a little story of my youth and play a song by Camper Van Beethoven; I try to sign off but my laptop runs out of battery and force sleeps 15 seconds before I finish.

This episode is sponsored in part by the fine folks at iPod Observer and Reel Reviews! Don’t forget, you can fly your EGC flag by buying the stuff package. For the month of November, $25 of your purchase goes to the Mercy Corps.

This show as a whole is Creative Commons licensed Attribution-NonCommercial-ShareAlike 1.0.

Links mentioned in this episode:

PlayPlay

Return of the Attack of the Inbox

This last six weeks have been a whirlwind, with the day job and PME and Converge South, et al. I had an email inbox relapse, flirting with the 300 message mark again. I’m going to try to take advantage of the long weekend by hammering that back down. When you have so few things in the inbox that you don’t need to scroll to see them all, it’s just a nice feeling.

Happy Thanksgiving

Here’s hoping that everyone has a happy day tomorrow, whether you celebrate it or not. I’m thankful that I get to have my career while living in this small town, whilst connected to people around the world. I’m thankful that anyone cares to listen to what I have to say or take the time to be interviewed by me, and I’m thankful so many of you are returning the favor by providing me with insightful and entertaining things to listen to.

Free Culture Interview

Mark Forman’s interview with me has been published. As an obligatory old media dig, I will note that very few of the print reporters I have spoken to in the last year were this engaged. In fact, Dan Conover might be the only one. My experiences with the national press is that they generally weren’t very insightful, curious, nor did they even seem to be listening very hard. Check out this interview and let me know what you think.

Update: In retrospect, the reporters from Wired were pretty good too. The big national daily newspapers from your really big cities, they were the ones that most underimpressed me.

Update: Part 2 is up as well. Thanks, Mark!

Time Flies

Man, it seems like yesterday was the day I got off the plane from the Portable Media Expo. Time is flying so fast I can barely notice it go by. The day job remains busy, the various stuff I do all evening and weekend stays busy. Hard to believe that Thanksgiving is almost upon us already. It’s also hard to believe that despite all the stuff I have in the can and ready to go, I’ve only done one show since coming back from the Expo. Must fix that over the holiday weekend!

Nappy Times

I’ve been fading in and out of wakefulness this afternoon, on our first stretch of real chilly weather. It’s just that kind of day, a lazy fall Saturday with nothing urgent to do. Right now, I’ve woken back up but every other person and creature in the house is asleep. I think I might rejoin them.

Leaves

Friend of the show Mark Welker (the infamous guy in the EGC shirt at Converge South) has posted a >beautiful little vlog entry. It is just footage of trees with leaves that looks like it was shot straight up from the back window well of his car. The combination of the music and the leaves and the blue sky and the utility wires moving past gives it a dreamy feel. He too is using the CVS camcorder and I think this turned out great. This is yet another bit of evidence that tells me the “only professional standards video will be watched” crowd is wrong. I loved this.

I have a video interview i shot with Mark I need to publish. I also have all kinds of footage of Converge South, some of PME and like 5 audio interviews in the can. I’m in no danger of running out of material anytime soon.