Photo for Glenn Fleishman

Blog

Writing

What I Do

Biography

GlennLog

Turning technology from mumbo-jumbo into rich tasty gumbo

� Switcher: Safari to Firefox | Main | Capsule SpongeBob Movie Review �

November 20, 2004

Throttling RSS

I've put into place a throttling mechanism for my RSS feed from Wi-Fi Networking News. I'll post the code soon. I use an Apache server and am essentially forcing through several RSS, RDF, and Atom feed files to be retrieved seamlessly through a script.

The script uses a MySQL database to record the agent name and IP address of requests. If the RSS feed hasn't changed in the last hour or, if longer, since the last time the same IP and agent requested a feed, the RSS aggregator gets a 304 (not modified) instead of a full dump.

I'm willing to take a small hit on testing this--losing some RSS aggregators that don't interpret this behavior correctly--in order to test whether I reduce my overall RSS feed suck.

As I noted a few days ago, my RSS feed from Wi-Fi Networking News (vast majority) is nearly an average of 400 MB per day. But a substantial minority of that is from stupid aggregators that don't check for modifications, but always request the full feed.

This is my way of fooling them. We'll see if it breaks anything, or just makes it more efficient.

Later: I have the early observations about which aggregators are really, really stinky at understanding what "please don't retrieve a page because it hasn't changed" means. I'm only intercepting GET requests, not HEAD requests, as I understand how Apache works, so I'm only recording hits from aggregators that keep taking and taking and taking bandwidth.

The top villains (and I'd be glad to get more information about them -- please drop me a line). The fact that some of these appear multiple times means that they are being accessed from different IP addresses.

Great news/update (11/22)! The folks at Xmission, whose Xmission RPC Agent was one of my top offenders, responded to some email I wrote in which I asked them if they could take a look at how their engine works, and they said there was a bug causing this kind of repetition which they've fixed. What a win-win situation: they use less bandwidth and computational time, and I don't lose readers! I've written the SmartBarXP people and hope to get a response from them, too.

Another update (later on 11/22): Greg from NewsGator wrote to find out why I was seeing such high usage from NewsGatorOnline. It's a well-behaved 'gator, it turns out: my script captures all GET requests, and the NewsGator makes all the right moves to not retrieve a non-modified page. But these are recorded in my logs as zero-byte 200 (OK) HTTP transactions. Thus NewsGatorOnline shows up with a lot of requests, but isn't pulling down traffic. Scratch 'em off the list!

Agent nameRequests over a few hours
XMission RPC Agent Fixed! 11/22253
NewsFire/0.2836
SmartBarXP WinInet27
SmartBarXP WinInet22
NewsGatorOnline/2.0 Not a problem, turns out19
NewsFire/0.2817
SmartBarXP WinInet16
SmartBarXP WinInet16
SmartBarXP WinInet16
SmartBarXP WinInet16
SmartBarXP WinInet15
SmartBarXP WinInet15
curl/7.9.8 (i386-portbld-freebsd4.6.2) libcurl 7.9.8 (OpenSSL 0.9.6g) (ipv6 enabled)15
SmartBarXP WinInet15
lwp-trivial/1.3515
SharpReader/0.9.4.1 (.NET CLR 1.1.4322.2032; WinNT 5.1.2600.0)15
SmartBarXP WinInet15
SmartBarXP WinInet15
SmartBarXP WinInet14
Oddbot/1.0 (+http://oddpost.com/oddbot.html)13
NONE12
IdeareNews/0.812
NewsFire/0.2812
FeedOnFeeds/0.1.7 (+http://minutillo.com/steve/feedonfeeds/)11

It looks like my next plan may be to entirely block certain aggregators by replying with an XML "pllllllllhhhhbbbbtt" and an item encoded note saying, "Please ask your aggregator's software developer to correct behavior in using requests to determine changed syndication feeds. You will then be allowed to use this feed again." I might offend some readers, but it looks to me like I might save a number of gigabytes a month now and much more in the future as usage grows. If you use RSS with Wi-Fi Networking News, please let me know if you're seeing errors, by the way.

Posted by Glennf at November 20, 2004 11:34 AM

Trackback Pings

TrackBack URL for this entry:
https://db.isbn.nu/mt3/mt-tb.pl/2745

Listed below are links to weblogs that reference Throttling RSS:

Nice hack... from Teal Sunglasses
Glenn comes up with a tool to rate limit stupid aggregators. Nice hack. if it doesn't work, maybe we can convince him to modify it to return a default XML doc that sends back a "please fix your broken aggregator" page instead.... [Read More]

Tracked on November 20, 2004 12:33 PM

Throttling RSS Seems to Work from The RSS Blog
Randy: Glenn is finding out that well behaved aggregators mean that RSS scales. [Read More]

Tracked on December 8, 2004 8:35 AM

My Posts on Throttling from Regular Sucking Schedule
I posted three items over the last few weeks about RSS bandwidth use and my attempts to throttle it back. The first one, on Nov. 13, shows a chart of usage and how rapidly its grown. My second, on Nov. 20, shows my attempts to throttle usage through a... [Read More]

Tracked on December 11, 2004 2:13 PM

Comments

May 2008
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Recent Entries

Archives


May 2008 | April 2008 | March 2008 | February 2008 | January 2008 | December 2007 | November 2007 | October 2007 | September 2007 | August 2007 | July 2007 | June 2007 | May 2007 | April 2007 | March 2007 | February 2007 | January 2007 | December 2006 | November 2006 | October 2006 | September 2006 | August 2006 | July 2006 | June 2006 | May 2006 | April 2006 | March 2006 | February 2006 | January 2006 | December 2005 | November 2005 | October 2005 | September 2005 | August 2005 | July 2005 | June 2005 | May 2005 | April 2005 | March 2005 | February 2005 | January 2005 | December 2004 | November 2004 | October 2004 | September 2004 | August 2004 | July 2004 | June 2004 | May 2004 | April 2004 | March 2004 | February 2004 | January 2004 | December 2003 | November 2003 | October 2003 | September 2003 | August 2003 | July 2003 | June 2003 | May 2003 | April 2003 | March 2003 | February 2003 | January 2003 | December 2002 | November 2002 | October 2002 | September 2002 | August 2002 | July 2002 | June 2002 | May 2002 | April 2002 | March 2002 | February 2002 | January 2002 | December 2001 | November 2001 | October 2001 |

Powered by Movable Type 3.33