Last week, I started playing around with a project to create my own (Python) site search, including a crawler and Whoosh-based search. I’d seen the implementation of a Lucene search in Zend go fairly easy-peasy, and liked the idea of a self-hosted search.
Problem is–well, one of the problems is–the crawl time for a site with 1200 posts (most of which are low-priority) is a deal-breaker on a shared hosting provider. It takes far longer than 5 minutes just to collect the links, even with multiple threads. Add the parse time to get indexable content for 1200 pages, and I was stuck contemplating how to crawl and index the site in parts.
This sounds like a great, fun, project. …Except that it’s already been done and I have other things I’d rather be doing. Google did it; their index for my site updates surprisingly quickly and doesn’t make me afraid that Dreamhost will smite me. (I’ve been with Dreamhost for several years now, and while I’ve learned how to properly deploy a site since moving here from… Brinkster, was it?, I don’t relish the idea of learning a new environment for all the stuff I run here.)
So instead of the 4-5 hours I’d spent screwing with the Ikea-esque assembly of a site crawler and search, I spent two this week really making Google’s Custom Search Engine (CSE) work for me. Yes, there are ads. Yes, it’s not a solution that I own. (Then again, neither is my email, in that sense.)
Continue reading Search-building: custom or Google
So, I set up a Google Profile in the hopes that I could centralize my identity on the intarwubs using this new-fangled OpenID thing.
My first target was Yahoo(!). I know, Yahoo is so not cool, but the Charlotte Camarilla LARPs have their mailing lists there. (Side note: I would totally host and maintain forums on a server for them. Mailing lists are so 1998.). I just wanted to start to sign in with my Google info, because I hate making 15 different profiles. Even Facebook Connect would be fine.
Turns out I’m stuck using my antiquated Yahoo(!) account for that (and Flickr). Of course, it wants to be my OpenID, too.
I know OpenID is supposed to be (in part) about taking ownership for URIs, but I’m not greedy. I only want ownership of a couple: irrsinn.net and some other beast with all that stupid profile info (like the Google profile I constructed, or Facebook) that can identify me other places.
Continue reading OpenID: why is this so complicated?!
A combination of my recent job interview and reading Coding Horror reminded me that I’ve been slack on putting title attributes on my links here on the blagonet. The interview because it touched on accessibility, and Coding Horror because Atwood crosslinks a lot of his posts, but doesn’t give the name of the post in the title tag, which means I end up with 30 open tabs to blog entries that I couldn’t cherry-pick thanks to non-semantic URIs and no titles. I also don’t know which of Atwood’s links are internal or external without checking the URIs. Kinda frustrating.
So I reactivated my External Links plugin (which puts that nice little icon next to external links) and am being non-lazy about titling links.
I feel better already. Hopefully that’ll make things a little easier to read around here.
My initial pass at the story management plugin is in place. It doesn’t do the fancy things I want it to do yet, but it can note the root page of a series, output a list of stories in story-date order on that root page, note the arcs, and give Next/Previous links using the actual story names in the individual stories.
Not bad for about 3 hours of work. Especially having to code around my elbows in WP.
Interesting note: PHP stores timestamps as integers, which means that to get dates beyond the year 2038, I used this nifty little Date class. Dropped in like a charm, doesn’t seem to have caused any conflicts with other parts of WordPress.
I think I’ll polish the plugin up a bit (give it a description, for instance) and toss it up in the WP plugin library. It’s not very configurable at the moment, but that’ll probably wait until I get in the sorting features that I want.
I started working up the story engine last night in Ruby. Greg suggested that XSLT (eXtensible Stylesheet Language Transformation) would be perfect for it if I stored my stories as XML. Do a simple transformation, and voila, a story in XHTML with layout and everything. I was hoping I could knock it out in a single evening.
Turns out Ruby and XSLT are two more separate critters than I thought — there’s no default library to handle it, and the three or so available are older and lack documentation. Better yet, the one easily available as a gem doesn’t use Ruby’s nice REXML, it uses LibXML.
Combine all that with never having used XSLT before, and I’m in for some annoyance. Drat. Then, if I want any WP integration (like searching, for instance), I’ll need to write a WP plugin of some sort.
I think I’ll switch to writing a WP plugin. 🙂