Tag Archives: sitemaps

Search-building: custom or Google

Until earlier this week, I had a lousy site search in place. It was one of Google’s Custom Search Engines, barely configured and only on its own page, due to it’s hefty (and blocking!) JavaScript. I’d long since disabled WordPress’s search since my stories aren’t being run in WordPress, and I didn’t feel like trying to chew on the internal search mechanisms to include the stories.

Last week, I started playing around with a project to create my own (Python) site search, including a crawler and Whoosh-based search. I’d seen the implementation of a Lucene search in Zend go fairly easy-peasy, and liked the idea of a self-hosted search.

Problem is–well, one of the problems is–the crawl time for a site with 1200 posts (most of which are low-priority) is a deal-breaker on a shared hosting provider. It takes far longer than 5 minutes just to collect the links, even with multiple threads. Add the parse time to get indexable content for 1200 pages, and I was stuck contemplating how to crawl and index the site in parts.

This sounds like a great, fun, project. …Except that it’s already been done and I have other things I’d rather be doing. Google did it; their index for my site updates surprisingly quickly and doesn’t make me afraid that Dreamhost will smite me. (I’ve been with Dreamhost for several years now, and while I’ve learned how to properly deploy a site since moving here from… Brinkster, was it?, I don’t relish the idea of learning a new environment for all the stuff I run here.)

So instead of the 4-5 hours I’d spent screwing with the Ikea-esque assembly of a site crawler and search, I spent two this week really making Google’s Custom Search Engine (CSE) work for me. Yes, there are ads. Yes, it’s not a solution that I own. (Then again, neither is my email, in that sense.)
Continue reading Search-building: custom or Google

Quasi-daily linkage

  • Broadsheet – Salon.com – Very well-written article on overweight folks having to buy two seats on planes. "See, those of us who are and/or love people to whom airlines' "person of size policies" apply don't automatically envision the discomfort of getting stuck next to a fatty; we envision the physical and emotional pain of being the fatty crammed between two potentially hostile strangers, at the mercy of flight attendants who might decide we're fine on one flight and a "safety risk" on the next."
  • Four Fabulous Video Band-Aids YOU need NOW! | Bitch Magazine – "My secret ritual is to curl up in my favorite chair (preferably with a hot cup of tea in hand and a kitty on my lap) to devour a few substantive video clips from insightful women."
  • The Most Common HTML and CSS Mistakes to Avoid | Tips – I'm very glad I rarely make these. I'd like to see a more advanced form of this list, though, for those us who use HTML/CSS often enough to break them in more interesting ways.
  • Create a Professional Web 2.0 Layout | Psdtuts+ – Quite awesome guide, especially since I'm puttering around with a site at the moment. I'll have to translate it to Fireworks-speak, though, since that's what I have at the moment.
  • sitemap-generators – Project Hosting on Google Code – I definitely recommend this for generating sitemaps. I've got it cron-jobbed on Dreamhost, and it generates a gzipped sitemap in about 1 second — 2500+ links.
  • WILL THE iPAD BE A FLOP? APPLE SEEMS TO THINK SO. – Brain – – Points out something interesting about how Apple is marketing the iPad.