|
|
Description: tech notes and web hackery from the guy that brought you bzero, python community server, the blogging ecosystem and the internet topic exchange
Last Update: 10:11:59 03/02/2006
|
| |
Additional Info
First Fetched: 00:18:16 01/31/2004
Last Updated: 10:11:59 03/02/2006
Headlines
<<
>>
1
2
3
4
5
|
|
|
I just heard from Layered Tech to say that my server (which hosts this blog, topicexchange.com, pycs.net, and so on) is going to be moved into a new datacenter; right now it's at The Planet, and it's going to SAVVIS. This'll happen in a couple of days, so there may be some downtime... Comment
|
| 16:29:41 February 23, 2006, Thursday (PST) |
Source: Second p0st |
 |
|
 |
|
|
|
I'm sure there is an option somewhere to do this, but I can't find it (and it looks like nobody on this wordpress.org support thread could either), so I hacked up a tiny plugin to let you get Wordpress archives sorted in ascending date order. Download: ascending-date-order-archives.php.txt Rename that to ascending-date-order-archives.php (i.e. remove the .txt extension) and save it in your wp-content/plugins directory, then go to your Wordpress plugins page and click 'Activate' on the Ascending date order archives plugin. From then on, your archives should appear in ascending order. Leave a comment here if you have any issues. Comment
|
| 13:35:21 February 23, 2006, Thursday (PST) |
Source: Second p0st |
 |
|
|
|
|
Here we go - view source for the key → edgeio-key: 6f1b18873f8d65b2ce8f1e0a66d68c2ba662e7a0 ← there it is! Comment
|
| 13:27:29 February 20, 2006, Monday (PST) |
Source: Second p0st |
 |
|
|
|
|
Interesting, slightly-off definitions of stuff around blogging by William Safire (via Dave). - Is that true about the "bye-line"? - "blog" is short for "weblog" (online journal), not "web server log file" (list of hits on a web server). Peter Merholz is responsible for the abbreviation. - A "meme" is a catchy idea. Chain letters count too, but the meaning is broader than that. The first time I heard of memes was when reading Snow Crash. Comment
|
| 09:10:37 February 19, 2006, Sunday (PST) |
Source: Second p0st |
 |
|
 |
|
|
|
I just got back today from the 7th IAPR International Workshop on Document Analysis Systems (proceedings), held in Nelson from 13-15 Feb. The presentations were all about document or image analysis, but the heavy use of AI techniques could make some of it relevant to what I work on these days. Some of the interesting people I met or caught up with: Adam Behringer (Exbiblio) Abdel Belaid (Loria) Jim Fruchterman (Benetech) Koichi Kise (Osaka Prefecture University) Bertin Klein (DFKI) Marcus Liwicki (University of Bern) Larry Spitz (DocRec Ltd) Noorazrin Zakaria (Université de la Rochelle) Projects I should take a look at: GroupLens (recommendation system) Semantic Wikis (workshop) PRImA Research document database OpenCV Digital Library of India IUPR camera-captured document archive Tohoku University's OCR web service IAM-OnDB pen-captured writing database Techniques I should learn (or re-learn): Gabor filters Hidden Markov models Standard classifiers: NNC, LDC Analytical segmentation ...
|
| 18:34:05 February 15, 2006, Wednesday (PST) |
Source: Second p0st |
 |
|
 |
|
|
|
Just so I don't forget - found this free OCR engine on SourceForge, a "commercial quality OCR engine originally developed at HP between 1985 and 1995". Rumour has it that Google will be developing open source OCR soon... Comment
|
| 16:10:58 February 15, 2006, Wednesday (PST) |
Source: Second p0st |
 |
|
|
|
|
I just realised that, since installing IE7, I haven't used the Google Toolbar -- IE7's Firefox-like search box is all I need. So I've turned it off. Comment
|
| 16:07:47 February 8, 2006, Wednesday (PST) |
Source: Second p0st |
 |
|
 |
|
|
|
In dom4j, once you have parsed a document, you can execute XPath expressions like so: List list = doc.selectNodes("/path/to/elements/you/want"); The doc.selectNodes function is a thin wrapper around the jaxen XPath library. This saves you from having to write: XPath xp = new org.jaxen.dom.Dom4jXPath("/path/to/elements/you/want"); List list = xp.execute(doc); However, if the document uses namespaces, you have to tell jaxen about them, or you won't have any way of specifying namespaced elements. Until yesterday, I thought the only way to do this was: XPath xp = new org.jaxen.dom.Dom4jXPath("/x:path/x:to/x:elements/x:you/x:want"); xp.addNamespace("x", "http://example.org/some-random-namespace"); List list = xp.execute(doc); This is a pain in the ass - three lines of code every time you want to use XPath! I did it this way for a little while, but finally got sick of it yesterday and dug through the dom4j code until eventually finding out how to do it properly: Map namespaces = new ...
|
| 14:21:24 February 8, 2006, Wednesday (PST) |
Source: Second p0st |
 |
|
 |
|
|
|
Finally I have a reason to write something in Java - a servlet to make queries against the eBay API and return the results to a Javascript application in a browser. So now I can add dom4j (Java DOM library), jaxen (Java XPath library) and JSON / org.json* (Javascript-friendly serialisation protocol) to my list of technologies I have used. Also, today I built my first WAR file, to send an early version of the code to the client. Easier than expected, a pleasant surprise! Comment
|
| 18:29:14 February 7, 2006, Tuesday (PST) |
Source: Second p0st |
 |
|
|
|
|
Another thing I have to blog because I always lose it and it seems to be getting harder to find on Google (this time I got it by searching for katamari nomiya kabata): Tim Rogers' review/commentary on Katamari Damashii 2 ("minna daisuki katamari damashii"). Here's his post about the original Katamari Damashii too, for reference. Comment
|
| 23:21:30 February 4, 2006, Saturday (PST) |
Source: Second p0st |
 |
|
|
|
|
To make pretty nice plum muffins, use this blueberry muffin recipe, substituting well-chopped plums for blueberries. (I made some a few weeks back but lost the printout, so this I'm blogging the URL so I won't lose it. It seems that it's hard to find muffin recipes these days that use butter rather than oil - so I don't want to lose this one!) Comment
|
| 17:26:56 February 4, 2006, Saturday (PST) |
Source: Second p0st |
 |
|
|
|
|
Useful: a whole book on processing XML in java, free. (I decided to get the eBay Java code to use the XML API rather than the Java API, which means I need to read and write XML manually, but it'll mean my object code will consist of a few kB of class files rather than a few kB of class files and 20MB+ of jars.) I haven't found an equivalent of ElementTree for Java, though... although here's a lightweight DOM implementation. Comment
|
| 01:48:10 February 3, 2006, Friday (PST) |
Source: Second p0st |
 |
|
 |
|
|
(How to debug PHP crashes (previously: "PHP: don't define functions inside switch statements"))
Hmm, looks like some versions of PHP, including the one running on outputthis.org, crash if you call a function defined inside a switch statement. Oops! --- BTW, if you find that your script is crashing PHP (i.e. your browser gives an error, the server drops the HTTP connection after receiving the headers, and you get an error in /var/log/apache/error.log like [notice] child pid 2706 exit signal Segmentation fault (11)), here's how you figure out what's going wrong - or at least get something to Google. * Step 1 - enable core dumping in Apache Edit /etc/apache/conf/httpd.conf (it might be at /etc/httpd/conf/httpd.conf) and set the CoreDumpDirectory option to a directory that Apache can write to. I created a new directory: mkdir /var/run/httpd-core chown apache.apache /var/run/httpd-core ... then put this line in httpd.conf: CoreDumpDirectory /var/run/httpd-core Now tell Apache to reload config: apachectl graceful * Step 2 - trigger the error Browse to the page or do whatever you ...
|
| 20:35:10 February 2, 2006, Thursday (PST) |
Source: Second p0st |
 |
|
 |
|
|
|
Hybrid (amzn) feat. Sheryl Crow (amzn). Listen to Crow's "Chances Are" and tell me that she wouldn't do better a better job of "Blackout" than Kirsty Hawkshaw... Comment
|
| 17:20:49 February 2, 2006, Thursday (PST) |
Source: Second p0st |
 |
|
 |
|
|
|
For a project at work, I've got to write a servlet that gets info from eBay via the eBay API and dumps it out in HTML or JSON for some Javascript to read later on. Naively assuming that the eBay Java SDK would be the easiest way to do this, I download it (all 55 megs). It unpacks to about 250 megs on my hard disk, most of which is documentation, apart from the 20 megs or so of compiled Java code that forms the actual SDK. Now, I expect that this documentation answers just about any question I might possibly have, only it seems that all the useful bits are spread all over the place... argh! So now I'm going straight to the XML API and writing the thing in Python first, then I'll port it back to Java when it does what I want. I've got more done in minutes with Python and the XML API docs than I managed in over a day with the Java SDK. I can't imagine that it'll be *that* hard to port it back to Java, either... and doing it this way will result in a few hundred kilobytes of class ...
|
| 18:53:23 February 1, 2006, Wednesday (PST) |
Source: Second p0st |
 |
|
 |
|
|
|
Looks like it considers the contents of RSS's channel <titledescription> element to be text, not HTML. Hmm. Maybe I'm finally going to have to strip those links out of there :-) Comment
|
| 13:20:54 February 1, 2006, Wednesday (PST) |
Source: Second p0st |
 |
|
|
|
|
If you're in Christchurch, you'll want to head down to the banks of the Avon river on Saturday 11 Feb - less than 2 weeks away - to see Kazu Negishi's installation, 1000 Windmills Garden. Our neighbours have been helping him out by building windmills - they've been spending months working on them. It takes a lot of work to make 1000 windmills... Comment
|
| 20:58:04 January 30, 2006, Monday (PST) |
Source: Second p0st |
 |
|
|
|
|
Is "dweeb" turning into a common synonym for "geek"? To me, "dweeb" sounds very negative... but it sounds like people are starting to refer to tech-types as dweebs, in a friendly sort of a way. Comment
|
| 00:18:41 January 30, 2006, Monday (PST) |
Source: Second p0st |
 |
|
|
|
|
Nice - KritX is an aggregator for hReviews. That means it trawls the blogosphere for reviews marked up with the hReview microformat. Comment
|
| 22:22:19 January 29, 2006, Sunday (PST) |
Source: Second p0st |
 |
|
 |
|
|
|
I remember reading a report about a proposed RDF-enabled blogging platform; it was a project from the Semantic Web group at HP Labs Bristol. It looks like they built it - it's called Semantic Blogging. I think the goal here is the same as that of Structured Blogging, although they've approached it from a different direction. Both projects have built publishing tools, but the StrB (which I can't just abbreviate to SB any more!) plugins focus on being able to publish as many different types of data as possible, and making it machine readable, whereas the SemB tool tries to do as much as possible with one microcontent type. This makes the StrB plugins rather more useful to users... while the SemB tool is a nice vision of the future. I guess now we have to fold those features into StrB and all will be well :-) Comment
|
| 23:55:58 January 28, 2006, Saturday (PST) |
Source: Second p0st |
 |
|
 |
|
|
|
Ah, cool, Aaron Swartz's Python web framework, web.py is out. Also it seems like the standard way to write Python web apps nowadays is to use WSGI and a server like flup to hook them up to SCGI servers (like Apache / mod_scgi, or my front-end proxy). Comment
|
| 18:41:01 January 28, 2006, Saturday (PST) |
Source: Second p0st |
 |
|
 |
|
|
|
I don't have enough time to do more than start hacking on the OPML Editor - Wordpress connection, so here's just that: a very small plugin that adds a handler to Wordpress for the opmlCommunityServer.saveFile XML-RPC method and parses just enough of OPML to be able to extract blogroll links. Rename the .php.txt file linked above to opmlcommunityserver.php and drop it in your wp-content/plugins directory, then activate the plugin, and go into your OPML Editor's dotOpmlData.root database and change the XML-RPC URL to point to your Wordpress blog's xmlrpc.php file instead of /RPC2 on the community server. Now when you save your blogroll.opml file, you should see your blogroll change on your WP blog. Note that it will blow away any existing links on the WP blog, so please only install this on your testing blog. It's intended as a starting point so someone else can build what Dave's asking, not as something for "end users" to use! Comment
|
| 01:43:49 January 26, 2006, Thursday (PST) |
Source: Second p0st |
 |
|
 |
|
|
|
... makes Spam Assassin look like Spam Geisha Excellent. Comment
|
| 15:00:29 January 23, 2006, Monday (PST) |
Source: Second p0st |
 |
|
|
|
|
Some good ones today: Richard Jones (who may be helping out with the simple media upload project) has got pyweek.org up and running. Ian Bicking continues to Paste-ify stuff. Nice little recipe - hierarchical split. So you can split a string by something, then split all the bits by something else, and so on. Convenient for some simple recursive descent parsing problems. Comment
|
| 10:37:42 January 22, 2006, Sunday (PST) |
Source: Second p0st |
 |
|
|
|
|
I've read a lot of whining from Perl hackers recently about Perl's slow decline and replacement by PHP for web dev and Python for everything else. Sounds like it's now Python's turn. Ivan Krstic: Rails is a warning that when it comes to web programming, we failed as a community, and the punishment is severe: innovation is now taking place elsewhere. Ian Bicking comments, and follows on with a web dev environment wishlist. Comment
|
| 18:46:58 January 19, 2006, Thursday (PST) |
Source: Second p0st |
 |
|
 |
|
|
|
Daniel Chudnov and Ed Summers have been working out a microformats for print citation based on OpenURL, a standard for encoding that sort of data inside a URL. Talking to Daniel has set off a bit of a "mind bomb" for me; I've been reading up about some of the work that's been done by the academic community, and it's quite amazing how much thought people are putting into it. They have a huge volume of data that they're trying to index / keep track of. It already has some metadata, but it's stored differently in every library. Tricky. I've thought for a long time that people have low expectations, software-wise, in the blogging community. Hardly anybody in "Web 2.0" (myself included) is building stuff that I would consider impressive from a programming POV. Time consuming to build, perhaps, but not anything special. The exceptions are outfits like Google (working on a massive scale), PubSub (super-fast matching with clever algorithms), Riya (image processing), Microsoft (handwriting ...
|
| 14:31:23 January 19, 2006, Thursday (PST) |
Source: Second p0st |
 |
|
 |
|
|
|
Salim Ismail (PubSub chairman / co-founder) is blogging, and badly needs some googlejuice for his new blog - "You've Got Ismail!" He's just posted a mammoth post about the evolution of the internet, and different modes of communication: messaging, request/response and publish/subscribe. I expect he'll have more to say about the third mode in due course :-) Comment
|
| 01:32:51 January 19, 2006, Thursday (PST) |
Source: Second p0st |
 |
|
|
|
|
I remember a widely-linked mail from Mark Pilgrim about a project he was working on to produce a personal database based on everything you do in Firefox. What happened to this? I don't remember hearing anything about it. Comment
|
| 19:45:51 January 18, 2006, Wednesday (PST) |
Source: Second p0st |
 |
|
 |
|
|
|
Alf Eaton sent me code today for the Structured Blogging plugins that would let them look up journal articles on PubMed, using his HubMed server. How he's done it is make an alternate lookup interface on HubMed that produces XML in the same format as the Amazon API, so it can easily be consumed by the SB plugins. Thanks, Alf! (You can download the code from Alf's server right now, or wait a few days and get it with the next version of the Structured Blogging plugins - it'll be there from v1.0pre13.) It sounds like Alf may be considering including the Structured Blogging plugin as part of NotePress, which is pretty cool. The SB code is open source, there for the taking, so if anyone wants to build structured publishing into software written in PHP or Perl, you're welcome to use it. Comment
|
| 19:02:21 January 12, 2006, Thursday (PST) |
Source: Second p0st |
 |
|
 |
|
|
|
One current issue with the Structured Blogging plugins is that they produce HTML that doesn't validate on the W3C validator and feeds that produce warnings on the Feed Validator. This is because of the method used to embed the structured post's XML source in the HTML output. How the output looks The current output looks like this, with the XML source for the post shown in bold: <script type="application/x-subnode; charset=utf-8"> <!-- the following is structured blog data for machine readers. --> <subnode alternate-for-id="sbentry_5" xmlns:data-view="http://www.w3.org/2003/g/data-view#" data-view:interpreter="http://structuredblogging.org/subnode-to-rdf-interpreter.xsl" xmlns="http://www.structuredblogging.org/xmlns#subnode"> <xml-structured-blog-entry xmlns="http://www.structuredblogging.org/xmlns"> <generator id="wpsb-1" type="x-wpsb-post" version="1"/> <event type="event/conference"> <name>Doc's show</name> <image>/~phil/sb_latest/images/syndicate_logo.gif</image> <person ...
|
| 14:28:30 January 11, 2006, Wednesday (PST) |
Source: Second p0st |
 |
|
 |
|