From Serendipity to WordPress

My former blog was I wanted to change to no longer maintain it myself and to improve its visibility. I bench-marked Blogger and WordPress hosting. Blogger has a very powerful way to customize look and feel of the blog but it’s a bit hard to use for a non web-developer. Moreover it doesn’t have any import and export features, so I chose WordPress. In WordPress you need to pay to have the CSS customization feature, but I don’t really need it.

WordPress import/export format (WXR) is a bit broken. It claims to be an RSS extension but looking at the import source code, it’s not even parsed as XML:


You will see that the WXR are parsed line by line using the php fgets command.

Serendipity RSS export is also broken as it only export summary but not full content of posts. Here is how I get round those problems.

I use phpmyadmin XML export to create a decent export of my Serendipity blog. Then I wrote this python script to convert it to WXR:

#! /usr/bin/env python
from xml.dom import minidom
from xml.dom import *
from time import *

def add_CDATA_node(out, tag_name, content):
    "Add a new tag with a CDATA content to a given element"

def add_node(out, tag_name, content):
    "Add a new tag with a text content to a given element"

def get_node(element, tag_name):
    "Get the text or CDATA content of a tag within a given element"
    n = element.getElementsByTagName(tag_name)[0].firstChild
    if n:
        return n.nodeValue.encode('utf-8').strip(' \r\n\t')
        return None

def parse_categories(document):
    """Parse categories from a Serendipity DOM.
     The output map post id to category tags"""
    categories = {}
    for e in inputDoc.getElementsByTagName("serendipity_category"):
        categories[get_node(e, "categoryid")] = get_node(e, "category_name")

    entry_cat = {}
    for e in inputDoc.getElementsByTagName("serendipity_entrycat"):
        cid = get_node(e, "categoryid")
    n = get_node(e, "entryid")
    if categories.has_key(cid):
        entry_cat[n] = categories[cid]
        entry_cat[n] = ''

    return entry_cat

inputDoc = minidom.parse("xrunhprof-s9y.xml")
out = open('/tmp/xrunhprof-wp.xml', 'w')
out.write("<?xml version='1.0' encoding='UTF-8'?>\n<channel>\n")
entry_cat = parse_categories(inputDoc)
for e in inputDoc.getElementsByTagName("serendipity_entries"):
    date = float(get_node(e, "timestamp"))
    ltime = localtime(date)
    gtime = gmtime(date)
    iid = get_node(e, "id")

    add_node(out, "title", get_node(e, "title"))
    add_node(out, "pubDate", strftime("%a, %d %b %Y %H:%M:%S +0000", gtime))
    body = get_node(e, "body")
    extend = get_node(e, "extended")
        body = body + '\n<!--more-->\n' + extend
    add_CDATA_node(out, "content:encoded", body)
        add_node(out, "category", entry_cat[iid])
    add_node(out, "wp:post_id", iid)
    add_node(out, "wp:status", "publish")
    add_node(out, "wp:post_date", strftime("%Y-%m-%d %H:%M:%S", localtime(date)))
    add_node(out, "wp:post_date_gmt", strftime("%Y-%m-%d %H:%M:%S", gmtime(date)))


#python, #serendipity, #wordpress