Home > Uncategorized > From Serendipity to WordPress

From Serendipity to WordPress

My former blog was http://xrunhprof.free.fr/serendipity/. I wanted to change to no longer maintain it myself and to improve its visibility. I bench-marked Blogger and WordPress hosting. Blogger has a very powerful way to customize look and feel of the blog but it’s a bit hard to use for a non web-developer. Moreover it doesn’t have any import and export features, so I chose WordPress. In WordPress you need to pay to have the CSS customization feature, but I don’t really need it.

WordPress import/export format (WXR) is a bit broken. It claims to be an RSS extension but looking at the import source code, it’s not even parsed as XML:

wordpress-2.3.1/wp-admin/import/wordpress.php

You will see that the WXR are parsed line by line using the php fgets command.

Serendipity RSS export is also broken as it only export summary but not full content of posts. Here is how I get round those problems.

I use phpmyadmin XML export to create a decent export of my Serendipity blog. Then I wrote this python script to convert it to WXR:

#! /usr/bin/env python
from xml.dom import minidom
from xml.dom import *
from time import *

def add_CDATA_node(out, tag_name, content):
    "Add a new tag with a CDATA content to a given element"
    out.write('<'+tag_name+'><![CDATA['+content+']]></'+tag_name+'>\n')

def add_node(out, tag_name, content):
    "Add a new tag with a text content to a given element"
    out.write('<'+tag_name+'>'+content+'</'+tag_name+'>\n')

def get_node(element, tag_name):
    "Get the text or CDATA content of a tag within a given element"
    n = element.getElementsByTagName(tag_name)[0].firstChild
    if n:
        return n.nodeValue.encode('utf-8').strip(' \r\n\t')
    else:
        return None

def parse_categories(document):
    """Parse categories from a Serendipity DOM.
     The output map post id to category tags"""
    categories = {}
    for e in inputDoc.getElementsByTagName("serendipity_category"):
        categories[get_node(e, "categoryid")] = get_node(e, "category_name")

    entry_cat = {}
    for e in inputDoc.getElementsByTagName("serendipity_entrycat"):
        cid = get_node(e, "categoryid")
    n = get_node(e, "entryid")
    if categories.has_key(cid):
        entry_cat[n] = categories[cid]
    else:
        entry_cat[n] = ''

    return entry_cat

inputDoc = minidom.parse("xrunhprof-s9y.xml")
out = open('/tmp/xrunhprof-wp.xml', 'w')
out.write("<?xml version='1.0' encoding='UTF-8'?>\n<channel>\n")
entry_cat = parse_categories(inputDoc)
for e in inputDoc.getElementsByTagName("serendipity_entries"):
    out.write('<item>\n')
    date = float(get_node(e, "timestamp"))
    ltime = localtime(date)
    gtime = gmtime(date)
    iid = get_node(e, "id")

    #title
    add_node(out, "title", get_node(e, "title"))
    add_node(out, "pubDate", strftime("%a, %d %b %Y %H:%M:%S +0000", gtime))
    #content
    body = get_node(e, "body")
    extend = get_node(e, "extended")
    if(extend):
        body = body + '\n<!--more-->\n' + extend
    add_CDATA_node(out, "content:encoded", body)
    if(entry_cat[iid]):
        add_node(out, "category", entry_cat[iid])
    #id
    add_node(out, "wp:post_id", iid)
    add_node(out, "wp:status", "publish")
    #date
    add_node(out, "wp:post_date", strftime("%Y-%m-%d %H:%M:%S", localtime(date)))
    add_node(out, "wp:post_date_gmt", strftime("%Y-%m-%d %H:%M:%S", gmtime(date)))
    out.write('</item>\n')

out.write('</channel>')
Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: