Gzip files in a git history

July 3, 2017 Leave a comment

I’m migrating git repositories with large files to LFS. LFS does not support local compression and my app does support gzip compressed files so I guess its better to compress files. Thus I wrote a git filter-branch --index-filter script which gzip files in all commits. I use a pickle file to do not recompress already compressed files. It compress a 4GiB repository (.git size) of 200 commits in 2.5 hours which is not that good. I guess running faster would require hacking BFG Repo-Cleaner or use pigz instead of the Python zlib module.

from subprocess import *
import os
import tempfile
import gzip
import pickle
import shutil

def should_compress(name):
    # To be adapted to you needs
    return name.endswith('.app')

def git(*args):
    res = call(['git'] + list(args))
    assert res == 0

# Map uncompressed file SHA1 to compressed file SHA1
db_file = os.path.join(os.environ["GIT_DIR"], "gzip_rewrite.pickle")
    with open(db_file) as f:
        gzipdb = pickle.load(f)
except IOError:
    gzipdb = {}

commit = os.environ['GIT_COMMIT']

# Iterate of all files in the current commit
for l in check_output(['git', 'ls-tree', '-r', commit]).splitlines():
    f_mod, f_type, f_sha1, f_name = l.split()
    if should_compress(f_name):
        if f_sha1 not in gzipdb:
            p_show = Popen(['git', 'show', commit+':'+f_name], stdout=PIPE)
            p_hash = Popen(['git', 'hash-object', '-w', '--stdin'], stdin=PIPE, stdout=PIPE)
            print 'Compressing', f_name
            shutil.copyfileobj(p_show.stdout, gzip.GzipFile(fileobj=p_hash.stdin), 2**20)
            out, err = p_hash.communicate()
            assert out is not None
            gzipdb[f_sha1] = out
        git('update-index', '--add', '--cacheinfo', f_mod, gzipdb[f_sha1], f_name+'.gz')
        git('update-index', '--remove', f_name)

with open(db_file, "w") as f:
    pickle.dump(gzipdb, f, -1)

Shrinking pg_xlog in a Gitlab container

May 28, 2017 Leave a comment

I’m running a Gitlab container and my disk is getting full because of the Postgres pg_xlog directory. This seem to be a common problem and this blog provide a solution. Here is it with trivial adjustments to the Gitlab container context. This first require to stop Gitlab:

docker exec -it gitlab_web_1 gitlab-ctl stop
docker exec -it gitlab_web_1 /opt/gitlab/embedded/bin/pg_controldata /var/opt/gitlab/postgresql/data/ |grep Next.ID

Note the checkpoint IDs which are needed for the next command:

Latest checkpoint's NextXID:          0:86136
Latest checkpoint's NextOID:          19342

Then run the actual shrink and restart Gitlab:

docker exec -it gitlab_web_1 -u gitlab-psql /opt/gitlab/embedded/bin/pg_resetxlog -o 19342 -x 86132 -f /var/opt/gitlab/postgresql/data/
docker exec -it gitlab_web_1 gitlab-ctl start
Tags: ,

Docker container of host OS without image

March 12, 2017 Leave a comment

I often need to check that my applications binary distribution embed all its shared libraries so it can work on all distros. I used to do this with chroot and mount --bind but it’s much easier with Docker:

tar cT /dev/null | docker import - empty
docker run -v /lib:/lib:ro -v /path/to/my/app:/opt/app -it empty /opt/app/bin/app

This create an empty Docker image then run a container with only /lib mounted from the host.


GTK 3 Adwaita tabs are too high

December 18, 2016 Leave a comment

xfce4-terminal is now using GTK 3 in Debian Stretch which gives me tabs looking like:

I thinks this is too high for my 768 pixels laptop screen. Fortunately Gtk 3 themes are CSS based and can be easily tweaked. Adding:

notebook tab {
    padding: 0px;
    font-weight: normal;

to the ~/.config/gtk-3.0/gtk.css file will change all GtkNodebook widgets so they look like:

An easy way to test more CSS rules is using GtkInspector.

This is fixed in xfce4-terminal 0.8.4:


Mantis to Gitlab converter

October 2, 2016 Leave a comment

Still migrating my old mantis/tiki/gitweb forge to Gitlab. Below is the script I used to migrate my Mantis server to Gitlab. It use the Mantis WSDL API with Zeep and python-gitlab which make the best from the Gitlab web API and Requests.

It’s far to be perfect as for example, it does not restore original authorship of issues and notes.

#! /usr/bin/env python

import zeep
import gitlab
import re

# Read a cheap configuration file from the current directory
with open('mantis2gitlab.conf') as f:
    mantis_url = f.readline().strip()
    mantis_user = f.readline().strip()
    mantis_passwd = f.readline().strip()
    gitlab_url = f.readline().strip()
    gitlab_token = f.readline().strip()
    gitlab_group = f.readline().strip()

mantis = zeep.Client(mantis_url + '/api/soap/mantisconnect.php?wsdl').service
gl = gitlab.Gitlab(gitlab_url, gitlab_token)
gl_group = gl.groups.search(gitlab_group)[0]
gl_projects = {}
for p in gl.group_projects.list(group_id=gl_group.id, all=True):
    gl_projects[p.name] = p

mantis_projects = mantis.mc_projects_get_user_accessible(mantis_user, mantis_passwd)
for mantis_project in mantis_projects._value_1:
    # This is my naming rule, adapt to yours
    gitlab_project_name = mantis_project.name.lower()
    gl_project_id = gl_projects[gitlab_project_name].id
    mantis_issues = mantis.mc_project_get_issues(mantis_user, mantis_passwd, mantis_project.id, 0, -1)
    print "Project", gitlab_project_name
    for mantis_issue in mantis_issues._value_1:
        print mantis_issue.summary
        description = mantis_issue.description
        # avoid spurious reference
        if re.search("(#[0-9]+\s)", description):
            description = '```\n' + description + '\n```\n'
        if mantis_issue.reporter.name.lower() != mantis_user.lower():
            description += '\n(Mantis bug reported by ' + mantis_issue.reporter.real_name + ')'
        gl_issue = gl.project_issues.create({
            'title': mantis_issue.summary,
            'description': description,
            'created_at': mantis_issue.date_submitted.isoformat()
        }, project_id = gl_project_id)

        if mantis_issue.notes:
            for note in mantis_issue.notes._value_1:
                body = note.text
                if note.reporter.name and note.reporter.name.lower() != mantis_user.lower():
                    body += '\n(Mantis note by ' + note.reporter.real_name + ')'
                # use raw API to set the date
                gl._raw_post('/projects/%d/issues/%d/notes' % (gl_project_id, gl_issue.id), data = {
                    'body': body,
                    'created_at': note.date_submitted.isoformat()

        if mantis_issue.status.name in ['resolved', 'closed']:
            gl._raw_put('/projects/%d/issues/%d' % (gl_project_id, gl_issue.id), data = {
                'state_event': 'close',
                'updated_at': mantis_issue.last_updated.isoformat()

When writting this script I had to revert the convert process serveral time. Here is how I did on my Gitlab 8.12.3 instance:

docker exec -it gitlab_web_1 su - gitlab-psql
export COLMUNS=200
/opt/gitlab/embedded/bin/psql -h /var/opt/gitlab/postgresql -d gitlabhq_production
select id,name from projects ;

Assuming we want to clear the project 2 issues:

delete from issues where project_id=2 ;
delete from notes where project_id=2 and (noteable_type='Issue' or noteable_type='Commit') ;
Tags: , ,

Tikiwiki to markdown converter

October 1, 2016 Leave a comment

Here is the script I used to move my Tiki wiki to a Gitlab wiki. It just require Requests and Pandoc.

#! /usr/bin/env python

import requests
from requests.auth import HTTPBasicAuth
import re
import subprocess

with open("tikiwiki2md.conf") as f:
wiki_url = f.readline().strip()
http_user = f.readline().strip()
http_passwd = f.readline().strip()
wiki_user = f.readline().strip()
wiki_passwd = f.readline().strip()

session = requests.Session()
session.auth = (http_user, http_passwd)
response = session.get(wiki_url + '/tiki-login.php')
response = session.post(wiki_url + '/tiki-login.php', data = {'user': wiki_user, 'pass': wiki_passwd})
response = session.post(wiki_url + '/tiki-listpages.php', data = {'maxRecords': 2**30})

page_list = re.findall(r'tiki-index.php\?page=([^\"]+)', response.content)

for page in page_list:
    response = session.get(wiki_url + '/tiki-index.php?page=' + page)
    print_url = re.search(r'a title="Print" href="([^\"]+)', response.content).group(1)
    response = session.get(wiki_url + '/' + print_url)
    print "converting", page
    process = subprocess.Popen(['pandoc', '-f', 'html', '-t', 'markdown_strict-raw_html', '-o', page + '.md'], stdin=subprocess.PIPE)


For images download the img/wiki_up directory from the Tiki wiki server, add it to the git repository of the Gitlab wiki and run sed -i 's^img/wiki_up/^^g' *.md.

Fast Python VTK import

August 24, 2016 Leave a comment

he Python VTK module is long to import because it load all the native VTK libraries which most of the time you won’t need. With VTK already in memory (disk cache):

$ time python -c "import vtk"

real 0m2.951s
user 0m2.964s
sys 0m1.860s

Here is a way to make it fast:

import sys
import types

class FastVTKLoader(object):
def __init__(self, modules):
self.modules = modules

def imp(self):
if not sys.meta_path:
sys.meta_path[0] = self
import vtk
return vtk

def load_module(self, name):
return types.ModuleType(name)

def find_module(self, name, path=None):
if name.startswith("vtk.vtk") and not name.endswith("Python"):
vm = name[7:]
if vm not in self.modules:
return self

vtk = FastVTKLoader(["CommonCore", "CommonDataModel", "IOXML", "FiltersExtraction"]).imp()
$ time python myvtkloader.py

real 0m0.077s
user 0m0.076s
sys 0m0.000s

Inspired by this Stackoverflow answer.

Tags: ,