My other sites: Technomancy - Celtic Knot Creator

PHP - How to track down a memory leak

written by rory, on Jan 25, 2010 4:04:40 PM.

I had to track down a bug in a PHP application. It was a long running maintenance script called from the command line. There was a memory leak and it was slowly growing in memory usage.

I created this little top level function:

$last_mem_usage = 0;
function mem_print($line_num) {
        global $last_mem_usage;
        $total = intval(memory_get_usage());
        print " at line $line_num  we have used ".number_format(intval($total - $last_mem_usage)). " since last total=$total last=$last_mem_usage\n";
        $last_mem_usage = $total;

Call it like so:

, and it will show you the memory usage since you last made that call.

If you suspect a certain function (or loop, or ...) of swollowing up your memory you can put it before or after the function call, like so:


Show which directories have the most files (regardless of size)

written by rory, on Jan 21, 2010 11:51:00 AM.

I was rsync-ing a directory to remote host using an old version of rsync, so it had to count all the files on my local machine before it could start copying (newer versions of rsync start copying before reading all the files). However this was a large directory tree, with thousands and thousands of files, and it was taking rsync a long time to count. There are tools like ncdu to show how much disk space each folder is using. However in this case I want to reduce the number of files, regardless of their size, so ncdu is useless to me.

This bash snippet will show how many files & directories are in each folder, with the most 'populous' directories at the bottom.

find | cut -d/ -f2 | uniq -c | sort -n

How to easily force lots files to be downloaded in Apache

written by rory, on Jan 14, 2010 11:23:00 AM.

I had a website, serving up many files. Sometimes you want to 'encourage' a user to download a file instead of their browser displaying a file. You can do this by setting the "Content-Disposition" header. However I had thousands of files that I wanted to force the user to save instead of display. I also needed to be able to let people display them normally.

I came up with this solution, by adding the following to my apache configuration file.

    Alias /download /path/to/my/docroot
    <Location /download>
        SetEnvIf Request_URI "^.*/([^/]*)$" FILENANE=$1
        Header set "Content-disposition" "attachment; filename=%{FILENANE}e"
        UnsetEnv FILENAME

If I go to, then it will display the image in my web browser, however if I go to, it will automatically add in the correct HTTP header to force the file to be downloaded. It will also include the correct filename (in this case "myfile.jpg")

It works by adding a new alias for your DocRoot ("/download") and then for all requests that start with that, it'll scan the requested URL and pull out the filename into an apache environment variable. Then a new apache header is added to the response based on this filename to tell the browser to download the file.

There are a few other ways to do this. One solution was to write a simple PHP wrapper script that will add the header and then read the file in and write it out to the user. There are some advantages to this apache header solution. Since apache just thinks it's sending a normal file, it will add the correct ETag or Content-Length HTTP headers.

There already is a Navit format of the OSM planet!

written by rory, on Nov 30, 2009 1:09:40 PM.

I wrote previously about my difficulties in creating a navit file for the whole OpenStreetMap planet file.

However I discovered today that the Navit project provides an dump of the OSM planet file on their website!

Navit formatted OSM planet file

osm2sqlite - A programme to convert from OSM files to SQLite

written by rory, on Nov 29, 2009 7:50:00 PM.

I've created osm2sqlite, a programme to convert from OSM files (OpenStreetMap's XML file format) into a SQLite database. It's not very good for display OpenStreetMap data in a GIS programme. It's good if you want to muck around with raw OpenStreetMap data, in a familiar SQL environment.

I'm using git to store the code, you can get it here:

Sample usage:

  • Checkout the code
    git clone git:// .
  • For this example, we'll download the OSM data from Ireland.
  • Unzip the file. Currently it has to be unzipped. Support for converting bzipped files is planned.
    bunzip2 ireland.osm.bz2
  • And finally run the script to convert the OpenStreetMap file to a database called ireland.sqlite.
    ./ -o ireland-2009-11-17.osm -d ireland.sqlite

How to diff RTF files

written by rory, on Nov 27, 2009 10:24:29 AM.

I recently got 2 RTF files and needed to diff them. As a command line jockey, I know how to diff normal text files. However I didn't know how to diff RTF files.

I discovered unrtf a programme for converting RTF files into plain text, (it has some bugs, like all software).

To do a word diff install dwdiff, and use the following command

dwdiff <(unrtf --text file1.rtf) <(unrtf --text file2.rtf)

What would *you* put on a harddrive going to Kenya?

written by rory, on Nov 27, 2009 9:38:00 AM.

When I was in Kenya in July 2008 with Camara teaching people how to use Ubuntu Linux, I brought a hard drive mirror of apt with me. Over in Africa bandwidth is slow and expensive, I was able to use this apt mirror to install new software easily over in Kenya. I gave a talk at OSSBarCamp about Using Free Culture in an Internet Free World.

I'm doing it again. A friend of mine in Kenya asked for a new harddrive. So what would you put on a harddrive going to Kenya?

I asked on the Ubuntu NGO mailing list, and got some great responses, from Ubuntu Screencasts, printer drivers, and wikipedia dumps. What else in the free culture world is there?

This post also appeared on the Ubuntu NGO Blog

Video of "Using Free Culture in an Internet Free World" from OSSBarCamp

written by rory, on Nov 16, 2009 11:12:32 AM.

In September 2009 I did a talk at OSSBarCamp in Dublin, Ireland entitled "Using Free Culture in an Internet Free World". It was about my experiences using Free Culture / FLOSS / Linux in Africa where the internet access is very bad. Free Culture (e.g.: Wikipedia) and Linux can help you get around this. Find out more by watching the video. page | Direct download link to the video (534MB)

The video is copyrighted but released under a Creative Commons Licence, CC-BY-SA

Using Android SDK on Ubuntu 9.10 Karmic Koala

written by rory, on Nov 4, 2009 11:30:46 AM.

I have an Android Dev G1 phone. It's still running firmware 1.0, which is positively ancient. I want to upgrade it. I plugged in my android phone and ran "adb devices" to check that it was detected. It didn't come up. It used to work. I described how to do it a previous blog post. Some googling showed that things have changed recently. Despite what this page from Google says, you can't get it to work that way. Thanks to this blog post for point me in the right direction.

You need to create a new udev file

$ sudo gedit /etc/udev/rules.d/51-android.rules

Then put the following lines into it:

SUBSYSTEM=="usb", SYSFS{idVendor}=="0bb4", SYMLINK+="android_adb" MODE="0666"

Then restart udev:

$ sudo /etc/init.d/udev restart
$ sudo service udev restart

Then plug your device in and you should see it with "adb devices"

How to import current Wikipedia dumps

written by rory, on Oct 21, 2009 10:31:00 PM.

Wikipedia provides database dumps and the main way to process this 5GB compressed XML file is with a C programme called xml2sql, which converts that file into a few raw text files, representing the text of wikipedia articles. However the XML schema changed and the current xml2sql programme doesn't work. If you run it using a recent dump (eg from October 2009), you'll get this error:
$ bzcat enwiki-latest-pages-articles.xml.bz2 | ./xml2sql
unexpected element <redirect>
./xml2sql: parsing aborted at line 33 pos 16.
The problem is the "<redirect />" element in the XML file. xml2sql doesn't know what to do with it and so stops. Each article has a "<redirect>" tag, and it doesn't change for any of the articles. I've managed to run xml2sql by stripping out this tag. You can do it like this:
$ bzcat enwiki-latest-pages-articles.xml.bz2 | grep -v '    <redirect />' | ./xml2sql