2D slices and arrays in Golang

Subtle, unexpected behaviour of Golang's slice "append"

Python leaky unittests Gotcha - Too Many MySQL Connections

OpenWorm - a long way to go

Open Source Software Presentation

I am a software engineer (lapsed physicist) who likes working with open source technology. I spend most of my time writing code in Python but I am currently also learning Go.

In Golang it is possible to allocate a 2D array, as long as the size of the array is known at compile time, for instance the following code:

package main

import (
    "fmt"
)

func main() {
    var MultiDimensionalArray [3][5]int

    MultiDimensionalArray[2][1]=5
    MultiDimensionalArray[1][1]=2

    fmt.Println(MultiDimensionalArray)

}

will print the following:

[[0 0 0 0 0] [0 2 0 0 0] [0 5 0 0 0]]

However, if we want to decide the size of the 2D array at runtime (dynamic allocation) we get an error:

package main

import (
    "fmt"
    "math/rand"
)

func main() {
    numColumns := rand.Intn(30)
    var MultiDimensionalArray [numColumns][5]int // Will Error: non-constant array bound numColumns
    fmt.Println(MultiDimensionalArray)
}

This is because the size of the array needs to be known at compile time (technically, it needs to be constant at compile time, there is a more detailed discussion of this on the go-nuts mailing list)

In Golang, when working with sequences of typed data it is much more common to utilise slices rather than arrays. When used correctly they have little to no overhead compared to arrays.

If we want a "2D slice" we actually want to construct a "slice of slices" as shown in this example

import (
    "fmt"
    "math/rand"
)

func main() {
    rand.Seed(10) //Initialize the random number generator with a seed
    numRows := rand.Intn(10) //4
    numColumns := rand.Intn(10) //8

    sliceOfSlices := make([][]int, numRows) //initialize the slice of slices
    for i := range sliceOfSlices {
       sliceOfSlices[i] = make([]int, numColumns) // intialize every slice within the slice of slices
    }

    fmt.Println(sliceOfSlices) // will print a 4x8 slice of slices
}

The builtin make function does not require the len of a slice to be known at compile time.

If, like me you find this slightly messy you may want to look at Gonum Matrix. Although I haven't used it yet, it appears to be the nearest thing Golang has to Python's numpy.

In Go, a slice describes a segment of an array. It consists of

  1. a pointer to the array
  2. the length of the segment
  3. its capacity (the maximum length of the segment).

go slice

When calling append on a slice, if the length of the slice after the append operation is larger than the underlying array (the capacity, in Go-speak), a new array is instantiated to hold the new data (The new array is is much larger, to accommodate more data being added). As a consequence, any other slice previously referring to the same underlying array now refers to a different array. This gives rise to some slightly strange behaviour, best shown in this example:

package main

import (
    "fmt"
)

func main() {
    a := []int{1,2,3}
    slice1 := a[:2]
    slice2 := a[:2]
    slice2 = append(slice2,4) // first append
    slice2[0] = 8 // underlying array still the same, both slice 1 and slice 2 have element 0 == 8
    fmt.Println("slice 1:", slice1) // slice 1: [8 2]
    fmt.Println("slice 2:", slice2) // slice 2: [8 2 4]
    slice2 = append(slice2,4) // array capacity now exceeded, new underlying array created
    slice2[0] = 3 // slice1 still has element 0 == 8, but slice 2 has element 0 == 3
    fmt.Println("slice 1:", slice1) // slice 1: [8 2]
    fmt.Println("slice 2:", slice2) // slice 2: [3 2 4 4]
}

At the begining, slice1 and slice2 above referred to the underlying array. When an append is performed on slice2 the underlying array still has sufficient capacity and so an in-place change to slice2 shows up in slice1. However, when the second append happens the underlying array is not big enough so slice2 refers to a new array with a larger capacity. Therefore, any changes to slice2 are "out of sync" with slice1.

This kind of thing is something you're unlikely to encounter too often, but is good to be aware of. In general, it seems like having slices refer to the same underlying array is probably best avoided, unless it's for a very specific reason.

Some time ago I noticed that some of my integration tests were failing with a MySQL Too many connections error. The tests were being run with Python's unittest package. Each integration test was opening a MySQL connection, but I couldn't understand the cause of this error - each test was run with a new TestInstance class and I (wrongly) assumed Python's garbage collector would come along and clear the old TestInstance, closing the MySQL connection in the process.

Heading over to Stack Overflow eventually gave me the answer (thanks Kara!)

It turns out that a reference to each test case is kept until the entire test suite is run, preventing the garbage collector from freeing this object

So the following TestCase will have opened 3 simultaneous MySQL connections by the time it terminates:

import unittest
import MySQLdb

class TestMySQLConn(unittest.TestCase):
    def setUp(self):
        self.db_connection = MySQLdb.connect(some_credentials)

    def test1(self):
        # a MySQL connection opens here
        pass

    def test2(self):
        # another MySQL connection opens here
        pass

    def test3(self):
        # yet another MySQL connection opens here
        pass

The best solution is to use a tearDown method in the TestCase:

def tearDown(self):
    self.db_connection.close()

This problem was eventually discussed and resolved in Python3.4

OpenWorm is a fantastic project which it has been my pleasure to work on for two years. In that time I have seen the community go from strength to strength and our prospects of making meaningful scientific contributions continue to grow.

It's recently been driven home to me how cautious one has to be when reading articles in the media about science. It all started with a video I released to our mailing list. This was intended as a very early proof of concept - showing that our method of simulating a worm muscular system as blocks of PCISPH contractile matter had the potential to reproduce undulutory thrust - an exciting discovery, but one filled with subtle caveats.

A few days later this BBC news article about OpenWorm surfaced. It's fantastic to see how much interest there is in our project, but the article was (while generally OK) inaccurate in a few ways. John Hurliman for instance is incorrectly attributed as project leader (OpenWorm has no "leader") and the project has been going on for much longer than suggested, the scientific claims were broadly correct though. Over time more articles were released, with the claims made becoming a bit outlandish, culminating in this. There were also some really good articles - the one on phys.org being my personal favourite.

We have released a blog post addressing our concerns about reporting.

openworm

I recently gave a workshop on Open Source Software development at the Malta Information Technology Agency.

It was a great pleasure and I would like to thank all who came for participating. The slides are located here.