Dangers of unit testing undefined behavior

Recently I participated in a intracompany discussion about a concurrency defect. A piece of code looked something like this:

Thread1:

Thread2:

The problem? ‘keepRunning’ a plain boolean without any synchronization. According to the JMM a whole slew of things could go wrong with the above code, one of which is for Thread1 to run forever because it never sees Thread2’s update to ‘keepRunning’.

What’s interesting about this is that there was an integration test that inadvertently tested this scenario and has always passed, so the problem was never caught. Once the code started running a production box which has different hardware characteristics (a lot more cores/memory) this code blew up.

This is one of those examples where unit testing doesn’t produce good results. It’s very dangerous to get an intuition about incorrect concurrent code by running simple unit tests. These one off unit tests run for a short period of time on a box that potentially has few cores, the JIT doesn’t kick in, inlining doesn’t happen the system isn’t under heavy load and the hardware configuration could be favorable to not surfacing the error. As shown in this post (C and Java examples), just because incorrect concurrent code works the way the author expects doesn’t mean it will continue to work. I’m not sure if that was the author’s intention but that’s what I got out of it.

This is why I am pessimistic about types and unit tests when it comes to catching interesting errors found in production. Unit tests/types are good for catching obvious things like “this method doesn’t accept arguments of this category” or “what happens when this method gets passed an empty string instead of a string that I expect?”. I have yet to see a language/test framework that can help with concurrency problems.

I know of are two partial solutions to the concurrency problem.

1) Try to avoid errors by construction, i.e. have good design that makes doing the wrong thing harder. Immutable data structures by default is a big first step in that direction.

2) Feynman method. Think really hard and write code that doesn’t contain concurrency bugs, if that’s not possible try to convince a friend or co-worker to think very hard with you.

The first method is really just a special case of the the Feynman method.

Everyone uses open source, but no one talks about it

About a month ago, I donated some money to OpenBSD because they were in trouble financially. After reading about the response to their call for help, I started wondering why it had to get so bad for a popular open source project before people rallied around them.

I’m a Java developer. Over the course of my career so far, I’ve worked both for Fortune 500 companies and companies that had less than 20 people. I’ve worked for companies that are only 100 years younger than America, and some that were founded only a few years after the tech bubble. The one thing that unites these companies is their use of open source software.

For small companies, it’s usually because they’re too small and cash-poor to afford commercial software like an Oracle licence, so they use MySQL or PostgreSQL. Older companies are usually trying to modernize, and the technical leadership decides that they no longer want to use Websphere/Weblogic, and would rather move to something with less cognitive overhead like Apache Tomcat.
I’ve seen dozens of cases where the only reason a project succeeds in a given time frame is because of easily-available mature open source solutions.

For example, every Java contract I’ve ever worked on has used Apache Commons. Tomcat, Jetty, JBoss increasingly Netty are prevalent for application servers and web service construction. Outside of Java, RabbitMQ, ZeroMQ, and ActiveMQ for are standard for queueing. PostgreSQL, MySQL, HBase, and Cassandra are common for persistence. In Javascript, I don’t think there are any proprietary frameworks left worth mentioning.

What I find much rarer is for somebody in a technical leadership position to acknowledge the huge role that open source plays in the success of these projects, and not to just acknowledge the platform, but doing the right thing and donating some money to these projects. If even 1% of the companies that use these projects were to donate 1% of the costs these projects save them, I’m confident there would be no funding issues for any of these projects.

These projects don’t exists for the money, and the developers who work on them aren’t doing it to become rich. But a bit more goodwill from decision makers in businesses that rely on these projects strikes me as the right thing to do.

In defense of PHP

This past weekend, I took a quick overnight trip to NYC to attend Code Montage’s, Coder Day of Service in Manhattan. It’s a great event where people donate their technical skills and time to help others who don’t have technical skills but have needs in the nonprofit realm. I worked with a woman who wanted to build a website to bring awareness to a cause she was passionate about.

She wanted the website to include posts with links to resources around the issue and testimonials from others involved with the same issue to build a community. It was great working with her and helping her address her needs.

What I found the most interesting from a technical standpoint was some of the previous advice she received about how to go about building such a website. Someone else at the hackathon had previously recommended that she look into Victory-Kit, a static-based website generation project which hasn’t seen any pull requests or other activity for over four months. It’s generally a bad idea to build any kind of website for end-users based on a dormant project: abandoned code means numerous unpatched bugs, undeveloped features. These sites are also really hard to support once the hackathon is over and the end user has to go to someone else for code maintenance.

Another person at the table recommended we do it using Jekyll with Chef. Chef’s a really great option for programmers because it allows maximum flexibility and control, but it’s only great for people who really know what they’re doing and have the time to devote to development. This was not a great idea for a non-technical user working on non-profit projects with tons of other things on her plate.

Interestingly enough, although it was the obvious solution nobody thought to use WordPress. It just works, it’s a mature technology, it has a vibrant community around it, tons of plugins, many free themes and many more premium themes that end-users can customize fairly easily without any programming knowledge.

There is a distaste for any PHP based technology in the web development community and I suspect I know why. A lot of blogs and comment boards disparage PHP recommending to use Ruby, Python, NodeJS or anything else, as long as it’s not PHP. I wonder if most of these haters formed their opinion of PHP a decade ago and haven’t looked at it since.

I bet most critics have no idea what modern PHP or PHP frameworks look like. I am glad the PHP community largely ignores the hate and continues to produce excellent code that just works.

Exercise 1.45 of SICP

Exercise 1.45: We saw in section 1.3.3 that attempting to compute square roots by naively finding a fixed point of y \mapsto \frac{x}{y} does not converge, and that this can be fixed by average damping. The same method works for finding cube roots as fixed points of the average-damped y \mapsto \frac{x}{y^2}. Unfortunately, the process does not work for fourth roots — a single average damp is not enough to make a fixed-point search for y \mapsto \frac{x}{y^3} converge. On the other hand, if we average damp twice (i.e., use the average damp of the average damp of y \mapsto \frac{x}{y^3}) the fixed-point search does converge. Do some experiments to determine how many average damps are required to compute nth roots as a fixed-point search based upon repeated average damping of y \mapsto \frac{x}{y^{n-1}}. Use this to implement a simple procedure for computing nth roots using fixed-point, average-damp, and the repeated procedure of exercise 1.43. Assume that any arithmetic operations you need are available as primitives.

Doing experiments on the average number of damps to get the fixed-point procedure to converge seems to be:

Degree of root Power of 2 Number of average damps
2 21 1
4 22 2
8 23 3
16 24 4
32 25 5

Using this basis it makes sense to write a procedure discrete-log that computes the number of powers of 2 in a given number. It’s a bit like a very primitive implementation log2 function hence the name. Discrete-log is \Theta(\log_2 n) because it halves the number of computations every iteration.

> (root 2 0)
1
> (root 2 1)
2
> (root 2 2)
1.4142135623746899
> (root (expt 2 50) 50)
1.9999956962054166
> (root 0.00000001 2)
1.0000000000082464e-4

This program summarizes most of Chapter 1. It has lexical scope, passing functions as parameters, functions that construct and return other functions and the square root algorithm with an improved close-enough? procedure from exercise 1.7.
The repeated-compose and discrete-log could be done iteratively instead of recursively like this.

tailor svn to hg

I few weeks ago I discovered tailor. After installing the latest tarball. Following this tutorial I was able to checkout from a svn repository from work into my hg repo. I was not interested in pushing back changes, in my case the project ended, however it should be easy enough to do.

I called my config file: projectname.tailor

At the terminal:

This will create two directories in /home/me/projectname/ hgside and svnside.