The moment you’ve been waiting for has arrived: Kijiji Coding Contest 2013 has a winner!
First, we would like to thank again everyone who participated. The problem was tricky to solve and required a significant effort. With a total of 119 submissions, we consider this first contest a huge success and we’re looking forward to organizing the next one.
And the winner is …
After intense reviews and heated debates, our official Top 3 is the following:
1 – Shida Li – wins a 15–inch MacBook Pro with Retina display
Shida provided one of the fastest solution, with a ton of low level optimizations and clever tricks (for instance, parsing each line backwards to avoid processing columns that weren’t needed).
It ran in 297ms on average on our test environment, the second fastest we received.
While doing that, Shida managed to keep his code clean and well encapsulated with a clear design.
This is an impressive piece of code that really blew us away!
Dive into the code:
- Clean and clear map-reduce strategy with discrete algorithm separation.
- Nicely implemented file-chunker that does not lose any data by carefully chunking only at newlines, which are the CSV record boundaries.
- Very interesting re-implementation of Integers and Strings to avoid object creation whenever possible.
2 – Keith Kim – wins a 32GB iPad with Retina display
With 480ms on average, Keith’s submission is the 5th fastest we received.
With an overall clean design, Keith squeezed out performance gains by implementing a few rather interesting classes.
It was by far the fastest among the 21 submissions which passed 100% of our test cases.
Dive into the code:
- You’ll find an interesting custom implementation of a hash map, designed for speed in this specific case. Although, we would have liked to see proper handling of hash collisions, even if it was verified that there were none in the sample data.
- Uh oh! Is there an unused class? Tsk-tsk… maybe made up for by the mechanical sympathy in the rest of the implementation.
3 – Donny Nadolny – wins a 16GB iPad mini
This one is very different. Donny gamed our public test case by sending back a lazy map, which gets loaded only after the first call to get.
With our full test suite, it fails most test cases as we didn’t call any get before iterating through the entry set.
Still, we found this trick clever, and also liked the concision of the code – we found it very readable, with clear intent and structure.
After adding a well placed get in our test suite, the average time is 3946ms and all the test cases are green.
This solution illustrates well the benefit of using a higher level language, in terms of expressiveness, to the expense of performance.
Dive into the code:
Donny’s entire entry is a quick read; clocking-in at only 58 lines of code in two files: ParkingTicketsStats.scala and ValueComparator.scala.
In our opinion, performance could have been improved by focusing some attention on the regex used for parsing the street names, and possibly reconsidering the line-by-line file-chunking strategy.
Congratulations to the three of you! We hope you’re going to like your prize, but if you ever decide to sell it on Kijiji, we promise we won’t take offense.
A note on performance
After the SortedMap trickiness which we explained in our last blog post, the topic that generated the most comments was performance.
Here is the range of performance we observed, with the slowest (56098 ms) 300 times slower than the fastest (186 ms).
So You Think You Can Code Durations
Here is a closer look at the top 80%. Only 8 submissions made it below 500 ms on our test environment.
The fastest submission was interesting with a clever trick of considering every word shorter than 4 characters to be a street suffix. This was sent by Yuan Tao who published his code here.
Unfortunately, ignoring the data integrity issues caused by this trick, this solution presented some flaws in the returned map.
The secret test suite
With the 3 submissions we published, you can see all the additional test cases that we executed and here are some details about those.
First, you’ll notice that we ran the submissions multiple times to compute an average duration after JVM warm-up. Some submissions didn’t support these multiple calls. We still looked at them but considered this as a problem since a useful “production ready” parser should handle these multiple calls, potentially with different files as input.
Another important criteria for us was the correctness of the results, meaning the returned map should be properly sorted and not missing data, either streets (loosely tested through the presence of duplicate values) or money (the total dollar amount reported should match the total found in the file within 5%).
Additional awesomeness which didn’t remain unnoticed
We want to acknowledge some really cool stuff that we’ve seen while going through all the submissions.
In no particular order:
Two people resorted to the Damerau–Levenshtein distance algorithm in order to fix the typos in the test file.
Two people solved the problem just for fun, sending solutions that were not integrated with our test case, one in PHP and one in Clojure.
Many people, while parsing byte arrays, skipped the first fixed size columns to gain some speed.
We’ve seen all kinds of CSV java library used in the submissions, one based on super-csv made it pretty far in the selection process.
Some people created a cache to retrieve street names faster from the address.
Here is a one-liner that returned a decent SortedMap from a HashMap, using Guava:
return ImmutableSortedMap.copyOf(streets, Ordering.natural()
Small issue with it, get(“UNKNOWN”) throws an IllegalArgumentException.
Finally someone wrote us to announce the imminent launch of www.parkintoronto.com which is going to use the same public data to try and help Torontonians avoid getting tickets, how awesome is that?