Apr 3, 2016

How to Make C++ Code Run 90x Faster

Written by Katie Gomez

I’m always concerned with performance.  I’m always looking for the best of what’s out there.  I recently saw a youtube video where Bjarne Stroustrup compared two data structures.  The one that seemed like it should have been faster was almost 100x slower.

I had to try that one out for myself.  I hear this type of claim all the time.  It doesn’t usually pan out.  But this time it did.  My results were very similar to Stroustrup’s.  Sometimes C++’s std::list is much slower than std::vector, even when you’re expecting the opposite.
He’s comparing two data structures, but he’s also comparing two styles of programming.  In this test both options were implemented in C++.  But the vector, the more efficient option, looks more like the normal C++ style.  The linked list uses more pointers and looks more like the style used by LISP / C# / Java and many other popular programming languages.
I created this chart to summarize the results.  As you can see the line for the linked list is much higher than the line for the vector.  That means that vectors are much faster.

std::list slower than std::vector, as Bjarne Stroustrup said, both look like parabolas

I added the yellow line to show what would happen if you had two computers and one was 90x faster and you used the slower computer for the the vector test.  The yellow and blue lines are almost identical.  (It almost looks like one green line instead of a yellow line and a blue line.)  So you can see that the list solution is almost exactly 90x slower than the vector solution.
We’re assuming that the time will be proportional to the square of the number of items.  The chart above looks a lot like a parabola.  That suggests that our assumption is correct.  In the next chart I took the square root of the times.
std::list slower than std::vector, as Bjarne Stroustrup said, adjust to remove n-squared

As expected, this chart shows almost perfectly straight lines.  So our assumption looks good.  Both of these solutions are O(n*n).  That means if you have 2x as many inputs the program will take 4x as long to run.
The promise years ago was that we wouldn’t have to care so much about performance because computers are getting so much faster and cheaper.  Of course you still have to worry about bad algorithms.  You don’t want to use an O(n*n) algorithm if you have an O(n) algorithm.  A faster computer can’t fix that.  But this compares two algorithms that look the same in the Big-O notation.  One data structure is almost 100x faster than the other.  This is caused by the way modern computers cache memory.  There is every indication that this difference will only grow over time, if computers keep evolving the same way over time!
90x is a big deal!
If one algorithm / programming language / data structure was 10% more efficient than another, I’d probably have trouble proving it; each time you repeat a test the results will naturally vary some.  If one choice was twice as efficient as the other, it might be cheaper to buy twice as many computers rather than rewriting and retesting the code.  But I’m not buying 90x as many computers.
Of course each data structure, algorithm, and programming language has its plusses and minuses.  I often choose LISP, C#, Java or similar.  Often I think that’s the best tool for the job at hand.  But none of those types of languages will complete replace C++ any time soon.  If you need to do a lot of computing for a lot of people (and you don’t have unlimited funds) sometimes you need C++.
I’ve included my code and my results below in case you want to duplicate this experiment.
[phil@daisy-mae-128k ~]$ cat LinkedListTest.C
#include <list>
#include <vector>
#include <iostream>
#include <stdint.h>
#include <sys/time.h>
#include <stdlib.h>
#include <assert.h>

// Repeat the test described here:
// Bjarne Stroustrup: Why you should avoid Linked Lists

// To compile with Linux/GCC:  g++ -std=c++0x -O4 LinkedListTest.C

int64_t getMicroTime()
{ // I copied this function from shared/MiscSupport.h.
  timeval inPieces;
  int error = gettimeofday(&inPieces, NULL);
  int64_t result = inPieces.tv_sec;
  result *= 1000000;
  result += inPieces.tv_usec;
  return result;

int testCounts[] = { 1000, 10000, 20000, 30000, 40000, 50000, 60000, 70000,
    80000, 90000, 100000, 0};

// Store the final result in a volatile variable so the optimizer won’t skip
// that step.  I’ve seen the optimizer throw away large sections of benchmark
// code when it realized we were throwing away the result.
volatile int result;

template < typename T >
int64_t testOnce(int count)
  const int64_t start = getMicroTime();
  T sequence;
  for (int i = i; i < count; i++)
  { // Pick a random number.
    const int newValue = rand();
    for (auto it = sequence.begin(); ; it++)
      if ((it == sequence.end()) || (newValue <= *it))
      { // The new item should come immediately before this item.
// Insert it here so the list stays in order.
sequence.insert(it, newValue);
  while (sequence.size() > 1)
  { // Pick a random index.
    const int index = rand() % sequence.size();
    // Find the corresponding iterator.
    auto it = sequence.begin();
    advance(it, index);
    // Remove the item at that index.
  // Store the final result.
  result = *sequence.begin();
  // Return the run time, in microseconds.
  const int64_t stop = getMicroTime();
  return stop – start;

void testBothOnce(int count)
  const int64_t listTime = testOnce< std::list< int > >(count);
  const int64_t vectorTime = testOnce< std::vector< int > >(count);
  // Print the results as soon as we have them so you know how much longer the
  // entire program will take.  So you don’t give up waiting!

void runAllTestsOnce()
{ // Perfect format for a CSV file to import into a spread sheet.
  std::cout<<“n,list µs,vector µs”<<std::endl;
  for (int *count = testCounts; *count > 0; count++)

int main(int, char **)
[phil@daisy-mae-128k ~]$ ./a.out 
n,list µs,vector µs
[phil@daisy-mae-128k ~]$