July 18, 2016

C is not magically fast, part 2

I wrote a blog post before about how C is not magically fast, but the sentiment that C has properties lacking in other languages that make it so is still widespread. It was with no surprise at all then that a colleague mentioned something resembling that recently at lunch break, and I attempted to tell him why it wasn’t (at least always) true.

He asked for an example where C++ would be faster, and I resorted to the old sort classic: C++ sort is faster than C’s qsort because of templates and inlining. He then asked me if I’d ever measured it myself, and since I hadn’t, I did just that after lunch. I included D as well because, well, it’s my favourite language. Taking the minimum time after ten runs each to sort a random array of 10M simple structs on my laptop yielded the results below:

  • D: 1.147s
  • C++: 1.723s
  • C: 1.789s

I expected  C++ to be faster than C, I didn’t expect the difference to be so small. I expected D to be the same speed as C++, but for some reason it’s faster. I haven’t investigated the reason why for lack of interest, but maybe because of how strings are handled?

I used the same compiler backend for all 3 so that wouldn’t be an influence: LLVM. I also seeded all of them with the same number and used the same random number generator: the awful srand from C’s standard library. It’s terrible, but it’s the only easy way to do it in standard C and the same function is available from the other two languages. I also only timed the sort, not counting init code.

The code for all 3 implementations:

// sort.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <sys/resource.h>

typedef struct {
    int i;
    char* s;
} Foo;

double get_time() {
    struct timeval t;
    struct timezone tzp;
    gettimeofday(&t, &tzp);
    return t.tv_sec + t.tv_usec*1e-6;
}

int comp(const void* lhs_, const void* rhs_) {
    const Foo *lhs = (const Foo*)lhs_;
    const Foo *rhs = (const Foo*)rhs_;
    if(lhs->i < rhs->i) return -1;
    if(lhs->i > rhs->i) return 1;
    return strcmp(lhs->s, rhs->s);
}

int main(int argc, char* argv[]) {
    if(argc < 2) {
        fprintf(stderr, "Must pass in number of elements\n");
        return 1;
    }

    srand(1337);
    const int size = atoi(argv[1]);
    Foo* foos = malloc(size * sizeof(Foo));
    for(int i = 0; i < size; ++i) {
        foos[i].i = rand() % size;
        foos[i].s = malloc(100);
        sprintf(foos[i].s, "foo%dfoo", foos[i].i);
    }

    const double start = get_time();
    qsort(foos, size, sizeof(Foo), comp);
    const double end = get_time();
    printf("Sort time: %lf\n", end - start);

    free(foos);
    return 0;
}


// sort.cpp
#include <iostream>
#include <algorithm>
#include <string>
#include <vector>
#include <chrono>
#include <cstring>

using namespace std;
using namespace chrono;

struct Foo {
    int i;
    string s;

    bool operator<(const Foo& other) const noexcept {
        if(i < other.i) return true;
        if(i > other.i) return false;
        return s < other.s;
    }

};


template<typename CLOCK, typename START>
static double getElapsedSeconds(CLOCK clock, const START start) {
    //cast to ms first to get fractional amount of seconds
    return duration_cast<milliseconds>(clock.now() - start).count() / 1000.0;
}

#include <type_traits>
int main(int argc, char* argv[]) {
    if(argc < 2) {
        cerr << "Must pass in number of elements" << endl;
        return 1;
    }

    srand(1337);
    const int size = stoi(argv[1]);
    vector<Foo> foos(size);
    for(auto& foo: foos) {
        foo.i = rand() % size;
        foo.s = "foo"s + to_string(foo.i) + "foo"s;
    }

    high_resolution_clock clock;
    const auto start = clock.now();
    sort(foos.begin(), foos.end());
    cout << "Sort time: " << getElapsedSeconds(clock, start) << endl;
}


// sort.d
import std.stdio;
import std.exception;
import std.datetime;
import std.algorithm;
import std.conv;
import core.stdc.stdlib;


struct Foo {
    int i;
    string s;

    int opCmp(ref Foo other) const @safe pure nothrow {
        if(i < other.i) return -1;
        if(i > other.i) return 1;
        return s < other.s
            ? -1
            : (s > other.s ? 1 : 0);
    }
}

void main(string[] args) {
    enforce(args.length > 1, "Must pass in number of elements");
    srand(1337);
    immutable size = args[1].to!int;
    auto foos = new Foo[size];
    foreach(ref foo; foos) {
        foo.i = rand % size;
        foo.s = "foo" ~ foo.i.to!string ~ "foo";
    }

    auto sw = StopWatch();
    sw.start;
    sort(foos);
    sw.stop;
    writeln("Elapsed: ", cast(Duration)sw.peek);
}