Alson Kemp

[Synthetic] Performance of the Go frontend for GCC

First, a note: this is a tiny synthetic bench.  It’s not intended to answer the question: is GCCGo a good compiler.  It is intended to answer the question: as someone investigating Go, should I also investigate GCCGo?

While reading some announcements about the impending release of Go 1.1, I noticed that GCC was implementing a Go frontend.  Interesting.  So the benefits of the Go language coupled with the GCC toolchain?  Sounds good.  The benefits of the Go language combing with GCC’s decades of x86 optimization?  Sounds great.

So I grabbed GCCGo and built it.  Instructions here: http://golang.org/doc/install/gccgo

Important bits:

  • Definitely follow the instructions to build GCC in a separate directory from the source.
  • My configuration was:

/tmp/gccgo/configure --disable-multilib --enable-languages=c,c++,go

I used the Mandelbrot script from The Benchmarks Game at mandlebrot.go.  Compiled using go and gccgo, respectively:

go build mandel.go
gccgo -v -lpthread -B /tmp/gccgo-build/gcc/ -B /tmp/gccgo-build/lto-plugin/ \
  -B /tmp/gccgo-build/x86_64-unknown-linux-gnu/libgo/ \
  -I /tmp/gccgo-build/x86_64-unknown-linux-gnu/libgo/ \
  -m64 -fgo-relative-import-path=_/home/me/apps/go/bin \
  -o ./mandel.gccgo ./mandel.go -O3

Since I didn’t install GCCGo and after flailing at compiler options for getting “go build” to find includes, libraries, etc, I gave up on the simple “go -compiler” syntax for gccgo. So the above gccgo command is the sausage-making version.

So the two files:

4,532,110 mandel.gccgo  - Compiled in 0.3s
1,877,120 mandel.golang - Compiled in 0.5s

As a HackerNewser noted, stripping the executables could be good. Stripped:

1,605,472 mandel.gccgo
1,308,840 mandel.golang

Note: the stripped GCCGo executables don’t actually work, so take the “stripped” value with a grain of salt for the moment. Bug here.

GCCGo produced an *unstripped* executable 2.5x as large as Go produced. Stripped, the executables were similar, but the GCCGo executable didn’t work. So far the Go compiler is winning.

Performance [on a tiny, synthetic, CPU bound, floating point math dominated program]:

time ./mandel.golang 16000 > /dev/null 

real  0m10.610s
user  0m41.091s
sys  0m0.068s

time ./mandel.gccgo 16000 > /dev/null 

real  0m9.719s
user  0m37.758s
sys  0m0.064s

So GCCGo produces executables that are about 10% faster than does Go, but the executable is nearly 3x the size.  I think I’ll stick with the Go compiler for now, especially since the tooling built into/around Go is very solid.

Additional notes from HN discussion:

  • GCC was 4.8.0.  Go was 1.1rc1.  Both AMD64.

Written by alson

May 5th, 2013 at 2:35 pm

Posted in Programming

with 8 comments

8 Responses to '[Synthetic] Performance of the Go frontend for GCC'

Subscribe to comments with RSS or TrackBack to '[Synthetic] Performance of the Go frontend for GCC'.

  1. I believe GCC tends to inline code much more aggressively than other compilers- this may explain the larger executable.


    5 May 13 at 4:08 pm

  2. Did you try “strip mandel.gccgo” ?


    5 May 13 at 6:12 pm

  3. Doh. No, I did not. Just did. Much better. I’ll update.


    5 May 13 at 6:55 pm

  4. BTW, if you set the appropriate environment variables, you can use go build -compiler=gccgo to compile your executables. Also it seems that you link the go libraries statically. Since gogcc also pulls in the glibc, no wonder the executable is so huge. Maybe try dynamic linking. For benchmarking, try https://github.com/davecheney/autobench


    5 May 13 at 7:18 pm

  5. I’ll have another go [har har] at that. I did flail at “go build-compiler=gccgo” for a bit before giving up, but I had quite a bit of a time with go/gccgo finding libraries and such, so I gave up and ran gccgo by hand.

    I agree on the point about static linking, size and such, but folks who poke at gccgo are going to have similar problems to the ones I experienced. The default behavior is the one I used and, while you could argue that I shouldn’t have used the default behavior, most people will use the defaults…


    5 May 13 at 7:24 pm

  6. May I contribute few more numbers?

    I was playing with google’s codejam problems (world finals 2012, shifting paths). At the time, performance of gccgo(4.7) & gc(1.0) was very asymmetrical — gccgo was 2 to 3 times faster in brute-force calculation, but memoised version (using a lot of map lookups) was much slower. It seems that current versions of gc(1.1) and gccgo(4.8.1) are more in-line.

    I wrote go version first, and then out of interest implemented equivalent code in C++.

    Run #1 — brute force solution, single problem from large input. This basically has some low-level bit twiddling.
    gc: 1m35s
    gccgo: 49s
    g++: 41s

    Run #2 — this version uses a lot of hash tables to memoise calculations. Run on a full E-large dataset. C++ version uses std::unordered_map for memoisation.
    gc: 3.2s
    gccgo: 2.25s
    c++: 1.18s

    Run #3 — run memoised solvers in parallel on 8 cpus. Go version only.
    gc: 0.87s
    gccgo: 0.95s


    4 Jun 13 at 11:45 am

  7. Alex,

    Definitely contribute more numbers! The only reason I wrote my original post was because I hadn’t seen any numbers, so more are a good thing. Can you share your code? Perhaps on Github?


    4 Jun 13 at 12:09 pm

  8. I have uploaded code here: https://github.com/usovalx/st

    Go version is a bit of a mess — I was still trying to figure out how to solve it.

    Building (stc is c++ version, stg is gccgo, st is gc):
    make stc stg st

    time ./st [-c] [-t] < E-large-practice.in

    -c — use memoisation
    -t — threads (go version only)


    4 Jun 13 at 1:05 pm

Leave a Reply