Check out my
new game!

jlongster

Profiling compiled Gambit Scheme code

February 17, 2010

Last night I was trying to track down a major bottleneck in my iPhone game. I tried injecting timers throughout my game, but that was slow and I couldn't get a very detailed description of what was going on. I decided to see how much I could get from a real profiling tool like Shark or Instruments.

Obscurity from Scheme

The difficulty is that I'm using Gambit Scheme which compiles to C, and I want to profile my code as compiled. Using a standard profiler or debugger which pokes around in a binary won't be as helpful because of our abstraction away from C land.

However, Gambit Scheme does a pretty good job of keeping things in tact. The compiler even supports a -track-scheme option which generates "#line" directives which refer back to Scheme code. Nonetheless, because we are using Scheme->C compiler, we are going to lose some precision.

Shark

I was still curious what would show up. I fired up Shark and connected it to my iPhone according to this article, and took some samples of my game. The results were really cool.

http://jlongster.com/media/images/shark.png

The results generally make sense. Obviously, the core rendering function DrawTriangles is the most expensive. The symbols beginning with ___H_ are my Scheme functions. My scheme function mesh-object-render is definitely expensive, and it looks like I should check out vector->float-array and vec4d-component-mul to see if there's any way I can optimize those.

However, there is a lot of Scheme code which is compiled without symbols. You'll notice that my second and third most expensive functions can't be tracked.

Instruments

Now lets see what Instruments shows for the same sampling. Instruments is really cool because it shows real-time sampling of your running app, while Shark post-processes its samples (and takes a long time doing so). It's much faster to use Instruments.

The only thing I miss from Shark is the ability to "start" and "stop" my sampling. I don't care about the loading time of game right now, and it seriously skews the results if I include it. With Shark, I could start sampling after the game was loaded, but Instruments samples the complete run.

I tried applying an "Inspection Range" which is the highlighted part of the graph, but the results still look strange. I'm not sure why mach_msg_trap was still taking 78% of the time (it's usual at loading time becauase of resource loading off the disk causing blocked threads in the run loop).

http://jlongster.com/media/images/instruments.png

The results are similar to Shark's.

You'll notice mark_array and ___garbage_collect which is Gambit's garbage collector. I wonder if that is what Shark was showing as the untracked second and third most expensive calls. I'm not sure why Instruments would be able to resolve the symbols and not Shark though.

statprof

There is also a profiler specifically built for Gambit Scheme called statprof. It tracks what's happening in the Scheme code and generates colored HTML files to show to hotspots in the program. It's really neat, but sometimes has problems with compiled code.

Here's example output from my program when I interpret most of the code:

http://jlongster.com/media/images/statprof.png

I like the way it reports data. However, if I compile my program, I lose some precision. I compiled it and even ran it straight on my iPhone and for some reason it couldn't track the code at all. It uses some deep internal functions of Gambit to lookup source locations, so I don't really know what's going on, and I'm not sure if there's a way to get it better. It would be nice to see precise profile output straight from my iPhone.

For those of you who know Gambit, I did compile my Scheme code with the -debug-location option, and it didn't help at all. The compiled version on my computer seemed to track things roughly, but nothing showed up when I ran it on my iPhone.

Conclusion

I'm not really offering any helpful information, but just reporting what I did and my results. It's interesting that we can still gleam basic information from hooking up our compiled Scheme app to a normal profiler.

I'm sure there are ways of compiling Gambit Scheme code which let these profilers track it more closely. If you have any experience with this, send me an email and let me know!

Gambit supports "declarations" which configure how the code is compiled, and adding a few of these can drastically increase performance. One of them is block, and I think it might be compiling functions together and throwing away symbols. All of my code begins with:

(declare (block))

It's possible that I can play around with these and other Gambit Scheme compiler flags to make these profile results better reflect my Scheme code.

Comments are not implemented yet. In the meanwhile, send me an email.