« CMake problems on Vista | Main

Trivial Destructors Grief

Recently I had the really strange issue with VS 2008 - a method marked __forceinline was not being inlined in Release mode. Messing with #pragma inline_depth and #pragma inline_recursion had no effect whatsoever. This was a big performance issue since a call chain 5 levels deep was not being inlined, and this was in a tight inner loop - VTune showed that the call overhead was larger than the amount of work in the methods. The code was trivial, and the deep expansion is the unfortunate result of a specific recursive template.

All of the un-inlined calls look like:

T operator()(const T &in) {
T result(in);
// do something trivial to result
return result;
}

The problem turned out to be the fact that T has the following declaration inside:

struct T {
~T() { /* empty */ }
};

The compiler does not figure out that the destructor is useless and so avoids doing the NRV optimization in each of the five calls - it tries to ensure that any side effects of calling the destructor (none in this case) are preserved - compilers are conservative this way. This in itself is not enough to prevent __forceinline from working, but for some reason, ~T() has a different linkage than the methods that call it, which prevents inlining.

I do not feel I fully understand what happened there (the code has many __m128 members and some posts on Microsoft Connect seem to indicate issues with inlining such code due to the requirement for 16-byte aligned stack). However, removing the trivial destructor did fix the issue - the methods are all inlined (even without __forceinline) and NRV and copy elision both optimizations both happen in the right spots. The code ran 2x faster as a result.