Optimizing C++/Code Optimization/faster Operations: Structure Fields Order
Optimizing C++/Code Optimization/faster Operations: Structure Fields Order
struct {
char msg[400];
double d;
int i;
};
you can speed up the computation by replacing the structure with the following one:
struct {
double d;
int i;
char msg[400];
};
On some processors, the addressing of a member is more efficient if its distance from the beginning of the structure
is less than 128 bytes.
In the first example, to address the d and i fields using a pointer to the beginning of the structure, an offset of at least
400 bytes is required.
Instead, in the second example, containing the same fields in a different order, the offsets to address d and i are of
few bytes, and this allows to use more compact instructions.
Now, let's assume you wrote the following structure:
struct {
bool b;
double d;
short s;
int i;
};
Optimizing C++/Code optimization/Faster operations 2
Because of fields alignment, it typically occupies 1 (bool) + 7 (padding) + 8 (double) + 2 (short) + 2 (padding) + 4
(int) = 24 bytes.
The following structure is obtained from the previous one by sorting the fields from the longest to the shortest:
struct {
double d;
int i;
short s;
bool b;
};
It typically occupies 8 (double) + 4 (int) + 2 (short) + 1 (bool) + 1 (padding) = 16 bytes. The sorting minimized the
paddings (or holes) caused by the alignment requirements, and so generates a more compact structure.
n = int(floor(x + 0.5f));
Using such a technique, if x is exactly equidistant between two integers, n will be the upper integer (for example, 0.5
generates 1, 1.5 generates 2, -0.5 generates 0, and -1.5 generates -1).
Unfortunately, on some processors (in particular, the Pentium family), such expression is compiled in a very slow
machine code. Some processors have specific instructions to round numbers.
In particular, the Pentium family has the instruction fistp, that, used as in the following code, gives much faster,
albeit not exactly equivalent, code:
The above code rounds x to the nearest integer, but if x is exactly equidistant between to integers, n will be the
nearest even integer (for example, 0.5 generates 0, 1.5 generates 2, -0.5 generates 0, and -1.5 generates -2).
If this result is tolerable or even desired, and you are allowed to use embedded assembly, then use this code.
Obviously, it is not portable to other processor families.
Optimizing C++/Code optimization/Faster operations 3
f *= pow(2, n);
Explicit inlining
If you don't use the compiler options of whole program optimization and to allow the compiler to inline any
function, try to move to the header files the functions called in bottlenecks, and declare them inline.
As explained in the guideline "Inlined functions" in section 3.1, every inlined function is faster, but many inlined
functions slow down the whole program.
Try to declare inline a couple of functions at a time, as long as you get significant speed improvements (at least 10%)
in a single command.
References
[1] http:/ / www-graphics. stanford. edu/ ~seander/ bithacks. html
Article Sources and Contributors 5
License
Creative Commons Attribution-Share Alike 3.0 Unported
http:/ / creativecommons. org/ licenses/ by-sa/ 3. 0/