
Utility to patch binaries generated by the Intel C++ compiler to get the maximum performance on AMD CPUs.
The Intel C++ compiler adds to generated binaries a CPUID test that looks if they are executed on a Intel CPU, so the binaries don't run with full optimizations on non-Intel CPUs. This utility patches such CPUID tests, so the binaries can run on an AMD CPU as if they were on a Intel CPU.
Tested on Linux with Intel C++ compiler 10.1 (it may work with future releases of ICC). Maybe it also works with fortran compiler if it has the same CPUID test, but this is not confirmed.
You must have the libelf library. In Ubuntu 8.04 just install the package libelfg0-dev. With a
version around 0.8.6 it should work well. Now you can compile with the command:
make
In the source code tarball there is a file called benchmark-partial-sums.c (taken from
The Computer Language Shootout http://shootout.alioth.debian.org). This code can be optimized
with SSE2 by the Intel compiler.
Compile this code with:
icc -O3 -xW -o benchmark-partial-sums benchmark-partial-sums.c
To run the benchmark use:
time ./benchmark-partial-sums 100000000
These were the average results on my AMD64 CPU:
Just run:
patch-AuthenticAMD <executable_name>
In the /path/to/icc/lib there are the shared libraries used by the compiler. It seems that
patching all of them, the binaries generated by ICC won't have the CPUID test. So they run perfectly
in AMD. Probably only one of the shared libraries is the responsible of adding such test. Anyway, I
can't confirm this because I didn't try it.
But you are warned that modifying, disassembling or reverse engineering the Intel C++ compiler goes against the Intel EULA (End User License Agreement). So do at your own risk.
If you want to try, run this command in /path/to/icc/lib:
for i in *;do patch-AuthenticAMD -ev $i;done
Please, this tool seems to work well, but it is not very tested. Send me an email with your results. You can also send me questions, suggestions, or anything. Feel free to send me questions about the code. My email is:
jimenezrick@gmail.com
doc directorylibelf by Example.mht: http://people.freebsd.org/~jkoshy/download/libelf/article.html
a tutorial for `libelf' in FreeBSD. Almost everything it says is valid for Linux.naughty-intel.html: the person who wrote this article explains everything one need to know about
the subject.Here it is the dump of a binary compiled with ICC 10.1 (objdump -d icc_binary).
################## DISASSEMBLY OF AN ICC GENERATED BINARY #########################
0000000000402c5c <__intel_cpu_indicator_init>:
...
...
# Get CPU vendor string (EAX = 0)
402c84: 48 33 c0 xor %rax,%rax
402c87: 0f a2 cpuid
402c89: 89 45 f8 mov %eax,-0x8(%rbp)
402c8c: 89 5d fc mov %ebx,-0x4(%rbp)
402c8f: 89 4d ec mov %ecx,-0x14(%rbp)
402c92: 89 55 f4 mov %edx,-0xc(%rbp)
402c95: 48 c7 c0 01 00 00 00 mov $0x1,%rax
# Get CPU capabilities (EAX = 1)
402c9c: 0f a2 cpuid
402c9e: 89 45 f0 mov %eax,-0x10(%rbp)
402ca1: 89 5d e0 mov %ebx,-0x20(%rbp)
402ca4: 89 4d e8 mov %ecx,-0x18(%rbp)
402ca7: 89 55 e4 mov %edx,-0x1c(%rbp)
...
...
402cca: 8b 45 fc mov -0x4(%rbp),%eax
# Compare the first four bytes of your vendor string with "Genu"
402ccd: 3d 47 65 6e 75 cmp $0x756e6547,%eax
402cd2: bb 01 00 00 00 mov $0x1,%ebx
402cd7: 75 1b jne 402cf4 <__intel_cpu_indicator_init+0x98>
402cd9: 8b 45 f4 mov -0xc(%rbp),%eax
# Compare the first four bytes of your vendor string with "ineI"
402cdc: 3d 69 6e 65 49 cmp $0x49656e69,%eax
402ce1: 75 11 jne 402cf4 <__intel_cpu_indicator_init+0x98>
402ce3: 8b 45 ec mov -0x14(%rbp),%eax
# Compare the first four bytes of your vendor string with "ntel"
402ce6: 3d 6e 74 65 6c cmp $0x6c65746e,%eax
402ceb: 75 07 jne 402cf4 <__intel_cpu_indicator_init+0x98>
402ced: ba 01 00 00 00 mov $0x1,%edx
402cf2: eb 02 jmp 402cf6 <__intel_cpu_indicator_init+0x9a>
402cf4: 33 d2 xor %edx,%edx
# If you has "GenuineIntel" everything goes OK. Later are more test
# to see the capabilities of your CPU and they are taken in account.
...
...
# Here it loads in RAX the address of a global variable (_DYNAMIC+0x1d8)
# where a value representing the the capabilities of your CPU is stored.
# This value also says if your CPU is non-INTEL which means that the
# true capabilities of your CPU are not full used (i.e. SSE).
402d7e: 48 8b 05 a3 56 20 00 mov 0x2056a3(%rip),%rax # 608428 <_DYNAMIC+0x1d8>
# In EBX the value of this global variable is ready to be copied to
# memory. An INTEL CPU with SSE and SSE2 has EBX = 0x800. An AMD CPU
# with SSE and SSE2 has EBX = 0x1 which means that the SSE and SSE2
# capabilities are not recognized.
402d85: 89 18 mov %ebx,(%rax)
...
...
################## DISASSEMBLY OF AN ICC GENERATED BINARY #########################
The patch-AuthenticAMD utility remplaces those three CMP instructions by other three CMPs that look
for the vendor string AuthenticAMD. The libelf library is used to analyze the structure of the
ELF binary to be patched so we can find the executable sections and do the replacements only in that
sections, so we can garantee that what we remplaces is a machine instruction and no another thing.
Also it is possible to by pass `libelf' and make replacements in all the binary.
The binaries generated with the Intel C++ compiler usually have several execution branches, some of them are for maximum compatibily with x86 processors and others are for maximun speed with SSE optimizations. With this utility, the executable will get the fastest path your CPU supports.
Last modified date: Wed May 5 21:44:58 CEST 2010