There are many well-known open-source cryptography libraries available, which implement many different ciphers. So which library and which cipher(s) should one use for a new program? This comparison presents a wealth of experimentally determined speed test results to allow an educated answer to this question.
The speed tests encompass eight open-source cryptography libraries of which 15 different ciphers are examined. The performance experiments were run on five different computers which had up to six different Linux distributions installed, leading to ten CPU / distribution combinations tests. Ultimately the cipher code was also compiled using four different C++ compilers with 35 different optimization flag combinations.
Two different test programs were written: the first to verify cipher implementations against each other, the second to perform timed speed tests on the ciphers exported by the different libraries. A cipher speed test run is composed of both encryption and decryption of a buffer. The buffer length is varied from 16 bytes to 1 MB in size.
Many of the observed results are unexpected. Blowfish turned out to be the fastest cipher. But cipher selection cannot be solely based on speed, other parameters like (perceived) strength and age are more important. However raw speed data is important for further discussion.
When regarding the eight selected cryptography libraries, one would expect all libraries to contain approximately the same core cipher implementation, as all calculation results have to be equal. However the libraries' performances varies greatly. OpenSSL and Beecrypt contain implementations with highest optimization levels, but the libraries only implement few ciphers. Tomcrypt, Botan and Crypto++ implement many different ciphers with consistently good performance on all of them. The smaller Nettle library trails somewhat behind, probably due to it's age.
The first real surprise of the speed comparison is the extremely slow test results measured on all ciphers implemented in libmcrypt and libgcrypt. libmcrypt's ciphers show an extremely long start-up overhead, but once it is amortized the cipher's throughput is equal to faster libraries. libgcrypt's results on the other hand are really abysmal and trail far behind all the other libraries. This does not bode well for GnuTLS's SSL performance. And libmcrypt's slow start promises bad performance for thousands of PHP applications encrypting small chunks of user data.
Most of the speed test experiments were run on Gentoo Linux, which compiles all programs from source with user-defined compiler flags. This contrasts to most other Linux distributions which ship pre-compiled binary packages. To verify that previous results stay valid on other distributions the experiments were rerun in chroot-jailed installations. As expected Gentoo Linux showed the highest performance, closely followed by the newer versions of Ubuntu (hardy) and Debian (lenny). The oldest distribution in the test, Debian etch, showed nearly 15% slower speed results than Gentoo.
To make the results transferable onto other computers and CPUs the speed test experiments were run on five different computers, which all had Debian etch installed. No unexpected results were observable: all results show the expected scaling with CPU speed. Most importantly no cache effects or special speed-ups were detectable. Most robust cipher was CAST5 and the one most fragile to CPU architecture was Serpent.
Most interesting for applications outside the scope of cipher algorithms was the compiler and optimization flags comparison. The speed test code and cipher library Crypto++ was compiled with many different compiler / flags combinations. It was even compiled and speed measured on Windows to compare Microsoft's compiler with those available on Linux.
The experimental results showed that Intel's C++ compiler produces by far the most optimized code for all ciphers tested. Second and third place goes to Microsoft Visual C++ 8.0 and , which generate code which is roughly 16.5% and 17.5% slower than that generated by Intel's compiler. gcc's performance is highly dependent on the amount to optimization flags enabled: a simple is not sufficient to produce well optimized binary code. Relative to the older compiler version is about 10% slower on most tests.
All in all the experimental results provide some hard numbers on which to base further discussion. Hopefully some of the libraries' spotlighted deficits can be corrected or at least explained. Lastly the most concrete result: the cipher and library I will use for my planned application is Serpent from the Botan library.
Table of Contents
- Description of Libraries, Ciphers and Compilers
- Libraries Tested
- Ciphers Tested
- Compiler and Flags Tested
- Test Method
- Test Environment
- CPUs and Distributions
- Test Program Runs and Plots
- Compiler / Flags Test Program Runs and Plots ()
- Observation and Discussion
- Ciphers Compared
- Libraries Compared by Cipher
- Findings On Different Distributions
- Ciphers compared by CPU
- Compiler and Optimization Flags
- Detailed Distribution Package Versions
- Full Speed Table Listing for All Distributions
- Detailed Speed Table Listing for All CPUs
- Full Speed Table Listing for All Compiler Flags
Currently I am working on a program dubbed CryptoTE. It is a text editor which automatically saves documents and attachments in an encrypted container file. The idea is to transparently encrypt sensitive passwords and other data so other, possibly malicious programs (and users) cannot read the text. Yes, I know there are many "Password Keeper" programs available on the Internet. However CryptoTE, being a text editor, will be much simpler: it will not force you to structure your password data, no tables, attributes, etc. Last reason: I need it myself. CryptoTE will be available on idlebox.net when finished.
During current development of CryptoTE I have to decide, which cryptography library and which cipher(s) to choose for encrypting data. Currently I don't plan on having the user select one of 100 different ciphers, and thus leave cipher selection to some arbitrary choice of the user. ("Blowfish looks pretty, reminds me of my last diving trip, I'll take that one.") So the list of available ciphers will be very short. I also don't care for the following misleading entry on the features list: "This super program has 1000 different ciphers" (which are actually just implemented by the library it uses).
The basic idea before starting this extensive comparison, was to use one of the currently strongest (public) ciphers: Rijndael (AES), Serpent or Twofish. Easy so far, but which library to use? Probably libgcrypt or libmcrypt, because the first is used by GnuTLS and the second is a long existing PHP extension used by many, many web applications.
However the results of this speed comparison test shows that this choice would not have been optimal. It turned out that there are substantial differences in the different libraries encryption speeds.
Once the speed test was written, the initial results showed such surprising differences, that I extended the test. I ran the library speed test on different Linux distributions and different CPUs / computers. This should determine if the differences were specific to my favorite distribution (Gentoo) or to my desktop computer's CPU architecture.
Testing different distributions however is not really fair. Most important criterion for the cipher speed are the compiler flags used to compile the library sources during packaging. So I expected a distribution using the flag to show lower speeds than a distribution compiled with (like my Gentoo is).
Thus I further extended the speed test to compare three different custom cipher implementations across different compilers and compiler flags, in the end even running the speed test on Windows (to satisfy the curiosity of a friend of mine). Here too the speed test results are unexpected.
2 Description of Libraries, Ciphers and Compilers
The speed comparison test was performed using many different ciphers found in well-known open source cryptography libraries. It was run on five different CPUs and six different Linux distributions to reveal details about distribution packaging, compiler flags and CPU attributes.
This section will describe in short which libraries, ciphers and compilers where compared.
2.1 Libraries Tested
|libgcrypt||1.2.3 / 1.2.4 / 1.4.0||C||LGPL||Used by GnuTLS, which I prefer over OpenSSL because it throws no valgrind memory errors.|
|libmcrypt||2.5.7 / 2.5.8||C||LGPL||Long existing PHP extension. Used by lots and lots of web sites|
|Botan||1.6.1 / 1.6.2 / 1.6.3||C++||BSD||Newer library. More liberal license. Good C++ interface instead of old-fashion C.|
|Crypto++||5.2.1c2a / 5.5 / 5.5.1 / 5.5.2||C++||Special||Another C++ library which seems to have a more Win32-ish background.|
|OpenSSL||0.9.8b / 0.9.8c / 0.9.8e / 0.9.8g||C||Special||Well, it's OpenSSL. Just the low-level cipher interface is tested.|
|Nettle||1.14.1 / 1.15||C||LGPL||Very small(!) low-level library.|
|Beecrypt||4.1.2||C||LGPL||Another small and possibly fast library.|
|Tomcrypt||1.06 / 1.17||C||Public Domain||Least entangled library of cipher implementations.|
The license of all these libraries are problematic, because the actual encryption cipher source code is often in the public domain. However, that is some lawyer's job to figure out. For a detailed listing of each libraries' versions see the extra page: Distribution Package Versions.
Furthermore three custom cipher implementations were included in the speed test. These custom implementations are basically the publicly available original cipher source code modified and extended by myself for direct inclusion in my C++ programs. Included are:
- Optimized Rijndael (AES) by Vincent Rijmen, Antoon Bosselaers and Paulo Barreto.
- Serpent cipher optimized by Dr. Brian Gladman.
- Another implementation of the Serpent cipher extracted from Botan. This is included to compare compiler settings and also because this implementation will be used in CryptoTE.
2.2 Ciphers Tested
The ciphers available in the different libraries vary greatly. Mostly I chose to run a speed test on the strongest ciphers included in the library. All ciphers are tested in ECB (Electronic Codebook) mode, because it is available everywhere and best tests the cipher implementation itself.
|Cipher||Blocksize (bits)||Keysize (bits)||Libgcrypt||Libmcrypt||Botan||Crypto++||OpenSSL||Nettle||Beecrypt||Tomcrypt|
2.3 Compiler and Flags Tested
Quite late during this speed test process, I decided to also test different compilers and compiler flag combintations. gcc was available in two different versions on my Gentoo system. Further I installed the Intel C/C++ Compiler using their "Non-Commercial Software Development" license. Lastly a friend wanted me to compare it with Visual C++, of which I have an academic edition.
|GNU Compiler Collection||4.1.2||Gentoo Linux||, , , , ,|
|GNU Compiler Collection||3.4.6||Gentoo Linux||, , , , ,|
|Intel C/C++ Compiler||10.0||Gentoo Linux||, , , ,|
|Microsoft Visual C++||8.0 (2005)||Windows XP||, , ,|
The basic optimization flags were tested on all three compilers. Some further gcc flags were also tested, as the default are still quite restrictive. Furthermore (not included in the preceding list) I ran the speed tests on MinGW to double-check the timer resolution on Windows.
3 Test Method
Two programs are used to test and compare the cipher implementations.
The first test is not a speed measurement, instead the program is used to validate the different libraries against each other. Some fixed input is run through different libraries and the encrypted output is compared. This is done to check the different implementation (especially those which I modified) for correctness.
only tests five ciphers: Rijndael, Serpent, Twofish, Blowfish and 3DES. Rijndael, Blowfish and 3DES are implemented in almost every library and Serpent is the cipher I ultimately chose. Twofish and Blowfish are surprisingly fast in some results, so I had to check that they actually did some work.
For each library or custom implementation takes a 128 KB buffer filled with a specific pattern. It then encrypts the buffer and compares the result with the another encrypted buffers, thus checking that both (or more) implementations returned the same results. Then the cipher is used to decrypt the buffer again, and the buffer contents is verified to be the original data pattern.
The following implementations are checked against each other:
- Rijndael (AES): Custom(Rijmen), libgcrypt, libmcrypt, Botan, Crypto++, OpenSSL, Nettle, Beecrypt, Tomcrypt. (That are all libraries)
- Serpent: Custom(Gladman), Custom(Botan), libgcrypt, libmcrypt, Botan, Crypto++, Nettle.
- Twofish: libgcrypt, libmcrypt, Botan, Crypto++, Tomcrypt.
- Blowfish: libgcrypt, libmcrypt, Botan, Crypto++, Nettle, Tomcrypt.
- 3DES: libgcrypt, libmcrypt, Botan, Crypto++, OpenSSL, Nettle, Tomcrypt. (All except Beecrypt)
The core of each speed test consists of one encryption pass directly followed by a decryption pass. Thus both encryption and decryption speed of the cipher is tested and results will reflect the time to encrypt plus decrypt. The passes are performed on one buffer filled with a pattern.
The first statistic variable is the buffer size en/decrypted. It ranges from 16 bytes to 1 MB. Only the buffer sizes 24+n with n = 0 .. 16 are measured. By also testing very small buffers, library overhead and cipher key preprocessing/initialization time is measured indirectly. This start-up overhead becomes smaller as the buffers get larger.
To make results more accurate with the inaccurate time measurement device (), small buffer size en/decryption is repeated a large number of times. The total run of all repeats is then divides by the number of repeatitions. The number of repeatition begins so that at least 64 KB of data is processed. If one repeated run takes less than 0.7 seconds, the same test is redone with twice the amount of data processed. This way the repetition loop is increased until processing takes a sufficiently long time to allow good measurement with only moderate timer resolution.
Furthermore each buffer size (including all internal repetitions) is tested 16 times. The different buffer sizes are not tested individually, but different sizes consecutively and then all are repeated.
The time is measured on Linux using and on Windows using . The results are written out to a text file for further processing with gnuplot. Each result includes the buffer size, average, standard deviation, minimum and maximum; both the absolute time measured and the reached throughput speed are printed into the result file.
4 Test Environment
4.1 CPUs and Distributions
The speed measurements were performed on five different computers available to me. They have five different CPUs:
- Intel Pentium 4 at 3.2 GHz with 1024 KB L2 cache - Short: p4-3200
- Intel Pentium 3 (Mobile) at 1.0 GHz with 512 KB L2 cache - Short: p3-1000
- Intel Pentium 2 at 300 MHz with 512 KB L2 cache - Short: p2-300
- Intel Celeron at 2.66 GHz with 256 KB L2 cache - Short: cel-2660
- AMD Athlong XP 2000+ with 256 KB L2 cache - Short: ath-2000
To compare distribution package speed six different Linux distributions where used:
- Gentoo stable
- Debian 4.0 etch (currently )
- Debian lenny (currently )
- Ubuntu 7.10 Gutsy Gibbon
- Ubuntu 8.04 Hardy Heron
- Fedora 8
For a detailed listing of the different libraries package versions used in the speed tests, see the extra page: Distribution Package Versions.
4.2 Basic Test Program Runs and Plots ()
The program was run many times. Small code changes and adaptions required many re-runs during the whole testing process. The final runs were performed from 2008-04-09 to 2008-04-22. They produced the text result files found in the downloadable package.
The text result files contain the raw time and speed numbers. Two different gnuplot scripts are included, which visualize the numbers to show different aspects.
The directory of the package contains PDFs named and (e.g. ). These graphs read result files from only one run of all speedtests; the first plots contain the different ciphers contained in each library. The second part then groups the results by cipher: displaying the speed of the different libraries.
The PDFs contain all libraries and all ciphers run on a single CPU/distribution combination. These graphs contain 57 plot lines and are really full. Their size is trimmed to be printed on A4 paper.
To compare the different CPU/Distribution combinations against each other, two further PDFs are included: and .
The contains eight plots on each page. The plots of all are grouped together and plots displaying the same cipher/libraries are put on one page. This way a direct side-by-side comparison can be done.
More individually the contains plots which show the same library as run on different CPU/distro combinations. Not all combinations are included, only those run on my p4-3200 desktop computer are compared.
4.3 Compiler / Flags Test Program Runs and Plots ()
The test runs to compare different compilers and compiler flag sets are also included in the package under a different results directory. The final runs of this result set were performed on 2008-05-26. All compiler tests were run on the same CPU / computer: p4-3200 - Pentium 4 3.2 GHz
The biggest issue was to automate compilation of both the speedtest code and the cryptography libraries with all the different flags and compilers. This was not done for all cryptography libraries, but only for Crypto++. It's configuration script was easy and allowed easy exact definition of the compiler and flags (other libraries' configure stripped out or automatically added optimization flags). Crypto++ also provided project files for Visual C++.
The directory contains some compilation automation scripts and a perl/gnuplot script. The script calls gnuplot subprograms and feeds generated gnuplot command into the plotter to create the two PDFs named and .
is the primary result file and compares the different compilers and compiler flags for all the different ciphers available.
was only used to check MinGW's special against the on Gentoo Linux. Thus the timer resolution of Windows and Linux was double-checked so the results of Visual C++ are comparable to those run on Linux.
5 Observation and Discussion
This section describes the observations and results found in the different graphs. Please note that all these results are subjective and statistically irrelevant because of the small number of computers tested. However they do give insight into the problems of encryption performance.
All plot bitmaps in the following text are linked to their full-scale PDF originals.
5.1 Ciphers Compared
The first set of plots contain straight-forward performance data of the different ciphers provided by each library.
The plot above displays absolute time in seconds required to run one unit of the speed test. One speed test unit consists of encryption and decryption of a buffer with specific length. The length of the buffer tested is the value on the x-axis and ranges from 16 to 1024768 bytes. The buffer lengths are plotted logarithmically, meaning each step to the right actually doubles the length. This way the small length are also showed in detail. In the above graph the average absolute time and the standard deviation (only visible as the small horizontal dashes) are plotted.
Much more informative is the above plot, which shows speed instead of absolute time. Where . The speed is displayed in megabyte per second. The above plot shows some ciphers available in the libgcrypt library.
First observations identifies Twofish to be the fastest cipher, once buffers are larger than about 9000 bytes. It achieves more than 20 MB/s throughput.
All ciphers require a start-up overhead, which explains the lower speed for small buffer. This start-up overhead mainly consists of cipher key-schedule context precalculations, but other things like library-overhead, memory-allocation and initialization also take their toll. Twofish and Blowfish need longest to start-up, all others are about the same. The start-up speed is visible in the graph by regarding how large a buffer must be to amortize the precalculations. This is where the plot line reaches it's horizontal value.
The above plot shows the ciphers tested in the Botan library. This plot shows a totally different picture than the previous one. This time Blowfish is the "winner". But, more important, all ciphers perform significantly better than the implementation in libgcrypt; of course one can only directly compare ciphers available in both libraries. Note the y-axis scale going up to 40 MB/s
Similar speeds are observable in the above plot of the ciphers from the Crypto++ library. Best performing cipher is again Blowfish with almost 50 MB/s throughput. However it is also the slowest to start-up and reach it's peak performance. All other ciphers perform similarly with their counterparts in the Botan library, with the exception of Serpent. For some reason Serpent is less than half as fast as in the Botan library.
The real surprise of the speedtest is the above plot showing ciphers implemented in the libmcrypt library. The plot shows a massively higher start-up time for all ciphers in the library. Performance of libmcrypt for small buffers from 1000 to 10000 bytes is abysmally lower than for all other libraries. However after the start-up overhead is amortized, the cipher implementations reach the their expected speeds. I have no idea why libmcrypt has such an overhead during cipher allocation and initialization. This cannot be due to key schedule setup of similar cipher-related aspects, because they are common to all libraries. It must be something with (possibly special secure) memory allocation, cipher look-up, multi-thread mutex locking or other aspects of the library's organization. I rather not think about the myriads of web applications using libmcrypt via PHP to encrypt small bits of user data, which is stored in some SQL database.
During my search for encryption libraries, I noted that the ubiquitous OpenSSL library also exports low-level cipher functions. Obviously the selection of ciphers in OpenSSL is directly linked to those required for SSL communication channels. It only provides 3DES, Blowfish, CAST5 and, in the newer OpenSSL versions, also AES. However the comparison of different libraries below will show that the relatively few cipher implementations in OpenSSL are highly optimized.
The nettle library contains well-performing implementation of the most common ciphers.
Last but one library in this first list is Tomcrypt. It contributes 11 ciphers to the speed test, some quite exotic like Noekeon, Skipjack and Anubis. Wikipedia brands Noekeon as a rather vulnerable cipher. Skipjack seems to have been a classified NSA cipher. Most interesting is Anubis which was (co-)created by the same person who initially designed AES (Rijndael).
Last library is Beecrypt, which contains only two block ciphers. Thus the data plot contains only two lines. These results appear again in a better context in the library comparison below.
So which is the fastest cipher? That is a difficult question to answer. The main problem is that all test results above were generated on Gentoo. Gentoo is a Linux distribution compiled from source on each installation. So each Gentoo installation is to some degree different from others because compiler flags, used system libraries and other aspects can change quickly.
This is why the real "best" cipher speed comparison table is postponed to one of following sections, in which different distributions are compared. Jump to the "best cipher" table if you are impatient.
The following table shows the maximum speed in KB/s of each cipher implementation:
5.2 Libraries Compared by Cipher
The second set of plots compares the eight cryptography libraries against each other. One cipher is selected for comparison and all libraries providing this cipher are plotted into one chart. Obviously not all libraries provide all ciphers, so the plots have different amount of lines.
The first cipher to compare is Rijndael (AES). It is provided in all eight libraries, plus one extra custom implementation. The custom implementation is basically the original Rijndael code as released by the author. The only modification was to adapted it into a convenient C++ class.
The plot shows that the different libraries vary greatly in performance. In the range from 10 MB/s to more than 40 MB/s the libraries' performances are fairly distributed. Lowest in speed is libmcrypt, while the highest speed was achieved by OpenSSL. My custom implementation came in third. Start-up overhead was also highest in libmcrypt. Most other libraries show low start-up overhead.
All Rijndael implementations were verified against each other, which means that all work as expected and output the same cipher text for equal input. Thus the above results cannot show totally different calculations; the output is always the same.
This is maybe the most surprising result of the whole speed test: all cipher implementations' calculation results are verified to be exactly the same, yet the performance of the tested libraries vary so greatly that this seems absurd.
The second plot shows how fast the Serpent cipher is performed by the different libraries. For Serpent two different custom implementations are included. The first is optimized by Dr. Brian Gladman using different theoretic methods. The second was extracted from Botan, it will be used by my CryptoTE editor.
Serpent is a slower (and more secure) cipher than Rijndael. The average libraries all show a performance speed of less than 15 MB/s. However the big exception turned out to be Botan, showing almost twice the speed of all other libraries. With almost 30 MB/s it surpasses many Rijndael implementations. This is why I extracted it from Botan into a stand-alone C++ class for used in my programs. The speed of Botan was retained and for small buffers the start-up overhead introduced by Botan was eliminated. Whether this amazing performance is due to special CPU features or compiler flags will be discussed in the following sections.
Twofish is another candidate from the AES-contest. It is implemented by six of the studied libraries. All show the same slow start-up of the cipher. It requires much preprocessing of the key material but achieves a higher throughput than Serpent for larger buffers. The speed achieved by all libraries is larger than 20 MB/s.
Predecessor of Twofish is the Blowfish cipher. Implemented by all eight examined libraries, it shows a similar slow start-up like Twofish. After amortizing the start-up overhead, Blowfish performs faster than Twofish. However the two should not be compared directly, because they perform in different security classes: Blowfish is old and Twofish is much newer and is generally regareded as more secure.
With almost 50 MB/s, Beecrypt's Blowfish implementation presented the highest achieved speed in the complete speed test on Gentoo. Close behind are Crypto++, Tomcrypt and OpenSSL. Compared to 50 MB/s libgcrypt's speed of roughly 6 MB/s, even on large buffer sizes, is really bad.
The cipher CAST5 is rather old, but still used e.g. by PGP / GnuPG for symmetric encryption. It is implemented by all libraries except beecrypt. This time all libraries perform similarly with an average speed of around 32 MB/s. Only libgcrypt falls out of the line.
Last cipher to be compared is 3DES. Triple DES is very old compared to the others, however it is still widely used in VPN, SSL and hardware circuits. It is implemented by all libraries except beecrypt. Most unexpected is the speed of OpenSSL's implementation of 3DES. It beats all others by far. Obviously much optimization has been put into this implementation, probably because 3DES is one the encryption ciphers routinely used for SSL connections.
So which is the best / fastest library? That question can be answered here only for the Gentoo distribution. Comparing the libraries on Gentoo has the advantage, that Gentoo begin source-compiled can enable all optimizations and does not introduce performance problems imposed by pre-compiled binary packages or other problems, which the binary package maintainer may have created.
However how to compare a library like beecrypt which implements only two ciphers to a library which implements eleven ciphers? Obviously only the ciphers actually available can be scored. The scoring analysis was done as follows: First the average speed for each cipher was calculated. Then each library's speed delta (difference to the average) was regarded and added up. Thus the total difference all implemented ciphers was taken for the following ranking. Note that in this analysis, if a cipher is implemented by only one library, the cipher adds zero score to the total. All speeds are in KB/s:
|Blowfish||47,299 / +6,884||52,662 / +12,247||49,685 / +9,269||40,781 / +365||50,448 / +10,033||32,170 / -8,246||44,355 / +3,940||5,922 / -34,493||40,415|
|CAST5 (128)||37,528 / +4,480||41,494 / +8,446||36,062 / +3,014||32,018 / -1,030||35,514 / +2,466||33,618 / +570||15,103 / -17,945||33,048|
|Noekeon||31,621 / +0||31,621|
|Anubis||27,898 / +0||27,898|
|Rijndael AES||35,817 / +7,929||44,153 / +16,265||23,588 / -4,300||40,245 / +12,356||21,807 / -6,082||27,155 / -734||34,625 / +6,737||10,145 / -17,743||13,459 / -14,429||27,888|
|Twofish||36,545 / +9,462||26,189 / -894||28,224 / +1,141||25,903 / -1,180||23,352 / -3,731||22,283 / -4,799||27,082|
|XTEA||26,910 / +3,768||23,844 / +702||20,595 / -2,547||21,218 / -1,924||23,142|
|Khazad||17,221 / +0||17,221|
|GOST||17,912 / +735||18,736 / +1,559||14,885 / -2,293||17,178|
|Serpent||29,171 / +12,112||30,775 / +13,715||12,266 / -4,794||10,914 / -6,145||14,962 / -2,097||6,910 / -10,149||17,059|
|CAST6 (256)||13,349 / -3,647||18,824 / +1,827||18,816 / +1,820||16,996|
|Loki97||9,637 / +0||9,637|
|Skipjack||8,683 / +0||8,683|
|3DES||12,070 / +5,649||5,644 / -776||6,698 / +277||6,744 / +323||4,940 / -1,481||3,834 / -2,587||5,015 / -1,406||6,421|
|Safer+||3,463 / -712||4,888 / +712||4,175|
The winning "library" are my custom implementations. No surprise there, I wouldn't have included them in the test if they were slow.
So the real winner is OpenSSL. It's implementations are on average 8,320 KB/s faster than the average implementation. Second and third place are very close and go to Beecrypt and Tomcrypt.
5.3 Findings On Different Distributions
First problem of the last two sections was that all libraries were taken from my Gentoo system. Gentoo however is a distribution where all packages are compiled from source using individual compiler flags. This approach is not shared by most other Linux distributions, which ship pre-compiled binary packages.
So are the finding above specific to my Gentoo system? Or even to the flags specified in my configurations?
To clarify this issue, five other Linux distributions were installed in chroot jails on the same computer. The speed test compiled in the chroot thus use the binary-distributed library versions.
The chart above plots the selected ciphers from libgcrypt run on the six Linux distributions on the same CPU. Each cipher has one distinct color and the six distributions are distinguished through the different line styles, solid dashed, dot-line-dot, etc. (Click on the plot for a zoomable PDF.)
One can see that some ciphers, that is Rijndael, Serpent and Blowfish, perform very similar on all platforms: their colored lines follow about the same path. Twofish too performs similar on all distributions except on Gentoo, probably due to extra compiler optimization. CAST5 shows a rather large variation of speeds; CAST5 also has large standard deviations compared to the others. 3DES also shows a rather large speed range.
All in all, no real surprises are in the above chart. Maybe the most strange is that compiler-optimized Gentoo (the solid line) performs a lot better on Twofish but also a lot worse on CAST5.
Next library above is libmcrypt. This chart verifies that mcrypt has very slow start-up times and not only on Gentoo, but on all distributions. The chart excludes some cipher (XTEA, Safer+ and Loki97) to increase readability. Again some ciphers show very little variation in throughput speed: Rijndael, CAST6 and 3DES. All others also show no great surprises.
Botan and Tomcrypt show the same effects. Some cipher implementations perform nearly equivalently on all distributions, others show a larger but no huge variation.
The corresponding chart for Crypto++ is very full and shows a wide variation even of ciphers previously unvarying. Crypto++ seems to be very sensitive to optimization. Nettle's chart shows the same observations as before.
The two remaining library are OpenSSL and Beecrypt. OpenSSL shows that it's cipher implementations perform almost unvaryingly well on all distributions. This promises good performance for SSL secure sockets on all distributions.
Beecrypt shows only one new aspect: the Blowfish implementation on Debian-lenny shows a serious fall as compared to Debian-etch. This is probably due to the gcc compiler version change to 4.2. More about compilers and compiler flags later.
So which distribution performs best? To analyze this question, the speed table was created for each distribution. It contains the maximum value of each plot, the maximum speed the cipher reached. Then the average speed of all cipher / library test runs performed on one Linux distribution is calculated. The table below shows this average and the average over all test runs. The values below the average are (minimum - maximum) speed across all ciphers implemented in the library. Again all values are in KB/s.
|(5,015 - 22,283)||(3,834 - 44,355)||(6,698 - 40,781)||(6,744 - 50,448)||(12,070 - 47,299)||(4,940 - 35,514)||(23,588 - 52,662)||(3,463 - 49,685)||(29,171 - 35,817)|
|(6,490 - 19,975)||(3,468 - 40,554)||(6,007 - 39,000)||(3,304 - 41,518)||(11,707 - 45,198)||(2,438 - 30,706)||(23,285 - 49,538)||(3,570 - 52,340)||(26,935 - 37,219)|
|(6,564 - 20,171)||(3,017 - 32,796)||(5,920 - 38,985)||(3,345 - 41,500)||(11,972 - 45,110)||(2,393 - 34,414)||(23,591 - 26,597)||(3,523 - 52,293)||(27,296 - 36,341)|
|(3,804 - 21,024)||(3,461 - 40,597)||(6,296 - 37,651)||(4,336 - 31,028)||(11,878 - 45,176)||(2,434 - 34,379)||(23,293 - 49,416)||(3,036 - 51,821)||(28,183 - 35,626)|
|(2,246 - 20,412)||(3,337 - 41,066)||(7,311 - 35,071)||(7,241 - 34,668)||(11,974 - 45,253)||(3,868 - 43,517)||(20,742 - 45,177)||(3,051 - 47,499)||(24,503 - 33,573)|
|(3,660 - 19,020)||(2,990 - 32,804)||(6,425 - 32,452)||(4,439 - 42,647)||(11,805 - 45,298)||(2,559 - 34,795)||(23,517 - 49,558)||(3,448 - 51,800)||(29,106 - 36,155)|
Obviously Gentoo is the fastest distribution. No surprise here, the libraries were compiled from source with high optimization levels.
The only other result seen here is that "newer" distributions (ubuntu-hardy and debian-lenny) perform better than older one. This is probably due to the compiler version bump from to . More about that in the section Compiler and Optimization Flags.
See the external table file for a detailed speed table listing for all distributions.
5.4 Ciphers compared by CPU
The tests discussed in the last three sections (ciphers, libraries and distribution comparisons) were all performed on my development computer. It has a Pentium 4 CPU at 3.2 GHz. To determine if any of the previous results are due to special attributes of the Pentium 4 architecture, the speed test was repeated on four other CPUs / computers. To make the comparison independent of the Linux distribution, Debian etch was installed on all computers (chrooted on some). The plots below display results of the speed test on the five CPUs side by side. The sixth plot shows results from my Pentium 4 run with Gentoo, instead of Debian etch; these are the same plots as in the section "Ciphers Compared" just for comparison.
Again we address libgcrypt's results first. All five results from Debian etch look similar. From p2-300 to p3-1000 the cipher's speed increases twofold (Rijndael from 2 MB/s to 7 MB/s), but all relative speeds are unchanged. Also cel-2660 and p4-3200 show very much the same picture, scaled only by the increased CPU speed. Yet these two charts pairs (p2-300/p3-1000 vs. cel-2660/p4-3200) show different relative speeds: most obvious Twofish is best on p2-300/p3-1000 but CAST5 wins on cel-2660/p4-3200. More interesting is the fact that the three ciphers Blowfish, Serpent and 3DES don't scale with CPU speed as well as the other three do. ath-2000 shows a third picture, different from p2-300/p3-1000 and cel-2660/p4-3200. 3DES and Serpent are actually faster on the ath-2000 than on p4-3200. These implementation seem to work better with AMD's CPUs than Intel's.
On all CPUs libmcrypt on Debian etch shows the same slow start-up. Since this rules out the library for almost all purposes, I will not go into more detail on the CPU comparison.
Next we regard a faster library: Botan. Again the CPUs' results form three distinct groups with equal relative speeds: p2-300/p3-1000, cel-2660/p4-3200 and ath-2000. But compared to libgcrypt the relative speeds change less: only Serpent and Twofish show large changes from CPU to CPU. Again ath-2000 shows better relative speed results for these cipher than the faster CPUs cel-2660/p4-3200. Interesting is also the comparison of Debian etch with Gentoo on the p4-3200: the plots show almost equal relative performance with Gentoo's higher optimization; with one exception: Serpent performs four times as fast on Gentoo.
Crypto++ is the next library in the speedtest. We already saw that Crypto++ is very sensitive to optimization flags. Looking at the five charts, p3-1000 immediately falls into the eye: Twofish is the fastest cipher only on that CPU, all others show very high Blowfish speeds instead. Blowfish is almost twice as fast on those CPUs than the next fastest cipher: Rijndael. For a more detailed analysis the above plots were regenerated without the Blowfish data set.
Without Blowfish the other ciphers show almost equal relative performance on the four CPUs p2-300, ath-2000, cel-2660, p4-3200. But even on the other ciphers the CPU p3-1000 performs differently. Most notably the Twofish cipher reaches almost 12 MB/s on p3-1000, but only 5 MB/s on ath-2000. Why this CPU walks out of the line is beyond me.
OpenSSL's highly optimized cipher implementations perform very well on all tested CPUs. Again this promises very good SSL socket speeds on all x86 CPUs. No further important observations are found on these charts.
Beecrypt's results can again be grouped into three similar charts: p2-300/p3-1000, ath-2000 and cel-2660/p4-3200. Like on libgcrypt, some ciphers (Serpent, 3DES) do not speed-up as well as others: Rijndael, CAST5, Blowfish and Twofish utilize the faster CPUs better. And the Athlon does a better job with the less-scalable ciphers than Intel's CPUs.
Tomcrypt shows the same results as already seen on libgcrypt, Beecrypt and less prominently on the other result comparisons.
What do we conclude from the cross-CPU examination? First and most important point is that the performance of an individual cipher does not depend on specific the CPU architecture. The speed usually scales well with CPU speed. However there are exceptions: some cipher implementations do not scale as well as others. Most often 3DES and Serpent show less relative performance gain.
Second interesting point is to determine the cipher which scales best. This requires a short calculation, because we need to account for the CPU's speedup. Thus the first step is to calculate the relative speed-up of each CPU. So first the average speed over all speed tests on all CPUs is taken: all-average in the following table. Then the average speed over all tests on each individual CPU is calculated and from that the relative speed to all-average is calculated: e.g. p3-1000 reaches only 65.2% of the all-average speed.
Then the average performance of each cipher is calculated again across all CPUs and for each CPU individually. Of course only the libraries are taken into account which actually implement the cipher.
In the last step, for each cipher to average performance of all CPUs is scaled down by the speed-up multiplier calculated above to get the linear scaled, expected speed of the cipher. This expected speed is then compared to the actually measured speed: negative values show less than expected speed, positive show a larger speed-up. The difference is shown in the table below, the sum of all differenced to the expected performance signifies how well the CPU is suited for (the tested) cryptography algorithms.
|relative to average||133.7%||63.4%||28.3%||175.5%||132.2%||85.0%||117.2%||58.0%|
From the differences to the expected performance the cipher best suited for all tested CPU can be determined: the worst-case is compared (highest negative performance speed-up). However because the min values are in KB/s speed a direct comparison is not valid: faster ciphers bring larger differences to the expected speed. The minimum speed difference has to be normalized by the average cipher's speed to allow a direct comparison. The same normalization is done for the (min - max) range size, which shows how large the cipher's speed fluctuation is.
So obviously p3-1000 is the CPU most suited for cipher algorithms. It performs on average 113 KB/s faster than the others. However compared to the actual speed of 2-10 MB/s this speed-up is not substantial.
The cipher performing best relative to all CPUs is CAST5. It has the least break-in of speed when run on all CPUs. Next are CAST6 and 3DES, which also show solid performance regardless of the CPU. Most fragile to CPU architecture is Serpent; it shows almost 1.676 kB/s less speed on the p4-3200 than expected.
Surprising is that 3DES shows the least fluctuation: the range of its speed differences is only 281 KB/s. On all CPUs 3DES performs almost exactly as expected by the average. However relative to 3DES's slow speed this range is not that small. The normalized ranges of CAST5, 3DES, Blowfish and Rijndael all show that these ciphers are quite independent of the CPU. Again Serpent shows the largest range of speed differences.
See the external table file for a detailed speed table listing for all CPUs.
5.5 Compiler and Optimization Flags
The last collection of test results are centered on the question "How important is the compiler and compiler flags for the encryption speed?". This question already arises above during the comparisons of different distributions. Here the binary package maintainer or in case of Gentoo the distribution user sets the (gcc) compiler flags used to compile the library source code.
To examine the compiler flags influence the cipher source code was compiled using all the 35 different flags shown in table "Compiler and Flags Tested". As stated above the biggest problem was to verify that the build scripts (configure + make) of the library actually passed the flags on to the compiler.
To improve readability of the following plots only a subset of all compiler flags are displayed. The longer gcc compiler flag sequences are shortened to allow compact display in the legend:
Of the above flags, only the variants are included in the following plots.
The first three plots compare compiler flags based on the three custom cipher implementations. First is the Rijndael implementation, which already shows the main trends of the compiler and flags comparison: Intel's C++ compiler generates the fastest code. Next best is gcc with the highest level of optimization. Microsoft Visual C++ passes somewhere in the middle field.
Another important observation is that the flag combination performs nearly equal to "" and "". This means that the flags and does not change performance.
An outlier result is the one generated by : it show way faster performance that all other gcc results. The reason for this fast result is unknown: less optimization seems to do some ciphers (here Rijndael) good.
Second custom implementation is Gladman's Serpent code. Again Intel's C++ compiler wins the race by a long shot. This time the second place goes to Microsoft's Visual C++, which also shows a large winning margin against gcc.
Interesting here is that all three compiler perform nearly the same when optimization is disabled: the red lines are almost equal.
gcc again shows large performance gains from more compiler flags, peaking again with .
The custom cipher code extracted from Botan is an interesting candidate for optimization: it mainly contains eight substitution box functions, the transformation and support functions of which all are declared .
Lots of room for optimizations like instruction schedueling, reordering and register allocation. However the cipher code contains only few branches and loops. Except for the loop over the 256-bit blocks no branches are contained in the main execution part.
The plot shows again Intel's compiler to provide highest optimization. The second place goes this time to gcc, but only with the highest optimization flags level in the test. Third is Visual C++.
Remarkable is the large difference between the winning combinations, which are above 20 MB/s, and the middle field of gcc flag combinations: they all show speeds smaller than 10MB/s. The jump from 10MB/s to more than 20MB/s happens when is added to the flags. This was also visible in the last two plot, but the jump is really large in the current plot.
Again shows a result breaking out of the middle field. This time it does not reach levels.
Now we study the results of cipher implementations in the Crypto++ library. First up is Rijndael.
The plot shows a much larger spread of results than the three custom implementations. Again the winning order is Intel's followed by Visual C++ and gcc. However the winning speed results are much closer together than in the last three tests.
again shows larger speed optimization than combinations. But again the difference is smaller than before.
Crypto++'s implementation of Serpent shows very much the same results as MyBotan Serpent: best, gcc with second and third. Again shows a special performance.
This time also shows good speed results, nearly reaching . In the preceding tests did not show good performance compared to the other results.
The plot above compares by the Twofish implementation in Crypto++. It shows the same findings as in the previous plots.
The PDF plot file contains six more comparisons with different ciphers from Crypto++. All show the same observations as the first six and are therefore omitted here. Check the PDF or tarball for the other charts.
The central point of interest in this section is to find the fastest compiler / compiler flags combination for all ciphers. For this comparison the speed of all ciphers are averaged for each compiler flags combination. The only other calculation of interest is to see how much slower the other compilers are. So each total average is also displayed relative to the fastest compiler / flags combination.
The table above shows only the first rows and columns of the complete table. See the external web page for the full speed table listing for all compiler flags.
Obviously Intel's C++ compiler is the fastest, it shows about the same performance gain for , and . When size optimization is enabled the speed drops about 7%.
Second best compiler in the test is Microsoft's Visual C++ 8.0: the code it creates performes roughly 16.5% slower than that created by Intel's compiler. Again the maximum optimization flag and shows about equal performance.
But close behind is with the flags combination , which creates 17.5% slower code than Intel's compiler. The older gcc version is an amazing 26% slower than Intel's top mark.
However relative to the older compiler version is only 10% slower. This is an interesting result, especially in view of early reports on to show poorer optimization than the tried-and-true old version . This opinion was very popular for versions of gcc. At least in the cipher code case, this does not hold for .
According to some reports you'd think the security sky was falling. Yes, GnuTLS, an open-source "secure" communications library that implements \Secure-Socket Layer (SSL) and Transport Layer Security (TLS), has serious flaws. The good news? Almost no one uses it. OpenSSL has long been everyone's favorite open-source security library of choice.
Red Hat discovered the latest in a long-series of GnuTLS bugs .
Latest? Yes, latest.
You see, GnuTLS has long been regarded as being a poor SSL/TLS security library. A 2008 message on the OpenLDAP mailing list had "GnuTLS considered harmful" as its subject — which summed it up nicely.
In it, Howard Chu, chief architect for the OpenLDAP, the open-source implementation of the Lightweight Directory Access Protocol (LDAP), wrote, "In short, the code is fundamentally broken; most of its external and internal APIs are incapable of passing binary data without mangling it. The code is completely unsafe for handling binary data, and yet the nature of TLS processing is almost entirely dependent on secure handling of binary data. I strongly recommend that GnuTLS not be used. All of its APIs would need to be overhauled to correct its flaws and it's clear that the developers there are too naive and inexperienced to even understand that it's broken."
With GnuTLS's most recent and perhaps biggest failure to date, Red Hat found that GnuTLS, when shown a specially rigged kind of bogus SSL certificate, would fail to see that the certificate was a fake.
The project itself, despite its name, is no longer associated with GNU or GNU/Linux. Its chief designer, Nikos Mavrogiannopoulos, had "a major disagreement with the Free Software Foundation's (FSF) decisions and practices. He then made it an independent project.
None of this has stopped some people from using GnuTLS. The usual reason is that its license, the Lesser Gnu Public License (LGPL), is considered more compatible with GPL licensed software such as Linux, than OpenSSL's BSD style open-source license.
There have been claims that "more than 200 different operating systems or applications rely on GnuTLS to implement crucial SSL and TLS operations." This statement was based on a single Debian user group discussion.
When I looked at this message thread the examples cited were multiple Debian network programs such as exim4, a mail transfer agent; cups, a print server; wget, a file retrieval program; and network-manager, a program used to set up network connections, relied on GnuTLS. Doing my digging I also found that Ubuntu uses GnuTLS with OpenLDAP. Whoops!
Now, make no mistake about it these are all important programs but none of them are used for financial transfers or other situations where a man-in-the-middle attack is likely to cause significant damage. In short, while the code's a real mess, it's highly unlikely anyone in danger of losing credit-card numbers to it. The Apple iOS and Mac OS X goto problem was much more serious.
In the real world almost all open-source based Web servers use OpenSSL. Of the two most popular open-source Web browsersApache uses OpenSSL by default and nginx requires OpenSSL.
To sum up, no one should be using GnuTLS. There are far better security programs out there starting with the far more popular OpenSSL. If for some reason you must use GnuTLS for now, either upgrade to the latest GnuTLS version (3.2.12) or apply the GnuTLS 2.12.x patch. Oh, and developers? Start weaning your programs from GnuTLS, you, and your users, will be glad you did.