Benchmarking the HotSpot, Graal CE and Graal EE doing matrix multiplication
The HotSpot JVM will eventually be replaced by Graal which is about to be released at v1.0. Graal comes in 2 flavours – a free community edition (CE) and a commercial enterprise edition (EE). The EE has some performance optimisations that the CE doesn’t. In particular it has auto-vectorization.
This is a follow-up to the earlier post Quick test to compare HotSpot and OpenJ9. This time the main objective is to compare the Graal CE and Graal EE. The general message on the performance of Graal CE seems to be that it is roughly on par with HotSpot – better at some things and worse at others, but about the the same on average. Presumably Graal EE will perform better than HotSpot.
- JDK 1.8.0_161, Java HotSpot(TM) 64-Bit Server VM, 25.161-b12
- JDK 1.8.0_192, GraalVM 1.0.0-rc12, 25.192-b12-jvmci-0.54 GraalVM CE
- JDK 1.8.0_192, GraalVM 1.0.0-rc12, 25.192-b12-jvmci-0.54 GraalVM EE
Let’s test those JVMs using code that will definitely benefit from auto-vectorization, matrix multiplication, and to test that using 3 commonly used pure Java linear algebra libraries:
- Apache Commons Math (ACM) v3.6.1
- Efficient Java Matrix Library (EJML) v0.37.1
- ojAlgo v47.0.1-SNAPSHOT
The test performs matrix-matrix multiplication on square dense matrices of various sizes: 100, 150, 200, 350, 500, 750 and 1000. The libraries have made different choices on how to store the elements and how to implement multiplication. In particular ojAlgo is multithreaded, and the others are not. The goal here is not to determine which library is the fastest, but rather to get an idea about what happens when you change the JVM.
The chart below shows the speed (throughput, ops/min) for the various library/JVM combinations for each of the different matrix sizes.
The chart speaks for itself and the results are as expected. Graal CE is typically slower than HotSpot, and Graal EE faster. The difference between the CE and EE versions of Graal is significant.
How much will Oracle charge for the EE version?
This test was executed on a Google Cloud Platform Compute Engine: n1-highmem-4 (4 vCPUs Intel Skylake Xeon, 26 GB memory).
The benchmark code is here: https://github.com/optimatika/ojAlgo-linear-algebra-benchmark
The raw results (*.log and *.csv files) can be found here: https://github.com/optimatika/ojAlgo-linear-algebra-benchmark/tree/develop/results/2019/02/oracles-jvms-hotspot-graal-ce-graal-ee