Motivation

As you can see from the other engineering blog posts on the CleverTap tech blog, a lot of them are connected with the performance of Java applications. What is common with this article is that at the core of our internal motivation to upgrade from Java 8 to Java 17 is again connected with performance. Currently, at CleverTap we use Amazon Corretto, which is a provided distribution of the Open Java Development Kit. We were using Amazon Corretto 8. Now, the interesting thing is that although Java 8 LTS is still supported and performance and security patches are submitted to it, in Amazon Corretto 8 distributions only security fixes are added. No performance patches are added. This started causing some issues as we wanted some of these performance improvements, but we couldn’t get them into the official Corretto distributions. So we started thinking about upgrading our Java version.

Why did we choose 17 and not 21? Well, just because it is an LTS version, and also it has been around for a while. So we hope that most of the critical issues will be fixed in it already.

Expectations

When we started migrating most of our services we already had one service migrated, which is the most resourceful and important service at CleverTap – our internal data storage TesseractDB. So we expected that for the other services, the upgrade would be easy as we already have had one service migrated and running successfully on Java 17. For most of the services we have, this was the case and we had no issues, but our message delivery service proved our expectations wrong. In this article, I would like to show you the biggest pitfalls we encountered during the upgrade of our message delivery service to Java 17.

Pitfall 1: NIO Sockets vs Plain Sockets

During the gradual rollout of the change of upgrade to Java 17, we experienced force kills by the Docker OOM killer due to an increase in the resident set size. A correlation was clear between the increase of RSS (resident set size) and the increase in the direct buffer pool. See below

At first, we were thinking that we were leaking some memory. However, this turned out to be not true after running with the Jemalloc profiler. The problem was in another place. The resident size is the sum of heap and off-heap objects. In the message delivery service, we allocate from the beginning eagerly the heap so it is always fixed size, which means that free space for off-heap objects is also somewhat fixed. Consequently, if some Java library started allocating more off-heap objects because of some optimizations we would go out of memory bounds pretty easily. That’s the case also for this problem. We started an async-profiler with configuration to monitor the Unsafe_allocateMemory0 and we saw that excessive memory is allocated by the new NIO socket implementation.

The problem further can be broken down into two differences between Java 8 and Java 17.

New Socket Implementation

The first one is the new socket implementation which is in the NIO package instead of the native socket implementation. This change was introduced in Java 13 and it led to an increase in DirectByteBuffer allocation. Example stack trace from a single thread:

java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:332) java.base/sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:243) java.base/sun.nio.ch.NioSocketImpl.tryRead(NioSocketImpl.java:258) java.base/sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:279) java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:309) java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:350) java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:803) java.base/java.net.Socket$SocketInputStream.read(Socket.java:966)

In the stack trace it can be seen that the new implementation of Socket is used which is NioSocketImpl. Internally, it uses sun.nio.ch.Util.getTemporaryDirectBuffer. If we take a look at this Util class code, we can see that it stores an array of DirectByteBuffers for caching and reuse purposes. Each such cache in our case has a size of 1024 positions. The size of each position is uncapped. Another thing is that this cache is a thread local with a lifespan equal to the lifespan of the thread which is called the method. This means that if you have long-running threads the allocated byte buffers would stay allocated and would look like a leak.

To solve this problem we used the option -Djdk.net.usePlainSocketImpl to enable plain sockets and as a side improvement we capped the cached buffers to 256kb using the option -Djdk.nio.maxCachedBufferSize

Improved Encapsulation of JDK Internals

In Java 17 the encapsulation of JDK Internals was seriously improved and some packages became unavailable for reflection. This led to different behavior in the Netty library. More specifically, on line netty decides whether to use direct allocation with Cleaner or not with Cleaner. In Java 8 it uses direct allocation without a Cleaner, while in Java 17 we can see that we are using the ByteBuffer.allocateDirect method. This is because the method useDirectBufferNoCleaner returns false and it returns false when it cannot gain access to one of the constructors of the DirectByteBuffer and on DEBUG level we were receiving

Unable to make private java.nio.DirectByteBuffer(long,int) accessible: module java.base does not "opens java.nio" to unnamed module

The solution to this problem was to open the java NIO package for reflection by using this option
--add-opens=java.base/java.nio=ALL-UNNAMED

Pitfall 2: String Deduplication

The second issue we encountered was around the option for string deduplication – -XX:+UseStringDeduplication. It turns out that the version that we used on most other services – Java 17.0.1 – has a bug around this logic. What we observed is that with the time passing the native memory was again increasing. We took a baseline and then did a diff on the next day and we saw that the String Deduplication category was constantly increasing. It turns out that there is a bug in this Java version, so we needed to increment our version to a newer version. We took 17.0.6 as our next version and the problem was resolved.

Pitfall 3: Object Monitors Deflation

The rollout of the upgraded version was going well on most of our message delivery clusters. But on some, we experienced an increase in RSS memory again. The interesting thing was that it happened when we had a spike in the thread creation. In the charts below this correlation can be seen pretty easily. What was going on and why did this happen?

Again with inspection of the Native memory tracking, we saw that the category Object monitors is increasing during these periods and is not reclaimed.

It turns out that there is another bug in the Java version we were using which led to the issue we saw. The bug is around the object monitors deflation. Without going too deep I want to give you a summary of what object monitors are, what they are used for, and how and when they are cleaned.

The monitor is essentially the main mechanism that Java uses to support synchronization. ObjectMonitors are associated with Java objects on an as-needed basis; this is called “inflation”. Operations like Object.wait() or contended synchronization cause inflation. When an ObjectMonitor becomes idle, it is eligible for “deflation”. Inflation and deflation of ObjectMonitors are invisible to a Java program. Historically, in Java 8 the deflation of ObjectMonitors was happening during a safepoint but after the release of Java 15, this was done asynchronously. There are two parameters watched for this cleanup, one is configurable AsyncDeflationInterval and the other one is MonitorUsedDeflationThreshold(MUDT). The first one has a default value of 250ms which means that every 250ms, the JVM would check if the monitor used is above the MUDT. This threshold might not be that easily achievable in a system that is spawning a lot of threads. They fixed this by introducing the new option GuaranteedAsyncDeflationInterval which is larger than AsyncDeflationInterval, which guarantees deflation execution on the specified interval without taking into account the MUDT threshold. If you are interested in more details, you can see this issue in the JDK board for an example of how it is reproduced, how it was introduced, and how it was fixed. Finally, we upgraded to the latest Java version available – 17.0.9 and the service was running smoothly.

Learnings

During this process, we learned a lot about Java and its internals but the two most important things that we learned are to lower your expectations when doing major release upgrades and observability is a key to many issues. During our debugs we executed the Native Memory tracking tool locally on the machines a lot to see these issues. It would have been very nice if we had a Grafana dashboard for these, then the correlation between OOM events and the respective increases in native memory would be a lot easier. There was no ready solution out there so we wrote a small library that is doing that. Soon you will see an article about improving our native memory tracking observability.

CleverTap Tech Blog

Pitfalls When Upgrading from Java 8 to Java 17

Motivation

Expectations

Pitfall 1: NIO Sockets vs Plain Sockets

New Socket Implementation

Improved Encapsulation of JDK Internals

Pitfall 2: String Deduplication

Pitfall 3: Object Monitors Deflation

Learnings

Like this:

Leave a ReplyCancel reply

Pitfalls When Upgrading from Java 8 to Java 17

Pitfalls When Upgrading from Java 8 to Java 17

Motivation

Expectations

Pitfall 1: NIO Sockets vs Plain Sockets

New Socket Implementation

Improved Encapsulation of JDK Internals

Pitfall 2: String Deduplication

Pitfall 3: Object Monitors Deflation

Learnings

Share this:

Like this:

Optimizing OpenSearch: How We Achieved 15x Faster Queries

Achieving Operational Efficiency and Optimising Cost by Migrating to ECS Fargate with Graviton

Application Profiling at Scale

Leave a ReplyCancel reply

Discover more from CleverTap Tech Blog