All the objects which are created dynamically (using new in C++ and Java) are allocated memory in the heap. At some point in time we might get OutOfMemory error once heap memory is full and still we continue to go on creating objects. So we need to clear heap memory by releasing memory for all those objects which are no longer referenced by the program (or the unreachable objects) so that the space is made available for subsequent new objects. Here garbage collection comes to our rescue, and it automatically releases the heap memory for all the unreferenced objects. In our previous journal entry, we have talked about different GC Collectors, but the actual concepts behind the algorithms which are used by these collectors to do the actual job are quite similar. Lets take a look at different Java garbage collection algorithms.
Java Garbage Collection Algorithms
Before diving into the practical implementation of details of the algorithms lets first understand what are GC Roots.
GC Roots are the objects which are accessible directly outside from the heap. Example:
- Active Threads.
- Static Variables.
- Local variable and input parameters of the currently executing methods
- JNI References.
In general all GC Collectors focus in two areas:
- Find out all objects that are still alive
- Get rid of everything else – the supposedly dead and unused objects.
First part, the census on live objects, is implemented in all collectors with the help of a process called Marking.
All of the algorithms discussed have the same mark phase. Marking phase is about traversing the whole object graph, starting from GC Roots. When GC visits the object, it marks it as accessible and thus alive. All the objects which are not reachable from GC Roots are garbage. Marking requires stop-the-world (STW) pauses, because the running application threads could interfere. How long the STW pause is, depends mostly on the number of visited objects.
GC proceed to start removing the unreachable objects once the Mark phase is completed.
Removing Unused Objects
Garbage Collection algorithms can be divided into three broad groups:
After the marking phase, we have the memory space which is occupied by visited and unvisited objects. All of the unvisited objects are considered free and can be re-used to allocate new objects. It is simple, but because the dead objects are not necessarily next to each other, we end up having a fragmented memory. That’s not bad, but if no single region is large enough to accommodate the allocation, the allocation is still going to fail (with an OutOfMemoryError in Java).
Mark-Sweep-Compact algorithms solve the shortcomings of Mark and Sweep (Fragmented Memory). This algorithm moves all marked objects to the beginning of the memory region. Compacting the heap isn’t for free, it comes with some downsides also. This approach increases the GC pause duration as we need to copy all objects to a new place and update all references of such objects. Using such approach the location of the free space is always known and no fragmentation issues are triggered either.
Mark and Copy algorithms are very similar to the Mark and Compact as they too relocate all live objects. The important difference is that the target of relocation is a different memory region as a new home for survivors. The previously occupied region is considered to be free. The advantages of this algorithm is that copying can occur simultaeously with marking during the same phase. Next time mark-copy is executed, all the alive objects are moved back to the previous memory region. As you can imagine, this of course leads to a memory compaction. Unfortunately, it requires additional extra region large enough to fit all live objects at any given point in time.