Friday, July 31, 2009

A way to find thread hang problem

Got a conference call in the early morning two days ago, and it was from operation team. It was reported that one of our apps in production had problem with thread hanging.

I bet thread dump would tell me something, so I took more than 10 thread dumps within 20 minutes. And then examined the thread dump and quickly found a thread which running over 20 minutes. And that thread tried to retrieved data from one of our internal server. After restarting that internal server, the thread hanging problem was gone.

During the working hour, I looked at the piece of code which called that internal server. Found it out it used URLConnection. As we use JDK 1.4, URLConnection does not have timeout function.

With Google search, found potential solution:
http://www.tek-tips.com/viewthread.cfm?qid=1219068&page=1

Tuesday, July 28, 2009

add cURL to cygwin

http://cygwin.com/

Run setup.exe any time you want to update or install a Cygwin package.

To install cURL for the first time, choose "cURL" from the packages list, in the "Web" category.

If you already have it installed, the update will be pre-selected.

Tuesday, July 14, 2009

Java Out Of Memory (OOM) types

Out Of Memory (OOM) types

Memory leak is not hard to find as you will get java.lang.OutOfMemoryError when the system is short of memory to do normal operations.
But track it down to find the causes of it are pretty hard. Also, it is hard as it normally happens only in production and after running several days.

In this post, I will classify the types of OOM I was facing with and the ways how to find the problem of them.
  1. Perm generation, too many classes loaded or being generated.
  2. Perm generation, String intern over loaded.
  3. Old generation, memory over used.
  4. Old generation, objects are not released as supposed to be.
  5. Old generation, Http session object is not clear out after session timeout time
Reference document:
http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html is the document talking about performance tune up. But it gives a nice description of what is perm generation and old generation

Good tools to start with
  • Jprofile
  • jmap
  • memory analyzer

There are many of documents online talking about of how to use those tools.
So I will skip that part talking about tools.

You can view my another post to set up Jprofile
http://michaeltechzone.blogspot.com/2009/01/run-setup-steps-for-remote-profiling-on.html

The heap dump is a dump of all the live objects and classes, if take a binary format as below, you can use memory analyzer to examine of those objects
jmap -heap:format=b
Where PID is process id and can be got with jps.

My client servers have high traffic websites, over 4 million hit per day, it had memory issue for a long time and many experts had been consulted but fail to solve the OOM. Memory issues happened in both perm and old space. The servers had to restart every day and had trended to restart during the day.

The client had dashboard to display memory usage visually. So we can see the memory usage chart easily. The application is running on ATG Dynamo server.

Type 1: Perm generation, too many classes loaded or being generated.
Here's the chart before fix


Two things had been done to help to find the issues:
  • -verbose:class to find out the offending class. look for the class which is repeatedly getting loaded but *not* getting unloaded.
  • took multiple heap dumps during the different time, one after start server, one during the middle day and one when OOM.
After comparing heap dumps for classes loaded and examined classes loaded/unloaded, I found out one class loader of our third party loaded one type of class multiple times.

With the fix, we see nice chart


Type 2: Perm generation, String intern over loaded

Personally, I did not meet such problem. I listed it here in order to complete the OOM types. Following link http://www.thesorensens.org/2006/09/09/java-permgen-space-stringintern-xml-parsing/ describes such case.

Type 3: Old generation, memory over used.

This kind of OOM is most frequently, I met and fixed several times of this type. It is the easy one to identify and fix. Normally, it happens when developers push too much objects into memory cache.

After looking at heap dump when OOM happens, you will find it from the top of heap sorted on retained size in the Dominator Tree view with memory Analyzer.

The problem I fixed was that even though we only used one attribute of the big object, that whole big object was cached. As many instances of that big object were being cached, there were not much memory spaces left. With the fix of caching only required attribute, the memory usage was getting better.


Type 4: Old generation, objects are not released as supposed to be

This is the true memory leak type, the memory is run away as server runs.

Our memory issue was related to oracle connection pool. At first, I noticed that all the connection pool, result set and statement were closed after used in the finally clause.
So it should work, but not, pretty odd.

With further investigation, I found that one SQL statement was reused.
Something like this:

Statement st = new SQL statement for query 1.
execute st for query_1

if result not found
st = new SQL statement for query_2.

This made query_1 statement dangling there without clearing, yet still live in connection pool. The caused the memory leak.

After I changed the logic to close statement after running first query before assign to query_2, the memory back to normal.


Before fix:


After fix:



Type 5. Old generation, Http session object is not clear out after session timeout time

For all above types of OOM, the memory charts of each JVM instance are almost the same.

After few months I fixed the memory issue for my client. They have OOM again.

This type OOM behavior was differently. It happened randomly and server died very quick. From the heap dump analyzer, no much information appeared except for that the session data took many memories.

Before fix




I doubted it had deadlock, but did not find it with IBM thread dump analyzer.

Then I suspected there was an infinite loop. I took thread dump every minute for a half hour. By examining that thread dump, I noticed that few IDs appeared frequently. Yes, it was an infinite loop which happened for the function clearing the user session. So the session data would not be cleared up, that was in turn caused out of memory.

With the information above where the infinite loop happened, it was not take a long time to find the real problem and provided the fix.

After fix: the memory chart backs to normal.


Find number of files created and total line of code written

In Cygwin for window or Unix, use following line of code to find:


Number of java files

prompt>find -name *.java > java_name.txt

Then look at java_name.txt with editor to get number of files.



Count number of lines of code inside the java

prompt>wc -l `find . -regex ".*\.\(java\)"`