Weblogic Admin Tutorials: Weblogic Trouble Shooting Issues

Weblogic Trouble Shooting Issues

________________________________________________________________________________

1. How to set the Class Path?

WL-Home\servers\bin SetwlsEnv.cmd (windows)

WL-Home\servers\bin SetwlsEnv.sh (Unix)

2. How to set Domain?

WL-Home\servers\bin SetDomainEnv.cmd(windows)

WL-Home\servers\bin SetDomainEnv.sh(Unix)

3.How to increase WLS Memory?

set minimum and maximum to same size

$ Java ...-ms32m -mx32m -> it will allocate 32 megabytes.

:Xms2048m - Xmx2048m

4. How to increase Permgenspace ?

Increase the max permgen space -XX :Maxpermsize=256m (default =64m).

5.How to Enable verboseGC?

Java_Options = -"Xverbose:Memory,gcreport,gcpause-xverbosetimestamp"

6.How to Enable core dump?

sun JVM; -xx:+ShowMessageBoxOnError

Jrockit JVM: -Djrockit.WaitOnError

Windows : DrWatson

7.How to check whether server is listening on the specified port number or not?

telnet <IP> <Port>

Ex: telnet 199.129.212.1 8080

8.How to check server is alive or not?

Ping <IP>

Ex: Ping 199.129.212.1

9.How to start Managed Server Independence Mode?

nohup./StartManagedWeblogic_ManagedServername.sh&

10.How to check weblogic server Process ID?

usr/ucb/ps -auxwww | grep java

11.How to check port already in use or not?

netstat -na | grep <port>

lsof -itcp:7001

12.How to check multi cast test in cluster ?

Javautils.multicast Test -n <name> -a <Multi cast-address> -P <multicast port>s send

13.How to access admin console?

http/https://Hostname(or)IP:port/console

Ex:http//localhost:7001/console

14.How to know the weblogic version?

Adminconsole>Environment>servers>Monitoring> weblogic version.

commandline:
---------------
Go to your domain/bin
run setdomain (cmd)(bin)
run "Java weblogic.version"

15.How to take thread dump?

Ps -ef | grep java

kill -3 <PID> windows: ctl + Break

Better way to take thread dump use

jstack <PID>

16. How to start the servers?

Admin server: ./startweblogic.sh

Managed server: ./startmanagedweblogic.sh

Node Manager: ./startnodemanager.sh

17.How to copy file from server to other server. scp <filename> <id at destination>@<destination server>:<on which path we copy files>

18.How do you list the number of all open files at any given moment?
lsof

19.Do you know how to check the elevated privileges that you hold?
sudo -i

20. How to check list of all all files?
ls-ltr

_________________________________________________________________________________

Some Useful commands:

ls : list of all directories and sub-directories.

Pwd : Prints working directory.

top : display cpu and memory utilization

df - h : disk free space in human readable form.

du - k: disk utilization

free -m: How much ram is available.

Ps -ef | grep weblogic display all running process.

uptime

vmstat

cat <filename>

tail -50f <file name>

wc -1 <filename> counts words and lines of textfile.

touch <filename> create blank file

cp <oldfile> <newfile>

keytool -v list check the list ssls

kill -9 to kill the process.

SCP: for secure copy.

vmstat: display virtual memory statices.

______________________________________________________________________________

===========================================

1)How to search a string from top to bottom in vi Editor's(String =weblogic)?

A) /weblogic

2)how to save and quit from vi Editor's?

a)WQ!

3)what are the Advantages of nohup command?

A)nohup will execute the process if you layout system.

Syn:nohup &.

4)Differance b/w the ping and tracert?

A) ping tracert

1)It is check the connectivity. 1.It is packet information one place to

another place destination.

2)It is display all at a time. 2.It display only 30 hubs in tracert.

5)How to execute unix commands in vi?

A):ls

6)tar dir1 dir2 dir3 and new_dir?

A)syn:tar -cvf new_dir.tar dir1 dir2 dir3.

7)How to display the ipaddress and portnumber?

A)netstat -anp

8)how to delete directory with recursion and force?

A)rm -rf filename.

9)How to Open a file with page to page ?

A)more filename

10)How to Hide a file (file name=tuxedo)?

A)mv tuxedo .tuxedo

11)How to do undo in vi Editor's?

A)u

12)How to goto 100 Line in Vi Editor's?

A)100L

13)How to display last 100 Line from a file?

A)tail -100 filename

14)How to Reterive the fields from a file?

A)cut

15)How to zip a directory?(/home/directory)?

A)gzip -r /home/directory

16)How to go to end of the line in Vi?

A)G

17)Display the directiory count in current dir?

A)ls -lrt !wc -l

18)How to display all files ending with "log"?

19)How to appeand data to the existing file?

A)cat>>filename

20)How to find out the diskspace of the fileSystem?

A)df -sh

21)write A syntax for the scp Commands?

A) Scp filename root@ipaddress:filename.

22)How to display the updated lines in file ?

A)tail -f filename.

23)How to display the ipaddress and portnumber?

A)netstat -anp

24)which command is used to connect to the remote server?

A)telenet ipaddress.

25)How to repalce and String in vi Editor's?

A) sed %s/oldstring/newstring/g

25)How to repalce and String in unix?

A) sed s/oldstring/newstring/g filename

26)How to display top 10 lines from a file?

A) head -10 filename.

27)syntax for tar and untar a file?

A)tar -cvf filename.tar file1 file2 file3.

tar -Xvf filename.tar

28)How to display hiddean a file?

A)ls -a.

29)How to Delete a blank line from a filename?

A)grep "^v" sample >temp

mv temp filename

30)How to display all cuurent running process?

A)ps -ef

31)syntax for zip and unzip file?

A)gzip filename.

gunzip filename.gz

=============================================================================================

Troubleshooting Tips on Windows

Hung,Deadlocked, Looping Process

Post-Mortem Diagnostics,Memory Leaks

Monitoring

OtherFunctions.

Hung, Deadlocked, or Looping Process

· Print thread stack for all Java threads:

o Control-Break

o jstack pid

· Detect deadlocks:

o Request deadlock detection: JConsole tool, Threads tab

o Print information on deadlocked threads: Control-Break

o Print lock information for a process: jstack -l pid

· Get a heap histogram for a process:

o Start Java process with -XX:+PrintClassHistogram, then Control-Break

o jmap -histo pid

· Dump Java heap for a process in binary format to file:

o jmap -dump:format=b,file= filename pid

Post-mortem Diagnostics, Memory Leaks

· Examine the fatal error log file. Default file name is hs_err_pid pid .log in the working directory.

· Create a heap dump:

o Start the application with HPROF enabled: java -agentlib:hprof=file= file,format=b application; then Control-Break

o Start the application with HPROF enabled: java -agentlib:hprof=heap=dumpapplication

o JConsole tool, MBeans tab

o Start VM with -XX:+HeapDumpOnOutOfMemoryError; if OutOfMemoryError is thrown, VM generates a heap dump.

· Browse Java heap dump:

o jhat heap-dump-file

· Get a heap histogram for a process:

o Start Java process with -XX:+PrintClassHistogram, then Control-Break

o jmap -histo pid

Monitoring

( jstat is not available on Windows 98 or Windows ME.)

Note: The vmID argument for the jstat command is the virtual machine identifier. See the jstatman page for a detailed explanation.

· Print statistics on the class loader:

o jstat -class vmID

· Print statistics on the compiler:

o Compiler behavior: jstat -compiler vmID

o Compilation method statistics: jstat -printcompilation vmID

· Print statistics on garbage collection:

o Summary of statistics: jstat -gcutil vmID

o Summary of statistics, with causes: jstat -gccause vmID

o Behavior of the gc heap: jstat -gc vmID

o Capacities of all the generations: jstat -gccapacity vmID

o Behavior of the new generation: jstat -gcnew vmID

o Capacity of the new generation: jstat -gcnewcapacity vmID

o Behavior of the old and permanent generations: jstat -gcold vmID

o Capacity of the old generation: jstat -gcoldcapacity vmID

o Capacity of the permanent generation: jstat -gcpermcapacity vmID

· Monitor objects awaiting finalization:

o JConsole tool, VM Summary tab

o getObjectPendingFinalizationCount method injava.lang.management.MemoryMXBean class

· Monitor memory:

o Heap allocation profiles via HPROF: java -agentlib:hprof=heap=sites

o JConsole tool, Memory tab

o Control-Break prints generation information.

· Monitor CPU usage:

o By thread stack: java -agentlib:hprof=cpu=samples application

o By method: java -agentlib:hprof=cpu=times application

o JConsole tool, Overview and VM Summary tabs

· Monitor thread activity:

o JConsole tool, Threads tab

· Monitor class activity:

o JConsole tool, Classes tab

Other Functions

· Interface with the instrumented Java virtual machines:

o Monitor for the creation and termination of instrumented VMs (not Windows 98 or Windows ME): jstatd daemon

o List the instrumented VMs (not Windows 98 or Windows ME): jps

o Provide interface between remote monitoring tools and local VMs (not Windows 98 or Windows ME): jstatd daemon

o Request garbage collection: JConsole tool, Memory tab

· Dynamically set, unset, or change the value of certain Java VM flags for a process:

o jinfo -flag flag pid

· Pass a Java VM flag to the virtual machine:

o jconsole -J flag ...

o jhat -J flag ...

· Report on monitor contention:

o java -agentlib:hprof=monitor=y application

· Evaluate or execute a script in interactive or batch mode:

o jrunscript

· Interface dynamically with an MBean, via JConsole tool, MBean tab:

o Show tree structure.

o Set an attribute value.

o Invoke an operation.

o Subscribe to notification.

· Run interactive command-line debugger:

o Launch a new VM for the class: jdb class

o Attach debugger to a running VM: jdb -attach address

__________________________________________________________________________________________

Troubleshooting Tips on Solaris OS and Linux

Monitoring Tools

Java VisualVM

Java™ VisualVM is a new monitoring and profiling tool for troubleshooting Java applications. It incorporates various technologies, including jvmstat and JMX, as well as CPU and memory profiling, to provide one easy-to-use integrated visualization tool. Developers can rapidly create their own extensions using a public API, and may share them with the community on a central repository.

· Display local and remote Java applications.

· Display application configuration and runtime environment.

· Monitor application memory consumption and runtime behavior.

· Monitor application threads.

· Profile application performance or analyze memory allocation. (Local applications only.)

· Create and display thread dumps.

· Create and browse heap dumps.

· Analyze core dumps.

· Analyze applications offline.

· Get additional plugins contributed by the community.

· Write and share your own plugins.

JConsole

Launch a GUI to monitor and manage Java applications and Java VMs on a local or remote machine.

· Connection to Java process, host, or JMX agent.

· Graphical overview of CPU usage, heap memory, threads, classes.

· Summary of key data, for example, uptime, compilation time, objects pending finalization, and more.

· Memory statistics, including garbage collection.

· Request garbage collection.

· Thread statistics.

· Deadlock detection.

· Class statistics.

· Tree structure of all platform and application MBeans.

· Set the value of an MBean attribute.

· Invoke operation on an MBean, for example, perform a heap dump.

· Subscribe to notification for an MBean.

· Information about the virtual machine, the compiler, the operating system.

· Pass flags to VM on which JConsole is running.

jps

List instrumented Java virtual machines.

jstat

Display performance statistics for an instrumented Java VM:

· Behavior of the class loader

· Behavior of the HotSpot Just-in-Time compiler, totals and by method

· Behavior of the GC heap

· Behavior and sizes of the generation areas

jstatd

· Monitor for creation and termination of instrumented HotSpot Java VMs.

· Provide an interface for remote monitoring tools to attach to Java VMs.

Debugging Tools

HPROF profiler

Writes class profiling information to a file or a socket, in ASCII or binary.

· Heap allocation profiling.

· Heap dump.

· CPU usage - for threads, methods.

· Monitor contention profiling.

To invoke the HPROF tool: java -agentlib:hprof ToBeProfiledClass

To print the complete list of options: java -agentlib:hprof=help

jdb

Launch a simple interactive command-line debugger.

· Display Java objects and primitive values.

· List currently running threads.

· Dump the current thread stack.

· Set breakpoints.

· Step through execution.

· Examine exceptions.

jhat

Parse a binary heap dump, launch a web browser, and present standard queries.

· Execute standard queries, for example, classes, objects, class instances, reference chains from object rootset, reachable objects, and more.

· Turn on or off object allocation tracking.

· Turn on or off object reference tracking.

· Specify objects to exclude from "reachable objects" query.

· Pass flags to the Java VM on which jhat is running.

· Develop custom queries with buit-in Object Query Language.

· Compare objects in two dumps.

jinfo

· Print command line flags and system properties for a running process, from a core file, or for a remote debug server.

· Dynamically set, unset, or change the value of certain Java VM flags for a process.

jmap

· Print shared object mappings for a process, a core file, or a remote debug server.

· Dump the Java heap (all objects or only live objects) of a process, a core file, or a remote debug server in binary format to a file.

· Force the Java heap dump of a process if the process does not respond.

· Print a heap summary for a process, a core file, or a remote debug server.

· Print a histogram (all objects or only live objects) of a process, a core file, or a remote debug server.

· Print information on objects awaiting finalization.

· Print class loader statistics of the permanent generation.

· Pass flags to the VM.

jsadebugd

Serviceability Agent Debug Daemon, which acts as debug server.

· Attach to a Java process or a core file.

· Remote clients can attach to the server using RMI.

jstack

· Print stack traces of threads for a process, core file, or remote debug server

· Print information about concurrent locks for a process, core file, or remote debug server

· Force the stack dump for a process if the process does not respond

Scripting Tools

jrunscript

· Command-line script shell: Evaluate or execute a script in interactive or batch mode.

· Pass flag to the VM.

· Set Java system properties.

_________________________________________________________________________________

SV1: OOM, Native OOM, Server Crash, High CPU Utilization, Server down/Unknown

SV2: 404, 403, Users Unable to access some application and URL, application errors, application responding slowly, application not working , application not opening, not getting authenticated, blank page.

SV3: Log file not rotating, high disk space usage on servers, Stack overflow, Thread count, Site scope alert, Error while uploading war file.

SV4: User creation errors.

________________________________________________________________________________

1.OOM

Ø Login to the Corresponding Server through Putty

Ø Then Check the Status of the Server instances

Ø Check the Server logs and Out logs for OutOfMemory Error

Ø Take the Access logs at the time of OOM and it will be good if we take thread dump

Ø If Server(s) is/are in Running State.

Ø Analysis the Thread dump for the Cause of OutOfMemory Error (Due to App/Server)

Ø Then Depending on the Server Status (if not in Running State) Restart the Server.

OutOfMemory during deployment:

Ø If the application is huge(contains more than 100 JSPs), we might encounter this problem with default JVM settings.

Ø The reason for this is, the MaxPermSpace getting filled up.

Ø This space is used by JVM to store its internal datastructures as well

Ø as class definitions. JSP generated class definitions are also stored in here.

Ø MaxPermSpace is outside java heap and cannot expand dynamically.

Ø So fix is to increase it by passing the argument in startup script of the server: –XX:MaxPermSize=128m (default is 64m)

2.Site Scope alerts:

Ø Login to the Server

Ø Check the server status and Particularly at the time of Site Scope alert

Ø Check the logs (Server/Out) for any Errors and Exceptions at the time of Site Scope alert.

3.High CPU utilization:

Ø Login to the Corresponding Server through putty

Ø Check the server instances CPU utilization

Ø ps –ef [0r] top [or] prstat

Ø aix: topas or psstat

Ø Make Sure that the instances are running in weblogic User.

Ø ps –ef | grep java

Ø Check the logs for any findings regarding high utilization

Ø Check the Queue threads

Ø If 100% cpu utilization :: kill -9 pid

Ø Restart the instances to bring down the more CPU Utilization.

4.High disk space usage on servers:

Ø Login to the Server.

Ø Check the disk space of the respective Mount which is consuming more disk Space.

Ø df –kh

Ø Zip log files or remove oldest logs backup war files and also access logs.

Ø gzip <filename> or compress <filename>

Ø [0r] rm –rf <filename>

Ø Backup : mv /apps/bea/domains/gwmp_desktop/ads_web.war /apps/back_up/ads_web.war_bak

Ø Backup: mv <sourcepath> <destinationpath>.

5.Threads count :

Ø Check the logs for any Errors and Exceptions

Ø Check the status of instances & connection pools

Ø Check the CPU usage.

Ø Take the thread dump if possible and Analyze the thread dump

Ø Check with Other Subsystems

Ø Check with the DB team if any Issues related to Database.

6.Stack overflow:

Ø Checkout the Server logs as well as Out logs and also the access logs at the time of Stack Overflow Occurrence. Restart the instance if required

Ø Xss=.

7. Log files not rotating:

Ø Check the Status of the Server

Ø ./startWeblogic.sh

Ø ./startManagedWeblogic.sh <manageservername>

Ø [0R]

Ø Check through console.

Ø Check the disk Space(if full, Delete the logs and then need to restart the Server)

Ø du –kh (folder)

Ø df –kh (filesystem)

Ø avail capacity

Ø 45% 90%

Ø If full , mv <source path> <destination path>

Ø Delete, rm –rf <filename: adminserver.log>

8.Server Errors:

Ø Check the Status of Servers.

Ø ./startWeblogic.sh

Ø ./startManagedWeblogic.sh <manageservername>

Ø [0R]

Ø Check through console.

Ø Check the Server logs

Ø /apps/bea/domain/gwmp_destop/logs

Ø Adminserver.log

Ø Managedserver.log

Ø If any Database Errors, Check the Connection pool and Datasource.

Ø Services->jdbc->connectionpool,datasource

Ø Check out the Deployment Descriptors.

Ø Weblogic.xml,web.xml

Ø Based on the logs if any Configuration Changes Required, Make the Changes and then restart instances one by one if in Cluster.

9.Server Down/Unknown:

Ø Login to the Server through Putty as well as Open the Admin Console

Ø Check out the respective Instance Process from putty as well as the instance Status from Admin Console

Ø If Process does not exist and Instance Status is Unknown, then check the logs of the Server Instance as well as Admin Logs.

Ø Admin and managed server logs.

Ø Node manage status.

Ø Find the root Cause from the logs And Restart the required instances

10. URL not working:

Ø Access the URL

Ø Check the Status of the Server instances on which this Application is deployed.

Ø Then Check the Default Queue threads or (Application Specified Queue if any)

Ø whether idle threads are zero or not. Then Server logs and Application logs (Out logs) for Errors and Exceptions.

Ø If idle threads are Zero, Check which Application is consuming all threads and if it is the same application which you are accessing, then check with the Application Owner.

Ø (To resolve the above Issue, Need to restart the Corresponding Instances, before that check

Ø with the App owner why they are getting consumed)

Ø If there is any Application Related Exceptions- Check with the Application owner or check the server logs for exceptions.

Ø If there are any DB Exceptions related to the application which you are accessing, Please Check the Corresponding Connection pool and Datasource whether they are running fine or not.

11.Application errors:

Ø Access the Application URL

Ø Check the instances and their status if any Errors

Ø Check the logs of the Server as well as Application (Out) logs

Ø Check out the Connection pool Parameters and Datasource

12.Users unable to access some application/URL:

Ø Check out by accessing the url

Ø Check out whether they are using Correct URL or not

Ø Check the logs of both Weblogic and Webserver

Ø Check the Server Instances status.

Ø Test the pools.

Ø Check the DB connectivity.

Ø Check if the deployment is done properly or not, else redeploy the application and check for errors in the logs simultaneously.

Ø Check out the Connection pool user name.

Ø Restart the instance if required.

13.Application error, responding slowly, Application not working/not Opening, not getting authenticated,Blank page

Ø Check the Web server and App server instance status.

Ø Check the logs for any errors/exceptions both in Webserver as well as in Weblogic Server.

Ø Check the Queue threads, Connection pool Status, Connections and Datasource.

Ø Check disk space

Ø Check the log4j property enable.

Ø Check if the deployment done properly.

14.Error while uploading war file:

check out the Availability of Space

15.Log locations:

1) Server log

WebLogic server creates server log file by default under:

/<domain-name>/<server name>/<server name>.log

The location is configurable.

2) JDBC log

All SQL statements and DB related exceptions/errors.

This file is created under /<server name>/jdbc.log

3)STDout log (If the process is redirected to STDout)

Domain log

All domain level information is logged into this file.

This is subset of server log file.

<domain name>/<domain name>.log

4) Access log

All http requests are recorded in this log file

/<server name>/access.log

5) Transaction log

All servers record transaction in the tlog file

/<server name>/<server name>.tlog

16.Server Crash:

Ø Server Crash

Ø This implies the weblogic java process no longer exists.

Ø Server crash can occur only because of native code. (Java cannot cause a process to crash)

Ø Determine all potential sources of native code used by the WebLogic Server.

Ø nativeIO.

Ø Type4 jdbc driver.

Ø Native libraries accessed with JNI calls.

Ø SSL native libraries.

Ø JVM itself. Most of the times its from JVM.

Sometimes the JVM will produce a small log file that may contain useful information as to which library the crash has originated from. (hs_err_pid*.log)

Server Crash Analysis

When a JVM is crashed, a core file(binary image of the process) is created. Run pmap and pstack against the core file to get the library that caused the crash.

Demo to figure out offending library using existing pmap & pstack out files.

Check list:

1) hs _err_pid*.log (Look for library that caused the crash)

2) pmap core (core file created in JVM root dir)

pstack core

3) Using debugger (gdb,dbx,adb) (if above two steps does not provide any information)

17.Server Hang:

A server is said to be hung when:

Ø Process is still alive

Ø Server does not accept any requests because all the execute threads busy or stuck for some reason.

Ø No reponse sent to clients.

Ø java weblogic.Admin PING command doesn’t return a normal reponse

Server Hang Analysis:

The first step is to take multiple thread dumps.

Ø A thread dump is a snapshot of the JVM at the particular instant.

Ø Multiple thread dumps are necessary to conclude that the threads are stuck and not progressing.

Procedure to take thread dumps:

Unix:

Ø Open shell window and issue the command kill -3 <PID>

Ø where PID is java processID of weblogic. Thread dumps are

Ø logged on to STDout file.

Windows:

Ø Do ctrl-break on command window where weblogic is running.

Ø Thread dumps are created on the same command window.

Windows Service:

Ø Open a command prompt and issue the command(Make sure beasvc.exe is in the PATH)

Ø c:\> beasvc -dump -svcname:service-name

Ø Thread dumps are created in the defined log file.

Ø While creating service, we can provide log option in installservice script as:

Ø -log:"d:\bea\domains\mydomain\myserver-stdout.txt

• Before we analyze thread dumps, it is important to know the common thread states:

1)Runnable [marked as R in some VMs]:

This state indicates that the thread is either running currently or is ready to run the next time the OS thread scheduler schedules it.

2)Object.wait() [marked as CW in some VMs]:

Indicates that the thread is waiting for some condition to be

fulfilled.

3)Waiting for monitor entry [marked as MW in some VMs]:

Indicates that the thread is waiting to enter a synchronized block.

These threads are something to watch out because there is lock contention here. Thread is waiting for a lock on object and some other thread is holding the lock.

In case of weblogic, the main worker threads are from group weblogic.kernel.defalt:

"ExecuteThread: '1' for queue: 'weblogic.kernel.Defalt'“….

This is the set of threads we need to look for hang/slow performance issues.

This is a snapshot of idle thread waiting for some work to be assigned.

On an idle system you would see lot of threads in the below state:

"ExecuteThread: '1' for queue: 'weblogic.kernel.Defalt'" daemon prio=5 tid=0x031a6308 nid=0x980 in Object.wait() [2dff000..2dffd8c]

at java.lang.Object.wait(Native Method)

- waiting on <0x112cf2c0> (a weblogic.kernel.ExecuteThread)

at java.lang.Object.wait(Object.java:429)

at weblogic.kernel.ExecuteThread.waitForRequest(ExecuteThread.java:153)

- locked <0x112cf2c0> (a weblogic.kernel.ExecuteThread)

at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:172)

• As for thread dump analysis & conclusion, lets see a sample thread dump and drill into it further

Demo of RSD thread dump (Thread stuck issue on UAT)

Server performing Slow

There are lot of reasons for server performing slow.

First step is to take thread dumps and see what the threads are doing. If there is nothing wrong with the threads there are other reasons why server performs slow:

Process runs OutOfMemory:

If java heap is full, server process appears to be hung and not accepting any requests because each request needs heap for allocating objects.

So if heap is full, none of the requests get served, all the requests fail with java.lang.OutOfMemory

OutOfMemory Analysis:

OutOfMemory can occur because of real memory crunch or a memory leak causing the heap to fill with orphaned objects.

First step is to enable GC and run the server again.

(-XX:printGCDetails).

The STDout file would show the garbage collection details.

If the error is because of memory leak, then we would need to use profilers like Introscope or optmizeIT to figure out the source of leak.

OutOfMemory Analysis

Process size = java heap + native memory + memory occupied by the executables and libraries.

On 32 bit operating systems, the virtual address space of a process can go up to 4 GB. This is data bit limitation (2 pow 32)

Out of this 4 GB, the OS kernel reserves some part for itself (typically 1 – 2 GB).

This is not a limitation on 64 bit machines like solaris(sparc) or windows running on Itanium (64 bit)

OutOfMemory Analysis

OOM can occur due to fragmentation. In this situation, we can see free memory available but still get OutOfMemory errors.

Before we know about fragmentation, we need to know the following fact:

Heap allocation can only be contiguous (As per JVM spec). If a request needs 2MB of memory then JVM has to provide 2MB of contiguous memory chunk.

Over a period of time, memory allocation is becomes scattered and there might not be enough contiguous memory available.

FullGC might no be able to reclaim the contiguous space.

This is called fragmentation

For eg: The verbose:gc output might look like the following if there was a fragmentation of heap. There is free memory available, but still JVM throws OOM error.

(Most of the fragmentation bugs are resolved in Sun JDK1.4.2_xx)

[GC 4673K->3017K(32576K), 0.0050632 secs]

[GC 5047K->3186K(32576K), 0.0028928 secs]

[GC 5232K->3296K(32576K), 0.0019779 secs]

[GC 5309K->3210K(32576K), 0.0004447 secs]

java.lang.OutOfMemoryError

• OutOfMemory Analysis

Fragmentation relates issues are because of bug in JVM.

Best approach is to try the latest minor version of JVM and if does not work out, we need to work with vendor to get it fixed.

• The following commands on solaris will provide good information:

vmstat :

The vmstat command reports statistics about kernel threads, virtual memory, disks, traps and CPU activity

sar:

An OS utility that is termed as system activity reporter

• If the application uses SSL, then the server performs slow compared to non SSL.

SSL reduces the capacity of the server by about 33 to 50 percent depending upon the strength of encryption used in the SSL connections.

Process running out of File descriptors. Server cannot accept further requests because sockets cannot be created. (Each socket created consumes a FileDescriptor)

The following exception is thrown in such cases:

java.net.SocketException: Too many open files

java.io.IOException: Too many open files

In the above case, the lsof utility would help. lsof utility shows the list of all open filedescriptors. From the list of open files, we ( application owner) can easily figure out if it is a bug or expected behavior. If it is expected behavior, then the number of FDs needs to be increased. (default number is 1024)

• GC taking long times (more than 20secs).

This appears like a hang for end users.

In the above case, we need to tune the GC parameters.

In these scenarios, we should be trying other GC options available. In some cases (GC taking very long times), incremental GC has been useful (-Xincgc).

WebLogic Troubleshooting

Communication bt Apache - Weblogic

If there is any issue between Apache and Weblogic and the cause is not obvious, enable debug at Apache layer. In http.conf file add:

Debug ALL

This would create file called wlproxy.log under /tmp of Apache machine. The log would contain all the request/response headers between Apache and WebLogic.

Most of the plug-in issues in WLS8.1 were centered around the attribute “KeepAliveEnabled”.

For most of the socket related errors, it worth trying turning off

“KeepAliveEnabled” and redo the test.

Apache Restart and Check the Connection counts:

APACHE_HOME\bin\Apache –t Syntax check

APACHE_HOME\bin\Apache start Start the server

APACHE_HOME\bin\Apache stop Stop the server

APACHE_HOME\bin\Apache Restart

APACHE_HOME\bin\Apache -l

_______________________________________________________________________

Getting error while restarting one of the Weblogic server instance

####<Sep 13, 2007 6:45:44 PM IST> <Error> <EmbeddedLDAP> <bng1web2prod> <itms> <[ACTIVE] ExecuteThread: '0' for queue: 'weblog

ic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1189689344635> <000000> <Error opening the Transaction Log: ./servers/

itms/data/ldap/ldapfiles/EmbeddedLDAP.tran (Permission denied)>

####<Sep 13, 2007 6:45:44 PM IST> <Error> <EmbeddedLDAP> <bng1web2prod> <itms> <[ACTIVE] ExecuteThread: '0' for queue: 'weblog

ic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1189689344637> <000000> <Error Instantiating 'dc=web2prod-Domain': nul

####<Sep 13, 2007 6:45:44 PM IST> <Critical> <EmbeddedLDAP> <bng1web2prod> <itms> <[ACTIVE] ExecuteThread: '0' for queue: 'web

logic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1189689344653> <BEA-171521> <An error occurred while initializing t

he Embedded LDAP Server. The exception thown is java.lang.ClassCastException: com.octetstring.vde.backend.BackendRoot. This ma

y indicate a problem with the data files for the Embedded LDAP Server. This managed server has a replica of the data contained

on the Master Embedded LDAP Server in the Admin server. This replica has been marked invalid and will be refreshed on the nex

t boot of the managed server. Retry the reboot of this server.>

####<Sep 13, 2007 6:45:44 PM IST> <Critical> <WebLogicServer> <bng1web2prod> <itms> <main> <<WLS Kernel>> <> <> <1189689344667

<BEA-000362> <Server failed. Reason:

While restarting WL instance on 9.2 I got the above-mentioned error and I found that server is getting started but again it’s getting forced shutdown.

Solution:

Just go to that server instance directory and browse inside that for the

/local/BEA/weblogic92/domain-name/servers/server-name/data/ldap/ldapfiles directory path

You will get below listed files in that particular directory

-rw-r--r-- 1 weblogic weblogic 79649 Sep 13 18:48 EmbeddedLDAP.data

-rw-r--r-- 1 weblogic weblogic 0 Sep 13 18:48 EmbeddedLDAP.delete

-rw-r--r-- 1 weblogic weblogic 648 Sep 13 18:48 EmbeddedLDAP.index

-rw-r--r-- 1 weblogic weblogic 0 Sep 13 18:48 EmbeddedLDAP.lok

-rw-r--r-- 1 weblogic weblogic 80126 Sep 13 18:48 EmbeddedLDAP.tran

-rw-r--r-- 1 weblogic weblogic 8 Sep 13 18:48 EmbeddedLDAP.trpos

Just delete the below listed files inside the directory

-rw-r--r-- 1 weblogic weblogic 0 Sep 13 18:48 EmbeddedLDAP.delete

-rw-r--r-- 1 weblogic weblogic 0 Sep 13 18:48 EmbeddedLDAP.lok

Now restart the instance from the bin directory, this will get your Server up and running without issue.

_______________________________________________________________________________

Issue 1: JMS Issue 1

EOP messaging bridges failing frequently with error : "(java.lang.Exception: javax.resource.ResourceException: method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found). Because of this issue messages are being piled up on MQ and not being picked up by the bridge.

Soln:

Domain:eopdom1 (1admin +2ms spread across 2 servers). Checked the bridge configuration (70 bridges in total). Then checked the pools-param in jma-xa-dap.rar (120 on m1 and 20 on m2). Changed this to 150 on both servers as each bridge needs atleast 2 connections from the adapter pool, then redeployed and restarted weblogic instances. Also applied patch WB1E (CR326720_920.jar) to resolve the known issue with the error mentioned.

Notes:

Live is running on 9.2.0 and test is running on 9.2.3, this should be brought in sycn. Also, planning a quick round of WLS health check on EOP.

Issue2: JMS issue 2

Messaging bridge failed to connect with the source and target destinations and was giving below error: "failed to get one of the adapters from JNDI (javax.naming.NameNotFoundException: Unable to resolve 'eis.jms.WLSConnectionFactoryJNDIXA'. Resolved 'eis.jms'; remaining name 'WLSConnectionFactoryJNDIXA')". This would suggest that the adapter file jms-xa-adp.rar was either not targeted to the required managed server instance or perhaps the deployment of adapter failed with certain error.

Soln:

Found that the adapter was only targeted to managed2 server whereas the bridge was configured to run on managed1 server. Targeted the adapter to managed1 server as well and restarted the instances.

Issue3: JMS issue 3

A newly configured messaging bridge failed to become Active and following 2 error messages were seen: " Unable to connect to source destination" and "Configured QoS is not reachable".

Soln:

"Unable to connect to source destination" found that the source URL had a space between the "//" and IP, removed this and now the bridge was able to connect to source destination. "configured QoS is not reachable" found that the "QoS degradation allowed" was checked for earlier bridges but was unchecked for this new bridge and QoS was configured for "Exactly One" delivery, enabled this and the messaging bridge became Active upon bounce of weblogic instances.

Notes:

QoS "Exactly Once" required the messaging to be XA enabled i.e. the connection factory should be XA enabled and the destinations should be configured to use jms XA adapter.

Issue4: Deployer

Unable to deploy application from the console and getting following error on the console page ""Deployer:149150]An IOException occurred while reading input.; nested exception is: java.net.SocketException: Connection reset; nested exception is: java.net.SocketException: Connection reset".

Soln:

The only error message in the logs was indicating that the application is attempting to connect to java.sun.com on port 80 over internet but this was disabled due to firewall restrictions, reported this to application team. As a work-around added a manual entry in config.xml for application and restarted the admin and managed server instances and the application got deployed sucessfully.

Notes:

One of the argument was that the application was getting deployed properly on another test instance even with the same error. Though we were never able to replicate this again, one theory is that while deploying through console the deployer was attempting to connect to java.sun.com again and again, eventually timing out but by adding entry in config.xml and restart it just attempted once and moved over with other tasks which would have higher priority during restart.

Issue4: Startup

“/wls_domains/wlmrtnept/servers/managed3_wlmrtnept/tmp/managed3_wlmrtnept.lok : java.io.IOException: No locks available”

Soln:

This could be due to incorrect NFS setup (if NFS filesystem is used), check if the hosts have correct permissions on the NFS server. Check for the below nfs libraries they should be installed.yum list | grep nfs

*Note*: Red Hat Network repositories are not listed below. You must run this command as root to access RHN repositories.
nfs-utils.x86_64 1:1.0.9-40.el5 installed
nfs-utils-lib.x86_64 1.0.8-7.2.z2 installed
Also rpc.statd and rpc.idmapd processes should be running.

Issue5: Cluster

For quite sometime we were observing multicast packet loss issues triggering various other problems on Weblogic like managed servers dropping out of cluster, jms messages not delivered properly to distributed queues.

A recurring message similar to below appears in the logs, although it is an informational message only but it in turn acts as a trigger to various other issues, so messages like this should not be neglected.

<Mar 23, 2010 12:14:04 PM GMT> <Info> <Cluster> <host1> <managed1> <[ACTIVE] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1269346444069> <BEA-000112> <Removing managed2 jvmid:-1616739071273980991S:host2:[61002,61002,-1,-1,-1,-1,-1]:host1:61001,host2:61002,host3:61003,host4:61004:domain-name:managed2 from cluster view due to timeout.>

Soln:

We used multicast test utility to see if there is in fact any issue with multicasting

java utils. MulticastTest –n <name> -a <multicast-address> -p <multicast-port>

The result showed that the multicast packets are intermittently being dropped within the vLAN causing the above issue. We then liaised with the OS experts to narrow down the issue and to see whether the multicast packets are being transmitted correctly amongst the servers. This did not help much as from the server perspective all the packets were being transmitted correctly.

Next we involved network experts to seek their help. After thorough investigations of the network logs and various switch configurations it was concluded that this was down to the multicast address range being used and the way the local switches acknowledged that multicast range. They also suggested that in future we make use of Link Local Multicast IP Addresses for Weblogic multicasting purposes.

A note on Link Local IP can be found at: http://www.iana.org/assignments/multicast-addresses/

In short, Multicast Link-local addresses (actually, the link-local mac-addresses) are treated as broadcasts by the local switches so all web logic servers on the same vlan will see them. Other multicast addresses are dropped by the switches as default unless further action is taken:
Disable IGMP snooping on the vlan or the whole switch – otherwise the switch just drops the multicast packet because Web logic doesn’t use IGMP so the switch never sees an IGMP join request to the multicast group (and thus never maps the mac address to the switch port). OR
Configure static multicast mac addresses for the relevant switch ports.

Both the above 2 options add network complexity and are costly to implement, test and maintain. Link-local multicast addresses completely avoid these issues. Some previous implementations using non link local multicast addresses may have worked OK if the switch had IGMP snooping disabled globally or per vlan.

Issue6: JDBC

Stale connections causing high CPU and high memory utilization and eventually breakdown of database.

Soln:

The hardware and software could not be easily replaced due to the cost it incurred and the complexity of the application. So the challenge was to make the best utilization of the database, reduce the connections as much as possible, effectively use the connections made and refine the code, if possible.

Majority of the connections to the database were being made by the connections pools configured on Web logic, so web logic was the ultimate target for refinement. During the periods when the issue occurred it was observed that the number of sessions rose rapidly from 1400 to 1800, whilst database was capable of handling 1400 sessions it couldn’t support 1800 sessions at all. Most of these 1800 sessions were connections created by web logic in response to application request. So, it was clear that we need to get back to pen and paper and tune the connection pools as much as possible.

A look at the configuration of connections pools pointed towards a major issue. The application had an admin server and 14 managed servers in total with four connections pools in total. Each connection pool, irrespective of whether it was required there or not, was targeted on to admin and all the managed servers. This resulted in creation of many unwanted stale connections on the database which otherwise could have easily been avoided. Few connection pools were only required on admin servers and others on managed servers only.

So, as the first step towards tuning Web logic connections pools we got rid of all such unwanted connections. This effort paid and the total maximum connections onto database was brought down from 2250 to 1520 (a saving of 730 connections).

Next we resorted to tuning various parameters available for connection pool. We mainly concentrated on two parameters, Shrink Frequency and Inactive Connection Timeout. A short description below:

Shrink Frequency: The number of seconds (between 0 and a positive 32-bit integer) before Web Logic Server shrinks the connection pool to the original number of connections or number of connections currently in use. (This field is relevant only if you check the Allow Shrinking box.)

Inactive Connection Timeout: The number of inactive seconds on a reserved connection (between 0 and a positive 32-bit integer) before Web Logic Server reclaims the connection and releases it back into the connection pool.

Shrink frequency was set to a default value of 900s. It was found that the connection pool size expanded to its maximum value during peak loads but on average any transaction took about 100 to 200s to complete. So, we thought of reducing the shrink frequency to 300s so that the pool is shrunk every 300s and any idle connections are closed.

Also, Inactive Connection Timeout was set to a default value of 0s which meant that inactive connections were not being released back to pool causing weblogic to spawn new connections. This was later set to 300s so any inactive connections can be released back to pool and can be reused.

The above actions proved quite effective in terms of reducing the overall load on the database.

Issue7: Startup

While tying to start Web logic as a Windows service the Service Manager throws an exception – “Error 1067 the process terminated unexpectedly.”

When this happens there will not be any information recorded in the web logic logs as the Service Manager had failed to initiate web logic start-up process itself. But we can check the Windows Event logs for more information on this.

Soln:

Pages

Weblogic Trouble Shooting Issues

No comments:

Post a Comment