Real time weblogicTrouble Shooting Issue.

 Weblogic Trouble Shooting Issues 
________________________________________________________________________________

1. How to set the Class Path?

WL-Home\servers\bin SetwlsEnv.cmd (windows)
WL-Home\servers\bin SetwlsEnv.sh (Unix)

2. How to set Domain?

WL-Home\servers\bin SetDomainEnv.cmd(windows)
WL-Home\servers\bin SetDomainEnv.sh(Unix)

3.How to increase WLS Memory?
 set minimum and maximum to same size
$ Java ...-ms32m -mx32m      -> it will allocate 32 megabytes.
              :Xms2048m - Xmx2048m

4. How to increase Permgenspace ?
Increase the max permgen space    -XX :Maxpermsize=256m (default =64m).

5.How to Enable verboseGC?
Java_Options = -"Xverbose:Memory,gcreport,gcpause-xverbosetimestamp"

6.How to Enable core dump?
 sun JVM;  -xx:+ShowMessageBoxOnError
Jrockit JVM: -Djrockit.WaitOnError
Windows : DrWatson

7.How to check whether server is listening on the specified port number or not?
telnet <IP> <Port>
Ex: telnet 199.129.212.1 8080

8.How to check server is alive or not?
Ping <IP>
Ex: Ping 199.129.212.1


9.How to start Managed Server Independence Mode?
nohup./StartManagedWeblogic_ManagedServername.sh&

10.How to check weblogic server Process ID?
usr/ucb/ps -auxwww | grep java

11.How to check port already in use or not?
netstat -na | grep <port>
lsof -itcp:7001 

12.How to check multi cast test in cluster ?
Javautils.multicast Test -n <name> -a <Multi cast-address> -P <multicast port>s send

13.How to access admin console?
http/https://Hostname(or)IP:port/console
Ex:http//localhost:7001/console

14.How to know the weblogic version?
Adminconsole>Environment>servers>Monitoring> weblogic version.

commandline:
---------------
Go to your domain/bin
run setdomain (cmd)(bin)
run "Java weblogic.version"

15.How to take thread dump?

Ps -ef | grep java
kill -3 <PID>             windows: ctl + Break

Better way to take thread dump use 

 jstack <PID>   

16. How to start the servers?
Admin server:     ./startweblogic.sh
Managed server: ./startmanagedweblogic.sh
Node Manager:   ./startnodemanager.sh

17.How to copy file from server to other server.                                                                                             scp <filename> <id at destination>@<destination server>:<on which path we copy files>

18.How do you list the number of all open files at any given moment?
   lsof

19.Do you know how to check the elevated privileges that you hold?
    sudo -i

20. How to check list of all all files?
 ls-ltr
_________________________________________________________________________________

Some Useful commands:


ls     : list of all directories and sub-directories.
Pwd : Prints working directory.
top : display cpu and memory utilization
df - h : disk free space in human readable form.
du - k: disk utilization
free -m: How much ram is available.
Ps -ef | grep weblogic  display all running process.
uptime
vmstat
cat <filename>
tail -50f  <file name>
wc -1 <filename> counts words and lines of textfile.
touch <filename> create blank file 
cp <oldfile> <newfile>
keytool -v list  check the list ssls
kill -9 to kill the process.
SCP: for secure copy.
vmstat: display virtual memory statices.

______________________________________________________________________________

   Issue :         Application /URL   is not working


  How to trouble shoot issue of Application down?
 When any user comes up saying Application is not working need to access the Application URL from our side.
If we get the following errors in accessing application.
·      Page can’t display error.
  • Blank Page error.
  • Redirection error: If the application is move to another a server
  • 503 error
  • If file is not found in app server then we get 404 error pages.
  • 403 if user is not authorized person(forbidden)
  • NSAPI (back end not available)
  • Network (TCP/IP) error.
  • Packets lost error: Packets are lost while data being transferred from one server.
  • Intermittent failure of application.

TO be Checked.
  • Access logs,Error logs (web Server-Apache/Tomcat/IIS)
  • Access logs (app server weblogic/Jboss/Websphere)
  • Access log of web agent need to be checked.
  • Need to disable the option show friendly urls in the internet explorer.
          Navigation: (Internet Options-à.show friendly urls -àdisable)
  • Web logic admin utility to test connection between web server and app server Weblogic.Admin.
  • Ping can be used to check the response is coming from application server.
  • Connection between the app server and database server can be checked by finding out the connection pool status.
  • Need to test web logic server instance.
  • App server log should be checked whether backend server available or not available.
  • Check the command given by Tim to check the connection between web server and app server.






Please find the status codes for the webserver

/** 2XX:  */
HTTP_OK = 200;
HTTP_CREATED = 201;
HTTP_ACCEPTED = 202;
HTTP_NOT_AUTHORITATIVE = 203;
HTTP_NO_CONTENT = 204;
HTTP_RESET = 205;
HTTP_PARTIAL = 206;

/** 3XX: relocation/redirect */
HTTP_MULT_CHOICE = 300;
HTTP_MOVED_PERM = 301;
HTTP_MOVED_TEMP = 302;
HTTP_SEE_OTHER = 303;
HTTP_NOT_MODIFIED = 304;
HTTP_USE_PROXY = 305;

/** 4XX: client error */
HTTP_BAD_REQUEST = 400;
HTTP_UNAUTHORIZED = 401;
HTTP_PAYMENT_REQUIRED = 402;
HTTP_FORBIDDEN = 403;
HTTP_NOT_FOUND = 404;
HTTP_BAD_METHOD = 405;
HTTP_NOT_ACCEPTABLE = 406;
HTTP_PROXY_AUTH = 407;
HTTP_CLIENT_TIMEOUT = 408;
HTTP_CONFLICT = 409;
HTTP_GONE = 410;
HTTP_LENGTH_REQUIRED = 411;
HTTP_PRECON_FAILED = 412;
HTTP_ENTITY_TOO_LARGE = 413;
HTTP_REQ_TOO_LONG = 414;
HTTP_UNSUPPORTED_TYPE = 415;

/** 5XX: server error */
HTTP_SERVER_ERROR = 500;
HTTP_INTERNAL_ERROR = 501;
HTTP_BAD_GATEWAY = 502;
HTTP_UNAVAILABLE = 503;
HTTP_GATEWAY_TIMEOUT = 504;
HTTP_VERSION = 505;

1.Check the servers status from admin console


Status should be Running and Health will be in OK state. If not (Unknown,Admin,Failed_Not restatable,)
Open putty and log in to particular server  and verify process is available  and running or not ,and verify logs.
Command to find the running process / services for all or specific weblogic server instances (admin / managed) – On Sun Solaris platform :-

The regular command
#> ps –ef is used for finding the process but it does not help in finding the Weblogic process / services..

Use this command : for capturing specific or detailed weblogic processes information.

#> /usr/ucb/ps –auxwww | grep admin-server-name
#> /usr/ucb/ps –auxwww | grep managed-server-name

Also if there are multiple managed servers with same name prefixed with some other text
for ex.. managedserver, managedserver1, managedserver2, use single quotes to enclose the server name..

Note: Give a space at the end of the command before closing the single quotes.
This gives all processes with the name enclosed..

ex : /usr/ucb/ps –auxwww | grep ‘managed-server-name ‘

Note: if process not available restart the server with proper approvals.
           If process available but server in unknown state take  3-5 Thread dumbs ,kill the process and restart the server.
.check the application availability
Note: It should  be active always. If it is in prepared state you need to start the application from start tab shown above





Use telnet  IP port to check whether server is listening on the port number specified or not.

If Port 8080 is not open, you will receive the following result:

If Port 8080 is open, you will receive a blank window with a flashing type cursor.



you can make to verify if http port 80 is opened up on firewall is by telnet the destination ip over destination port 80.
ping ip-address  or hostname: To see if host is alive or dead
If the host is alive you will receive the following message.
If the host is not live you will receive the following message.



5.check disk free space available on all file systems
df command:
df command stands for "disk free". It is meant to show Linux disk space information, including disk space that is used, disk space remaining, and how filesystems are mounted on your Linux (or Unix) system.
The df command not only shows the free disk space on your local computer, it also shows the free disk space on all networked file systems that are mounted by your Linux system.
If option ‘-h’ (human-readable) is used with the command, it will generate the report using KB/MB/GB units instead of number of blocks which are displayed when the command is used without any options.


6. top command:
Ø  The top command shows how much processing power and memory are being used, as well as other information about the running processes.

Ø  By default, the processes are ordered by percentage of CPU usage, with only the "top" CPU consumers shown.

Ø  By default output is dynamically refreshed for every 5 seconds.


Ø  When the top command is running, press M (upper-case) to display processes sorted by memory usage



top displays a variety of information about the processor state. The display is updated every 5 seconds by default, but you can change that with the d command-line option
1. First line displays the time the system has been up, and the three load averages for the system. The load averages are the average number of process ready to run during the last 1, 5 and 15 minutes.
2. processes:second line displays The total number of processes running at the time of the last update. This is also broken down into the number of tasks which are running, sleeping, stopped, or undead.
3. "CPU states"
Next line Shows the percentage of CPU time in user mode, system mode, niced tasks, iowait and idle. (Niced tasks are only those whose nice value is positive.) Time spent in niced tasks will also be counted in system and user time, so the total will be more than 100%. The processes and states display may be toggled by the t interactive command.  
4. Mem
Next line displays Statistics on memory usage, including total available memory, free memory, used memory, shared memory, and memory used for buffers.
5. Swap
Statistics on swap space, including total swap space, available swap space, and used swap space. This and Mem are just like the output of free(1). 
7.tail –100f   .std/.log
used to view the log files


Identify the messages with ERROR severity.
Note:if server performance is slow check log files for stuck threads
8.take thread dump if needed


Check for any stuck threads in log file if server is running slow
<Warning> <WebLogicServer> <BEA-000337> <ExecuteThread: '7' for queue: 'weblogic.kernel.Default' has been busy for "630" seconds working on the request "weblogic.ejb20.internal.JMSMessagePoller@d64412", which is
more than the configured time (StuckThreadMaxTime) of "600" seconds.>
9.view the GC statistics from server logfile if needed.you can view same from console also


10.out of memory


In production we need to restart the server as solution
Note: don’t change any heap parameters without proper approval from the client via ticket


o   Login to the Corresponding Server through Putty
o   Then Check the Status of the Server instances
o   Check the Server logs and Outlogs for OutOfMemory Error
o   Take the Access logs at the time of OOM and it will be good if we take thread dump too
§  If Server(s) is/are in Running State.
o   Analysis the Thread dump for the Cause of OutOfMemory Error (Due to App/Server)
o   Then Depending on the Server Status (if not in Running State) Restart the Server.

  1. Site Scope alerts:

Ø  Login to the Server
Ø  Check the server status and Particularly at the time of Site Scope alert
Ø  Check the logs (Server/Out) for any Errors and Exceptions at the time of Site Scope alert.

  1. High CPU utilization:

Ø  Login to the Corresponding Server through putty.
Ø  Check the server instances CPU utilization by using : prstat-a /top -b
Ø  Make Sure that the instances are running in weblogic User. Ps –ef |grep domian
Ø  Check the logs for any findings regarding high utilization: tail -100f .std/.log
Ø  Check the Queue threads : Admin console ->monitoring->
Ø  Restart the instances to bring down the more CPU Utilization. :./stop ./start

  1. High disk space usage on servers:

Ø  Login to the Server.
Ø  Check the disk space of the respective Mount which is consuming more disk Space.  : df –kh / du –sh * |sort -rn
Ø  Zip log files or remove oldest logs backup war files and also access logs.
gzip .log  / rm –rf  <>.log /mv <>.log /archive

  1. Threads count :

Ø  Check the logs for any  Errors and Exceptions: grep I Error .std/.log
Ø  Check the status of instances & connection pools: Admin console
Ø  Check the CPU usage. prstat-a /top -b
Ø  Take the thread dump if possible and Analyze the thread dump
/usr/ucb/ps –auxww |grep <instance>    jstack <pid>/ kill -3 <pid>
Ø  Check with Other Subsystems
Ø  Check with the DB team if any Issues related to Database

  1. Stack overflow:

-          Checkout the Server logs as well as Out logs and also the access logs at the time of Stack Overflow Occurrence. Restart the instance if required.



Ø  Check the Status of the Server : Admin console/ps –ef |grep
Ø  Check the disk Space(if full, Delete the logs and then need to restart the Server)
df –k /logs ->100%  ->  du –sh * |sort –nr |head    -> gzip <>.log
  1. Server Errors:

Ø  Check the Status of Servers.
Ø  Check the Server logs
Ø  If any Database Errors, Check the Connection pool and Datasource.
Ø  Check out the Deployment Descriptors.
Ø  Based on the logs if any Configuration Changes Required, Make the Changes and then restart instances one by one if in Cluster.

  1. Server Down/Unknown:

Ø  Login to the Server through Putty as well as Open the Admin Console
Ø  Check out the respective Instance Process from putty as well as the instance Status from Admin Console
Ø  If Process does not exist and Instance Status is Unknown, then check the logs of the Server Instance as well as Admin Logs.
Ø  Find the root Cause from the logs And Restart the required instances

  1. URL not working:

o   Access the URL
o   Check the Status of the Server instances on which this Application is deployed.
o   Then Check the Default Queue threads or (Application Specified Queue if any), whether idle threads are zero or not. Then Server logs and Application logs (Out logs) for Errors and Exceptions.
o   If idle threads are Zero, Check which Application is consuming all threads and if it is the same application which you are accessing, then check with the Application Owner.
§  (To resolve the above Issue, Need to restart the Corresponding Instances, before that check
·         with the App owner why they are getting consumed)
o   If there is any Application Related Exceptions- Check with the Application owner or check the server logs for exceptions.
o   If there are any DB Exceptions related to the application which you are accessing, Please Check the Corresponding Connection pool and Datasource whether they are running fine or not.
  1. Application errors:

Ø  Access the Application URL
Ø  Check the instances and their status if any Errors
Ø  Check the logs of the Server as well as Application (Out) logs
Ø  Check out the Connection pool Parameters and Datasource

  1. Users unable to access some application/URL:

Ø  Check out by  accessing the url
Ø  Check out whether they are using Correct URL or not
Ø  Check the logs of both Weblogic and Webserver
Ø  Check the Server Instances status.
Ø  Test the pools.
Ø  Check the DB connectivity.
Ø  Check if the deployment is done properly or not, else redeploy the application and check for errors in the logs simultaneously.
Ø  Check out the Connection pool user name.
Ø  Restart the instance if required.

  1. Application error, responding slowly, Application not working/not
Opening, not getting authenticated:
Ø  Check the Web server and App server instance status.
Ø  Check the logs for any errors/exceptions both in Webserver as well as in Weblogic Server.
Ø  Check the Queue threads, Connection pool Status, Connections and Datasource.
Ø  Check disk space
Ø  Check the log4j property enable
Ø  Check if the deployment done properly.
  1. Error while uploading war file:
-          Check out the Availability of Space
license.bea
copy to tmp dir
then cp to server dir
check the old license series
if it is starts with 6 then it will not update. we need to contact bea and get new license then rename the old license and save the new license as license.bea and just stop n start.( no need run updatelicense.sh if it is in 6 series)
If it starts with 4 then rename the old license and rename the new license as license.bea. and run ./updaelicense and restart



===========================================

Unix commands Interview questions

unix commands :-


1.Search a file from root directory.(filename = sample.txt)
  find / -name sample.txt

2.Display the numbers in sorted order
   sort -nu

3.Display the updated lines in file
  tail -f filename

4.How to zip a Directory
  gzip -r /hom/bea/app

5.How to kill a prosess.
   kill -9 pid

6.Display all java process in unix.
    ps -ef |grep java

7.How to delete 10 lines in vi editor.
    10dd

8.How to search a string and replace tin  vi editor.
     :"%s/old string /new string/ gi"

9.How to search a string in vi editor from bottom to top.
     ?string name

10.How to insert a line above the current line in vi editor .
      esc+o

11.How to rename a file .
     mv file1 file2

12. copy the content one directory to another directory.
        cp -r dir1/file1 dir2/file2

13. How can delete blank lines in directory.
      grep -v "^$" sample > temp
      mv temp sample

14.How can replace astring in unix.
     sed "s/oldstring/newstring/g" filename

15.Syntax for zip and unzip afile.
     gzip filename
     guzip filename.gz

16.How can display top 10 disk usage files.
  du -sh *|sort -nr|head -10

17.How to retrive a field in a file.
   cut -f 1,2 stud.

18.what is command search a string in a file.
     grep

19).How to goto the end of the line in Vieditor?
     G

20).Copy 10 line in vi editor?
      10yy

21).How to go to insert mode in vi editor?
      Esc+i

22).How to search a string in a file(filename=sample,string=weblogic)?
     grep weblogic sample

23).How to copyfile from one unix to other unix system and syntax (file=sample.txt,target host=192.168.11.128,target file path=/home/bea)?
     scp -rp sample.txt username@192.168.11.128:/home/bea

24).Copy file from path to other tree structure is given below?

25).How to find out CPU utiligation?
      top

26).How to execute ascript using nohup(script name=startweblogic.sh)?
      nohup ./startWeblogic.sh &

27).Which command is used to search and replace a string?
       sed

28).wich command is used to search astring in multiple files?
      fgrep


1)How to search a string from top to bottom in vi Editor's(String =weblogic)?
A) /weblogic

2)how to save and quit from vi Editor's?
a)WQ!

3)what are the Advantages of nohup command?
A)nohup will execute the process if you layout system.
Syn:nohup &.

4)Differance b/w the ping and tracert?
A) ping                   tracert
1)It is check the connectivity.   1.It is packet information one place to
                                   another place destination.
2)It is display all at a time.    2.It display only 30 hubs in tracert.
   
5)How to execute unix commands in vi?
A):ls

6)tar dir1 dir2 dir3 and new_dir?
A)syn:tar -cvf new_dir.tar dir1 dir2 dir3.

7)How to display the ipaddress and portnumber?
A)netstat -anp

8)how to delete directory with recursion and force?
A)rm -rf filename.

9)How to Open a file with page to page ?
A)more filename

10)How to Hide a file (file name=tuxedo)?
A)mv tuxedo .tuxedo

11)How to do undo in vi Editor's?
A)u

12)How to goto 100 Line in Vi Editor's?
A)100L

13)How to display last 100 Line from a file?
A)tail -100 filename

14)How to Reterive the fields from a file?
A)cut

15)How to zip a directory?(/home/directory)?
A)gzip -r /home/directory

16)How to go to end of the line in Vi?
A)G

17)Display the directiory count in current dir?
A)ls -lrt !wc -l

18)How to display all files ending with "log"?

19)How to appeand data to the existing file?
A)cat>>filename

20)How to find out the diskspace of the fileSystem?
A)df -sh

21)write A syntax for the scp Commands?
A) Scp filename root@ipaddress:filename.

22)How to display the updated lines in file ?
A)tail -f filename.

23)How to display the ipaddress and portnumber?
A)netstat -anp

24)which command is used to connect to the remote server?
A)telenet ipaddress.

25)How to repalce and String in vi Editor's?
A) sed %s/oldstring/newstring/g

25)How to repalce and String in unix?
A) sed s/oldstring/newstring/g filename

26)How to display top 10 lines from a file?
A) head -10 filename.

27)syntax for tar and untar a file?
A)tar -cvf filename.tar file1 file2 file3.
  tar -Xvf filename.tar

28)How to display hiddean a file?
A)ls -a.

29)How to Delete a blank line from a filename?
A)grep "^v" sample >temp
  mv temp filename

30)How to display all cuurent running process?
A)ps -ef

31)syntax for zip and unzip file?
A)gzip filename.
  gunzip filename.gz


=============================================================================================

Troubleshooting Tips on Windows 

Hung,Deadlocked, Looping Process
Post-Mortem Diagnostics,Memory Leaks
Monitoring
OtherFunctions.
Hung, Deadlocked, or Looping Process
·         Print thread stack for all Java threads:
o    Control-Break
o    jstack pid
·         Detect deadlocks:
o    Request deadlock detection: JConsole tool, Threads tab
o    Print information on deadlocked threads: Control-Break
o    Print lock information for a process: jstack -l pid
·         Get a heap histogram for a process:
o    Start Java process with -XX:+PrintClassHistogram, then Control-Break
o    jmap -histo pid
·         Dump Java heap for a process in binary format to file:
o    jmap -dump:format=b,file= filename pid
Post-mortem Diagnostics, Memory Leaks
·         Examine the fatal error log file. Default file name is hs_err_pid pid .log in the working directory.
·         Create a heap dump:
o    Start the application with HPROF enabled: java -agentlib:hprof=file= file,format=b application; then Control-Break
o    Start the application with HPROF enabled: java -agentlib:hprof=heap=dumpapplication
o    JConsole tool, MBeans tab
o    Start VM with -XX:+HeapDumpOnOutOfMemoryError; if OutOfMemoryError is thrown, VM generates a heap dump.
·         Browse Java heap dump:
o    jhat heap-dump-file
·         Get a heap histogram for a process:
o    Start Java process with -XX:+PrintClassHistogram, then Control-Break
o    jmap -histo pid
Monitoring
jstat is not available on Windows 98 or Windows ME.)
Note: The vmID argument for the jstat command is the virtual machine identifier. See the jstatman page for a detailed explanation.
·         Print statistics on the class loader:
o    jstat -class vmID
·         Print statistics on the compiler:
o    Compiler behavior: jstat -compiler vmID
o    Compilation method statistics: jstat -printcompilation vmID
·         Print statistics on garbage collection:
o    Summary of statistics: jstat -gcutil vmID
o    Summary of statistics, with causes: jstat -gccause vmID
o    Behavior of the gc heap: jstat -gc vmID
o    Capacities of all the generations: jstat -gccapacity vmID
o    Behavior of the new generation: jstat -gcnew vmID
o    Capacity of the new generation: jstat -gcnewcapacity vmID
o    Behavior of the old and permanent generations: jstat -gcold vmID
o    Capacity of the old generation: jstat -gcoldcapacity vmID
o    Capacity of the permanent generation: jstat -gcpermcapacity vmID
·         Monitor objects awaiting finalization:
o    JConsole tool, VM Summary tab
o    getObjectPendingFinalizationCount method injava.lang.management.MemoryMXBean class
·         Monitor memory:
o    Heap allocation profiles via HPROF: java -agentlib:hprof=heap=sites
o    JConsole tool, Memory tab
o    Control-Break prints generation information.
·         Monitor CPU usage:
o    By thread stack: java -agentlib:hprof=cpu=samples application
o    By method: java -agentlib:hprof=cpu=times application
o    JConsole tool, Overview and VM Summary tabs
·         Monitor thread activity:
o    JConsole tool, Threads tab
·         Monitor class activity:
o    JConsole tool, Classes tab
Other Functions
·         Interface with the instrumented Java virtual machines:
o    Monitor for the creation and termination of instrumented VMs (not Windows 98 or Windows ME): jstatd daemon
o    List the instrumented VMs (not Windows 98 or Windows ME): jps
o    Provide interface between remote monitoring tools and local VMs (not Windows 98 or Windows ME): jstatd daemon
o    Request garbage collection: JConsole tool, Memory tab
·         Dynamically set, unset, or change the value of certain Java VM flags for a process:
o    jinfo -flag flag pid
·         Pass a Java VM flag to the virtual machine:
o    jconsole -J flag ...
o    jhat -J flag ...
·         Report on monitor contention:
o    java -agentlib:hprof=monitor=y application
·         Evaluate or execute a script in interactive or batch mode:
o    jrunscript
·         Interface dynamically with an MBean, via JConsole tool, MBean tab:
o    Show tree structure.
o    Set an attribute value.
o    Invoke an operation.
o    Subscribe to notification.
·         Run interactive command-line debugger:
o    Launch a new VM for the class: jdb class
o    Attach debugger to a running VM: jdb -attach address
 __________________________________________________________________________________________

Troubleshooting Tips on Solaris OS and Linux 

Monitoring Tools
Java™ VisualVM is a new monitoring and profiling tool for troubleshooting Java applications. It incorporates various technologies, including jvmstat and JMX, as well as CPU and memory profiling, to provide one easy-to-use integrated visualization tool. Developers can rapidly create their own extensions using a public API, and may share them with the community on a central repository.
·         Display local and remote Java applications.
·         Display application configuration and runtime environment.
·         Monitor application memory consumption and runtime behavior.
·         Monitor application threads.
·         Profile application performance or analyze memory allocation. (Local applications only.)
·         Create and display thread dumps.
·         Create and browse heap dumps.
·         Analyze core dumps.
·         Analyze applications offline.
·         Get additional plugins contributed by the community.
·         Write and share your own plugins.
Launch a GUI to monitor and manage Java applications and Java VMs on a local or remote machine.
·         Connection to Java process, host, or JMX agent.
·         Graphical overview of CPU usage, heap memory, threads, classes.
·         Summary of key data, for example, uptime, compilation time, objects pending finalization, and more.
·         Memory statistics, including garbage collection.
·         Request garbage collection.
·         Thread statistics.
·         Deadlock detection.
·         Class statistics.
·         Tree structure of all platform and application MBeans.
·         Set the value of an MBean attribute.
·         Invoke operation on an MBean, for example, perform a heap dump.
·         Subscribe to notification for an MBean.
·         Information about the virtual machine, the compiler, the operating system.
·         Pass flags to VM on which JConsole is running.
List instrumented Java virtual machines.
Display performance statistics for an instrumented Java VM:
·         Behavior of the class loader
·         Behavior of the HotSpot Just-in-Time compiler, totals and by method
·         Behavior of the GC heap
·         Behavior and sizes of the generation areas
·         Monitor for creation and termination of instrumented HotSpot Java VMs.
·         Provide an interface for remote monitoring tools to attach to Java VMs.


Debugging Tools
HPROF profiler
Writes class profiling information to a file or a socket, in ASCII or binary.
·         Heap allocation profiling.
·         Heap dump.
·         CPU usage - for threads, methods.
·         Monitor contention profiling.
To invoke the HPROF tool: java -agentlib:hprof ToBeProfiledClass
To print the complete list of options: java -agentlib:hprof=help
Launch a simple interactive command-line debugger.
·         Display Java objects and primitive values.
·         List currently running threads.
·         Dump the current thread stack.
·         Set breakpoints.
·         Step through execution.
·         Examine exceptions.
Parse a binary heap dump, launch a web browser, and present standard queries.
·         Execute standard queries, for example, classes, objects, class instances, reference chains from object rootset, reachable objects, and more.
·         Turn on or off object allocation tracking.
·         Turn on or off object reference tracking.
·         Specify objects to exclude from "reachable objects" query.
·         Pass flags to the Java VM on which jhat is running.
·         Develop custom queries with buit-in Object Query Language.
·         Compare objects in two dumps.
·         Print command line flags and system properties for a running process, from a core file, or for a remote debug server.
·         Dynamically set, unset, or change the value of certain Java VM flags for a process.
·         Print shared object mappings for a process, a core file, or a remote debug server.
·         Dump the Java heap (all objects or only live objects) of a process, a core file, or a remote debug server in binary format to a file.
·         Force the Java heap dump of a process if the process does not respond.
·         Print a heap summary for a process, a core file, or a remote debug server.
·         Print a histogram (all objects or only live objects) of a process, a core file, or a remote debug server.
·         Print information on objects awaiting finalization.
·         Print class loader statistics of the permanent generation.
·         Pass flags to the VM.
Serviceability Agent Debug Daemon, which acts as debug server.
·         Attach to a Java process or a core file.
·         Remote clients can attach to the server using RMI.
·         Print stack traces of threads for a process, core file, or remote debug server
·         Print information about concurrent locks for a process, core file, or remote debug server
·         Force the stack dump for a process if the process does not respond


Scripting Tools
·         Command-line script shell: Evaluate or execute a script in interactive or batch mode.
·         Pass flag to the VM.
·         Set Java system properties.


_________________________________________________________________________________

SV1:   OOM, Native OOM, Server Crash, High CPU Utilization, Server down/Unknown
SV2:   404, 403, Users Unable to access some application and URL, application errors, application responding slowly, application not working , application not opening, not getting authenticated, blank page.
SV3:   Log file not rotating, high disk space usage on servers, Stack overflow, Thread count, Site scope alert, Error while uploading war file.
SV4:   User creation errors.

________________________________________________________________________________


  1. 1.OOM

    Ø  Login to the Corresponding Server through Putty
    Ø  Then Check the Status of the Server instances
    Ø  Check the Server logs and Out logs for OutOfMemory Error
    Ø  Take the Access logs at the time of OOM and it will be good if we take thread dump
    Ø  If Server(s) is/are in Running State.
    Ø  Analysis the Thread dump for the Cause of OutOfMemory Error (Due to App/Server)
    Ø  Then Depending on the Server Status (if not in Running State) Restart the Server.

     OutOfMemory during deployment:

    Ø  If the application is huge(contains more than 100 JSPs), we might encounter this problem with default JVM settings.
    Ø  The reason for this is, the MaxPermSpace getting filled up.
    Ø  This space is used by JVM to store its internal datastructures as well
    Ø  as class definitions. JSP generated class definitions are also stored in here.
    Ø  MaxPermSpace is outside java heap and cannot expand dynamically.
    Ø  So fix is to increase it by passing the argument in startup script of the server: –XX:MaxPermSize=128m (default is 64m)


    2.Site Scope alerts:

    Ø  Login to the Server
    Ø  Check the server status and Particularly at the time of Site Scope alert
    Ø  Check the logs (Server/Out) for any Errors and Exceptions at the time of Site Scope alert.

    3.High CPU utilization:

    Ø  Login to the Corresponding Server through putty
    Ø  Check the server instances CPU utilization
    Ø  ps –ef  [0r] top [or] prstat
    Ø  aix: topas or psstat
    Ø  Make Sure that the instances are running in weblogic User.
    Ø  ps –ef | grep java
    Ø  Check the logs for any findings regarding high utilization
    Ø  Check the Queue threads
    Ø  If 100% cpu utilization :: kill -9 pid
    Ø  Restart the instances to bring down the more CPU Utilization.


    4.High disk space usage on servers:


    Ø  Login to the Server.
    Ø  Check the disk space of the respective Mount which is consuming more disk Space.
    Ø  df –kh
    Ø  Zip log files or remove oldest logs backup war files and also access logs.
    Ø  gzip <filename> or compress <filename>
    Ø  [0r] rm –rf  <filename>
    Ø  Backup : mv /apps/bea/domains/gwmp_desktop/ads_web.war /apps/back_up/ads_web.war_bak
    Ø  Backup: mv <sourcepath> <destinationpath>.

    5.Threads count :

    Ø  Check the logs for any  Errors and Exceptions
    Ø  Check the status of instances & connection pools
    Ø  Check the CPU usage.
    Ø  Take the thread dump if possible and Analyze the thread dump
    Ø  Check with Other Subsystems
    Ø  Check with the DB team if any Issues related to Database.
    6.Stack overflow:

    Ø  Checkout the Server logs as well as Out logs and also the access logs at the time of Stack Overflow Occurrence. Restart the instance if required
    Ø  Xss=.

    7. Log files not rotating:


    Ø  Check the Status of the Server
    Ø  ./startWeblogic.sh
    Ø  ./startManagedWeblogic.sh <manageservername>
    Ø  [0R]
    Ø  Check through console.
    Ø  Check the disk Space(if full, Delete the logs and then need to restart the Server)
    Ø  du –kh (folder)
    Ø  df –kh (filesystem)
    Ø  avail capacity
    Ø  45% 90%
    Ø  If full , mv <source path> <destination path>
    Ø  Delete, rm –rf <filename: adminserver.log>

    8.Server Errors:

    Ø  Check the Status of Servers.
    Ø  ./startWeblogic.sh
    Ø  ./startManagedWeblogic.sh <manageservername>
    Ø  [0R]
    Ø  Check through console.
    Ø  Check the Server logs
    Ø  /apps/bea/domain/gwmp_destop/logs
    Ø  Adminserver.log
    Ø  Managedserver.log
    Ø  If any Database Errors, Check the Connection pool and Datasource.
    Ø  Services->jdbc->connectionpool,datasource
    Ø  Check out the Deployment Descriptors.
    Ø  Weblogic.xml,web.xml
    Ø  Based on the logs if any Configuration Changes Required, Make the Changes and then restart instances one by one if in Cluster.

    9.Server Down/Unknown:

    Ø  Login to the Server through Putty as well as Open the Admin Console
    Ø  Check out the respective Instance Process from putty as well as the instance Status from Admin Console
    Ø  If Process does not exist and Instance Status is Unknown, then check the logs of the Server Instance as well as Admin Logs.
    Ø  Admin and managed server logs.
    Ø  Node manage status.
    Ø  Find the root Cause from the logs And Restart the required instances

    10. URL not working:


    Ø  Access the URL
    Ø  Check the Status of the Server instances on which this Application is deployed.
    Ø  Then Check the Default Queue threads or (Application Specified Queue if any)
    Ø  whether idle threads are zero or not. Then Server logs and Application logs (Out logs) for Errors and Exceptions.
    Ø  If idle threads are Zero, Check which Application is consuming all threads and if it is the same application which you are accessing, then check with the Application Owner.
    Ø  (To resolve the above Issue, Need to restart the Corresponding Instances, before that check
    Ø  with the App owner why they are getting consumed)
    Ø  If there is any Application Related Exceptions- Check with the Application owner or check the server logs for exceptions.
    Ø  If there are any DB Exceptions related to the application which you are accessing, Please Check the Corresponding Connection pool and Datasource whether they are running fine or not.

    11.Application errors:

    Ø  Access the Application URL
    Ø  Check the instances and their status if any Errors
    Ø  Check the logs of the Server as well as Application (Out) logs
    Ø  Check out the Connection pool Parameters and Datasource



    12.Users unable to access some application/URL:

    Ø  Check out by  accessing the url
    Ø  Check out whether they are using Correct URL or not
    Ø  Check the logs of both Weblogic and Webserver
    Ø  Check the Server Instances status.
    Ø  Test the pools.
    Ø  Check the DB connectivity.
    Ø  Check if the deployment is done properly or not, else redeploy the application and check for errors in the logs simultaneously.
    Ø  Check out the Connection pool user name.
    Ø  Restart the instance if required.

    13.Application error, responding slowly, Application not working/not Opening, not getting authenticated,Blank page

    Ø  Check the Web server and App server instance status.
    Ø  Check the logs for any errors/exceptions both in Webserver as well as in Weblogic Server.
    Ø  Check the Queue threads, Connection pool Status, Connections and Datasource.
    Ø  Check disk space
    Ø  Check the log4j property enable.
    Ø  Check if the deployment done properly.


    14.Error while uploading war file:

    check out the Availability of Space


    15.Log locations:

    1) Server log
    WebLogic server creates server log file by default under:
    /<domain-name>/<server name>/<server name>.log
    The location is configurable.

    2) JDBC log
    All SQL statements and DB related exceptions/errors.
    This file is created under /<server name>/jdbc.log

    3)STDout log (If the process is redirected to STDout)
    Domain log
    All domain level information is logged into this file.
    This is subset of server log file.
    <domain name>/<domain name>.log
    4) Access log
    All http requests are recorded in this log file
    /<server name>/access.log
    5) Transaction log
    All servers record transaction in the tlog file
    /<server name>/<server name>.tlog

    16.Server Crash:
    Ø  Server Crash
    Ø  This implies the weblogic java process no longer exists.
    Ø  Server crash can occur only because of native code. (Java cannot cause a process to crash)
    Ø  Determine all potential sources of native code used by the WebLogic Server.
    Ø  nativeIO.
    Ø  Type4 jdbc driver.
    Ø  Native libraries accessed with JNI calls.
    Ø  SSL native libraries.
    Ø  JVM itself. Most of the times its from JVM.

    Sometimes the JVM will produce a small log file that may contain useful information as to which library the crash has originated from. (hs_err_pid*.log)

    Server Crash Analysis
    When a JVM is crashed, a core file(binary image of the process) is created. Run pmap and pstack against the core file to get the library that caused the crash.

    Demo to figure out offending library using existing pmap & pstack out files.
    Check list:

    1) hs _err_pid*.log (Look for library that caused the crash)

    2) pmap core (core file created in JVM root dir)
    pstack core

    3) Using debugger (gdb,dbx,adb) (if above two steps does not provide any information)

    17.Server Hang:

    A server is said to be hung when:
    Ø  Process is still alive
    Ø  Server does not accept any requests because all the execute threads busy or stuck for some reason.
    Ø  No reponse sent to clients.
    Ø  java weblogic.Admin PING command doesn’t return a normal reponse

    Server Hang Analysis:
    The first step is to take multiple thread dumps.
    Ø  A thread dump is a snapshot of the JVM at the particular instant.
    Ø  Multiple thread dumps are necessary to conclude that the threads are  stuck and not progressing.

    Procedure to take thread dumps:
    Unix:
    Ø  Open shell window and issue the command  kill -3 <PID>
    Ø  where PID is java processID of weblogic. Thread dumps are
    Ø  logged on to STDout file.
    Windows:
    Ø  Do ctrl-break on command window where weblogic is running.
    Ø  Thread dumps are created on the same command window.

    Windows Service:
    Ø  Open a command prompt and issue the command(Make sure beasvc.exe is in the PATH)
    Ø  c:\> beasvc -dump -svcname:service-name
    Ø  Thread dumps are created in the defined log file.
    Ø  While creating service, we can provide log option in installservice script    as:
    Ø  -log:"d:\bea\domains\mydomain\myserver-stdout.txt

    •             Before we analyze thread dumps, it is important to know the common thread states:
    1)Runnable [marked as R in some VMs]:
    This state indicates that the thread is either running currently or is ready to run the next time the OS thread scheduler schedules it.
    2)Object.wait() [marked as CW in some VMs]:
    Indicates that the thread is waiting for some condition to be
    fulfilled.
    3)Waiting for monitor entry [marked as MW in some VMs]:
    Indicates that the thread is waiting to enter a synchronized block.

    These threads are something to watch out because there is lock contention here. Thread is waiting for a lock on object and some other thread is holding the lock.

    In case of weblogic, the main worker threads are from group weblogic.kernel.defalt:
    "ExecuteThread: '1' for queue: 'weblogic.kernel.Defalt'“….
    This is the set of threads we need to look for hang/slow performance issues.
    This is a snapshot of idle thread waiting for some work to be assigned.
    On an idle system you would see lot of threads in the below state:

    "ExecuteThread: '1' for queue: 'weblogic.kernel.Defalt'" daemon prio=5 tid=0x031a6308 nid=0x980 in Object.wait() [2dff000..2dffd8c]
    at java.lang.Object.wait(Native Method)
    - waiting on <0x112cf2c0> (a weblogic.kernel.ExecuteThread)
    at java.lang.Object.wait(Object.java:429)
    at weblogic.kernel.ExecuteThread.waitForRequest(ExecuteThread.java:153)
    - locked <0x112cf2c0> (a weblogic.kernel.ExecuteThread)
    at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:172)

    •             As for thread dump analysis & conclusion, lets see a sample thread dump and drill into it further
    Demo of RSD thread dump (Thread stuck issue on UAT)
    Server performing Slow
    There are lot of reasons for server performing slow.
    First step is to take thread dumps and see what the threads are doing. If there is nothing wrong with the threads  there are other reasons why server performs slow:

    Process runs OutOfMemory:
    If java heap is full, server process appears to be hung and not accepting any requests because each request needs heap for allocating objects.
    So if heap is full, none of the requests get served, all the requests fail with java.lang.OutOfMemory


    OutOfMemory Analysis:
    OutOfMemory can occur because of real memory crunch or a memory leak causing the heap to fill with orphaned objects.
    First step is to enable GC and run the server again.
    (-XX:printGCDetails).
    The STDout file would show the garbage collection details.
    If the error is because of memory leak, then we would need to use profilers like Introscope or optmizeIT to figure out the source of leak.
    OutOfMemory Analysis
    Process size  = java heap + native memory + memory occupied by the executables and libraries.
    On 32 bit operating systems, the virtual address space of a process can go up to 4 GB. This is data bit limitation (2 pow 32)

    Out of this 4 GB, the OS kernel reserves some part for itself (typically 1 – 2 GB).
    This is not a limitation on 64 bit machines like solaris(sparc) or windows running on Itanium (64 bit)

    OutOfMemory Analysis
    OOM can occur due to fragmentation. In this situation, we can see free memory available but still get OutOfMemory errors.
    Before we know about fragmentation, we need to know the following fact:
    Heap allocation can only be contiguous (As per JVM spec). If a request needs 2MB of memory then JVM has to provide 2MB of contiguous memory chunk.
    Over a period of time, memory allocation is becomes scattered and there might not be enough contiguous memory available.
    FullGC might no be able to reclaim the contiguous space.
    This is called fragmentation
    For eg: The verbose:gc output might look like the following if there was a fragmentation of heap. There is free memory available, but  still JVM throws OOM error.
    (Most of the fragmentation bugs are resolved in Sun JDK1.4.2_xx)

    [GC 4673K->3017K(32576K), 0.0050632 secs]
    [GC 5047K->3186K(32576K), 0.0028928 secs]
    [GC 5232K->3296K(32576K), 0.0019779 secs]
    [GC 5309K->3210K(32576K), 0.0004447 secs]
    java.lang.OutOfMemoryError

    •             OutOfMemory Analysis
    Fragmentation relates issues are because of bug in JVM.
    Best approach is to try the latest minor version of JVM and if does not work out, we need to work with vendor to get it fixed.
    •             The following commands on solaris will provide good information:
    vmstat :
    The vmstat command reports statistics about kernel   threads, virtual memory, disks, traps and CPU activity
    sar:
    An OS utility that is termed as system activity reporter
    •             If the application uses SSL, then the server performs slow compared to non SSL.
    SSL reduces the capacity of the server by about 33 to 50 percent depending upon the strength of encryption used in the SSL connections.

    Process running out of File descriptors. Server cannot accept further requests because sockets cannot be created. (Each socket created consumes a FileDescriptor)
    The following exception is thrown in such cases:
    java.net.SocketException: Too many open files
    OR
    java.io.IOException: Too many open files
    In the above case, the lsof utility would help. lsof utility shows the list of all open filedescriptors. From the list of open files, we ( application owner) can easily figure out if it is a bug or expected behavior. If it is expected behavior, then the number of FDs needs to be increased. (default number is 1024)

    •             GC taking long times (more than 20secs).
    This appears like a hang for end users.
    In the above case, we need to tune the GC parameters.
    In these scenarios, we should be trying other GC options  available. In some cases (GC taking very long times), incremental GC has been useful (-Xincgc).




    WebLogic Troubleshooting
    Communication bt Apache - Weblogic

    If there is any issue between Apache and Weblogic and the cause is not obvious, enable debug at Apache layer. In http.conf file add:
    Debug ALL
    This would create file called wlproxy.log under /tmp of Apache machine. The log would contain all the request/response headers between Apache and WebLogic.
    Most of the plug-in issues in WLS8.1 were centered around the attribute “KeepAliveEnabled”.
    For most of the socket related errors, it worth trying turning off
    “KeepAliveEnabled” and redo the test.

    Apache Restart and Check the Connection counts:

    APACHE_HOME\bin\Apache –t   Syntax check
    APACHE_HOME\bin\Apache  start Start the server
    APACHE_HOME\bin\Apache  stop Stop the server
    APACHE_HOME\bin\Apache  Restart
    APACHE_HOME\bin\Apache  -l
_______________________________________________________________________

Getting error while restarting one of the Weblogic server instance


####<Sep 13, 2007 6:45:44 PM IST> <Error> <EmbeddedLDAP> <bng1web2prod> <itms> <[ACTIVE] ExecuteThread: '0' for queue: 'weblog
ic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1189689344635> <000000> <Error opening the Transaction Log: ./servers/
itms/data/ldap/ldapfiles/EmbeddedLDAP.tran (Permission denied)>
####<Sep 13, 2007 6:45:44 PM IST> <Error> <EmbeddedLDAP> <bng1web2prod> <itms> <[ACTIVE] ExecuteThread: '0' for queue: 'weblog
ic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1189689344637> <000000> <Error Instantiating 'dc=web2prod-Domain': nul
l>
####<Sep 13, 2007 6:45:44 PM IST> <Critical> <EmbeddedLDAP> <bng1web2prod> <itms> <[ACTIVE] ExecuteThread: '0' for queue: 'web
logic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1189689344653> <BEA-171521> <An error occurred while initializing t
he Embedded LDAP Server. The exception thown is java.lang.ClassCastException: com.octetstring.vde.backend.BackendRoot. This ma
y indicate a problem with the data files for the Embedded LDAP Server. This managed server has a replica of the data contained
 on the Master Embedded LDAP Server in the Admin server. This replica has been marked invalid and will be refreshed on the nex
t boot of the managed server. Retry the reboot of this server.>
####<Sep 13, 2007 6:45:44 PM IST> <Critical> <WebLogicServer> <bng1web2prod> <itms> <main> <<WLS Kernel>> <> <> <1189689344667
<BEA-000362> <Server failed. Reason:

While restarting WL instance on 9.2 I got the above-mentioned error and I found that server is getting started but again it’s getting forced shutdown.

Solution:
Just go to that server instance directory and browse inside that for the
/local/BEA/weblogic92/domain-name/servers/server-name/data/ldap/ldapfiles directory path

You will get below listed files in that particular directory

-rw-r--r--   1 weblogic weblogic   79649      Sep 13  18:48 EmbeddedLDAP.data
-rw-r--r--   1 weblogic weblogic       0          Sep 13  18:48 EmbeddedLDAP.delete
-rw-r--r--   1 weblogic weblogic     648        Sep 13  18:48 EmbeddedLDAP.index
-rw-r--r--   1 weblogic weblogic       0          Sep 13  18:48 EmbeddedLDAP.lok
-rw-r--r--   1 weblogic weblogic   80126      Sep 13  18:48 EmbeddedLDAP.tran
-rw-r--r--   1 weblogic weblogic       8          Sep 13  18:48 EmbeddedLDAP.trpos

Just delete the below listed files inside the directory
-rw-r--r--   1 weblogic weblogic       0      Sep 13 18:48 EmbeddedLDAP.delete
-rw-r--r--   1 weblogic weblogic       0      Sep 13 18:48 EmbeddedLDAP.lok

Now restart the instance from the bin directory, this will get your Server up and running without issue.
_______________________________________________________________________________

Issue 1: JMS Issue 1

EOP messaging bridges failing frequently with error : "(java.lang.Exception: javax.resource.ResourceException: method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found). Because of this issue messages are being piled up on MQ and not being picked up by the bridge.

Soln:

    Domain:eopdom1 (1admin +2ms spread across 2 servers). Checked the bridge configuration (70 bridges in total). Then checked the pools-param in jma-xa-dap.rar (120 on m1 and 20 on m2). Changed this to 150 on both servers as each bridge needs atleast 2 connections from the adapter pool, then redeployed and restarted weblogic instances. Also applied patch WB1E (CR326720_920.jar) to resolve the known issue with the error mentioned.

Notes:

Live is running on 9.2.0 and test is running on 9.2.3, this should be brought in sycn. Also, planning a quick round of WLS health check on EOP.


Issue2: JMS issue 2

Messaging bridge failed to connect with the source and target destinations and was giving below error: "failed to get one of the adapters from JNDI (javax.naming.NameNotFoundException: Unable to resolve 'eis.jms.WLSConnectionFactoryJNDIXA'. Resolved 'eis.jms'; remaining name 'WLSConnectionFactoryJNDIXA')". This would suggest that the adapter file jms-xa-adp.rar was either not targeted to the required managed server instance or perhaps the deployment of adapter failed with certain error.

Soln:

Found that the adapter was only targeted to managed2 server whereas the bridge was configured to run on managed1 server. Targeted the adapter to managed1 server as well and restarted the instances.

Issue3: JMS issue 3

A newly configured messaging bridge failed to become Active and following 2 error messages were seen: " Unable to connect to source destination" and "Configured QoS is not reachable".

Soln:

"Unable to connect to source destination" found that the source URL had a space between the "//" and IP, removed this and now the bridge was able to connect to source destination. "configured QoS is not reachable" found that the "QoS degradation allowed" was checked for earlier bridges but was unchecked for this new bridge and QoS was configured for "Exactly One" delivery, enabled this and the messaging bridge became Active upon bounce of weblogic instances.

Notes:  

 QoS "Exactly Once" required the messaging to be XA enabled i.e. the connection factory should be XA enabled and the destinations should be configured to use jms XA adapter.



Issue4: Deployer

Unable to deploy application from the console and getting following error on the console page ""Deployer:149150]An IOException occurred while reading input.; nested exception is: java.net.SocketException: Connection reset; nested exception is: java.net.SocketException: Connection reset".

Soln:

The only error message in the logs was indicating that the application is attempting to connect to java.sun.com on port 80 over internet but this was disabled due to firewall restrictions, reported this to application team. As a work-around added a manual entry in config.xml for application and restarted the admin and managed server instances and the application got deployed sucessfully.

Notes:

One of the argument was that the application was getting deployed properly on another test instance even with the same error. Though we were never able to replicate this again, one theory is that while deploying through console the deployer was attempting to connect to java.sun.com again and again, eventually timing out but by adding entry in config.xml and restart it just attempted once and moved over with other tasks which would have higher priority during restart.


Issue4: Startup

“/wls_domains/wlmrtnept/servers/managed3_wlmrtnept/tmp/managed3_wlmrtnept.lok : java.io.IOException: No locks available”

Soln:

This could be due to incorrect NFS setup (if NFS filesystem is used), check if the hosts have correct permissions on the NFS server. Check for the below nfs libraries they should be installed.yum list | grep nfs

*Note*: Red Hat Network repositories are not listed below. You must run this command as root to access RHN repositories.
nfs-utils.x86_64                           1:1.0.9-40.el5         installed
nfs-utils-lib.x86_64                       1.0.8-7.2.z2           installed
Also rpc.statd and rpc.idmapd processes should be running.


Issue5:  Cluster

For quite sometime we were observing multicast packet loss issues triggering various other problems on Weblogic like managed servers dropping out of cluster, jms messages not delivered properly to distributed queues.

A recurring message similar to below appears in the logs, although it is an informational message only but it in turn acts as a trigger to various other issues, so messages like this should not be neglected.

<Mar 23, 2010 12:14:04 PM GMT> <Info> <Cluster> <host1> <managed1> <[ACTIVE] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1269346444069> <BEA-000112> <Removing managed2 jvmid:-1616739071273980991S:host2:[61002,61002,-1,-1,-1,-1,-1]:host1:61001,host2:61002,host3:61003,host4:61004:domain-name:managed2 from cluster view due to timeout.>

Soln:

We used multicast test utility to see if there is in fact any issue with multicasting

java utils. MulticastTest –n <name> -a <multicast-address> -p <multicast-port>

The result showed that the multicast packets are intermittently being dropped within the vLAN causing the above issue. We then liaised with the OS experts to narrow down the issue and to see whether the multicast packets are being transmitted correctly amongst the servers. This did not help much as from the server perspective all the packets were being transmitted correctly.

Next we involved network experts to seek their help. After thorough investigations of the network logs and various switch configurations it was concluded that this was down to the multicast address range being used and the way the local switches acknowledged that multicast range. They also suggested that in future we make use of Link Local Multicast IP Addresses for Weblogic multicasting purposes.

A note on Link Local IP can be found at: http://www.iana.org/assignments/multicast-addresses/

In short, Multicast Link-local addresses (actually, the link-local mac-addresses) are treated as broadcasts by the local switches so all web logic servers on the same vlan will see them. Other multicast addresses are dropped by the switches as default unless further action is taken:
Disable IGMP snooping on the vlan or the whole switch – otherwise the switch just drops the multicast packet because Web logic doesn’t use IGMP so the switch never sees an IGMP join request to the multicast group (and thus never maps the mac address to the switch port).  OR
Configure static multicast mac addresses for the relevant switch ports.

Both the above 2 options add network complexity and are costly to implement, test and maintain. Link-local multicast addresses completely avoid these issues. Some previous implementations using non link local multicast addresses may have worked OK if the switch had IGMP snooping disabled globally or per vlan.


Issue6:   JDBC

Stale connections causing high CPU and high memory utilization and eventually breakdown of database.

Soln:


The hardware and software could not be easily replaced due to the cost it incurred and the complexity of the application. So the challenge was to make the best utilization of the database, reduce the connections as much as possible, effectively use the connections made and refine the code, if possible.

Majority of the connections to the database were being made by the connections pools configured on Web logic, so web logic was the ultimate target for refinement. During the periods when the issue occurred it was observed that the number of sessions rose rapidly from 1400 to 1800, whilst database was capable of handling 1400 sessions it couldn’t support 1800 sessions at all. Most of these 1800 sessions were connections created by web logic in response to application request. So, it was clear that we need to get back to pen and paper and tune the connection pools as much as possible.

A look at the configuration of connections pools pointed towards a major issue.  The application had an admin server and 14 managed servers in total with four connections pools in total. Each connection pool, irrespective of whether it was required there or not, was targeted on to admin and all the managed servers. This resulted in creation of many unwanted stale connections on the database which otherwise could have easily been avoided. Few connection pools were only required on admin servers and others on managed servers only.

So, as the first step towards tuning Web logic connections pools we got rid of all such unwanted connections. This effort paid and the total maximum connections onto database was brought down from 2250 to 1520 (a saving of 730 connections).

Next we resorted to tuning various parameters available for connection pool. We mainly concentrated on two parameters, Shrink Frequency and Inactive Connection Timeout. A short description below:

Shrink Frequency: The number of seconds (between 0 and a positive 32-bit integer) before Web Logic Server shrinks the connection pool to the original number of connections or number of connections currently in use. (This field is relevant only if you check the Allow Shrinking box.)

Inactive Connection Timeout: The number of inactive seconds on a reserved connection (between 0 and a positive 32-bit integer) before Web Logic Server reclaims the connection and releases it back into the connection pool.

Shrink frequency was set to a default value of 900s. It was found that the connection pool size expanded to its maximum value during peak loads but on average any transaction took about 100 to 200s to complete. So, we thought of reducing the shrink frequency to 300s so that the pool is shrunk every 300s and any idle connections are closed.

Also, Inactive Connection Timeout was set to a default value of 0s which meant that inactive connections were not being released back to pool causing weblogic to spawn new connections. This was later set to 300s so any inactive connections can be released back to pool and can be reused.

The above actions proved quite effective in terms of reducing the overall load on the database.


Issue7: Startup

While tying to start Web logic as a Windows service the Service Manager throws an exception – “Error 1067 the process terminated unexpectedly.”

When this happens there will not be any information recorded in the web logic logs as the Service Manager had failed to initiate web logic start-up process itself. But we can check the Windows Event logs for more information on this.

Soln:

While tying to start Web logic as a Windows service the Service Manager throws an exception – “Error 1067 the process terminated unexpectedly.”

When this happens there will not be any information recorded in the web logic logs as the Service Manager had failed to initiate web logic start-up process itself. But we can check the Windows Event logs for more information on this.
___________________________________________________________________