Tuning apache and tomcat for web 2.0 comet applications


Disclaimer (Added on 25 December 2011)


This is a very old page written sometime in 2006. Lot of information may be out of date. I have personally switched to Nginx. However this page remains a good chronicle of Apache, Tomcat, Linux and Java tuning effort. To put things in perspective, C10K was considered insurmountable, Jetty had just got continuation support, Tomcat 6 was not out and we all aspired to be "MySpace" and no body knew of FaceBook. The management was too timid to try async and event driven solutions and finally went with the ugly design of supporting lot of open connections for a chat application instead of an event driven one. 

On the bright side though, lot of problems and techniques described here are generic information and can be plugged into your context. If you are into serious web applications tuning then it may well be worth a read.


Tuning apache and tomcat for web 2.0 comet applications

Introduction to problem

Traditional web applications are request oriented. The client , typically , a browser sends  request for resource to a web server. The server has a listening thread that keeps track of incoming connections. when a request arrives , the server uses one process or thread to process the request. The resource is returned to the client and the connection is closed. 

[ Browser ..=> web server ..=> response ..=> close ]
In this model, the number of requests that can be served in a second would depend on two things
(A) How many threads are there to handle the client requests
(B) How long it takes to serve one request.


If All threads of server are busy then the incoming requests are put in a queue. The server would return to the requests in queue when server threads become free. The number of requests  handled per second (Reported by tools like apache AB) is always greater than the number of allowed simultaneous connections. All this is made possible because the time required to process a request is very short. In other words you can server more requests in a second than you have threads. The out-of-box apache 2 configuration on my machine allows 256 maximum simultaneous connections. This is the web as we  know.

However, there is one breed of applications that need to hold onto the connections. Think of applications that require real time data coming to clients (stock tickers)  or think of applications where low-latency is required. In the above traditional web  model, the browser has to re-connect to get the new data. (Polling). If the new updates "can"  happen with high frequency (e.g. a chat application) then the polling frequency also has to increase  .  An alternative to high frequency polling is to use push based applications.

what is a push based application? Here the browser connects to server. The server maintains the connection till the browser time-out (server response stream is not closed) and keeps flushing data down the connection as and when they become available.

Push based applications hold onto the connection. When the request arrives, the server allocates a thread for processing. However this thread is not freed immediately. This threads blocks for output from the server. So in a response-oriented push application, the request per second would be approximately the number of allowed simultaneous connections, a very low number. And now we have a problem , because apache stock installation only allows for 256 MaxClients or simultaneous connections.

Let me try to explain the problem if you are writing a servlet to run in tomcat container.

Say, you write a servlet. In this service()  or doGet() method   you write all the processing instructions. The servlet container will grab a thread from thread pool and tell it to run through your service method. The connection is established and the thread enters the service method. Now, after that, if you are a push based application, you just can not fall through to the end of doGet() method. If you do that, you have run through the service() method and the response stream would be closed. You do not want that. 

So what you do is, you block the thread on some condition within the service method. So the thread will block for your condition. When push data becomes available , this thread writes to response stream and again enters a blocked state. So as long as you hold onto the connection, you can not return this thread to the thread pool. And as more and more "push" connections are established you would run out of threads!

Apache + Tomcat web interface is provided using servlets that operate in a one request one thread mode. So when we block the servlet, we actually block one thread for one connection. Even on open connections we are not writing data in a linear fashion. The data from the backend is delivered in response to events that are not known a-priori. If we have dedicated threads that block on data from backend events then we are limited by the number of threads allowed in a JVM

Possible solutions

How can we remedy the above situation ? what possible solutions are there?

Increase the number of server threads. (MaxClients for apache). However, if you increase the number of server threads , you increase the resources consumed on the server as well. You have to overcome the limit of stock apache and tomcat installations. In short, you need to do some serious tuning.

(B) Provide support in the web server to suspend a connection and resume it again. Two examples are Jetty continuations and Netscape personalization engine (NPE) . This way you can do traditional web programming and still be happy.

(C) Do something about java threads. Can we suspend a thread without returning from a method and then resume it again ?

(D) Use a language that provides support for 100K threads or use different kind of thread implementation.

(E) Leave the whole mess of HTML/Javascript front end behind. code the front end in Flash/Applets that allow you to open direct sockets.
[ links to this kind of applications ]

(F) Can Non Blocking I/O help you? (This could be related to point B above) Use a different HTTP connector like grizzly.

(G) Do not use http protocol. use a protocol that allows you to maintain sessions easily. 

(H) change the model to use polling.

[ This was the mail I first sent to tomcat-users list.]

List: tomcat-user
Subject: Supporting maximum number of keep-alive connections
From: “Rajeev Jha”
Date: 2006-03-17 19:47:44
Message-ID: f25d79b70603171135t7452403ao23a0eccb552cd78e () mail ! gmail ! com
[Download message RAW]

I would like to try out tomcat for new my application. I have used tomcat for quite some time, but the nature of new application is very different from the traditional request-response model. We want to build an application that supports about 1024 keep-alive connections per machine.(2 GB, x86 running linux 2.6.x). We need keep-alive support because application is push-oriented. Browsers send a request and the server keeps the connection open till browser time-out(10 mins).

mean-while data is pushed to client as javascript commands. when browser time-out happens, a command for new connection is sent back. Application is dynamic in nature, using servlets to publish data. After googling and going thru mailing archives, i would like to collect feedback from other tomcat users.

option A) This option is one thread – one connection, so not very scalable. Just use coyote HTTP/JK connector , set the thread stack size, VM heap size and maximum threads in config file. For my other applications, the defaults worked fine for me. But this time i am looking for as many threads as i can support.

What is the maximum number of threads people have success with on “NORMAL” machines?( 2 GB RAM, single processor running linux ) on such a machine what is the maximum possible threads ? what parameters to tune ?

It looks like with a thread stack size of 256 k, 1024 threads should be possible.

option B) Use tomcat APR. adjust the poller size to 1024. I dont know how this works out in real life or what are the max. possible numbers.

option C) Use Glassfish’s grizzly as http connector. I have seen blog posts about grizzly. can this connector be used with tomcat 5.5.x ? Has anyone had success using it with this kind of requirement ?

Then, the other bottle-neck is, integrating with apache. If we run tomcat APJ APR and apache is the front end then apache would also be hard pressed to do 1024 keep-alive connections. I can try event MPM or worker modules. Does this look ok ?

Active child processes – 32
Server Limit – 64
ThreadsPerChild – 32
MaxClients – 1024

I would appreciate if any of you can speak from experience. Our first aim is to support 1024 keep-alives with tomcat, then 1024 keep-alives with apache/tomcat combo, then maybe run multiple instances then multiple machines … (Long shot)

I have some dev machines and time for this job. So if you guys help and want me to run some benchmarking tool , i can do that also.



whole thread is available at

After going through the thread you can see people advise against using a stock implementation of apache and tomcat applications because

(1) To service 1024 requests , you need 1024 threads and there is no guarantee that the application would work with so many threads.
(2) Most of our threads would be in blocking state most of the time and that just does not look okay!

Why we did what we did

Now I would like to provide some details on why we still went ahead with options (A). 

(1) Writing the custom connector was error-prone and likely to take longer time. In today’s world hardware is cheap so why not try the thread – blocked model of servlets. Our production guys were ready to increase the number of server machines. The strategy was to use a large number of server machines and load balance the requests.

(2) I tried jetty continuation. Here, you suspend the servlet thread when no data is available. You resume the request when data becomes available and then suspend it again. So you keep doing suspend/[resume on data] till the browser time out. I could do one suspend/resume but could not do suspend => resume => suspend cycle.

[ mail to jetty-discuss] This is still pending from my side.

(3) CPS/ Pico threads. Erlang and Yaws. Never sat down to do it. All my middleware code is in java. The portion for push application is just a part of it. And most important of all, i could never convince the production guys to install an erlang server on their boxes. some how all these works looked still in academic realm only.

(4) Changing the model to polling model was never an option. i am not a front end guy and switching to polling would have meant a really major design change which product management would have never bought.
(5) There has been a lot of talk around NonBlocking I/O connectors (NBIO) and how they are a cure for everything. I do not claim to be an expert on the subject. what follows in next paragraph are my impressions only. Promise of NBIO connectors is that you can watch  large number of sockets with less number of threads. That is house-keeping code of servlet container. In our case, we are blocking threads inside our servlet code. So we need N threads for N connections and we can not change that. At least not until the one request - one thread model of servlet changes. I am planning to expand this section after doing some more reading.

[ An Aside on keep-alive]

One more thing to understand is that the requirement of push applications can not be solved by keep-alives.  keep-alives are request oriented. Since the client can request the same resource again, please keep the connection open for some more time. (keepAliveTimeOut). Open connection of a push application are not for multiple requests on same server socket to minimize the overhead of creating new sockets. In a push application the server needs to push data down the open connection for a length of time comparable to browser time-out. 

Here, everything is response-oriented. You should not close the response stream. Theoretically our push application should work without keep -alives directive. However in reality , it works with keep-alives only. This could be due to other reasons and i need to debug this issue. Till that time, we keep using keep-alives in our Apache configuration.

[ An Aside on keep-alive] 

(6) But the number (1) reason, the most important of them all. when I read about the improvements in 2.6.x linux kernel, running 1k threads looked very very possible.  Our first goal and our last goal was to achieve 1024 users per CPU.

So now we asked ourselves, what would it take to do it the one connection one thread way? How to do it the wrong way against the advice of everyone ? This is what I would like to describe next. I would step by step, one piece after another. To the learned man, this all will look a lot of waste. so wise men can just ponder the numbers and provide their comments. For the mortals now, here is how we did it!!

Our Application details

I would like to explain our application in some detail. We have an HTML interface. we follow web standard practices till it starts beating the practical side of things. Most of the UI glue/magic is achieved by javascript code and it would take a while to figure out all the UI code.  This UI code interfaces to web server and servlet containers. Our front end is apache and tomcat is playing the role of servlet container. Now we have tried to make this layer as lightweight as possible. The threads entering servlet doGet() method do not do much. They do not instantiate thousands of variables and do hundreds of computations. Most of the time they are just passing messages between the UI layer and our middleware code. You can think of this layer as a message router. All the heavy lifting is done by our backend code written in C. Keeping this layer simple allows for a simple design.

There was some debate in our mind about front-ending tomcats with apache web server. We have heard about APR and how it obviates the need for a separate apache front -end. However, we still decided to go with apache because we wanted support of few modules like mod_cache, mod_status and mod_rewrite. Our production install policy dictates having an apache as front-end . The connection between apache and tomcat is through JK connector. Again, production guys were against using apache 2.2 , so we went ahead with apache 2.0.x only. we have tomcat 5.5.x series and JDK 1.5.x series installed. Finally, everything is running on Redhat Enterprise Linux advanced server (RHEL AS4).

A schematic diagram would look like this:
[ Apache => mod_jk=> tomcat => JVM => Linux ]

Tuning our web server layer means tuning everything involved in the stack. We should talk about tuning apache so that it can take 1024 simultaneous connections. we should talk about tuning the AJP/13 connector between apache and tomcat. we should talk about tuning tomcat and the JVM it is running in. Finally we should tune our OS also.

[ Tuning Apache ]
* Install and configure*

Installing apache is always easy. You just download the source code from apache.org and do

./configure—prefix=/path/to/prefix -—enable-rewrite -—with-mpm=worker -—enable-cache -—enable-status
./make install

We need to explain -—with-mpm=worker. We wanted a hybrid model of processes and threads in our apache server so we compiled with this switch. We wanted to try out the apache 2.2 event MPM also but production guys were against it, so we could not give it a spin. The results obtained with worker looked better than prefork (default). For more details, please see [ http://httpd.apache.org/docs/2.0/mod/worker.html ]
Rest all are modules that are specific to our work.

* Apache configuration *

The apache2 has 256 MaxClients limit out-of-box but the compiled-in hard-limit of MaxClients(ServerLimit) is 20,000 (way beyond our purpose)

Our aim was to support 1024 simultaneous apache connections. The parameter that governs the simultaneous connections is MaxClients. For a pre-fork apache, the MaxClients is the number of processes. worker is a hybrid MPM , with number of threads per child processes. We could not scale our pre-fork apache (frankly, we did not try much also). We decided to go ahead with worker MPM and we describe the tuning of worker.c module in somewhat detail now.

(1) ServerLimit : This goes hand-in-hand with the MaxClients. For prefork MPM , this is number of MaxClients. in case of worker MPM , this together with ThreadsPerChild determines the MaxClients. for worker MPM MaxClients = ThreadsPerChild x ServerLimit

(2) ThreadPerChild: number of threads created for a apache child process. Default is 25. Hard Limit can be 64 determined by the ThreadLimit.

(3) MaxClients : This is the number of simultaneous requests that apache can handle.

Listed below is one real world example (http://www.stdlib.net/colmmacc/Apachecon-EU2005/scaling-apache-handout.pdf)
ServerLimit 900
StartServers 10
MaxClients 28800
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 32
MaxRequestsPerChild 100

As you can see, MaxClients = ServerLimit x ThreadsPerChild

If we go with pre-fork,I do not know if we can start so many apache processes on a machine, even theoretically!!!. On my Linux machine,

Linux Kernel thread and process limits

[rajeevj01@bdc31035e ]$ cat /proc/sys/kernel/pid_max
So this number needs to be tuned first.

After going through all such examples ,we settle on a configuration that will allow us to run 1k simultaneous connections. Any perspective gained from here will determine how we go ahead with 2k simultaneous connections. so this is the setup we try first

ServerLimit 4
ThreadLimit 512
StartServers 2
MaxClients 1024
MinSpareThreads 128
MaxSpareThreads 256
ThreadsPerChild 256

We would get 1024 connections using 256 (threads) x 4 (processes). The same number of threads can be achieved by 128 threads x 8 processes or 512 threads x 2 processes. why we decided 256×4 ? This just looks sort of balanced. As it is, in all probability you would not be able to run this example configuration with the default thread stack size on linux machines. ( so many threads!!) To understand better, we need to explore the following topics
+large number of threads support in linux (NPTL support)
+limit the thread stack size using ulimit
+change number of open file descriptors per shell using ulimit

[ NPTL support and threads in linux kernel 2.6.x]

There has been major improvements in thread support in linux kernel. Earlier, people could have choked you to death for asking 10k threads. But now it is okay to ask that. It would be better to do some study and googling of the topic. Terms are NPTL, Ingo Molnar , Linux 2.6 kernel thread support etc. In summary, 2.6.x kernel series has better threading support. For this reason, we use the 2.6.x kernel.

 [How many threads can be supported by your RAM and linux ?]

The system limits can be checked in following places

+ cat /proc/sys/kernel/threads-max [32464]
+ grep totalpages /var/log/dmesg (totalpages/16 is the max number of concurrent threads possible, see also http://nptl.bullopensource.org/Tests/NPTL-limits.html)
+ cat /proc/sys/kernel/pid_max
+ cat /proc/sys/vm/max_map_count (need to figure out what exactly is this )

You would find that on 2.6.x kernels, these numbers are fairly large. I have listed the values from my fedora core 4 machine. On one of my production box, /proc/sys/kernel/threads-max is 163840!!!! A better way to find the number of threads supported is to use threads-limit program from volano chat site. However, when i run this threads-limit program on my fedora core 4 box which should support 32k threads, this is the output

Volano.org - Threads Limit program

Thread stack size

Message from threads-limit
[rajeevj01@bdc31035e sandbox]$ ./threads
Creating threads …
100 threads so far …
200 threads so far …
300 threads so far …
Failed with return code 12 creating thread 304.

Only 300 threads , what is wrong here ? The thread stack size. By default the thread stack size is around 10 MB. we need to reduce the thread stack size in order to create more threads. Use the ulimit command to reduce the thread stack size to 256k.

$ulimit -s 256
Now when we run the program again, we get 12000 threads!!
11800 threads so far …
11900 threads so far …
12000 threads so far …
Failed with return code 12 creating thread 12081.

Now is the time to know what is ulimit and how it affects our tuning effort!

[ Effect of ULIMIT command on tuning]

ulimit is a command that provides control over the resources available to shell and to processes started by it. on a linux system do man ulimit for more details. For us, the number of threads and number of sockets are important parameters. We need ulimit to decrease the default thread stack size and to increase the number of file descriptors available to the shell and processes started by it. We need to decrease the thread stack size to start more and more threads. 

At the same time, we need to bump up the file descriptors limit to create a large number of sockets required by our servers and connectors. 1024 simultaneous connections means 1024 sockets on apache, and 1024 sockets each for AJP/13 connector on tomcat side and apache side. [Need to fix_ ]

The ulimit command has two variants. ulimit-aH and ulimit -aS. H is for the hard options (what is set in the config files for the normal user) and soft is the current value. Now if you are root it is all pretty ok,you can just use ulimit command to increases whatever you want. For other normal logins, we can not bump up the resources more than the hard limits. For such cases, we have to first increase the hard-limits by changing the /etc/security/limits.conf on linux 2.6 systems. if you are using PAM, then you also need to do the following to honor the limits.conf file:

edit /etc/security/limits.conf and add to any config file in /etc/pam.d/ that handles logins (“login” and “ssh” example): session required /lib/security/pam_limits.so

We have no issues with decreasing the thread stack size as we want a value less than the current soft value. however, file descriptors are a different thing alltogether. Here you want more than the hard limit set in limits.conf (given by ulimit -n, typically 1024). so the system administrator would need to bump up the open file descriptors hard limit first. Afterwards you can bump up the number of open file descriptor available to processes in shell using the ulimit -n command.

After making these changes, your apache should be able to create 1024 concurrent connections.

Now, we turn our attention to the tomcat servlet container and the JVM hosting it. To a large extent tuning of JVM would be application dependent. so we would start with tomcat tuning and AJP connector tuning. Point is, we should be running a lean and mean tomcat and in a JVM that is tuned for the application we are hosting. Run a lean Tomcat

[ Tuning tomcat container]

Application design has an important role in resource hungry code. Most of the time people blame the container for performance woes when they should be looking hard inside their own applications. if you tell the container to do 10,000 tricks , of course it would take time. I believe firmly that we were able to run with so many threads is because of a very simple design. Our application is stateless, totally request oriented. 

No heavy lifting happens inside the servlet code. Messages from backend are delivered asynchronously to a listener that awakens the threads  blocked in servlet code. The servlet threds then just pass the message tokens to UI front end where they are evaluated in javascript functions. We are able to run happily in thread stack size of 256K.

Second point is configuration. Earlier we were running with [ Apache => mod_jk => tomcat ] configuration. One servlet container was holding the long-lived threads and doing message passing from UI also. The servlet doing the message delivery to backend (1 in diagram below )is pretty normal.  The code doing message delivery from backend to UI is long-living threads. Both the code was running inside one container. The performance improved significantly when we separated (1) and (2).

UI => message => servlet (1) => backend => servlet thread blocks (2) => message => UI 
Now we have one container for normal message delivery and one dedicated container for holding the long-living threads.
                ( Apache ) =>  [Normal | long-living threads| ]

We are not doing "heavy" synchronization. The messages from backend are delivered in a Queue and the writer is never blcoked. All the cleanup etc is done by the Queue reader thread.

Apart from that , here are some points about doing  tomcat tuning:

(0) Get rid of large  XML configuration files, especially if the parsing is a per-request thing. (some one please kill me for saying so!)
(1) switch off the DNS Lookup
(2) switch on gzip compression
(3) Do not load un-necessary applications
(4) disable the servlet auto reloading feature
(5) Tracing and debugging should be minimal (no big writes to disk)
(6) Adjust the thread stack size with ulimit -s 256 before running the container because we need something like 1024 threads for our AJP/13 connector.
(7) we need to tune   JVM parameters as well. This is subject matter of next paragraph.

And finally, always do profiling to identify the "hot-spots" inside your code. In the appendix, we describe setting up an eclipse profiler.

JVM tuning

JVM tuning can require a separate book. So i would not go into details of how things are and all. All tuning info is very application dependent. what works for me may not work for you.  we made use of jps and jstat commands to procure GC data. Please check the appendix for more details.  After we did one round of GC data capture we could see that

(A)  The eden space size was small , so minor collections were happening every now and then.
(B)  survivor space was not big, so after minor collections overflow was happening to tenured space at a very young age

So keeping that in mind, we should start with a bigger young generation. Plus we were running a server application, so lets keep the same -Xms and -Xmx options. we use 768 MB heap size for the container holding the connections. For the other container, we use 512 MB heap. Here are our JVM parameters

-Xss256K -Xms768M -Xmx768M -XX:NewSize=384M -XX:SurvivorRatio=6 -XX:+UseParallelGC

Tuning mod_jk connector

After tomcat, we turn our attention to the mod_JK AJP connector. Many how-to explain the usage of this plug-in. For a quick setup please check the appendix. There are few things worth noting with mod_jk connector.
see this posting on tomcat-users list for more:

[ http://marc.theaimsgroup.com/?l=tomcat-user&m=114080334306171&w=2 ]

(A) For each apache thread, mod_jk would open a socket to tomcat. For a push application, we are not writing any data on this socket most of the time.  We only write data down the mod_jk socket as and when it becomes available. However, we still want this connection between apache and tomcat to be open till browser time-out happens. This is the requirement of our push application. This means we should not use directives that drop the connection when no data is sent or connection becomes inactive.  [ if no data written till a timeout]  when the re-connect happens from the browser, we need to recycle the old connections that still persist on tomcat.

(B) value of  maxThreads parameter for AJP/13 connector should be 1024 inside the server.xml file. This is needed to ensure that we do not run out of threads on tomcat side.  Number of threads control how many concurrent requests can be served by this connector.

(C) For 1024 connections, we should have 1024 x 3 = total of 3072 sockets at-least. so make sure that ulimit -n value is more than 4k or you will run out of sockets.
(D) We are intentionally running an application that requires one thread per connection. so in our case we do not worry too much about "wasting threads and sockets" on tomcat side. we are running a 1:1 configuration. so lets go ahead with 1024 maxThreads.
(E) you should run your application for longer durations to check that connections are indeed recycled. you are in trouble if  your thread count and sockets keep increasing.
(F) Apache side of connections are keep-alive connections.

 with the above points in mind, we come up with the following configuration in server.xml

<Connector port="48090"
               maxThreads="1024" minSpareThreads="100" maxSpareThreads="200"
               enableLookups="false" protocol="AJP/1.3"
               tomcatAuthentication="false" connectionTimeout="600000"/>

The requests from apache are forwarded to tomcat via mod_jk connector. We do not want to do a long wait for tomcat reply so we have the following in our workers.properties file [ This is application dependent and may not work for you ]

# persistent worker handles long live connections
#worker for handling all other type of requests

For a push application you keep the response stream open and write data when it becomes available. when the data is not available, the thread inside servlet doGet() method just blocks. suppose the browser drops the connection when the servlet thread is blocked and before the browser re-connection time. Now, we can not push data to browser because browser side of connection is closed. we get a clientAbortException in such cases. we handle this exception inside our servlet code.

caching issues , problems in safari browser  and mod_deflate.

Push applications require data to be pushed as and when they become available. Now , many servers will keep buffering data till the response stream is closed. This also happens when you have enabled compression. To compress data, I would require the "full" data. So you have to keep the URL mapped to push stream out of mod_deflate [ or your compression module] settings. Buffering happens at the browser end also. To instruct the browser to flush data immediately, we use the following as part of headers

    Header set Expires "Sat,6 May 1995 12:00:00 GMT"
    Header set cache-control "no-store, no-cache"
    Header append cache-control "must-revalidate , no-transform"
    Header append cache-control "pre-check=0, post-check=0"
    Header set Pragma "no-cache"

=>  How to collect performance related data  <=

We need to monitor resources on the box where we are running our web server and applications. What should we look out for?

A) Apache status. There is something called mod_status to give you status of apache.

B) You need to watch the tomcat container. you can do that via JMX tools (something like MC4J) or just use the manager application bundled with tomcat. Using MC4J you can watch the active thread count, JVM heap and GC data all in one place.

C) Number of open sockets and their state. we can use netstat command for this purpose.
$netstat -a -n|grep -E "^(tcp)"| cut -c 68-|sort|uniq -c|sort -n
This will show you a sorted list of how many sockets are in each connection state.

D) to see the CPU status, you can use the top command.
$top -b -d 2 > top_stats
This redirects the output of top to a file in plain text format every 2 seconds
$man ps free watch also.

E) memory ( I should not be swapping )
vmstat <seconds> ; say "vmstat 2"  gives stats every 2 seconds
do a $man vmstat for more details.

F) JVM GC data
You can use jps and jstat command line tools to gather GC data.

G) install the REDHAT sysstat package.

[ Appendix ]

+ How to install mod_jk connector with apache
+ Make the eclipse java profiler work for you
+ Installing tomcat with (A)pache (P)ortable (R)untime
+ Program to check file descriptor limit
+ Program to check threads limit
[ References ]

Benchmarking tools
Tomcat site lists three of them. AB too is supposed to be very good. 

Mailing lists
Tomcat dev provides a host of information.
MARC archive for tomcat-users and tomcat-dev http://marc.theaimsgroup.com/?l=tomcat-dev&r=1&w=2
Comet and push applications 

Alex Russell on comet
Remote scripting
Apple developer site article on remote
Event driven architectures – Google for SEDA, MINA, ACE 

JVM profiling and tuning
Java tuning whitepaper
Location of BEA tuning docs

USENIX paper on why Events are a bad idea

HTTP interface to jabber implements JEP 0025

NBIO connectors
Grizzly is used in project glassfish
C10k problem l

Jetty continuations

Here you can understand why we require support from connector and server to scale and why simple NBIO connector would not work.

Google for Pico threads

Erlang and Yaws

Concurrency constructs to do thread blocking in servlets. Check Doug leas page.