This bug is a related to bug 787, but actually describes a problem that is a little different: If a round trip timeout policy has been added to a remote object reference, and the network connection to the remote is interrupted while data are beeing sent within a synchronous method invocation, the thread that issued the call blocks for a duration that is not related to the timeout. The thread will not return until the underlying TCP implementation gives up resending the data, what depends on the TCP stack and may takes hours. Another face of this bug: If the sending of data is possible, but takes longer than the timeout (because network throuhgput is degraded), the calling thread will be blocked until the complete data has been sent (regardless of the timeout), and will finally receive a TIMEOUT exception, albeit the complete data has actually been transmitted. The reason for this behaviour: Deep inside JACORB the call that actually puts the data to the socket blocks in method java.net.SocketOutputStream.write(). The timeout is implemented a little 'naive' by class org.jacorb.orb.ReplyReceiver.Timer, which starts a new thread that just waits until the timeout elapsed (or the remote call is finished), and justs sets a flag if the timeout expired. Thus, the timeout expires as designed, but as long as the calling thread is blocked in SocketOutputStream.write() this has no effect. AFAIKS, the actual problem here is: java.net.SocketOutputStream.write() is always a blocking operation, and it is also not by the java socket API to specify a timeout for write operations. Thus, it is simply not possible to return from SocketOutputStream.write() until all data are transferred or the TCP Stack gives up. BTW, with asynchronous calls the problem is not as visible, but as well present: The sending thread will linger in background until java.net.SocketOutputStream.write() returns. To fix this, i see several Options: 1) If org.jacorb.orb.ReplyReceiver.Timer.run() detects a timeout, call close() on the socket that is used. According to the documentation of Socket.close() this will raise an exception in all threads blocked in I/O through this socket. 2) If a timeout applies in an synchronous call, perform the call in a separate thread (as always done in asynchronous calls). Let the calling thread wait until either sending is done or the timeout expires. This option was already realized in the fix for Bug 787 submitted by my colleague Peter Nikol. I could verify that the fix also avoids the problem described here. However, the thread blocked in SocketOutputStream.write() will not die until the call returns, what may take hours. As in 1), it might be a good idea to additionally call close() on the socket to prevent this. for both 1 and 2: IMHO it is generally questionable to start a separate thread for every outgoing call, if this can be avoided. This applies to the current implementation as well. 3) Use java.nio.channels.SocketChannel instead of the 'traditional' java Sockets at least for socket writing. With SocketChannels, we can use nonblocking mode and select() to implement write timeouts without having to create other threads. For reading from the socket with timeout we could either use the same means, or use the read timeout facility of 'traditional' java sockets. Thus, for synchronous calls we could implement timeouts completely without using other threads. Of course, this option bears a lot of work to do, but i feel it is best w.r.t. efficient ressource utilization.
New write-timeout options are part of the CVS trunk. There is one for timeout sending requests and a separate value for timing out writing replies. These properties take a number of milliseconds to wait for a write. jacorb.connection.request.write_timeout jacorb.connection.reply.write_timeout The actual timeout enforces a deadline on the act of performing a blocking write. The implementation makes use of a single thread which makes use of a timer queue to wait a certain amount of time before signaling callback handlers. When using RTT, or either of the above timeout options, a callback is registered with the timer queue. This enables a single thread to handle all waiting functions. In the case of the write_timeouts when using traditional blocking IO, the callback handler will close the connection, which triggers the existing error handling code. Note that if the new NIO feature is used, the write_timeouts will never fire because the writer will never block. In that case, the RTT will perform as expected.
We hope this is fixed now.