Using the Round Trip Timeout (RTT) the activity turns back not in the expected time. First problem is already discussed. When the connection isn't established the used timeout is not what was set by the Roundtrip Timeout (RTT) but by the connection timeout * retrys (1s * 5 = 5s) from the jaco property file (see Bugreport 549) Second problem is much more worse. If a number of let's say N clients are trying to send requests to the same server the request processing will queue up in the ClientIIOPConnection.connect (syncronized) method. Each client will have to wait until the previous clients timeout is reached and all retrys are tried (5s). Then it is not checked if the RTT is already timed out and so this client and all the followers also try to establish the connection one by one . This approach ends up with the last client (lets say of 30) waiting N * connection timout * retrys (30 * 5s = 150s). If the application is continously creating new requests with a cycle lower than 5s this will also end up in resource shortness. As far as I know is the RTT defined to limit the requests lifecycle. I didn't find anything that the RTT will manage to inform the application exactly when the request times out. But isn't that exactly what the application programmer is expecting?
Created attachment 315 [details] Example Application to show the multiplication effect of the timeouts
Using the example application: console 1: java server ping.ior console 2: java PingClient ping.ior console 1: Control-C At the end of each console 2 output line there is a number which gives the latency in ms of one invokation of the operation doSimplePing. In PingClient there are 4 Clients running in parallel calling the Ping Interface. This is as expected while server hosting the Ping Implementation is running. After stop of server application, output changes to something like "retries exceeded ..." and the number is something like N * CONNECTION_TIMOUT * RETRYS. With the given jacorb_properties file and the usage of 4 Threads in the current PingClient Application you can see effectiv latencies of up to 4 * 500 * 5 = 10 000 ms and sometimes more.
Created attachment 325 [details] patch for correct client timeouts (RTT)
Created attachment 326 [details] short desription of the patch
*** Bug 549 has been marked as a duplicate of this bug. ***
See also Bug 814, which is closely related
Created attachment 340 [details] Improved patch (don't use the former one - take this instead).
Bug: #787 Thema: Improved patch for client timeouts (RTT) This patch is an enhancement of Peter Nikol's last patch and includes his latest changes. Test case: -------------------------- Description: We want to send a blocking CORBA call with a timeout. If the call cannot be completed within the specified time, an org.omg.CORBA.TIMEOUT exception is thrown. Used CORBA interface: interface Ping { void doSimplePing(in long timeoutMillis); }; The client calls the method 'doSimplePing(500)'. The server receives the request and delays the response for 500ms. Example: 1. doSimplePing(500), request timeout 250ms --> org.omg.CORBA.TIMEOUT 2. doSimplePing(500), request timeout 700ms --> no timeout, call returned within 700ms 3. doSimplePing(5000), request timeout 7000ms --> no timeout, call returned within 7000ms With this test application it is possible to create timeouts explicitely. Test procedure 1: - Start the server application on machine 1 - Copy the ping.ior from the server to the client - Start the client application on machine 2 --> Test behaviour is as in example above described - Unplug the network cable --> All CORBA requests lead to an org.omg.CORBA.TIMEOUT exception or to an org.omg.CORBA.COMM_FAILURE exception. - Plug in the network cable --> After catching an org.omg.CORBA.COMM_FAILURE exception the application reinitializes the connection rereading the IOR file. Test procedure 2: - Start the server application on machine 1 - Copy the ping.ior from the server to the client - Start the client application on machine 2 --> Test behaviour is as in example above described - Unplug the network cable --> All CORBA requests lead to an org.omg.CORBA.TIMEOUT exception or to an org.omg.CORBA.COMM_FAILURE exception. - Restart the server application (a new port is used) - Copy the ping.ior from the server to the client - Plug in the network cable --> After catching an org.omg.CORBA.COMM_FAILURE exception the application reinitializes the connection rereading the IOR file. Solved problems: ---------------- - Timeout problem as Peter Nikol described in bug #787 - If the request timeout was smaller than the connection timeout, an org.omg.CORBA.TIMEOUT exception was thrown. If the server was restartet meanwhile and the server port number has changed, the application never gets an org.omg.CORBA.COMM_FAILURE exception, if the timeout value is small. While the Executor detects the org.omg.CORBA.COMM_FAILURE the application receives only a org.omg.CORBA.TIMEOUT, which is not an indicator for an defective connection. There is no trigger to reread the IOR file and create a new connection. ---> Therefor a connection with an org.omg.CORBA.COMM_FAILURE exception is marked as invalid. If the client wants to send a CORBA call through the same connection, an org.omg.CORBA.COMM_FAILURE exception (instead of org.omg.CORBA.TIMEOUT) is thrown immediately. - Sporadic deadlocks after unplugging the network cable (Patch Marc Heide, bug #708) - While the connection is invalid, it will not be returned on request to a client thread. In this case the ClientConnectionManager will create a new connection with the same profile and returns it. Modified classes: - ClientConnection - ClientConnectionManager - ClientGIOPConnection - GIOPConnection - ClientIIOPConnection - Delegate The changes are based on JacORB release 2.3.0 Description of the patch: ------------------------- - ClientConnection: Method 'isConnectionInvalid()' has been added to check, if the connection is valid. - ClientConnectionManager: 1. The method 'getConnection(org.omg.ETF.Profile profile)' checks, if a VALID connection with the specified profile already exists in the connections pool. Otherwise a new connection is created and returned. 2. The method 'releaseConnection(ClientConnection connection)' checks, if the connection can be closed and removed from the connections pool. It can be closed, if there is no client using this connection [connection.decClients()] or the connection is invalid [connection.isConnectionInvalid()]. In case of having an invalid connection it is removed from the connections pool. - ClientGIOPConnection: In method 'closeAllowReopen()' getting the write lock has been moved before the 'synchronized(connect_sync)' statement to avoid deadlocks. (Bug #759, Richard Ridgeway) - GIOPConnection: 1. The write lock implementation [getWriteLock() and releaseWriteLock()] has been changed, because we assume that holding a write lock and requesting it for a second time in the same thread can lead to deadlocks. 2. If an 'org.omg.CORBA.COMM_FAILURE' exception occured, the connection is marked as invalid. The next time this connection is used an 'org.omg.CORBA.COMM_FAILURE' exception is thrown immediately. 3. Closing the connection in the methods 'getMessage()' and 'sendMessage(MessageOutputStream out)' after an error occured has been removed according to the patch from Marc Heide (Bug #708). - ClientIIOPConnection: A 'java.net.ConnectException' in method 'connect(org.omg.ETF.Profile server_profile, long time_out)' is caught as an 'IOException' and is rethrown as an 'org.omg.CORBA.COMM_FAILURE' exception. - Delegate: 1. The 'bind_sync' object has been replaced by a 'ReentrantLock' object from the concurrent package. Using this object it is possible to set a blocking timeout. (Enhanced patch from Peter Nikol) 2. Sending CORBA messages [method 'invoke_internal(...)'] has been uncoupled from the application thread. Therefor an 'Executor' object is used, which sends the messages in an own thread. For each Delegate object one sending thread (Executor) is used. We hope, that our bugfixes are helpful for the JacORB community and will find the way into the next major release. We did a careful test of the software bugfixes. Please understand, that the fixes come without any guarantee.
Created attachment 359 [details] Improved patch for client timeouts (RTT) This is the attachment (diff files) for changes described in comment 8
Created attachment 360 [details] Improved patch for client timeouts (RTT) [Java source files] This is the attachment (Java source files) for changes described in comment 8
Created attachment 382 [details] Improved patch for client timeouts (RTT) Improved patch for client timeouts (RTT). A thread pool is used for CORBA timeout calls. [Text diff files]
[Emailed submitter on 11/11/11 requesting they test CVS head and whether their changes are still necessary (given OCI's NIO addition) or is a subset of the changes required]
(In reply to comment #12) > [Emailed submitter on 11/11/11 requesting they test CVS head and whether their > changes are still necessary (given OCI's NIO addition) or is a subset of the > changes required] > Hallo Nick, sorry for the delayed answer. We are interested in the fixes of this bugfix and will do a test of the current head. Due to heavy project load, we will not be able to start immediatelly. Our intension is to have first test results until end of november. Thanks Peter
Have you made any progress testing the new version?
Created attachment 394 [details] Recent corbaping test software Inside this zip file there is a Readme.txt that you should follow.
We are currently experiencing timeout problems using JacORB 3.0. Can you confirm whether this bug has been addressed or not? This is just so I don't waste a lot of time trying to determine if our issue exactly matches this one.
I have not rerun the tests on the current 3.6.1 version. It would be useful to verify if it is still an issue. @Rudolf Visagie : if you have a reproducable test case that would be helpful
(In reply to Nick Cross from comment #18) > I have not rerun the tests on the current 3.6.1 version. It would be useful > to verify if it is still an issue. > @Rudolf Visagie : if you have a reproducable test case that would be helpful Unfortunately we are having the problems in our production environment under special conditions and therefore do not have a reproducable test case. I'm not even sure that it matches this bug. Does is this bug affect the request reply timeout after a connection already has been established (jacorb.connection.client.pending_reply_timeout) or only timeouts when establishing a connection. If it's only when establishing connections, it's definitely not related to our problem.
@Rudolf : I believe Peter has put a description of the problem in the ticket. If you feel it doesn't meet your criteria please enter a separate ticket.