Read only archive ; use https://github.com/JacORB/JacORB/issues for new issues

Bug 787

Summary: RTT is not working as expected when connection isn't established
Product: JacORB Reporter: Peter.Nikol
Component: ORBAssignee: Gerald Brose <gerald.brose>
Status: NEW ---    
Severity: critical CC: conny.krappatsch, jacorb, robert.bienias, rolfgall, rudolf.visagie, ze_corsaire
Priority: P1    
Version: 2.2.2   
Hardware: PC   
OS: All   
Attachments: Example Application to show the multiplication effect of the timeouts
patch for correct client timeouts (RTT)
short desription of the patch
Improved patch (don't use the former one - take this instead).
Improved patch for client timeouts (RTT)
Improved patch for client timeouts (RTT) [Java source files]
Improved patch for client timeouts (RTT)
Recent corbaping test software

Description Peter.Nikol 2007-07-04 13:52:34 CEST
Using the Round Trip Timeout (RTT) the activity turns back not in the expected time.

First problem is already discussed. When the connection isn't established the
used timeout is not what was set by the Roundtrip Timeout (RTT) but by the
connection timeout * retrys (1s * 5 = 5s) from the jaco property file (see
Bugreport 549)

Second problem is much more worse. If a number of let's say N clients are trying
to send requests to the same server the request processing will queue up in the
ClientIIOPConnection.connect (syncronized) method. Each client will have to wait
until the previous clients timeout is reached and all retrys are tried (5s).

Then it is not checked if the RTT is already timed out and so this client and
all the followers also try to establish the connection one by one . This
approach ends up with the last client (lets say of 30) waiting N * connection
timout * retrys (30 * 5s = 150s).

If the application is continously creating new requests with a cycle lower than
5s this will also end up in resource shortness.

As far as I know is the RTT defined to limit the requests lifecycle. I didn't
find anything that the RTT will manage to inform the application exactly when
the request times out. But isn't that exactly what the application programmer is
expecting?
Comment 1 Peter.Nikol 2007-07-04 14:03:47 CEST
Created attachment 315 [details]
Example Application to show the multiplication effect of the timeouts
Comment 2 Peter.Nikol 2007-07-04 15:18:50 CEST
Using the example application:

console 1: java server ping.ior
console 2: java PingClient ping.ior

console 1: Control-C

At the end of each console 2 output line there is a number which gives the
latency in ms of one invokation of the operation doSimplePing. 

In PingClient there are 4 Clients running in parallel calling the Ping Interface.

This is as expected while server hosting the Ping Implementation is running.

After stop of server application, output changes to something like "retries
exceeded ..." and the number is something like N * CONNECTION_TIMOUT * RETRYS.

With the given jacorb_properties file and the usage of 4 Threads in the current
PingClient Application you can see effectiv latencies of up to 4 * 500 * 5 = 10
000 ms and sometimes more.
Comment 3 Peter.Nikol 2007-09-18 15:58:24 CEST
Created attachment 325 [details]
patch for correct client timeouts (RTT)
Comment 4 Peter.Nikol 2007-09-18 16:02:14 CEST
Created attachment 326 [details]
short desription of the patch
Comment 5 Richard Ridgway 2007-11-13 11:51:39 CET
*** Bug 549 has been marked as a duplicate of this bug. ***
Comment 6 Martin Cornelius 2007-11-21 16:01:33 CET
See also Bug 814, which is closely related
Comment 7 Peter.Nikol 2008-01-17 16:05:34 CET
Created attachment 340 [details]
Improved patch (don't use the former one - take this instead).
Comment 8 Robert Bienias 2009-02-06 17:27:37 CET
Bug: #787
Thema: Improved patch for client timeouts (RTT)

This patch is an enhancement of Peter Nikol's last patch and includes his latest changes.

Test case:
--------------------------
Description:

We want to send a blocking CORBA call with a timeout.
If the call cannot be completed within the specified time, an org.omg.CORBA.TIMEOUT exception is thrown.

Used CORBA interface:
interface Ping
{
  void doSimplePing(in long timeoutMillis);
};

The client calls the method 'doSimplePing(500)'. 
The server receives the request and delays the response for 500ms.

Example:
1. doSimplePing(500),  request timeout  250ms
   --> org.omg.CORBA.TIMEOUT
2. doSimplePing(500),  request timeout  700ms
   --> no timeout, call returned within 700ms
3. doSimplePing(5000),  request timeout 7000ms
   --> no timeout, call returned within 7000ms

With this test application it is possible to create timeouts explicitely.


Test procedure 1:
- Start the server application on machine 1
- Copy the ping.ior from the server to the client
- Start the client application on machine 2
  --> Test behaviour is as in example above described
- Unplug the network cable
  --> All CORBA requests lead to an org.omg.CORBA.TIMEOUT exception or
      to an org.omg.CORBA.COMM_FAILURE exception.
- Plug in the network cable
  --> After catching an org.omg.CORBA.COMM_FAILURE exception the application
      reinitializes the connection rereading the IOR file.
  

Test procedure 2:
- Start the server application on machine 1
- Copy the ping.ior from the server to the client
- Start the client application on machine 2
  --> Test behaviour is as in example above described
- Unplug the network cable
  --> All CORBA requests lead to an org.omg.CORBA.TIMEOUT exception or
      to an org.omg.CORBA.COMM_FAILURE exception.
- Restart the server application (a new port is used)
- Copy the ping.ior from the server to the client
- Plug in the network cable
  --> After catching an org.omg.CORBA.COMM_FAILURE exception the application
      reinitializes the connection rereading the IOR file.


Solved problems:
----------------
- Timeout problem as Peter Nikol described in bug #787
- If the request timeout was smaller than the connection timeout, 
  an org.omg.CORBA.TIMEOUT exception was thrown. 
  If the server was restartet meanwhile and the server port number has changed,
  the application never gets an org.omg.CORBA.COMM_FAILURE exception,
  if the timeout value is small.
  While the Executor detects the org.omg.CORBA.COMM_FAILURE the application 
  receives only a org.omg.CORBA.TIMEOUT, which is not an indicator for an 
  defective connection.
  There is no trigger to reread the IOR file and create a new connection.
  
  ---> Therefor a connection with an org.omg.CORBA.COMM_FAILURE exception
       is marked as invalid. 
       If the client wants to send a CORBA call through the same connection,
       an org.omg.CORBA.COMM_FAILURE exception 
       (instead of org.omg.CORBA.TIMEOUT) is thrown immediately.
- Sporadic deadlocks after unplugging the network cable 
  (Patch Marc Heide, bug #708)
- While the connection is invalid, 
  it will not be returned on request to a client thread.
  In this case the ClientConnectionManager will create a new connection with
  the same profile and returns it.


Modified classes:
- ClientConnection
- ClientConnectionManager
- ClientGIOPConnection
- GIOPConnection
- ClientIIOPConnection
- Delegate

The changes are based on JacORB release 2.3.0 

Description of the patch:
-------------------------

- ClientConnection:
  Method 'isConnectionInvalid()' has been added to check, 
  if the connection is valid.

- ClientConnectionManager:
  1. The method 'getConnection(org.omg.ETF.Profile profile)' checks,
     if a VALID connection with the specified profile already exists
     in the connections pool.
     Otherwise a new connection is created and returned.
  2. The method 'releaseConnection(ClientConnection connection)' checks, 
     if the connection can be closed and removed from the connections pool.
     It can be closed, if there is no client using this connection
     [connection.decClients()] 
     or the connection is invalid [connection.isConnectionInvalid()].
     In case of having an invalid connection it is removed from 
     the connections pool.

- ClientGIOPConnection:
  In method 'closeAllowReopen()' getting the write lock has been moved before
  the 'synchronized(connect_sync)' statement to avoid deadlocks.
  (Bug #759, Richard Ridgeway)

- GIOPConnection:
  1. The write lock implementation [getWriteLock() and releaseWriteLock()] 
     has been changed, because we assume that holding a write lock and 
     requesting it for a second time in the same thread can lead to deadlocks.
  2. If an 'org.omg.CORBA.COMM_FAILURE' exception occured, 
     the connection is marked as invalid.
     The next time this connection is used an 'org.omg.CORBA.COMM_FAILURE'
     exception is thrown immediately.
  3. Closing the connection in the methods 'getMessage()' and
     'sendMessage(MessageOutputStream out)' after an error occured
     has been removed according to the patch from Marc Heide (Bug #708).

- ClientIIOPConnection:
  A 'java.net.ConnectException' in method 
  'connect(org.omg.ETF.Profile server_profile, long time_out)' is caught as
  an 'IOException' and is rethrown as an 'org.omg.CORBA.COMM_FAILURE' exception.
  
- Delegate:
  1. The 'bind_sync' object has been replaced by a 'ReentrantLock' object from
     the concurrent package.
     Using this object it is possible to set a blocking timeout.
     (Enhanced patch from Peter Nikol)
  2. Sending CORBA messages [method 'invoke_internal(...)'] has been uncoupled
     from the application thread.
     Therefor an 'Executor' object is used,
     which sends the messages in an own thread.
     For each Delegate object one sending thread (Executor) is used.


We hope, that our bugfixes are helpful for the JacORB community and will find the way into the next major release.
We did a careful test of the software bugfixes. Please understand, that the fixes come without any guarantee.

Comment 9 Robert Bienias 2009-02-06 17:37:47 CET
Created attachment 359 [details]
Improved patch for client timeouts (RTT)

This is the attachment (diff files) for changes described in comment 8
Comment 10 Robert Bienias 2009-02-06 17:45:32 CET
Created attachment 360 [details]
Improved patch for client timeouts (RTT) [Java source files]

This is the attachment (Java source files) for changes described in comment 8
Comment 11 Robert Bienias 2010-12-03 09:25:46 CET
Created attachment 382 [details]
Improved patch for client timeouts (RTT)

Improved patch for client timeouts (RTT).
A thread pool is used for CORBA timeout calls.  [Text diff files]
Comment 12 Nick Cross 2011-11-14 22:54:50 CET
[Emailed submitter on 11/11/11 requesting they test CVS head and whether their changes are still necessary (given OCI's NIO addition) or is a subset of the changes required] 
Comment 13 Peter.Nikol 2011-11-16 08:56:23 CET
(In reply to comment #12)
> [Emailed submitter on 11/11/11 requesting they test CVS head and whether their
> changes are still necessary (given OCI's NIO addition) or is a subset of the
> changes required] 
> Hallo Nick,
sorry for the delayed answer. We are interested in the fixes of this bugfix and will do a test of the current head.

Due to heavy project load, we will not be able to start immediatelly. Our intension is to have first test results until end of november.

Thanks

Peter

Comment 14 Peter.Nikol 2011-11-16 08:57:40 CET
(In reply to comment #12)
> [Emailed submitter on 11/11/11 requesting they test CVS head and whether their
> changes are still necessary (given OCI's NIO addition) or is a subset of the
> changes required] 
> Hallo Nick,
sorry for the delayed answer. We are interested in the fixes of this bugfix and will do a test of the current head.

Due to heavy project load, we will not be able to start immediatelly. Our intension is to have first test results until end of november.

Thanks

Peter

Comment 15 Nick Cross 2011-12-08 21:10:16 CET
Have you made any progress testing the new version?
Comment 16 Robert Bienias 2011-12-16 18:16:05 CET
Created attachment 394 [details]
Recent corbaping test software

Inside this zip file there is a Readme.txt that you should follow.
Comment 17 Rudolf Visagie 2015-06-29 07:35:17 CEST
We are currently experiencing timeout problems using JacORB 3.0. Can you confirm whether this bug has been addressed or not? This is just so I don't waste a lot of time trying to determine if our issue exactly matches this one.
Comment 18 Nick Cross 2015-06-29 08:33:50 CEST
I have not rerun the tests on the current 3.6.1 version. It would be useful to verify if it is still an issue. 
@Rudolf Visagie : if you have a reproducable test case that would be helpful
Comment 19 Rudolf Visagie 2015-06-29 09:42:45 CEST
(In reply to Nick Cross from comment #18)
> I have not rerun the tests on the current 3.6.1 version. It would be useful
> to verify if it is still an issue. 
> @Rudolf Visagie : if you have a reproducable test case that would be helpful

Unfortunately we are having the problems in our production environment under special conditions and therefore do not have a reproducable test case. I'm not even sure that it matches this bug. Does is this bug affect the request reply timeout after a connection already has been established (jacorb.connection.client.pending_reply_timeout) or only timeouts when establishing a connection. If it's only when establishing connections, it's definitely not related to our problem.
Comment 20 Nick Cross 2015-07-01 04:10:21 CEST
@Rudolf : I believe Peter has put a description of the problem in the ticket. If you feel it doesn't meet your criteria please enter a separate ticket.