Read only archive ; use https://github.com/JacORB/JacORB/issues for new issues
Bug 1000 - COMM_FAILUREs in simple multithreaded client/server test starting with JacORB 3.4
Summary: COMM_FAILUREs in simple multithreaded client/server test starting with JacORB...
Status: RESOLVED FIXED
Alias: None
Product: JacORB
Classification: Unclassified
Component: ORB (show other bugs)
Version: 3.4
Hardware: PC Linux
: P5 major
Assignee: Mailinglist to track bugs
URL:
Depends on:
Blocks:
 
Reported: 2015-01-19 09:36 UTC by Martin Corino
Modified: 2015-01-21 09:30 UTC (History)
1 user (show)

See Also:


Attachments
Debug log (68.99 KB, text/plain)
2015-01-19 09:36 UTC, Martin Corino
Details
Looping failure after jacorb.connection.client.disconnect_after_systemexception workaround (370.43 KB, text/x-log)
2015-01-19 10:14 UTC, Martin Corino
Details
Patched regression test with multithreaded testcase (10.92 KB, text/x-java)
2015-01-19 14:09 UTC, Martin Corino
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Corino 2015-01-19 09:36:22 UTC
Created attachment 453 [details]
Debug log

I have a simple multithreaded client to singlethreaded server test that works fine from JacORB 2.4 - 3.3 which started throwing COMM_FAILUREs at me starting with JacORB 3.4 (but also in 3.5).

The test implements a trivial client for the following IDL:
interface Foo
{
  string get_string(in long tid);
};

The server creates and activates a single servant and executes ORB#run in the main thread.
The client resolves the servant reference, starts 2 threads and calls the servant's get_string() method 10 times in each thread.
The client uses DII and the server DSI.

When running this with either JacORB 3.4 or 3.5 at a certain point in the iterations COMM_FAILUREs appear. Detailed logging seems to suggest connections are being unexpectedly closed at some point.

Attached a debug log of a fairly typical testrun.

Any ideas why this may have started happening from JacORB 3.4?
Comment 1 Nick Cross 2015-01-19 09:54:21 UTC
Hi,

Would you be able to supply your test case please?

Related tickets would be bug 986, bug 967 and bug 975
Comment 2 Martin Corino 2015-01-19 10:13:41 UTC
I'll try to put together a Java test shortly. 
My current test environment is not using Java directly. Instead I use jR2CORBA which is a JRuby CORBA implementation using JacORB as underlying ORB implementation.

In the meantime I tested with the suggested workaround from the related tickets you mentioned but that only resulted in some sort of unending failure loop.
Attaching a sample of the resulting debug log.
Comment 3 Martin Corino 2015-01-19 10:14:51 UTC
Created attachment 454 [details]
Looping failure after jacorb.connection.client.disconnect_after_systemexception workaround
Comment 4 Nick Cross 2015-01-19 10:46:31 UTC
That loop matches the problem I think from bug 967.

Could you adapt a unit test from https://github.com/JacORB/JacORB/tree/master/test/regression/src/test/java/org/jacorb/test/dii to replicate the issue?
Comment 5 Martin Corino 2015-01-19 14:09:49 UTC
Created attachment 455 [details]
Patched regression test with multithreaded testcase

Please find attached a (very) simple testcase which reliably triggers the COMM_FAILURE as I encountered it.
I simply updated the standard regression test commenting out all exisiting tests and adding a new one running two threads performing 10 requests each.
In my environment each time I run this test one of the threads will reliably fail with a COMM_FAILURE at some point.
Comment 6 Nick Cross 2015-01-21 09:30:44 UTC
Fixed by SHA 4343aeb31a7ba80c44280ffa7857e249eb4734f4