Read only archive ; use https://github.com/JacORB/JacORB/issues for new issues

Bug 1002

Summary: infinite loop in GIOPConnection.receiveMessages() after SocketException
Product: JacORB Reporter: Christian Bindseil <christian.bindseil>
Component: ORBAssignee: Mailinglist to track bugs <jacorb-bugs>
Status: RESOLVED FIXED    
Severity: major CC: jacorb
Priority: P5    
Version: 3.4   
Hardware: PC   
OS: Windows NT   

Description Christian Bindseil 2015-01-29 10:12:46 UTC
Currently we are testing our system on Java 8 and are facing a mysterious problem. In an automated performance-test after some hours and several thousand connections (sequential, not parallel) jacorb gets locked in an infinite loop. We are using JacORB 3.4, but with 3.5 the problem persists. On Java 7 everything works well. I tried to debug the problem, but it isn't easy because it happens only after hours and JacORBs code is complex. Therefore I can't provide a test case. 

My analysis of the problem:
The deadly event seems to be a SocketException thrown when GIOPConnection.close() calls transport.close() (see line 1140) where the transport is a ServerIIOPConnection:
(line numbers for jacorb 3.4)

org.omg.CORBA.COMM_FAILURE: IOException: java.net.SocketException: Socket is closed  vmcid: 0x0  minor code: 0  completed: No
org.jacorb.orb.etf.ConnectionBase.to_COMM_FAILURE(ConnectionBase.java:152)
org.jacorb.orb.iiop.IIOPConnection.handleCommFailure(IIOPConnection.java:78)
org.jacorb.orb.iiop.ServerIIOPConnection.close(ServerIIOPConnection.java:111)
org.jacorb.orb.giop.GIOPConnection.close(GIOPConnection.java:1140)
org.jacorb.orb.giop.ServerGIOPConnection.close(ServerGIOPConnection.java:237)
org.jacorb.orb.giop.ServerGIOPConnection.streamClosed(ServerGIOPConnection.java:229)
org.jacorb.orb.giop.GIOPConnection.getMessage(GIOPConnection.java:398)
org.jacorb.orb.giop.GIOPConnection.receiveMessagesLoop(GIOPConnection.java:540)
org.jacorb.orb.giop.GIOPConnection.receiveMessages(GIOPConnection.java:529)
org.jacorb.orb.giop.MessageReceptor.doWork(MessageReceptor.java:69)
org.jacorb.util.threadpool.ConsumerTie.run(ConsumerTie.java:60)
java.lang.Thread.run(Thread.java:745)

Because of this exception the do_close flag isn't set to true in line 1141 of GIOPConnection. Afterwards the loop in GIOPConnection.receiveMessages() never ends, although it gets a SocketException in every iteration. It is catching it and only logging "Unexpected error during receiveMessages. Lost a message!" - then it enters the next iteration, infinitely.

I have no idea about the cause of the closed socket and why it doesn't seem to be closed when running on Java 7. But the infinite loop is a serious issue preventing our client from switching to Java 8, which is planned for the next weeks.

Our config looks like this:

## set to false to resolve some instability we were suffering after migrating to jacorb 3.4 
jacorb.connection.client.disconnect_after_systemexception=false

ORBInitRef.NameService=http://www.x.y.z/~user/NS_Ref
jacorb.orb.print_version=on
jacorb.log.default.verbosity=2
jacorb.logfile.maxLogSize=4
jacorb.logfile=<not shown for privacy reason>
jacorb.logfile.append=false
jacorb.debug.dump_outgoing_messages=off
jacorb.debug.dump_incoming_messages=off
jacorb.giop_minor_version=2
jacorb.retries=5
jacorb.retry_interval=500
jacorb.maxManagedBufSize=28
jacorb.deferredArrayQueue=8
jacorb.hashtable_class=java.util.Hashtable
jacorb.use_bom=off
jacorb.giop.add_1_0_profiles=off
jacorb.dns.enable=off
jacorb.dns.eager_resolve=true
jacorb.compactTypecodes=off
jacorb.cacheTypecodes=off
jacorb.cachePoaNames=off
jacorb.disableClientOrbPolicies=off
jacorb.acceptor_exception_listener=org.jacorb.orb.listener.DefaultAcceptorExceptionListener
jacorb.enhanced_thread_name=off
jacorb.interop.indirection_encoding_disable=off
jacorb.interop.comet=off
jacorb.interop.lax_boolean_encoding=off
jacorb.interop.strict_check_on_tc_creation=on
jacorb.interop.chunk_custom_rmi_valuetypes=on
jacorb.interop.sun=off
jacorb.interop.null_string_encoding=off
OAIAddr=<not shown for privacy reason>
OASSLPort=<not shown for privacy reason>
org.omg.PortableInterceptor.ORBInitializerClass.standard_init=org.jacorb.orb.standardInterceptors.IORInterceptorInitializer
jacorb.use_imr=off
jacorb.use_imr_endpoint=on
jacorb.imr.allow_auto_register=off
jacorb.imr.check_object_liveness=off
ORBInitRef.ImplementationRepository=http://www.x.y.z/~user/ImR_Ref
jacorb.imr.table_file=<not shown for privacy reason>
jacorb.imr.backup_file=<not shown for privacy reason>
jacorb.imr.ior_file=<not shown for privacy reason>
jacorb.imr.timeout=
jacorb.imr.endpoint_host=
jacorb.imr.endpoint_port_number=
jacorb.imr.connection_timeout=2000
jacorb.imr.object_activation_retries=5
jacorb.imr.object_activation_sleep=50
jacorb.implname=StandardImplName
jacorb.java_exec=java -Dorg.omg.CORBA.ORBClass=org.jacorb.orb.ORB -Dorg.omg.CORBA.ORBSingletonClass=org.jacorb.orb.ORBSingleton
jacorb.security.support_ssl=on
jacorb.ssl.socket_factory=org.jacorb.security.ssl.sun_jsse.SSLSocketFactory
jacorb.ssl.server_socket_factory=org.jacorb.security.ssl.sun_jsse.SSLServerSocketFactory
jacorb.security.ssl.client.supported_options=20
jacorb.security.ssl.client.required_options=20
jacorb.security.ssl.server.supported_options=20
jacorb.security.ssl.server.required_options=20
jacorb.security.keystore=<not shown for privacy reason>
jacorb.security.keystore_password=<not shown for privacy reason>
jacorb.security.jsse.trustees_from_ks=off
jacorb.security.ssl.server.cipher_suites=SSL_DH_anon_WITH_RC4_128_MD5
jacorb.security.ssl.client.cipher_suites=SSL_DH_anon_WITH_RC4_128_MD5
jacorb.security.randomClassPlugin=
jacorb.security.ssl.ssl_listener=
jacorb.poa.monitoring=off
jacorb.poa.log.verbosity=2
jacorb.poa.thread_pool_max=20
jacorb.poa.thread_pool_min=5
jacorb.poa.threadtimeout=0
jacorb.poa.queue_wait=off
jacorb.poa.queue_max=100
jacorb.poa.queue_min=10
jacorb.poa.useServantClassLoader=off
jacorb.naming.log.verbosity=2
jacorb.naming.purge=on
jacorb.naming.ping=on
jacorb.naming.ior_filename=c:/NS_Ref
jacorb.notification.filter.thread_pool_size = 2
jacorb.notification.proxyconsumer.thread_pool_size = 2
jacorb.notification.proxysupplier.thread_pool_size = 4
jacorb.notification.supplier.poll_intervall = 1000
jacorb.notification.max_batch_size = 1
jacorb.notification.max_events_per_consumer = 100
jacorb.notification.order_policy = PriorityOrder
jacorb.notification.discard_policy = PriorityOrder
jacorb.notification.consumer.backout_interval = 5000
jacorb.notification.consumer.error_threshold = 3
jacorb.notification.proxysupplier.threadpolicy = ThreadPool
jacorb.notification.default_filter_factory = builtin
jacorb.notification.stop_time_supported = on
jacorb.notification.proxy.destroy_causes_disconnect = on
jacorb.security.sas.log.verbosity=3
jacorb.security.sas.TSS.log.verbosity=3
jacorb.security.sas.CSS.log.verbosity=3
Comment 1 Nick Cross 2015-01-30 08:24:25 UTC
(Emailed submitter for more details)
Comment 2 Nick Cross 2015-02-04 18:20:05 UTC
Discussed the issue with Christian who used Byteman (https://www.jboss.org/byteman) to assist in reliably diagnosing the problem and testing a solution.
Comment 3 Nick Cross 2015-02-05 08:33:05 UTC
Fixed by d28e0e429cb1e70f6b6a6ff0f73c29dc3d2023b0