Read only archive ; use https://github.com/JacORB/JacORB/issues for new issues

Bug 708

Summary: GIOPConnection is not safe against malicious or malfunctioned peers
Product: JacORB Reporter: Marc Heide <marc.heide>
Component: ORBAssignee: André Spiegel <spiegel>
Status: RESOLVED FIXED    
Severity: normal CC: asi, crotwell, kent.au, Richard_Ridgway, rnc
Priority: P2    
Version: 2.3.0 beta 2   
Hardware: All   
OS: All   
Attachments: patches getMessage() and sendMessage() to close connection in described situations
patch for deadlock problem introduced by former patches (against 2.3 beta2)
test client/server for deadlock/timeout
patch against 2.3 beta to remove all calls to close() operation
Solve potential deadlock in GIOPConnection.
Solve potential deadlock in GIOPConnection. Last update 08 Jun 2009.

Description Marc Heide 2006-07-20 09:34:14 CEST
Problem affects getMessage() and sendMessage() (read and write direction)

getMessage():
if a malfunctioned or malicious peer writes garbage to socket which is not a
valid GIOP header, the ORB does a permanent read() which may result in very high
CPU load. Even if this is not an attack but a wrong protocol or anything else,
it makes no sense to keep this connection opened, since the chance to get in
synch with peer is minimal once a not valid header (12 bytes) was read. This was
already discussed and submitted to Andre Spiegel (including a test case)

sendMessage():
occuring IOExceptions during send are ignored at the moment, this may lead to
hanging resources or extremly slow processing of requests in overload
situations. Instead the connection should be closed in these situations since
the peer obviously is not able to get the response messages properly.
scenario:
 - peer gets in overload 
 - peer doesnt read data in time from its socket
 - system buffers fill up with written data
 - if system buffers are full, write() operation blocks (possibly forever since  
   this operation is not secured by SO_TIMEOUT parameter!)
 - even by providing own sockets which enable timeouts during write(), request 
   processing is extremly slow, since every thread responding to a request runs 
   into timeout situation

a patch is attached (against 2.2.3)
Comment 1 Marc Heide 2006-07-20 09:36:16 CEST
Created attachment 243 [details]
patches getMessage() and sendMessage() to close connection in described situations
Comment 2 André Spiegel 2006-08-01 11:35:01 CEST
Applied both parts of the patch on CVS HEAD.  Thanks.
Comment 3 Marc Heide 2006-10-26 14:09:52 CEST
There was an error introduced in promoted and applied patch for GIOPConnection.
This error may lead to deadlocks between reader and writer threads which might
try to close this socket concurrently. 

This is because 2 sync Objects are relevant for closing the socket:
connect_sync
write_sync

they get allocated by implementation of streamClosed of ClientGIOPConnection in
order: 
connect_sync -> write_sync (reader thread)

but while sending a message via sendMessage they get allocated in order: 
write_sync -> connect_sync (writer thread already has write_lock if he calls
streamClosed)

solution: call releaseWriteLock() before calling streamClosed() in sendMessage()
in error situations.

applied patch is against 2.3 beta2
Comment 4 Marc Heide 2006-10-26 14:12:30 CEST
Created attachment 253 [details]
patch for deadlock problem introduced by former patches (against 2.3 beta2)
Comment 5 André Spiegel 2006-10-31 17:37:59 CET
Second patch is installed.  Thanks, Marc.
Comment 6 Philip Crotwell 2006-12-01 20:46:49 CET
*** Bug 701 has been marked as a duplicate of this bug. ***
Comment 7 Philip Crotwell 2006-12-01 21:59:56 CET
I believe that there is still a deadlock situation caused by this code. Here is
the test I did.

First, in StreamConnectionBase.java, about line 126, I changed it to be this, so
that every so often it throws a IOException that looks like the one we were
seeing in our clients. Unfortnantly, the real sockect exception has proven hard
to reproduce, but I believe this test is a good first test. I only did this in
the client's jacorb.jar.

Next I ran a simple "hello world" client and server (attached). The client orb
sees the IOException, marks the socket as bad, and creates a new socket to the
server successfully. However, the local call that triggered the IOException
doesn't return with the COMM_FAILURE and the client hangs until the client
timeout is reached. Subsequent calls to the same server also timeout.

The output of both the client and server is in client.out and server.out in the
tarball, as well as a thread dump on the client showing the deadlocked threads.

It is not clear to me how to resolve this, but at least there is a test case to
demonstrate it.

StreamConnectionBase.java, line 126:
    int USC_count = 0;
    
    public void write (boolean is_first,
                       boolean is_last,
                       byte[] data,
                       int offset,
                       int length,
                       long time_out )
    {

        try
        {
            if (USC_count++ % 10 == 9) {
                java.net.SocketException s = new
java.net.SocketException("Broken pipe");
                java.io.IOException io = new java.io.IOException();
                io.initCause(s);
                System.out.println("Fake 'Broken pipe' "+io.getMessage());
                throw io;
            }
Comment 8 Philip Crotwell 2006-12-01 22:02:40 CET
Created attachment 256 [details]
test client/server for deadlock/timeout
Comment 9 Marc Heide 2006-12-04 14:57:13 CET
Yes you are right, at least the client locks up, but this is not caused by this
code, its caused by wrong behavior in Delegate.java. The Delegate simply ignores
the COMM_FAILURE exception and on next call the Delegate is still in state:
bound=true, which means simply send the request (including opening a new
transport). BUT: the previous close() has forced the MessageReceptor (which
reads answers from socket) to quit and so there is no Reader any more, thats why
your ReplyReceivers hangs forever. Possible Solutions

Delegate should be adapted to handle COMM_FAILURE exceptions like TRANSIENT
exceptions (try_rebind), would be best solution.

As a workaround you could throw away a narrowed object reference (to garbage
collection) once you got a COMM_FAILURE exception in client code, but this
cannot be the final solution.

Comment 10 Philip Crotwell 2006-12-06 17:20:11 CET
I tried the idea of making the Delegate treat COMM_FAILUREs in the same way as
TRANSIENTs, ie try_rebind(), but it did not seem to help.

I addition, I tried creating a new ref by reloading each corba object from the
ior via string_to_object, and this seemed to work occassionally but not always.
I expect there is a race condition with the garbage collector such that if the
gc cleans the old object in time, then there will be a new Delegate, but if not
then you are still dealing with the old one. Since you can never force an object
to be garbage collected, this doesn't seem like a good solution.

Here is the client side logging with debug messages, starting at the point when
the fake 'Broken pipe' is generated, till it hit the timeout. 

[jacorb.orb.giop] INFO : ClientConnectionManager: found ClientGIOPConnection to
129.252.35.30:9876 (3da850)
Fake 'Broken pipe' null
[jacorb.orb.iiop] DEBUG : Caught exception
java.io.IOException
        at
org.jacorb.orb.etf.StreamConnectionBase.write(StreamConnectionBase.java:139)
        at org.jacorb.orb.giop.GIOPConnection.write(GIOPConnection.java:887)
        at org.jacorb.orb.CDROutputStream.write(CDROutputStream.java:438)
        at
org.jacorb.orb.giop.ServiceContextTransportingOutputStream.write_to(ServiceContextTransportingOutputStream.java:148)
        at
org.jacorb.orb.giop.RequestOutputStream.write_to(RequestOutputStream.java:251)
        at org.jacorb.orb.giop.GIOPConnection.sendMessage(GIOPConnection.java:971)
        at org.jacorb.orb.giop.GIOPConnection.sendRequest(GIOPConnection.java:923)
        at
org.jacorb.orb.giop.ClientConnection.sendRequest(ClientConnection.java:323)
        at
org.jacorb.orb.giop.ClientConnection.sendRequest(ClientConnection.java:304)
        at org.jacorb.orb.Delegate.invoke_internal(Delegate.java:1024)
        at org.jacorb.orb.Delegate.invoke(Delegate.java:939)
        at org.omg.CORBA.portable.ObjectImpl._invoke(ObjectImpl.java:457)
        at demo.hello._GoodDayStub.hello_simple(_GoodDayStub.java:32)
        at demo.hello.Client$1.run(Client.java:26)
        at java.lang.Thread.run(Thread.java:613)
Caused by: java.net.SocketException: Broken pipe
        at
org.jacorb.orb.etf.StreamConnectionBase.write(StreamConnectionBase.java:138)
        ... 14 more
[jacorb.giop.conn] ERROR : Failed to write GIOP message due to COMM_FAILURE
org.omg.CORBA.COMM_FAILURE: IOException: java.io.IOException  vmcid: 0x0  minor
code: 0  completed: No
        at
org.jacorb.orb.etf.ConnectionBase.to_COMM_FAILURE(ConnectionBase.java:143)
        at
org.jacorb.orb.etf.StreamConnectionBase.write(StreamConnectionBase.java:153)
        at org.jacorb.orb.giop.GIOPConnection.write(GIOPConnection.java:887)
        at org.jacorb.orb.CDROutputStream.write(CDROutputStream.java:438)
        at
org.jacorb.orb.giop.ServiceContextTransportingOutputStream.write_to(ServiceContextTransportingOutputStream.java:148)
        at
org.jacorb.orb.giop.RequestOutputStream.write_to(RequestOutputStream.java:251)
        at org.jacorb.orb.giop.GIOPConnection.sendMessage(GIOPConnection.java:971)
        at org.jacorb.orb.giop.GIOPConnection.sendRequest(GIOPConnection.java:923)
        at
org.jacorb.orb.giop.ClientConnection.sendRequest(ClientConnection.java:323)
        at
org.jacorb.orb.giop.ClientConnection.sendRequest(ClientConnection.java:304)
        at org.jacorb.orb.Delegate.invoke_internal(Delegate.java:1024)
        at org.jacorb.orb.Delegate.invoke(Delegate.java:939)
        at org.omg.CORBA.portable.ObjectImpl._invoke(ObjectImpl.java:457)
        at demo.hello._GoodDayStub.hello_simple(_GoodDayStub.java:32)
        at demo.hello.Client$1.run(Client.java:26)
        at java.lang.Thread.run(Thread.java:613)
[jacorb.giop.conn] ERROR : GIOP connection closed due to errors during sendMessage
[jacorb.giop.conn] DEBUG : ClientGIOPConnection to 129.252.35.30:9876 (3da850):
close()
[jacorb.giop.conn] WARN : Abnormal connection termination. Lost 1 outstanding
replie(s)!
[jacorb.orb.iiop] INFO : Client-side TCP transport to 129.252.35.30:9876 closed.
[jacorb.giop.conn] DEBUG : ClientGIOPConnection to 129.252.35.30:9876 (3da850):
streamClosed()
[jacorb.giop.conn] DEBUG : ClientGIOPConnection to 129.252.35.30:9876 (3da850):
closeAllowReopen()
[jacorb.orb.delegate] DEBUG : invoke: SystemException
[jacorb.orb.delegate] DEBUG : Delegate.try_rebind
Caught exception in 18th loopIOException: java.io.IOException
[jacorb.orb.iiop] DEBUG : Transport to 129.252.35.30:9876: stream closed Socket
closed
[jacorb.orb.iiop] DEBUG : Caught exception
java.net.SocketException: Socket closed
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at
org.jacorb.orb.etf.StreamConnectionBase.read(StreamConnectionBase.java:81)
        at org.jacorb.orb.giop.GIOPConnection.getMessage(GIOPConnection.java:393)
        at
org.jacorb.orb.giop.GIOPConnection.receiveMessages(GIOPConnection.java:517)
        at org.jacorb.orb.giop.MessageReceptor.doWork(MessageReceptor.java:71)
        at org.jacorb.util.threadpool.ConsumerTie.run(ConsumerTie.java:61)
        at java.lang.Thread.run(Thread.java:613)
[jacorb.giop.conn] DEBUG : ClientGIOPConnection to 129.252.35.30:9876 (3da850):
getMessage() -- COMM_FAILURE
[jacorb.giop.conn] DEBUG : ClientGIOPConnection to 129.252.35.30:9876 (3da850):
streamClosed()
[jacorb.giop.conn] DEBUG : ClientGIOPConnection to 129.252.35.30:9876 (3da850):
closeAllowReopen()
[jacorb.util.tpool] DEBUG : [1/1] job queue empty
[jacorb.orb.delegate] DEBUG : Delegate.getReference with POA < empty>
[jacorb.orb.delegate] DEBUG : Delegate.getReference with POA < empty>
[jacorb.orb.giop] INFO : ClientConnectionManager: found ClientGIOPConnection to
129.252.35.30:9876 (3da850)
[jacorb.giop.conn] DEBUG : ClientGIOPConnection to 129.252.35.30:9876 (3da850):
sendMessage() -- opening transport
[jacorb.orb.iiop] DEBUG : Trying to connect to 129.252.35.30:9876 with
timeout=90000.
[jacorb.orb.giop] DEBUG : ClientConnectionManager: cannot release
ClientGIOPConnection to 129.252.35.30:9876 (3da850) (still has 4 client(s))
[jacorb.orb.delegate] DEBUG : Delegate released!
[jacorb.orb.giop] DEBUG : ClientConnectionManager: cannot release
ClientGIOPConnection to 129.252.35.30:9876 (3da850) (still has 3 client(s))
[jacorb.orb.delegate] DEBUG : Delegate released!
[jacorb.orb.giop] DEBUG : ClientConnectionManager: cannot release
ClientGIOPConnection to 129.252.35.30:9876 (3da850) (still has 2 client(s))
[jacorb.orb.delegate] DEBUG : Delegate released!
[jacorb.orb.iiop] INFO : Connected to 129.252.35.30:9876 from local port 54778
Caught exception in 19th loopclient timeout reached
[jacorb.orb.delegate] DEBUG : Delegate.getReference with POA < empty>
[jacorb.orb.delegate] DEBUG : Delegate.getReference with POA < empty>
[jacorb.orb.giop] INFO : ClientConnectionManager: found ClientGIOPConnection to
129.252.35.30:9876 (3da850)
Comment 11 Marc Heide 2006-12-11 15:41:25 CET
well, after some testing with your client and mine, i think i found the root
cause of this behavior. The GIOPConnection and the ClientConnection on top of
it, are designed as lets say "outliving" objects, so calling close() on
GIOPConnection should not be done from lower layers or GIOPConnection itself.
Instead to indicate a problem with underlaying transport it is quite sufficient
just to call streamClosed() operation and leave the real close() for outer
objects (Delegates or GIOPConnectionManagers). Calling close() before all
Delegates have finished their work with this Connection leads to error
situations where a new transport connection is opened but no MessageReceiver is
created to read the answers. 

So i simply removed all calls to close()introduced by my patches from
GIOPConnection and now it seems to work fine. See my attachement.
Comment 12 Marc Heide 2006-12-11 15:44:09 CET
Created attachment 257 [details]
patch against 2.3 beta to remove all calls to close() operation
Comment 13 Christian Sarow 2007-11-14 14:28:53 CET
are these patches included in the 2.3 - release? couldn't find it in the release
notes...
maybe someone could tell me...

regards...
christian
Comment 14 Christian Sarow 2007-11-14 16:35:38 CET
in addition to my previous entry... i compared 2.3 beta (unpatched) to the
patched 2.3 beta and to 2.3 final and found a mix of both old and patched
code... so i'm a little bit confused about this... are there any comments to the
patch from 12/11/06 and/or 2.3 ?

regards 
chris
Comment 15 Marc Heide 2007-11-15 08:46:38 CET
The latest patch from 12/11/06 is not part of 2.3.0
But with this patch it now works very stable here. 

Regards
Marc
Comment 16 André Spiegel 2008-05-05 15:40:07 CEST
I have installed the final patch from 2006-12-11 into the main code base now.  Sorry for the delay in getting this sorted out.  It will be part of the next release.
Comment 17 Kent 2009-06-01 04:42:01 CEST
Dear all,

A deadlock is found in my environment. JacORB 2.3 with la patched (removed all calls to close() from GIOPConnection). Here is the log message.

Name: RMI TCP Connection(103)-10.0.10.101
Status: WAITING at java.lang.Object@20d65

Stack: 
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:474)
org.jacorb.orb.giop.GIOPConnection.getWriteLock(GIOPConnection.java:812)
org.jacorb.orb.giop.ClientGIOPConnection.closeAllowReopen(ClientGIOPConnection.java:130)
org.jacorb.orb.giop.ClientGIOPConnection.streamClosed(ClientGIOPConnection.java:106)
org.jacorb.orb.giop.GIOPConnection.sendMessage(GIOPConnection.java:989)
org.jacorb.orb.giop.GIOPConnection.sendRequest(GIOPConnection.java:900)
org.jacorb.orb.giop.ClientConnection.sendRequest(ClientConnection.java:323)
org.jacorb.orb.giop.ClientConnection.sendRequest(ClientConnection.java:304)
org.jacorb.orb.Delegate.invoke_internal(Delegate.java:1024)
org.jacorb.orb.Delegate.invoke(Delegate.java:939)
org.omg.CORBA.portable.ObjectImpl._invoke(ObjectImpl.java:457)
CustomerModule._CustomerStub.getProvisioning(_CustomerStub.java:485)
...

Name: RMI TCP Connection(91)-10.0.10.101
Status: BLOCKED at java.lang.Object@5e320f ,Owner: RMI TCP Connection(103)-10.0.10.101

Stack: 
org.jacorb.orb.giop.GIOPConnection.sendMessage(GIOPConnection.java:930)
org.jacorb.orb.giop.GIOPConnection.sendRequest(GIOPConnection.java:900)
org.jacorb.orb.giop.ClientConnection.sendRequest(ClientConnection.java:323)
org.jacorb.orb.giop.ClientConnection.sendRequest(ClientConnection.java:304)
org.jacorb.orb.Delegate.invoke_internal(Delegate.java:1024)
org.jacorb.orb.Delegate.invoke(Delegate.java:939)
org.omg.CORBA.portable.ObjectImpl._invoke(ObjectImpl.java:457)
CustomerModule._CustomerStub.getProvisioning(_CustomerStub.java:485)
...

My workaround solution is 
1) removed all calls to close() as it is
2) create a new method streamClosedWithoutWriteLock() {just remove the call to getWriteLock()}

It works.

Kent
Comment 18 Abhishek Singh 2009-06-01 07:10:21 CEST
Hi All,

Please let me know if this issue is similar to http://www.jacorb.org/cgi-bin/bugzilla/show_bug.cgi?id=858 and the same solution can be applied here.
Kent, can you explain where to implement the method streamClosedWithoutWriteLock() ?
Comment 19 Kent 2009-06-02 08:47:40 CEST
Created attachment 363 [details]
Solve potential deadlock in GIOPConnection.

My test env. is a client on Windows XP along with TcpView + a server on Linux. Test step: 1) start up client, server and TcpView  2) triger a request from client, that will create a connection  3) close the connection with TcpView  Expect result: a new connection will be created and system still function.
Comment 20 Kent 2009-06-02 08:59:32 CEST
Dear Abhishek Singh,

I'm not 100% sure it is as same as #858. I've got COMM_FAILURE (broken pipe) in my software only when there is high stress. But I don't know why. 

My solution is using TcpView to close conection to mock one, and try to fix it. I've test my patch yesterday and have some minor turning. See attachment. Hope it can work.

Kent
Comment 21 Kent 2009-06-08 09:01:33 CEST
Created attachment 364 [details]
Solve potential deadlock in GIOPConnection. Last update 08 Jun 2009.

A bad mews is the pervious patch does not survive during a weekend's stability test. There is still a deadlock withing write_lock and connect_sync. Thus I made another patch. This time, my strategy is getting write_lock before sync connect_sync when we need both of them. See the attachment for details.
Comment 22 Kent 2009-06-12 03:54:07 CEST
FYI, the patch on 8 Jun works perfectly so far.
Cheers!
Comment 23 André Spiegel 2010-04-22 16:02:26 CEST
Reopen to make sure we get Kent's final patch into main code base.
Comment 24 André Spiegel 2010-09-06 20:25:42 CEST
Final part of patch is now committed (only the synchronization part in ClientGIOPConnection, as the other changes had already been applied).
Comment 25 Phil Mesnier 2010-09-24 18:44:34 CEST
(In reply to comment #24)
> Final part of patch is now committed (only the synchronization part in
> ClientGIOPConnection, as the other changes had already been applied).
> 

Andre,

I think there is still a problem here. I've been working with a customer who has a heavily threaded client, which happens to do quite a lot of Name Service resolving. They ran into a race condition that your fix ought to remedy, but I have thought of another possible problem.

They have many, perhaps a dozen, threads all waiting to send a request to the same name service. This means a whole bunch of threads in GIOPConnection.getWriteLock(), called from GIOPConnectoin.sendMessage().

Something happens and causes a COMM_FAILURE to be thrown to the thread that got the write lock and tried to send. This thread then calls releaseWriteLock() on its way to calling ClientGIOPConnection.closeAllowReopen(). A new thread gets to return from getWriteLock() and proceeds to try to send its message. However the transport is corrupt and unclosed, since the first thread hasn't had a chance to close the transport as it is blocked in getWriteLock(). So a second COMM_FAILURE is thrown, and another thread enters closeAllowReopen() and blocks. Since there are many threads waiting for the write lock, this scenario can occur many times before a thread in closeAllowReopen() actually gets the write lock and proceeds to close the transport. Now at this point, you've got at least one thread waiting in closeAllowReopen(). 

It is entirely possible that you also have threads waiting from sendMessage(). Lets say one of those gets a chance to go, it successfully opens a new transport, sends its request, and releases the write lock. Now, one of the threads waiting to close gets the write lock and closes the transport, which by this time will be the new one just opened. Oops! Now you have a client thread that successfully sent a message waiting forever (or timeout) for a reply it will never receive because another thread closed the transport out from under it.
Comment 26 Phil Mesnier 2011-01-14 21:41:37 CET
I've finally had a chance to test my theory about an additional deadlock scenario. I now believe this is not a problem, as sendMessage() calls ClientGIOPConnection.streamClosed() which calls ConnectionListener.streamClosed() after closeAllowReopen(). This ensures that even if a race as I hypothesized occurs, the result will be a COMMFAIL exception rather than a deadlock.

Now, I have observed that it is possible to have multiple threads waiting in closeAllowReopen(), and even for a thread to close the stream and have another reopen, to have a third previously blocked in closeAllowReopen() to then close the stream again. It might be possible to do something a little more efficient here to ensure a "reclose" doesn't happen but I think the conditions leading to this are fairly rare so I won't propose any change now.
Comment 27 Nick Cross 2011-01-19 14:40:37 CET
*** Bug 759 has been marked as a duplicate of this bug. ***
Comment 28 Nick Cross 2011-10-06 15:36:46 CEST
*** Bug 858 has been marked as a duplicate of this bug. ***