Re: large numbers of sockets in CLOSE_WAIT state?


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



Steve Williams wrote:
> Hi All
> 
> We have run the NLANR beacon on Solaris 9 for many months. You must
> install the TCP scaling patch on sol 9 - not sure about 10. This is not
> a scaling patch this fixes a flaw in the way the beacon handles packets
> - before this patch we had to hack a few things to get it to run
> reliably.

I couldn't get the 'tcp scaling' patch to apply - I've put all the
other ones on (seemed to make sense) and I've generated a consistent
set of patches (based on diff -c from a consistent place in the
directory hierarchy).

What state was the beacon code distribution in when you applied the
scaling patch to it?  I permuted for a while and got nowhere.

> 
> Our Solaris 9 beacons have been running for many months - but for many
> reasons will likely move to dbeacon unless someone fancies putting quite
> some effort into this one.

I've thought of that too.   If it's going to be maintained long-term, I
think that might be the best option....

		--eli

> 
> Cheers
> 
> Steve
> 
> ------------------------------------------------------------
> Steve Williams
> Technical Specialist Network Measurement and Monitoring
> Advanced Technology Group
> UKERNA
> Atlas Centre
> Didcot
> Chilton
> Oxfordshire
> OX11 0QS
> ------------
> Tel: 01235 822245
> E-mail: S.Williams@ukerna.ac.uk
> 
> 
>> -----Original Message-----
>> From: owner-beacon@dast.nlanr.net [mailto:owner-beacon@dast.nlanr.net]
> On
>> Behalf Of debbie fligor
>> Sent: 04 August 2006 15:07
>> To: Eli Dart
>> Cc: debbie fligor; beacon@dast.nlanr.net
>> Subject: Re: large numbers of sockets in CLOSE_WAIT state?
>>
> 
> On Jul 28, 2006, at 18:15, Eli Dart wrote:
> 
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> Hi all,
>>>>
>>>> I just set up a beacon central server on Solaris10.  There are large
>>>> numbers of sockets stuck in CLOSE_WAIT state involving the beacon
>>>> server talking to itself on TCP port 10004.  The number is slowly
>>>> growing, at a rate of approximately one socket per 3 minutes (see
>>>> quick
>>>> dirty script output, below).
>>>>
>>>> This only seems to appear on the central server (though I don't have
>> a
>>>> Solaris10 client just now).
> my experience with solaris (9) and MacOS, is
> 
> 1) the central server code has lots of tcp checksum failures, and so
> the tcp state gets funky (both OSes)
> 2) the client will only stay up a few hours, I would run it in a
> shell script to respawn it when it closed (solaris)
> 3) the central server will continue to display old and bad data if it
> doesn't get new data in from the broken tcp connections (both OSes)
> 4) the majority of the time the "local_loss.html" page's data would
> be accurate even if the central server isn't, but not always. (both
> OSes)
> 
> for my local campus I finally gave up on beacon and switched to
> dbeacon instead as I had only solaris and MacOS clients and servers.
> If you want to run the NLANR beacon I suggest sticking to linux and
> then it mostly works most of the time.  The caveat with dbeacon is
> there is no central server that gets a unicast update, so you can't
> tell who is trying to use the beacon if it's not working -- not a
> problem on my campus where I run all the beacon clients and know what
> is supposed to be there.
> 
> It's no longer under development so there's not much hope for bug
>> fixes.
>>>> Thoughts?
>>>>
>>>> 		--eli
>>>>
>>>>
>>>> dart@beacon % while ( 1 )
>>>> while? set count = `netstat -an | grep 10004 | grep CLOSE_WAIT | wc
>>>> -l`
>>>> while? echo -n "$count    "
>>>> while? date
>>>> while? sleep 60
>>>> while? end
>>>> 60    Fri Jul 28 15:21:02 PDT 2006
>>>> 60    Fri Jul 28 15:22:02 PDT 2006
>>>> 61    Fri Jul 28 15:23:02 PDT 2006
>>>> 61    Fri Jul 28 15:24:02 PDT 2006
>>>> 61    Fri Jul 28 15:25:02 PDT 2006
>>>> 62    Fri Jul 28 15:26:03 PDT 2006
>>>> 62    Fri Jul 28 15:27:03 PDT 2006
>>>> 62    Fri Jul 28 15:28:03 PDT 2006
>>>> 63    Fri Jul 28 15:29:03 PDT 2006
>>>> 63    Fri Jul 28 15:30:03 PDT 2006
>>>> 63    Fri Jul 28 15:31:03 PDT 2006
>>>> 64    Fri Jul 28 15:32:03 PDT 2006
>>>> 64    Fri Jul 28 15:33:03 PDT 2006
>>>> 64    Fri Jul 28 15:34:04 PDT 2006
>>>> 65    Fri Jul 28 15:35:04 PDT 2006
>>>> 65    Fri Jul 28 15:36:04 PDT 2006
>>>> 65    Fri Jul 28 15:37:04 PDT 2006
>>>> 66    Fri Jul 28 15:38:04 PDT 2006
>>>> 66    Fri Jul 28 15:39:04 PDT 2006
>>>> 66    Fri Jul 28 15:40:04 PDT 2006
>>>> 67    Fri Jul 28 15:41:05 PDT 2006
>>>> 67    Fri Jul 28 15:42:05 PDT 2006
>>>> 67    Fri Jul 28 15:43:05 PDT 2006
>>>> 68    Fri Jul 28 15:44:05 PDT 2006
>>>> 68    Fri Jul 28 15:45:05 PDT 2006
>>>> 68    Fri Jul 28 15:46:05 PDT 2006
>>>> 69    Fri Jul 28 15:47:05 PDT 2006
>>>> 69    Fri Jul 28 15:48:06 PDT 2006
>>>>
>>>>
>>>>
>>>> - --
>>>> Eli Dart                                         Office: (510)
>>>> 486-5629
>>>> ESnet Network Engineering Group                  Fax:    (510)
>>>> 486-6712
>>>> Lawrence Berkeley National Laboratory
>>>> PGP Key fingerprint = C970 F8D3 CFDD 8FFF 5486 343A 2D31 4478 5F82
>>>> B2B3
>>>> 


- --
Eli Dart                                         Office: (510) 486-5629
ESnet Network Engineering Group                  Fax:    (510) 486-6712
Lawrence Berkeley National Laboratory
PGP Key fingerprint = C970 F8D3 CFDD 8FFF 5486 343A 2D31 4478 5F82 B2B3
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (FreeBSD)

iD8DBQFE03jSLTFEeF+CsrMRAkZFAKCXmzp96DDcyMVv+wGxvVqd7M55nwCdGUaN
Vm3BZ83vJfKudOLoaft0lQI=
=aNMo
-----END PGP SIGNATURE-----



Other Mailing lists | Author Index | Date Index | Subject Index | Thread Index