Re: large numbers of sockets in CLOSE_WAIT state?
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Steve Williams wrote:
> Hi All
>
> We have run the NLANR beacon on Solaris 9 for many months. You must
> install the TCP scaling patch on sol 9 - not sure about 10. This is not
> a scaling patch this fixes a flaw in the way the beacon handles packets
> - before this patch we had to hack a few things to get it to run
> reliably.
I couldn't get the 'tcp scaling' patch to apply - I've put all the
other ones on (seemed to make sense) and I've generated a consistent
set of patches (based on diff -c from a consistent place in the
directory hierarchy).
What state was the beacon code distribution in when you applied the
scaling patch to it? I permuted for a while and got nowhere.
>
> Our Solaris 9 beacons have been running for many months - but for many
> reasons will likely move to dbeacon unless someone fancies putting quite
> some effort into this one.
I've thought of that too. If it's going to be maintained long-term, I
think that might be the best option....
--eli
>
> Cheers
>
> Steve
>
> ------------------------------------------------------------
> Steve Williams
> Technical Specialist Network Measurement and Monitoring
> Advanced Technology Group
> UKERNA
> Atlas Centre
> Didcot
> Chilton
> Oxfordshire
> OX11 0QS
> ------------
> Tel: 01235 822245
> E-mail: S.Williams@ukerna.ac.uk
>
>
>> -----Original Message-----
>> From: owner-beacon@dast.nlanr.net [mailto:owner-beacon@dast.nlanr.net]
> On
>> Behalf Of debbie fligor
>> Sent: 04 August 2006 15:07
>> To: Eli Dart
>> Cc: debbie fligor; beacon@dast.nlanr.net
>> Subject: Re: large numbers of sockets in CLOSE_WAIT state?
>>
>
> On Jul 28, 2006, at 18:15, Eli Dart wrote:
>
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> Hi all,
>>>>
>>>> I just set up a beacon central server on Solaris10. There are large
>>>> numbers of sockets stuck in CLOSE_WAIT state involving the beacon
>>>> server talking to itself on TCP port 10004. The number is slowly
>>>> growing, at a rate of approximately one socket per 3 minutes (see
>>>> quick
>>>> dirty script output, below).
>>>>
>>>> This only seems to appear on the central server (though I don't have
>> a
>>>> Solaris10 client just now).
> my experience with solaris (9) and MacOS, is
>
> 1) the central server code has lots of tcp checksum failures, and so
> the tcp state gets funky (both OSes)
> 2) the client will only stay up a few hours, I would run it in a
> shell script to respawn it when it closed (solaris)
> 3) the central server will continue to display old and bad data if it
> doesn't get new data in from the broken tcp connections (both OSes)
> 4) the majority of the time the "local_loss.html" page's data would
> be accurate even if the central server isn't, but not always. (both
> OSes)
>
> for my local campus I finally gave up on beacon and switched to
> dbeacon instead as I had only solaris and MacOS clients and servers.
> If you want to run the NLANR beacon I suggest sticking to linux and
> then it mostly works most of the time. The caveat with dbeacon is
> there is no central server that gets a unicast update, so you can't
> tell who is trying to use the beacon if it's not working -- not a
> problem on my campus where I run all the beacon clients and know what
> is supposed to be there.
>
> It's no longer under development so there's not much hope for bug
>> fixes.
>>>> Thoughts?
>>>>
>>>> --eli
>>>>
>>>>
>>>> dart@beacon % while ( 1 )
>>>> while? set count = `netstat -an | grep 10004 | grep CLOSE_WAIT | wc
>>>> -l`
>>>> while? echo -n "$count "
>>>> while? date
>>>> while? sleep 60
>>>> while? end
>>>> 60 Fri Jul 28 15:21:02 PDT 2006
>>>> 60 Fri Jul 28 15:22:02 PDT 2006
>>>> 61 Fri Jul 28 15:23:02 PDT 2006
>>>> 61 Fri Jul 28 15:24:02 PDT 2006
>>>> 61 Fri Jul 28 15:25:02 PDT 2006
>>>> 62 Fri Jul 28 15:26:03 PDT 2006
>>>> 62 Fri Jul 28 15:27:03 PDT 2006
>>>> 62 Fri Jul 28 15:28:03 PDT 2006
>>>> 63 Fri Jul 28 15:29:03 PDT 2006
>>>> 63 Fri Jul 28 15:30:03 PDT 2006
>>>> 63 Fri Jul 28 15:31:03 PDT 2006
>>>> 64 Fri Jul 28 15:32:03 PDT 2006
>>>> 64 Fri Jul 28 15:33:03 PDT 2006
>>>> 64 Fri Jul 28 15:34:04 PDT 2006
>>>> 65 Fri Jul 28 15:35:04 PDT 2006
>>>> 65 Fri Jul 28 15:36:04 PDT 2006
>>>> 65 Fri Jul 28 15:37:04 PDT 2006
>>>> 66 Fri Jul 28 15:38:04 PDT 2006
>>>> 66 Fri Jul 28 15:39:04 PDT 2006
>>>> 66 Fri Jul 28 15:40:04 PDT 2006
>>>> 67 Fri Jul 28 15:41:05 PDT 2006
>>>> 67 Fri Jul 28 15:42:05 PDT 2006
>>>> 67 Fri Jul 28 15:43:05 PDT 2006
>>>> 68 Fri Jul 28 15:44:05 PDT 2006
>>>> 68 Fri Jul 28 15:45:05 PDT 2006
>>>> 68 Fri Jul 28 15:46:05 PDT 2006
>>>> 69 Fri Jul 28 15:47:05 PDT 2006
>>>> 69 Fri Jul 28 15:48:06 PDT 2006
>>>>
>>>>
>>>>
>>>> - --
>>>> Eli Dart Office: (510)
>>>> 486-5629
>>>> ESnet Network Engineering Group Fax: (510)
>>>> 486-6712
>>>> Lawrence Berkeley National Laboratory
>>>> PGP Key fingerprint = C970 F8D3 CFDD 8FFF 5486 343A 2D31 4478 5F82
>>>> B2B3
>>>>
- --
Eli Dart Office: (510) 486-5629
ESnet Network Engineering Group Fax: (510) 486-6712
Lawrence Berkeley National Laboratory
PGP Key fingerprint = C970 F8D3 CFDD 8FFF 5486 343A 2D31 4478 5F82 B2B3
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (FreeBSD)
iD8DBQFE03jSLTFEeF+CsrMRAkZFAKCXmzp96DDcyMVv+wGxvVqd7M55nwCdGUaN
Vm3BZ83vJfKudOLoaft0lQI=
=aNMo
-----END PGP SIGNATURE-----