RE: large numbers of sockets in CLOSE_WAIT state?
Hi All
We have run the NLANR beacon on Solaris 9 for many months. You must
install the TCP scaling patch on sol 9 - not sure about 10. This is not
a scaling patch this fixes a flaw in the way the beacon handles packets
- before this patch we had to hack a few things to get it to run
reliably.
Our Solaris 9 beacons have been running for many months - but for many
reasons will likely move to dbeacon unless someone fancies putting quite
some effort into this one.
Cheers
Steve
------------------------------------------------------------
Steve Williams
Technical Specialist Network Measurement and Monitoring
Advanced Technology Group
UKERNA
Atlas Centre
Didcot
Chilton
Oxfordshire
OX11 0QS
------------
Tel: 01235 822245
E-mail: S.Williams@ukerna.ac.uk
> -----Original Message-----
> From: owner-beacon@dast.nlanr.net [mailto:owner-beacon@dast.nlanr.net]
On
> Behalf Of debbie fligor
> Sent: 04 August 2006 15:07
> To: Eli Dart
> Cc: debbie fligor; beacon@dast.nlanr.net
> Subject: Re: large numbers of sockets in CLOSE_WAIT state?
>
> -----BEGIN PGP SIGNED MESSAGE-----
>
>
> On Jul 28, 2006, at 18:15, Eli Dart wrote:
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Hi all,
> >
> > I just set up a beacon central server on Solaris10. There are large
> > numbers of sockets stuck in CLOSE_WAIT state involving the beacon
> > server talking to itself on TCP port 10004. The number is slowly
> > growing, at a rate of approximately one socket per 3 minutes (see
> > quick
> > dirty script output, below).
> >
> > This only seems to appear on the central server (though I don't have
a
> > Solaris10 client just now).
>
> my experience with solaris (9) and MacOS, is
>
> 1) the central server code has lots of tcp checksum failures, and so
> the tcp state gets funky (both OSes)
> 2) the client will only stay up a few hours, I would run it in a
> shell script to respawn it when it closed (solaris)
> 3) the central server will continue to display old and bad data if it
> doesn't get new data in from the broken tcp connections (both OSes)
> 4) the majority of the time the "local_loss.html" page's data would
> be accurate even if the central server isn't, but not always. (both
> OSes)
>
> for my local campus I finally gave up on beacon and switched to
> dbeacon instead as I had only solaris and MacOS clients and servers.
> If you want to run the NLANR beacon I suggest sticking to linux and
> then it mostly works most of the time. The caveat with dbeacon is
> there is no central server that gets a unicast update, so you can't
> tell who is trying to use the beacon if it's not working -- not a
> problem on my campus where I run all the beacon clients and know what
> is supposed to be there.
>
> It's no longer under development so there's not much hope for bug
fixes.
>
> >
> > Thoughts?
> >
> > --eli
> >
> >
> > dart@beacon % while ( 1 )
> > while? set count = `netstat -an | grep 10004 | grep CLOSE_WAIT | wc
> > -l`
> > while? echo -n "$count "
> > while? date
> > while? sleep 60
> > while? end
> > 60 Fri Jul 28 15:21:02 PDT 2006
> > 60 Fri Jul 28 15:22:02 PDT 2006
> > 61 Fri Jul 28 15:23:02 PDT 2006
> > 61 Fri Jul 28 15:24:02 PDT 2006
> > 61 Fri Jul 28 15:25:02 PDT 2006
> > 62 Fri Jul 28 15:26:03 PDT 2006
> > 62 Fri Jul 28 15:27:03 PDT 2006
> > 62 Fri Jul 28 15:28:03 PDT 2006
> > 63 Fri Jul 28 15:29:03 PDT 2006
> > 63 Fri Jul 28 15:30:03 PDT 2006
> > 63 Fri Jul 28 15:31:03 PDT 2006
> > 64 Fri Jul 28 15:32:03 PDT 2006
> > 64 Fri Jul 28 15:33:03 PDT 2006
> > 64 Fri Jul 28 15:34:04 PDT 2006
> > 65 Fri Jul 28 15:35:04 PDT 2006
> > 65 Fri Jul 28 15:36:04 PDT 2006
> > 65 Fri Jul 28 15:37:04 PDT 2006
> > 66 Fri Jul 28 15:38:04 PDT 2006
> > 66 Fri Jul 28 15:39:04 PDT 2006
> > 66 Fri Jul 28 15:40:04 PDT 2006
> > 67 Fri Jul 28 15:41:05 PDT 2006
> > 67 Fri Jul 28 15:42:05 PDT 2006
> > 67 Fri Jul 28 15:43:05 PDT 2006
> > 68 Fri Jul 28 15:44:05 PDT 2006
> > 68 Fri Jul 28 15:45:05 PDT 2006
> > 68 Fri Jul 28 15:46:05 PDT 2006
> > 69 Fri Jul 28 15:47:05 PDT 2006
> > 69 Fri Jul 28 15:48:06 PDT 2006
> >
> >
> >
> > - --
> > Eli Dart Office: (510)
> > 486-5629
> > ESnet Network Engineering Group Fax: (510)
> > 486-6712
> > Lawrence Berkeley National Laboratory
> > PGP Key fingerprint = C970 F8D3 CFDD 8FFF 5486 343A 2D31 4478 5F82
> > B2B3
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.4 (FreeBSD)
> >
> > iD8DBQFEypqcLTFEeF+CsrMRAismAJ9dl4+v93bA+g6864jM2A7ov4LHzQCdGtng
> > D+H31LIhJgFmK83bPtU+MmM=
> > =gYAf
> > -----END PGP SIGNATURE-----
> >
> >
>
> - -----
> - -debbie
> Debbie Fligor, n9dn Network Engineer, CITES, Univ. of Il
> email: fligor@uiuc.edu <http://www.uiuc.edu/ph/www/fligor>
> "Every keystroke can be monitored. And the computers never forget."
>
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.3 (Darwin)
>
> iQCVAwUBRNNUgJEN6XnnHVONAQGi1AP9HhP+jaK0cSLwKIKKvSjmXjJx+75PuPzE
> ocIRB7/QS6HUwBtnwJemVh99Wvr87xBEKBPDyS2LaktsSvNGG4uIXRPnrWMmNQLk
> EFvVqyB7zWzoQx7UPEdn2Ym6EExI5pdMUOkdoXDfyNpRoei/lvbYAmHRjR/tvSjP
> j+I2+UxT48g=
> =0v5i
> -----END PGP SIGNATURE-----