RE: large numbers of sockets in CLOSE_WAIT state?


Hi All

We have run the NLANR beacon on Solaris 9 for many months. You must
install the TCP scaling patch on sol 9 - not sure about 10. This is not
a scaling patch this fixes a flaw in the way the beacon handles packets
- before this patch we had to hack a few things to get it to run
reliably.

Our Solaris 9 beacons have been running for many months - but for many
reasons will likely move to dbeacon unless someone fancies putting quite
some effort into this one.

Cheers

Steve

------------------------------------------------------------
Steve Williams
Technical Specialist Network Measurement and Monitoring
Advanced Technology Group
UKERNA
Atlas Centre
Didcot
Chilton
Oxfordshire
OX11 0QS
------------
Tel: 01235 822245
E-mail: S.Williams@ukerna.ac.uk


> -----Original Message-----
> From: owner-beacon@dast.nlanr.net [mailto:owner-beacon@dast.nlanr.net]
On
> Behalf Of debbie fligor
> Sent: 04 August 2006 15:07
> To: Eli Dart
> Cc: debbie fligor; beacon@dast.nlanr.net
> Subject: Re: large numbers of sockets in CLOSE_WAIT state?
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> 
> 
> On Jul 28, 2006, at 18:15, Eli Dart wrote:
> 
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Hi all,
> >
> > I just set up a beacon central server on Solaris10.  There are large
> > numbers of sockets stuck in CLOSE_WAIT state involving the beacon
> > server talking to itself on TCP port 10004.  The number is slowly
> > growing, at a rate of approximately one socket per 3 minutes (see
> > quick
> > dirty script output, below).
> >
> > This only seems to appear on the central server (though I don't have
a
> > Solaris10 client just now).
> 
> my experience with solaris (9) and MacOS, is
> 
> 1) the central server code has lots of tcp checksum failures, and so
> the tcp state gets funky (both OSes)
> 2) the client will only stay up a few hours, I would run it in a
> shell script to respawn it when it closed (solaris)
> 3) the central server will continue to display old and bad data if it
> doesn't get new data in from the broken tcp connections (both OSes)
> 4) the majority of the time the "local_loss.html" page's data would
> be accurate even if the central server isn't, but not always. (both
> OSes)
> 
> for my local campus I finally gave up on beacon and switched to
> dbeacon instead as I had only solaris and MacOS clients and servers.
> If you want to run the NLANR beacon I suggest sticking to linux and
> then it mostly works most of the time.  The caveat with dbeacon is
> there is no central server that gets a unicast update, so you can't
> tell who is trying to use the beacon if it's not working -- not a
> problem on my campus where I run all the beacon clients and know what
> is supposed to be there.
> 
> It's no longer under development so there's not much hope for bug
fixes.
> 
> >
> > Thoughts?
> >
> > 		--eli
> >
> >
> > dart@beacon % while ( 1 )
> > while? set count = `netstat -an | grep 10004 | grep CLOSE_WAIT | wc
> > -l`
> > while? echo -n "$count    "
> > while? date
> > while? sleep 60
> > while? end
> > 60    Fri Jul 28 15:21:02 PDT 2006
> > 60    Fri Jul 28 15:22:02 PDT 2006
> > 61    Fri Jul 28 15:23:02 PDT 2006
> > 61    Fri Jul 28 15:24:02 PDT 2006
> > 61    Fri Jul 28 15:25:02 PDT 2006
> > 62    Fri Jul 28 15:26:03 PDT 2006
> > 62    Fri Jul 28 15:27:03 PDT 2006
> > 62    Fri Jul 28 15:28:03 PDT 2006
> > 63    Fri Jul 28 15:29:03 PDT 2006
> > 63    Fri Jul 28 15:30:03 PDT 2006
> > 63    Fri Jul 28 15:31:03 PDT 2006
> > 64    Fri Jul 28 15:32:03 PDT 2006
> > 64    Fri Jul 28 15:33:03 PDT 2006
> > 64    Fri Jul 28 15:34:04 PDT 2006
> > 65    Fri Jul 28 15:35:04 PDT 2006
> > 65    Fri Jul 28 15:36:04 PDT 2006
> > 65    Fri Jul 28 15:37:04 PDT 2006
> > 66    Fri Jul 28 15:38:04 PDT 2006
> > 66    Fri Jul 28 15:39:04 PDT 2006
> > 66    Fri Jul 28 15:40:04 PDT 2006
> > 67    Fri Jul 28 15:41:05 PDT 2006
> > 67    Fri Jul 28 15:42:05 PDT 2006
> > 67    Fri Jul 28 15:43:05 PDT 2006
> > 68    Fri Jul 28 15:44:05 PDT 2006
> > 68    Fri Jul 28 15:45:05 PDT 2006
> > 68    Fri Jul 28 15:46:05 PDT 2006
> > 69    Fri Jul 28 15:47:05 PDT 2006
> > 69    Fri Jul 28 15:48:06 PDT 2006
> >
> >
> >
> > - --
> > Eli Dart                                         Office: (510)
> > 486-5629
> > ESnet Network Engineering Group                  Fax:    (510)
> > 486-6712
> > Lawrence Berkeley National Laboratory
> > PGP Key fingerprint = C970 F8D3 CFDD 8FFF 5486 343A 2D31 4478 5F82
> > B2B3
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.4 (FreeBSD)
> >
> > iD8DBQFEypqcLTFEeF+CsrMRAismAJ9dl4+v93bA+g6864jM2A7ov4LHzQCdGtng
> > D+H31LIhJgFmK83bPtU+MmM=
> > =gYAf
> > -----END PGP SIGNATURE-----
> >
> >
> 
> - -----
> - -debbie
> Debbie Fligor, n9dn       Network Engineer, CITES, Univ. of Il
> email: fligor@uiuc.edu          <http://www.uiuc.edu/ph/www/fligor>
> "Every keystroke can be monitored. And the computers never forget."
> 
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.3 (Darwin)
> 
> iQCVAwUBRNNUgJEN6XnnHVONAQGi1AP9HhP+jaK0cSLwKIKKvSjmXjJx+75PuPzE
> ocIRB7/QS6HUwBtnwJemVh99Wvr87xBEKBPDyS2LaktsSvNGG4uIXRPnrWMmNQLk
> EFvVqyB7zWzoQx7UPEdn2Ym6EExI5pdMUOkdoXDfyNpRoei/lvbYAmHRjR/tvSjP
> j+I2+UxT48g=
> =0v5i
> -----END PGP SIGNATURE-----



Other Mailing lists | Author Index | Date Index | Subject Index | Thread Index