RE: Trying to get beacon 1.1-0 working
That is interesting as I was under the impression that it was only Solaris that had a problem.
Waiting for a fix - or a comment - from the beacon guys I added a quite large delay in front of the select to ensure it had time to get a full data set - in the server only - leaving the clients all as they were. Bad fix but works OK.
Seemed that really the software should not make assumptions about how the OS will pass/build packets.
Just seen 1.3 is out - wonder if that fixes this issue?
Steve
-----------------------------------------------
Steve Williams
Technical Specialist Measurement and Monitoring
Advanced Technology Group
UKERNA
Atlas Centre, Chilton, Didcot, Oxon OX11 0QS
-----------------------------------------------
S.Williams@ukerna.ac.uk
Tel: +44 (0)1235 822245
GDS Video: 0044 01100 107
> -----Original Message-----
> From: owner-beacon@dast.nlanr.net [mailto:owner-beacon@dast.nlanr.net] On
> Behalf Of Havard Eidnes
> Sent: 25 August 2005 21:42
> To: beacon@dast.nlanr.net
> Subject: Trying to get beacon 1.1-0 working
>
> Hi,
>
> I'm trying to get the beacon software, version 1.1-0 working on
> my own central server running NetBSD 2.0_STABLE, and I'm finding
> that the beacon script appears to make some non-portable
> assumptions about the host's TCP stack.
>
> For a long time I had problems that the beacon server rejected all
> (!) the TCP reports. After a bit of digging, I found the reason.
> In my small setup (http://beacon.nordu.net/), the clients typically
> end up sending the reports in *two* TCP segments -- the first
> contains the "line 0" identity of the central beacon server, the
> second segment contains the beacon info about the sender, and then
> the reports for the other beacons it sees. The time between them
> the first and second TCP segments can be considerable -- upwards to
> 25ms does not appear to be uncommon. Clients in my case are both
> NetBSD, FreeBSD, and Solaris.
>
> This particular behaviour appears to interact quite badly with the
> following piece of code:
>
> while (defined ($line = <$fh>) && ($line ne $ENDMESSAGE)) {
> push(@lines, $line);
>
> What happens is that these two TCP segments end up as two separate
> sets of lines. The first set of lines consists of a single line, so
> it validates as being sent to the correct beacon server, but the
> report itself is otherwise empty. When the beacon server comes
> around to process the second TCP segment, it rejects the report
> because the first line does not match the beacon centralserver/
> group/port/version line (it was already processed in the first
> round).
>
> It appears that placement of constructs such as
>
> my $oldfh = select($client);
> $| = 0;
> select($oldfh);
>
> and
>
> $oldfh = select($client);
> $| = 1;
> select($oldfh);
>
> before and after the first and last $client print statements in
> send_tcp_report() makes the data (in my case, with few participants)
> fit in a single segment. However, I can see the same problem
> cropping up when the number of participants grows, as one can then
> no longer rely on the data fitting in a single TCP segment, though I
> have no observations about what would happen in that case.
>
> In order to inter-operate with the unmodified 1.1-0 clients others
> have installed to participate in "my" group, I also have an ugly
> workaround for the server part of the code which in my local copy
> presently looks like this:
>
> my $count = 0;
> $line = "";
> while ($count < 100 && ($line ne $ENDMESSAGE)) {
> $line = <$fh>;
> if (!defined($line)) {
> usleep(20000);
> $count++;
> $line="";
> next;
> }
> if ($DEBUG>2) {
> if ($count > 0) {
> printf("Re-read %d times\n", $count);
> }
> }
> $count = 0;
> if ($line ne $ENDMESSAGE) {
> if ($DEBUG>4) {
> printf("Adding line: %s", $line);
> }
> push(@lines, $line);
> }
> }
> if ($DEBUG>4) {
> printf("Processing %d lines\n", scalar(@lines));
> }
>
> In my current setup, this code often ends up reporting "Re-read"
> values of up towards 10, and this appears to result in bad receive
> stats for the multicast data at the central server, since it is most
> probably dropping the UDP packets while processing the TCP-received
> data.
>
> It seems to me that it would probably have been better to send the
> unicast reports using unicast UDP with an application-level framing
> than using TCP. That way, the central beacon would have a fighting
> chance to participating on a reasonably level field in processing
> the multicast packets, instead of being bogged down and unresponsive
> while working around the above problem.
>
> Comments?
>
> I wonder: is 1.3 alpha any better in this regard?
>
> Regards,
>
> - Håvard