Trying to get beacon 1.1-0 working
Hi,
I'm trying to get the beacon software, version 1.1-0 working on
my own central server running NetBSD 2.0_STABLE, and I'm finding
that the beacon script appears to make some non-portable
assumptions about the host's TCP stack.
For a long time I had problems that the beacon server rejected all
(!) the TCP reports. After a bit of digging, I found the reason.
In my small setup (http://beacon.nordu.net/), the clients typically
end up sending the reports in *two* TCP segments -- the first
contains the "line 0" identity of the central beacon server, the
second segment contains the beacon info about the sender, and then
the reports for the other beacons it sees. The time between them
the first and second TCP segments can be considerable -- upwards to
25ms does not appear to be uncommon. Clients in my case are both
NetBSD, FreeBSD, and Solaris.
This particular behaviour appears to interact quite badly with the
following piece of code:
while (defined ($line = <$fh>) && ($line ne $ENDMESSAGE)) {
push(@lines, $line);
What happens is that these two TCP segments end up as two separate
sets of lines. The first set of lines consists of a single line, so
it validates as being sent to the correct beacon server, but the
report itself is otherwise empty. When the beacon server comes
around to process the second TCP segment, it rejects the report
because the first line does not match the beacon centralserver/
group/port/version line (it was already processed in the first
round).
It appears that placement of constructs such as
my $oldfh = select($client);
$| = 0;
select($oldfh);
and
$oldfh = select($client);
$| = 1;
select($oldfh);
before and after the first and last $client print statements in
send_tcp_report() makes the data (in my case, with few participants)
fit in a single segment. However, I can see the same problem
cropping up when the number of participants grows, as one can then
no longer rely on the data fitting in a single TCP segment, though I
have no observations about what would happen in that case.
In order to inter-operate with the unmodified 1.1-0 clients others
have installed to participate in "my" group, I also have an ugly
workaround for the server part of the code which in my local copy
presently looks like this:
my $count = 0;
$line = "";
while ($count < 100 && ($line ne $ENDMESSAGE)) {
$line = <$fh>;
if (!defined($line)) {
usleep(20000);
$count++;
$line="";
next;
}
if ($DEBUG>2) {
if ($count > 0) {
printf("Re-read %d times\n", $count);
}
}
$count = 0;
if ($line ne $ENDMESSAGE) {
if ($DEBUG>4) {
printf("Adding line: %s", $line);
}
push(@lines, $line);
}
}
if ($DEBUG>4) {
printf("Processing %d lines\n", scalar(@lines));
}
In my current setup, this code often ends up reporting "Re-read"
values of up towards 10, and this appears to result in bad receive
stats for the multicast data at the central server, since it is most
probably dropping the UDP packets while processing the TCP-received
data.
It seems to me that it would probably have been better to send the
unicast reports using unicast UDP with an application-level framing
than using TCP. That way, the central beacon would have a fighting
chance to participating on a reasonably level field in processing
the multicast packets, instead of being bogged down and unresponsive
while working around the above problem.
Comments?
I wonder: is 1.3 alpha any better in this regard?
Regards,
- Håvard