[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: reproducible URLs



On September 3, 1998 at 13:10, Claire McNab wrote:

> > For this reason I prefer MD5 sums of the message body -- there is
> > statistically only a 1:18446744073709551616 chance of matching a
> > false positive.  For anyone interested, I've written some sendmail
> > 8.9.1 patches to add md5sums at the sendmail level (based on the work of
> > Martin Hamilton) and also have a procmail recipe to do the same.
> 
> This sounds like a useful issue to tackle.  I've been hit by it a few 
> times, when from other parts of my site, or in in other messages to 
> the list, I have referred to articles by filename ... only to find 
> that later, when I have rebuilt the archives to roll out new .rc 
> files, the filename has changed :(
> 
> However, I wonder about the MD5 method.  Without knowing anything 
> about MD5, could it work with 8.3 filenames?  I buid my archives on a 
> DOS box, so am constrained to that format.

Such a method will not work under 8.3 filenames.  The current method
is friendly to 8.3 systems.

> I am also not concrned about the lack of message-IDs: this problem 
> becomes visible quite quickly in my setup, as articles are repeatedly 
> added to the database on each archive build.  When I spot this, I 
> just edit the mbox file and add a message-id of the form
> poster's_name_YYMMDDHHMMSS_something_random@no-valid-msg-id
> (I know this is a prob for others, and I recognise the difficulty -- 
> I'm just saying its not a prob for me, though I hope it would be 
> supported for the benefit of others, esp those with more heavily 
> automated systems).

v2.3 will create a message-id for messages w/o one.  The id has
the string "NO-ID-FOUND" in it so one can tell the id was generated
be MHonArc.

> So it occurred to me that one way of implementing this would be to 
> create a new .db file (e.g. filename.db), which would record the 
> filenames used for each message ID and for each MD5 sum.  That way 
> the chances of a duplicate occurring are *very* low: it would require 
> a duplicate MD5 sum *and* a duplicate or missding msg-id.
> 
> AFAICS, mhonarc.db is wiped when the archives are rebuilt ... and all 
> that would be needed is to ensure that filename.db is not wiped on a 
> rebuild, and its data reused.   That way, we could retain the current 
> flexibility of filename format (which has other advantages, such as 
> being reasonably transparent) and add permanency.
> 
> How does that sound?

Changing the v2.x code base to support different filenames from the
current convention will take some work.  Also, if such a feature
were to be added to v2.x, the current filename style should still be
supported.  I.e.  Alternate schemes would be triggered by a resource.

Using messsage-ids (or MD5 sums) is something I will look into
for v2.x, but after v2.3 is released.

	--ewh

----
             Earl Hood              | University of California: Irvine
      ehood@medusa.acs.uci.edu      |      Electronic Loiterer
http://www.oac.uci.edu/indiv/ehood/ | Dabbler of SGML/WWW/Perl/MIME