Alex Harford wrote:

>>>> and there's nothing problematic with it. The repository is at any
>>>> point in time in a consistent state, and so is the backup.
>>> 
>>> However, this is still incorrect in regards to CVS.  While you are
>>> doing a backup (tar or what have you)
>> 
>> Well, maybe this is the difference. I don't use tar for backups :) 
> 
> In order for file backups to work, the directory you are backing up is
> going to have to remain *exactly* the same between the start of the
> backup and when the backup completes.  Can we agree that the backup is
> 'bad' if a file gets added or changed while the backup is occurring?

No, I don't agree with this in this generality. The reasons should become
clear in the following.

>>> At this point, the repo is not in a consistent state, and restoring
>>> from this backup would require manual labour to get it working.
>> 
>> Is your claim that the repository is not in a consistent state
>> something you know or something you think might be so?
> 
> Consider the scenario:
> 
> - Alex the Admin starts a backup
> - Dan the Developer checks in changes to a Makefile and also adds
> foo.c to the repo.
> - The backup software stores the new copy of the Makefile but it
> doesn't grab foo.c, since filesystem operations are not atomic!
> - CVS Server dies and gets restored from the backup
> - Dan the Developer checks out the code, builds, and it fails because
> the Makefile is looking for foo.c
> * Since Dan the Dev has a local copy, can he check in foo.c again
> without any problems?  How does CVS handle the file history?  Does it
> know about foo.c in the revision history, but can't find it on the
> filesystem?

A few things to consider with this scenario:

- With CVS(NT), files are "atomic". With this I mean that all history of a
file is contained in its corresponding repository (RCS based) file. So Dan
could easily run a diff, see what's different (i.e. what made it into the
backup and what not), and based on that, check in what's missing. He would
have to do that anyway (atomic commits or not), as he couldn't know whether
or not his commit made it before the backup.

- Ideally, a commit is an atomic operation on a logical set of files. In
practice, this is not always the case. In my observation, it is less common
to be than not to be, and that not only for CVS(NT). For example, I value
good commit comments. Unluckily, even though I can commit several files
with one command (and in theory this could be made atomic), all the files
then have the same comment (which is of course no general limitation; it is
a limitation of CVS(NT)). Anyway, this has the effect that I quite often
commit several files that belong to one logical set with different commands
-- so that I can commit them with different comments. (AFAIK, SVN also
allows only one commit comment per commit, not per file to be committed.)

- Given that, I don't have the expectation that the repository is
consistent from a logical point of view at any point in time. I know that
this is something like the "holy grail of commits", but in my experience
reality differs, especially when you use the repository also as central
backup and communication tool and are not writing exclusively command line
tools in C :). This is not really a question of whether a single commit
command is atomic, it's more a question of whether it's even possible or
desirable to defer commits until a logical set is complete. (Just think
about a change that involves schematic and board changes,
component/BOM/purchasing info changes, case drawing changes, firmware
changes, PC software changes, production documentation changes, user
documentation changes, all done by different people -- there's no
reasonable way this can get committed in an atomic manner as one change
set, even though it is one logical change set and none of its parts makes
sense without the others.)

- IMO this "logical consistency" is even less important with repository
restore situations. They should be (very) rare instances; if they are not
(very) rare, you have another problem that you should fix urgently. And
whenever I have to restore a repository, I'd tell all people who check in
stuff to the repository to check carefully whether everything they have on
disk is actually in the repository, and to commit what's not. I don't think
that this would be different between CVS(NT) or SVN -- as even with the
better atomicity of the SVN commit process nobody knows exactly which
changes made it and which didn't.

- A side note: If you want to play it safe, you should commit foo.c before
you commit Makefile. At least in this example, this is always safe.


> This situation doesn't occur if your backup software checks the
> filesystem again after backing up the files, but that means that you
> have to keep trying to back up until there are no checkins, or you
> need to lock the users out, and it's no longer a 'live' backup at that
> point.

Well, yes or no. For whoever is really concerned with that, the suggested
way would be to stop the server process (or lock out the users), run the
backup, and restart the server process (or let them back in). This doesn't
seem to be a problem in most situations. (Which isn't to say that it's
necessary.) 


> Now consider this scenario:
> 
> - Alex the Admin starts an SVN dump of 0-1234 of the Subversion repository
> - Dan the Devel checks in version 1235 into the repo
> - Server dies and gets restored from the backup.
> - Dan the Devel still has a local copy
> - Dan the Devel checks in again

Except for the explicit references to SVN, this could be a description of
the process with a CVS(NT) repository.

> In both cases we require that Dan has a local copy that is applied again,
> but in the CVS case we have a repo that doesn't build properly and
> requires tweaking from the CVS admin, while in the SVN case the
> developer checks in his code again.

Now here you say something I can't follow; it doesn't seem to be mentioned
before in your scenario: "in the CVS case we have a repo that ... requires
tweaking from the CVS admin". What tweaking are you talking about? Other
than this mysterious "tweaking" I don't see much of a difference: people
with write privileges need to verify whether their data is in the
repository, and commit it if it isn't, whether that's SVN or CVS(NT).

The exact process of how to re-commit data that has already been committed
but then lost on the repository side is probably quite different between
the two, as the sandbox philosophy and commit process seem to be quite
different. But in principle it seems to be similar. It may be easier on the
SVN user; I don't know, and we'd have to compare the exact procedure
necessary for both. But considering how rare the process should be, I don't
think it matters much whether it's a bit easier or not, as long as it's
possible and safe.


Look, I don't want this to become a pissing contest ("pissing" is not in my
spell checker -- can you believe that? :). I know that there are a number
of shortcomings with the CVS concept; nobody who runs a CVS(NT) repository
can avoid learning about them :)  And since SVN was created by exactly such
people, I'm sure a number of them are being addressed. But then, we all
know that this is no guarantee that not other shortcomings are being
introduced :)  

Some time ago, I looked into it and the main reasons why I did not switch
to SVN were that it doesn't permit file locking (which I consider essential
for working with non-mergeable files) and it doesn't have ACLs. I need both
features, so that's a no-go right from the start, no matter how convincing
other features may be.

Also note that I don't know traditional CVS that well. I use CVSNT, which
is built upon the same code base as CVS (still, but not for very long
anymore) and therefore structurally similar, but has improved upon CVS in
many details. 

Gerhard

-- 
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist