Alex Harford wrote: >>>> and there's nothing problematic with it. The repository is at any >>>> point in time in a consistent state, and so is the backup. >>> >>> However, this is still incorrect in regards to CVS. While you are >>> doing a backup (tar or what have you) >> >> Well, maybe this is the difference. I don't use tar for backups :) > > In order for file backups to work, the directory you are backing up is > going to have to remain *exactly* the same between the start of the > backup and when the backup completes. Can we agree that the backup is > 'bad' if a file gets added or changed while the backup is occurring? No, I don't agree with this in this generality. The reasons should become clear in the following. >>> At this point, the repo is not in a consistent state, and restoring >>> from this backup would require manual labour to get it working. >> >> Is your claim that the repository is not in a consistent state >> something you know or something you think might be so? > > Consider the scenario: > > - Alex the Admin starts a backup > - Dan the Developer checks in changes to a Makefile and also adds > foo.c to the repo. > - The backup software stores the new copy of the Makefile but it > doesn't grab foo.c, since filesystem operations are not atomic! > - CVS Server dies and gets restored from the backup > - Dan the Developer checks out the code, builds, and it fails because > the Makefile is looking for foo.c > * Since Dan the Dev has a local copy, can he check in foo.c again > without any problems? How does CVS handle the file history? Does it > know about foo.c in the revision history, but can't find it on the > filesystem? A few things to consider with this scenario: - With CVS(NT), files are "atomic". With this I mean that all history of a file is contained in its corresponding repository (RCS based) file. So Dan could easily run a diff, see what's different (i.e. what made it into the backup and what not), and based on that, check in what's missing. He would have to do that anyway (atomic commits or not), as he couldn't know whether or not his commit made it before the backup. - Ideally, a commit is an atomic operation on a logical set of files. In practice, this is not always the case. In my observation, it is less common to be than not to be, and that not only for CVS(NT). For example, I value good commit comments. Unluckily, even though I can commit several files with one command (and in theory this could be made atomic), all the files then have the same comment (which is of course no general limitation; it is a limitation of CVS(NT)). Anyway, this has the effect that I quite often commit several files that belong to one logical set with different commands -- so that I can commit them with different comments. (AFAIK, SVN also allows only one commit comment per commit, not per file to be committed.) - Given that, I don't have the expectation that the repository is consistent from a logical point of view at any point in time. I know that this is something like the "holy grail of commits", but in my experience reality differs, especially when you use the repository also as central backup and communication tool and are not writing exclusively command line tools in C :). This is not really a question of whether a single commit command is atomic, it's more a question of whether it's even possible or desirable to defer commits until a logical set is complete. (Just think about a change that involves schematic and board changes, component/BOM/purchasing info changes, case drawing changes, firmware changes, PC software changes, production documentation changes, user documentation changes, all done by different people -- there's no reasonable way this can get committed in an atomic manner as one change set, even though it is one logical change set and none of its parts makes sense without the others.) - IMO this "logical consistency" is even less important with repository restore situations. They should be (very) rare instances; if they are not (very) rare, you have another problem that you should fix urgently. And whenever I have to restore a repository, I'd tell all people who check in stuff to the repository to check carefully whether everything they have on disk is actually in the repository, and to commit what's not. I don't think that this would be different between CVS(NT) or SVN -- as even with the better atomicity of the SVN commit process nobody knows exactly which changes made it and which didn't. - A side note: If you want to play it safe, you should commit foo.c before you commit Makefile. At least in this example, this is always safe. > This situation doesn't occur if your backup software checks the > filesystem again after backing up the files, but that means that you > have to keep trying to back up until there are no checkins, or you > need to lock the users out, and it's no longer a 'live' backup at that > point. Well, yes or no. For whoever is really concerned with that, the suggested way would be to stop the server process (or lock out the users), run the backup, and restart the server process (or let them back in). This doesn't seem to be a problem in most situations. (Which isn't to say that it's necessary.) > Now consider this scenario: > > - Alex the Admin starts an SVN dump of 0-1234 of the Subversion repository > - Dan the Devel checks in version 1235 into the repo > - Server dies and gets restored from the backup. > - Dan the Devel still has a local copy > - Dan the Devel checks in again Except for the explicit references to SVN, this could be a description of the process with a CVS(NT) repository. > In both cases we require that Dan has a local copy that is applied again, > but in the CVS case we have a repo that doesn't build properly and > requires tweaking from the CVS admin, while in the SVN case the > developer checks in his code again. Now here you say something I can't follow; it doesn't seem to be mentioned before in your scenario: "in the CVS case we have a repo that ... requires tweaking from the CVS admin". What tweaking are you talking about? Other than this mysterious "tweaking" I don't see much of a difference: people with write privileges need to verify whether their data is in the repository, and commit it if it isn't, whether that's SVN or CVS(NT). The exact process of how to re-commit data that has already been committed but then lost on the repository side is probably quite different between the two, as the sandbox philosophy and commit process seem to be quite different. But in principle it seems to be similar. It may be easier on the SVN user; I don't know, and we'd have to compare the exact procedure necessary for both. But considering how rare the process should be, I don't think it matters much whether it's a bit easier or not, as long as it's possible and safe. Look, I don't want this to become a pissing contest ("pissing" is not in my spell checker -- can you believe that? :). I know that there are a number of shortcomings with the CVS concept; nobody who runs a CVS(NT) repository can avoid learning about them :) And since SVN was created by exactly such people, I'm sure a number of them are being addressed. But then, we all know that this is no guarantee that not other shortcomings are being introduced :) Some time ago, I looked into it and the main reasons why I did not switch to SVN were that it doesn't permit file locking (which I consider essential for working with non-mergeable files) and it doesn't have ACLs. I need both features, so that's a no-go right from the start, no matter how convincing other features may be. Also note that I don't know traditional CVS that well. I use CVSNT, which is built upon the same code base as CVS (still, but not for very long anymore) and therefore structurally similar, but has improved upon CVS in many details. Gerhard -- http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist