CVS is still around for many important projects, making it difficult to scale their development. Tutorials available for this topic are not robust enough for ease of use. Thus, here I would like to describe CVS to Git migration steps good enough for copy-pasting and just basic changes.
A very good starting point is given by this post for CVS repository of eCos:
rsync -r rsync://sourceware.org/ecos-cvs cvs > logs/rsync_out.txt 2> logs/rsync_err.txt
The point is we need to make a full copy (clone) of CVS repository, not just a checkout. Most likely, for CVS there always exists a copyable URL, but its path might be not so obvious, and knowing the right URL is the key.
The cvs2git tool still does a good job, but as of 2023 is already lost in the past. Therefore, I’m picking up a time-machine cvs2git Docker container, which has Python 2 to run it:
docker pull jwiebalk/cvs2git
docker run -it -v $(pwd)/cvs:/cvs -v $(pwd)/git/repo:/git -v $(pwd)/log_output:/tmp/cvs_migration jwiebalk/cvs2git
Inside the Docker, we first navigate to /cvs
folder to perform the dump, and then to /git
to import the dump. By default, the final git state will be in “deleting all content”, as when creating an orphanned branch. Therefore, we finish by doing git reset --hard
and exit container:
cd /cvs
cvs2git --blobfile=/git/git-blob.dat --dumpfile=/git/git-dump.dat --fallback-encoding=ascii ecos >> /tmp/cvs_migration/ecos.log
cvs2git --blobfile=/git/git-blob.dat --dumpfile=/git/git-dump.dat --username=dmikushin --fallback-encoding=ascii ecos >> /tmp/cvs_migration/ecos.log
cd /git
git init ecos
cd ecos
cat /git/git-blob.dat /git/git-dump.dat | git fast-import
git gc --prune=now
git reset --hard
exit
On the host side, adjust the newly created Git repository ownership with chown
and chgrp
.
In principle that’s all: Git tree is fully constructed and ready for pushing. But the log imported from CVS does not hold relevant information about the contributors:
git log | grep Author: | sort -u
Author: asl <>
Author: bartv <>
Author: grante <>
Author: gthomas <>
Author: jani <>
Author: jlarmour <>
Author: jld <>
Author: msalter <>
Author: nickg <>
Author: sergeig <>
Author: vae <>
So I’m going to use a Perl script to replace these names with better ones I’ve collected manually with a background search. The idea is to batch-process the Git repository with a git filter-branch
, as shown in this post:
#!/usr/bin/env perl -w
my(@authors) = (
"asl",
"bartv",
"grante",
"gthomas",
"jani",
"jlarmour",
"jld",
"msalter",
"nickg",
"sergeig",
"vae"
);
my(@authors_new) = (
# New names go here
);
my($i) = 0;
for ( ; $i < scalar(@authors); $i++)
{
my($name) = $authors[$i];
my($name_new) = $authors_new[$i];
my($email_new) = $authors_new[$i];
$name_new =~ s/\s*<.*$//g;
$email_new =~ s/^.*<//g;
$email_new =~ s/>//g;
print "Renaming " . $authors[$i] . " to " . $authors_new[$i] . "\n";
my($cmd) = "FILTER_BRANCH_SQUELCH_WARNING=1 " .
"git filter-branch -f --env-filter '#!/bin/bash\n" .
"OLD_NAME=\"$name\" \n" .
"CORRECT_NAME=\"$name_new\" \n" .
"CORRECT_EMAIL=\"$email_new\" \n" .
"if [ \"\$GIT_COMMITTER_NAME\" = \"\$OLD_NAME\" ]; then \n" .
" export GIT_COMMITTER_NAME=\"\$CORRECT_NAME\" \n" .
" export GIT_COMMITTER_EMAIL=\"\$CORRECT_EMAIL\" \n" .
"fi \n" .
"if [ \"\$GIT_AUTHOR_NAME\" = \"\$OLD_NAME\" ]; then \n" .
" export GIT_AUTHOR_NAME=\"\$CORRECT_NAME\" \n" .
" export GIT_AUTHOR_EMAIL=\"\$CORRECT_EMAIL\" \n" .
"fi \n" .
"' --tag-name-filter cat -- --branches --tags";
system("$cmd");
}
We simply execute this script to process the whole Git repo (may take several minutes):
perl ./authors.pl
Check out the nice new names with git log
! Now the repository is really ready for pushing. Happy migrating!
Dmitry Mikushin
If you need help with Machine Learning, Computer Vision or with GPU computing in general, please reach out to us at Applied Parallel Computing LLC.