Darcs-bridge bug tracker and updates.
I've recently been checking-in with my GSoC darcs-bridge work. I've re-worked the horrible state-file format I originally used, and have been making various other tweaks to the code.
Unfortunately, I've come across a few bugs (thanks to Brent Yorgey and Jesper Reenburg for reporting some initial problems they've hit), and have been trying to iron them out. With that in mind, I've added a bug tracker "Topic" for darcs-bridge on the darcs.net bug tracker: Bug tracker.
Hopefully, I'll be able to sort out the remaining bugs, and then move onto a tidy-up and refactor of the existing code.
GSoC: Darcs Bridge – Results
So, that's it, GSoC is over! Sorry I've been slack on the updates, I should have definitely kept on top of them better...
Anyway, it's been an interesting summer, frustrating at (most of the! :-)) times, confusing and hard work... sounds like fun? Well, it was! I've learnt a great deal about Darcs, Haskell, VCSs and working alone (and that I need to force myself to write blog updates in a more timely fashion!) and I've had an ultimately very rewarding experience.
So, what have I actually achieved? (If you want to jump right into how to use the bridge, check out the wiki page I've created here: http://wiki.darcs.net/DarcsBridgeUsage)
I suppose the easiest way to judge this is to take the list of pre-project targets and see where we are with them:
- Allow automatic incremental conversion: Check!
This definitely works well. - Create a mapping/encoding of multi-head repositories: Check!
Using a special tagging-scheme as described on the wiki, we are able to import (and re-export) branches/merges, mapping individual Git branches onto separate Darcs repositories. - Import and export foreign patch formats generated by VCS “send” commands: Check 1/2!
This works for applying Git patches to a Darcs repository, but the reverse operation turned out to be difficult to handle correctly. (We're not convinced that there are particularly compelling use-cases either.) If there was particular demand, I would hope that with a bit of further effort/thinking it would be possible to code up a solution. - Solve the problem of efficiently translating to/from Darcs patches: Check!
This goal is a bit of a strange one; given that we've decided to do complete translations of the repositories, rather than on-the-fly conversions, we've basically side-stepped any after-conversion performance problems, since the generated repositories are repositories-proper and do not have any translation-associated performance problems. - “Roundtripping”, whereby information may be lost when converting to and back from another repository format. (Particularly), Translation to and from Darcs specific patch-types e.g. replace patches: Check (mostly)!
Exporting replace patches can be recovered, assuming the replace primitives are at the "start" or "end" of a patch, rather than in the middle (intermediate states cannot be recovered/transmitted, hence the restriction - see the wiki page for more). Currently (as described in the wiki page) Darcs conflicts that are resolved using >1 patch (assuming the tagging-scheme is used) will be coalesced into a single resolution patch, upon exporting. In practice I imagine this isn't really a problem, but that said I'm hopeful it'd not be particularly difficult to fix. - The cycle problem, in the presence of multiple bridges.
N/A Again, due to choosing one-off translations I don't think that multiple bridges will cause any issues. - Create a consistent mapping between Darcs2 and Darcs1 format repositories.
Unchecked. I never got around to this feature; given that there exists a tool to do one-time conversions to Darcs-2 I'm not particularly concerned.
So, all-in-all, pretty good! There are a few things I'd like to get tidied up though...
TODOs:
- Further investigation with Darcs/Haskell gurus (Ganesh and Petr, I'm looking at you!) as to how I can improve the performance and resource-usage of darcs-bridge. Currently, exporting Darcs repositories is too memory-hungry, something I definitely want to improve.
- Attempting to import the Git source Git repo manages to trip Darcs itself up; there is a corner case of the patch-theory implementation in current Darcs: http://bugs.darcs.net/issue1520 that the changes/conflicts/resolutions in the Git repo manage to find. I suspect that others who are much more clever than myself have spent hours looking at the problem with no luck (since it's still not fixed)... I wonder if a fresh pair of eyes will spot anything or if the code's too opaque?
- Release! Once the performance has been tweaked a bit more, it'll be time to actually release darcs-bridge to the wider-world! Maintainership and bug-reporting and naming (currently the code/packaging is all centred around the darcs-fastconvert name, I think darcs-bridge should be separate to signify its improvements/differences) are some things I can think of that need discussion beforehand.
So, what next, Darcs-wise? I think I want to look into understanding and hopefully continuing Petr's work on the next-generation primitive-patches, particularly, how they fit into a repo-model (things like conflicts,duplicates and the issues their design throws up).
And finally...
A big thank you to Google for running the summer-of-code programme, Haskell.org for accepting my project (gaining a very keen Haskeller and Darcs-hacker in the process :-)) and the #darcs inhabitants: kowey, gh, mornfall sm and iago to name a few and particularly Ganesh for his advice throughout the project; all three groups were invaluable and this project couldn't have gone ahead without any one of them.
GSoC: Darcs Bridge – Detecting Merges
As I mentioned in my previous post, a problem with exporting multiple Darcs branches is that the patch-based model of Darcs makes it particularly difficult to detect merges of two branches.
We want to be able to detect merges, and export them in the fast-import data stream, for import by Git or other fast-import aware VCSs, since otherwise, it would appear as if two branches never converged, even if they had been merged in Darcs.
As an example, say we have two Darcs repos:
master : ABCD'E'
branch1: ADE
where D' and E' are the commuted versions of D and E, having been pulled into master from branch1. We'd like to see this history in git:
_ B _ C _ M / / A _ D _ E _/
i.e. a merge of ABC and ADE; Git makes these merge points explicit (a commit with >1 parent SHA1s is a merge commit), whereas Darcs does not.
So, the question is, how do we detect merges that are clean (i.e. no conflicts) and non-clean (with conflicts)?
First, a discussion on re-ordering and selective-pulling:
To Darcs, the repositories containing {ABC} and {ACB} are equal and are already sharing all their patches. Git (and thus, the fast-import format) does not see it this way - branches only share patches if the SHA1s of those patches match. This means that we can only treat Darcs branches as equal if they share their patches and ordering.
Darcs always provides the user with the ability to cherry-pick patches - selectively choosing the patches to operate on. If some patches are not pulled in, we cannot treat the resulting repository as a merge of branches.
So the resulting constraints are: to use the darcs-bridge effectively, patches should not be unpulled or otherwise re-ordered, and pulls between branches should always use the "--all|-a" option to pull in all changes.
Finally, how do we attempt to find the merge points? What we essentially need to know is where does a merge start in a sequence of patches, and where do the merged patches come from? So in our merged example, we'd need to know that D' is the start of a merge, and the patches are coming from branch1. Knowing this information allows us to export the patches of branch1, and then create a merge commit between the state of branch1 and the state of master.
One potential way of doing this is to create a tag, immediately before pulling a branch, and after the pull has completed. In our example, we'd get:
master: {A,B,C,T1,D',E',T1}
branch1: {A,D,E}
To export these branches, we follow these steps, starting with master:
- While the head-patches of all branches are equal, output a copy of that patch and move the 'reset point' of the branches to the state after exporting the patch.
- When we find a non-equal patch, save the current state as the branches reset point. And keep exporting the current branch.
- When we hit a merge tag (identified by a unique tag message), we read the patches in until we hit the corresponding merge tag (with equal message). We now have a set of merged patches, and need to find the origin branch.
- For each branch, try and match the list of merged patches, if we find a match, we can export that branches patches (having reset the current state to that of the branch reset-state), if we don't find the patches, something has been changed, so the only thing we can do is just output the patches as they are on the merge target branch (since we can't find the source branch).
The next question is how we obtain these "merge marker" tags, which is what I am currently pondering... watch this space!
GSoC: Darcs Bridge – Week 6/7/8
Another very late blog post, argh! But, good news - I've completed several of the targets of the project:
- Branches that the bridge manages can now be tracked/untracked and listed.
- Incremental imports/exports with branches now work correctly (bar the merging issues).
- Git patches can be applied to a Darcs repo.
We can now tell the bridge to track new Git/Darcs branches, such that they are synced when commits/patches are exported/imported and can also tell the bridge to no longer track branches, so they will no longer be exported/imported.
# Init a test repo.
$ darcs init --repo foo_project
$ cd foo_project
$ echo 1 > a
$ darcs add a
$ darcs rec -am 'add a'
Finished recording patch 'add a'
$ cd ..
# Create a branch of the repo.
$ darcs get foo_project foo_project_branch1
Copying patches, to get lazy repository hit ctrl-C...
Finished getting.
# Init a bridge from the repo.
$ darcs-fastconvert create-bridge foo_project
Identified darcs repo at /tmp/throwaways/foo_project
Cloning source repo from /tmp/throwaways/foo_project to /tmp/throwaways/foo_project_bridge/foo_project
Initialised target git repo at /tmp/throwaways/foo_project_bridge/foo_project_git
Created .darcs_bridge in /tmp/throwaways/foo_project_bridge
Wrote new marks files.
Wrote hook.
Wired up hook in both repos. Now syncing from darcs
Copying old sourcemarks: /tmp/throwaways/foo_project_bridge/.darcs_bridge/marks/darcs_export_marks
Doing export.
Doing import.
Copying old targetmarks: /tmp/throwaways/foo_project_bridge/.darcs_bridge/marks/git_export_marks
Doing mark update export.
Diffing marks.
1 marks to append.
Import marks updated.
Bridge successfully synced.
# Start tracking the branch we created.
$ darcs-fastconvert branch track foo_project_bridge/ foo_project_branch1/
Copying old sourcemarks: /tmp/throwaways/foo_project_bridge/.darcs_bridge/marks/darcs_export_marks
Doing export.
Doing import.
Copying old targetmarks: /tmp/throwaways/foo_project_bridge/.darcs_bridge/marks/git_export_marks
Doing mark update export.
Diffing marks.
0 marks to append.
Import marks updated.
Bridge successfully synced.
# Print a list of all tracked branches.
$ darcs-fastconvert branch list foo_project_bridge/
Tracked branches:
Name: master ~ Darcs path: foo_project_bridge/foo_project
Name: foo_project_branch1 ~ Darcs path: foo_project_bridge/foo_project_branch1
# Show that the branch was correctly imported into git.
$ (cd foo_project_bridge/foo_project_git/ && git log -p foo_project_branch1)
commit 893e08f44b0de658e00a49bc61a51c6a6621d59e
Author: Owen Stephens
Date: Fri Jul 15 11:53:53 2011 +0000
add a
diff --git a/a b/a
new file mode 100644
index 0000000..d00491f
--- /dev/null
+++ b/a
@@ -0,0 +1 @@
+1
Another main piece of work that I've completed is to properly handle incremental import/export with branches.
Since Darcs uses a patch-based representation, and the fast-import format uses a snapshot-based representation, we have to jump through some hoops to properly import/export the state requried.
To demonstrate, consider the Darcs history with 2 patches:
Fri Jul 15 13:11:04 BST 2011 Owen Stephens
* Amend shopping list, :-(
hunk ./shopping_list.txt 4
+Apples
+Pears
Fri Jul 15 13:10:44 BST 2011 Owen Stephens
* Create shopping list.
addfile ./shopping_list.txt
hunk ./shopping_list.txt 1
+Beer
+Pizza
+Chips
To output this repository in the fast-import format, we need to recreate each intermediate state of the repository, and list the file contents at that state. This is easy - we simply apply each patch and dump the affected files' contents, one after another. At each state, we save the pristine hash (a hash that identifies the pristine state of the repository) and the inventory (the list of patches that have been applied to create the pristine), which allows us to 'reset' ourselves to a previous state at a later point in time, by restoring the pristine/inventory.
Imagine another repository, which has a different final patch:
Fri Jul 15 13:21:06 BST 2011 Owen Stephens
* More treats needed!
hunk ./shopping_list.txt 4
+Cake
+Cider
Fri Jul 15 13:10:44 BST 2011 Owen Stephens
* Create shopping list.
addfile ./shopping_list.txt
hunk ./shopping_list.txt 1
+Beer
+Pizza
+Chips
We want to output the following 2 repos:
original : AB
better : AC
the snapshot-based history graph would look something like:
/-- B
A --
\-- C
We export A, since it is shared by both branches, and then export B. However, we need to reset ourselves back to the state before we applied B - we do so by restoring the pristine/inventory that we stored when we first applied A.
Once we have output C, we throw away the state we have generated/saved, since we have no further need for it, and it could potentially consume a large amount of space.
The fly in the ointment is the need to handle incremental imports/exports. Incremental imports/exports are supported by the fast-import format, through the use of "marks" files. Marks files contain a list of mark->patch_hash mappings[1], where a mark is an integer that is output along with each commit/patch in the format stream. In our example above, A would have mark 1, B 2 and C 3, along with their corresponding patch-hashes (and branch names) in the marks file.
Imagine that we exported the repository incrementally: we would first export A and B, and then, in a separate stream, C. The problem is the fact that to export C, we need the state, as it was after exporting A (remember we've thrown it away after exporting B, to save space). The solution is simple, but fairly inelegant - we simply run through all the thus-far exported patches, and recreate the state for each, which is, as expected, expensive.
On the import side, it's a little more difficult - consider reading the incremental stream containing just C - it'll contain a single commit that looks something like:
commit refs/heads/demo2
mark :3
committer Owen Stephens
data 20
More treats needed!
from :1
M 100644 inline shopping_list.txt
data 28
Beer
Pizza
Chips
Cake
Cider
This commit object names the branch on which it should be recreated ("refs/heads/demo2") the mark for the commit, the commiter, commit message, ancestor (from) commit mark and the commit modifications.
Note the line "from :1" - this line tells the importer that this commit should be based on the state as it was at mark 1 - i.e. commit A. We need to recreate that state - as mentioned earlier, as we import each commit, we stash the state for later use; however, as in export, we throw away this state, once we've finished a particular import stream. To recreate the state in a later stream, we take the ancestor mark (the from mark) and read the corresponding branch-name and patch-hash from the marks file. We then issue an internal command that performs the equivalent of "darcs get branch-name temporary_location --to-match = 'hash: PATCH_HASH'"; once we have get'd a new copy of the repo at the required state, we simply read the pristine and (entire) inventory, which allows us to reset our current state to that of the marked patch.
The final piece of work completed is being able to directly apply git-formatted patches to a Darcs repository:
$ darcs init
$ git init -q
$ echo -e '1\n2\n3' > a
$ git add a && git commit -m 'Add a'
[master (root-commit) 564e9e2] Add a
1 files changed, 3 insertions(+), 0 deletions(-)
create mode 100644 a
$ echo -e '4\n5' > a
$ echo -e 'a\nb\nc' > b
$ git add b
$ git add a
$ git commit -m 'Modify a and add b'
[master 411e599] Modify a and add b
2 files changed, 5 insertions(+), 3 deletions(-)
create mode 100644 b
# Revert changes, so we can apply the patch to the Darcs repo.
$ rm a b
# Create a Git patch of all the repos commits.
$ git format-patch --all --stdout > git.patch
# Apply the Git patch to Darcs.
$ darcs-fastconvert apply-patch . git.patch
Attempting to parse input.
Successfully parsed 2 patches.
Attempting to apply patches.
Applying patch 1 of 2: Add a
Applying patch 2 of 2: Modify a and add b
Succesfully applied patches.
Git patches contain a SHA1 hash of each affected file, which we can use to verify that the files are in the same state that they were in Git, prior to the commit. The patch-apply code computes the target files' SHA1 hashes (Git computes the SHA1 of the string "blob LEN\0CONTENT" of a file) to detect if the files are in the same state as in the Git patch. If the hashes differ, the user is prompted to apply anyway, with any non-applying patches being completely rolled back (the unrecorded state of the repository is also unaffected).
# Create another Git commit.
$ echo -e 'd\ne' >> b
$ git add b && git commit -m 'Modify b'
[master 59046a6] Modify b
1 files changed, 2 insertions(+), 0 deletions(-)
# Revert, so Darcs can apply the patch.
$ darcs rev -a
Finished reverting.
# Change b, so the expected hash doesn't match, but the patch will still apply cleanly.
$ sed -i 's/a/d/' b
$ git format-patch HEAD~1 --stdout > git.patch
$ darcs-fastconvert apply-patch . git.patch
Attempting to parse input.
Successfully parsed 1 patches.
Attempting to apply patches.
Applying patch 1 of 1: Modify b
WARNING: Hash of b does not match patch
No changes will be recorded, if the patch does not apply.
Continue anyway? [yn]y
Succesfully applied patches.
This means someone can make their own local git clone of a Darcs repo, and send patches to the Darcs-repo owner, who will be able to directly apply them.
Still outstanding in the next week:
- Apply Darcs patches to a Git repo.
- Merge detection - we still need to try to detect and output clean Darcs merges, else we'll lose them when exporting to Git.
- Performance - the performance of import is somewhat slow - we need to work out where and why it is performing badly.
[1] Darcs marks also contain the branch name that the given patch is part of - since Darcs doesn't yet natively support branches, we have to provide this information manually.
GSoC: Darcs Bridge – Week 4/5
I've had a fairly busy couple of weeks outside of GSoC, so I didn't have a particularly interesting blogpost to make last week. That said, the bridge is coming along nicely; we are now able to export Darcs branches into the fast-export format (other than a TODO on detecting merges - more on that later).
Some interesting topics that have come up recently:
Prefix sharing of Darcs branches when exporting - given branches ABCD and ABCE we can "share" the patches A,B and C between the branches rather than simply exporting ABCD and A'B'C'E, which would lose the common history of the two branches. The current implementation exports the longest prefix of patches between branches and then (to use the Git terminology) "rebases" any extra patches on top. E.g. branches ABCD and ABDE will be exported as ABCD and ABD'E (N.B. that D and D' are not the same). The current behaviour is somewhat a "best-effort" (it has some complicated "reproducibility" issues) but after a long discussion with Ganesh and Petr, a better approach wasn't found, so for now, it is how it is.
Encoding replace patches (and other incompatible patch-types) is tricky. The fast-export stream format simply stores file contents at each commit (just as Git does internally), which is fine for exporting patches once - we just apply the Darcs patches in turn, listing the changed files in full for each patch/commit. However, a property we are keen to keep with the Bridge is that of reproducability - multiple exports or repeated import/exporting should yield the same changes. For example, if a Darcs replace patch was exported to Git, and then the Git repo exported back into Darcs, we'd like to be able to recover the same replace patch (rather than a large hunk patch).
To illustrate, imagine the Darcs patch: [hunk file1 "foo\nfoo\nfoo" 1, replace file1 foo bar] that adds some foos to file1 and then replaces foos with bars in file1. It is important to know exactly where the "replace" was in the sequence of low-level patches - if we don't know the position we will create the wrong patch when re-importing (e.g. the newly added foos won't be changed to bars, if we place the "replace" before the hunk). It is difficult to encode positions other than "first" or "last", since we are unable to easily represent the intermediate states in Git (to ensure that the states are re-exported later), so for now, these changes will only be handled if they are first or last in a patch. N.B. the only way to force a replace into the middle of a sequence is by using amend-record, so the impact of this decision is *somewhat* limited.
Upcoming TODOs:
- Add branches to the bridge commands (add, rm, list branches etc.) - since we now support multi-head import/export on both sides, these commands will be very useful.
- Detecting, and making explicit in the fast-convert stream, merges of Darcs branches. Currently, re-exported Git merges are lost, since they are not detected on the Darcs side.
- Performance (import, especially is sometimes slow).
- Perhaps a way of showing progress without piping into git fast-import? Currently, bridge progress is mostly ignored.
- Accepting foreign patch-formats e.g. be able to apply emailed Darcs patches to a Git repo and vice-versa?
GSoC: First simple merging branches git import
A quick update post to show my first import of a branching git repo that contains merges.
It's a very simple import, but it works! :)
$ git log --graph --pretty='%ad %an <%ae>%n * %s%n%n %b'
* Tue Jun 14 16:19:34 2011 +0100 Owen Stephens
|\ * Merge branch 'branch1'
| |
| | Conflicts:
| | b
| |
| * Tue Jun 14 16:18:54 2011 +0100 Owen Stephens
| | * b branch1
| |
| |
* | Tue Jun 14 16:19:08 2011 +0100 Owen Stephens
|/ * b master
|
|
* Tue Jun 14 16:18:34 2011 +0100 Owen Stephens
* a master
$ git fast-export --all | darcs-fastconvert import darcs
$ darcs cha --repo darcs
Tue Jun 14 16:19:34 BST 2011 Owen Stephens
* Merge branch 'branch1'
Conflicts:
b
Tue Jun 14 16:18:54 BST 2011 Owen Stephens
* b branch1
Tue Jun 14 16:19:08 BST 2011 Owen Stephens
* b master
Tue Jun 14 16:18:34 BST 2011 Owen Stephens
* a master
The "conflicts b" message is generated by Git, and shows up in the Darcs patch, even though the patch isn't a conflicting patch; also, the merge commit seen by Git is actually 2 patches in Darcs: a 'merge' patch, which contains conflicts, and a 'resolution' patch that contains the resolution as per the Git merge commit. This is to ensure that we preserve the entire patch history, rather than simply "diffing" the end state and the branches.
The following rather verbose commands show the actual patch/commit content:
$ darcs cha --repo darcs -v
Tue Jun 14 16:19:34 BST 2011 Owen Stephens
* Merge branch 'branch1'
Conflicts:
b
hunk ./b 1
+1
+c
Tue Jun 14 16:18:54 BST 2011 Owen Stephens
* b branch1
duplicate
|hunk ./b 1
|-1
|-2
|-3
|rmfile ./b
|:
addfile ./b
conflictor [
hunk ./b 1
+1
+2
+3
]
|:
hunk ./b 1
+a
+b
+c
addfile ./c
hunk ./c 1
+a
+b
+c
Tue Jun 14 16:19:08 BST 2011 Owen Stephens
* b master
addfile ./b
hunk ./b 1
+1
+2
+3
Tue Jun 14 16:18:34 BST 2011 Owen Stephens
* a master
addfile ./a
hunk ./a 1
+1
+2
+3
git log --graph -p
* commit 6802fcd03d3ddf69cfb33a803211fe4f22da9542
|\ Merge: f085e43 c5dc576
| | Author: Owen Stephens
| | Date: Tue Jun 14 16:19:34 2011 +0100
| |
| | Merge branch 'branch1'
| |
| | Conflicts:
| | b
| |
| * commit c5dc576b33e8125153c9337b6b2dbf99a2de1a60
| | Author: Owen Stephens
| | Date: Tue Jun 14 16:18:54 2011 +0100
| |
| | b branch1
| |
| | diff --git a/b b/b
| | new file mode 100644
| | index 0000000..de98044
| | --- /dev/null
| | +++ b/b
| | @@ -0,0 +1,3 @@
| | +a
| | +b
| | +c
| | diff --git a/c b/c
| | new file mode 100644| | index 0000000..de98044
| | --- /dev/null
| | +++ b/c
| | @@ -0,0 +1,3 @@
| | +a
| | +b
| | +c
| |
* | commit f085e43571e1d19dd345c7e3ec7a0a57efaaba26
|/ Author: Owen Stephens
| Date: Tue Jun 14 16:19:08 2011 +0100
|
| b master
|
| diff --git a/b b/b
| new file mode 100644
| index 0000000..01e79c3
| --- /dev/null
| +++ b/b
| @@ -0,0 +1,3 @@
| +1
| +2
| +3
|
* commit 83888d0729210fd84c4557467c6548fcc99aae2c
Author: Owen Stephens
Date: Tue Jun 14 16:18:34 2011 +0100
a master
diff --git a/a b/a
new file mode 100644
index 0000000..01e79c3
--- /dev/null
+++ b/a
@@ -0,0 +1,3 @@
+1
+2
+3
GSoC: Darcs Bridge – Week 3
So, time for a (somewhat delayed - oops!) week 3 update on darcs-bridge:
By the end of last week, I had coded up a poorly-implemented approach to importing branches, but had hit a few final problems (mostly arising due to my flawed approach). Petr and Ganesh both helped me to see the light (and also a better method for handling branches!) which I've spent the weekend on-and-off hacking up.
The result is this rather innocuous-looking transcript:
$ git branch -a
foo_branch
* master
$ git log --graph --all
* commit 48e3317aad56df72977c80c2b40a34b87349e435
| Author: Owen Stephens <git@owenstephens.co.uk>
| Date: Mon Jun 13 14:28:53 2011 +0100
|
| add b
|
| * commit e85e1bae42b1647bc08bdac2aaec8e402152bce0
|/ Author: Owen Stephens <git@owenstephens.co.uk>
| Date: Mon Jun 13 14:28:53 2011 +0100
|
| add c
|
* commit d8b3bafad054acdd8307fbefdc95287dc715e9a7
Author: Owen Stephens <git@owenstephens.co.uk>
Date: Mon Jun 13 14:28:53 2011 +0100
add a
$ (cd git_repo && git fast-export --all) | darcs-fastconvert import darcs
[...]
$ darcs cha --repo darcs
Mon Jun 13 14:28:53 BST 2011 Owen Stephens <git@owenstephens.co.uk>
* add c
Mon Jun 13 14:28:53 BST 2011 Owen Stephens <git@owenstephens.co.uk>
* add a
$ darcs cha --repo darcs-branch_foo_branch
Mon Jun 13 14:28:53 BST 2011 Owen Stephens <git@owenstephens.co.uk>
* add b
Mon Jun 13 14:28:53 BST 2011 Owen Stephens <git@owenstephens.co.uk>
* add a
Which shows (somewhat unclearly) a multi-head git repo with a base commit - "add a" and then two branching commits: "add b" and "add c"; "add b" is made on a new branch, and should not show up in the log of the master branch. Importing this git repo creates a base repo directory and any branches are created in adjacent directories. I've not quite finished off this branch-importing work - I still need to handle merges, but it should not be too much trouble, and it's good to see that the code works on simple cases already!
Other things I've completed over the last few days:
- Correctly handling renames/moves, rather than simply diffing before/after (which would give patches that see fully removed/added files).
- I no longer shell out to darcs-fastconvert (yuck!), when syncing a bridge, as I've reworked the input/output handling that was preventing me from calling the import code internally.
- Testing: I've added a basic test-suite, that will catch the most egregious of foul-ups due to any changes that I make. I'll add more tests in the future, to catch tricky cases, especially with handling branches.
Things coming up: handling merges and exporting multiple darcs "branches" into a single git repository. Onwards!
GSoC: Darcs Bridge – Week 2
So, it's the end of the 2nd week of GSoC, but only my first full week of work, due to my uni finals; Thankfully, they're over and GSoC is going well! I didn't write a week 1 post, since I'd only done < 2 days of work, but this week I've got some good things to discuss.
This week, I've created a working (but not stringently tested, yet!) automatic Darcs<->Git bridge, by extending darcs-fastconvert. The bridge creates a Git clone of an input Darcs repository (vice versa for a Git input repo), using the fast-import data format to import the data from Darcs. The bridge inserts a "hook" into both repos, (pre-receive for Git, and pre-apply for Darcs) that ensures that the bridge is synced, before allowing new patches to be pushed. If the bridge was out-of-sync, the new commits will be imported and the push/apply will be rejected; the user should then pull in the imported changes and resolve any conflicts locally. We use a mutex to disallow concurrent pushes to the two repos.
If you'd like to test the bridge, you can get a copy of my darcsden repo. The bridge can be created and tested as follows:
darcs get http://darcsden.com/owst/darcs-fastconvert-gsoc
cabal configure; cabal build; cabal install
cd DIR_CONTAINING_REPO
darcs-fastconvert create-bridge --input-repo=REPO_DIR
a directory named REPO_DIR_bridge should have been created, with a clone of the input repo, and a Git copy. These repos should be used as the master repos, and should be pushed to, not edited directly (otherwise, the bridge-syncing commands won't run).
The current tool has limitations, particularly with regard to branches in Darcs and Git, but on simple, linear history repos, the bridge should have no problems.
Some TODOs, for the bridge:
- Improve the help (possibly by modifying cmdlib, the command-line argument parser being used), particularly removing flags for mandatory parameters.
- Create a typeclass for the monad that the "export" command runs in, to allow easy redirection of output. Currently I have to shell-out to my own executable, due to the design of darcs-fastconvert, when I want to internally run an export/import command. This sucks (but at least the hack works as is!), and I will need to do some re-engineering to fix it.
- Create some tests! I need to create some simple shell-tests that will ensure I don't introduce regression errors, when adding features to the bridge.
At the end of the week, I started work on adding Rename/Move handling to the import mechanism of darcs-fastconvert. Git is able to infer moves/copies, using the -C and -M options to git-fast-export, but darcs-fastconvert cannot currently import them. Adding this behaviour will reduce the likelihood of loosing information, if a previously converted repo with moves/renames was converted back to its original format.
My next task is to implement simple multi-head importing - currently darcs-fastconvert linearises Git repos with multiple branches, usually leading to "strange" patch contents. I will map Git branches to multiple Darcs repos (the method of branching in Darcs). One particularly tricky problem to solve is that of creating "good" Darcs patches, from a Git merge commit.
GSoC: Darcs Bridge – Week 0
This week I've been participating in some productive revision-procrastination, by starting work on my GSoC project - Darcs Bridge; I still have university exams until next week, so wanted to make sure I lost no GSoC time (and besides, Haskell hacking is much more fun than revising Elliptic curves!).
- First up, I created a darcsden repository, to host my work-in-progress over the summer, here.
- To kick things off code-wise, I wanted to make sure I could build darcs-fastconvert, mornfall's existing method of converting to/from darcs/git, using the "fast-import" de facto standard. This meant updating the code, to build against the latest dev-version of Darcs - 2.7.3 and GHC 7.
- I found, and fixed a bug in Darcs itself, which manifested itself when attempting to add non-existent files within a newly added folder within a repo.
Ensuring darcs-fastconvert compiled was a great first challenge for my project; fixing the new typing requirements forced me to understand how many of Darcs low-level concepts were implemented, particularly how types are used to represent the contexts of a patch.
Ealier today, my mentor Ganesh and I had an online chat about ideas for the project:
- We want conversion between darcs<->git to be as seamless as possible - a "sync" command.
- Darcs-fastconvert can do this to some extent, but requires manually managing "marks" files (for git and darcs), to persist the state of the conversion. These are somewhat fragile and prone to errors (and also make the export/import command invocations more noisy than they should be).
- We can "shell-out" to git, rather than relying on the user to manage the git side of the conversion. This also allows us to easily get hold of data such as the git commit ids (something that is not exported in the fast-import data-format.) - this'll allow us hopefully to better manage multi-head repos.
- We can assume that the bridge will be responsible for "managing" the darcs and git repositories; particularly, we envisage a "bridge lock" that will allow us to ensure that users cannot commit to the darcs and git repositories simultaneously - the pre-commit hooks in both git and darcs shall fail, if this lock is active. (We initially thought we could use the separate darcs and git locks, but this could well lead to lock ordering or race problems.)
- We can use git "hooks" to ensure that a darcs and git repository are in sync, before new commits can be pushed to the git repo.
That's it for now, but I'll make sure to keep this blog updated, as my work progresses.
GSoC project accepted!
Yesterday I received confirmation that my Darcs GSoC project proposal has been accepted; this summer I'll be working on creating/improving a "bridge" between Darcs and other VCSs, such as Git (for more information, see the Darcs wiki page of my project).
This blog will host my weekly GSoC updates, when the coding period starts in late May.
Thanks to all in Darcs team who helped me flesh out my proposal (special mentions to Eric, Ganesh and Jason - thanks guys!), I'm very much looking forward to my summer!