Monotone options rework

As I promised earlier I’ll continue my little series of noteworthy changes and additions in the upcoming monotone release. What I’ll blog about today may sound as if it is merely “syntactic” sugar for the command line end user, but serves a greater purpose when its put in perspective: The introduction of overwritable and negatable options.

Lets start with a simple example: monotone creates a special type of certs if a user explicitely calls the `suspend` command to mark a particular branch – actually its head revision(s) – as end-of-life, or uninteresting. This is useful in case a certain feature branch has been merged back into the main branch and the feature branch name should no longer confuse other people. Since monotone cannot physically delete these no longer needed data due to its distributed nature, the internal revision machinery now simply ignores branches and revisions which have these kind of certs and basically hides them from the user, so they aren’t picked as update candidates when `update` is called and they also no longer show up in the list of branches (`list branches`).

Of course people might still want to see at some point of time what suspended branches are actually available, sometimes maybe to revive a dead development line or for other historical purposes. For this use case monotone has a global option `–ignore-suspend-certs`, which simply deactivates the automatic hiding and all commands behave as if no suspend certs exist at all. So long so good, but there was a nitty gritty use case problem with this (and other similar) functionality:

If a user permanently decided to ignore suspend certs, for example by adding a specific section in his global `get_default_command_options`, he could not easily deactivate or overwrite this setting again via command line. The same applied to long-running processes, like `automate stdio`: once a client triggered the list branches command there with the `–ignore-suspend-certs` option, the client was unable to switch this flag off in case he wanted to query only active branches with the next invocation.

Up until 0.48 only ugly workarounds existed, such as restarting the stdio process, deactivating the loading of hooks or temporarily commenting out the particular section in the lua file, but with the recent merge of the options branch, it became possible. Every revertable, boolean flag gained a corresponding cancel flag, which is most of the time just named `–no-`:

$ mtn ls branches --ignore-suspend-certs --no-ignore-suspend-certs

(Can you guess what this command will now actually do? Little hint: The last occurrence counts…)

But as I mentioned above, this is only half of the truth: Not all cancel flags are just prefixed with a “no-“, in some cases they also look completly different and in some cases they even made us rename the original option, to keep the flow of the UI syntax and to prevent the invention of ugly pseudo options (or do you think `–no-norc` is a good name for the cancel option of `–norc`?).

Before you scream “Oh my god, will I ever get used to this new options?! You broke monotone for me!” let me give you a few words of relief:

  1. For common options which have been changed or removed, there is a new deprecation functionality which points you at the new option syntax.
  2. monotone’s inline command help will not only show you the full syntax of the original option, but also of the cancel option name.
  3. Just as partial command names are completed, option names are now completed as well. If a certain prefix has multiple expansions, all possible options are listed with a short description. (For my little example above, the smallest unique prefix for the long `–ignore-suspend-certs` option is `–ignore-`, as there is also an option named `–ignored` available, for a completely different use case though.)
  4. Finally, if you’re still puzzled by all the new and changed options and general calling syntax and you want a single page which you can just skim / search over for the thing you’re looking for, I’m pleased to tell you that the next monotone version will again have a manual page, but this time its auto-generated from the internal command tree and options. Maybe I’ll tell you a little bit more about this in one of the next blog posts, if not, just try it out with `mtn manpage` as soon as 0.99 hits the streets.

I hope we’ll get a lot of feedback on this for the 0.99 release, as I firmly believe that this overhaul will make the functional and consistent foundations of monotone even stronger.

I also hope that we do not scare too many people away; we did this with purpose for 0.99 and not 1.0 or 1.1, so trust me, we don’t plan to mess around with the UI to this extent again anytime soon 🙂

Thanks for reading so far. Your comments are welcome on irc and via mail.

Search and replace multiple lines across many files

sed is usually my favourite tool to search and replace things from the command line, but sometimes Perl’s regexes are far more convenient to use. Recently I found out another reason why Perls -pi -e is superior over plain sed: when you want to change multiple lines in a document!

Imagine you have hundreds of source code files where somebody once had the great idea to add a ___version___ property into each class:

public class Foo
{
    private static final String ___version___ = "$Version:$";
    
    // other stuff
}

With Perl the line in question is easy to remove:

$ for file in $(find . -name "*.java"); do \
   cp $file $file.bkp; perl -pi -e \
      "s/\s*public.+___version___.+\n//g" \
   < $file.bkp > $file; rm $file.bkp; done

But, there is one problem: Perl processes each line of the file separately when it slurps in the file, which results in unwanted empty lines:

public class Foo
{
    
    // other stuff
}

Then I stumbled upon this article and the solution is to set a special input separator to let Perl slurp in the file as a whole:

$ for file in $(find . -name "*.java"); do \
   cp $file $file.bkp; perl -p0777i -e \
     "s/\s*public.+___version___.+\n(\s*\n)*/\n/g" \
   < $file.bkp > $file; rm $file.bkp; done

and voila, we get what we want:

public class Foo
{
    // other stuff
}

Digging a little deeper what -0777 actually means leads us to perlrun(1):

The special value 00 will cause Perl to slurp files in paragraph mode. The value 0777 will cause Perl to slurp files whole because there is no legal byte with that value.

Another day saved – thanks to Perl!

And while we’re at it, have a look at Rakudo Star, the best Perl 6 compiler which was released just recently. Perl 6 is in my humble opinion one of the well-designed languages I’ve came across so far, so if you find some time, go over and read the last christmas special, its really worth it!

On monotone selectors

This is the first post in a small series of posts which will show off some of the new functionality you can expect in the next major version of monotone. While there is no fixed release date set for it yet, we plan to release it in fall this year. If you look at the roadmap you see that most things have already been implemented and merged into mainline, so we’re definitely on plan 🙂

Anyways, lets begin this little series with the selector rewrite Tim merged a couple of weeks ago. Selectors are one of the main concepts in monotone to pick revisions other than by their 40 byte long hash id and are therefor very useful to “navigate” between different development lines.

Monotone up until 0.48 knows already many selectors – you can select revisions by tag, by branch, by author, by custom cert values and so on. Selectors can be combined to calculate the intersection between two single sets, like “show me all revisions from author ‘Jon’ on branch ‘my.project'” which would essentially look like this:

$ mtn automate select "a:jon/b:my.project"

The syntax for these selectors is all nice and simple – each selector is prefixed with a unique character and multiple selectors are concatenated with a single slash. While these old-style selectors solved many use cases, some however kept unresolved in the past and users from other DVCS like Darcs had a rather hard time figuring out how to accomplish a certain selection in monotone.

A particular good example is “how can I easily view the changes of a development branch since the last merge point?”. Up until now you either had to figure out manually the revision of the merge point by looking at the output of log or use some scary construct like the following:

$ mtn au common_ancestors $(mtn au select h:main.branch) \
    $(mtn au select h:) | mtn au erase_ancestors -@-

Enter selector functions

Luckily, you don’t have to write these things anymore starting from 0.99 onwards. Give the new selector functions a warm applause!

$ mtn au select "lca(h:main.branch;h:feature.branch)"

In this example “lca” stands for the “least common ancestors” function which takes two arguments, i.e. two other selectors. The syntax is extra short in a workspace where an empty head selector h: defaults to the branch recorded in the workspace options, so if you’re in the feature.branch workspace, just type:

$ mtn au select "lca(h:main.branch;h:)"

Quite convenient, eh? This is not only short, but up to five times faster than the above complex command line. Of course the selector can be directly used in a call to diff or log, like so:

$ mtn diff -r "lca(h:main.branch;h:)"
$ mtn log --to children(lca(h:main.branch;h:))"

But huh, whats that nested children call you ask? Well, the lca function picks the merge point in the _main branch_ and if the revision graph goes around that, log would otherwise happily log more parents (earlier revisions) on the feature branch. The call to children ensures that we pick the merge revision in the feature branch and therefor really stop logging at this revision.

Test drive

There are many more of these selector functions and explaining them all in detail is out of scope here, please have a look at “composite selectors” in the nightly built manual.
And if you want to have an early look at this and play around without having to compile it yourself – at least if you’re on openSUSE or Fedora – just download the binaries from our nightly builds.

MySQL partitioning benchmark

I had a little research task today at work where I needed to evaluate which MySQL storage engine and technique would be the fastest to retrieve lots of (like millions) log data. I stumbled upon this post which explained the new horizontal partitioning features of MySQL 5.1 and what I read there made me curious to test it out myself, also because the original author forgot to include a test with a (non-)partitioned, but indexed table.

This is my test setup: Linux 2.6.34, MySQL community server 5.1.46, Intel Pentium D CPU with 3.2GHz, 2GB RAM

Test MyISAM tables

The table definitions are copied and adapted from the aforementioned article:

CREATE TABLE myi_no_part (
      c1 int default NULL,
      c2 varchar(30) default NULL,
      c3 date default NULL
) engine=MyISAM;

CREATE TABLE myi_no_part_index (
      c1 int default NULL,
      c2 varchar(30) default NULL,
      c3 date default NULL,
      index(c3)
) engine=MyISAM;

CREATE TABLE myi_part (
  c1 int default NULL,
  c2 varchar(30) default NULL,
  c3 date default NULL
) PARTITION BY RANGE (year(c3))
(PARTITION p0 VALUES LESS THAN (1995),
 PARTITION p1 VALUES LESS THAN (1996),
 PARTITION p2 VALUES LESS THAN (1997),
 PARTITION p3 VALUES LESS THAN (1998),
 PARTITION p4 VALUES LESS THAN (1999),
 PARTITION p5 VALUES LESS THAN (2000),
 PARTITION p6 VALUES LESS THAN (2001),
 PARTITION p7 VALUES LESS THAN (2002),
 PARTITION p8 VALUES LESS THAN (2003),
 PARTITION p9 VALUES LESS THAN (2004),
 PARTITION p10 VALUES LESS THAN (2010),
 PARTITION p11 VALUES LESS THAN MAXVALUE) 
 engine=MyISAM;

CREATE TABLE myi_part_index (
  c1 int default NULL,
  c2 varchar(30) default NULL,
  c3 date default NULL,
  index(c3)
) PARTITION BY RANGE (year(c3))
(PARTITION p0 VALUES LESS THAN (1995),
 PARTITION p1 VALUES LESS THAN (1996),
 PARTITION p2 VALUES LESS THAN (1997),
 PARTITION p3 VALUES LESS THAN (1998),
 PARTITION p4 VALUES LESS THAN (1999),
 PARTITION p5 VALUES LESS THAN (2000),
 PARTITION p6 VALUES LESS THAN (2001),
 PARTITION p7 VALUES LESS THAN (2002),
 PARTITION p8 VALUES LESS THAN (2003),
 PARTITION p9 VALUES LESS THAN (2004),
 PARTITION p10 VALUES LESS THAN (2010),
 PARTITION p11 VALUES LESS THAN MAXVALUE) 
 engine=MyISAM;

Test Archive tables

Since MySQL’s Archive engine does only support one index which is primarily used for identifying the primary id, I left out the indexed versions for that:

CREATE TABLE ar_no_part (
      c1 int default NULL,
      c2 varchar(30) default NULL,
      c3 date default NULL
) engine=Archive;

CREATE TABLE ar_part (
  c1 int default NULL,
  c2 varchar(30) default NULL,
  c3 date default NULL,
  index(c3)
) PARTITION BY RANGE (year(c3))
(PARTITION p0 VALUES LESS THAN (1995),
 PARTITION p1 VALUES LESS THAN (1996),
 PARTITION p2 VALUES LESS THAN (1997),
 PARTITION p3 VALUES LESS THAN (1998),
 PARTITION p4 VALUES LESS THAN (1999),
 PARTITION p5 VALUES LESS THAN (2000),
 PARTITION p6 VALUES LESS THAN (2001),
 PARTITION p7 VALUES LESS THAN (2002),
 PARTITION p8 VALUES LESS THAN (2003),
 PARTITION p9 VALUES LESS THAN (2004),
 PARTITION p10 VALUES LESS THAN (2010),
 PARTITION p11 VALUES LESS THAN MAXVALUE) 
 engine=Archive;

Test data

I re-used the procedure to create about 8 million test data records spread randomly over the complete partitioned area and subsequently copied the generated data to the other tables:

delimiter //

CREATE PROCEDURE load_part_tab()
     begin
      declare v int default 0;
              while v < 8000000
      do
      insert into myi_no_part
      values (v,'testing partitions',adddate('1995-01-01',(rand(v)*36520) mod 3652));
      set v = v + 1;
      end while;
     end
     //

delimiter ;

call load_part_tab;

insert into myi_no_part_index select * from myi_no_part;

...

Test query and the results

I used the same query to retrieve data from all of the tables:

select count(*) from TABLE_NAME 
where c3 > date '1995-01-01' and c3 < date '1995-12-31';

and these were the results (mean values of several executions):

table exec time
`myi_no_part` ~ 6.4s
`myi_no_part_index` ~ 1.2s
`myi_part` ~ 0.7s
`myi_part_index` ~ 1.3s
`ar_no_part` ~ 10.2s
`ar_part` ~ 1.1s

These results were actually pretty suprising to me, for various reasons:

  • I would not have thought that intelligent partitioning would beat an index on the particular column by saving the hard disk space for the index at the same time (roughly 1/3 of the total data size in this test case).
  • The values for `myi_no_part` were actually better than expected - I would have thought that these should be much worse, also if you compare them with the values from the author of the original article.
  • The archive engine adds actually nothing to the mix, but disadvantages. Maybe my test case is flawed because I "only" tested with 8 million rows, but one can clearly see that a partitionated MyISAM table beats a partitionated Archive table by more than 40%, so the usage of the Archive engine gives you no advantages, but only disadvantages, like being not able to delete records or add additional indexes.
  • Apparently partitioning and indexing the column in question is slightly slower instead of faster, however if one tries to use a subset of a partitioned table (like restricting to where c3 > date '1995-06-01' and c3 < date '1995-08-31') it is faster - ~0.3s with index vs ~0.7s without index.

Conclusion

MySQL's partitioning is a great new feature in 5.1 and should be used complementary to subtle and wise indexing.

New local pre-commit hook in monotone

There was only one hook in monotone until now which could be “reused” to interact with the commit process and validate the changeset that should be committed, the `validate_commit_message` hook. But this was a bit clumsy as it was actually designed to validate the commit message (as the name suggests) and not the changeset, thus the hook was called _after_ the commit message was entered in the editor (or was given with `–message` or `–message-file`).

Now monotone (from 0.99 onwards) gained a new commit hook which is called before the commit message processing takes place, but after the logic validated the changeset and branch to which it should be committed. Its named simply `validate_changes` and takes two parameters, the revision to be committed as full text (parsable in the hook via `parse_basic_io`) as the first and the name of the branch to which the revision should be committed as the second. Just as `validate_commit_message`, it is expected to return a tupel containing a boolean which denotes if the change is valid or not and an optional string which explains the reason if not and which is displayed to the committer afterwards.

With this new installment, it should feel natural e.g. to create a pre-commit hook which ensures that none of the patched or added sources contains Windows line endings:

function validate_changes(revdata, branchname)
  local parsed = parse_basic_io(revdata)
  for _,stanza in ipairs(parsed) do
    if stanza.name == "add_file" or
       stanza.name == "patch" then
      local file = stanza.values[1]
      if not guess_binary_file_contents(file) then
        local fp = assert(io.open(file, "r"))
        local contents = fp:read("*all")
        fp:close()
        if string.find(contents, "\r\n") then
          return false, "CRLF detected"
        end
      end
    end
  end
  return true, ""
end

Unfortunately its not yet possible to call `mtn_automate`, the lua interface to monotone’s automation commands, from hooks like this. Then we could have saved the `read(“*all”)` call and would only have to scan the output of `automate content_diff`, which should be a little faster than doing a full string search in lua for bigger files. We, i.e. the monotone devs, are aware of the problem though and will come up with a solution sooner or later.

I hope this new hook will still be useful for some of you until then.

Mailing list roundup

I’ve just set up a new mailing list specifically for monotone users, who find the (sometimes endless) developer discussions too boring or are annoyed of ticket spam. You can find the new list’s interface here.

The plan is to do basic first level support on this list and move developer-relevant parts via cross-posting over to the old monotone-devel list. While I’m already subscribed to the new list, I encourage a couple of other developers to subscribe there as well, in case I’m not available.

I also registered monotone-users and the pre-existing one for the Debian packaging team on Gmane, but it will take a bit more time until they set them up over there, so please be patient.

Better late than never

ISO finally revises the voting directives for open standards after the OOXML debacle in 2007 / 2008. One of the changes is that the national bodies should no longer vote with “Yes, with comments” if they encounter serious flaws and trust on the ballot resolution meeting to get their issues solved (which evidently did not happen for OOXML), but should vote “No, with comments” instead.

Furthermore, if the “standard” receives more than 25% disapproval, it should now officially “be over” as well – if these rules would have been applied in the past, OOXML would not be an ISO certified standard as it is unfortunately today.

There are also smaller, less substantial changes. For example, the dedication to Jan van den Beld, the former head of Ecma, for his “unwavering dedication to the development and evolution of the JTC 1 procedures”, has been removed. Ironically, both Ecma and Microsoft have indeed made long-term contributions to the evolution of Fast Track in JTC1, but probably not the way they intended.

(Source)

Now the only paragraph I’m missing in the new rules obviously is a way to revoke a broken standard, but I guess this won’t happen. Lets just hope that OOXML sinks into insignificance in the next couple of years.

guitone and monotone 0.48

The current fourth release candidate of guitone doesn’t work out of the box with monotone 0.48. The reason is that the minor interface version changed slightly and my version check is too strict in this regard. But there is an option for the rescue – simply check “relaxed version check” in the preferences and guitone will happily work with monotone 0.48 and later versions, unless a major change lets something break there:

The final version of guitone will probably take a little longer, since I want to synchronize this release with the release of 0.99 / 1.0 of monotone, so stay tuned. Other development continues in the meantime, I’m currently working on including support to query remote databases from guitone, which will likely make it into guitone 1.1.

Land des Stillstands

Die Spatzen twittern, äh, pfeifen es von den Dächern, Wulff ist wohl durch. Finde ich sehr schön, wurde doch somit die letzte, kantige Personalie in Berlin (Köhler) durch einen weiteren CDU/CSU-getreuen Vasallen ersetzt. Was ich persönlich von Herrn Wulff erwarte? Nun, viel mehr als die jährliche Neujahrsansprache wohl sicherlich nicht…

[Update: Wulff fehlten doch noch 32 Stimmen, Gauck hatte 599. Wurde wohl doch falsch gezwitschert, wenn nun die Linke geschlossen fĂĽr Gauck votieren wĂĽrde…, ja, das wärs.]