Search and replace multiple lines across many files

sed is usually my favourite tool to search and replace things from the command line, but sometimes Perl’s regexes are far more convenient to use. Recently I found out another reason why Perls -pi -e is superior over plain sed: when you want to change multiple lines in a document!

Imagine you have hundreds of source code files where somebody once had the great idea to add a ___version___ property into each class:

public class Foo
{
    private static final String ___version___ = "$Version:$";
    
    // other stuff
}

With Perl the line in question is easy to remove:

$ for file in $(find . -name "*.java"); do \
   cp $file $file.bkp; perl -pi -e \
      "s/\s*public.+___version___.+\n//g" \
   < $file.bkp > $file; rm $file.bkp; done

But, there is one problem: Perl processes each line of the file separately when it slurps in the file, which results in unwanted empty lines:

public class Foo
{
    
    // other stuff
}

Then I stumbled upon this article and the solution is to set a special input separator to let Perl slurp in the file as a whole:

$ for file in $(find . -name "*.java"); do \
   cp $file $file.bkp; perl -p0777i -e \
     "s/\s*public.+___version___.+\n(\s*\n)*/\n/g" \
   < $file.bkp > $file; rm $file.bkp; done

and voila, we get what we want:

public class Foo
{
    // other stuff
}

Digging a little deeper what -0777 actually means leads us to perlrun(1):

The special value 00 will cause Perl to slurp files in paragraph mode. The value 0777 will cause Perl to slurp files whole because there is no legal byte with that value.

Another day saved – thanks to Perl!

And while we’re at it, have a look at Rakudo Star, the best Perl 6 compiler which was released just recently. Perl 6 is in my humble opinion one of the well-designed languages I’ve came across so far, so if you find some time, go over and read the last christmas special, its really worth it!

On monotone selectors

This is the first post in a small series of posts which will show off some of the new functionality you can expect in the next major version of monotone. While there is no fixed release date set for it yet, we plan to release it in fall this year. If you look at the roadmap you see that most things have already been implemented and merged into mainline, so we’re definitely on plan 🙂

Anyways, lets begin this little series with the selector rewrite Tim merged a couple of weeks ago. Selectors are one of the main concepts in monotone to pick revisions other than by their 40 byte long hash id and are therefor very useful to “navigate” between different development lines.

Monotone up until 0.48 knows already many selectors – you can select revisions by tag, by branch, by author, by custom cert values and so on. Selectors can be combined to calculate the intersection between two single sets, like “show me all revisions from author ‘Jon’ on branch ‘my.project'” which would essentially look like this:

$ mtn automate select "a:jon/b:my.project"

The syntax for these selectors is all nice and simple – each selector is prefixed with a unique character and multiple selectors are concatenated with a single slash. While these old-style selectors solved many use cases, some however kept unresolved in the past and users from other DVCS like Darcs had a rather hard time figuring out how to accomplish a certain selection in monotone.

A particular good example is “how can I easily view the changes of a development branch since the last merge point?”. Up until now you either had to figure out manually the revision of the merge point by looking at the output of log or use some scary construct like the following:

$ mtn au common_ancestors $(mtn au select h:main.branch) \
    $(mtn au select h:) | mtn au erase_ancestors -@-

Enter selector functions

Luckily, you don’t have to write these things anymore starting from 0.99 onwards. Give the new selector functions a warm applause!

$ mtn au select "lca(h:main.branch;h:feature.branch)"

In this example “lca” stands for the “least common ancestors” function which takes two arguments, i.e. two other selectors. The syntax is extra short in a workspace where an empty head selector h: defaults to the branch recorded in the workspace options, so if you’re in the feature.branch workspace, just type:

$ mtn au select "lca(h:main.branch;h:)"

Quite convenient, eh? This is not only short, but up to five times faster than the above complex command line. Of course the selector can be directly used in a call to diff or log, like so:

$ mtn diff -r "lca(h:main.branch;h:)"
$ mtn log --to children(lca(h:main.branch;h:))"

But huh, whats that nested children call you ask? Well, the lca function picks the merge point in the _main branch_ and if the revision graph goes around that, log would otherwise happily log more parents (earlier revisions) on the feature branch. The call to children ensures that we pick the merge revision in the feature branch and therefor really stop logging at this revision.

Test drive

There are many more of these selector functions and explaining them all in detail is out of scope here, please have a look at “composite selectors” in the nightly built manual.
And if you want to have an early look at this and play around without having to compile it yourself – at least if you’re on openSUSE or Fedora – just download the binaries from our nightly builds.

New local pre-commit hook in monotone

There was only one hook in monotone until now which could be “reused” to interact with the commit process and validate the changeset that should be committed, the `validate_commit_message` hook. But this was a bit clumsy as it was actually designed to validate the commit message (as the name suggests) and not the changeset, thus the hook was called _after_ the commit message was entered in the editor (or was given with `–message` or `–message-file`).

Now monotone (from 0.99 onwards) gained a new commit hook which is called before the commit message processing takes place, but after the logic validated the changeset and branch to which it should be committed. Its named simply `validate_changes` and takes two parameters, the revision to be committed as full text (parsable in the hook via `parse_basic_io`) as the first and the name of the branch to which the revision should be committed as the second. Just as `validate_commit_message`, it is expected to return a tupel containing a boolean which denotes if the change is valid or not and an optional string which explains the reason if not and which is displayed to the committer afterwards.

With this new installment, it should feel natural e.g. to create a pre-commit hook which ensures that none of the patched or added sources contains Windows line endings:

function validate_changes(revdata, branchname)
  local parsed = parse_basic_io(revdata)
  for _,stanza in ipairs(parsed) do
    if stanza.name == "add_file" or
       stanza.name == "patch" then
      local file = stanza.values[1]
      if not guess_binary_file_contents(file) then
        local fp = assert(io.open(file, "r"))
        local contents = fp:read("*all")
        fp:close()
        if string.find(contents, "\r\n") then
          return false, "CRLF detected"
        end
      end
    end
  end
  return true, ""
end

Unfortunately its not yet possible to call `mtn_automate`, the lua interface to monotone’s automation commands, from hooks like this. Then we could have saved the `read(“*all”)` call and would only have to scan the output of `automate content_diff`, which should be a little faster than doing a full string search in lua for bigger files. We, i.e. the monotone devs, are aware of the problem though and will come up with a solution sooner or later.

I hope this new hook will still be useful for some of you until then.

Mailing list roundup

I’ve just set up a new mailing list specifically for monotone users, who find the (sometimes endless) developer discussions too boring or are annoyed of ticket spam. You can find the new list’s interface here.

The plan is to do basic first level support on this list and move developer-relevant parts via cross-posting over to the old monotone-devel list. While I’m already subscribed to the new list, I encourage a couple of other developers to subscribe there as well, in case I’m not available.

I also registered monotone-users and the pre-existing one for the Debian packaging team on Gmane, but it will take a bit more time until they set them up over there, so please be patient.

guitone and monotone 0.48

The current fourth release candidate of guitone doesn’t work out of the box with monotone 0.48. The reason is that the minor interface version changed slightly and my version check is too strict in this regard. But there is an option for the rescue – simply check “relaxed version check” in the preferences and guitone will happily work with monotone 0.48 and later versions, unless a major change lets something break there:

The final version of guitone will probably take a little longer, since I want to synchronize this release with the release of 0.99 / 1.0 of monotone, so stay tuned. Other development continues in the meantime, I’m currently working on including support to query remote databases from guitone, which will likely make it into guitone 1.1.

Sender verification failed – or How you should treat your customers correctly

For a couple of years now, one of the easiest, yet very powerful anti-spam techniques is sender verification. Often spam is sent from bogus email addresses which contain random strings and are therefor also hard to blacklist. In this process the receiving mail server simply checks the `From:` header of every mail and asks the sending mail server if it actually knows the sending user. If not, the receiving mail server will most likely immediately discard the message with a “550: sender verify failed”. To not put a high load on the sending server, the result is cached in the receiving one, so if you receive 20 mails from bob@foo.com, its sending server is probably only asked once (or not at all if it has been asked before).

My exim instance has sender verification enabled by default and I like it, because ~90% of the spam doesn’t even need to get scanned by spamassassin, which in return means lower server load. However, sender verification also makes problems sometimes, especially if automatically crafted emails from, lets call them “legacy”, systems should reach me. You can of course replace “legacy” with “simple PHP mail script” or “shop frontend” if you like, as administrators or developers of these systems are apparently completely unaware of the bad job they do, if they fulfil the requirement “hey, the user should get this notification email, but ensure that he won’t spam the support with questions about it, so use an invalid address…”

You know what follows: The novice, or sometimes also not so novice developer / administrator, simply goes ahead and sets `noreply@host.com` as `From:` address. Especially in shared hosting environments there is usually an MX server configured for the hosted domain which allows local relaying, so sending a mail from a PHP script like this

mail("joe@otherhost.com", "Hello", "It works!",
     "From: noreply@host.com\r\n");

seems so simple. Of course most of the time it gets completly forgotten to give the mail server of the `host.com` domain a notice that there is suddenly a new (bogus) mail user available within one of his managed domains! So how do you fulfil the “don’t spam the support” requirement then?

Well, the simplest way is to use an existing mail address which is known within the sending mail server and then also add a `Reply-To:` header to your mail which may then contain the bogus address. If the user clicks on “reply” in his mail client, this reply-to address will pop up in the `To:` field and you practically achieve the same effect.

But probably the best way is of course to convince your management that it should not ignore customer inquiries by stupid procedures like this…

As a customer of several online services I have encountered this and similar mail problems a lot in the past. I cannot remember exactly when I actually stopped informing the individual webmaster or support team about the problems they had with their mail setup, simply because my inquiries had been ignored most of the time. See this blog post as a silent rant for all the crappy configured setups out there.

monotone 0.48 released

We, the monotone developers, are very proud to announce the new 0.48 release of our distributed version control system.

This release comes with dozens of bug fixes – a fall-out of joint efforts during a bug hunt fest earlier this year – and some interesting new features, such as an improved changelog editing view and new database management features.

Please check as always the NEWS file for a detailed list of changes and improvements. Binaries will be posted as they come in and will be retrievable from the Downloads page.

For the next version of monotone expect further stabilization work and UI improvements as well as completed localizations. We plan to make another minor release and are approaching 1.0… finally!

monotone translators needed

So you can’t code C++, but still want to help out our little version control system? Fine, then maybe you’re fluent or even native with a foreign language – if so, our translation team could really need your help!

Right now monotone ships with five active translations, Swedish (maintained by Richard Levitte) , Italian (maintained by Lapo Luchini), Spanish (maintained by Nicolas Ruiz), Portuguese (maintained by Américo Monteiro) and German (maintained by myself, Thomas Keller). Especially the first three maintainers are currently a bit behind and short on time, so if you are capable to help out, just drop me a note or send a message to monotone-i18n@nongnu.org.

We also have two more “inactive” translations, French and Japanese, lurking around in our source tree which you could pick up and complete, but its a bit more work to finish these.

Beside that you can also start with a completely new translation and I’d be happy to assist you with everything you need for that. Again, just drop me a note or send a message to the group, we’ll quickly set you up!

Makefile-based InnoSetup automation with QMake

Over the last couple of weeks I did several major improvements to the QMake-based build setup guitone uses: The project file comes now with one target to create a tarball, one to create a Mac OS X disk image containing all the needed Qt libraries and one target to install the application, which can be configured to use all the options you know from autotool-based projects (like PREFIX, BINDIR or DESTDIR, to name a few).

But yes, there was one task which was yet missing there – one to automatically create a Win32 installer. The steps to produce that had been so far:

  1. enter the to-be-packaged version in the InnoSetup script file
  2. convert the supplied text files from Unix to DOS line endings, while giving them a .txt extension
  3. call the InnoSetup compiler on the script file and create the executable

Especially the first and second action looked hard to automate, given the fact that Windows does not come with a rich set of tools to process text streams – and requiring a Cygwin installation just for using sed seemed awkward to me. Obviously other people had similar problems before and somebody proposed to emulate sed with a VBScript which would be executed by the Windows Scripting Host (WSH). Wow, cool thing – if I’d just remember my broken Visual Basic knowledge. But didn’t Microsoft have this Javascript Look-a-Like, JScript? Shouldn’t this be executable as well?

Apparently it was and I sat down to hack an improved JScript sed version:

var patterns = new Array();
var replacements = new Array();
var argcount = 0;

for (var i=0; icscript and to combine everything for a proper QMake target. Here we go:

DOCFILES="NEWS README README.driver COPYING"
...
win32 {
    isEmpty(QTDIR):QTDIR           = "c:\Qt\4.6.2"
    isEmpty(MINGWDIR):MINGWDIR     = "c:\MinGW"
    isEmpty(OPENSSLDIR):OPENSSLDIR = "c:\OpenSSL"
    isEmpty(ISCC):ISCC = "c:\Program Files\Inno Setup 5\ISCC.exe"
    
    win32setup.depends  = make_first
    win32setup.target   = win32setup
    win32setup.commands = \
        cscript //NoLogo res\win32\sed.js \
            s/@@VERSION@@/$${VERSION}/ \
            s/@@QTDIR@@/$${QTDIR}/ \
            s/@@MINGWDIR@@/$${MINGWDIR}/ \
            s/@@OPENSSLDIR@@/$${OPENSSLDIR}/ \
            < res\win32\guitone.iss.in > res\win32\guitone.iss && \
        ( for %%f in ($$DOCFILES) do \
            cscript //NoLogo res\win32\sed.js \
                s/\n\$$/\r\n/ \
                < %%f > %%f.txt ) && \
        \"$$ISCC\" res\win32\guitone.iss && \
        ( for %%f IN ($$DOCFILES) do del %%f.txt )
    
    QMAKE_EXTRA_TARGETS += win32setup
}

So if you know enough Javascript you can probably emulate whatever tool you’re missing on Win32 without having to depend on any external dependency. Very cool!

mtn-browse 0.70 and accompanying Perl monotone library released

Tony Cooper writes on monotone-devel:

I would like to announce the 0.70 release of mtn-browse:

Monotone browser (mtn-browse) is an application for browsing Monotone VCS databases without the need for a workspace. The interface allows one to:

  • Easily select a revision from within a branch
  • Find a revision using complex queries
  • Navigate the contents of a revision using a built in file manager
  • Display file contents, either using the internal viewer or an external helper application
  • Compare the changes between different revisions or versions of a file either using the internal difference viewer or an external application
  • Find files within a revision based on detailed search criteria
  • Display file annotations and easily refer back to the corresponding change documentation
  • Save files to disk
  • Browse remote databases via the netsync protocol
  • Support for Monotone version 0.35 up to 0.47
  • Extensive built in help
  • In English with additional German locale

This version brings many bug fixes and locale support improvements along with support for the newer versions of Monotone. The source can be downloaded from here.

Monotone::AutomateStdio is an object oriented Perl library module that allows Perl applications to easily interface with Monotone’s automate stdio interface. This library supports Monotone versions 0.35
up to and including 0.47. All of the automate stdio functions are available via this library. The source and documentation can be downloaded from here.

Both projects currently support Linux and Mac OS X, but should also work on Solaris and other Unixes. They are considered stable (well at least by me) so let me know if you run into problems.

Keep up the good work, Tony!