如果不需要对邮件正文的快速搜索,mu提供的主题、发件人、收件人、日期、标记等搜索功能应该足够了。

基于xapian的mu虽然用UTF8编码索引了邮件正文,但是由于分词问题,cjk正文搜索的功能几乎不能用。

同样基于xapian的recoll却实现了,虽然自带的QT界面没什么吸引力,但是命令行界面就不一样拉。

1$ sudo apt-get install recoll
2$ recoll
3$ cat > ~/.recoll/recoll.conf <<-EOF
# 希望被索引的邮箱目录
topdirs = /your/maildir/
# 保存索引结果的目录
dbdir = /your/folder
EOF
4$ recollindex
5$ recoll -t -q "中文任意关键字"

先安装 recoll<1>,感受一下图形界面<2>,确认配置(图形界面里面也可以配置)<3>,创建索引(图形界面也可以创建)<4>,搜索关键字(图形界面也可以)<5>。

接下来就是通过 recoll 的命令行输出结果,把搜索到的目标邮件的绝对路径软链接到一个mutt的虚拟邮箱(假设虚拟邮箱名是 bingo)。

6$ export MAILDIR="/your/maildir/"
7$ recoll -t -q "中文任意关键字" | grep message/rfc822 | sed -s 's,^.*\('${MAILDIR}'[^]]*\)\].*$,"\1",' | xargs ln -sft ${MAILDIR}bingo/cur

不希望每次都从mutt切换到终端执行这么一长串命令的话,可以自己写一个bash脚本,然后再写一个mutt的宏并设定一个快捷键(假设是数字1左边的按键)。

8$ cat > ~/.mutt/mutt-grep.bash <<-EOF
#!/bin/bash

MAILDIR=/your/maildir
MFOLDER=bingo
recoll -t -q ${*} | grep message/rfc822 | sed -s 's,^.*\('$MAILDIR'[^]]*\)\].*$,\"\1\",' | xargs ln -sft $MAILDIR/$MFOLDER/cur/
EOF
9$ echo 'macro index,pager \` "<shell-escape>~/.mutt/mutt-grep.bash " "recoll find"' >> ~/.muttrc

打开mutt,按下快捷键,输入要搜索的内容,回车,切换到bingo邮箱,浏览搜索结果。

这样好像还是有点麻烦,可惜我对 bash 不熟悉,所以只针对 zsh 做了一个脚本,实现了下面这些功能

  • 集成了 mu 和 recoll
  • 自动清除上次的搜索结果
  • 自动切换到虚拟邮箱
  • 高亮显示搜索关键字
  • 支持搜索历史记录
  • 输入搜索关键字时mutt界面不退出

recoll 不止是邮件索引工具,他是一个类似 Google Desktop 的全文索引工具。squeeze的版本(1.13.04)比较旧,我打了一个squeeze-backports(1.16.2)。

  • Openoffice files need unzip and xsltproc.
  • PDF files need pdftotext which is part of the Xpdf or Poppler packages.
  • Postscript files need pstotext. The original version has an issue with shell character in file names, which is corrected in recent packages. See the the Recoll helper applications page for more detail.
  • MS Word needs antiword. It is also useful to have wvWare installed as it may be be used as a fallback for some files which antiword does not handle.
  • MS Excel and PowerPoint need catdoc.
  • MS Open XML (docx) needs xsltproc.
  • Wordperfect files need wpd2html from the libwpd (or libwpd-tools on Ubuntu) package.
  • RTF files need unrtf, which, in its standard version, has much trouble with non-western character sets. Check the Recoll helper applications page.
  • TeX files need untex or detex. Check the Recoll helper applications page for sources if it’s not packaged for your distribution.
  • dvi files need dvips.
  • djvu files need djvutxt and djvused from the DjVuLibre package.
  • Audio files: Recoll releases before 1.13 used the id3info command from the id3lib package to extract mp3 tag information, metaflac (standard flac tools) for flac files, and ogginfo (vorbis tools) for ogg files. Releases 1.14 and later use a single Python filter based on mutagen for all audio file types.
  • Pictures: Recoll uses the Exiftool Perl package to extract tag information. Most image file formats are supported. Note that there may not be much interest in indexing the technical tags (image size, aperture, etc.). This is only of interest if you store personal tags or textual descriptions inside the image files.
  • chm: files in microsoft help format need Python and the pychm module (which needs chmlib).
  • ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar module. icalendar is not needed for newer versions, which use internal code.
  • Zip archives need Python (and the standard zipfile module).
  • Rar archives need Python, the rarfile Python module and the unrar utility.
  • Midi karaoke files need Python and the Midi module
  • Konqueror webarchive format with Python (uses the Tarfile module).
  • mimehtml web archive format (support based on the mail filter, which introduces some mild weirdness, but still usable).
  • Text, HTML, mail folders, and Scribus files are processed internally.
  • Lyx is used to index Lyx files.
  • Many filters need iconv and the standard sed and awk.