aboutsummaryrefslogtreecommitdiffstats
path: root/doc/white-papers/mail/ibex.sgml
diff options
context:
space:
mode:
Diffstat (limited to 'doc/white-papers/mail/ibex.sgml')
-rw-r--r--doc/white-papers/mail/ibex.sgml158
1 files changed, 0 insertions, 158 deletions
diff --git a/doc/white-papers/mail/ibex.sgml b/doc/white-papers/mail/ibex.sgml
deleted file mode 100644
index dcb8f5ca4b..0000000000
--- a/doc/white-papers/mail/ibex.sgml
+++ /dev/null
@@ -1,158 +0,0 @@
-<!doctype article PUBLIC "-//Davenport//DTD DocBook V3.0//EN" [
-<!entity Evolution "<application>Evolution</application>">
-<!entity Camel "Camel">
-<!entity Ibex "Ibex">
-]>
-
-<article class="whitepaper" id="ibex">
-
- <artheader>
- <title>Ibex: an Indexing System</title>
-
- <authorgroup>
- <author>
- <firstname>Dan</firstname>
- <surname>Winship</surname>
- <affiliation>
- <address>
- <email>danw@helixcode.com</email>
- </address>
- </affiliation>
- </author>
- </authorgroup>
-
- <copyright>
- <year>2000</year>
- <holder>Helix Code, Inc.</holder>
- </copyright>
-
- </artheader>
-
- <sect1 id="introduction">
- <title>Introduction</title>
-
- <para>
- &Ibex; is a library for text indexing. It is being used by
- &Camel; to allow it to quickly search locally-stored messages,
- either because the user is looking for a specific piece of text,
- or because the application is contructing a vFolder or filtering
- incoming mail.
- </para>
- </sect1>
-
- <sect1 id="goals">
- <title>Design Goals and Requirements for Ibex</title>
-
- <para>
- The design of &Ibex; is based on a number of requirements.
-
- <itemizedlist>
- <listitem>
- <para>
- First, obviously, it must be fast. In particular, searching
- the index must be appreciably faster than searching through
- the messages themselves, and constructing and maintaining
- the index must not take a noticeable amount of time.
- </para>
- </listitem>
-
- <listitem>
- <para>
- The indexes must not take up too much space. Many users have
- limited filesystem quotas on the systems where they read
- their mail, and even users who read mail on private machines
- have to worry about running out of space on their disks. The
- indexes should be able to do their job without taking up so
- much space that the user decides he would be better off
- without them.
- </para>
-
- <para>
- Another aspect of this problem is that the system as a whole
- must be clever about what it does and does not index:
- accidentally indexing a "text" mail message containing
- uuencoded, BinHexed, or PGP-encrypted data will drastically
- affect the size of the index file. Either the caller or the
- indexer itself has to avoid trying to index these sorts of
- things.
- </para>
- </listitem>
-
- <listitem>
- <para>
- The indexing system must allow data to be added to the index
- incrementally, so that new messages can be added to the
- index (and deleted messages can be removed from it) without
- having to re-scan all existing messages.
- </para>
- </listitem>
-
- <listitem>
- <para>
- It must allow the calling application to explain the
- structure of the data however it wants to, rather than
- requiring that the unit of indexing be individual files.
- This way, &Camel; can index a single mbox-format file and
- treat it as multiple messages.
- </para>
- </listitem>
-
- <listitem>
- <para>
- It must support non-ASCII text, given that many people send
- and receive non-English email, and even people who only
- speak English may receive email from people whose names
- cannot be written in the US-ASCII character set.
- </para>
- </listitem>
- </itemizedlist>
-
- <para>
- While there are a number of existing indexing systems, none of
- them met all (or even most) of our requirements.
- </para>
- </sect1>
-
- <sect1 id="implementation">
- <title>The Implementation</title>
-
- <para>
- &Ibex; is still young, and many of the details of the current
- implementation are not yet finalized.
- </para>
-
- <para>
- With the current index file format, 13 megabytes of Info files
- can be indexed into a 371 kilobyte index file&mdash;a bit under
- 3% of the original size. This is reasonable, but making it
- smaller would be nice. (The file format includes some simple
- compression, but <application>gzip</application> can compress an
- index file to about half its size, so we can clearly do better.)
- </para>
-
- <para>
- The implementation has been profiled and optimized for speed to
- some degree. But, it has so far only been run on a 500MHz
- Pentium III system with very fast disks, so we have no solid
- benchmarks.
- </para>
-
- <para>
- Further optimization (of both the file format and the in-memory
- data structures) awaits seeing how the library is most easily
- used by &Evolution;: if the indexes are likely to be kept in
- memory for long periods of time, the in-memory data structures
- need to be kept small, but the reading and writing operations
- can be slow. On the other hand, if the indexes will only be
- opened when they are needed, reading and writing must be fast,
- and memory usage is less critical.
- </para>
-
- <para>
- Of course, to be useful for other applications that have
- indexing needs, the library should provide several options, so
- that each application can use the library in the way that is
- most suited for its needs.
- </para>
- </sect1>
-</article>