aboutsummaryrefslogtreecommitdiffstats
path: root/devel-docs/query/virtual-folder-in-depth.txt
diff options
context:
space:
mode:
Diffstat (limited to 'devel-docs/query/virtual-folder-in-depth.txt')
-rw-r--r--devel-docs/query/virtual-folder-in-depth.txt309
1 files changed, 0 insertions, 309 deletions
diff --git a/devel-docs/query/virtual-folder-in-depth.txt b/devel-docs/query/virtual-folder-in-depth.txt
deleted file mode 100644
index 01718a5f05..0000000000
--- a/devel-docs/query/virtual-folder-in-depth.txt
+++ /dev/null
@@ -1,309 +0,0 @@
-TITLE: An in-depth look at the virtual folder mechanism
-AUTHOR: Giao Nguyen <grail@cafebabe.org>
-
-* introduction
-
-This document describes a different way of approaching mail
-organization and how all things are possible in this brave new
-world. This document does not describe physical storage issues nor
-interface issues.
-
-Historically mail has been organized into folders. These folders
-usually mapped to a single storage medium. The relationship between
-mail organization and storage medium was one to one. There was one
-mail organization for every storage medium. This scheme had its
-limitations.
-
-Efforts at categorizations are only meaningful at the instance that
-one categorized. To find any piece of data, regardless of how well
-it was categorized, required some amount of searching. Therefore, any
-attempts to nullify searching is doomed to fail. It's time to embrace
-searching as a way of life.
-
-These are the terms and their definitions. The example rules used are
-based on the syntax for VM (http://www.wonderworks.com/vm/) by Kyle
-Jones whose ideas form the basis for this. I'm only adding the
-existence of summary files to aid in scaling. I currently use VM and
-it's virtual-folder rules for my daily mail purposes. To date, my only
-complaints are speed (it has no caches) and for the unitiated, it's
-not very user-friendly.
-
-Comments, questions, rants, etc. should be directed at Giao Nguyen
-<grail@cafebabe.org> who will try to address issues in a timely
-manner.
-
-* Definitions
-
-** store
-
-A location where mail can be found. This may be a file (Berkeley
-mbox), directory (MH), IMAP server, POP3 server, Exchange server,
-Lotus Notes server, a stack of Post-Its by your monitor fed through
-some OCR system.
-
-** message
-
-An individual mail message.
-
-** vfolder
-
-A group of messages sharing some commonality. This is the result of a
-query. The vfolder maybe contained in a store, but it is not necessary
-that a store holds only one vfolder. There is always an implicit
-vfolder rule which matches all messages. A store contains the vfolder
-which is the result of the query (any). It's short for virtual folder
-or maybe view folder. I dunno.
-
-** default-vfolder
-
-The vfolder defined by (any) applied to the store. This is not the
-inbox. The inbox could easily be defined by a query. A default rule
-for the inbox could be (new) but it doesn't have to be. Mine happens
-to be (or (unread) (new)).
-
-** folder
-
-The classical mail folder approach: one message organization per
-store.
-
-** query
-
-A search for messages. The result of this is a vfolder. There are two
-kinds of queries: named queries and lambda queries. More on this
-later.
-
-** summary file
-
-An external file that contains pointers to messages which are matches
-for a named query. In addition to pointers, the summary file should
-also contain signatures of the store for sanity checks. When the term
-"index" is used as a verb, it means to build a summary file for a
-given name-value pair.
-
-* Queries
-
-Named queries are analogous to classical mail folders. Because named
-queries maybe reused, summary files are kept as caches to reduce
-the overall cost of viewing a vfolder. Summary files are superior to
-folders in that they allow for the same messages to appear in multiple
-vfolders without message duplications. Duplications of messages
-defeats attempts at tagging a message with additional user information
-like annotations. Named queries will define folders.
-
-Lambda queries are similar to named queries except that they have no
-name. These are created on the fly by the user to filter out or
-include certain messages.
-
-All queries can be layered on top of each other. A lambda query can be
-layered on a named query and a named query can be layered on a lambda
-query. The possibilities are endless.
-
-The layerings can be done as boolean operations (and, or, not). Short
-circuiting should be used.
-
-Examples:
-
-(and (author "Giao")
- (unread))
-
-The (unread) query should only be evaluated on the results of (author
-"Giao").
-
-(or (author "Giao")
- (unread))
-
-Both of these queries should be evaluated. Any matches are added to the
-resulting vfolder.
-
-* Summary files
-
-Summary files are only meaningful when applied to the context of the
-default-vfolder of a store.
-
-Summary files should be generated for queries of the form:
-
-(function "constant value")
-
-Summary files should never be generated for queries of the form:
-
-(function (function1))
-
-(and (function "value")
- (another-function "another value"))
-
-Given a query of the form:
-
-(and (function "value")
- (another-function "another value"))
-
-The system should use one summary file for (function "value") and
-another summary file for (another-function "another value"). I will
-call the prior form the "plain form".
-
-It should be noted that the signature of the store should be based on
-the assumption that new data may have been added to the store since
-the application generated the summary file. Signatures generated on
-the entirety of the store will most likely be meaningless for things
-like POP/IMAP servers.
-
-* Incremental indexing
-
-When new messages are detected, all known queries should be evaluated
-on the new messages. vfolders should be notified of new messages that
-are positive matches for their queries. The indexes generated by this
-process should be merged into the current indexes for the vfolder.
-
-* Can I have multiple stores?
-
-I don't see why not. Again, the inbox is a vfolder so you can get a
-unified inbox consisting of all new mail sent to all your stores or
-your can get inboxes for each store or any combination your heart
-desire. You get your cake, eat it, and someone else cleans the dishes!
-
-* Why all this?
-
-Consider the dynamic nature of the following query:
-
-(and (author "Giao")
- (sent-after (today-midnight)))
-
-today-midnight would be a function that is evaluated at run-time to
-calculate the appropriate object.
-
-* Scenarios of usage and their solutions
-
-** Mesage alterations
-
-This is a fuzzy area that should be left to the UI to handle. Messages
-are altered. Read status are altered when a new message is read for
-example. How do we handle this if our query is for unread messages?
-Upon viewing the state would change.
-
-One idea is to not evaluate the queries unless we're changing between
-vfolder views. This assumes that one can only view a particular
-vfolder at a time. For multi-vfolder viewing, a message change should
-propagate through the vfolder system. Certain effects (as in our
-example) would not be intuitive.
-
-It would not be a clean solution to make special cases but they may be
-necessary where certain defined fields are ignored when they are
-changed. Some combination of the above rules can be used. I don't
-think it's an easy solution.
-
-** Message inclusion and exclusion
-
-Messages are included and excluded also with queries. The final query
-will have the form of:
-
-(and (author "Giao")
- (criteria value)
- (not (criteria other-value)))
-
-Userland criterias may be a label of some sort. These may be userland
-labels or Message-IDs. What are the performance issues involved in
-this? With short circuiting, it's not a major problem.
-
-The criterias and values are determined by the UI. The vfolder
-mechanism isn't concerned with such issues.
-
-Messages can be included and excluded at will. The idea is often
-called "arbitrary inclusion/exclusion". This can be done by
-Message-IDs or other fields. It's been noted that Message-IDs are not
-unique.
-
-I propose that any given vfolder is allocated an inclusion label and an
-exclusion label. These should be randomly generated. This should be
-part of the vfolder description. It should be noted that the vfolder
-description has not been drafted yet.
-
-The result is such that the rules for a given named query is:
-
-(and (user-query)
- (label inclusion-label)
- (not exclusion-label))
-
-** Query scheduling
-
-Consider the following extremely dynamic queries:
-
-A:
-(and (author "Giao")
- (sent-after (today-midnight)))
-
-B:
-(and (sent-after (today-midnight))
- (author "Giao"))
-
-C:
-(or (author "Giao")
- (sent-after (today-midnight)))
-
-Query A would be significantly faster because (author "Giao") is not
-dynamic. A summary file could be generated for this query. Query B is
-slow and can be optimized if there was a query compiler of some
-sort. Query C demonstrates a query in which there is no good
-optimization which can be applied. These come with a certain amount of
-baggage.
-
-It seems then that for boolean 'and' operations, plain forms should be
-moved forward and other queries should be moved such that they are
-evaluated later. I would expect that the majority of queries would be
-of the plain form.
-
-First is that the summary file is tied to the query and the store
-where the query originates from. Second, a hashing function for
-strings needs to be calculated for the query so that the query and the
-summary file can be associated. This hashing function could be similar
-to the hashing function described in Rob Pike's "The Practice of
-Programming". (FIXME: Stick page number here)
-
-** Archives
-
-Many people are concerned that archives won't be preserved, archives
-aren't supported, and many other archive related issues. This is the
-short version.
-
-Archives are just that, archives. Archives are stores. Take your
-vfolder, export it to a store. You are done. If you load up the store
-again, then the default-vfolder of that store is the view of the
-vfolder, except the query is different.
-
-The point to vfolder is not to do away with classical folder
-representation but to move the queries to the front where it would
-make data management easier for people who don't think in terms of
-files but in terms of queries because ordinary people don't think in
-terms of files.
-
-* Miscellany
-
-** Annotations
-
-There should be a scheme to add annotations to messages. Common mail
-user agents have used a tag in the message header to mark messages as
-read/unread for example. Extending on this we have the ability to add
-our own data to a message to add meaning to it. If we have a good
-scheme for doing this, new possibilities are opened.
-
-*** Keywords
-
-When sending a message, a message could have certain keywords attached
-to it. While this can be done with the subject line, the subject line
-has a tendency to be munged by other mail applications. One popular
-example is the "[rR]e:" prefix. Using the subject line also breaks the
-"contract" with other mail user agents. Using keywords in another
-field in the message header allows the sender to assist the recipient
-in organizing data automatically. Note that the sender can only
-provide hints as the sender is unlikely to know the organization
-schemes of the recipient.
-
-** Scope
-
-Let us assume that we have multiple stores. Does a query work on a
-given store? Or does it work on all stores? Or is it configurable such
-that a query can work on a user-selected list of stores?
-
-* Alternatives to the above
-
-Jim Meyer <purp@selequa.com> is putting some notes on where
-annotations needs to be located. They'll be located here as well as
-any contributions I may have to them.