mail/README.async


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366

*******
This document is absurdly, obscenely out of date. Don't read it.

    -- Peter Williams 7/2/2001
*******

Asynchronous Mailer Information
Peter Williams <peterw@helixcode.com>
8/4/2000

1. INTRODUCTION

It's pretty clear that the Evolution mailer needs to be asynchronous in
some manner. Blocking the UI completely on IMAP calls or large disk reads
is unnacceptable, and it's really nice to be able to thread the message
view in the background, or do other things while a mailbox is downloading.

The problem in making Evolution asynchronous is Camel. Camel is not 
asynchronous for a few reasons. All of its interfaces are synchronous --
calls like camel_store_get_folder, camel_folder_get_message, etc. can
take a very long time if they're being performed over a network or with
a large mbox mailbox file. However, these functions have no mechanism
for specifying that the operation is in progress but not complete, and
no mechanism for signaling when to operation does complete.

2. WHY I DIDN'T MAKE CAMEL ASYNCHRONOUS

It seems like it would be a good idea, then, to rewrite Camel to be
asynchonous. This presents several problems:

    * Many interfaces must be rewritten to support "completed"
      callbacks, etc. Some of these interfaces are connected to
      synchronous CORBA calls.
    * Everything must be rewritten to be asynchonous. This includes
      the CamelStore, CamelFolder, CamelMimeMessage, CamelProvider,
      every subclass thereof, and all the code that touches these.
      These include the files in camel/, mail/, filter/, and
      composer/. The change would be a complete redesign for any
      provider implementation.
    * All the work on providers (IMAP, mh, mbox, nntp) up to this
      point would be wasted. While they were being rewritten
      evolution-mail would be useless.

However, it is worth noting that the solution I chose is not optimal,
and I think that it would be worthwhile to write a libcamel2 or some
such thing that was designed from the ground up to work asynchronously.
Starting fresh from such a design would work, but trying to move the
existing code over would be more trouble than it's worth.

3. WHY I MADE CAMEL THREADED

If Camel was not going to be made asynchronous, really the only other
choice was to make it multithreaded. [1] No one has been particularly
excited by this plan, as debugging and writing MT-safe code is hard.
But there wasn't much of a choice, and I believed that a simple thread
wrapper could be written around Camel.

The important thing to know is that while Camel is multithreaded, we
DO NOT and CANNOT share objects between threads. Instead,
evolution-mail sends a request to a dispatching thread, which performs
the action or queues it to be performed. (See section 4 for details)

The goal that I was working towards is that there should be no calls
to camel made, ever, in the main thread. I didn't expect to and
didn't do this, but that was the intent.

[1]. Well, we could fork off another process, but they share so much
data that this would be pretty impractical.

4. IMPLEMENTATION

a. CamelObject

Threading presented a major problem regarding Camel. Camel is based
on the GTK Object system, and uses signals to communicate events. This
is okay in a nonthreaded application, but the GTK Object system is
not thread-safe.

Particularly, signals and object allocations use static data. Using
either one inside Camel would guarantee that we'd be stuck with 
random crashes forevermore. That's Bad (TM).

There were two choices: make sure to limit our usage of GTK, or stop
using the GTK Object system. I decided to do the latter, as the
former would lead to a mess of "what GTK calls can we make" and
GDK_THREADS_ENTER and accidentally calling UI functions and upgrades
to GTK breaking everything.

So I wrote a very very simple CamelObject system. It had three goals:

    * Be really straightforward, just encapsulate the type 
      heirarchy without all that GtkArg silliness or anything.
    * Be as compatible as possible with the GTK Object system
      to make porting easy
    * Be threadsafe

It supports:

    * Type inheritance
    * Events (signals)
    * Type checking
    * Normal refcounting
    * No unref/destroy messiness
    * Threadsafety
    * Class functions

The entire code to the object system is in camel/camel-object.c. It's
a naive implementation and not full of features, but intentionally that
way. The main differences between GTK Objects and Camel Objects are:

    * s,gtk,camel,i of course
    * Finalize is no longer a GtkObjectClass function. You specify
      a finalize function along with an init function when declaring
      a type, and it is called automatically and chained automatically.
    * Declaring a type is a slightly different API
    * The signal system is replaced with a not-so-clever event system.
      Every event is equivalent to a NONE__POINTER signal. The syntax
      is slightly different: a class "declares" an event and specifies
      a name and a "prep func", that is called before the event is
      triggered and can cancel it.
    * There is only one CamelXXXClass in existence for every type.
      All objects share it.

There is a shell script, tools/make-camel-object.sh that will do all of
the common substitutions to make a file CamelObject-compatible. Usually
all that needs to be done is move the implementation of the finalize
event out of the class init, modify the get_type function, and replace
signals with events.

Pitfalls in the transition that I ran into were:

    * gtk_object_ref -> camel_object_ref or you coredump
    * some files return 'guint' instead of GtkType and must be changed
    * Remove the #include <gtk/gtk.h>
    * gtk_object_set_datas must be changed (This happened once; I 
      added a static hashtable)
    * signals have to be fudged a bit to match the gpointer input
    * the BAST_CASTARD option is on, meaning failed typecasts will
      return NULL, almost guaranteeing a segfault -- gets those
      bugs fixed double-quick!

b. API -- mail_operation_spec

I worked by creating a very specific definition of a "mail operation"
and wrote an engine to queue and dispatch them.

A mail operation is defined by a structure mail_operation_spec
prototyped in mail-threads.h. It comes in three logical parts -- a 
"setup" phase, executed in the main thread; a "do" phase, executed
in the dispatch thread; and a "cleanup" phase, executed in the main
thread. These three phases are guaranteed to be performed in order
and atomically with respect to other mail operations.

Each of these phases is represented by a function pointer in the
mail_operation_spec structure. The function mail_operation_queue() is
called and passed a pointer to a mail_operation_spec and a user_data-style
pointer that fills in the operation's parameters. The "setup" callback
is called immediately, though that may change.

Each callback is passed three parameters: a pointer to the user_data,
a pointer to the "operation data", and a pointer to a CamelException.
The "operation data" is allocated automatically and freed when the operation
completes. Internal data that needs to be shared between phases should
be stored here. The size allocated is specified in the mail_operation_spec
structure.

Because all of the callbacks use Camel calls at some point, the 
CamelException is provided as utility. The dispatcher will catch exceptions
and display error dialogs, unlike the synchronous code which lets
exceptions fall through the cracks fairly easily.

I tried to implement all the operations following this convention. Basically
I used this skeleton code for all the operations, just filling in the 
specifics:

===================================

typedef struct operation_name_input_s {
    parameters to operation
} operation_name_input_t;

typedef struct operation_name_data_s {
    internal data to operation, if any
    (if none, omit the structure and set opdata_size to 0)
} operation_name_data_t;

static gchar *describe_operation_name (gpointer in_data, gboolean gerund);
static void setup_operation_name   (gpointer in_data, gpointer op_data, CamelException *ex);
static void do_operation_name      (gpointer in_data, gpointer op_data, CamelException *ex);
static void cleanup_operation_name (gpointer in_data, gpointer op_data, CamelException *ex);

static gchar *describe_operation_name (gpointer in_data, gboolean gerund)
{
    operation_name_input_t *input = (operation_name_input_t *) in_data;

    if (gerund) {
        return a g_strdup'ed string describing what we're doing
    } else {
        return a g_strdup'ed string describing what we're about to do
    }
}

static void setup_operation_name (gpointer in_data, gpointer op_data, CamelException *ex)
{
    operation_name_input_t *input = (operation_name_input_t *) in_data;
    operation_name_data_t *data = (operation_name_data_t *) op_data;

    verify that parameters are valid

    initialize op_data

    reference objects
}

static void do_operation_name (gpointer in_data, gpointer op_data, CamelException *ex)
{
    operation_name_input_t *input = (operation_name_input_t *) in_data;
    operation_name_data_t *data = (operation_name_data_t *) op_data;

    perform camel operations
}

static void cleanup_operation_name (gpointer in_data, gpointer op_data, CamelException *ex)
{
    operation_name_input_t *input = (operation_name_input_t *) in_data;
    operation_name_data_t *data = (operation_name_data_t *) op_data;

    perform UI updates

    free allocations

    dereference objects
}

static const mail_operation_spec op_operation_name = {
    describe_operation_name,
    sizeof (operation_name_data_t),
    setup_operation_name,
    do_operation_name,
    cleanup_operation_name
};

void
mail_do_operation_name (parameters)
{
    operation_name_input_t *input;

    input = g_new (operation_name_input_t, 1);

    store parameters in input

    mail_operation_queue (&op_operation_name, input, TRUE);
}

===========================================

c. mail-ops.c

Has been drawn and quartered. It has been split into:

    * mail-callbacks.c: the UI callbacks
    * mail-tools.c: useful sequences wrapping common Camel operations
    * mail-ops.c: implementations of all the mail_operation_specs

An important part of mail-ops.c are the global functions 
mail_tool_camel_lock_{up,down}. These simulate a recursize mutex around
camel. There are an extreme few, supposedly safe, calls to Camel made in
the main thread. These functions should go around evey call to Camel or
group thereof. I don't think they're necessary but it's nice to know
they're there.

If you look at mail-tools.c, you'll notice that all the Camel calls are
protected with these functions. Remember that a mail tool is really 
just another Camel call, so don't use them in the main thread either.

All the mail operations are implemented in mail-ops.c EXCEPT:

    * filter-driver.c: the filter_mail operation
    * message-list.c: the regenerate_messagelist operation
    * message-thread.c: the thread_messages operation

d. Using the operations

The mail operations as implemented are very specific to evolution-mail. I
was thinking about leaving them mostly generic and then allowing extra
callbacks to be added to perform the more specific UI touches, but this
seemed kind of pointless.

I basically looked through the code, found references to Camel, and split
the code into three parts -- the bit before the Camel calls, the bit after,
and the Camel calls. These were mapped onto the template, given a name,
and added to mail-ops.c. Additionally, I simplified the common tasks that
were taken care of in mail-tools.c, making some functions much simpler.

Ninety-nine percent of the time, whatever operation is being done is being
done in a callback, so all that has to be done is this:

==================

void my_callback (GtkObject *obj, gchar *uid)
{
    camel_do_something (uid);
}

==== becomes ====

void my_callback (GtkObject *obj, gchar *uid)
{
    mail_do_do_something (uid);
}

=================

There are, however, a few belligerents. Particularly, the function
mail_uri_to_folder returns a CamelFolder and yet should really be
asynchronous. This is called in a CORBA call that is sychronous, and
additionally is used in the filter code.

I changed the first usage to return the folder immediately but
still fetch the CamelFolder asyncrhonously, and in the second case,
made filtering asynchronous, so the fact that the call is synchronous
doesn't matter.

The function was renamed to mail_tool_uri_to_folder to emphasize that
it's a synchronous Camel call.

e. The dispatcher

mail_operation_queue () takes its parameters and assembles them in a
closure_t structure, which I abbreviate clur. It sets a timeout to
display a progress window if an operation is still running one second
later (we're not smart enough to check if it's the same operation,
but the issue is not a big deal). The other thread and some communication
pipes are created.

The dispatcher thread sits in a loop reading from a pipe. Every time
the main thread queues an operation, it writes the closure_t into the pipe.
The dispatcher reads the closure, sends a STARTING message to the main
thread (see below for explanation), calls the callback specified in the
closure, and sends a FINISHED message. It then goes back to reading
from its pipe; it will either block until another operation comes along,
or find one right away and start it. This the pipe takes care of queueing
operations.

The dispatch thread communicates with the main thread with another pipe;
however, the main thread has other things to do than read from the pipe,
so it adds registers a GIOReader that checks for messages in the glib
main loop. In addition to starting and finishing messages, the other
thread can communicate to the user using messages and a progress bar.
(This is currently implemented but unused.)

5. ISSUES

    * Operations are queued and dequeued stupidly. Like if you click
      on one message then click on another, the first will be retrieved
      and displayed then overwritten by the second. Operations that could
      be performed at the same time safely aren't.
    * The CamelObject system is workable, but it'd be nice to work with
      something established like the GtkObject
    * The whole threading idea is not great. Concensus is that an 
      asynchronous interface is the Right Thing, eventually.
    * Care still needs to be taken when designing evolution-mail code to
      work with the asynchronous mail_do_ functions
    * Some of the operations are extremely hacky.
    * IMAP's timeout to send a NOOP had to be removed because we can't
      use GTK. We need an alternative for this.