ephemerons (long message, with a brief preamble)
Taylor R Campbell
2010-08-10 21:28:56 UTC
Here's a pseudocode implementation of ephemerons which I wrote and
translated into C for MIT Scheme this past weekend; there are also
some explanatory comments about ephemerons and the algorithm. It
would be neat if Scheme48 supported ephemerons, and it probably
wouldn't be hard to adapt this pseudocode (which I wrote to clarify to
myself what was going on in the algorithm, but which is not far from
Pre-Scheme) to Scheme48 for a real implementation, at least for the
semispace garbage collector, by calling POST-MARK-HOOK somewhere in
REAL-COPY-OBJECT. Adding support for ephemerons to the bibop
collector may be hairy; I haven't thought about that.

Note: Some implementations of ephemerons (such as the implementation
of weak hash tables in SBCL) make the garbage collector's running time
quadratic in the number of ephemerons. This algorithm is linear. It
is roughly equivalent to the algorithm described by Bruno Haible in
English in


where he calls ephemerons `weak mappings'.

;;; Background: What are ephemerons?
;;; An ephemeron is an object with two subobjects called its key and
;;; its datum, somewhat, but not exactly, like this:
;;; (define (make-ephemeron key datum)
;;; (cons (make-weak-pointer key) datum))
;;; (define (ephemeron.key ephemeron)
;;; (weak-pointer-ref (car ephemeron)))
;;; (define (ephemeron.datum ephemeron)
;;; (cdr ephemeron))
;;; In real ephemerons, however, not only is the reference to the key
;;; weak, but the reference to the datum is too. The key and datum
;;; are both considered live iff the key is considered live. Thus,
;;; the key is proven dead iff the only transitive strong references
;;; to the key from the roots of garbage collection pass through the
;;; ephemeron -- even if the datum has a strong reference to the key;
;;; and if the key is proven dead, then the reference to the datum is
;;; dropped, and its storage may be reclaimed.
;;; In other words, the references to the key and datum are preserved
;;; iff somebody other than the ephemeron cares about the key.
;;; So, for example, the following program may yield (#F #F):
;;; (let ((ephemeron
;;; (let ((datum (cons 0 (cons 1 2))))
;;; (make-ephemeron (cdr datum) datum))))
;;; (collect-garbage)
;;; (list (ephemeron.key ephemeron)
;;; (ephemeron.datum ephemeron)))
;;; However, if ephemerons were implemented with weak pointers and
;;; pairs as above, then it would always return ((1 . 2) (0 1 . 2)),
;;; because the cdr of the pair would have a strong reference to (0
;;; 1 . 2), the datum, which has a strong reference to (1 . 2), the
;;; key.

;;; Given:
;;; (OBJECT-HASH <object> <modulus>) -> non-negative integer below <modulus>
;;; Returns the least non-negative residue modulo <modulus> of an
;;; integer unique to <object>.
;;; (MARKED? <object>) -> boolean
;;; (FORWARDED? <object>) -> boolean
;;; (TRACE <object>) -> object'
;;; (FOLLOW <marked-object>) -> object'
;;; In a copying collector, MARKED? tells whether <object> is an
;;; object in the old space with a forwarding pointer. Otherwise,
;;; it tells whether <object> has been marked. Immediate objects
;;; may be considered always marked.
;;; In a copying collector, FORWARDED? tells whether <object> is in
;;; the new space. Otherwise, it returns #T.
;;; TRACE guarantees that <object> and all objects transitively
;;; strongly referenced by it are marked. In a copying collector,
;;; TRACE returns the object in the new space to which <object> has
;;; been forwarded. Otherwise, it returns <object>.
;;; In a copying collector, if <marked-object> is an object in the
;;; old space with a forwarding pointer, FOLLOW returns the object
;;; in the new space to which it has been forwarded; it is an error
;;; if <marked-object> is not an object in the old space with a
;;; forwarding pointer. In a non-copying collector, FOLLOW returns
;;; <marked-object>; it is an error if <marked-object> is not
;;; marked.
;;; (RECYCLE-VECTOR! <vector> <length> <fill>) -> vector
;;; <Vector> must be an unmarked vector of length at least
;;; <length>. In a copying collector, returns a vector in the new
;;; space of length <length> all of whose elements are initialized
;;; to <fill>. Otherwise, shrinks <vector> in place to <length>,
;;; releasing any storage formerly but no longer occupied by the
;;; vector, and replaces all its elements by <fill>.
;;; (NEXT-HASH-TABLE-SIZE <size>) -> non-negative integer
;;; Returns a next good size for a hash table's array after <size>.
;;; (TRACE-IS-IDEMPOTENT?) -> boolean
;;; True if (TRACE (TRACE x)) is equal to (TRACE x) and cheap to
;;; compute. If false, this code won't try to evaluate (TRACE
;;; (TRACE x)) at all.
;;; Assumptions:
;;; 1. The garbage collector runs in one swell foop -- that is, we
;;; are not using an incremental or generational or regional
;;; garbage collector, or anything of the sort.
;;; 2. Tracing an unmarked object marks it. Tracing a marked object
;;; has no effect. Thus, every object is marked at most once.
;;; 3. After marking an object, the garbage collector applies
;;; POST-MARK-HOOK to the object's copy in the new space, in a
;;; copying collector; or to the object, in a non-copying
;;; collector.
;;; 4. UPDATE-EPHEMERONS is not called until after the bulk of the
;;; garbage collector's work is done -- that is, all strong paths
;;; from the roots have been traversed. The garbage collector may
;;; update ordinary weak references only after UPDATE-EPHEMERONS.
;;; Constraints:
;;; 1. We are in a garbage collector -- we are out of memory and
;;; cannot allocate more than a constant amount of auxiliary
;;; storage (which must be pre-allocated).

;;; Claims:
;;; 1. The storage for the key of an ephemeron is reclaimed (or, the
;;; key of an ephemeron is not traced) iff the only references to
;;; the key are through the datum.
;;; 2. If the storage for the key of an ephemeron is reclaimed, then
;;; the ephemeron is destroyed -- its key and datum are both
;;; replaced by #F.
;;; 3. POST-MARK-HOOK runs in O(1) time, so that handling ephemerons
;;; during garbage collection adds only constant factors to the
;;; running time.
;;; 4. UPDATE-EPHEMERONS runs in O(n + m) time, where n is the number
;;; of ephemerons and m is the number of objects referenceable
;;; only through ephemerons. Thus, handling ephemerons and
;;; calling UPDATE-EPHEMERONS adds only constant factors to the
;;; running time of the garbage collector as a whole.
;;; 5. The size of the new space never increases beyond the size of
;;; the old space as a consequence of collecting ephemerons.
;;; (This requires RECYCLE-VECTOR! to behave as requested.)
;;; Questions, in descending order of importance:
;;; 1. Can this be done with smaller constant factors in space?
;;; That is, can this be done with less space reserved for each
;;; ephemeron?
;;; 2. Can this be done with smaller constant factors in time? For
;;; example, can it be done with less pointer-chasing?

;;;; Ephemerons and GC State

;;; The number of words used by n ephemerons is
;;; (+ vector-overhead (* n (+ record-overhead 4))),
;;; because each ephemeron has four words for record fields, and one
;;; word reserved in the ephemeron hash table.

(define-record-type <ephemeron>
(%make-ephemeron key datum next-listed next-queued)
(key ephemeron.key set-ephemeron.key!)
(datum ephemeron.datum set-ephemeron.datum!)

;; While tracing, the garbage collector links up a list of the live
;; ephemerons through this field; *EPHEMERON-LIST* heads the list.
(next-listed ephemeron.next-listed set-ephemeron.next-listed!)

;; The NEXT-QUEUED field is multiplexed for buckets in the ephemeron
;; hash table and for the ephemeron queue, used to process ephemerons
;; whose keys have just been marked.
(next-queued ephemeron.next-queued set-ephemeron.next-queued!))

;;; Invariant: (<= *ephemeron-count* (vector-length *ephemeron-hash-table*))

(define (make-ephemeron key datum)
(set! *ephemeron-count* (+ 1 *ephemeron-count*))
(if (< (vector-length *ephemeron-hash-table*) *ephemeron-count*)
(set! *ephemeron-hash-table*
(make-vector (next-hash-table-size *ephemeron-count*))))
(%make-ephemeron key datum #f #f #f))

(define *ephemeron-count* 0)
(define *ephemeron-hash-table* (make-vector 0))
(define *ephemeron-list* #f)
(define *ephemeron-queue* #f)
(define *updating-ephemerons?* #f)

(define (for-each-ephemeron procedure)
(let loop ((ephemeron *ephemeron-list*))
(if (ephemeron? ephemeron)
(let ((next (ephemeron.next-listed ephemeron)))
(procedure ephemeron)
(loop next)))))

(define (walk-ephemeron-queue procedure)
(let loop ()
(let ((ephemeron *ephemeron-queue*))
(if (ephemeron? ephemeron)
(begin (set! *ephemeron-queue* (ephemeron.next-queued ephemeron))
(procedure ephemeron)

;;;; GC Post-Mark Hook

;;; An ephemeron is put in the hash table at most once (when it is
;;; marked), looked up from the hash table at most once (when its key
;;; is marked), and then removed from the hash table. An ephemeron is
;;; put on the queue at most once (when its key is marked), found on
;;; the queue at most once (in UPDATE-EPHEMERONS), and then removed
;;; from the queue. An ephemeron is put on the list at most once
;;; (when it is marked), found on the list at most twice (both times
;;; in UPDATE-EPHEMERONS), and then removed from the list. Thus,
;;; there is a constant bound on the number of times any ephemeron can
;;; pass through any part of this code -- provided that there are no
;;; collisions in the hash table.

(define (post-mark-hook object)
(if (ephemeron? object) (post-mark-ephemeron object))
(if *updating-ephemerons?* (queue-ephemerons-for-key object)))

(define (queue-ephemerons-for-key object)
(let* ((hash-table *ephemeron-hash-table*)
(index (object-hash object (vector-length hash-table)))
(bucket (vector-ref hash-table index)))
(let loop ((ephemeron bucket) (previous #f))
(if (ephemeron? ephemeron)
(let ((next (ephemeron.next-queued ephemeron)))
(if (eq? object (ephemeron.key ephemeron))
(if previous
(set-ephemeron.next-queued! previous next)
(vector-set! hash-table index next))
(set-ephemeron.next-queued! ephemeron *ephemeron-queue*)
(set! *ephemeron-queue* ephemeron)))
(loop next ephemeron))))))

(define (post-mark-ephemeron ephemeron)
(set-ephemeron.next-listed! ephemeron *ephemeron-list*)
(set! *ephemeron-list* ephemeron)
(let ((key (ephemeron.key ephemeron)))
(if (not (marked? key))
(let* ((hash-table *ephemeron-hash-table*)
(index (object-hash key (vector-length hash-table))))
(set-ephemeron.next-queued! ephemeron (vector-ref hash-table index))
(vector-set! hash-table index ephemeron)))))

;;;; GC Hooks

;;; The GC must call INITIALIZE-EPHEMERONS before marking any
;;; ephemerons, and UPDATE-EPHEMERONS after tracing all strong
;;; references in the heap.

(define (initialize-ephemerons)
(set! *ephemeron-count* 0))

(define (update-ephemerons)
(set! *updating-ephemerons?* #t)
(for-each-ephemeron pre-process-ephemeron)
(walk-ephemeron-queue process-ephemeron)
(for-each-ephemeron post-process-ephemeron)
;; Invariant: post-GC ephemeron count <= pre-GC ephemeron count.
;; Thus, if there was space pre-GC for the ephemeron hash table,
;; there is space post-GC for it.
(set! *ephemeron-hash-table*
(recycle-vector! *ephemeron-hash-table* *ephemeron-count* #f))
(set! *updating-ephemerons?* #f))

(define (pre-process-ephemeron ephemeron)
(let ((key (ephemeron.key ephemeron)))
(if (marked? key)
(if (trace-is-idempotent?)
(process-ephemeron ephemeron)
(queue-ephemerons-for-key key)))))

(define (process-ephemeron ephemeron)
(set-ephemeron.datum! ephemeron (trace (ephemeron.datum ephemeron))))

(define (post-process-ephemeron ephemeron)
(set! *ephemeron-count* (+ 1 *ephemeron-count*))
(set-ephemeron.next-listed! ephemeron #f)
(set-ephemeron.next-queued! ephemeron #f)
(let ((key (ephemeron.key ephemeron)))
(if (marked? key)
(set-ephemeron.key! ephemeron (follow key))
(begin (assert (not (forwarded? (ephemeron.datum ephemeron))))
(set-ephemeron.key! ephemeron #f)
(set-ephemeron.datum! ephemeron #f)))))