mirror of
https://git.tartarus.org/simon/putty.git
synced 2025-01-09 17:38:00 +00:00
Document PuTTY's local idiom for OO / traits.
A user mentioned having found this confusing recently, and fair enough, because it's done in a way that doesn't quite match the built-in OO system of any language I know about. But after the rewriting in recent years, I think pretty much everything in PuTTY that has a system of interchangeable implementations of the same abstract type is now done basically the same way, so this seems like a good moment to document the idiom we use and explain all its ins and outs.
This commit is contained in:
parent
1f8b3b5535
commit
79de16732a
379
doc/udp.but
379
doc/udp.but
@ -411,6 +411,385 @@ ensure that when you free that memory you reset the pointer field to
|
||||
is called, it can reliably free the memory if there is any, and not
|
||||
crash if there isn't.
|
||||
|
||||
\H{udp-traits} Explicit vtable structures to implement traits
|
||||
|
||||
A lot of PuTTY's code is written in a style that looks structurally
|
||||
rather like an object-oriented language, in spite of PuTTY being a
|
||||
pure C program.
|
||||
|
||||
For example, there's a single data type called \cw{ssh_hash}, which is
|
||||
an abstraction of a secure hash function, and a bunch of functions
|
||||
called things like \cw{ssh_hash_}\e{foo} that do things with those
|
||||
data types. But in fact, PuTTY supports many different hash functions,
|
||||
and each one has to provide its own implementation of those functions.
|
||||
|
||||
In C++ terms, this is rather like having a single abstract base class,
|
||||
and multiple concrete subclasses of it, each of which fills in all the
|
||||
pure virtual methods in a way that's compatible with the data fields
|
||||
of the subclass. The implementation is more or less the same, as well:
|
||||
in C, we do explicitly in the source code what the C++ compiler will
|
||||
be doing behind the scenes at compile time.
|
||||
|
||||
But perhaps a closer analogy in functional terms is the Rust concept
|
||||
of a \q{trait}, or the Java idea of an \q{interface}. C++ supports a
|
||||
multi-level hierarchy of inheritance, whereas PuTTY's system \dash
|
||||
like traits or interfaces \dash has only two levels, one describing a
|
||||
generic object of a type (e.g. a hash function) and another describing
|
||||
a specific implementation of that type (e.g. SHA-256).
|
||||
|
||||
The PuTTY code base has a standard idiom for doing this in C, as
|
||||
follows.
|
||||
|
||||
Firstly, we define two \cw{struct} types for our trait. One of them
|
||||
describes a particular \e{kind} of implementation of that trait, and
|
||||
it's full of (mostly) function pointers. The other describes a
|
||||
specific \e{instance} of an implementation of that trait, and it will
|
||||
contain a pointer to a \cw{const} instance of the first type. For
|
||||
example:
|
||||
|
||||
\c typedef struct MyAbstraction MyAbstraction;
|
||||
\c typedef struct MyAbstractionVtable MyAbstractionVtable;
|
||||
\c
|
||||
\c struct MyAbstractionVtable {
|
||||
\c MyAbstraction *(*new)(const MyAbstractionVtable *vt);
|
||||
\c void (*free)(MyAbstraction *);
|
||||
\c void (*modify)(MyAbstraction *, unsigned some_parameter);
|
||||
\c unsigned (*query)(MyAbstraction *, unsigned some_parameter);
|
||||
\c };
|
||||
\c
|
||||
\c struct MyAbstraction {
|
||||
\c const MyAbstractionVtable *vt;
|
||||
\c };
|
||||
|
||||
Here, we imagine that \cw{MyAbstraction} might be some kind of object
|
||||
that contains mutable state. The associated vtable structure shows
|
||||
what operations you can perform on a \cw{MyAbstraction}: you can
|
||||
create one (dynamically allocated), free one you already have, or call
|
||||
the example methods \q{modify} (to change the state of the object in
|
||||
some way) and \q{query} (to return some value derived from the
|
||||
object's current state).
|
||||
|
||||
(In most cases, the vtable structure has a name ending in \cq{vtable}.
|
||||
But for historical reasons a lot of the crypto primitives that use
|
||||
this scheme \dash ciphers, hash functions, public key methods and so
|
||||
on \dash instead have names ending in \cq{alg}, on the basis that the
|
||||
primitives they implement are often referred to as \q{encryption
|
||||
algorithms}, \q{hash algorithms} and so forth.)
|
||||
|
||||
Now, to define a concrete instance of this trait, you'd define a
|
||||
\cw{struct} that contains a \cw{MyAbstraction} field, plus any other
|
||||
data it might need:
|
||||
|
||||
\c struct MyImplementation {
|
||||
\c unsigned internal_data[16];
|
||||
\c SomeOtherType *dynamic_subthing;
|
||||
\c
|
||||
\c MyAbstraction myabs;
|
||||
\c };
|
||||
|
||||
Next, you'd implement all the necessary methods for that
|
||||
implementation of the trait, in this kind of style:
|
||||
|
||||
\c static MyAbstraction *myimpl_new(const MyAbstractionVtable *vt)
|
||||
\c {
|
||||
\c MyImplementation *impl = snew(MyImplementation);
|
||||
\c memset(impl, 0, sizeof(*impl));
|
||||
\c impl->dynamic_subthing = allocate_some_other_type();
|
||||
\c impl->myabs.vt = vt;
|
||||
\c return &impl->myabs;
|
||||
\c }
|
||||
\c
|
||||
\c static void myimpl_free(MyAbstraction *myabs)
|
||||
\c {
|
||||
\c MyImplementation *impl = container_of(myabs, MyImplementation, myabs);
|
||||
\c free_other_type(impl->dynamic_subthing);
|
||||
\c sfree(impl);
|
||||
\c }
|
||||
\c
|
||||
\c static void myimpl_modify(MyAbstraction *myabs, unsigned param)
|
||||
\c {
|
||||
\c MyImplementation *impl = container_of(myabs, MyImplementation, myabs);
|
||||
\c impl->internal_data[param] += do_something_with(impl->dynamic_subthing);
|
||||
\c }
|
||||
\c
|
||||
\c static unsigned myimpl_query(MyAbstraction *myabs, unsigned param)
|
||||
\c {
|
||||
\c MyImplementation *impl = container_of(myabs, MyImplementation, myabs);
|
||||
\c return impl->internal_data[param];
|
||||
\c }
|
||||
|
||||
Having defined those methods, now we can define a \cw{const} instance
|
||||
of the vtable structure containing pointers to them:
|
||||
|
||||
\c const MyAbstractionVtable MyImplementation_vt = {
|
||||
\c .new = myimpl_new,
|
||||
\c .free = myimpl_free,
|
||||
\c .modify = myimpl_modify,
|
||||
\c .query = myimpl_query,
|
||||
\c };
|
||||
|
||||
\e{In principle}, this is all you need. Client code can construct a
|
||||
new instance of a particular implementation of \cw{MyAbstraction} by
|
||||
digging out the \cw{new} method from the vtable and calling it (with
|
||||
the vtable itself as a parameter), which returns a \cw{MyAbstraction
|
||||
*} pointer that identifies a newly created instance, in which the
|
||||
\cw{vt} field will contain a pointer to the same vtable structure you
|
||||
passed in. And once you have an instance object, say \cw{MyAbstraction
|
||||
*myabs}, you can dig out one of the other method pointers from the
|
||||
vtable it points to, and call that, passing the object itself as a
|
||||
parameter.
|
||||
|
||||
But in fact, we don't do that, because it looks pretty ugly at all the
|
||||
call sites. Instead, what we generally do in this code base is to
|
||||
write a set of \cw{static inline} wrapper functions in the same header
|
||||
file that defined the \cw{MyAbstraction} structure types, like this:
|
||||
|
||||
\c static MyAbstraction *myabs_new(const MyAbstractionVtable *vt)
|
||||
\c { return vt->new(vt); }
|
||||
\c static void myabs_free(MyAbstraction *myabs)
|
||||
\c { myabs->vt->free(myabs); }
|
||||
\c static void myimpl_modify(MyAbstraction *myabs, unsigned param)
|
||||
\c { myabs->vt->modify(myabs, param); }
|
||||
\c static unsigned myimpl_query(MyAbstraction *myabs, unsigned param)
|
||||
\c { return myabs->vt->query(myabs, param); }
|
||||
|
||||
And now call sites can use those reasonably clean-looking wrapper
|
||||
functions, and shouldn't ever have to directly refer to the \cw{vt}
|
||||
field inside any \cw{myabs} object they're holding. For example, you
|
||||
might write something like this:
|
||||
|
||||
\c MyAbstraction *myabs = myabs_new(&MyImplementation_vtable);
|
||||
\c myabs_update(myabs, 10);
|
||||
\c unsigned output = myabs_query(myabs, 2);
|
||||
\c myabs_free(myabs);
|
||||
|
||||
and then all this code can use a different implementation of the same
|
||||
abstraction by just changing which vtable pointer it passed in in the
|
||||
first line.
|
||||
|
||||
Some things to note about this system:
|
||||
|
||||
\b The implementation instance type (here \cq{MyImplementation}
|
||||
contains the abstraction type (\cq{MyAbstraction}) as one of its
|
||||
fields. But that field is not necessarily at the start of the
|
||||
structure. So you can't just \e{cast} pointers back and forth between
|
||||
the two types. Instead:
|
||||
|
||||
\lcont{
|
||||
|
||||
\b You \q{up-cast} from implementation to abstraction by taking the
|
||||
address of the \cw{MyAbstraction} field. You can see the example
|
||||
\cw{new} method above doing this, returning \cw{&impl->myabs}. All
|
||||
\cw{new} methods do this on return.
|
||||
|
||||
\b Going in the other direction, each method that was passed a generic
|
||||
\cw{MyAbstraction *myabs} parameter has to recover a pointer to the
|
||||
specific implementation type \cw{MyImplementation *impl}. The idiom
|
||||
for doing that is to use the \cq{container_of} macro, also seen in the
|
||||
Linux kernel code. Generally, \cw{container_of(p, Type, field)} says:
|
||||
\q{I'm confident that the pointer value \cq{p} is pointing to the
|
||||
field called \cq{field} within a larger \cw{struct} of type \cw{Type}.
|
||||
Please return me the pointer to the containing structure.} So in this
|
||||
case, we take the \cq{myabs} pointer passed to the function, and
|
||||
\q{down-cast} it into a pointer to the larger and more specific
|
||||
structure type \cw{MyImplementation}, by adjusting the pointer value
|
||||
based on the offset within that structure of the field called
|
||||
\cq{myabs}.
|
||||
|
||||
This system is flexible enough to permit \q{multiple inheritance}, or
|
||||
rather, multiple \e{implementation}: having one object type implement
|
||||
more than one trait. For example, the \cw{Proxy} type implements both
|
||||
the \cw{Socket} trait and the \cw{Plug} trait that connects to it,
|
||||
because it has to act as an adapter between another instance of each
|
||||
of those types.
|
||||
|
||||
It's also perfectly possible to have the same object implement the
|
||||
\e{same} trait in two different ways. At the time of writing this I
|
||||
can't think of any case where we actually do this, but a theoretical
|
||||
example might be if you needed to support a trait like \cw{Comparable}
|
||||
in two ways that sorted by different criteria. There would be no
|
||||
difficulty doing this in the PuTTY system: simply have your
|
||||
implementation \cw{struct} contain two (or more) fields of the same
|
||||
abstraction type. The fields will have different names, which makes it
|
||||
easy to explicitly specify which one you're returning a pointer to
|
||||
during up-casting, or which one you're down-casting from using
|
||||
\cw{container_of}. And then both sets of implementation methods can
|
||||
recover a pointer to the same containing structure.
|
||||
|
||||
}
|
||||
|
||||
\b Unlike in C++, all objects in PuTTY that use this system are
|
||||
dynamically allocated. The \q{constructor} functions (whether they're
|
||||
virtualised across the whole abstraction or specific to each
|
||||
implementation) always allocate memory and return a pointer to it. The
|
||||
\q{free} method (our analogue of a destructor) always expects the
|
||||
input pointer to be dynamically allocated, and frees it. As a result,
|
||||
client code doesn't need to know how large the implementing object
|
||||
type is, because it will never need to allocate it (on the stack or
|
||||
anywhere else).
|
||||
|
||||
\b Unlike in C++, the abstraction's \q{vtable} structure does not only
|
||||
hold methods that you can call on an instance object. It can also
|
||||
hold several other kinds of thing:
|
||||
|
||||
\lcont{
|
||||
|
||||
\b Methods that you can call \e{without} an instance object, given
|
||||
only the vtable structure identifying a particular implementation of
|
||||
the trait. You might think of these as \q{static methods}, as in C++,
|
||||
except that they're \e{virtual} \dash the same code can call the
|
||||
static method of a different \q{class} given a different vtable
|
||||
pointer. So they're more like \q{virtual static methods}, which is a
|
||||
concept C++ doesn't have. An example is the \cw{pubkey_bits} method in
|
||||
\cw{ssh_keyalg}.
|
||||
|
||||
\b The most important case of a \q{virtual static method} is the
|
||||
\cw{new} method that allocates and returns a new object. You can think
|
||||
of it as a \q{virtual constructor} \dash another concept C++ doesn't
|
||||
have. (However, not all types need one of these: see below.)
|
||||
|
||||
\b The vtable can also contain constant data relevant to the class as
|
||||
a whole \dash \q{virtual constant data}. For example, a cryptographic
|
||||
hash function will contain an integer field giving the length of the
|
||||
output hash, and most crypto primitives will contain a string field
|
||||
giving the identifier used in the SSH protocol that describes that
|
||||
primitive.
|
||||
|
||||
The effect of all of this is that you can make other pieces of code
|
||||
able to use any instance of one of these types, by passing it an
|
||||
actual vtable as a parameter. For example, the \cw{hash_simple}
|
||||
function takes an \cw{ssh_hashalg} vtable pointer specifying any hash
|
||||
algorithm you like, and internally, it creates an object of that type,
|
||||
uses it, and frees it. In C++, you'd probably do this using a
|
||||
template, which would mean you had multiple specialisations of
|
||||
\cw{hash_simple} \dash and then it would be much more difficult to
|
||||
decide \e{at run time} which one you needed to use. Here,
|
||||
\cw{hash_simple} is still just one function, and you can decide as
|
||||
late as you like which vtable to pass to it.
|
||||
|
||||
}
|
||||
|
||||
\b The \e{implementation} structure can also contain publicly visible
|
||||
data fields (this time, usually treated as mutable). For example,
|
||||
\cw{BinaryPacketProtocol} has lots of these.
|
||||
|
||||
\b Not all abstractions of this kind want virtual constructors. It
|
||||
depends on how different the implementations are.
|
||||
|
||||
\lcont{
|
||||
|
||||
With a crypto primitive like a hash algorithm, the constructor call
|
||||
looks the same for every implementing type, so it makes sense to have
|
||||
a standardised virtual constructor in the vtable and a
|
||||
\cw{ssh_hash_new} wrapper function which can make an instance of
|
||||
whatever vtable you pass it. And then you make all the vtable objects
|
||||
themselves globally visible throughout the source code, so that any
|
||||
module can call (for example) \cw{ssh_hash_new(&ssh_sha256)}.
|
||||
|
||||
But with other kinds of object, the constructor for each implementing
|
||||
type has to take a different set of parameters. For example,
|
||||
implementations of \cw{Socket} are not generally interchangeable at
|
||||
construction time, because constructing different kinds of socket
|
||||
require totally different kinds of address parameter. In that
|
||||
situation, it makes more sense to keep the vtable structure itself
|
||||
private to the implementing source file, and instead, publish an
|
||||
ordinary constructing function that allocates and returns an instance
|
||||
of that particular subtype, taking whatever parameters are appropriate
|
||||
to that subtype.
|
||||
|
||||
}
|
||||
|
||||
\b If you do have virtual constructors, you can choose whether they
|
||||
take a vtable pointer as a parameter (as shown above), or an
|
||||
\e{existing} instance object. In the latter case, they can refer to
|
||||
the object itself as well as the vtable. For example, you could have a
|
||||
trait come with a virtual constructor called \q{clone}, meaning
|
||||
\q{Make a copy of this object, no matter which implementation it is.}
|
||||
|
||||
\b Sometimes, a single vtable structure type can be shared between two
|
||||
completely different object types, and contain all the methods for
|
||||
both. For example, \cw{ssh_compression_alg} contains methods to
|
||||
create, use and free \cw{ssh_compressor} and \cw{ssh_decompressor}
|
||||
objects, which are not interchangeable \dash but putting their methods
|
||||
in the same vtable means that it's easy to create a matching pair of
|
||||
objects that are compatible with each other.
|
||||
|
||||
\b Passing the vtable itself as an argument to the \cw{new} method is
|
||||
not compulsory: if a given \cw{new} implementation is only used by a
|
||||
single vtable, then that function can simply hard-code the vtable
|
||||
pointer that it writes into the object it constructs. But passing the
|
||||
vtable is more flexible, because it allows a single constructor
|
||||
function to be shared between multiple slightly different object
|
||||
types. For example, SHA-384 and SHA-512 share the same \cw{new} method
|
||||
and the same implementation data type, because they're very nearly the
|
||||
same hash algorithm \dash but a couple of the other methods in their
|
||||
vtables are different, because the \q{reset} function has to set up
|
||||
the initial algorithm state differently, and the \q{digest} method has
|
||||
to write out a different amount of data.
|
||||
|
||||
\lcont{
|
||||
|
||||
One practical advantage of having the \cw{myabs_}\e{foo} family of
|
||||
inline wrapper functions in the header file is that if you change your
|
||||
mind later about whether the vtable needs to be passed to \cw{new},
|
||||
you only have to update the \cw{myabs_new} wrapper, and then the
|
||||
existing call sites won't need changing.
|
||||
|
||||
}
|
||||
|
||||
\b Another piece of \q{stunt object orientation} made possible by this
|
||||
scheme is that you can write two vtables that both use the same
|
||||
structure layout for the implementation object, and have an object
|
||||
\e{transform from one to the other} part way through its lifetime, by
|
||||
overwriting its own vtable pointer field. For example, the
|
||||
\cw{sesschan} type that handles the server side of an SSH terminal
|
||||
session will sometimes transform in mid-lifetime into an SCP or SFTP
|
||||
file-transfer channel in this way, at the point where the client sends
|
||||
an \cq{exec} or \cq{subsystem} request that indicates that that's what
|
||||
it wants to do with the channel.
|
||||
|
||||
\lcont{
|
||||
|
||||
This concept would be difficult to arrange in C++. In Rust, it
|
||||
wouldn't even \e{make sense}, because in Rust, objects implementing a
|
||||
trait don't even contain a vtable pointer at all \dash instead, the
|
||||
\q{trait object} type (identifying a specific instance of some
|
||||
implementation of a given trait) consists of a pair of pointers, one
|
||||
to the object itself and one to the vtable. In that model, the only
|
||||
way you could make an existing object turn into a different trait
|
||||
would be to know where all the pointers to it were stored elsewhere in
|
||||
the program, and persuade all their owners to rewrite them.
|
||||
|
||||
}
|
||||
|
||||
\b Another stunt you can do is to have a vtable that doesn't have a
|
||||
corresponding implementation structure at all, because the only
|
||||
methods implemented in it are the constructors, and they always end up
|
||||
returning an implementation of some other vtable. For example, some of
|
||||
PuTTY's crypto primitives have a hardware-accelerated version and a
|
||||
pure software version, and decide at run time which one to use (based
|
||||
on whether the CPU they're running on supports the necessary
|
||||
acceleration instructions). So, for example, there are vtables for
|
||||
\cw{ssh_sha256_sw} and \cw{ssh_sha256_hw}, each of which has its own
|
||||
data layout and its own implementations of all the methods; and then
|
||||
there's a top-level vtable \cw{ssh_sha256}, which only provides the
|
||||
\q{new} method, and implements it by calling the \q{new} method on one
|
||||
or other of the subtypes depending on what it finds out about the
|
||||
machine it's running on. That top-level selector vtable is nearly
|
||||
always the one used by client code. (Except for the test suite, which
|
||||
has to instantiate both of the subtypes in order to make sure they
|
||||
both passs the tests.)
|
||||
|
||||
\lcont{
|
||||
|
||||
As a result, the top-level selector vtable \cw{ssh_sha256} doesn't
|
||||
need to implement any method that takes an \cw{ssh_cipher *}
|
||||
parameter, because no \cw{ssh_cipher} object is ever constructed whose
|
||||
\cw{vt} field points to \cw{&ssh_sha256}: they all point to one of the
|
||||
other two full implementation vtables.
|
||||
|
||||
}
|
||||
|
||||
\H{udp-compile-once} Single compilation of each source file
|
||||
|
||||
The PuTTY build system for any given platform works on the following
|
||||
|
Loading…
Reference in New Issue
Block a user