![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Just like with everything else, you'd probably end up asking yourself: how's it done? So here's this chapter, just to wheten your appetite...
6.12.1 How Arrays Work Delving into something old 6.12.2 Two flavors of equality Delving into something new 6.12.3 The truth about metaclasses Or, the truth on metaclasses 6.12.4 The truth of Smalltalk performance Hmm... they told me Smalltalk is slow...
The Array Class. Our examples have already shown the Array class, and its use is fairly obvious. For many applications, it will fill all your needs--when you need an array in a new class, you keep an instance variable, allocate a new Array and assign it to the variable, and then send array accesses via the instance variable.
This technique even works for string-like objects, although it is wasteful of storage. An Array object uses a Smalltalk pointer for each slot in the array; its exact size is transparent to the programmer, but you can generally guess that it'll be roughly the word size of your machine. (41) For storing an array of characters, therefore, an Array works but is inefficient.
Arrays at a Lower Level. So let's step down to a lower level of data structure. A ByteArray is much like an Array, but each slot holds only an integer from 0 to 255-and each slot uses only a byte of storage. If you only needed to store small quantities in each array slot, this would therefore be a much more efficient choice than an Array. As you might guess, this is the type of array which a String uses.
Aha! But when you go back to chapter 9 and look at the Smalltalk hierarchy, you notice that String does not inherit from ByteArray. To see why, we must delve down yet another level, and arrive at the basic methods for setting up the structure of the instances of a class.
When we implemented our NiledArray example, we used
<shape: #inherit>
. The shape is exactly that:
the fundamental structure of Smalltalk objects created within a given
class. Let's consider the differences in the next sub-sections.
#pointer
new:
message.
The number passed as an argument to new:
causes the new
object, in addition to the space for instance variables, to
also have that many slots of unnamed (indexed) storage allocated.
The analog in C would be to have a dynamically allocated structure
with some scalar fields, followed at its end by a array of pointers.
#byte
char
.
#word
long
. This kind of subclass is only used in a few
places in Smalltalk.
#character
char
.
There are many more shapes for more specialized usage. All of them specify the same kind of "array-like" behavior, with different data types.
How to access this new arrays? You already know how to access instance
variables--by name. But there doesn't seem to be a name for this new
storage. The way an object accesses it is to send itself
array-type messages like at:
, at:put:
, and so forth.
The problem is when an object wants to add a new level
of interpretation to these messages. Consider
a Dictionary--it is a pointer-holding object
but its at:
message is in terms of a key, not an integer
index of its storage. Since it has redefined the at:
message, how
does it access its fundamental storage?
The answer is that Smalltalk has defined basicAt:
and
basicAt:put:
, which will access the basic storage even when
the at:
and at:put:
messages have been defined to provide
a different abstraction.
This can get pretty confusing in the abstract, so let's do an example to show how it's pretty simple in practice. Smalltalk arrays tend to start at 1; let's define an array type whose permissible range is arbitrary.
ArrayedCollection subclass: RangedArray [ | offset | <comment: 'I am an Array whose base is arbitrary'> RangedArray class >> new: size [ <category: 'instance creation'> ^self new: size base: 1 ] RangedArray class >> new: size base: b [ <category: 'instance creation'> ^(super new: size) init: b ] init: b [ <category: 'init'> offset := (b - 1). "- 1 because basicAt: works with a 1 base" ^self ] rangeCheck: i [ <category: 'basic'> (i <= offset) | (i > (offset + self basicSize)) ifTrue: [ 'Bad index value: ' printOn: stderr. i printOn: stderr. Character nl printOn: stderr. ^self error: 'illegal index' ] ] at: [ self rangeCheck: i. ^self basicAt: i - offset ] at: i put: v [ self rangeCheck: i. ^self basicAt: i- offset put: v ] ] |
The code has two parts; an initialization, which simply records what index you wish the array to start with, and the at: messages, which adjust the requested index so that the underlying storage receives its 1-based index instead. We've included a range check; its utility will demonstrate itself in a moment:
a := RangedArray new: 10 base: 5. a at: 5 put: 0 a at: 4 put: 1 |
Since 4 is below our base of 5, a range check error occurs. But this check can catch more than just our own misbehavior!
a do: [:x| x printNl] |
Our do: message handling is broken! The stack backtrace pretty much tells the story:
RangedArray>>#rangeCheck: RangedArray>>#at: RangedArray>>#do: |
Our code received a do: message. We didn't define one, so we inherited the existing do: handling. We see that an Integer loop was constructed, that a code block was invoked, and that our own at: code was invoked. When we range checked, we trapped an illegal index. Just by coincidence, this version of our range checking code also dumps the index. We see that do: has assumed that all arrays start at 1.
The immediate fix is obvious; we implement our own do:
RangedArray extend [ do: aBlock [ <category: 'basic'> 1 to: (self basicSize) do: [:x| aBlock value: (self basicAt: x) ] ] ] |
But the issues start to run deep. If our parent class believed that it knew enough to assume a starting index of 1(44), why didn't it also assume that it could call basicAt:? The answer is that of the two choices, the designer of the parent class chose the one which was less likely to cause trouble; in fact all standard Smalltalk collections do have indices starting at 1, yet not all of them are implemented so that calling basicAt: would work.(45)
Object-oriented methodology says that one object should be entirely opaque to another. But what sort of privacy should there be between a higher class and its subclasses? How many assumption can a subclass make about its superclass, and how many can the superclass make before it begins infringing on the sovereignty of its subclasses?
Alas, there are rarely easy answers, and this is just an example. For this particular problem, there is an easy solution. When the storage need not be accessed with peak efficiency, you can use the existing array classes. When every access counts, having the storage be an integral part of your own object allows for the quickest access--but remember that when you move into this area, inheritance and polymorphism become trickier, as each level must coordinate its use of the underlying array with other levels.
In C, if you have a function for comparing strings, you might try to write it:
streq(char *p, char *q) { return (p == q); } |
But clearly this is wrong! The reason is that you can have two copies of a string, each with the same contents but each at its own address. A correct string compare must walk its way through the strings and compare each element.
In Smalltalk, exactly the same issue exists, although the details of manipulating storage addresses are hidden. If we have two Smalltalk strings, both with the same contents, we don't necessarily know if they're at the same storage address. In Smalltalk terms, we don't know if they're the same object.
The Smalltalk dictionary is searched frequently. To speed the search, it would be nice to not have to compare the characters of each element, but only compare the address itself. To do this, you need to have a guarantee that all strings with the same contents are the same object. The String class, created like:
y := 'Hello' |
does not satisfy this. Each time you execute this line, you may well get a new object. But a very similar class, Symbol, will always return the same object:
y := #Hello |
In general, you can use strings for almost all your tasks. If you ever get into a performance-critical function which looks up strings, you can switch to Symbol. It takes longer to create a Symbol, and the memory for a Symbol is never freed (since the class has to keep tabs on it indefinitely to guarantee it continues to return the same object). You can use it, but use it with care.
This tutorial has generally used the strcmp()-ish kind of
checks for equality. If you ever need to ask the question
"is this the same object?", you use the ==
operator
instead of =
:
x := y := 'Hello' (x = y) printNl (x == y) printNl y := 'Hel', 'lo' (x = y) printNl (x == y) printNl x := #Hello y := #Hello (x = y) printNl (x == y) printNl |
Using C terms, =
compares contents like strcmp()
.
==
compares storage addresses, like a pointer comparison.
Everybody, sooner or later, looks for the implementation of the
#new
method in Object class. To their surprise, they
don't find it; if they're really smart, they search for implementors
of #new in the image and they find out it is implemented by
Behavior
... which turns out to be a subclass of Object! The
truth starts showing to their eyes about that sentence that everybody
says but few people understand: "classes are objects".
Huh? Classes are objects?!? Let me explain.
Open up an image; then type the text following the
st>
prompt.
st> Set superclass! HashedCollection st> HashedCollection superclass! Collection st> Collection superclass! Object st> Object superclass! nil |
Nothing new for now. Let's try something else:
st> #(1 2 3) class! Array st> '123' class! String st> Set class! Set class st> Set class class! Metaclass |
You get it, that strange Set class
thing is something
called "a meta-class"... let's go on:
st> ^Set class superclass! Collection class st> ^Collection class superclass! Object class |
You see, there is a sort of `parallel' hierarchy between classes and metaclasses. When you create a class, Smalltalk creates a metaclass; and just like a class describes how methods for its instances work, a metaclass describes how class methods for that same class work.
Set
is an instance of the metaclass, so when you invoke
the #new
class method, you can also say you are invoking
an instance method implemented by Set class
. Simply put,
class methods are a lie: they're simply instance methods that
are understood by instances of metaclasses.
Now you would expect that Object class superclass
answers
nil class
, that is UndefinedObject
. Yet you saw that
#new
is not implemented there... let's try it:
st> ^Object class superclass! Class |
Uh?!? Try to read it aloud: the Object class
class inherits
from the Class
class. Class
is the abstract superclass
of all metaclasses, and provides the logic that allows you to create
classes in the image. But it is not the termination point:
st> ^Class superclass! ClassDescription st> ^ClassDescription superclass! Behavior st> ^Behavior superclass! Object |
Class is a subclass of other classes. ClassDescription
is
abstract; Behavior
is concrete but lacks the methods
and state that allow classes to have named instance variables,
class comments and more. Its instances are called
light-weight classes because they don't have separate
metaclasses, instead they all share Behavior
itself as
their metaclass.
Evaluating Behavior superclass
we have worked our way up to
class Object again: Object is the superclass of all instances as well
as all metaclasses. This complicated system is extremely powerful,
and allows you to do very interesting things that you probably did
without thinking about it--for example, using methods such as
#error:
or #shouldNotImplement
in class methods.
Now, one final question and one final step: what are metaclasses instances of? The question makes sense: if everything has a class, should not metaclasses have one?
Evaluate the following:
st> meta := Set class st> 0 to: 4 do: [ :i | st> i timesRepeat: [ Transcript space ] st> meta printNl st> meta := meta class st> ] Set class Metaclass Metaclass class Metaclass Metaclass class 0 |
If you send #class
repeatedly, it seems that you end up
in a loop made of class Metaclass
(46) and its
own metaclass, Metaclass class
. It looks like class
Metaclass is an instance of an instance of itself.
To understand the role of Metaclass
, it can be useful
to know that the class creation is implemented there.
Think about it.
Random class
implements creation and
initialization of its instances' random number seed;
analogously, Metaclass class
implements creation and
initialization of its instances, which are metaclasses.
Metaclass
implements creation and initialization of
its instances, which are classes (subclasses of Class
).
The circle is closed. In the end, this mechanism implements a clean, elegant and (with some contemplation) understandable facility for self-definition of classes. In other words, it is what allows classes to talk about themselves, posing the foundation for the creation of browsers.
Everybody says Smalltalk is slow, yet this is not completely true for at least three reasons. First, most of the time in graphical applications is spent waiting for the user to "do something", and most of the time in scripting applications (which GNU Smalltalk is particularly well versed in) is spent in disk I/O; implementing a travelling salesman problem in Smalltalk would indeed be slow, but for most real applications you can indeed exchange performance for Smalltalk's power and development speed.
Second, Smalltalk's automatic memory management is faster than C's manual one. Most C programs are sped up if you relink them with one of the garbage collecting systems available for C or C++.
Third, even though very few Smalltalk virtual machines are as optimized as, say, the Self environment (which reaches half the speed of optimized C!), they do perform some optimizations on Smalltalk code which make them run many times faster than a naive bytecode interpreter. Peter Deutsch, who among other things invented the idea of a just-in-time compiler like those you are used to seeing for Java(47), once observed that implementing a language like Smalltalk efficiently requires the implementor to cheat... but that's okay as long as you don't get caught. That is, as long as you don't break the language semantics. Let's look at some of these optimizations.
For certain frequently used 'special selectors', the compiler emits a send-special-selector bytecode instead of a send-message bytecode. Special selectors have one of three behaviors:
#do:
for example.
#at:
, #at:put:
, #size
) are
assigned to special bytecodes because they are subject to a special
caching optimization. These selectors often result in calling a
virtual machine primitive, so GNU Smalltalk remembers which primitve
was last called as the result of sending them. If we send #at:
100 times for the same class, the last 99 sends are directly mapped
to the primitive, skipping the method lookup phase.
No-lookup methods do contain a primitive number specification,
<primitive: xx>
, but it is used only when the method is
reached through a #perform:...
message send. Since
the method is not normally looked up, deleting the primitive name
specification cannot in general prevent this primitive from running.
No-lookup pairs are listed below:
Integer /Integer |
for | + - * = ~= > < >= <=
|
Integer /Integer |
for | // \\ bitOr: bitShift: bitAnd:
|
Any pair of objects | for | == isNil notNil class
|
BlockClosure | for | value value: blockCopy: (48)
|
Other messages are open coded by the compiler. That is, there are no message sends for these messages--if the compiler sees blocks without temporaries and with the correct number of arguments at the right places, the compiler unwinds them using jump bytecodes, producing very efficient code. These are:
to:by:do: if the second argument is an integer literal to:do: timesRepeat: and:, or: ifTrue:ifFalse:, ifFalse:ifTrue:, ifTrue:, ifFalse: whileTrue:, whileFalse: |
Other minor optimizations are done. Some are done by a peephole optimizer which is ran on the compiled bytecodes. Or, for example, when GNU Smalltalk pushes a boolean value on the stack, it automatically checks whether the following bytecode is a jump (which is a common pattern resulting from most of the open-coded messages above) and combines the execution of the two bytecodes. All these snippets can be optimized this way:
1 to: 5 do: [ :i | ... ] a < b and: [ ... ] myObject isNil ifTrue: [ ... ] |
That's all. If you want to know more, look at the virtual machine's source code in `libgst/interp-bc.inl' and at the compiler in `libgst/comp.c'.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |