Extends the the Java™ Collections Framework by providing type-specific maps, sets, lists and priority queues with a small memory footprint and fast access and insertion.

See:
          Description

Packages
it.unimi.dsi.fastutil  
it.unimi.dsi.fastutil.booleans Provides type-specific classes for boolean elements or keys.
it.unimi.dsi.fastutil.bytes Provides type-specific classes for byte elements or keys.
it.unimi.dsi.fastutil.chars Provides type-specific classes for character elements or keys.
it.unimi.dsi.fastutil.doubles Provides type-specific classes for double elements or keys.
it.unimi.dsi.fastutil.floats Provides type-specific classes for float elements or keys.
it.unimi.dsi.fastutil.ints Provides type-specific classes for integer elements or keys.
it.unimi.dsi.fastutil.longs Provides type-specific classes for long elements or keys.
it.unimi.dsi.fastutil.objects Provides type-specific classes for object elements or keys.
it.unimi.dsi.fastutil.shorts Provides type-specific classes for short elements or keys.

 

Extends the the Java™ Collections Framework by providing type-specific maps, sets, lists and priority queues with a small memory footprint and fast access and insertion. It is free software distributed under the GNU Lesser General Public License.

Warning: As of 4.0, the introduction of the new high-performance addElements() methods to type-specific lists made previous implementations of type-specific lists incomplete. However, the new methods are implemented by type-specific abstract lists, so a recompilation should be sufficient.

Moreover, the abstract generic version of get(), put() and remove() for maps with non-object keys or values now always return null to denote a missing key. In a few cases they used to return an object-wrapped default return value.

As of 3.1, the introduction of several new static methods made it clear that accumulating all such methods in a single static container class would not have been feasible (Collections would have more than a thousand methods). Thus, static methods are spread, whenever appropriate, in type-specific classes such as IntSets. The old it.unimi.dsi.fastutil.Iterators has been fit into this framework, too, and now all its methods are distributed in type-specific classes (e.g., IntIterators).

As a consequence, source code compatibility is broken for methods contained in the old it.unimi.dsi.fastutil.Iterators class.

Moreover, after facing of a number of ambiguities in method invocation, it has been decided to rename all removal methods in type-specific versions of Collection to rem() (this was previously true just of IntCollection.rem(int)).

Package Specification

The classes of this package specialize the most useful HashSet, HashMap, LinkedHashSet, LinkedHashMap, TreeSet, TreeMap, IdentityHashMap, ArrayList and Stack classes to versions that accept a specific kind of key or value (e.g., integers). Besides, there are also several types of priority queues and a large collection of static objects and methods (such as immutable empty containers, comparators implementing the opposite of the natural order, iterators obtained by wrapping an array and so on.

To understand what's going on at a glance, the best thing is to look at the examples provided. If you already used the Collections Framework, everything should look rather natural. If, in particular, you use an IDE such as Eclipse, which can suggest you the method names, all you need to know is the right name for the class you need.

All data structures in fastutil implement their standard counterpart interface (e.g., Map for maps). Thus, they can be just plugged into existing code, using the standard access methods (of course, any attempt to use the wrong type for keys or values will produce a ClassCastException). However, they also provide (whenever possible) many polymorphic versions of the most used methods that lessen the tedious "type juggling" that is well known to Java programmers. In doing so, they implement more stringent interfaces that extend the standard ones (e.g., Int2IntSortedMap or IntListIterator).

Of course, the main point of type-specific data structures is that the absence of wrappers around primitive types can increas speed and reduce space occupancy by several times. The implementation of generics in Java will not change this fact, since there will be no genericity for primitive types: thanks to automatic boxing/deboxing of primitive types, type juggling will be eliminated, but the data structures themselves will remain much larger and slower than needed.

The implementation techniques used in fastutil are quite different than those of java.util: for instance, open-addressing hash tables, threaded AVL trees, threaded red-black trees and exclusive-or lists. An effort has also been made to provide powerful derived objects: for instance, the keys of sorted maps are sorted and iterators on sorted containers are always bidirectional.

More generally, the rationale behing fastutil is that you should never need to code explicitly natural transformations. You do to not need to define an anonymous class to iterate over an array of integers—just wrap it. You do not need to write a loop to put the characters returned by an iterator into a set—just use the right constructor. And so on.

The Names

In general, class names adhere to the general pattern

valuetype collectiontype

for collections, and

keytype 2 valuetype maptype

for maps.

By "type" here I mean a capitalized primitive type, Object or Reference. In the latter case, we are treating objects, but their equality is established by reference equality (that is, without invoking equals()), similarly to IdentityHashMap. Of course, reference-based classes are significantly faster.

Thus, an IntOpenHashSet stores integers efficiently and implements IntSet, whereas a Long2IntAVLTreeMap does the same for maps from longs to integers (but the map will be sorted, tree based, and balanced using the AVL criterion), implementing Long2IntMap. If you need additional flexibility in choosing your hash strategy, you can put, say, arrays of integers in a ObjectOpenCustomHashSet, maybe using the ready-made hash strategy for arrays. A LongLinkedOpenHashSet stores longs in a hash table, but provides a predictable iteration order (the insertion order) and access to first/last elements of the order. A Reference2ReferenceOpenHashMap is similar to an IdentityHashMap. You can manage a priority queue of characters in a heap using a CharHeapPriorityQueue, which implements CharPriorityQueue. Front-coded lists are highly specialized immutable data structures that store compactly a large number of arrays: if you don't know them you probably don't need them.

Since there are eight primitive types in Java, and we support reference-based containers, we get 1310 (!) classes (some nonsensical classes, such as Boolean2BooleanAVLTreeMap, are not generated). Many classes are generated just to mimic the hierarchy of java.util so to redistribute common code in a similar way. There are also several abstract classes that ease significantly the creation of new type-specific classes by providing automatically generic methods based on the type-specific ones.

The huge number of classes required a suitable division in subpackages (more than anything else, to avoid crashing browsers with a preposterous package summary). Each subpackage is characterized by the type of elements or keys: thus, for instance, IntSet belongs to it.unimi.dsi.fastutil.ints (the plural is required, as int is a keyword and cannot be used in a package name), as well as Int2ReferenceRBTreeMap. Note that all classes for non-primitive elements and keys are gathered in it.unimi.dsi.fastutil.objects. Finally, a number of non-type-specific classes have been gathered in it.unimi.dsi.fastutil.

An In–Depth Look

The following table summarizes the available interfaces and implementations. To get more information, you can look at a specific implementation in it.unimi.dsi.fastutil or, for instance, it.unimi.dsi.fastutil.ints.

InterfacesAbstract ImplementationsImplementations
CollectionAbstractCollection
SetAbstractSetOpenHashSet, OpenCustomHashSet
SortedSetAbstractSortedSetRBTreeSet, AVLTreeSet, LinkedOpenHashSet
MapAbstractMapOpenHashMap, OpenCustomHashMap
SortedMapAbstractSortedMapRBTreeMap, AVLTreeMap, LinkedOpenHashMap
ListAbstractListArrayList, ArrayFrontCodedList
PriorityQueue†AbstractPriorityQueue†HeapPriorityQueue, ArrayPriorityQueue
IndirectPriorityQueue‡AbstractIndirectPriorityQueue‡HeapSemiIndirectPriorityQueue, HeapIndirectPriorityQueue, ArrayIndirectPriorityQueue
IndirectDoublePriorityQueue‡AbstractIndirectDoublePriorityQueue‡HeapSesquiIndirectDoublePriorityQueue, HeapIndirectDoublePriorityQueue, ArrayIndirectDoublePriorityQueue
Stack†AbstractStack†ArrayList
IteratorAbstractIterator
ComparatorAbstractComparator
BidirectionalIterator†AbstractBidirectionalIterator
ListIteratorAbstractListIterator

†: this class has also a non-type-specific implementation in it.unimi.dsi.fastutil.

‡: this class has only a non-type-specific implementation in it.unimi.dsi.fastutil.

Note that abstract implementations are named by prefixing the interface name with Abstract. Thus, if you want to define a type-specific structure holding a set of integers without the hassle of defining object-based methods, you should inherit from AbstractIntSet.

The following table summarizes static containers, which usually give rise both to a type-specific and to a generic class:

Static Containers
Collections†
Sets†
SortedSets†
Maps†
SortedMaps
Lists†
Arrays†
Heaps
SemiIndirectHeaps
IndirectHeaps
PriorityQueues†
IndirectPriorityQueues‡
IndirectDoublePriorityQueues‡
Iterators†
Comparators
Hash‡
HashCommon‡

†: this class has also a non-type-specific implementation in it.unimi.dsi.fastutil.

‡: this class has only a non-type-specific implementation in it.unimi.dsi.fastutil.

The static containers provide also special-purpose implementations for all kinds of empty structures (including arrays) and singletons.

Warnings

All classes are not synchronized. If multiple threads access one of these classes concurrently, and at least one of the threads modifies it, it must be synchronized externally. Iterators will behave unpredictably in the presence of concurrent modifications. Reads, however, can be carried out concurrently.

Reference-based classes violate the Map contract. They intentionally compare objects by reference, and do not use the equals() method. They should be used only when reference-based equality is desired (for instance, if all objects involved are canonized, as it happens with interned strings).

Linked classes do not implement wholly the SortedMap interface. They provide methods to get the first and last element in iteration order, but any submap or subset method will cause an UnsupportedOperationException (this may change in future versions).

Substructures in sorted classes allow the creation of arbitrary substructures. In java.util, instead, you can only create contained sub-substructures (BTW, why?). For instance, (new TreeSet()).tailSet(new Integer(1)).tailSet(new Integer(0)) will throw an exception, but (new IntRBTreeSet()).tailSet(1).tailSet(0) won't.

Immutability is syntactically based (as opposed to semantically based). All methods that are known not to be causing modifications to the structure at compile time will not throw exceptions (e.g., EMPTY_SET.clear()). All other methods will cause an UnsupportedOperationException. Note that (as of Java 1.4) the situation in java.util is definitely different, and inconsistent: for instance, in singletons add() always throws an exception, whereas remove() does it only if the singleton would be modified. This behaviour agrees with the interface documentation, but it is nonetheless confusing.

Additional Features and Methods

The new interfaces add some very natural methods. Moreover, whenever possible, the object returned is type-specific, or even implements a more powerful interface.

Due to some limitations of Java (you cannot override covariantly the return value of an interface, i.e., with a method returning a more specific value), however, sometimes these features are available only by means of type casting.

More in detail:

There are a few quirks, however, that you should be aware of:

Static Container Classes

fastutil provides a number of static methods and singletons, much like Collections. To avoid creating classes with hundreds of methods, there are separate containers for sets, lists, maps and so on. Generic containers are placed in it.unimi.dsi.fastutil, whereas type-specific containers are in the appropriate package. You should look at the documentation of the static classes contained in it.unimi.dsi.fastutil, and in type-specific static classes such as CharSets, Float2ByteSortedMaps, LongArrays, FloatHeaps and DoublePriorityQueues. Presently, you can easily obtain empty collections, empty type-specific collections, singletons, synchronized versions of any type-specific container and unmodifiable versions of containers and iterators (of course, unmodifiable containers always return unmodifiable iterators).

On a completely different side, the type-specific static container classes for arrays provide several useful methods that allow to treat an array much like an array-based list, hiding completely the growth logic. In many cases, using this methods and an array is even simpler then using a full-blown type-specific array-based list because elements access is syntactically much simpler. The version for objects uses reflection to return arrays of the same type of the argument.

For the same reason, fastutil provides a full implementation of methods that manipulate arrays as type-specific heaps, semi-indirect heaps and indirect heaps.

Iterators and Comparators

fastutil provides type-specific iterators and comparators. The interface of a fastutil iterator is slightly more powerful than that of a java.util iterator, as it contains a skip() method that allows to skip over a list of elements (an analogous method is provided for bidirectional iterators). For objects (even those managed by reference), the extended interface is named ObjectIterator; it is the return type, for instance, of ObjectCollection.objectIterator(). fastutil provides also classes and methods that makes it easy to create type-specific iterators and comparators. There are abstract versions of each (type-specific) iterator and comparator that implement in the obvious way some of the methods (see, e.g., AbstractIntIterator or AbstractIntComparator).

A plethora of useful static methods is also provided by various type-specific static containers (e.g., IntIterators) and by Iterators: among other things, you can wrap arrays and standard iterators in type-specific iterators, generate them giving an interval of elements to be returned, concatenate them or pour them into a set.

Queues

fastutil offers three types of queues: direct queues, indirect queues and double indirect queues. A direct queue offers type-specific method to enqueue and dequeue elements. An indirect queue needs a reference array, specified at construction time: enqueue and dequeue operations refer to indices in the reference array. The advantage is that it may be possible to notify the change of any element of the reference array, or even to remove an arbitrary element. Finally, double indirect queues maintain at the same time two orders (very commonly, opposite orders).

Queues have two type of implementations: a trivial array-based implementation, and a heap-based implementation. In particular, heap-based indirect queues may be fully indirect or just semi-indirect: in the latter case, there is no need for an explicit indirection array (which saves one integer per queue entry), but not all operations will be avalable. Correspondingly, double priority queues have a sesqui (i.e., one-and-a-half) indirect implementation and a fully indirect implementation.

Custom Hashing

Sometimes, the behaviour of the built-in equality and hashing methods is not what you want. In particular, this happens if you store in a hash-based collection arrays, and you would like to compare them by equality. For this kind of applications, fastutil provides custom hash strategies, which define new equality and hashing methods to be used inside the collection. There are even ready-made strategies for arrays. Note, however, that fastutil containers do not cache hash codes, so custom hash strategies must be efficient.

Abstract Classes

fastutil provides a wide range of abstract classes, to help in implementing its interfaces. They take care, for instance, of providing wrappers for non-type-specific method calls, so that you have to write just the (usually simpler) type-specific version.

Performance

The main reason behind fastutil is performance, both in time and in space. The relevant methods of type-specific hash maps and sets are something like 2 to 10 times faster than those of the standard classes. Note that performance of hash-based classes on object keys is usually worse (from a few percent to doubled time) than that of java.util, because fastutil classes do not cache hash codes (albeit it will not be that bad if keys cache internally hash codes, as in the case of String). Of course, you can try to get more speed from hash tables using a small load factor (say, 1/2).

For tree-based classes you have two choices: AVL and red-black trees. The essential difference is that AVL trees are more balanced (their height is at most 1.44 log n), whereas red-black trees have faster deletions (but their height is at most 2 log n). So on small trees red-black trees could be faster, but on very large sets AVL trees will shine. In general, AVL trees have slightly slower updates but faster searches; however, on very large collections the smaller height may lead in fact to faster updates, too.

fastutil reduces enormously the creation and collection of objects. First of all, if you use the polymorphic methods and iterators no wrapper objects have to be created. Moreover, since fastutil uses open-addressing hashing techniques, creation and garbage collection of hash-table entries are avoided (but tables have to be rehashed whenever they are filled beyond the load factor). The major reduction of the number of objects around has a definite (but very difficult to measure) impact on the whole application (as garbage collection runs proportionally to the number of alive objects).

Whenever possible, fastutil tries to gain some speed by checking for faster interfaces: for instance, the various set-theoretic methods addAll(), retainAll(), ecc. check whether their arguments are type-specific and use faster iterators and accessors accordingly.

Deletions in Hash Tables

Since deletions in hash tables are handled simply by tagging, they are very fast per se, but they tend to slow down subsequent accesses (with respect to a table with no deleted entries). In highly dynamical situations, where entries are continuously created and deleted, unsuccessful searches may take linear time (as all entries must be probed).

A partial solution to this problem (which has no known complete solution if you use open addressing with double hashing—cfr. Knuth's section on hashing in the third volume of The Art of Computer Programming) is to call the rehash() method, which will try to rebuild the table remapping all keys. There are also trim() methods that will reduce the table size if possible.

In other words, if your application requires inextricably interleaved insertions, deletions and queries open-addressing hash-table implementations (and in particular fastutil classes) are not the right choice.

Note, however, that fastutil implements a special optimization, usually not found elsewhere, that speeds up probes for recently deleted entries. More details can be found in the documentation of the Hash interface.

The case of linked tables is even more problematic: the deletion of an item requires a linear probe of the links until the item is found, and thus it has potentially linear cost (however, this is not true if the deletion is performed by means of an iterator, or if you delete the last element).

Memory Usage

The absence of wrappers makes data structures in fastutil much smaller: even in the case of objects, however, data structures in fastutil try to be space-efficient.

Hash Tables

To avoid memory waste, (unlinked) hash tables in fastutil keep no additional information about elements (such as a list of keys). In particular, this means that enumerations are always linear in the size of the table (rather than in the number of keys). Usually, this would imply slower iterators. Nonetheless, the iterator code includes a single, tight loop; moreover, it is possible to avoid the creation of wrappers. These two facts make in practice fastutil iterators faster than java.util's.

The memory footprint for a table with n keys is exactly the memory required for the related types times n, plus a overhead of n bytes to store the state of each entry. The absence of wrappers around primitive types can reduce space occupancy by several times (this applies even more to serialized data, e.g., when you save such a data structure in a file). These figures can greatly vary with your virtual machine, JVM versions, CPU etc.

More precisely, when you ask for a map that will hold n elements with load factor 0 < f ≤ 1, p entries are allocated, where p is first prime in Hash.PRIMES larger than n / f. Primes in Hash.PRIMES are roughly multiplicatively spaced by 21/16, so you lose on average about 2% with respect to n / f.

When the table is filled up beyond the load factor, it is rehashed to a larger size. The growth is controlled by the growth factor, which can be set at any time. By default, the table size is doubled (for more information, see IntOpenHashSet), but you can trade speed for memory occupancy by setting a slower growth rate.

In the case of linked hash tables, there is an additional vector of p integers that is used to store link information. Each element records the next and previous element indexes exclusive-or'd together. As a result, linked tables provide bidirectional iterators without having to store two pointers per entry (however, iterators starting from a given element require a linear probe to be initialized, unless the element is the last one).

Since hash codes are not cached, equality on objects is checked first by checking equality of their hashCode(), and then using equals(). This turns out to increase slightly the performance, as many classes (including String) cache their hash codes; in any case, the speed cannot reach that of java.util's hash classes, which cache hash codes.

Balanced Trees

The balanced trees implementation is also very parsimonious. fastutil is based on the excellent (and unbelievably well documented) code contained in Ben Pfaff's GNU libavl, which describes in detail how to handle balanced trees with threads. Thus, the overhead per entry is two pointers and one integer, which compares well to three pointers plus one boolean of the standard tree maps. The trick is that we use the integer bit by bit, so we consume two bits to store thread information, plus one or two bits to handle balancing. As a result, we get bidirectional iterators in constant space and amortized constant time without having to store references to parent nodes.

It should be mentioned that all tree-based classes have a fixed overhead for some arrays that are used as stacks to simulate recursion; in particular, we need 48 booleans for AVL trees and 64 pointers plus 64 booleans for red-black trees.

An Example

Suppose you want to store a sorted map from longs to integers. The first step is to define a variable of the right interface, and assign it a new tree map (say, of the AVL type):

Long2IntSortedMap m = new Long2IntAVLTreeMap();
    

Now we can easily modify and access its content:

m.put( 1, 5 );
m.put( 2, 6 );
m.put( 3, 7 );
m.put( 1000000000L, 10 );
m.get( 1 ); // This method call will return 5
m.get( 4 ); // This method call will return 0
    

We can also try to change the default return value:

m.defaultReturnValue( -1 );
m.get( 4 ); // This method call will return -1
    

By suitable type casting, we can obtain a very powerful iterator:

LongListIterator i = (LongListIterator)((LongSortedSet)m.keySet()).iterator();
// Now we sum all keys
long s = 0;
while( i.hasNext() ) s += i.nextLong();
    

If one just needs a type-specific iterator, there is a special method that avoids casting:

LongIterator i = ((LongSortedSet)m.keySet()).longIterator();
// Now we sum all keys
long s = 0;
while( i.hasNext() ) s += i.nextLong();
    

We now generate a head map, and iterate bidirectionally over it starting from a given point:

// This map contains only keys smaller than 4
Long2IntSortedMap m1 = m.headMap( 4 );
// This iterator is positioned between 2 and 3
LongBidirectionalIterator t = ((LongSortedSet)m1.keySet()).iterator( 2 );
t.previous(); // This method call will return 2 (t.next() would return 3)
    

Should we need to access the map concurrently, we can wrap it:

// This map can be safely accessed by many threads
Long2IntSortedMap m2 = Longs2IntSortedMaps.synchronize( m1 );
    

Linked maps are very flexible data structures which can be used to implement, for instance, queues whose content can be probed efficiently:

// This map remembers insertion order (note that we are using the array-based constructor)
IntSortedSet s = new IntLinkedOpenHashSet( new int[] { 4, 3, 2, 1 } );
s.firstInt(); // This method call will return 4
s.lastInt(); // This method call will return 1
s.contains(5); // This method will return false
IntBidirectionalIterator i = s.iterator( s.lastInt() ); // We could even cast it to a list iterator 
i.previous(); // This method call will return 1
i.previous(); // This method call will return 2
s.remove(s.lastInt()); // This will remove the last element in constant time
    

Finally, we play with iterators. It is easy to create iterators over intervals or over arrays, and combine them:

IntIterator i = Iterators.fromTo( 0, 10 ); // This iterator will return 0, 1, ..., 9.
int a[] = new int[] { 5, 1, 9 };
IntIterator j = Iterators.wrap( a ); // This iterator will return 5, 1, 9.
IntIterator k = Iterators.concat( new IntIterator[] { i , j } ); // This iterator will return 0, 1, ..., 9, 5, 1, 9.