The subroutines in the lib directory are compiled into the shared library, libbthash.so.
Subroutine | Description |
---|---|
hinit | Initialize a hash database |
hopen | Open the database |
hopent | Create a temporary database |
hclose | Close the database |
hcloset | Remove a temporary database from disk |
hisrt | Insert a record in the database |
hupdt | Replace a record in the database |
hdlet | Delete a record from the database |
hget | Read a record in the database |
hsweep | Read the next physical record in the database |
hdbck | Check every pointer in the database |
hchain | Count length of chains in the database |
hprime | Calculate a prime number |
hreorg | Reorganize the database |
hlock | Lock the database (UNIX) |
hunlock | Unlock the database (UNIX) |
The following calls may be used together to create an update transaction.
Read
Initialize a Hash Database
about initializing a database.
UNIX syntax:
hashfmt *hopen(char *dbname, int rwflag,
int lock, int *reason);
Windows syntax:
hashfmt *hopen(char *dbname, int rwflag, int *reason);
Parameters:
Processing:
hopen tests for locks. If no lock is pending,
it opens the database according to the read/write parameter.
Then it reads the first 256 bytes of the database to
gather schema information about the database. It validates
the signature of the database before formatting the
hashfmt structure.
Then it allocates memory for the read and write buffer.
The hashfmt structure is not freed until
hclose.
If more than one database is open, the buffers are
kept separate.
Returns:
hashfmt *hopent(int totrcds, int rcdlen, int keylen,
int keyofst, int *reason,
unsigned char *seed);
Parameters:
Processing:
hopent creates a temporary database name.
It then calls hinit to create the temporary
database and initialize it.
Finally, it calls hopen to make it available
for update. The temporary database is locked until
it is closed with hcloset.
Using a temporary database:
A temporary database is useful for joins and views
of a permanent database. It is also useful for
creating summary databases for reports.
A hashing temporary database is useful for data,
where the program knows the total number of records
being processed and the exact keys of the records.
A hashing temporary database is also useful for
data that doesn't need to be sorted.
Initializing a hashing temporary database is
expensive, because all the records need to be
initialized with blank data. A btree database
may be more suitable as a temporary database, because
it initializes only the root record in the
database. A btree database is designed for
fast growth. A hash database is designed for
stable amounts of data, even though the turnover
rate of the data may be high.
Returns:
int hclose(hashfmt *hshhdr);
Parameters:
Processing:
hclose closes the database. Then it
frees all allocated memory in the
hashfmt structure.
Finally, it frees the hashfmt structure,
itself.
Returns:
Zero for good close.
int hcloset(hashfmt *hshhdr);
Parameters:
Processing:
hcloset closes the database by calling
hclose. It then removes the temporary
database from disk.
The link count for the temporary database must go
to zero before it can be removed. If the program
pipes to another process, the link count may not
go to zero. In that case the file may not be removed.
Returns:
Zero for good close.
int hisrt(hashfmt *hshhdr, unsigned char *rcd);
Parameters:
Processing:
Locking is mandatory during insert. If a lock
is pending, hisrt returns with an error.
After a successful insert, the file is unlocked
only if it was not opened with a lock. If the
file was opened with a lock, the lock remains
in effect after a good insert.
hisrt first checks the key to see if
it is reserved. If it is not reserved, it
continues.
hisrt looks for an existing record with
the same key. If no record exists with the same
key, it tries to fit the record in the alloted
hash location.
The
Sedgewick algorithm
is adapted to calculate the hash location.
First the record key is scrambled with the
md5 one-way hash.
Then the digest from the hash is divided
by the total number of records. The remainder
becomes the location of the record.
If several records hash to the same location, there
is a collision.
The colliding record fills the free block at the
head of the free block stack and the free block
stack is popped.
If the free block stack is empty, the database is
extended by one record and the record is added to
the beginning of the overflow chain.
In either case, the colliding record is logically
inserted at the beginning of the overflow chain for
the hashing address.
Returns:
int hupdt(hashfmt *hshhdr, unsigned char *rcd);
Parameters:
Processing:
Locking is mandatory during update. If a lock
is pending, hupdt returns with an error.
After a successful update, the file is unlocked
only if it was not opened with a lock. If the
file was opened with a lock, the lock remains
in effect after a good update.
hupdt looks for the record matching the
embedded key. If the record is found, it is replaced
in the same physical location and written back to
disk.
Returns:
int hdlet(hashfmt *hshhdr, unsigned char *key);
Parameters:
Processing:
Locking is mandatory during delete. If a lock
is pending, hdlet returns with an error.
After a successful delete, the file is unlocked
only if it was not opened with a lock. If the
file was opened with a lock, the lock remains
in effect after a good delete.
hdlet looks for the record matching the
key. If the record is not found, hdlet
returns with an error code of 1.
If the record is found, hdlet sees if
the record to delete is in the hash space.
If so, it checks to see if the record to be
deleted will orphan the overflow chain.
If so, it copies the first record in the
overflow chain into its location and puts
the first overflow location on the free stack.
If the record to be deleted does not orphan
the overflow chain, it empties the record and
returns.
If the record to be deleted is on the overflow chain,
hdlet puts the record at the head of
the free block stack.
hisrt will use the record on
the free block stack the next time a record
is inserted.
Returns:
int hget(hashfmt *hshhdr, unsigned char *key,
unsigned char *buf);
Parameters:
Processing:
hget searches for a record matching the key.
If the record is not found at the hash address,
the overflow chain is searched.
Returns:
int hsweep(hashfmt *hshhdr, int position,
unsigned char *buf);
Parameters:
Processing:
hsweep reads the database in physical
sequence. The node is saved in hshhdr->node.
The disk address of the record is saved in
hshhdr->ofst.
hsweep bypasses empty records in the
hashing address space and deleted records in
the overflow address space.
hsweep verifies that the record is in
the correct location if it is in the hashing
address space. It does not perform this
verification if the record is in the overflow
address space.
hsweep returns one record per call. This
subroutine is useful for matching criteria in the
data portion of the record.
It is also useful for doing mass updates to the
database.
The keys will always be in random sequence.
Returns:
int hdbck(hashfmt *hshhdr);
Parameters:
Processing:
hdbck reads the database in physical
sequence. Every pointer is mapped to a bit
map. If a pointer is mapped twice, it is
an error. If a pointer points to the hash
address space, it is an error.
If the bit map lacks a pointer to a record
in the overflow address space, it is an error.
Returns:
void hchain(hashfmt *hshhdr, int *count, int *smlen,
int *smlensq, int *freelen);
Parameters:
Processing:
hchain reads the hashing part of the
database in physical sequence. Every overflow
chain is walked to determine its length and to
gather statistics. Finally the free block
chain is walked to determine its length.
Returns:
The output parameters are count, smlen,
smlensq, and freelen. The output parameters
allow you to calculate the standard deviation
of the overflow chains.
int hprime(int *maxnum);
Parameters:
Processing:
hprime calculates the number of records
to allocate to the hashing space in your database.
This number should be a prime number.
The input parameter is an estimated prime number.
The output from hprime is the largest
prime number equal or less than the estimated
number.
Returns:
The largest prime number equal or less than
the estimated number.
int hreorg(hashfmt *hshhdr, int totrcds,
double freespc);
Parameters:
Processing:
Back up your database before calling
hreorg. There is a risk of
losing your data during the renames.
If the total number of records parameter is
zero, hreorg changes the database
size to the file size plus the free space
percent. The totrcds parameter is
calculated by dividing the record length
into the file size.
hreorg initializes a temporary
database with the new database size.
It then copies the current database to
the temporary database. Finally, it renames
the temporary database back to the original.
When the totrcds parameter changes, the
hashing location of each record also changes.
hreorg closes the original database with
hclose before the rename occurs. When
you receive control back from hreorg,
you will have to open the database to use it.
Returns:
Zero - good reorganization.
int hlock(hashfmt *hshhdr);
Parameters:
Processing:
hlock returns if the database is locked
by another user. If the database is not locked,
the lock is applied and the database status is
now LOCK;
Returns:
hinit
hopen
hopent
hclose
hcloset
hisrt
hupdt
hdlet
hget
hsweep
hdbck
hchain
hprime
hreorg
hlock (UNIX)