DictionaryBasedBreakIterator Class Reference

A subclass of RuleBasedBreakIterator that adds the ability to use a dictionary to further subdivide ranges of text beyond what is possible using just the state-table-based algorithm. More...

#include <dbbi.h>

Inheritance diagram for DictionaryBasedBreakIterator:

RuleBasedBreakIterator BreakIterator UObject UMemory

Public Member Functions

virtual ~DictionaryBasedBreakIterator ()
 Destructor.
 DictionaryBasedBreakIterator ()
 Default constructor.
 DictionaryBasedBreakIterator (const DictionaryBasedBreakIterator &other)
 Copy constructor.
DictionaryBasedBreakIteratoroperator= (const DictionaryBasedBreakIterator &that)
 Assignment operator.
virtual BreakIteratorclone (void) const
 Returns a newly-constructed RuleBasedBreakIterator with the same behavior, and iterating over the same text, as this one.
virtual int32_t previous (void)
 Advances the iterator backwards, to the last boundary preceding this one.
virtual int32_t following (int32_t offset)
 Sets the iterator to refer to the first boundary position following the specified position.
virtual int32_t preceding (int32_t offset)
 Sets the iterator to refer to the last boundary position before the specified position.
virtual UClassID getDynamicClassID (void) const
 Returns a unique class ID POLYMORPHICALLY.

Static Public Member Functions

UClassID U_EXPORT2 getStaticClassID (void)
 Returns the class ID for this class.

Protected Member Functions

virtual int32_t handleNext (void)
 This method is the actual implementation of the next() method.
virtual void reset (void)
 removes the cache of break positions (usually in response to a change in position of some sort)
void init ()
 init Initialize a dbbi.
virtual BreakIteratorcreateBufferClone (void *stackBuffer, int32_t &BufferSize, UErrorCode &status)

Friends

class DictionaryBasedBreakIteratorTables
class BreakIterator

Detailed Description

A subclass of RuleBasedBreakIterator that adds the ability to use a dictionary to further subdivide ranges of text beyond what is possible using just the state-table-based algorithm.

This is necessary, for example, to handle word and line breaking in Thai, which doesn't use spaces between words. The state-table-based algorithm used by RuleBasedBreakIterator is used to divide up text as far as possible, and then contiguous ranges of letters are repeatedly compared against a list of known words (i.e., the dictionary) to divide them up into words.

Applications do not normally need to include this header.

This class will probably be deprecated in a future release of ICU, and replaced with a more flexible and capable dictionary based break iterator. This change should be invisible to applications, because creation and use of instances of DictionaryBasedBreakIterator is through the factories and abstract API on class BreakIterator, which will remain stable.

This class is not intended to be subclassed.

DictionaryBasedBreakIterator uses the same rule language as RuleBasedBreakIterator, but adds one more special substitution name: <dictionary>. This substitution name is used to identify characters in words in the dictionary. The idea is that if the iterator passes over a chunk of text that includes two or more characters in a row that are included in <dictionary>, it goes back through that range and derives additional break positions (if possible) using the dictionary.

DictionaryBasedBreakIterator is also constructed with the filename of a dictionary file. It follows a prescribed search path to locate the dictionary (right now, it looks for it in /com/ibm/text/resources in each directory in the classpath, and won't find it in JAR files, but this location is likely to change). The dictionary file is in a serialized binary format. We have a very primitive (and slow) BuildDictionaryFile utility for creating dictionary files, but aren't currently making it public. Contact us for help.

NOTE The DictionaryBasedIterator class is still under development. The APIs are not in stable condition yet.

Definition at line 62 of file dbbi.h.


Constructor & Destructor Documentation

virtual DictionaryBasedBreakIterator::~DictionaryBasedBreakIterator  )  [virtual]
 

Destructor.

ICU_Stable:
ICU 2.0

DictionaryBasedBreakIterator::DictionaryBasedBreakIterator  ) 
 

Default constructor.

Creates an "empty" break iterator. Such an iterator can subsequently be assigned to.

Returns:
the newly created DictionaryBaseBreakIterator.
ICU_Stable:
ICU 2.0

DictionaryBasedBreakIterator::DictionaryBasedBreakIterator const DictionaryBasedBreakIterator other  ) 
 

Copy constructor.

Parameters:
other The DictionaryBasedBreakIterator to be copied.
Returns:
the newly created DictionaryBasedBreakIterator.
ICU_Stable:
ICU 2.0


Member Function Documentation

virtual BreakIterator* DictionaryBasedBreakIterator::clone void   )  const [virtual]
 

Returns a newly-constructed RuleBasedBreakIterator with the same behavior, and iterating over the same text, as this one.

Returns:
Returns a newly-constructed RuleBasedBreakIterator.
ICU_Stable:
ICU 2.0

Reimplemented from RuleBasedBreakIterator.

virtual BreakIterator* DictionaryBasedBreakIterator::createBufferClone void *  stackBuffer,
int32_t &  BufferSize,
UErrorCode status
[protected, virtual]
 

Parameters:
stackBuffer user allocated space for the new clone. If NULL new memory will be allocated. If buffer is not large enough, new memory will be allocated.
BufferSize reference to size of allocated space. If BufferSize == 0, a sufficient size for use in cloning will be returned ('pre-flighting') If BufferSize is not enough for a stack-based safe clone, new memory will be allocated.
status to indicate whether the operation went on smoothly or there were errors An informational status value, U_SAFECLONE_ALLOCATED_ERROR, is used if any allocations were necessary.
Returns:
pointer to the new clone
ICU_Internal:

Reimplemented from RuleBasedBreakIterator.

virtual int32_t DictionaryBasedBreakIterator::following int32_t  offset  )  [virtual]
 

Sets the iterator to refer to the first boundary position following the specified position.

Parameters:
offset The position from which to begin searching for a break position.
Returns:
The position of the first break after the current position.
ICU_Stable:
ICU 2.0

Reimplemented from RuleBasedBreakIterator.

virtual UClassID DictionaryBasedBreakIterator::getDynamicClassID void   )  const [virtual]
 

Returns a unique class ID POLYMORPHICALLY.

Pure virtual override. This method is to implement a simple version of RTTI, since not all C++ compilers support genuine RTTI. Polymorphic operator==() and clone() methods call this method.

Returns:
The class ID for this object. All objects of a given class have the same class ID. Objects of other classes have different class IDs.
ICU_Stable:
ICU 2.0

Reimplemented from RuleBasedBreakIterator.

UClassID U_EXPORT2 DictionaryBasedBreakIterator::getStaticClassID void   )  [static]
 

Returns the class ID for this class.

This is useful only for comparing to a return value from getDynamicClassID(). For example:

Base* polymorphic_pointer = createPolymorphicObject(); if (polymorphic_pointer->getDynamicClassID() == Derived::getStaticClassID()) ...

Returns:
The class ID for all objects of this class.
ICU_Stable:
ICU 2.0

Reimplemented from RuleBasedBreakIterator.

virtual int32_t DictionaryBasedBreakIterator::handleNext void   )  [protected, virtual]
 

This method is the actual implementation of the next() method.

All iteration vectors through here. This method initializes the state machine to state 1 and advances through the text character by character until we reach the end of the text or the state machine transitions to state 0. We update our return value every time the state machine passes through a possible end state.

ICU_Internal:

Reimplemented from RuleBasedBreakIterator.

void DictionaryBasedBreakIterator::init  )  [protected]
 

init Initialize a dbbi.

Common routine for use by constructors.

ICU_Internal:

Reimplemented from RuleBasedBreakIterator.

DictionaryBasedBreakIterator& DictionaryBasedBreakIterator::operator= const DictionaryBasedBreakIterator that  ) 
 

Assignment operator.

Parameters:
that The object to be copied.
Returns:
the newly set DictionaryBasedBreakIterator.
ICU_Stable:
ICU 2.0

virtual int32_t DictionaryBasedBreakIterator::preceding int32_t  offset  )  [virtual]
 

Sets the iterator to refer to the last boundary position before the specified position.

Parameters:
offset The position to begin searching for a break from.
Returns:
The position of the last boundary before the starting position.
ICU_Stable:
ICU 2.0

Reimplemented from RuleBasedBreakIterator.

virtual int32_t DictionaryBasedBreakIterator::previous void   )  [virtual]
 

Advances the iterator backwards, to the last boundary preceding this one.

Returns:
The position of the last boundary position preceding this one.
ICU_Stable:
ICU 2.0

Reimplemented from RuleBasedBreakIterator.

virtual void DictionaryBasedBreakIterator::reset void   )  [protected, virtual]
 

removes the cache of break positions (usually in response to a change in position of some sort)

ICU_Internal:

Reimplemented from RuleBasedBreakIterator.


Friends And Related Function Documentation

friend class BreakIterator [friend]
 

ICU_Internal:

Reimplemented from RuleBasedBreakIterator.

Definition at line 279 of file dbbi.h.


The documentation for this class was generated from the following file:
Generated on Tue Nov 16 10:03:17 2004 for ICU 3.2 by  doxygen 1.3.9.1