pitt.search.semanticvectors
Class TermVectorsFromLucene
java.lang.Object
pitt.search.semanticvectors.TermVectorsFromLucene
- All Implemented Interfaces:
- VectorStore
public class TermVectorsFromLucene
- extends java.lang.Object
- implements VectorStore
Implementation of vector store that creates term vectors by
iterating through all the terms in a Lucene index. Uses a sparse
representation for the basic document vectors, which saves
considerable space for collections with many individual documents.
- Author:
- Dominic Widdows, Trevor Cohen.
Constructor Summary |
TermVectorsFromLucene(java.lang.String indexDir,
int seedLength,
int minFreq,
int nonAlphabet,
VectorStore basicDocVectors,
java.lang.String[] fieldsToIndex)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TermVectorsFromLucene
public TermVectorsFromLucene(java.lang.String indexDir,
int seedLength,
int minFreq,
int nonAlphabet,
VectorStore basicDocVectors,
java.lang.String[] fieldsToIndex)
throws java.io.IOException,
java.lang.RuntimeException
- Parameters:
indexDir
- Directory containing Lucene index.seedLength
- Number of +1 or -1 entries in basic
vectors. Should be even to give same number of each.minFreq
- The minimum term frequency for a term to be indexed.basicDocVectors
- The store of basic document vectors. Null
is an acceptable value, in which case the constructor will build
this table. If non-null, the identifiers must correspond to the Lucene doc numbers.fieldsToIndex
- These fields will be indexed. If null, all fields will be indexed.
- Throws:
java.io.IOException
java.lang.RuntimeException
getBasicDocVectors
public VectorStore getBasicDocVectors()
- Returns:
- The object's basicDocVectors.
getIndexReader
public org.apache.lucene.index.IndexReader getIndexReader()
- Returns:
- The object's indexReader.
getFieldsToIndex
public java.lang.String[] getFieldsToIndex()
- Returns:
- The object's list of Lucene fields to index.
getVector
public float[] getVector(java.lang.Object term)
- Specified by:
getVector
in interface VectorStore
- Parameters:
term
- the object whose vector you want to look up
- Returns:
- a vector (of floats)
getAllVectors
public java.util.Enumeration getAllVectors()
- Specified by:
getAllVectors
in interface VectorStore
- Returns:
- an enumeration of all the object vectors in the store.
getNumVectors
public int getNumVectors()
- Specified by:
getNumVectors
in interface VectorStore
- Returns:
- a count of the number of vectors in the store.