pitt.search.semanticvectors
Class Search

java.lang.Object
  extended by pitt.search.semanticvectors.Search

public class Search
extends java.lang.Object

List of different types of searches that can be performed. Most involve processing combinations of vectors in different ways, in building a query expression, scoring candidates against these query expressions, or both. Most options here correspond directly to a particular subclass of VectorSearcher

See Also:
The search option is set using the --searchtype flag. Options include:
sum: Default option - build a query by adding together (weighted) vectors for each of the query terms, and search using cosine similarity.
,
sparsesum: Build a query as with SUM option, but quantize to sparse vectors before taking scalar product at search time. This can be used to give a guide to how much similarities are changed by only using the most significant coordinates of a vector.
subspace: "Quantum disjunction" - get vectors for each query term, create a representation for the subspace spanned by these vectors, and score by measuring cosine similarity with this subspace.
,
maxsim: "Closest disjunction" - get vectors for each query term, score by measuring distance to each term and taking the minimum.
,
tensor: A product similarity that trains by taking ordered pairs of terms, a target query term, and searches for the term whose tensor product with the target term gives the largest similarity with training tensor. Will almost certainly not work well until convolution / tensor relations are built into indexing phase.
,
convolution: Similar to TENSOR, product similarity that trains by taking ordered pairs of terms, a target query term, and searches for the term whose convolution product with the target term gives the largest similarity with training convolution.
,
permutation Based on Sahlgren at al. (2008). Searches for the term that best matches the position of a "?" in a sequence of terms. For example 'martin ? king' should retrieve luther as the top ranked match requires the index queried to contain unpermuted vectors, either random vectors or previously learned term vectors, and the index searched must contain permuted vectors.
balanced permutation Based on Sahlgren at al. (2008). Searches for the term that best matches the position of a "?" in a sequence of terms. For example 'martin ? king' should retrieve luther as the top ranked match requires the index queried to contain unpermuted vectors, either random vectors or previously learned term vectors, and the index searched must contain permuted vectors. This is a variant of the method, that takes the mean of the two possible search directions (search with index vectors for permuted vectors, or vice versa)
printquery Build an additive query vector (as with SUM and print out the query vector for debugging.

Constructor Summary
Search()
           
 
Method Summary
static ObjectVector[] getSearchResultVectors(java.lang.String[] args, int numResults)
          Search wrapper that returns the list of ObjectVectors.
static void main(java.lang.String[] args)
          Takes a user's query, creates a query vector, and searches a vector store.
static java.util.LinkedList<SearchResult> RunSearch(java.lang.String[] args, int numResults)
          Takes a user's query, creates a query vector, and searches a vector store.
static void usage()
          Prints the following usage message:
Search class in package pitt.search.semanticvectors
Usage: java pitt.search.semanticvectors.Search [-queryfile query_vector_file]
[-searchfile search_vector_file]
[-luceneindexpath path_to_lucene_index]
[-searchtype TYPE]
[-numsearchresults num_results]
[-lowercasequery]
<QUERYTERMS>
-luceneindexpath argument my be used to get term weights from
term frequency, doc frequency, etc.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Search

public Search()
Method Detail

usage

public static void usage()
Prints the following usage message:
Search class in package pitt.search.semanticvectors
Usage: java pitt.search.semanticvectors.Search [-queryfile query_vector_file]
[-searchfile search_vector_file]
[-luceneindexpath path_to_lucene_index]
[-searchtype TYPE]
[-numsearchresults num_results]
[-lowercasequery]
<QUERYTERMS>
-luceneindexpath argument my be used to get term weights from
term frequency, doc frequency, etc. in lucene index.
-searchtype can be one of SUM, SPARSESUM, SUBSPACE, MAXSIM,
TENSOR, CONVOLUTION, PERMUTATION, BALANCED_PERMUTATION, PRINTQUERY
<QUERYTERMS> should be a list of words, separated by spaces.
If the term NOT is used, terms after that will be negated.


RunSearch

public static java.util.LinkedList<SearchResult> RunSearch(java.lang.String[] args,
                                                           int numResults)
                                                    throws java.lang.IllegalArgumentException
Takes a user's query, creates a query vector, and searches a vector store.

Parameters:
args - See usage();
numResults - Number of search results to be returned in a ranked list.
Returns:
Linked list containing numResults search results.
Throws:
java.lang.IllegalArgumentException

getSearchResultVectors

public static ObjectVector[] getSearchResultVectors(java.lang.String[] args,
                                                    int numResults)
                                             throws java.lang.IllegalArgumentException
Search wrapper that returns the list of ObjectVectors.

Throws:
java.lang.IllegalArgumentException

main

public static void main(java.lang.String[] args)
                 throws java.lang.IllegalArgumentException
Takes a user's query, creates a query vector, and searches a vector store.

Parameters:
args - See usage();
Throws:
java.lang.IllegalArgumentException