|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectpitt.search.semanticvectors.Search
public class Search
List of different types of searches that can be performed. Most
involve processing combinations of vectors in different ways, in
building a query expression, scoring candidates against these
query expressions, or both. Most options here correspond directly
to a particular subclass of VectorSearcher
The search option is set using the --searchtype flag. Options include:
sum:
Default option - build a query by adding together (weighted)
vectors for each of the query terms, and search using cosine
similarity.
,
sparsesum:
Build a query as with SUM
option, but quantize to
sparse vectors before taking scalar product at search time.
This can be used to give a guide to how much similarities are
changed by only using the most significant coordinates of a
vector.
subspace:
"Quantum disjunction" - get vectors for each query term, create a
representation for the subspace spanned by these vectors, and
score by measuring cosine similarity with this subspace.
,
maxsim:
"Closest disjunction" - get vectors for each query term, score
by measuring distance to each term and taking the minimum.
,
tensor:
A product similarity that trains by taking ordered pairs of
terms, a target query term, and searches for the term whose tensor
product with the target term gives the largest similarity with training tensor.
Will almost certainly not work well until convolution / tensor relations are
built into indexing phase.
,
convolution:
Similar to TENSOR
, product similarity that trains
by taking ordered pairs of terms, a target query term, and
searches for the term whose convolution product with the target
term gives the largest similarity with training convolution.
,
permutation
Based on Sahlgren at al. (2008). Searches for the term that best matches
the position of a "?" in a sequence of terms. For example
'martin ? king' should retrieve luther as the top ranked match
requires the index queried to contain unpermuted vectors, either
random vectors or previously learned term vectors, and the index searched must contain
permuted vectors.
balanced permutation
Based on Sahlgren at al. (2008). Searches for the term that best matches
the position of a "?" in a sequence of terms. For example
'martin ? king' should retrieve luther as the top ranked match
requires the index queried to contain unpermuted vectors, either
random vectors or previously learned term vectors, and the index searched must contain
permuted vectors. This is a variant of the method, that takes the mean
of the two possible search directions (search with index vectors for permuted vectors,
or vice versa)
printquery
Build an additive query vector (as with SUM
and
print out the query vector for debugging.
Constructor Summary | |
---|---|
Search()
|
Method Summary | |
---|---|
static ObjectVector[] |
getSearchResultVectors(java.lang.String[] args,
int numResults)
Search wrapper that returns the list of ObjectVectors. |
static void |
main(java.lang.String[] args)
Takes a user's query, creates a query vector, and searches a vector store. |
static java.util.LinkedList<SearchResult> |
RunSearch(java.lang.String[] args,
int numResults)
Takes a user's query, creates a query vector, and searches a vector store. |
static void |
usage()
Prints the following usage message:
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public Search()
Method Detail |
---|
public static void usage()
Search class in package pitt.search.semanticvectors
Usage: java pitt.search.semanticvectors.Search [-queryfile query_vector_file]
[-searchfile search_vector_file]
[-luceneindexpath path_to_lucene_index]
[-searchtype TYPE]
[-numsearchresults num_results]
[-lowercasequery]
<QUERYTERMS>
-luceneindexpath argument my be used to get term weights from
term frequency, doc frequency, etc. in lucene index.
-searchtype can be one of SUM, SPARSESUM, SUBSPACE, MAXSIM,
TENSOR, CONVOLUTION, PERMUTATION, BALANCED_PERMUTATION, PRINTQUERY
<QUERYTERMS> should be a list of words, separated by spaces.
If the term NOT is used, terms after that will be negated.
public static java.util.LinkedList<SearchResult> RunSearch(java.lang.String[] args, int numResults) throws java.lang.IllegalArgumentException
args
- See usage();numResults
- Number of search results to be returned in a ranked list.
numResults
search results.
java.lang.IllegalArgumentException
public static ObjectVector[] getSearchResultVectors(java.lang.String[] args, int numResults) throws java.lang.IllegalArgumentException
java.lang.IllegalArgumentException
public static void main(java.lang.String[] args) throws java.lang.IllegalArgumentException
args
- See usage();
java.lang.IllegalArgumentException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |