pitt.search.semanticvectors
Class VectorUtils

java.lang.Object
  extended by pitt.search.semanticvectors.VectorUtils

public class VectorUtils
extends java.lang.Object

This class provides standard vector methods, e.g., cosine measure, normalization, tensor utils.


Constructor Summary
VectorUtils()
           
 
Method Summary
static float[] addVectors(float[] vector1, float[] vector2, int weight)
          Add two vectors.
static float[] addVectors(float[] vector1, short[] sparseVector, int weight)
          Add two vectors.
static float[][] createZeroTensor(int dim)
           
static float euclideanDistance(float[] vec1, float[] vec2)
           
static float[] Floats(double[] vector)
           
static float[][] Floats(double[][] matrix)
           
static short[] floatVectorToSparseVector(float[] floatVector, int seedLength)
          Take a vector of floats and simplify by quantizing to a sparse format.
static short[] generateRandomVector(int seedLength, java.util.Random random)
          Generates a basic sparse vector (dimension = Flags.dimension) with mainly zeros and some 1 and -1 entries (seedLength/2 of each) each vector is an array of length seedLength containing 1+ the index of a non-zero value, signed according to whether this is a + or -1.
static float[] getConvolutionFromTensor(float[][] tensor)
          Returns the convolution of two vectors; see Plate, Holographic Reduced Representation, p.
static float[] getConvolutionFromVectors(float[] vec1, float[] vec2)
          Returns the convolution of two vectors; see Plate, Holographic Reduced Representation, p.
static float getInnerProduct(float[][] ten1, float[][] ten2)
          Returns the inner product of two tensors.
static int getNearestVector(float[] vector, float[][] candidates)
          Get nearest vector from list of candidates.
static short[] getNLargestPositions(float[] values, int numResults)
          Given an array of floats, return an array of indices to the n largest values.
static float[][] getNormalizedTensor(float[][] tensor)
          Returns the normalized version of a 2 tensor, i.e.
static float[] getNormalizedVector(float[] vec)
          Returns the normalized version of a vector, i.e.
static float[][] getOuterProduct(float[] vec1, float[] vec2)
          Returns a 2-tensor which is the outer product of 2 vectors.
static float getSumScalarProduct(float[] testVector, java.util.ArrayList<float[]> vectors)
          Sums the scalar products of a vector and each member of a list of vectors.
static float[][] getTensorSum(float[][] ten1, float[][] ten2)
          Returns the sum of two tensors.
static boolean isZeroTensor(float[][] ten)
           
static boolean isZeroVector(float[] vec)
           
static boolean orthogonalizeVectors(java.util.ArrayList<float[]> vectors)
          The orthogonalize function takes an array of vectors and orthogonalizes them using the Gram-Schmidt process.
static float[] permuteVector(float[] indexVector, int rotation)
          This method implements rotation as a form of vector permutation, as described in Sahlgren, Holst and Kanervi 2008.
static short[] permuteVector(short[] indexVector, int rotation)
          This method implements rotation as a form of vector permutation, as described in Sahlgren, Holst and Kanervi 2008.
static void printMatrix(float[][] matrix)
           
static void printVector(float[] vector)
           
static float scalarProduct(float[] vec1, float[] vec2)
          Returns the scalar product (dot product) of two vectors for normalized vectors this is the same as cosine similarity.
static float[] sparseVectorToFloatVector(short[] sparseVector, int dimension)
          Translate sparse format (listing of offsets) into full float vector.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

VectorUtils

public VectorUtils()
Method Detail

printVector

public static void printVector(float[] vector)

printMatrix

public static void printMatrix(float[][] matrix)

Floats

public static float[] Floats(double[] vector)

Floats

public static float[][] Floats(double[][] matrix)

isZeroVector

public static boolean isZeroVector(float[] vec)

isZeroTensor

public static boolean isZeroTensor(float[][] ten)

createZeroTensor

public static float[][] createZeroTensor(int dim)

scalarProduct

public static float scalarProduct(float[] vec1,
                                  float[] vec2)
Returns the scalar product (dot product) of two vectors for normalized vectors this is the same as cosine similarity.

Parameters:
vec1 - First vector.
vec2 - Second vector.

euclideanDistance

public static float euclideanDistance(float[] vec1,
                                      float[] vec2)

getNearestVector

public static int getNearestVector(float[] vector,
                                   float[][] candidates)
Get nearest vector from list of candidates.

Parameters:
vector - The vector whose nearest neighbor is to be found.
candidates - The list of vectors from whoe the nearest is to be chosen.
Returns:
Integer value referencing the position in the candidate list of the nearest vector.

getNormalizedVector

public static float[] getNormalizedVector(float[] vec)
Returns the normalized version of a vector, i.e. same direction, unit length.

Parameters:
vec - Vector whose normalized version is requested.

getNormalizedTensor

public static float[][] getNormalizedTensor(float[][] tensor)
Returns the normalized version of a 2 tensor, i.e. an array of arrays of floats.


getOuterProduct

public static float[][] getOuterProduct(float[] vec1,
                                        float[] vec2)
Returns a 2-tensor which is the outer product of 2 vectors.


getTensorSum

public static float[][] getTensorSum(float[][] ten1,
                                     float[][] ten2)
Returns the sum of two tensors.


getInnerProduct

public static float getInnerProduct(float[][] ten1,
                                    float[][] ten2)
Returns the inner product of two tensors.


getConvolutionFromTensor

public static float[] getConvolutionFromTensor(float[][] tensor)
Returns the convolution of two vectors; see Plate, Holographic Reduced Representation, p. 76.


getConvolutionFromVectors

public static float[] getConvolutionFromVectors(float[] vec1,
                                                float[] vec2)
Returns the convolution of two vectors; see Plate, Holographic Reduced Representation, p. 76.


getSumScalarProduct

public static float getSumScalarProduct(float[] testVector,
                                        java.util.ArrayList<float[]> vectors)
Sums the scalar products of a vector and each member of a list of vectors. If the list is orthonormal, this gives the cosine similarity of the test vector and the subspace generated by the orthonormal vectors.


orthogonalizeVectors

public static boolean orthogonalizeVectors(java.util.ArrayList<float[]> vectors)
The orthogonalize function takes an array of vectors and orthogonalizes them using the Gram-Schmidt process. The vectors are orthogonalized in place, so there is no return value. Note that the output of this function is order dependent, in particular, the jth vector in the array will be made orthogonal to all the previous vectors. Since this means that the last vector is orthogonal to all the others, this can be used as a negation function to give an vector for vectors[last] NOT (vectors[0] OR ... OR vectors[last - 1].

Parameters:
vectors - ArrayList of vectors (which are themselves arrays of floats) to be orthogonalized in place.

generateRandomVector

public static short[] generateRandomVector(int seedLength,
                                           java.util.Random random)
Generates a basic sparse vector (dimension = Flags.dimension) with mainly zeros and some 1 and -1 entries (seedLength/2 of each) each vector is an array of length seedLength containing 1+ the index of a non-zero value, signed according to whether this is a + or -1.
e.g. +20 would indicate a +1 in position 19, +1 would indicate a +1 in position 0. -20 would indicate a -1 in position 19, -1 would indicate a -1 in position 0.
The extra offset of +1 is because position 0 would be unsigned, and would therefore be wasted. Consequently we've chosen to make the code slightly more complicated to make the implementation slightly more space efficient.

Returns:
Sparse representation of basic ternary vector. Array of short signed integers, indices to the array locations where a +/-1 entry is located.

getNLargestPositions

public static short[] getNLargestPositions(float[] values,
                                           int numResults)
Given an array of floats, return an array of indices to the n largest values.


floatVectorToSparseVector

public static short[] floatVectorToSparseVector(float[] floatVector,
                                                int seedLength)
Take a vector of floats and simplify by quantizing to a sparse format. Lossy.


sparseVectorToFloatVector

public static float[] sparseVectorToFloatVector(short[] sparseVector,
                                                int dimension)
Translate sparse format (listing of offsets) into full float vector. The random vector is in condensed (signed index + 1) representation, and is converted to a full float vector by adding -1 or +1 to the location (index - 1) according to the sign of the index. (The -1 and +1 are necessary because there is no signed version of 0, so we'd have no way of telling that the zeroth position in the array should be plus or minus 1.)


permuteVector

public static short[] permuteVector(short[] indexVector,
                                    int rotation)
This method implements rotation as a form of vector permutation, as described in Sahlgren, Holst and Kanervi 2008. This supports encoding of N-grams, as rotating random vectors serves as a convenient alternative to random permutation

Parameters:
indexVector - the sparse vector to be permuted
rotation - the direction and number of places to rotate
Returns:
sparse vector with permutation

permuteVector

public static float[] permuteVector(float[] indexVector,
                                    int rotation)
This method implements rotation as a form of vector permutation, as described in Sahlgren, Holst and Kanervi 2008. This supports encoding of N-grams, as rotating random vectors serves as a convenient alternative to random permutation

Parameters:
indexVector - the sparse vector to be permuted
rotation - the direction and number of places to rotate
Returns:
vector with permutation

addVectors

public static float[] addVectors(float[] vector1,
                                 float[] vector2,
                                 int weight)
Add two vectors. Overloaded vector to handle either float[] + float[] or float[] + sparse vector

Parameters:
vector1 - initial vector
vector2 - vector to be added
weight - weight (presently only term frequency implemented - may need this to take floats later)
Returns:
sum of two vectors

addVectors

public static float[] addVectors(float[] vector1,
                                 short[] sparseVector,
                                 int weight)
Add two vectors. Overloaded vector to handle either float[] + float[] or float[] + sparse vector

Parameters:
vector1 - initial vector
sparseVector - vector to be added
weight - weight (presently only term frequency implemented - may need this to take floats later)
Returns:
sum of two vectors