CoordinateMatrix#
- class pyspark.mllib.linalg.distributed.CoordinateMatrix(entries, numRows=0, numCols=0)[source]#
Represents a matrix in coordinate format.
- Parameters
- entries
pyspark.RDD
An RDD of MatrixEntry inputs or (int, int, float) tuples.
- numRowsint, optional
Number of rows in the matrix. A non-positive value means unknown, at which point the number of rows will be determined by the max row index plus one.
- numColsint, optional
Number of columns in the matrix. A non-positive value means unknown, at which point the number of columns will be determined by the max row index plus one.
- entries
Methods
numCols
()Get or compute the number of cols.
numRows
()Get or compute the number of rows.
toBlockMatrix
([rowsPerBlock, colsPerBlock])Convert this matrix to a BlockMatrix.
Convert this matrix to an IndexedRowMatrix.
Convert this matrix to a RowMatrix.
Transpose this CoordinateMatrix.
Attributes
Entries of the CoordinateMatrix stored as an RDD of MatrixEntries.
Methods Documentation
- numCols()[source]#
Get or compute the number of cols.
Examples
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(1, 0, 2), ... MatrixEntry(2, 1, 3.7)])
>>> mat = CoordinateMatrix(entries) >>> print(mat.numCols()) 2
>>> mat = CoordinateMatrix(entries, 7, 6) >>> print(mat.numCols()) 6
- numRows()[source]#
Get or compute the number of rows.
Examples
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(1, 0, 2), ... MatrixEntry(2, 1, 3.7)])
>>> mat = CoordinateMatrix(entries) >>> print(mat.numRows()) 3
>>> mat = CoordinateMatrix(entries, 7, 6) >>> print(mat.numRows()) 7
- toBlockMatrix(rowsPerBlock=1024, colsPerBlock=1024)[source]#
Convert this matrix to a BlockMatrix.
- Parameters
- rowsPerBlockint, optional
Number of rows that make up each block. The blocks forming the final rows are not required to have the given number of rows.
- colsPerBlockint, optional
Number of columns that make up each block. The blocks forming the final columns are not required to have the given number of columns.
Examples
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(6, 4, 2.1)]) >>> mat = CoordinateMatrix(entries).toBlockMatrix()
>>> # This CoordinateMatrix will have 7 effective rows, due to >>> # the highest row index being 6, and the ensuing >>> # BlockMatrix will have 7 rows as well. >>> print(mat.numRows()) 7
>>> # This CoordinateMatrix will have 5 columns, due to the >>> # highest column index being 4, and the ensuing >>> # BlockMatrix will have 5 columns as well. >>> print(mat.numCols()) 5
- toIndexedRowMatrix()[source]#
Convert this matrix to an IndexedRowMatrix.
Examples
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(6, 4, 2.1)]) >>> mat = CoordinateMatrix(entries).toIndexedRowMatrix()
>>> # This CoordinateMatrix will have 7 effective rows, due to >>> # the highest row index being 6, and the ensuing >>> # IndexedRowMatrix will have 7 rows as well. >>> print(mat.numRows()) 7
>>> # This CoordinateMatrix will have 5 columns, due to the >>> # highest column index being 4, and the ensuing >>> # IndexedRowMatrix will have 5 columns as well. >>> print(mat.numCols()) 5
- toRowMatrix()[source]#
Convert this matrix to a RowMatrix.
Examples
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(6, 4, 2.1)]) >>> mat = CoordinateMatrix(entries).toRowMatrix()
>>> # This CoordinateMatrix will have 7 effective rows, due to >>> # the highest row index being 6, but the ensuing RowMatrix >>> # will only have 2 rows since there are only entries on 2 >>> # unique rows. >>> print(mat.numRows()) 2
>>> # This CoordinateMatrix will have 5 columns, due to the >>> # highest column index being 4, and the ensuing RowMatrix >>> # will have 5 columns as well. >>> print(mat.numCols()) 5
- transpose()[source]#
Transpose this CoordinateMatrix.
New in version 2.0.0.
Examples
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(1, 0, 2), ... MatrixEntry(2, 1, 3.7)]) >>> mat = CoordinateMatrix(entries) >>> mat_transposed = mat.transpose()
>>> print(mat_transposed.numRows()) 2
>>> print(mat_transposed.numCols()) 3
Attributes Documentation
- entries#
Entries of the CoordinateMatrix stored as an RDD of MatrixEntries.
Examples
>>> mat = CoordinateMatrix(sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(6, 4, 2.1)])) >>> entries = mat.entries >>> entries.first() MatrixEntry(0, 0, 1.2)