Next: , Previous: Descriptive Statistics, Up: Statistics


25.2 Basic Statistical Functions

Octave supports various helpful statistical functions. Many are useful as initial steps to prepare a data set for further analysis. Others provide different measures from those of the basic descriptive statistics.

— Function File: center (x)
— Function File: center (x, dim)

If x is a vector, subtract its mean. If x is a matrix, do the above for each column. If the optional argument dim is given, operate along this dimension.

See also: studentize.

— Function File: studentize (x)
— Function File: studentize (x, dim)

If x is a vector, subtract its mean and divide by its standard deviation.

If x is a matrix, do the above along the first non-singleton dimension. If the optional argument dim is given, operate along this dimension.

See also: center.

— Function File: n = histc (x, edges)
— Function File: n = histc (x, edges, dim)
— Function File: [n, idx] = histc (...)

Produce histogram counts.

When x is a vector, the function counts the number of elements of x that fall in the histogram bins defined by edges. This must be a vector of monotonically increasing values that define the edges of the histogram bins. n(k) contains the number of elements in x for which edges(k) <= x < edges(k+1). The final element of n contains the number of elements of x exactly equal to the last element of edges.

When x is an N-dimensional array, the computation is carried out along dimension dim. If not specified dim defaults to the first non-singleton dimension.

When a second output argument is requested an index matrix is also returned. The idx matrix has the same size as x. Each element of idx contains the index of the histogram bin in which the corresponding element of x was counted.

See also: hist.

— Function File: cut (x, breaks)

Create categorical data from numerical or continuous data by cutting into intervals.

If breaks is a scalar, the data is cut into that many equal-width intervals. If breaks is a vector of break points, the category has length (breaks) - 1 groups.

The returned value is a vector of the same size as x telling which group each point in x belongs to. Groups are labelled from 1 to the number of groups; points outside the range of breaks are labelled by NaN.

See also: histc.

— Function File: c = nchoosek (n, k)

Compute the binomial coefficient or all combinations of n. If n is a scalar then, calculate the binomial coefficient of n and k, defined as

           /   \
           | n |    n (n-1) (n-2) ... (n-k+1)       n!
           |   |  = ------------------------- =  ---------
           | k |               k!                k! (n-k)!
           \   /

If n is a vector generate all combinations of the elements of n, taken k at a time, one row per combination. The resulting c has size [nchoosek (length (n), k), k].

nchoosek works only for non-negative integer arguments; use bincoeff for non-integer scalar arguments and for using vector arguments to compute many coefficients at once.

See also: bincoeff.

— Function File: perms (v)

Generate all permutations of v, one row per permutation. The result has size factorial (n) * n, where n is the length of v.

As an example, perms([1, 2, 3]) returns the matrix

            1   2   3
            2   1   3
            1   3   2
            2   3   1
            3   1   2
            3   2   1

— Function File: ranks (x, dim)

Return the ranks of x along the first non-singleton dimension adjusted for ties. If the optional argument dim is given, operate along this dimension.

See also: spearman, kendall.

— Function File: run_count (x, n)
— Function File: run_count (x, n, dim)

Count the upward runs along the first non-singleton dimension of x of length 1, 2, ..., n-1 and greater than or equal to n.

If the optional argument dim is given then operate along this dimension.

— Function File: probit (p)

For each component of p, return the probit (the quantile of the standard normal distribution) of p.

— Function File: logit (p)

For each component of p, return the logit of p defined as

          logit(p) = log (p / (1-p))

See also: logistic_cdf.

— Function File: cloglog (x)

Return the complementary log-log function of x, defined as

          cloglog(x) = - log (- log (x))

— Function File: mahalanobis (x, y)

Return the Mahalanobis' D-square distance between the multivariate samples x and y, which must have the same number of components (columns), but may have a different number of observations (rows).

— Function File: [t, l_x] = table (x)
— Function File: [t, l_x, l_y] = table (x, y)

Create a contingency table t from data vectors. The l_x and l_y vectors are the corresponding levels.

Currently, only 1- and 2-dimensional tables are supported.