% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ml-prepare-dataset.R
\name{ml_prepare_dataset}
\alias{ml_prepare_dataset}
\title{Creates the 'label' and 'features' columns}
\usage{
ml_prepare_dataset(
  x,
  formula = NULL,
  label = NULL,
  features = NULL,
  label_col = "label",
  features_col = "features",
  keep_original = TRUE,
  ...
)
}
\arguments{
\item{x}{A \code{tbl_pyspark} object}

\item{formula}{Used when \code{x} is a \code{tbl_spark}. R formula.}

\item{label}{The name of the label column.}

\item{features}{The name(s) of the feature columns as a character vector.}

\item{label_col}{Label column name, as a length-one character vector.}

\item{features_col}{Features column name, as a length-one character vector.}

\item{keep_original}{Boolean flag that indicates if the output will contain,
or not, the original columns from \code{x}. Defaults to \code{TRUE}.}

\item{...}{Added for backwards compatibility. Not in use today.}
}
\value{
A \code{tbl_pyspark}, with either the original columns from \code{x}, plus the
'label' and 'features' column, or, the 'label' and 'features' columns only.
}
\description{
Creates the 'label' and 'features' columns
}
\details{
At this time, 'Spark ML Connect', does not include a Vector Assembler
transformer. The main thing that this function does, is create a 'Pyspark'
array column. Pipelines require a 'label' and 'features' columns. Even though
it is is single column in the dataset, the 'features' column will contain all
of the predictors insde an array. This function also creates a new 'label'
column that copies the outcome variable. This makes it a lot easier to remove
the 'label', and 'outcome' columns.
}
