| Title: | Variable Selection for Binary Data Using the EM Algorithm | 
| Version: | 0.1 | 
| Description: | Implements variable selection for high dimensional datasets with a binary response variable using the EM algorithm. Both probit and logit models are supported. Also included is a useful function to generate high dimensional data with correlated variables. | 
| Depends: | R (≥ 3.1.3) | 
| License: | GPL-3 | 
| LazyData: | true | 
| RoxygenNote: | 5.0.1 | 
| NeedsCompilation: | no | 
| Packaged: | 2016-01-12 23:02:10 UTC; jcs8v_000 | 
| Author: | John Snyder [aut, cre] | 
| Maintainer: | John Snyder <jcs8v6@mail.missouri.edu> | 
| Repository: | CRAN | 
| Date/Publication: | 2016-01-13 08:49:37 | 
Variable Selection For Binary Data Using The EM Algorithm
Description
Conducts EMVS analysis
Usage
BinomialEMVS(y, x, type = "probit", epsilon = 5e-04, v0s = ifelse(type ==
  "probit", 0.025, 5), nu.1 = ifelse(type == "probit", 100, 1000),
  nu.gam = 1, lambda.var = 0.001, a = 1, b = ncol(x),
  beta.initial = NULL, sigma.initial = 1, theta.inital = 0.5, temp = 1,
  p = ncol(x), n = nrow(x), SDCD.length = 50)
Arguments
| y | responses in 0-1 coding | 
| x | X matrix | 
| type | probit or logit model | 
| epsilon | tuning parameter | 
| v0s | tuning parameter, can be vector | 
| nu.1 | tuning parameter | 
| nu.gam | tuning parameter | 
| lambda.var | tuning parameter | 
| a | tuning parameter | 
| b | tuning parameter | 
| beta.initial | starting values | 
| sigma.initial | starting value | 
| theta.inital | startng value | 
| temp | not sure | 
| p | not sure | 
| n | not sure | 
| SDCD.length | not sure | 
Value
probs is posterior probabilities
Examples
#Generate data
set.seed(1)
n=25;p=500;pr=10;cor=.6
X=data.sim(n,p,pr,cor)
#Randomly generate related beta coefficnets from U(-1,1)
beta.Vec=rep(0,times=p)
beta.Vec[1:pr]=runif(pr,-1,1)
y=scale(X%*%beta.Vec+rnorm(n,0,sd=sqrt(3)),center=TRUE,scale=FALSE)
prob=1/(1+exp(-y))
y.bin=t(t(ifelse(rbinom(n,1,prob)>0,1,0)))
result.probit=BinomialEMVS(y=y.bin,x=X,type="probit")
result.logit=BinomialEMVS(y=y.bin,x=X,type="logit")
which(result.probit$posts>.5)
which(result.logit$posts>.5)
High Dimensional Correlated Data Generation
Description
Generates an high dimensional dataset with a subset of columns being related to the response, while controlling the maximum correlation between related and unrelated variables.
Usage
data.sim(n = 100, p = 1000, pr = 3, cor = 0.6)
Arguments
| n | sample size | 
| p | total number of variables | 
| pr | the number of variables related to the response | 
| cor | the maximum correlation between related and unrelated variables | 
Value
Returns an nxp matrix with the first pr columns having maximum correlation cor with the remaining p-pr columns
Examples
data=data.sim(n=100,p=1000,pr=10,cor=.6)
max(abs(cor(data))[abs(cor(data))<1])