To compare the dissolution profiles, the most widely used method is the similarity factor f2, which is usually defined in the regulatory guideline as f2=50log100√1+∑t=nt=1(ˉRt−ˉTt)2n, where ˉRt and ˉTt are mean dissolution profile of reference ad test product at time t; n is the number of time points.
Nevertheless, there are several prerequisites for the use of f2 method according to the regulatory guidelines, such as no more than one time point above 85% dissolved should be used and the variability, expressed as coefficient of variation (CV), should be no more than 20% and 10% for early and later time points, respectively. See vignette Introduction to bootf2 for details.
Recently, several guidelines recommended the use of confidence interval of f2 approach using bootstrap when the traditional f2 method cannot be applied1–4. However, none of the guidelines specified in details which estimators or types of confidence intervals should be used. Therefore, the function bootf2()
will output various confidence intervals by default for several f2 estimators.
According to Shah et al, the population f2 for the inference is f2=100−25log(1+1PP∑i=1(μT,i−μR,i)2), where P is the number of time points; μT,i and μR,i, typically unknown, are population mean of test and reference product at time point i, respectively5. Several estimators for the population f2 has been described in the literature5–8. The function is able to calculate the following five estimators for f2, along with their corresponding 90% confidence intervals using bootstrap.
The estimated f2 (ˆf2, denoted by est.f2
in the function) is the one written in various regulatory guidelines. It is expressed differently, but mathematically equivalently, as ˆf2=100−25log(1+1PP∑i=1(ˉXT,i−ˉXR,i)2), where P is the number of time points; ˉXT,i and ˉXR,i are mean dissolution data at the ith time point of random samples chosen from the test and the reference population, respectively. Compared to the equation of population f2 above, the only difference is that in the equation of ˆf2 the sample means of dissolution profiles replace the population means for the approximation. In other words, a point estimate is used for the statistical inference in practice.
Bias-corrected f2 (ˆf2,bc, denoted by bc.f2
in the function) was described in the article of Shah et al5, as ˆf2,bc=100−25log(1+1P(P∑i=1(ˉXT,i−ˉXR,i)2−1nP∑i=1(S2T,i+S2R,i))), where S2T,i and S2R,i are unbiased estimates of variance at the ith time points of random samples chosen from test and reference population, respectively; and n is the sample size. ˉXT,i, ˉXR,i and P are same as described previously. As domain of the log(⋅) function needs to be positive, when variance is sufficiently high, i.e., when inequality 1nP∑i=1(S2T,i+S2R,i)≥P∑i=1(ˉXT,i−ˉXR,i)2+P is valid, ˆf2,bc cannot be calculated.
Bias-corrected f2 described earlier assumes equal weight of variance between test and reference, which might not necessarily be true, therefore, the function also includes the following variance- and bias-corrected f2 (ˆf2,vcbc, denoted by vc.bc.f2
in the function) ˆf2,vcbc=100−25log(1+1P(P∑i=1(ˉXT,i−ˉXR,i)2−1nP∑i=1(wT,i⋅S2T,i+wR,i⋅S2R,i))), where wT,i and wR,i are weighting factors for variance of test and reference products, respectively, which can be calculated as follows: wT,i=0.5+S2T,iS2T,i+S2R,i, and wR,i=0.5+S2R,iS2T,i+S2R,i. Similar to ˆf2,bc, the ˆf2,vcbc cannot be estimated when the variance is sufficiently high, i.e., when the inequality 1nP∑i=1(wT,i⋅S2T,i+wR,i⋅S2R,i)≥P∑i=1(ˉXT,i−ˉXR,i)2+P is valid.
The mathematical expectation of ˆf2 is E(ˆf2)=E(100−25log(1+1PP∑i=1(ˉXT,i−ˉXR,i)2))≈100−25log(1+E(1PP∑i=1(ˉXT,i−ˉXR,i)2))=100−25log(1+1P(P∑i=1(μT,i−μR,i)2+1nP∑i=1(σ2T,i+σ2R,i))), where σ2T,i and σ2R,i are population variance of the test and the reference product, respectively.
The expected f2 (ˆf2,exp, denoted by exp.f2
in the function) is calculated based on the mathematical expectation of ˆf2, using mean dissolution profiles and variance from samples for the approximation6–8. ˆf2,exp=100−25log(1+1P(P∑i=1(ˉXT,i−ˉXR,i)2+1nP∑i=1(S2T,i+S2R,i))).
Similarly, since ˆf2,exp assumes equal weight of variance between test and reference, the variance-corrected version (ˆf2,vcexp, denoted by vc.exp.f2
in the function) was also included in the function. ˆf2,vcexp=100−25log(1+1P(P∑i=1(ˉXT,i−ˉXR,i)2+1nP∑i=1(wT,i⋅S2T,i+wR,i⋅S2R,i))).
The following 90% confidence intervals are included in the function:
The Normal interval (denoted by normal
in the function) with bias correction was estimated according to Davison and Hinkley9, ˆf2,L=ˆf2,S−EB−√VB⋅Z1−α,ˆf2,U=ˆf2,S−EB+√VB⋅Z1−α, where ˆf2,L and ˆf2,U are the lower and upper limit of the confidence interval estimated from bootstrap samples; ˆf2,S denotes the estimators as described in Section Types of f2 and calculated using the randomly selected samples from populations; Z1−α is Φ−1(1−α), where Φ(⋅) represents standard normal cumulative distribution function and Φ−1(⋅) denotes its inverse, the quantile function; α is the type I error rate and equal to 0.05 in the current study; EB and VB are the resampling estimates of bias and variance calculated as EB=1BB∑b=1ˆf⋆2,b−ˆf2,S=ˉf⋆2−ˆf2,S,VB=1B−1B∑b=1(ˆf⋆2,b−ˉf⋆2)2, where superscript ⋆ denotes that the f2 estimates are calculated from bootstrap samples; B is the number of bootstrap samples, which is equal to 10000 in the function as default value; ˆf⋆2,b is the f2 estimate with the bth bootstrap sample and ˉf⋆2 is the mean value.
The basic interval (denoted by basic
in the function) was calculated according to Davison and Hinkley9, ˆf2,L=2ˆf2,S−ˆf⋆2,(B+1)(1−α),ˆf2,U=2ˆf2,S−ˆf⋆2,(B+1)α, where ˆf⋆2,(B+1)α and ˆf⋆2,(B+1)(1−α) are the (B+1)αth and the (B+1)(1−α)th ordered resampling estimates of f2, respectively. When (B+1)α is not an integer, the following equation is used for interpolation, ˆf⋆2,(B+1)α=ˆf⋆2,k+Φ−1(α)−Φ−1(kB+1)Φ−1(k+1B+1)−Φ−1(kB+1)(ˆf⋆2,k+1−ˆf⋆2,k), where k=[(B+1)α], the integer part of (B+1)α, ˆf⋆2,k+1 and ˆf⋆2,k are the (k+1)th and the kth ordered resampling estimates of f2, respectively.
The percentile intervals (denoted by percentile
in the function) were estimated using nine different types of quantiles, percentile Type 1 to Type 9, as summarized in Hyndman and Fan’s article10 and implemented in R
’s quantile
function. Using R
’s boot
package, program bootf2BCA
outputs a percentile interval using the equation above for interpolation. To be able to compare the results among different programs, the same interval, denoted by Percentile (boot)
in the function, is also calculated in this study.
The bias-corrected and accelerated intervals were estimated as follows according to literature11, ˆf2,L=ˆf⋆2,α1,ˆf2,U=ˆf⋆2,α2, where ˆf⋆2,α1 and ˆf⋆2,α2 are the 100α1th and the 100α2th percentile of the resampling estimates of f2, respectively. Type I errors α1 and α2 were obtained as α1=Φ(ˆz0+ˆz0+ˆzα1−ˆa(ˆz0+ˆzα)),α2=Φ(ˆz0+ˆz0+ˆz1−α1−ˆa(ˆz0+ˆz1−α)), where ˆz0 and ˆa are called bias-correction and acceleration, respectively.
There are different methods to estimate the ˆz0 and ˆa. Shah et al.5 used jackknife technique (denoted by bca.jackknife
) as ˆz0=Φ−1(#{ˆf⋆2,b<ˆf2,S}B), and ˆa=∑ni=1(ˆf2,m−ˆf2,i)36(∑ni=1(ˆf2,m−ˆf2,i)2)3/2, where #{⋅} denotes the number of element in the set, ˆf2,i is the ith jackknife statistic, ˆf2,m is the mean of the jackknife statistics, and n is the sample size.
Program bootf2BCA
gives a slightly different BCa interval with R
’s boot
package8. This approach, denoted by bca.boot
in the function, was also implemented in the function bootf2
for estimating the interval.
The complete list of arguments of the function is as follows:
bootf2(test, ref, path.in, file.in, path.out, file.out,
n.boots = 10000L, seed = 306, digits = 2L, alpha = 0.05,
regulation = c("EMA", "FDA", "WHO"), min.points = 1L,
both.TR.85 = FALSE, print.report = TRUE,
report.style = c("concise", "intermediate", "detailed"),
f2.type = c("all", "est.f2", "exp.f2", "bc.f2",
"vc.exp.f2", "vc.bc.f2"),
ci.type = c("all", "normal", "basic", "percentile",
"bca.jackknife", "bca.boot"),
quantile.type = c("all", 1:9, "boot"),
jackknife.type = c("nt+nr", "nt*nr", "nt=nr"),
time.unit = c("min", "h"), output.to.screen = FALSE,
sim.data.out = FALSE)
test
, ref
, path.in
, file.in
test
and ref
are data frames with the time as the first column, and individual units for the rest of columns. In such cases, arguments path.in
and file.in
should not be used..xlsx
or .xls
. In this case, data of test and reference should be stored in separate worksheets. The first column should be time, the rest columns are individual units. The first row is the column head indicating the names, such as ‘time,’ ‘unit 01,’ unit 02’, …. It doesn’t matter what the names are as columns will be renamed internally by the function. The important point is that the first row will not be read, so do not put dissolution data on the first row.path.in
and file.in
are provided, the argument test
and ref
should be the worksheet names inside quotation mark, e.g., "lot ABCD1234 pH 6.8"
.path.in
can be an absolute path such as"/home/myname/my.project/dat/"
, or a relative path such as "../dat/"
if the working directory is in the folder "/home/myname/my.project/analysis/"
and the data file is in the folder "/home/myname/my.project/dat/"
."C:\user\myname\my.project\dat\"
cannot be the path.in
, you have to changed it to "C:\\user\\myname\\my.project\\dat\\"
, or to "C:/user/myname/my.project/dat/"
, the same format as used in Linux system.path.out
and file.out
, output.to.screen
:
path.out
and file.out
are not provided, but argument print.report
is TRUE
, then a plain text report will be generated automatically in the current working directory with file name test_vs_ref_TZ_YYYY-MM-DD_HHMMSS.txt
, where test
and ref
are data set names of test and reference, TZ
is the time zone such as CEST, YYYY-MM-DD
is the numeric date format and HHMMSS
is the numeric time format for hour, minute, and second.output.to.screen = TRUE
, a summary report very similar to concise style report will be printed on screen.jackknife.type
nt=nr
in the function. So if there are 12 units in test and reference data sets, there will be 12 jackknife estimators.nt+nr
in the function. This is the default method. So if there are 12 units in test and reference data sets, there will be 12+12=24 jackknife estimators.nt*nr
in the function. So if there are 12 units in test and reference data sets, there will be 12×12=144 jackknife estimators.f2.type
and ci.type
are explained in the previous section.sim.data.out = TRUE
.help("bootf2")
for more details of each argument.The minimum required arguments are test
and ref
. First, let’s simulate some dissolution profiles to play with.
# time points
<- c(5, 10, 15, 20, 30, 45, 60)
tp
# model.par for reference with low variability
<- list(fmax = 100, fmax.cv = 3, mdt = 15, mdt.cv = 14,
par.r tlag = 0, tlag.cv = 0, beta = 1.5, beta.cv = 8)
# simulate reference data
<- sim.dp(tp, model.par = par.r, seed = 100, plot = FALSE)
dr
# model.par for test
<- list(fmax = 100, fmax.cv = 3, mdt = 12.29, mdt.cv = 12,
par.t tlag = 0, tlag.cv = 0, beta = 1.727, beta.cv = 9)
# simulate test data with low variability
<- sim.dp(tp, model.par = par.t, seed = 100, plot = FALSE) dt
Use default setting for most arguments. To reduce the compilation time, bootstrap number is set to 100 only.
# output file will be generated automatically
bootf2(dt$sim.disso, dr$sim.disso, n.boots = 100, print.report = FALSE,
output.to.screen = TRUE)