--- title: "Plotting proportions with ``superb``" bibliography: "../inst/REFERENCES.bib" csl: "../inst/apa-6th.csl" output: rmarkdown::html_vignette description: > This vignette shows how to plot proportions using superb. vignette: > %\VignetteIndexEntry{Plotting proportions with ``superb``} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r, echo = FALSE, warning=FALSE, message = FALSE, results = 'hide'} cat("this will be hidden; use for general initializations.\n") library(superb) library(ggplot2) options(superb.feedback = c('design','warnings') ) ``` In this vignette, we show how to plot proportions. Proportions is one way to summarize observations that are composed of succes and failure. Success can be a positive reaction to a drug, an accurate completion of a task, a survival after a dangerous ilness, etc. Failure are the opposite of success. A proportion is the number of success onto the total number of trials. Similarly, if the success are coded with "1"s and failure, with "0"s, then the proportion can be obtained indirectly by computing the mean. ## An example Consider an example where three groups of participants where examined. The raw data may look like: | |Group |Score | |-----------------|---------|---------| |subject 1 |1 | 1 | |subject 2 |1 | 1 | |... |... |... | |subject n1 |1 | 0 | |subject n1+1 |2 | 0 | | ... |... |... | |subject n1+n2 |2 | 1 | |subject n1+n2+1 |3 | 1 | |... |... |... | |subject n1+n2+n3 |3 | 0 | in which there is $n_1$ participant in Group 1, $n_2$ in Group 2, and $n_3$ in Group 3. The data can be compiled by reporting the number of success (let's call them $s$) and the number of participants. One example of results could be | | s | n | proportion | |---------------|------|------|------------| | Group 1 | 10 | 30 | 33.3% | | Group 2 | 18 | 28 | 64.3% | | Group 3 | 10 | 26 | 38.5% | Although making a plot of these proportions is easy, how can you plot error bars around these proportions? ## The arcsine transformation First proposed by Fisher, the arcsine transform is one way to represent proportions. This transformation stretches the extremities of the domain (near 0% and near 100%) so that the sampling variability is constant for any observed proportion. Also, this transformation make the sampling distribution nearly normal so that $z$ test can be used. An improvement over the Fisher transformation was proposed by Anscombe (1948). It is given by $$ A(s, n) = \sin^{-1}\left( \sqrt{\frac{s + 3/8}{n + 3/4}} \right) $$ The variance of such transformation is also theoretically given by $$ Var_A = \frac{1}{4(n+1/2)} $$ As such, we have all the ingredients needed to make confidence intervals! ## Defining the data In what follows, we assume that the data are available in compiled form, as in the second table above. Because ``superb()`` only takes raw data, we will have to convert these into a long sequence of zeros and ones. ```{r} # enter the compiled data into a data frame: compileddata <- data.frame(cbind( s = c(10, 18, 10), n = c(30, 28, 26) )) ``` The following converts the compiled data into a long data frame containing ones and zeros so that ``superb()`` can be fed raw data: ```{r} group <- c() scores <- c() for (i in 1: (dim(compileddata)[1])) { group <- c( group, rep(i, compileddata$n[i] ) ) scores <- c( scores, rep(1, compileddata$s[i]), rep(0, compileddata$n[i] - compileddata$s[i]) ) } dta <- data.frame( cbind(group = group, scores = scores ) ) ``` ## Defining the transformation in R In the following, we define the A (Anscombe) transformation, the standard error of the transformed scores, and the confidence intervals: ```{r} # the Anscombe transformation for a vector of binary data 0|1 A <-function(v) { x <- sum(v) n <- length(v) asin(sqrt( (x+3/8) / (n+3/4) )) } SE.A <- function(v) { 0.5 / sqrt(length(v+1/2)) } CI.A <- function(v, gamma = 0.95){ SE.A(v) * sqrt(qchisq(gamma, df=1)) } ``` This is all we need to make a basic plot with ``superb()`` ... but we need a few libraries, so let's load them here: ```{r, echo = TRUE, message = FALSE} library(superb) library(ggplot2) library(scales) # for asn_trans() non-linear scale ``` Here we go: ```{r, message=FALSE, echo=TRUE, fig.width = 3, fig.cap="**Figure 1**. Anscombe-transformed scores as a function of group."} # ornate to decorate the plot a little bit... ornate = list( theme_bw(base_size = 10), labs(x = "Group" ), scale_x_discrete(labels=c("Group A", "Group B", "Group C")) ) superb( scores ~ group, dta, statistic = "A", error = "CI", adjustment = list( purpose = "difference"), plotStyle = "line", errorbarParams = list(color="blue") # just for the pleasure! ) + ornate + labs(y = "Anscombe-transformed scores" ) ``` ## Reversing the transformation to see proportions. The above plot shows Anscombe-transform scores. This may not be very intuitive. It is then possible to undo the transformation so as to plot proportions instead. The complicated part is to undo the confidence limits. ```{r} # the proportion of success for a vector of binary data 0|1 prop <- function(v){ x <- sum(v) n <- length(v) x/n } # the de-transformed confidence intervals from Anscombe-transformed scores CI.prop <- function(v, gamma = 0.95) { y <- A(v) n <- length(v) cilen <- CI.A(v, gamma) ylo <- y - cilen yhi <- y + cilen # reverse arc-sin transformation: naive approach cilenlo <- ( sin(ylo)^2 ) cilenhi <- ( sin(yhi)^2 ) c(cilenlo, cilenhi) } ``` Nothing more is needed. We can make the plot with these new functions: ```{r, message=FALSE, echo=TRUE, fig.width = 3, fig.cap="**Figure 2**. Proportion as a function of group."} superb( scores ~ group, dta, statistic = "prop", error = "CI", adjustment = list( purpose = "difference"), plotStyle = "line", errorbarParams = list(color="blue") ) + ornate + labs(y = "Proportions" ) + scale_y_continuous(trans=asn_trans()) ``` This new plot is actually identical to the previous one as we plotted the proportions using a non-linear scale (the ``asn_trans()`` scale for arcsine). However, the vertical axis is now showing graduations between 0% and 100% as is expected of proportions. ## Returning to the example What can we conclude from the plot? You noted that we plotted difference-adjusted confidence intervals. Hence, if at least one result is not included in the confidence interval of another result, then the chances are good that they differ significantly. Running an analysis of proportions, it indicates the presence of a main effect of Group ($F(2,\infty)= 3.06, p = .047$). How to perform an analysis of proportions (ANOPA) is explained in @lc22. We see from the plot that the length of the error bars are about all the same, suggesting homogeneous variance (because all the sample are of comparable size). This is always the case as Anscombe transform is a 'variance-stabilizing' transformation in the sense that it makes all the variances identical. ## In summary The ``superb`` framework can be used to display any summary statistics. Here, we showed how ``superb()`` can be used with proportions. For within-subject designs involving proportions, it is also possible to use the correlation adjustments [as demonstrated in @lc22]. # References