Great stuff Mozz. Good to get a refresher on the higher recruit numbers and lower P values from 005. Agree that when we do higher numbers in 002 (eg 117 per cohort in stage 1) we will get better clustering of data and more 'significance' compared with 60 recruits in 005 and <20 in 008.
I was looking at the charts we present and the error bars. Presumably we are trying to show no overlapping of the error bars, so that treatment will result in an improvement over non-treatment (and non-overlapping kind of infers significance?). Looking closer at these error bars, the fine print says they are "standard error"...
Interesting to look into what this actually is (I was unaware of this til looking it up). It kind of appears like it is the spread/distribution or standard deviation of the data, but its actually not. It’s a measure of the "uncertainty of the mean of the population". This seems quite important so I did a little work on it.
When you have a big population, any sampling will be a subset and so a sample will have its own mean which is different from the population mean.
This difference, between the mean you measure in a sample and where the entire population mean "could" be is the standard error (our error bar).
If you have a small sample the mean of that could vary a lot from the population mean (eg. inferring the average age of all Australians by sampling say just 20 people). The more and more you sample, the more representative and accurate your mean becomes relative to the whole population (and hence the lower the standard error). The stats say standard error is proportional to 1/sqrt(N) - so if you want to halve your error, you need to do 4 times as many samples. Our 002 trial (stage 1) at 117 samples is about 6 times the sample space of 008 at 20 samples so our error bars should reduce a lot (less than half) with this sampling. 002 stage 2 adds even more samples to the mix (234 samples) and less 'error'. Reducing the size of the error bars obviously reduces the chance they overlap so increases the significance (in broad terms I guess).
So all we need to show is that the mean of the treatment group is better than the mean of the placebo group. We don’t need to show that the treatment bell curve is always better than the placebo bell curve. This chart is from a presso of a different ASX listed biotech shows this concept. Its illustrative only, but infers the results can overlap a lot, but the main thing is that there is a clear and significant difference in the mean of each group.
So the two main factors are the treatment effect (how far apart the means are) and the consistency or spread S of the treatment.As long as there is ‘some’ treatment effect its largely just a question of doing enough numbers to show that statistically (like your example of asprin needing massive numbers to prove a small effect).
Its hard to argue we have not seen a treatment effect with iPPS of some sort so its just down to the numbers.Having achieved significance in the WOMAC data at 56 days (the endpoint of our P3) for 20 patients is great but its for 2mg/kg twice a week.Say for example we took that 1.5mg/kg twice a week arm in 002 – and presume 25% less effect, then to reduce those bars by 25% to compensate we need less than twice the number of recruits. Say 40 should do it.Double it again to 80 to be safe, and its hard to imagine that 350 (stage 1 and 2), or 234 (stage 2) or even 117 (stage 1 alone!) won’t also pass that hurdle.You might even ask whether we would even have to do stage 2 if we get significance in stage 1?
Onequestion on all of this is the effect of rescue medication.We know the placebo group took many times more rescue medication than treatment so its not a like for like comparison (more like a comparison against standard of care rather than no treatment).Can we back outthe effect of rescue medication somehow, or argue that we need even less significance (not that its hard of us to achieve) because all we might need to do is be at least as good as standard of care with added benefits in anti-inflammation and cartilage preservation?If we get significance above standard of care we seem to be doubly significant against no treatment?