Entering edit mode

3.8 years ago

freuv
▴
20

I am interested in using the ‘maxstat’ package to run univariate survival analysis based on ‘high’ and ‘low’ gene expression.

https://cran.r-project.org/web/packages/maxstat/maxstat.pdf

However, I am looking for clarification on what the output of the maxstat package means.

From my understanding, the gene expression ‘cutoff’ generated is the optimal suggested value to use to classify ‘high’ versus ‘low’ expression. The p-value is whether the difference in survival is significant or not?

Please confirm/clarify.

Thank you, F

Why should one use a cut-point to classify 'high' and 'low' expression versus, say, selecting the patients with the top 10% highest expression and lowest 10% expression as your 'high' and 'low' cohort?

The main aim is to find an optimal cutoff for continuous variables. What if you would like to select 50%-50% instead of top 10% for each ? Maxstat provides you a statically proved an optimal cutpoint.

Hi arta, thank you for your quick response.

If I am understanding correctly, are they not two different approaches? In the maxstat approach, we divide all patients within the 10-90% percentile of gene expression by a single cut-point value. What is above that is 'high' expression and what is below is classified as 'low.'

The other approach selects two extremes within the data, i.e. top 10/20/30 and bottom 10/20/30.

I'm unclear on what is the more robust approach to achieve an unbiased positive correlation with survival.

Hello Arta. Sorry I'm late but I will like to know if the maxstat can be used for standardized data, that is when the gene data has been converted to z values. Can the maxstat determine an optimal cut off point using this data?