Basics of statistics: From a clinician’s perspective

Karthik Raman

Consultant cardiac surgeon, Kauvery Heart City, Trichy

Measures of central tendency

Mean

The mean is a fundamental measure of the central tendency in statistics, calculated by summing all values in a dataset and dividing by the total number of values. This simple yet powerful metric provides an average that represents a typical value within a distribution. However, caution should be exercised when interpreting the mean, particularly in the presence of outliers or skewed data. In such cases, the mean may not accurately represent the true central value of the dataset because extreme values can disproportionately influence the results. To address this limitation, the mean is often used in conjunction with the standard deviation, which quantifies the spread of the data points around the mean. Together, these measures offer a more comprehensive understanding of the characteristics of the dataset, allowing for more informed statistical analysis and decision making.

Median

The median is a crucial measure of central tendency in statistics, representing the middle value in a dataset when arranged in ascending order. It is particularly useful for analyzing skewed data distributions, as it is not influenced by extreme outliers like the mean. To find the median, one must first sort the data points from lowest to highest. If the dataset contains an odd number of values, the median is the middle value. However, for an even number of data points, the median is calculated by taking the average of the two middle values. The median is often used in conjunction with the interquartile range to provide a comprehensive view of data dispersion, especially in non-normally distributed datasets. This combination offers valuable insights into the central tendency and spread of the data, making it an essential tool for statisticians and researchers across various fields.

Mode

Mode is the most frequent value that appears in a dataset. It is particularly useful in identifying the most common occurrence within a set of observations. In cases where no single value appears more frequently than others, the mode cannot be calculated, indicating a lack of a clear central tendency. For example, in medical research, the mode can be used to determine the most frequent complication following a specific procedure. This information is valuable for clinicians as it allows them to focus on a particular area of concern in patient outcomes. By highlighting the most common issue, healthcare providers can prioritize their efforts, allocate resources more effectively, and develop targeted strategies to address the predominant complication, ultimately improving patient care and treatment outcomes.

Stats Terminologies

RELATIVE RISK (RISK RATIO)

ODDs RATIO

HAZARD RATIO

Study comparing incidence of stroke between technique a and technique b

  • 𝑅𝐸𝐿𝐴𝑇𝐼𝑉𝐸 𝑅𝐼𝑆𝐾= (𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑠𝑡𝑟𝑜𝑘𝑒 𝑎𝑓𝑡𝑒𝑟 𝑡𝑒𝑐ℎ𝑛𝑖𝑞𝑢𝑒 𝑎 )/(𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑠𝑡𝑟𝑜𝑘𝑒 𝑎𝑓𝑡𝑒𝑟 𝑡𝑒𝑐ℎ𝑛𝑖𝑞𝑢𝑒 𝑏)
  • 𝑂𝑑𝑑𝑠 𝑟𝑎𝑡𝑖𝑜= (𝑂𝑑𝑑𝑠 𝑜𝑓 𝑑𝑒𝑣𝑒𝑙𝑜𝑝𝑖𝑛𝑔  𝑠𝑡𝑟𝑜𝑘𝑒 𝑎𝑓𝑡𝑒𝑟 𝑡𝑒𝑐ℎ𝑛𝑖𝑞𝑢𝑒 𝑎)/(𝑂𝑑𝑑𝑠 𝑜𝑓 𝑑𝑒𝑣𝑒𝑙𝑜𝑝𝑖𝑛𝑔 𝑠𝑡𝑟𝑜𝑘𝑒 𝑎𝑓𝑡𝑒𝑟 𝑡𝑒𝑐ℎ𝑛𝑖𝑞𝑢𝑒 𝑏)

In a centre 1000 patients underwent CABG of which 300 developed stroke post-operatively

Probability

(𝑇ℎ𝑜𝑠𝑒 𝑤ℎ𝑜 𝑑𝑒𝑣𝑒𝑙𝑜𝑝𝑒𝑑 𝑠𝑡𝑟𝑜𝑘𝑒)/(𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑎𝑡𝑖𝑒𝑛𝑡𝑠)

300/1000

0.3

When we compare 2 probabilities it is RELATIVE RISK

Odds

(𝑇ℎ𝑜𝑠𝑒 𝑤ℎ𝑜 𝑑𝑒𝑣𝑒𝑙𝑜𝑝𝑒𝑑 𝑠𝑡𝑟𝑜𝑘𝑒)/(𝑇ℎ𝑜𝑠𝑒 𝑤ℎ𝑜 𝐷𝐼𝐷 𝑁𝑂𝑇 𝑑𝑒𝑣𝑒𝑙𝑜𝑝 𝑠𝑡𝑟𝑜𝑘𝑒)

300/700

0.42

When we compare 2 Odds it is Odds ratio.

  • 𝑂𝑑𝑑𝑠 𝑟𝑎𝑡𝑖𝑜= (𝑂𝑑𝑑𝑠 𝑜𝑓 𝑑𝑒𝑣𝑒𝑙𝑜𝑝𝑖𝑛𝑔  𝑠𝑡𝑟𝑜𝑘𝑒 𝑎𝑓𝑡𝑒𝑟 𝑡𝑒𝑐ℎ𝑛𝑖𝑞𝑢𝑒 𝑎)/(𝑂𝑑𝑑𝑠 𝑜𝑓 𝑑𝑒𝑣𝑒𝑙𝑜𝑝𝑖𝑛𝑔 𝑠𝑡𝑟𝑜𝑘𝑒 𝑎𝑓𝑡𝑒𝑟 𝑡𝑒𝑐ℎ𝑛𝑖𝑞𝑢𝑒 𝑏)
  • 𝑅𝐸𝐿𝐴𝑇𝐼𝑉𝐸 𝑅𝐼𝑆𝐾= (𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑠𝑡𝑟𝑜𝑘𝑒 𝑎𝑓𝑡𝑒𝑟 𝑡𝑒𝑐ℎ𝑛𝑖𝑞𝑢𝑒 𝑎)/(𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑠𝑡𝑟𝑜𝑘𝑒 𝑎𝑓𝑡𝑒𝑟 𝑡𝑒𝑐ℎ𝑛𝑖𝑎𝑢𝑒 𝑏)
  • 𝐻𝐴𝑍𝐴𝑅𝐷 𝑅𝐴𝑇𝐼𝑂= (𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑠𝑡𝑟𝑜𝑘𝑒 𝑎𝑓𝑡𝑒𝑟 𝑡𝑒𝑐ℎ𝑛𝑖𝑞𝑢𝑒 𝑡 1 𝑦𝑒𝑎𝑟)/(𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑠𝑡𝑟𝑜𝑘𝑒 𝑎𝑓𝑡𝑒𝑟 𝑡𝑒𝑐ℎ𝑛𝑖𝑞𝑢𝑒 𝑏 𝑎𝑡 1 𝑦𝑒𝑎𝑟).

Why so much confusion?

  • In general, as the event rates are less, OR= RR.
  • If event rates are more, OR over estimates RR.
  • OR is for case-control studies.
  • As in exposed and non-exposed.(retrospective)
  • No of people at risk is not known.
  • RR is generally for cohort studies (prospective)
  • No of people at risk is known.
  • OR can also be used.

Interpretation of these ratios [RR/HR/OR]

𝑶𝒅𝒅𝒔 𝒓𝒂𝒕𝒊𝒐(𝑶𝑹)=  (𝑂𝑑𝑑𝑠 𝑜𝑓 𝑑𝑒𝑣𝑒𝑙𝑜𝑝𝑖𝑛𝑔  𝑠𝑡𝑟𝑜𝑘𝑒 𝑎𝑓𝑡𝑒𝑟 𝑡𝑒𝑐ℎ𝑛𝑖𝑎𝑢𝑒 𝑎)/(𝑂𝑑𝑑𝑠 𝑜𝑓 𝑑𝑒𝑣𝑒𝑙𝑜𝑝𝑖𝑛𝑔 𝑠𝑡𝑟𝑜𝑘𝑒 𝑎𝑓𝑡𝑒𝑟 𝑡𝑒𝑐ℎ𝑛𝑖𝑎𝑢𝑒 𝑏)

  • OR <1= Odds of developing stroke after tech a is less than tech b
  • OR >1= Odds of developing stroke after tech a is GREATER than tech b
  • OR IS 1= Odds of developing stroke after tech a and tech b is the same

The man behind the enigma

RA fisher

The P value is defined as the probability under the assumption of no effect or no difference (null hypothesis), of obtaining a result equal to or more extreme than what was actually observed.

P value- an enigma

  • RA fisher – 1930s
  • Strength of evidence against null hypothesis
  • P- for probability.
  • Difference due to chance.
  • 05- convention
  • Smaller the value more the significance.

P value- fallacies

  • Statistical significance against clinical significance.
  • Arbitrary value of 0.05
  • Test of significance – matters a lot.
  • Size of sample.
  • Spread of data.

Confidence Interval

  • 95% (95 out of 100 times) the CI contains the value in question.
  • Width of CI- a measure of reliability.
  • Narrower the CI- more is the reliability.
  • Sample size affects width.

Randomized Controlled Trials- Analysis of results

  • Intention to treat- ITT
  • As treated- AT

Points to remember

  • Cross over can happen
  • Attrition
  • ITT- has the original number
  • AT- has altered numbers but the study design is maintained.
  • More cross overs- then separate analysis with ITT.

 

Key points

  • CI: If it contains 1 there is no significant difference
  • In K-M curves: if there is No Overlap – Definitely significant difference
  • Overlap: depends on the degree of overlap

Dr. Karthik Raman
Consultant cardiac surgeon

Kauvery Hospital