Glossary of Statistical Methodology

Glossary of Statistical Methodology for Laypersons

This glossary explains key statistical concepts in simple terms to help make research easier to understand.


Aggregate Data

Information collected from multiple sources and combined to provide an overall picture rather than looking at individual details. For example, averaging the responses from a patient survey to understand general health trends.

Agreement (Inter-rater Reliability)

A way to measure whether different people reviewing the same information reach the same conclusions. It helps check if a decision-making process is consistent and reliable.


Bayesian Statistics

A method of statistics that updates predictions as new information becomes available. For example, if you check the weather forecast in the morning, you might get a different prediction in the afternoon based on new data.

Bootstrapping

A technique used to make estimates when we don’t have all the data. It involves using a small sample of data repeatedly to predict what the larger dataset might look like.

Borrowing Strength

A method of improving predictions by using additional information. For example, if we want to know how well a school is performing, we can look at its test scores alongside scores from other schools in the area for a better comparison.

Burn-in

The early phase of a computer model where initial data is processed before the model starts producing useful results. It’s like warming up before exercising.


Calibration (Measurement Tools)

Checking whether a measuring device gives accurate readings. For example, making sure a thermometer measures temperature correctly.

Calibration (Predictive Models)

Testing whether a model’s predictions match what actually happens. If a weather forecast predicts a 70% chance of rain and it rains 70 out of 100 times, the model is well-calibrated.

Causal Inference

A way of figuring out if one thing directly causes another. For example, if people who exercise have lower blood pressure, causal inference helps determine whether exercise is the actual cause, rather than another factor like diet.

Censoring

A situation where we don’t have complete data. For example, if we are studying how long light bulbs last, but some bulbs are still working when the study ends, we don’t know their exact failure time.

Competing Risks

When multiple events could happen, but only one will. For example, in a study on heart disease, a person might die from another cause before developing heart disease, making it difficult to study the primary risk.

Composite Outcome

A way of combining multiple results into one measure. For example, in a study on heart disease, researchers might track heart attacks, strokes, and deaths together rather than separately.

Continuous Outcome

A result that can take any value within a range. For example, a person’s height can be 170 cm, 170.5 cm, or 170.55 cm, rather than just short or tall.

Convergence

When a statistical model gets closer to an accurate answer as more data is added. It’s like guessing someone’s age – your guesses improve as you get more clues.

Coverage (Simulation Studies)

A measure of how often a model’s predictions fall within a reasonable range of the actual outcome. High coverage means the model is reliable.

Cross-validation

A method of testing how well a model works by using different parts of the data for training and testing. It’s like checking if a recipe works by trying it in different kitchens.


Diagnostic Model

A tool used to estimate the likelihood that someone has a disease based on symptoms or test results.

Discrimination (Prognostic Models)

A measure of how well a model distinguishes between different outcomes. For example, a good model should correctly predict which patients will recover and which won’t.

Deviance

A way of measuring how well a model fits the data. Lower deviance means the predictions are closer to reality.


Electronic Health Records (EHRs)

A digital version of a patient’s medical history, including test results, diagnoses, and treatments.

Estimand

The specific question or goal a study is trying to answer. For example, if we want to know whether a new diet helps with weight loss, the estimand would be the average weight difference between people who followed the diet and those who didn’t.

External Validation

Testing a model using data from a different group than the one it was created with to see if it still works well.


Frequentist Statistics

A statistical approach that relies on repeated experiments and data collection to determine likelihoods.


Generalized Linear Model

A method used to find relationships between variables. For example, studying how exercise and diet affect blood pressure.

Granular Data

Detailed information rather than summaries. For example, instead of knowing that 30% of patients have high blood pressure, granular data tells us each patient’s exact blood pressure reading.


Hazard Ratio

A way of comparing the risk of an event happening in two groups. For example, if one group taking a new drug has a lower risk of stroke compared to another group, the hazard ratio shows how much lower the risk is.


Individual Patient Data

Detailed data collected from each person in a study, rather than just overall statistics.

Interpolation

A method of estimating missing values by using surrounding data points. For example, if you know temperatures at 10 AM and 12 PM, interpolation helps estimate the temperature at 11 AM.

Internal Validation

Testing how well a model works using the same data it was created from, before testing it on new data.


Likelihood

The probability of an event occurring, based on available data.

Linear Regression

A method of finding the relationship between two variables using a straight-line graph.

Logistic Regression

A method used to predict whether an event will happen or not, like whether a patient will develop a disease (yes/no).


Mean Difference

The average difference between two groups. For example, if one group has an average blood pressure of 120 and another has 130, the mean difference is 10.

Model Fit

How well a statistical model matches the actual data. A good fit means the model makes accurate predictions.

Multiple Imputation

A way of handling missing data by filling in gaps with multiple possible values and averaging the results.


Network Meta-analysis

A method for comparing multiple treatments at once by combining data from different studies.


Parametric Model

A model that assumes data follows a specific pattern, such as a straight line or curve.

Precision

A measure of how consistent results are when repeated.

Prognostic Model

A tool that predicts future outcomes, such as the likelihood of a patient developing a disease.


Regression

A method of finding patterns in data and predicting future outcomes.

Residuals

The difference between predicted and actual values. Small residuals mean the model is accurate.

Risk Difference

The difference in risk between two groups. For example, if 10% of smokers and 5% of non-smokers develop lung disease, the risk difference is 5%.

Risk Ratio

A measure of how much more likely an event is in one group compared to another.


Shrinkage

A method used to adjust models so they work better when data is limited.

Simulation Study

Using a computer to test different scenarios instead of running real-world experiments.

Survival Analysis

A method of studying how long it takes for an event to happen, such as how long patients live after a treatment.


Type I Error (False Positive)

Saying something is true when it’s not. For example, a medical test that incorrectly says someone has a disease.

Type II Error (False Negative)

Saying something is false when it’s actually true. For example, a test that incorrectly says someone does not have a disease.


Validation

Checking if a model or prediction method is accurate and reliable.


This glossary simplifies complex statistical terms to make research methods more accessible. Let us know if you need further clarifications!