A brief primer on scientific and mathematical notations

Last updated on Dec 1, 2020 3 min read Educational

As I finished writing the final draft of my first first author paper, survClust, there were a lot of other firsts! In my opinion writing the methods and a crisp conclusion and discussion were the difficult parts.

Below, I share my notes that really came in handy while I was writing the methods section of my manuscript.

What this is?

Notes on how to describe a statistical methodology. Some basic rules and notations that you should keep in mind.

Scientific notations

Random variables are usually written in uppercase roman letters: $X, Y$ , etc.
Probability density functions (pdfs) and probability mass functions are denoted by lowercase letters, e.g. $f_{(x)}$ , or $f_{X} (x)$ .
Cumulative distribution functions (cdfs) are denoted by uppercase letters, e.g. $F (x)$ , or $F_{X} (x)$ .

Let's summarize the above three points with an example -

A random variable $X$ has density $f_{X}$ as follows -

$P r [a \leq X \leq b] = \int_{a}^{b} f_{X} (x) d x$

Hence, if $F_{X}$ is the cumulative distribution function of $X$ then:

$F_{X} (x) = \int_{- \infty}^{x} f_{X} (u) d u,$

and

$f_{X} (x) = \frac{d}{d x} F_{X} (x) .$

Now, let's go over some quick statistical nitty-gritties:

Greek letters $θ, β$ are commonly used to denote unknown parameters.
Placing a hat, or caret, over a true parameter denotes an estimator of it, e.g., $\hat{θ}$ is an estimator for $θ$ .
Building on the above point the sample mean, variance and correlation coefficient are denoted as $\bar{x}, s^{2}, r$ respectively. On the other hand population parameters are represented as follows - population mean $μ$ , population variance $σ^{2}$ , and population correlation as $ρ$

Finally most of the time you will need to know the following writing notions while drafting the methods section of your manuscript -

Input or independent variables are denoted by $X$ , output or dependent variables are denoted by $Y$ , and qualitative outputs by $G$ .
If $X$ is a vector, annotate its values by subscripts $X_{j}$
Observed values are written in lowercase; hence the $i^{t h}$ observed value of $X$ is written as $x_{i}$ , where $x_{i}$ is a scalar or vector.
Matrices are represented by bold uppercase letters; for example a matrix $X$ , with dimensions $N$ x $p$ i.e a set of $N$ input $p$ -vectors. In general, vectors will not be bold, except when they have $N$ components; Note that all vectors are assumed to be column vectors.

Let's break it down with an example -

Given a vector of inputs $X^{T} = (X_{1}, X_{2}, . . ., X_{p})$ , we predict the output $Y$ via a simple linear regression -

$\hat{Y} = {\hat{β}}_{0} + \sum_{n = 1}^{p} X_{j} {\hat{β}}_{j}$ Or writing this in a vector form as an inner product - $\hat{Y} = X^{T} \hat{β}$ To solve this we need to estimate a value of $β$ such that it minimizes the Residual Sum of Squares or RSS as follows -

$R S S (β) = \sum_{i = 1}^{N} (y_{i} - x_{i}^{T} β)^{2}$

Or in matrix notation we can write it as,

$R S S (β) = (y - X β)^{T} (y - X β)$ where $X$ is an $N \times p$ matrix with each row an input vector, and $y$ is an $N$ -vector of the outputs. See how $y$ is in bold in the above question.

Or take one of your favorite papers, and try to go over its methods section to iron and figure out other key details!

notes

A brief primer on scientific and mathematical notations

What this is?

Scientific notations

Arshi Arora

Research Biostatistician