Probability mass function (PMF) is a function that gives the probability that a discrete random variable is exactly equal to some value.

A probability mass function differs from a probability density function (PDF) in that the latter is associated with continuous rather than discrete random variables. A PDF must be integrated over an interval to yield a probability.

The probability distribution of a random variable is a function that takes the sample space as input and returns probabilities: in other words, it maps possible outcomes to their probabilities.

The joint probability distribution is useful in the cases where we are interested in the probability that  x takes a specific value while y takes another specific value. For instance, what would be the probability to get a 1 with the first dice and 2 with the second dice? The probabilities corresponding to every pair of values are written P(x=x,y=yor  P(x,y)  . This is what we call the joint probability

P(y=y|x=x) describes the conditional probability: it is the probability that the random variable y takes the specific value y given that the random variable x took the specific value x . It is different from P(y=y,x=x) which corresponds to the probability of getting both the outcome y for the random variable y and x for the random variable x . In the case of conditional probability, the event associated with the random variable x has already produced its outcome (x ).

The probability that the random variable takes the value  y given that the random variable  x took the value x is the ratio of the probability that both events occur (y takes the value y and x takes the value x ) and the probability that x takes the value x

Bernoulli Trial

Consists of a fixed number n of statistically independent Bernoulli trials, each with a probability of success p , and counts the number of successes. Probability of exactly k successes in the experiment.

is Binomial Coefficient 

Binomial Experiment

When multiple Bernoulli trials are performed, each with its own probability of success, these are sometimes referred to as Poisson trials

Data scientists use probability distributions as models for how their data are generated. In this context a model is a set of assumptions involving probabilities. Almost invariably, models are simplified representations of complex real scenarios. Car pass in a hour  problem 

Time duration independent of other time duration!!!!!  ( Assumption 1)

Independent measurement (Assumption 2) 

n is number of trial ,  p is probability of each success

Let us consider  infinity number of trails  ( n goes to infinity ).

In this probability of getting k success in that given interval