Suppose that I perform a stochastic task n times (like tossing a coin) and that p is the probability that one of the possible outcomes occurs. If K is the stochastic variable that measures how many times this outcome occurred during the whole experiment, and if all the events are mutually independent, then the probability that K is equal to a specific k, with 0\leq k\leq n, is
\mathrm{Pr}\{K=k\} = \binom n k p^k (1-p)^{n-k} =: b_n(k),
which makes intuitive sense to me.
Now suppose that I want to know what the average, or expected, value of K is going to be: the formulae tell me that
\begin{split}
\langle K\rangle &= \sum_{k=0}^n k b_n(k) = \sum_{k=0}^n k \binom n k p^k (1-p)^{n-k} \\
&= \sum_{k=0}^n k \frac{n(n-1)!}{k(k-1)!(n-k)!} p p^{k-1} (1-p)^{n-k} \\
&= np \sum_{\kappa=1}^\nu \frac{\nu!}{\kappa!(\nu-\kappa)!} p^\kappa (1-p)^{\nu-\kappa} = np(p + 1- p)^{\nu} \\ &= np,
\end{split}
where I’ve made the substitutions \kappa = k-1 and \nu = n-1. Notwithstanding the mathematical certainty of this derivation, it also makes perfect intuitive sense to me that np should be the expected value of K, since it is the product of the probability of the outcome times the number of trials performed: if there’s a 1/6 chance that I roll a 5 on a fair dice, and I throw it 600 times, then I expect to see a 5 about 100 of those times.
If instead I want to know how I should expect the outcomes to vary around the expected value, I may compute the variance of K: with the same substitutions as before,
\begin{split}
\mathrm{Var}[K] &= \langle K^2\rangle - \langle K \rangle^2 = \left(\sum_{k=0}^n k^2 b_n(k) \right) -n^2p^2 \\
&= \left( \sum_{k=1}^n k^2 \binom n k p^k (1-p)^{n-k}\right) -n^2p^2 \\
&= \left( np \sum_{\kappa = 0}^\nu (\kappa +1) \binom \nu \kappa p^\kappa (1-p)^{\nu-\kappa} \right)-n^2p^2 \\
&= \left(np \Big(\sum_{\kappa=0}^\nu \kappa b_\nu(\kappa) +(p + 1-p)^\nu \Big) \right) -n^2p^2 \\
&= np(\nu p + 1) - n^2 p^2 = np( np - p + 1 - np) \\ &= np(1-p).
\end{split}
Again, the derivation is mathematically crystalline; but why should I expect that this be the formula for the variance of K? Why does multiplying the expected value of K times the probability that my outcome doesn’t occur give me a measure of the dispersion of K? In other words, how can I justify this formula for variance intuitively in a similar way as I can with the formula for the mean?
EDIT. Up until now, I’ve received answers that are just perfectly good explanations of how to derive the formula for the variance of K in ways that differ from the one presented above. That’s not what I’m asking for. The ideal answer should contain as few formulae as possible, and use simple enough words to explain not why the formula is mathematically true, but why it’s reasonable and couldn’t possibly be otherwise – something like the intuitive explanation for \langle K\rangle = np that I gave above.