I'm taking a course in Discrete Mathematics this summer, and my book doesn't offer a very good explanation of Big-O notation. I understand that if $f(x)$ is $\mathcal O(g(x))$ it means that there exists some constant $C$ and some value $k$ (for $x$) where $C\cdot g(x)$ will be greater than $f(x)$. However, I don't understand how to prove this.
The book example gives the function $f(x) = x^2 + 2x + 1$ is $\mathcal O(x^2)$. My book states that "we observe that we can readily estimate the size of $f(x)$ when $x > 1$ because $x < x^2$ and $1 < x^2$ when $x > 1$. It follows that $0 \le x^2 + 2x + 1 \le x^2 + 2x^2 + x^2 = 4x^2$."
By that example, I see that the constant $C$ is $4$ and $k$ is $1$. However, I don't understand at all why they just decided to throw $x^2$ everywhere and then sum the coefficients to get $4$ as the answer for $C$. Please help!!!