Complete data are very rare because some data are usually missing. There are typically four strategies to handle missing data.
Ignore the variables with missing data.
Delete the records that have some missing values.
Impute the missing values (i.e., simply fill in the missing values with some other value, such as the mean of similar records with data).
Use a data mining technique that can handle missing values, such as CART, which is one of a few techniques that handle missing data pretty well.
Suppose you are working on a project to model a hotel company's customer base. The goal is to determine which customer attributes are correlated with a high number of hotel stays. You have access to data from a customer survey, but one field (which is optional) is yearly income and has some missing values equaling roughly 5% of the records. Which of the strategies should be used? Explain why. Suppose you chose the third strategy. Explain how you would impute the missing incomes.