You will likely cycle back and forth between question and data in the very early stages.
Some questions are great, but we do not have the data to answer them.
Sometimes we start looking at the data and realize the preliminary question is not quite the right one. That's okay: Go back and revise it!
This is where you will spend a lot of time. Survey results reported by Forbes show that data cleaning is the most time consuming and most unpleasant task. The better we are at these tasks, the easier our lives will be.
Time to get to work answering our question. Do we see patterns that suggest an answer to our question?
At this point, your plots are for you and your team's consumption, so they do not have to be perfect. You should be thinking, however, about how the final plots will look.
Again, we may find things here that send us back to step 3 (data cleaning) or step 1 (asking questions).
Hugo Bowne-Anderson via Harvard Business Review
"The vast majority of my guests tell [me] that the key skills for data scientists are...the abilities to learn on the fly and to communicate well in order to answer business questions, explaining complex results to nontechnical stakeholders."
The whole article is interesting...