Data Analytics Workflow

Data analytics workflow

  • How do research projects proceed?

  • Every project is a bit different, but there are some general patterns.

Workflow

  1. Pose a preliminary question

  2. Find the appropriate data

  3. Get the data into usable form

  4. Preliminary analysis

  5. Final analysis

  6. Report your results

1. Pose a preliminary question

  • What do you want to know? Why is it interesting?

  • Your question may (and probably will) change depending on what you find in the data.

2. Find the appropriate data

  • You will likely cycle back and forth between question and data in the very early stages.

  • Some questions are great, but we do not have the data to answer them.

  • Sometimes we start looking at the data and realize the preliminary question is not quite the right one. That's okay: Go back and revise it!

3. Get the data into usable form

  • Load the data
  • Clean data
  • Deal with missing values and outliers
  • Sanity checks: Are the types correct? Variables have the right signs?

This is where you will spend a lot of time. Survey results reported by Forbes show that data cleaning is the most time consuming and most unpleasant task. The better we are at these tasks, the easier our lives will be.

4. Preliminary analysis

Time to get to work answering our question. Do we see patterns that suggest an answer to our question?

  1. Plots...lots of plots
  2. Summary statistics
  3. Simple correlations

At this point, your plots are for you and your team's consumption, so they do not have to be perfect. You should be thinking, however, about how the final plots will look.

Again, we may find things here that send us back to step 3 (data cleaning) or step 1 (asking questions).

5. Final analysis

  • By this point you are pretty sure you can answer your question: Or some version of it.
  • Develop more formal models. Regression analysis? Machine learning?
  • Create the final vizualizations you will use in your report.
  • Perform robustness checks.

6. Report your findings

  • Reporting is the most important part of your analysis.
  • If people cannot understand what you have found, your work loses value.
  • This is where some economic training can help. We ask and answer questions.
  • Know your audience. Is this for technical readers? CEOs? Marketing execs?
  • Your figures should be immaculate.
  • Report only what is necessary. Do not get bogged down in details. [Unless the audience wants details.]

More on communication

Hugo Bowne-Anderson via Harvard Business Review

"The vast majority of my guests tell [me] that the key skills for data scientists are...the abilities to learn on the fly and to communicate well in order to answer business questions, explaining complex results to nontechnical stakeholders."

The whole article is interesting...