Exploratory Data Analysis Walk-Through: Part 3

4 min readAug 24, 2020


We’ll wrap our our EDA walk-through with some final questions a summary of our analysis. (View Part 1 and Part 2)

In this post, we’ll answer the last questions:

  1. Which genre has the highest ROI on production costs and box office sales?
  2. Which month experienced the most box office success in terms of gross sales?

As in Part 2, we will use the same DataFrame that was introduced in the “ROI vs. Popularity” section, df_roi_pop.

1. Which genre has the highest ROI on production costs and box office sales?

I. Top 10 genres with highest total gross sales

II. Top 10 genres with highest average ROI on production costs

III. Visualize the data

IV. Interpretation

We can see that the genre of Action,Adventure,Sci-Fi and Adventure,Animation,Comedy do very well at the box office. But genres with Horror in it have the best ROI on production costs. As a lot of horror movies have low production costs, while action and adventure movies tend to have soaring production costs (but very high box office numbers), it is up for discussion on which metric to pursue. Action/Adventure movies most likely outperform other genres due to blockbuster movies like the Avengers series and Jurassic World that resonate with younger audiences, but further exploration would be necessary to confirm this hypothesis.

2. Which month had the most box office success?

We will revisit the same DataFrame, df_roi_pop, to analyze the data. But first, the release_date column will have to be cleaned up.

I. Clean the data

II. Group movies by release month with gross sales

III. Group movies by release month with average ROI

Let’s view the average ROI return on production costs, by month, as well.

IV. Visualize the data

V. Interpretation

As expected, the summer months of May, June, July historically have the highest box office sales, as well as the holiday months of November and December.

The summer months do well at the box office as studios release their ‘tent-pole’ movies that feature large budget special effects, franchise installments, and all-star casts to draw in the summer crowds. The months of November and December also do well during the holiday season as families gather more often and studios release potentially great films to target awards season. Further exploration can be done in regards to the amount of competition (number of other films released) during these peak seasons.

Interestingly, July has a far greater ROI return on production costs than any other month. Whether this is due to outliers would need to be further investigated.


While there is no positive relationship between a movie’s production budget and how well it will be received by fans and critics, I have identified several factors that have historically shown to provide an excellent ROI on production costs. These factors have proven to lead to large box office sales numbers as well as high multiples of ROI on production costs.

First, we see that using a movie studio like WB (NL), which is New Line Cinema, a label of Warner Bros. Picture Group; or UTV Motion Picture (UTV), a subsidiary of the Walt Disney Company India have great ability in executing various processes of filmmaking, production, post-production, and distribution.

Secondly, having the lead role(s) played by actors/actresses Jennifer Lawrence, Kristen Wiig, and Mark Ruffalo show the greatest possibility of having a great ROI on production costs and gaining box office success.

Interestingly, movies that involve horror in the genre have high ROI on production costs, but do not necessarily perform the best at the box office. Movies in the Action,Adventure,Sci-Fi and Adventure,Animation,Comedy genres do extremely well and draw the biggest numbers for the box office.

As far as release schedule, movies released during May, June, July have historically been big box office winners, as well as the holiday months of November and December.

On a final note, movies released during July show a much higher ROI on production costs compared to other months. Whether this is due to a few outliers or is statistically meaningful will have to be further investigated.

The above analysis presents the current successes in the film industry and what types of movies Microsoft can work on to jumpstart their new movie studio’s success.

Final Note

While the analysis above provides a great starting point to explore next actionable steps, there must be a degree of caution on the statistical significance of the data. In other words, whether the data results obtained is solely by chance or not.

Furthermore, an ROI metric of return on production costs was utilized; however, this does not account for marketing and/or distribution costs. For some studios and certain films, these costs can be exorbitant and possibly outpace production costs, which would significantly decrease the true ROI on a film’s budget.


Part 1
Part 2




Written by Eric

How do you put out a fire in your office wastebasket? First, set fire to more wastebaskets to get a larger sample size. Setting wastebasket fires since 2020.

No responses yet