L1 Training Video Example and the Importance of Denominator Choice
The User Story for L1, slide 8.7.4, Challenge #1 (Catering Analysis) provides the SUM Demo Model and asks for a list of goals scored summed by each player's favorite breakfast, in order to plan the menu. Summing the goals as requested by the assignment reveals that the players who chose Cold Cereal collectively scored 4 goals, which was more than the goals collectively scored by players who ate any other single breakfast. This seems to imply that Cold Cereal is the best breakfast to eat before the big game.
However, this approach fails to ask an important question: how many players actually chose each breakfast?
For example, suppose there were 20 people on the team, and one of them ate Magic Beans for breakfast that gave him super-enhanced athletic abilities, and he single-footedly scored 15 goals. The rest ate rice for breakfast and scored only one goal each. If we analyzed it the way the problem asks, what would the data say?
19 players ate rice for breakfast, and they each scored 1 goal, so the players that ate rice collectively scored 19 goals. The players that ate Magic Beans only scored 15 goals. Therefore, rice is a better breakfast than magic beans. So, even though Magic Beans are clearly the best choice, (barring any regulations against magical doping) the data analysis places them second to rice. Aggregating the data as requested in the problem could give the Catering Manager the wrong impression of which breakfast was the most effective.
Instead, the table should have been set up something like this:
Line Item | Formula | Summary |
Breakfast Goals | 'DATA01 Scoring by Player'.Goals[SUM: 'SYS01 Roster Details'.Favorite Breakfast] | Sum |
Breakfast Popularity | 'DATA01 Scoring by Player'.count[SUM: 'SYS01 Roster Details'.Favorite Breakfast] | Sum |
Breakfast Average | Breakfast Goals / Breakfast Popularity | Formula |
Note the ".count" in the formula for "Breakfast Popularity". This is not a built-in function (though I think it probably should be.) Instead, I manually added a line item "count" to the DATA01 Scoring by Player module with the simple formula "1", and Sum as the summary function. This allowed the players to be tallied and used as a denominator for aggregating the effectiveness of each breakfast.
This method, with the given data, produces the following table:
Line Item | Eggs | Meat | Fruit | Rice | Hot Cereal | Cold Cereal | All Foods |
Breakfast Goals | 3 | 3 | 1 | 1 | 2 | 4 | 14 |
Breakfast Popularity | 5 | 4 | 6 | 4 | 6 | 7 | 32 |
Breakfast Average | 0.6000 | 0.7500 | 0.1667 | 0.2500 | 0.3333 | 0.5714 | 0.4375 |
By analyzing the popularity of each breakfast we can see that 7 players chose Cold Cereal, and only 4 chose meat. In other words, the cold-cereal-eaters scored more goals simply because there were more of them. Dividing the total goals scored per breakfast by the number of players who chose that breakfast reveals that Meat was actually the better choice.
This is an important concept to keep in mind. Always make sure you're choosing an appropriate denominator, or you could miss out on Magic Beans!