Aggregated Data Insights – Airline Delay Data
Los Angeles | Phoenix | San Diego | San Francisco | Seattle | ||
ALASKA | On Time | 497 | 221 | 212 | 503 | 1841 |
Delayed | 62 | 12 | 20 | 102 | 305 | |
AM WEST | On Time | 694 | 4840 | 383 | 320 | 201 |
Delayed | 117 | 412 | 65 | 129 | 61 |
Overall Percentage Delay Rate (ALASKA)= 13.3%
Overall Percentage Delay Rate (AM WEST) = 10.9%
The airline with a lower delay rate: Based on the individual destinations in the data above, the airline with the lowest delay rate is Alaska Airline. However, overall, the America West Airline has a lower percentage delay rate. The data above is taken as the number of times the airline reaches the destination on time and the number of time delayed.
There are instances when data can be interpreted in more than one way, especially when the direction or reversal of a comparison method is used. This leads to a difference in results, a situation known as Simpson’s Paradox (Fung, 2013). In the current context, the paradox occurs, as seen in the percentage delay rates, since the data is combined for the overall percentage delay rate. For combined data, the obtained results may not really represent the trend of the phenomena accurately.
As a data analyst or scientist, the responsibility when dealing with such data is to study the situation and decide whether to separate the data or analyze it when combined. This requires the data scientist to know what is the object of analyzing the data to choose the most appropriate approach to take. In the above context, the data analyst can choose to use individual destinations in data analysis.
Reference
Fung, K. (2013). Number sense; How to use Big Data to Your Advantage. New York: McGraw Hill Education. ISBN:978-0-07-179967-6
ORDER A PLAGIARISM-FREE PAPER HERE
We’ll write everything from scratch
Question
We often aggregate data in SQL, Tableau, and other environments. Aggregated data can provide insights, but these aggregated insights must be studied with care. Take a few minutes and study the data shown below. You might start by entering the data into a spreadsheet. Then answer the question, “Which airline had a lower delay rate?”
airline delay data
Also, do you have familiarity (through direct experience or having heard or read a story) where the same summary data might be reasonably interpreted two different ways? Where do your responsibilities as a data analyst or data scientist begin and end when analyzing and interpreting data like this?
Please try to work this out before looking things up. You’ll receive full credit for any reasonable post! You can read more about this data and Simpson’s Paradox in the excellent book Numbersense: How to Use Big Data to Your Advantage by Kaiser Fung. The material relevant to this discussion is available online at amazon.com in the book’s free “Look inside” preview (pp. 1-3 and 12-14). Kaiser Fung also covered this here: http://nymetro.chapter.informs.org/prac_cor_pubs/09-2013%20Kaiser%20Fung.pdf. And it’s been covered elsewhere, including here: http://www.philender.com/courses/intro/simpson.html