Videotaped Lecture: L05: Measuring Spread and Variation
يَا أَيُّهَا النَّاسُ إِنَّا خَلَقْنَاكُم مِّن ذَكَرٍ وَأُنثَى وَجَعَلْنَاكُمْ شُعُوبًا وَقَبَائِلَ لِتَعَارَفُوا إِنَّ أَكْرَمَكُمْ عِندَ اللَّهِ أَتْقَاكُمْ إِنَّ اللَّهَ عَلِيمٌ خَبِيرٌ
49:13 O men! Behold, We have created you all out of a male and a female, and have made you into nations and tribes, so that you might come to know one another. Verily, the noblest of you in the sight of God is the one who is most deeply conscious of Him. Behold, God is all-knowing, all-aware
A huge amount of diversity exists among human beings. The purpose of this diversity is to allow us to know each other – if people were identical, we would have a hard time differentiating between them. However, this diversity is not meant to be a means of pride; Yusuf Ali adds a parenthetical phrase explaining this:ye may know each other (not that ye may despise each other).This is a counter to the widely believed idea of nationality, where people are encourage to take pride in their nations, and hate other nations. This idea has been responsible for a huge amount of bloodshed in two world wars.
We also find diversity in data sets. For some purposes, it is useful to measure this diversity. How different are the data from each other? There is no unique, clear, and unamibiguous answer to this question. Different types of measures exist, which can be useful for different purposes. Different measures will be useful in the context of different types of arguments that we want to make. This chapter introduces the basic concepts of how to measure diversity, and when this can be useful.
CHAPTER 5
MEASURING DIVERSITY & VARIATION
Diversity is difference – the first question is: difference from what? Before we measure diversity, we must decide on what is “normal”. This is usually taken to be the measure of central location. As we discussed in previous chapter, there are many possible choices for the center. The traditional choice for center is the mean (average) – as we have discussed earlier, this is not a good choice in many data sets.