2. A Few Keywords and Definitions for Understanding Statistics – II (contd.)
by Dr. Prafulla Dikshit (6-8 minutes read)
2.6 Types of Data
Now
that we know something about data and statistics, let’s talk a little about the
types of data. There are many different categorizations of data, which we will
eventually study. However, at the most basic level, we can see data in terms of
the characteristics we are trying to measure through the data. Thus, a broad
categorization of the attributes of the population or the sample would give us
different types of data. In statistics, there are specific terms for these
characteristics. The characteristics of a population denote parameters,
whereas the corresponding characteristics of a sample drawn from the given
population are called statistics. Ideally, they may be measuring the
same - basic attributes or properties, but their values will differ by the source
(population or sample) from which the data for the same are drawn. Parameters
are numbers that summarize data from an entire population, whereas statistics
are numbers summarizing the data from a sample. Thus,
data may be referred to
as a. population data encapsulated within the population-parameters binary
expression or b. as sample data summarized via an equivalent sample-statistics
expression. You may remember these as p-p and s-s respectively to serve as
a memory aid. So, if it is parameters you are measuring through your data, it
is a population and not a sample data we are referring to and vice-versa. We
will discuss this in detail in future posts, as it is a basis for many topics
such as parametric versus non-parametric testing in statistics; however, for
now, it suffices to know that this distinction is critical to understanding the
more advanced statistical concepts.
We now come to what I refer to as a more functional categorization of data. At the broadest level, such a categorization points to stuff that we collect as part of the research. Such stuff or information we collect may either have to do with numbers or it may be plain descriptive facts, not having anything to do with numbers. We have some fancy terms here as well, and the stuff we collect that has to do with numbers is referred to as quantitative data, and the other type as qualitative data. Qualitative data may also be denoted as Categorical. Some examples of qualitative data would include something like race, religion, gender, cultural backgrounds of students in a class, color tones of flowers in a garden, or interview responses of participants in an in-depth interview process. Examples of quantitative data would include heights, weights, or ages of students in the class, the percentage of say, white flowers in a garden, or the scores of participants in a poll survey. So far, it appears, and we can say that non-numeric data is qualitative, whereas numeric data is quantitative. However, what about something like a zip code? Can we call it quantitative, or should we categorize it as qualitative? A zip code or a postal code for a location such as 36925 seems like a numeric piece of data. After all, all it has is numbers! But what if I take another zip code like 99051 and add it to the previous one? Does it make sense? Or, if we multiply the two, do they yield anything meaningful? No. More examples – phone numbers, class roll numbers, social security numbers, etc. So here's the thing – Qualitative data is one where mathematical operations are meaningless. It does not matter if it has numbers or not. On the other hand, let's consider the other type of data – one we call quantitative, such as the ages of students in a class. If we subtract the age of two of the students – does it mean something, or does it tell us something relatively about the two? Yes, of course – it gives the age difference. Or, if we add their ages and divide them by two, it gives us their average age. What about the heights of students in the class? They can be added and divided by the number of students to get an idea of the general height in the class. So, as we see, data is quantitative, which implies that a. it is numeric, and b. some, or the other forms of mathematical operations on the data produce meaningful results. More examples – weight, wages, speed, area, time, temperature, etc. So, the bottom line is that the real difference between qualitative and quantitative data is that math operations are meaningful in the latter, but not in the former.
When
we think of quantitative or numeric data, we come across two distinct types of
such data. There
are continuous data and there are discreet data.
Discreet data are those that are finite that is we can count them and are mostly represented by whole numbers. There is only a certain number of values that can be picked from a range of values. For example, there can be only a definite number say, 30, 35, or 40, or less or more in a class, but not say 20 and a half or 40 and three-fourths or half a student in a class. Or if we have to choose people with heights between roll numbers say 25 and 40, we will be able to choose only a certain finite number of people within that range. Or, for instance, we have only six sides to a dice. Thus, we can get only one of 6 whole numbers that are, from 1 through 6, on top, upon throwing the dice. Continuous data, on the other hand, are those that have an infinite number of possible values within a given range and are not countable. For example, consider the speed of a car or an airplane. While both have a minimum speed of zero and have their top speeds respectively, the range of all values between minimum and top speeds are infinite and not countable. We may for our convenience, set the values at broad levels such as 50, 60, …… up to say 200 miles per hour (mph) for a car. However, there can be an uncountable number of values between any two identified values such as between 50 to 60 mph. For example, the car could be instantaneously traveling at 50.112, 50.113, 50.869 mph, or any other possible values within the range. We may continue in this fashion if we keep lowering the difference between values unless it reaches zero difference. A surprisingly simple example is the measures on physical feet scale ruler! If asked how many inches are there, on a feet ruler, one may answer 12 inches. However, inches can be further divided and expressed as ½, ¼, 1/8, or 1/16, 1/32, or 1/64th part of an inch, or may even go further dividing the length into smaller and smaller parts, infinitely. Such data is continuous since there is an absence of discreetness or disjointedness here. To put it more simply, continuous data is mostly 'measurements’, whereas discreet data is mostly ‘counts.’
***
That's about it for this post. In the next post, we will talk about different levels of data measurement in statistics. Questions and comments are welcome in the comments section below. You may also suggest ways to improve this blog. Signing off now. Have a happy weekend ahead!


Effortlessly simple explanation 👌
ReplyDeleteThanks. I would also appreciate questions, if any
DeleteWhat kind of data is better for average, continous or discrete?
ReplyDeleteWhile we can calculate average from both continuous and discreet data, this is slightly an unfair comparison, since for a given entity or property, it could be measured through either continuous or discrete data. However, it is not for us to choose, and it depends on whether the underlying property or variable being measured is continuous or discrete. However, in general, continuous data is more suitable for calculation of average, than discrete data. For example, the average of two time-periods say 2 minutes and 4 minutes is 1.5 minutes, which is meaningful, since time is a continuous variable, and we can actually comprehend a value of 1.5 minutes. However, if a hen lays 2 eggs in the morning and 4 in the evening the average number of eggs in a day comes to 1.5 eggs. Well, a hen can't lay 1.5 eggs, so this average though admissible for estimation purposes, is not a good representation of reality, since the number of eggs is discreet. Hope this answers your question.
DeleteI really enjoy your blogs. I don’t know what I’d do without your help! Thank you so much.
ReplyDelete