Librarian by trade, geek by choice, artist by nature.
The word “data” is pretty meaningless to me right now. I’m in the data analysis phase of my dissertation, plus Alex and I are re-watching Star Trek: TNG Season 5 (Worf’s family name has honor again! Tasha Yar’s half-Romulan daughter is nefarious! Huzzah!), so I hear/say/breathe “data” a lot. Data-data-data. Was it ever a real word?
Among the many, many “Things I’ve Learned That It Would Have Been Useful To Know Before Starting My Dissertation” is that Likert-scaled data (questions coded with ranged response options, for instance: very unsatisfied, unsatisfied, satisfied, very satisfied) can/should be analyzed by nonparametric methods. Erm. So let me just say that the basic statistical analyses the courses I took tended to focus on (particularly t-tests, ANOVAs, and ANCOVAs) are parametric. Nonparametric stats were mentioned, most particularly Chi-square, but not in nearly as much detail. It was usually a long step-by-step analysis of how to perform an ANOVA, then, “you can also do this by nonparametric stats, but most of the time you won’t need to, so moving along to Chapter 13…” Egads. So I’m flipping through books, skimming articles, haunting “step-by-step statistical nonparametric SPSS analysis” web search results, and I’ve made an appointment with the UNT Stats help department to see if they can help walk me through this Land of Nonparametric Crazy. (PhD Comics doesn’t give the real explanation for an ANOVA, but it’s pretty revealing nonetheless…)
So whyfore this Land of Nonparametric Crazy? Let me explain. Basically, Likert-scaled data is coded into sequential numbers (in the example above, very unsatisfied = 1, unsatisfied = 2, satisfied = 3, very satisfied = 4). Since the responses are recorded numerically, you can do all sorts of statistical numeric math-y mumbo-jumbo on them. BUT: essentially the numbers are just CODES for certain attitudes/feelings/etc. Sure, they look sequential (or “continuous” if you want to be all math-y about it), and in most cases they are ranked. For instance, the example above has a ranked order from negative attitude to positive attitude, that corresponds to the numbers. But it isn’t a scale with an absolute equal distance between the intervals (between each response). Now, if you’re recording temperatures, or people’s heights in inches, that’s truly continuous interval data. Those are scales with defined, unchanging points. But who can say where the cutoff is between “satisfied” and “very satisfied?”
Because of that strange property of Likert-scaled data, we refer to it as either “ordinal” (meaning it’s ranked, but there aren’t equal measurable distances between the response options) or as “nominal” (meaning that the number is really just a code, indicating a category of response rather than a numeric value). That’s a super-non-technical explanation, but I’m trying to make this as non-jargon-y as possible.
Aaaand voila, you can’t (well, you theoretically shouldn’t) analyze ordinal/nominal data by parametric methods. Parametric stats are really powerful, but the catch is that they rely on a bunch of assumptions, things like your data being regularly distributed (a whole other ball of wax), and that your data is interval/continuous. When you violate those assumptions, you turn to nonparametric statistical methods.
Lt. Cmdr. Data himself would be able to explain this much more quickly and correctly, but it would go waaay over my head. Then again, if he was sitting in my living room, I’d just plug my USB drive of data into his ear and ask him to crunch all the numbers for me. He’d ask me why I persist in using odd English slang like “dude” and “y’all” and I’d explain it’s a social convention that connotes my playful, laid-back demeanor. He’d nod knowingly and try to incorporate “dude” into his speech during the next meeting of the senior staff, Picard would get huffy, and by the end of the episode we’d all have a good laugh at poor Data’s expense.
Wait… wasn’t I talking about statistics?