WHY HYPOTHESIS?

INTRODUCTION

Have you ever tried collecting data, looking at it, and then drawing questions that you intend to answer only to find out that your data can’t properly answer your made-up questions? Well, I have. I downloaded some data on Amazon book sales from 2009-2019 to get my hands “dirty” as some would say. I drew up some interesting questions and was excited to begin, only to open the excel sheet and look through the data and find that of the 8-10 questions I drew up for myself, the data available to me could only really answer…what??...two questions or so. It was fun. When you intend to perform some experiment or do some research, it is important that you properly state/frame the problem and define exactly what questions you seek to answer and what success would look like within the context of said experiment. Then this will guide you on what data you actually need to source for. Now that we’ve gotten that amateur mistake of mine out of the way, we can talk about how this is done. On that note, I introduce to you a very popular and interesting topic in statistics and research called Hypothesis Testing. A hypothesis is basically a guess or assumption albeit an educated one. It is this that will guide the formulation of your problem statement, the techniques you’ll use to test or treat the subjects of the experiment, and ultimately answer the question or educated guess. A hypothesis test is basically applied to help us check if our observation is by mere chance or our treatment is a plausible reason for said observation.

via MEME

In the previous paragraph, I mentioned subjects. Subjects basically refer to people on which the tests are carried out (or those who take the test). Ok, but why is this piece of information important? Well in research or experiments, tests are carried out on subjects and these as we’ve said are people but there’s just one little problem. At times the subjects in question could be of a very large number making it impractical (expensive and unrealistic) to perform the treatment (test) on them. These “large” numbers of people are often referred to as a population – no, not the entire world population – but the focus of your study. A not so foolproof example would be if you wanted to test the effect of social media ads on the buying habits of adolescents in a country or region, administering the test to every single one of them would be expensive and time-consuming. However, you could give this treatment (or test) to a subset of the said population and the results from this are generalized for a population (this is basically the whole idea behind inferential statistics). This subset is known as a sample, which must be representative (a representative sample is one that reflects the characteristics of the entire population, and every member of your chosen population can be chosen at random) of the population in order for the generalization to be acceptable.

NULL AND ALTERNATE (RESEARCH) HYPOTHESIS

So far, we pretty much have an idea of the following concepts; hypothesis, hypothesis testing, and subjects (population and samples). With this, we can safely (hopefully) move on to two more very important concepts in hypothesis testing; null and alternate (or research) hypothesis…these are some fancy names, right? Well, stay with me. These two concepts are extremely important to any research study.

Null hypothesis rejection meme via memegenerator.net

The null hypothesis H0, whose whole purpose by the way is to be proven wrong (kinda), is officially defined as a statement of equality or no relationship, in the context of our previous example, the null hypothesis would read, “social media ads have no influence on the buying habits of adolescents”. The null hypothesis is important because it serves as the baseline and benchmark from which we begin our study and which the final outcome is “checked”. Now you’re probably tempted to ask why (maybe not, but still…). The null hypothesis ensures that the researcher doesn’t start his/her study or experiments leaning to any side of the outcome (in essence, unbiased). This means that before any test is done in anger, the null hypothesis is regarded as true until proven otherwise. Using our example (once again, not so fool-proof), we don’t really have a reason to believe that social media ads have a significant effect on their buying habits especially as we’ve not done proper research and have no facts to back up such an assertion. This is why the null hypothesis is regarded as the starting point or the most appealing argument in the absence of statistically significant and meaningful information (more on this some other time) as for all we know any observed relationships between the variables either in this example or any other scenario, as long as it isn’t tested or can’t be proven, could be purely coincidental (insert good old correlation ≠ causation).

So, that was a lot but we’re not done just yet. If the null hypothesis exists to be rejected or accepted, again…kinda, then surely, there has to be a basis for its acceptance or rejection, right? Enter an alternative or research hypothesis, Ha, which by its official definition is the polar opposite of the null hypothesis in that it is a statement of inequality or more explicitly, a research hypothesis states that a relationship exists between the variables in question. So in essence, the burden of proof rests on the alternate hypothesis. Now take a look at our example one more time. “The effect of social media ads on the buying habits of adolescents in a country or region” Let’s actually frame it as a research hypothesis,

“social media ads have an effect on the buying habits of adolescents”.

Once again I just want to say this example is just to drive home my point, it’s not a perfect one.

Now, I want you to take a look at that research hypothesis for a minute, what do you notice? (It’s ok to not get it right, but try). What you’ll notice is the alternate hypothesis does not state what kind of effect social media ads have on adolescents (does it make them more likely to buy stuff, or less likely). This is referred to as a non-directional research hypothesis and yes, as you’ve guessed, there is a directional research hypothesis that points more explicitly to the direction/nature of the relationship.

HOW DO I KNOW MY RESEARCH HYPOTHESIS IS “FOOL-PROOF”?

We previously touched on subjects (population and samples) and this leads us to the next section of this article where we’ll be discussing the differences between the null and alternate hypotheses as well as what makes a good alternate hypothesis. So, back to subjects, the first thing to note in terms of the differences is that the null hypothesis is a statement made against the entirety of a population whereas with an alternate hypothesis, tests are carried out on a sample of that population and the results are generalized for that population. Another important distinction is how both are expressed (mathematically), the null hypothesis is written with an equal sign while the alternate hypothesis is written with a not equal to, greater than, or less than sign, like in this example: If we want to test whether the average height of first-year college kids is 175 cm.

The null hypothesis would be H0 : μ = 175

The alternate hypothesis would be Ha : μ ≠ 175 OR Ha : μ > 175 OR Ha : μ < 175

That’s pretty simple, isn’t it? Note the use of the symbols, for a population like in the example above, “mu” µ, is used, but when referring to a sample, “x-bar”, is used.

Now to the most interesting part, here we’re looking to spot the imposter amongst us. So just how do we know that the statement of a research hypothesis is a good one? Firstly, a good hypothesis is stated in a brief, clear and declarative form, for example, “consistent exposure of the eyes to light rays from a computer device leads to headaches”. A bad hypothesis formation would look like a question, a statement of uncertainty (which it shouldn’t be), like this: “does consistent exposure to light rays from a computer lead to headaches?”

Another vital quality of a good hypothesis, and probably the most important one, is that it is testable. Take our example about the first-year college kids

“If we want to test whether the average height of first-year college kids is 175 cm”

This is testable. Height, first-year college kids (well, a sample), it’s obviously testable. Certain statements however aren’t. I remember seeing an incredibly hilarious tweet which I’ll attempt to paraphrase to show you an example, “95% of kids were turned “gay” through boarding schools”, not only does that number come from the Bureau of Imaginary Statistics (yes, it's a thing) it is obviously untestable because where do we begin to find all those students past and present and how do we begin to even ask them/test such a question?

One last requirement for the road. For a hypothesis to be considered a good one, its results or findings must be reproducible. Reproducibility refers to an attempt to replicate the original observation using the same methods of a previous investigation but collecting unique observations (Open Science Collaboration, p. 300-301). In essence, an independent researcher must be able to carry out that study using your prescribed methods and achieve the same/similar results.

So to recap, because this has been a lot, a good hypothesis is declarative and concise, testable, and reproducible. Hypothesis formulation and testing is without a doubt a key component of any research as it helps us to answer important questions about our problem statement with greater clarity and confidence.

References

Open Science Collaboration. (2014). The reproducibility project. In V. Stodden, F. Leisch, & R. D. Peng (Eds.), Implementing Reproducible Research (pp. 300-301).