Wednesday, April 26, 2017

Published 09:58 by with 0 comment

mtcars - Hypothesis Testing

Having explored the mtcars dataset and building a good understanding about the dataset, its time for us to do some "Hypothesis Testing" (without getting much into math and statistics behind it).

Hypothesis Testing

So, the objective of this testing is to state our hypothesis about cars and validate our conclusion that the result was not by chance and it can be explained by data. If we are able to do that, we would prove our hypothesis to be true. Otherwise we ll reject our hypothesis.

Lets define the steps of the hypothesis testing (as below)
  • Define the hypothesis
  • Collect the data (this is already done)
  • Use the data and statistic measures to bolster/bust the hypothesis 

Hypothesis Definition

Let us frame our hypothesis like -  Cars fitted with manual transmission have high fuel efficiency when compared with cars with automatic transmission.

null hypothesis -> True difference in fuel efficiency means between two groups of cars is = 0

alternate hypothesis -> True difference in fuel efficiency means between the groups is not = 0

Testing the Hypothesis

On trying to get average mpg across the two classes of transmission type, we see cars with manual transmission runs ~7.25 more miles per gallon when compared with their peers fitted with automatic transmission.



Here is the visual of the fuel efficiencies by transmission type.



t.test() result summary from R looks like this.


Since the p-value is 0.001374 (which is less than 0.05) we can reject the null hypothesis. But before doing so, lets try to quickly quantify by building a simple linear regression model and see if the model explains the variability.

Simple Linear Regression


Looking at the coefficients from the result summary we get the same information (of cars with manual transmission having ~7.25 mpg more). Interestingly, the R-Squared value explains that only 36% of variability in data is explained by the model - we should dig a little deeper to understand what other feature(s) can explain the variability.

What can we do next?

From the correlation tests, we understood that mpg is (negatively) correlating with wt, hp and disp (in addition to am).

A better idea is to build a multiple linear regression model including explanatory variables wt, hp and disp and see if the data variability can be explained.

Multiple Linear Regression

A few observations:

  1. 84% of the variability in the data can be explained by this multiple linear regression model and hence we reject our null hypothesis
  2. Interestingly, the fuel efficiency difference between cars with manual and automatic transmission is about ~2.15 miles per gallon
  3. The feature that influences fuel efficiency the most is the Horse Power followed by weight of the vehicle and the transmission type is an insignificant influencer

Image from LiveJournal found via xkcd

Conclusion

From data, we were able to conclude that though manual transmission cars have higher fuel efficiency that cars fitted with automatic transmission, the transmission type has no significant impact on the fuel efficiency of the car (with the best model that explains the variability in data) and also in the journey we were able to find the factors that impact the fuel efficiency.

P.S.: It is wise to note that the dataset we have used is at least four decades old and with technological  improvements in automobile industry our study might not be relevant. Nevertheless, we have tried building a model that can be applied to similar datasets that are from today.
      edit

0 comments:

Post a Comment