It would be a good idea to spend some time reading the basics and this page would be dedicated for that.
Why do we care about variable type (in the context of Machine Learning)?
Understanding the variable types gives us the power to treat them appropriately. For instance, it would help us in avoiding mistakes like doing an average on postal codes or taking ratio of two pH values. It also helps us choosing appropriate operation on the variable based on the context. For example, in a psychological study of perception, different colors would be regarded as nominal. In a physics study, color is quantified by wavelength, so color would be considered a ratio variable.
Feature
Feature (aka Variable) is a measurable property or attribute of an observation. An example would be while studying about performance of cars the possible features would be- Number of Cylinders
- Miles Per Gallon (or Kilometers Per Liter)
- Horse Power
- Weight
- Number of Gears
- Type of Transmission
Label
Label represents the outcome of the or the output whose variation is being studied. An example would be the learning problem shown below. Here the examples are labeled '1' and '0'. In this case, animals are marked with '1' and others with '0'.Label is also referred as 'Explained Variable' or 'Dependent Variable' The dependent variable responds to the independent variable and for this reason it is called 'dependent' variable.
Dataset
Dataset (or Data Set) is simply a collection of observations. Typically its the data collection in rows and columns where columns correspond to feature/label and rows correspond to the observations. Most popular dataset format used is spreadsheet which is powerful for quick analysis.Data (Variable) Type
Data is often classified as below based on the nature of the data.Please Note: This is different the 'datatype' defined in database realm.
Quantitative Variable
Quantitative Variable is expressed in numerical form and therefore arithmetic operations can be performed on them. Quantitative Variable can be further classified as follows-
Continuous Variable
A continuous variable can take values between two numbers. This variable can take infinitely many values.
Example: Time taken by top five athletes to complete 100m in Rio Olympics: 9.81, 9.89, 9.91, 9.93, 9.94.
Example: Time taken by top five athletes to complete 100m in Rio Olympics: 9.81, 9.89, 9.91, 9.93, 9.94.
- Interval Variable
Interval Variables take numeric values and they can be measured along continuum. The intervals between the values of the interval variable are equally spaced.
Example: Temperature measured in degrees Celsius or Fahrenheit. The difference between 20C and 30C is the same as 30C to 40C
- Ratio Variable
Ratio Variables are Interval Variables but with the different that they posses the clear definition of zero (0) which indicates that there is none of that variable.
Example: Temperature measures in Kelvin as 0 Kelvin (also called as absolute zero) indicates that there is no temperature. And for the very same reason temperature measured in Celsius and Fahrenheit are NOT Ratio Variables. Other examples include height, mass, distance, etc.
-
Discrete Variable
A discrete variable does not admit intermediate values between two specific numbers. It is represented by whole integer values.
Example: Total medals by the USA at last three summer Olympics: 121 (Rio 2016), 103 (London 2012), 110 (Beijing 2008).
Example: Total medals by the USA at last three summer Olympics: 121 (Rio 2016), 103 (London 2012), 110 (Beijing 2008).
Qualitative Variable
Qualitative Variable takes non-numeric value. It describes data that fit into categories. This is also referred as Categorical Variable. Qualitative Variable can be further classified as follows-
Nominal Variable
A nominal variable is one that has two or more categories, but there is no intrinsic ordering to the categories.
Example: The blood type of a person has multiple categories A, B, AB or O but there is no intrinsic order.
Example: The blood type of a person has multiple categories A, B, AB or O but there is no intrinsic order.
- Dichotomous (Binary) Variable
Dichotomous variables are nominal variables which have only two categories or levels. Example: If we ask a person if s/he owns a car. The response can be either 'Yes' or 'No'
-
Ordinal Variable
An ordinal variable is similar to a nominal variable but the difference being there is a clear ordering (or ranking) of the variables.
Example: Clothing Size having values like S, M, L, XL where there is an order (S < M)
Example: Clothing Size having values like S, M, L, XL where there is an order (S < M)