Categorical-Numerical

Map > Data Science > Explaining the Past > Data Exploration > Bivariate Analysis > Categorical & Numerical

Bivariate Analysis - Categorical & Numerical

Line Chart with Error Bars

A line chart with error bars displays information as a series of data points connected by straight line segments. Each data point is average of the numerical data for the corresponding category of the categorical variable with error bar showing standard error. It is a way to summarize how pieces of information are related and how they vary depending on one another (iris_linechart.xlsx).

Combination Chart

A combination chart uses two or more chart types to emphasize that the chart contains different kinds of information. Here, we use a bar chart to show the distribution of a binned numerical variable and a line chart to show the percentage of the selected category from the categorical variable. The combination chart is the best visualization method to demonstrate the predictability power of a predictor (X-axis) against a target (Y-axis).

Z-test and t-test

Z-test and t-test are basically the same. They assess whether the averages of two groups are statistically different from each other. This analysis is appropriate for comparing the averages of a numerical variable for two categories of a categorical variable.

If the probability of Z is small, the difference between two averages is more significant.

t-test

When the n₁ or n₂ is less than 30 we use the t-test instead of the Z-test.

Example:

Is there a significant difference between the means (averages) of the numerical variable (Temperature) in two different categories of the categorical variable (O-Ring Failure)?

O-Ring Failure	Temperature
Y	53	56	57	70	70	70	75
N	63	66	67	67	67	68	69	70	72	73	75	76	76	78	79	80	81

t-test	O-Ring Failure
Temperature	Y	N
Count	7	17
Mean	64.43	72.18
Variance	76.95	30.78
t	-2.62
df	22
Probability	0.0156

The low probability (0.0156) means that the difference between the average temperature for failed O-Ring and the average temperature for intact O-Ring is significant.

Analysis of Variance (ANOVA)

The ANOVA test assesses whether the averages of more than two groups are statistically different from each other. This analysis is appropriate for comparing the averages of a numerical variable for more than two categories of a categorical variable.

Example:

Is there a significant difference between the averages of the numerical variable (Humidity) in the three categories of the categorical variable (Outlook)?

Outlook	Humidity
overcast	86	65	90	75
rainy	96	80	70	80	91
sunny	85	90	95	70	70

Outlook	Count	Mean	Variance
overcast	4	79.0	127.3
rainy	5	83.4	104.8
sunny	5	82.0	132.5

Source of Variation	Sum of Squares	Degree of freedom	Mean Square	F Value	Probability
Between Groups	44.0	2	22.0	0.182	0.836
Within Groups	1331.2	11	121.0
Total	1375.2	13

There is no significant difference between the averages of Humidity in the three categories of Outlook.