Key Concepts Overview
- Scatter Diagrams: Tools for visualizing the relationship, or correlation, between two variables ($x$ and $y$).
- Plotting points: Each observed $(x, y)$ pair is a point on the Cartesian plane.
- Purpose: Determine pattern (linear, curved, none) and strength/direction of association.
Understanding Correlation
What it measures: The degree to which two variables change together. It does not imply causation.
Types of Correlation
- Positive Correlation: As $x$ increases, $y$ tends to increase (pattern slopes up and right).
- Negative Correlation: As $x$ increases, $y$ tends to decrease (pattern slopes down and right).
- No Correlation: No discernible pattern linking changes in one variable to another (diluted cloud of points).
Strength of Correlation
- Strong: Points cluster tightly around a potential line or curve.
- Medium/Weak: Points are spread more widely but still show a tendency.
- None: Random scatter, no visual pattern.
Syllabus Deep Dives & Theory Focus
I. Graphical Analysis
- Identifying Pattern: Ability to quickly determine if the relationship is linear or non-linear by inspection of the scatter diagram.
- Outliers: Identify potential points that deviate significantly from the general trend. Suspend conclusions based on single outliers.
- Line of Best Fit (LOBF): The single straight line that best represents the overall trend of the data. Points should generally fall close to this line.
II. Correlation Measurement ($r$)
- Coefficient of Correlation ($r$): A numerical measure given by Pearson’s $r$. Ranges from $-1$ to $+1$.
- Calculation: Mathematically, $\rho$ is calculated using the formula involving covariance and standard deviations (or specific sums of squares). Formula: $$\rho = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum(x_i - \bar{x})^2 \sum(y_i - \bar{y})^2}}$$. For exams, $r$ typically found using provided statistical calculator functions or formula sheets. Conceptually understand method, interpret resultant value key.
- $r \approx +1$: Strong positive linear relationship.
- $r \approx -1$: Strong negative linear relationship.
- $r \approx 0$: Little to no linear relationship.
III. Making Predictions (Linear Interpolation)
- Method: Use the identified Linear Model ($y = mx + c$) derived from LOBF and data points to estimate $y$ for a given $x$.
- Domain Matters: Prediction/interpolation must remain within the observed range of $x$-values (the domain). Extrapolating outside this range is unreliable.
Study Checklist & Practice Areas
- Define Correlation: Distinguish between correlation and causation.
- Analyze Diagrams: Given various scatter diagrams, correctly state the type (positive/negative/none) and strength of correlation visually.
- Calculate $r$: Compute Pearson’s coefficient from provided data sets and interpret its value (e.g., “There is a strong negative correlation…”).
- Draw LOBF: Accurately draw the line of best fit onto given data plots.
- Predict/Estimate: Use the derived linear relationship to estimate missing data points, explicitly stating limitations (interpolation vs. extrapolation).
Remember: Always describe the correlation observed before presenting any numerical findings or predictions.