by Dane Miller – 4/9/18
Here is a popular dataset on old faithful geyser eruptions in Yellowstone, WY. The dataset comes from Weisberg (2005) publication in Applied Linear Regression. This type of dataset can be extremely useful to National Park Service Rangers for predicting eruptions for visiting tourist. I would highly recommend visiting Yellowstone and seeing old faithful geyser in person it is truly amazing!
Source of the data: http://www.stat.cmu.edu/~larry/all-of-statistics/=data/faithful.dat
Weisberg, S. (2005). Applied Linear Regression, 3rd edition. New York: Wiley, Problem 1.4.
Yellowstone NPS https://www.nps.gov/yell/planyourvisit/exploreoldfaithful.htm
seaborn.jointplot https://seaborn.pydata.org/generated/seaborn.jointplot.html
This dataset contains only two variables duration of the current eruption, and the wait time in between eruptions.
Let’s look at a theoretical model: μ = β_{0} + β_{1}X_{i}
μ : Wait time β_{1}X_{i: }Duration
Empirical model: ^{^}y_{i} = b_{0} +b_{1}x_{i1}
y= observed wait time b_{1}x_{i1: }observed duration
coef | std err | t | P>|t| | [0.025 | 0.975] | |
Intercept | 35.0774 | 1.184 | 29.630 | 0.000 | 32.748 | 37.407 |
duration_sec | 10.7499 | 0.325 | 33.111 | 0.000 | 10.111 | 11.389 |
Wait time = 35.0774 + 10.7499Duration
When I was initially introduced to this dataset in graduate school during a stats course. My focus then was to complete the problems as quickly as possible so that I could get back to my graduate research. However, I missed on some important subtleties in this simply dataset.
Rushing for a dataset in graduate school with Microsoft Excel. Looks pretty crappy! What was I thinking!!!
Plotting the residuals:
The data is separating into two groups.
The same old faithful dataset now using seaborn.jointplot in python.
Focus your efforts on learning python or R it will drastically improve your work. And there you have it a rebooted old faithful dataset plotted with seaborn.jointplot in python.