View Revisions of Internet Weather Forecast Accuracy

Return to article

Compare: (should be the newer of the two)
To: (older)
Key: Unchanged text, New text, Deleted text
Weather forecasting is a secure and popular online presence, which is understandable. The weather affects most everyone's life, and the Internet can provide information on just about any location at any hour of the day or night. But how accurate is this information? How much can we trust it? Perhaps it is just my skeptical nature (or maybe the seeming unpredictability of nature), but I've never put much weight into weather forecasts - especially those made more than three days in advance. That skepticism progressed to a new high in the Summer of 2004, but I have only now done the research necessary to test the accuracy of online weather forecasts. First the story, then the data.

h2. An Internet Weather Forecast Gone Terribly Awry


It was the Summer of 2004 and my wife and I were gearing up for a trip with another couple to Schlitterbahn in New Braunfels - one of the (if not _the_) best waterparks ever created.[1] As a matter of course when embarking on a 2.5-hour drive to spend the day in a swimsuit, and given the tendency of the area for natural disasters,[2] we checked the weather. The temperatures looked ideal and, most importantly, the chance of rain was a nice round goose egg.

A couple of hours into our Schlitterbahn experience, we got on a bus to leave the "old section" for the "new section." Along the way, clouds gathered and multiple claps of thunder sounded. "So much for the 0% chance of rain," I commented. By the time we got to our destination, lightning sightings had led to the slides and pools being evacuated and soon the rain began coming down in torrents - accompanied by voluminous lightning flashes. After at least a half an hour the downpour had subsided, but the lightning showed no sign of letting up, so we began heading back to our vehicles. A hundred yards into the parking lot, we passing a tree that had apparently been split in two during the storm (whether by lightning or wind, I'm not sure). Not but a few yards later, there was a distinct thud and the husband of the couple accompanying us cried out as a near racquetball sized hunk of ice rebounded off of his head and onto the concrete. Soon, similarly sized hail was falling all around us as everyone scampered for cover. Some cowered under overturned trashcans while others were more fortunate and made it indoors.

The hail, rain and lightning eventually subsided, but the most alarming news was waiting on cell phone voicemail. A friend who lived in the area had called frantically, knowing we were at the park, as the local news was reporting multiple people had been by struck by lightning at Schlitterbahn during the storm.

"So much for the 0% chance of rain," I repeated.

h2. Testing the Skepticism


After that experience, I gave up using online weather forecasts (actually _any_ weather forecast) for more than getting a reasonable idea of the "temperature decade" for the next day. I've recently begun to be a little skeptical of my own skepticism, however. What if I was the victim of a freak waterpark occurrence and was missing out on the typically reliable weather information online? Using a spreadsheet, observed data and straightforward statistics, I was set to find out.

My plan was to record the weather forecasts of some of the most popular Internet weather sites as well as actual temperatures and then to analyze the data to determine each site's accuracy. I would then be able to draw supported conclusions to apply to future use of Internet weather forecasts (if any).

h2. Data Mining Internet Weather Forecasts


Doing an Internet search for various weather related keywords, and then cross-referencing to avoid duplication,[3] I selected the top ten weather forecast sites to be included in my survey using their Google Toolbar[4] PageRank (PR).[5] Additionally, I selected Houston, Texas as the location and _The Weather Channel_ as my "actual temperature" source.[6]
* _The National Weather Service_[7] - PR9
* _BBC Weather_[8] - PR9
* _The Weather Channel_[9] - PR8
* _The Weather Underground_[10] - PR8
* _IntelliCast_[11] - PR8
* _CNN Weather_[12] - PR8
* _MSN Weather_[13] - PR8
* _The Weather Network_[14] - PR7
* _Unisys_[15] - PR7
* _AccuWeather_[16] - PR6
* _Actual_ (as reported on weather.com)[17]

Then, on a daily basis I recorded the predicted low and high temperatures on each weather forecast site going back as far as was made available. This varied greatly from site to site, with _CNN Weather_, _BBC Weather_, _The Weather Underground_ and _The Weather Network_ providing only the current day and four days into the future, and _Accuweather_ providing the current day and four_teen_ days into the future. I usually logged the data at 12pm CST, but occasionally as late as 5pm CST, which resulted in some high temperature predictions for the current day not being available, as well as (oddly enough) the low temperature not being available in a few cases. I also recorded average and record temperatures for all days considered.

[[Image:Weather_data.png]]

h2. Calculations on Forecast Accuracy and Consistency


In order to assess the accuracy and consistency of each weather forecast site, I first found the absolute values of the differences between the predicted and actual temperatures. For example, considering the data presented in Table 2 above, the actual high temperature in Houston, TX on Thursday, December 21[^st^] was 70° F. At noon on Thursday, December 14[^th^], _The Weather Channel_ online predicted the high on that day would be 60° F, 10° off of the actual and yielding an "accuracy value" of _10_. On the same day, _MSN Weather_ online predicted a high of 45° F, corresponding to a value of _25_ - the higher number indicating a poorer performance. The tables turned somewhat two days later when _The Weather Channel_ predicted 66° and _MSN Weather_ predicted 68°, resulting in accuracy values of _4_ and _2_, respectively.

Next, I calculated the mean and standard deviation of these accuracy values for each weather forecast site and predictive period (e.g., _Accuweather_ two days in advance, _The Weather Network_ four days in advance, etc.). The mean value representing the average accuracy and the standard deviation representing the consistency, or "spread," of the accuracy values.[18]

The following tables and graphs summarize the gathered weather forecast accuracy and consistency data by organizing it into columns by the number of days previous. Note than in both cases a _lower_ number represents a better performance.

[[Image:Accuracy.png]]


[[Image:Accuracy_high_bar.png]]


[[Image:Accuracy_low_bar.png]]


[[Image:Consistency.png]]


[[Image:Consistency_high_bar.png]]


[[Image:Consistency_low_bar.png]]

h2. Ranking Forecasts by Accuracy and Consistency


I then ranked the accuracy and consistency of each weather forecast site as compared to the competing sites (i.e., the other sites providing forecasts). Note that days 10 through 14 were omitted as _Accuweather_ was the only site providing a weather forecast.

[[Image:Accuracy_rank.png]]


[[Image:Consistency_rank.png]]

Additionally, I organized the accuracy and consistency rankings with respect to short, mid and longterm weather forecasts as dictated by the data groupings. I scored each weather forecast site with points corresponding to each ranking it received within the specified time period. For example, in order to rank weather forecast sites in the short term grouping (0-4 days in advance), I multiplied the number of first place ranks by 10, added the number of second place ranks multiplied by 9, and continued in this manner through adding the number of tenth place rankings multiplied by 1. The higher the score, the higher the ranking. Mid and long term group rankings were similarly determined with the calculations modified to fit the number of participating weather forecast sites.

[[Image:Accuracy_rank_group.png]]


[[Image:Consistency_rank_group.png]]

h2. Correlation of Variables to Weather Forecast Accuracy and Consistency


I also ran correlation analysis on various factors to see if they explained any of the accuracy differences observed.[19] Specifically, I analyzed the following variables in order to check for the listed corresponding correlation trends:
* _Number_ - trends over time. I numbered the days for which the temperature was being forecast from 1 to 62, starting with December 1, 2006 (#1) and ending with January 31, 2007 (#62).
* _Day_ - trends with the day of the week.
* _Hi/Lo_ - trends between high and low forecasts. If the forecast was for a daytime high, this value was 0; if it was for an overnight low, it was 1.
* _Site_ - trends between different sites. Again a column for each site was used with a 1 value for when the particular site was making the prediction and a 0 value when it was another site.
* _Previous_ - trends between the number of days ahead the forecast was predicting. Starting with predictions made on the same day (i.e., the forecast for today's high or tonight's low), this value ran from 0 to 14.

[[Image:Regression_data.png]]

I compared the resulting correlation values with some standard values[20] to determine if there were _small_, _medium_, _large_ or _no_ trends correlating with the weather forecast accuracy numbers.

[[Image:correlation_interpretation.png]]


[[Image:correlation_all.png]]

These results indicated a trend for more accurate weather forecasts closer to the temperature in question and when a high temperature is being predicted. No weather forecast site was shown to be significantly more accurate than another, though - something that does not seem to jive with the previously generated tables and graphs. It is important to note, however, that these values are generalized over all forecasting periods and for both high and low temperatures. I ran another correlation analysis to remove these variables, just the high temperature forecasts published 0 days previous in this case.

[[Image:correlation_one_type.png]]

These numbers again showed no significant correlation between the weather forecast site and the accuracy of the weather forecast. Looking back at Figure 1 and Table 2, however, this wasn't too surprising. The forecast accuracies in this selection were reasonably tightly grouped, with the exception of _BBC_, and _BBC_ was on the verge of having a small correlation. I made another selection and ran a third correlation analysis, this time on the more loosely grouped accuracy values for low temperature forecasts published 3 days previous. These numbers showed small correlations for _MSN_ and _Unisys_, which are reflected by the relatively large separation from the pack in Figure 2 and Table 2.

[[Image:correlation_two_type.png]]


h2. Conclusion


While the tabled rankings brought out the competitor in me, it was obvious from the correlation analysis that only the numbers clearly "separate from the pack" in Figures 1-4 are better or worse enough as to be statistically significant. Thus, there were a few shiners and a few duds, but the variation among the rest can be explained away by chance. The trends I observed included:
* In seeking high temperature forecasts, it looked best to use _IntelliCast_ or _The Weather Channel_ in the long term, but there wasn't a clear leader in the short to mid term. _BBC_ seemed unreliable in all cases, as well as _MSN_ in the long term. _The Weather Network_, _CNN_ and _Unisys_ all had blemishes (3, 4 and 0 days in advance, respectively), but were generally in with the pack.
* In seeking low temperature forecasts, _IntelliCast_ and _The Weather Channel_ were again the choice in the long term, joined by _Unisys_ in the short term. _BBC_ was still a dud in anything but the very short term, and _MSN_ performed horribly in nearly all cases, as well as _Accuweather_ in the long term.
* _Accuweather_ was the clear leader in anything greater than 10 days in advance, being the only site providing a weather forecast.

In addition to the above observations/recommendations, it was clear from the correlation analysis that the further removed a weather forecast is, the less accurate it will likely be. Much more unexpectedly, however, it was also clear that predictions of the overnight low temperature are less accurate than those of the daytime high.

Overall, the accuracy and consistency values prescribed caution - even when considering the most accurate and consistent weather forecasts. For example, if I wanted to know the high temperature for tomorrow, the numbers showed _CNN Weather_ to be the most accurate Internet weather resource. Its weather forecast, however, comes with an average accuracy value of over 3° and a consistency value of over 2°. Thus, the conscientious browser would need to mentally append "with an accuracy of 3°±2°" to the temperature prediction and realize this results in a two degree span at best and a _ten_ degree span at worst. This means a pessimist would be justified in reading a prediction of "75°" for tomorrow's high as nothing more than "70°-80°" - and this using the _best_ online resource available! Granted, the optimist would also be justified in reading the same prediction as "74°-76°," but it's always best to plan for the worst case - especially when going to Schlitterbahn.

Many of the other less accurate weather forecasts, then, seem to be practically worthless for all but the most optimistic. Take, for example, the best option for determining the overnight low temperature a week from today, _The Weather Channel_. The appropriate accuracy baggage on this Internet weather forecast site would be ~5.6°±4.4°, pessimistically reducing a forecast of "50°" to "40°-60°" (!!). Perhaps this explains why only four sites ventured to provide weather forecasts more than a week in advance, and four others didn't even push beyond four days.

So, what of my skepticism? I'd say it's going strong. While the difference between online weather forecast sites was less than I expected, the accuracy and consistency results support a strong dose of skepticism anytime you lookup the weather on the Internet.[21]

h2.
Notes

fn1. "Schlitterbahn Waterpark Resort." _Schlitterbahn.com._ Accessed January 2007 from "http://www.schlitterbahn.com/nb/":http://www.schlitterbahn.com/nb/. According to this site, " Schlitterbahn Waterpark Resort® received top awards in the World’s Best Waterpark, World’s Best Waterpark Landscaping and the World’s Best Waterpark Ride categories during the 2006 Golden Ticket Award ceremony at Holiday World amusement park."

fn2. "'Devastating' Texas floods kill 9." _CNN.com_. Accessed January 2007 from "http://archives.cnn.com/2002/WEATHER/07/05/texas.flooding/index.html":http://archives.cnn.com/2002/WEATHER/07/05/texas.flooding/index.html. I personally participated in the cleanup efforts following this flooding as a member of a group of about 15 people that spent an entire day tearing apart a house that had been picked up in this flood and dropped on it's site. You bet we were going to check the weather.

fn3. One example of such duplication is that both _Yahoo Weather_ (accessed January 2007 from "http://weather.yahoo.com/":http://weather.yahoo.com/) and _USAToday Weather_ (accessed January 2007 from "http://asp.usatoday.com/weather/weatherfront.aspx":http://asp.usatoday.com/weather/weatherfront.aspx) use _The Weather Channel_ [[accessed January 2007 from "http://www.weather.com/":http://www.weather.com/) as their source.

fn4. "Google Toolbar Features." _Google_. Accessed December 2007 from "http://toolbar.google.com/button_help.html":http://toolbar.google.com/button_help.html.

fn5. "Our Search: Google Technology." _Google_. Accessed December 2007 from "http://www.google.com/technology/index.html":http://www.google.com/technology/index.html. According to Google, "PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value."

fn6. While some may suspect biased results due to selecting one of the weather forecast sites included in my survey for the actual temperature comparison, these values are reported from third-party measuring stations such as airports without regard to the reporting site.

fn7. _The National Weather Service_. Main URL: "http://www.nws.noaa.gov/":http://www.nws.noaa.gov/. Data gathered from: "http://www.srh.noaa.gov/forecast/MapClick.php?CityName=Houston&state=TX&site=HGX":http://www.srh.noaa.gov/forecast/MapClick.php?CityName=Houston&state=TX&site=HGX. Accessed January 2007.

fn8. _BBC Weather_. Main URL: "http://www.bbc.co.uk/weather/":http://www.bbc.co.uk/weather/. Data gathered from: "http://www.bbc.co.uk/weather/5day_f.shtml?world=0268":http://www.bbc.co.uk/weather/5day_f.shtml?world=0268. Accessed January 2007.

fn9. _The Weather Channel_. Main URL: "http://www.weather.com/":http://www.weather.com/. Data gathered from: "http://www.weather.com/weather/tenday/USTX0617?from=month_topnav_undeclared":http://www.weather.com/weather/tenday/USTX0617?from=month_topnav_undeclared. Accessed January 2007.

fn10. _The Weather Underground_. Main URL: "http://www.wunderground.com/":http://www.wunderground.com/. Data gathered from: "http://www.wunderground.com/cgi-bin/findweather/getForecast?query=houston%2C+tx":http://www.wunderground.com/cgi-bin/findweather/getForecast?query=houston%2C+tx. Accessed January 2007.

fn11. _IntelliCast_. Main URL: "http://www.intellicast.com/IcastPage/LoadPage.aspx":http://www.intellicast.com/IcastPage/LoadPage.aspx. Data gathered from: "http://www.intellicast.com/IcastPage/LoadPage.aspx?seg=LocalWeather& SearchResults=True&loc=kiah&product=Forecast&prodgrp=Forecasts&prodnav=none":http://www.intellicast.com/IcastPage/LoadPage.aspx?seg=LocalWeather&SearchResults=True&loc=kiah&product=Forecast&prodgrp=Forecasts&prodnav=none. Accessed January 2007.

fn12. _CNN Weather_. Main URL: "http://www.cnn.com/WEATHER/":http://www.cnn.com/WEATHER/. Data gathered from: "http://weather.cnn.com/weather/forecast.jsp?locCode=HOU":http://weather.cnn.com/weather/forecast.jsp?locCode=HOU. Accessed January 2007.

fn13. _MSN Weather_. Main URL: "http://weather.msn.com/":http://weather.msn.com/. Data gathered from: "http://weather.msn.com/tenday.aspx?wealocations=wc:USTX0617":http://weather.msn.com/tenday.aspx?wealocations=wc:USTX0617. Accessed January 2007.

fn14. _The Weather Network_. Main URL: "http://www.theweathernetwork.com/":http://www.theweathernetwork.com/. Data gathered from: "http://www.theweathernetwork.com/weather/cities/usa/Pages/USTX0617.htm#longTerm":http://www.theweathernetwork.com/weather/cities/usa/Pages/USTX0617.htm#longTerm. Accessed January 2007.

fn15. _Unisys_. Main URL: "http://weather.unisys.com/":http://weather.unisys.com/. Data gathered from: "http://weather.unisys.com/forecast.cgi?Name=houston%2C+tx&Go.x=0&Go.y=0":http://weather.unisys.com/forecast.cgi?Name=houston%2C+tx&Go.x=0&Go.y=0. Accessed January 2007.

fn16. _AccuWeather_. Main URL: "http://home.accuweather.com/index.asp?partner=accuweather":http://home.accuweather.com/index.asp?partner=accuweather. Data gathered from: "http://wwwa.accuweather.com/forecast-15day.asp?partner=accuweather&traveler=0&zipChg=1&zipcode=77001&metric=0":http://wwwa.accuweather.com/forecast-15day.asp?partner=accuweather&traveler=0&zipChg=1&zipcode=77001&metric=0. Accessed January 2007.

fn17. _The Weather Channel_. Main URL: "http://www.weather.com/":http://www.weather.com/. Data gathered from: "http://www.weather.com/weather/pastweather/USTX0617?from=36hr_topnav_undeclared":http://www.weather.com/weather/pastweather/USTX0617?from=36hr_topnav_undeclared. Accessed January 2007.

fn18. It may be best to picture these values as if you were at a shooting range. Someone who shoots close to the center is accurate, while someone who shoots with a tight grouping at any location is consistent.

fn19. Weiss, Neil A. "Elementary Statistics: Descriptive Methods in Regression and Correlation." Addison Wesley Longman, Inc. 1999. More information on the methods and calculations involved in correlation analysis.

fn20. Cohen, J. _Statistical power analysis for the behavioral sciences, 2nd ed._ Hillsdale, NJ: Lawrence Erlbaum Associates. 1988.

fn21. I'd be willing to bet skepticism would be warranted for weather forecasts on television, also, but that's another article for another time.

The Showcase

Nerd-Its   Nerd Trends   Last Ten  

  1. RE: The true solution in Scientology: We've had it with you
  2. Manic Fits in Scientology: We've had it with you
  3. RE: Busy guy in Catholic Exorcist Points Finger at Vatican
  4. RE: Why wouldn't it be a religion? Yes, but .... in Scientology: We've had it with you
  5. RE: cell phones in How To Beat Traffic Mathematically
  6. RE: The true solution in Scientology: We've had it with you
  7. RE: Actually... in Scientology: We've had it with you
  8. RE: Actually... in Scientology: We've had it with you
  9. RE: The true solution in Scientology: We've had it with you
  10. RE: The true solution in Scientology: We've had it with you

What is OmniNerd?

Omninerd_icon Welcome! OmniNerd's content is generated by nerds like you. Learn more.

Voting Booth

The Interstate Commerce Clause of the U.S. Constitution empowers Congress to regulate?

8 votes, 0 comments