Statistical analysis on 2011 NCAA meet

Post by sagehen1 » Wed Dec 21, 2011 2:51 pm

So I figured this community might appreciate this more than most - I ran my final project for my statistical computing class on analyzing the difference between swimmers' seed times and actual performance at the 2011 D3 National Championship Meet. More specifically, I wanted to see which factors were best at predicting performance in an event - having a seed time achieved before January (i.e. the swimmer likely rested in December), having an A cut, distance of the event, gender, stroke, and class year. I used classification and regression trees and random forests to run this analysis (if you're interested, ask, but these are basically just techniques for determining which variables are the most important classifiers).

- The most significant variable (out of the six that I was able to consider from the USA Swimming database of times) was having a seed time achieved before January; out of the 171 races where swimmers had seed times before January, they dropped an average of .1435 seconds per 50 and swam faster 71.35 percent of the time, whereas if your seed time was after December, on average your time would hold from your seed time and you were about 50/50 on swimming faster.
- Having an A cut going in also seemed significant, even if your seed time was from after December; swimmers with an A cut and seed time from 2011 dropped an average of .08627 seconds per 50 while those without an A cut and a seed time from 2011 gained .02529 seconds.
- There was an interesting pattern between distance and likelihood of dropping time in your event; the least likely event that you would drop time in was the 50 free, where only 38.95% of swimmers went faster. However, discounting the 50, the shorter the race, the more likely you were to drop time.

I attached the pdf if anyone's interested. I'd be interested to hear any feedback.
D3 Meet Analysis.pdf
(104.97 KiB) Downloaded 165 times

Re: Statistical analysis on 2011 NCAA meet

Post by N Dynamite » Wed Dec 21, 2011 3:23 pm

Interesting stuff - thanks for sharing. I'm curious what conclusions can truly be drawn. For instance, would the people who got a selectable cut in December perform as well in March had they tapered only in February? Was there any way to account for the people who tapered both in December and February/December only/February only? If so, could you rank those? I don't think you'd get much argument that it would be best to taper in December, get your cut, and then train through til March. But which would be better - tapering in December and needing to retaper in February to improve your time or just tapering in February? This may generate some strong opinions, but it if you had to choose between tapering in December and/or February, I'd still think it would be better to do one taper prior to nationals and do it in February than to have to taper twice.
