Comunicati Stampa - Researching Sponsors: Part 3: Data Analysis
Researching Sponsors: Part 3: Data Analysis
Author’s note: This is part 3 of a 4-part series on deriving the sponsor formula. Part 1 on Tuesday discussed the genesis of the project and the early stages of research. Part 2 on Wednesday discussed the remainder of the controlled research. Part 4 on Saturday will reveal and discuss the formula itself.
And now for something completely different. Back in late February, as I was beginning this research properly with Charisma tests, I posted on the United States forum seeking users to provide me with their own observations. Three users responded and provided data via a Google Form I created at varying degrees of frequency. At the time, I viewed this data not as a means of deriving the formula, but as a means of testing the formula.
However, now I recognized that it could help speed me toward my goal. I had three untested scenarios: rating + Charisma, rating + PR managers, and everything at once. While most of the user-provided data fell into the last category, I had a few data points in the second category, both from myself (collected during my breaks when a test driver was not replaced for a couple of weeks) and from one user who was running a short-term driver with a Charisma of 50.
Analyzing that data allowed me to combine the Charisma/PR manager formula and the rating formula to create a Frankenformula that described the sponsor formula for all variables, but only worked for a Charisma of 50. But then I had over 50 data points that also included Charisma. By calculating the difference between the actual weekly payment and the result of this weird formula for several of the data points, I was able to quickly deduce the fact that the two terms of the formula involving rating both scaled by Charisma. Adjustment of the coefficients and inclusion of the C variable allowed me to factor out the C completely to the outside.
After nearly two seasons of work, I had a hypothesis for the final formula. I also had a battery of tests ready to go. I added a column to the spreadsheet, plugged in the formula, and filled the formula down the entire column. For easier finding of errors, I also used a little conditional formatting to highlight each row in green for matches and red for errors.
Out of 153 total data points, 21 were red. I studied them. 7 of the red rows were my own data points that I collected from my F1 driver on the tail end of his career. A quick examination of the data showed that I had put the total rating in the season rating column and vice versa. Switching them instantly turned all 7 green, leaving me with 14 red rows, all of which were submitted by other users.
One of the difficulties of getting data for the sponsor offer is that you need to look at the exact data at the moment of the sponsor offer. For example, if your sponsor offer is received in the afternoon (US time) and you don’t check until evening, then the rating is likely to be wrong (ratings update at 5:50 PM US Eastern Time). The same could happen with an employee who improves their PR Manager skill after the offer and before you check. As such, when I initially began collecting user data, I anticipated this type of user error and included specific instructions and warnings on the form to try to combat it.
To paraphrase a famous saying: “No plan survives contact with the public.” Those instructions and warnings were not always followed, and sometimes users simply submitted what they saw when they checked. Thankfully, rating histories are public information in this game. 8 of the red rows were turned green by checking the rating history and correcting the rating to the previous day’s rating. 3 more rows were turned green by reverting a single PR Manager skill increase. That left 3 red rows.
Looking closely at these three red rows, one was off by only €1, and the other two could be brought within €1 by correcting the rating as described above. I carried the formula out a few additional decimal places and noticed that each one was within three eurocents of rounding down instead of up, which would fix all three. I have seen evidence of rounding errors in driver rating histories and on the driver profile (where the season and cumulative ratings are added together) before, so I know that the system carries more than three decimal places, and I guessed that the rounding of the driver rating in these three cases was pushing it over the threshold to round the formula result the other way. Indeed, two rows were turned green by subtracting just 0.0001 from the rating, and the last was turned green by subtracting 0.0003, all within the range of a rounding error.
I was mostly satisfied that I had my formula, but I wanted a few more controlled tests before calling it good. I hired a PR manager with my existing driver and ran him for one more week, collecting data that fit with the formula. I then started a new driver with maximum Charisma (now 96) and ran him for two weeks. Again, the data fit with the formula. While I have not done properly controlled tests on the interaction between all three, I have literally dozens of data points in that category that all fit the formula. It’s also really, really hard to test the interaction between three variables at once in a controlled fashion, especially when one of them (rating) is extremely time-consuming to vary. Therefore, I decided that I was satisfied with the formula.
On Tuesday of last week, I received a message from Numpty with some thoughts on the formula. Numpty had, after reading my thread in the United States forum, derived the effect of rating, on top of the formula I had posted for everything but rating, on his own. However, his formula appeared different from my own. Careful examination, however, showed that the two formulas were equivalent, and mine could be converted to his by factoring. Over the next several days, Numpty and I discussed the subject some more, and we performed some more factoring that I believe has led to the form that is the simplest and easiest to understand.
Look for Part 4 tomorrow, in which the actual formula will be revealed and discussed.