Winning And Losing In Figure Skating

 

Edmund L. Russell III

 

The ISU has proposed a new system for marking and for placements,  which will be called the CoP (Code of Points) System.   The WSF has proposed using the Best of Majority System (BOM) instead of the CoP or the One-By-One (OBO system).   This paper endeavors to compare these systems to a universal standard – principles by which marking and placements systems should be compared.

 

In this paper I will use some examples from other sports to clarify some of the concepts – principally because these other sports are generally simpler in both marking and placements than Figure Skating.  However the principles learned from these other sports remain the same.  I will also,  of course,  discuss the implications of these in the marking and placement systems in Figure Skating.

 

No value judgments will be made in this analysis,  just a simple statement of facts and comparisons.  It is left to the reader to determine which of the systems is better suited for figure skating.

 

 

Marking And Placement – Two Separate Concepts

 

Determining who wins and who places in any competition is a function of both the marking and the placement systems.   The marking system (or scoring system) has a dual purpose:  awarding marks for event completions and defining which event outcome is more important than which other event outcome in a competition. The placement system serves two purposes as well:   defining the order of finish and defining the sense of what it means to be a winner.

 

For example,  in American football,  the marks (scores) are assigned to have various numbers of points for completed events such as touchdowns, conversions, field goals, and safetys.    The usual placement system simply adds up the points to determine the placement with the highest number of points as the winner.  In the Superbowl,  the placements are determined in the same manner as a single game.   The sense of the winner in a football game is the team which earns the most points on a particular day.   The sense of the winner in the SuperBowl is the best team on the particular day of the SuperBowl.   The winner overall may or may not have the best win-loss record in football for that year.

 

In contrast,  in Baseball,   all events are worth the same number of points in the marking system – a single point each time a player crosses home plate.  In a single game the placement is determined by the total of the points as in Football.   However,  unlike Football,  the placement within the World Series is determined by the win-loss record of the two teams in the World Series.  The sense of the winner in a single game is the best team of the day.  The sense of the winner of the World Series is the team which has the best win-loss record in the World Series.

 

In both Baseball and Football,  changes in the relative importance of different scoring events can be made by modifying the marking (scoring) system.   Football would be greatly different if field goals were worth more points than touchdowns – we would see far more field goals attempted instead of the riskier activity of trying for a touchdown.   In baseball,  if home-runs were worth 6 runs,  many more players would try to hit home runs and it would be the case that the team with the greatest number of home runs would generally win any given game.

 

In these conceptually simpler games of football and baseball,  it is easy to see how changes in either the awarding of marks (scores) or the requirements to determine a winner can affect the outcome of the game.  In Figure Skating,   the marking system and placement systems have a similar effect.   We can predict that if jumps are given more weight than spins,  there will be more emphasis on jumps.   We can readily imagine the potential changes in outcome if the short program were given more weight than the long program – it is likely that both the routines and sometimes the winners would be very different.  Even more so if there were no penalty at all for missed jumps -- there would be many more missed jumps.  And if there were no penalty for extra jumps we would probably see a lot more jumps in the programs.   

 

It is now easy to see that every change in the manner of awarding marks and placements will eventually change a sport to some extent.   The problem is to determine how it will change and by how much.    And of course,  the more radically different the changes in the  marking and placement system,  the odds are higher that the changes in the sport will be equally dramatic.

 

The proposed Code of Points system for Figure Skating is a radical change.  This proposal weights artistic and technical elements differently than the current marking system – there is much greater weight given to jumps. In addition it rewards failed jumps with what will probably be viewed as a minor penalty.  Finally it changes the placement system and thus the sense of the winner.   In this system the sense of winner is that of greatest point earner in a competition.   This winner may or may not be the majority favorite – as is chosen in the OBO and Majority systems.   The impact on the sport is likely to be substantial and the character of the sport should be expected to change significantly over several years if the Code of Points system is adopted.

 

 

Testing New Systems

 

In addition to this examination of the impact of changes in the marking and placement systems and how the athletes  will eventually focus their skills,  we should examine what occurs when the scoring and placement system proposals are “tested” by the ISU staff.   They perform a short (often badly inadequate) series of what they call “simulations” to determine if the winners and top placements will be radically different from whatever the current scoring system and placement system yields.  If those winners change radically,  the proposed system under development will be modified to try to bring it back into balance with the existing system.

 

However,  note that there are two fatal flaws in this method of "simulation" testing.   The most critical one is that the skaters are still competing under the current system.  Thus,  their strategies and skill sets have not been adjusted to the new scoring system.   This does not change substantially even if a short series of real competitions are held under the proposed system as optimal competition preparation and strategy will take several years to develop under full time use of the new system.  The second fatal flaw is that the new system is changed or "tweaked" after the “simulation.”   It is often easy to modify the system being “simulated” so that the results come reasonably close to the outcome of the current system.

 

The goal behind this "simulation" is not to fix the new system as much as it is to attempt to answer the question of who would have won such and such an event if the new system were being used.   The developers of a new system don’t want to mention that the winner was someone who actually ended up in 10th place!    If too many "simulations" are held,  it becomes difficult to impossible to tweak the proposed system to be consistent with the real competition results. To this end, the number of simulations is, predictably,  small.

 

As a final comment on the testing of a new system,  under current practice,  the system actually put in place usually does not match the developing system which has been used for the majority of the testing.   Final tweaks are made and a brand new,  untested system is put into service.   It is guaranteed that any system which is tweaked to match prior sense of the winner as closely as possible will not continue to match the prior sense of the winner after it goes into full time use.  

 

I would strongly recommend locking down the proposed new system,  not allowing any changes (minor or major),  and using it side by side the current system to see what may occur once the system gets into general use.  Of course it is critical that the developers do this in good faith and openly post the lists of their placements instead of keeping them secret as has been done in the past.  I would think that 10 locked down “simulations” of each event type (singles, pairs, dance, synchro) would be sufficient for all but the most radical of proposed changes in scoring and placement.

 

 

Properties Of Marking Systems

 

In this section,  the focus will be on the scoring systems which have been in recent use or proposed for figure skating although references to other systems may be made to illustrate concepts more clearly.    In particular the Best of Majority system will be contrasted to the proposed CoP system.

 

There are four basic categories of marking systems and many different implementations of these four basic categories.  The categories are:  objective-absolute,  objective-relative,  subjective-absolute,  and subjective-relative.   The first modifier is objective or subjective;  this modifier indicates the presence of objective equipment (ruler, balance, clock) or no objective equipment (subjective,  trained observer).  The second modifier is absolute or relative;  this modifier indicates whether the comparison is being made to a fixed table or metric (absolute scale) vs not having a fixed table or metric (relative scale). 

 

The proposed CoP system is a subjective-absolute system for marks while the BOM and OBO systems are subjective-relative system for assigning marks.   None of the system,  proposed or current,  uses objective measurements -- all use trained observers known as Judges to assign marks.

 

 

Absolute Scale Vs Relative Scale:   

 

Systems of marking may use either an absolute scale or a relative scale. Absolute scales for marking are scales which award a fixed mark (points) for completion of a particular element or task.   For instance in track,  marks are awarded which are equal to the time it took to complete the course.   Relative scales for marking are scales which award marks (points) relative to the performance of others in the competition.  An example of a relative marking system are the BOM and OBO systems currently used in figure skating.  

 

It is possible to convert absolute marks to relative marks by simply referencing the absolute marks to one (probably the first) of the competitors and reporting +/- minus marks based on the index mark.  Inherently, no information is lost in this conversion from absolute to relative scales.  Relative marking scales are not inherently inferior to absolute marking scales.

 

One major difference in typical application between using absolute vs relative marking,  especially when used to mark performances (such as in a figure skating competition),  is that the absolute scale assumes that the total performance is equal to the sum of the elements.  In contrast,  the relative scale can be used in the same manner and in addition it allows the marking of the performance to be more than the sum of the elements. 

 

Because of this inherent flexibility,  relative marking is generally felt to be superior in marking performances since a performance is frequently seen as more than a simple sum of the parts.  For instance,  a “polished” presentation does in fact look better than an “unpolished” presentation and may in fact be much more difficult even though the same elements are completed.

 

To elucidate a bit,  there is no difference when using an absolute scale system of marks between doing pieces of a figure skating program in isolation or doing them in a program.  It would be easier to measure performance under an absolute system if we lined up the competitors and ask them to do,  one at a time,  a particular element (say, a jump) and then -- using automated equipment -- measured various attributes such as height, distance covered in the air,  speed into jump,  speed out of jump,  amount of pre-rotation on the ground,  amount of post-rotation on the ground, direction of travel out of jump, etc.  

 

Various methodologies may be used to implement relative marking scales.   The most common method is to assign the best mark to the best performance,  the next best mark to the next best,  and so on.  In general this is implemented in a fashion such that each athlete's performance is fitted into its proper place among the performances which have already occurred.  Often points are assigned as placekeeprs,  but in theory almost any sort of marker or set of symbols can be used.

 

A less common method is to make pairwise comparisons.  Pairwise comparisons often suffer from indeterminacy when more than one aspect of an outcome is to be compared,  especially if the aspects are equally weighted in importance.  For example, it may be judged that A is better than B and B may be judged to be better than C and yet C may be judged to be better than A.   This indeterminacy usually precludes the use of pairwise comparisons when multiple aspects of a performance are being judged.   When a single aspect of a performance is being judged,  pairwise comparisons do not usually suffer from indeterminacy although it is theoretically possible.

 

Another feature of an absolute scale is that there is a fixed level of fineness of difference between two outcomes which can be measured.  Relative scales generally do not suffer from this problem.   For instance,  if skater A does a jump with a 6 inch deliberate change of edge and skater B does the same jump with a 12 inch deliberate change of edge,  the absolute system will apply the same deduction for both skaters while relative marking can award a lower mark to the skater with the longer change of edge.

 

The proposed CoP system for figure skating compares the outcome of key elements in a program to a schedule of points and therefore uses an absolute scale.  Because it uses an absolute scale,  the CoP system assumes that a skating performance is equal to the sum of the pieces.  And because the CoP system uses an absolute scale,  there is a lower limit to the level of fineness in performance differences the system can measure. 

 

The BOM and OBO system for figure skating use a relative scale.  These two systems can assign marks as either the sum of the pieces or more than the sum of the pieces.   In addition,  because of the type of relative scale used in BOM and OBO,  these systems are capable of measurements much finer than the CoP system is capable of making.

 

 

Objective Vs Subjective Marking:

 

The vast majority of sports measure only one aspect of a performance while some sports measure multiple aspects of a performance. How the measurements are made determine if the measurement is objective or subjective.  

 

An objective measurement of an event is a measurement made by equipment – human judgment does not enter into the measurement to any great extent.   Ideally, no human judgment is required at all,  but this is rare in practice.  Common types of objective measurements in sports include distance and time.   In some sports,  such as baseball and basketball,  the event being measured is very well defined and does not take a lot of human judgment but the observation of the event is often left up to a person.  Such measurements may also be considered to be objective most of the time.  There are exceptions of course and during those exceptions,  baseball and basketball have some points (marks) added due to clearly subjective calls by referees. Objective measurements,  especially those made by equipment,  are often very accurate.

 

Subjective measurements imply observation by a trained observer.  In subjective measurements human judgment is a significant component of the measurement.  When a subjective measurement is made,  it is generally made through a comparison to some standard.  The standard is usually either a minimum standard (at least as good as outcome),  average standard (usual outcome),  or the maximum standard (ideal outcome).  

 

Of these,  the weakest subjective comparison is to the average standard since average cannot usually be defined concisely.  Each Judge has their own concept of what “average” means.  In fact,  their concept of “average” may change over time and frequently will change during the course of a competition.   That is,  what was considered “average” for a triple toe-loop at the beginning of an event may no longer be considered average by the time 20 skaters have completed their programs.  

 

Another possible method of comparison for an objective system is to compare one observation to one or more other observations.   With this particular kind of objective comparison,  it is possible to discriminate at a much finer level between outcomes that it is when comparing to a standard.   This is because when comparison is to a standard,  the increments between the finest measurements are generally fixed.  That is,  differences smaller than the defined difference fall in a category of “not a significant difference.”

 

There is also a difference between the accuracy of comparing to a standard and comparing directly to another outcome.   When comparing two skaters to a standard to determine the ranking of the skaters,  there are two opportunities for error for each of the skaters for a total of four opportunities for error:  for each skater we have the error in the observation itself and in the match to the standard.   When comparing two skaters directly with each other,  there are three opportunities for error:  the two observations and the direct comparison.  

 

If the error sizes are equal for all of these opportunities for error,  the direct comparison of skaters against each other has less error.  Mathematically,  the standard deviation of the ranking of the skaters using the comparison to a standard is 15% larger than a direct comparison.   That is,  direct comparisons are inherently better than comparisons to a standard.    Since the goal of a marking and placement system is to place the skater,  the added activity of comparing to a standard is both wasted and is inherently inferior.

 

In the case of the proposed system,  trained observers identify components of a program and award points.  Therefore the proposed CoP system is a subjective system for awarding points.  In addition,  the CoP system uses the weakest of these methodologies.  When the CoP system of points is used,  the Judges must compare what the skater has accomplished to a mental image of what an “average” outcome is for each particular element to be marked.   The Judge may then assign additional points ranging up to +3 (for exceptional) and down to –3 (for very poor).  No concise definitions are given for very poor through exceptional – a fair amount of discretion is left to of each of the Judges.  Part of the problem with developing a concise definition is that the elements have many properties which are considered to have value (height, flow, cleanness, position in the air).  Thus the CoP system is a subjective system of marking.  In addition,  the CoP system inherently assumes a performance by a skater is just the sum of particular pieces of the performance.  

 

In the BOM and OBO systems,  trained observers compare the performances of skaters directly with each other.  The BOM and OBO systems are subjective scoring systems. Typically the number of direct comparisons is small in that the task of the Judge is to work the skater into her proper position among other skaters.  In practice,  the actual comparison of any one skater is generally at most to the performances of two other skaters   The direct comparisons may be very accurate for the typical differences between skaters and comparisons can be made which are much finer than in the CoP system when such distinctions are required.     In addition,  in the BOM and OBO systems,  the placement is sometimes decided upon using more than just the sum of the elements performed.   That is,  placements will reward a superior total performance with a higher placement than a performance which is equivalent in elements completed but inferior in total performance.

 

 

Marking Errors:

 

In this section,  we will be concerned with errors in marking and not with intentionally biased marking which is explored fully in different sections.   We will also not address the occurrence of what may be called "maverick marks."  Maverick marks are not  intentionally biased but are so far from what may be called the "correct" mark that they are obviously wrong.

 

When the performance of an athlete's outcome is awarded marks the performance has received a measurement from a measurement system.   A measurement system consists of the people, the equipment,  the methods employed, and the environment of use.   Not all measurement systems are equivalent and the performance of a measurement system can be changed by altering any of the components of the measurement system.   Two key properties of measurement systems which impact the performance of any measurement system are accuracy and precision.

 

There are well defined techniques used throughout the scientific community and industry to determine the source of errors in accuracy and precision of measurement (marking) systems.   These techniques are sometimes called "gauge R and R studies" or "components of variance studies." 

 

The definition of accurate is that the long term average mark or measurement is correct and can be compared to a well defined, controlled standard.  In statistical terms,  a highly accurate measurement system has a long term average measurement (mark) which is close to the true measurement (mark) while a less accurate system has a long term average which is not close to the true measurement (mark). 

 

The definition of precise is that the marks or measurements (marks) are consistent with each other.  A highly precise measurement system has a small standard deviation (all marks close to each other) for the measurements about the average measurement while a less precise measurement system has a larger standard deviation (marks far apart from each other) about the average measurement.  Note that accuracy is not used in the definition of precision.

 

Precision does not imply accuracy and accuracy does not imply precision.  To elucidate,  consider the sport of target shooting.   If the shots all land in the bulls-eye,  the shooting is both accurate and precise.   If the shots all land in a small group to the upper left of the target,  the shots are precise but not accurate.  If the shots are all over the target but on average are centered on the bull's eye,  the shots are accurate but not precise.   If the shots are all over the target and are centered around the upper left of the target,  the shots are neither precise not accurate.

 

Obviously,  if we want to improve target shooting performance,  if the problem is one of accuracy we simply aim in a different place.   If the problem is one of precision,  we have to fix whatever is causing the lack of precision,  perhaps the breathing or timing with the individuals pulse.   A problem with accuracy is typically easier to correct than a problem with precision.   It is well known however that attempting to fix a problem of precision using a fix for accuracy or vice versa actually worsens the total performance.  

 

Let us now consider an example with a measurement:  measuring the length of a pencil using a ruler.  When we measure the length of a pencil with a ruler,  many factors can come into play to affect the quality of the measurement.   The ruler itself may not be correct – that is the marks on the ruler may be systematically different from the international standard for length.  In addition, the measurement technique can cause errors of differing amounts.  Some people begin measurements at 0 while others start at 1 (or 10),  some people close one eye while others use both eyes,  some interpolate between the marks while others round to the nearest mark,  some look straight down while others look from the side -- there are dozens of variations.  In addition,  there may be time constraints and issues with lighting and physical accessibility.

 

Obviously, if we are going to measure the length of a pencil with a ruler (an objective-absolute measurement) to determine if it is longer than a particular length,  both accuracy and precision are important.  However,  if we are going to compare two pencils to determine which is longer,  accuracy is not important.   In this case,  each measurement is assumed to have a precision equal to 1/2 of the finest tick mark on the ruler.

 

As an alternative to measuring the length of the pencils using a ruler,  we could make a tick mark on a straight piece of wood at the length of one of the pencils and then determine if the second pencil extends beyond that tick mark.  Such a measurement is objective-relative and may be more precise than making measurements with rulers.

 

We could also place the two pencils side by side and simply observe which is longer.   Such a measurement is subjective-relative.  As noted before,  subjective-relative measurements often discriminate more finely between two outcomes than an absolute-objective measurement if the absolute measurement scale is not divided finely enough.  

 

It is easy to see that in the example of comparing the length of two pencils to determine which is longer, that there is added opportunity for error in making two independent length measurements instead of a single direct comparison.   If the standard deviation for a single length measurement is the same as the standard deviation for a single direct comparison,  then the error for two length measurements to make a decision as to which is longer is 41% larger than for the single direct comparison.    Unless some special circumstance is in effect which makes the direct comparison inherently less precise,  the direct comparison (subjective - relative) is better than the two measurements using a ruler (objective - absolute).

 

The CoP system asks the Judges to make an independent measurement of the quality of each element in an athlete's program in isolation from the other elements in the same program and that of the programs of other athletes.  There are a large number of measurements being made during each program.  Since the CoP compares to a standard,   there are errors both in precision and accuracy for each of the measurements of each element and each skater.  Furthermore,  the Judge’s attention is divided between two tasks:  getting the measurement assignment correct in accordance with the standard and making sure that the right skater gets the higher marks in the grade of effort (since the grades of effort are not clearly defined).   Because of the large number of marks given during each program,  it is likely that the Judge will not be able to execute the second of these tasks well.

 

The BOM and OBO systems on the other hand are making direct comparisons of skaters against each other.  Precision in marking in BOM and OBO is important,  accuracy in marking is irrelevant.   That is, the only thing which is required of the Judge is that the correct skater should get the higher marks.  This removes one type of error from marking in these two systems.  In addition,  in BOM and OBO,  the Judge's attention is focused on what is really important – making sure that the skater is placed correctly.  None of the Judge’s attention is deflected to a secondary task which is irrelevant to placement.

 

In most measurement or marking situations where comparisons are being made,  precision is important and accuracy generally is not.   In the CoP system,  both accuracy and precision are required because the measurements (marks) for each element are being made in isolation from other elements and are then being combined afterwards.  This has implications for Judge's training and for fixing marking problems.   In both cases for CoP,  the determination must be made as to whether a difficulty a specific Judge or group of Judges has is one of precision or one of accuracy.   The determination is critical because the type of fix is inherently different.   Attempting to fix a precision problem using fixes for accuracy will not work and will make the performance of the Judge(s) worse.

 

The impact of these issues with precision and accuracy will probably not be seen when the CoP system is tested using small fields of skaters or among the top skaters in a field when the leading skaters are well separated in ability from the rest of the field.   They will however become more apparent as the size of the field increases and more leading skaters are close in ability as happens at Worlds and in the Olympics.   The precision of the OBO and BOM systems in large fields are fairly well known.  Under BOM and OBO, Judges generally can place the leading skaters within +/- one place of the final placement and within +/- several placements for skaters in the middle of the field.   The greatly differing marks which sometimes occur under OBO and BOM are more properly considered maverick marks.

 

What may be called "churning" in results from one competition to another (the same skaters frequently moving up or down several places) should be viewed as a strong indication of a poor measurement system -- and not as an indication of a better system.   While it is possible for athletes to have a performance which is poor from time to time,  it is the case in most individual sports that outcomes are often fairly predictable.   Skating is no different.

 

 

Balance Of The Marks:

 

The balance of the marks in a scoring system affects the strategies the players of a sport will employ.   In some systems,  like baseball,  all scoring events are awarded with the same number of points.   Any one run counts just as much as any other run.   The best thing for a baseball player to attempt at any given time is whatever is most likely to produce a run.  In other systems,  like American football,  different scoring events are awarded with different numbers of points.   In football,  given an opportunity to make a touchdown (6 points) or a field goal (3 points) the touchdown will be attempted unless some special circumstance exists.

 

In figure skating under the CoP system,  if a jump such as a quad Salchow scores higher than a triple Axel,  the athlete will logically choose to do a quad Salchow.  In addition if the athlete typically has a fall in performing the quad Salchow and the penalty is not severe enough,  the athlete will still choose to do the quad Salchow.  Under the OBO and BOM system,  the athlete should choose to avoid the quad Salchow because no credit is given for a jump which is not completed.

 

The CoP system has a very heavy emphasis on jumps as opposed to what may be called "artistic" elements such as step sequences, spins, use of edges, flow and general presentation.   Since the jumps are worth more points,  it should be expected that the skaters will place a heavier emphasis on jumps than on artistry and presentation under CoP.   In contrast,  in the BOM and OBO system,  the skaters need to achieve a balance between the purely technical performance of elements and the artistic performance of the elements.   BOM and OBO place slightly more than 50% of the weight of the marks in the artistic mark (in the free program the presentation mark is the tie-breaker).

 

The obvious goal of each skater under any of the systems is to maximize their score and therefore their placement.  The nature of a performance under CoP will change,  probably radically,  as the skaters and coaches learn what is required.  

 

 

Historical Record:

 

One of the benefits of a scoring system is the possibility of developing a historical record through which the performance of athletes can be compared over time.   In sports where the measurement is objective,  such as in track and swimming where time to complete a set distance is recorded,  this is achievable.   However,  if the rules or equipment changes sufficiently,  valid comparisons cannot be made.   Controversies will occur around attempts to make comparisons of things which are not really comparable.

 

An example of this came up recently in baseball counting the number of home runs in a single season.   If the number of games in a season changes over time (more games have in fact been added) then a simple comparison of number of home runs in a season is not a valid comparison.   Neither is the comparison valid if the distance a ball has to be hit changes – or the distance from the pitcher’s mound to home plate.  The same is true if,  as many believe to be the case,  the ability of the pitchers has changed significantly over time.  

 

So,  if the rules change or the equipment changes,  the comparisons are not directly meaningful.  It is certain that,  given past history in Figure Skating that the system is going to change over time.   In the not too distant past:  figure skating events included Figures,  rules have been changed to limit the number of jumps vs other elements,  the importance of various programs (short and long) has been altered as has their use in qualifying rounds.  All of these have some impact on both the strategy and the resulting performance/placement of skaters in major competitions.

 

While valid direct comparisons cannot be made,  under the BOM and OBO systems it is possible to track the marks awarded to the skaters over time.   Skaters routinely have marks which increase over time,  they reach a peak,  and if they continue to skate the marks start to decline.  Records do exist as to which skaters have completed which jumps.  And records do exist for the number of awarded 6.0s and win/loss records.

 

The tendency to change the rules will be true regardless of whether the CoP is adopted or not.  It is quite possible that the Code of Points will require many substantive changes of the marking rules in the next decade as the system has not been “tested” long enough under real conditions for anything other than extreme problems to have emerged and become noticeable.   It is highly likely that both the number of points awarded for different elements and the +/- grade of effort points are going to change.  It is likely that there will be pressure to inflate points over time since points are the basis for the marking.  As long as substantive change occurs,  no valid historical comparisons will be possible. 

 

However,  under CoP as under BOM and OBO,  there will still be available a record of which skaters have completed which jumps – although under CoP there may not be quite as stringent a requirement as to what is considered a completed jump as under BOM and OBO.   In addition,  there will be a win/loss record in major competitions over time for individual skaters.   What will be lost by the CoP system is the concept of the mark of 6.0 – performing at a level considered to be perfection at that point in the history of skating.  

 

 

Flexibility:

 

Different scoring systems have differing types of flexibility.   Some systems are essentially static and have no inherent flexibility and may be considered to be ”rigid” – such as in track and field where time and distance are commonly measured.   Other systems undergo rule changes to achieve flexibility as occurred in basketball when a 3 point basket was added. Such a system of marks may be considered to be “stiff.” 

 

Rigid systems do not invite change.   Stiff systems my permit some change to occur,  but the change really must be implemented at the level of the rules to take effect in a consistent and appropriate manner.

 

The CoP system will have to achieve flexibility by adding points awarded for new elements and changing the points awarded if the importance weighting of the elements requires change.   Thus the CoP system is categorized as “stiff.”   The CoP system,  therefore,  will slow innovation is skating and in general resist change that is not adopted in the rules.   This is because the point schedule is a set schedule and the best possible strategy for a skater can be known given the skater's set of skills and a copy of the point schedule.

 

The stiffness of the CoP system implies that it cannot be changed successfully while in application during a competition.  If a new element is performed by the skater,  the system cannot appropriately cope with the change – points may not be awarded even for a relatively difficult new element.   In fact,  it may take several seasons for the new element to be worked into the CoP system properly.  

 

In addition,  the stiffness of the CoP system may magnify problems due to inconsistent training of the Judges.   As noted earlier,  much of the scoring in the CoP system is subjective.  Inadequate training or preparedness of some Judges under a stiff system implies that the full dynamic range of the marks for any given element (7 different levels) may be inappropriately applied in a relatively inflexible manner.

 

One other concern with the stiffness of the CoP system is that it may not be properly applied if a skater modifies her program during execution of the program.   There are a large number of individual elements to be judged in isolation and an essentially declared, known order in which they will be performed.   Today,  skaters will freely modify their programs as they are being executed.   Depending on the method of implementation of the technical marking,  the accuracy of the awarding of base points for a modified program elements may or may not be as good as if the program was executed as planned.

 

The BOM and OBO systems are inherently flexible.   They are fully capable of coping with added new elements within a competition since they directly compare performances of skaters to each other.   New elements can be worked into the marks as they are encountered and program modifications during the performance are automatically incorporated.   These systems do not artificially resist change,  in fact they invite skater creativity and change.

 

 

Properties Of Placement Systems

 

 

The placement system is the system by which the marks are changed into placements.  In many sports there is only one placement system,  in others there are several.   For example,  under BOM and OBO in figure skating,  the placement system for the short program is different from the placement system for the long program and both of these are different from the placement system for the overall result. 

 

Regardless of the sport,  I don't know of any placement system which does not convert the marks into ordinals -- first, second, third, and so on.  

 

In most sports,  this is conversion from marks to ordinal placements is straightforward and often trivial.  For instance in track,  the placements are the ordinals derived from the measured times to complete the course: the athlete who has shortest time receives the 1st place ordinal.   In baseball the placements are the ordinals based on the number of points (runs) acquired: the team with the largest number of points receives the first place ordinal.

 

In both track and baseball,  the conversion from the awarded marks to ordinal placements is a one dimensional function.   That is,  there is only one mark for each team or athlete to convert to a placement ordinal.   This type of conversion is clear and unambiguous.

 

Some sports have more than one set of marks to convert to a placement. Examples of this situation arises in gymnastics, figure skating,  diving, and freestyle skiing.   These sports all have to take a multidimensional measurement and assign a single dimensional placement.   The problems associated with doing this well are non-trivial and are not generally fully understood by the spectators, athletes, coaches and officials.

 

For example,  as soon as there are just two sets of marks,  one from each of two Judges, for each competitor.  Each mark can independently provide a potential placement ordinal.  The trivial solution often is to add up the marks and then convert to a placement as is done in most sports.  The above solution is an example of a placement function which takes a two-dimensional input and yields a one dimensional placement.

 

But the question which eventually has to be addressed is:  "what do we do when the marks are very different?"   This trivial solution seems appropriate until different types of errors in marking (precision and accuracy), biased marks (cheating by a Judge or block of Judges), and maverick marks (extreme error in the marks) are encountered.

 

Suppose for instance that one of the Judges made a serious error and awarded low marks for a competitor.   If the marks are simply added up,  then the result for that athlete will be incorrect.    We could use the results of a single Judge among the two,  but without knowing that an error occurred,  how do we know which one is the right one to use?  How do we fix this placement system?  Do we add more Judges?  If so,  what do we do with their marks?

 

There are many different properties of placement systems.   In the following sections we will explore some of the key properties of how placement systems function. 

 

 

Random Vs Non-Random Placement:

 

I know of no other system in sports which employs randomness in its selection of a winner while placements are being awarded.  The CoP system inherently inserts a random element into who wins the event and which places the other competitors get.

 

The CoP system randomly selects a subset of the Judges to use for placements – however all of the Judges have given marks.   Mathematically, to determine the winner, it makes no difference if the random subset of Judges is chosen after the event is over or while it is in progress.

 

This is analogous to having two baseball teams play 14 games and selecting at random 7 of the games to determine the overall winner.   In such a scenario,  if the outcome of 4 of the 7 randomly selected games determines which team wins,  it is possible for the winner of the 14 game series to have won only 4 games and the losing team to have won 10 games.  On average such a system will select the correct winner,  but there may be a substantial number of incorrect selections.

 

Most objective observers will inherently agree that the overall winner should be the team which wins the most games.   They will regard the result as inherently wrong if the team which wins the least number of games is declared the winner.

 

This oddness in placements is precisely what is happens under the CoP system.  The outcome of skating under the CoP system is only weakly determined by the performance of the skaters in the event.  14 Judges will Judge the event and 7 (or 9) will determine who wins.   The overall winner can be,  and will at some point be,  due to a small minority (4) of the Judges of the full panel of 14.

 

The BOM and OBO systems are said by some to have a similar randomness in outcome.  This claim arises because of the random appointment of Judges to the panel prior to the event.   The argument of some spectators is that Judges favor one skater over another and will mark accordingly.  Hence the random Judge appointment translates to a random outcome.  This is by no means a similar setting.   This claim is equivalent to saying that the great majority of the Judges are dishonest.

 

The actual randomness under the CoP system is inherent in the system.  It exists whether the Judges are dishonest or not.   If cannot be removed.  Under BOM and OBO if dishonesty exists among the Judges it can be removed.   There is no scoring or placement system fix if a majority of the Judges are believed to be dishonest.

 

How random is the CoP outcome?   We will examine the randomness of the selection process assuming a subset of  7 of 14 Judges.  A similar analysis can be done for the random selection of 9 Judges.  In this analysis, we will assume that the Judges are all 100% honest and are not cheating.  

 

If, given the original panel of 14 Judges,  there are 7 Judges who place skater A best and 7 Judges who place skater B best,  then  A and B will win 50% of the time each.   The correct result should be a tie between A and B.   In the case of a 6/8 split among the panel of 14,   the skater placed higher by the  minority will win about 1/3 of the time.  In the extreme case, where 4 Judges among the panel of 14 place a particular skater higher,  that minority skater will win approximately in 1 of 25 such events.

 

In all sports,  controls can be effected which control dishonesty among the Officials.  These controls do not belong in the marking and placement systems although some marking and placement systems can control some dishonesty.  The inherent randomness in the CoP systems however is not correctable if a random selection of the Judges is going to continue.

 

 

Consistency With Marking Intent:

 

We would expect that a good placement system will award a placement which is at least no lower if one or more of the marks for a skater is raised.   In addition,  we would expect that a good placement system will award a placement which is at least no higher if one or more of the marks for a skater is lowered.

 

Systems which toss out one or more high marks and one or more low marks do not meet this expectation.  That is,  under some circumstances (not unusual),  the actual choice of a single Judge’s mark may not reflect the Judge’s intent.  A higher mark may cause another winner to be selected.   Or a lower mark may cause the skater to win.

 

The CoP system,  if it retains the tossing of high and low marks,  does not meet the expectation.  This is an inherent flaw in the methodology.

 

The OBO and BOM placement systems meet the expectation that awarding a higher mark will not lower the placement and that awarding a lower mark will not raise the placement.

 

 

Compromise Vs Majority:

 

Some placement systems are majority systems and others are compromise systems.   A majority system is defined as a placement system in which the winner is the athlete who is felt by the majority to be a winner.   This is a natural sense of what is a winner in Judged competitions and in most other types of competitions as well.

 

Placement systems which are compromise systems produce some winners who are not the choice of the majority.   There is a “compromise” between the majority and the minority and the athlete who best satisfies that compromise is declared the winner.   Such winners may also be called “technical” winners.   The majority knows which competitor really won,  but also recognizes that the declared winner met some specific criteria.

 

An example of a compromise winner is easily found when the total points (marks) of the Judges are used to declare a winner.   The type of winner is simply the skater who has been awarded the most points.   Having the most points overall does not imply that the majority of he Judges favor that particular athlete.  It is possible that the athlete is favored by a single Judge and is in second according to all other Judges.

 

 In addition,  if one or two high marks and low marks are tossed out, as is done in some systems,  the sum or average of the remaining marks still produces a compromise winner.

 

The CoP system is a compromise system.  The current BOM and OBO markings systems always produce a majority winner.  

 

 

Robustness To Marking Scale Shifts:

 

There are two types of marking scale shifts under consideration in this section.   The first is a bulk shift,  from Judge to Judge,  in what the marks mean in general.   This may occur due to regional differences and / or educational differences.  For example,  there may be regional differences (due to training inconsistency) in when rotation is considered to be complete for a jump or where the rotation begins and ends in a spin.    As another example,  there may be regional differences of opinion in what the “finish” for a jump or other element is supposed to look like.

 

Another type of marking scale shift may be called a spread shift.   An example of this is that a Judge may allow a small "cheat" for a very difficult quad jump and not allow any "cheat" for an easier triple or double jump.

 

Under the CoP, BOM and OBO systems,  the requirements for what is to be considered a completed element are not fully specified.  In fact,  much is left to the interpretation of the Judges.   Under the CoP system,   the grade of effort marks are poorly defined.   The Judges should be expected to have both bulk and scale shifts.

 

As with marking systems,  placement systems may be categorized as rigid, stiff, or robust (similar to flexible).   Rigidity in the placement system concerns how closely the actual marks are honored.   If the marks themselves, through a mathematical operation such as addition or averaging, are the basis of the placement,  then the placement system is rigid.   A rigid system cannot correct for scale shifts caused by differences in training, region, or rule interpretation.   Rigid placement systems require uniform training and exact or nearly exact usage of the rules for good performance.

 

The CoP system is rigid in making placements,  which is curious because the CoP system also uses random selection of the winner and may toss high and low marks.   In any case,  the CoP system does not allow for differences in regional preferences, training,  or interpretation of the rules to exist between the Judges on a panel.   Such differences can and do exist in Figure Skating.  The CoP placement system retains the information on how close the skaters were within a Judge but loses (makes no use of) the information about the differences between the Judges.

 

The OBO and BOM placement systems are robust to scale shifts of both types.   These systems convert the marks to ordinals.  The ordinals place the skaters a fixed distance apart within each Judge.   Any scale differences between the Judges are therefore corrected;  however information within a Judge regarding the closeness of the skaters is lost.   In the case of using ordinals,  the information loss has been heavily studied in what is known as non-parametric statistics.   In generally the information loss within a Judge will be relatively minor and will most likely be made up by one or two additional Judges.

 

Both sets of placement systems lose information.  In the case of BOM and OBO,  it has been shown that these systems are, using 9 Judges,  more efficient than a total marks system of 7 Judges (assuming none are tossed).   This is consistent with expectations from non-parametric statistics which would imply that we should expect at least 86% efficiency from an ordinals based system.   The CoP system,  with 7 Judges  can do no better than OBO and BOM and can only equal the performance of OBO and BOM if the marking is a) perfect (no abnormal errors) b) no scale shifts between Judges c) all 7 Judges are used.  If any of the 7 Judges are dropped, scale shifts occur,  or abnormal marks occur,  the CoP system will not perform as well as BOM and OBO.

 

 

Biased Judging And Block Judging:

 

In placement systems which involve Judging,  concern exists for the possibility of both biased Judging and what is now called block Judging.   Mathematically,  biased and block Judging are the same and should have the same mathematical controls.   In both cases,  marks are artificially raised or lowered for particular skaters by one of more Judges.

 

The OBO and BOM system control for bias and blocking through their emphasis on the majority.   In both of these systems the majority is defined by the middle Judge of the panel.   The only way to move the middle Judge is for the biased Judge,  or Judging block members, to cross the middle Judge and therefore change which Judge is in the middle.   If the biased or block Judge naturally provides marks on the high side of the panel for a skater,  raising their marks has no effect at all regardless of how high their marks are raised.   Similarly if they are naturally on the low side of the panel,  lowering their marks further has no effect at all on the results.   This is because the Judge in the middle does not change.

 

In the CoP system,  there is apparent confusion about what constitutes proper controls for block Judging and biased Judging.  Although block and biased Judging are accomplished in mathematically the same manner,  the developers of the CoP system believes they are using random Judge selection to break up the blocks and tossing high and low marks to remove bias.  

 

The interesting part of this is that,  while it is the case that the influence of a block of Judges is reduced by random selection some of the time,  the same random selection also increases the influence of block Judging some of the time.  On average random selection has no effect on block Judging and cannot have a long term effect unless the randomization is biased against particular Judges known to form blocks!

 

Thus the only meaningful control for biased and block Judging in the CoP system comes from tossing the high and low marks.   If the CoP system begins with 14 Judges,  and tosses 5 or 7 through random selection,  then all that is accomplished in the long run is that either 36% (select 9 Judges) or 50% (select 7 Judges) of the information in the panel is lost by random selection.  On top of this,  it is now expected that the CoP system will further reduce the amount of information by tossing two high and two low Judges (from a panel of 9).   Therefore,  the CoP system makes placement decisions using only 36% of the information available to it from the full panel.   The CoP system cannot properly handle a combined total of more than 2 Judges block or biased Judges (in the same direction).   This is because  the control for blocking and bias are the tossing of highs and lows.  Add to this the randomness in the winner selection and the problems with marking consistency.

 

So the CoP system loses 64% of the information in the panel to control at most,  on a long term average,  two biased or two block Judges.   In contrast the OBO and BOM system,  in controlling bias and block Judging,  loses at most 14% of the information on a panel and can routinely control biased and block Judging from as many as 4 Judges on a 9 Judge panel or up to 7 Judges on a 14 Judge panel.

 

 

Robustness To Error:

 

Robustness to marking error by a single Judge is mathematically the same as intentional bias and has been covered in the preceding section.

 

 

Fixed Vs Reversals:

 

The BOM and OBO system suffer from one problem which is somewhat unique – interim results during a competition are not stable. That is,  it appears to the viewer that these placement systems allow for the performance of a skater to affect the running placement of skaters who have already completed.   What is actually occurring is that,  since the OBO and BOM systems remove scale shifts,  they no longer “know” how far apart the skaters really are on any individual Judge’s scorecard.   For these two systems,  the placement results do not really exist until all other skaters have skated.  It is then,  and only then,  that the correct ordinals can be created – and these correct ordinals are the basis for the placement.   What exists while the event is underway is interim estimated ordinals. 

 

The OBO system is somewhat less sensitive to reversals than the BOM system but that reduction in frequency of reversals comes at a price.  The BOM system achieves higher accuracy in determining the correct winner in close events by about 15%.

 

The CoP system does not suffer from reversals since it has reduced the entire skating performance to one fixed number.  If a skater achieves higher points than another skater,  he will from that point forward remain ahead.

 

The CoP placement system essentially provides a single (one dimensional) mark for each skater upon completion of her program.   The CoP placement system then takes the one dimensional score and assigns a placement.   The root of the problem with the BOM and OBO system is that these two placement systems, at any given time in the competition, retain a 9 dimensional marking times the number of skater who have already skated (e.g.: 9 Judges by 10 skaters =90 dimensions) and reduce this to a one dimensional outcome (placement).   As each skater completes their program and marks are awarded,  the dimensionality of the projection of the marking space into outcome space changes.   The real projection does not occur until all skaters have skated.  

 

 

Summary

 

 

 

 

 

 

 

Property

BOM/OBO

CoP

Fully Tested

Yes

No

Type of Marking

Subjective – Relative

Subjective - Absolute

Precision

High

Questionable

Accuracy

Not Applicable

Questionable

Balance of Marks

No Concern (tested)

High Concern (new)

Major Changes to Performances Expected

None

Emphasis on Jumps, Less Artistry

Historical Record

Yes

Yes

Flexibility

High

Very Stiff

Random Winner

No

Yes

Placement Consistent with Marking Intent

Yes

No

Compromise vs Majority

Majority