Winning And Losing In Figure Skating
Edmund L. Russell III
The
ISU has proposed a new system for marking and for placements, which will be called the CoP (Code of Points)
System. The WSF has proposed using the
Best of Majority System (BOM) instead of the CoP or the One-By-One (OBO
system). This paper endeavors to
compare these systems to a universal standard – principles by which marking and
placements systems should be compared.
In
this paper I will use some examples from other sports to clarify some of the
concepts – principally because these other sports are generally simpler in both
marking and placements than Figure Skating.
However the principles learned from these other sports remain the same. I will also,
of course, discuss the implications
of these in the marking and placement systems in Figure Skating.
No
value judgments will be made in this analysis,
just a simple statement of facts and comparisons. It is left to the reader to determine which
of the systems is better suited for figure skating.
Determining
who wins and who places in any competition is a function of both the marking
and the placement systems. The marking
system (or scoring system) has a dual purpose:
awarding marks for event completions and defining which event outcome is
more important than which other event outcome in a competition. The placement
system serves two purposes as well:
defining the order of finish and defining the sense of what it means to
be a winner.
For
example, in American football, the marks (scores) are assigned to have
various numbers of points for completed events such as touchdowns, conversions,
field goals, and safetys. The usual
placement system simply adds up the points to determine the placement with the
highest number of points as the winner.
In the Superbowl, the placements
are determined in the same manner as a single game. The sense of the winner in a football game
is the team which earns the most points on a particular day. The sense of the winner in the SuperBowl is
the best team on the particular day of the SuperBowl. The winner overall may or may not have the
best win-loss record in football for that year.
In
contrast, in Baseball, all events are worth the same number of
points in the marking system – a single point each time a player crosses home
plate. In a single game the placement is
determined by the total of the points as in Football. However,
unlike Football, the placement
within the World Series is determined by the win-loss record of the two teams
in the World Series. The sense of the
winner in a single game is the best team of the day. The sense of the winner of the World Series
is the team which has the best win-loss record in the World Series.
In
both Baseball and Football, changes in
the relative importance of different scoring events can be made by modifying
the marking (scoring) system. Football
would be greatly different if field goals were worth more points than
touchdowns – we would see far more field goals attempted instead of the riskier
activity of trying for a touchdown. In
baseball, if home-runs were worth 6
runs, many more players would try to hit
home runs and it would be the case that the team with the greatest number of
home runs would generally win any given game.
In
these conceptually simpler games of football and baseball, it is easy to see how changes in either the
awarding of marks (scores) or the requirements to determine a winner can affect
the outcome of the game. In Figure
Skating, the marking system and
placement systems have a similar effect.
We can predict that if jumps are given more weight than spins, there will be more emphasis on jumps. We can readily imagine the potential changes
in outcome if the short program were given more weight than the long program –
it is likely that both the routines and sometimes the winners would be very
different. Even more so if there were no
penalty at all for missed jumps -- there would be many more missed jumps. And if there were no penalty for extra jumps
we would probably see a lot more jumps in the programs.
It
is now easy to see that every change in the manner of awarding marks and
placements will eventually change a sport to some extent. The problem is to determine how it will
change and by how much. And of
course, the more radically different the
changes in the marking and placement
system, the odds are higher that the
changes in the sport will be equally dramatic.
The
proposed Code of Points system for Figure Skating is a radical change. This proposal weights artistic and technical
elements differently than the current marking system – there is much greater
weight given to jumps. In addition it rewards failed jumps with what will
probably be viewed as a minor penalty.
Finally it changes the placement system and thus the sense of the
winner. In this system the sense of
winner is that of greatest point earner in a competition. This winner may or may not be the majority
favorite – as is chosen in the OBO and Majority systems. The impact on the sport is likely to be
substantial and the character of the sport should be expected to change
significantly over several years if the Code of Points system is adopted.
In
addition to this examination of the impact of changes in the marking and
placement systems and how the athletes
will eventually focus their skills,
we should examine what occurs when the scoring and placement system
proposals are “tested” by the ISU staff.
They perform a short (often badly inadequate) series of what they call
“simulations” to determine if the winners and top placements will be radically
different from whatever the current scoring system and placement system
yields. If those winners change
radically, the proposed system under
development will be modified to try to bring it back into balance with the
existing system.
However, note that there are two fatal flaws in this
method of "simulation" testing.
The most critical one is that the skaters are still competing under the
current system. Thus, their strategies and skill sets have not been
adjusted to the new scoring system.
This does not change substantially even if a short series of real
competitions are held under the proposed system as optimal competition
preparation and strategy will take several years to develop under full time use
of the new system. The second fatal flaw
is that the new system is changed or "tweaked" after the
“simulation.” It is often easy to
modify the system being “simulated” so that the results come reasonably close
to the outcome of the current system.
The
goal behind this "simulation" is not to fix the new system as much as
it is to attempt to answer the question of who would have won such and such an
event if the new system were being used.
The developers of a new system don’t want to mention that the winner was
someone who actually ended up in 10th place! If too many "simulations" are
held, it becomes difficult to impossible
to tweak the proposed system to be consistent with the real competition
results. To this end, the number of simulations is, predictably, small.
As
a final comment on the testing of a new system,
under current practice, the
system actually put in place usually does not match the developing system which
has been used for the majority of the testing.
Final tweaks are made and a brand new,
untested system is put into service.
It is guaranteed that any system which is tweaked to match prior sense
of the winner as closely as possible will not continue to match the prior sense
of the winner after it goes into full time use.
I
would strongly recommend locking down the proposed new system, not allowing any changes (minor or
major), and using it side by side the
current system to see what may occur once the system gets into general
use. Of course it is critical that the
developers do this in good faith and openly post the lists of their placements
instead of keeping them secret as has been done in the past. I would think that 10 locked down
“simulations” of each event type (singles, pairs, dance, synchro) would be
sufficient for all but the most radical of proposed changes in scoring and
placement.
In
this section, the focus will be on the
scoring systems which have been in recent use or proposed for figure skating
although references to other systems may be made to illustrate concepts more
clearly. In particular the Best of
Majority system will be contrasted to the proposed CoP system.
There
are four basic categories of marking systems and many different implementations
of these four basic categories. The
categories are: objective-absolute, objective-relative, subjective-absolute, and subjective-relative. The first modifier is objective or subjective; this modifier indicates the presence of
objective equipment (ruler, balance, clock) or no objective equipment
(subjective, trained observer). The second modifier is absolute or
relative; this modifier indicates
whether the comparison is being made to a fixed table or metric (absolute
scale) vs not having a fixed table or metric (relative scale).
The proposed CoP system is a subjective-absolute system for marks while the BOM and OBO systems are subjective-relative system for assigning marks. None of the system, proposed or current, uses objective measurements -- all use trained observers known as Judges to assign marks.
Absolute Scale Vs Relative Scale:
Systems
of marking may use either an absolute scale or a relative scale. Absolute scales
for marking are scales which award a fixed mark (points) for completion of a
particular element or task. For
instance in track, marks are awarded
which are equal to the time it took to complete the course. Relative scales for marking are scales which
award marks (points) relative to the performance of others in the
competition. An example of a relative
marking system are the BOM and OBO systems currently used in figure skating.
It
is possible to convert absolute marks to relative marks by simply referencing
the absolute marks to one (probably the first) of the competitors and reporting
+/- minus marks based on the index mark.
Inherently, no information is lost in this conversion from absolute to
relative scales. Relative marking scales
are not inherently inferior to absolute marking scales.
One
major difference in typical application between using absolute vs relative
marking, especially when used to mark
performances (such as in a figure skating competition), is that the absolute scale assumes that the
total performance is equal to the sum of the elements. In contrast,
the relative scale can be used in the same manner and in addition it
allows the marking of the performance to be more than the sum of the
elements.
Because
of this inherent flexibility, relative
marking is generally felt to be superior in marking performances since a
performance is frequently seen as more than a simple sum of the parts. For instance,
a “polished” presentation does in fact look better than an “unpolished”
presentation and may in fact be much more difficult even though the same
elements are completed.
To
elucidate a bit, there is no difference
when using an absolute scale system of marks between doing pieces of a figure
skating program in isolation or doing them in a program. It would be easier to measure performance
under an absolute system if we lined up the competitors and ask them to
do, one at a time, a particular element (say, a jump) and then
-- using automated equipment -- measured various attributes such as height,
distance covered in the air, speed into
jump, speed out of jump, amount of pre-rotation on the ground, amount of post-rotation on the ground,
direction of travel out of jump, etc.
Various
methodologies may be used to implement relative marking scales. The most common method is to assign the best
mark to the best performance, the next
best mark to the next best, and so
on. In general this is implemented in a
fashion such that each athlete's performance is fitted into its proper place
among the performances which have already occurred. Often points are assigned as
placekeeprs, but in theory almost any
sort of marker or set of symbols can be used.
A
less common method is to make pairwise comparisons. Pairwise comparisons often suffer from
indeterminacy when more than one aspect of an outcome is to be compared, especially if the aspects are equally
weighted in importance. For example, it
may be judged that A is better than B and B may be judged to be better than C
and yet C may be judged to be better than A.
This indeterminacy usually precludes the use of pairwise comparisons
when multiple aspects of a performance are being judged. When a single aspect of a performance is
being judged, pairwise comparisons do
not usually suffer from indeterminacy although it is theoretically possible.
Another
feature of an absolute scale is that there is a fixed level of fineness of
difference between two outcomes which can be measured. Relative scales generally do not suffer from
this problem. For instance, if skater A does a jump with a 6 inch
deliberate change of edge and skater B does the same jump with a 12 inch
deliberate change of edge, the absolute
system will apply the same deduction for both skaters while relative marking
can award a lower mark to the skater with the longer change of edge.
The
proposed CoP system for figure skating compares the outcome of key elements in
a program to a schedule of points and therefore uses an absolute scale. Because it uses an absolute scale, the CoP system assumes that a skating
performance is equal to the sum of the pieces.
And because the CoP system uses an absolute scale, there is a lower limit to the level of
fineness in performance differences the system can measure.
The
BOM and OBO system for figure skating use a relative scale. These two systems can assign marks as either
the sum of the pieces or more than the sum of the pieces. In addition,
because of the type of relative scale used in BOM and OBO, these systems are capable of measurements
much finer than the CoP system is capable of making.
Objective
Vs Subjective Marking:
The
vast majority of sports measure only one aspect of a performance while some
sports measure multiple aspects of a performance. How the measurements are made
determine if the measurement is objective or subjective.
An
objective measurement of an event is a measurement made by equipment – human
judgment does not enter into the measurement to any great extent. Ideally, no human judgment is required at
all, but this is rare in practice. Common types of objective measurements in
sports include distance and time. In
some sports, such as baseball and
basketball, the event being measured is
very well defined and does not take a lot of human judgment but the observation
of the event is often left up to a person.
Such measurements may also be considered to be objective most of the
time. There are exceptions of course and
during those exceptions, baseball and
basketball have some points (marks) added due to clearly subjective calls by
referees. Objective measurements,
especially those made by equipment,
are often very accurate.
Subjective measurements imply observation by a trained observer. In subjective measurements human judgment is a significant component of the measurement. When a subjective measurement is made, it is generally made through a comparison to some standard. The standard is usually either a minimum standard (at least as good as outcome), average standard (usual outcome), or the maximum standard (ideal outcome).
Of these, the weakest subjective comparison is to the average standard since average cannot usually be defined concisely. Each Judge has their own concept of what “average” means. In fact, their concept of “average” may change over time and frequently will change during the course of a competition. That is, what was considered “average” for a triple toe-loop at the beginning of an event may no longer be considered average by the time 20 skaters have completed their programs.
Another
possible method of comparison for an objective system is to compare one
observation to one or more other observations.
With this particular kind of objective comparison, it is possible to discriminate at a much finer
level between outcomes that it is when comparing to a standard. This is because when comparison is to a
standard, the increments between the
finest measurements are generally fixed.
That is, differences smaller than
the defined difference fall in a category of “not a significant difference.”
There
is also a difference between the accuracy of comparing to a standard and
comparing directly to another outcome.
When comparing two skaters to a standard to determine the ranking of the
skaters, there are two opportunities for
error for each of the skaters for a total of four opportunities for error: for each skater we have the error in the
observation itself and in the match to the standard. When comparing two skaters directly with
each other, there are three
opportunities for error: the two
observations and the direct comparison.
If
the error sizes are equal for all of these opportunities for error, the direct comparison of skaters against each
other has less error. Mathematically, the standard deviation of the ranking of the
skaters using the comparison to a standard is 15% larger than a direct
comparison. That is, direct comparisons are inherently better than
comparisons to a standard. Since the
goal of a marking and placement system is to place the skater, the added activity of comparing to a standard
is both wasted and is inherently inferior.
In
the case of the proposed system, trained
observers identify components of a program and award points. Therefore the proposed CoP system is a
subjective system for awarding points.
In addition, the CoP system uses
the weakest of these methodologies. When
the CoP system of points is used, the
Judges must compare what the skater has accomplished to a mental image of what
an “average” outcome is for each particular element to be marked. The Judge may then assign additional points
ranging up to +3 (for exceptional) and down to –3 (for very poor). No concise definitions are given for very
poor through exceptional – a fair amount of discretion is left to of each of
the Judges. Part of the problem with
developing a concise definition is that the elements have many properties which
are considered to have value (height, flow, cleanness, position in the air). Thus the CoP system is a subjective system of
marking. In addition, the CoP system inherently assumes a
performance by a skater is just the sum of particular pieces of the
performance.
In
the BOM and OBO systems, trained
observers compare the performances of skaters directly with each other. The BOM and OBO systems are subjective
scoring systems. Typically the number of direct comparisons is small in that
the task of the Judge is to work the skater into her proper position among
other skaters. In practice, the actual comparison of any one skater is
generally at most to the performances of two other skaters The direct comparisons may be very accurate
for the typical differences between skaters and comparisons can be made which
are much finer than in the CoP system when such distinctions are required. In addition, in the BOM and OBO systems, the placement is sometimes decided upon using
more than just the sum of the elements performed. That is,
placements will reward a superior total performance with a higher
placement than a performance which is equivalent in elements completed but
inferior in total performance.
In this section, we will be concerned with errors in marking and not with intentionally biased marking which is explored fully in different sections. We will also not address the occurrence of what may be called "maverick marks." Maverick marks are not intentionally biased but are so far from what may be called the "correct" mark that they are obviously wrong.
When the performance of an athlete's outcome is awarded marks the performance has received a measurement from a measurement system. A measurement system consists of the people, the equipment, the methods employed, and the environment of use. Not all measurement systems are equivalent and the performance of a measurement system can be changed by altering any of the components of the measurement system. Two key properties of measurement systems which impact the performance of any measurement system are accuracy and precision.
There are well defined techniques used throughout the scientific community and industry to determine the source of errors in accuracy and precision of measurement (marking) systems. These techniques are sometimes called "gauge R and R studies" or "components of variance studies."
The definition of accurate is that the long term average mark or measurement is correct and can be compared to a well defined, controlled standard. In statistical terms, a highly accurate measurement system has a long term average measurement (mark) which is close to the true measurement (mark) while a less accurate system has a long term average which is not close to the true measurement (mark).
The definition of precise is that the marks or measurements (marks) are consistent with each other. A highly precise measurement system has a small standard deviation (all marks close to each other) for the measurements about the average measurement while a less precise measurement system has a larger standard deviation (marks far apart from each other) about the average measurement. Note that accuracy is not used in the definition of precision.
Precision does not imply accuracy and accuracy does not imply precision. To elucidate, consider the sport of target shooting. If the shots all land in the bulls-eye, the shooting is both accurate and precise. If the shots all land in a small group to the upper left of the target, the shots are precise but not accurate. If the shots are all over the target but on average are centered on the bull's eye, the shots are accurate but not precise. If the shots are all over the target and are centered around the upper left of the target, the shots are neither precise not accurate.
Obviously, if we want to improve target shooting performance, if the problem is one of accuracy we simply aim in a different place. If the problem is one of precision, we have to fix whatever is causing the lack of precision, perhaps the breathing or timing with the individuals pulse. A problem with accuracy is typically easier to correct than a problem with precision. It is well known however that attempting to fix a problem of precision using a fix for accuracy or vice versa actually worsens the total performance.
Let
us now consider an example with a measurement:
measuring the length of a pencil using a ruler. When we measure the length of a pencil with a
ruler, many factors can come into play
to affect the quality of the measurement.
The ruler itself may not be correct – that is the marks on the ruler may
be systematically different from the international standard for length. In addition, the measurement technique can
cause errors of differing amounts. Some
people begin measurements at 0 while others start at 1 (or 10), some people close one eye while others use
both eyes, some interpolate between the
marks while others round to the nearest mark,
some look straight down while others look from the side -- there are dozens
of variations. In addition, there may be time constraints and issues with
lighting and physical accessibility.
Obviously,
if we are going to measure the length of a pencil with a ruler (an
objective-absolute measurement) to determine if it is longer than a particular
length, both accuracy and precision are
important. However, if we are going to compare two pencils to
determine which is longer, accuracy is
not important. In this case, each measurement is assumed to have a
precision equal to 1/2 of the finest tick mark on the ruler.
As
an alternative to measuring the length of the pencils using a ruler, we could make a tick mark on a straight piece
of wood at the length of one of the pencils and then determine if the second
pencil extends beyond that tick mark.
Such a measurement is objective-relative and may be more precise than
making measurements with rulers.
We
could also place the two pencils side by side and simply observe which is
longer. Such a measurement is
subjective-relative. As noted before, subjective-relative measurements often
discriminate more finely between two outcomes than an absolute-objective
measurement if the absolute measurement scale is not divided finely
enough.
It
is easy to see that in the example of comparing the length of two pencils to
determine which is longer, that there is added opportunity for error in making
two independent length measurements instead of a single direct comparison. If the standard deviation for a single
length measurement is the same as the standard deviation for a single direct
comparison, then the error for two
length measurements to make a decision as to which is longer is 41% larger than
for the single direct comparison.
Unless some special circumstance is in effect which makes the direct
comparison inherently less precise, the
direct comparison (subjective - relative) is better than the two measurements
using a ruler (objective - absolute).
The
CoP system asks the Judges to make an independent measurement of the quality of
each element in an athlete's program in isolation from the other elements in
the same program and that of the programs of other athletes. There are a large number of measurements
being made during each program. Since
the CoP compares to a standard, there
are errors both in precision and accuracy for each of the measurements of each
element and each skater.
Furthermore, the Judge’s
attention is divided between two tasks:
getting the measurement assignment correct in accordance with the
standard and making sure that the right skater gets the higher marks in the
grade of effort (since the grades of effort are not clearly defined). Because of the large number of marks given
during each program, it is likely that
the Judge will not be able to execute the second of these tasks well.
The BOM and OBO systems on the other hand are making direct comparisons of skaters against each other. Precision in marking in BOM and OBO is important, accuracy in marking is irrelevant. That is, the only thing which is required of the Judge is that the correct skater should get the higher marks. This removes one type of error from marking in these two systems. In addition, in BOM and OBO, the Judge's attention is focused on what is really important – making sure that the skater is placed correctly. None of the Judge’s attention is deflected to a secondary task which is irrelevant to placement.
In most measurement or marking situations where comparisons are being made, precision is important and accuracy generally is not. In the CoP system, both accuracy and precision are required because the measurements (marks) for each element are being made in isolation from other elements and are then being combined afterwards. This has implications for Judge's training and for fixing marking problems. In both cases for CoP, the determination must be made as to whether a difficulty a specific Judge or group of Judges has is one of precision or one of accuracy. The determination is critical because the type of fix is inherently different. Attempting to fix a precision problem using fixes for accuracy will not work and will make the performance of the Judge(s) worse.
The impact of these issues with precision and accuracy will probably not be seen when the CoP system is tested using small fields of skaters or among the top skaters in a field when the leading skaters are well separated in ability from the rest of the field. They will however become more apparent as the size of the field increases and more leading skaters are close in ability as happens at Worlds and in the Olympics. The precision of the OBO and BOM systems in large fields are fairly well known. Under BOM and OBO, Judges generally can place the leading skaters within +/- one place of the final placement and within +/- several placements for skaters in the middle of the field. The greatly differing marks which sometimes occur under OBO and BOM are more properly considered maverick marks.
What may be called "churning" in results from one competition to another (the same skaters frequently moving up or down several places) should be viewed as a strong indication of a poor measurement system -- and not as an indication of a better system. While it is possible for athletes to have a performance which is poor from time to time, it is the case in most individual sports that outcomes are often fairly predictable. Skating is no different.
Balance
Of The Marks:
The
balance of the marks in a scoring system affects the strategies the players of
a sport will employ. In some
systems, like baseball, all scoring events are awarded with the same
number of points. Any one run counts
just as much as any other run. The best
thing for a baseball player to attempt at any given time is whatever is most
likely to produce a run. In other
systems, like American football, different scoring events are awarded with
different numbers of points. In
football, given an opportunity to make a
touchdown (6 points) or a field goal (3 points) the touchdown will be attempted
unless some special circumstance exists.
In
figure skating under the CoP system, if
a jump such as a quad Salchow scores higher than a triple Axel, the athlete will logically choose to do a
quad Salchow. In addition if the athlete
typically has a fall in performing the quad Salchow and the penalty is not
severe enough, the athlete will still
choose to do the quad Salchow. Under the
OBO and BOM system, the athlete should
choose to avoid the quad Salchow because no credit is given for a jump which is
not completed.
The
CoP system has a very heavy emphasis on jumps as opposed to what may be called
"artistic" elements such as step sequences, spins, use of edges, flow
and general presentation. Since the
jumps are worth more points, it should
be expected that the skaters will place a heavier emphasis on jumps than on
artistry and presentation under CoP. In
contrast, in the BOM and OBO
system, the skaters need to achieve a
balance between the purely technical performance of elements and the artistic
performance of the elements. BOM and
OBO place slightly more than 50% of the weight of the marks in the artistic
mark (in the free program the presentation mark is the tie-breaker).
The
obvious goal of each skater under any of the systems is to maximize their score
and therefore their placement. The
nature of a performance under CoP will change,
probably radically, as the
skaters and coaches learn what is required.
Historical
Record:
One of the benefits of a scoring system is the possibility of developing a historical record through which the performance of athletes can be compared over time. In sports where the measurement is objective, such as in track and swimming where time to complete a set distance is recorded, this is achievable. However, if the rules or equipment changes sufficiently, valid comparisons cannot be made. Controversies will occur around attempts to make comparisons of things which are not really comparable.
An
example of this came up recently in baseball counting the number of home runs
in a single season. If the number of
games in a season changes over time (more games have in fact been added) then a
simple comparison of number of home runs in a season is not a valid
comparison. Neither is the comparison
valid if the distance a ball has to be hit changes – or the distance from the
pitcher’s mound to home plate. The same
is true if, as many believe to be the
case, the ability of the pitchers has
changed significantly over time.
So, if the rules change or the equipment
changes, the comparisons are not
directly meaningful. It is certain
that, given past history in Figure
Skating that the system is going to change over time. In the not too distant past: figure skating events included Figures, rules have been changed to limit the number
of jumps vs other elements, the importance
of various programs (short and long) has been altered as has their use in
qualifying rounds. All of these have
some impact on both the strategy and the resulting performance/placement of
skaters in major competitions.
While
valid direct comparisons cannot be made,
under the BOM and OBO systems it is possible to track the marks awarded
to the skaters over time. Skaters
routinely have marks which increase over time,
they reach a peak, and if they
continue to skate the marks start to decline.
Records do exist as to which skaters have completed which jumps. And records do exist for the number of
awarded 6.0s and win/loss records.
The
tendency to change the rules will be true regardless of whether the CoP is
adopted or not. It is quite possible that
the Code of Points will require many substantive changes of the marking rules
in the next decade as the system has not been “tested” long enough under real
conditions for anything other than extreme problems to have emerged and become
noticeable. It is highly likely that
both the number of points awarded for different elements and the +/- grade of
effort points are going to change. It is
likely that there will be pressure to inflate points over time since points are
the basis for the marking. As long as
substantive change occurs, no valid
historical comparisons will be possible.
However, under CoP as under BOM and OBO, there will still be available a record of
which skaters have completed which jumps – although under CoP there may not be
quite as stringent a requirement as to what is considered a completed jump as
under BOM and OBO. In addition, there will be a win/loss record in major
competitions over time for individual skaters.
What will be lost by the CoP system is the concept of the mark of 6.0 –
performing at a level considered to be perfection at that point in the history
of skating.
Different
scoring systems have differing types of flexibility. Some systems are essentially static and have
no inherent flexibility and may be considered to be ”rigid” – such as in track
and field where time and distance are commonly measured. Other systems undergo rule changes to
achieve flexibility as occurred in basketball when a 3 point basket was added.
Such a system of marks may be considered to be “stiff.”
Rigid
systems do not invite change. Stiff
systems my permit some change to occur,
but the change really must be implemented at the level of the rules to
take effect in a consistent and appropriate manner.
The
CoP system will have to achieve flexibility by adding points awarded for new
elements and changing the points awarded if the importance weighting of the
elements requires change. Thus the CoP
system is categorized as “stiff.” The
CoP system, therefore, will slow innovation is skating and in
general resist change that is not adopted in the rules. This is because the point schedule is a set
schedule and the best possible strategy for a skater can be known given the
skater's set of skills and a copy of the point schedule.
The
stiffness of the CoP system implies that it cannot be changed successfully
while in application during a competition.
If a new element is performed by the skater, the system cannot appropriately cope with the
change – points may not be awarded even for a relatively difficult new
element. In fact, it may take several seasons for the new
element to be worked into the CoP system properly.
In
addition, the stiffness of the CoP
system may magnify problems due to inconsistent training of the Judges. As noted earlier, much of the scoring in the CoP system is
subjective. Inadequate training or
preparedness of some Judges under a stiff system implies that the full dynamic
range of the marks for any given element (7 different levels) may be
inappropriately applied in a relatively inflexible manner.
One
other concern with the stiffness of the CoP system is that it may not be
properly applied if a skater modifies her program during execution of the
program. There are a large number of individual
elements to be judged in isolation and an essentially declared, known order in
which they will be performed.
Today, skaters will freely modify
their programs as they are being executed.
Depending on the method of implementation of the technical marking, the accuracy of the awarding of base points
for a modified program elements may or may not be as good as if the program was
executed as planned.
The
BOM and OBO systems are inherently flexible.
They are fully capable of coping with added new elements within a
competition since they directly compare performances of skaters to each
other. New elements can be worked into
the marks as they are encountered and program modifications during the
performance are automatically incorporated.
These systems do not artificially resist change, in fact they invite skater creativity and
change.
Properties Of Placement
Systems
The
placement system is the system by which the marks are changed into
placements. In many sports there is only
one placement system, in others there
are several. For example, under BOM and OBO in figure skating, the placement system for the short program is
different from the placement system for the long program and both of these are
different from the placement system for the overall result.
Regardless
of the sport, I don't know of any
placement system which does not convert the marks into ordinals -- first,
second, third, and so on.
In
most sports, this is conversion from
marks to ordinal placements is straightforward and often trivial. For instance in track, the placements are the ordinals derived from
the measured times to complete the course: the athlete who has shortest time
receives the 1st place ordinal. In
baseball the placements are the ordinals based on the number of points (runs)
acquired: the team with the largest number of points receives the first place
ordinal.
In
both track and baseball, the conversion
from the awarded marks to ordinal placements is a one dimensional
function. That is, there is only one mark for each team or
athlete to convert to a placement ordinal.
This type of conversion is clear and unambiguous.
Some
sports have more than one set of marks to convert to a placement. Examples of
this situation arises in gymnastics, figure skating, diving, and freestyle skiing. These sports all have to take a
multidimensional measurement and assign a single dimensional placement. The problems associated with doing this well
are non-trivial and are not generally fully understood by the spectators,
athletes, coaches and officials.
For
example, as soon as there are just two
sets of marks, one from each of two
Judges, for each competitor. Each mark
can independently provide a potential placement ordinal. The trivial solution often is to add up the
marks and then convert to a placement as is done in most sports. The above solution is an example of a
placement function which takes a two-dimensional input and yields a one
dimensional placement.
But
the question which eventually has to be addressed is: "what do we do when the marks are very
different?" This trivial solution
seems appropriate until different types of errors in marking (precision and
accuracy), biased marks (cheating by a Judge or block of Judges), and maverick
marks (extreme error in the marks) are encountered.
Suppose
for instance that one of the Judges made a serious error and awarded low marks
for a competitor. If the marks are
simply added up, then the result for
that athlete will be incorrect. We
could use the results of a single Judge among the two, but without knowing that an error
occurred, how do we know which one is
the right one to use? How do we fix this
placement system? Do we add more
Judges? If so, what do we do with their marks?
There
are many different properties of placement systems. In the following sections we will explore
some of the key properties of how placement systems function.
Random
Vs Non-Random Placement:
I
know of no other system in sports which employs randomness in its selection of
a winner while placements are being awarded.
The CoP system inherently inserts a random element into who wins the
event and which places the other competitors get.
The
CoP system randomly selects a subset of the Judges to use for placements –
however all of the Judges have given marks.
Mathematically, to determine the winner, it makes no difference if the
random subset of Judges is chosen after the event is over or while it is in
progress.
This
is analogous to having two baseball teams play 14 games and selecting at random
7 of the games to determine the overall winner. In such a scenario, if the outcome of 4 of the 7 randomly
selected games determines which team wins,
it is possible for the winner of the 14 game series to have won only 4
games and the losing team to have won 10 games.
On average such a system will select the correct winner, but there may be a substantial number of
incorrect selections.
Most
objective observers will inherently agree that the overall winner should be the
team which wins the most games. They
will regard the result as inherently wrong if the team which wins the least
number of games is declared the winner.
This
oddness in placements is precisely what is happens under the CoP system. The outcome of skating under the CoP system
is only weakly determined by the performance of the skaters in the event. 14 Judges will Judge the event and 7 (or 9)
will determine who wins. The overall
winner can be, and will at some point
be, due to a small minority (4) of the
Judges of the full panel of 14.
The
BOM and OBO systems are said by some to have a similar randomness in
outcome. This claim arises because of
the random appointment of Judges to the panel prior to the event. The argument of some spectators is that
Judges favor one skater over another and will mark accordingly. Hence the random Judge appointment translates
to a random outcome. This is by no means
a similar setting. This claim is
equivalent to saying that the great majority of the Judges are dishonest.
The
actual randomness under the CoP system is inherent in the system. It exists whether the Judges are dishonest or
not. If cannot be removed. Under BOM and OBO if dishonesty exists among
the Judges it can be removed. There is
no scoring or placement system fix if a majority of the Judges are believed to
be dishonest.
How
random is the CoP outcome? We will
examine the randomness of the selection process assuming a subset of 7 of 14 Judges. A similar analysis can be done for the random
selection of 9 Judges. In this analysis,
we will assume that the Judges are all 100% honest and are not cheating.
If,
given the original panel of 14 Judges,
there are 7 Judges who place skater A best and 7 Judges who place skater
B best, then A and B will win 50% of the time each. The correct result should be a tie between A
and B. In the case of a 6/8 split among
the panel of 14, the skater placed
higher by the minority will win about
1/3 of the time. In the extreme case,
where 4 Judges among the panel of 14 place a particular skater higher, that minority skater will win approximately
in 1 of 25 such events.
In
all sports, controls can be effected
which control dishonesty among the Officials.
These controls do not belong in the marking and placement systems
although some marking and placement systems can control some dishonesty. The inherent randomness in the CoP systems
however is not correctable if a random selection of the Judges is going to
continue.
Consistency
With Marking Intent:
We
would expect that a good placement system will award a placement which is at
least no lower if one or more of the marks for a skater is raised. In addition,
we would expect that a good placement system will award a placement
which is at least no higher if one or more of the marks for a skater is
lowered.
Systems
which toss out one or more high marks and one or more low marks do not meet
this expectation. That is, under some circumstances (not unusual), the actual choice of a single Judge’s mark
may not reflect the Judge’s intent. A
higher mark may cause another winner to be selected. Or a lower mark may cause the skater to win.
The
CoP system, if it retains the tossing of
high and low marks, does not meet the
expectation. This is an inherent flaw in
the methodology.
The
OBO and BOM placement systems meet the expectation that awarding a higher mark
will not lower the placement and that awarding a lower mark will not raise the
placement.
Compromise
Vs Majority:
Some
placement systems are majority systems and others are compromise systems. A majority system is defined as a placement
system in which the winner is the athlete who is felt by the majority to be a
winner. This is a natural sense of what
is a winner in Judged competitions and in most other types of competitions as
well.
Placement
systems which are compromise systems produce some winners who are not the
choice of the majority. There is a
“compromise” between the majority and the minority and the athlete who best
satisfies that compromise is declared the winner. Such winners may also be called “technical”
winners. The majority knows which
competitor really won, but also
recognizes that the declared winner met some specific criteria.
An
example of a compromise winner is easily found when the total points (marks) of
the Judges are used to declare a winner.
The type of winner is simply the skater who has been awarded the most
points. Having the most points overall
does not imply that the majority of he Judges favor that particular
athlete. It is possible that the athlete
is favored by a single Judge and is in second according to all other Judges.
In addition,
if one or two high marks and low marks are tossed out, as is done in
some systems, the sum or average of the
remaining marks still produces a compromise winner.
The
CoP system is a compromise system. The
current BOM and OBO markings systems always produce a majority winner.
Robustness
To Marking Scale Shifts:
There
are two types of marking scale shifts under consideration in this section. The first is a bulk shift, from Judge to Judge, in what the marks mean in general. This may occur due to regional differences
and / or educational differences. For
example, there may be regional
differences (due to training inconsistency) in when rotation is considered to
be complete for a jump or where the rotation begins and ends in a spin. As another example, there may be regional differences of opinion
in what the “finish” for a jump or other element is supposed to look like.
Another
type of marking scale shift may be called a spread shift. An example of this is that a Judge may allow
a small "cheat" for a very difficult quad jump and not allow any
"cheat" for an easier triple or double jump.
Under
the CoP, BOM and OBO systems, the
requirements for what is to be considered a completed element are not fully
specified. In fact, much is left to the interpretation of the
Judges. Under the CoP system, the grade of effort marks are poorly
defined. The Judges should be expected to have both
bulk and scale shifts.
As
with marking systems, placement systems
may be categorized as rigid, stiff, or robust (similar to flexible). Rigidity in the placement system concerns
how closely the actual marks are honored.
If the marks themselves, through a mathematical operation such as
addition or averaging, are the basis of the placement, then the placement system is rigid. A rigid system cannot correct for scale
shifts caused by differences in training, region, or rule interpretation. Rigid placement systems require uniform
training and exact or nearly exact usage of the rules for good performance.
The
CoP system is rigid in making placements,
which is curious because the CoP system also uses random selection of
the winner and may toss high and low marks.
In any case, the CoP system does
not allow for differences in regional preferences, training, or interpretation of the rules to exist
between the Judges on a panel. Such
differences can and do exist in Figure Skating.
The CoP placement system retains the information on how close the
skaters were within a Judge but loses (makes no use of) the information about
the differences between the Judges.
The
OBO and BOM placement systems are robust to scale shifts of both types. These systems convert the marks to
ordinals. The ordinals place the skaters
a fixed distance apart within each Judge.
Any scale differences between the Judges are therefore corrected; however information within a Judge regarding
the closeness of the skaters is lost.
In the case of using ordinals,
the information loss has been heavily studied in what is known as
non-parametric statistics. In generally
the information loss within a Judge will be relatively minor and will most
likely be made up by one or two additional Judges.
Both
sets of placement systems lose information.
In the case of BOM and OBO, it
has been shown that these systems are, using 9 Judges, more efficient than a total marks system of 7
Judges (assuming none are tossed). This
is consistent with expectations from non-parametric statistics which would
imply that we should expect at least 86% efficiency from an ordinals based
system. The CoP system, with 7 Judges
can do no better than OBO and BOM and can only equal the performance of
OBO and BOM if the marking is a) perfect (no abnormal errors) b) no scale
shifts between Judges c) all 7 Judges are used.
If any of the 7 Judges are dropped, scale shifts occur, or abnormal marks occur, the CoP system will not perform as well as
BOM and OBO.
Biased
Judging And Block Judging:
In
placement systems which involve Judging,
concern exists for the possibility of both biased Judging and what is
now called block Judging.
Mathematically, biased and block
Judging are the same and should have the same mathematical controls. In both cases, marks are artificially raised or lowered for
particular skaters by one of more Judges.
The
OBO and BOM system control for bias and blocking through their emphasis on the
majority. In both of these systems the
majority is defined by the middle Judge of the panel. The only way to move the middle Judge is for
the biased Judge, or Judging block
members, to cross the middle Judge and therefore change which Judge is in the middle. If the biased or block Judge naturally
provides marks on the high side of the panel for a skater, raising their marks has no effect at all
regardless of how high their marks are raised.
Similarly if they are naturally on the low side of the panel, lowering their marks further has no effect at
all on the results. This is because the
Judge in the middle does not change.
In
the CoP system, there is apparent
confusion about what constitutes proper controls for block Judging and biased
Judging. Although block and biased
Judging are accomplished in mathematically the same manner, the developers of the CoP system believes
they are using random Judge selection to break up the blocks and tossing high
and low marks to remove bias.
The
interesting part of this is that, while
it is the case that the influence of a block of Judges is reduced by random
selection some of the time, the same
random selection also increases the influence of block Judging some of the
time. On average random selection has no
effect on block Judging and cannot have a long term effect unless the
randomization is biased against particular Judges known to form blocks!
Thus
the only meaningful control for biased and block Judging in the CoP system
comes from tossing the high and low marks.
If the CoP system begins with 14 Judges,
and tosses 5 or 7 through random selection, then all that is accomplished in the long run
is that either 36% (select 9 Judges) or 50% (select 7 Judges) of the
information in the panel is lost by random selection. On top of this, it is now expected that the CoP system will
further reduce the amount of information by tossing two high and two low Judges
(from a panel of 9). Therefore, the CoP system makes placement decisions
using only 36% of the information available to it from the full panel. The CoP system cannot properly handle a
combined total of more than 2 Judges block or biased Judges (in the same
direction). This is because the control for blocking and bias are the
tossing of highs and lows. Add to this
the randomness in the winner selection and the problems with marking
consistency.
So
the CoP system loses 64% of the information in the panel to control at
most, on a long term average, two biased or two block Judges. In contrast the OBO and BOM system, in controlling bias and block Judging, loses at most 14% of the information on a
panel and can routinely control biased and block Judging from as many as 4
Judges on a 9 Judge panel or up to 7 Judges on a 14 Judge panel.
Robustness
To Error:
Robustness
to marking error by a single Judge is mathematically the same as intentional
bias and has been covered in the preceding section.
Fixed
Vs Reversals:
The
BOM and OBO system suffer from one problem which is somewhat unique – interim
results during a competition are not stable. That is, it appears to the viewer that these placement
systems allow for the performance of a skater to affect the running placement
of skaters who have already completed.
What is actually occurring is that,
since the OBO and BOM systems remove scale shifts, they no longer “know” how far apart the
skaters really are on any individual Judge’s scorecard. For these two systems, the placement results do not really exist
until all other skaters have skated. It
is then, and only then, that the correct ordinals can be created –
and these correct ordinals are the basis for the placement. What exists while the event is underway is
interim estimated ordinals.
The
OBO system is somewhat less sensitive to reversals than the BOM system but that
reduction in frequency of reversals comes at a price. The BOM system achieves higher accuracy in
determining the correct winner in close events by about 15%.
The
CoP system does not suffer from reversals since it has reduced the entire
skating performance to one fixed number.
If a skater achieves higher points than another skater, he will from that point forward remain ahead.
The
CoP placement system essentially provides a single (one dimensional) mark for
each skater upon completion of her program.
The CoP placement system then takes the one dimensional score and
assigns a placement. The root of the
problem with the BOM and OBO system is that these two placement systems, at any
given time in the competition, retain a 9 dimensional marking times the number
of skater who have already skated (e.g.: 9 Judges by 10 skaters =90 dimensions)
and reduce this to a one dimensional outcome (placement). As each skater completes their program and
marks are awarded, the dimensionality of
the projection of the marking space into outcome space changes. The real projection does not occur until all
skaters have skated.
|
Property |
BOM/OBO |
CoP |
|
Fully Tested |
Yes |
No |
|
Type of Marking |
Subjective – Relative |
Subjective - Absolute |
|
Precision |
High |
Questionable |
|
Accuracy |
Not Applicable |
Questionable |
|
Balance of Marks |
No Concern (tested) |
High Concern (new) |
|
Major Changes to
Performances Expected |
None |
Emphasis on Jumps, Less
Artistry |
|
Historical Record |
Yes |
Yes |
|
Flexibility |
High |
Very Stiff |
|
Random Winner |
No |
Yes |
|
Placement Consistent with
Marking Intent |
Yes |
No |
|
Compromise vs Majority |
Majority |
|