What is the arithmetic mean of an empty sequence?

Martin Ba :

Disclaimer: No, I didn't find any obvious answer, contrary to what I expected!

When looking for code examples wrt. the arithmetic mean, the first several examples I can turn up via Google seem to be defined such that the empty sequence generates a mean value of 0.0. (eg. here and here ...)

Looking at Wikipedia however, the Arithmetic mean is defined such that an empty sequence would yield 0.0 / 0 --

 A = 1/n ∑[i=1 -> n](a[i])

-- so, possibly, that is NaN in the general case.

So if I write a utility function that calculates the arithmetic mean of a set of floating point values, should I, in the general case:

  • return 0. for the empty sequence?
  • return (Q)NaN for the empty sequence?
  • "throw an exception" in case of empty sequence?
R.M. :

There isn't an obvious answer because the handling depends on how you want to inform calling code of the error. (Or even if you want to interpret this as an "error".)

Some libraries/programs really don't like raising exceptions, so do everything with signal values. In that case, returning NaN (because the value of the expression is technically undefined) is a reasonable choice.

You might also want to return NaN if you want to "silently" bring the value forward through multiple other calculations. (Relying on the behavior that NaN combined with anything else is "silently" NaN.)

But note that if you return NaN for the mean of an empty sequence, you impose the burden on calling code that they need to check the return value of the function to make sure that it isn't NaN - either immediately upon return or later on. This is a requirement that is easy to miss, depending on how fastidious you are in checking return values.

Because of this, other libraries/programs take the viewpoint that error conditions should be "noisy" - if you passed an empty sequence to a function that's finding the mean of the sequence, then you've obviously doing something majorly wrong, and it should be made abundantly clear to you that you've messed up.

Of course, if exceptions can be raised, they need to handled, but you can do that at a higher level, potentially centralized at the point where it makes more sense to. Depending on your program, this may be easier or more along the lines of your standard error handling scheme than double checking return values.

Other people would argue that your functions should be robust to the error. For maximum robustness, you probably shouldn't use either NaN or an exception - you need to choose an actual number which "makes sense" as a value for the average of an empty list.

Which value is going to be highly specific to your use case. For example, if your sequence is a list of differences/errors, you might to return 0. If you're averaging test scores (scored 0-100), you might want to return 100 for an empty list ... or 0, depending on what your philosophy of the "starting" score is. It all depends on what the return value is going to be used for.

Given that the value of this "neutral" value is going to be highly variable based on exact use case, you might want to actually implement it in two functions - one general function which returns NaN or raises an exception, and another that wraps the general function and recognizes the 'error' case. This way you can have multiple versions, each with a different "default" case. -- or if this is something you're doing a lot of, you might even have the "default" value be a parameter you can pass.

Again, there isn't a single answer to this question: the average of an empty sequence is undefined. How you want to handle it depends intimately on what the result of the calculation is being used for: Just display, or further calculation? Should an empty list be exceptional, or should it be handled quietly? Do you want to handle the special case at the point in time it occurs, or do you want to hoist/defer the error handling?

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事