Extremizing with the logistic model

Author

Jonas Moss

Published

Mar 13, 2023

This is a post forecast aggregation. Extremization in particular. If you’re unfamiliar with forecast aggregation, see e.g. this post.

Suppose we have n equally skilled and knowledgeable forecasters with incomplete information, intending to forecast a binary event. The available information is completely encoded in μR, and the probability of the event occurring is p=F(μ), where F is the cumulative distribution of the logistic function.

The forecasters do not load completely on the available information. In other words, they are not aware of all the available information at one point in time – or are not able to effectively compute what to do with it. The forecasters have, however, several sources of idiosyncratic and incorrect sources of information they load on.

Summing these idiosyncratic sources of information together with an error term, we can treat them as a single source of error, denoted s.

Let’s model the situation using a logistic model. ziLogis(λμ,s),zi=log(pi1pi). Here log(p/(1p)) is the quantile function of the logistic function, hence we assume that pi=F(zi), where F is the cumulative distribution function of the logistic distribution. I used the same transform to go from μ to p.

Since the logistic function is symmetric, we have EZi=λμ. It follows that the available information equals μ=EZi/λ. A natural estimator of EZi is 1nlog(pi1pi), and a natural estimator of μ is μ^=1λ1nlog(pi1pi). This is an extremizing estimator of the log-odds provided 0<λ<1, which is very reasonable. Moreover, if λ=1/30.58, the estimator approximates the extremizing estimator derived by Neyman and Roughgarden, discussed by Jamie Sevilla here. My argument is similar to the one of Satopää et al., 2014.

Comments

  1. Multiplicative bias. I am assuming a multiplicative bias in μλ. The same kind of result does not occur if I assume the bias is additive. Moreover, doesn’t people always assume there isn’t bias of this kind when talking about the wisdom of the crowds and so on? Why should there the a multiplicative bias in this case? I believe the answer is simple: You can’t expect anyone to know all the available information, or to know what to do with it. Of course, it is possible to overload on the information too, having λ>1, but it seems unlikely for the entire population of forecasters to have an average of 1.
  2. Extensions. There are so many ways to extend this model. For instance, it doesn’t allow for forecasters with different δis. Evidently, if we knew the λis of all forecasters, we could modify the formula easily. Moreover, we don’t use different σis. Both these parameters can potentially be estimated for each forecaster, giving estimates of something like forecaster knowledge (λi) and forecaster skill (how good he is at applying his knowledge, or σi).
  3. Question-specific. Consider the problem of estimating the weight of a cow. In this experiment, the true weight was 1272 lbs, and the mean forecasted weight was 1355lbs, suggesting a δ of 1.06. The δ appears not to be a statistical artifact, as the standard deviation appears to be less than 1000. A conservative 95% confidence interval would be about 1272±16.
  4. Sensitivity. The same kind of extremizing happens in every model of this kind (e.g., using a normal instead of a logistic), as the relationship between E(Zi) and μλ still holds. The formula won’t be the same, but the behavior will be similar.