As I noted earlier, reading the book with pen in hand jogged loose various thoughts. . . . The book is about unexpected events (“black swans”) and the problems with statistical models such as the normal distribution that don’t allow for these rarities. From a statistical point of view, let me say that multilevel models (often built from Gaussian components) can model various black swan behavior. In particular, self-similar models can be constructed by combining scaled pieces (such as wavelets or image components) and then assigning a probability distribution over the scalings, sort of like what is done in classical spectrum analysis of 1/f noise in time series. For some interesting discussion in the context of “texture models” for images, see the chapter by Yingnian Wu in my book with Xiao-Li on applied Bayesian modeling and causal inference. (Actually, I recommend this book more generally; it has lots of great chapters in it.)
That said, I admit that my two books on statistical methods are almost entirely devoted to modeling “white swans.” My only defense here is that Bayesian methods allow us to fully explore the implications of a model, the better to improve it when we find discrepancies with data. Just as a chicken is an egg’s way of making another egg, Bayesian inference is just a theory’s way of uncovering problems with can lead to a better theory. I firmly believe that what makes Bayesian inference really work is a willingness (if not eagerness) to check fit with data and abandon and improve models often.
update: Gelman follows up on his comments with:
Dan Goldstein and Nassim Taleb’s paper writes: “Finance professionals, who are regularly exposed to notions of volatility, seem to confuse mean absolute deviation with standard deviation, causing an underestimation of 25% with theoretical Gaussian variables. In some fat tailed markets the underestimation can be up to 90%. The mental substitution of the two measures is consequential for decision making and the perception of market variability.”
This interests me, partly because I’ve recently been thinking about summarizing variation by the mean absolute difference between two randomly sampled units (in mathematical notation, E(|x_i-x_j})), because that seems like the clearest thing to visualize. Fred Mosteller liked the interquartile range but that’s a little too complicated for me, also I like to do some actual averaging, not just medians which miss some important information. I agree with Goldstein and Taleb that there’s not necessarily any good reason for using sd (except for mathematical convenience in the Gaussian model).