Wednesday, April 07, 2010
The Knotty Queston of Sensitivity
See below the gizmo for the explanation!
I wanted to discuss Mark's new analytical angle (measuring elements of the economy against durable goods consumption). And I was all set to discuss the whys and wherefores with a healthy dollop of data served up for your analysis via gizmo from the Consumer Expenditure Survey (aka Consumer Units Az U Have Not Known Them). And I was wafting merrily onwards, having finally almost figured out how to cram this stuff in blogger to my grumpy, lazy satisfaction. (My standards are low.)
And then I realized that I might be writing econo-babble as a result of some discussion on another blog over a contention that was apparently based on a an exercise in bogosity.
So now I am starting a little further back. Skip this post if you and econo-babble are dear friends of long acquaintance. The dice-throwing exercise is here because I want to use it to demonstrate some issues about econometrics, sensitivity and robusticity. Without math.
Definitions:
In economic discussions, "sensitivity" often is casually used to refer to how strongly the economy will react to certain types of forcings.
There is, however, a deeper and more structural meaning of "sensitivity" used in many types of statistical studies and modeling. So I quote from Chapter 12 of the Interactive Textbook on Clinical Symptom Research:
It is really HARD to do this sort of thing well. The study of systems with many unknowns is generally incremental - one has to observe, compile effects, theorize, and test to work out how the system is functioning before one can safely make predictions.
In economics, the system you are studying is constantly changing. You are always chasing a moving target. So relationships that have been true for some time may have shifted, which means that the body of accumulated knowledge on which one is relying may be far less applicable than one believes due to accumulated experience and research. This is a major source of error for very good economists.
Aside from the basic lack of understanding of the entire system, there are other pretty predictable sources of error which are generally unavoidable:
1) Testing too small a sample. Here is the first gizmo exercise for the math-allergic. Leave the defaults as is and run the program. Note that you will get large variances. By definition, three rolls of a six-sided die cannot come out with an even probability distribution.
Suppose (as is usually true during times of extreme economic change) that you do not know the numbers on the dice. The dice are handled by another person. You know that there are six dice, and you get to see only three after a random roll. You are not even going to find out the range of numbers on the six dice, since you will only see three. Not only is this too small a sample, the sample chosen (or available) is too short to even give you an idea of the components of the system.
Try another experiment. Roll the dice 18 times. Repeat that a few times. Most often you will at least get each side represented in the outcome set for 18 rolls. However you will notice that the distribution of outcomes varies substantially for each set of 18 rolls. If the dice are not loaded, in theory each number should have come up three times (18/6). In practice, you won't get close to an even distribution until you start rolling the dice a thousand times. At 10,000 rolls, your variances should mostly be in the low single digits. At 100,000 times, you begin to approach the expected result.
Now change the number of sides to 15. Try rolling the dice 1,000 times, 10,000 times and 100,000 times. You'll see that the variances group considerably higher than for a six-sided die at 1,000 times, but that the variances converge for the higher roll counts.
A 15-sided die rolled 10,000 times is roughly comparable to the measurements we perform on the US economy to calculate GDP. In other words, we are not very good at measuring aspects of the economy that are smaller than about 6% of the total. There is a reason why these numbers are revised for years. We certainly don't know true annual GDP to 50 bps (1/2 a percent) even after years of recalculation.
2) Testing too short an interval: Suppose that the dice are loaded, but they are loaded in such a way as to vary their face bias based on the amount of sunlight available. Clearly, running your sample in at 11:00 AM is not going to give you results predictive of an entire 24 hour cycle.
Economics is dominated by cycles. The most basic is the year. Because of this most important measures are seasonally adjusted, but the seasonal adjustments are a running average of those effects as observed in recent cycles. During times of rapid change, the seasonal adjustments may be disproportionately off for a time. Annual (YoY changes) are the strongest basis for economic analysis for this reason.
There are also effects related to longer economic cycles such as expansions and contractions. Components of the economy that may have relatively steady impacts may abruptly shift to exert unusually strong or weak effects at various stages of cycles. Sometimes the shift is relatively predictable, and sometimes it isn't. The shift can be quite strong - roughly equivalent to a 20-sided die suddenly having 2 faces marked with the same number. Because you are sampling numbers and weighting your sample data based on your belief about the die having 20 separate faces, you can aggregate your sample data and create a very unrealistic result.
The combination of these two sampling problems combined with the need to extrapolate and predict can produce some surprisingly large errors. I will continue this tomorrow.
I wanted to discuss Mark's new analytical angle (measuring elements of the economy against durable goods consumption). And I was all set to discuss the whys and wherefores with a healthy dollop of data served up for your analysis via gizmo from the Consumer Expenditure Survey (aka Consumer Units Az U Have Not Known Them). And I was wafting merrily onwards, having finally almost figured out how to cram this stuff in blogger to my grumpy, lazy satisfaction. (My standards are low.)
And then I realized that I might be writing econo-babble as a result of some discussion on another blog over a contention that was apparently based on a an exercise in bogosity.
So now I am starting a little further back. Skip this post if you and econo-babble are dear friends of long acquaintance. The dice-throwing exercise is here because I want to use it to demonstrate some issues about econometrics, sensitivity and robusticity. Without math.
Definitions:
In economic discussions, "sensitivity" often is casually used to refer to how strongly the economy will react to certain types of forcings.
There is, however, a deeper and more structural meaning of "sensitivity" used in many types of statistical studies and modeling. So I quote from Chapter 12 of the Interactive Textbook on Clinical Symptom Research:
What is Sensitivity Analysis?In short, a sensitivity analysis is an attempt to measure the degree to which it is reasonably likely that the conclusions of the study or model are basically nonsense. The math may be perfect, the eyes may be bright, advanced degrees may adorn the authors' and contributors' names, and funding can be flooding through the door - but nevertheless, you should never, never believe any of it unless assumptions are clearly disclosed and some sort of discussion of the sensitivity analysis is available. If the authors didn't bother to find out whether they were wasting their time, they are either not very good at study design/modeling or they wanted to get to a certain conclusion in the first place. Either way, it's apt to be nonsense.
No matter how well-executed or comprehensive an economic evaluation, the data on costs and outcomes will inevitably contain various degrees of uncertainty and potential bias. Investigators often make best estimates of unknown variables based on available information from experts and the literature.
Sensitivity analyses are performed to test the robustness of study results and conclusions when these underlying assumptions or estimates are varied. This process reveals the degree of uncertainty, imprecision, or methodological controversy in the evaluation.
Examples of questions addressed in sensitivity analysis include the following:
* What if a discount rate of 6% was used instead of 2%?
* What if the compliance rate for influenza vaccination was 10% higher than originally assumed?
* What if the per diem hospital cost underestimated the true economic cost of the health care program by $100?
* What if indirect and intangible costs were not considered?
It is really HARD to do this sort of thing well. The study of systems with many unknowns is generally incremental - one has to observe, compile effects, theorize, and test to work out how the system is functioning before one can safely make predictions.
In economics, the system you are studying is constantly changing. You are always chasing a moving target. So relationships that have been true for some time may have shifted, which means that the body of accumulated knowledge on which one is relying may be far less applicable than one believes due to accumulated experience and research. This is a major source of error for very good economists.
Aside from the basic lack of understanding of the entire system, there are other pretty predictable sources of error which are generally unavoidable:
1) Testing too small a sample. Here is the first gizmo exercise for the math-allergic. Leave the defaults as is and run the program. Note that you will get large variances. By definition, three rolls of a six-sided die cannot come out with an even probability distribution.
Suppose (as is usually true during times of extreme economic change) that you do not know the numbers on the dice. The dice are handled by another person. You know that there are six dice, and you get to see only three after a random roll. You are not even going to find out the range of numbers on the six dice, since you will only see three. Not only is this too small a sample, the sample chosen (or available) is too short to even give you an idea of the components of the system.
Try another experiment. Roll the dice 18 times. Repeat that a few times. Most often you will at least get each side represented in the outcome set for 18 rolls. However you will notice that the distribution of outcomes varies substantially for each set of 18 rolls. If the dice are not loaded, in theory each number should have come up three times (18/6). In practice, you won't get close to an even distribution until you start rolling the dice a thousand times. At 10,000 rolls, your variances should mostly be in the low single digits. At 100,000 times, you begin to approach the expected result.
Now change the number of sides to 15. Try rolling the dice 1,000 times, 10,000 times and 100,000 times. You'll see that the variances group considerably higher than for a six-sided die at 1,000 times, but that the variances converge for the higher roll counts.
A 15-sided die rolled 10,000 times is roughly comparable to the measurements we perform on the US economy to calculate GDP. In other words, we are not very good at measuring aspects of the economy that are smaller than about 6% of the total. There is a reason why these numbers are revised for years. We certainly don't know true annual GDP to 50 bps (1/2 a percent) even after years of recalculation.
2) Testing too short an interval: Suppose that the dice are loaded, but they are loaded in such a way as to vary their face bias based on the amount of sunlight available. Clearly, running your sample in at 11:00 AM is not going to give you results predictive of an entire 24 hour cycle.
Economics is dominated by cycles. The most basic is the year. Because of this most important measures are seasonally adjusted, but the seasonal adjustments are a running average of those effects as observed in recent cycles. During times of rapid change, the seasonal adjustments may be disproportionately off for a time. Annual (YoY changes) are the strongest basis for economic analysis for this reason.
There are also effects related to longer economic cycles such as expansions and contractions. Components of the economy that may have relatively steady impacts may abruptly shift to exert unusually strong or weak effects at various stages of cycles. Sometimes the shift is relatively predictable, and sometimes it isn't. The shift can be quite strong - roughly equivalent to a 20-sided die suddenly having 2 faces marked with the same number. Because you are sampling numbers and weighting your sample data based on your belief about the die having 20 separate faces, you can aggregate your sample data and create a very unrealistic result.
The combination of these two sampling problems combined with the need to extrapolate and predict can produce some surprisingly large errors. I will continue this tomorrow.
Comments:
<< Home
So, on the first 3 rolls that d20 rolled a 1, a 12, and a 19. I'm didn't like that, but the next six rolls were a 12, 14, 15, 17, and two 18s. I think I'll keep it ...
:-)
:-)
Those of us required to take stat in college are well aware of these issues which only highlights how limited stat is in the financial world but our always hungry content media misuse of data knows no boundary!
MOM,
"And then I realized that I might be writing econo-babble as a result of some discussion on another blog over a contention that was apparently based on a an exercise in bogosity."
I am fluent in econo-babble! Many of my charts would cease to have any meaning if Mr. Fusion was rolled out for private consumption in the morning.
I look through data seeking out trends that might support what I already think might be going on, but really it's just data mining. Data mining is very risky. It's easy to spot trends in completely random data as you point out in this post.
I'd also be among the first to point out that the economy is far too complex for any one chart to do it justice.
That said, I have mined Who Struck John's dice data (while trying to ignore how he came up with it) and would suggest the Paladin as a decent class choice. AD&D for the win! ;)
"And then I realized that I might be writing econo-babble as a result of some discussion on another blog over a contention that was apparently based on a an exercise in bogosity."
I am fluent in econo-babble! Many of my charts would cease to have any meaning if Mr. Fusion was rolled out for private consumption in the morning.
I look through data seeking out trends that might support what I already think might be going on, but really it's just data mining. Data mining is very risky. It's easy to spot trends in completely random data as you point out in this post.
I'd also be among the first to point out that the economy is far too complex for any one chart to do it justice.
That said, I have mined Who Struck John's dice data (while trying to ignore how he came up with it) and would suggest the Paladin as a decent class choice. AD&D for the win! ;)
I also want to point to my Trend Line Disclaimer.
If finding a suitable wobbly line that matched past data was all it took to predict the future, I could tell you what the weather will be like 10 years from now. Not going to happen, lol.
I'm also a fan of the book titled, "A Random Walk Down Wall Street". He describes showing a chart to one of his chartist friends.
"What is this company?" he exclaimed. "We've got to buy immediately. This pattern's a classic. There's no question the stock will be up 15 points next week." He did not respond kindly to me when I told him this chart had been produced by flipping a coin.
If finding a suitable wobbly line that matched past data was all it took to predict the future, I could tell you what the weather will be like 10 years from now. Not going to happen, lol.
I'm also a fan of the book titled, "A Random Walk Down Wall Street". He describes showing a chart to one of his chartist friends.
"What is this company?" he exclaimed. "We've got to buy immediately. This pattern's a classic. There's no question the stock will be up 15 points next week." He did not respond kindly to me when I told him this chart had been produced by flipping a coin.
This post is an excellent explanation of the reason why centrally planned economies don't work. No one is smart enough or can ever know enough to make the decisions. Adam Smith, bless his Scottish heart, continues to be pertinent.
Post a Comment
<< Home