Backtesting and Cognitive Dissonance

Cognitive dissonance is the term used to describe the discomfort people have when they are expected to believe two or more ideas that, on their face, seem to be contradictory.

Backtesting is the application of a trading system to an historical data series.

Expert opinions of backtesting bring about cognitive dissonance.

One opinion is that backtesting, particularly in its extreme form as optimizing over an in-sample set of data, is bad. The opinion continues that in-sample results have little value and cannot be used to estimate future performance. I often express that opinion.

Another opinion is that a trader must have confidence that the optimization process chooses the best set of logic and parameters. That set is the one that best identifies patterns, conditions, and signals that precede profitable trading opportunities. I regularly express that opinion.

I do believe both points of view. How can I resolve the dissonance?

In the first case, my criticism is of blindly mining a set of data in search of the best fit, expecting the result to be a robust and reliable trading system.

If the data being processed is not a financial time series, then that best fit probably is both descriptive of the in-sample data and predictive of the out-of-sample data. But financial time series data is different. Its characteristics change, influenced by economic cycles, government policies, political actions, global events, and corporate actions. Models that accurately fit one time period lose accuracy as time passes and the important patterns in the data shift.

Proper development of trading systems requires that the model is able to, or allowed to, adapt to changes in the data. That can be accomplished by either including logic that recognizes changes and applies different rules or different parameter values to different conditions; or by periodically repeating the search for the best fit to the recent data – reoptimizing.

In some circumstances, such as moving averages, adaptive algorithms are well known and efficiently implemented. Attempting to recognize and adapt to more general changes greatly increases the complexity of the logic and is usually not practical.

The other approach, to periodically reoptimize, requires that the length of the in-sample period be consistent with the length of time the characteristics of the data remain stationary. There is no general rule for determining that length. It can only be determined by experimentation. Given the length of time the data is in a stable state, relative to the specific model, the system must use the early portion to synchronize the logic to the data and the later portion to trade it. That is, the length of the in-sample period plus the length of the out-of-sample period must be no greater than the length of the period of stability. Toward the end of the out-of-sample period, system performance degrades as the characteristics of the data change beyond the ability of the logic to adapt. The trader must either reoptimize / resynchronize on a time schedule such that the system performance is not allowed to degrade, or he must be able to monitor the health of the system and resynchronize it as necessary.

The trading system developer gains the necessary confidence in the resynchronization process by practicing. He tries different combinations of logic and parameters, optimizes over an in-sample period, selects the best alternative of those tested, and tests over an out-of-sample period. The walk forward technique is an automated process that does exactly that. It repeatedly searches an in-sample period, selects the best, and tests the following out-of-sample period. Each step is a practice event in anticipation of the time when the developer moves the system from development to trading. The success of the walk forward process depends on accurately fitting the model to the data over the in-sample period, and choosing the alternative that the developer prefers.

The final result of the walk forward process is a set of several out-of-sample periods, each of which resulted from using the best in-sample fit. That set of combined out-of-sample trades is the very best estimate of future performance available. It is the gold standard. We can use it as a baseline with which to determine the best position size and to compare actual performance.

Having successful results from walk forward tests is what gives me the confidence I express in the second opinion. We arrived at it by having confidence in backtesting. But only after providing for the unique considerations encountered in developing trading systems. This removes the dissonance – at least for me.


Reprinted with permission Dr. Howard B. Bandy. (