Sir Francis Galton coined the term whilst investigating genetic inheritance, having observed that the children of individuals with extreme attributes on average demonstrated 'regressed' levels when compared to their parents. Children of extremely tall parents were more likely to tend slightly towards the average rather than growing indiscriminately more extreme in stature. This observation extends to other inherited traits but also finds itself in many natural processes displaying random variables with extreme deviations.
Consider the below data on San Francisco tides extracted from the National Oceanic and Atmospheric Administration. When one identifies the extreme highs and lows of the distribution, and considers the tides at a fixed timestep following from these extreme measurements, a clear pattern appears: the distribution of observations finds itself closer to the mean by pulling itself away from the extreme points initially observed. On reproducing the same analysis on the daily returns of the S&P 500, one finds a similar result - contrary to popular intuition, a day in the red is most likely to be succeeded by a day with positive returns, especially when one considers that the most extreme days tend to skew negative for financial time-series.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
# Load tidal data from NOAA
data = pd.read_csv('tides_san_francisco.csv')
# Convert date time strings to datetime objects
data['Date Time'] = pd.to_datetime(data['Date Time'])
# Extract date and time information
data['Date'] = data['Date Time'].dt.date
data['Time'] = data['Date Time'].dt.time
# Identify extreme tides (top and bottom 5%)
data['High Tides'] = (data['Prediction'] > data['Prediction'].quantile(0.99)).astype('int')
data['Low Tides'] = (data['Prediction'] < data['Prediction'].quantile(0.01)).astype('int')
# Find subsequent tide predictions
data['Post High'] = np.where(data['High Tides']==1, data['Prediction'].shift(-60), 0)
data['Post Low'] = np.where(data['Low Tides']==1, data['Prediction'].shift(-60), 0)
#data = data.dropna()
# Plotting the data
plt.figure(figsize=(12, 6))
# Plot extreme high tides
plt.subplot(1, 2, 1)
plt.hist(data[data['High Tides']==1]['Post High'], bins=30, edgecolor='black', alpha=0.7)
plt.axvline(data['Prediction'].mean(), color='red', linestyle='--', label='Mean Tide Level')
plt.title('Subsequent Tides After Extreme High Tides')
plt.xlabel('Tide (meters)')
plt.ylabel('Frequency')
plt.legend()
# Plot extreme low tides
plt.subplot(1, 2, 2)
plt.hist(data[data['Low Tides']==1]['Post Low'], bins=30, edgecolor='black', alpha=0.7)
plt.axvline(data['Prediction'].mean(), color='red', linestyle='--', label='Mean Tide Level')
plt.title('Subsequent Tides After Extreme Low Tides')
plt.xlabel('Tide (meters)')
plt.ylabel('Frequency')
plt.legend()
plt.tight_layout()
plt.show()
import yfinance as yf
# Fetch historical stock data
ticker = 'SPY'
data = yf.download(ticker, start='1990-01-01', end='2025-01-01', progress=False)
data['Daily Return'] = data['Adj Close'].pct_change() * 100
# Identify days with exceptionally high returns
high_return_days = data[data['Daily Return'] > data['Daily Return'].quantile(0.995)]
low_return_days = data[data['Daily Return'] < data['Daily Return'].quantile(0.005)]
# Calculate subsequent day returns
high_return_days['Next Day Return'] = data['Daily Return'].shift(-1).loc[high_return_days.index]
low_return_days['Next Day Return'] = data['Daily Return'].shift(-1).loc[low_return_days.index]
# Plotting the data
plt.figure(figsize=(12, 6))
# Plot extreme high tides
plt.subplot(1, 2, 1)
plt.hist(high_return_days['Next Day Return'], bins=30, edgecolor='black', alpha=0.7)
plt.axvline(data['Daily Return'].mean(), color='red', linestyle='--', label='Mean Daily Return')
plt.title('Distribution of Next Day Returns Following High Return Days')
plt.xlabel('Next Day Return (%)')
plt.ylabel('Frequency')
plt.legend()
# Plot extreme low tides
plt.subplot(1, 2, 2)
plt.hist(low_return_days['Next Day Return'], bins=30, edgecolor='black', alpha=0.7)
plt.axvline(data['Daily Return'].mean(), color='red', linestyle='--', label='Mean Daily Return')
plt.title('Distribution of Next Day Returns Following Low Return Days')
plt.xlabel('Next Day Return (%)')
plt.ylabel('Frequency')
plt.legend()
plt.tight_layout()
plt.show()
If you find yourself dubious of this proposition, consider a very simple trading strategy where one trades the reverse position of the previous day's return sized proportionally to its magnitude (i.e. if the S&P 500 dropped by 3% yesterday, then purchase the exposure in a proportion of 3x notional). Repeating this procedure on a daily basis results in a surprisingly performant strategy with a risk-adjusted return in line with the market itself (excluding trading costs which may detract from its implementation). Even more surprisingly, the strategy excels during volatile market periods when equity markets tend to struggle most (although perhaps not all that surprising when one considers the previous analysis). I was amused to find Edwin Lefèvre describe a scene in Reminiscences of a Stock Operator, whereby a renowned trader of the early 1900s made a small fortune buying up all the shares he could get when the market was in turmoil and selling pressures were high.
More sophisticated indicators may be used to identify extreme bounds of a price series, such as the Internal Bar Strength (IBS) indicator, which identifies where an instrument closes relative to its daily range. Using this particular indicator in a very simplistic trading set-up shows merit and highlights the potential for information extraction from other datapoints other than the closing price alone.
import math
# Download historical data for S&P 500 futures
start_date = '2000-01-01'
end_date = '2025-01-01'
df = yf.download('^GSPC', start=start_date, end=end_date, progress=False)
# Create signals DataFrame
signals = pd.DataFrame(index=df.index)
signals['Price'] = df['Adj Close']
signals['Return'] = signals['Price'].pct_change()
signals['Position'] = -signals['Return'].shift(1)
signals['Strategy_Return'] = signals['Position'] * signals['Return']
signals['Cumulative_Return'] = (1 + signals['Strategy_Return']).cumprod()
# Calculate performance metrics
first_valid_index = signals['Cumulative_Return'].first_valid_index()
last_valid_index = signals['Cumulative_Return'].index[-1]
total_days = (last_valid_index - first_valid_index).days
annReturn = (signals['Cumulative_Return'][-1] / signals['Cumulative_Return'].loc[first_valid_index]) ** (365 / total_days) - 1
annVol = math.sqrt(252) * signals['Cumulative_Return'].pct_change(1).std()
sharpeRatio = annReturn / annVol
# Plot cumulative returns
plt.figure(figsize=(8, 4))
plt.plot(np.log(signals['Cumulative_Return']))
plt.title('Next Day Reversion Strategy Performance', fontsize=14)
plt.xlabel('Date', fontsize=10)
plt.ylabel('Cumulative Return', fontsize=10)
plt.tight_layout()
# Annotate the plot with performance metrics
plt.text(0.01, 0.97, f"Annualized Return: {annReturn:.2%}\n"
f"Annualized Volatility: {annVol:.2%}\n"
f"Sharpe Ratio: {sharpeRatio:.2f}",
transform=plt.gca().transAxes, fontsize=8, verticalalignment='top',
bbox=dict(facecolor='white', alpha=0.8, edgecolor='gray'))
plt.show()
# Download price data
ticker = '^GSPC'
data = yf.download(ticker, start='2000-01-01', end='2025-01-01', progress=False)
# Compute the IBS indicator
data['IBS'] = (data['Close'] - data['Low']) / (data['High'] - data['Low'])
# Initialize columns for signals and positions
data['Signal'] = 0
data['Position'] = 0
# Generate buy signals
data.loc[data['IBS'] < 0.10, 'Signal'] = 1
# Generate sell signals (when to close the position)
data['Prev_High'] = data['High'].shift(1)
data.loc[data['Close'] > data['Prev_High'], 'Signal'] = -1
# Implement the logic to maintain the position until a sell signal is generated
in_position = False
for i in range(len(data)):
if data['Signal'].iloc[i] == 1 and not in_position:
data.loc[data.index[i], 'Position'] = 1
in_position = True
elif data['Signal'].iloc[i] == -1 and in_position:
data.loc[data.index[i], 'Position'] = 0
in_position = False
elif in_position:
data.loc[data.index[i], 'Position'] = 1
data['Position'] = data['Position'].ffill().fillna(0)
# Calculate the strategy returns
data['Strategy_Returns'] = data['Position'].shift(1) * data['Adj Close'].pct_change()
data['Cumulative_Strategy_Returns'] = (1 + data['Strategy_Returns']).cumprod()
# Calculate strategy performance metrics
trading_days = 252
annualized_return = (data['Cumulative_Strategy_Returns'][-1] / data['Cumulative_Strategy_Returns'].loc[data['Cumulative_Strategy_Returns'].first_valid_index()]) ** (365 / (data['Cumulative_Strategy_Returns'].index[-1] - data['Cumulative_Strategy_Returns'].first_valid_index()).days) - 1
annualized_volatility = data['Strategy_Returns'].std() * np.sqrt(trading_days)
sharpe_ratio = annualized_return / annualized_volatility
# Plot the cumulative returns
plt.figure(figsize=(8, 4))
plt.plot(np.log(data['Cumulative_Strategy_Returns']))
plt.title('IBS Indicator Strategy Performance', fontsize=14)
plt.xlabel('Date', fontsize=10)
plt.ylabel('Cumulative Returns', fontsize=10)
plt.tight_layout()
# Annotate the plot with performance metrics
textstr = '\n'.join((
f"Strategy Annualized Return: {annualized_return:.2%}",
f"Strategy Annualized Volatility: {annualized_volatility:.2%}",
f"Strategy Sharpe Ratio: {sharpe_ratio:.2f}",
))
# Place a text box in upper left in axes coords
plt.gca().text(0.01, 0.97, textstr, transform=plt.gca().transAxes, fontsize=8,
verticalalignment='top', bbox=dict(facecolor='white', alpha=0.8, edgecolor='gray'))
plt.show()
Traditional statistical arbritrage has known its heyday in the 80s when Morgan Stanley and other outfits pooled huge amounts of capital in identifying mean reversion opportunities amongst spreads of instruments as opposed to singular stocks themselves. Unfortunately, not many financial series mean-revert in a meaningful and predictable manner individually; however, their combination can result in more favourable, statistical edges by synthetically creating time-series which are stationary.
A simple mean-displacement strategy has been tested below on five oil majors, which should hold strong relationships due to their competing natures. Momentary bouts of increased enthusiasm or neglect allow the strategy to profit from narrowing spread opportunities. Once again, the cost of trading would likely deter this strategy from operating successfully in its current form but the case is exemplified for more sophisticated iterations which may include additional basket components, ensemble reversion signals and more targeted hedging using a risk model.
# Download historical data for oil majors
tickers = ['XOM', 'CVX', 'BP', 'SHEL', 'TTE']
price_data = yf.download(tickers, start='2000-01-01', end='2025-01-01', progress=False)['Adj Close']
# Calculate daily returns
daily_returns = price_data.pct_change().dropna()
# Construct the 2-day returns signal based on mean displacement
lookback_window = 2
signals = -(daily_returns.rolling(lookback_window).sum().subtract(
daily_returns.rolling(lookback_window).sum().mean(axis=1), axis=0))
# Normalize the signal
signals = signals.divide(signals.abs().sum(axis=1), axis=0)
strategy_returns = signals.shift().multiply(daily_returns).sum(axis=1)
cum_returns = (1+strategy_returns).cumprod()
# Calculate performance metrics
ann_return = (cum_returns[-1] / cum_returns.loc[cum_returns.first_valid_index()]) ** (365 / (cum_returns.index[-1] - cum_returns.first_valid_index()).days) - 1
ann_vol = math.sqrt(252) * cum_returns.pct_change().std()
sharpe_ratio = ann_return / ann_vol
# Plot cumulative strategy returns
plt.figure(figsize=(8, 4))
plt.plot(np.log(cum_returns), label='Strategy Returns')
plt.title('Oil Majors Stat Arb Strategy Performance', fontsize=14)
plt.xlabel('Date', fontsize=10)
plt.ylabel('Cumulative Returns', fontsize=10)
# Annotate the plot with performance metrics
textstr = '\n'.join((
f"Annualized Return: {ann_return:.2%}",
f"Annualized Volatility: {ann_vol:.2%}",
f"Sharpe Ratio: {sharpe_ratio:.2f}"
))
# Place a text box in upper left in axes coords
plt.gca().text(0.012, 0.97, textstr, transform=plt.gca().transAxes, fontsize=8,
verticalalignment='top', bbox=dict(facecolor='white', alpha=0.8, edgecolor='gray'))
plt.tight_layout()
plt.show()
Like most natural processes, financal time-series obey certain characteristic movement patterns. The first follows from a simple perturbation - a news announcement, a large market order or simply an aggregation of factors we cannot distill into a simple explanation. The law of inertia states that the trend will continue in the same vector until acted on by an opposing force. Perhaps this seems like an oversimplification but I would suggest that most of us often fall prey to this modus operandi in our daily lives. We will continue to work a dead-end job or a stick around in a toxic relationship until we are nudged by a more powerful call to take action. The previous trending behaviour has dislodged any notion of fair value and must be corrected through an active pursuit for equilibrium. There are times for extended sloth and times when we must roll up our sleeves and hunt down a result not driven solely by a de facto set of rules.
To complicate things somewhat, these two movements occur simultaneously at different timescales. A simple time-series conceals a mass of information on behaviors, each with their own probable causes. Price discriminant market participants reinforce efficient markets by profiting off price dislodgements, however only when they have the capacity to do so. Arbritrage at a high-frequency scale is possible due to more sophisticated market access, and allows for low-risk mean reversion trades. As we de-magnify, markets start to trend as most market players are now able to participate. There are limits to arbritrage here, even if we were to identify price movements which seem far-removed from reality. Bubbles are formed and as foolish as it may seem, the only thing more foolish would be to go against the price movement - you simply cannot counteract the magnanimity of the masses. Finally, one timescale further out and mean-reversion takes hold once again. Bubbles collapse, bull cycles end and the proverbial party has reached its climax. Equilibium is restored, at least temporarily, until the Gatsby-esque charade starts up all over again.
Something that often weighs on my conscience is stating things which upon further learning turns out to be either partially incorrect, or outright wrong. Although I doubt anyone should entertain my musings, I still try to correct the materials here when I feel that I have found an inconsistency in my writings. Over the course of the past months, I have been pleasantly surprised to find more than a couple independent algo traders posting mean reversion strategies on Reddit, in a similar vein to the strategies backtested in this notebook. Unfortunately, this has also compelled me to dig a little bit deeper into how pervasive the observed behaviours would be in the long-term. Fortunately, when it comes to the S&P 500, there is extensive history going back since the 1930s to perform this analysis. Let's now re-run the 'Next Day Reversion Strategy' for the full available history.
# Download historical data for S&P 500 futures
start_date = '1900-01-01'
end_date = '2025-01-01'
df = yf.download('^GSPC', start=start_date, end=end_date, progress=False)
# Create signals DataFrame
signals = pd.DataFrame(index=df.index)
signals['Price'] = df['Adj Close']
signals['Return'] = signals['Price'].pct_change()
signals['Position'] = -signals['Return'].shift(1)
signals['Strategy_Return'] = signals['Position'] * signals['Return']
signals['Cumulative_Return'] = (1 + signals['Strategy_Return']).cumprod()
# Calculate performance metrics
first_valid_index = signals['Cumulative_Return'].first_valid_index()
last_valid_index = signals['Cumulative_Return'].index[-1]
total_days = (last_valid_index - first_valid_index).days
annReturn = (signals['Cumulative_Return'][-1] / signals['Cumulative_Return'].loc[first_valid_index]) ** (365 / total_days) - 1
annVol = math.sqrt(252) * signals['Cumulative_Return'].pct_change(1).std()
sharpeRatio = annReturn / annVol
# Plot cumulative returns
plt.figure(figsize=(8, 4))
plt.plot(np.log(signals['Cumulative_Return']))
plt.title('Next Day Reversion Strategy Performance (1928-Present)', fontsize=14)
plt.xlabel('Date', fontsize=10)
plt.ylabel('Cumulative Return', fontsize=10)
plt.tight_layout()
# Annotate the plot with performance metrics
plt.text(0.01, 0.97, f"Annualized Return: {annReturn:.2%}\n"
f"Annualized Volatility: {annVol:.2%}\n"
f"Sharpe Ratio: {sharpeRatio:.2f}",
transform=plt.gca().transAxes, fontsize=8, verticalalignment='top',
bbox=dict(facecolor='white', alpha=0.8, edgecolor='gray'))
plt.show()
Needless to say, I was taken aback by this result - it's almost as if I had conveniently started the prior backtest from the moment I knew it would start performing! More worrying is how smooth the 'anti-strategy' performance seems to be (not the random-walk I would quite expect from an overfit strategy). Upon confirming that this result was persistent across other equity index mean reversion formulations, I had to concede that the daily mean reversion qualities that currently exist are perhaps a relatively recent phenomenon. This led me to posit that the market undergoes structural periods of extended positive or negative autocorrelation in daily returns, as confirmed in the below chart. The only period in history where the same degree of negative autocorrelation existed was pre-1940s, during which the above reversion strategy once again outperforms.
And just in case you were curious, I have plotted the cumulative returns of the anti-strategy during the period of positive returns autocorrelation. In any case, I hope this serves as a teaching moment that even a 20 year period is not sufficient to permanently characterise a time-series - the rules are permanently in flux; I suppose that's what keeps things interesting.
def rolling_autocorrelation(returns, window):
return returns.rolling(window).apply(lambda x: x.autocorr(), raw=False)
daily_returns = df['Adj Close'].pct_change()
acf = rolling_autocorrelation(daily_returns, 500).dropna()
plt.figure(figsize=(8, 4))
plt.plot(acf)
plt.title('Rolling Autocorrelation of S&P 500 Returns', fontsize=14)
plt.xlabel('Date', fontsize=10)
plt.ylabel('Autocorrelation', fontsize=10)
plt.axhline(y=0, color='r', linestyle='--')
plt.tight_layout()
plt.show()
df = yf.download(['^GSPC'], '1940-01-01', '1990-01-01', progress=False)['Adj Close']
signals = (df.pct_change()>0).astype(int)
strategyReturns = signals.shift() * df.pct_change()
cumReturns = (1+strategyReturns).cumprod()
annReturn = cumReturns[-1] ** (365 / (cumReturns.index[-1]-cumReturns.index[0]).days) - 1
annVol = math.sqrt(252) * strategyReturns.std()
sharpeRatio = annReturn / annVol
plt.figure(figsize=(8, 4))
plt.plot(np.log(cumReturns))
plt.title('Next Day Momentum Strategy Performance (1940-1990)', fontsize=14)
plt.xlabel('Date', fontsize=10)
plt.ylabel('Cumulative Return', fontsize=10)
plt.text(0.01, 0.97, f"Annualized Return: {annReturn:.2%}\n"
f"Annualized Volatility: {annVol:.2%}\n"
f"Sharpe Ratio: {sharpeRatio:.2f}",
transform=plt.gca().transAxes, fontsize=8, verticalalignment='top',
bbox=dict(facecolor='white', alpha=0.8, edgecolor='gray'))
plt.tight_layout()
plt.show()