We'll develop our strategy logic interactively in a notebook, peeking at the DataFrames as we go, then transfer the code to a .py
file for backtesting with Moonshot.
As a reminder, here are the rules of the QVAL strategy as outlined in the Alpha Architect white paper:
Start by querying historical prices from your Sharadar history database. We specify our universe of NYSE stocks as well as the universes we wish to exclude.
For now we limit ourselves to a couple years of data to make it easier to work with. Later we'll run a backtest using a larger date range.
from quantrocket import get_prices
DB = "sharadar-us-stk-1d"
UNIVERSES = "nyse-stk"
EXCLUDE_UNIVERSES = ["nyse-financials", "nyse-reits", "nyse-adrs"]
prices = get_prices(DB,
start_date="2014-01-01",
end_date="2016-01-01",
universes=UNIVERSES,
exclude_universes=EXCLUDE_UNIVERSES,
fields=["Close", "Volume"])
prices.tail()
Sid | FIBBG0000018G2 | FIBBG000001J87 | FIBBG000001JC2 | FIBBG000001JD1 | FIBBG000001NT5 | FIBBG000001NV2 | FIBBG000001SF9 | FIBBG000002791 | FIBBG0000027B8 | FIBBG000002WJ5 | ... | QA000000001978 | QA000000001981 | QA000000001995 | QA000000014708 | QA000000014977 | QA000000017129 | QA000000018169 | QA000000020127 | QA000000021599 | QA000000021660 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Field | Date | |||||||||||||||||||||
Volume | 2015-12-24 | 344.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 5.0 | 367.0 | 275.0 | 0.0 | ... | NaN | NaN | NaN | NaN | 151783.0 | 44931.0 | 10039591.0 | 198010.0 | 19740.0 | 2487841.0 |
2015-12-28 | 0.0 | 0.0 | 110.0 | 0.0 | 0.0 | 0.0 | 2.0 | 1200.0 | 2305.0 | 645.0 | ... | NaN | NaN | NaN | NaN | 190260.0 | 144889.0 | 22684172.0 | 499439.0 | 21078.0 | 4331514.0 | |
2015-12-29 | 970.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4404.0 | 20.0 | 200.0 | 1789.0 | 0.0 | ... | NaN | NaN | NaN | NaN | 243947.0 | 148636.0 | 26242804.0 | 478452.0 | 29483.0 | 5834458.0 | |
2015-12-30 | 6.0 | 0.0 | 1010.0 | 0.0 | 0.0 | 140.0 | 0.0 | 9781.0 | 512.0 | 0.0 | ... | NaN | NaN | NaN | NaN | 301970.0 | 120767.0 | 30679962.0 | 344818.0 | 22549.0 | 4393923.0 | |
2015-12-31 | 15.0 | 0.0 | 1000.0 | 0.0 | 3.0 | 0.0 | 50.0 | 8824.0 | 849.0 | 0.0 | ... | NaN | NaN | NaN | NaN | 304484.0 | 167048.0 | 33392960.0 | 261484.0 | 21686.0 | 6288810.0 |
5 rows × 1600 columns
The QVAL white paper calls for limiting the universe to the top 60% of stocks by market cap. Although market cap is available in the Sharadar fundamental data, here we will use dollar volume as a proxy for market cap.
The code below will compute daily ranks by dollar volume and give us a boolean mask indicating which stocks have adequate dollar volume.
closes = prices.loc["Close"]
volumes = prices.loc["Volume"]
# calculate 90 day average dollar volume
avg_dollar_volumes = (closes * volumes).rolling(90).mean()
# rank biggest to smallest; pct=True gives percentile ranks between 0-1
dollar_volume_ranks = avg_dollar_volumes.rank(axis=1, ascending=False, pct=True)
have_adequate_dollar_volumes = dollar_volume_ranks <= (0.60)
have_adequate_dollar_volumes.tail()
Sid | FIBBG0000018G2 | FIBBG000001J87 | FIBBG000001JC2 | FIBBG000001JD1 | FIBBG000001NT5 | FIBBG000001NV2 | FIBBG000001SF9 | FIBBG000002791 | FIBBG0000027B8 | FIBBG000002WJ5 | ... | QA000000001978 | QA000000001981 | QA000000001995 | QA000000014708 | QA000000014977 | QA000000017129 | QA000000018169 | QA000000020127 | QA000000021599 | QA000000021660 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | |||||||||||||||||||||
2015-12-24 | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | True |
2015-12-28 | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | True |
2015-12-29 | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | True |
2015-12-30 | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | True |
2015-12-31 | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | True |
5 rows × 1600 columns
We'll use this filter in the next step.
Next, we use Sharadar fundamentals to identify the cheapest 10% of stocks as measured by the enterprise multiple (EV/EBIT).
The function get_sharadar_fundamentals_reindexed_like
returns a DataFrame of fundamentals matching the dates and sids (security IDs) of the input DataFrame (in this case closes
). The resulting DataFrame gives us the latest fundamental values as of any given date.
from quantrocket.fundamental import get_sharadar_fundamentals_reindexed_like
# Request EV/EBIT. The dimension "ART" (= "As reported - trailing twelve months") excludes restatements.
fundamentals = get_sharadar_fundamentals_reindexed_like(closes, fields=["EVEBIT"], dimension="ART")
enterprise_multiples = fundamentals.loc["EVEBIT"]
# Ignore negative enterprise multiples, which indicate negative earnings
enterprise_multiples = enterprise_multiples.where(enterprise_multiples > 0)
enterprise_multiples.tail()
Sid | FIBBG0000018G2 | FIBBG000001J87 | FIBBG000001JC2 | FIBBG000001JD1 | FIBBG000001NT5 | FIBBG000001NV2 | FIBBG000001SF9 | FIBBG000002791 | FIBBG0000027B8 | FIBBG000002WJ5 | ... | QA000000001978 | QA000000001981 | QA000000001995 | QA000000014708 | QA000000014977 | QA000000017129 | QA000000018169 | QA000000020127 | QA000000021599 | QA000000021660 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | |||||||||||||||||||||
2015-12-24 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | 38.0 | 20.0 | NaN | 49.0 | 8.0 | 16.0 |
2015-12-28 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | 38.0 | 20.0 | NaN | 49.0 | 8.0 | 16.0 |
2015-12-29 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | 38.0 | 20.0 | NaN | 49.0 | 8.0 | 16.0 |
2015-12-30 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | 38.0 | 20.0 | NaN | 49.0 | 8.0 | 16.0 |
2015-12-31 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | 38.0 | 20.0 | NaN | 49.0 | 8.0 | 16.0 |
5 rows × 1600 columns
We identify value stocks by ranking on enterprise multiple, but we only apply the rankings to stocks with adequate dollar volume:
value_ranks = enterprise_multiples.where(have_adequate_dollar_volumes).rank(axis=1, ascending=True, pct=True)
are_value_stocks = value_ranks <= (0.10)
fundamentals = get_sharadar_fundamentals_reindexed_like(
closes,
dimension="ART", # As-reported trailing twelve months reports
fields=[
"ROA", # Return on assets
"ASSETS", # Total Assets
"NCFO", # Net Cash Flow from Operations
"DE", # Debt to Equity Ratio
"CURRENTRATIO", # Current ratio
"SHARESWA", # Outstanding shares
"GROSSMARGIN", # Gross margin
"ASSETTURNOVER", # Asset turnover
])
return_on_assets = fundamentals.loc["ROA"]
total_assets = fundamentals.loc["ASSETS"]
operating_cash_flows = fundamentals.loc["NCFO"]
leverages = fundamentals.loc["DE"]
current_ratios = fundamentals.loc["CURRENTRATIO"]
shares_out = fundamentals.loc["SHARESWA"]
gross_margins = fundamentals.loc["GROSSMARGIN"]
asset_turnovers = fundamentals.loc["ASSETTURNOVER"]
Many Piotroski F-score components compare current to previous values, so next we need to get DataFrames of the previous period's values. To do this in pandas, we identify which dates represent new fiscal periods, shift the previous period's value into the new period, and fill forward:
fundamentals = get_sharadar_fundamentals_reindexed_like(
closes,
dimension="ART", # As-reported trailing twelve month reports
fields=["REPORTPERIOD"])
# get a boolean mask of the first day of each newly reported fiscal period
fiscal_periods = fundamentals.loc["REPORTPERIOD"]
are_new_fiscal_periods = fiscal_periods != fiscal_periods.shift()
# shift the ROAs forward one fiscal period by (1) shifting the ratios one day,
# (2) keeping only the ones that fall on the first day of the newly reported
# fiscal period, and (3) forward-filling
previous_return_on_assets = return_on_assets.shift().where(are_new_fiscal_periods).fillna(method="ffill")
# Step 2.c: Repeat for other indicators
previous_leverages = leverages.shift().where(are_new_fiscal_periods).fillna(method="ffill")
previous_current_ratios = current_ratios.shift().where(are_new_fiscal_periods).fillna(method="ffill")
previous_shares_out = shares_out.shift().where(are_new_fiscal_periods).fillna(method="ffill")
previous_gross_margins = gross_margins.shift().where(are_new_fiscal_periods).fillna(method="ffill")
previous_asset_turnovers = asset_turnovers.shift().where(are_new_fiscal_periods).fillna(method="ffill")
Now we can calculate the F-Score components; each resulting component is a DataFrame of booleans:
have_positive_return_on_assets = return_on_assets > 0
have_positive_operating_cash_flows = operating_cash_flows > 0
have_increasing_return_on_assets = return_on_assets > previous_return_on_assets
have_more_cash_flow_than_incomes = operating_cash_flows / total_assets > return_on_assets
have_decreasing_leverages = leverages < previous_leverages
have_increasing_current_ratios = current_ratios > previous_current_ratios
have_no_new_shares = shares_out <= previous_shares_out
have_increasing_gross_margins = gross_margins > previous_gross_margins
have_increasing_asset_turnovers = asset_turnovers > previous_asset_turnovers
Lastly, we convert the booleans to integers and sum to get the F-Score, which is a number between 0 and 9:
f_scores = (
have_positive_return_on_assets.astype(int)
+ have_positive_operating_cash_flows.astype(int)
+ have_increasing_return_on_assets.astype(int)
+ have_more_cash_flow_than_incomes.astype(int)
+ have_decreasing_leverages.astype(int)
+ have_increasing_current_ratios.astype(int)
+ have_no_new_shares.astype(int)
+ have_increasing_gross_margins.astype(int)
+ have_increasing_asset_turnovers.astype(int)
)
f_scores.tail()
Sid | FIBBG0000018G2 | FIBBG000001J87 | FIBBG000001JC2 | FIBBG000001JD1 | FIBBG000001NT5 | FIBBG000001NV2 | FIBBG000001SF9 | FIBBG000002791 | FIBBG0000027B8 | FIBBG000002WJ5 | ... | QA000000001978 | QA000000001981 | QA000000001995 | QA000000014708 | QA000000014977 | QA000000017129 | QA000000018169 | QA000000020127 | QA000000021599 | QA000000021660 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | |||||||||||||||||||||
2015-12-24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 5 | 1 | 5 | 2 | 7 | 7 | 5 |
2015-12-28 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 5 | 1 | 5 | 2 | 7 | 7 | 5 |
2015-12-29 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 5 | 1 | 5 | 2 | 7 | 7 | 5 |
2015-12-30 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 5 | 1 | 5 | 2 | 7 | 7 | 5 |
2015-12-31 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 5 | 1 | 5 | 2 | 7 | 7 | 5 |
5 rows × 1600 columns
Now that we have the F-scores we can rank the value stocks by quality and select the top 50%. This gives us our DataFrame of long signals, which by convention we cast from boolean to int.
quality_ranks = f_scores.where(are_value_stocks).rank(axis=1, ascending=False, pct=True)
long_signals = quality_ranks <= (0.50)
long_signals = long_signals.astype(int)
The QVAL strategy trades an equal-weighted portfolio. By convention, for an unlevered strategy the daily weights should add up to 1 (=100% invested), so we divide each day's signals by the number of signals to get the individual position weights:
daily_signal_counts = long_signals.abs().sum(axis=1)
daily_signal_counts.tail()
Date 2015-12-24 49 2015-12-28 49 2015-12-29 49 2015-12-30 49 2015-12-31 49 dtype: int64
weights = long_signals.div(daily_signal_counts, axis=0).fillna(0)
weights.where(weights!=0).stack().tail()
Date Sid 2015-12-31 FIBBG0029SNR63 0.020408 FIBBG002B917C3 0.020408 FIBBG00BN961G4 0.020408 FIBBG00GTKSSK4 0.020408 FIBBG00R2NHQ65 0.020408 dtype: float64
Currently we have a DataFrame of signals that change every day, but the QVAL strategy calls for quarterly rebalancing.
To accomplish this with pandas, we resample the DataFrame to quarterly, take the last signal of the quarter, then reindex back to daily and fill forward:
# Resample daily to quarterly, taking the last day's signal
# For pandas offset aliases, see https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
weights = weights.resample("Q").last()
# Reindex back to daily and fill forward
weights = weights.reindex(closes.index, method="ffill")
The DataFrame of weights represents what we want to own, as calculated at the end of the day. Assuming we enter positions the next day, we simply shift the weights forward to simulate our positions:
positions = weights.shift()
To calculate the return (before costs), we multiply the security's percent change over the period by the size of the position.
Since positions
represents when we enter the position, we must shift positions
forward to get the "end" of the position, since that is when we collect the percent change, not when we first enter the position.
position_ends = positions.shift()
gross_returns = closes.pct_change() * position_ends
TIP: Proper alignment is important to avoid lookahead bias. Inspect your DataFrames as you go to validate your logic. If your DataFrames are large, as in this strategy, creating some toy DataFrames can be a good way to see what's happening:
import pandas as pd
toy_closes = pd.Series([50,100,50])
toy_pct_changes = toy_closes.pct_change()
toy_positions = pd.Series([0,1,0])
toy_position_ends = toy_positions.shift()
toy_returns = toy_pct_changes * toy_position_ends
pd.concat({
"close": toy_closes,
"pct_change": toy_pct_changes,
"position": toy_positions,
"position_ends": toy_position_ends,
"return": toy_returns},
axis=1)
close | pct_change | position | position_ends | return | |
---|---|---|---|---|---|
0 | 50 | NaN | 0 | NaN | NaN |
1 | 100 | 1.0 | 1 | 0.0 | 0.0 |
2 | 50 | -0.5 | 0 | 1.0 | -0.5 |