代做Empirical Finance (BU.232.640.W1.SP24) Assignment 1 – Event Driven Backtesting for Small-Cap Bio S

项目预算：开发周期：发布时间：要求地区：

Assignment 1 - Event Driven Backtesting for Small-Cap Bio Stock

Empirical Finance (BU.232.640.W1.SP24)

April 11, 2023

Assignment Objectives:

• Source PDUFA and Phase 3 catalyst event data for small cap biotech stocks.

• Develop a robust investment strategy and analyze the summary statistics of the strategy.

• Discuss whether the “Sell the News” phenomenon is evident in the data.

• Strategize for the upcoming PDUFA and Phase 3 catalyst events in the second half of 2024 and develop an investment strategy.

Source PDUFA and Phase 3 catalyst event data for small cap biotech stocks

I have sourced PDUFA and Phase 3 catalyst event data from the following websites:

• https://www.fdatracker.com/fda-calendar/

• https://www.biopharmcatalyst.com

This is an example of the data from www.fdatracker.com:

FDATracker provides the PDUFA dates for all biotech stocks going back around 10 years. The only problem with the data is that the FDA decision needs to be sourced from company news announcements. I have manually sourced the FDA decision for a small sample of companies.

Biopharmcatalyst.com has a very comprehensive database on Phase 3 clinical trials. This is an example of the data:

Zoomed in view:

Biopharmcatalyst.com provides information on whether the company has met endpoints or surpassed targets set for the Phase 3 trial. I have manually assigned positive and negative ratings based on whether end points have been achieved. I also used Bloomberg to calculate the market capitalization of all companies and only retained companies with a market capitalization below $2 billion. Investopedia (n.d.) considered companies to be small cap if their market cap is between $250 million and $2 billion. I included companies with a market capitalization below $250 million as they tend to be more sensitive to news events and relevant to this study.

This is a small sample of my data which can be found in the Phase3_And_PDUFA_Data.xlsx spreadsheet:

I have collected data on 204 catalyst events categorized as either PDUFA events or Phase 3 clinical trial results. My data has 101 negative news events and 103 positive news events.

Structuring the Data and Collecting Stock Data

I used a python script linked to the Bloomberg API to download the stock data for all companies listed in the Phase3_And_PDUFA_Data.xlsx spreadsheet:

The stock data had 132 different companies. Although I had 204 catalyst events some companies had more than one catalyst event on different dates.

Methodology for Structuring the Data

I wrote a python script. Empirical Finance Project - Equity Curves - Mark Tanzer.py which outputs a spreadsheet titled Structured_Stock_Event_Data. This workbook has two sheets called Returns and Prices.

The Returns sheet calculates the percentage returns 5, 10, 30, and 60 days before and after the catalyst event date. The python script is flexible and outputs percentage returns specified by the user. There is no limit on the number of columns which can be output. This is a small sample of the data:

The Prices sheet goes through every csv file and locates closing price stock data around the catalyst event date. It then outputs the daily closing price stock data for 90 days before the event and 90 days after the event. I could not display 180 columns and have just taken a picture of the output 5 days before and 5 days after the event date below:

The following screenshots show the python script is extracting the correct closing price stock data (highlighted in red below):

The structured output data is consistent with the highlighted closing price stock data above:

I progressed to write a python script. called Empirical Finance Project - Equity Curves - Mark Tanzer.py. This script outputs an excel workbook titled Equity_Curves.xlsx. It has 4 sheets titled All Returns, Individual Daily Returns, Cumulative Returns and Summary Statistics.

The All_Returns sheet calculates the daily returns 90 days before and 90 days after the trade event:

Day_0_Return defines the trade event and represents holding a trade at the close of the day before the trade event to the closing price on the trade event. One notices large percentage changes for the first 4 stocks on the catalyst event date. Sometimes a catalyst event doesn’t have a large percentage change. In my data there were around 30 events with a move between -5% and 5%. All other events were outside this range which is considered reasonable for this project.

The All_Returns sheet aims to identify stock price changes around catalyst event dates. This is not however sufficient to construct an equity curve as the individual returns need to be allocated to a full list of dates extending the entire data set. These returns should also reflect the trade entry date and the trades holding period. This is a very difficult piece of code to design because a strategy trading on day 5 and holding 2 days needs to find the date corresponding to the Day_-5_Return and the Day_-4_Return and then allocate it to the full list of dates. It needs to do this for all securities with a catalyst event date and for all holding periods.

The Individual Daily Returns sheet was able to achieve this result for different trade entry dates and holding periods (defined by Days_offset and Holding):

Let’s, check the data for one stock security, namely CERS. The Day_-5_Return of -2.28% represents a daily return calculated from holding the stock from the close on day -6 to the close on day -5. It is allocated to 14 March 2024 which is 5 days before the trade event. But AQSTand OPTN have an overlap with the dates and -1.6% and -3.5% returns on 14 March 2024 need to be added to the -2.28%. This gives 7.38% (-2.28% - 1.6% - 3.5%) and the difference is because of rounding. Similarly, the same process is done for Day_-4_Return, Day_-3_Return, Day_-2_Return and Day_-1_Return which match exactly with the previous individual daily returns calculated. Note: the column heading specifies a Days_offset = 5 days (5 days in the past) and Holding = 5 (hold the trade for 5 days).

Compounding these individual returns then gives the cumulative equity curve or cumulative returns of the trades. This is output into the Cumulative_Returns sheet based on a notional investment of 100,000 . The full notional investment plus capital gains and losses are assumed to be invested in each trade. This means that if any investment falls 90% it is almost impossible for the equity curve to recover. Here is a screenshot of the Cumulative_Returns sheet:

Finally , for every equity curve, I calculate summary trade statistics which are output into the sheet called Summary_Statistics. This is a small sample of the output and most spreadsheets have a large number of columns extending hundreds of columns.

Backtesting Methodology

The previous section explained the methodology for calculating equity curves for different holding periods. I used a brute force approach to simulate thousands of different equity curves and then filter by highest return and highest Sharpe ratio to select my top 8 strategies.

The equity curves have been simulated for the following scenarios:

• Only trade before the catalyst event (Before)

• Only trade after the catalyst event (After)

• Only trade after the catalyst event with positive news from the company ( After – Positive News)

• Only trade after the catalyst event with negative news from the company ( After – Negative News)

• Only trade after the catalyst event with positive news and positive momentum ( After – Positive News – Positive Momentum)

• Only trade after the catalyst event with positive news and negative momentum ( After – Positive News – Negative Momentum)

• Only trade after the catalyst event with negative news and positive momentum ( After – Negative News – Positive Momentum)

• Only trade after the catalyst event with negative news and negative momentum ( After – Negative News – Negative Momentum)

Positive momentum is defined as a +10% total return from 20 days before the catalyst event to the stocks closing price on the catalyst event date . Negative momentum is defined as a -10% returns from 20 days before the catalyst event to the stock’s closing price on the catalyst event date.

The plot of all the individual equity curves can be found in the following folders which accompany this report:

These are the output excel spreadsheets from running all the simulations:

This is a picture of the python code supporting this project (python code submitted with this report):

Top Selected Equity Curves and Trade Summary Statistics

The previous section described how I calculated thousands of equity curves and then filtered by highest return and highest Sharpe ratio to determine the top 8 strategies. The top strategies are summarized below:

It should be noted that for backtesting “After” scenarios, days ffset = X and holding period = Y means the trader should go X days in the future and then backY days. A trade is placed on the date corresponding to Y and then held until date corresponding to X. This approach was adopted because I needed a consistent methodology for “Before” events and “After” events with flexibility to change the backtest scenario. In the case of a “Before” scenario, days offset = X and holding period = Y means the trader should enter the trade X days in the past and then hold for Y days in the future. A trade would be entered on the date corresponding to date X and held till date corresponding to date Y.

The top trading strategies were selected from thousands of equity curves and generally show positive news with negative momentum or negative news with positive momentum. This suggests “Sell the News” s an effective strategy in the market.

The following plot displays all equity curves next to each other as shown in the trade statistics summary table above. The strategies go from top left to top right, then a sequential move down one row starting from the left column. The plot is small because I have combined all the equity curves on the same pane. This does however, provide a high-level overview of the structure of the equity curves for each strategy. Please refer to the Appendix for enlarged screenshots of these equity curves.

Future Catalyst Events and Trading Strategy for H2 of 2024

I have selected strategies 5, 6, 7 and 8 for trading H2 of 2024. I have eliminated strategies 1 and 3 from inclusion as a sample size of 12 is very small. Strategies 2 and 4 are subsets of strategies 5 and 6 which suggests some robustness around these strategies. Strategies 5 and 6 show a nice shaped positive performance from the beginning of 2023 which suggests there may be an edge. The sample size is however small (28) which means there will not be many trades taken based on these strategies. Strategies 7 and 8 have a much bigger trade sample which means there is more statistical significance related to the results. Their equity curves are upwards sloping. There are some sharp drawdowns which suggest there could be events not captured in the backtest data. I would advise carefully checking for unforeseen catalyst events which could impact trading results in the future. I would also suggest setting a stop loss of around 13% on all trades. This is close to the max loss percentage on the different strategies and will not change the distribution of the equity curves while still providing downside protection.

In the Appendix section titled Future Catalyst Events I have taken screenshots of an additional 72 future catalyst events. The below is an illustration of 10 potential trade entries and exits. For all of the strategies the trader would need to wait for the news release. If momentum is a trading rule in the strategy then it would also involve checking if there is a 10% positive momentum or -10% negative momentum from exactly 20 days before the catalyst event through to the closing price on the day of the catalyst event.