Listening to Baby Driver Movie track "Was He Slow" hoping my code will run faster. This part won't. It is data validation. We start off with 3,437 stocks, and perform initial data validation. The first run takes 24 minutes in Python with Yahoo Finance data (thank you Yahoo Finance). We run 10 automated check on each stock ticker, and we filter out and remove any stocks with missing or invalid data. That part takes 15+ minutes and is tedious. We also had one BETA < 0 stock from work earlier today and removed that stock. More are coming with negative BETA values I would guess.
Had to run data validation a second time to remove another set of ~25 stocks without valid data. I found them after the first validation (same checks), but then the overall download failed to deliver enough data (e.g., 252 days of adjusted close per ticker).
BTW, it is after midnight and I really want to see what stocks come through the analysis. We have had a very volatile week since the election. BTW, our two latest picks, AX/MCRB and PDCE have torn the cover off the baseball...hit it out of the park, and in other words, seem to continue to outperform the benchmarks.
Today was a busy day of coding on our CQNS model. We did a few things and found it runs faster and better. Those things were:
1. We went back to the QUBO friendly formulation of the CQNS. It is a little less 'true' to the economics, but it allows us to have QUBO and algebraic solving occur with the same formulations. This is where we take the individual stocks and raise their expected returns to a power, then calculate the portfolio return. We had switched it from taking the expected return of the portfolio, and then raising that to a power. It is truly a minor point, only a few basis points of difference, but it makes the code run better.
2. We scrubbed the first part of the code, the one that has data validation and the first solution of the problem. We removed all extra code or charting or reporting from this first section. This one is too large for quantum computing to solve (going from 3,200 stocks to a smaller number of stocks (say N). Ran it before dinner and it screamed with speed on my 2013 iMAC with 16GB RAM. It might be time to buy a new iMAC...or not be lazy and dust off my HPe server.
3. We dug a little deeper into the D-Wave Simulated Annealer and had it find N stocks (out of a smaller set of 422 stocks) by adding back all the systems parameters and optimizing them. Let's hope that same tuning works for 3,200 stocks. Tabu sampler tuning still did not provide us a matching portfolio (e.g., N stocks out of 422), but it came closer. Not giving up on the MultiStart Tabu solver.
4. We decided to keep SPY as the ETF for the S&P 500 instead of using the ^GSPC index. It works well.
5. We find a very small bug in our calculation of log returns. The difference between day zero and day one throws an error. A few weeks ago, we decided to 'fix' that error and fill the first day with a '1.0' value. We did that by switching from .dropna to .fillna(1). The problem is that this creates a first day doubling of every stock on the first day of the sample. Log returns value of 1, is a 100% percentage gain. I believe this is what caused the 68 stock solution before election day. This had artificially reduced all the BETA scores, expected returns, and likely other unintended consequences. This is now fixed, and we have a more normal, wider range of BETA values.
6. We have also put our solvers to work in different ways, and this is helping us get slightly better answers overall. For example, we now use our bespoke simulated annealer against the best answer already found, and as a solver with a random seed. As a pair, this does better. We also are training our genetic algorithms to do better by adding different types of mutation in the breeding process, and continuing to run the GA against the best answer seen so far, and doubling generations and initial random seed sizes.
7. The one issue we have that we cannot code around is the D-Wave Advantage (TM) 1.1 solver which allows us to go from 64 to 134 stocks at one time on the quantum annealer. If we want to run more than 64 stocks at a time in the second program...then we need it. However, tonight the waiti time to try an embed 3,200 stocks on the D-Wave was over 25 minutes. The embedding did not fail, it timed out...and then was going to time out for 1,000 runs. So, we are going back to the D-Wave Systems 2000Q Chimera architecture to run our quantum algorithm. It has wait times in the seconds, not minutes.
8. Therefore, we are going to bring our solvers down to N stocks in the first program, and N will be 64 stocks or less. After that, we will run all 64 stocks through our set of classical solvers, and through a battery of runs on the D-Wave Systems 2000Q (solver #6 tonight). This allows us to scale back up to the D-Wave Advantage and hopefully see better performance on a 64 stocks solution with less chain breaks and stress on the QUBO / qubits.
Oh, and our new research into the higher moments of stock prices is progressing. We have a simple way to identify our outliers for kurtosis and skew. These metrics, along with expected return and variance, can give us a better understand of each individual stock. We are still working on a formulation for calculating these metrics for a portfolio, as opposed to for a stock, which is called cokurtosis and coskewness.
The final word...I spent many hours this week fixing our website and optimizing it for search and advertisements. Please take a look when you have a chance, and I welcome your feedback.
Everyone please stay safe out there. We are likely heading for a 'stay at home' order in Illinois.
President, Chicago Quantum, US Advanced Computing Infrastructure, Inc.
Strategic IT Management Consultant with a strong interest in Quantum Computing. Consulting for 29 years and this looks as interesting as cloud computing was in 2010.