Pair Trading - Exploring The Low Risk Statistical Arbitrage Trading Concepts

VJAY

Well-Known Member
Yes, SS1, SS2 need to be defined...before calling get_beta() function. The parameter lb=20 stands for the look back period default I have kept it to 20days. Unsually should be same as that of zScore lookback period.
I think i done some thing wrong here ...
1534943199256.png
 

VJAY

Well-Known Member
Run all the cell from the begining first, as you have added the get_beta function to the function cell newly, you need to run the cell for it to register in python and understand it.
Yes I done it thanks ...I not run stockdata file :)
 
Python:
import pandas as pd
df = pd.read_csv('stock_data.csv').iloc[:,1:]
from itertools import combinations
combin = combinations(df.columns, 2)
corr_map = df.corr().stack()
result = {x:corr_map[x] for x in combin}
pd.Series(result).sort_values().to_csv('pairs.csv')
Hi,
@UberMachine gave this code to get correlation values for pairs.
@UberMachine, @ncube - A request, please write a code that generates p-value and z-score for all possible pairs from "stock_data.csv" file and writes to another file.
Thank you.
 

ncube

Well-Known Member
Hi,
@UberMachine gave this code to get correlation values for pairs.
@UberMachine, @ncube - A request, please write a code that generates p-value and z-score for all possible pairs from "stock_data.csv" file and writes to another file.
Thank you.
@Vevensa_P ,
The excersise of finding all possible pairs from stock_data.csv will not help as the combinations will run in lakhs. See p-value is the co-integration significance score, it will depend on how the pairs are considered. i.e pairs WIPRO-TCS & TCS-WIPRO are different, It can happen that WIPRO-TCS has a good p-value but for TCS-WIPRO the p-value would be worse. Hence if we try to find the P-value of all the 500 nifty shares then one need to find all the combinations and it can run into lakhs.

Hence best way to find the suitable pairs for analysis is either by fundamental analysis or use some machine learning approach. Once the pairs are identified we can check the p-value & zScore only for these set of pairs.

Coming to correlation values for pair, please note that correlation and co-integration(p-value) are not same. A pair can be highly correlated, but they may not co-integrate, hence one need to be careful in selecting the pairs, focus on finding co-integrated pairs rather than correlated ones.

What is the difference between correlation and cointegration?

When talking about statisitical arbitrage many people often get confused between correlation and cointegration.
  • Correlation – If two stocks are correlated then if stock A has an upday then stock B will have an upday
  • Cointegration – If two stocks are cointegrated then it is possible to form a stationary pair from some linear combination of stock A and B
One of the best explanations of cointegration is as follows: “A man leaves a pub to go home with his dog, the man is drunk and goes on a random walk, the dog also goes on a random walk. They approach a busy road and the man puts his dog on a lead, the man and the dog are now cointegrated. They can both go on random walks but the maximum distance they can move away from each other is fixed ie length of the lead”. So in essence the distance/spread between the man and his dog is fixed, also note from the story that the man and dog are still on a random walk, there is nothing to say if their movements are correlated or uncorrelated.With correlated stocks they will move in the same direction most of the time however the magnitude of the moves is unknown, this means that if you’re trading the spread between two stocks then the spread can keep growing and growing showing no signs of mean reversion. This is in contract to cointegration where we say the spread is “fixed” and that if the spread deviates from the “fixing” then it will mean revert.

Source: http://gekkoquant.com/2012/10/21/statistical-arbitrage-correlation-vs-cointegration/
 
Last edited:
@Vevensa_P ,
The excersise of finding all possible pairs from stock_data.csv will not help as the combinations will run in lakhs. See p-value is the co-integration significance score, it will depend on how the pairs are considered. i.e pairs WIPRO-TCS & TCS-WIPRO are different, It can happen that WIPRO-TCS has a good p-value but for TCS-WIPRO the p-value would be worse. Hence if we try to find the P-value of all the 500 nifty shares then one need to find all the combinations and it can run into lakhs.

Hence best way to find the suitable pairs for analysis is either by fundamental analysis or use some machine learning approach. Once the pairs are identified we can check the p-value & zScore only for these set of pairs.

Coming to correlation values for pair, please note that correlation and co-integration(p-value) are not same. A pair can be highly correlated, but they may not co-integrate, hence one need to be careful in selecting the pairs, focus on finding co-integrated pairs rather than correlated ones.

What is the difference between correlation and cointegration?

When talking about statisitical arbitrage many people often get confused between correlation and cointegration.
  • Correlation – If two stocks are correlated then if stock A has an upday then stock B will have an upday
  • Cointegration – If two stocks are cointegrated then it is possible to form a stationary pair from some linear combination of stock A and B
One of the best explanations of cointegration is as follows: “A man leaves a pub to go home with his dog, the man is drunk and goes on a random walk, the dog also goes on a random walk. They approach a busy road and the man puts his dog on a lead, the man and the dog are now cointegrated. They can both go on random walks but the maximum distance they can move away from each other is fixed ie length of the lead”. So in essence the distance/spread between the man and his dog is fixed, also note from the story that the man and dog are still on a random walk, there is nothing to say if their movements are correlated or uncorrelated.With correlated stocks they will move in the same direction most of the time however the magnitude of the moves is unknown, this means that if you’re trading the spread between two stocks then the spread can keep growing and growing showing no signs of mean reversion. This is in contract to cointegration where we say the spread is “fixed” and that if the spread deviates from the “fixing” then it will mean revert.

Source: http://gekkoquant.com/2012/10/21/statistical-arbitrage-correlation-vs-cointegration/
@ncube,
Yes, I agree, it wasn't a good idea to know p-value and z-score for "all possible pairs from stock_data.csv file".
Say, I want to monitor the pairs from the nine private bank stocks which are part of bank nifty index. Those nine stocks make seventy two pairs. I can only imagine monitoring them by creating a cell for each pair in jupyter notebook and run them every day.
The python code requested can make the job easy to filter pairs that satisfy the condition "(p-value < 0.05) AND ((z-score > 2.0) OR (z-score < -2.0))".
You have clearly explained the difference between correlation and cointegration. Thank you. A few days ago I checked a pair having less than 0.01 p-value, the correlation was only 0.39. Until then, I was assuming that correlation and p-value are expressing the same thing differently like 0.97 correlation is approximately equal to 0.03 p-value.
 

VJAY

Well-Known Member
Pvalue-0.0183
1535190198851.png
 
Last edited:

VJAY

Well-Known Member
Dear ncube ,
Can you please share code if have for this formula.......I want to get last 20 days average daily range for scrips of pairs...is it possible?
 

ncube

Well-Known Member
@ncube,
Yes, I agree, it wasn't a good idea to know p-value and z-score for "all possible pairs from stock_data.csv file".
Say, I want to monitor the pairs from the nine private bank stocks which are part of bank nifty index. Those nine stocks make seventy two pairs. I can only imagine monitoring them by creating a cell for each pair in jupyter notebook and run them every day.
The python code requested can make the job easy to filter pairs that satisfy the condition "(p-value < 0.05) AND ((z-score > 2.0) OR (z-score < -2.0))".
You have clearly explained the difference between correlation and cointegration. Thank you. A few days ago I checked a pair having less than 0.01 p-value, the correlation was only 0.39. Until then, I was assuming that correlation and p-value are expressing the same thing differently like 0.97 correlation is approximately equal to 0.03 p-value.
@Vevensa_P , Yes, pairs can be easily screened using some python looping function, but it will take some dedicated coding efforts as I do not have any ready-made code for it. I am sorry I will not be able to help you in this regard as I will be busy with my office work for next few months.
 

ncube

Well-Known Member
Dear ncube ,
Can you please share code if have for this formula.......I want to get last 20 days average daily range for scrips of pairs...is it possible?
@VJAY, I think what you want is 20 day ATR (Average True Range) value for each of the stocks in the pair. You can easily add this indicator for the stocks in any standard charting platform as it is a common indicator.

However if you need it in python, you can use the TA-Lib library mentioned by @UberMachine in his thread. ATR requires the High,Low & Close values for calculation and hence need to be done on the complete stock data before running the pair trading functions.