Machine Learning (ML) master thread

#1
Hi folks,

I just joined TJ, and am glad to see a lively community here :)

I just thought it would be a good idea to have a centralized ML related thread in the algo trading section where we can ask and answer queries about ML - which ML methods to use, which software packages, programming languages, and what not. There appears to be a lot of interest in this area, particularly among algo traders. I have extensive experience (10+ yrs) in ML and am happy to answer questions when possible.

Sorry if it is not appropriate for a new member to create such a thread. Feel free to delete it in that case (I think the mods might do it anyway) :)

Thanks for reading.
 
#2
Hi folks,

I just joined TJ, and am glad to see a lively community here :)

I just thought it would be a good idea to have a centralized ML related thread in the algo trading section where we can ask and answer queries about ML - which ML methods to use, which software packages, programming languages, and what not. There appears to be a lot of interest in this area, particularly among algo traders. I have extensive experience (10+ yrs) in ML and am happy to answer questions when possible.

Sorry if it is not appropriate for a new member to create such a thread. Feel free to delete it in that case (I think the mods might do it anyway) :)

Thanks for reading.
I recently came across ML and boy its awesome. Any way I am using R
 

NJ23

Well-Known Member
#3
Would you mind sharing where(which aspect of trading i.e. entry-exits/asset allocation) and how(methods/algorithms) you used(or are planning to use) ML for trading?
 
#4
Would you mind sharing where(which aspect of trading i.e. entry-exits/asset allocation) and how(methods/algorithms) you used(or are planning to use) ML for trading?
I haven't started trading using ML - in fact I think that's quite a long way away.

Currently, I am trying to test a few hypotheses, and collecting data and building ML models for the same. One hypothesis I am testing is whether correlation between global indices (if any) can be combined with world news, social media (twitter, FB, etc.) using sentiment analysis to generate buy/sell signals. Think of it as pairs trading supplemented with additional sources of information (news, social media).

I am writing distributed web crawlers to crawl news stories from Google News and other archiving websites (for older news stories). Sentiment analysis software packages work quite well for news stories (they can give a rating from 1 to 10, 1 being the news story being extremely positive and 10 being extremely negative etc.). For twitter and FB, I am writing simple decision tree ensemble based classifiers based on pre-defined set of sentiment keywords (available online freely).

Correlation based measures can be combined with sentiment measures in simple ways - something as simple as linear/logistic regression could do the trick.

Hope that at least gives you a general idea of what I am trying to do. It might sound complicated, but really the only challenge is data gathering :) For the rest of the things, existing software packages will work pretty well.

A nice book that came out recently that talks about quite a few of these themes is http://www.amazon.com/Finding-Alphas-Quantitative-Approach-Strategies/dp/1119057868. It's very general but provides a good outline and pointers for further study.

Good luck !
 
#5
I haven't started trading using ML - in fact I think that's quite a long way away.

Currently, I am trying to test a few hypotheses, and collecting data and building ML models for the same. One hypothesis I am testing is whether correlation between global indices (if any) can be combined with world news, social media (twitter, FB, etc.) using sentiment analysis to generate buy/sell signals. Think of it as pairs trading supplemented with additional sources of information (news, social media).

I am writing distributed web crawlers to crawl news stories from Google News and other archiving websites (for older news stories). Sentiment analysis software packages work quite well for news stories (they can give a rating from 1 to 10, 1 being the news story being extremely positive and 10 being extremely negative etc.). For twitter and FB, I am writing simple decision tree ensemble based classifiers based on pre-defined set of sentiment keywords (available online freely).

Correlation based measures can be combined with sentiment measures in simple ways - something as simple as linear/logistic regression could do the trick.

Hope that at least gives you a general idea of what I am trying to do. It might sound complicated, but really the only challenge is data gathering :) For the rest of the things, existing software packages will work pretty well.

A nice book that came out recently that talks about quite a few of these themes is http://www.amazon.com/Finding-Alphas-Quantitative-Approach-Strategies/dp/1119057868. It's very general but provides a good outline and pointers for further study.

Good luck !
How you are deploying your web crawlers? Is it in servers or in your computer itself. I have wrote a web crawlers which crawl deep into websites for text sentimental analysis and data download. The problem is even for a single query it takes at least 5minutes to crawl and because i am using my system itself it takes lots of system resources and heats up my system.

Any advice
 
#6
How you are deploying your web crawlers? Is it in servers or in your computer itself. I have wrote a web crawlers which crawl deep into websites for text sentimental analysis and data download. The problem is even for a single query it takes at least 5minutes to crawl and because i am using my system itself it takes lots of system resources and heats up my system.

Any advice

I have written my own web crawler, with 30+ websites it fetches data every 5sec and writes it in a database. I run the crawler on my own laptop though no sentimental analysis is performed yet. why is your web crawler taking 5 minutes to get a single query?
 
#7
I have written my own web crawler, with 30+ websites it fetches data every 5sec and writes it in a database. I run the crawler on my own laptop though no sentimental analysis is performed yet. why is your web crawler taking 5 minutes to get a single query?
One thing is my webcrawler crawls a link and grabs the whole text(after parsing) in that link and write to a database then It also do a sentimental analysis on the grabbed text.

For a query the above process repeats for several links(minium 20) and with my system config it takes some time
 
#8
One thing is my webcrawler crawls a link and grabs the whole text(after parsing) in that link and write to a database then It also do a sentimental analysis on the grabbed text.

For a query the above process repeats for several links(minium 20) and with my system config it takes some time

I think the better way to program is to make it parallel(asynchronous) fetching. you can save lot of time.
 
#9
One thing is my webcrawler crawls a link and grabs the whole text(after parsing) in that link and write to a database then It also do a sentimental analysis on the grabbed text.

For a query the above process repeats for several links(minium 20) and with my system config it takes some time
Parallelization will help. Additionally, you don't need to download the HTML to your disk for storage. Grab the page, parse it on the fly (using BeautifulSoup or something) to remove noise/ads from HTML and get the raw text, then do some preprocessing (stemming, stopword removal) etc. also on the fly. Once you do this, your storage requirements would come down by 50-90%.

You don't need to crawl deep into websites. Just go one level deeper from the landing page, and that too within the same (sub)domain, and then stop. Also do some basic sanity check to make sure that the outlinks do not lead you out to some other website/domain :)

You could easily rent some time on cloud like Amazon's EC2 or RackSpace's if you are intending to do this seriously. Amazon provides ML libraries too (including for doing sentiment analysis) on its cloud. This has become really affordable these days.

On a sidenote, I don't think sentiment analysis would work on Indian markets. There's simply too much market manipulation going on.
 

Similar threads