Saturday, September 21, 2013

Machine Learning to identify Ad Servers in Clickstream

Being able to sift through the clickstream of thousands of people is interesting in the search of insight is a fun task.

However clickstream is fill of noise:

  • Ad servers;
  • Iframes
  • Analtyics trackers (adsence, commscore etc...)
Ad servers are particularly annoying - they are shape shifters - continually adding changing both domains and subdomains.

However - noise has patterns:


  • Referrers
  • Redirect Codes
  • HTTP Headers
  • In-discriminant appearances across the web.
These are the sort of patterns that machine learning could eat for breakfast!

No comments:

Post a Comment