Knowsis is a social-intelligence company providing social media analytics for finance.
Topics of research:
System which promptly flags the user when there is a novel anomaly in social activity
from IPython.core.display import Image, display
display(Image('images/VW_chart.jpg', unconfined=True))
Natural way to model counts of events happening in time is to use a Poisson process $N(t)$ with parameter $\lambda$, the rate at which the events happen
$$ \mathbb{E}[N(t)] = \textbf{expected number of events occurred until time } t = \lambda t$$A Poisson regression models the logarithm of the expected number of events as a linear combination of predictor variables
$$ \log(\mathbb{E}[N]) = log(\lambda) = \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_k x_k $$with $\beta_i$ coefficients of the model estimated based on volume history and $x_i$ observable quantities of the datapoint.
Consider seasonality at different scales
Consider recent behaviour
Other variables?
With the expected rate $\lambda_{pred}$, how (un)likely is that we see a number of tweets $N$ higher than the one we have so far?
If this event has probability lower than $\alpha$, it is anomalous!
$$\mathbb{P}(N \geq n_{obs} | \lambda_{pred}) < \alpha$$Given an anomaly period, identify which tweets are novel
Observation: tweets tend to cluster in stories
Given an anomaly period, identify the tweets which are novel not part of old stories
Advantage:
Disadvantage:
general anomaly detection algorithm which is:
two-step novelty detection
Presentation & Notebooks:
https://github.com/knowsis/novel-twitter-anomalies-pydatalondon2016/