Functionality: The Alert & Automation System allows rapid detection of anomalies and response to them by alerting administrators and/or triggering an automated response using RESTful queries. The alerts are typically set up to detect underperforming network equipment, security attacks, or excessive usage patterns. The system works as follows: it ingests incoming JSON events, maps them into streams identified by their keys (selected document attributes), and checks if a stream exceeds alerting criteria. If so, it pushes an alert via some of its RESTful plugins. A key feature of this system is it operates in real-time: It adds the minimum delay possible. It uses real-time algorithms, often referred to as streaming or online algorithms. Real-time capability is generally useful and esential when it comes to security or automation.
Ingestion: Alerts can only be as good as the information gathered. The system supports several data sources. It has been extensively tested against vendor-specific events coming from frafos SBC. Examples of situations and respective alert types used to detect them include abnormally long calls (matching expression), a URI failing to authenticate too often (rate alert), low trunk ASR (ratio alert), disconnected registrar (rapid change in registration expirations), unusual country of origin for a URI (memory alert), or an overloaded trunk (too many parallel calls).
Configuration: Specific alerting use cases are implemented by configuring an appropriate alert type. There are several alert types upon which alerts can be set up. To set up an actual alert, the alerts need to be described in JSON configuration format and submitted to the alert processing using an API. Alert types can be grouped into two classes: predefined and custom. Predefined alert types come with a few parameters and are thus easy to set up. Custom alert types take more parameters, such as the stream key and filtering expressions. Configuration examples are shown below (encoded as opaque import strings). The configuration process requires three design steps: realization of the logic (such as "a user's authentication must not fail more than three times a minute"), mapping it into the actual event attributes, and turning both into alert specification.
A demo video showing how an alert is raised for an incoming call through opsgenie and slack is here https://www.youtube.com/watch?v=jEinikRqeSA
This alert is based on “parallel calls” session tracking which keeps track for trunk’s both incoming and outgoing calls (key attrs.dst_ca_name|attrs.src_ca_name) and raises an alert if number of sessions exceeds a threshold for the respective trunk.
eJx1kE9LxEAMxb%252FKkPMgqwcPvS0F8SIUVs9LnElp2flTknTdstvvLq0Va8Vb8n4v4SVXoITvgaCoMQhZ6JAxkhILFFeo26DEU3XG0BMUAKOFEw1rCVVZ7rzo0eExYaTblyLsvpVl6gXltF0W8bKS7kcLTKJ1H944bL0LeYr6D3n2f8IuZN9rs0XacFYNVBG32a%252Fo4243WhA6E7e6vvXhZ2HZYEq0iTha8CSO207bnKCAcm%252BE3c2LmogXU5VgQYducpeH17md%252F1NAhYwhUDAlhiCmzmzQuF40RzM93EKb6gwFaM4mYhqMm42aP5C9%252FDaPnxt0oI0%253D
The alert type is “string-match” type and it looks for call-end event types that show either’s direction average MOS dropped bellow 3.0. ! attrs.rtp-MOScqex-avg-a>3.0 & rtp-MOScqex-avg-b>3.0 .
eJx1kd9Kw1AMxl8l5mK76cpQ8KKgIAPxZqgUHyDtctbDzj9z0rkx6rNLx8Qy8C75fuHLR3JCDtQ4xsqQy1xgIiHPypKxOqFNjaN252zWsd2T639HhwKNdcoyAXgDpCq5FE2L9WvdfvJhQfvtgh7vyiXM4FpvRh2HAoWzmt59iJvaTciz13%252FIy0b%252BIU%252B9dtdIO4mqjt9YbNxM6P1yORSYec9i9TgBt3%252BGq45C4KuIQ4Ebzq3YpDYGrNDFL3iPNRaoxzTO1GssMJAf66xiwxY8adthgTaYiBWu%252BqzRAzkWBe1IwdOOM1iFFHO2jWPQCJlJ2g5MlPOdbdMrwzlJLqFO3FpjWxpjQHMEPpBPjqvLT4xE%252Fz03MZYNyRxmF7k9r37ow4YNDj9trbWP
Ratio alerts detect if a subset of events exceeds a per-centage and is often used to detect situations when per centage of failure is beyond acceptable. Here we observe share of "call-attempt" events with specific SIP code (regular expression match attrs.sip-code~"404|408|500|503") in the mix of all failed call attempts and call starts (attrs.type~"call-attempt|call-start"). The percentage is determined and alert raised individually for every Call-Agent (trunk) as the alert key is attrs.dst_ca_name .
eJx1U02L20AM%252FStiDs3FCW43W0qghRAohVIopb0tLLItx8LjGSPJScMm%252B9vL2FvWpM3J9vvQPEnjJ0cBC09uU6NXylyPgh0ZibrNk2vplB4H9AO5jUMz0VWl9ljiY8CO3CVLmm%252Bo7VyXYO4Lj2XrWW1GjadcMnfkUMXjjLi%252FZK4Tzx3P5Xd5qrQPUWiGvk2oDkXN3kj%252BDajcL8tY0fODW%252Bfr8zr%252FcL7P8%252FN9fvcwRrvls1OfPCV6v0Qz6no7jx9qKDZ5Yz%252F3fUqQkqfSrvsXUqsH%252F0v8DeZzd8vzpZIbzHaw5pqyRqKZp%252B8kHKsZ%252Bz4f50TCpP9ZgdKBhG2%252B4HevJ%252B0aDIGusl8yV5GWwr1xDGlu0PC%252BgUoNdluokf0gBIJGLnNpnG7jdl9%252FuMyNt2XjykEtdsuWTknF0WWOQx2T7JVBT2JgDRp02JICG%252FRRlQtPYBFK9OXg0WiqAbEGBB0KJUvv0z6oAjpQMF3Bz4auQcBQ%252FbVYQ0KphhBoTyXXTBUUJ5juCYc90O9eSJVjmGIZtgR1lA48twTT9aklds%252BLOsZVgbKANy%252Fw1PLHIVRUr2AL4zghiacsU4nFy781jC2FBXR4goJgUKoyUO56T1DGYcwuEw4xRT%252Bykrv8AQhNUFk%253D
Long calls may be easily detected by checking the attribute "attrs.duration" in "call-end" events. Use alert type “string-match” and to looks for call-ends over one hour in length using the following matching expression attrs.type='call-end' & attrs.duration>3600
Note an important nuance: the basic string-match alert-type is keyed to a tenant and all incoming events are thus processed as a single "stream". If multiple long call events arrive for different From URIs, the later events can be absorbed by throttling. If each URI is to be handled individually, the "custom-key stringmatch" alert-type shall be used instead.
eJx1kc9KA0EMxl8l5GAPjmVV6GGhghTEiyAUHyA7m3GHzp8lky2Wsj67bFW6FHpL8gtJvi9H5ERNYKwdhcIGexKKrCwF6yP6vglkd8EXndI9heG%252FdTTofFCWGUBSlbLUQ8%252FrhaUQ7ji1C7iB33o7CKnP6elxVVU4GhQu6obwIWE%252BZEZeol4hr61cIc%252BDdpdIO8mqgd9ZfG5ndFVVo8HCexavhxl4OA%252FcdJQSX5w4Gmy5WPH9pAhrvO9uYZKMBif9WOP2DQ0milNcVHz6hEhqOzTok8tY42YomiNQYFHQjhQi7biAV%252BhzKb4JDJqhMIntwGU5GembQRlOp5QlbHu23nl7chaaA%252FAXxT5w%252FWe6kxy%252FFy7nZUNy%252FoU9rV4PqWWH4w8o07OI
A rapid decrease or increase in registrations typically indicates a connectivity issue with a registrar or a registration "avalanche" afterwards. The "sudden-change" alert type observes occurrence for some events (here filtered by attrs.type="reg-expired"). An alert is raised if a short-term occurrence departs from smoothened mid-term occurrence more than by a threshold. The alert key is tenant id tls-cn: it covers all domain's events in a single stream.
eJx1kEFrwzAMhf%252BK0TkZ3WBjFHYYhbHLYDB260WJlcTUkY2spA0l%252F324LTTb6MnofU%252FWk45AjJUnWDfoExUQUbAnJUmwPsKOpvyM6AeCNahPZc0wFxl8YNotYZaFkjaD%252FxZ%252Fg7z1eoO8W7lBXgft%252FqLGeaVfDagq6U6nSC9bEGpLOkQnZLcn%252F96xDfuF%252F3EuwNLoUF3ghf68mgtwLQehhXqf1USe6n%252FxtZOg6umTxAW7oE%252Brc9NI4nR5xofrapsOmenPsU7JUi0unrMBHWIp1CZTUYtsNJi6Q27JCEZn%252FQQF5L1hDZuvDRTA2OciDdYSX7xQgOMm5EOZ06jrZ4o7MljJEPViTgaTacLA1oRBTTWZOvQRxXFrUhdESyXpDbI1PnB7rvowZo4jCbaUYP4BgV3OhQ%253D%253D
A telemarketer is easily detected by an abnormally high ratio of calls that are declined or terminated within the first few seconds of a call (attrs.type~'call-attempt|call-end', attrs.duration<5). The alert relates to users and thus the key is set to attrs.from. The detection is easily implemented using the ratio alert by alerting on a high percentage of such short-duration calls. What needs to be fixed, though, is the call-attempt events don't include a duration. A matching expression "attrs.duration<5" would work for call-ends but disqualify call-attempt events from processing. Call-attempts thus need to be fixed using Transformation to include 0 duration first of all.
transformation:
function addDuration(event) {
if (!(event.attrs.type==="call-attempt")) return;
event.attrs.duration=0;
}
alert config:
eJx1Ul2L1EAQ%252FCvNPJiX3HIKCi4qHAciiCCiP6CTVC5D5iN0d3Zdzr3fLrNRNu65TzNTVd1TU9OPDombALftOShqN7FwhEHUbR%252FdiENZdhxmuK1jM9FNLzm6Y13IL6zjWlBgPzWB2zF4tRV1an%252Bs3d6nLu9XxMvbY%252B2iBB%252F9Wv%252B2wP4hZcGlWOem98Egz611s7D5nN69Lk6uqeww4alqOYQbNkOc7NfpgNRVpS5P65oPBVIEtHb5VIFaP4cfEq4wH%252BO1mk%252BdXGHuZhsuKRskmwV8hfjcrdg3t6dEIB76n7QVO4i39Se%252BOt90P3BKuPB%252BrF0HbcVPJUi3dYaAyDKWoXC1K%252BG5rbv%252F%252FM3VLnEsh3ZWy%252FFmxIFO%252Bbva%252BdTnIjszHCBGNrBR5BFK3mjKqr4JIMvUcmjnwIalB%252BWemHRuFFb2yw%252BgI%252ByQTDf0fcAlSJy6vyU2QFB6CEgntL736Kg50DIVPj0Qfk4CVZ%252FTYst4BPVZIgU%252Fgs7T%252FlT1OW8alope%252FIGXJ7%252BfU4d%252BQ3d0CpCKePGytKj%252BHcuKIh%252BoAc2Krib1cQqgNs8n77LglIv1vVe4428wwUOh
Abnormally many regions from which a user accesses a service may be a sign of a frequent traveller, or worse, a remote attacker. The "memory" alert type observes users keyed by attrs.from and associates their geographic regions with them as in geoip.region_code. An alert is raised if too many new regions are discovered.
eJx1kEFrwzAMhf9K0GkDU8oGO%252BQ2CmNQCrvsPNzkJTWxLSMrhVDy30eywUxGT0L63kNPuhGiPXtQ3VmfYShZsQEKyVTfaMC0lKv1I6gmqyp51wkHms0CTzYPpWAZuz6yoJg%252Bz4YyPBotpT3YpZ2gdxy%252FGm6xetPZ22bwLpfaNdpsSJC1G%252F2n%252BO3OX%252FIW9A55b%252BUOeR31skWd84p%252FBr0Iq3p8QBy3BX3Z79cTrxCn5cOe%252FtYcLjZGbILPhlrkRlxSx5Fq%252BnlHlQeXEqR6cCGxKNpHMqRTWlyH44kMRRuWphmzcqgGTFVAYJlo%252FgasjZ8w
Too frequent failures to authenticate from an IP address may be caused by an innocently forgotten password. At a higher rate they however indicate an attempt to guess password. This rate-alert looks at failing authentications (attrs.type='auth-failed') and relates to originating IP addresses (attrs.source). The alert is raised when 5 failures occur within 5 minutes. An alternative to this configuration may be use of From-URI as key: this avoids alerting on IPs shared by multiple users with different URIs.
eJx1kEFrwzAMhf9K0KUXd5TBdgjsMMpKdxiUsf0ALXlpTB27yHK6UPzfh9kltOQk9D4h6b0rwfOPA9UqCYbOLDxAIZHqK50wlTKyS6CaWFXiQwxJGlA2BX9wPM1HihwhFnGmduwisqGL9W24zMBTNjTw79sIr%2FFG76xTyP15nc54WXHSft2xdWhX%2FzcdGr39RBC1S%2B5b3ALZ9uw9luhuWNq4b2WBvCbt7wMZIVbnWT5mQ9pLUHU4QGxoZ%2FB5s8nZUIvYiD2rDZ5qej9Uxa%2F1x0pDVfzDq21YQYZKKFTT9mv%2FSYY8D6VrUtQwrE%2BYqt4e%2B0rKbP4DDvqpnw%3D%3D
It may be useful to alert on traffic from a new User-Agent. This may indicate an attack, a firmware update, or a new type of device. The "memory" alert type observes UAs keyed by attrs.from-ua and associates the UA name. An alert is raised on the very first occurrence of a user-agent.
eJx9kN9Kw1AMxl8lxIvdnIlTNrB3YyCCDATZA6Rduh56%252FpScdKPM%252BuxyZGCd6F3IL9%252BXLzljIM9YYNUnjR5aHsCzjzKgQRvqiAW%252BMYOtgQKQqtiyVwaboGoo2HAAG4CyjvfARw4KSYXJo0Edumy9edmiQQ5UOsZCpWeDe06V2E5tDFhg4BPs1hADaMNQuli1aLAjIc%252FKkrA4Y22dsuTqSK7PvjlNus1LPmbCh3ng03tFzs2TkugMR4MtD78VtUQ%252F7%252BnCt5Ta6Uxu20OIwpPuYjSY2HGl%252F9rZrnRUtc6m6VxNLvFoUDhp3buduOt9F%252FLk9Q%252FyvJc%252FyLrX5hppI1HV8SuLjfsJfVgsl6vH%252B9XXNUcWq9P33H27bhoKgX%252FkTPmymwWO4%252FgJ1sjA4Q%253D%253D
If a traffic probe supports detection of attacker signature, a high presence of failed calls coming from SIP user-agent with a specific signature indicates a distributed attack.
eJx9UtFq3DAQ%252FJVFD70X5zgKLeQghXClL6VQSn9gbY9zi2XJ3V0lPdLLtxedW%252BKE0idJs7Oj2ZEeQ%252BIJYR%252B6Yp6nqxEnUnbJoQmShhz24fBc4Qh18iM7TTzCSJzmbCZtBHmmjmNXIjsWDcoDMVlpDV73hojO0RPukdy29P2I1yBx6v%252B2%252BBGKqqEgm9HJIOipPdEg0aGS7gg%252FZ4WZ5LTYch5BQ9aJoowgdlfbDpqnp82Q87Zl3dCbP%252FAy8k1JPYYt3dI9xwKq5MXLIrFZyH25jJQ2NPGJWlAx9A2ZTHMEdblcvOuCU67WH8QQmuCnuQZ8%252BPwtNAGJ24iwdy1oQg%252FrVOaqG%252Fbho5irtKWGceAY6daduzE0YWblCQ61sH8MI051udgN%252B2AybxU%252FCsy3Jnfh3FTGF7ZxzaqwQQW2QgeOhnMTZG4jd2MU839UHyT1%252BWFVeHduwqRRJlnTr6%252Br0l3KihX6dlcvLu3yZGtHS641nZtNxzFesTum2TfV6v%252FoTy%252Fovy4Hc9alM8%252Frrg%252FL3PWLvU5DYT6U%252BGl6UVm%252BuLIY%252Bsryo2b3iK9Qyf2K%252BX53mQz3UPH1g%252ByetQ9HTgnx5c3n828fzDcY