Duo Security Analyzes 88 Million Twitter Accounts to Reveals How to Identify Bot Handles
Duo Security has published technical research and methodology detailing how to identify automated Twitter accounts, known as bots, at a mass scale.
The leading provider of unified access security and multi-factor authentication used machine learning algorithms to identify bot accounts across their dataset. From May to July 2018, researchers collected and analyzed 88 million public Twitter accounts comprising more than half-a-billion tweets — one of the largest random datasets of Twitter accounts studied to date.
Duo Labs researchers also unraveled a sophisticated cryptocurrency scam botnet consisting of at least 15,000 bots, and identified tactics used by malicious bots to appear legitimate and avoid detection, among other findings.
Duo’s dataset is built from information collected through the publicly available Twitter API, and includes profile screen name, tweet count, followers/following counts, avatar, and bio. The content of tweets and social network connections for accounts were also gathered as platform API limits allowed.
Duo Principal R&D Engineer Jordan Wright and Data Scientist Olabode Anise will present their research Don't @ Me: Hunting Twitter Bots at Scale on Wednesday, August 8, at 2:40 p.m. PDT at the 2018 Black Hat USA security conference in Las Vegas. Following the presentation, Wright and Anise will make their research tools available on Github to enable other researchers to identify automated Twitter accounts at scale.
Key findings of the research
Analysis of one of the largest random Twitter datasets to-date, including the application of 20 unique account characteristics in a machine learning model to differentiate a human Twitter account, classified as “genuine” in the study, from a bot. These characteristics include, among others, the time between tweets, distinct tweet sources and the average number of hours per day an account is active.
New open-source tools and techniques that can be used to discover and unravel large-scale botnets.
Discovery and details of a sophisticated cryptocurrency scam botnet, consisting of at least 15,000 bots, including how it siphons money from unsuspecting users by spoofing cryptocurrency exchanges, celebrities, news organizations, verified accounts and more.
Accounts in the cryptocurrency scam botnet were programmed to deploy deceptive behaviors in an attempt to appear genuine and evade automatic detection.
Mapping of the cryptocurrency scam botnet’s three-tiered, hierarchical structure, consisting of scam publishing bots, “hub” accounts that other bots often followed and amplification bots that like tweets in order to artificially inflate the tweet’s popularity and make the scam link appear legitimate.
Duo researchers actively observed Twitter suspending cryptocurrency scam bots, as well as quickly identifying verified accounts that had been hijacked, returning them to their rightful owners. Despite ongoing efforts, portions of the studied cryptocurrency botnet remain active.
In response to the research, which was shared with Twitter prior to publishing, a Twitter spokesperson said:
“Twitter is aware of this form of manipulation and is proactively implementing a number of detections to prevent these types of accounts from engaging with others in a deceptive manner. Spam and certain forms of automation are against Twitter's rules. In many cases, spammy content is hidden on Twitter on the basis of automated detections. When spammy content is hidden on Twitter from areas like search and conversations, that may not affect its availability via the API. This means certain types of spam may be visible via Twitter's API even if it is not visible on Twitter itself. Less than 5% of Twitter accounts are spam-related.”
“Malicious bot detection and prevention is a cat-and-mouse game,” said Wright. “We anticipate that enlisting the help of the research community will enable discovery of new and improving techniques for tracking bots. However, this is a more complex problem than many realize, and as our paper shows, there is still work to be done.”