Ever wondered why Botnets such as Conficker would generate domain names that look gibberish, i.e. from a language with no properly matching vowels and consonants? Despite the massive sophistication exhibited by Conficker, it left only one Achilles heel, and I recently helped develop a method to detect Conficker. Our method is “zero-day” in that it should be able to detect even future “domain fluxing” botnets automatically by exploiting that one flaw in their design.
Recently several Botnets such as Conficker, Kraken and Torpig have used DNS based “domain fluxing” for command-and-control, where each Bot queries for existence of a series of domain names and the owner has to register only one such domain name. Unfortunately, in each case, someone had to reverse engineer the bot executable and then determine the sequence of domain-strings that would be generated every day. As one can imagine, this process is time- and resource-intensive and too much valuable time may pass before someone finds out all the domain names that would be registered by a Botnet and races ahead to register them all. We thought hard on this problem. On how can we make a first-alarm generating system of sorts which can take a look at all DNS traffic and provide instantaneous feedback on whether there is a domain flux activity present in it.
After several months of work, we recently presented a paper on our methodology at ACM Internet Measurement Conference on Nov. 1 2010 in Melbourne, Australia, on how to detect such domain flux botnets in real-time, i.e. in a zero-day fashion.
The methodology is based on an interesting observation. Whoever has tried registering for a domain name knows how hard it can be to find a name that is not taken. And that’s the very reason why Botnet developers had no better choice than generating unpronounceable words. For instance, Conficker bots generate names such as joftvvtvmx.org, gcvwknnxz.biz, vddxnvzqjks.ws. Kraken bots generate domains such asbltjhzqp.dyndns.org, ejfjyd.mooo.com, mnkzof.dyndns.org. The difference is striking in that these domain names consist of words that are absolutely unpronounceable. Of course, as Botnet owners have to ensure that whatever domain name they choose to register, it should be easily available, and given that almost all domain names are nowadays taken, they were left with no other choice but to generate these domain names randomly. In most cases, they did not even bother to generate strings with similar distribution of vowels and consonants as normal latin-language words. In fact this forms the hypothesis for our domain flux detection, where we look at distribution of alphabets in a domain name to infer whether it is good or bad.
Title: Detecting Algorithmically Generated Malicious Domain Names
Authors: Sandeep Yadav, Ashwath Reddy, A.L. Narasimha Reddy, Supranamaya Ranjan
Presented at ACM Internet Measurement Conference 2010
Recent Botnets such as Conficker, Kraken and Torpig have used DNS based “domain fluxing” for command-and-control, where each Bot queries for existence of a series of domain names and the owner has to register only one such domain name. In this paper, we develop a methodology to detect such “domain fluxes” in DNS traffic by looking for patterns inherent to domain names that are generated algorithmically, in contrast to those generated by humans. In particular, we look at distribution of alphanumeric characters as well as bigrams in all domains that are mapped to the same set of IP-addresses. We present and compare the performance of several distance metrics, including KL-distance, Edit distance and Jaccard measure. We train by using a good data set of domains obtained via a crawl of domains mapped to all IPv4 address space and modeling bad data sets based on behaviors seen so far and expected. We also apply our methodology to packet traces collected at a Tier-1 ISP and show we can automatically detect domain fluxing as used by Conficker botnet with minimal false positives.