Identifying a Domain Generation Algorithm

 



Threat actors often deploy Domain Generation Algorithms (DGAs) to bypass detection mechanisms. A threat actor or malware developer can use a DGA to generate a large number of random domain names, which are used to connect to command and control (C2) servers. While not all generated domains successfully connect, DGAs remain a widely used technique among cyber adversaries to evade detection.

The Key Question:
How can we identify DGA-generated domains manually, without relying on external monitoring software?

To do so, we must recognize several suspicious characteristics that indicate a domain may not be legitimate. Let’s explore some common signs:


1. Random Characters

Legitimate domains are typically readable and meaningful (e.g., google.com). In contrast, DGA-generated domains often consist of seemingly random characters, such as:
ww0pm65l0s68o[.]com


2. Digits or Numbers

Most legitimate domains rarely include long sequences of numbers. DGA domains often contain multiple digits sometimes more than 3 or 4 in a row:
Example: ww0pm65l0s68o[.]com


3. Pronounceability

Legitimate domain names are generally easy to pronounce and remember. DGA-generated domains tend to be hard or impossible to pronounce, which is a strong signal of suspicious activity.


4. Top-Level Domains (TLDs)

While clean domains commonly use TLDs like .com, .org, .net, and sometimes .io, DGA domains often use unusual or less common TLDs such as:
.gg, .xyz, .biz, .top
Note: A common TLD does not guarantee safety, but uncommon ones used in random-looking domains can be a red flag.


5. Shannon Entropy

Shannon entropy measures the randomness of a string. It's a useful metric to detect DGA domains:

  • Entropy < 2.5: Likely normal

  • Entropy > 3.5: Potentially suspicious

For example, google.com has low entropy, while ww0pm65l0s68o.com has high entropy, indicating a more random (and potentially DGA-generated) domain.

6. Regular Expressions :

A regular expression is a pattern that the regular expression engine attempts to match in input text.

  • \b[a-z0-9]{20,}\.(com|net|org|info|biz|top|)\b
  • \b[b-df-hj-np-tv-z]{6,}\.(com|net|org|info)\b

6. DNS Lookup Results :

We can identify DGA-generated domains by using WHOIS lookup. One of the most suspicious signs to look for is a very recent registration date for example, domains registered last week or last month.


Practical Scenario :

As a threat hunter working on a proactive threat intelligence team, you've been tasked with identifying potential Domain Generation Algorithm (DGA) domains being used by an active malware campaign.

Your manager asks you to:

  1. Go to https://domains-monitor.com/ and download a list of newly registered domains from the past 24 hours.

  2. Manually analyze the list and identify which domains may have been generated by a DGA.

  3. Document patterns, anomalies, and any characteristics that raise suspicion.

Your goal is not to rely on automated tools or DGA classifiers, but to use your intuition, experience, and basic investigative techniques to triage and flag domains that stand out due to randomness, structure, or suspicious trends.

Solutions :

I have downloaded the newly registered domains from the website that my manager requested, as shown in the screenshot below:



Next, I will use regular expressions to find patterns that match our criteria. In addition, I have provided regular expressions to help identify random characters commonly found in DGA-generated domains. The results are shown below:


Next, I will apply some of the signs I mentioned earlier to the matched result such as the presence of more than 3 or 4 digits and random characters. As we can see, the following domain matches several indicators commonly associated with DGA-generated domains: 0538c3060b94774da3ee011552255739.top

The suspicious signs present in this domain include:

  • More than 4 digits

  • Random characters

  • Uncommon TLD (.top)

PoC :


As we can see, suspicious domains are difficult to pronounce. They cannot be easily read or spoken, and often require effort to read them out word by word.

In addition, we can see that the domain contains a long string of random characters, such as c3060b94774da3ee011, which is a strong indicator that it may be a DGA-generated domain.

Once you've used regular expressions or manual inspection to filter potentially suspicious domains from your list, a good next step is to check their Shannon entropy  a measure of randomness that can help flag algorithmically generated domain names (DGAs).

You don’t need any additional tools for this. Just follow these steps:

  1. Go to CyberChef.org.

  2. In the left panel, search for "Entropy" and drag the "Entropy" operation into your recipe.

  3. In the right-side input box, paste one of the domains you want to analyze.

  4. Observe the Shannon entropy score.

A higher entropy value usually means the domain is more random potentially the result of a DGA. Compare it against known legitimate domains to get a baseline.

To validate the findings, we can submit the matched domain patterns to VirusTotal and examine the results for any threat indicators :


An additional step we can take is checking the WHOIS record to see the domain's registration date, which can provide further indication of whether it might be DGA-generated. However, I recommend applying the previously mentioned signs first before checking the WHOIS data, since a recent registration date alone is not a definitive indicator. For example, just because a domain was registered today doesn’t necessarily mean it was generated by a DGA. To strengthen the analysis, I performed a WHOIS lookup on the domain that matched our regular expression pattern:



Comments