Repeated DNA Sequences - Practice Coding Problems

Repeated DNA Sequences - Problem

The DNA sequence is composed of a series of nucleotides abbreviated as 'A', 'C', 'G', and 'T'. For example, "ACGAATTCCG" is a DNA sequence.

When studying DNA, it is useful to identify repeated sequences within the DNA. Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

You may return the answer in any order.

Input & Output

Example 1 — Basic Repeated Sequences

$ Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

› Output: ["AAAAACCCCC","CCCCCAAAAA"]

💡 Note: AAAAACCCCC appears at positions 0 and 10. CCCCCAAAAA appears at positions 5 and 15. Both sequences are 10 characters long and occur more than once.

Example 2 — No Repeats

$ Input: s = "AAAAAAAAAA"

› Output: ["AAAAAAAAAA"]

💡 Note: The entire string is one repeated 10-letter sequence AAAAAAAAAA which appears twice (positions 0 and 1).

Example 3 — Short String

$ Input: s = "ACGT"

› Output: []

💡 Note: String is too short (4 characters) to contain any 10-letter sequences, so return empty array.

Constraints

1 ≤ s.length ≤ 10⁵
s[i] is either 'A', 'C', 'G', or 'T'

Visualization

Tap to expand

Understanding the Visualization

Input DNA

Long DNA string with nucleotides A, C, G, T

Extract Sequences

Get all 10-character substrings using sliding window

Find Repeats

Return sequences that occur more than once

Key Takeaway

🎯 Key Insight: Use sliding window with hash map to efficiently track 10-character substring occurrences in one pass

Asked in

Li LinkedIn 8 a Amazon 6 M Microsoft 4

The key insight is to use a sliding window to extract all 10-character substrings and track their occurrences with a hash map. The optimal approach uses bit manipulation for O(1) rolling hash updates. Best time complexity: O(n), space: O(n).

Common Approaches

Approach	Time	Space	Notes
✓ Bit Manipulation with Rolling Hash	O(n)	O(n)	Use bit manipulation to encode DNA sequences as integers for faster comparison
Brute Force - Compare All Substrings	O(n²)	O(1)	Extract all 10-letter substrings and compare each with every other
Sliding Window with Hash Map	O(n)	O(n)	Use sliding window to extract substrings and hash map to count occurrences

Bit Manipulation with Rolling Hash — Algorithm Steps

Encode DNA characters to 2-bit values
Build initial 20-bit hash for first 10 characters
Use rolling hash to update hash value as window slides
Track hash occurrences and decode repeated sequences

Visualization

Tap to expand

Step-by-Step Walkthrough

Encode DNA

A=00, C=01, G=10, T=11 (2 bits each)

Rolling Hash

Slide window updating 20-bit hash value

Track & Decode

Count hash occurrences, decode repeated ones

Code -

solution.c — C

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_SEQUENCES 100000

struct Entry {
    int hash;
    int count;
};

int charToNum(char c) {
    switch(c) {
        case 'A': return 0;
        case 'C': return 1;
        case 'G': return 2;
        case 'T': return 3;
    }
    return 0;
}

char numToChar(int n) {
    char chars[] = "ACGT";
    return chars[n];
}

char** solution(char* s, int* returnSize) {
    int n = strlen(s);
    if (n < 10) {
        *returnSize = 0;
        return NULL;
    }
    
    int hashVal = 0;
    for (int i = 0; i < 10; i++) {
        hashVal = (hashVal << 2) | charToNum(s[i]);
    }
    
    struct Entry entries[MAX_SEQUENCES];
    int entryCount = 0;
    entries[0].hash = hashVal;
    entries[0].count = 1;
    entryCount = 1;
    
    char** result = (char**)malloc(1000 * sizeof(char*));
    *returnSize = 0;
    int mask = (1 << 20) - 1;
    
    for (int i = 10; i < n; i++) {
        hashVal = ((hashVal << 2) & mask) | charToNum(s[i]);
        
        int found = -1;
        for (int j = 0; j < entryCount; j++) {
            if (entries[j].hash == hashVal) {
                found = j;
                break;
            }
        }
        
        if (found == -1) {
            entries[entryCount].hash = hashVal;
            entries[entryCount].count = 1;
            entryCount++;
        } else {
            entries[found].count++;
            if (entries[found].count == 2) {
                char* decoded = (char*)malloc(11 * sizeof(char));
                int temp = hashVal;
                for (int k = 9; k >= 0; k--) {
                    decoded[k] = numToChar(temp & 3);
                    temp >>= 2;
                }
                decoded[10] = '\0';
                result[*returnSize] = decoded;
                (*returnSize)++;
            }
        }
    }
    
    return result;
}

int main() {
    char s[100001];
    fgets(s, sizeof(s), stdin);
    s[strcspn(s, "\n")] = 0;
    
    int returnSize;
    char** result = solution(s, &returnSize);
    
    printf("[");
    for (int i = 0; i < returnSize; i++) {
        printf("\"%s\"", result[i]);
        if (i < returnSize - 1) printf(",");
    }
    printf("]\n");
    
    return 0;
}

Time & Space Complexity

Time Complexity

⏱️

O(n)

Single pass with O(1) rolling hash updates per position

✓ Linear Growth

Space Complexity

O(n)

Hash map stores up to n-9 integer keys plus result strings

⚡ Linearithmic Space

87.5K Views

Medium Frequency

~25 min Avg. Time

3.2K Likes

Ln 1, Col 1

Smart Actions

💡 Explanation

AI Ready

💡 Suggestion Tab to accept Esc to dismiss

// Output will appear here after running code

Code Editor Closed

Click the red button to reopen

Input & Output

Constraints

Visualization

Related Problems

Common Approaches

Bit Manipulation with Rolling Hash — Algorithm Steps

Visualization

Code -

Time & Space Complexity

Select Compiler