DNA Pattern Recognition - Problem

Biologists are studying basic patterns in DNA sequences stored in a database. Given a table Samples containing DNA sequences, you need to identify which samples contain specific genetic patterns.

Pattern Requirements:

  • Start Codon: Sequences that start with ATG (a common start codon)
  • Stop Codons: Sequences that end with either TAA, TAG, or TGA (stop codons)
  • ATAT Motif: Sequences containing the motif ATAT (a simple repeated pattern)
  • Triple G: Sequences that have at least 3 consecutive G (like GGG or GGGG)

Return a result table showing each sample with boolean flags (1/0) indicating which patterns are present, ordered by sample_id in ascending order.

Table Schema

Samples
Column Name Type Description
sample_id PK int Unique identifier for each DNA sample
dna_sequence varchar DNA sequence represented as string of A, T, G, C characters
species varchar Species from which the DNA sample was collected
Primary Key: sample_id
Note: Each row contains a DNA sequence and its corresponding species information

Input & Output

Example 1 — Multiple Pattern Detection
Input Table:
sample_id dna_sequence species
1 ATGCTAGCTAGCTAA Human
2 GGGTCAATCATC Human
3 ATATATCGTAGCTA Human
4 ATGGGGTCATCATAA Mouse
Output:
sample_id dna_sequence species has_start has_stop has_atat has_ggg
1 ATGCTAGCTAGCTAA Human 1 1 0 0
2 GGGTCAATCATC Human 0 0 0 1
3 ATATATCGTAGCTA Human 0 0 1 0
4 ATGGGGTCATCATAA Mouse 1 1 0 1
💡 Note:

Sample 1: Starts with ATG (has_start=1), ends with TAA (has_stop=1), no ATAT motif, no triple G.

Sample 2: Starts with GGG (has_ggg=1), but no other patterns match.

Sample 3: Contains ATAT at the beginning (has_atat=1), but no other patterns.

Sample 4: Has all patterns except ATAT: starts with ATG, contains GGGG, ends with TAA.

Example 2 — No Pattern Matches
Input Table:
sample_id dna_sequence species
5 TCAGTCAGTCAG Mouse
6 ATATCGCGCTAG Zebrafish
7 CGTATGCGTCGTA Zebrafish
Output:
sample_id dna_sequence species has_start has_stop has_atat has_ggg
5 TCAGTCAGTCAG Mouse 0 0 0 0
6 ATATCGCGCTAG Zebrafish 0 1 1 0
7 CGTATGCGTCGTA Zebrafish 0 0 0 0
💡 Note:

Sample 5: No patterns match - all flags are 0.

Sample 6: Starts with ATAT (has_atat=1) and ends with TAG (has_stop=1).

Sample 7: No genetic patterns detected - all flags are 0.

Constraints

  • 1 ≤ sample_id ≤ 1000
  • dna_sequence contains only characters 'A', 'T', 'G', 'C'
  • 1 ≤ dna_sequence.length ≤ 1000
  • species is a non-empty string

Visualization

Tap to expand
DNA Pattern Recognition OverviewInput: DNA SequencesSamples Table1: ATGCTAGCTAGCTAA2: GGGTCAATCATC3: ATATATCGTAGCTA4: ATGGGGTCATCATAA5: TCAGTCAGTCAGPattern Detection✓ ATG start codon✓ TAA/TAG/TGA stops✓ ATAT motif✓ GGG+ patternOutput: Pattern FlagsResults1: [1,1,0,0] ✓Start+Stop2: [0,0,0,1] ✓Triple G3: [0,0,1,0] ✓ATAT4: [1,1,0,1] ✓Multi5: [0,0,0,0] ✗NoneSQL Pattern Matching FunctionsLEFT(sequence, 3) = 'ATG' → Start codon detectionRIGHT(sequence, 3) IN ('TAA','TAG','TGA') → Stop codon detectionsequence LIKE '%ATAT%' → ATAT motif detectionsequence LIKE '%GGG%' → Triple G detectionCASE WHEN condition THEN 1 ELSE 0 END → Boolean flagsORDER BY sample_id → Ascending order
Understanding the Visualization
1
Input
DNA sequences in Samples table
2
Pattern Detection
Apply string matching functions
3
Output
Boolean flags for each pattern type
Key Takeaway
🎯 Key Insight: SQL string functions (LEFT, RIGHT, LIKE) can efficiently detect multiple genetic patterns in DNA sequences using a single query with CASE expressions.
Asked in
Google 28 Amazon 22 Microsoft 18
28.5K Views
Medium Frequency
~12 min Avg. Time
890 Likes
Ln 1, Col 1
Smart Actions
💡 Explanation
AI Ready
💡 Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen