Count Occurrences in Text - Problem

You are given a table Files containing file names and their text content.

Table Structure:

  • file_name (varchar): Unique file identifier
  • content (text): The text content of the file

Write a SQL query to find the number of files that contain the words 'bull' and 'bear' as standalone words. The words must be surrounded by spaces or be at the beginning/end of the content.

Important: Words like 'bullet', 'bears', 'bull.', or 'bear,' should NOT be counted as they are not standalone occurrences.

Return the result showing each word ('bull' and 'bear') along with the count of files containing that word.

Table Schema

Files
Column Name Type Description
file_name PK varchar Unique identifier for each file
content text Text content of the file
Primary Key: file_name
Note: Each row represents one file with its complete text content

Input & Output

Example 1 — Mixed Word Matches
Input Table:
file_name content
doc1.txt The bull market is trending upward
doc2.txt A bear was spotted in the woods
doc3.txt The bullet train is very fast
doc4.txt Bull and bear are both animals
Output:
word n_files
bear 2
bull 2
💡 Note:

Analysis: Files doc1.txt and doc4.txt contain standalone 'bull'. Files doc2.txt and doc4.txt contain standalone 'bear'. File doc3.txt contains 'bullet' which is not counted as it's not a standalone 'bull'.

Example 2 — Edge Cases with Punctuation
Input Table:
file_name content
file1.txt bull.
file2.txt bear!
file3.txt bears are dangerous
file4.txt no matches here
Output:
word n_files
bear 1
bull 1
💡 Note:

Word Boundaries: 'bull.' and 'bear!' are counted as standalone words since punctuation marks serve as word boundaries. 'bears' is not counted as it contains additional letters.

Constraints

  • 1 ≤ number of files ≤ 1000
  • file_name contains unique values
  • content can be empty or contain up to 10,000 characters
  • Words are case-sensitive

Visualization

Tap to expand
Count Occurrences in Text - Word Boundary AnalysisFiles Tablefile_namecontentdoc1.txtbull market risingdoc2.txtbear in woodsdoc3.txtbullet traindoc4.txtbull and bearWord BoundaryPattern MatchResultwordn_filesbear2bull2Match Analysis:✓ bull (docs 1,4)✓ bear (docs 2,4)✗ bullet (partial)
Understanding the Visualization
1
Input Files
Files table with text content
2
Pattern Match
Apply word boundary regex patterns
3
Count Results
Count files per word
Key Takeaway
🎯 Key Insight: Use word boundary regex patterns to distinguish complete words from partial matches in text search operations
Asked in
Amazon 23 Microsoft 18 Google 15
28.5K Views
Medium Frequency
~12 min Avg. Time
892 Likes
Ln 1, Col 1
Smart Actions
💡 Explanation
AI Ready
💡 Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen