Top Percentile Fraud - Practice Coding Problems

Table Schema

Fraud

Column Name	Type	Description
`policy_id` PK	int	Unique policy identifier
`state`	varchar	State where policy is issued
`fraud_score`	int	ML model fraud likelihood score

Primary Key: policy_id

Note: Each row represents a policy with its fraud risk assessment

Input & Output

Example 1 — Multiple States with Top 5%

Input Table:

policy_id	state	fraud_score
1	CA	95
2	CA	88
3	CA	75
4	NY	92
5	NY	89
6	NY	82
7	TX	98

Output:

policy_id	state	fraud_score
1	CA	95
4	NY	92
7	TX	98

💡 Note:

From CA (3 policies): top 5% includes policy 1 with highest score 95. From NY (3 policies): top 5% includes policy 4 with highest score 92. From TX (1 policy): policy 7 is automatically in top 5%. Results ordered by state ASC, fraud_score DESC, policy_id ASC.

Example 2 — Tied Fraud Scores

Input Table:

policy_id	state	fraud_score
1	FL	90
2	FL	90
3	FL	85
4	FL	80

Output:

policy_id	state	fraud_score
1	FL	90
2	FL	90

💡 Note:

Both policies 1 and 2 have the same fraud_score of 90, so they both get percentile rank 0.00 (tied for first place). Since 2 out of 4 policies = 50%, but they're tied at the top, both are included in the top 5% group.

Constraints

1 ≤ policy_id ≤ 100000
1 ≤ fraud_score ≤ 100
state consists of valid US state abbreviations
Each state has at least 1 policy

Visualization

Tap to expand

Understanding the Visualization

Input

Fraud table with policy_id, state, fraud_score

Percentile Rank

Calculate PERCENT_RANK per state

Filter

Keep top 5% (≤ 0.05) per state

Key Takeaway

🎯 Key Insight: Use PERCENT_RANK() to identify relative position within each group partition

Asked in

a Amazon 15 M Microsoft 12 G Google 8

Use PERCENT_RANK() window function with PARTITION BY state ORDER BY fraud_score DESC to calculate percentiles within each state, then filter for percentile_rank ≤ 0.05.

Table Schema

Fraud

Column Name	Type	Description
`policy_id` PK	int	Unique policy identifier
`state`	varchar	State where policy is issued
`fraud_score`	int	ML model fraud likelihood score

Primary Key: policy_id

Note: Each row represents a policy with its fraud risk assessment

Common Approaches

Approach	Time	Space	Notes
✓ Window Function with PERCENT_RANK	O(n log n)	O(n)	Use PERCENT_RANK() window function to calculate percentile within each state

Window Function with PERCENT_RANK — Algorithm Steps

Step 1: Partition data by state and rank by fraud_score descending
Step 2: Filter for records with PERCENT_RANK <= 0.05 (top 5%)
Step 3: Order results by state, fraud_score desc, policy_id asc

Visualization

Tap to expand

Step-by-Step Walkthrough

Partition

Group by state

Rank

Order by fraud_score DESC

Filter

Keep top 5% (≤ 0.05)

Code -

solution.c — C

Time & Space Complexity

Time Complexity

⏱️

O(n log n)

Sorting required for ranking within each state partition

⚡ Linearithmic

Space Complexity

O(n)

Window function temporary storage

⚡ Linearithmic Space

28.4K Views

Medium Frequency

~18 min Avg. Time

890 Likes

Ln 1, Col 1

Smart Actions

💡 Explanation

AI Ready

💡 Suggestion Tab to accept Esc to dismiss

// Output will appear here after running code

Code Editor Closed

Click the red button to reopen

Table Schema

Input & Output

Constraints

Visualization

Related Problems

Table Schema

Common Approaches

Window Function with PERCENT_RANK — Algorithm Steps

Visualization

Code -

Time & Space Complexity

Select Compiler