Target Diversity: US vs China

Comparing the spread of drug targets across 126,348 patent-target entries

Overview

Which country's biopharma patent landscape is more diverse? We classified every patent family by its origin country and compared the distribution of drug targets (gene symbols) between CN China (62,463 target entries) and US United States (63,885 target entries).

Country classification uses the filing jurisdiction from DOCDB patent family records. For direct national filings, this is the canonical country. For PCT (WO) filings — which account for a large share of US-origin patents — we resolve the receiving office from the WO application number (e.g. US0010384 = filed at USPTO), recovering ~148K US-origin and ~30K CN-origin families that would otherwise be lost.

Key finding: At comparable volume, China's patent landscape is significantly more concentrated — a handful of checkpoint and oncology targets dominate, with the top 10 targets capturing 13.7% of all CN entries vs 9.0% for the US. The US spread is flatter, with research effort distributed more evenly across therapeutic areas. Both follow power-law distributions, but China's curve is steeper.

The numbers

MetricCNUSInterpretation
Total target entries 62,46363,885 Nearly equal volume once WO origins are resolved
Unique targets 6,7006,213 CN explores 8% more distinct genes
Shannon entropy (bits) 10.3210.69 US is more diverse (higher = more spread)
Normalized entropy 0.8120.848 US distributes effort more evenly relative to its target count
Simpson diversity 0.99650.9982 Two random US patents are more likely to target different genes
Gini coefficient 0.7730.748 CN targets are more unequally distributed
HHI 0.00360.0018 CN is 2x more concentrated
CR-10 (top 10 share) 13.7%9.0% CN's top 10 targets capture 50% more of the landscape
CR-50 (top 50 share) 28.6%20.9% Same pattern extends further down the ranking
Singleton targets (n=1) 2,8132,302 Both have long tails of explored-once genes
Mega-targets (n>100) 9189 Similar count of heavily-patented targets; CN concentrates more per target

Rank-frequency distribution

Both countries follow a power-law (Zipf-like) distribution: a few targets attract enormous patent volume while thousands of genes have only a handful of filings. On a log-log scale, a steeper drop indicates higher concentration. China's curve sits above (more volume) but drops off faster at the tail.

Rank-frequency distribution for CN and US targets
Log-log rank vs. frequency plot. CN (red) has higher counts across all ranks but a steeper slope.

Head-to-head: top targets

The diverging bar chart below shows the union of each country's top 20 targets. China's landscape is dominated by immuno-oncology checkpoints — PDCD1 (PD-1), CD274 (PD-L1), and EGFR each exceed 1,000 CN families. The US top targets are more balanced: TNF, EGFR, and ERBB2 (HER2) lead with comparable absolute counts to each other, and a broader mix of inflammatory, metabolic, and oncology targets fills out the top 20.

Top 20 targets comparison CN vs US
Union of top-20 targets from each country. CN bars extend left, US right.

Target overlap

Of the 8,687 unique targets across both countries, 4,226 are shared (49%). China has 2,474 genes not found in US filings, while the US has 1,987 exclusive targets. With WO origins resolved, the overlap is much larger than it first appeared — nearly half of all targets are pursued in both countries.

Venn diagram of target overlap between CN and US
2,474 CN-only targets, 4,226 shared, 1,987 US-only.

Concentration curves

The Lorenz curve plots cumulative share of patent entries against cumulative share of targets (sorted from most to least frequent). A curve bowing further from the diagonal indicates higher concentration. China's Gini coefficient of 0.773 vs. the US's 0.748 confirms that a smaller fraction of targets captures a larger share of Chinese patents — though the gap is narrower than raw counts suggest.

Lorenz concentration curves for CN and US
Lorenz curves with Gini coefficients. CN (red) is more concentrated than US (blue).

Diversity metrics at a glance

Side-by-side comparison of diversity metrics
Six diversity and concentration metrics compared between CN and US filings.

Takeaways

At equal volume, the concentration gap persists. With WO origins resolved, both countries have ~63K target entries — making the comparison apples-to-apples. CN and US now have similar mega-target counts (91 vs 89), but China concentrates far more heavily on its top targets. The HHI is 2x higher for CN (0.0036 vs 0.0018).

The US distributes effort more evenly. Every diversity metric (Shannon entropy, Simpson index, normalized entropy) is higher for the US, while every concentration metric (Gini, HHI, CR-10/20/50) is lower. This suggests US patent filings reflect a broader spread of therapeutic bets rather than clustering around consensus targets.

Immuno-oncology drives the CN concentration. The top 3 CN targets (PDCD1, CD274, EGFR) are all immuno-oncology workhorses, accounting for ~8.5% of all CN target entries. The equivalent top 3 in the US (TNF, EGFR, ERBB2) account for ~4.5%. China's "me-too" innovation pattern in checkpoint inhibitors is clearly visible in the data.