MolWatch is a searchable index of drug-related patent families filed at the USPTO, with structured annotations extracted by gpt-5.4-nano.
Data pipeline
1. Patent XML bulk data downloaded from the EPO DOCDB back-file archive, filtered to biotech IPC classes (A61K, A61P).
2. Patents grouped into families with a canonical representative selected per family.
3. Claims and applicant names enriched from USPTO bulk application XML.
4. Drug annotations (modality, gene target, disease indications) extracted from abstracts using GPT-5.4 nano via the OpenAI Batch API, validated against MeSH disease terms and HGNC gene symbols.