Symetrique Decision-Grade Evidence
All Blogs Real-World Data

Real-World Data

A practitioner's framework for choosing the right RWD provider — covering population coverage, clinical depth, therapeutic fit, governance, and total cost of ownership.

Choosing the Right Real-World Data Provider: Why Most Selection Processes Fail

The real-world data market has never been more crowded. IQVIA, Symphony Health, Komodo Health, Optum Life Sciences, HealthVerity, Flatiron Health, Definitive Healthcare, Datavant, TriNetX, PurpleLab, Truveta, Forian, Quest Diagnostics, LabCorp — the list of providers grows every year, each claiming broad coverage, deep longitudinal data, and the analytics capabilities to answer whatever question you bring.

For a pharma or healthcare organization trying to make a sound data decision, the landscape is genuinely difficult to navigate.

This confusion is not accidental. Most RWD providers market to the broadest possible audience, which obscures the fact that every data asset has structural strengths and structural limitations. A provider that is ideal for oncology outcomes research may be poorly suited for rare disease patient identification. A dataset that gives you comprehensive commercial claims coverage will have significant gaps in Medicaid populations. A platform that excels at longitudinal patient journey mapping may have weak site-of-care detail.

I have spent 20+ years building analytics systems inside pharma and diagnostics organizations — including enterprise platforms deployed across 30+ global markets — and the question of which data to use, when to use it, and how to evaluate a provider has come up in every engagement. This post shares the framework I use when advising pharma and biotech clients on RWD onboarding decisions.

Why Most RWD Selection Processes Fail

Before getting to the framework, it is worth naming the failure modes I see most often.

Selecting on brand rather than fit. IQVIA is the largest RWD provider in the world. That does not mean it is always the right choice. Many organizations default to IQVIA because the name provides internal cover — "no one ever got fired for buying IQVIA" — rather than because it is the best fit for the specific analytical question at hand. Brand familiarity is a reasonable starting point; it is a poor basis for a data decision.

Confusing data volume with data quality. A provider claiming coverage of 300 million patients sounds impressive until you examine what "coverage" actually means. Most large claims databases have dense coverage for commercially insured, working-age adults and thin coverage for Medicaid patients, the uninsured, and elderly populations on Medicare fee-for-service. Rare disease and pediatric populations are frequently undercounted regardless of headline patient counts.

Ignoring the question before evaluating the data. The most important input to any RWD selection decision is a precisely stated analytical question. "We need RWD" is not a question. "We need to size the treated population of adult IgAN patients on stable ACEi/ARB therapy in the United States, segmented by geography and payer type, for a pre-launch market sizing exercise" is a question. The right data follows directly from the question. Without it, you are shopping without a list.

Buying a subscription instead of solving a problem. Annual enterprise RWD subscriptions can run from $200,000 to several million dollars. Organizations often buy broad access and then struggle to generate specific value from it. A targeted, project-scoped data license — a defined cohort extraction for a defined analytical question — frequently produces better output at a fraction of the cost.

Brand familiarity is a reasonable starting point; it is a poor basis for a data decision. The organizations that get the most value from RWD are not the ones with the largest subscriptions — they are the ones who ask the right questions first.

The Symetrique Perspective

At Symetrique, we believe RWD selection must start with a precisely stated analytical question — not a vendor shortlist. Every engagement begins by defining what decision the data is meant to inform and the minimum viable evidence standard required. The framework that follows is the one we use with every pharma and healthcare client navigating this market.

Madhav Kathikar — Founder, Symetrique
#RealWorldData#RWD#PharmaAnalytics #DataStrategy#Symetrique

The Five Dimensions of RWD Evaluation

When evaluating any RWD provider, I assess five dimensions. No single provider scores perfectly on all five. The goal is to find the best match for your specific question, patient population, and governance requirements.

1

Population Coverage and Representativeness

The first question is not how many patients are in the database — it is whether your patients are in the database.

Key questions to ask any provider: What is the payer mix of your covered population — commercial, Medicare, Medicaid, uninsured? What is the geographic distribution, and are certain states or regions underrepresented? What is the time lag between care delivery and data availability? How are specialty drug patients captured — many commercial claims databases have significant gaps in specialty pharmacy data.

For rare disease, this dimension is often disqualifying. A database covering 300 million commercially insured lives may contain only a few hundred confirmed patients with a condition affecting one in 50,000 people — not enough for meaningful analysis. In these cases, disease-specific registries, EHR networks with deep specialty coverage, or diagnostic lab data are often the only viable options.

2

Data Type and Clinical Depth

Different questions require different data. The major categories:

Administrative claims capture billing events — diagnoses, procedures, pharmacy fills, and provider encounters. Claims are longitudinal, comprehensive for covered services, and strong for treatment pattern analysis, market share measurement, and adherence tracking. Their limitation is clinical thinness: claims tell you what happened but rarely why, and they do not capture lab values, imaging findings, clinical notes, or disease severity.

Electronic health records (EHR) provide richer clinical detail — lab values, vital signs, physician notes, biomarkers, disease severity measures. This depth is essential for outcomes research and studies requiring granular patient characterization. The limitation is fragmentation: a patient treated across multiple health systems appears as multiple incomplete records unless those records are linked.

Specialty pharmacy data captures dispensing and refill records for specialty drugs — the most accurate source for specialty drug adherence, persistence, and real-world dosing patterns. This data is proprietary to the manufacturer and typically requires a hub or specialty pharmacy aggregation arrangement.

Patient registries are purpose-built for specific diseases, collecting standardized clinical data prospectively. Registry data has the highest clinical quality but the smallest population size. For rare diseases, well-curated registries — the Primary Immune Deficiency Foundation registry, the NHF Bleeding Disorders registry, HAE patient databases — are often the only credible source for outcomes research.

Diagnostic and laboratory data is the most underappreciated data type in pharma RWD strategy — and the one that diagnostic companies like Quest Diagnostics and LabCorp have quietly built into significant data businesses. Lab data contains something neither claims nor EHR reliably provides: actual measured biomarker values. Creatinine trends, HbA1c trajectories, eGFR progression, PSA levels, CBC results, genetic panel findings — this is the clinical signal that drives treatment decisions and defines patient eligibility for many therapies. Lab data is also a powerful patient identification engine: an abnormal result often precedes an ICD-10 diagnosis code by months or years.

Linked datasets combine multiple data types — claims linked to EHR, EHR linked to genomics, claims linked to specialty pharmacy — to produce a more complete longitudinal record. HealthVerity and Datavant specialize in this linkage infrastructure. Linked data is more powerful but more expensive, and the linkage methodology requires careful scrutiny.

3

Therapeutic Area and Indication Fit

Every major RWD provider has therapeutic area strengths that are not widely advertised — and weaknesses that are even less so.

Flatiron Health has unmatched depth in structured oncology EHR data, built from a network of community oncology practices. For oncology outcomes research — real-world progression-free survival, treatment sequencing, biomarker-outcome associations — Flatiron is the standard. For any non-oncology indication, it is the wrong choice.

Komodo Health has invested heavily in rare disease patient identification and longitudinal journey mapping, using its Healthcare Map to trace fragmented patient records across care settings. For rare disease market sizing, patient finding, and undiagnosed patient identification, Komodo is one of the strongest options.

Optum Life Sciences brings a unique payer perspective — its data derives from UnitedHealth Group's claims, giving it exceptional visibility into cost, utilization, and access dynamics. For market access analytics, HEOR studies, and budget impact modeling, Optum's payer-linked data is a significant advantage.

IQVIA has the broadest geographic coverage globally, the deepest prescription data (NPA and DDD datasets cover virtually all dispensed prescriptions in the US), and the most comprehensive commercial analytics suite. For market sizing, competitive share analysis, and prescription trend monitoring, IQVIA is the most complete single-source solution — at a price to match.

Symphony Health (ICON) offers strong prescription and sales data with broad channel coverage — retail, institutional, and integrated delivery networks. For commercial analytics teams tracking market share and competitive dynamics, Symphony is a credible IQVIA alternative.

HealthVerity specializes in data linkage — connecting claims, EHR, specialty pharmacy, lab, and other sources using privacy-preserving identity resolution. When no single source is sufficient and you need a linked longitudinal view of the patient journey, HealthVerity is the platform of choice.

TriNetX provides a federated network of real-time EHR data from academic medical centers and health systems, with a particular strength in clinical trial feasibility — identifying eligible patient populations at specific sites before a trial begins.

Definitive Healthcare focuses on provider and facility intelligence — who practices where, what procedures they perform, what their referral networks look like. For site-of-care analytics, physician targeting, and commercial field force optimization, Definitive is the specialist.

PurpleLab is one of the fastest-growing mid-tier claims platforms, built around its HealthNexus no-code analytics environment. It aggregates over 50 billion medical and pharmaceutical claims covering more than 330 million patient lives and 98% of US payers. PurpleLab's strongest differentiator is accessibility — its no-code interface allows teams to move from question to insight in minutes. A 2025 acquisition of KAID Health added AI-powered NLP on clinical notes to its structured claims foundation. Best fit for commercial pharma teams that need rapid, self-service analytics without deep data engineering resources.

Truveta is purpose-built for regulatory-grade RWE, founded as a consortium of major US health systems contributing daily-updated EHR data to a shared research platform. Truveta Data covers EHR data from more than 120 million patients linked with closed claims for more than 200 million patients across 100+ payers. Its Truveta Language Model normalizes unstructured clinical notes, lab values, and imaging data at scale. For regulatory-grade HEOR, safety monitoring, and comparative effectiveness requiring daily-updated, FDA-provenance-documented data, Truveta is the right choice.

Forian occupies a distinctive position through its CHRONOS hybrid ecosystem, which combines open claims, closed claims, EHR, remittance data, and social determinants of health (SDoH) covering over 350 million de-identified patients since 2015. Open claims offer longer patient follow-up, less data lag, and more frequent refresh than closed claims alone — making Forian especially valuable for tracking patients across payer transitions. Its SDoH variables (race/ethnicity, income, education, occupation) expand the analytical frame beyond what claims or EHR alone can provide.

Quest Diagnostics has built a systematic data business — Quest Data Insights — on one of the largest clinical laboratory networks in the United States. Its de-identified lab data covers hundreds of millions of tests annually and can be merged with data from Datavant, HealthVerity, Komodo, and Symphony Health. Quest's geographic breadth captures results from patients seen at independent physician offices and retail clinics that do not appear in hospital-centric EHR networks. Best for oncology biomarker research, rare disease patient identification, and studies requiring actual measured lab values rather than diagnosis codes as proxies.

LabCorp is Quest's closest peer — a national laboratory network with over 45 billion annual test results across 6,500 diagnostic assays and 160 million patient encounters. LabCorp's acquisition of Invitae adds genetic and variant data for studies that connect genotype to phenotype to treatment outcome. Its strategic investment in HealthVerity enables lab data to be linked with a broader RWD ecosystem. For pharma teams in precision medicine, oncology, cardiovascular, and rare genetic disease, LabCorp's combination of diagnostic depth, genomics, and external linkage capability is difficult to match from any other single provider.

4

Data Governance, Compliance, and Provenance

This dimension is non-negotiable for organizations in regulated environments, yet it is the one most frequently skipped.

HIPAA Safe Harbor and Expert Determination are the two accepted de-identification methods — understand which your provider uses and what the re-identification risk assessment shows, especially when combining datasets. Ask about data provenance: some providers are transparent about their source networks; others keep sources proprietary. Opaque provenance makes it impossible to assess representativeness or systematic bias. Review contractual use restrictions carefully — most licenses prohibit individual patient identification and regulatory submissions without additional validation. For regulatory-grade RWD, the FDA's fit-for-purpose assessment framework is explicit about what is required, and not all commercial providers have validated their data to that standard.

5

Access Model and Total Cost of Ownership

Annual enterprise subscriptions can run from $200,000 to several million dollars and are frequently underutilized. For defined, project-specific questions, a targeted cohort extraction license is usually more cost-effective. Beyond the data fee, account for analytical infrastructure: raw RWD requires substantial processing before it is analytically useful, and some providers deliver analytics-ready extracts while others deliver raw files that require significant capability to use. Finally, understand actual delivery timelines — for time-pressured decisions, the difference between two weeks and six weeks matters.

Diagnostic and laboratory data is the most underappreciated data type in pharma RWD strategy. An abnormal result often precedes an ICD-10 diagnosis code by months or years — making lab data a powerful patient identification engine that neither claims nor EHR can replicate.

The Symetrique Perspective

No single dimension should dominate the evaluation in isolation. We have seen organizations over-index on cost and discover governance gaps after procurement — and others over-invest in clinical depth for a question that claims data alone could have answered in weeks. Assess all five in sequence, weighted to the specific analytical question at hand.

Madhav Kathikar — Founder, Symetrique
#RWDEvaluation#DataGovernance#ClinicalData #HIPAA#PharmaRWD#Symetrique

Matching Use Case to Provider and How Symetrique Approaches RWD Onboarding

The table below maps the most common pharma analytics use cases to appropriate data sources. The right choice always depends on your specific question, population, and governance requirements.

Matching Use Case to Provider: A Reference Framework

Use Case Primary Data Type Best-Fit Providers Key Watch-Outs
Commercial market sizing Claims IQVIA, Symphony Health, PurpleLab Medicaid and uninsured gaps; retail vs. specialty channel split
Rare disease population sizing EHR + registry + lab Komodo Health, TriNetX, Quest, registries Small N; ICD-10 coding variability; undiagnosed patient gap
Launch monitoring / share tracking Prescription claims IQVIA NPA/DDD, Symphony Health Data lag; specialty vs. retail split
Patient journey / treatment seq. Linked claims + EHR HealthVerity, Komodo, Optum, Forian Linkage methodology; payer transition tracking
Oncology outcomes research Structured oncology EHR Flatiron Health Non-oncology gaps; limited claims linkage
HEOR and budget impact modeling Payer claims + cost Optum Life Sciences, Truveta Payer mix representativeness; cost coding variability
Clinical trial feasibility EHR networks TriNetX, IQVIA Site Intelligence Academic center bias; community practice gaps
Market access / formulary Claims + formulary IQVIA, MMIT, Komodo Health Formulary lag; payer segmentation completeness
Site-of-care / provider analytics Claims + facility data Definitive Healthcare, IQVIA Site attribution methodology; ownership changes
Biomarker / lab-based patient ID Diagnostic lab data Quest Diagnostics, LabCorp Test ordering patterns vary; claims linkage needed
Genomics / precision medicine Lab + genomic data LabCorp (Invitae), Truveta Genome Project Consent requirements; genotype-phenotype linkage
Regulatory-grade RWE Validated claims + EHR IQVIA, Optum, HealthVerity, Truveta FDA fit-for-purpose requirements; audit-ready provenance

When to Use Multiple Sources — and When Not To

One of the most common mistakes in RWD strategy is assuming that combining more sources always produces better answers. It does not.

Use multiple sources when: your target population is fragmented across care settings; no single source provides adequate population coverage; you need to link clinical depth with treatment breadth; or a regulatory submission requires triangulation across independent sources.

Use a single source when: the question is well-defined and one data type can answer it; budget is constrained and the marginal value of a second source does not justify the cost; combining sources would introduce linkage complexity that creates more uncertainty than it resolves; or speed matters and multi-source linkage would take months you do not have.

The core principle: use the minimum data necessary to answer the question with confidence. More data is not better analytics. Better questions — combined with the right data for those questions — are.

How Symetrique Approaches RWD Onboarding

As Symetrique works with pharma and healthcare organizations, RWD provider selection and onboarding is a core part of our engagement model.

  • 1
    Start with the question. Before evaluating any provider, we define the precise analytical question, the decision it is meant to inform, and the minimum viable evidence standard required. This prevents both over-engineering and under-scoping.
  • 2
    Assess internal data first. Most organizations underutilize the real-world data they already own — specialty pharmacy feeds, hub operational data, patient services program data, commercial CRM data. We map internal assets before recommending any external purchase. External data fills gaps; it does not replace internal context.
  • 3
    Define the gap precisely. Where internal data cannot answer the question, we identify the specific deficit — population coverage, data type, geography, clinical depth — that external data must fill. Vague gaps lead to overbroad subscriptions.
  • 4
    Evaluate providers against the five dimensions. We assess coverage, data type fit, therapeutic area strength, governance, and access model against the specific analytical requirements. Not against general reputation.
  • 5
    Start small and validate empirically. Before any enterprise subscription commitment, we recommend a scoped pilot — a single cohort extraction, a defined analytical output, a validation against known benchmarks. A provider that looks strong in a demo may perform poorly on your specific population.
  • 6
    Build governance infrastructure that lasts. We help organizations build the data governance, compliance, and documentation frameworks that make RWD assets usable across teams and defensible to legal, regulatory, and IT stakeholders. Data without governance creates liability, not insight.

A Final Thought

Real-world data is not a commodity. The same budget spent on two different providers for the same analytical question can produce answers that differ by a factor of two or more — not because one provider is dishonest, but because every dataset has structural coverage biases that compound in ways that are predictable if you know what to look for, and invisible if you do not.

The organizations that get the most value from RWD are not the ones with the largest subscriptions. They are the ones who ask the right questions first, understand their data well enough to know its limits, and build analytics on a foundation that reflects the reality of their patient populations rather than the coverage map of a particular data vendor.

That is the work Symetrique does. If you are evaluating RWD providers for a specific program, or building a data strategy for a new therapeutic area, we would be glad to help you navigate it.

Madhav Kathikar
Founder & Principal — Symetrique Inc.

Madhav Kathikar is the Founder and Principal of Symetrique Inc., a healthcare analytics company based in the Greater Chicago Area, Illinois. Symetrique is a Women-Owned and Minority-Owned Business (WBE/MBE certified, Illinois 2026).

Madhav Kathikar — Founder, Symetrique
#RWDProviders#RWE#HEOR #RareDisease#Oncology#DataGovernance#Symetrique

References

Industry and Market Research

FDA Guidance

Peer-Reviewed Literature

Provider Sources

Provider characterizations reflect publicly available information as of April 2026. Capabilities evolve — verify current offerings directly with each provider before making procurement decisions.