Executive summary – a single shift in diagnostic design
AI-driven peptide design shifts protease sensor development from empirical screening to on-demand creation, enabling sparser multiplex urine diagnostics for early cancer signals. MIT and Microsoft researchers’ CleaveNet model proposes short peptides tuned to cancer-linked proteases, with lab assays showing that model-designed sequences produced stronger cleavage signals than many training-set peptides. When displayed on nanoparticles, these peptides release urine-excreted reporters detectable by paper-strip tests. This structural change promises fewer cross-reactive sensors, faster iteration cycles—from months to days in lab workflows—and a potential pathway to noninvasive, home-based cancer screening.
Breaking down CleaveNet’s announcement
CleaveNet is a generative AI model trained on peptide-protease interaction data to propose novel cleavage substrates. Instead of screening vast peptide libraries over months, researchers prompt CleaveNet for sequences optimized for a target protease’s recognition motifs and kinetic preferences. In their reported lab assays, several CleaveNet-designed peptides—none present in the training set—exhibited higher fluorescence cleavage signals against proteases such as MMP13 and MMP9 than many of the original library members.
These designed peptides are conjugated to nanoparticle surfaces via cleavable linkers. Upon protease-mediated cleavage, small reporter fragments detach, circulate, and are filtered into urine. In murine models, urine samples yielded distinct paper-strip readouts correlated with administered nanoparticles and tumor presence. The research team reports on-demand sequence generation cycles that can take hours for initial proposals and days for iterative validation, compared with traditional empirical screening workflows that span months.
An ARPA-H-funded effort led by MIT’s Sangeeta Bhatia aims to translate this approach into an at-home multiplex kit targeting up to 30 cancer types. The kit concept leverages fewer, highly specific peptides to reduce cross-reactivity and simplify interpretation—an explicit move away from prior panels that relied on dozens of overlapping sensors to achieve coverage.
Industry context: from empirical panels to generative design
Protease-based diagnostics have long relied on trial-and-error selection of peptide substrates that can be cleaved by disease-linked enzymes. Traditional approaches employ randomized peptide libraries or phage display, followed by multistage screening to identify cleavage profiles. These methods often require broad panels—20 to 50 peptides—to capture protease dysregulation patterns, introducing cross-reactivity and complex data analytics.

CleaveNet’s generative framework reflects a broader shift in protein engineering, where models such as graph neural networks and diffusion-based architectures propose functional biomolecules without exhaustive wet-lab screening. At Lawrence Berkeley Lab, generative methods produced over 2,600 tumor-targeting TCR-mimic candidates in 30 hours, with crystallographic validation showing precise antigen binding and low off-target affinity—a process that historically took years.
Compared with these therapeutic efforts, CleaveNet focuses on diagnostics by tuning peptide sequences exclusively for cleavage efficiency and specificity. The promise is a smaller, higher-fidelity sensor set that can be multiplexed in a single urine assay, potentially reducing reagent costs, sample volume, and signal deconvolution complexity. This structural change—on-demand sensor design—could reshape the balance between sensor count and diagnostic accuracy.
Technical caveats: evidence gaps and performance metrics
While CleaveNet’s initial results indicate a capability to surpass training-set peptides in controlled assays, several unknowns remain in translating this work to human diagnostics:
- Clinical sensitivity and specificity: Animal model performance does not guarantee diagnostic accuracy in diverse human cohorts. Metrics to watch include true-positive rates across early-stage versus late-stage cancers and false-positive rates in patients with inflammatory or benign conditions.
- Peptide kinetics and off-target cleavage: Reported lab assays measure relative fluorescence increase, but comprehensive Michaelis-Menten parameters (kcat/Km) across target and off-target proteases are needed to quantify specificity ratios. Persistence of cross-reactivity in complex biological fluids could undermine assay fidelity.
- Reporter biodistribution and safety: Nanoparticle size, surface chemistry, and peptide immunogenicity influence in vivo circulation, tissue accumulation, and clearance routes. Biodistribution endpoints—organ retention half-life, complement activation markers, and peptide-specific antibody titers—will signal safety profiles for systemic reporter delivery.
Regulatory and governance considerations
An at-home urine assay requiring nanoparticle administration intersects multiple regulatory domains. It may be classified as a combination product—both a biological assay and a medical device. Key pathways and oversight points include:
- FDA premarket submission requirements for biologics and diagnostics, including Investigational New Drug (IND) and 510(k)/de novo device pathways.
- Manufacturing controls under current Good Manufacturing Practice (cGMP) for nanoparticles and peptide conjugates.
- Labelling and user-safety standards for home diagnostics that introduce exogenous reporters into the body.
- Post-market surveillance frameworks to monitor real-world performance, adverse events, and false-positive prevalence.
Data privacy and user consent in at-home sampling also introduce governance questions around health data storage, transmission of results, and integration with clinical records.
Signals to watch: validating on-demand design
Adoption of AI-driven protease sensors hinges on evidence across several axes. Observational signals that will indicate the viability of this structural shift include:
- Sensitivity and specificity curves from human trials, stratified by cancer type, stage, and comorbidity.
- Multiplex panel size versus diagnostic accuracy, measured by area under the ROC curve (AUC) for differing numbers of peptides per assay.
- Peptide immunogenicity assays reporting anti-peptide antibody levels post-exposure and incidence of hypersensitivity reactions.
- Nanoformulation scalability metrics such as batch yield, size distribution consistency, and sterility assurance levels in GMP production.
- Regulatory milestones including IND clearances, device de novo approvals, and combination-product designations.
- Cost-of-goods analysis comparing reagent, manufacturing, and distribution expenses against traditional screening panels.
- Data on urine matrix effects covering variability in pH, protein content, and interfering substances across demographic groups.
Broader implications for diagnostics and therapeutics
The transition from empirical sensor panels to on-demand design marks a structural pivot in diagnostic development. If generative models like CleaveNet consistently yield high-specificity peptide sensors, diagnostic workflows can compress iteration timelines, reduce reagent libraries, and tailor panels to emerging protease biomarkers—whether for cancer, infectious diseases, or inflammatory conditions.
Beyond diagnostics, similar generative frameworks may inform therapeutic payload release strategies. Peptide sensors that detect protease activity could be paired with drug-conjugate systems to achieve tumor-localized activation, illustrating a dual-use paradigm. The same AI architectures could extend to de novo enzyme inhibitors or peptides that modulate protease networks in complex disease states.
Conclusion: a new axis in diagnostic design
CleaveNet exemplifies a structural realignment in protease sensor development: moving from exhaustive experimental screening to AI-driven, on-demand sequence generation. This shift underpins the thesis that generative models can compress development cycles and enable sparser multiplex urine diagnostics, but the ultimate impact will depend on rigorous human validation, thorough safety profiling, and a clear regulatory pathway. Observing sensitivity/specificity outcomes, immunogenicity data, manufacturing scale metrics, and regulatory milestones will reveal whether on-demand peptide design transforms noninvasive cancer screening.



