Subset Stability, Mixed Models, and the Importance of Ontological/Epistemological Boundaries showing up in our Colvin Diagram
Ronald Loui
Ronald Loui
Ronald Loui
Published Jun 1, 2026
+ Follow

Quine got famous a century ago for showing the "nominalization" fluidity in logic: you could move objects into predicates and vice versa. This vexes anyone who cares to count objects in deductive logical representations. (Interestingly, you can't count predicates because P&Q is one predicate? But you can count objects, a1, a2, a3, until someone calls a1 = (ix)A1(x), "the x such that A1 is true of x" and a1 disappears from your "universe"! though i would start counting rigid designators at that point.)

Hart got the non-equivalence of rule "qualification" and "open textured terms" right, then backed off because no one understood computation at the time: if p then q unless r, in the hands of Cambridge logicians, was just if p AND not-r then q. We now know better. Sometimes you can't compute all the truth values and you still need an inference based on default or convention.

Anyone who knows me knows i am an epistemologist and begrudgingly talk ontology only when i have to. At Cleveland Clinic the ontologists define the columns of the xls spreadsheet; i joke that you can't always tell the ontologist from the oncologist. Cheap humor. MD brother is an oncologist.

But there is a pretty good connection to all of this through our new invention, the Colvin Diagram, which visualizes the hierarchy of relevant reference classes in your training set at query time. This returns the ML-view to the classical stats view from UCL, Neyman, Pearson, Fisher. But it advances classical stats by putting front and center the philosophical question that they always try to hide. Reichenbach's question. Which reference class are you using? What criterion of similarity, to project from past instances to this instance?

Bayesians hide it too, but at least they have meta-distributions which can address the problem rather than just hide it.

So I was puzzling over my contribution to the Festschrift for Guillermo Simari. Yes, my doctoral student got a Festschrift. Because his sons were PhDs, and he birthed a great group of AI researchers in Argentina and spent two decades bringing leading Europeans to visit. So yes, he deserves and gets a Festschrift.

I was in Café Muñoz in Bahía Blanca in the mid-90s puzzling about dialectical moves over café con leche while watching waitress and sobbing over Rosana's wrenching ballad, Si Tu No Estas Aqui. It dawned on me that day that the effect of argument and counter-argument isn't mainly retraction of assertion (the old model), nor final judgement of the contest (the new model), nor adding qualifiers to rules (the fast-fading nonmonotonic AI model). Those are epistemological moves.

No, the usual effect is something no one talks about. It's revision of the concept.

I assert, you push back, I say, hmm, you're right, thank you for forcing that clarification. I re-assert with more precision, a better scoped, less general, less sweeping claim. Like real-time open-textured term refinement.

Go back to Hart and imagine the picture in ML vector space: lazy-learning cutting planes in high dimensional space as open-textured terms (not yet fully specified by legislation, waiting for judges to decide cases) are evolving a more precise extent with each judicial decision reforming a concept. Hard cases set precedent, determine meaning at the boundary, in the interstices, with nuance. Can't do it without defeasibility, and can't do it without dialectical dialogue (or dialectical monologue). So one slices off part of what the predicate means, in order to survive the counterexamples.

This is an ontological move.

Remind us what is ontological? It's about what vocabulary you choose to use. Usually in dialogue , in "conversational game" according to Princeton's David Lewis, with another person or I suppose chat-generating-entity. It's what columns you put on your spreadsheet or database table: you can merge, you can split, you can mix and match, you can refuse precision, you can scope to a context. Probably a few more things you can do with those labels of concepts. As you might imagine, onotological moves are important in appellate law-giving, while ontological understanding is important in more mundane law-ruling. Fuzziness is about ontology. Prototypes and deformations is how my friend and colleague Thorne McCarty would talk about this. Cycorp was all about getting ontologies right, but Doug Lenat had waaaaay too rigid a view, a database view, of what was needed.

Remind us what is epistemological? It's about knowing, believing, doubting. Once you can assert something, a predication of an object, P(a), i.e., P is true of a, do you believe it? Modal logicians invent modalities that represent the propositional attitudes or epistemic stance toward such a piece of information. "I suspect." "I surmise." "I am certain." Probabilists shrug and say, what took you so long? Probability is all about epistemology and vice versa. In fact, meta-probability is where the action seems to be. You could go to qualitative probabilities or discrete degrees of belief. That's ok in a lot of situations, and sometimes more honest. Problem with the Bayesians and especially the objective Bayesians seems to be they can pull epistemic precision for anything anywhere anytime out of their lower backside anatomy. Some of us find that obscene.

In a medical record, an ICD code is recorded and billed with certainty of precision (epistemological issue), but also certainty-or-not of correct concept (ontological issue). "Or not" because sometimes the ICD numbering allows "or other" such as F60.89 "other specified specific personality disorders." Includes narcissism! Or Z34.8 represents an encounter for supervision of other normal pregnancy (for instance, supervision of a twin pregnancy)." I wrote a little paper once about adding belief qualifiers and prototypicality qualifiers to medical coding. Anyone read it? It's a good idea. Had a good trip to Dallas ICHI to present that one.

Obviously there is an interplay between epistemics and ontics: choose well, know well. It's at the heart of Kyburg's SCIENCE AND REASON theory-formation and theory-revision model.

So the idea in the Festschrift is this: if P(a) is rebutted by saying not-P(a), or even a class of not-P(x), x in D; then instead of retracting P(a) or saying "i lose!", the dialogue continues, thank you very much, by revising P. P[-b](a). P with the exceptional class carved out, is now asserted of a. Change of meaning of P. You got me, but I still wanna argue the point.

I gave an example of Jeremy Lin being from Harvard and the rule being that Asian Harvard peeps don't do well socially. Before you knee-jerk qualify the rule to say Asian Harvard non-athletes don't do well socially, which is an epistemic move, let's just say more carefully what I mean by Asian or social well-doing. Is he really Asian, under the connotation of that predicate? He's famously ABC (American-born Chinese) from Palo Alto who speaks relatively well for himself. And you don't know his dating life or induction into the finals dining clubs. You see the move.

The problem is that one could have qualified the rule, rather than revising the predicate. They seem equally good representations of the claimant's position. Why choose the ontological if you are inherently epistemological?

Then the bathtub, thinking about the Colvin Diagram. Ah yes, the Colvin Diagram. What doesn't it solve?

What you want to look at is the subset stability. I see a reference class R1 with propensity to P. But it has subclasses Q1, Q2, and Q3, each with a different propensity to P. In the extreme case, you can see Simpson's paradox emerge; in the usual case, mixed models is the term statisticians like. If the enumeration of subclasses is complete wrt dialogue focus, you may even see causal (as probabilistic influence) statements you might want to make.

So this is why you want to make the ontological move, not just the epistemological.

Instead of allowing that R1 has a diluting subclass Q3, which differs from Q1 and Q2, and leaving the table frustrated, thinking "I wish they could understand what I meant!", the move is simply to split off Q3, merge Q1 and Q2, and reassert if R1[-Q3] then propensity to P. Then the epistemics change: the propensity of the newly sliced and remixed class are the new meaning of the statement about R being P.

That's ontology effecting epistemology.

It's so simple. Most of what the Colvin Diagram enables is simple, clear, common sense, intuitive. It's absolutely shocking how it eluded theorists, mathematists, and data scientists for so long. Well, it hasn't been that long for data science.

If you think of data science generally as the rewrite of 100yrs of stats with the advent of computation, it makes sense. In this case, we are not throwing epochs of tons of computation at a problem. We are doing a lot of computation on the first filter. But then the innovation becomes the user interface: visualization. Older generations could not deliver it, including my COMPUTING REFERENCE CLASSES mid-80s paper and Kyburg's book with Choh Man Teng a decade later, simply because book publishers did not have color and graphics control.

You must be able to plot the hierarchy that shows nodes with sample size AND propensity, and edges with risk difference (or risk ratio difference, or some other test of significant difference, Fisher, Chi-Square, etc.). If you can't do all of that simultaneously, you don't get the insight. Color is not strictly necessary, but really helps.

Kyburg probably saw this in his mind's eye every time the reference class question arose. But he was trying too hard. He wanted to solve the problem of selecting the best ref class. The Colvin Diagram brings epistemic humility to the table. It's a human audit, not an automation. Oh, we could report meta-stats about the heterogeneity, what we call t-Chaos of the Ackerman subgraph (I spelled Ackerman correctly there: Mollie Ackerman, Thao Nguyen also being ack'ed). Maybe even train a NN to look at the cutsets and spit out a meta-opinion.

But we prefer you keep your ontology and epistemology closer to the chest and not outsource the art of good judgment.