Rumsfeld’s Right – ‘Unknown Unknowns’ in Data Science Apps

Donald Rumsfeld generated criticism on February 12, 2002 re: his ‘unknown unknowns’ statement while serving as U.S. Secretary of Defense. His comments were made in in response to a press on the lack of evidence linking Iraqi governmentterrorist groups, & weapons of mass destruction:

Skipping the WMD & Iraq War debate, we examine the ‘unknown unknowns’ statement in the context of data science, and specifically, how the concept applies in business intelligence, data mining, and yes, security-oriented applications. We observe Mr. Rumsfeld identified (3) of (4) binary possibilities — ‘known knowns’, ‘known unknowns’, & ‘unknown unknowns’, where the missing combination is ‘unknown knowns’:

Philosophically speaking, although Rumsfeld was vilified in some circles for the statement, military usage dates to at least 1984, and the concept is fully described in Persian literature at least as far back as 1368.

We provide a more modern discussion framed in data science application. Since we’re quasi-objectifying philosophical statements, we first map the concepts onto the following 2-D graph:

Availability x Awareness - Known Knowns <-> Unknown Unknowns

Although Mr. Rumsfeld and other sources don’t explicitly specify corresponding dimensions, we can infer these two axes to further motivate discussion:

  • Availability – capability to obtain, collect, or otherwise gain access to info required to make decision
  • Awareness – capability to correctly parse, process, understand &/or interpret collected information

Thus, availability can be broadly considered an extrinsic factor that can be increased via access to additional collection resources, whereas awareness can be broadly considered an intrinsic factor that can be increased via access to additional processing resources, to include training and education. With these thoughts in mind, we rename the dimensions, at the risk of redefining their semantics, where:

  • Reality of Perception ← Availability – how correct are our ‘facts’ – are our true statements really true, are our false statements really false – in data science circles, high availability roughly corresponds to high sensitivity & specificity (we return to this concept later)
  • Perception of Reality ← Awareness – how good is our ability to interpret and link the ‘facts’ – are we correctly parsing their semantic meaning, have we removed our personal biases – in data science circles, high awareness would correspond to high levels of accuracy & precision (again, we return to this concept later).

Thus, by simple substitution, we obtain this subsequent visualization:

Availability x Awareness - Known Knowns <-> Unknown Unknowns

So what of of finer granularity? Our first attempt in this direction is inspired by the ‘clock-style’ divisions associated with defining psychological states of ‘flow’ and yielded the following ‘reality × perception’ spinner.

Availability x Awareness - Known Knowns <-> Unknown Unknowns

So what of yet finer granularity? Our second attempt yields this ‘reality × perception’ spinner, at which point we were stuck identifying a term to further disambiguate among unknown, known, & unsure:

Availability x Awareness - Known Knowns <-> Unknown Unknowns

We punted fleshing out this granularity level and mapped the ‘reality × perception’ states onto associated actions using the prior level of granularity. These states are similarly inspired by the ‘clock-style’ divisions associated with creating psychological flow [since the alignment of reality & perception are closely coupled to our ability to exhibit true ‘flow’]:

Wikipedia - Problem Solving Wikipedia - Inductive Reasoning Wikipedia - Knowledge Wikipedia - Deductive Reasoning Wikipedia - Just the facts ma'am Wikipedia - Rationalization Wikipedia - Delusion Wikipedia - Intuition
Availability x Awareness - Known Knowns <-> Unknown Unknowns

As linked via in the image above to an associated Wikipedia post, we’ve generated these approximate correspondences:

  • know we know – knowledge (philosophize)
  • unsure we know – deduction (puzzle solve)
  • don’t know we know – narrative generation (narrate)
  • don’t know we’re unsure – rationalization (justify)
  • don’t know we don’t know – delusion (confusion)
  • unsure we don’t know – intuition (guess)
  • know we don’t know – problem solve (analyze)
  • know we’re unsure – induction (hypothesize)

The ‘trick’ is that we can interpret the ‘reality × perception’ concept in several other ways. For instance, foregoing the unknown unknowns, we can obtain the following visualization roughly corresponding to the ‘1 – 5’ star ratings, e.g., vis-à-vis Likert scales, that folks may associate with judging truthiness:

Reality × Perception - Likert Scale - ‘Star Ratings’

We attempted to combine visualizations, yielding this confusing mess:

Reality × Perception - Likert Scale - ‘Star Ratings’

Overlaying with the ‘state-of-mind’ actions is similarly unhelpful:

Reality × Perception - Actions + ‘Star Ratings’

However, both overlays are similar to the Klout style ‘influence matrix’, motivating another attempt to further granularize the ‘reality × perception’ matrix via classic block subdivision.

Reality × Perception - Perceived Reality Matrix

Where does that leave us – getting access to sufficient and accurate data is difficult, especially as data becomes increasingly subjective – sales figures, email records, transcribed conversation, images, video interview. Before rushing to judgement, appreciate your data may not be accurate, and further appreciate that each person on your team brings unique filters biases when interpreting the data.

Still not convinced? Reality is the ability of a pregnancy test to generate the right shade of blue. Perception is your ability to read the shade of blue pink the word ‘pregnant’’. Since a true positive, false positive, false negative, or true negative each generate different actions, we want tests to be accurate & the user to interpret test results correctly, as reflected in the mapping onto a traditional confusion matrix:

Reality × Perception - Perceived Reality Matrix

More simply, ensure your data signal generators & your data signal analyzers are operating correctly. We may return to this ‘unknown unknowns’ concept in a future post, perhaps in the context of information retrieval along with:

Bottom line — Donald Rumsfeld did have it right — ‘unknown unknowns’ do exist. Ensure your data mining app, business intelligence generators, &/or competitive intelligence analysts are aware of their limits and their data gathering tools. Similarly ensure your decision-making processes likewise have measures to detect &/or counter-act when your organization encounters ‘unknown unknowns’.

If you’ve read this far, you may enjoy a post on reality, perception & influence in social media applications. For more quasi-philosophy articles and to discover more about ‘technology simplified’, subscribe to Thought Puzzle.

à

    Related posts…

    1. ODNI: 2009 Annual Data Mining Report
    2. Data.gov: over before it began?
    3. 30 Mobile Apps — Obvious to Sci-Fi
    4. IARPA: Quantum Computer Science (QCS)
    5. Data.gov: on becoming ‘mom-friendly’
 

 
 

» archives

» recent comments

» subscribe