hierarchical thought

working memory limits us to the number of concrete details we can hold in our minds at any one time. depending on the type of detail, the consensus is somewhere between 4 and 10, usually quoted at 7. given first hand experience, I believe this idea has some legitimacy. but there’s a hidden detail that changes the implications of this limited work space hypothesis.

a detail in this context is answer to a question: the number of dollars in your bank account, the name of someone you want to communicate with, or the second digit of a novel phone number. that last one is the odd one. the number of dollars in your bank account is a single detail, you don’t recall the digits individually, nor do you recall the nth letter of someone’s name, but the name as a whole. so why divide up the phone number in this way? because we remember things based on connections to other concepts we’ve already established. names are recognizable as names, and we’ve heard most of the names we’re likely to encounter. in the case where we’ve not encountered the name, or anything like it before, they are indeed harder to remember, though we probably fall back on the larger chunks of syllables rather than the letters that make up the name. but how did we establish those other concepts if they are also expressed in terms of connections to other concepts we’ve previously learned?

in a real sense, it’s connections all the way down. at the bottom of all of this are a set of subjective, first person experiences. every individual will have a unique, incomparable set of these experiences. it is reasonable to believe that these experiences are roughly similar, since many of us seem to be able to communicate and infer what is meant without total elaboration, but we may never really know just what it is that we’re all uniquely experiencing. but somehow, through composition of those experiences over a lifetime, we’re able to build up a complex set of ideas. if we had to recall the exact details of each of these ideas, we’d be lost.

instead, as I mention in my post on attention:

in human learning the process called chunking is where we learn the specific low-level steps of an activity, and our brain is able to bundle them together such that we don’t need to address each step individually anymore, but can access the chunk as a whole, thereby reducing the amount of mental effort required to perform complex tasks. a good example is driving. when one first learns how to drive, it seems difficult to manage all the various activities that need to be monitored and adjusted to keep the car safely on the road making progress towards a destination. but as one practices, these activities are gradually pushed down below the layer of conscious effort, such that we can get in the car and simply will ourselves to “drive to the store”, and not have to break that down into pieces like “cross-over left and and right hands on the steering wheel so I can turn it more than 90 degrees in any one direction”.

now think back to the limited workspace. imagine it as a table on which 7 items are arrayed in physical space. those items are the details we are trying to remember. let’s layer in chunking: imagine that one of the items in your workspace is a pointer to another workspace, which also has specific items arrayed in some physical space. a pointer in this context is meant to be the address of something containing much more detail. in order to know the time of sunrise on March 21st 2021, we don’t need to memorize the almanac, just store a pointer, the knowledge that such questions have answers in almanacs, and follow the pointer until we get to the answer.

by use of pointers, working memory is converted from a harsh limit to the branching factor of our hierarchical memory. to illustrate the point, if we simplify from a branching factor fo 7 to 2, for the purpose of illustration:

graph TD A([A])-->A1 A([A])-->A2 B([B])-->B1 B([B])-->B2 A1([A1])-->A1a([A1a]) A1([A1])-->A1b([A1b]) A2([A2])-->A2a([A2a]) A2([A2])-->A2b([A2b]) B1([B1])-->B1a([B1a]) B1([B1])-->B1b([B1b]) B2([B2])-->B2a([B2a]) B2([B2])-->B2b([B2b])

here two top level details (A & B), when expanded each lead to two additional details (A1, A2, B1, & B2) which themselves lead to two additional details and so on. I don’t believe the brain works as a perfect hierarchy, but this binary tree at least illustrates some level of the way in which information can be organized without requiring us to sort through all the uncoordinated facts contained in our long-term memory.