Last October, in a live demo, Neuralink finally revealed their new prototype: a coin-shaped device that connects with the brain by means of electrodes thinner than a human hair. The device is primarily designed to help people with neurological problems, such as memory loss, following strokes, spinal injuries and other diseases and accidents, but it can also monitor users’ health and record brain activity. In the future, Elon Musk expects the device to be able to read our minds, allow us to share our thoughts and perhaps even achieve a human–AI symbiosis. For now, however, it has provided barely any new scientific insights.
The search to understand the brain’s role in our behaviour, and how it gives rise to our cognitive abilities, has been ongoing for a long time. In the nineteenth century, Italian physician Angelo Mosso (who is now known for his studies of people with skull defects) developed a method of measuring blood pressure. The Mosso method allowed him to record blood flow in the brain through pulsations. He noted that subjects who engaged in cognitive tasks, such as mathematical calculations, showed an increase in brain pulsations and inferred that mental activity was accompanied by an increase in blood flow.
Mosso knew that his method had its limitations because he couldn’t test healthy subjects. So he built a platform, resting on a fulcrum, which he called a human circulation balance. The idea is simple. People are placed on the table and adjusted such that their weight is perfectly balanced. If brain activity increases the amount the blood in the brain, this causes a change in the weight distribution of the body (the head becomes heavier). Mosso was extremely careful to take into account every variable that might affect his results. Since breathing could produce observable movement, the platform was linked to a heavy counterweight to dampen respiratory fluctuations. Because head movements could alter the equilibrium, reference points marked the subject’s original position. After controlling for all these factors, Mosso exposed his subjects to a variety of conditions and measured the tilt in the platform. He observed that the platform tilted more heavily towards the head when his subjects read complex materials, such as manuals, than when they read newspaper articles. A recent replication study confirms that Mosso may have built the first crude version of a neuroimaging device.
In 1932, physicist Carl Anderson was studying cosmic rays in a cloud chamber when he found that some particle trails were bent by the same amount as those of electrons, but in the opposite direction. Only a particle with identical mass but opposite charge from the electron could behave in that way. He called these particles positrons. When radioactive isotopes break down, they give off positrons. Positron emission tomography (PET) takes advantage of this by allowing researchers to record the path that a radioactive substance called a tracer follows through the brain. Neuroscientists can thereby reconstruct an image of brain activity (although radioactive tracers cannot be used on certain subjects, such as children).
When a video camera has a high spatial resolution, it can capture tiny details, such as a ladybird resting on a flower or the letters of a distant sign. If it has a high temporal resolution, you can use it to photograph a passing car or a hummingbird in flight. PET has pretty good temporal resolution, but at only around one centimetre (much larger than most regions of interest in the brain), its spatial resolution is relatively low. Another neuroimaging technique is electroencephalography (EEG). Like the transistors in your PC, neurons can exist in only two states: on or off. When they reach a certain level of electric charge, they fire, instantly releasing their action potential: a subtle but abrupt electrical current that is transmitted through the neural network of your brain. EEG records these spikes in electrical activity by means of electrodes placed on the scalp. Its temporal resolution is high (in the millisecond range). Yet, at around 5–9 centimetres, its spatial resolution is too low, since the electric current must pass through a number of cranial layers that blur the image by the time it reaches the scalp.
While Anderson was discovering positrons, Linus Pauling was studying haemoglobin, the molecule that carries oxygen in the blood and causes its reddish colour. Pauling discovered that oxygenated haemoglobin has no magnetic properties, but deoxygenated haemoglobin has paramagnetic properties, meaning that it displays magnetic properties in the presence of a magnetic field. This discovery allowed for the development of functional magnetic resonance imaging (fMRI), which measures the proportions of oxygenated and deoxygenated haemoglobin, allowing scientists to record blood flow in the brain. Although it’s commonly believed that fMRI directly records brain activity, research has shown that it actually measures neuron inputs rather than their firing.
Although fMRI has low temporal resolution (in the order of seconds, rather than milliseconds), it has high spatial resolution and has therefore become the most popular neuroimaging technique. When you scan the brain using an fMRI machine, the image is converted into cubes called voxels (3D pixels). Each voxel measures around 1–3 mm2 and displays the average activity of millions of neurons. As Russell Poldrack puts it in The New Mind Readers, imagine you are observing from a distance as a crowd applauds three different political candidates. You can’t tell which each individual person supports which candidate, but you can separate the crowd into various sections and see whether each section shows a detectable preference—for example, by using a microphone to record the clapping and measuring the decibels of each section in response to each candidate. If we imagine that each person represents a neuron and each section of the crowd a voxel, this is how fMRI works. Some sections of the crowd probably clap much louder for one particular candidate than the others; just as, for example, some sections of the brain respond more strongly to faces than to other types of stimuli. However, scattered across the crowd, you are likely to find people whose preferences are atypical of their sections and thus, even in segments of the crowd that do not strongly favour any one candidate, you may be able to detect a difference in the pattern of applause that can provide a clue as to which candidate is speaking.
Likewise, we can compare a new activity pattern in the brain to the known ones and infer which pattern it most closely fits. Neuroscientists call this neural decoding and, to a certain extent, it allows researchers to know what a person is seeing—perhaps even thinking—just by looking at her brain scans. There’s some debate as to whether the cheering crowd analogy aptly describes this phenomenon, but researchers have shown that activity distribution across the neural network can reflect whether a rat is drinking or turning around at the end of a track, for instance, just as you can infer which political candidate a particular section of the crowd prefers.
Researchers, then, can decode behaviour and perception by looking at the brain. No wonder some enthusiasts have argued for the use of fMRI in legal contexts. In theory, you could tell whether a person were lying by reading her brain scan. Alternatively, you could use a brain scan to discover her shopping preferences, for neuromarketing purposes.
In theory. In practice, there are a number of caveats. For one, when studying behaviour, we must ask ourselves how reliable our tests are. The reliability of a test is reflected in how consistent the results are when the same person takes the same test at different times, under similar conditions. It is measured on a scale from 0 to 1. If a test’s consistency is 0, there is no overlap between the different results. If its consistency is 1, the results are exactly the same. Along with validity (i.e. whether the measure predicts some outcome of interest), reliability is critical in psychological research.
What is the reliability of fMRI? Ten researchers carried out a meta-analysis of 66 tasks used in fMRI studies. A broad range of tasks were examined: from measures of episodic memory and face processing, to measures of sexual picture perception and alcohol use reactivity. The researchers found that more than half of the tasks had poor reliability—lower than 0.4. About a quarter showed fair reliability of 0.4–0.6, and only a fifth had good reliability of higher than 0.6. This means that our ability to make individual predictions is severely constrained.
Reliability is not the only limitation of fMRI. Cost is also a limiting factor. A single fMRI study can cost over $20,000 (around $555 per subject per hour). This is researchers rarely scan large numbers of people. Most studies have to rely on a small number of participants, usually less than fifty. When you design an experiment or carry out an observational study, you are measuring a phenomenon, such as the correlation between different variables or the difference a treatment makes. This correlation or difference is called the effect size. There are many ways of measuring effect sizes, but the one that concerns us here is correlation, which is measured on a scale of 0–1 and can include negative values. (The reliability of a test is a measure of the positive correlation between the results obtained during different sessions).
A study’s ability to capture a given effect size is called its statistical power, and this depends on at least three main factors: the real size of the effect in the population, the number of variables you’re measuring and the sample size. Say you’re measuring 5–9 variables and the real correlations you’re looking for are 0.5, 0.3 and 0.1. Clearly, you won’t know the real effect size until you measure it. However, in this hypothetical example, you would need 40–50 participants for the largest effect size, 70–100 hundred for the medium one and nearly 800 for the smallest. When your statistical power is low, because your sample size is small and the effect size you’re looking for is also small, at least two things can happen, neither of them good. Either you will end up overestimating the effect size, or fail to find one at all.
Given that every brain image is composed of hundreds of thousands of voxels, there are so many variables involved that you need a large sample size to discover even a large effect size (the sample size needed increases with the number of variables). Recently, researchers have been able to estimate the sample size required to measure the correlations between structural brain regions and behavioural measures using data from the Adolescent Brain Cognitive Development Study, a US project involving nearly 12,000 brain scans. The largest effect size found was a correlation of 0.14 for cognitive ability and functional connectivity. A sample of more than 4000 individuals would be needed to accurately detect such a correlation. This explains why previous associations between brain structure and behaviour have proven so hard to replicate.
The study also showed that studies relying on 25 participants could reach entirely opposite conclusions—such as a correlation of 0.6 for two variables in one study, but a correlation of -0.6 for the same two variables in another study. This data applies to structural neuroimaging. In functional neuroimaging, it has been estimated that over 600 participants would be needed to increase the reliability of task-based scans.
Neuroimaging is not as simple as getting a subject into the scanner and looking at the results. Data have to be cleaned and processed and statistical analyses must be performed. The methodological decisions every scientist makes during this process can have huge consequences (this is why, most of the time, researchers work hard to validate their measures). In one study, seventy research teams were asked to analyse the same data set and test nine hypotheses. Four hypotheses showed remarkably consistent results across all teams, but the remaining five achieved less consistency, as about a third of all teams reported results different from the rest. The researchers found that the main factors that affected the variability of the results were the choices made when deciding how to analysing the data and the statistical software used. Earlier research has also shown that some of the most common software packages increase the odds of obtaining spurious results.
There are additional, external factors that affect the inferences that can be drawn from fMRI. Coffee, for instance, has been shown to influence brain activity, mainly by increasing blood flow to areas associated with attention and vigilance. Whether the subject’s eyes are eyes open or closed during the scan also alters the activity patterns seen in the scan. Neuroscientists make corrections for these sorts of phenomena all the time, but this lowers the statistical power of the studies and decreases confidence in any results. Even the best available technologies still present several obstacles.
As for Neuralink’s device, it’s not even clear whether it “could last for decades in a corrosive environment like the brain,” as a member of the team has acknowledged. To be capable of reading minds would be amazing. We may be ready for that in the future. But, alas, not yet.
Because I’ve trained myself to think only in an obscure dialect of esperanto.