At the LHC, the physics starts from a glass bottle storing hydrogen atoms. The electrons are stripped from the atoms, allowing the isolated protons to be fed through a series of particle accelerators. The protons are provided an increasing amount of energy through these accelerators, before being fed to the LHC. Within the LHC, two beams of electrons travelling in opposite directions collide. The energy levels of the protons can reach up to 13 TeV. Essentially, the particles are boosted to close to the speed of light. In each of these collision events, a number of particles are produced, which then decay into other particles, in what are called showers or jets. These particles are observed by detectors. Around the LHC are seven detectors, that are cylindrical arrays of detectors, that can record the position, motion, velocity and nature of particles. These detectors are also known as experiments, and the most massive of these is the Compact Muon Solenoid, or the CMS. It weighs 14,000 tonnes, and its distinguishing feature is a single large solenoid magnet. By analysing the debris of the collisions, scientists can figure out what kind of interactions happened within the LHC, and if there were any expected or unexpected new particles formed. The CMS produces about 1 Gb of data per second. In 2010 alone, the CMS yielded 29 terabytes of data, and recorded 300 million collision events between protons. In 2014, through the CERN data portal, an unprecedented amount of data from the collision events was released to the general public – not just the scientific community. It was quite a surprising move, since it was something that had never before been done in high energy particle physics.
Typically, what happens is that the physicists look for particular patterns and detections that are the telltale signs of a new particle or physical phenomena, based on theoretical predictions. This was the approach that led to the discovery of the Higgs Boson. The particle was estimated to exist back in 1964, to explain the unnaturally heavy weights of the W and Z bosons. Simulations of collision events informed scientists exactly what they were looking for. Once the particle was understood on a theoretical level, it was already caught in a kind of a trap, even before it was detected. If the particle existed, then it was inevitable that it would be found by the LHC. If the particle was not detected, it would mean that the scientists would have to come up with new physics, as their understanding was wrong. As the researchers knew exactly what they were looking for, the Higgs Boson was isolated from the trillions of collision events within the LHC. The discovery was announced in 2012, and was possible because the scientists knew what exactly they were looking for.
However, there remains the question of particles and physics that are not explained by the standard model. There may yet be phenomena and particles that are not anticipated by any of the current physics. This means that the scientists working with the detectors don’t even have an idea of what to look for. To do this, the scientists can harvest the data by analysing the events that can only be explained by physics beyond what is currently accepted. Understandably, combing through all the data, looking for collision events where something unexpected happened, is a herculean task. Researchers from MIT have now figured out a way to isolate the outlier cases in the mountains of data. They do this by using algorithms similar to those used by social networking sites. The degree of similarity between two events is mapped out geometrically. The more similar events are all clumped together, while the rare or unlikely ones are at a distance in this map. According to the researchers, this is the first time that a map of the relationships between so many collision events has been produced. Jesse Thaler, associate professor of physics at MIT explains, “Maps of social networks are based on the degree of connectivity between people, and for example, how many neighbors you need before you get from one friend to another. It’s the same idea here.”
One way to interpret and visualise the data of the collision events, is to convert them into point clouds. The sprays of particles that are detected by the experiments are converted into a collection of dots. Within computation, there are a number of approaches that allow machines to make sense of point cloud data. For example, point cloud data is used by robots and autonomous cars to navigate through their surroundings. The team from MIT used a technique for comparing point clouds, to estimate the amount of work that is needed to convert one point cloud into another. The basis of the algorithm is an idea known as the “earth mover’s distance”. Think of an area over which a pile of dirt is distributed. Now, say you are given a particular distribution, and another, target distribution to compare it to. The algorithm calculates the amount of work needed to move the dirt about in the area, to make it match the given target distribution. The approach only works if the amount of dirt in both the areas are exactly the same, which can be applied in this case because there are always two protons to start with. Thaler says, “You can imagine deposits of energy as being dirt, and you’re the earth mover who has to move that dirt from one place to another. The amount of sweat that you expend getting from one configuration to another is the notion of distance that we’re calculating.”
The researchers considered each of the point clouds, or the reading of the resulting particle jets from collision events as a point. Now, if more amount of work is needed to convert one point cloud to another, then the distance between them is more. If less energy is required, then the two are closer together. These points were then arranged in a social network. This social network used only 100,000 pairs of collision events. By visualising the collision events as a social network, it was easy to isolate the ones that stand out. All the events along the edges, the ones that require the most amount of work to be rearranged into another event, may point towards previously unexpected phenomena. The team hopes to scale up the effort to include more collision events in the visualisation of the network. The dataset released to the public itself is a selection of the collision events detected, which number in the trillions. Mapping such large volumes may help understand the relationships between particle collisions.
Patrick Komiske, a graduate student working on the team said, “We’d like to have an Instagram page for all the craziest events, or point clouds, recorded by the LHC on a given day. This technique is an ideal way to determine that image. Because you just find the thing that’s farthest away from everything else.”
The researchers want to test their approach using historical data. In 1995, the top quark was detected at the Tevatron in Fermilab. The first detection actually happened in 1992, but it took further events for particle physicists to confirm the existence of the elusive subatomic particle. The top quark is the most massive of all elementary particles currently known to science. If the top quark is flagged by the algorithm in social network visualisation of historical particle accelerator data, it would prove the approach developed by the researchers. Thaler explains, “The top quark is an object that gives rise to these funny, three-pronged sprays of radiation, which are very dissimilar from typical sprays of one or two prongs. If we could rediscover the top quark in this archival data, with this technique that doesn’t need to know what new physics it is looking for, it would be very exciting and could give us confidence in applying this to current datasets, to find more exotic objects.”