How does NASA use big data?

Government   |   
Published February 15, 2019   |   

Do you know that the term “big data” was first used in a 1997 paper by scientists at NASA, describing an interesting challenge, they had with visualizing the large datasets? The volume of data NASA has to manage is mind-boggling. According to Kevin Murphy, NASA Program Executive for Earth Science Data Systems, NASA – one of the biggest generators of data – is generating 12.1TB of data every single day from nearly 100 currently active missions and thousands of sensors and systems around the Earth and space. Some missions could generate as much as 24TB in a single day. Handling, storing, and managing this data is a massive challenge.

Where does NASA store all this data?

NASA has big data at the heart of its most critical projects of the agency. While the Caribbean Islands are battered by the most destructive and powerful Atlantic hurricanes and the State of Florida braces itself for a storm of the category 5, sixteen earth science satellites in NASA are working to gather all the relevant data on climate science while at the same time monitoring the quality of air, the oceans, and the hurricanes among other climatic areas that raise concern. Some of the critical projects in NASA’s mission that demonstrate how NASA uses to achieve its goals are:

  • The Quantum Artificial Intelligence Laboratory (QuAIL) and the space agency’s quantum computers;
  • The agency’s Supercomputer- the Pleiades- that performs simulation and modeling;
  • Storage of bulk amounts of data on Earth Science and Distributed Active Archive Centers (DAACs);
  • Cyber Security of its networks and the NACRA- the Network Activity Cybersecurity Risk Assessment;
  • Expert Medical Care and Exploration of Medical Capabilities (ExMC).

This article, however, focuses only on analyzing ways in which NASA develops its approaches towards their current usage of big data.

How does big data process and manage NASA missions

NASA is managing and processing big data as evidenced by the MPCS (Mission Data Processing and Control System). The Curiosity Rover recently used this system on a mission to Mars. During the expedition, MPCS leveraged data from NASA’s Mars Reconnaissance Orbiter as well as the deep-space framework to guide the Curiosity Rover during the mission in real time. In the past years, this process would take several hours or even days to conclude.

Additionally, NASA also uses big data in navigation. NASA’s Flight Operation Team uses big data to generate Custom Data Visualizations- built by MPCS- to guide NASA teams in missions.

Data Storage

NASA has numerous active missions at any particular time: From robotic spacecrafts capturing and sending images of high-resolution and other data forms from far distances to other earth-based mission projects of surveying the ice at the earth’s poles or examining the change in the climate around the globe. As one might imagine, the data that is generated from all these projects is staggeringly voluminous. NASA stores most of this data. For instance, the NCCS (NASA Center for Climate Simulation) is an incredibly huge storage space by all standards. How huge exactly? Well, this resource contains data of 32 petabytes, and its total capacity is 37 petabytes.

Currently, NASA manages several hundreds of petabytes in data, especially if all the domain sciences and earth disciplines are considered. Handling these astronomical amounts of data is not out of the realm of the typical missions of the space agency. An individual project is capable of collecting as much as hundreds of terabytes in data. NASA’s Goddard Institute primarily uses the information stored on NCCS to carry out its everyday operations. The NCCS also owns the 17ft by 6ft visualization wall that provides a high-resolution surface where scientists can present videos, animated content, and images from NCCS’s data.

As recently as three years ago, NASA was generating about 12.1 terabytes of data each day from numerous sensors and systems positioned across the globe and space. As NASA upgrades its spacecrafts to better its capability of handling much larger and faster data transmissions by a factor of about one thousand using optical lasers, the anticipation is that some of the space agency’s missions could go as far as generating as much as twenty-four terabytes of data every single day. So, how exactly does NASA manage to store all this data? Simply put, similar to how any normal organization’s IT department would – the volume of data anticipated to be generated is approximated, and then the agency plans accordingly how to store it.

To store the voluminous amount of data collected, the National Space Agency has adopted a diverse storage system that has both a sophisticated cloud platform mostly used by giant commercial organizations such as Amazon and Google.

Also, there is another one data storage project of NASA: Earth Observing System Data and Information System (EOSDIS). It’s devoted to better understanding the surface and atmosphere of Earth and focuses on satellite measurements to make knowledgeable decisions.

Within the space agency, one question remains: how it is equipping itself to handle the exponential annual growth of data of about ten petabytes per year. For most astronauts in the agency, machine learning algorithms and artificial intelligence solutions will play an integral role.

Distribution and Archiving of Information

When dealing with astronomical data volumes like NASA does, it is not surprising to run into inevitable, formidable challenges which include the big data fundamental question: What should we store?

In NASA’s case, not all bits of data received are stored. The trick lies in trying to determine what data should be saved and what data should be utilized in mining useful insights and then ultimately discarding it. At the National Space Agency, the chief objective of some of the projects with big data is, essentially, to archive the data. This implies that the agency saves the bits of data collected for performing data stewardship. For instance, the data gathered from the agency’s Earth Observing System satellites and other programs of field measurement is stored in NASA’s Distributed Active Archive Center (DAAC) facilities. Here, the data is processed, archived, and then distributed.

NASA makes use of big data through the ASDC (Atmospheric Science Data Center). The ASDC, located at NASA’s Research Center in Langley, is responsible for archiving, processing, and distribution of NASA’s Earth Science Data.

The information from the ASDC is crucial in helping scientists understand the causes of climate change as well as atmospheric processes. ASDC insights can also help people to comprehend the effects human actions have had on climate in the past years.

Another way NASA is leveraging big data is through the PDS (Planetary Data System). The PDS is responsible for archiving and also the presentation of scientific information into a single website. This system provides access to an excess of 100TB of space models, telemetry, images, and other useful information gathered from planetary missions in the past 30 years.

Project Analysis

Some of NASA’s big data projects are carried out primarily to acquire data for analysis rather than stewardship. A good radio astronomy instance of data for analysis is the scheduled Square Kilometer Array (SKA), which entails numerous telescopes positioned in South Africa and Australia for exploring the formation of galaxies in their formative stages, the universe’s origins, and other mysteries. In this particular case, researchers in NASA are more interested in using the data in conducting multiple analytics than just storing the data in the agency’s systems.

Another example where the agency acquires data particularly for analysis is the US National Climate Assessment. The US National Climate Assessment is a federal research project for climate-change research whose principal role is to generate more accurate measurements of the areas covered with snow, and the measurements of the snow covering regions where black carbon, dust, and other pollutants generally affect how satellites view the snow.

NASA’s Pleiades supercomputer taps into big data to assist in analyzing different complicated projects such as comprehensive space shuttle designs, solar flare incidences, and space weather. Recently, this supercomputer was used in the evaluation of large amounts of star data gathered by NASA’s Kepler spacecraft. Through this analysis, NASA was able to discover Earth-size planets within the Milky Way galaxy.

Other than that, this supercomputer helped in the development of the Bolshoi cosmological simulation that evaluates how large-scale structures and galaxies evolved. And lest we forget, at least 1200 people across the US depend on the Pleiades to solve large and complicated calculations.

In conclusion

NASA uses big data way beyond the functions highlighted here. In fact, NASA is arguably the world’s leading user of big data. But despite the honors, it is vital to note that the agency is still at its infancy stages when it comes to exploring big data. And given the enormous strides, big data use has helped NASA achieve at this stage; we can only imagine endless, unfathomable opportunities ahead.