Among the plethora of disturbing news, some "long-term" topics easily get lost. Some days ago I realized that I had too many files in my backups.
I have since deleted what I believe I won't need any more -- or those things I had forgotten even existed. It took me several days. During the waiting time, while they were deleted, I pondered: How big is really big data?
The German weekly Die Zeit revealed in 2015 that the German Federal Intelligence Service (BND) rakes in telecommunication metadata from 220 million phone records worldwide every day and passes them on to its U.S. counterparts, the National Security Agency and Central Intelligence Agency. The German agency states that it keeps the data for "only" half a year.1 Tens of thousands of people are employed for this kind of nontargeted mass screening.
In this era of artificial intelligence, there is a collecting and archiving mania. Today, everything in radiology has to be archived. Data do not really age, we are told, although in reality they do, and data storage carriers do too -- rapidly so.
Suppose we are in the year 2040. Google has finally been broken into 50 smaller, independent companies by antitrust authorities. Because of the hilarious amount of data, doing a Google search does not show any data and publications created before 2020. If you have published a paper in 2014, it's lost in the cloud -- if there still is a cloud.
If you haven't paid your cloud fees, your data pool has gone anyway. Or perhaps somebody has accessed your data, processed them for purposes unknown to you, or altered them. Perhaps your data have been destroyed without your knowledge. Whom can you trust? Nobody.
There is another problem with the cloud. The term sounds rather pleasant: white, puffy clouds in front of blue skies; the perfect picture selling a green and clean environment. But this kind of data storage, data crunching, or, often, data cemetery facilities is definitely not clean and environmentally friendly; there is no sustainability. On the contrary, it needs an outrageous amount of energy for the server machines, for cooling and air conditioning.
In addition, the wide-scale potential of online banking, social networking, e-commerce, e-government, information processing, and others results in unthought-of server workloads.
Then this question arises: Once we have placed our trust in a cloud provider, are we then completely at its mercy? It remains a fact that you place your data into the hands of strangers. What can we do against dependence?
Cloud computing can be an incalculable risk. Of course you can keep your data under your control if you don't want to hand them over to the big monopolies. However, which hospital, which private radiology office, has the capacity and the financial resources to store all image and written data for 30 years? Handing out copies of the images on CDs to the patients is also impractical because CDs are not a reliable storage media.
The explosion of data is being countered by an increasing ignorance of how data came into being. We have more and more information, but less and less information about the information itself. How do you sort out data garbage? Old formats are no longer readable. People create enormous archives of digital content, but after a short while they don't know what's inside.
I have had the unpleasant experience of not being able to read images made in scientific studies 30 years ago: They were stored on magnetic tapes, then on floppy disks, later on diskettes, then on CDs, and then on USB sticks or hard disks. The half-life of digital media carriers is getting shorter and shorter.
Just think of a CD-ROM or a VHS cassette. They are significantly less resistant to aging than books, and the data can no longer be read after just two to three decades.
More so, there is no software that can decipher the early image formats. This holds true not only for images but also for text files. For instance, Adobe PageMaker was a leading layout software for publications, among them scientific papers and books. In the meantime, Adobe has discontinued the PageMaker format; it can no longer be deciphered today.
Future generations will suffer from a kind of digital amnesia because old formats are no longer readable. Will they have to return to printed books?
There are only unlikely or unappealing solutions -- thus, the topic will be adjourned sine die, which means indefinitely. Let's shoot it into the cloud to be processed there.
References
- Biermann K. BND stores 220 million telephone data – every day. Zeit Online (in English). 2 February 2015. https://www.zeit.de/digital/datenschutz/2015-02/bnd-nsa-mass-surveillance. Accessed 10 April 2020.