Algorithms of Big Data Processing As a trump in a new De Beers of IT-1
Subject: why not DBMS?
Good morning and thank you very much for your comments regarding our project.
Two moments remain unclear, however:
why should not we emphasize the well-beaten track of using the classic commercial Data Base Management Systems? Data means data regardless of the volume or the type of a database (big data, “not so big” data etc.). Please comment;
your analogy with diamonds is beyond my experience. I’ve heard about De Beers, of course, and I can guess they also have rushed into the digital transformation like many other “non-digital” companies throughout the world. But I feel you meant something different with regard to software development, something closer to diamonds as such. Please clarify.
Subject: Diamonds and BD
My young friend, it was a little weird that your advanced geeky team of outsource developers could not recognize the actual scale of the challenge. Your client is expecting to get the ready-to-use SQL tables from you. Tell me how are going to cope with terabytes of their dark data? How will you extract volume from the huge massif of their unstructured text documents? Your are an app maker, right? So, which tool are you planning to offer your client to monetize that information? By the way, do you realize what dark data is by definition?
Scientists from Stanford liken it to dark matter. Dark data is a great mass of scattered information lacking the structure and therefore is unprocessable by conventional software (hope, it’s clear for you as for a software development company). Those guys know what are they talking about since their DeepDive data management system extracts, processes, and integrates valuable information from dark data in a variety of domains from genomics to human trafficking.
Make an effort, my friend, and switch on your imaginative power. The analogy with De Beers is pretty appropriate to catch the idea. Imagine a giant pile of loose rock withdrawn from a kimberlite pipe during a diamond-mining process. The loose rock can be associated with the billions of signals generated by the vast variety of computers and sensors embedded in various IoT devices, for example. The dump trucks moving that loose rock compose the networks that transfer data through the Internet. Your rich experience in web development can help you to embrace it well. Rough diamonds are hidden in the loose rock just as valuable information is buried in the mass of dark data. The DeepDive system and the like are aimed at extracting the valuable data diamonds from the loose rock of dark data. And the Big Data processing algorithms can be likened to operators driving the extraction systems somewhere in De Beers’ diamond mine.
Your client does not need rock refuse - he needs rough diamonds. Consider it together with your software development team. And don’t laugh at my old-guy metaphors, please.
Subject: Diamond cutters for Dark Data
Dear Mr. Rootlord,
Thank you very much for your message. We don’t mean to laugh at your suggestions. The case with those bank frameworks when your COBOL skills saved our mobile development project fixed the brains of many of us, actually. To tell the truth, I’m trying to make our Indeema management focus on figuring out of some different approach to data mining and Big Data processing sufficient for the present project. I showed them your previous message in order to explain the dark data concept. Besides, one of the recent big deals, when Apple purchased Lattice Data (the DeepDive-type platform capable of putting dark data in order by means of machine learning) for $200 million, pushed our managers to take the concept seriously.
To say in your words, Apple hired professional (and expensive!) diamond cutters to polish rough diamonds of Apple’s dark data.
Although I found your analogy with diamonds very demonstrable, one confusing aspect is still available in it. Why do you equate the processed big data with rough diamonds instead of finished ones? I don’t want you thinking we practice idiocies while sorting out so little things, but a deeper understanding of your metaphor can help me in persuading our outsource development managers more effectively. Please explain.
Subject: Finished diamonds & actionable insights
My dear friend, I’m glad to know that your leaders appeared flexible enough to accept the obvious things :). With regard to the price that digital giants are ready to pay for Big Data processing you are right – the diamond cutters are expensive. Especially when they involve artificial intelligence and machine learning technologies. Big Data analytics industry is worth more than $120 billion. And it is growing because business is looking for the miracle capable of cracking a code of the latent patterns hidden inside disparate, messy, and diverse data in order to get the “actionable insights”. Kaggle platform (about 500K data scientists have joined the platform since 2010) is rumored to be acquired by Google (the loudly ringing Industrial Internet of Things makes Google get moving) . The transaction amount is kept secret, but I may guess it exceeds those $12.5 billion Kaggle raised from investors in 2010.
As for the rough diamonds from my metaphor, it’s easy. The end value a business can get from any kind of Big Data processing is the notorious concept of the “actionable insights” - the ideas and decisions capable of elevating the business above average. Such insights are the finished diamonds. So, it’s more accurate to say the very business executives should act as the diamond cutters polishing their processed Big Data in order to achieve a “business epiphany” by means of data analytics. That’s why I call the database processed with even the most advanced algorithms only rough diamonds. Applying my analogy with De Beers, Kaggle and Lattice Data can be likened to the “extractors” able to effectively separate rough diamonds from rock refuse. It’s up to Google and Apple to decide what to do with the rough diamonds of the processed data – to polish it getting “actionable insights” or to bury it back into rock refuse. I hope you see the difference.
Subject: Our Decision
Dear Mr. Rootlord,
Thank you very much for your explanation. Now everything is clear for both our software development team and me. In order not to bore you with my questions, I’d like to inform you just about the final decision we’ve made. We are diving deep into Big Data algorithms being fully aware that it is inevitable. Time is money, so we cannot sit still. And the technology we choose is Hadoop. Hope you endorse the selection.
Subject: Too hasty to be correct
Goddamn, my boy, this is my fault! …
Part II explains why swift decisions about big data can lead to mess rather than to order.