Google Docs Case: When Human Factor Compromises Privacy Setting
It is hardly a new revelation for everyone that almost anything in the digital environment can act as a vacuum for our private data. Facebook sells your data to advertisers, Google reads your email, your smartphone work as a beacon able to disclose your location at any moment with high accuracy. It seems we used to accepting such a transparency as a matter of course in these days of a Great Digital Eye, lidless and omnipresent.
In most cases, we cannot afford to turn down email and smartphones. For the majority of the “always connected” population, a ban on their Facebook accounts is equal to either an imprisonment or the death of a friend. The contemporary workflows of numerous businesses are reorganized for the online coworking in the clouds. And those millions of dollars spent for digital security by a corporate sector are for a reason.
Nevertheless, we stay almost calm when another report about a huge data leakage appears in media - hackers will keep hacking, there is nothing to be done. And we tend to tune it out when our private information remains untouched. But we can be frustrated when an ordinary mode of interaction with some digital source is undergoing even a slight change. In many cases, we ignore new rules and regulations coming from content giants with regard to some updates in their privacy policies. It is boring to read fine-print pages unless you are a lawyer.
Google warned us beforehand
In 2009, Google announced that the search engines would indicate public documents from Google Docs in search results. At that time, there was almost no reaction to it from a wide audience since people preferred to keep their documents offline. Indeed, until quite recently, the Microsoft Office backed by our desktop hard discs reigned almost supreme in the document flow. After almost a decade of a transition of documents from our hard discs to clouds, the warning from Google gained a certain resonance. In early July 2018, people began to find quite sensitive private data among Google Docs indicated by Google, Yandex, Yahoo, and some other search engines. And that was no hacking at all. Moreover, the security capabilities of Google Drive were not compromised. It just happened what Google warned us about in 2009 - the improper security settings started playing tricks on the neglectful users of Google Docs. How significant can this impact be for the further popularity of Google Docs in particular and for the cloud services in general? And what should we do to avoid some sorrowful self-induced data leakage?
Precedents are now available
As one of the popular CIS news portals Medusa reported, many Russian-speaking internet users started discovering a lot of very peculiar documents through Yandex and Google since July 5. Journalists joined the “private secrets safari” when first “wow” reports from disconcerted users appeared on the internet. It turned out quite soon that neither a targeted media shot nor a data leakage from hackers was the cause of the excitement. Random users eventually found the documents that should appear in a public access under no circumstances. And nothing but conventional search engines were applied to the search. Both the indexing rules of Google and the disregard of users made the “leakage” possible. From a purely technical perspective, any document from Google Docs can be identified as public on the internet if its privacy settings imply sharing. Hence, the simplest and briefest recommendation for users in such a case can be “watch your settings”. However, the problem has some special aspects behind simple explanations.
What Yandex silences about
Since the majority of the “leaks” appeared through the Yandex search, the company was asked about the explanations of how Yandex could access the documents that hardly belonged to the open part of the internet - the ones accessible via hyperlinks without a login and password. For example, the disclosed secret instructions regarding the gender and nationality discrimination of clients of one of the Russian Banks could hardly belong to publicly accessible documents. It was obviously a part of an internal bank correspondence dedicated to quite a narrow circle of the bank’s officers. Nothing definite was explained by Yandex in that regard. They assured journalists that Yandex complied with the robots exclusion standard which prevented indexing the websites where the standard was applied. In simple words, the standard tells web crawlers what to scan and what to ignore on the internet. Since Google Docs uses the robots exclusion standard, it would be impossible to index the private documents under any scenario. Such an explanation that explains actually nothing should suggest us that quite a voluntarist approach takes place when the search engines along with content giants decide which info is to be publicly accessible and which is not.
A corporate complicity or a human factor?
Assume your corporate documents containing some commercial secrets appeared on the internet. Moreover, no hacking happened. Assume that it turned out that something wrong was with the privacy settings of the documents. What a shame that your staff who have been working with Google Docs for years left such a primitive mistake. And a somebody’s personal guilt can hardly be proved in full. What would you do in this case? Would you use Google? It feels pointless because Google would have a concrete counter-argument: millions of users all over the world have no problems with our Docs, watch your settings!
What is the deep fundamental problem behind the situation? How to figure out a reliable antidote to quite possible paranoia making you continuously watch your staff who in their turn should monitor and recheck the document settings?
One way or another, but it seems the notorious human factor is responsible for the above-mentioned effects. It is hardly reasonable to suspect something like a multi-stage corporate complicity when Google confidentially represents search engines a temporary access to some particular documents due to some hidden interests of some secret entities. Technically speaking, such a conspiracy is not excluded, but even so, the universal human weaknesses should stay behind the scene after all.
The root of the problem
The efficiency race is gaining momentum in the contemporary business environment. Since an absolutely exclusive winning technology is almost impossible in these days of a knowledge society, only improved efficiency can help you succeed in a commercial competition. The booming robotization, as well as a total automation, are aimed at making all production and business processes run faster and more effectively. And the smaller human factor is involved, the better results can be achieved. Inanimate workers have no emotions that oftentimes lead humans to stupid errors and odd mistakes. Having even a similar production rate, the human-free assembly lines are more cost-effective for the manufacturers than human workers just due to the error-free modes the machines can constantly keep. Hence, the current development of the AI and machine learning is grounded on a very pragmatic quest for a higher efficiency. Which technology could exclude negative aspects of the human factor from such a sensitive sector as data security?
Immutable ledgers against leakage
The technology capable of excluding both human errors and targeted attacks on some sensitive data was invented a decade ago. In fact, there are several technologies of such a kind that have a common name Distributed Ledger Technology among which Blockchain is the most popular. It so happens that the Blockchain is well known with regard to cryptocurrencies. In addition to Blockchain, such a platform as Hashgraph along with such a protocol as IPFS can facilitate various data security issues. The common feature of all distributed ledger technologies is a cryptographically encrypted control over files where no central entity is available. In other words, everything that happens with a piece of information in such a system relies on a consensus of different users whose rights are distributed in one or another manner. For example, a financial transaction can be accepted as valid when a group of independent users comes to a joint conclusion about it through a certain type of consensus. Despite a huge number of cases to which this working algorithm can be applied, the core advantage of DLT is a crucial minimization of a human factor wherever the data immutability is important. Besides, the very essence of DLT as a software phenomenon comes to a definition of the decentralized applications:
The decentralized applications is a radically new form of software that enables services that no single entity operates.
Sometimes sh***t happens
Imagine a document that should be created by a group of users during a certain period. A lot of alterations, editorial changes, and various modifications have to be made when the teammates are engaged in a collaborative processing of the document. This is an ordinary workflow which many of us are familiar with thanks to such coworking software as Google Docs and Trello, for example. The document will be created and stored on a hard disc somewhere in a data center of a cloud service provider. The shared access to the document makes it partially public. Nothing prevents us from assuming a situation when one of the users can change the document settings to make it publicly accessible in full. It can be either a misuse of settings or a deliberate intent, no matter. Yet another situation when a file with the document is damaged or appeared out of access due to purely technical reasons on the data center’s hardware cannot be excluded also. But in case the file is sharded into pieces when each cryptographically encrypted piece is distributed among numerous computers of a peer-to-peer network, any human factor-induced error can hardly lead to both the data leakage and inaccessibility of the file. Thus, the distributed control over information enables another level of data security when privacy settings are much less dependent on a human factor.
Why we all aren’t there yet
Which begs the question why the world content giants such as Google, for example, have not adopted the DLT for their services yet. In order to find an adequate response to such a logical question, we should embrace the entire digital environment where both centralized and decentralized applications keep coexisting today. The general idea is that they are not mutually exclusive. After a decade of a strong development, cryptocurrencies did not replace fiat currencies. Hundreds of the decentralized applications available now cannot kill thousands of their centralized analogs. Numerous IoT developers use blockchain for their software solutions, but many of them don’t. This technological pluralism is possible due to a large diversity of final purposes various software are aimed at. The vertically integrated organizations such as national banking systems and governmental bodies face numerous operational constraints when a “no-central-entity” application is to be adopted. The content giants and leading cloud providers such as Google, Facebook, and Amazon can easily lose their leadership when content appears distributed within peer-to-peer networks. This is why, by the way, we are still using Web 2.0 instead of the fully distributed Web 3.0 whose appearance is so much anticipated by many blockchain proponents. In other words, far from all of us are ready to accept the approach proposed by the IPFS when: “the entire World Wide Web can be considered as one torrent file that everyone shares”.
There’s no point asking the developers of Google Docs why they ignore the DLT capabilities. They are professionals, and every professional even being an enthusiastic proponent of Blockchain should admit, hand on heart, that in comparison with centralized analogs, the decentralized applications:
are more expensive;
are less scalable;
have a poorer user experience;
have ambivalent governance;
show worse performance.
Moreover, every true professional most probably realizes that the above-mentioned drawbacks of the decentralized applications take place because of neither their immaturity nor the improper development. Lightning networks, bigger (or smaller) blocks, forking, sharding, self-amending as well as any other technical solutions can hardly critically change the blockchain-based apps’ characteristics. This so because of the primary design of every decentralized application where decentralization, censorship resistance, and a crucially minimized human factor are prioritized over any other capability.
Is status quo here to stay?
A broad world audience will never give up using cloud-based centralized software over a medium term at least. How odd it may sound, the network effect highly appreciated by the blockchain proponents works well far beyond the DLT. Google Docs can hardly suffer too much from the above-mentioned accident with the private data leakage because millions and millions of users throughout the world have been using this cloud service for years without significant problems. Most probably, the majority of the world users of Google Docs can recognize the accident as rather a single disconcerting setback than an alarming trend.
Does this mean that some decentralized analog of Google Docs has no chance to compete with the content behemoth? It depends on the end customers of such an application after all. The multifaceted environment of the contemporary digitization can include numerous groups, cohorts, and companies who are ready to sacrifice speed, scalability, and cost in favor of the immutability of their software enabled by the DLT. Wherever errors and leaks caused by a human factor threats the very existence of a business, a relatively low-productive but highly decentralized software solution can become a silver bullet. Hence, this is about a proper balance between what you appreciate most and what one or another software developer offers you.
Who can help to make a right choice
When someone is hesitating between a centralized service and a decentralized application, the conflicting recommendations of the developers from both camps can only add some extra hesitation. This is quite natural when every argument can be counterbalanced with a relevant counter-argument. But the truly valuable advice can come from a software development company which feels equally good in both realms. In other words, only a developer whose self-interests do not rely on a definite type of applications is worth listening to. Something wrong is with a developer who always tries to convince a customer to choose just a decentralized application under any circumstances. An ability to grasp the business goals of a customer with no regard to any favorite technology distinguishes a true professional from the rest. Both a project portfolio and customers’ feedback can suggest whether a development company is worth attention or not when it comes to such sensitive issues as human factor effects and data security.