You’re a cloud-based company, your entire business model relies on cloud data and your cloud environment is VERY precious to you, correct? That’s why you invest all of your resources into protecting this environment from the outside world. You want to make sure that the perimeter between your cloud data and the outside world is 100% safe, I get it.
But what if I told you that your goal is to protect your crown jewels, a.k.a - the data itself?
Do you even know where your data is stored? Your sensitive data? Where it flows and who can access it? If you’re unaware of your sensitive data’s existence and can’t follow its actual and potential data flows, it simply cannot be protected. If you answered even one of the above questions with - ‘no’ or ‘maybe’, then you need to read on about DSPM, urgently.
Getting Familiar With Cloud Data Security Problems
Organized Data vs Data Chaos
Up until recently, most companies used to manage their data by utilizing a centralized data architecture. This architecture is usually coupled with the traditional approach where there's a single database shared across all services. This resulted in data being maintained and secured by one main entity within organizations, usually security or data governance teams, while all the data was ‘dumped’ into one place. Developers within the company who wanted to create new data stores needed to have the approval of that security/data entity to make sure nothing fell through the cracks. This methodology made the life of data, compliance and security teams much easier in terms of data management.

Shift Left in Data Creation
Fast-forward to a few years later, many companies, especially cloud-first companies with high data capacity, are now rapidly transitioning to the microservices approach - which is usually paired with a decentralized data architecture. What that means is - technically speaking, developers within the organization with the appropriate permissions can create their own data stores with very little supervision whatsoever. Essentially, the ownership of data management and security has shifted from one main entity (centralized) to each developer managing his/her own data store (decentralized). This creates a situation we define as - Data Chaos. Why chaos? Because developers can now create their own data stores (e.g. managed S3, RDS and unmanaged MySQL, PostgreSQL, MongoDB) so quickly that it’s becoming impossible to keep track of the full picture: data is flowing between different applications, services, cloud-native users, 3rd-party vendors and even countries. On top of that, cloud applications produce an incredible amount of byproduct data which also contributes to that Data Chaos phenomenon. Add the fact that developers are not trained to deal with data security and compliance and you get it - chaos.
Known Data vs Shadow Data
Nowadays, cloud service providers offer relatively basic data management solutions for data sources (e.g. RDS, S3, DynamoDB, etc.). While these solutions allow organizations to observe the amount of data stores they currently have, they do not provide any information on the types of data residing in them and whether that data is sensitive or not. This lack of visibility into your data creates shadowed areas within your data stores, resulting in unknown data that is unused or unavailable data created by devs and byproducts. This unknown data is what we define as - Shadow Data.

The emergence of highly distributed cloud-native apps based on microservices, containers, and serverless functions has brought the issue of shadow data even more to the forefront, as decentralized workload-based data stores are a major contributor to data sprawl.
The problem here is - since no one within the company knows about the existence of this data - it isn’t monitored and this leaves a potential backdoor for hackers and compliance issues alike.