Data Segregation: Missing Piece in Securing Enterprise Content

This is how customers describe Data Segregation problem to me.

“We have millions of documents sitting in enterprise application servers that we know are accessible to all our users. We are subject to regulations that require us to identify classified documents within these large sets of data, and segregate them into restricted locations. It is also important for us to restrict users from storing documents in the wrong locations. This data is not static, it is constantly being created, modified, duplicated, and incorporated into other documents. This isn’t a snapshot problem, this is a problem that is subject to continuous organic growth, especially in a global company. We are not sure how to segregate this data and put it  into the right physical repositories. This project is so complex, we do not even know where to start.”

This is the problem of Content Segregation. We would all agree that Enterprise Content Management applications are very important in the day-to-day operations of the business; an essential part of the Global Collaborative business process. However, these applications have not adequately addressed the need for content segregation. Enterprise and Security Architects are looking for more sophisticated ways to secure and manage the data that is created, stored and shared in these applications at an exponentially growing pace. 

Many organizations are fearful of or prohibited from placing data in certain physical locations or cloud storages due to restrictions on data access or compliance with business or industry regulations. These are often referred to as Data Residency or Data Sovereignty regulations.

For example, in the US, ITAR/EAR, regulated data cannot be stored, backed-up or transferred through a server physically located outside of the US. Similarly, European data protection laws prohibit personal data from moving outside of the European Union (EU) or even specific country borders.

These regulations are different from the well-known and well understood access control rules. The problem these data owners and security architects are facing is not around access controls, but around physical storage of data when created, caching of data when accessed, and storage of data in transit.

Enterprise and Security architects are often asked to implement solutions to make sure unauthorized users cannot access or use classified, sensitive data.

Most content management applications, such as SharePoint, address security concerns through features like access controls, rights management and audit logs. However, these controls only work to restrict users from accessing or using content, and tries to provide a trail of any such access. But they don’t prevent the user from storing or caching the data in an unauthorized physical server. They are also limited in visiblity to their own system; when documents move outside the application like Sharepoint the management chain is often broken.

Other applications provide data encryption or tokenization. These options get around the issues of data security, residency and privacy by obfuscating the data that goes into the servers. These techniques mask the content from the end users but do not address the data residency requirements.

That leaves us with the following question marks:

  1. Is data residency and content segregation a challenge for us?
  2. How do we make sure our data is stored and cached in the right place?
  3. How can we identify and segregate data automatically?

At MinerEye, we have harnessed AI technology, to automate the identification of sensitive data, in both legacy and day forward data, and monitor data segregation that is based on continuous automated categorization of data. The system supports users in the process of defining the relevant policy based on its finding of data sovereignty issues.