In the popular Grimms’ fairytale, a wicked witch leads Hansel and Gretel into a dark forest in the hope they’ll not find their way out. But quick-thinking Hansel left a trail of breadcrumbs so they could retrace their path back home after escaping their captor.
A bit like Hansel and Gretel in the forest, a company’s data leaves a trail of breadcrumbs—metadata—to record where it came from and where it’s going. In data management circles, this technique is called data lineage, and it can help enterprise data management professionals “get out of the woods.”
Data Lineage Meets DLP
Data lineage traces the journey of an organization’s data as it flows from its origins through its IT systems, tracking how and where it’s used, moved, and stored along the way. It allows businesses to monitor their data with precision, regardless of the various transformations it undergoes during its lifecycle.
Data lineage has emerged as an essential tool in modern DLP professionals’ toolkits. That’s because, too often, companies’ DLP efforts fall short as there are too many blind spots: it’s unclear what data the organization needs to safeguard and how it’s being used. Only by knowing how the data is being used with a solution that traces data can IT teams:
- Define what’s risky for their organization
- Enforce actions to protect their data
- Investigate unsafe or malicious activity
- Educate users to handle data better
So, data lineage can mitigate the risk of data loss, compromise, or theft and avoid businesses falling afoul of increasingly complex compliance regulations.
Let’s look at some ways in which data lineage enables IT professionals to tackle some of today’s most pressing DLP challenges head-on:
3 Use Cases that Prove Data Lineage is the Future of DLP
- Balancing the AI Productivity-Risk Equation
The scale of productivity gains for employees in all roles ushered in by generative AI is only matched by the level of new risks to confidential company information it introduces. Generative AI tools heighten the potential for sensitive data exposure because these models incorporate user input, which generates output for other users outside the company.
Businesses need to strike the delicate balance between capitalizing on the productivity benefits promised by the use of AI tools and safeguarding their company’s confidential data. With new flavors of AI launching almost every day, it falls to IT teams to craft a security approach that can keep up in understanding and controlling AI’s usage in the enterprise.
Until recently, security products only recognized and protected a limited range of data types as they relied on finding patterns in the content itself. Fortunately, data lineage tools can analyze billions of events surrounding every piece of data to better understand and classify it, allowing for protection of a much broader range of sensitive data in any form, anywhere it goes.
These mechanisms enable out-of-the-box visibility and control over sensitive data flowing to and from generative AI applications. IT teams can share these insights with business leaders and co-create company policies to govern the responsible usage of AI in the workplace.
The best tools accurately classify sensitive data and identify unsafe activity in real time. This helps IT teams configure policies that block the pasting of sensitive data (for example, to ChatGPT) while allowing non-sensitive data to pass through.
Look for solutions that include proactive user coaching and guidance in the form of customizable messages to:
- Alert employees of the risks of pasting sensitive data
- Direct them to approved alternatives
- Anticipating and Closing Compliance Gaps
Industries where data privacy is of critical importance, such as the healthcare, financial, and legal sectors, face a raft of compliance requirements that translate to the need for robust DLP solutions.
In the year ahead, expanding global regulations will heighten the need for DLP to satisfy expansive laws and requirements on data governance. Businesses in all industries will need to sharpen their compliance practices to shield sensitive data and adjust their toolsets to ensure they’re geared to cover any new compliance standards that emerge. Increasingly, confidentiality will need to be a concern for every document and communication, even as they circulate within the organization.
Data lineage lays the essential policy and control foundations necessary for proactively securing your ecosystem of company, employee, customer, and partner data without getting in the way of business.
Importantly, data lineage reveals “unknown unknowns” about how employees are accessing, creating, and using sensitive data. These insights help IT teams craft more robust data compliance programs through better workforce education and complete visibility into the impact of any policies before they’re deployed.
Should incidents occur, the historical context provided by data lineage instantly reveals the entire lead-up to the event. This lets analysts quickly differentiate between malicious intent and honest mistakes and reveal gaps and misconfigurations that may have contributed to a potential breach or compliance misstep.
- Tightening Up SaaS Data Sprawl
SaaS environments like Office 365 naturally present DLP challenges, notably in the form of data sprawl. This can result in files with sensitive information exchanging hands and potentially becoming accessible to unauthorized parties. Employees, for example, may set file permissions too broad, making business-critical intellectual property like an R&D roadmap available to end users of any permission level.
Over time, companies end up with an unknown number of files containing sensitive data circulating. Loose permissions and a lack of visibility into the location of this content introduce potential data breach and exfiltration risks. The situation becomes more complex in hybrid environments where employees can move files between OneDrive and other sanctioned and unsanctioned apps and devices.
Microsoft’s coverage within the office ecosystem for addressing DLP risks is decent but not comprehensive.
One of the primary limitations is that policy options and scanning are limited to specific file types, mainly Office files. The result is that files containing proprietary intellectual property, such as source code or design files like CAD, images, and videos, can’t be truly secured with Microsoft DLP alone.
Additionally, while Microsoft provides a cloud access service broker (CASB) solution—Microsoft Defender for Cloud Apps—to provide visibility into other clouds like Google Workspace, Box, or other cloud file shares—companies using multiple clouds lack uniformity in terms of the enforceable actions they can take to identify and protect sensitive data.
For these reasons, many businesses elect to augment Microsoft DLP with a data loss prevention solution that includes data lineage.
For example, this functionality would allow you to know that a finding containing a customer credit card number in a SharePoint site message originated from a CSV exported from Salesforce and into your Microsoft cloud environment by a user who did not originally have access to Salesforce.
This level of detail means that notifications provide useful information to admins and that false positives remain low. It also means you can configure data loss prevention policies taking into account end-user actions and not just the type of content you wish to protect.
No company is immune to data security threats, and as modern IT environments become more complex, the case for building a multi-layered DLP strategy is compelling.
Organizations that fail to establish a clear and comprehensive data lineage trail as part of this effort could risk becoming the data loss witch’s next meal.