What could you do if you knew how every piece of data was being used?

By Ross Moore

Buckets, Containers, and Blobs – Oh, My!

2 common sources of data leaks are buckets and containers. GrayhatWarfare – one of several services dedicated to scanning for public-facing buckets – has a “database of 416712 buckets, 316000 s3 buckets, 49900 open Azure containers, 6800k Digital Ocean Buckets, 44000 Google buckets and a total of 10.4 Billion publicly accessible files.”

Many data sources are intended to be open to the public, and are therefore an inherent part of the surface web. Other data repositories and sources are less open to the public, but are still intended for fairly wide distribution among its members, e.g., by those who have a login. And these are part of the deep web – hidden behind a paywall or requiring username and password.

But there are many sources that are both open to the public that are not intended to be that way. One example from 2022 is a health treatment facility who had 93,000 patient files exposed to the internet and was discovered.

It would be enormously helpful to know what’s going on with each bit of data.

More than putting the pieces together

What good is putting a puzzle together if one doesn’t know what it’s supposed to look like? Sure, some people like to do that, but puzzles like the Impossible Jigsaw Puzzles or similar that have no picture, no corners, or no edges, are not nearly as popular as those which have what the typical buyer is looking for.

For companies, the challenge isn’t simply to put the puzzle together. There is no official “information security” of “business vision” puzzle. Each organization needs to decide, “What is our puzzle picture?” What all does the org want – a bigger client list? shopping trends? giving trends? Optimize supply chain processes? Decrease warehouse expenses? Increase sales in a particular part of the country/world?

The picture a company wants to paint will include a cycle of decision-making, data collection, data analysis, and refinement or correction of the previous decisions.

A difficulty that rears its ugly head is this: taking stock of all the puzzle pieces may come long after a company is in full motion. People have already implemented disparate systems that hold duplicate data; systems haven’t been designed with interoperability in mind; policies and procedures aren’t in place to provide a framework for daily business; and staff members have likely changed and new personnel are left with a veritable trainwreck of data pieces with nowhere to put them.

Dangers of Being Unaware

Having as comprehensive an inventory as possible is the primary piece of the puzzle. But in addition to that is the vitality of knowing what is being done with the data. Who’s using it? Who’s moving it and how often? To where is it being moved?

Data currency – ensuring data is viable and updated – is important. What would happen if someone in your company used an outdated version of the contract? What if, when you needed to email the whole company about something vital, you used an old email list that didn’t include new employees?

The value of Value

The value of data is not just to have tons of data. “…value is derived from discovering patterns, appreciating impact of change and time, and that data requires enrichment not just discovery. ”

Data by itself is of little use – it’s just letters scattered on a page (like when you drop the Scrabble ® tiles in a pile). What’s missing, and what is arguably the hardest aspect of data value, is the effort that people (authors) have to make not simply to make words and paragraphs out of the letters, but to know what the topic of the business (book) is. The data won’t be valuable until the story is known.

Keeping Tabs

When the story is known and the authors are at work on it, the initial phase is easy (kind of like the honeymoon phase, when all is bright and cheery). But before long comes the everyday minutiae of reality – Who’s changing what? What’s the next step? Who’s leading the project? What is the current project? Is it OK to keep things in email? How many shared folders are there? And for security folks – is anyone accessing that data who shouldn’t be?

Knowing where the data is and who has access; keeping track of where data came, where it moved, and where it’s going; and managing the data locations – these are some of the necessary tasks in protecting employee information, customer identities, and corporate intellectual property.

Knowing these aspects of data is called “Data Lineage.”

Preventing Trouble

More specifically, data lineage is “the process of tracking data as it moves within an organization to understand its origins, the ways it’s been modified, as well as who is using it and how. These are effectively the “What, Where, Who, and Why” of the data being created, modified, and shared within your organization. This added context about data, allows security teams to better protect it from theft and misuse.”

Whether viewed from a legal, security, reputational, compliance, or contractual viewpoint, knowing how each piece of data is being used not only brings benefits – that knowledge helps prevent troubles in all of those same categories.

People – whether prospects, customers, or personnel – don’t only want a better digital experience (who doesn’t want better and better ways to live and work?); they also want to be protected while doing those things.

Improving Life

Even small businesses will have many thousands, if not millions, of pieces of data (not just files) to manage. Finding a data lineage solution is a way to keep all that data as secure as possible. Alerting security personnel when data is placed in the wrong folder, graphing the previous movement of a file when an incident occurs, and knowing when someone has moved a sensitive file to a personal online location are just a few ways that data lineage helps secure information.

You can’t protect things you don’t know about. Taking time to discover and know will go a long way in letting people know their data is safe with you.

ABOUT THE AUTHOR

Ross Moore is the Cyber Security Support Analyst with Passageways. He has experience with ISO 27001 and SOC 2 Type 2 implementation and maintenance. Over the course of his 20+ years of IT and Security, Ross has served in a variety of operations and infosec roles for companies in the manufacturing, healthcare, real estate, business insurance, and technology sectors. He holds (ISC)2’s SSCP along with CompTIA’s Pentest+ and Security+ certifications, a B.S. in Cyber Security and Information Assurance from WGU, and a B.A. in Bible/Counseling from Johnson University. He is also a regular writer at Bora.

Cyber Security Review online – October 2023