Product Analysis: Acaveo DataClassifer

Bonnie Surma January 9, 2014 1
Product Analysis: Acaveo DataClassifer

Product analysis by Joel Oleson for requested by Acaveo, but these thoughts are my own. I invite anyone to provide their experience in the comments.

When I sat down to discuss DataClassifier with the Acaveo team, I was impressed with the timeliness of the product. This past year has been a big reminder of governance and data classification for all of us. It’s really been a wake-up call that there needs to be a time and place for conversations from the most social to the most sensitive and classified. Stories this past year including Wikileaks and the NSA Snowden, Target credit cards, and accounts and phone numbers from SnapChat have all reminded us that data privacy is one of the most important things to ensure trust with our customers. All of this news brings extra scrutiny on SharePoint and enterprise systems. What documents may be stored in SharePoint that should NOT be? What documents should be stored in encrypted databases or have additional auditing requirements? What should NOT be moved to the cloud? It was many of these questions that came to Acaveo. Security and encryption are great, but when trying to treat all data equal, you’re dealing with multiple terabytes of information.

Security by obscurity does not work. It’s too easy these days to do a search and find documents that otherwise should be more secured. I recently worked at a company where executive travel and contact information was very sensitive. They were not to be contacted by the employees. That Personally Identifiable Information (PII) data should not be readily available. When SharePoint search rolled out, it exposed documents that contained sensitive data that should not have been. Wouldn’t it be great if rather than having to have interns or someone on the security team searching random terms to find data that might be exposed that there was a system that could use pattern matching and regular expressions to discover credit cards, phone numbers and more that could then be flagged and tracked?

Needs Identified by SharePoint Customers

  1. Customers want to find data that shouldn’t be stored in SharePoint.
  2. Worries around data privacy issues
  3. Solutions to work with my existing environment

Figure 1: DataClassifier by Acaveo Policy History Report

Understanding Acaveo DataClassifier

The design of DataClassifier focuses on identifying the data that’s stored in SharePoint, optionally managing metadata in the SharePoint search schema, providing a power-user search interface to SharePoint and offering some useful reporting. Acaveo simplifies information governance for unstructured data, which enables businesses and ensures better compliance and privacy with less cost and risks.

While some know exactly what they are looking for and will jump into the regular expressions builder, others will find the pattern matching that’s already set up as exactly what they need. Here are a few examples of the twenty templates:

  1. Credit card numbers
  2. Social security numbers
  3. IP addresses
  4. Phone numbers
  5. Product keys
  6. US zip Codes
  7. Driver’s license
  8. Email addresses
  9. German postal codes
  10. Canadian health card

Figure 2: Acaveo DataClassifier Regular Expression Builder

When the data is captured you can then take action by using the Smart Information Server to migrate, defensibly delete, capture, compress and track it among other options. As the content is being indexed it can be automatically detected and then remediation begin by flagging content with additional properties or metadata that can set off alerts or change access to the documents that match a policy. The policies are a set of powerful regular expression pattern matching and advanced filtering capabilities that leverage SharePoint Search in SharePoint 2013 and in FAST Search for SharePoint 2010. It’s really a lightweight solution that doesn’t require additional indexing or hardware. Competitive solutions require additional infrastructure and additional indexing while DataClassifier leverages existing SharePoint infrastructure. Because it built on SharePoint search, it can capture data based on policies.

Four Points of Business Value

The business scenarios to me were quite easy to understand around identifying the documents or items that would show up in search that either shouldn’t be there, or I needed to make sure they were being managed. Then I’d devise a monitoring plan to ensure compliance with various standards and policies. Working closely with corporate security and information security policies is really what’s needed here to ensure legal compliance but also in ensuring appropriate industry compliance.

  • Discover documents that would violate privacy and compliance issues.
  • Eliminate sensitive documents from public or anonymous sites.
  • Monitor SharePoint data for files that should not be shared.
  • Ensure compliance by managing, eliminating or migrating privacy related data

Figure 3: Acaveo Power Search

How Do I Setup DataClassifier?

DataClassifier installs directly into the SharePoint farm. It requires a SQL Server for configuration and reporting and can leverage the existing SharePoint SQL Server. It is installed on all crawl servers. When you break it down, it is essentially the DataClassifier database and two windows services—one for content enrichment and the other for the DataClassifier software. It’s designed to take advantage of SharePoint Search or Fast Search Web Services. It’s a simple deployment and default configuration and reporting is designed to help you start realizing value on day one. Once set up and indexing, it can be used to identify and automatically classify documents from Exchange, File Shares and SharePoint. It does need search admin rights or Fast search admin group read access. It also needs admin rights in 2013 for the SP Shell admin commandlet. This is all covered in the admin guide.

Pricing and Licensing Pricing is simple. It’s licensed at $4,995 per crawl component (i.e. crawl, indexing server). In some five-server farms, that might be one server, and others, it may be three, so it really depends on how the farm is configured. With this licensing you aren’t being charged any extra for indexing Exchange, SharePoint, and File Shares. You’re just paying for the SharePoint Servers related to indexing. Alternatively to buying DataClassifier as a standalone product, Acaveo bundles it with its information governance solution, Smart Information Server.

Reality Check and My Thoughts on Acaveo DataClassifier

Ok. I admit I hadn’t even heard of Acaveo DataClassifier until recently. (Apparently, it was only released back in October.) I was excited to hear about the solution when I started writing this review. I wasn’t really surprised to hear the problem they were trying to solve. It started with customers that approached them and said… “We really don’t know what’s in our SharePoint environment. Build a product that will help identify the documents that expose us to risk.” I personally have heard the need crop up on many occasions. I’ve heard of customers who’ve built on FAST to identify PII information, such as phone numbers and social security numbers, but I hadn’t seen a solution that did a great job of automating and helping classify the data so action could be taken. Based on this, I think the niche that they’ve identified is very valid. In my past, I’ve worked in companies where corporate security was breathing down our necks saying SharePoint exposes the company to risk due to lack of security and not knowing what was actually being stored in SharePoint. The kitchen sink is in SharePoint… that’s for sure. As more and more put data in SharePoint and the power of search exposes that information, governance and policies can help enforce, but how do you ensure that people are appropriately securing documents that may put customers’ privacy data at risk? Search? Yes! You could build highly complex solutions on search to do what they’ve done here in the discovery, but most people do not know regular expressions, and even as a dev project, the devs don’t know the regulatory issues and what PII data they should be searching.

Honestly, I feel better knowing this product exists. There really is a lot here, and the remediation components I haven’t dove into also help in providing a great governance story. Each customer knows what kind of risk there is in his environment. Many simply want to keep that kind of data out of SharePoint, but how can they ensure that’s the case? In my experience, it’s security that ends up having to force the issue.

When I heard it deployed Windows services on the Index server/Crawl components, at first I was concerned that it broke the SharePoint rules of not installing anything on the servers, but the more I dug into what it would really need to do to take advantage of the Search pipeline I was more reassured. But, you should be aware of the permissions and what it deploys and needs. This is not your typical WSP solution. This is a component that’s really working with Search to ensure compliance as part of the pipeline.

What I saw was impressive, and I do think customers that are exposed to risks should take a look. Those thinking that this product will encrypt the data or make the existing data harder to crack are looking at the product wrong. This one simply identifies the documents or results as an extension to indexing and it includes a power-user search capability along with reporting. It’s up to the administrator or the person running the tool to change permissions or enforce a policy.


Acaveo DataClassifier really fills a niche. It helps you discover documents that contain PII or match a string such as a credit card numbers, phone numbers, social security numbers, zip codes, including 20 different types of data that could be identified as a concern for privacy. With this data it can flag the items which can then be migrated, deleted, compressed or archived. It’s up to the customer to decide what should be done.

In my analysis I confirmed there are a number of business scenarios where this might be necessary, based on your industry. Some customers are more concerned with privacy data being stored in SharePoint. If you are concerned, I think you’ll be pleased to know there are solutions like Acaveo DataClassifier which can both help you discover documents that may contain personally identifiable information. If you’re looking to reduce your risk by either eliminating these kinds of documents or simply ensuring they are classified and appropriately managed with more strict permissions, I think you’ve found a product that’s worth your time in evaluating to reduce your risk.

For more information about Acaveo DataClassifier, see This includes the free trial download form (which gives you an instant download with a 30-day license). All the manuals, video tutorials and other resources are available here:


This product analysis is designed to be an unbiased review by Joel Oleson for vendors and the community. My hope is that this provides value to SharePoint customers in eliminating privacy concerns and ensuring compliance.


One Comment »

Leave A Response »