Clean Data In, Useful Data Out: Data Cleansing Essential to Contact Management Success
8/29/2007
Contact Management (CM) systems have been a topic of great discussion within the legal industry. A seemingly simple idea-allowing a law firm to combine its entire client and contact information from a wide range of sources into a single central database-in reality is a complicated endeavor.
At its core, a CM system can be used for the simple but important job of creating accurate correspondence and documents for clients. However, in most cases, the ultimate goal is to use a CM system for more sophisticated marketing, business development and strategic analysis purposes. Firms that successfully implement integrated
CM systems enjoy tremendous benefits for themselves - and for their clients. Unfortunately, more than half of all traditional law firm CRM initiatives fail to meet expectations. Some of this is due to a lack of attorney participation, but more often it is due to bad data. According to Gartner Research, the number one reason for the failure of traditional CRM systems is poor data quality.
The high failure rate of CM initiatives is often caused by careless importation of unfiltered data from external sources into the new centralized CM system. In too many cases, firms suffer from garbage in and garbage out. To get the optimal return on a firm's investment in a CM system, a well-thought-out data strategy is essential.
CLEANSING DATA: CRITICAL TO CONTACT MANAGEMENT SUCCESS
Data cleansing is a process that analyzes the data collected from multiple sources within a law firm, detects errors, makes corrections based on predetermined standards and highlights non-routine records for closer attention by a human data steward.
In the legal industry, where lawyers manage high-net-worth relationships, having very accurate client and contact data is essential to success. Automated data cleansing software can assist with much of this process; however, human review is still essential.
Obviously, the more automated the data cleansing process, the less expensive and more consistent the end result will be. The combination of contact data from many sources introduces myriad opportunities for error. Because the records in each database were entered by a different person or persons, each person may have introduced typos, placed data in an incorrect field or used different conventions for addresses and abbreviations.
This problem is only exacerbated in a merger situation - where the combined databases are even more diverse. Contact data from some sources - like an attorney's Microsoft® Outlook contacts - may be fairly up-to-date and correct, while data from other sources - like the mailing list from a 2005 seminar - is more likely to be out-of-date and incorrect.
If a CM system is installed and data from various sources dumped into it with no cleansing, a senior partner's valued client could easily receive four copies of an invitation to an upcoming event based simply on name variation - one for James Steward, one for Jim Steward, one for James W. Steward and one for J.W. Steward. And who knows how many combinations and permutations might be generated by the address fields?
When he gets a complaint from the client ("After all these years and all the money I've sent your way, don't you guys know who I am?"), the senior partner will be highly annoyed - and less likely to use the CM system in the future.
Lawyers are trained to develop a highly critical eye for their client work. Even simple errors in the data contained in a newly launched CM system will cause them to lose confidence and opt out - leading to the system's costly failure.
DATA CLEANSING IS ESSENTIAL
The first step in the data cleansing process is a search of a law firm's database records to identify and collect all of the firm's existing data sources. One major source of good contact information is the Microsoft Outlook contacts of the firm's attorneys. The average attorney has 300 to 400 contacts. In Contact Manager, the solution implemented by Hubbard One, all individual attorney contacts will be combined and synchronized to create a single version of the record in the centralized database (there is also an option that allows an attorney to keep parts of the contact record private).
Client and matter information is also acquired from a law firm's time and billing system.
Initially, in the interest of cost savings and efficiency, most firms decide to focus on cleansing the data from their top 500 to 1,000 clients - selecting the number that best represents a majority of revenue over the past three to five years. The Pareto Principle, which states that 20 percent of clients will likely yield 80 percent of revenue, seems to apply to most corporate law firms.
Another major source of contact information is the marketing database that has been developed for specific firm activities - like seminars, open house, alumni events, newsletters (electronic and print), announcements and holiday cards. Finally, contact data is often found in isolated Excel spreadsheets and Access databases used by various practice groups or client teams.
Once all of the external data sources have been located, the format of each is identified. Common formats used in databases are SQL (structure query language), XML (used on the Internet), CSV (comma separated values) and PST (Microsoft Outlook).
All of these databases do not speak the same language. To communicate with each other for comparison purposes, they need a translator - which is where the Hubbard One data cleansing software comes in. Each data source is mapped and prepared for import into a consistent internal format. The data cleansing software is built on Microsoft .NET technology. During this phase, the data cleanser will also parse the name and address data. Parsing ensures that bits of data - such as first, middle and last names or parts of company names - are in the correct field.
CONSOLIDATE EXISTING DATA SOURCES AND IMPORTING INTO DATA CLEANSER
Once all of the external data sources have been imported, the analysis phase begins. The data cleanser relies on rules that govern processing. A rule can be as complex or simple as it needs to be. It can be specified to execute during any part of the process - import, analysis, cleanse and/or export.
There may be hundreds of rules governing the cleansing process. Each rule will control a single definitive decision. Most computer applications were designed with the express purpose of making these kinds of routine decisions automatically.
Rules can ensure that a client (company or individual) name appears consistently in each use. Over the years, for example, a firm might have accumulated contacts from PricewaterhouseCoopers - or its predecessors Coopers & Lybrand and Price Waterhouse - or even their predecessors.
In addition, the correct version of the current name has unusual spacing and capitalization. The created rule would change all mentions of all versions to PricewaterhouseCoopers.
For street addresses, does the firm prefer 17th St., 17th Street, Seventeenth St. or Seventeenth Street? Should there also be a rule variation for 17th Ave., 17th Avenue, Seventeenth Ave. or Seventeenth Avenue? Once the decision is made, the rule will make sure the entries using this street name always appear the same. If the firm wants to allow for some variation, the rules can allow for that option as well.
In addition, abbreviations, titles and capitalizations will be made consistent in conjunction with firm preferences as they are expressed in rules. Once the data is combined, the data cleanser can also fill empty fields by automatically pulling data from an internal source or an external database - like the United States Postal Service database. Most firms choose to augment their client data with industry codes and other valuable research from sources like Dun & Bradstreet or Standard & Poor's.
Rules can also be created to identify relationships for importation into the CM system - using either name, so that the record will indicate everyone in a firm who knows the contact, and/or by client number, which will indicate everyone in a firm who has done work (and the nature of the work) for the contact. One of the most challenging and resource-intensive phases of data cleansing is the detection and removal of duplicate records of the same entity.
Some firms will combine all client and contact lists into an Excel spreadsheet, sort the spreadsheet and manually attempt to remove duplicates. This process neither maintains the association of the contact to the contributing attorney nor helps the reviewer choose the correct address from the alternatives listed. The Hubbard One data cleanser solves these problems - and many others.
There might be multiple entries with correct but different variations (like the "James Steward" example above) or multiple entries that match exactly on all attributes (like three completely identical entries for "Jane C. Doe”). As opposed to "proper name and address" cleansing, which is fairly straightforward once the rules are set, "duplicate records" cleansing relies on a more sophisticated process that uses mathematics and artificial intelligence techniques.
Based on a sophisticated mathematical vector model, duplicates will be detected and reports will be run only for those "suspect" items that need to be manually checked by data stewards during the review phase. These individuals can then determine the "winning" (most correct) version of the record.
In searching for duplicates, certain data sources are given a heavier "weight" than others. For example, data retrieved from Microsoft Outlook files or a recent open house mailing list can be weighted more heavily than other data sources - because they are more up-to-date.
In addition, certain fields within a data source - such as the highly unique e-mail address field - are given more weight than others. Using this information, the data is processed and reports are generated for both positive duplicate matches and possible duplicate matches, depending upon threshold values which are set. The reports are reviewed and - if necessary - the rules are modified.
When all of the reports have been reviewed and accepted, the final data cleansing pass is performed to merge and purge records. Important associated information for each record - such as which Outlook user provided the name and the number of the contact - remains intact within the record. Once cleansed, the data is mapped and exported into the law firm's new CM system.
Once all the client and contact data have been collected, cleansed and imported into the CM system, the job is not yet done. According to Data Warehousing Institute, unattended business data deteriorates at a rate of 1.5 to 3 percent each month. In addition, each time someone adds data -as simple as adding a new contact in Microsoft Outlook - the accuracy of the system is compromised. The Data Quality Manager component of Contact Manager helps keep data clean after the initial cleansin process is complete.
Most law firms that are serious about their CM systems create the position of data steward and choose a talented person to fill it. A data steward is responsible for monitoring the quality of data coming into and going out of the system. Additions, deletions and changes to a record are flagged by the Data Quality Manager module of the CM system and put into a queue, where the data steward can double-check the entry to verify that it is valid before accepting or rejecting the entry.
Only after this process is complete is data added to the centralized system. Of course, attorneys and staff can still enter and save information in Outlook even if it is not contributed to the marketing database and accepted by the data steward.
CONCLUSION
A CM system is a large investment for any law firm. For this investment to pay off, data from existing sources must be thoroughly cleansed before it is added to the new system. Failure to do so will result in errors and duplicates. First, all data from external sources must be consolidated into one file that speaks a common language. Then, rules that will provide consistency and prevent duplication must be created, run and approved. With good rules in place, all data is run through the cleanser. The correct, de-duplicated results are ready to be added to a CM system. "Clean data in" leads to "useful data out." Data cleansing is essential to CM success
Nancy Manzo, Senior Consultant
Nancy Manzo is a senior consultant in the marketing center practice at Hubbard One. She has 17 years of experience working with large law firms on using technology to support their business development objectives, including ten years as a marketing director at a global law firm. Prior to joining Hubbard One, Nancy led a successful consulting firm that worked with over 100 US, Canadian and Italian law firms on client relationship management initiatives. At Hubbard One, Nancy's group provides consulting in the areas of requirements gathering, business case development, implementation strategy, change management, metrics and return on investment analysis. Nancy received her bachelor of art's degree in history and political science from Eugene Lang College of the New School for Social Research in New York City.