Introduction
You have probably noticed how much attention is being placed in your enterprise to optimise the use of, or “monetise”, the data that flows through your organisation’s IT systems.
One of the problems associated with this of course is that it is likely that some of the information you want to include for this will be in what are called data silos. Which applications are most likely to be silos, why does that happen and what can you do about it?
According to Gartner, “Data Monetization refers to using data for quantifiable economic benefit. This can include indirect methods such as: measurable business performance improvements, beneficial terms or conditions from business partners, information bartering, productizing information (i.e., new information-based offerings), “informationalizing” products (i.e., including information as a value-add component of an existing offering), or selling data outright (via a data broker or independently).”
A 2017 McKinsey study into data monetisation found that the most frequent functional area which could point to improved performance as a result of a broader use of data and analytics tools was Sales and Marketing. The next most common was Research and Development and the third was Supply Chain and Distribution.
Clearly trying to use data to enhance profitability and effectiveness makes eminent sense. After all you are investing a lot of money in your IT infrastructure so if you can use that data to become more competitive, gain and retain customers, innovate more effectively and optimise your supply chain and business processes by gaining better insight then the more successful you will be.
If you can access and combine data from multiple sources, perhaps in a data lake or enterprise data warehouse, your data scientists and analysts can use advanced tools to identify patterns and trends which can help you to anticipate the future and make decisions. Doing the same for a data catalog or data glossary project can help to democratise data and give business users access to accurate information to help them to work more effectively and reduce the amount of time they take to find the data they need.
Depending on the characteristics of your particular organisation you may have many different types of sources. Data may reside in databases, files, external systems, ERP or CRM packages or could even be machine or sensor data. Finding and accessing the data you need from some of these data sources can be very simple, others can present more of a challenge.
Those whose data are more difficult to find and access can result in your organisation having what are referred to as data silos. In other words they become discrete systems which can rarely, or ever be included in data and information management projects without significant resource and effort.
Which of your IT systems suffer most from this and why.
In our experience the data sources which are most commonly considered to be data silos are the large complex and customised ERP and CRM packages from vendors such as SAP, Oracle and Microsoft. We are also seeing increasing numbers of larger enterprise wide Salesforce implementations presenting the same challenges.
If your enterprise runs one or more of these, or similar, systems then you may already be familiar with problems associated with data silos. What is it about these applications which causes these issues?
In addition to gaining access to their data, the problems are largely to do with the characteristics of their metadata. These include where the metadata is stored, its quantity and complexity as well as its accessibility and the ease, or otherwise, with which it can be analysed and then shared with other tools and technologies.
The reason that metadata is so critical is that you use it to find and understand the data you need in the context of virtually any data and information management projects including a data monetization initiative. For example in a data warehouse or data lake project if you do not know which tables in your source applications contain the data your business users need then you will struggle to deliver accurate information to them. Similarly the success of a data catalog or glossary relies on its ability to provide users with accurate information about data in context. If you cannot find relevant business friendly metadata in your source systems to support these activities, the project may fail to live up to expectations. You can probably think of other data management projects which rely on access to usable metadata for their success.
Large, complex ERP and CRM applications display three characteristics when it comes to their metadata which hinder or preclude their inclusion in enterprise data initiatives and make it more likely that they will gain the status of untouchable data silos.
Most ERP and CRM packages are based on a relational database platform. In principle therefore you might think you could use a database scanner or perhaps a data modeling tool to pull out all the table and attribute names automatically.
The problem with that approach is that database system catalogs for these packages contain no meaningful metadata, by which I mean ‘business names’ and descriptions for tables and fields. In addition there are no relationships between tables defined so it is virtually impossible to identify how tables are related. This is important if you want to know for example how “order” is related to “customer”.
To help you locate the data in your ERP and CRM packages to support your data monetisation initiative it is necessary to be able to quickly and accurately find which tables represent the business context for whatever you need for your project and ensure that they contain useful information.
Even if you could access those data dictionary tables which contain the rich metadata and extract it into perhaps a spreadsheet or a database you still have to be able to analyse and scope the metadata to find what you need.
When you consider this in the context of the size of the data models of the systems the scale of the challenge becomes even apparent. For example, SAP has over 90,000 tables, a typical Oracle eBusiness Suite implementation has over 20,000 tables and even JD Edwards systems have in excess of 4,000 tables. We also have experience of Salesforce applications with over 3000 tables.
It is unlikely that you will need to use all the metadata in your ERP or CRM system, even for a data catalog project. For example does it make sense to include 90,000 tables and a million attributes from an SAP system if say half of them contain no data?
Traditional methods of discovering metadata include searching through whatever documentation is available, engaging consulting services or relying on internal technical specialists. There are some vendor tools however these are not designed for business or data analysts who have little or no technical knowledge of the application. In addition some of these tools do not provide a full representation of the available metadata.
Some users resort to searching the internet for data models of their ERP and CRM systems or even guesswork!
Even solutions based on machine learning will need to be educated into the way the metadata or data is structured in order to be effective and this can only really happen with clear insight into the ERP and CRM metadata.
Finally these methods and tools do not share metadata easily with other technology platforms such as data catalogs, ETL or data modeling products.
The problem with these approaches is that they are long-winded, time-consuming, expensive and often inaccurate. In some cases you may find that this causes business users to lose faith in the ability of IT to deliver and tempts them to start looking for their own solutions.
Once you’ve identified the relevant metadata you require from these packages then it is likely that you will need to be able to reuse it quickly and easily in other technologies being used on the project.
However you access and scope the metadata, if the results are not produced in a format which can be easily brought into those other products then you may have to resort to hand keying it or copying and pasting it. This is time a consuming, costly and potentially inaccurate method especially for large numbers of tables or tables with a lot of columns.
Imagine for example trying to rekey the information from a single SAP table, MARA (General Material Data) into a data catalog or data model. It has over 240 attributes (columns) each with a business name and a more detailed description and is related to over 1500 other tables in the SAP system. We normally estimate that it takes about 1 day to rekey the metadata from 5 tables accurately into another system.
Clearly it is likely that you will need metadata from more than a few tables for your project so this is not really a viable option.
Obviously the quickest, most effective and accurate method is to import it using whatever mechanisms are available. However, this is not necessarily a simple task, especially when the metadata is complex as you will also need to understand the data structures of the target products in order to know how to create the source to target mappings necessary. If you have large quantities of metadata, there may be issues with speed of processing. Also, if you have not been able to access and analyse the rich metadata in the application then loading physical table and column names into the data catalog is of dubious value.
What are the implications of not being able to access, analyse and share their metadata?
It could possibly mean that data from your ERP or CRM packages are not included in data monetisation projects at all rendering these initiatives of limited value.
Or the anticipated benefits are delayed whilst the work to include their data is completed. In this instance it is likely that costs would also be higher than budgeted.
Unwanted outcomes such as these can reduce the trust and faith the business has in IT’s ability to deliver leading to reduced budgets for the future or departments and business units deciding to take control for themselves.
Take some time to consider or ask your team:
- Are data from your ERP and CRM systems able to be included in data and information management projects in acceptable timescales and with an appropriate level of accuracy? If not, why not?
- Do these enterprise applications act as a drag on your digital transformation or data monetisation initiatives? What is the cost to your organisation? If so what is it about them that makes that happen?
- How is metadata discovery or source data analysis done for your CRM and ERP packages now?
- Who does that work? Are they internal technical specialists or data analysts? Do you have to use 3rd party external resource?
- What is the cost of this work? What tools and techniques do they use?
- How long does it take to find the tables and related tables which contain the data needed?
- How accurate are the results?
- How easy is it to check accuracy and how much rework is needed to correct mistakes?
- How far reaching and what is the cost of any rework?
Finally, ask your vendors of data catalog, ETL, data warehouse, data lake or master data management tools what solutions they have for accessing, analysing and most importantly, using the rich and helpful metadata that does exist in your ERP or CRM packages.
If you recognise this problem and would like to avoid it for current or future projects, contact us to learn more about how Safyr can help.
You can use Safyr to help you overcome the three obstacles above and accelerate the delivery of your projects whilst managing costs and accuracy.
You extract the rich and useful metadata in your own ERP and CRM systems including customisations and store it in a repository. Then you can navigate, analyse, scope the metadata to create subsets that represent the business topics you are interested in. Finally those subsets can be shared automatically with a variety of different technology platforms and formats eliminating the need for rekeying or manual integration.
There is more information on Safyr here or you can follow this link download a trial version of the software.