Spare a thought for your data engineers, analysts and architects. They are often, it seems to me in that difficult place where an irresistible force meets an immovable object. They are at the point where the expectations of the business for ever faster access to quality data come face to face with the realities of data management in corporate IT. And, in order to meet those demands one of the most significant components the data architect needs is access to the enterprise metadata in your organization’s technology applications. This metadata provides definitions and data structures which are vital for complex projects. And, as they will tell you accessing, understanding and using this information is not a trivial task.
The reason why this is can be such a challenge of course, is that corporate IT has spent the past 50 years or so building an ever more complex landscape of applications and systems, some or all of which may be required to contribute to data intensive projects. And this means more and more metadata to access, understand and exploit. Naturally there are no standards which vendors have followed in order to build the data foundations of their applications.
Of course, the computing industry has made great strides in building and selling ever more advanced analytics, ETL, data warehouse and data lake tools as well as sophisticated data catalog, lineage, governance and quality solutions amongst others. However, until recently metadata discovery has largely been ignored and left to technical specialists to deliver.
Enterprise metadata is important because it represents the building blocks of the source systems and comprehensive knowledge about it is critical for all projects involving data. But, actually finding, making sense and using that metadata can be difficult, time consuming, expensive and ultimately, act as a drag on project delivery.
Perhaps this is one reason why it is sometimes still so difficult to meet the expectations of the business? Perhaps a significant part of this is to do with the challenges in finding and understanding that enterprise metadata which is so critical if these new data transformation programs are to deliver against their expectations?
And this is where some of the challenges delays occur for while the force of the imperatives of the business are understood, the realities of enterprise metadata discovery in the context of the complex IT landscape often act as an impediment to delivering the data needed.
Of course, some metadata is easy to access and understand. For example, the database system catalog supporting a limited, home grown database application can usually be reverse engineered or accessed by crawlers and used for ETL processes, within a data catalog solution or to support a data lake. This is because the metadata in the system catalog contains both technical and logical (or business friendly) names for tables and attributes. In addition, it may sometimes include the relationships between tables by way of Primary and Foreign Key constraints, so the task of a data analyst or enterprise architect is made easier when they come to using this metadata to support the business. Other sources such as files, spreadsheets, some smaller applications, machine data, and even social media tools also give up their metadata relatively easily.
Some sources present a much more difficult challenge. Good examples of these are the large, complex, and often heavily customised packaged applications from vendors such as SAP, Oracle, Microsoft and Salesforce. In addition some of the newer breed of SaaS based packages also make it a tough to access and understand their enterprise metadata. These applications are often critical to a project because they store so much of the data that is required.
These problems are caused by some characteristics which each of these packages have in common. For example most of them have many thousands of data tables and millions of attributes which means there is a lot of metadata to find and analyse. As an illustration, a standard SAP implementation has over 90,000 base tables. Typically, these applications do not store the metadata which is really valuable to data architects, such as business names for tables and attributes, and relationships between tables, in the database system catalog. This renders database scanners virtually useless. The metadata in each of these applications is usually held in a series of data dictionary tables. It structured differently in each system, and each system has their own method for accessing the metadata from third party products. These approaches include reading data dictionary tables via SQL, or an API, or in the case of SAP via ABAP.
A further problem is that once the source of the enterprise metadata has been identified and accessed, the data architect now needs to be able to analyse and curate it in order to isolate what is valuable in the context of the business facing application. Doing this with systems which have thousands, or tens of thousands of tables with complex relationships is not a simple task.
Finally, enterprise metadata should never live in a vacuum. It needs to be shared and used in or with other tools and products; for example data catalog and governance applications, ETL and data warehouse or data lake products, as well as for enterprise architecture with data modeling tools.
Before embarking on acquiring more technology, the first question to ask is whether the methods and techniques your teams have been using to do this work up to now are still fit for purpose in a business environment which is rapidly changing and becoming ever more competitive.
Do they enable your data engineers, analysts and architects to deliver what your business needs in terms of up to date, accurate data, or metadata for your data catalog, data governance, data warehouse and analytics projects?
Do these methods contribute to the delayed delivery of critical initiatives or cost overruns?
Have problems with data discovery meant that you have had to limit the scope of some projects or even had to cancelled them?
If the answers are above are ‘yes’, then it is time to start thinking differently and ensuring that your data teams have access to the specialist tools which are designed to make it easier to discover and work with large quantities of complex, difficult to access metadata and exploit it in other technology platforms.
Without these, their job is harder, takes longer and may mean that the data your business is relying on is not accurate or takes too long to deliver.
To learn how Safyr could help you to change the way your data analysts discovery and make use of metadata from SAP, Oracle, Microsoft and Salesforce applications, visit our website.