Building a Data Dictionary for SQL Server, MySQL, and Postgres

Table of Contents

TL;DR - Give me the code, now!

Introduction

As a data analyst in a large enterprise, we have to touch tens of application databases, and each database would have tens and hundreds of tables. Building a data dictionary is a crucial step to help discover what data is available.

At first sight, this might seem like a trivial problem involving queries to many information schema tables, but as we have identified several additional attributes which would be useful to determine which table/data source is useful, we found those information simply not available in information schemas (such as column descriptions in SQL Server).

And to take this to a further level, since our databases are centrally registered on one main SQL Server as linked server, we would like to fully automate and implement a stored procedure to crawl and collect all schema information into a single data dictionary table.

In this post, we will automate the building of data dictionary on a SQL Server, by collecting database schemas from local databases and databases from SQL Server, MySQL and PostgreSQL linked servers. Since the script is written in declarative style, you could easily expand and adapt to other linked DBMS solutions. You can also use the individual SQL per DBMS if you are not interested in building a central data dictionary on a SQL server.

This post is tested against SQL Server 2016

What we will be collecting

With the basics such as table name, column name and data type in mind, we have decided to collect the following information which would be beneficial to identify the purpose of the column and overall quality of the data source:

  • Foreign Key relationships - This will be helpful to identify table references and the corresponding primary key column, especially when the foreign key column is not named predictably
  • Table and Column description - In each DBMS, comments and description could be added to tables and columns, and those might already be populated by the application developer. For example, when I designed one of my department’s application, the columns’ description are written in SSDT and subsequently written to the database
  • (Estimated) Table Row count - This will be helpful to identify (1) in case of replication, tables that were not replicated; and (2) tables belonging to features which were not used. Collecting actual row count would be a performance nightmare, so we will turn to alternative methods as we outline per each DBMS below

Note: The foreign key information is not collected for Postgres because I am not familiar enough with Postgres to write that part of the query, you are welcomed to contribute!

The code

DataDictionary table structure

First of all, we need to define the data dictionary’s schema, below is the schema we will be using. In additional to the fields and information named above, we have added [ColumnDescriptionDD], which is a special column for us to manually put in our remarks, and this field will be persisted across automatic schema pulling.

CREATE TABLE [dbo].[DataDictionary]
(
	[SystemName] VARCHAR(255) NOT NULL , 
	[ServerName] VARCHAR(255) NOT NULL, 
	[DatabaseName] VARCHAR(255) NOT NULL, 
	[SchemaName] VARCHAR(255) NOT NULL, 
	[TableName] VARCHAR(255) NOT NULL, 
	[ColumnName] VARCHAR(255) NOT NULL, 
	[OrdinalPosition] INT NULL, 
	[DataType] VARCHAR(255) NULL, 
	[ForeignKeyTo] VARCHAR(255) NULL, 
	[TableDescription] NVARCHAR(1024) NULL, 
	[ColumnDescription] NVARCHAR(1024) NULL, 
	[TableRowCount] BIGINT NULL, 
	[ColumnDescriptionDD] NVARCHAR(1024) NULL,
	[FirstSeenAt] DATETIME2(0) NOT NULL DEFAULT CURRENT_TIMESTAMP, 
	[LastSeenAt] DATETIME2(0) NULL, 
	CONSTRAINT [PK_DataDictionary] PRIMARY KEY NONCLUSTERED ([ServerName], [DatabaseName], [SchemaName], [TableName], [ColumnName]) 
)

CREATE CLUSTERED INDEX [IX_DataDictionary_Clustered] ON [dbo].[DataDictionary] ([ServerName], [DatabaseName], [SchemaName], [TableName], [OrdinalPosition])

GO
EXEC sp_addextendedproperty @name = N'MS_Description',
	@value = N'The name of the last linked server in a linked server chain (not necessarily the hostname of the server itself)',
	@level0type = N'SCHEMA',
	@level0name = N'dbo',
	@level1type = N'TABLE',
	@level1name = N'DataDictionary',
	@level2type = N'COLUMN',
	@level2name = N'ServerName'
GO
EXEC sp_addextendedproperty @name = N'MS_Description',
	@value = N'Populated when the column disappears (i.e. when table, schema, or server is gone). Can be reverted to NULL when column reappears',
	@level0type = N'SCHEMA',
	@level0name = N'dbo',
	@level1type = N'TABLE',
	@level1name = N'DataDictionary',
	@level2type = N'COLUMN',
	@level2name = N'LastSeenAt'
GO
EXEC sp_addextendedproperty @name = N'MS_Description',
	@value = N'NULL means the column has been deleted',
	@level0type = N'SCHEMA',
	@level0name = N'dbo',
	@level1type = N'TABLE',
	@level1name = N'DataDictionary',
	@level2type = N'COLUMN',
	@level2name = N'OrdinalPosition'

Collecting schema from SQL Server

The following query is for collecting the data dictionary fields from SQL Server, with $DatabaseName being the placeholder for specifying the SQL Database name.

To filter out irrelevant tables, a WHERE clause has been added to exclude tables under a user schema (you should adjust the DOMAIN\ accordingly), tables tagged with TEST or BAK, and internal tables created by Master Data Service (MDS).

SELECT
	schemas.name AS SchemaName,
	tables.name AS TableName,
	columns.name AS ColumnName,
	ROW_NUMBER() OVER (PARTITION BY columns.object_id ORDER BY columns.column_id) AS OrdinalPosition,
	types.name + CASE
		WHEN types.name IN ('binary', 'char', 'varbinary', 'varchar') THEN '(' + CASE WHEN columns.max_length = -1 THEN 'MAX' ELSE CAST(columns.max_length AS VARCHAR(255)) END + ')'
		WHEN types.name IN ('nchar', 'nvarchar') THEN '(' + CASE WHEN columns.max_length = -1 THEN 'MAX' ELSE CAST(columns.max_length / 2 AS VARCHAR(255)) END + ')'
		WHEN types.name IN ('decimal', 'numeric') THEN '(' + CAST(columns.precision AS VARCHAR(255)) + ',' + CAST(columns.scale AS VARCHAR(255)) + ')'
		WHEN types.name IN ('datetime2', 'datetimeoffset', 'time') THEN '(' + CAST(columns.scale AS VARCHAR(255)) + ')'
		ELSE ''
	END AS DataType,
	fk_ref.ReferencedColumnQN AS ForeignKeyTo,
	table_desc.Description AS TableDescription,
	col_desc.Description AS ColumnDescription,
	tstat.TableRowCount
FROM $DatabaseName.sys.columns columns
INNER JOIN (
	SELECT name, object_id, schema_id, is_ms_shipped FROM $DatabaseName.sys.tables
	UNION ALL
	SELECT name, object_id, schema_id, is_ms_shipped FROM $DatabaseName.sys.views
) tables ON tables.object_id = columns.object_id
INNER JOIN $DatabaseName.sys.schemas schemas ON schemas.schema_id = tables.schema_id
INNER JOIN $DatabaseName.sys.types types ON types.user_type_id = columns.user_type_id
LEFT JOIN ( -- Get table row count (LEFT JOIN because VIEWs do not have a row count)
	-- Ref: https://stackoverflow.com/a/28917736
	SELECT object_id, SUM([rows]) AS TableRowCount
	FROM $DatabaseName.sys.partitions
	WHERE index_id in (0,1)
	GROUP BY object_id
) tstat ON tstat.object_id = tables.object_id
LEFT JOIN ( -- Get foreign key references (If one column has multiple FK, one will be chosen to display arbitrarily, however this should be rare in any sane DB design)
	SELECT
		parent_object_id,
		parent_column_id,
		ReferencedColumnQN
	FROM (
		SELECT
			fkc.parent_object_id,
			fkc.parent_column_id,
			QUOTENAME(s_child.name) + '.' + QUOTENAME(t_child.name) + '.' + QUOTENAME(c_child.name) AS ReferencedColumnQN, -- Qualified Name
			ROW_NUMBER() OVER (PARTITION BY fkc.parent_object_id, fkc.parent_column_id ORDER BY fkc.constraint_object_id) AS Priority -- ORDER BY is arbitrary
		FROM $DatabaseName.sys.foreign_keys fk
		INNER JOIN $DatabaseName.sys.foreign_key_columns  fkc		ON fkc.constraint_object_id = fk.object_id
		INNER JOIN $DatabaseName.sys.tables			   t_child	ON t_child.object_id = fkc.referenced_object_id
		INNER JOIN $DatabaseName.sys.columns			  c_child	ON c_child.object_id = t_child.object_id		 AND fkc.referenced_column_id = c_child.column_id
		INNER JOIN $DatabaseName.sys.schemas			  s_child	ON s_child.schema_id = t_child.schema_id
	) t
	WHERE Priority = 1
) fk_ref ON fk_ref.parent_object_id = columns.object_id AND fk_ref.parent_column_id = columns.column_id
LEFT JOIN (
	-- Ref: https://learn.microsoft.com/en-us/sql/relational-databases/system-catalog-views/extended-properties-catalog-views-sys-extended-properties?view=sql-server-ver16
	SELECT
		major_id AS object_id,
		CAST(value AS NVARCHAR(1024)) AS Description
	FROM $DatabaseName.sys.extended_properties
	WHERE class = 1 AND minor_id = 0 AND name = 'MS_Description'
) table_desc ON table_desc.object_id = columns.object_id
LEFT JOIN (
	-- See table_desc reference
	SELECT
		major_id AS object_id,
		minor_id AS column_id,
		CAST(value AS NVARCHAR(1024)) AS Description
	FROM $DatabaseName.sys.extended_properties
	WHERE class = 1 AND name = 'MS_Description'
) col_desc ON col_desc.object_id = columns.object_id AND col_desc.column_id = columns.column_id
WHERE	
	tables.is_ms_shipped = 0
	AND NOT (schemas.name LIKE 'DOMAIN\%') -- Ignore personal tables
	AND NOT (tables.name LIKE '%_BAK_%' OR tables.name LIKE '%_TEST_%') -- Intermediate tables
	AND NOT (schemas.name = 'mdm' AND (tables.name LIKE 'viw_%' OR tables.name LIKE 'tbl_%')) -- MDS internal tables
	AND NOT (schemas.name = 'stg' AND (tables.name LIKE 'viw_%' OR tables.name LIKE '%_Leaf')) -- MDS internal tables
	AND NOT (schemas.name = 'stg' AND tables.name LIKE '[[]stg].[[]viw_%') -- MDS internal tables, name looks malformed but it exists

Collecting schema from MySQL

The following query is for collecting the data dictionary fields from MySQL. Since there is no catalog or database in MySQL, there is no $DatabaseName placeholder in this SQL, and as a result the SQL for pulling MySQL schema is much simpler.

It should be noted that, the row count estimation could be a very rough figure, as documented in MySQL Reference Manual:

The number of rows. Some storage engines, such as MyISAM, store the exact count. For other storage engines, such as InnoDB, this value is an approximation, and may vary from the actual value by as much as 40% to 50%. In such cases, use SELECT COUNT(*) to obtain an accurate count.

SELECT
	columns.TABLE_SCHEMA AS SchemaName,
	columns.TABLE_NAME AS TableName,
	columns.COLUMN_NAME AS ColumnName,
	columns.ORDINAL_POSITION AS OrdinalPosition,
	columns.COLUMN_TYPE AS DataType, -- Can list unassigned for number and enum type
	fk_ref.ForeignKeyQN AS ForeignKeyTo,
	LEFT(tables.TABLE_COMMENT, 1024) AS TableDescription,
	LEFT(columns.COLUMN_COMMENT, 1024) AS ColumnDescription,
	tables.TABLE_ROWS AS TableRowCount
FROM INFORMATION_SCHEMA.COLUMNS columns
INNER JOIN INFORMATION_SCHEMA.TABLES tables ON tables.TABLE_CATALOG = columns.TABLE_CATALOG AND tables.TABLE_SCHEMA = columns.TABLE_SCHEMA AND tables.TABLE_NAME = columns.TABLE_NAME
LEFT JOIN (
	SELECT
		rc.CONSTRAINT_SCHEMA,
		rc.TABLE_NAME,
		kcu.COLUMN_NAME,
		GROUP_CONCAT(CONCAT_WS(''.'', kcu.REFERENCED_TABLE_SCHEMA, kcu.REFERENCED_TABLE_NAME, kcu.REFERENCED_COLUMN_NAME)) AS ForeignKeyQN
	FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS rc
	JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu ON
		rc.CONSTRAINT_SCHEMA = kcu.CONSTRAINT_SCHEMA
		AND rc.TABLE_NAME = kcu.TABLE_NAME
		AND rc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME
	GROUP BY rc.CONSTRAINT_SCHEMA, rc.TABLE_NAME, kcu.COLUMN_NAME
) fk_ref ON fk_ref.CONSTRAINT_SCHEMA = columns.TABLE_SCHEMA AND fk_ref.TABLE_NAME = columns.TABLE_NAME AND fk_ref.COLUMN_NAME = columns.COLUMN_NAME
WHERE columns.TABLE_SCHEMA NOT IN ('sys', 'mysql', 'information_schema', 'performance_schema', 'mysql_innodb_cluster_metadata')

Collecting schema from Postgres

The following query is for collecting the data dictionary fields from Postgres. $DatabaseName is the placeholder for specifying the SQL Database name.

Note that:

  • The row count does work for partitioned table (equals to the sum of row count of the individual table partitions)
  • The row count accuracy “depends on whether ANALYZE or VACUUM are run enough - where “enough” is defined by the level of write activity to your table”. See this StackOverflow answer
  • The foreign key information is not collected (because I am not familiar enough with Postgres)
SELECT
	schemas.nspname AS SchemaName,
	tables.relname AS TableName,
	columns.attname AS ColumnName,
	ROW_NUMBER() OVER (PARTITION BY columns.attrelid ORDER BY columns.attnum) AS OrdinalPosition,
	format_type(columns.atttypid, columns.atttypmod) AS DataType,
	NULL AS ForeignKeyTo, -- Not implemented
	left(obj_description(columns.attrelid, 'pg_class'), 1024) AS TableDescription,
	left(col_description(columns.attrelid, columns.attnum), 1024) AS ColumnDescription,
	tables.reltuples::bigint + COALESCE(tstat2.children_reltuples::bigint, 0) AS TableRowCount
FROM $DatabaseName.pg_catalog.pg_attribute columns
INNER JOIN (
	SELECT oid, relname, relnamespace, reltuples
	FROM $DatabaseName.pg_catalog.pg_class
	WHERE relkind IN ('p', 'r', 'v', 'm') AND NOT relispartition
) tables ON tables.oid = columns.attrelid
INNER JOIN $DatabaseName.pg_catalog.pg_namespace schemas ON schemas.oid = tables.relnamespace
LEFT JOIN (
	SELECT pg_inherits.inhparent AS oid, SUM(reltuples::bigint) AS children_reltuples
	FROM $DatabaseName.pg_catalog.pg_inherits
	INNER JOIN $DatabaseName.pg_catalog.pg_class children ON children.oid = pg_inherits.inhrelid
	GROUP BY pg_inherits.inhparent
) tstat2 ON tstat2.oid = columns.attrelid
WHERE
	columns.attnum > 0
	AND NOT columns.attisdropped
	AND schemas.nspname NOT IN ('pg_catalog', 'information_schema')

Run the queries against linked servers

As mentioned in the beginning, we have some tens of linked servers (or even double linked servers) and databases to crawl, therefore a declarative script is crucial to fetch all the schemas without repeating the bulky SQLs, this is the reason why $DatabaseName is used throughout the standalone SQLs above.

Here we have hardcoded to support only 2 levels of linked server, this limitation came from the fact that we are using SQL Server 2016, and we cannot reliably split notations like JUMP1.JUMP2.JUMP3.APP_DB because the STRING_SPLIT function does not guarantee order until SQL Server 2022 with <em>enable_ordinal</em> argument, and since in our environment max is second level so we went with 2. It should be trivial to duplicate relevant lines to support more levels.

We also used a table variable instead of directly writing into the data dictionary table to avoid locks and black out period.

DECLARE @SQLServerCols VARCHAR(MAX) = '...';
DECLARE @MySQLCols VARCHAR(MAX) = '...';
DECLARE @PostgresCols VARCHAR(MAX) = '...';

DECLARE @Targets TABLE(
	SystemName VARCHAR(255) PRIMARY KEY,
	ServerNameL1 VARCHAR(255),
	ServerNameL2 VARCHAR(255),
	DatabaseName VARCHAR(255),
	ColsSQLTemplate VARCHAR(MAX)
);

INSERT INTO @Targets VALUES
	-- Local databases
	('Application 1',      NULL,           NULL,          'App1',       @SQLServerCols),

	-- Linked Servers
	('Application 2',      'SVR_APP2_DB',  NULL,          'App2',       @SQLServerCols),

	-- Double-Linked Server
	('Application 3',     'SVR_WAREHOUSE', 'SVR_APP3_DB', 'App3',       @SQLServerCols),
	('Application 4',     'SVR_WAREHOUSE', 'SVR_APP4_DB', 'App4',       @SQLServerCols),
	('Application 5',     'SVR_JUMPHOST',  'SVR_APP5_DB', 'myreportdb', @PostgresCols),
	('WordPress Website', 'SVR_JUMPHOST',  'SVR_WP_DB',   'def',        @MySQLCols)
;

DECLARE @SystemName VARCHAR(255);
DECLARE @ServerNameL1 VARCHAR(255);
DECLARE @ServerNameL2 VARCHAR(255);
DECLARE @DatabaseName VARCHAR(255);
DECLARE @ColsSQLTemplate VARCHAR(MAX);

DECLARE @DataDictionary TABLE -- See DataDictionary table structure
(
	[SystemName] VARCHAR(255) NOT NULL , 
	[ServerName] VARCHAR(255) NOT NULL, 
	[DatabaseName] VARCHAR(255) NOT NULL, 
	[SchemaName] VARCHAR(255) NOT NULL, 
	[TableName] VARCHAR(255) NOT NULL, 
	[ColumnName] VARCHAR(255) NOT NULL, 
	[OrdinalPosition] INT NULL, 
	[DataType] VARCHAR(255) NULL, 
	[ForeignKeyTo] VARCHAR(255) NULL, 
	[TableDescription] NVARCHAR(1024) NULL, 
	[ColumnDescription] NVARCHAR(1024) NULL, 
	[TableRowCount] BIGINT NULL
)

DECLARE MyCursor CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY
FOR
SELECT * FROM @Targets;

OPEN MyCursor
FETCH NEXT FROM MyCursor INTO @SystemName, @ServerNameL1, @ServerNameL2, @DatabaseName, @ColsSQLTemplate
WHILE @@FETCH_STATUS = 0
BEGIN 

	DECLARE @TSQL VARCHAR(MAX) = REPLACE(@ColsSQLTemplate, '$DatabaseName', @DatabaseName);

	IF @ServerNameL2 IS NOT NULL
	SET @TSQL = 'SELECT * FROM OPENQUERY(' + @ServerNameL2 + ',''' + REPLACE(@TSQL, '''', '''''') + ''')';

	IF @ServerNameL1 IS NOT NULL
	SET @TSQL = 'SELECT * FROM OPENQUERY(' + @ServerNameL1 + ',''' + REPLACE(@TSQL, '''', '''''') + ''')';

	SET @TSQL = 'SELECT ''' + @SystemName + ''' AS SystemName, ''' + COALESCE(@ServerNameL2, @ServerNameL1, @@SERVERNAME) + ''' AS ServerName, ''' + @DatabaseName + ''' AS DatabaseName, *
	FROM (' + @TSQL + ') t
	ORDER BY SchemaName, TableName, OrdinalPosition';

	RAISERROR ('Processing System "%s"', 0, 1, @SystemName) WITH NOWAIT;

	INSERT INTO @DataDictionary
	EXEC (@TSQL);

	FETCH NEXT FROM MyCursor INTO @SystemName, @ServerNameL1, @ServerNameL2, @DatabaseName, @ColsSQLTemplate
END
CLOSE MyCursor
DEALLOCATE MyCursor

Writing the final result into DataDictionary

Here we will make use of SQL Server’s MERGE statement, which allows us to simultaneously write newly found columns, update existing column definitions, and mark existing column as deleted, while simultaneously retaining the [ColumnDescriptionDD] column we defined and populated manually.

MERGE
	INTO DataDictionary AS dd
	USING @DataDictionary AS ddnew
	ON
		dd.ServerName = ddnew.ServerName
		AND dd.DatabaseName = ddnew.DatabaseName
		AND dd.SchemaName = ddnew.SchemaName
		AND dd.TableName = ddnew.TableName
		AND dd.ColumnName = ddnew.ColumnName
	WHEN MATCHED
		THEN
			UPDATE
			SET OrdinalPosition = ddnew.OrdinalPosition,
				DataType = ddnew.DataType,
				ForeignKeyTo = ddnew.ForeignKeyTo,
				TableDescription = ddnew.TableDescription,
				ColumnDescription = ddnew.ColumnDescription,
				TableRowCount = ddnew.TableRowCount,
				LastSeenAt = NULL
	WHEN NOT MATCHED
		THEN
			INSERT (SystemName, ServerName, DatabaseName, SchemaName, TableName, ColumnName, OrdinalPosition, DataType, ForeignKeyTo, TableDescription, ColumnDescription, TableRowCount)
			VALUES (SystemName, ServerName, DatabaseName, SchemaName, TableName, ColumnName, OrdinalPosition, DataType, ForeignKeyTo, TableDescription, ColumnDescription, TableRowCount)
	WHEN NOT MATCHED BY SOURCE
		THEN
			UPDATE
			SET OrdinalPosition = NULL,
				LastSeenAt = CURRENT_TIMESTAMP

Putting it altogether

Now that we have all the puzzle pieces, we will combine them into this gigantic SQL enclosed in a stored procedure below.

Before you run the script, make sure to:

  • Inspect and update the WHERE clauses, adjust to your environment’s need
  • Change the DOMAIN\ filter to match your domain’s name (or remove if you are not in a domain environment)
  • Update the @Targets variable to specify linked servers and database names you would like to crawl from

And you are good to go! You can then deploy the procedure and set up a scheduled task or SQL Agent Job to routinely pull the schemas for you.

CREATE PROCEDURE [dbo].[usp_CollectDataDictionary]
AS
	SET NOCOUNT ON;
	SET XACT_ABORT ON;

	DECLARE @SQLServerCols VARCHAR(MAX) = '
	SELECT
		schemas.name AS SchemaName,
		tables.name AS TableName,
		columns.name AS ColumnName,
		ROW_NUMBER() OVER (PARTITION BY columns.object_id ORDER BY columns.column_id) AS OrdinalPosition,
		types.name + CASE
			WHEN types.name IN (''binary'', ''char'', ''varbinary'', ''varchar'') THEN ''('' + CASE WHEN columns.max_length = -1 THEN ''MAX'' ELSE CAST(columns.max_length AS VARCHAR(255)) END + '')''
			WHEN types.name IN (''nchar'', ''nvarchar'') THEN ''('' + CASE WHEN columns.max_length = -1 THEN ''MAX'' ELSE CAST(columns.max_length / 2 AS VARCHAR(255)) END + '')''
			WHEN types.name IN (''decimal'', ''numeric'') THEN ''('' + CAST(columns.precision AS VARCHAR(255)) + '','' + CAST(columns.scale AS VARCHAR(255)) + '')''
			WHEN types.name IN (''datetime2'', ''datetimeoffset'', ''time'') THEN ''('' + CAST(columns.scale AS VARCHAR(255)) + '')''
			ELSE ''''
		END AS DataType,
		fk_ref.ReferencedColumnQN AS ForeignKeyTo,
		table_desc.Description AS TableDescription,
		col_desc.Description AS ColumnDescription,
		tstat.TableRowCount
	FROM $DatabaseName.sys.columns columns
	INNER JOIN (
		SELECT name, object_id, schema_id, is_ms_shipped FROM $DatabaseName.sys.tables
		UNION ALL
		SELECT name, object_id, schema_id, is_ms_shipped FROM $DatabaseName.sys.views
	) tables ON tables.object_id = columns.object_id
	INNER JOIN $DatabaseName.sys.schemas schemas ON schemas.schema_id = tables.schema_id
	INNER JOIN $DatabaseName.sys.types types ON types.user_type_id = columns.user_type_id
	LEFT JOIN ( -- Get table row count (LEFT JOIN because VIEWs do not have a row count)
		-- Ref: https://stackoverflow.com/a/28917736
		SELECT object_id, SUM([rows]) AS TableRowCount
		FROM $DatabaseName.sys.partitions
		WHERE index_id in (0,1)
		GROUP BY object_id
	) tstat ON tstat.object_id = tables.object_id
	LEFT JOIN ( -- Get foreign key references (If one column has multiple FK, one will be chosen to display arbitrarily, however this should be rare in any sane DB design)
		SELECT
			parent_object_id,
			parent_column_id,
			ReferencedColumnQN
		FROM (
			SELECT
				fkc.parent_object_id,
				fkc.parent_column_id,
				QUOTENAME(s_child.name) + ''.'' + QUOTENAME(t_child.name) + ''.'' + QUOTENAME(c_child.name) AS ReferencedColumnQN, -- Qualified Name
				ROW_NUMBER() OVER (PARTITION BY fkc.parent_object_id, fkc.parent_column_id ORDER BY fkc.constraint_object_id) AS Priority -- ORDER BY is arbitrary
			FROM $DatabaseName.sys.foreign_keys fk
			INNER JOIN $DatabaseName.sys.foreign_key_columns  fkc		ON fkc.constraint_object_id = fk.object_id
			INNER JOIN $DatabaseName.sys.tables			   t_child	ON t_child.object_id = fkc.referenced_object_id
			INNER JOIN $DatabaseName.sys.columns			  c_child	ON c_child.object_id = t_child.object_id		 AND fkc.referenced_column_id = c_child.column_id
			INNER JOIN $DatabaseName.sys.schemas			  s_child	ON s_child.schema_id = t_child.schema_id
		) t
		WHERE Priority = 1
	) fk_ref ON fk_ref.parent_object_id = columns.object_id AND fk_ref.parent_column_id = columns.column_id
	LEFT JOIN (
		-- Ref: https://learn.microsoft.com/en-us/sql/relational-databases/system-catalog-views/extended-properties-catalog-views-sys-extended-properties?view=sql-server-ver16
		SELECT
			major_id AS object_id,
			CAST(value AS NVARCHAR(1024)) AS Description
		FROM $DatabaseName.sys.extended_properties
		WHERE class = 1 AND minor_id = 0 AND name = ''MS_Description''
	) table_desc ON table_desc.object_id = columns.object_id
	LEFT JOIN (
		-- See table_desc reference
		SELECT
			major_id AS object_id,
			minor_id AS column_id,
			CAST(value AS NVARCHAR(1024)) AS Description
		FROM $DatabaseName.sys.extended_properties
		WHERE class = 1 AND name = ''MS_Description''
	) col_desc ON col_desc.object_id = columns.object_id AND col_desc.column_id = columns.column_id
	WHERE	
		tables.is_ms_shipped = 0
		AND NOT (schemas.name LIKE ''DOMAIN\%'') -- Ignore personal tables
		AND NOT (tables.name LIKE ''%_BAK_%'' OR tables.name LIKE ''%_TEST_%'') -- Intermediate tables
		AND NOT (schemas.name = ''mdm'' AND (tables.name LIKE ''viw_%'' OR tables.name LIKE ''tbl_%'')) -- MDS internal tables
		AND NOT (schemas.name = ''stg'' AND (tables.name LIKE ''viw_%'' OR tables.name LIKE ''%_Leaf'')) -- MDS internal tables
		AND NOT (schemas.name = ''stg'' AND tables.name LIKE ''[[]stg].[[]viw_%'') -- MDS internal tables, name looks malformed but it exists
	';

	DECLARE @MySQLCols VARCHAR(MAX) = /* For MySQL, there is no catalog or database, please use 'def' as the DatabaseName in-line with MySQL's internal naming */ '
	SELECT
		columns.TABLE_SCHEMA AS SchemaName,
		columns.TABLE_NAME AS TableName,
		columns.COLUMN_NAME AS ColumnName,
		columns.ORDINAL_POSITION AS OrdinalPosition,
		columns.COLUMN_TYPE AS DataType, -- Can list unassigned for number and enum type
		fk_ref.ForeignKeyQN AS ForeignKeyTo,
		LEFT(tables.TABLE_COMMENT, 1024) AS TableDescription,
		LEFT(columns.COLUMN_COMMENT, 1024) AS ColumnDescription,
		tables.TABLE_ROWS AS TableRowCount -- "... InnoDB, this value is an approximation, and may vary from the actual value by as much as 40% to 50% ..." -- Ref: https://dev.mysql.com/doc/refman/8.0/en/information-schema-tables-table.html
	FROM INFORMATION_SCHEMA.COLUMNS columns
	INNER JOIN INFORMATION_SCHEMA.TABLES tables ON tables.TABLE_CATALOG = columns.TABLE_CATALOG AND tables.TABLE_SCHEMA = columns.TABLE_SCHEMA AND tables.TABLE_NAME = columns.TABLE_NAME
	LEFT JOIN (
		-- Ref: https://www.db-fiddle.com/f/sryFQ4D9fSwf9xPXh16S8S/0
		SELECT
			rc.CONSTRAINT_SCHEMA,
			rc.TABLE_NAME,
			kcu.COLUMN_NAME,
			GROUP_CONCAT(CONCAT_WS(''.'', kcu.REFERENCED_TABLE_SCHEMA, kcu.REFERENCED_TABLE_NAME, kcu.REFERENCED_COLUMN_NAME)) AS ForeignKeyQN
		FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS rc
		JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu ON
			rc.CONSTRAINT_SCHEMA = kcu.CONSTRAINT_SCHEMA
			AND rc.TABLE_NAME = kcu.TABLE_NAME
			AND rc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME
		GROUP BY rc.CONSTRAINT_SCHEMA, rc.TABLE_NAME, kcu.COLUMN_NAME
	) fk_ref ON fk_ref.CONSTRAINT_SCHEMA = columns.TABLE_SCHEMA AND fk_ref.TABLE_NAME = columns.TABLE_NAME AND fk_ref.COLUMN_NAME = columns.COLUMN_NAME
	WHERE columns.TABLE_SCHEMA NOT IN (''sys'', ''mysql'', ''information_schema'', ''performance_schema'', ''mysql_innodb_cluster_metadata'')
	';

	DECLARE @PostgresCols VARCHAR(MAX) = '
	SELECT
		schemas.nspname AS SchemaName,
		tables.relname AS TableName,
		columns.attname AS ColumnName,
		ROW_NUMBER() OVER (PARTITION BY columns.attrelid ORDER BY columns.attnum) AS OrdinalPosition,
		format_type(columns.atttypid, columns.atttypmod) AS DataType,
		NULL AS ForeignKeyTo, -- Not implemented
		left(obj_description(columns.attrelid, ''pg_class''), 1024) AS TableDescription,
		left(col_description(columns.attrelid, columns.attnum), 1024) AS ColumnDescription,
		tables.reltuples::bigint + COALESCE(tstat2.children_reltuples::bigint, 0) AS TableRowCount
	FROM $DatabaseName.pg_catalog.pg_attribute columns
	INNER JOIN (
		SELECT oid, relname, relnamespace, reltuples
		FROM $DatabaseName.pg_catalog.pg_class
		WHERE relkind IN (''p'', ''r'', ''v'', ''m'') AND NOT relispartition
	) tables ON tables.oid = columns.attrelid
	INNER JOIN $DatabaseName.pg_catalog.pg_namespace schemas ON schemas.oid = tables.relnamespace
	LEFT JOIN (
		SELECT pg_inherits.inhparent AS oid, SUM(reltuples::bigint) AS children_reltuples
		FROM $DatabaseName.pg_catalog.pg_inherits
		INNER JOIN $DatabaseName.pg_catalog.pg_class children ON children.oid = pg_inherits.inhrelid
		GROUP BY pg_inherits.inhparent
	) tstat2 ON tstat2.oid = columns.attrelid
	WHERE
		columns.attnum > 0
		AND NOT columns.attisdropped
		AND schemas.nspname NOT IN (''pg_catalog'', ''information_schema'')
	';

	DECLARE @Targets TABLE(
		SystemName VARCHAR(255) PRIMARY KEY,
		ServerNameL1 VARCHAR(255),
		ServerNameL2 VARCHAR(255),
		DatabaseName VARCHAR(255),
		ColsSQLTemplate VARCHAR(MAX)
	);

	INSERT INTO @Targets VALUES
		-- Local databases
		('Application 1',      NULL,           NULL,          'App1',       @SQLServerCols),

		-- Linked Servers
		('Application 2',      'SVR_APP2_DB',  NULL,          'App2',       @SQLServerCols),

		-- Double-Linked Server
		('Application 3',     'SVR_WAREHOUSE', 'SVR_APP3_DB', 'App3',       @SQLServerCols),
		('Application 4',     'SVR_WAREHOUSE', 'SVR_APP4_DB', 'App4',       @SQLServerCols),
		('Application 5',     'SVR_JUMPHOST',  'SVR_APP5_DB', 'myreportdb', @PostgresCols),
		('WordPress Website', 'SVR_JUMPHOST',  'SVR_WP_DB',   'def',        @MySQLCols)
	;

	DECLARE @SystemName VARCHAR(255);
	DECLARE @ServerNameL1 VARCHAR(255);
	DECLARE @ServerNameL2 VARCHAR(255);
	DECLARE @DatabaseName VARCHAR(255);
	DECLARE @ColsSQLTemplate VARCHAR(MAX);

	DECLARE @DataDictionary TABLE -- See DataDictionary table structure
	(
		[SystemName] VARCHAR(255) NOT NULL , 
		[ServerName] VARCHAR(255) NOT NULL, 
		[DatabaseName] VARCHAR(255) NOT NULL, 
		[SchemaName] VARCHAR(255) NOT NULL, 
		[TableName] VARCHAR(255) NOT NULL, 
		[ColumnName] VARCHAR(255) NOT NULL, 
		[OrdinalPosition] INT NULL, 
		[DataType] VARCHAR(255) NULL, 
		[ForeignKeyTo] VARCHAR(255) NULL, 
		[TableDescription] NVARCHAR(1024) NULL, 
		[ColumnDescription] NVARCHAR(1024) NULL, 
		[TableRowCount] BIGINT NULL
	)

	DECLARE MyCursor CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY
	FOR
	SELECT * FROM @Targets;

	OPEN MyCursor
	FETCH NEXT FROM MyCursor INTO @SystemName, @ServerNameL1, @ServerNameL2, @DatabaseName, @ColsSQLTemplate
	WHILE @@FETCH_STATUS = 0
	BEGIN 

		DECLARE @TSQL VARCHAR(MAX) = REPLACE(@ColsSQLTemplate, '$DatabaseName', @DatabaseName);

		IF @ServerNameL2 IS NOT NULL
		SET @TSQL = 'SELECT * FROM OPENQUERY(' + @ServerNameL2 + ',''' + REPLACE(@TSQL, '''', '''''') + ''')';

		IF @ServerNameL1 IS NOT NULL
		SET @TSQL = 'SELECT * FROM OPENQUERY(' + @ServerNameL1 + ',''' + REPLACE(@TSQL, '''', '''''') + ''')';

		SET @TSQL = 'SELECT ''' + @SystemName + ''' AS SystemName, ''' + COALESCE(@ServerNameL2, @ServerNameL1, @@SERVERNAME) + ''' AS ServerName, ''' + @DatabaseName + ''' AS DatabaseName, *
		FROM (' + @TSQL + ') t
		ORDER BY SchemaName, TableName, OrdinalPosition';

		RAISERROR ('Processing System "%s"', 0, 1, @SystemName) WITH NOWAIT;

		INSERT INTO @DataDictionary
		EXEC (@TSQL);

		FETCH NEXT FROM MyCursor INTO @SystemName, @ServerNameL1, @ServerNameL2, @DatabaseName, @ColsSQLTemplate
	END
	CLOSE MyCursor
	DEALLOCATE MyCursor

	MERGE
		INTO DataDictionary AS dd
		USING @DataDictionary AS ddnew
		ON
			dd.ServerName = ddnew.ServerName
			AND dd.DatabaseName = ddnew.DatabaseName
			AND dd.SchemaName = ddnew.SchemaName
			AND dd.TableName = ddnew.TableName
			AND dd.ColumnName = ddnew.ColumnName
		WHEN MATCHED
			THEN
				UPDATE
				SET OrdinalPosition = ddnew.OrdinalPosition,
					DataType = ddnew.DataType,
					ForeignKeyTo = ddnew.ForeignKeyTo,
					TableDescription = ddnew.TableDescription,
					ColumnDescription = ddnew.ColumnDescription,
					TableRowCount = ddnew.TableRowCount,
					LastSeenAt = NULL
		WHEN NOT MATCHED
			THEN
				INSERT (SystemName, ServerName, DatabaseName, SchemaName, TableName, ColumnName, OrdinalPosition, DataType, ForeignKeyTo, TableDescription, ColumnDescription, TableRowCount)
				VALUES (SystemName, ServerName, DatabaseName, SchemaName, TableName, ColumnName, OrdinalPosition, DataType, ForeignKeyTo, TableDescription, ColumnDescription, TableRowCount)
		WHEN NOT MATCHED BY SOURCE
			THEN
				UPDATE
				SET OrdinalPosition = NULL,
					LastSeenAt = CURRENT_TIMESTAMP
	;

RETURN 0