6. Phase C: Information Systems Architectures — Data Architecture
This chapter describes the Data Architecture part of Phase C.
6.1 Objectives
The objectives of the Data Architecture part of Phase C are to:
- Develop the Target Data Architecture that enables the Business Architecture and the Architecture Vision, in a way that addresses the Statement of Architecture Work and stakeholder concerns
- Identify candidate Architecture Roadmap components based upon gaps between the Baseline and Target Data Architectures
6.2 Inputs
This section defines the inputs to Phase C (Data Architecture).
6.2.1 Reference Materials External to the Enterprise
- Architecture reference materials (see the TOGAF Standard — Architecture Content)
- TOGAF® Series Guide: Information Architecture: Customer Master Data Management
- TOGAF® Series Guide: Information Architecture: Business Intelligence & Analytics
- TOGAF® Series Guide: Information Architecture: Metadata Management
6.2.2 Non-Architectural Inputs
- Request for Architecture Work (see the TOGAF Standard — Architecture Content)
- Capability Assessment (see the TOGAF Standard — Architecture Content)
- Communications Plan (see the TOGAF Standard — Architecture Content)
6.2.3 Architectural Inputs
- Organizational Model for Enterprise Architecture (see the TOGAF Standard — Architecture Content), including:
- Scope of organizations impacted
- Maturity assessment, gaps, and resolution approach
- Roles and responsibilities for architecture team(s)
- Constraints on architecture work
- Budget requirements
- Governance and support strategy
- Tailored Architecture Framework (see the TOGAF Standard — Architecture Content), including:
- Tailored architecture method
- Tailored architecture content (deliverables and artifacts)
- Configured and deployed tools
- Data principles (see the TOGAF Standard — ADM Techniques), if existing
- Statement of Architecture Work (see the TOGAF Standard — Architecture Content)
- Architecture Vision (see the TOGAF Standard — Architecture Content)
- Architecture Repository (see the TOGAF Standard — Architecture Content), including:
- Re-usable building blocks (in particular, definitions of current data)
- Publicly available reference models
- Organization-specific reference models
- Organization standards
- Draft Architecture Definition Document, which may include Baseline and/or Target Architectures of any architecture domain
- Draft Architecture Requirements Specification (see the TOGAF Standard — Architecture Content), including:
- Gap analysis results (from Business Architecture)
- Relevant technical requirements that will apply to this phase
- Business Architecture components of an Architecture Roadmap (see the TOGAF Standard — Architecture Content)
6.3 Steps
The level of detail addressed in Phase C will depend on the scope and goals of the overall architecture effort.
New data building blocks being introduced as part of this effort will need to be defined in detail during Phase C. Existing data building blocks to be carried over and supported in the target environment may already have been adequately defined in previous architectural work; but, if not, they too will need to be defined in Phase C.
The order of the steps in this phase as well as the time at which they are formally started and completed should be adapted to the situation at hand in accordance with the established Architecture Governance. In particular, determine whether in this situation it is appropriate to conduct Baseline Description or Target Architecture development first, as described in the TOGAF Standard — Applying the ADM.
All activities that have been initiated in these steps should be closed during the Finalize the Data Architecture step (see 6.3.8 Finalize the Data Architecture). The documentation generated from these steps must be formally published in the Create/Update the Architecture Definition Document step (see 6.3.9 Create/Update the Architecture Definition Document).
The steps in Phase C (Data Architecture) are as follows:
- Select reference models, viewpoints, and tools (see 6.3.1 Select Reference Models, Viewpoints, and Tools)
- Develop Baseline Data Architecture Description (see 6.3.2 Develop Baseline Data Architecture Description)
- Develop Target Data Architecture Description (see 6.3.3 Develop Target Data Architecture Description)
- Perform gap analysis (see 6.3.4 Perform Gap Analysis)
- Define candidate roadmap components (see 6.3.5 Define Candidate Roadmap Components)
- Resolve impacts across the Architecture Landscape (see 6.3.6 Resolve Impacts Across the Architecture Landscape)
- Conduct formal stakeholder review (see 6.3.7 Conduct Formal Stakeholder Review)
- Finalize the Data Architecture (see 6.3.8 Finalize the Data Architecture)
- Create/update the Architecture Definition Document (see 6.3.9 Create/Update the Architecture Definition Document)
6.3.1 Select Reference Models, Viewpoints, and Tools
Review and validate (or generate, if necessary) the set of data principles. These will normally form part of an overarching set of Architecture Principles. Guidelines for developing and applying principles, and a sample set of data principles, are given in the TOGAF Standard — ADM Techniques.
Select relevant Data Architecture resources (reference models, patterns, etc.) on the basis of the business drivers, stakeholders, concerns, and Business Architecture.
Select relevant Data Architecture viewpoints (for example, stakeholders of the data — regulatory bodies, users, generators, subjects, auditors, etc.; various time dimensions — real-time, reporting period, event-driven, etc.; locations; business processes); i.e., those that will enable the architect to demonstrate how the stakeholder concerns are being addressed in the Data Architecture.
Identify appropriate tools and techniques (including forms) to be used for data capture, modeling, and analysis, in association
with the selected viewpoints. Depending on the degree of sophistication warranted, these may comprise simple documents or
spreadsheets, or more sophisticated modeling tools and techniques such as data management models, data models, etc.
Examples of data modeling techniques are:
- Entity relationship diagram
- Class diagram
Further guidance on Information Architecture reference models can be found in the following documents:
- TOGAF® Series Guide: Information Architecture: Customer Master Data Management
- TOGAF® Series Guide: Information Architecture: Business Intelligence & Analytics
- TOGAF® Series Guide: Information Architecture: Metadata Management
6.3.1.1 Determine Overall Modeling Process
For each viewpoint, select the models needed to support the specific view required, using the selected tool or method.
Ensure that all stakeholder concerns are covered. If they are not, create new models to address concerns not covered, or augment existing models (see above).
The recommended process for developing a Data Architecture is as follows:
- Collect data-related models from existing Business Architecture and Application Architecture materials
- Rationalize data requirements and align with any existing enterprise data catalogs and models; this allows the development of a data inventory and entity relationship
- Update and develop matrices across the architecture by relating data to business service, business capability, business function, access rights, and application
- Elaborate Data Architecture views by examining how data is created, distributed, migrated, secured, and archived
6.3.1.2 Identify Required Catalogs of Data Building Blocks
Descriptions of data may be captured as a catalog showing decomposition across related model entities (e.g., data entity -> logical data component -> physical data component).
During the Business Architecture phase, a Business Service/Information diagram was created showing the key data entities required by the main business services. This is a prerequisite to successful Data Architecture activities.
Using the traceability from business function/business capability to application and data entity, it is possible to create an inventory of the data needed to support the Architecture Vision.
Once the data requirements are consolidated in a single location, it is possible to refine the data inventory to achieve semantic consistency and to remove gaps and overlaps.
The TOGAF Standard — Architecture Content contains a detailed description of catalogs which should be considered for development within a Data Architecture, describing them in detail and relating them to entities, attributes, and relationships in the TOGAF Enterprise Metamodel.
6.3.1.3 Identify Required Matrices
At this stage, an entity to applications matrix could be produced to validate this mapping. How data is created, maintained, transformed, and passed to other applications, or used by other applications, will now start to be understood. Obvious gaps such as entities that never seem to be created by an application or data created but never used, need to be noted for later gap analysis.
The rationalized data inventory can be used to update and refine the architectural diagrams of how data relates to other aspects of the architecture.
Once these updates have been made, it may be appropriate to drop into a short iteration of the Application Architecture to resolve the changes identified.
The TOGAF Standard — Architecture Content contains a detailed description of matrices which should be considered for development within a Data Architecture, describing them in detail and relating them to entities, attributes, and relationships in the TOGAF Enterprise Metamodel.
6.3.1.4 Identify Required Diagrams
Diagrams present the Data Architecture information from a set of different perspectives (viewpoints) according to the requirements of the stakeholders.
Once the data entities have been refined, a diagram of the relationships between entities and their attributes can be produced.
It is important to note at this stage that information may be a mixture of enterprise-level data (from system service providers and package vendor information) and local-level data held in personal databases and spreadsheets.
The level of detail modeled needs to be carefully assessed. Some physical system data models will exist down to a very detailed level; others will only have core entities modeled. Not all data models will have been kept up-to-date as applications were modified and extended over time. It is important to achieve a balance in the level of detail provided (e.g., the reproduction of existing detailed system physical data schemas or the presentation of high-level process maps and data requirements highlight the two extreme views).
The TOGAF Standard — Architecture Content contains a detailed description of diagrams which should be considered for development within a Data Architecture, describing them in detail and relating them to entities, attributes, and relationships in the TOGAF Enterprise Metamodel.
6.3.1.5 Identify Types of Requirement to be Collected
Once the Data Architecture catalogs, matrices, and diagrams have been developed, architecture modeling is completed by formalizing the data-focused requirements for implementing the Target Architecture.
These requirements may:
- Relate to the data domain
- Provide requirements input into the Application and Technology Architectures
- Provide detailed guidance to be reflected during design and implementation to ensure that the solution addresses the original architecture requirements
Within this step, the architect should identify requirements that should be met by the architecture (see 13.5.2 Requirements Development).
6.3.2 Develop Baseline Data Architecture Description
Develop a Baseline Description of the existing Data Architecture, to the extent necessary to support the Target Data Architecture. The scope and level of detail to be defined will depend on the extent to which existing data elements are likely to be carried over into the Target Data Architecture, and on whether architectural descriptions exist, as described in 6.5 Approach . To the extent possible, identify the relevant Data Architecture building blocks, drawing on the Architecture Repository (see the TOGAF Standard — Architecture Content).
Where new architecture models need to be developed to satisfy stakeholder concerns, use the models identified within Step 1 as a guideline for creating new architecture content to describe the Baseline Architecture.
6.3.3 Develop Target Data Architecture Description
Develop a Target Description for the Data Architecture, to the extent necessary to support the Architecture Vision and Target Business Architecture. The scope and level of detail to be defined will depend on the relevance of the data elements to attaining the Target Architecture, and on whether architectural descriptions exist. To the extent possible, identify the relevant Data Architecture building blocks, drawing on the Architecture Repository (see TOGAF Standard — Architecture Content).
Where new architecture models need to be developed to satisfy stakeholder concerns, use the models identified within Step 1 as a guideline for creating new architecture content to describe the Target Architecture.
If appropriate, investigate different Target Architecture alternatives and discuss these with stakeholders using the Architecture Alternatives and Trade-offs technique (see the TOGAF Standard — ADM Techniques).
6.3.4 Perform Gap Analysis
Verify the architecture models for internal consistency and accuracy:
- Perform trade-off analysis to resolve conflicts (if any) among the different views
- Validate that the models support the principles, objectives, and constraints
- Note changes to the viewpoint represented in the selected models from the Architecture Repository, and document
- Test architecture models for completeness against requirements
Identify gaps between the Baseline and Target, using the gap analysis technique as described in the TOGAF Standard — ADM Techniques.
6.3.5 Define Candidate Roadmap Components
Following the creation of a Baseline Architecture, Target Architecture, and gap analysis, a data roadmap is required to prioritize activities over the coming phases.
This initial Data Architecture roadmap will be used as raw material to support more detailed definition of a consolidated, cross-discipline roadmap within the Opportunities & Solutions phase.
6.3.6 Resolve Impacts Across the Architecture Landscape
Once the Data Architecture is finalized, it is necessary to understand any wider impacts or implications.
At this stage, other architecture artifacts in the Architecture Landscape should be examined to identify:
- Does this Data Architecture create an impact on any pre-existing architectures?
- Have recent changes been made that impact the Data Architecture?
- Are there any opportunities to leverage work from this Data Architecture in other areas of the organization?
- Does this Data Architecture impact other projects (including those planned as well as those currently in progress)?
- Will this Data Architecture be impacted by other projects (including those planned as well as those currently in progress)?
6.3.7 Conduct Formal Stakeholder Review
Check the original motivation for the architecture project and the Statement of Architecture Work against the proposed Data Architecture. Conduct an impact analysis to identify any areas where the Business and Application Architectures (e.g., business practices) may need to change to cater for changes in the Data Architecture (for example, changes to forms or procedures, applications, or database systems).
If the impact is significant, this may warrant the Business and Application Architectures being revisited.
Identify any areas where the Application Architecture (if generated at this point) may need to change to cater for changes in the Data Architecture (or to identify constraints on the Application Architecture about to be designed).
If the impact is significant, it may be appropriate to drop into a short iteration of the Application Architecture at this point.
Identify any constraints on the Technology Architecture about to be designed, refining the proposed Data Architecture only if necessary.
6.3.8 Finalize the Data Architecture
- Select standards for each of the building blocks, re-using as much as possible from the reference models selected from the Architecture Repository
- Fully document each building block
- Conduct a final cross-check of overall architecture against business requirements; document the rationale for building block decisions in the architecture document
- Document the final requirements traceability report
- Document the final mapping of the architecture within the Architecture Repository; from the selected building blocks, identify those that might be re-used, and publish via the Architecture Repository
- Finalize all the work products, such as gap analysis
6.3.9 Create/Update the Architecture Definition Document
Document the rationale for building block decisions in the Architecture Definition Document.
Prepare the Data Architecture sections of the Architecture Definition Document, comprising some or all of:
- Business data model
- Logical data model
- Data management process model
- Data Entity/Business Function matrix
- Data interoperability requirements (e.g., XML schema, security policies)
- If appropriate, use reports and/or graphics generated by modeling tools to demonstrate key views of the architecture; route the document for review by relevant stakeholders, and incorporate feedback
6.4 Outputs
The outputs of Phase C (Data Architecture) may include, but are not restricted to:
- Refined and updated versions of the Architecture Vision phase deliverables, where applicable:
- Statement of Architecture Work (see the TOGAF Standard — Architecture Content), updated if necessary
- Validated data principles (see the TOGAF Standard — ADM Techniques), or new data principles (if generated here)
- Draft Architecture Definition Document (see the TOGAF Standard — Architecture
Content), including:
- Baseline Data Architecture, Approved, if appropriate
- Target Data Architecture, Approved, including:
- Business data model
- Logical data model
- Data management process models
- Data Entity/Business Function matrix
- Views corresponding to the selected viewpoints addressing key stakeholder concerns
- Draft Architecture Requirements Specification (see the TOGAF Standard — Architecture Content), including such Data Architecture requirements as:
- Gap analysis results
- Data interoperability requirements
- Relevant technical requirements that will apply to this evolution of the architecture development cycle
- Constraints on the Technology Architecture about to be designed
- Updated business requirements, if appropriate
- Updated application requirements, if appropriate
- Data Architecture components of an Architecture Roadmap (see the TOGAF Standard — Architecture Content)
The TOGAF Standard — Architecture Content contains a detailed description of architectural artifacts which might be produced in this phase.
6.5 Approach
6.5.1 Data Structure
A Data Architecture should be able to handle:
- Data at rest — data in stores
- Data in motion — data in transactions or services/APIs
- Data in use — data at the border of the application (e.g., GUI)
- Open data — data that the organization provides for public usage and which it is voluntarily or legally required to provide
Different alternate ways of working with these types of Data Architecture will be added.
Data Architecture is created by using three metamodel entities: data entity, logical data component, and physical data component.
Data entities can be used to create conceptual data models to help the IT developers understand the concepts they will be dealing with. Often the entity relationship models also contain some requirements on the relations (e.g., a customer can only have one address).
Logical data components can be used to create logical data models. Often it is important for the IT area to have a clear view of all data that is used in the IT environment. The logical data model is often used as a requirement on the data stored in applications (at rest), data moved between applications (in motion), or data at the user interface of applications (data in use).
Physical data components are clusters of logical data components that have been implemented by some earlier project (links to, for example, XML message, database schemas) or requirements for new implementation projects.
All three data entities can be used in data exchange models for data passed between/into/out of IS services, logical application components, or physical application components.
All data entities can have quality attributes for specific situations.
6.5.2 Key Considerations for Data Architecture
6.5.2.1 Data Management
When an enterprise has chosen to undertake large-scale architectural transformation, it is important to understand and address data management issues. A structured and comprehensive approach to data management enables the effective use of data to capitalize on its competitive advantages.
Considerations include:
- A clear definition of which application components in the landscape will serve as the system of record or reference for enterprise master data
- Will there be an enterprise-wide standard that all application components, including software packages, need to adopt?
(In the main, packages can be prescriptive about the data models and may not be flexible.)
- Clearly understand how data entities are utilized by business capabilities, business functions, processes, and business and application services
- Clearly understand how and where enterprise data entities are created, stored, transported, and reported
- What is the level and complexity of data transformations required to support the information exchange needs between applications?
- What will be the requirement for software in supporting data integration with the enterprise's customers and suppliers (e.g., use of Extract, Transform, Load (ETL) tools during data migration, data profiling tools to evaluate data quality, etc.)?
More guidance on data management can be found in the TOGAF® Series Guide: Information Architecture — Customer Master Data Management.
6.5.2.2 Data Migration
When an existing application is replaced, there will be a critical need to migrate data (master, transactional, and reference) to the new application. The Data Architecture should identify data migration requirements and also provide indicators as to the level of transformation, weeding, and cleansing that will be required to present data in a format that meets the requirements and constraints of the target application. The objective being that the target application has quality data when it is populated. Another key consideration is to ensure that an enterprise-wide common data definition is established to support the transformation.
6.5.2.3 Data Governance
Data governance considerations ensure that the enterprise has the necessary dimensions in place to enable the transformation, as follows:
- Structure: this dimension pertains to whether the enterprise has the necessary organizational structure and the standards bodies to manage data entity aspects of the transformation
- Management System: here enterprises should have the necessary management system and data-related programs to manage the governance aspects of data entities throughout its lifecycle
- People: this dimension addresses what data-related skills and roles the enterprise requires for the transformation
If the enterprise lacks such resources and skills, the enterprise should consider either acquiring those critical skills or training existing internal resources to meet the requirements through a well-defined learning program.
6.5.3 Architecture Repository
As part of this phase, the architecture team will need to consider what relevant Data Architecture resources are available in the organization's Architecture Repository (see the TOGAF Standard — Architecture Content); in particular, generic data models relevant to the organization's industry "vertical" sector.
TOGAF is a registered trademark of The Open Group