Feedback on DDI 4 V4

advertisement
Feedback on DDI 4
1 Reviewing the Approach:
1. Does it have reusable components to the extent needed?
DDI 4 has a significant number of reusable components, and the balance between size and number
of reusable components seems good.
One possible counter example would be the Date type, which can act as either a Date or a DateRange, which makes the model more difficult to understand.
2. We have some abstractions of things, like collection and other patterns. Is this the way to
go?
Abstraction is definitely a step in the right direction.
There are several advantages to capturing abstract theoretical patterns (such as Set and Tree) in the
modelling process:




Consistency of modelling
Simplicity and clarity in the transformation from model to physical form
Opportunity for software applications using the physical form to write reusable code to
match the abstract theoretical patterns in the data.
Decreased coupling, leading to an anticipated reduction in the amount of future rework and
breaking changes, resulting from the stability of theoretical models and patterns.
This last point in particular (abstract theoretical patterns) can return significant benefits in the long
term, especially when the effort of users of the standard are considered. These abstract types can
then be instantiated in concrete types that match the vocabulary of the target audience.
We recommend continuing to increase use of abstract types, especially where they correspond to
well established theoretical patterns. This should make the model cleaner and simpler, with minimal
duplication. (Any tendency towards excessive use of abstractions would probably show clearly in
increasing complexity of the model.)
There is also a danger, though, of “overuse” of reuse, or of mixing structures/patterns within the one
reusable class. There should be considerable emphasis on the agility and light coupling within
models, while mixing patterns might result in coupling that could cause problems in the future.
Instead of having a pattern that does two functions internally, using a flag to indicate the difference,
we could instead use a pattern that uses abstraction to remove the distinction, and then two trivial
overrides to re-introduce it.
The abstract layer is also expected to aid in mapping between different standards.
3. Is there a balance between real life and a clean model in a technical sense?
1|Page
Given the origins of DDI in XML Schema, it appears that the “real life” aspect (i.e. concrete classes)
predominates. However DDI’s initial efforts into abstraction, such as Collections and Tree Node
structures are an excellent start on an appropriate path to redress this. The clean model should
come from increased use, and reuse, of abstract patterns.
4. Are the classes modelling something in real life? Is it clear that a class is and what it relates
to in the real world?
Within one of the target user domains – statistics – terminology is converging towards a standard, as
captured in GSIM. DDI’s efforts to adopt GSIM terminology in DDI 4 will introduce a clear linkage
between DDI 4 and the real-life world of statistics. ABS recognises signs of this in DDI 4, and
encourages further alignment.
As part of our review, we downloaded recent models of DDI 4 from the website, and imported them
into Sparx EA. This made it possible to generate diagrams for each class, as required. These diagrams
made the meaning of each class easier to understand, and reflects the general value of a modelling
approach.
We tested DDI 4 on a several modellers who were new to both GSIM and DDI, and found only a few
classes whose meaning eluded them. We expect that this effect will reduce over time as the model
matures.
The Capture class was one example of this. Some reviewers felt that there was a lack of clarity about
whether the name was intended as a verb or a noun, and that its inheritance chain of classes varied
between some that represented a task, and some that represented a definition or description of a
task, which might be considered to be metadata for a task.
5. Reaction to the cascade model of describing variables.
There is a strong similarity to GSIM, but some of the differences may benefit from further
information/explanation. The concept of cascading models itself seems a valid approach.
2 Reviewing the Model
6. Are the set of properties complete?
For the provided documentation, they seemed complete. It is likely that ABS would discover needs
for more attributes if it attempted to apply DDI 4 in earnest, as it is currently doing with DDI 3.2.
From manual inspection, no reports of missing properties were received.
When exploring the DDI 4 models downloaded from the website, some incomplete properties were
discovered, where properties had been declared, but without data types provided. (This may be the
result of ABS downloading a more modern version, rather than the one that was groomed for official
review.)
7. Are they under-specified, too simple? Over-specified, too complex? Can you provide a sense
of the percent of coverage we are providing in terms of what is needed to describe a class?
2|Page
ABS recognises three levels of model – conceptual, logical, and physical. In the past, DDI has been
expressed in XSD, and so qualified as a physical model. With DDI 4, we note that multiple physical
forms are available for download, including XSD, and so the physical form is still available. The DDI
model and documentation is clearly more detailed than a conceptual model, and so we assume it is
targeted as a logical model, by our terminology. As such it seems to be at an appropriate level of
detail.
As the physical forms were automatically generated from the logical DDI model, that model clearly
must have contain sufficient detail to be precise, and so cannot have been under-specified. Some
reviewers felt that it would be good to know the modelling conventions that are used to drive the
conversions to physical. Perhaps these could be published as part of the standard.
Some could argue that DDI is over-specified for a logical model, on the grounds that there is
sufficient information to drive the generation of the physical forms, but this has to be balanced
against the value of a model driven approach, in which the physical form is automatically generated
from the logical. In such situations, the logical model must employ either modelling standards or
hints to the generation code on occasion, to achieve the benefits of automation. It is clear that DDI
has adopted the model-driven approach, and ABS supports that fully. Again, it would be clearer if
the modelling conventions or hinting techniques were published along with the model. This would
enable ABS to decide whether some details are superfluous, or present to inform the autogeneration code.
Some reviewers also thought that there might be value in DDI including a conceptual layer model
(particularly if/where it differs from GSIM), to help with introduction of its concepts. If a conceptual
model were to be produced early in the process, then readers of the model would have a clear
understanding of where DDI is heading, and would be more able to comment meaningfully on
matters of scope. Comparability between DDI and other standards would also be easier at a
conceptual layer.
8. Do the properties and relationships look correct?
ABS downloaded models from the DDI website, and imported them into Sparx EA, to find that most
of the relationships were reversed from the expected, and most inheritance relationships were
duplicated. This is more likely to be a technical compatibility issue than an actual modelling
problem?
9. Are there too many levels in terms of relationships?
We assume this refers to the number of levels of inheritance of relationship types?
The number of levels seemed manageable. If it is found in the future that the number of levels is
confusing readers, then that could be clarified by having a separate diagram for the inheritance
structure.
As mentioned before, ABS’s investigations indicate that overuse of coupling in a model is
undesirable, and inheritance is a strong form of decoupling. If there is a desire to reduce the number
of levels of relationship inheritance, then one approach would be to have a stereotype of
<<relationship>>, and to give all relationships that stereotype, rather than including them in an
3|Page
inheritance tree. The code that auto-generates the physical forms would then use the stereotype to
trigger similar structures for each relationship.
This approach has the potential to simplify the model, reduce coupling and reduce levels of
inheritance, (although we don’t know how this would work with Drupal or the physical model
generation code.)
Another issue that may be relevant at this point is the decision that must often be made when
implementing a number of subtypes from a base type, between:


defining trivial overrides to produce multiple subtypes, and
having a type field with an enumerated list of subtypes.
The enumeration approach certainly seems more compact, but can result in a “breaking change” if
one of the subtypes becomes non-trivial. The override/inheritance approach allows a smoother
transition from trivial to non-trivial overrides. It also allows subtypes to be further split into subsubtypes without rework. Thus ABS recommends using inheritance rather than subtype enumeration
in these cases.
3 Functional Views
10. Are the right classes included in each Functional View given its described coverage?
ABS received no comments on this topic from its reviewers.
11. In the future when we publish other views we intend publish in this format but with use
cases, narratives, how to navigate the contents, etc. We want to provide information to help
people think about their cases and decide which Functional View applies and how to use it.
What would be useful to have?
We support the concept of matching Views to Use Cases. This is consistent with the TOGAF concept
of Viewpoint, from the point of view of a Use Case Actor. However, there may be multiple classes of
Actors for a use case, so there may be multiple view-points; hence multiple views per use case.
An analysis of Actors and Use Cases should yield the relevant viewpoints needed to guide this work.
Ideally, readers should be able to identify their organisation’s work with relevant Actors and Use
Cases. Perhaps Actors could be documented in a way to encourage this. Once they have identified
with one or more actors and use cases, it should be relatively simple to trace through to relevant
views.
12. Although the current Functional Views are very limited, do you see yourself using them?
Why or why not? What could be done to make Functional Views better?
To date, ABS is using DDI3.2 as a physical model for use within its Metadata Registry and Repository.
To do so, it needs to map GSIM concepts through its own logical model to DDI 3.2. Any
documentation for DDI 4 that aids in identifying these mappings on a case-by-case basis is likely to
be of great benefit to ABS in the future, and so we hope the use of Functional Views is expanded.
Perhaps further Views could be based on identified actors and use cases for DDI4?
4|Page
13. If you don't plan on using this (a DDI structure to manage the metadata found in the current
Functional Views), what would you be using? Let's us know what we should be looking at in
terms of what we cover.
As mentioned above, the ABS expects the use of Functional Views to be of great benefit in
understanding DDI4. While Views may need their own metadata, and a class to keep it in, we see
them as more of a human readable artefact than machine readable. As indicated in the previous
questions, it is the ease of human navigation to the correct view(s) that will determine their
usefulness.
We note that both DDI format and views fall under the general category of ontology. There is a
simple OWL/RDF standard called Simple Knowledge Organisation System – SKOS – which deals with
the documentation and relationships between ontologies. It might be worth adopting concepts
from SKOS, or simply using it in its entirety.
14. In DDI-C we had top level elements (codeBook or DDIInstance) which contained a consistent
set of information. XML and RDF handle this in different ways. This topic is still under
discussion within the Moving Forward Project. Is there a set of information that is needed
for all or most instances of a Functional View?
ABS recommends the following, as a base set.




Viewpoint information - Description, Classes of actors, Use cases.
A reference to the version of DDI used by the view.
Most of the usual Dublin core fields would be applicable.
SKOS information – possibly re-modelled in XSD.
4 Reviewing the Documentation
15. The documentation files are a work in progress. What information do you need that is not
there?
Most reviewers were able to dive straight into the meat of the review, but some without previous
DDI experience had some trouble. For them, there was shortage of information about the scope and
purpose of DDI, its history, etc – likely stemming from an assumption of DDI experience?
Perhaps their situation was sufficiently unique that this problem does not really need addressing.
16. What is “too much information”?
There is certainly a lot of valuable information in DDI. The level of detail needs to be sufficient to
drive the auto-generation of the physical models. This determines the amount of information
needed, and so the problem becomes one of managing the presentation of the information in
manageable chunks, rather than trimming it down.
The View approach will yield benefits, as it enables a more “human readable” experience.
5|Page
Another technique found to be of great value is the use of clickable web presentation of the model,
as is done on the DDI wiki website. Is it possible to further emphasise and enhance this capability,
perhaps as a way of presenting the information in more manageable chunks?
Some techniques which we are considering internally to ABS, and which might be useful for DDI,
include:





Use of a conceptual model, to introduce concepts with limited detail.
Use of theoretical abstracts, such as sets, trees, etc. that allow readers to link to pre-existing
knowledge, or to provide the opportunity to research more detail outside of the presented
material.
Defined, documented, consistent modelling styles and techniques
Dynamic Generation of diagrams as required. Sparx EA has a menu item (right click, Insert
Related Elements…) that allows the automated addition of related classes onto a diagram.
Thus it is a simple matter to create a new diagram for the context of a class, when needed.
This ability allows the experienced user to investigate the model beyond the provided
information.
Like DDI’s website, ABS is developing a clickable version of our model.
There was a general impression that the document could be made more succinct or a more compact
format might be appropriate, especially if planning to print on paper.
17. How could we make it easier to use?
ABS sees presentation of models as an exercise in managing complexity. Specialist interactive tools
such as a clickable model, e.g. the DDI wiki web site, are more likely to solve the problem than
normal paper-style documents.
One reviewer suggest the concept of How To recipe books as many developers find this approach to
coding to be successful. Perhaps this is another View?
Perhaps some examples bases on actors or use cases could be of further illustration, in a number of
different presentation modes (views, XML, RDF examples?)
18. To understand?
It is relevant to consider the intended audience. Hypothetically, to what extent can we expect them
to already understand the fundamentals of data structures and modelling? DDI has already taken a
great step forward by enabling the generation of multiple physical forms from the one model. This
will allow readers to work in any of the physical data languages that they are familiar with.
An opportunity may lie in the use of RDF triples. If the triples that define the RDF form of DDI could
be expressed in non-technical language, some readers may appreciate such a lexical approach.
A definition of all modelling conventions used within the diagrams might help. Having a fixed set of
them should also improve consistency.
6|Page
5 Other Observations
Some felt that the overall concept of the model

It was unclear to us whether DDI 4 would be a simple re-implementation of version 3.2 in a
modelling environment, or whether it was also to be a logical upgrade. DDI4 seems heavily
based on version 3.2. The benefits of a model based approach are starting to show already,
perhaps continuing with a direct conversion first, without enhancement, would be a good
step forward?

Seems to be focussing on after-the-fact documentation, rather than at-the-same-time
designing of processes, documenting up-front. For example, DDI 4 seems to start at
Question, missing the earlier parts of the statistical lifecycle, such as Concepts (this could be
due to the partial review?).

Is showing signs of its heritage from XML Schema, rather than being a clean new model.
It is likely this effect will reduce over time though, with the further influence of having a
model basis.

One reviewer felt that the model is trying to meet too many scenarios and resulting in
design-by-committee with lots of point solutions, rather than an integrated whole. A solution
to this would be to make better use of abstraction, rather than having many variations in the
concrete. This effect is highly likely to reduce over time due to the influence of the visual
aspects of UML modelling, which will help to expose duplication.

DDI seems to contain re-definitions of other standards, such as Dublin Core and
XSD:dateTime. Perhaps it would be better to reference these, rather than re-define them.
Reduction of the volume of content will aid in navigation and understanding of the model. If
other standards are needed, then placing them in separate models (but perhaps on the
same website), with links, would be cleaner than making them part of the DDI standard.
Decoupling from other standards is always desirable for agility reasons.
One stand-out example is the DDI development of the process model. Are there other
existing process standards, such as BPMN/BPEL, that might be referenced and used in
conjunction with DDI?

Within ABS, we are coming to believe that the best solutions lie in the standard UML metamodelling (level 2 and 3) capabilities, as implemented in Sparx EA. We have considered
suggesting the applicability of these concepts to DDI, but wonder if Drupal is capable of this.
7|Page
Download