Feedback on DDI 4 1 Reviewing the Approach: 1. Does it have reusable components to the extent needed? DDI 4 has a significant number of reusable components, and the balance between size and number of reusable components seems good. One possible counter example would be the Date type, which can act as either a Date or a DateRange, which makes the model more difficult to understand. 2. We have some abstractions of things, like collection and other patterns. Is this the way to go? Abstraction is definitely a step in the right direction. There are several advantages to capturing abstract theoretical patterns (such as Set and Tree) in the modelling process: Consistency of modelling Simplicity and clarity in the transformation from model to physical form Opportunity for software applications using the physical form to write reusable code to match the abstract theoretical patterns in the data. Decreased coupling, leading to an anticipated reduction in the amount of future rework and breaking changes, resulting from the stability of theoretical models and patterns. This last point in particular (abstract theoretical patterns) can return significant benefits in the long term, especially when the effort of users of the standard are considered. These abstract types can then be instantiated in concrete types that match the vocabulary of the target audience. We recommend continuing to increase use of abstract types, especially where they correspond to well established theoretical patterns. This should make the model cleaner and simpler, with minimal duplication. (Any tendency towards excessive use of abstractions would probably show clearly in increasing complexity of the model.) There is also a danger, though, of “overuse” of reuse, or of mixing structures/patterns within the one reusable class. There should be considerable emphasis on the agility and light coupling within models, while mixing patterns might result in coupling that could cause problems in the future. Instead of having a pattern that does two functions internally, using a flag to indicate the difference, we could instead use a pattern that uses abstraction to remove the distinction, and then two trivial overrides to re-introduce it. The abstract layer is also expected to aid in mapping between different standards. 3. Is there a balance between real life and a clean model in a technical sense? 1|Page Given the origins of DDI in XML Schema, it appears that the “real life” aspect (i.e. concrete classes) predominates. However DDI’s initial efforts into abstraction, such as Collections and Tree Node structures are an excellent start on an appropriate path to redress this. The clean model should come from increased use, and reuse, of abstract patterns. 4. Are the classes modelling something in real life? Is it clear that a class is and what it relates to in the real world? Within one of the target user domains – statistics – terminology is converging towards a standard, as captured in GSIM. DDI’s efforts to adopt GSIM terminology in DDI 4 will introduce a clear linkage between DDI 4 and the real-life world of statistics. ABS recognises signs of this in DDI 4, and encourages further alignment. As part of our review, we downloaded recent models of DDI 4 from the website, and imported them into Sparx EA. This made it possible to generate diagrams for each class, as required. These diagrams made the meaning of each class easier to understand, and reflects the general value of a modelling approach. We tested DDI 4 on a several modellers who were new to both GSIM and DDI, and found only a few classes whose meaning eluded them. We expect that this effect will reduce over time as the model matures. The Capture class was one example of this. Some reviewers felt that there was a lack of clarity about whether the name was intended as a verb or a noun, and that its inheritance chain of classes varied between some that represented a task, and some that represented a definition or description of a task, which might be considered to be metadata for a task. 5. Reaction to the cascade model of describing variables. There is a strong similarity to GSIM, but some of the differences may benefit from further information/explanation. The concept of cascading models itself seems a valid approach. 2 Reviewing the Model 6. Are the set of properties complete? For the provided documentation, they seemed complete. It is likely that ABS would discover needs for more attributes if it attempted to apply DDI 4 in earnest, as it is currently doing with DDI 3.2. From manual inspection, no reports of missing properties were received. When exploring the DDI 4 models downloaded from the website, some incomplete properties were discovered, where properties had been declared, but without data types provided. (This may be the result of ABS downloading a more modern version, rather than the one that was groomed for official review.) 7. Are they under-specified, too simple? Over-specified, too complex? Can you provide a sense of the percent of coverage we are providing in terms of what is needed to describe a class? 2|Page ABS recognises three levels of model – conceptual, logical, and physical. In the past, DDI has been expressed in XSD, and so qualified as a physical model. With DDI 4, we note that multiple physical forms are available for download, including XSD, and so the physical form is still available. The DDI model and documentation is clearly more detailed than a conceptual model, and so we assume it is targeted as a logical model, by our terminology. As such it seems to be at an appropriate level of detail. As the physical forms were automatically generated from the logical DDI model, that model clearly must have contain sufficient detail to be precise, and so cannot have been under-specified. Some reviewers felt that it would be good to know the modelling conventions that are used to drive the conversions to physical. Perhaps these could be published as part of the standard. Some could argue that DDI is over-specified for a logical model, on the grounds that there is sufficient information to drive the generation of the physical forms, but this has to be balanced against the value of a model driven approach, in which the physical form is automatically generated from the logical. In such situations, the logical model must employ either modelling standards or hints to the generation code on occasion, to achieve the benefits of automation. It is clear that DDI has adopted the model-driven approach, and ABS supports that fully. Again, it would be clearer if the modelling conventions or hinting techniques were published along with the model. This would enable ABS to decide whether some details are superfluous, or present to inform the autogeneration code. Some reviewers also thought that there might be value in DDI including a conceptual layer model (particularly if/where it differs from GSIM), to help with introduction of its concepts. If a conceptual model were to be produced early in the process, then readers of the model would have a clear understanding of where DDI is heading, and would be more able to comment meaningfully on matters of scope. Comparability between DDI and other standards would also be easier at a conceptual layer. 8. Do the properties and relationships look correct? ABS downloaded models from the DDI website, and imported them into Sparx EA, to find that most of the relationships were reversed from the expected, and most inheritance relationships were duplicated. This is more likely to be a technical compatibility issue than an actual modelling problem? 9. Are there too many levels in terms of relationships? We assume this refers to the number of levels of inheritance of relationship types? The number of levels seemed manageable. If it is found in the future that the number of levels is confusing readers, then that could be clarified by having a separate diagram for the inheritance structure. As mentioned before, ABS’s investigations indicate that overuse of coupling in a model is undesirable, and inheritance is a strong form of decoupling. If there is a desire to reduce the number of levels of relationship inheritance, then one approach would be to have a stereotype of <<relationship>>, and to give all relationships that stereotype, rather than including them in an 3|Page inheritance tree. The code that auto-generates the physical forms would then use the stereotype to trigger similar structures for each relationship. This approach has the potential to simplify the model, reduce coupling and reduce levels of inheritance, (although we don’t know how this would work with Drupal or the physical model generation code.) Another issue that may be relevant at this point is the decision that must often be made when implementing a number of subtypes from a base type, between: defining trivial overrides to produce multiple subtypes, and having a type field with an enumerated list of subtypes. The enumeration approach certainly seems more compact, but can result in a “breaking change” if one of the subtypes becomes non-trivial. The override/inheritance approach allows a smoother transition from trivial to non-trivial overrides. It also allows subtypes to be further split into subsubtypes without rework. Thus ABS recommends using inheritance rather than subtype enumeration in these cases. 3 Functional Views 10. Are the right classes included in each Functional View given its described coverage? ABS received no comments on this topic from its reviewers. 11. In the future when we publish other views we intend publish in this format but with use cases, narratives, how to navigate the contents, etc. We want to provide information to help people think about their cases and decide which Functional View applies and how to use it. What would be useful to have? We support the concept of matching Views to Use Cases. This is consistent with the TOGAF concept of Viewpoint, from the point of view of a Use Case Actor. However, there may be multiple classes of Actors for a use case, so there may be multiple view-points; hence multiple views per use case. An analysis of Actors and Use Cases should yield the relevant viewpoints needed to guide this work. Ideally, readers should be able to identify their organisation’s work with relevant Actors and Use Cases. Perhaps Actors could be documented in a way to encourage this. Once they have identified with one or more actors and use cases, it should be relatively simple to trace through to relevant views. 12. Although the current Functional Views are very limited, do you see yourself using them? Why or why not? What could be done to make Functional Views better? To date, ABS is using DDI3.2 as a physical model for use within its Metadata Registry and Repository. To do so, it needs to map GSIM concepts through its own logical model to DDI 3.2. Any documentation for DDI 4 that aids in identifying these mappings on a case-by-case basis is likely to be of great benefit to ABS in the future, and so we hope the use of Functional Views is expanded. Perhaps further Views could be based on identified actors and use cases for DDI4? 4|Page 13. If you don't plan on using this (a DDI structure to manage the metadata found in the current Functional Views), what would you be using? Let's us know what we should be looking at in terms of what we cover. As mentioned above, the ABS expects the use of Functional Views to be of great benefit in understanding DDI4. While Views may need their own metadata, and a class to keep it in, we see them as more of a human readable artefact than machine readable. As indicated in the previous questions, it is the ease of human navigation to the correct view(s) that will determine their usefulness. We note that both DDI format and views fall under the general category of ontology. There is a simple OWL/RDF standard called Simple Knowledge Organisation System – SKOS – which deals with the documentation and relationships between ontologies. It might be worth adopting concepts from SKOS, or simply using it in its entirety. 14. In DDI-C we had top level elements (codeBook or DDIInstance) which contained a consistent set of information. XML and RDF handle this in different ways. This topic is still under discussion within the Moving Forward Project. Is there a set of information that is needed for all or most instances of a Functional View? ABS recommends the following, as a base set. Viewpoint information - Description, Classes of actors, Use cases. A reference to the version of DDI used by the view. Most of the usual Dublin core fields would be applicable. SKOS information – possibly re-modelled in XSD. 4 Reviewing the Documentation 15. The documentation files are a work in progress. What information do you need that is not there? Most reviewers were able to dive straight into the meat of the review, but some without previous DDI experience had some trouble. For them, there was shortage of information about the scope and purpose of DDI, its history, etc – likely stemming from an assumption of DDI experience? Perhaps their situation was sufficiently unique that this problem does not really need addressing. 16. What is “too much information”? There is certainly a lot of valuable information in DDI. The level of detail needs to be sufficient to drive the auto-generation of the physical models. This determines the amount of information needed, and so the problem becomes one of managing the presentation of the information in manageable chunks, rather than trimming it down. The View approach will yield benefits, as it enables a more “human readable” experience. 5|Page Another technique found to be of great value is the use of clickable web presentation of the model, as is done on the DDI wiki website. Is it possible to further emphasise and enhance this capability, perhaps as a way of presenting the information in more manageable chunks? Some techniques which we are considering internally to ABS, and which might be useful for DDI, include: Use of a conceptual model, to introduce concepts with limited detail. Use of theoretical abstracts, such as sets, trees, etc. that allow readers to link to pre-existing knowledge, or to provide the opportunity to research more detail outside of the presented material. Defined, documented, consistent modelling styles and techniques Dynamic Generation of diagrams as required. Sparx EA has a menu item (right click, Insert Related Elements…) that allows the automated addition of related classes onto a diagram. Thus it is a simple matter to create a new diagram for the context of a class, when needed. This ability allows the experienced user to investigate the model beyond the provided information. Like DDI’s website, ABS is developing a clickable version of our model. There was a general impression that the document could be made more succinct or a more compact format might be appropriate, especially if planning to print on paper. 17. How could we make it easier to use? ABS sees presentation of models as an exercise in managing complexity. Specialist interactive tools such as a clickable model, e.g. the DDI wiki web site, are more likely to solve the problem than normal paper-style documents. One reviewer suggest the concept of How To recipe books as many developers find this approach to coding to be successful. Perhaps this is another View? Perhaps some examples bases on actors or use cases could be of further illustration, in a number of different presentation modes (views, XML, RDF examples?) 18. To understand? It is relevant to consider the intended audience. Hypothetically, to what extent can we expect them to already understand the fundamentals of data structures and modelling? DDI has already taken a great step forward by enabling the generation of multiple physical forms from the one model. This will allow readers to work in any of the physical data languages that they are familiar with. An opportunity may lie in the use of RDF triples. If the triples that define the RDF form of DDI could be expressed in non-technical language, some readers may appreciate such a lexical approach. A definition of all modelling conventions used within the diagrams might help. Having a fixed set of them should also improve consistency. 6|Page 5 Other Observations Some felt that the overall concept of the model It was unclear to us whether DDI 4 would be a simple re-implementation of version 3.2 in a modelling environment, or whether it was also to be a logical upgrade. DDI4 seems heavily based on version 3.2. The benefits of a model based approach are starting to show already, perhaps continuing with a direct conversion first, without enhancement, would be a good step forward? Seems to be focussing on after-the-fact documentation, rather than at-the-same-time designing of processes, documenting up-front. For example, DDI 4 seems to start at Question, missing the earlier parts of the statistical lifecycle, such as Concepts (this could be due to the partial review?). Is showing signs of its heritage from XML Schema, rather than being a clean new model. It is likely this effect will reduce over time though, with the further influence of having a model basis. One reviewer felt that the model is trying to meet too many scenarios and resulting in design-by-committee with lots of point solutions, rather than an integrated whole. A solution to this would be to make better use of abstraction, rather than having many variations in the concrete. This effect is highly likely to reduce over time due to the influence of the visual aspects of UML modelling, which will help to expose duplication. DDI seems to contain re-definitions of other standards, such as Dublin Core and XSD:dateTime. Perhaps it would be better to reference these, rather than re-define them. Reduction of the volume of content will aid in navigation and understanding of the model. If other standards are needed, then placing them in separate models (but perhaps on the same website), with links, would be cleaner than making them part of the DDI standard. Decoupling from other standards is always desirable for agility reasons. One stand-out example is the DDI development of the process model. Are there other existing process standards, such as BPMN/BPEL, that might be referenced and used in conjunction with DDI? Within ABS, we are coming to believe that the best solutions lie in the standard UML metamodelling (level 2 and 3) capabilities, as implemented in Sparx EA. We have considered suggesting the applicability of these concepts to DDI, but wonder if Drupal is capable of this. 7|Page