Assessment Item J2 – Scanner and parser design – CS431 Skill being assessed: Ability of the student to use results from formal language theory and algorithmic principles to inform design decisions in compiler construction. Program outcome to which this skill is mapped: (j) An ability to apply mathematical foundations, algorithmic principles, and computer science theory in the modeling and design of computer-based systems in a way that demonstrates comprehension of the tradeoffs involved in design choices Performance Assessment Abstract: Designing a compiler is a complex endeavor that benefits from many areas of computer science, including formal language theory and algorithmic principles. For example, the techniques used in the two initial stages of a compiler have a solid mathematical foundation. In fact, scanners and parsers are fully based on formal models of computation called regular expressions (or regex's) and context-free grammars (or CFG's), respectively. As you know, each regex or CFG represents a language (or set of strings/token sequences) that it generates. However, these models have different computational powers. For example, some languages can be generated by one or more CFG's but cannot be generated by any regex. Furthermore, every language that is generated by a regex can also be generated by at least one CFG. In short, CFG's are a more powerful computational model than regex's, since the former can do everything that the latter can do, and more. Similarly, there are several interesting sub-classes of CFG's, such as LL(1) and LR(1) grammars, with different computational powers. For us, the most relevant result of formal language theory is that regex's, LL(1) grammars, LR(1) grammars, and general CFG's each have more computational power than their predecessor in the list. Since CFG's are the most powerful, why not use them for both scanning and parsing? Why even separate these two stages? The reason, of course, is that there exist many trade-offs that apply to compiler design. First, there are several stakeholders to consider, including the programming language designer, the compiler writer, the other system programmers, the application programmer, and the end user of the compiled applications. Second, each stakeholder may focus on different and often conflicting features such as: run time, memory requirements, maintenance, ease of use, user friendliness of error messages, etc. Third, several of these features are relevant to most of the software systems involved, namely the compiler itself, but also the source code it takes as input, the operating system it relies on, and the executable code it outputs. Finally, consider the following facts: Fact 1: Regexp's can be used to build scanners that run in O(n) time, where n is the total number of characters in the source program file. Fact 2: Parsers that can handle any CFG run in O(n3 ) time, where n is the number of tokens produced by the scanner. Fact 3: Parsers that can handle LL(1) or LR(1) grammars run in O(n) time, where n is the number of tokens in the output of the scanner, but LR(1) parsers are harder to implement. Now, use the foregoing trade-offs and facts to write a well-structured and grammatically correct essay that answers the following questions as precisely as possible: 1. Why is scanning typically separated from parsing? You must state and justify exactly three DISTINCT advantages of this common practice in compiler design. An alternative approach would be to use grammars for both scanning and parsing, which could then be combined into a single phase. But then, why even use regex's? 2. If you were to write your own parser-generating tool (like yacc or JavaCC), would you support LR(1) grammars or only LL(1) grammars? Remember to justify your answer fully and to consider the relevant stakeholders and tradeoffs. There is no one correct answer. Any choice is acceptable as long as it is backed by a detailed consideration of pros and cons of each alternative in relation to your goals for this tool. You must state these goals as part of your answer. Make sure to clearly separate your two answers. For each one, you must use precise phrasing and justify each claim with a cogent argument. Your essay must be on the order of, and no longer than, two single-spaced pages. If you consult and use sources to write your essay, you must include full references to these sources at the end of your essay. However, references are not included in the page count. Rubric for Evaluation Criteria Scanning versus parsing Exemplary Student clearly stated three distinct advantages to differentiating the scanning and parsing phases. All three advantages are extremely well justified based (partly) on the given trade-offs and facts. Satisfactory Student stated three distinct advantages to differentiating the scanning and parsing phases. All three advantages are reasonably well justified based (partly) on the given trade-offs and facts. Marginal Student stated three advantages but two of them are mostly different formulations of the same one or the justification for some of them is lacking in clarity or detail. LL(1) grammars versus LR(1) grammars Student made a clear choice of which type of grammar to support and justified that choice superbly by weighing their own goals, the pros and cons of the Student made a clear choice of which type of grammar to support and justified that choice reasonably well by weighing their own goals, the pros and cons of the Student made a clear choice of which type of grammar to support but this choice is based on a somewhat limited consideration of the pros and cons Deficient Student did not articulate more than one advantage, or they did not use the given trade-offs and facts, or the poor quality of the write-up makes it hard to identify the purported advantages or to understand their justification. Student did not make a clear choice, or did not state their goals at all, or did not justify their choice with a logical argument based on the pros and cons of the alternatives, and thus coming to a fully justified conclusion. alternatives, and coming to a sufficiently justified conclusion. of the alternatives, or student did not state their goals in clear enough language. alternatives and their goals.