IndexConvert User Guide Issue 0.9.2.3.a February 2016 IndexConvert User Guide For Release 0.9.2.3 Page 1 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 Introduction IndexConvert is a Microsoft Word macro that helps convert an index from a text or word processor file to a form that can be imported into an indexing program, spreadsheet or database. It automates many of the most complex processes. The concept was first presented at the 2015 Joint SI/SfEP Conference in York in a five minute Lightning Talk. A manual process had been developed and used successfully but the manual conversion of an index to a tab delimited file is a tedious and labour intensive process likely to add new faults. Creation of this macro began approximately two weeks before the conference and a prototype was available at the time of the conference. The processing at that stage was still immature and since then has advanced considerably. It uses the process and labels developed for the manual process but performs finding and replacing automatically under user control. Export is available to support Cindex, Macrex, Sky Index, a spreadsheet and a database. Please read the disclaimer section before using the software. See Disclaimer. Page 2 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 Contents Introduction ......................................................................................................................................... 2 Contents ............................................................................................................................................. 3 IndexConvert ...................................................................................................................................... 7 What it is ......................................................................................................................................... 7 General Approach ........................................................................................................................... 7 Installation ....................................................................................................................................... 8 Run IndexConvert ........................................................................................................................... 8 Trust Settings .............................................................................................................................. 9 Configuration................................................................................................................................... 9 Before Starting on a New Index....................................................................................................... 9 User Interface ................................................................................................................................10 Ctrl-Break .......................................................................................................................................11 Process Commonality ....................................................................................................................11 Preprocess .....................................................................................................................................11 Label Headings, Set Out ................................................................................................................12 Label Headings, Run-on ................................................................................................................13 Label locators.................................................................................................................................13 Locator Options ..........................................................................................................................13 Roman Numerals .......................................................................................................................13 Cross References ......................................................................................................................14 Locator Labels............................................................................................................................14 Exceptions .................................................................................................................................14 Audit ..............................................................................................................................................14 Audit Report ...................................................................................................................................15 Concatenate...................................................................................................................................15 Unicode Characters....................................................................................................................15 Label Styles ...................................................................................................................................15 Export ............................................................................................................................................15 Exit .................................................................................................................................................16 Saving ............................................................................................................................................16 Licence Key and Metrics ....................................................................................................................16 Metrics ...........................................................................................................................................16 Layout Resulting from IndexConvert Processing ................................................................................17 Labels used by IndexConvert .............................................................................................................18 Page 3 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 Headings and Locators ..................................................................................................................18 Styles .............................................................................................................................................18 Error Codes....................................................................................................................................19 Import Index to Target Program .........................................................................................................22 Cindex............................................................................................................................................22 Import.............................................................................................................................................22 Remove Labels ..............................................................................................................................22 Adjust Styles ..................................................................................................................................22 Macrex ...............................................................................................................................................25 Unicode.ini .................................................................................................................................25 Macrex Import ................................................................................................................................26 Spreadsheet ......................................................................................................................................26 Database ...........................................................................................................................................27 SKY Index ..........................................................................................................................................27 Case Studies .....................................................................................................................................30 Test Index ......................................................................................................................................30 Legal Index ....................................................................................................................................33 Disclaimer ..........................................................................................................................................34 IndexConvert Copying, Licence Keys, and Redistribution ..................................................................34 Acknowledgements ............................................................................................................................34 Enquiries ............................................................................................................................................35 Page 4 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 Tables Table 1 Navigation and Scrolling ........................................................................................................11 Table 3 Layout Resulting from Processing .........................................................................................17 Table 4 Content Labels ......................................................................................................................18 Table 5 Style Labels ..........................................................................................................................18 Table 6 Error Codes...........................................................................................................................19 Table 7 Cindex Find and Replace Patterns to Recover Fonts from Markup .......................................23 Table 8 Macrex Unicode Table - Observations ..................................................................................26 Figures Figure 1 Templates and Add-ins ......................................................................................................... 8 Figure 2 Security Warning................................................................................................................... 9 Figure 3 Microsoft Office Trusted Locations ........................................................................................ 9 Figure 4 Configuration ........................................................................................................................ 9 Figure 5 Index Structure Dialog ..........................................................................................................10 Figure 6 User Interface ......................................................................................................................11 Figure 7 Example Dialog Box .............................................................................................................11 Figure 8 Preprocess Error Message...................................................................................................12 Figure 9 Label Headings, Levels 1-3 ..................................................................................................12 Figure 10 Label Headings, Levels 4-6 ................................................................................................12 Figure 11 Label Headings, Levels 7-9 and 10+ ..................................................................................12 Figure 12 Label Headings, Indents – Auto .........................................................................................13 Figure 13 Locator [ I ] Selected, Showing Roman Numerals Checked ...............................................13 Figure 14 Locators [ i ] Selected Showing Optional Cross Reference Matrix ......................................14 Figure 15 Locators [ i ] Selected Showing Optional Locator Labels ....................................................14 Figure 16 Metrics ...............................................................................................................................16 Figure 17 Cindex Replace Showing Recovery of Italic Font from Markup ..........................................23 Figure 18 Select Macrex Unicode file .................................................................................................26 Page 5 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 Figure 19 Sky Index Translation File not found ..................................................................................27 Figure 20 Read Sky Index Translation File.........................................................................................28 Figure 21 Sky Index Replace .............................................................................................................28 Figure 22 Sky Index Style Recovery ..................................................................................................29 Page 6 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 IndexConvert What it is IndexConvert is a Microsoft Word macro that helps convert an index from a text or word processor file to a form that can be imported into an indexing program, spreadsheet or database. It automates many of the most complex processes. Cindex was chosen as the launch target because that is the indexing program I use. IndexConvert is capable of converting many structured files into a form that can be imported into a spreadsheet or database A little more pre-processing is required but if this can be performed using Word find/replace then this is fairly quick. If the data can be structured to look like a back-of-the-book index then it can be converted to a spreadsheet or database format The macro requires a ‘desktop’ version of Microsoft Word and will not run with an online version of Office 365. Advanced programming techniques, pattern matching and advanced find and replace techniques are used to maximize quality and minimize user intervention. The user does not need to be an advanced user of MS Word but does need to understand how their indexing program treats entries and needs to take care when responding to judgements offered by IndexConvert. General Approach An interactive approach is used to make the operation of this very sophisticated Page 7 of 35 macro as simple as possible. The user is asked to respond to initial judgements made by the macro based on multiple searches of the index for predefined patterns. Examples of the dialogue are ‘Is this a level 1 heading?’ or ‘Is this a first locator?’ The first stage is preprocessing. This removes multiple paragraph marks, manual line feeds and single letter group headings. An end of file marker is inserted to support termination of processing. This is automatically removed later. The preprocessing stage must be repeated until no further adjustments are required. Index entries are labelled using distinctive labels unlikely to appear normally. These include $H1_ to $H9_ for headings, $L_ for locators and a series of error codes beginning $Er. Headings are labeled using a process based on the settings of several style and indentation parameters enabling accurate identification of heading levels. Each heading level is labeled separately. Each heading level process can be run several times to label headings at the same level that may have been formatted differently. An alternative process based only on the indentation of the headings and subheadings is also available. When the indentation process works it is very successful, but it depends on the indentations being extremely reliable. © B Campbell 2015-16 IndexConvert User Guide Because labeling is rapid, the first 10 matching entries are labeled and the user is then asked whether to continue the labeling process. At this stage another 10 can be labeled or the quantity can be doubled. Each time the requested number of entries has been labeled the next batch size can be doubled or reverted to 10. Issue 0.9.2.3.a February 2016 Installation The macro is contained in a file named IndexConvert.docm. This is installed as a Word Add-In. Select the Office Button>Word Options>AddIns>Word Add-Ins>Go. Select Add as shown in the dialog box below and add IndexConvert.docm Labelling of locators uses a similar process. Numeric locators are identified and see/see also cross references are labeled as locators if they are run-on from headings. Options exist to define additional cross references such as voir, and locator prefix labels for chapters, figures, plates, tables etc. When heading and locator labeling is complete an audit process determines whether the labeling has been consistent and identifies entries that may have been labeled incorrectly. The user needs to adjust the index at this stage to eliminate all error codes. An audit report can be created containing only the entries containing $Er error labels. Concatenation follows. For a tab delimited export such as for Cindex or Sky Index, each lowest level subheading is prefixed by all of the parent headings, separated by tab characters. For a Macrex export underline characters “_,” are inserted instead of the higher level headings Styles are labeled following concatenation. The styles labeled are bold, italic, underline, subscript and superscript. Page 8 of 35 Figure 1 Templates and Add-ins Run IndexConvert To run IndexConvert the file containing the index must be saved as a docm (macro enabled) file. This ensures that the advanced features used by the macro will function correctly. On opening the file a security warning may be seen as below. © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 Choose the export appropriate to you. If your details have been entered correctly the macro will run on the next attempt. Figure 2 Security Warning Select Options and allow macros to run. The macro can be assigned to a custom key combination by using Office Button>Word Options>Customise>Customise. Select a key combination and assign it to the macro. Figure 4 Configuration Before Starting on a New Index Remove content from all headers and footers of your document to prevent their content being exported in the final index text file. Check odd and even. Trust Settings The Trust settings can be adjusted to allow files in the chosen directory and subdirectory to be trusted. Use Word Options>Trust Center>Trust Center Settings. Remove the title and any introductory comments from your index to prevent them being labelled as headings. If there are complex or numeric group headings then remove these. Preprocess will remove single letter group headings and most blank lines automatically, but for a new index, removing at least some of these manually will increase familiarity with the index and may find unexpected features that need to be addressed Figure 3 Microsoft Office Trusted Locations Select the View ribbon and Macros. A dialog box appears. Select IndexConvert and Run. The security warning above will no longer appear. Configuration When the macro is run for the first time a configuration screen appears. You must enter the licence key, name, email address and if appropriate, company name. Inspect the index. Understand the structure. How many heading levels exist, are there run-on subheadings. How are locators differentiated from the text? IndexConvert requires subheadings to be set-out for identification and concatenation to work correctly. Run-back subheadings followed by additional subheadings, as below should be made set-out. heading, subheading 1 subheading 2 Page 9 of 35 © B Campbell 2015-16 IndexConvert User Guide subheading 3 Needs to be altered to. Heading subheading 1 subheading 2 Issue 0.9.2.3.a February 2016 Replace string \1^13^t^t\3 is preloaded by IndexConvert but not automatically used. Tthis replaces group 2 with a new line followed by two tabs. If suitable then modify the find and replace strings as necessary. subheading 3 When running Preprocess for the first time a dialog box similar to that below below appears as a reminder. If the index was originally produced manually then alterations to create a pure set-out structure can be quite complex. User Interface The main user interface is shown below. Individual command buttons are then described. The [ i ] buttons supply further information and if appropriate, additional configuration settings. You should begin with Preprocess and work down the column, finishing with Export. Figure 5 Index Structure Dialog This is shown as a result of a Word search for ([a-z]{3})(, )([a-z]{3}) with wildcards selected. This searches for three lower case letters a-z (group /1) followed bu a comma and space (group /2) and another four characters (group /3).. Page 10 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 The user still needs to be wary and to check that the index is well formed before proceeding to label headings. Run preprocess as many times as necessary until zero alterations are proposed. Preprocess is controlled by the dialog box shown below. A similar dialog box is used throughout processing. The control buttons at the bottom of the dialog box are the main process control buttons. Small buttons clustered to the right support navigation, screen updates and zoom. Figure 6 User Interface Figure 7 Example Dialog Box Ctrl-Break If you make a wrong choice and processing is already under way, Ctrl-Break will halt the macro, Table 1 Navigation and Scrolling + Increase zoom - Decrease zoom T Top of file U Up 5 lines (depends on zoom) Audit report, concatenate, label styles and export differ according to the selected export R Return to original position Preprocess D Down 5 lines (depends on zoom) Preprocess removes double paragraph markers or new lines, and group heading letters. Locators beginning on a new line can be joined to the previous line. B Bottom of file S Control screen updates Process Commonality Preprocess, heading labeling, locator labeling and audit is the same for all exports. Page 11 of 35 © B Campbell 2015-16 IndexConvert User Guide Scrolling (screen updates) can be turned off during processing using the S button. This will speed the processing until the next user interaction is required. In some circumstances the response seen is less reassuring than when scrolling is allowed. During operations taking considerable time, if screen updates have not already been cancelled, this dialog box appears after about 7 seconds and allows S to be selected. Issue 0.9.2.3.a February 2016 When labeling headings, groups of 10 are labeled with the option to double the number at each user intervention. This means that if results are unsatisfactory then recovery is simple. Headings 1 through 9 are processed by selecting the appropriate button. Lower level headings can be entered as a number and the 10+ button selected. This is more likely to be required for exporting complex data to a spreadsheet. If you encounter the message shown below then this may indicate that the original index file came from a text file or an Apple Mac environment. Figure 9 Label Headings, Levels 1-3 Figure 8 Preprocess Error Message If it keeps recurring then try resaving the file (as a .docm), closing it and re-opening it. Then rerun Preprocess. Figure 10 Label Headings, Levels 4-6 Label Headings, Set Out There are two principal approaches to labeling headings. The first method involves an automatic inspection of clues about text formatting associated with the entry. Each heading level is processed independently and the user should begin by selecting Level 1. Proceed through each heading level in turn. Page 12 of 35 Figure 11 Label Headings, Levels 7-9 and 10+ The alternative is the Auto option which makes use of incrementing indents. If these are used in the index then this is the fastest and most reliable method for labeling headings. However, in general, © B Campbell 2015-16 IndexConvert User Guide processing individual levels is likely to be faster and more reliable. Issue 0.9.2.3.a February 2016 must be fixed before concatenation can proceed. Locator Options Selecting the [ i ] button opens a dialog box allowing the processing of Roman locators, additional cross references and locator prefixes. Roman Numerals Figure 12 Label Headings, Indents – Auto The result of label headings is indicated below. $H1_heading $H2_subheading 1 $H2_subheading 2 $H2_subheading 3 Label Headings, Run-on Run-on subheadings are not addressed by this macro. Where run-on subheadings are clearly identified (by colons and semicolons etc.) then Word find and replace can be used to replace these with tab characters or spaces followed by the appropriate subheading label, $H2_, for instance. Processing can then proceed as normal. Roman numeral locator processing can be selected here. There is a risk that headings containing what appear to be Roman numeral locators will be labeled. For example Malcolm X, WARNING: The search for Roman numerals wiil significantly slow locator processing. Be patient, it may appear as if processing has stopped. If Roman numerals are not used or are unusual then do not select this option. The default configuration at installation is Roman numerals unselected. Label locators Arabic numeral locators and (optionally) Roman numeral locators up to l (50) are labeled. Only the first locator is labeled, based on punctuation. Two passes are performed. The second pass determines whether the punctuation was ambiguous. The second set of punctuation that could be mistaken as a first locator is labeled $Er01_. The errors Page 13 of 35 Figure 13 Locator [ I ] Selected, Showing Roman Numerals Checked © B Campbell 2015-16 IndexConvert User Guide Cross References Additional cross reference terms such as voir can be added in the grid. The large panel to the right is a scratch pad that can be used for transferring a list of terms from another document. It is only a storage area and is not read during processing. Values entered here are retained between sessions. Unicode font is supported. Only the first word of a set of cross references needs to be entered; see (a default value) defines see, see also, see under etc. Figure 14 Locators [ i ] Selected Showing Optional Cross Reference Matrix Locator Labels Additional locator (prefix) labels can be used to address chapters, diagrams, figures, plates etc. A scratchpad is available in the panel at the right. This panel is used for prefix labels only. There is no need for a similar suffix dialog. If there are spaces following the prefix then these must be included in the grid. Values are retained between sessions. Figure 15 Locators [ i ] Selected Showing Optional Locator Labels Page 14 of 35 Issue 0.9.2.3.a February 2016 The example below shows the result of locator labeling. $H1_heading $H2_subheading $L_ 1 $H2_subheading $L_2 $H2_subheading $L_3 Exceptions Label Locators is intended to process back-of-the-book indexes. There are certain indexes where Label Locators should not be performed. These include legal indexes where the locators appear as a set-out column. For this type of index any see cross references should be manually labeled with $L_. Word find/replace can be used. The tab character is important. During the Audit stage do not allow locator audit to run if most entries do not contain labeled locators. Audit Audit checks consistency of the index labeling. Manual correction is required following audit. There are several error types that cannot be detected, for instance, if a level 2 heading has been marked as Level 1, an error will not be reported unless there is a following level 3 heading. The Audit process should not be run until all heading levels have been processed as it will insert error labels into the index which then need to be removed. If no locators are found then locator audit is not performed. © B Campbell 2015-16 IndexConvert User Guide Audit should normally only be run once for an index. Correction of errors may involve removing locator labels completely, for instance to force a see cross reference into the locator field. Error processing is addressed later in this guide. Audit Report The audit report contains all the entries labeled with a $Er label. This can be useful for reviewing the status of the index and also when support is required. Error correction can be performed on the audit report if you are unsure how to proceed and do not want to risk making the wrong changes to a large index. These alterations then need to be performed on the main index. A special audit report is created following Macrex and Sky Index concatenation. See the Macrex and Sky Index sections. WARNING: The audit report uses the clipboard. Do not perform any copy/paste operations in other applications while concatenation is in progress. Concatenate Concatenate works through the audited hierarchies of headings, concatenating level 1, Level 2 etc. Tabs are inserted between levels for Cindex, Sky Index, spreadsheet and database export. Commas are inserted for Macrex export. During concatenation the user is asked at regular intervals of about 10 seconds whether screen updates should be prevented (option s in the dialog). When s is selected processing proceeds faster and the dialog no longer appears. Page 15 of 35 Issue 0.9.2.3.a February 2016 The end of file label is removed at the end of Concatenation If processing is interrupted for any reason concatenation can be restarted. The index is scanned for where processing was interrupted and will continue. Successful continuation requires that the first unprocessed entry is a level 1 entry. Unicode Characters For Sky Index, if a Translation table is present, and has been exported, and is selected during Concatenate, then any characters defined in the Translation table are replaced by their Text keyboard entry codes. If a character is encountered that should be in the Translation table then the first occurrence is labeled {$Ernnnn} where nnnn is the decimal value of the Unicode character. The character, its decimal value and its hex value are included automatically in an Audit report. The process is similar for Macrex except that the file referenced is the Unicode.ini file. WARNING: Concatenation uses the clipboard. Do not perform any copy/paste operations in other applications while concatenation is in progress. Label Styles Bold, italic, underline, subscript and superscript strings are labeled. Export This exports the concatenated index to a text file. The file name will be the same as the original name used but with a different extension depending on the export format chosen. © B Campbell 2015-16 IndexConvert User Guide The table below shows the extensions and encoding for the different exports. You are advised not to use the Word File>Save command as you will miss out on any error checking included in the exit command and may choose an inappropriate encoding which will result in your indexing program being unable to read the file correctly. Export format Extension Encoding Cindex txt UTF-8 Unicode Spreadsheet txt UTF-8 Unicode Macrex mbk Windows ANSI Sky Index txt Windows ANSI Database txt UTF-8 Unicode Issue 0.9.2.3.a February 2016 When IndexConvert is not running, the index can be saved using the Word File>Save or File>Save As command. It is recommended that the file is saved with a different name at each stage of the process in case rework is required. Licence Key and Metrics Licence keys are supplied to match user needs. They contain information controlling the number of indexes and the average number of entries that can be processed in a year. They also contain an expiry date. The values encoded by the license key are displayed by the Metrics dialog. Metrics are reset with the first index processed each January. Metrics The metrics dialog shows usage against the limits allowed by the licence key. Exit This exits IndexConvert and allows the user to review the document. IndexConvert is designed to allow frequent starting and stopping as the processing continues. Figure 16 Metrics Saving Page 16 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 Layout Resulting from IndexConvert Processing The table below summarizes the layout that results from processing by IndexConvert. Table 2 Layout Resulting from Processing Selection Export Format Cindex Fields are separated by tabs. Where more than one field exists in a record the last field is expected to be a locator. The $H and $L labels are used to identify the fields, They are retained in the exported file to give extra assurance the all fields end up in the right place when imported to Cindex. Spreadsheet Exports a ‘flat’ spreadsheet file. Locators, if present, all appear in the same column. Empty cells to the left of the locator column are padded with appropriate $H labels. Macrex MBK Headings and Locators are separated by commas. Where headings contain commas these are enclosed in braces {,} (soft commas. Unicode characters and Greek characters may be defined as expanded codes including [alpha], a{[v]}. These are defined in a Unicode table supplied with Macrex. IndexConvert uses the Unicode table to expand the codes during the concatenation process. Sky Index Fields are separated by tabs. The $H and $L labels are used to identify the fields, They are retained in the exported file to give extra assurance that all fields end up in the right place when imported to Sky Index. Database Fields are separated by tabs. The $H and $L labels are used to identify the fields, They are retained in the exported file to give extra assurance that all fields end up in the right place. Database can be selected as an alternative format for Cindex .. Page 17 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 Labels used by IndexConvert Headings and Locators Index content is labeled to ensure that headings and locators are correctly identified. A common approach is employed for the initial labeling of headings and locators independent of the final export format. Table 3 Content Labels Text Label Headings $H1_ to $H9_ Locators $L_ Styles Styles are labeled for Cindex, Sky Index, spreadsheet and database export using a similar style of labeling as is used for headings and locators. For Macrex export the Macrex labels are used. Table 4 Style Labels Text Cindex Macrex Spreadsheet Sky Index Database Bold $BA_bold$BZ_ Italic $IA_italic$IZ_ Underline $UA_underline$UZ_ Superscript $+A_ $+Z_ Subscript $-A_subscript$-Z_ subscript Small Caps $sA_SMALLCAPS$sZ_ SMALL CAPS Page 18 of 35 \Bold\ © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 Error Codes All error codes begin with $Er. This pattern can be used to find them and to delete or alter them once they have been addressed. Some adjustments may be quite complex, and a good knowledge of the operation of the target indexing program is required to make the correct adjustment. Table 5 Error Codes Label Description Cindex Macrex Sky Index Spreadsheet Database $ErH01_ Hierarchy error. No level 1 heading label. This indicates a higher level heading label is missing in the hierarchy. $ErH02_ No level 2 heading label. It could be a serious error and can be repaired either by restarting the labeling process or use find/replace to correct any errors. $ErH03_ No level 3 heading label. $ErH08_ No level 8 heading label. Page 19 of 35 © B Campbell 2015-16 IndexConvert User Guide Label Description Issue 0.9.2.3.a February 2016 Cindex Macrex Sky Index Spreadsheet Database $ErL00_ Punctuation error or unexpected locator layout means no locator has been labeled for this entry. Every entry will be labeled with either a $L_ locator label or a $ErL00_ locator error label. This forces the user to review every $ErL00_ label. Where the locator has been mislabeled this can be corrected and the $ErL00_ label removed, including the leading tab character. $ErL01_ Locator ambiguity. Two parts of an entry contain punctuation that is recognized as a locator. A $ErL01_ label is added to the end of the entry. Ensure that the first locator is labeled $L_ For subheadings beginning with see or see also, remove the $ErL00_ label and leading tab. Insert {Tab}$L_ to cause the subheading to be treated as a locator. Ensure the first locator is labeled $L_ or a stopper ~!~, For headings and subheadings without locators the $ErL00_ should be removed. For headings and subheadings without locators the $ErL00_ should be removed. These entries become redundant during concatenation and are automatically removed. Ensure that the first locator is labeled {Tab}$L_ Ensure the first locator is labeled {Tab}$L_ or a stopper ~!~, If there is a number in the heading then enclose in curly brackets {…}. $ErL02_ Page 20 of 35 Macrex export only. A locator marker $L_ and a stopper ~|~ are both present in an entry. Ensure the first locator is labeled $L_ or a stopper ~!~, © B Campbell 2015-16 IndexConvert User Guide Label Description Issue 0.9.2.3.a February 2016 Cindex Macrex Sky Index Spreadsheet Database $ErL03_ Unexpected spaces in the locator string. First locator may have been incorrectly labeled. Check first locator labeling. Move if necessary, with associated tab character and remove $ErL03_ label As Cindex etc. $ErL04_ Incorrect number of tabs in concatenated entry. Adjust as appropriate. N/A {$Ernnnn} Macrex only. An unknown Unicode value Change the character, if possible or update the Unicode table. {$Er} Macrex only. A right to left character. Not currently supported by IndexConvert. No current solution. Page 21 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 Import Index to Target Program Cindex This summarizes the recommended process for importing an IndexConvert generated file into Cindex and post processing it ready for updating the index. The process is similar for all programs but is only detailed for Cindex. Import Create a new index file and import. Remove Labels The following process is recommended. It removes labels from the fields where they are expected, allowing errors to be found rapidly using global search. Remove $H1_ labels from the main heading. Remove $H2_ labels from sub headings. Similarly process any other subheadings. Remove $L_ labels from locators. and will require review and correction as necessary. Adjust Styles Use find and replace to restore bold, italic, underline, subscript and superscript to the original styles. The large number of Word underline styles are all labeled the same and can be recovered to an underline style. The window below shows an approach using regular expressions which removes the italic markup and reinstates the original style. It uses three groups in the search window. The first contains \$IA_, the third contains \$IA_. The \ is used because $ has special significance. Instead of attempting to build a complex expression between the two groups it is much easier to say that a certain character does NOT appear, hence [^!]*. ! Is an exclamation mark. Choose a character that suits. Finally, check that all styles have been adjusted. Searching for A_ will do this. Then look for any remaining labels. These may have ended up in the wrong fields Page 22 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 Figure 17 Cindex Replace Showing Recovery of Italic Font from Markup Table 6 Cindex Find and Replace Patterns to Recover Fonts from Markup Style Find1 Replace Italic (\$IA_)([^!]*)(\$IZ_) \2, Attribute I Bold (\$BA_)([^!]*)(\$BZ_) \2, Attribute B Smallcaps (\$sA_)([^!]*)(\$sZ_) \2, Attribute S Underline (\$UA_)([^!]*)(\$UZ_) \2 Attribute U Subscript (\$-A_)([^!]*)(\$-Z_) \2 Attribute Sp Superscript (\$+A_)([^!]*)(\$+Z_) \2 Attribute Sb 1 The Find string can be copied into the Cindex find field. Then you just need to adjust the attribute for each find/replace operation. Page 23 of 35 © B Campbell 2015-16 IndexConvert User Guide Page 24 of 35 Issue 0.9.2.3.a February 2016 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 Macrex Unicode characters are not supported directly by the Macrex mbk file but are replaced by printer replacement codes defined in a Unicode table. Expansion of Unicode characters to printer replacement codes takes place during concatenation. An audit report is created automatically which lists all the printer replacement codes in the Unicode file and also those appearing in the index but not found in the Unicode table. The user may need to adjust the Unicode table to obtain the desired result. The $H and $L labels added during earlier stages of processing are replaced during concatenation to give a format suitable for Macrex import. Heading levels are separated by commas. Commas within headings are enclosed in braces {,} and locators are separated from headings by commas. Where the Unicode table defines printer replacement codes these replace the Unicode characters during concatenation. The first time an unrecognized code is encountered it is labeled {$Lnnnn} where nnnn is the decimal code value of the Unicode character. The code is added to the internal copy of the Unicode table which is written to the audit report to support diagnosis. Right to left characters are not currently processed fully by IndexConvert. Every occurrence is labeled {$Er}. All $Er codes must be removed before export. Braces are not added to locator strings where Macrex expects non numeric characters to be enclosed in braces. The user will need to address these either during Macrex editing or by editing the mbk file prior to import. Styles are labeled \Bold\, ^italic^, {[S]}superscript{[s]}, {[U]}subscript{[u]}, {[UL]}underline{[ul]} and {[A]}SMALL CAPS{[a]}. These are interpreted by Macrex. Nested braces may be present in the IndexConvert export file where complex combinations of diacritics and styles are present. Many adjustments have been included in the software to try and prevent their occurrence. Unicode.ini Unicode.ini file is the default definition of the Unicode characters available and their expanded counterparts. IndexConvert uses the Unicode file to convert any Unicode characters to printer replacement codes recognized by Macrex. The Unicode file is selected during configuration by selecting the Macrex Unicode File button. A typical dialog box is shown below. Page 25 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 Figure 18 Select Macrex Unicode fileThe interaction of IndexConvert, the index being exported and the Unicode file can be very complex. The table below includes observations that may help but individual users will have to use their own knowledge and experience to make appropriate choices. Table 7 Macrex Unicode Table - Observations Unicode Source Observation Recommendation 0x7b Default Unicode table Certain curly brackets placed by IndexConvert are replaced by the Unicode values.{[(]} and {[)]} Either remove from the Unicode table or edit the file after export. 0x7d Macrex Import Use the Load from Backup option. Macrex parses the backup file as it is being read. The number of alterations required will depend on the attention paid to IndexConvert processing through the various stages. It will be more complex if a large number of Unicode characters are not defined in the Unicode table. Spreadsheet For spreadsheet export empty headings are padded with $H_ labels to ensure that locators always end up in the locators column. If IndexConvert guidelines are followed, import to a spreadsheet should be straightforward. Heading ($Hn_) and locator ($L_) labels are retained for quality purposes and will need to be removed. Page 26 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 Locators are not separated into separate fields and the file is not expanded to provide one record per locator. Recovery of styles using standard Find and Replace may not be possible. Suitable macros may be required. Database Database export is similar to Spreadsheet export except that each entry contains one locator only. This export is compatible with spreadsheets, databases and Cindex. Heading ($Hn_) and locator ($L_) labels are retained for quality purposes and will need to be removed. Recovery of styles using standard Find and Replace may not be possible. Suitable macros may be required. SKY Index There is an option to use an exported copy of the Translation table to expand Unicode characters to their keyboard text equivalents. For example Á will expand to [A`] if that entry exists in the Translation table. Use Options>Translation Manager in Sky Index to reach the Translation table and select Export to generate a file that is readable by IndexConvert. Sky Index export is an ANSI file. Checking against the Translation file is optional. One may not exist; if one does exist the user has the option of not using it. The dialog boxes that appear when Concatenation is selected are below. The first appears if a translation file has not been selected at the Configuration stage or one does not exist.. Figure 19 Sky Index Translation File not found The second appears if a Translation file has been found. Page 27 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 Figure 20 Read Sky Index Translation File When export takes place, Unicode characters outside of the ANSI set and without an entry in the Sky Index Translation file will appear in the exported file as question marks (?). Heading ($Hn_) and locator ($L_) labels are retained for quality purposes and will need to be removed. Style labels are exported and need to be recovered using Sky Index find/replace. Figure 21 Sky Index Replace Page 28 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 The dialog box above shows Sky Index replace dialog set to restore the italic style. The Find strings in the table below require Pattern Matching to be selected. Style Find2 Replace Italic {$IA_}{*}{$IZ_} {2}, Italic Bold {$BA_}{*}{$BZ_} {2}, Bold Underline {$UA_}{*}{$UZ_} {2}, Underline SmallCaps Not supported Subscript {$-A_}{*}{$-Z_} {2}, Subscript Superscript {$+A_}{*}{$+Z_} {2}, Superscript Figure 22 Sky Index Style Recovery 2 The Find string can be copied into the Cindex find field. Then you just need to adjust the attribute for each find/replace operation. Page 29 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 Case Studies Test Index The index below contains content from a number of scientific text books and some made up cases. It is one of the indexes used for testing IndexConvert. Comments are inserted in brackets (…). and Symbols (symbol heading inserted by Cindex - remove) 2:4:6-Tribromoaniline, 269 2:4:6-Tribromophenol, 245 (Empty line is removed by Preprocess) A (Single letter group headings removed by Preprocess3) Abdullah, Saud, K. see Saud Abraham, 16, 27n, 39n, 45n, 86, 152n and Islam, 170, 182, 185 ac Current Gain hfe, 35 (subscript – Label styles adds $- labels) Acceleration, relativistic transformation of, 243 (subheadings4) Newton’s Laws of, 244 B Berzelius, 333, xii, xiii, xiv, xxiv Beta (β), 29, 30, 48, 55, 181 Biasing, 23, 25, 27 definition, p95 single stage, 96 classes, 97 general circuit, 98 (Roman numeral locators Label locators [ i ]5) (β Unicode Macrex and Sky Index6) (page prefix7) 3 Single letter group headings and empty lines are removed by Preprocess. Leading comments and empty lines should be removed. Content should be removed from headers and footers as they will otherwise get saved in the final text file. 4 (In some indexes the comma indicates a subheading and should be modified to become a set-out subheading, this ensures concatenation works correctly. 5 Select Label locators [ i ] and check Roman numerals for them to be identified as locators. Currently only i to xxx are identified. 6 For Macrex, Unicode characters are included in the Unicode.ini file. A default file is supplied with Macrex but can either be added to or copied and added to. The file to be used is defined using the Configuration dialog. For Sky the Translation table contains translations for special text strings and Unicode characters. This can be exported. The file to be used is defined using the Configuration dialog. 7 Page prefixes can be entered using Label locators [ i ]. p, ch, Chap may all be used. If there is a space between the prefix and the number then this must be included in the definition. Page 30 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 three stage direct coupled, 105 diode, 107 nonlinear, 106 thermistor, 106 two stage direct coupled, 102, 218, 219 Boundary conditions (no locator, picked up by Audit8) on A, 125 (Bold – Label styles) on B and H, 36 on D and E, 16 C cafés, 166–167 D Dipole magnetic (see Magnetic dipole) (Locator ‘(see’ not labeled9) E E-field calculation of, 21, 22 due to accelerated charge, 77, 243, 270 K KP, 174 function of temperature, 174 statistical calculation, 637 L Large signal definitions and test circuits hFE, VCE(SAT), VBE, hIE, 466 M m- Nitrobenzoic acid, 261 N Negative feedback. see feedback Newton’s Laws, 244 8 During Audit, error codes $Er are inserted into the index where attention is required. The error labels inserted and how they should be processed depends on the target export. 9 This heading doesn’t contain a recognisable locator. This is picked up by Audit. Page 31 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 O 'Oxo' reaction, 77 P p Orbital, 285 T Test 1 Heading, 100, 1000, cc, i, ii, xx (Subheadings10) Test 2 heading, 100, 1000, cc, i, ii, xx Test c heading, 100, 1000, cc, i, ii, xx test 4 heading, 100, 1000, cc, i, ii, xx test e heading, 100, 1000, cc, i, ii, xx test 6 heading, 100, 1000, cc, i, ii, xx test 7 heading, 100, 1000, cc, i, ii, xx test H heading, 100, 1000, cc, ii, xx test j heading, 100, 1000, cc, ii, xx test k heading, 100, 1000, cc, ii, xx W Wöhler, 173, 325, xiii, xiv, xv Y "Y" parameters, 50 see also "Z" parameters Z "Z" parameters, 49, 50, 51, 56 see also "Y" parameters (Cross reference11) Π π-Orbital, 285 10 Headings level 1 to 10. These are automatically processed using Auto indent. Or they can be labelled using Level 1 to 9 and Level 10+ for level 10. 11 IndexConvert comes expecting see, see also cross referenced. Additional cross reference terms can be defined using Label locators [ i ]. Page 32 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 Legal Index Locators for this index are complex references to legal procedures. Label headings labels the locators as sub headings. There were five levels of heading in this index. Label locators should NOT be run.. Because the locators are always the last field in the export file they are treated as locators by Cindex (the target export). Error checking will tell you if the structure following concatenation is not regular and needs attentionr.$Er04_ labels are inserted into the entries with the wrong number of tabs (columns). ABN — see Australian Business Number (ABN) ACN — see Australian Company Number (ACN) Acquisition of shares and securities — see also Takovers Panel — see also Takeovers Panel Rules for Proceedings application for summons for appearance in relation to registration of transfer of interests ACT Sch 6 r 12.2(1) – (3) Cthr 12.2(1) – (3) NSWr 12.2(1) – (3) NTr 12.2(1) – (3) QldSch 1A r 12.2(1) – (3) SAr 12.2(1) – (3) Tas see Cth Vicr 12.2(1) – (3) WAr 12.2(1) – (3) generally ACT Sch 6 rr 12.1 – 12.3 Cth rr 12.1 – 12.3 NSW r 12.2(1) – (3) NT r 12.2(1) – (3) Qld Sch 1A rr 12.1 – 12.3 SA rr 12.1 – 12.3 Tas see Cth Vic rr 12.1 – 12.3 WA rr 12.2 – 12.3 issue of summons for appearance in relation to registration of transfer of interests, form of ACT Sch 6 r 12.2(3), Form 18 Cth r 12.2(4), Sch 1 Form 18 NSW r 12.2(4), Sch 1 Form 18 NT r 12.2(4), Sch 1 Form 18 Qld Sch 1A r 12.2(4), Form 18 SA r 12.2(4), Sch 1 Form 18 Tas see Cth Vic r 12.2(4), Sch 1 Form 18 WA r 12.2(4), Sch 1 Form 18 Page 33 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 Disclaimer The success of IndexConvert depends on an understanding of the structure of the index, an understanding of how IndexConvert operates and an understanding of the indexing software being used. The process supported automates many of the tedious operations involved but full automation is not possible. This software is provided 'as is' without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of fitness for a purpose. The software and its documentation could include technical or other mistakes, inaccuracies or typographical errors. In no event shall the author of this software be liable to you or any third parties for any special, punitive, incidental, indirect or consequential damages of any kind, or any damages whatsoever, including, without limitation, those resulting from loss of use, data or profits, whether or not the author has been advised of the possibility of such damages, and on any theory of liability, arising out of or in connection with the use of this software. The use of the software is at your own discretion and risk and with agreement that you will be solely responsible for any damage to your computer system or loss of data that results from such activities. No advice or information, whether oral or written, obtained by you from the author shall create any warranty for the software. The user is responsible for ensuring that any material processed by IndexConvert is done in a manner that does not infringe on the rights of any copyright owner. IndexConvert Copying, Licence Keys, and Redistribution You may make copies of IndexConvert for archiving. You may keep copies on multiple computers. One computer should be the principle computer for the purpose of index conversion. Any others should be for backup and familiarization only. You may not redistribute IndexConvert. Users should always download IndexConvert from wwwindexbase.co.uk to be sure the latest version is used. Any licence keys supplied are for use by the end user only and are not transferable. They may not be redistributed. From time to time special purpose licence keys will be supplied. These are generally for special maintenance purposes. These are not to be redistributed. Acknowledgements Connie and David Tyler introduced me to indexing and thence to indexers and the various societies and members around the world. Lucy Ridout of SfEP gave me the opportunity of a Lightning Talk at the 2015 joint SI/SfEP conference in York. This provided the impetus to embark on this complex project. Frances Lennie of Indexing Research provided encouragement during the development of IndexConvert and made other indexers aware of what I was trying to achieve. Drusilla and Hilary Calvert provided information about the operation of Macrex. Kamm Schreiner of Sky Software and David Ream of Leverage Technologies both supplied information and advice. Page 34 of 35 © B Campbell 2015-16 IndexConvert User Guide Issue 0.9.2.3.a February 2016 A number of professional indexers from around the world have enabled IndexConvert to be tested on a range of index formats and have supplied user feedback. Enquiries If you have queries about this software or require advice about its use then please email enquiries@indexbase.co.uk Page 35 of 35 © B Campbell 2015-16