自動化系統評估 Evaluation of Automation Systems 張裕幸 F.W. Lancaster and Beth Sandore, Technology and Management in Library and Information serverices:Chapter14 Evaluation of automated Systems,1997,196-225. • Evaluation of the performance of an automated system can provide several useful types of management information on (1) whether new or updated systems meet contract requirements; (2) whether the system is living up to the performance and output standards of its user community; (3) the point at which a new system or system refinements are needed; (4) possible future resource consumption. • 評估自動化系統的效益可以由下列幾種有用的管理資訊予 以獲得:(1)新系統或更新系統是否符合合約需求; (2)系 統產出或效益是否有達到使用者(群)的要求標準; (3) 明 確指出新系統或系統強化的需求; (4) 未來可能的資源消 耗。1 • -------------------------------------• 1.第四項資訊即考慮到未來潛藏性的成本支出,與TCO的 觀念相符。 • It is obvious that a computer system can be evaluated according to different types of criteria – ease of use, cost, reliability , integratibility , and so on. • 電腦系統的評估可以依據不同的標準,如 「容易使用」、「成本」、「穩定性」以 及「整合性」等等。 • In her survey of 54 major research libraries in North America, Johnson (1991) discovered that ease of use by patrons was a major consideration in the selection of a new system. – above cost and, perhaps surprisingly, ease of use by staff. Her found that these same libraries considered improvement of user services as the major objective of automation and improvement service to users as the major accomplishment of automation. •經過評估北美54個主要的圖書館,Johnson(1991) 發現使用者感覺「容易使用」是選擇系統的主要 考量。2而她亦發現改善使用者服務是自動化系統 的主要目標,亦是自動化的使命。 • Peters(1988) identifies three types of systems evaluation: (1)functional – to determine whether a system’s features meet the library’s needs; (2)economic – to determine the affordability of a system; and (3) performance – to reveal whether the system capacity can meet present or anticipated future demands. •Peters定義系統評估三個型式:功能性—決定系 統特性是否符合圖書館需求;經濟—決定系統的 支出(時間與金錢等);效益—系統是否能夠符合 現今或未來的需求。 Value 1 2 3 Ease of use by patrons (顧客感覺好用) 77.8 14.8 7.4 Availability of application modules and subsystems (應用模組或子系統之可用性—符合使用需求) 77.8 18.5 7.4 Completeness of modules and subsystems (模組或子系統完整性) 68.5 22.2 9.3 Cost of system (系統成本) 68.5 29.6 1.9 Cost of hardware (硬體成本) 61.1 33.3 5.6 Need for local programming stall (MIS人員的需求) 59.3 29.6 11.1 Service reputation of vendor (軟體廠商的保證-e.g.售後服務) 53.7 37.0 9.3 Easy of use by staff (職員或幕僚感覺好用) 48.1 51.9 0.0 Comparable installed sited (軟體的客戶參考) 44.5 40.7 14.8 Previous experience with vendor(軟體廠商的導入經驗) 25.9 29.6 44.5 Training and documentation provided (教育訓練及文件提供) 22.2 66.7 11.1 5.6 25.9 68.5 Criteria Bias against vendor (對軟體廠商的徧見) Key: 1.= Seriously considered (審慎考慮) 2.= Considered to some extent (僅考慮某些範圍) 3.= Not considered at all (從不考慮) • There are obviously many possible ways in which approaches to the evaluation of automated systems can be categorized. For the purposes of this chapter, two major approaches are identified: • Evaluation without user involvement or with less than full user involvement.(user free) • Evaluation with full user involvement. (user involved) • 評估自動化系統有許多方法,但可歸納成兩類:1.没有使 用者參與的評估方法;2.使用者完全參與的評估方法。3 • ----------------------------------• Thinking—TAM模型屬於上述兩種評估方法中第2種型式。 • User-Free Evaluation: This category of evaluation focuses on system features rather than on how these are exploited by a particular group of users. • 無使用者參與評估:這類型的評估著重於 系統特性而不是特定使用者的試用經驗。 這類評估可用於系統的選擇、系統的接受 性評估以及系統強化或替換的決策評估。 • One useful tool that can be used in the selection of systems is a checklist to determine the features present in a particular system or, more particularly, to compare the characteristics of two or more systems. •這類的評估通常會使用Checklist 工具, 以呈現特定系統的特性或比較兩個系統之 間的特色。 • A point value may be assigned to each feature, and a differential weighting scheme may be established to place emphasis on features that are considered more important than others. In other cases, features are assigned an equal rating of 1 or 0. system scores can be derived from the grids, with subtotals to indicate system strengths in particular areas, and total scores to indicate overall performance. •在Checklist中, 可針對不同的特色指定權值以強 調重要的系統特性。系統評分可以由垂直加總導 出,子項加總分數為該系統某特定區項的系統強 度,整體加總可以顯示出整體的效益。 Questions A B C D O B I S D E Geac 1.Is there adequate logon instruction (i.e. explain which terminal types are supported) 2.Are the contents and coverage of the OPAC ckearkt exokaubed? 3.Are the key equivalencies explained for remote user’s keyboard? N A N A N A N A 4.Is there adequated logoff instruction? 5.Is the screen display always clean? (i.e., no garbage characters) 6. (a) Is remote access unrestricted in terms of time of day? F G H I J K L NOTIS P A L S D R A H O M e H O M e N A N A N A (b) Does the system tell the user if there is a time limit to remote sessions? (c) Does the system give a warning message of automatic logoff if there is no user input? 7.Does the remote user have access to the some OPAC as those who use dedicated terminals in the library? 8.Does the sytem indicate where the remote user can get additional help? Score(Maximum 10) 6 Note: “NA” means “not applicable” 5 4 7 5 5 6 6 6 8 8 7 • The checklist method of evaluation is useful for several reasons. In the case of a single system review, it helps one to arrive at a list of desirable features, and to identify the strengths and weaknesses of a particular system. In the case of a multiple system review, a comparative checklist can help to verify the existence of features across systems and thus to identify comparative strengths and weaknesses. •在進行系統評估時Checklist是相當有用的工具, 對單一系統評估,它可以協助整理需求清單,同 時指出特定系統的優缺與弱點。對多個系統的比 較上,它協助定義不同系統間的特色差異,並找 出強勢與弱點。 • The use of a checklist ensures that the same questions about system features are posed consistently across systems. •Checklist的使用確保問題在不同系統間可 以在一致性的標準下進行比較。 • Cherry et al.(1994) employ4ed a checklist to survey features in the OPACs of twelve Canadian academic libraries. Data on each system were collected twice, by two different researchers, and the two datasets were checked a third time against the systems to resolve any disagreements. One hundred seventy features were included in the checklist, grouped into ten functional categories:1.Database characteristics; 2.Operational control; 3.Searching; 4.Subject search aids; 5.Access pints; 6.Screen display; 7.Output control; 8. Commands; 9.User assistance; 10. OPAC usability via remote access. • Cherry等(1994)使用checklist對加拿大十二所學院圖書 館進行評估,首先他對兩個研究群組進行系統的特性資料 收集,針對此兩個資料集再進行第三次查核以除去誤差以 求數據之公正。在他的研究中checklist總共收集了120因 素,並予以歸類聚集成十大類別之中。此十大類別為:1. 資料庫特性; 2.操作性控制; 3.搜尋; 4.主題搜尋輔助; 5.存取點; 6.螢幕畫面顯示; 7.輸出控制; 8.命令輸入; 9.使用者輔助; 10. 經由OPAC遠端存取的使用率。 • Acceptance testing or benchmarking is a process often used by libraries to verify that the new or upgraded system meets the contract requirements. Often the conditions of acceptance in a contract indicate clearly what type of performance is expected, and the acceptable level of performance, to determine whether a system works in the manner agreed upon in the contract. •接受度測試及標竿法經常用於檢驗新系統或系統 更新是否符合合約的需求。合約接受性(驗收標準) 清楚地指出何種效益必須達到要求,以及效益水 準為何、檢驗系統是否在合約所認定的規範下執 行。 • At times, public or staff users identify problems or make suggestions for system changes designed to refine its operation or its interaction with users. The feedback for making these changes can come from word of mouth or from the periodic review of performance logs generated by the system. • 職員或使用者所找出的問題或建議可於系統功能或設計上 予以進行修正或加強,而這些回饋意見的收集可以透過口 頭、文字或系統記錄分析獲得。 • Stress tests are commonly used to test implementations of new features. • 著重測試(具時迫性測試)是為了對新上線的系統特性予以 測試。 • Capacity planning is another important element in overall evaluation. By tracking the size of the database, and estimating its growth rate, projections can be made about when to increase capacity, and whether this increase in size will degrade or otherwise affect response time and other performance factors. Precise capacity planning is difficult because it involves projection and prediction based on numerous complex performance factors. • 系統容量規劃亦是整體性評估的重要項目,資料 庫容量是否足夠符合未來成長需求。資料存取算 是否會影響系統回應時間,而確地規劃是相當困 難,因為它包含大量效率相關因素的預測與其相 互影響的評估。 • User-involved Evaluation: For over twenty years, a growing body of research based on information science and cognitive psychology has been performed to gain a better understanding of how users interact with systems, and how the results of that interaction can be evaluated. One practical goal of this type of research is to collect and analyze information that can be fed back into better system design. • 使用者參與的評估:資訊科學與認知心理學人機 互動的研究於近廿年來有長足的進步,甚至對於 互動結果亦可進一步進行評估。這類研究的目標 在於收集並分析回饋資訊以求更好的系統設計。 • The interaction between the user and the system can be the subject of study for a number of purposes. The studies discussed in this chapter are carried out to learn more about how a system is used and to improve its performance. Many possible methods are applicable. Unobtrusive measures gather data while library patrons are actually using the system. User may or may not be aware that their keystrokes, or other actions, are being recorded or observed. The methods are unobtrusive in the sense that users are not being asked any questions and are not required to do anything they would not otherwise be doing. • 基於許多目的,使用者與系統的互動可以做為研究主題, 而本章主要是探討如何提昇效益。非強制性觀察法是以使 用者實際使用系統的過程收集資料,而使用者可能在不知 情的情況下鍵入動作,均被記錄下來並以觀察。這種觀察 法使用者不會被問及有關使用系統的任何問題,而這主要 的目的在於讓使用者處於自然的環境下,評估使用者的活 動。 • Obtrusive measures are used primarily to obtain feedback on user preferences for various system features and their opinions on system performance. • 強迫性觀察的評量主要是可以獲得使用者徧好的回饋,以 及他們對系統效益的看法。諸如,以訪談方式或在研究者 督下進行系統測試。 • Data thus collected can be useful in revealing how specific system features are exploited and in identifying features that appear to be giving users significant problems. At least three types of approach are applicable: review of transaction logs, direct observation of users operating at terminals, and video and/or audio taping of user performance. • 特定的系統特質可以依據使用者所提供的重大問題予以發 掘。至少有三種方式是可行的:直接檢視交易記錄檔、直 接觀察使用者在終端機操作的情形、或者以影音錄製使用 者使用系統績效的情形。 • Transaction log analysis (TLA) has been defined as the “… studey of electronically recorded interactions between online information retrieval systems and the persons who search for the information found in those systems.(Peters et al., 1993a) • 交易記錄分析被定義研究 ”使用者存取系統與資訊檢索 間電子記錄的研究”。交易記錄分析在1970年代被視為分 析使用者與線上型錄(選單)間互動的工具。 • Many TLA studies gather information on how frequently system features are used: choice of search type, use of help screens, how many hits users are willing to review, how often a search results in zero hits, the number and type of error messages that users receive, and so on. • 多數TLA研究在於收集系統特性、使用的頻率、搜尋型態 的選擇、help螢幕以及功能的使用、使用者重覆點選的 hits數、使用者看到錯誤訊息的數量。 • An annotated bibliography by Peters et al.(1993b) and a review article by Simpson (1989) serve as two excellent sources of further information about TLA. • 使用TLA的著名文獻有Peters et al.(1993b) 以及 Simpson(1989)是兩篇相當優秀參考來源,可以做為未來 的TLA研究。 • Despite all of its potential benefits, transaction log analysis does have limitations. In many systems with transaction log monitoring facilities, it is either difficult or impossible to delineate individual user searching sessions. • 除了上述所提及的優點外, TLA方法亦有其限制,在許多 系統中TLA要找到別使用者的搜尋Session是相當因難,甚 至是不可能的。另一個限制是在跨系統的比較上,TLA並 不適合且無法顯示相同的特性。 • Another problem is that of cost. A comprehensive monitoring module can add a significant overhead to the cost of operating the system. • 另一個問題是成本考量,持續的觀察對系 統的運作是不個明顯的負擔。甚至對圖書 館人員及管理者而言缺乏時間去分析這項 的訊息。 • Transaction log analysis collects data about system use in the aggregate and deals on with the quantitativewhich commands are used how often, which heading are consulted, how much time is spent per session, and so on. The most obvious example is the monitoring and analysis of use of a help command. Knowing what types of help are requested by users, especially in the case of a new system or one that has recently added new features, can be of great value in identifying problem areas that may not have been anticipated in the system design but may in fact, be rather easy to correct. • TLA 在系統使用分析上以量為分析對象包含對使用頻率、 上線的時間以及對系統項目的協助查詢等,在量的收集包 含總數及其分配。最明顯的例子是觀察並分析help指念的 使用狀況。 如果知道何種型態的help command是使用者 最常發送的動作,則能清楚對新系統或新增功能找出其問 題所在,而這些問題剛好是在系統設計時未曾被考量但卻 需予以修正。 • Although it is rarely acknowledge, direct observation is perhaps one of the most commonly employed techniques for collecting information about online system users. Critics often suggest that observation is an unscientific way of gathering only the information needed to support one’s own views. The technique need not be flawed; it is the degree of consistency in what is observed, and at what intervals it is observed, that determines the reliability of the data collected. • 雖然較少受到研究者的認同,直接觀察法可能是收集線上 系統使用者資訊中最普遍的技術。評論家認為這類方式是 不符科學方式(僅收集符合研究者論點的資訊)。然而這類 技術並不完全有缺陷,重要的是所觀察事件現象的一致性 程度如何,且觀察時段區間如何,而這些均是決定所收集 到資訊的可靠性。 • It is important to employ valid sampling techniques in conjunction with observation in order to obtain reliable data on which management decisions can be based. For examples, if one wants to know how many times users have to wait in line to use terminals in the reference room, one obviously cannot rely solely on the observations of a single librarian who only staffs the reference desk fifteen hours per week, between 8 a.m. and 5 p.m., Monday through Friday. Direct observation can be useful, only its own or to supplement other methods, when appropriate sampling methods are employed and input is received from more than one observer. • 使用直接觀察法必須結合其他的抽樣技術以得到可靠性的 資料,這樣方能提供管理決策使用。例如,如果想要得知 使用者在閱覽室中需多少時間排隊方能使用終端機,若僅 依據某一圖書館人員在一週內上班時間中十五小時的觀察 根本不夠。直接觀察法必需有多人觀察收集資訊且結合其 他的抽樣等收集方法才能具備有用性。 • A number of studies have analyzed the results of video and/or audio taping of the speech and actions of users during search sessions. The technique has been used to examine whether the cognitive, affective, or attitudinal behavior of users affects their performance and the outcomes of searching. Methods likes protocol analysis, which employ a pre-determined framework for analyzing user comments (asking a user to “think-aloud” while searching, then recording the resulting behavior and comments), can be used to classify and evaluate the relative effect of user decision-making and behavior on the success or failure of searching. • 有些研究利用錄影帶或錄音帶的方式將使用者在搜尋的連 線過程中所做的動作及關鍵字(併字)予以記錄並分析。這 樣的技術已被使用為評估是否使用者的認知或態度會影響 搜尋結果其效率與否。上述方法類似protocol analysis (要求使用者在搜尋時自言自語,同時記錄其行為及使用 評語),可以進一步對使用者的決策過程及搜尋行為的結 果予以評估。 • Survey questionnaires enable the collection of data about user satisfaction with a system or specific aspects of it, searching preferences and attitudes, demographic data about users, and the level of skills or knowledge that users possess. • 從問卷亦可以收集到使用者對系統的滿意度及使用觀點, 如搜尋偏好及搜尋態度、使用者的人數統計以及使用者的 資訊素養等。 • The published literature includes several studies in which online questionnaires were applied to record user attitudes and preferences for system features. The advantage of online questionnaires is the ability to collect critical incident data about session immediately following the session. • 應用線上問卷可以記錄使用者對系統特性的態度及偏好研 究,同時可以對使用者在搜尋的連線狀態下收集到重要事 件的資料。 • In comparison with questionnaires and transaction log analysis, interviews can provide a more intimate view of the user’s perspective on the system under examination. • 相較於問卷調查及線上交易記錄分析,訪談法可以提供使 用者對系統更詳細更深入的觀點。 • Another type of interview – the focus group interview – may be conducted with a small group and one or more interview, with video and/or audio taping, and/or assistants transcribing notes and statements during the group discussion. Focus group interviews are regularly used in marketing research to gather information from a particular, pre-selected group of users about products or potential products.1 • 另一種訪談法是特定群組訪談,它可是和一個小群組或多 個小群組進行訪談(可以錄音或錄影、或者是以小組討論 記錄方式)。 特定群組的訪談通常使用於市場研究或收集 特定群組對產品及潛在產品的看法。 • 1:亦可以以線上群組聊天方式收集群組意見,進行網路虛 擬社群相關主題研究。 • Some online systems offer the option for users to send unsolicited comments to librarians or system designers. In some cases, system administrators post an e-mail address to encourage users to report bugs or anomalies they encounter while searching the system. Therefore, the comment option is a valuable tool in problem identification. Current systems commonly offer an option that enables users to send mail message from within the system to system administrators or other staff who work closely with various aspects of the catalog. • 某些線上系統提供使用者傳送自發性的建議給館員或系統 設計者。在某些狀況下,系統管理者會貼出Email位址以 鼓勵使用者能回報他們在搜尋資料過程中所遭遇的Bugs. 意見回饋有助於指出問題所在。現在大部分的系統均會提 供意見反應的功能給使用者傳送訊息給系統管理者或館員。 • Limitations of Evaluation • Although the reasons for performance evaluation are compelling, much of the work done in systems evaluation within libraries is the exception rather than the rule. This situation can be commonly observed for a number of reasons. Sometimes system resource usage is neither the concern nor the domain of the library, but rather of the campus or city administrative computing center. In situations where the management of the computing facilities is separated from the management of the library, it is more difficult to establish a cohesive picture of the factors that affect the system’s performance, much less to correct these situations when performance problems arise. • 雖然績效評估的原因是迫不得已,大多數圖書館的系統評 估均非日常例行性工作。這種情況可由一連串的原因觀察 得到。系統資源的使用率既不是圖書館專有領域亦非其所 關心範圍,當然亦不是校園行政事務或電算中心的職責。 當電腦效能的管理與圖書館管理分開時,要建立系統績效 的影響因素關連圖時是相關困難,更不用說當問題發生時 會去更正。 • Also, online systems typically generate hundreds of statistical reports about the system functions on a regular (usually monthly) basis. Often, information about system resources usage needs to be carefully analyzed and translated to a different format in order to be usable for library managers. System vendors have moved increasing to report generation modules that can be customized by libraries in order to avoid this pitfall. •線上系統可以產生上百份有關系統功能的統計報 表(周報表、月報表或季報表)。但報表上的系統 資源使用率資訊必須經過仔細的分析及轉換才能 提供圖書館管理者有用的資訊。 • The analysis of system performance data requires both skill and a commitment to ongoing analysis. Not all librarians feel they have adequate training in the use quantitative or qualitative analysis methods. Further, it is not always clear where in the library organization the responsibility ought to rest for ongoing evaluation, beyond the annual budgeting process for equipment, software, and online contractual services. • 系統效益分析需要技術及專心以赴地不斷地進行, 然而並不是所有的館員均認為他們適合做這樣量 化及質化研究的分析訓練。再者圖書館組織是否 應持續地進行評估,在政策面亦不明朗因為這牽 涉到年度預算、設備軟體及合約服務期限等問題。 • The published literature reveals that this analysis is now being performed by many different people – system, reference, collection development, technical services, and administrative librarians. • 最近研究發表文獻指出這類的評估作業亦 有許多不同的人員在進行,這包括系統、 參考人員、資料發展分析、技術服務人員 及圖書館管理人員。