Performance analysis of different implementations of Proficiency Report generation (MUI-402) Implemented and analyzed the performances of a few solutions to the problem of proficiency report generation based on the chat we had with Seth on 09/01/2011. The idea is coming up with different solutions using different levels of database querying via SQL and procedural processing in C#, and comparing the performances of these different solutions. Performance analysis is done on databases populated with sample data containing different number of users and sessions (50, 100, and 600 users with 10 sessions/exercise or 100 sessions/exercise). During the analysis, individual speeds of three different sets of operations were measured and recorded. These three different sets of operations are performed from the time the user clicks on “Generate Report” button until the resulting table is displayed. They are defined as the following: SET 1: All database querying operations performed on Sqlite SET 2: All procedural processing performed via C#/.NET source code for building a data table with proficiency information SET 3: All processing performed via C#/.NET source code for building the displayed datagrid with icons Goals: Figuring out the relative time spent on each set of operations Figuring out the optimal solution Solutions Solution 1 (current implementation in repository): for each user { for each exercise { [SET 1] Run a SQL query to get passing/failing information of sessions of the user and exercise in chronological order [SET 2] Determine proficiency by processing the query result (calculate the number of consecutive and nonconsecutive passes) } } [SET 3] Prepare displayed data grid Solution 2 for each user { [SET 1] Run a SQL query to get passing/failing information of sessions of all exercises of the user in chronological order for each exercise { [SET 2] Determine proficiency by processing the query result for calculating the number of consecutive and nonconsecutive passes of the exercise } } [SET 3] Prepare displayed data grid Solution 3 for each exercise { [SET 1] Run a SQL query to get passing/failing information of sessions of the exercise of all users in chronological order for each user { [SET 2] Determine proficiency by processing the query result for calculating the number of consecutive and nonconsecutive passes of the user } } [SET 3] Prepare displayed data grid Solution 4 [SET 1] Run a SQL query to get passing/failing information of sessions of all exercises of all users in chronological order for each user { for each exercise { [SET 2] Determine proficiency by processing the query result for calculating the number of consecutive and nonconsecutive passes of the user and the exercise } } [SET 3] Prepare displayed data grid Performances Solution 1 10 sessions/exercise total time (sec) 50 users & 39 exercises 100 users & 39 exercises 600 users & 39 exercises 6.5 12.16 70 100 sessions/exercise total time (sec) 50 users & 39 exercises 100 users & 39 exercises 600 users & 39 exercises 30.78 62.72 345.5 SET 1 time % 3.976 61.16923077 8.270827 68.01666941 51.871 74.10142857 SET 2 time % 1.452 22.33846154 2.905173 23.89122533 17.107 24.43857143 SET 1 time 28.17 58.455 325.5 % 91.52046784 93.19993622 94.21128799 SET 3 time 1.072 0.984 1.022 SET 2 time 1.585 2.98 17.778 % 5.149447693 4.75127551 5.145586107 % 16.49230769 8.092105263 1.46 SET 3 time 1.025 1.285 2.222 % 3.33008447 2.048788265 0.643125904 Solution 2 10 sessions/exercise total time (sec) 50 users & 39 exercises 100 users & 39 exercises 600 users & 39 exercises 3.13 5.25 27.63 100 sessions/exercise total time (sec) 50 users & 39 exercises 100 users & 39 exercises 600 users & 39 exercises 21.03 41.88 266 SET 1 time 1.861 3.824 24.635 % 59.45686901 72.83809524 89.16033297 SET 2 time 0.174 0.34 1.732 SET 1 time 19.156 39.338 254.454 % 91.08892059 93.93027698 95.6593985 % 5.559105431 6.476190476 6.268548679 SET 3 time 1.095 1.086 1.263 SET 2 time 0.736 1.453 10.143 % 34.98402556 20.68571429 4.57111835 SET 3 % 3.499762244 3.469436485 3.813157895 time 1.138 1.089 1.403 SET 1 SET 2 time % time % 1.2412482 50.25296356 0.2777518 11.24501215 2.8992899 59.16918163 0.9777101 19.95326735 17.9797978 34.28641838 33.2702022 63.44432151 time 0.951 1.023 1.19 % 5.411317166 2.600286533 0.527443609 Solution 3 10 sessions/exercise total time (sec) 50 users & 39 exercises 100 users & 39 exercises 600 users & 39 exercises 2.47 4.9 52.44 100 sessions/exercise total time (sec) 50 users & 39 exercises 100 users & 39 exercises 600 users & 39 exercises 17.12 36.28 393.42 SET 1 time 14.679 29.791 184.431 % 85.74182243 82.11411246 46.87890804 SET 3 SET 2 time 1.397 5.325 208.006 % 38.50202429 20.87755102 2.269260107 SET 3 % 8.160046729 14.67750827 52.87123176 time 1.044 1.164 0.983 SET 1 SET 2 time % time % 1.2610721 10.13723553 10.1299279 81.43028859 2.5141439 6.163628095 37.0068561 90.72531527 time 1.049 1.269 SET 1 time % 13.1327511 17.99007 time 1 % 6.098130841 3.208379272 0.2498602 Solution 4 10 sessions/exercise total time (sec) 50 users & 39 exercises 100 users & 39 exercises 12.44 40.79 100 sessions/exercise total time (sec) 50 users & 39 exercises 73 SET 2 time % 58.8672489 80.64006699 SET 3 % 8.432475884 3.111056632 SET 3 % 1.369863014 Analysis Notes: There are 39 scored exercises (i.e., the exercises a user can be proficient) in dV-Trainer and the number of scored exercises is likely going keep being a bounded finite number. The number of users and number of sessions performed in an exercise are unbounded. Relative time spent on each set of operations The fraction of the total time spent on each set of operations heavily depends on the solution. For example, the majority of time is spent on SET 1 operations in Solution 2. Solution 2 performs database query operations (SET 1) per each user in a loop. The fraction of time spent on SET 1 operations in Solution 2 increases as the number of users increases in a dataset (see Solution 2’s performance chart), while the fraction of time spent on SET 2 operations is bounded by the bounded number of exercises. On the other hand, Solution 4 spends the majority of time on SET 2 operations. The actual time spent on SET 3 operations is independent from the solution for the same number of users and increases as the number of users increase (the users are the rows of the data grid). However, the actual time spent on SET 3 operations become insignificant compared to the time spent on other operations as the data set size increases both in terms of number of users and number of sessions per exercise. Optimal solution Solution 4 proved to be the slowest solution from the very first couple data sets compared to all other solutions and required no further data collection for other data sets. Both Solutions 2 and 3 are faster than Solution 1 on 50 users and 100 users data sets with 10 and 100 sessions per exercise. However, it also looks like the speed of Solutions 2 and 3 approaches to that of Solution 1 as the number of users increase in a data set. Practically speaking, a dataset of 600 users with 100 sessions per exercise can be assumed to be a pretty good upper bound on the data that can be generated over the lifetime of dV-Trainer 1.1. For varying sized data sets up to this size, Solution 2 is the fastest solution. P.S. I also looked into optimizing the sql query I am using for getting session pass/fail information. I especially looked for unnecessary table joins (like pointed out in MUI-438) but could not find any.