CA SBA REPORT Wong Hei Wai 7A (26) SCHOOL-BASED ASSESSMENT REPORT [2011-2012] Yan Chai Hospital Lim Por Yen Secondary School 7A Wong Hei Wai (26) HONG KONG EXAMINATION AND ASSESSMENT AUTHORITY HONG KONG AVANCED LEVEL EXAMINATION AS COMPUTER APPLICATIONS [ DISCUSSION FORUM SYSTEM ] 1 CA SBA REPORT Wong Hei Wai 7A (26) ~CONTENT~ 1. Objective and Analysis p.3-8 2. Design p.9-11 3. Implementation 4. Testing and Evaluation 5. Conclusion and Discussion 6. Documentation 2 CA SBA REPORT Wong Hei Wai 7A (26) 1. Objective and Analysis 1.1 Background Database system is widely used nowadays. A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality, in a way that supports processes requiring this information. A general-purpose DBMS is typically a complex software system that meets many usage requirements, and the databases that it maintains are often large and complex. The utilization of database is now spread to such a wide degree that virtually every technology and product relies on databases and DBMSs for its development and commercialization. For example, companies will use database function to contain a large amount of data, such as customers’ information, staff information, different statistics and so on. And also, in the world of Internet, database system is important as well since many websites need to use this function to contain its data, such as public discussion forums. Public discussion forums are being popular in the Internet. There are thousands of discussion forums in the Internet world. Some of the most famous discussion forums are Uwants.com, Little Soldier Forum and Hong Kong Fail Forum. In a public discussion forum, database is an essential structure for it to operate which is used to store user information and forum posts. Each post contains different essential information, such as the user identifier, IP address and time of the post. They should be able to store for security and user behavior analysis purposes. In this SBA project, a database structure of a public discussion forum is asked to be designed. The system will generate the following statistics: posting statistics user statistics online traffic analysis posting habits analysis 3 CA SBA REPORT Wong Hei Wai 7A (26) 1.2 Analysis All of the public discussion forums are on the purpose of giving expression and discuss with other internet users. And the forums also have similar functions which are based on the database system, such as online status statistics, user information searching and news searching functions. It is useful for administrators and users to look up for different information of the discussion forum. During the summer vacations, I have done a research on two discussion forums, the Uwants forum and TVB forum and made a comparison on their functions. Here are some examples. 1.2.1 Online and posting statistic According to fig.1, public discussion forums can contain the counting amount of number of users who are onlineing and also the record of the maximum number of users which online at the same time Besides, there is a record which containing the number of posts in that public discussion forum, and the number of posts which post on that day. fig. 1 1.2.2 user information According to fig.2, users can search for their account information in the public discussion forum, such as User ID, their identity group, the registration date, their last posting record. fig.2 4 CA SBA REPORT Wong Hei Wai 7A (26) 1.2.3 posting record There are some records about posting from a specific member. In order to attract more users to register in their discussion forum, some administrators have much creativity that they will use new name such as “accumulating marks” or “internet cash” instead of the traditional one. fig.3 1.2.4 searching function Users are allowed to use a searching function in a discussion forum. There are thousands and thousands of posts in a discussion forum, if the users want to read a specific news, it is extremely difficult to search the posts one by one. Therefore the main purpose of using the search engine is to let the users to search some posts. Users can do searching by keywords of the post, the name of user, posting time, theme of post and the related region. Fig.4 is an example. fig.4 5 CA SBA REPORT 1.2.5 Wong Hei Wai 7A (26) forum layout The database system can divide all the posts into different categories. The following print screen picture is from Uwants forum. In it, the database system creates an index referred to different categories, such as News, Food, Travel, Comics, Sports, Fashion, etc. The index function can allow users to search their interested information conveniently and the layout of the forum can be more orderliness. fig.5 The print screen below is another forum, TVB forum. It adopts a simpler layout structure by using database system. The TVB company is a company which offering different TV programmes. So they divided the information of the forum into Dramas, News, Life, Entertainment and so on. fig.6 6 CA SBA REPORT 1.2.6 Wong Hei Wai 7A (26) Voting function Besides the function of posting news, discussion forum also allow users to make votes in posts. A bar chart will be created to show the number of votes in different options. The following print screen is showing a voting post in TVB forum. The statistic showed there were 3919 total votes, which means 3979 users have read this post and made vote. Fig.7 7 CA SBA REPORT Wong Hei Wai 7A (26) Database software There are different database software in the market nowadays. For example, FoxPro, Microsoft Access, mySQL and Oracle. Although they are all so famous and have lots of users, there are still some differences between those database software. The following tables compare the limits about data size limits and capabilities for some database software. Database software FoxPro Access My SQL Oracle Unlimited 2 GB Unlimited Unlimited 2 GB 256 TB 4 GB Max char size 16 MB 255 B 64 KB 4000 B Max number 32 bits size 32 bits 64 bits0 126 bits Max column 64 name size / 64 30 Merge join No No No Yes Windowing Functions No No No Yes Common Table Expressions No No No Yes Max DB size Max size table 2 GB According to the comparison of these database software, Oracle is superior to others in every aspect and it is much expensive. The Oracle Database is a product from Oracle Corporation and it has always been choosing to be the database system software in big companies. Yet, this SBA report doesn't need that expensive software, and also the school has Microsoft Access software, so I decided to use Microsoft Access to design the database. 8 CA SBA REPORT Wong Hei Wai 7A (26) 2. Design The first thing we have to do is design the structure of the database system. We can use an Entity Relationship Diagram (ERD) to show the structure clearly. The ERD is showed below: Fig.8 In the above ERD, the relations between entities are clearly showed. Rectangles represent entities or called record. An entity is a representation of any composite information of a real object or an abstract object. Oval represent the relation the attributes or called fields. Rhombuses represent the relation between two entities, it is unique and cannot be null. There may be more than one relationship between two (or more) entities. Cardinality information can be divided into two types – minimum cardinality and maximum cardinality. In the ERD, 1, 0, M is the cardinality and existence of a relationship. 0 means the existence of the entity in the relationship is optional. 1 mean that the existence of the entity in a relationship must have at least one of at mist one. M means more than one existence between entities. There are three 9 CA SBA REPORT Wong Hei Wai 7A (26) entities in total (members, news and category). In the above ERD, users can read news, post news, reply news or delete news. The following are description of the relationship of the five linkages. One news can be read by none or many users and also be the same situation in the opposite direction, so the relation between reading news by users is Many to Many relation. One user can post none or much news, but one news can be posted by only one user, so the relation between posting news by users is One to Many relation. One news can be replied by none or many users and also be the same situation in the opposite direction, so the relation between replying news by users is Many to Many relation. One user can delete none or much news, but one news can only be deleted by one user, so the relation between deleting news by users is One to Many relation. One category can contain none or much news, but one news can only belong to one category, so the relation between category and news is One to Many relation. After finishing the ERD, we need to change the ERD into database schema, because ERD is a result of data analysis, but it can’t directly form a table structure, so we have to change it into schema. A database schema is its structure described in a formal language supported by the database management system (DBMS) and refers to the organization of data to create a blueprint of how a database will be constructed. When changing the ERD to schema, we need to follow some rules. For a 1: 1 cardinality relationship, all the attributes of the related entities are grouped into single table. For 1: M cardinality relationship, model each of the related entities in a separate table and post the primary key of the “one” side entity as an (foreign key) attribute to the table that represents the “many” side entity. For an M: M cardinality relationship, model each of the related entities in a separate table and create a new table (which is referred to as the intersection table) and post the primary key of each entity set/type as an attribute in the new table. If the relationship has its own attributes, those attributes are to be stored in the intersection table too. The 10 CA SBA REPORT Wong Hei Wai 7A (26) primary key of the intersection table is a composite key which includes the primary key of each concerned entity type. Besides those rules, we also need to do normalization to the table or schema. Normalization is a database design technique based on analyzing relations among key and non-key attributes of database tables. The main purpose of normalization is to minimize data redundancy and anomalies. There are 3 normal forms of normalization, the First Normal Form, the Second Normal Form and the Third Normal Form. The First Normal Form is used to ensure that no repeating fields in the table. The Second Normal Form is in First Normal Form and exhibits no partial dependencies in a table, i.e. non-key attribute in the table is full functionally dependent on the primary key of the table. The last is the Third Normal Form, it is in Second Normal Form and exhibits no transitive dependencies. Therefore the schema is shown below: Member ( user_ID, user_name, email, sex, birthday, password, start_date, online) News ( news_ID, IP, user_name, date_time, category_name, user_ID) Category ( category_name, description, administrator, start_date) Posting ( user_ID, news_ID, date_time) Reply ( user_ID, news_ID, date_time) 11 CA SBA REPORT Wong Hei Wai 7A (26) 3. Implementation After the design of the database, we need to create those tables by Microsoft Access. Pressing the above function that highlighted with red circle to create a new table. After clicking the button, we can see the following table to type the table field name, type and information. 12 CA SBA REPORT Wong Hei Wai 7A (26) In the following table, the field name is needed to type into the cell in the first place. Then in the second cell, we need to choose the type of the field, for example, character, memo, integer, date and so on. We need to choose the suitable field type for further processing. In the last field , we should type the description of the field, but it is optional for entering content. After creating all fields in the table, we need to set a key field that is the primary key of the table in schema. To set the key field, right click the mouse in the side bar of the setting page. The choose the key field which is the first button, then that field will be the key field of the whole table. Other tables in schemas also do these steps to 13 CA SBA REPORT Wong Hei Wai 7A (26) set all tables in schemas. In one single table, more than one primary key field can be set. After doing that, all the table will show at the main page of the database as follow. They are used for inserting data of the discussion forum. fig.9 For inserting records into the table, two method can be used. First method is typing all the record to the table by SQL and the access insert function. But it's quite inefficient, so I used choose the other method, using other source like spreadsheet to insert the information. Steps of insert the record by other resource are showed as follow. First, you need the file which stores all your record and the structure of storing records is same as the table structure of the table. Let me use the table Member to be an example. 14 CA SBA REPORT Wong Hei Wai 7A (26) Then you can use the function in Access to insert all the data to the table. Here show the following step. 15 CA SBA REPORT Wong Hei Wai 7A (26) First click file button on the top of Access, then we choose the choice marked with red circle, then the other box will appear, then we choose the choice Insert. Then one window will appear to the following steps. Then we should choose the file that stores the record that match with the table. In this case, we should choose student.xls. After choosing the right file then click the Insert(M) button and continuous the process. After clicking the button, other window will show to continuous the process. Then window is showed as following: 16 CA SBA REPORT Wong Hei Wai 7A (26) In these two steps, we just need to click the buttons of the next step because no any setting you need to set in this part. After clicking the next step in the following box, we need to choose the table that we is needed to insert information. 17 CA SBA REPORT Wong Hei Wai 7A (26) In this part, we need to choose which table you want to insert the record, in this example we need to choose the table Member. After choosing the table, you need to click next step again. Lastly, we just need to click finish, then, the coping of record to the table have been finished. Here is the table copied the record. fig.10 This is the member table which contains all the record of the member, like user ID, user name, email, sex, birthday, password, date of starting to use the forum and the status of online. I insert some data randomly, and there are 15 members in the forum, they have different personal information. fig.11 This is the news table which contains all the record of news, like news ID, user name, posting date and time, category name and user ID. There are 10 news posted by 10 different users in the forum and they belong to the 5 categories. 18 CA SBA REPORT Wong Hei Wai 7A (26) fig.12 This is the category table which contains all the record of existing category group, like category name, description of that category, the administrator and the date of creating that category. fig.13 fig.14 These are the posting table and reply table which contains the user ID, news ID and the date of posting news and replying news. I assume that in the discussion forum, there are 10 news posted by 10 different users, but there are only 4 replies. After copied all records to all tables, I need to create some SQL to generate statistics and carry out analysis afterwards. Posting Statistics 19 CA SBA REPORT Wong Hei Wai 7A (26) fig.15 The above is the SQL which can show the number of posting of each member from the discussion forum. It applies with the GROUP BY function. it is used to project rows having common values into a smaller set of rows. fig.16 The above is another SQL showing the posting record which contains the date of posting and the category it belongs to, also the total number of news in that day and category. User Statistics 20 CA SBA REPORT Wong Hei Wai 7A (26) fig.16 The above is the SQL which can show the information of each member in the discussion forum. But not all the information contain in the table Member, because ordinary users don't have full limits of authority, they can only search other users' ID, name, email, sex, birthday and the registration date. Online traffic analysis fig.17 The above is the SQL which can show how many member are being online at a specific moment. The "Online" field in table Member is a Logical field type. The Online status can only be True or False. If true, it means that member is being online. 21 CA SBA REPORT Wong Hei Wai 7A (26) 6. Testing and Evaluation For testing the database system, I will use the SQL to test if it works. Testing for posting statistics: After executing the posting statistics SQL, the above result came out. Under the assumed data, there are 10 members have posted news, from u0001 to u0010. They all have been posted 1 news in the discussion forum. 22 CA SBA REPORT Wong Hei Wai 7A (26) The above is the 2nd SQL searching the posting record, including the date of posting, the belonged category of that news and the total number of news posting in the same day and category. In the assumed data, members post news in 10 days between 21st March and 30th March, 1 news for each days. In those 10 news, there are 5 different categories, including sport, food, travel, movie and music. Each category has 2 news in it. And they are posted in 10 different days. Testing for user statistics: Since ordinary users of a discussion forum have limits of authority, they can only read some of the information of other users, such as user_ID, user_name, email, sex, birthday and start_date. The password is hided from the searchable area, it can only be checked by administrator. In this assumed data, there are in total 15 users in this discussion forum, and they got different name, which must be unique as well as the user ID. For the convenience of testing the SQL, I just simply set their name as a, b, c, etc. They got 15 different emails, and some are female and some are male. In 15 users, they have different birthday, it can be repeated in some of cases but not in this. And also they are registered in 15 different days. 23 CA SBA REPORT Wong Hei Wai 7A (26) Testing for online traffic analysis: The above SQL is used for testing which member is online at a specific moment. If that member is online, their “online” field in table Member will become true, otherwise maintain false. In the assumed data, there are 8 members online, they are u0001, u0003, u0005, u0007, u0009, u0011, u0013 and u0015. The SQL will show the user ID and user name of online members. Testing for not null function: 24 CA SBA REPORT Wong Hei Wai 7A (26) In the table Member, the field User_ID is set as the primary key, as well as the not null function which is highlighted in red circle on the above. The not null function is required the data entry must conclude this field, the user_ID must not be empty. If users enter data without entering the user ID, the database system will ask the user to enter it until it is not null. For testing if the not null function works, I enter a fake user record who named P. And I didn’t enter his user ID. After that the Access showed the box above, stating the field “user_ID” cannot be null, it required me to re-enter a user ID. 25 CA SBA REPORT Wong Hei Wai 7A (26) Testing for deleting data: Use the table Member be an example. In deleting the excess data, highlight all student who are excess, then right click mouse and choose the second choice that marked as the above picture. We should notice all the record that is deleted cannot recover again, so we should do it very carefully. Special cases: However, some unusual cases we can also think about. In the reality, when the users of discussion forum log in to it, they may forget their password and cannot log in to the forum. The database system can offer a “Forget Password” function to users. When they forget their password, they can use a Safety Question and Answer to get back their password which the question and answer is being given in the registration state. 26 CA SBA REPORT Wong Hei Wai 7A (26) 5. Conclusion and Discussion This system can fit the requirements in the objective. After this project, using database is a essential method to store and handle users’ data in discussion forum. Report is also made by the table after the process. So database is a useful tool for handling data and making report. In this project, I can learn other database program like access except FoxPro that teaching in class, and I learn how to use the access function to insert the record from other resources. I can also learn how to generate statistics and carry out analysis by writing SQL. I can learn the difference on spreadsheet and I can compare and construct different database tool and program. I know more about the database system of public discussion forum. By creating the report to generalize statistics about different data, I can learn about the process to making a report by using tables and the function in database. In this project, I can also learn how to write a complete report to present my ideas to others. However, I also find some problems during building up this system. First, I don’t know well about database of public discussion forum. Then I did a research on different Hong Kong popular discussion forum, and found out what characteristics do they have. After doing a presentation power point, I have a clearer direction on how to create a database system. Before I create the basic of database by Access, I need to design a ER diagram. Mr Law has taught us the way of drawing the ER diagram, but I found that it was quite difficult to implement into the situation of discussion forum. Facing a large number of entities, fields and relations, I could hardly manage them probably. My classmates gave me a software named SmartDraw, it was a very good tool to draw diagram, and I can easily edit it if I found there is any adjustment. Besides, it is difficult to directly inputting data in to the database, so I suggest inputting data to spreadsheet first and then import the data into database table, this is more efficient. To improve this database, we can offer a function of “Forget Password” to forum users since some of them may be forget the password and cannot log in all the time. And also we can create more statistics for users, they can know more details in the discussion forum. 27 CA SBA REPORT Wong Hei Wai 6. Documentation Wikipedia: http://en.wikipedia.org/wiki/Main_Page 28 7A (26)