A Service-oriented Architecture for New Media Services with Semantic Analysis Support

advertisement

A Service-oriented Architecture for New Media Services with

Semantic Analysis Support

利用語意分析來支持一個以服務為導向的新媒體架構

Author

Yanping Zhao, Sanxing Cao, Jianyun Shang, Bowen Liu, Fang Liu

School of Management and Economics, Beijing Institute of Technology School of

Information Engineering, Communication University of China

Beijng, China zhaoyp@bit.edu.cn; sanko333@gmail.com

Content Type

Conferences

This paper appears in: Management and Service Science, 2009. MASS '09.

International Conference on

Issue Date : 20-22 Sept. 2009

Speaker

Pei Mei Chen

Abstract— This paper designs a framework for the new media services platform in

China. By analyzing the functions of key modules, we establish an ontology semantic framework for description and organizing the heterogeneous new media resources of data, data flows, streams, and rules of the services detected from the user interactions context and node trust evaluation. Also we establish an optimal model for RSS content aggregation warehouse and the subscriber stubs warehouse. Be aimed at monitoring and analyzing the interaction pattern of users for optimal services in the platform, we put forward new service algorithms, and tested the result. Finally, we establish ontology top level description and reasoning logic metadata model for coordinating the whole system and supporting the service system at the core of all warehouses and modules. This result could provide a reference model for the new media services.

摘要:

這篇論文設計了一個用於新媒體服務平台的架構在中國。根據分析關鍵模組的功

能,我們設立一個描述本體論的語意架構和組織異質的新媒體資源的資料、資料

流、趨勢,並從使用者交互的上下文和節點信任評價檢測服務的規則。此外,我

們設立一個 RSS 內容總合倉庫和訂閱者存根倉庫的最理想模型。其目的為檢查和

分析在平台上的使用者互動模式達到理想的服務,我們提出了新的服務演算法,

並測試結果。最後,我們建立本體論頂層級的描述和推理邏輯的資料轉換模組,

以調節整個系統和支持服務系統在全部核心的倉庫和模組中。這個結果能提供一

個參照模組給新的媒體服務。

Keywords-ontology guided RSS content aggregation warehouse; subscribe stubs warehouse; clustering analysis; node trust evaluation

關鍵字:本體論引導 RSS 內容聚集倉庫;訂閱存根倉庫;集群分析;節點信任評

補充資料:

集群分析-將比較相似的樣本聚集在一起,形成集群 (cluster) 。以『距離』作為

分析方法不需要任何的假設。集群分析可分成分層法 (Hierarchical) 、非分層法

(Nonhierarchical) 和兩階段法。

資料來源: http://www.mcu.edu.tw/department/management/stat/ch_web/etea/SPSS/Applied

_Multivariate_Data_Analysis_ch10.pdf

I. INTRODUCTION 序論

New media includes digital news paper, digital broadcast, short messages, mobile TV,

IPTV, digital video, MP3, Streaming Media and so on. These years, it attracts many people online and gets more and more popular. How to organize it and supply it as a service for people’s daily life and interests? How to transport it to avoid large amount of network stream transportation? How to meet the particular needs of the special personal aims? Many scholars, enterprises, organizations and governments focus their attention on it, make it a industrial chain for video/audio contents deliveries, video advertisings, stream distribution player platforms, cross platform channels services, promoting user experience, profit modal analysis, video industry license and copyright for online works protection, and so on[1]. In China, the government is promoting services in these new areas to get people communicate easier and enjoy both work and leisure time with care of the intellective property protections [2].

新媒體包括數位報紙、數位廣播、簡訊、行動電視、網路協定電視、數位視頻、

MP3 、串流媒體等。這些年來,它吸引了很多人上網,並得到越來越多的青睞。

當一個服務要提供給人們日常生活和具趣味性的,是該如何去組織它和供應它?

如何去傳輸它,以避免大量的網路流運輸?如何滿足特殊的個人目標的獨特需

求?許多學者、企業、組織和各國政府都集中在他們對此的注意,製作一個產業

鏈給予視頻 / 音頻內容傳送、視頻廣告、分散式播放平台、跨平台渠道服務、促

進使用者體驗、盈利模式分析、視頻行業許可證和線上作品的版權保護,等等

[1] 。在中國,政府是促進在這些新領域的服務,讓人們溝通更容易和享受工作

和空閒時間一起照顧到智慧財產權的保護。

補充資料:

IPTV -全名網路協定電視( Internet Protocol Television ) Broadband

TV )的一種。 IPTV 是用寬頻網路作為介質傳送電視信息的一種系統,將廣播節

目透過寬頻上的網際協議向訂戶傳遞數碼電視服務。由於需要使用網路, IPTV

服務供應商經常會一併提供連接網際網路及 IP 電話等相關服務。 IPTV 是數位電

視的一種,因此普通電視機需要配合相應的機上盒接收頻道,也因此供應商通常

會向客戶同時提供隨選視訊服務。

串流媒體( Streaming media )-是指將一連串的媒體資料壓縮後,經過網路分段

傳送資料,在網路上即時傳輸影音以供觀賞的一種技術與過程,此技術使得資料

封包得以像流水一樣發送。

In this paper, we try to design a framework for the new media service platform in

China, in order to provide new media information with distributed sources on trusted nodes and meet variety of requirements.

在本論文中,我們在中國的新媒體服務平台試著去設計一個架構,為了提供新媒

體在信任節點上的分散式來源資訊和滿足各種要求。

This paper is organized as follows, the section 2 is the architecture of the new media service system, which includes a ontology model for describing new media sources and the rules for intelligent services, section 3 is the particular modules and algorithms for service quality improving and trust evaluation dynamically, section 4 is the initial testing results on some modules and section 5 summaries.

本論文組織如下,第 2 節是新媒體服務系統的結構,其中包括一個本體論模型用

於描述新媒體的來源和智能服務的規則,第 3 節是特殊的模組和演算法其用於服

務品質的改善和動態地信任評價,第 4 節是在同一模組下初步測試的結果和第 5

節的結論。

II. THE ARCHITECTURE OF THE NEW MEDIA SERVICE SYSTEM

新媒體服務系統的結構

Since new media is involved many heterogeneous resources and vast volume of video transportation and value added services, we investigate the technology and platforms, and practical feasible plans, and give the following design of our new media service framework and its key modules.

自新媒體涉及許多異質資源和龐大的視頻傳輸量和價值增值服務,我們調查技術

和平台,並實踐可行的計劃,和給予以下我們設計的新媒體服務架構及其關鍵模

組。

A. New Media Ontology Model 新媒體的本體論模型

Figure 1. the new media service ontology model 新媒體服務的本體論模型

新媒體

New Media

中國新聞網新聞中心 chinanews

大眾點評網 dianping

表示有一組屬性 denotes there is a set of attributes

入口

Portal

美國有線電視新聞網

CNN

社交網站

SNS

新媒體網路

New Media Network

臉書 facebook

flickr

土豆網 tudou

播客 podcast

搜尋引擎

Search engines

網路小說 network literature

網路電視 network television

部落格 blog

YouTube

飯否 fanfou ppLive

RSS

博客網 bokee

網路廣播 network radio

行動新媒體

Mobile New Media

數位電視

Digital TV

文字簡訊/多媒體簡訊服務

SMS / MMS

行動電視/廣播 mobile TV/ radio

手機出版

Mobile publishing

行動報紙

Mobile newspaper

新電視媒體

New television media

IPTV

其他新媒體

Other new media

行動電視

Mobile TV

Tunnel media

The essence of our approach is to use ontology to model the information and knowledge. Ontology is the term referring to the shared understanding of some domains of interest, which is often conceived as a set of classes (concepts), relations, functions, axioms and instances (Gruber,1993)[3]. In AI (Artificial Intelligence) community Ontology is knowledge representation method, which encodes knowledge into well-defined formats that can both be understood by human and processed by computers and allows knowledge shareable and reusable. So we establish the following new media ontology model not only for sharing knowledge but also for reasoning and providing better service semantic co-ordinations.

我們的方法的本質是利用本體論模型的資訊和知識。本體論的項目提到一些感興

1993 年) [3] 。在 AI (人工智慧)社會本體論的知識代表方法,其編碼知識成良

好的定義架構,它能讓人類和電腦流程都能理解,並允許知識的共享和重複使

用。因此我們建立以下新媒體的本體論模型,不僅為了共享知識,同時也為推理

和提供最好的服務語意協調。

We use Stanford Protégé ontology tools[4] for the ontology modeling, and we also collect many computer mediated communication sources such as web site communication (Blog postings and forum postings), peer-to-peer communication,

MSN archives, QQ archives, and PPLive, PPStream, UUSee, QQLive and so on.

我們為了本體論模型使用史丹佛的 Protégé 本體論工具 [4] ,並且我們還收集許多

以電腦為介質的通信來源,例如:網站交流(部落格的文章和論壇的帖子)、點

對點的通信、 MSN 的記錄、 QQ 的記錄,和 PPLive( 網路電視 ) 、 PPStream( 網路電

視 -PPS) 、 UUSee( 網路電視 悠視網 ) 、 QQLive( 網路電視 騰訊視頻 ) 等。

B. The Software Architecture of the System 系統的軟體結構

To answer the questions in new media services, we develop a novel ontology-based new media information SOA(Service Oriented Architecture) for the service system.

This system is also a knowledge-based RSS content aggregation warehouse and the subscribe stubs warehouse supported intelligent system. It integrates disparate and heterogeneous data sources and deploys various statistical, knowledge-clustering methods to support fast video information content collection, classifying to channels and improving real-time delivery and decision-making. Our approach extends the existing analysis methods in accumulating knowledge dynamically, analyzing user behavior and service providers comprehensively and knowledge reuse.

在新媒體服務要回答的問題,我們為了服務系統開發了一個新穎的以基礎本體論

的新媒體資訊 SOA (以服務為導向的架構)。這個系統也以知識為基礎的 RSS 內

容聚集成的倉庫和訂閱存根倉庫來支援智能系統。它集成了不同且異質的資料來

源和展開各種統計,知識群集方法用以支援快速視頻資訊內容的收集,渠道分類

和提高即時傳送和決策。在動態積累知識裡,我們的方法可以擴展現有的分析方

法,並全面地分析使用者的行為和服務提供者,還有知識的可再利用性。

顯示和服務供應商層

RSS 系統

訂戶管理

內容播放器 訂閱者認證的介面

業務層

個性化 搜尋

業務管理 內容傳遞網路管理 系統管理

服務層

使用者行為分析 內容分配 支付管理

資料層

節點信任評價

新媒體的本體論知識基礎 內容聚集倉庫

網路層

訂閱者存根倉庫

對新媒體的下一代智能節點重疊網路平台

Figure 2. the new media service oriented architecture 新媒體 以服務為導向的架構

P2P 連結

群組

下一代智能節點重疊

網路平台的智能節點

(信任管理伺服器)

網際網路的普通節點

Figure 3. China Next Generation Internet Network CNGI core network model

中國下一代網際網路 CNGI 核心網路模型

Where the Next Generation Intelligent Node Overlap Network platform(INON)[5] is the novel design of the Demonstration project of China Next Generation Internet

Network CNGI core network model[6] show in Fig 3.

下一代智能節點重疊網路平台( INON ) [5] 是中國下一代網際網路 CNGI 核心網路

模組的新穎設計,顯示在圖 3 中。

補充資料:

CNGI(China Next Generation Internet) -於 2003 年啟動,核心網路建設目標是在

2003 年到 2005 年的時間內,採用 IPv6 技術,由各營運商共同完成 CNGI 主幹網

(覆蓋 20 個城市 39 個核心節點)以及國內與國際互聯中心的建設,並實現與國

際下一代網際網路的高速連接。

In the following sections we will give some key modules and our novel designs.

在下面的章節中,我們將給予一些關鍵模組和新穎的設計。

III. PARTICULAR MODULES AND ALGORITHMS FOR SERVICE

QUALITY IMPROVING 特別的模組和演算法用以服務品質的改善

We give two key modules in the Fig.2 system.

我們給予二個關鍵模組在圖 2 的系統中。

A. Semantic Themes Definition and Context Reasoning for RSS Resources

語意主題定義和內容推理的 RSS 資源

We extract themes from the RSS datasets. These themes constitute the almost all scopes of the communication or new media contents. All themes can be divided into different aspects. Main features in each theme are selected with some assistant special rules to reason the texts according to the semantic knowledge. We put forward sentence structure connection techniques, and introduce a semantic sentence mechanism (SSM in short). Based on the three-system theory of complex sentence, the relation class of complex sentence is divided into three part: “causal”, “parallel”, and “transition. The causal complex sentence includes: causal, deduce, hypothesis, condition, purpose, etc. The parallel complex sentence includes: parallel, link up, progressive, choice, etc. The transition complex sentence includes: compromise, hypothesis, etc. Complex sentence usually contains conjunction word, which is the symbol for classifying complex sentence. Conjunction word not only connects sub-sentence but also indicate the logical relation between sub sentences. In terms of its position, conjunction word usually locates ahead of subject and predicate. However, because of the complexity of complex sentence and the variety of Chinese, complex sentences which have no conjunction word do exist.

我們從 RSS 資料庫提取主題。這些主題幾乎構成通信和新媒體內容的所有範圍。

所有主題可分為不同的面向。在每個主題的主要特點是選擇一些輔助的特殊規則

以推論根據語意知識的本文。我們提出的句子結構連接技術,並引入語意句子機

制(簡稱 SSM )

分:“因果”、“並行”和“過渡“。因果的複雜句包括:因果關係、推斷、假

設、條件、目的等。並行的複雜句包括:並行、連接、漸進的、選擇等。過渡的

複雜句包括:妥協、假設等。複雜句通常包含連接詞,這是分類複雜的句子的象

徵。連接詞不僅連接子句子,也表明子句子之間的邏輯關係。在它的位置,連接

詞通常位於主語和述語提前上。然而,因為複雜句和各種中文的複雜性,複雜句

有沒有連接詞的存在。

1) Analysis of HNC sentence model: Chuanjiang Miao[7] divided the description object of the information into four parts: the object’s properties, the interaction between objects, description classification of human activity and logic activity.

1) HNC 句模型分析:苗傳江描述物件的資訊分為四個部分:物件的屬性、物件之

間的互動、描述人類活動的類別和邏輯活動。

補充資料:

苗傳江-北京語言大學語言資訊處理研究所 http://www.cdblp.cn/author/%E8%8B%97%E4%BC%A0%E6%B1%9F

HNC - Hierarchical Network of Concept (概念階層網路)

論 http://www.hncnlp.com/Abs/absEmcj.htm

2) Analysis of passive sentences: The passive sentences people used to say are those sentences: they use predicate as the transitive verb and their subject is what the predicate modifying. For example, given a sentence “A hit B” or “B is hitted by A”, through the predicate ‘hit’, or “hitted” we get to know that ‘A’ carries out the action, while ‘B’ bear the action. This rule shows that this sentence tends to say that ‘A’ is stronger than ‘B’. Rules like this can be useful in helping identify the semantic inclinations of the given texts. This kind of knowledge is written by some experts, such as HNC (Hierarchy Network of Concepts)[7] and built by human experts and some sophisticated machine learning. It includes the Chinese ‘bei’ sentences and the passive sentences. And ‘shou’, ‘ai’, ‘zao’, etc.

2) 被動句的分析:被動句是人們曾說的那些句子:他們使用述語作為及物動詞

和他們的主語是什麼述語的修飾。例如,給一個句子為“ A 打 B ”或“ B 被打

A ”,透過述語“打”,或“被打”我們可以得知“ A ”為執行該行動,而“ B ”

為承擔該行為。這條規則表明,這句話往往會說, 'A' 是比 'B' 強的。這樣的規則可

以幫助確定本文的語意傾向。這種知識是根據一些專家寫的,例如: HNC (概念

階層網路)和根據人類專家和一些精密的機器學習所建立的。它包括中文的

“備” 句和其被動句。還有“受”、“愛”、“著”等。

By integrating the information above, we abstract the candidate words with the structure of the sentences. According to the impact of sentence structure, we introduce the rules to the multi-pattern matching algorithm in improving the quality of the classification accuracy of RSS, as well as the speed, we bring forward the SSM

(Semantic Sentence Mechanism) algorithm. It introduces the fast Chinese semantic analysis algorithm and a variety of sentence structure rules studied by our group.

透過整合上述資訊,我們提取句子結構裡的候選字。根據句子結構的影響,在改

善 RSS 分類準確性的品質時,我們引入規則以多方面與演算法相配,以及為增加

其速度,我們提出 SSM (語意句機制)的算法。根據我們的群組,它引入了快速

中文語意分析演算法和各種句子結構的研究規則。

Then, we execute semantic analysis SSM, and with the aid of the open source Chinese segmentation algorithm[8] from the ICT of the Chinese Academy of Science, to judge what theme each message or metadata of the video episode in the RSS corp. Thus, we can pick all the articles or metadata in the new media RSS content to a special theme and to a subgroup. So we can analyze and classify all complex, various streams or communications into separated groups or channels, which show the ideology of our design to improve the original RSS feed.

然後,我們執行語意分析 SSM ,並借助從中國科學研究院來的開放原始碼中國分

割演算法,在 RSS 集裡判斷什麼主題的每則訊息或某一視頻的資料轉換。因此,

在新媒體 RSS 內容的所有文章或資料轉換中,我們可以選擇一個特殊的主題和一

個分組。所以我們可以分析和分類所有複雜的、各種的趨勢或通信,以成為各群

組或渠道,這表明我們的設計思想,以改善原始的 RSS 摘要。

B. Dynamic Evaluation Model for the Node Trusts 動態節點信任評價模型

Since multi-media applications are mainly transmitted through P2P network on INON, which aimed to lighting the unfortunately original P2P weaknesses [6]:

在 INON (智能節點重疊網路平台)上,之前的多媒體應用主要是透過 P2P 網路

來傳輸,而多媒體應用能發掘出原始 P2P 之缺點:

由於多媒體應用主要是透過 P2P 網路上的 INON (智能節點重疊網路平台)

主要是啟發不適宜的原始 P2P 之缺點:

The disorderly expansion of P2P network, lack of uniform supervision and standard.

 The complexity of contents in the P2P network, the various behaviors of users make analysis or management hard for vendors.

 The chaos nature of peers’ identity arises crisis of trust and multiple security problems.

 P2P 網路無序的擴張,缺乏統一的監督和標準。

 在 P2P 網路的內容複雜性,為了供應商以使用者的各種行為作出分析或嚴格

管理。

 混亂同等本體的種類產生信任危機和多個安全問題。

To achieve our goal, we design a dynamic trust evaluation algorithm. Here is the idea.

The peer’s trust level measurement: suppose there are N customer peers and M server peers in the network. Customer peers’ trusting level at time t n

is determined by their evaluation of server peers. Customer peers analysis and grade sever peers at t n

.

為了實現我們的目標,我們設計一個動態的信任評價演算法。這裡的想法。同等

的信任等級測量:假設在網路上有 N 個同等客戶和 M 個同等伺服器。根據他們

的同等伺服器的評價,同等客戶的信任等級在時間 t n

是確定的。在 t n

下,同等

客戶的分析和同等階級的伺服器。

The expectation of trusted value of a server peer S k

, k=1,2,…M, at time t n

is the average of customer peers’ evaluations to the server:

同等伺服器 S k

的信任評價值, k=1,2,…M ,在時間 t n

是同等客戶對伺服器的評價

的平均值:

where S i

(t n

) indicate customer peer i’s evaluation to the trusted level of server peer S k

. and we set the rules: If , and calculate the expectation of the server peers S k

again. Also calculate the Trusted value C(t n

) of the customer peers i , i = 1,2,…N at time t n

.

S i

( t n

)表示同等客戶 i 的評價在同等伺服器 S k

的信任等級。而我們設置的規則:

如果 ,再次計算同等伺服器 S k

的期望。並且計

算同等客戶 i 的信任值 C(t n

) ,在時間 t n

下 i = 1,2,…N 。 where TrustC(t n

) denotes the customer peer’s trusted level at t n

, TrustC(t n-1

) denotes the customer peer’s trusted level at t n

-1. First set the initial trusted level of customer peers to 1. In the following evaluation process, if the difference between this user’s evaluation to sever k and the expectation of k’s is larger than δ, this user’s evaluation to server k will be deemed as useless and be discarded; at the same time, this user’s trusted level will remain the same as before. Otherwise, the evaluation will be deemed as useful and be kept, and the trusted level of the costumer will be added by 1.

在 t n

下, TrustC(t n

) 表示同等客戶的信任等級,而在 t n-1

下, TrustC(t n-1

) 表示同等

客戶的信任等級。 首先設定初始的 同等客戶信任等級為 1 。 在下面評價的過程

中,如果使用者評價之間的差異,以伺服器 k 和 k 的期望是比 δ 大,那麼該使用

者的評價伺服器 k 將被視為無用的和被丟棄;同時,該使用者的 信任等級將 仍然

像以前一樣。除此之外,評價將被視為有益的和長期保有,和客戶的信任等級將

增加 1 。

Server peer k’s trusted level at t n

is defined as follows:

同等伺服器 k 的信任等級在 t n

下的定義如下:

TrustC(t n

) denotes the trusting level of customer peer i at t n

. Use TrustC(t n

) as weight, calculate the trusted level of server peer k based on the evaluation of the remaining active peers.

在 t n

下的 TrustC(t n

) 表示同等客戶 i 的信任等級。使用 TrustC(t n

) 當作權重單位,

在主動的同等評價基礎上, 計算同等伺服器 k 的 信任等級。

The details of trust evaluation model is demonstrated in simple simulation experiment as follows. Suppose 2 sever peers exchanges data with 5 customer peers. Let δ equal to 3, M equal 2 and N equal to 5.

信任評價模型的細節顯示在如下的簡單模擬實驗中。假定 2 個同等伺服器與 5

個同等客戶交換資料。讓 δ 等於 3 , M 等於 2 和 N 等於 5 。

TABLE I. EXPERIMENT RESULTS OF TRUST EVALUATION MODEL

表一 信任評估模型的實驗結果

Where for the first server peer trusted value, we have 5 customers trusted values 1, 3,

6, 4, 7 at time t n

and customers trusting values to server 1 as 2, 7, 5, 5, 10 and initial

Expectation 5.8, delete two outliers as the abstract deviation 3.8 and 4.2 is greater than δ=3, and give the 2, 3, 4 customers’ new trusted value increase 1 and the else remain unchanged, we get the modified Expectation of Server 1 as 5.46. The similar calculation is down for the server 2.

第一個同等伺服器的信任值,在時間 t n

下我們有 5 個客戶的信任值 1 、 3 、 6 、 4 、

7 ,和在伺服器 1 時客戶信任值為 2 、 7 、 5 、 5 、 10 ,還有初始期望值為 5.8

,刪

除了抽象且偏差的二個比 δ=3 大的離群值 3.8

和 4.2

,並給予 2 、 3 、 4 客戶新的

信任值增加 1 和其他的保持不變,我們得到修改伺服器 1 的期望為 5.46

。類似的

計算也用於伺服器 2 上。

By the evaluation of the peers' trusting and trusted level, we can accurately evaluation peers' reliability and contribution; then choose the useful peer, ignore the redundancy peer, improve the network's efficiency and quality.

根據同等信任評價和信任等級,我們能準確地評價同等的可靠性和貢獻度;然後

選擇有用的同等,忽略過多的同等,以提高網路的效率和品質。

IV. INITIAL TESTS AND RESULTS 最初的測試與結果

We test our methods under Operating System Window XP. The speed of CPU is

3.2GHz, and the memory is 512M, and the capacity of the hard disk is 250G.

我們在作業系統 XP 下測試我們的方法。 CPU 的速度為 3.2GHz

、記憶體為 512M ,

而硬碟的容量是 250G 。

The testing results are encouraging. We get from RSS site http://www.feedss.com the testing articles of finance and political channels, that is from each channel get the links and sub-links of each topics, and collect their feed articles, and use manual inspection to pick up the articles is really belong to the channels, and put those not belong to the channels in separated directories. Then we use our SSM algorithm to classify each of the manual detected directory and let system filter them from each directory. For example first to the finance or none-finance, next to the political or none-political, one by one channel. We collect 83 finance articles and 36 political ones, and manually divided them to 4 directories as true or false themes. We choose these two themes because the financial one is a bit easier in semantic field, and the political is much more difficult since it has many names, actors, positions, titles, and countries and so on. If these factors are neglect, the results will be very confused and lead to put many articles to misleading channels. Therefore, we put many semantic connecting factors into our algorithm, and get the following results.

測試結果是令人鼓舞的。我們從 RSS 網站 http://www.feedss.com

的財金和政治頻

道的測驗文章,那是從每個頻道取得連結和每個主題的子連結,和收集他們的 feed 文章,並使用手動的方式檢測以撿起真正屬於該頻道的文章,而且把那些不

屬於該頻道放在分別的目錄中。然後我們使用我們的 SSM 演算法進行每個手動

發現的目錄分類和讓系統從每個目錄過濾掉他們。例如:首先為財金或非財金,

下一個為政治或非政治,讓頻道為一對一的情況。我們收集了 83 篇財金的文章

和 36 篇政治的文章,和手動地以是或否來分 4 個主題。我們選擇這兩個主題,

因為在語意的範圍裡財金是很容易的,而政治就非常的困難,因為它有許多人

名、參與者、職位、職稱和國家等。如果這些要素被忽視的話,結果將會非常混

亂,並導致許多文章誤導頻道。因此,我們投入許多語意連接要素至我們的演算

法中,並得到以下的結果。

Figure 4. the input interface 輸入的介面

Where features is selected as 1000, the first sentence position weight is 2, the selective words is verb/noun, the weight for Prepositional phrase is 1.2, and so on…

選擇 1000 個特點,首句權重為 2 ,選擇詞性為動詞/名詞,介詞短語句為 1.2

等等 … 。

And made the following results.

並提出了以下結果。

Figure 5. the initial testing of the classification improvement 改善分類的初步測試

In Figure 4, the SSM is our algorithm, the SVM is the Support Vector Machine, the

KNN is the K-nearest neighborhood algorithms. The vertical axis is the accuracy

(correct rates) of the articles in that directory manually detected. For example, in economics directory, our SSM gets the highest accuracy about 0.80, and SVM is 0.75, and KNN is 0.78, and the non-economics directory, SSM is 0.43, SVM is 0.29, and

KNN is 0.09.

在圖 4 中, SSM 是我們的算法, SVM 是支援向量機制, KNN 是 K 近鄰演算法。

們的 SSM 獲得最高準確度約為 0.80

, SVM 是 0.75

和 KNN 為 0.78

,而非經濟的

目錄中, SSM 是 0.43

, SVM 是 0.29

,和 KNN 是 0.09

In total, here are 83 articles in economics, RSS feed has only 0.578313 correct rates,

0.421687 is not correctly in the given channels. While use our SSM algorithm in the

83 articles, there are 71 classified correctly, which means 0.916667 of correct rates. It shows our improvement is significant.

在總數中,經濟在這裡為 83 篇文章,在給定的頻道裡, RSS feed 只有 0.578313

的正確率,而有 0.421687

的非正確性。當使用我們的 SSM 演算法在這 83 篇文章,

將有 71 個分類正確,這意味著正確率為 0.916667

。它表明我們的改善是有效的。

V. CONCLUSIONS AND FUTURE WORK 結論與未來的工作

On the base of INON(Next Generation Intelligent Node Overlap Network platform) and Chinese National media service system platform, we have constructed a framework for the new media service system, realized optimal channel integration and fast content classification. We have designed dynamic algorithm for performance optimization of the P2P download and trust evaluation modules. These modules could be easily combined with payment mechanism to provide and ensure personalized, user behavior analysis supported, and trusted better services in the near future.

在 INON (下一代智能節點重疊網路平台)和中國國家媒體服務系統平台基礎上,

我們建構了一個為了新媒體服務系統的架構,實現了最理想的頻道整合和快速內

容分類。我們設計了動態演算法以實現最理想的 P2P 下載和信任評價模組。這些

模組可以很容易地與支付機制來提供和確保個人化、支援使用者行為分析來相結

合,並相信在不久的將來能有更好的服務。

Download