Improved Matching Integration in Heterogeneous Image Rich

Improved Matching Integration in Heterogeneous Image Rich Information

Network

Ms. Sneha R. Sankhe, Prof. Kavita P. Shirsat

ME Student of VIT, Mumbai University, snehasankhe.88@gmail.com

Asst. Professor, Dept. of Computer Engineering, VIT, Mumbai University, India, kavita.shirsat@vit.edu.in

Abstract — Ecommerce websites are having large amount of images which are accompanied by annotation, comments and other information forming heterogeneous image rich information network (HIN), it poses new challenges to query and search in such networks. Actually by following network schema, HIN can better model realworld systems, which are semi-structured and typed, following a network schema. In such, performing image retrieval, recommendation and measuring similarity is the key task. In image rich information network, Link, content based image similarities are considered as similarity measure to achieve high accuracy. We propose

HMok-SimRank algorithm for weighted HIN to compute link based similarity and content based image similarity is estimated from image content features. Then, we recommend an algorithm Integrated Weighted Similarity

Learning (IWSL) to describe both link and content based similarities by considering the network structure and mutually reinforcing link similarity and feature weight learning. Both local and global feature learning methods are designed. While performing image retrieval image mining, image segmentation, image categorization, tag annotation and collaborative filtering are considered. So when user gives search, we able to generate both visually and semantically relevant and overall recommendation.

Keywords — Link-Based-Similarity, Content-Based similarity, Heterogeneous image rich information network, image retrieval.

I.

I NTRODUCTION

Figure 1. Information network for flickr, connected by images, user tags and groups

However, objects are connected and inter-dependent on each other, how to search effectively in a large network for a given user's query could be a challenge. Similarity search that returns the most similar objects to a queried object, will serve as a basic function for semantic search in heterogeneous networks.

Most of data or informational objects are interconnected or interact with each other, forming complex networks. But, current research networks are usually assumed to be homogeneous, where nodes are objects of the same entity type and links are relationships from the same relation type.

However, most real world networks are heterogeneous, where nodes and relations are of different types. Since treating all the nodes as of the same type homogeneous information networks may miss important semantic information. And treating every node as of a distinct type may also lose valuable schema-level information. Thus, a typed, semi-structured heterogeneous network modelling may capture essential semantics of the real world. Typed, semi-structured heterogeneous information networks are ubiquitous. Heterogeneous information networks, it poses new challenges to query and search in such networks intelligently and efficiently. Given the enormous size and complexity of a large network, a user is often only interested in a small portion of the objects and links most relevant to the query.

Figure 2. Product search and Recommendation system architecture

The proposed system, a novel product recommendation system has been implemented for ecommerce to find both visually and semantically relevant products modelled in an image-rich information network. In this system architecture, the first layer contains the product data warehouse which includes product images and related product information. The second layer performs Meta information extraction in which link based tag similarity and group similarity computed and image feature extraction in which Image similarity can be

predictable from image content features such as color histogram, edge histogram, color correlogram, CEDD, GIST, texture features, Gabor features, shape and SIFT. The third layer builds a weighted heterogeneous image-rich information network. The fourth layer compares the similarity and next layer performs information network analysis to find relevant result for a query. Its user friendly interface is key part, which interacts with users and responds to their requests also collects feedback.

II.

PRELIMINARIES AND NOTATIONS

To model a weighted heterogeneous image-rich information network as a graph G = (V, E, W) with vertices/nodes V edges E and edge weights W. In which V = {

V q

}, where q =1,…..,Q and Q is the number of types of heterogeneous nodes, |Vq| is the number of nodes of type i.

Every image has a D-dimensional content feature FϵD, which is either a single or a grouping of multiple types of features.

Add an edge as l ij

ϵ E between nodes i ϵ V and j ϵV when they are linked together. The weight of this link is denoted by 𝜛 𝑖𝑗

.

We consider a heterogeneous network graph with three types of nodes (Q =3): as images V

I

, groups V

G

, and tags V

T

.

Take Flickr network as an example, there is an undirected link between an image node e ϵV

I

and a tag node t ϵ V

T if e is annotated with t, there is also an undirected link between e and a group node g ϵ V

G

if e belongs to group g. There is no link between nodes of the same type [1].

III.

LINK-BASED SIMILARITY

SimRank is one of the most popular link-based algorithms for evaluating similarity between nodes in information networks. It computes node similarity based on the idea that

“two nodes are similar if they are linked by similar nodes in the network.” The intuition behind the SimRank algorithm is that, in many domains, similar objects are referenced by similar objects. More precisely, objects a and b are considered to be similar if they are pointed from objects c and d , respectively, and c and d are themselves similar. The base case is that objects are maximally similar to themselves.

In a basic homogeneous network, Sim-Rank computes the similarity score in two objects o and o’ is defined as:

𝑆(𝑜, 𝑜 ′ ) =

𝐵

|𝑁(𝑜)| |𝑁(𝑜 ′ )|

∑ 𝑎∈𝑁(𝑜)

∑ 𝑏∈𝑁(𝑜′)

𝑆(𝑎, 𝑏) ,

..................(1) where B is a constant between 0 and 1 i.e. B ∈ [0,1]. N(o) is the set of nodes which link to the object o . |N(o’)| as the cardinality of set N(o). There are two special cases:

Case 1: 𝑆(𝑜, 𝑜 ′ ) = 1; 𝑖𝑓 𝑜 = 𝑜′

Case 2: 𝑆(𝑜, 𝑜 ′ ) = 0; 𝑖𝑓 𝑁(𝑜) = 𝜙 𝑜𝑟 𝑁(0 ′ ) = 𝜙

Space complexity is the amount of memory needed to perform the computation. For a network G of n nodes, the memory space required by SimRank to store the similarity pairs is O(n 2 ). The relation of computing time and the amount of input gives the time complexity. Denote P as the time spent for calculation of eq. (1), then the time complexity is O(n 2 p) in a single iteration. Since SimRank is computed iteratively, the total time complexity is O(In 2 p) for I iterations. The original SimRank is too computationally expensive to be used in large scale networks.

1. Mok-SimRank for Fast Computation

Some algorithms have been proposed for more efficient

SimRank computation. Initialize every object’s top k similar objects as candidates and focus computation on the chosen candidates as (k<<n). This can reduce the space complexity and time complexity. This method can be denoted as k-

SimRank. We choose this strategy because it is proper for image retrieval at large scales where most images are not similar to query and there is no need to estimate their similarity again.

The time complexity of P in k-SimRank is

O(|N(o)||N(o’)|log(k)), where log(k) is complexity to decide whether |N j

(o’)| is a candidate of object |N i

(o)| or not. To make k-SimRank even more computationally efficient, we describe an approach called minimum order k-SimRank (Mok-

SimRank). Denote K(c) as c’s top k similar candidates.

Between N(o) and N(o’), denote N big

as the neighbourhood that has larger cardinality and N small

as the smaller one.

The basic idea of Mok-SimRank is that beginning with N small

, for every c N small

, compute the scores considering two cases:

Case 1: (k< |N big

|) for d K(c) check d N big if TRUE, returns score.

if FALSE, returns zero.

Case 2: (k>= |N big

|) for d N big check d K(c)

if TRUE, returns score.

if FALSE, returns zero.

The time complexity of P is then reduced to be always the minimum combination P min

.

𝑃 𝑚𝑖𝑛

= {

𝑂 (|𝑁 𝑠𝑚𝑎𝑙𝑙

𝑂 (|𝑁 𝑠𝑚𝑎𝑙𝑙

| 𝑘𝑙𝑜𝑔 (|𝑁 𝑏𝑖𝑔

|)) ∶ 𝑖𝑓 𝑘 < |𝑁 𝑏𝑖𝑔

|

| |𝑁 𝑏𝑖𝑔

| 𝑙𝑜𝑔(𝑘)) ∶ 𝑖𝑓 𝑘 ≥ |𝑁 𝑏𝑖𝑔

|

...(2)

This is the optimal combination that achieves the minimum cost via automatically making the minimum computation order. Table 1 summarizes the time and space complexity of the link-based similarity algorithms in a homogeneous network of n nodes.

2. HMok-SimRank for Weighted Heterogeneous Networks

Mok-SimRank can be extended to work for a weighted heterogeneous information network, which contains multiple types of nodes. To explain, we take the image-rich information network from Flickr as an example. Similar

images are likely to link to similar groups and tags, so we define the link-based semantic similarity between images e

V

I

and e’ ∈ V

I

as follows:

𝑆 𝑚+1

(𝑒, 𝑒 ′ ) =α

I

𝑆 𝐺 𝑚

(𝑒, 𝑒 ′ ) + β

I

𝑆 𝑇 𝑚

(𝑒, 𝑒 ′ ), …..........…….…

(3)

Where,

𝑆 𝐺 𝑚

(𝑒, 𝑒 ′ ) =

𝐵

𝐺

𝐼

Ω(𝑁

𝐺 (𝑒))∗Ω(𝑁 𝐺 (𝑒 ′ ))

∑ aϵN

G (e)

∑ bϵN

G (e ′

)

ψ ab ee

′

S m

(a, b) ..………. (4)

𝑆 𝑇 𝑚

(𝑒, 𝑒 ′ ) =

𝐵

𝑇

𝐼

Ω(𝑁

𝑇 (𝑒))∗Ω(𝑁 𝑇 (𝑒′))

∑ aϵN

T (e)

∑ bϵN

T (e ′

)

ψ ab ee

′

S m

(a, b) ….......…..

(5) where, 𝑁 𝐺 (𝑒) is set of groups image e links to, and 𝑁 𝑇 (𝑒) is set of tags image e links to α

I

and β

I

are the weights of linkbased similarity for group and tag, respectively. We set both as 0.5 in experiment to treat them as equally important.

𝐵

𝐼

𝐺 and 𝐵

𝐼

𝑇

are the damping factors. Ω(𝑁 𝐺 (𝑒)) is the sum of the weights for the links between image e and nodes in 𝑁 𝐺 (𝑒) ,

Ω(𝑁 𝐺 (𝑒)) = ∑ 𝑎𝜖𝑁

𝐺 (𝑒)

(𝜛 𝑒𝑎

) ………….........................….(6)

ψ ab ee

′

is the importance/contribution of S(a,b) for S(e,e’), considering the link weighting, and is defined as the multiplicative combination of the weights of the two links l 𝑒𝑎 and l 𝑒′𝑏 ,

ψ ab ee

′

= 𝜛 𝑒𝑎

* 𝜛 𝑒′𝑏

…….....................…………..…........…(7)

Weight 𝜛 can be set manually or automatically. The simplest case is that we set all weights to 1, then the network essentially becomes unweighted and all links are treated as equally important. However, in real applications, the links can be of non equal importance. Take Amazon as an example, the tag frequency represents the number of users who think the tag is relevant to the product. So we can use the tag frequency as weight 𝜛 ea

, for the link between product image e and tag a.

Similarly, we can define and compute the link-based group and tag similarity.

The group similarity is computed via the similarity of the images and tags they link to. For each pair of groups 𝑔 𝜖 𝑉

𝐺

𝑆 𝑚+1 and 𝑔

′

(𝑔, 𝑔′) =α 𝜖 𝑉

G

𝐺′

𝑆 𝐼 𝑚

(𝑔, 𝑔 ′ ) + β

G

𝑆 𝑇 𝑚

(𝑔, 𝑔 ′ ), ...............……(8)

Where,

𝐵

𝐼

𝐺

Ω(𝑁

𝐼 (𝑔))∗Ω(𝑁 𝐼 (𝑔 ′ ))

∑ aϵN

I (g)

𝑆 𝐼 𝑚

(𝑔, 𝑔′) =

∑ bϵN

I (g ′ )

ψ ab gg′

S m

(a, b) ....................(9

) 𝑆 𝑇 𝑚

(𝑔, 𝑔 ′ ) =

𝐵

𝑇

𝐺

Ω(𝑁

𝑇 (𝑔))∗Ω(𝑁 𝑇 (𝑔′))

∑ aϵN

T (g)

∑ bϵN

T (g′)

ψ ab gg′

′

S m

(a, b) ..........….(1

0) where N I (g) is the set of images of group g, and N t (g) is set of tags of group g. The meaning and setting of parameters

α

G

, β

G,

𝐵 𝐼

𝐺 and 𝐵 𝑇

𝐺

are similar to those in (3), (4), and (5).

The tag similarity is calculated via the similarity of the images and groups they link to. For each pair of tags t and t’,

𝑆 𝑚+1

(𝑡, 𝑡′) =α

T

𝑆 𝐼 𝑚

(𝑡, 𝑡′) + β

T

𝑆 𝐺 𝑚

(𝑡, 𝑡 ′ ), .........................…(11)

Where,

𝑆 𝐼 𝑚

(𝑡, 𝑡′) =

𝐵

𝐼

𝑇

Ω(𝑁

𝐼 (𝑡))∗Ω(𝑁 𝐼 (𝑡 ′ ))

∑ aϵN

I (t)

∑ bϵN

I (t′)

ψ ab tt′

S m

(a, b) .....................(12

)

𝑆

𝐺 𝑚

(𝑡, 𝑡′) =

𝐵

𝐺

𝑇

Ω(𝑁

𝐺 (𝑡))∗Ω(𝑁 𝐺 (𝑡′))

∑ aϵN

G (t)

∑ bϵN

G (t′)

ψ ab tt′

′

S m

(a, b) ..................(1

3) where 𝑁 𝐼 (𝑡) is the set of images tagged by t, and 𝑁 𝐺 (𝑡) is set of groups tagged by t. The meaning and setting of parameters α

T

, β t,

𝐵 𝐼

𝑇 and 𝐵 𝐺

𝑇

are similar to those in (3), (4), and (5).

The similarity score for any pair of nodes within the same type is initialized as 0 for different nodes and 1 for the same node. We do not consider similarity between nodes from different types. The similarity scores in (4), (5), (9), (10), (12), and (13) can be efficiently computed by the idea introduced in

Mok-SimRank. So we can still achieve high efficiency via

Mok- SimRank for heterogeneous networks. Note that the computation of the similarity scores in (3), (8) and (11) are mutually dependent.

Generally, basic SimRank, K-SimRank, and Mok-Sim- Rank can be extended to compute link-based similarity in

(weighted) heterogeneous networks, and we call them

HSimRank and HK-SimRank and HMok-SimRank to distinguish with the situation in homogeneous networks.

TABLE 1

Complexity of Algorithm

Type

In

Homogeneous

Network

In weighted

Heterogeneous

Network

Algorithm

K-

SimRank

Time

Complexity

SimRank 𝑂(𝐼𝑛 2 𝑃 )

𝑂(𝐼𝑛𝑘𝑃 )

Space

Complexity

𝑂(𝑛 2 )

𝑂(𝑛𝑘 )

Mok- 𝑂(𝐼𝑛𝑘𝑃 𝑚𝑖𝑛

) 𝑂(𝑛𝑘 )

SimRank

HSimRank

HK-

SimRank

HMok-

SimRank

O(

O(

𝐼 ∑

O(

𝐼 ∑

𝐼 ∑ 𝑝 𝑖=1 𝑝 𝑖=1 𝑝 𝑖=1 𝑚 𝑚

2 𝑖

2 𝑖 𝑚

2 𝑖

𝑃 ) O( ∑ 𝑝 𝑖=1 𝑝 𝑚

2 𝑖

O( ∑ 𝑖=1 𝑚 𝑖 𝑘𝑃 ) 𝑘𝑃 𝑚𝑖𝑛

𝑂(𝑛𝑘

O( ∑ 𝑝 𝑖=1

𝑂(𝑛𝑘

)

) 𝑚 𝑖

) 𝑘) = 𝑘) =

)

Table 1 summarizes the time and space complexity of the link-based similarity algorithms in a (weighted) heterogeneous

network of p types of nodes, with m i type i. Note that the total number of nodes in the network is n= ∑ 𝑝 𝑖=1 𝑚 𝑖

.

IV.

Image similarity can be estimated from image content features such as color and edge histogram, Color Correlogram,

CEDD , GIS T, texture features, Gabor features, shape and

SIFT.

1. Content Metric for Similarity

(i 𝜖{1, … . , 𝑝} ) nodes for

CONTENT-BASED IMAGE SIMILARITY which is used to evaluate the image similarity in our paper.

There are two reasons why we choose 𝜒 2

: First, it has shown performance as good as, and sometimes better, with cosine,

L1 and L2 measures for image similarity in our experiments by human judgment. It has been used in image retrieval and obtains best accuracy as a kernel for SVM-based image classification. Second, its sum of square formula makes it convenient to perform Gradient Decent based optimization used in our algorithm as described in the next section.

V.

INTEGRATING LINK AND CONTENT

SIMILARITIES

We represent an image as a point in a D-dimension feature space with either a single type or a combination of multiple types of features. If the integrated feature space has fixed number of dimensions, our approach is also applicable.

Normalize feature F 𝜖𝑅 𝐷

, where D is the number of dimensions in the feature space, to be of unit length: for any 𝑓 𝑑 , the value of feature F on dimension d (d = 1,….,D), divide it by the sum of values on all dimensions. 𝑓 𝑑 = 𝑓 𝑑 𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙

∑

𝐷 𝑑=1 𝑓 𝑑 𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙

….........................................…….….(14)

The 𝜒 2

test statistic distance between two feature vectors Fi and Fj is defined as: 𝜒 𝑖𝑗

= 𝜒(𝐹 𝑖

, 𝐹 𝑗

) = ∑ 𝐷 𝑑=1 𝑐 𝑑 𝑖𝑗

….......................................…. (15) 𝜒 𝑖𝑗

=

1

2

∑ 𝐷 𝑑=1

( 𝑓 𝑖 𝑑

−𝑓 𝑗 𝑑

)

2 𝑓 𝑖 𝑑

+𝑓 𝑗 𝑑

…………..................................…. (16)

When feature vectors are normalized to unit length, the 𝜒 2 test statistic distance varies from 0 to 1, with 0 indicating the most similar and 1 indicating the most different.

2. Weighted Content Similarity

Existing studies on metric learning have empirically and theoretically shown that instead of treating each dimension of the feature vector equally, a learned weighted metric can significantly improve the performance for tasks such as image retrieval, classification, and clustering [1]. This is because the feature dimensions are usually not equally important for measuring the similarity between images. We can obtain better result by putting more weights on a subset of features, which are more relevant to the semantic meaning of the images [9].

Based on the 𝜒 2

test statistic distance and a D-dimensional feature weighting vector W = (w 1 , w 2 ,………… w define the weighted content similarity 𝑐 𝑤 𝑖𝑗

D ) we

between images i and j as follows: 𝑐 𝑤 𝑖𝑗

= 1 −

1

2

∑ 𝐷 𝑑=1

( 𝑤 𝑑 𝑓 𝑖 𝑑

−𝑤 𝑑 𝑓 𝑗 𝑑

)

2 𝑤 𝑑 𝑓 𝑖 𝑑

+𝑤 𝑑 𝑓 𝑗 𝑑

…...................................(17) 𝑐 𝑤 𝑖𝑗

= 1 − ∑ 𝐷 𝑑=1 𝑤 𝑑 𝑐 𝑑 𝑖𝑗

….............................................…. (18)

Using content similarity only may lead to unsatisfying results. Direct use of link information solely based on human annotations may also lead to unsatisfying results if the annotation is wrong, too general, or incomplete. In addition, if the image does not link to any object in the information network, then only based on link information cannot work.

Traditional thinking is to combine content and link information to achieve more robust performance. In this section, we describe a novel way to integrate these two types of information.

1. Learning Feature Weights

To build a bridge between the content and semantics, we learn a weighting vector 𝑊𝜖𝑅 𝐷

for the feature space to force weighted content-based similarity 𝑐 𝑤 𝑖𝑗 to be somehow consistent with the semantic link-based similarity Sij. We consider two types of approaches: global feature learning

(GFL) and local feature learning (LFL).

2. Global Feature Learning

If the tags (or groups) of an image are incomplete (0 or very few) and thus cannot fully describe its semantic meaning, the link-based similarity becomes unreliable. In order to overcome this factor, we introduce the confidence (or importance) of S ij

and define it as a function of the number of linked annotations (including both tags and groups) to the image. After global feature learning, update the image similarity as a combination of content-based and link based similarity.

𝑆(𝑖, 𝑗) = (1 − 𝜇)𝐶 𝑊 𝑖𝑗

+ 𝜇(𝑝 ∗ 𝑆 𝑖𝑗

+ ℎ ∗ ) ………....……… (19)

Where parameter 𝜇 𝜖[0,1] controls the weight of link-based similarity in the combination. We could manually set a value based on user preference or automatically use the value which estimates the confidence score of the link-based similarity.

3. Local Feature Learning

The problem of global feature learning is that using a global feature weighting for all images may be too general.

Different images may belong to different semantic topics and thus need different weightings to capture their specific important features [1]. Therefore, we can perform LFL to find

a specific feature weight for image. After local metric learning, based on the learned local weights, the similarity between two images i and j can be computed in two ways: symmetric and asymmetric.

The asymmetric similarity 𝑆

𝑆 𝐴 (𝑖, 𝑗) = (1 − 𝜇)𝐶

𝑊 𝑖

∗ 𝑖𝑗

𝐴

+ 𝜇(𝑝

(𝑖, 𝑗) is defined as 𝑖

∗ 𝑆 𝑖𝑗

+ h i

∗ ) …….......…… (20) where (W i

, p i

,h i

) are the learned parameters for image i.

The symmetric similarity 𝑆 𝑆 (𝑖, 𝑗) is defined as

𝑆 𝑆 (𝑖, 𝑗) =

𝑆

𝐴 (𝑖,𝑗) + 𝑆 𝐴 (𝑗,𝑖)

……………………...………… (21)

2

4. Integration Algorithm

We present novel algorithm to integrate link-based and content-based similarities. A basic approach would be in two important stages i.e. First perform HMok-SimRank to compute the link-based similarities and Second perform feature learning (either GFL or LFL) considering the linkbased similarity to update the feature weights, and then update the node similarities based on the new content similarity[1].

Tag and group similarities should get upgraded in stage for which integration between content and link information is required. The idea is that after feature learning, we update the image similarity by combining the link similarity with weighted content similarity. Based on the new image similarity, we can update the group and tag similarity. With new tag and group similarity, we can update new link-based image similarity and learn a new weight. The image/tag/group similarities will be mutually updated iteratively until the process converges or any stop criteria is satisfied. This approach is called as Integrated Weighted Similarity Learning

(IWSL) in which term weighted refers to weighted content feature as well as that algorithm works on weighted heterogeneous network.

Algorithm

Integrated Weighted Similarity Learning (IWSL)

Input: G, the image-rich information network.

1. Construct kd-tree over the image features;

2. Find top k similar candidates of each object;

3. Initialize similarity scores;

4. Iterate {

5. Calculate the link similarity for image pairs via HMok-

SimRank;

6. Perform feature learning using either global or local feature learning;

7. Search for new top k similar image candidates based on the new similarity weighting (Optional);

8. Update the new image similarities by Equation 19 (global), or 20, 21 (local);

9. Compute link-based similarity for all group and tag pairs via HMok-SimRank;

10. } until converge or stop criteria satisfied.

VI.

EXPERIMENTS

1. Data set

We conduct experiments on Amazon dataset. The Amazon data set is created by downloading product images and related metadata information, such as category, tags, and title, via the

API of Amazon. For image feature extraction, we extracted

CEDD which is a compact descriptor that considers both color and edge features. Also while uploading images in our system we have implemented the task of image segmentation on each image and then that image gets stored. For tag pre-processing, we change all tags to lower case. Tags with only number characters are removed. Stop-words, such as the, you and me, are also removed. The remaining tags are stemmed using the

Porter stemming algorithm. In which linguistic normalization, i.e. the variant forms of a word are reduced to a common form.

2. Experimental Setting

Experiments were conducted on a PC with Intel (R) Core

(TM) i3-3110M CPU @ 2.40GHz 2.40 GHz and 4 GB RAM, running Windows 8.1Pro.

3. Relevance Performance

The Mean Average Precision (MAP) is used to measure the retrieval performance of the algorithms. The final MAP score for each algorithm is calculated as the mean average precision of each image. It shows that the link based similarity performs better than content based similarity. Algorithm IWSL further improves the performance by introducing a new way of integrating content and link information via mutual reinforcement with feature learning. IWSL_L achieves better results than IWSL_G, because IWSL_L performs local feature learning, which can find a specific and better feature weighting for each image than global feature learning, which finds a general feature weighting for all images.

Figure 3. Comparison based on Link , Content, Local and

Global learning

4. Product Search and Recommendation System for E-

Commerce

In product search and recommendation system, When users search and click on a product in a web browser, we can recommend both visually and semantically relevant products,

with the relevance score computed by our IWSL algorithm.

However, for products (e.g., jewellery, watch, bag, glasses, and clothes) that depend heavily on visual appearance to attract consumers, we are able to generate both visually and semantically relevant recommendations without the co-viewed or co-bought information. This leads to a better general product search service that enables users to compare and make the best choice.

Figure 4. Result for Link based similarity with its retrieval performance in milliseconds

Figure 5. Result for Content based similarity with its retrieval performance in milliseconds

Figure 6. Result for Global and Local feature learning with its retrieval performance in milliseconds

Figure 7. Result for IWSL algorithm with its retrieval performance in milliseconds

Fig.3 gives the comparative graph of link, content based weight with learning local and global features. And Fig. 4 to

Fig. 7 gives the retrieval performance of link, content, local and global feature learning so that from local and global we can use local feature learning in IWSL algorithm since it is giving a better results. In fig. 7 both visually and semantically relevant product, with the relevance score computed by our

IWSL algorithm is recommended.

VII.

CONCLUSION

This paper presents an efficient way of finding similar objects (such as photos and products) by modelling ecommerce websites as image rich information networks. We observed that HMok-SimRank algorithm to efficiently estimate the weighted link-based semantic similarity in weighted heterogeneous image-rich information networks.

This method is much faster than heterogeneous SimRank and

K-SimRank. We propose the algorithm IWSL to provide a novel way of integrating with feature weighting learning for similarity/ relevance computation in weighted heterogeneous image-rich information network. Also network structure benefits the task of image mining.

Global and local feature learning approaches are compared for learning a weighting vector to capture more important feature subspace to narrow the semantic gap .And observed that Local feature learning achieves better results than global, therefore local feature learning is used in IWSL to update new image similarity.

We conducted experiments on Amazon networks. The results have shown that our algorithm achieves better performance than other traditional approaches. We have implemented a new product search and recommendation system to find both visually coherent and semantically pertinent products based on algorithm.

VIII.

REFERENCES

[1] Xin Jin, Jiebo Luo, Jie Yu, Gang Wang, Jiawei Han and

Dhiraj Joshi, “Reinforced Similarity Integration in

Image-Rich Information Networks”, IEEE

TRANSACTIONS ON KNOWLEDGE AND DATA

ENGINEERING, VOL. 25, NO. 2, FEBRUARY 2013

[2] Hongyan Liu, Jun He, Dan Zhu, Charles X. Ling and

Xiaoyong Du, “Measuring Similarity Based on Link

Information: A Comparative Study”, IEEE

TRANSACTIONS ON KNOWLEDGE AND DATA

ENGINEERING, VOL. 25, NO. 12, DECEMBER 2013.

[3] Jiawei Han, Yizhou Sun, Xifeng Yan, Philip S. Yu,

“Mining Knowledge from Data: An Information

Network Analysis Approach”, 2012 IEEE 28th

International Conference on Data Engineering.

[4] Fahim Irfan Alam, Muhammad Iqbal Hasan Chowdhury,

Md. Reza Rabbani, Fateha Khanam Bappee, "An

Optimized mage Segmentation Algorithm", 2013 IEEE.

[5] Amel Ksibi, Mouna Dammak, Anis Ben Ammar,

Mahmoud Mejdoub and Chokri Ben Amar, "Flickrbased Semantic Context to refine Automatic Photo

Annotation", 2012 IEEE.

[6] Mohammed Mahmudur Rahman, “Contextual

Recommendation System”,2013 IEEE.

[7] J. Tang, G.-J. Qi, H. Li and T.-S. Chua, “Image

Annotation by Graph-Based Inference with Integrated

Multiple/Single Instance Representations,” IEEE

Transactions on Multimedia, vol. 12, no. 2, pp. 131- 141,

Feb. 2010.

[8] Yizhou Sun and Jiawei Han, "Meta-Path-Based Search and Mining in Heterogeneous Information Networks",

TSINGHUA SCIENCE AND TECHNOLOGY

ISSNll1007-0214ll01/10llpp329-338 Volume 18,

Number 4, August 2013.

[9] Xin Jin, Jiebo Luo, Jie Yu, Gang Wang, Jiawei Han and

Dhiraj Joshi, “iRIN: Image Retrieval in Image-Rich

Information Networks,” Proc. 19th International

Conference World Wide Web (WWW ’10), pp. 1261-

1264, 2010.

[10] Xiao Yu, Xiang Ren, Quanquan Gu, Yizhou Sun, Jiawei

Han, "Collaborative Filtering with Entity Similarity

Regularization in Heterogeneous Information

Networks",2013.

[11] Fahim Irfan Alam, Muhammad Iqbal Hasan Chowdhury,

Md. Reza Rabbani, Fateha Khanam Bappee, "An

Optimized mage Segmentation Algorithm", 2013 IEEE.

Improved Matching Integration in Heterogeneous Image Rich

Improved Matching Integration in Heterogeneous Image Rich Information

Network

Ms. Sneha R. Sankhe, Prof. Kavita P. Shirsat

ME Student of VIT, Mumbai University, snehasankhe.88@gmail.com

Asst. Professor, Dept. of Computer Engineering, VIT, Mumbai University, India, kavita.shirsat@vit.edu.in

Related documents

Products

Support

Improved Matching Integration in Heterogeneous Image Rich

Improved Matching Integration in Heterogeneous Image Rich Information

Network

Ms. Sneha R. Sankhe, Prof. Kavita P. Shirsat

ME Student of VIT, Mumbai University, snehasankhe.88@gmail.com

Asst. Professor, Dept. of Computer Engineering, VIT, Mumbai University, India, kavita.shirsat@vit.edu.in

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib