google opinions project proposal

advertisement
GOOGLE OPINIONS PROJECT
PROPOSAL
Prepared for
Professor Betsy Schlobohm
Prepared by
David Urbina
Khairun-nisa Hassanali
Michael Fashola
Scott Larson
April 28, 2009
GOOGLE OPINIONS
II
Memorandum
DATE:
TO:
FROM:
SUBJECT:
March 25, 2009
Scott Larson – Project Manager, Google Think Team
Marissa Mayer - VP Search Products and User Experience
Update to Google Search Engine
Recent advances in Information and Communication technology have made our search engine
developed in the early days of the World Wide Web in need of major reviews and update. We
have seen increase in the number of blogs, twittering, social networking sites and other opinionthemed sites like yelp.com, jdpower.com, and rottentomatoes.com that signal a paradigm shift
in content and distribution [1]. No longer are opinions two dimensional images that exist only in
the mind of the conceiver but, through the instrumentality of the web, are now potent agents of
change [2].
Google as a leader in launching feature-rich applications has set up a new group called Google
Think to undertake the task of enhancing the Google Search Engine in order to accommodate
this paradigm shift and develop a contemporary search engine that will provide decision-making
solutions to consumers.
I am happy to invite your team to undertake this project with the responsibility of designing and
developing a comprehensive search engine that will mine blogs, opinion editorials and provide
themed search which will ultimately appeal to users.
We wish you the best in this assignment.
Cheers,
Marissa Mayer
III
Memorandum
DATE:
TO:
FROM:
SUBJECT:
March 26, 2009
Marissa Mayer – Vice President, Search Products and User Experience
Scott Larson – Project Manager, Google Think Team
Acceptance of Offer
Your memo empowering our group to create an update to the Google search engine was
received with enthusiasm.
Our appointment to undertake this project reflects the high level of confidence you repose on us
to develop a ground-breaking application that will re-launch Google as a leader in search engine
development. Please rest assured we will approach this assignment with vigor.
My team will be conducting surveys and may request for information that will aid a successful
completion of this project. This project will involve the use of products and technologies
developed by other teams at Google. We would therefore need the cooperation of all these
teams at Google. Kindly inform us if there is to be any change in scope of this project.
Cheers,
Scott Larson
IV
Memorandum
DATE:
TO:
FROM:
SUBJECT:
April 10, 2009
Marissa Mayer – Vice President, Search Products and User Experience
Scott Larson – Project Manager, Google Think Team
Updates to Google Search Engine
We have conducted an assessment of the current search engine in line with your request, and
proffered viable options in enhancing the Google search engine as we know it today.
We are in agreement with the need to update the Google search engine capabilities. While
Google’s share of searches has increased year-on-year [3], it is still unable to meet one of the
primary needs of today’s internet users: searching for opinions.
We propose Google Opinions, an extension of the Google Search Engine, which provides an
end to end solution to users searching for opinions. Google Opinions will also provide users with
advanced search capabilities that will enable them to further pin down the source of the opinions
and display statistics on the opinions retrieved by the system.
The general objective of the proposed plan is to make Google more responsive to the evolving
internet culture [1], [2] and launch new capabilities that will put Google at the frontline for years
to come. A critical success factor is to have a cross-over appeal to Google, so that it can be of
use to all strata of the society. This better positioning will lead to an improvement in revenue
streams.
We hereby acknowledge with thanks, the kind assistance of Vinton Cerf and his team in
providing market intelligence reports and new markets product themes. These came in handy
while analyzing the cost and benefits of the proposed option.
Working on the project proposal has broadened our insights. We envisage this is just a step in a
series of concerted efforts to make Google a leader in her field, and we will be glad to play a
role in helping realize Google’s short and long term goals.
Cheers,
Scott Larson.
V
Table of Contents
Memorandum ...........................................................................................................................III
Memorandum .......................................................................................................................... IV
Memorandum ........................................................................................................................... V
Table of Contents .................................................................................................................... VI
List of Illustrations ................................................................................................................ VIII
List of Tables ......................................................................................................................... VIII
Executive Summary ................................................................................................................ IX
Google Opinions ...................................................................................................................... 1
Introduction ............................................................................................................................ 1
Current Situation .................................................................................................................... 1
Project Plan .............................................................................................................................. 2
The solution ........................................................................................................................... 2
Objectives .............................................................................................................................. 2
Major and Minor Steps ........................................................................................................... 4
Deliverables and outcomes .................................................................................................... 5
Qualifications ........................................................................................................................... 8
The Google Think Group ........................................................................................................ 8
The People............................................................................................................................. 8
Costs and Benefits................................................................................................................... 9
Conclusion and Recommendations ......................................................................................11
Appendix A.
Project Plan ..................................................................................................12
Google Opinions System.......................................................................................................12
Google Opinions Project Timeline .........................................................................................14
Google Opinions Main Page .................................................................................................15
Appendix B.
Costs and Revenue ......................................................................................16
Costs.....................................................................................................................................16
Revenue ...............................................................................................................................20
Appendix C.
Google Opinions Glossary ..........................................................................23
Appendix D.
Financial Statements ...................................................................................32
Google Inc.: Income Statement .............................................................................................32
Google Inc.: Balance Sheet ...................................................................................................36
VI
Google Inc.: Cash Flow Statement ........................................................................................39
Appendix E.
Resumes .......................................................................................................41
References ..............................................................................................................................46
VII
List of Illustrations
FIGURE A-1: GOOGLE OPINIONS SYSTEM DIAGRAM .................................................................................................12
FIGURE A-2: GOOGLE OPINIONS PROJECT TIME LINE ..............................................................................................14
FIGURE A-3: GOOGLE OPINIONS USER INTERFACE...................................................................................................15
FIGURE B-1: COST PROFILE .......................................................................................................................................18
FIGURE B-2: PHASE-WISE COST ................................................................................................................................19
FIGURE B-3: COST PER MODULE ...............................................................................................................................20
FIGURE B-4: COST BENEFIT ANALYSIS ......................................................................................................................22
List of Tables
TABLE 1: MAJOR AND MINOR STEPS ............................................................................................................................5
TABLE 2: PROJECT DELIVERABLES AND OUTCOMES ...................................................................................................7
TABLE 3: SOFTWARE COST.........................................................................................................................................17
TABLE 4: PHASE-WISE COST ......................................................................................................................................18
TABLE 5: COST PER MODULE .....................................................................................................................................19
TABLE 6: INCOME STATEMENT ....................................................................................................................................35
TABLE 7: BALANCE SHEET ..........................................................................................................................................38
TABLE 8: CASH FLOW STATEMENT .............................................................................................................................40
VIII
Executive Summary
Google needs to once again re-position herself as a leader in search engine development, and
offer greater value to users in real value terms. Opinion-themed searches will enable us seize
the initiative and open new vistas in previously uncharted territory.
We propose Google Opinions, an extension of the Google Search Engine, to provide opinionthemed searches and appeal to a broad spectrum of users with varied needs such as
consumers, employers, students and businessmen [1] – [3]. We envisage Google Opinions
being used to provide opinions in a recruitment decision, product ratings in a purchasing
decision and other individual and corporate decision challenges arising from the current
environment of a vast array of products and an information overload.
The Google Opinions project is proposed to start on June 1, 2009 and end on May 31, 2010.
This project will involve a team of thirty highly qualified personnel, with extensive experience in
information retrieval, sentiment mining and software development.
The financial and non-financial benefits from the Google Opinions project far outweigh the cost.
The Google Opinions project will cost $10,909,357. The increase in revenue from Google
Opinions is envisaged to be above $4,000,000 per annum.
Implementing the Google Opinions project will lead to publishing path breaking papers on
sentiment mining. Further, our competitors such as Microsoft [4] – [6] are working on similar
technology and therefore it is imperative that we get Google Opinions out into the market before
our competitors.
IX
Google Opinions
Introduction
A widely valued, but rarely provided service is that of opinions. Customers are always in need of
advice about other interests, frequently referring to reviews written by professionals and other
customers [1, 2]. However, while these opinions may be useful, they are far from exhaustive,
and do not allow a view of the wide variety of views available.
Google Opinions attempts to solve this problem by providing a search engine designed
specifically for retrieving opinions.
Current Situation
As of April 2009, search engines do not have the capability of performing opinion based
searches. The World Wide Web is abounding with opinions on blogs, newspapers, review sites
and social network sites [1] – [3]. Given time and effort, a user can use a standard search
engine to research these websites and analyze the few opinions they find. However, the rate at
which the information is collected, the quality of opinions retrieved in this manner is time
consuming and insufficient for most consumer and business needs.
The cause of this problem is the lack of an efficient means to collect and retrieve opinions from
websites [1] – [3]. The collection and analysis of the large quantity and variety of opinions
available on the Web is beyond the scope of a single user’s practical and willing effort to retrieve
the corresponding results. The Google Search Engine was not designed with a perspective of
retrieving opinions from websites.
If this problem is not solved, both users and businesses will continue to suffer from a lack of
online consensus and views on particular products, positions, and ideals [1] – [3], [7], [8].
1
2
Google’s major competitor Microsoft has filed patents on similar technology [4] – [6]. Google
needs to bring out a system that solves this problem before its competitors giving it an edge in
the market of opinion-based retrieval. Not doing so would lead to a loss of revenue as people
will move to the competitors search engine. Further, there would other benefits that Google
would miss out on such as publishing path breaking research papers that result from this
project.
Project Plan
The solution
In order to meet the demand for opinion based searches [1] – [3], we propose to develop
Google Opinions [7], [8], an extension of the Google Search Engine. Google Opinions will
enable a user to perform opinion-based search retrieving both positive and negative opinions on
a particular subject.
Google Opinions will also provide the user with advanced search features such as specifying
the specific source of opinions and how negative or positive the opinions retrieved by Google
Opinions should be. Further, using Thought Stats, users can view statistics and graphs on the
opinions retrieved for a search subject. Thought Stats also includes i-Util, which gives a
measure of satisfaction derived from opinions on products and services. Please refer to
Appendix A for a detailed explanation of the Google Opinions system along with the system
diagram and proposed user interface. Please refer to Appendix C for an explanation of the
terms used in the Google Opinions project.
Objectives
Google Opinions must meet the following objectives:
1. Provide an end to end solution that allows for users to:
3
a. Search for opinions on a subject [7], [8].
b. Display links to articles that contain opinions on these subjects.
c. Display representative sentences containing these opinions [7], [8].
2. .Provide a user interface that will provide for users to:
a.
Provide the search words on which they want an opinion.
b. Provide for advanced search options that allow the user to:
i. Specify the polarity (degree of positivity or negativity of the opinions
retrieved by Google Opinions [7], [8].
ii. Specify the sources Google Opinions should use for retrieving opinions.
iii. Specify the time frame within which the opinions are expressed.
c. Provide help on using Google Opinions.
3. Provide the user with statistics and graphs on the opinions retrieved by Google
Opinions.
4. Display advertisements related to the search words.
5. Allow for easy integration with other Google components such as Google [9] and Google
AdSense [10].
4
Major and Minor Steps
The table below gives the major and minor steps of the Google Opinions Project along with the
timelines. The Google Opinions project is expected to start on June 1, 2009 and end on May, 31
2009. Please refer to Figure A-2 in Appendix A for the Google Opinions project timeline.
Expected
Major Step/Component
Minor Step
Expected Start Date
Completion Date
Testing and
Pre-Requirements
June 1, 2009
June 30, 2009
Requirements
July 1, 2009
July 31, 2009
Design
August 1, 2009
August 31, 2009
Coding
September 1, 2009
September 31, 2009
Unit Testing
October 1, 2009
October 31, 2009
Requirements
July 1, 2009
July 31, 2009
Design
August 1, 2009
August 31, 2009
Coding
September 1, 2009
October 31, 2009
Unit Testing
November 1, 2009
November 30, 2009
Requirements
July 1, 2009
July 31, 2009
Design
August 1, 2009
August 31, 2009
Coding
September 1, 2009
September 30, 2009
Unit Testing
October 1, 2009
October 31, 2009
Requirements
July 1, 2009
July 31, 2009
Design
August 1, 2009
August 31, 2009
Coding
September 1, 2009
October 31, 2009
Unit Testing
November 1, 2009
November 30, 2009
installing software
User Interface
Go-Op-Crawler
Pre-Processor
Go-Top-Generator
5
Expected
Major Step/Component
Minor Step
Expected Start Date
Completion Date
Requirements
July 1, 2009
July 31, 2009
Design
August 1, 2009
August 31, 2009
Coding
September 1, 2009
October 31, 2009
Unit Testing
November 1, 2009
November 30, 2009
Requirements
July 1, 2009
July 31, 2009
Design
August 1, 2009
August 31, 2009
Coding
September 1, 2009
October 31, 2009
Unit Testing
November 1, 2009
November 30, 2009
Requirements
July 1, 2009
July 31, 2009
Design
August 1, 2009
August 31, 2009
Coding
September 1, 2009
October 31, 2009
Unit Testing
November 1, 2009
November 30, 2009
Integration
All components
December 1, 2009
December 31, 2009
System Integration
Testing and bug
January 1, 2010
March 31, 2010
Testing [11]
fixing
Product Quality Testing
Testing and bug
April 1, 2010
May 31, 2010
[12]
fixing
Go-Op-Selector
Go-Summarizer
Thought Stats
Table 1: Major and Minor Steps
Deliverables and outcomes
Table 2 gives the deliverables and outcomes for the Google Opinions Project. These
deliverables are due once the component is completed. Please refer to Table 1 for the
completion dates of each component.
6
Component
Deliverables
Outcome
User Interface
Source Code [13]
A user interface which enables users to
Executable [14]
search for opinions. The user interface will
Deployment Manual [15]
provide for advanced search options and a
User Manual
Help section.
Software Requirements
Specification [16]
Go-Op-Crawler
Source Code [13]
A crawler that collects opinion-laden web
Executable [14]
pages based on the search words.
Deployment Manual [15]
Software Requirements
Specification [16]
Unit Testing Set Kit
Preprocessor
Source Code [13]
A program that takes web pages collected
Executable [14]
by the Go-Op-Crawler and produces plain
Deployment Manual [15]
text.
Software Requirements
Specification [16]
Unit Testing Set Kit
Go-Top-Generator
Source Code [13]
A module that takes as input text and gives
Executable [14]
as output the topics discussed in the text.
Deployment Manual [15]
Software Requirements
Specification [16]
7
Component
Deliverables
Outcome
Unit Testing Set Kit
Topic Aspect Database
Go-Op-Selector
Source Code
A module that takes as input a set of topics
Executable
and text and gives as output opinion-laden
Deployment Manual
sentences within the text on the given
Software Requirements
topics.
Specification
Unit Testing Set Kit
Go-Summarizer
Taxonomic Sentiment
A module that takes a set of sentences as
Database
input and selects sentences that represent
Source Code [13]
a summary of the input sentences.
Executable [14]
Deployment Manual [15]
Software Requirements
Specification [16]
Unit Testing Set Kit
Thought Stats
Source Code [13]
A module that generates statistics and
Executable [14]
graphs based on the data collected by other
Deployment Manual [15]
modules.
Software Requirement
Specification [16]
Unit Testing Set Kit
Table 2: Project Deliverables and Outcomes
Other general deliverables this project will produce are:
8

System Architecture Document [17]

Integration Testing Kit
Google Opinions will be the final deliverable which will provide an end to end solution enabling
users to search for opinions. Google Opinions will be ready for deployment on May 31, 2009.
Qualifications
The Google Think Group
The Google Think group was founded three months ago to harness the potential market in
opinion extraction and mining. The Google Think team consists of software engineers who are
experienced in software development, sentiment mining, information retrieval and web based
applications. Please refer to Appendix E for the resumes of the Google Think team members.
The People
Scott Larson – Manager
Scott is close to completing his Master’s Degree in Software Engineering at the University of
Texas at Dallas. He is experienced in leading groups, including time management, resource
management, work distribution, and project integration. Scott will be managing the Google
Opinions project.
Khairun-nisa Hassanali – Lead Researcher and Developer
Khairun-nisa is a highly educated Researcher of Natural Language Processing at the University
of Texas at Dallas. She is currently pursuing a PhD in Computer Science with a research focus
9
on Sentiment Mining. Khairun-nisa will be leading all research and development on the Google
Opinions project.
David Urbina – Lead Architect
David has extensive experience as a Solutions Architect and Developer. He is a Graduate
Student at the University of Texas at Dallas with a 4.0 GPA, majoring in Software Engineering.
David will be leading the Requirements Engineering, architecture, and design of the project.
Michael Fashola – Lead Tester and Manager of Marketing and Finance
Michael is a Graduate Student at the University of Texas at Dallas. He is experienced in
software testing and financial auditing. Michael will be leading the Testing and Validation phase
of the Google Opinions project, as well as managing the finances and marketing of the Google
Opinions project.
Costs and Benefits
Our market research indicated that while Google has maintained her market share despite
growing competition [18] – [21], opportunities for growth have been limited. This is supported by
financial reports for preceding four quarters as shown in Appendix E. A key ingredient to jumpstart another era of prolific growth is being able to offer quick and efficient search for opinions
[1] – [3]. Google Opinions will fulfill this need. None of our competitors such as Microsoft [22]
and Yahoo [23] have this capability in their search engines.
As shown in Appendix B, the implementation of the Google Opinions project will cost an
estimated $10,909,357 dollars. We believe that this investment in Google Opinions will enable
Google to take over a niche part of the market. We expect annual revenue increases of over
$4,000,000 from Google Opinion and expect to break even in three years of implementing this
10
product. This increase in revenue will come from Google AdWords [9] and AdSense [10] as
businesses would want to purchase keywords for the Google Opinions related to their activities.
A person looking for a positive or negative review on a product or service may also be looking at
purchasing the same. Therefore, based on the reviews they are most likely to click on the
business that sells this product or service. Further, Thought Stats is also a useful tool for
businesses to track people’s opinions on their products and services. We intend to
commercialize Thought Stats four years from now as a way of giving businesses access to
otherwise copyrighted materials with expected revenues thereby further boosting our growth
prospects.
Google Opinions can be ported at no extra cost with Google Site Search [22]. Businesses such
as newspaper sites and review sites would be interested in opinion based searches in addition
to a general search.
But the advantages of our plans go beyond simple costs:
First, this will be a major research breakthrough and will lead to publishing of path breaking
papers by Google in the field of sentiment mining.
Second, we will lose a portion of our market share if we do not get Google Opinions into the
market. Our major competitor Microsoft is also working on similar technology and has filed a
patent for the same [4] – [6]. If we do not get our product out into the market first, we will lose a
portion of our customers to Microsoft. This will lead a loss of revenue.
Third, this technology can be used in future in the Google Suite of products such as Orkut with
little cost.
11
Conclusion and Recommendations
Google Opinions will capture the market as it will be the first search engine to provide opinion
based searches thereby providing Google with increased revenues. As a result of its fast
performance and reliability, we expect this project to do better than similar products in future by
Google’s competitors. Further,
If our proposal is approved, we anticipate a commencement date of June 1, 2009 and being
completed no later than May 31, 2010. In order to proceed, we suggest recruiting 26 employees
before May 15, 2009, and purchase the required hardware and software no later than May 10,
2009. The software should be installed on all machines no later than May 15, 2009. This will
allow the Google Opinions project to proceed as planned and have maximum opportunity to be
tested before deployed.
12
Appendix A. Project Plan
Google Opinions System
The Google Opinions system uses the Go-Compare algorithm and Thoughts Discovery
paradigm. The Google Opinions system consists of the following major steps:
The figure below gives the system diagram for the Google Opinions project.
Newspapers
Web Portals Blogs
Raw Data
Web
Crawling
A1
Go-OpCrawler
Cleaned up
Data
Preprocessing
List of
Aspects
A2
Preprocess
or
Input Search
Text
Articles related
to Search Text
Topic
generation
A3
Go-TopGenerator
Polarity
Thesaurus
/Dictionary
Polarity
Scanning
A4
Go-OpSelector
Opinion-Laden
Statements
Summarizing
Polarity Rate
Summary
A5
GoSummariz
er
NODE:
A0 TITLE:
Google Opinions
Figure A-1: Google Opinions System Diagram
NO.:
13

Web Crawling: In order to perform an opinion-based search, the Google Opinions will
require access to opinion-laden data. The Go-Op-Crawler [7], [8] component will crawl
the web for articles related to the search words given by the user. This data will be
stored on servers and will be updated on a daily basis. This data will be grouped
according to the time the data was published on the web.

Preprocessing: The data collected by the Go-Op-Crawler will be in HTML or XML
format. The algorithms used by the Google Opinions system require plain text along with
statistics on the occurrences of words within the input text [7], [8]. The Pre-Processor
component will take in raw data in HTML/XML form and process it to give plain text
along with statistics on the occurrences of the words in the text.

Topic Generation: The Go-Top-Generator will find the main topics in a given text. The
text will then be tagged with the topics generated by the Go-Top-Generator. The Go-OpSelector component will be given as inputs those texts with topics related to the search
words.

Polarity Scanning: The Google Opinions system is interested only in retrieving opinions
on the search words entered by the user. The Go-Op-Selector searches for sentences
that contain opinions [7], [8]. This is done by looking for words (adjectives, adverbs or
verbs) that have a positive or negative connotation. Examples of such words would be
“delicious” and “awful”.

Summarizing: The Go-Summarizer component will select representative sentences
from the set of sentences containing opinions [7], [8]. These sentences, along with a link
to the document, will be shown to the user by Google Opinions.
14
Google Opinions Project Timeline
The figure below gives the timelines each of the task will be completed. The Google Opinions
project is expected to start on June 1, 2009 and end on May 31, 2010.
2009
ID
Task Name
Start
Finish
2010
Duration
Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May
1 User Interface
6/1/2009
10/30/2009
110d
2 Go-Op-Crawler
6/1/2009
11/30/2009
131d
3 Preprocessor
6/1/2009
10/30/2009
110d
4 Go-Top-Generator
6/1/2009
11/30/2009
131d
5 Go-Op-Selector
6/1/2009
11/30/2009
131d
6 Go-Summarizer
6/1/2009
11/30/2009
131d
7 Integration
12/1/2009
12/31/2009
23d
8 Integration Testing
1/1/2010
4/1/2010
65d
9 Quality Testing
4/2/2010
6/1/2010
43d
Figure A-2: Google Opinions Project Time Line
15
Google Opinions Main Page
Figure A-3 gives the proposed user interface for the Google Opinions project [25].
Figure A-3: Google Opinions User Interface
16
Appendix B. Costs and Revenue
Costs
The Google Opinions project will cost a total of $10,909,357. These costs are divided as follows:

Personnel costs - $3,000,000 [26]
o
The Google Think team consists of 30 members who will all be working on the
Google Opinions project. This consists of 16 software engineers with 1-2 years of
experience, 10 software engineers with 4-6 years of experience, 1 project
manager, 1 lead researcher, 1 testing manager and 1 lead architect. These
personnel will take part in all the phases of the project which include
requirements collection, developing and testing.

Hardware costs - $7,000,000
o
The hardware requirements for the Google Opinions project consists a total of
2000 Intel Xeon Processors [27]. The project will require 1220 computing
processors, 30 workstations and 750 data servers. This will be written off over a
5-year period in line with best accounting practices.

Software costs - $909,357
o
The table below gives a breakdown of the software costs of the Google Opinions
project:
Software
Microsoft Office Ultimate 2007 [28]
Number of
Total Cost in US
Licenses/Copies
Dollars
30
15,000
17
Software
Number of
Total Cost in US
Licenses/Copies
Dollars
Microsoft Office Project Ultimate 2007
5
8,000
100
150,000
5
115,000
10
18,750
700
105,000
Rational Rose [34 -37]
10
50,000
Rational PurifyPlus [38]
10
25,000
X-Manager Enterprise 3 [39]
30
10,857
[29]
Microsoft Vista Business Operating
System [30]
Oracle Data Mining Database [31]
OriginPro 8 Data Analysis and Graphing
Software [32]
Red Hat Enterprise Linux Advanced
Platform [33]
Table 3: Software Cost
o
We will require Python, Java, Perl, C, C++, Postgres and MySQL software to be
installed on all servers and workstations. This software is available free of cost.
o
The following software has been developed by other groups in Google and will
be enhanced to be used in the Google Opinions project.

Go-Op-Crawler

Pre-processing software

Go-Summarizer
18
Additionally, a team of 10 software engineers will be required to maintain the Google Opinions
product. We therefore envision a cost of $100,000 per annum in maintenance costs.
Figure B-1: Cost Profile shows the proportion of the costs:
Figure B-1: Cost Profile
The phase-wise cost is given in Table 4 and Figure B-2.
Phase
Cost in US Dollars
Pre-Requirements
909,113
Requirements
909,113
Design
909,113
Development
1818227
Unit Testing
909,113
Integration
909,113
System Integration
909,113
System Integration Testing
1818226
Product Quality Testing
1818226
Table 4: Phase-wise Cost
19
Figure B-2: Phase-wise Cost
The cost per module is given in the table below:
Module
Cost in US Dollars
Go-Op-Crawler
1090936
Pre-processor
1090936
Go-Top-Generator
2181871
Go-Op-Selector
2181871
Go-Summarizer
2181871
Thought Stats
1090936
User Interface
1090936
Table 5: Cost per Module
The chart below shows the division of the costs per module:
20
Figure B-3: Cost per Module
Revenue
The majority of Google’s revenue comes from advertising on Google’s website and her partners’
websites. We surveyed 20 people to see if they would like to use Google Opinions. All
responded they would use Google Opinions for product reviews, movie reviews, service
reviews, school projects and performing background checks. This survey indicates that Google
Opinions would be well received in the market.
Our advertising revenues from posting advertisements on Google Opinions would rise since
stores, service provides, product manufacturers, education providers would want their product,
store or service listed when a user searches for an opinion on them. We therefore predict an
increase in the revenues from Google AdWords and AdSense as a result of Google Opinions as
below [32]:
AdWords –
1,000 new opinion-themed words for auction at an average cost-per-click of $0.38 and 10,000
visits per annum translates to $3,800,000 (1,000 * 0.38 * 10,000)
21
All things being equal, we expect a growth rate of 5% in traffic year-on-year thereby yielding the
following figures for subsequent years:
Year 2: ($3,800,000 * 1.05) = $3,990,000
Year 3: ($3,990,000 * 1.05) = $4,189,500
Year 4: ($4,189,500 * 1.05) = $4,398,975
Year 5: ($4,398,975 * 1.05) = $4,618,924
AdSense –
The domino-effect of opinion-themed search will see partner websites receiving increased traffic
due to clients curious about the advisability of a particular product or service relative to close
substitutes. We estimate an increase in net revenue of $300,000 [33] from this medium in Year
1 with year-on-year increases of 10% as the web-space opens up, and more and more people
have access to the Internet. The table below shows projected increase in net revenue from
AdSense over a 5-year period:
Year 1: $300,000
Year 2: ($300,000 * 1.10) - $330,000
Year 3: ($330,000 * 1.10) - $363,000
Year 4: ($363,000 * 1.10) - $399,300
Year 5: ($399,300 * 1.10) - $439,230
22
Thought Stat –
A survey on Thought Stat will be conducted as it runs a free pilot phase in the first and second
years to enable organizations gauge its effect on their bottom-line. Survey responses will be
used to revamp the service before commercialization beginning 2 years after deployment. With
Fortune 100 companies all paying for license keys at rate of $5,000 per annum, we expect
substantial increase in revenue derivable from this window as it compares favorably with
amount paid for such professional services individually procured.
Google Opinions can be offered as an extension to Google Site Search at no extra cost.
Websites such as product websites, newspaper sites and review sites would be interested in
having an opinion based search capability since most of the users would be interested in
opinions. Based on the market survey we conducted, we believe our estimates are conservative
and substantial positive differences could be expected.
This translates to the chart below showing the break-even point towards the end of the second
year.
Figure B-4: Cost Benefit Analysis
23
Appendix C. Google Opinions Glossary
Term: Go-Compare
Definition: Go-Compare is an algorithm in the project that will aid consumers in decisionmaking by offering an analysis of all products of interest, and providing alternatives. No longer
will decisions be made with inadequate information.
Word Origin: Go-Compare originates from the words “Go” for “Google” and “compare”. The
word “compare” is a verb to relate two or more items together and draw out key differences and
similarities between them.
Word History: Comparing is a key issue in decision-making. To everything, there are
alternatives. Man is a rational being; given a set of competing options, he always chooses what
suits his interests at any given time. Those choices are an expression of his likes and dislikes.
Selecting from two or more alternatives requires that adequate information about the
alternatives and their consequences be made available. Go-Compare strives to make this
information available to aid decision-making.
Negation: Go-Compare is not a technology or application but an algorithm that feeds on the
divergent opinions existing on the web for a particular product or service, and close substitutes.
It assumes consumers are rational. Go-Compare is not an information store. Different products
of same type maybe measured using different indices. Go-Compare attempts to synthesize
these indices and offer a total picture that will enable consumers make the right choice.
Division into Parts: Go-Compare will have to relate like with like. Product genre will be mined,
and listed prices of the products will also be compared. To make these possible, a geno-crawler
will consider the characteristics and attributes of products in order to categorize them
24
appropriately according to some strict terms. Then a comparator will generate the products’
features and i-Util ratings. If no ratings are available, it generates one.
Term: Go-Op-Crawler
Definition: This term is used to call a web crawler specifically applied in a Thoughts Discovery
process. It denotes is a computer program that browses relevant and specific Internet sources
in a methodical and automated manner. These sources provide a window to access opinions
about general or particular topics [1][40].
Word Origin: The word Go-Op-Crawler originates from the words “Go” for “Google”, “Op” for
“opinions” and “crawler”.
Division into Parts: The Go-Op-Crawler consists of two principal parts: the web crawler itself
that Google is already using to browse through the Internet which is a part of other Google’s
products such as Google Scholar, Google Images and Google Code; and a classifier which
categorizes Internet sources by relevance, type of site and topics of interest. In order to
categorize the internet sources by relevance, this classifier uses the Google PageRank
technology and Hypertext-Matching Analysis [41], [42].
Similarities and Differences: Like the other web crawlers a Go-Op-Crawler browses web
pages following links. It presents a parallel multi-threading architecture which facilitates the
processing of huge volumes of data [41]. A Go-Op-Crawler is different of other web crawlers
because it selectively only goes through links which has been categorized by relevance, type of
site and topics of interest using PageRank and Hypertext=Matching Analysis.
25
Term: Go-Op-Selector
Definition: Go-Op-Selector is the opinion selector component in the Google Opinions project. It
is responsible for selecting the opinion-laden statements on the topics generated by the GoTop-Generator component.
Word Origin: Go-Op-Selector originates from the words “Go” for “Google”, “Op” for “opinion”
and “Selector” for “selector”.
Division into Parts: The Go-Op-Selector consists of the input, output and processor module
[42], [43]. The input module takes as input the topics which were generated by the Go-TopGenerator component and text from which opinions are to be extracted. The processor module
finds sentences that contain either the topic or synonyms of the topic. It looks for sentences that
have adjectives or adverbs within the vicinity of the topic word (or its synonym). These words
generally denote polarity (negative or positive intonation). The output module passes the
selected opinion-laden statements to the Go-Summarizer.
Analogy: The Go-Op-Selector is like a search engine that searches for a topic in a collection of
opinion laden text [43].
Examples: If a user types “George Bush” into the Google Opinions search bar, the Google-TopGenerator would generate “George Bush” as a topic. A statement “I hate George Bush” would
result in the Go-Op-Selector selecting this statement as the statement has “hate” which is an
opinion laden word in the vicinity of “George Bush”.
Term: Go-Summarizer
26
Definition: The Go-Summarizer is the summarizer component in the Google Opinions project. It
is responsible for generating a summary of the opinions contained in the statements selected by
the Go-Op-Selector component.
Word Origin: Go-Summarizer originates from the words “Go” for “Google” and “summarizer”.
Division into Parts: The Go-Summarizer [44] consists of an input, output and processor
component. The input component takes as input the opinion-laden sentences selected by the
Go-Op-Selector. It passes these sentences to the processor component. The processor
component is responsible for selecting sentences that accurately represent all the opinions in
the input sentences. This component ensures that no two sentences that express the same
opinion are selected. The output component displays the sentences selected in the summary.
Analogy: The Go-Summarizer component is similar to the automatic summarizers available in
the market [45]. The Go-Summarizer component is an extractive summary generator (it
generates a summary by selecting representative sentences) and not an abstractive summary
(rewriting the sentences).
Examples: If the Go-Op-Selector gives the statements “I hate George Bush”, “I think George
Bush has done great work”, “Everyone hates George Bush” as input to the Go-Summarizer, the
Go-Summarizer will select “Everyone hates George Bush” and “George Bush has done great
work” as representative summary sentences.
Term: Go-Top-Generator
Definition: The Go-Top-Generator is the topic generator component in the Google Opinions
project. It is responsible for selecting the main topics talked about in an article.
27
Word Origin: Go-Top-Generator originates from the words “Go” for “Google”, “Top” for “topic”
and “generator”.
Division into Parts: The Go-Top-Generator consists of a Syntactic Analysis component,
Statistical Analyzer component and a Domain component [43]. The Syntactic Analysis
component looks for syntactic patterns in the input text. These patterns are commonly found in
sentences that talk about a topic. The Statistical Analyzer component will calculate the number
of occurrences of words in the input text. Words that are not stop words (commonly occurring
words such as prepositions) and occur frequently are likely to be topics. The Domain
component is responsible for looking for different topics that are specific to the domain of the
text. For example, a text related to restaurant reviews would talk about customer service.
Examples: Given a sentence “I love HP laptops”, the Go-Top-Generator will select “HP laptops”
as the topic.
Term: Google Opinions
Definition: Google Opinions is an extension of the Google Search Engine [25] which searches
for opinions on a particular subject.
Word Origin: Originates from the company name “Google Opinions” and the other Google
categories such as “Google Images”, “Google News”, and “Google Maps”.
Division into Parts: Google Opinions consists of the following parts: Go-Op-Crawler (web
crawler), Go-Summarizer (summarization component), engine, and a statistics section. The
search engine works similarly to Google’s other search categories in that it reports sites from
the web related to the opinion typed into the search bar, as well as providing options to limit or
enhance ones search. The statistics is explained in the Thought Stats definition.
28
Analogy: Google Opinions can be thought of as a more technical variation of the Google
Search available on the Google Main Page, with the web sites limited to opinions instead of
pure key-word findings.
Similarities and Differences: The similarities between a standard Google search and a Google
Thoughts search include the report of web pages containing either positive or negative opinions
of the search criteria and a page allowing a refinement of search options. Differences include a
marked rating next to the web site indicating whether opinions are positive or negative, and an
extra section for statistics.
Examples: A user who types “George Bush” into the Google Opinions search bar will have
access to articles that talk about George Bush in a positive or negative manner.
Term: i-Util Meter
Definition: An i-Util meter is a computer program that collates and measures the satisfaction
users or consumers derive from a product. It uses the ratings awarded to the product to arrive at
a weighted average.
Word origin: i-Util Meter originates from the words “i” for “internet”, “Util” for “utility” and meter.
Word History: Measuring satisfaction across a broad spectrum has always been a challenge.
People have different needs and expectations from a given item. Satisfaction then has become
a social question. Hence we look to the social sciences for answers. A Util is a hypothetical unit
of measurement of utility that is commonly used by economists to present hypothetical
information about utility and consumer demand theory. The util measurement unit was
developed as a convenient way to illustrate and discuss concepts such as total utility, marginal
utility, and the law of diminishing marginal utility. However, because utility is not a measurable
29
characteristic, the utility does represent an actual unit of measurement, such as inches or
pounds.
Division into parts: An i-Util Meter is made up of two principal parts: the i-Util itself that is a
numerical representation of derived benefits according to an individual user, and the meter that
aggregates scores and generates an average.
Similarities and differences: The i-Util Meter is a virtual measurement of perceived
satisfaction derived from a product by end-users. It is aggregated among multiple users and
makes allowances for special events such as promos that can suddenly spike consumer
appreciation. It is similar to a utility meter [46] that measures consumption levels of a utility. The
differences stems from their levels of alignment with the customer. The i-Util Meter measures
the satisfaction with a product or service whereas utility meters measure consumption patterns
and trends with particular emphasis on charging a fee on consumption.
Term: Thought Stats
Definition: Thought Stats is a component of the Google Opinions project which collects
statistics on opinions on a topic and displays the same as tables, graphs and charts.
Word Origin: Originates from the group name “Google Think” and the abbreviation “stats” for
“statistics”
Division into Parts: Thought Stats consists of multiple portions for gathering either opinionspecific or comparative-opinion statistics on various products, ideas, views, positions, and any
other physical or abstract article for which an opinion can be formed. Both numerical and
graphical content can be obtained for summarizing a wide variety of opinions.
30
Analogy: Thought Stats can is similar to Google Analytics [47] with the data being number of
opinions instead of number of website hits.
Examples:

A Pie Chart showing the division of positive and negative opinions about a product,

A Bar Graph displaying the comparative percentage of positive and negative opinions
about 5 different objects,

A Line Graph traversing the percentage of positive or opinions towards a particular view
over a period of time,

A Table showing opinion statistics about a set of products, such as number of opinions,
percentage of opinions, increase or decrease in number and quality of opinions, rate of
increase of number and quality of opinions, and product information referenced as
influencing these opinions.
Term: Thoughts Discovery
Definition: Thoughts Discovery is the process of automatically searching large volumes of data
for patterns that can be considered opinions or particular thoughts about general or specific
topics.
Word Origin: Thoughts Discovery originates from the specialization of a branch of Data Mining
called Knowledge Discovery [48], [49].
Division into Parts: Thoughts Discovery is based on the following computer science
disciplines: Artificial Intelligence, Data Mining, Statistics, Machine Learning and Pattern
Matching and Recognition. The application of the former three, as a set, has been the focus of
31
all Data mining investigations. Now, with the state-of-the-art advances in Pattern Matching and
recognition applied to the existing data mining techniques the new Thoughts Discovery
investigation area has raised.
Negation: Thoughts Discovery is not a technology but an amalgamation of disciplines. It cannot
be applied as a physical or logical tool, but can be applied as a methodological paradigm to
guide the development of tools.
32
Appendix D. Financial Statements
Google Inc.: Income Statement
The table below gives the income statement for the previous five quarters [16].
In Millions of USD (except
for per share items)
Revenue
3 months
3 months
3 months
3 months
3 months
ending
ending
ending
ending
ending
2009-03-
2008-12-
2008-09-
2008-06-
2008-03-
31
31
30
30
31
5,508.99
5,700.90
5,541.39
5,367.21
5,186.04
-
-
-
-
-
Total Revenue
5,508.99
5,700.90
5,541.39
5,367.21
5,186.04
Cost of Revenue, Total
2,101.50
2,190.01
2,173.39
2,147.57
2,110.54
Gross Profit
3,407.49
3,510.90
3,368.00
3,219.64
3,075.51
882.25
917.35
1,015.87
959.46
856.20
Research & Development
641.64
733.34
704.57
682.21
673.07
Depreciation/Amortization
-
-
-
-
-
Interest Expense(Income) -
-
-
-
-
-
Unusual Expense (Income)
-
1,094.76
-
-
-
Other Operating Expenses,
-
-
-
-
-
Total Operating Expense
3,625.40
4,935.46
3,893.83
3,789.25
3,639.81
Operating Income
1,883.59
765.45
1,647.57
1,577.96
1,546.23
Other Revenue, Total
Selling/General/Admin.
Expenses, Total
Net Operating
Total
33
In Millions of USD (except
for per share items)
3 months
3 months
3 months
3 months
3 months
ending
ending
ending
ending
ending
2009-03-
2008-12-
2008-09-
2008-06-
2008-03-
31
31
30
30
31
6.21
-
-
-
-
-
-
-
-
-
-
1.41
0.75
5.32
-2.96
Income Before Tax
1,889.80
835.35
1,668.78
1,635.89
1,713.58
Income After Tax
1,422.83
382.44
1,289.94
1,247.39
1,307.09
Minority Interest
-
-
-
-
-
Equity In Affiliates
-
-
-
-
-
1,422.83
382.44
1,289.94
1,247.39
1,307.09
Accounting Change
-
-
-
-
-
Discontinued Operations
-
-
-
-
-
Extraordinary Item
-
-
-
-
-
1,422.83
382.44
1,289.94
1,247.39
1,307.09
Preferred Dividends
-
-
-
-
-
Income Available to
1,422.83
382.44
1,289.94
1,247.39
1,307.09
1,422.83
382.44
1,289.94
1,247.39
1,307.09
Interest Income(Expense),
Net Non-Operating
Gain (Loss) on Sale of
Assets
Other, Net
Net Income Before Extra.
Items
Net Income
Common Excl. Extra Items
Income Available to
Common Incl. Extra Items
34
In Millions of USD (except
for per share items)
Basic Weighted Average
3 months
3 months
3 months
3 months
3 months
ending
ending
ending
ending
ending
2009-03-
2008-12-
2008-09-
2008-06-
2008-03-
31
31
30
30
31
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
0.00
0.00
0.00
0.00
317.22
316.86
317.78
318.02
317.39
4.49
1.21
4.06
3.92
4.12
-
-
-
-
-
-
0.00
0.00
0.00
0.00
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Shares
Basic EPS Excluding
Extraordinary Items
Basic EPS Including
Extraordinary Items
Dilution Adjustment
Diluted Weighted Average
Shares
Diluted EPS Excluding
Extraordinary Items
Diluted EPS Including
Extraordinary Items
Dividends per Share Common Stock Primary
Issue
Gross Dividends - Common
Stock
Net Income after Stock
Based Comp. Expense
Basic EPS after Stock Based
35
In Millions of USD (except
for per share items)
3 months
3 months
3 months
3 months
3 months
ending
ending
ending
ending
ending
2009-03-
2008-12-
2008-09-
2008-06-
2008-03-
31
31
30
30
31
Comp. Expense
Diluted EPS after Stock
-
-
-
-
-
Depreciation, Supplemental
-
-
-
-
-
Total Special Items
-
-
-
-
-
Normalized Income Before
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
4.49
2.79
4.06
3.92
7.31
Based Comp. Expense
Taxes
Effect of Special Items on
Income Taxes
Income Taxes Ex. Impact of
Special Items
Normalized Income After
Taxes
Normalized Income Avail to
Common
Basic Normalized EPS
Diluted Normalized EPS
Table 6: Income Statement
36
Google Inc.: Balance Sheet
The table below gives the balance sheet for the previous five quarters [16].
In Millions of USD (except
As of
As of
As of
As of
As of
for per share items)
2009-03-
2008-12-
2008-09-
2008-06-
2008-03-
31
31
30
30
31
Cash & Equivalents
10,426.29
8,656.67
8,370.47
7,363.54
6,519.75
7,358.64
7,189.10
6,042.14
5,370.13
5,614.76
17,784.93
15,845.77
14,412.61
12,733.67
12,134.51
2,543.11
2,642.19
2,541.49
2,641.90
2,560.91
-
-
-
-
-
2,543.11
2,642.19
2,541.49
2,641.90
2,560.91
-
-
-
-
-
1,317.86
1,404.11
897.35
846.87
697.79
434.90
286.11
111.40
94.40
71.72
22,080.80
20,178.18
17,962.85
16,316.84
15,464.93
-
7,576.34
7,325.79
7,013.00
6,430.85
4,830.31
4,839.85
4,821.65
4,853.81
4,791.40
Intangibles, Net
910.34
996.69
1,047.72
1,138.99
1,203.97
Long Term Investments
101.00
85.16
1,100.90
1,067.52
1,056.97
Other Long Term Assets,
468.46
433.85
660.70
664.92
345.99
Short Term Investments
Cash and Short Term
Investments
Accounts Receivable Trade, Net
Receivables - Other
Total Receivables, Net
Total Inventory
Prepaid Expenses
Other Current Assets, Total
Total Current Assets
Property/Plant/Equipment,
Total - Gross
Goodwill, Net
Total
37
In Millions of USD (except
As of
As of
As of
As of
As of
for per share items)
2009-03-
2008-12-
2008-09-
2008-06-
2008-03-
31
31
30
30
31
33,513.03
31,767.58
30,806.97
29,179.79
27,604.98
196.22
178.00
240.71
439.28
358.12
1,452.90
1,824.45
1,683.23
1,565.69
1,730.86
-
0.00
0.00
0.00
0.00
-
-
-
-
-
534.74
299.63
300.34
340.55
371.50
2,183.85
2,302.09
2,224.28
2,345.51
2,460.47
Long Term Debt
-
-
-
-
-
Capital Lease Obligations
-
-
-
-
-
Total Long Term Debt
-
0.00
0.00
0.00
0.00
Total Debt
-
0.00
0.00
0.00
0.00
0.00
12.52
20.42
22.20
-
-
-
-
-
-
Other Liabilities, Total
1,481.08
1,214.11
1,087.41
899.06
806.97
Total Liabilities
3,664.93
3,528.71
3,332.11
3,266.77
3,267.45
-
-
-
-
-
-
-
-
-
-
Total Assets
Accounts Payable
Accrued Expenses
Notes Payable/Short Term
Debt
Current Port. of LT
Debt/Capital Leases
Other Current liabilities, Total
Total Current Liabilities
Deferred Income Tax
Minority Interest
Redeemable Preferred
Stock, Total
Preferred Stock - Non
Redeemable, Net
38
In Millions of USD (except
As of
As of
As of
As of
As of
for per share items)
2009-03-
2008-12-
2008-09-
2008-06-
2008-03-
31
31
30
30
31
Common Stock, Total
0.32
0.32
0.32
0.31
0.31
Additional Paid-In Capital
14,694.50
14,450.34
14,194.20
13,904.27
13,561.95
Retained Earnings
14,984.46
13,561.63
13,179.19
11,889.25
10,641.86
-
-
-
-
-
168.82
226.58
101.16
119.18
133.41
Total Equity
29,848.10
28,238.86
27,474.86
25,913.01
24,337.53
Total Liabilities &
33,513.03
31,767.58
30,806.97
29,179.79
27,604.98
-
-
-
-
-
315.70
312.92
314.59
314.25
313.50
(Accumulated Deficit)
Treasury Stock - Common
Other Equity, Total
Shareholders' Equity
Shares Outs - Common
Stock Primary Issue
Total Common Shares
Outstanding
Table 7: Balance Sheet
39
Google Inc.: Cash Flow Statement
The table below gives the cash flow statement for the previous five quarters [16].
In Millions of USD
3 months
12 months
9 months
6 months
3 months
(except for per share
ending
ending
ending
ending
ending
items)
2009-03-31
2008-12-31
2008-09-30
2008-06-30
2008-03-31
1,422.83
4,226.86
3,844.42
2,554.48
1,307.09
321.13
1,212.24
898.76
589.28
280.56
82.09
287.65
215.62
138.85
55.96
Deferred Taxes
-12.85
-224.65
-124.60
-105.89
-38.21
Non-Cash Items
224.23
2,023.53
704.34
433.92
184.78
Changes in Working
212.08
327.23
192.03
-65.04
-10.72
2,249.51
7,852.86
5,730.56
3,545.60
1,779.45
Capital Expenditures
-262.75
-2,358.46
-1,990.62
-1,539.11
-841.60
Other Investing Cash
-156.08
-2,960.96
-1,511.97
-826.51
-565.40
-418.83
-5,319.42
-3,502.58
-2,365.63
-1,406.99
31.84
159.09
114.77
94.98
51.10
-
-
-
-
-
-36.74
-71.52
-38.25
-22.75
-22.45
Net Income/Starting Line
Depreciation/Depletion
Amortization
Capital
Cash from Operating
Activities
Flow Items, Total
Cash from Investing
Activities
Financing Cash Flow
Items
Total Cash Dividends
Paid
Issuance (Retirement) of
40
In Millions of USD
3 months
12 months
9 months
6 months
3 months
(except for per share
ending
ending
ending
ending
ending
items)
2009-03-31
2008-12-31
2008-09-30
2008-06-30
2008-03-31
-
-
-
-
-
-4.89
87.57
76.52
72.23
28.66
-56.17
-45.92
-15.62
29.74
37.05
1,769.62
2,575.08
2,288.88
1,281.94
438.16
-
1.56
1.29
0.95
0.39
-
1,223.98
743.44
378.55
12.09
Stock, Net
Issuance (Retirement) of
Debt, Net
Cash from Financing
Activities
Foreign Exchange Effects
Net Change in Cash
Cash Interest Paid,
Supplemental
Cash Taxes Paid,
Supplemental
Table 8: Cash Flow Statement
41
Appendix E. Resumes
This appendix contains the resumes of Scott Larson, Khairun-nisa Hassanali, David Urbina and
Michael Fashola.

Scott Larson will serve as the Project Manager for the Google Opinions project.

Khairun-nisa Hassanali will serve as the Lead Researcher for the Google Opinions
project.

David Urbina will serve as the Lead Architect for the Google Opinions project.

Michael Fashola will serve as the Testing Manager and Manager of Finance and
Marketing for the Google Opinions project.
42
Khairun-nisa Hassanali
Home: 214-281-8888
Email: khairunnisa.hassanali@gmail.com
www.utdallas.edu/~khassanali
2400 Waterview Parkway, #418
Richardson, TX 75080
OBJECTIVE
Secure a Software Engineering position in an innovative team with a passion for quality
EDUCATION
The University of Texas – Dallas, PhD in Computer Science, anticipated 2011
G.P.A 3.888/4.0 Major: Computer Science
Research Focus: Natural Language Processing, Opinion Mining
Bangalore University - Bangalore, India, MCA, Dec 2001
Percentage 87.78/100
Major: Computer Applications
Bangalore University, - Bangalore, India, Bachelors of Science (B.Sc.), May 1998
Percentage 67.38/100
Major: Computer Science, Mathematics, Statistics
PROGRAMMING
LANGUAGES
C
C++
Java
Python
Visual Basic
SKILLS
Dynamic, self-motivated, customer-oriented, team player and a quick learner
6 years experience in developing SIP Servers and User Agent Toolkits and application
software
MEMBERSHIPS
IEEE, Student Member, January 2009 – Current
AWARDS AND
HONORS
Recipient of the Graduate Student Scholarship, University of Texas at Dallas
Team award, Flextronics Software Systems for improving the performance of SIP Server
PROFESSIONAL EXPERIENCE
Teaching Assistant
The University of Texas at Dallas
Sep. 2008 – Present
Assist professors in tutoring students and grading assignments of courses including Natural Language Processing.
Research Assistant
The University of Texas at Dallas
Sep. 2007 – May 2008
Conducted research on automatic classification of political blogs, named entity recognition, opinion mining and
detection of sarcasm in written text. Implemented prototypes using Python, Java and C on UNIX environment.
Technical Leader
Flextronics Software Systems, Bangalore, India
May 2003 – Feb. 2007
Led a team of 4 software engineers in developing SIP (Session Initiation Protocol) Server Frameworks and User
Agent Toolkits. Responsibilities included design, testing, reviewing and customer support. Received a team award
for improving the performance of the SIP Server Frameworks 5.9. Used C++ and UNIX.
Trainee Software Engineer
Icope Technologies Pvt. Ltd, Bangalore, India
Feb. 2003 – May 2003
Developed J-Theseus, a Customer Relationship Management package using Java, JSP, JDBC and MS SQL Server.
Software Developer
Lakhani General Suppliers, Mombasa, Kenya
Feb. 2002 – Nov. 2002
Single handedly developed and deployed an Inventory Control system using Visual Basic and MS SQL Server.
REFERENCES AVAILABLE UPON REQUEST
43
David Urbina
2400 Waterview Parkway, Apt. 524
Richardson, Texas 75080
Home: 972-233-1659
Email: david.urbina@acm.org
OBJECTIVES
Work in an innovative company thereby orienting my professional career in the Software
Architecture and Software Requirements knowledge areas.
EDUCATION
The University of Texas – Dallas Master in Computer Science, anticipated December
2010
G.P.A.: 4.0/4.0
Major: Software Engineering
Courses: Advance Requirements Engineering, Object-Oriented Analysis and Design
Simón Bolívar University – Venezuela
Computer Engineering, 2006
Average: 4.25/5.0
Courses: Database Systems I, II, III; Operative Systems I, II, III; Information Systems I,
II, III
PROGRAMMING
LANGUAGES
J2SE
C#
TL-SQL
SOFTWARE
Visual Studio 2008, .NET Framework 3.5, Spring Framework 2.5, Eclipse, Tomcat,
WebSphere Community Edition, Oracle Database Server 10g, Microsoft SQL Server
2005, Windows XP, Linux
CERTIFICATIONS
J2SE Sun Certified Programmer
SKILLS
5 years of experience working in team environments
4 years of experience in object-oriented analysis/design/development
3 years of experience using UML modeling language
1 year of experience assisting students in computer science laboratory
LANGUAGES
Written and Oral fluency in Spanish and English
MEMBERSHIPS
ACM, student membership, 2005 - current
AWARDS AND
HONORS
Honorable Mention for the research “Security and Portals for SUMA grid”
PROFESSIONAL EXPERIENCE
Solutions Architect
DBAccess, Inc.
January 2008 – December 2008
Principal responsible for the decisions of design in two large-scale projects. Demonstrated skills for designing
software-intensive system architectures, Communicating ideas and training co-workers. Promoter and co-organizer
of the Quality Architecture Evaluation Service in this company and first Quality Architecture Evaluator of the
SCADA system of PDVSA, S.A., one of the largest oil companies in the world.
Solutions Developer
DBAccess, Inc.
January 2006 – December 2007
Worked in 3 projects for across-the-globe clients. Demonstrated skills for accomplishing goals on schedule and
learning new technologies and methodologies. Continually collaborate with the design team, giving ideas to improve
the architecture design. Quick assignation to the highly respected Software Architecture Unit of the company.
INTERESTS
Technical Reading, Working in Open Source projects
REFERENCES AVAILABLE UPON REQUEST
44
Michael Fashola
Home: 214-575-0104
Cell:
469-693-8687
mof081000@utdallas.edu
9637 Forest Lane
Dallas, Tx 75243
OBJECTIVE
Multi-disciplinary Software architect with an eye for value adding and impacting
positively on the bottom-line through sales strategy
EDUCATION
The University of Texas at Dallas, M.S. in Software Engineering
anticipated May 2010
Hours Completed:
9 out of 24 for degree
EdExcel, UK
B-TEC/HND Software Engineering 2008
University of Ilorin, Ilorin, NIGERIA
B.Sc in FINANCE
1999
PROGRAMMING
LANGUAGES
Java
C
C++
C#
Python
PERL
SKILLS
8years experience working in Teams environment
5years experience developing and testing enterprise applications
2years experience in optimization and synthesis of high performance systems
LANGUAGES
English
MEMBERSHIPS
IEEE, student membership, 2009 – current
ACM, student membership, 2009 – current
Institute of Chartered Accountants on Nigeria,
1997 – current
Chartered Institute of Bankers of Nigeria, 2001 – current
PROFESSIONAL EXPERIENCE
Head, Asset & Liability TEAM OCEANIC BANK INTERNATIONAL PLC
January 2001 – present
 Asset and Liability monitoring and management
 Testing, recommendation and deployment of new software applications for financial system governance
 Strategically building the Bank's assets and liabilities in line with Central Bank regulations
Audit Officer
OFFICE OF AUDITOR GENERAL FOR LOCAL GOVERNMENT Aug 1999 – Feb. 2000
 Public Sector Accounting
 Value for money audit
Audit Officer
ARUNA BAWA & Co. (Chartered Accountants)
 Consultancy services
March 1999 – July 1999
REFERENCES AVAILABLE ON REQUEST
45
Scott Larson
Home: (972) 495-6171
Cell: (214) 738-7986
E-mail: s_larson323@yahoo.com
2713 Chariot Ln.
Garland, Tx 75044
Objective
To contribute to the Software Development Industry with knowledge of Programming and
Software Engineering, and increase my knowledge of corporate practice
Education
The University of Texas at Dallas M.S. – Software Engineering
Anticipated Fall 2009
GPA: 3.46 / 4.00 Software Engineering
Courses:
Requirements Engineering
Software Architecture and Design
Software Testing and Verification
Database Design
Texas Christian University
B.A. – Music Spring 2007
GPA: 3.29 / 4.00 Music Composition
Programming
Languages
Java
JavaScript
C++
ASP
C
Visual Basic
Software
MS Office
MS Visio
Adobe Photoshop
SQL Server
Skills
6 years experience programming Java
5 years Organizational experience
4 years experience Problem Solving
4 years Analytical experience
3 years Communication [academic environment]
1 year Leadership [academic environment]
Languages
English (Native) German (some)
Memberships
IEEE - current
SQL
HTML
Visual C++
Codewarrior
Studio Factory
Academic Projects at The University of Texas – Dallas
Title
Software Testing
and documentation (leader)
Proposal
Architecture and Design
Brief Description
6-person group project black-box testing
Semester
Spring 2009
4-person group project developing a
business-level project proposal (leader)
3-person group project designing and
Implementing an online search engine (leader)
Spring 2009
REFERENCES AVAILABLE UPON REQUEST
Fall 2008
46
References
[1]
R. Kumar, J. Novak, P. Raghavan, and A. Tomkins, "On the bursty evolution of
blogspace," in WWW '03: Proceedings of the twelfth international conference on World
Wide Web.
ACM Press, 2003, pp. 568-576. [Online]. Available:
http://dx.doi.org/10.1145/775152.775233
[2]
K. Dave, S. Lawrence, and D. M. Pennock, "Mining the peanut gallery: opinion extraction
and semantic classification of product reviews," in WWW '03: Proceedings of the twelfth
international conference on World Wide Web.
ACM Press, 2003, pp. 519-528. [Online].
Available: http://dx.doi.org/10.1145/775152.775226
[3]
A. N. Langville and C. D. Meyer, Google's PageRank and Beyond: The Science of Search
Engine Rankings.
[4]
Princeton University Press, July 2006.
B. Williams and J. Jacobs, "Exploring the use of blogs as learning spaces in the higher
education sector," Australasian Journal of Educational Technology, vol. 20(2), pp. 232247, 2004. [Online]. Available: http://www.jeremybwilliams.net/AJETpaper.pdf
[5]
Ian A. McAllister, Christoph R. Ponath, Ling Bao, Steven J. Hanks, Microsoft Corporation.
“Extraction and summarization of information.” US 2007/0282867 A1 , May 30, 2006
[6]
Simon H. Corston-Oliver, Anthony Aue, Eric K. Ringger, Michael Gamon, Microsoft
Corporation. “System for processing sentiment-bearing text.” US 2006/0200342 A1, Apr.
14, 2005
[7]
I. Titov and R. Mcdonald, "A joint model of text and aspect ratings for sentiment
summarization," in Proceedings of ACL-08: HLT.
Columbus, Ohio: Association for
Computational Linguistics, June 2008, pp. 308-316. [Online]. Available:
http://www.aclweb.org/anthology-new/P/P08/P08-1036.bib
[8]
E. Spertus, M. Sahami, and O. Buyukkokten, "Evaluating similarity measures: a largescale study in the orkut social network," in KDD '05: Proceeding of the eleventh ACM
47
SIGKDD international conference on Knowledge discovery in data mining.
New York,
NY, USA: ACM Press, 2005, pp. 678-684. [Online]. Available:
http://dx.doi.org/10.1145/1081870.1081956
[9]
“Google AdWords: Promote Your Business with Google.” Internet:
https://www.google.com/accounts/ServiceLogin?service=adwords&cd=null&hl=enUS&ltmpl=regionala&passive=true&ifr=false&alwf=true&continue=https%3A%2F%2Fadwo
rds.google.com%2Fselect%2Fgaiaauth%3Fapt%3DNone%26ugl%3Dtrue&sourceid=awo
&subid=ww-en-et-ads-newawhptest2, [Apr. 22, 2009]
[10] “Google AdSense: Maximize revenue you’re your online content.”
Internet: https://www.google.com/adsense/login/en_US/?sourceid=aso&subid=na-en-habk&utm_medium=ha&utm_term=google%20adsense&gsessionid=RH1NF7ML6p-FQWsMkHEYg, [Apr. 27, 2009]
[11] Wikipedia, The Free Encyclopedia. “Integration testing” Internet:
http://en.wikipedia.org/wiki/Integration_testing, [Apr. 24, 2009]
[12] Wikipedia, The Free Encyclopedia. “System testing” Internet:
http://en.wikipedia.org/wiki/System_testing, [Apr. 24, 2009]
[13] Wikipedia, The Free Encyclopedia. “Code Source” Internet:
http://en.wikipedia.org/wiki/Source_code, [Apr. 24, 2009]
[14] Wikipedia, The Free Encyclopedia. “Executable” Internet:
http://en.wikipedia.org/wiki/Executable, [Apr. 24, 2009]
[15] Wikipedia, The Free Encyclopedia. “Software deployment” Internet:
http://en.wikipedia.org/wiki/Software_deployment, [Apr. 24, 2009]
[16] Wikipedia, The Free Encyclopedia. “Software Requirements Specification” Internet:
http://en.wikipedia.org/wiki/Software_Requirements_Specification, [Apr. 24, 2009]
48
[17] Kruchten, P., “The 4+1 View Model of Software Architecture”. Architectural Blueprints IEEE Software, pp. 42-50, November, 1995.
[18] “Which Google Products Make Money?.” Internet: http://blogoscoped.com/archive/200901-07-n84.html, [Apr. 27, 2009]
[19] “Google Financial Statements.” Internet:
http://www.google.com/finance?fstype=bi&cid=694653
[20] Google. “Financial Release.” Internet:
http://investor.google.com/releases/2008Q4_google_earnings.html, [Apr. 22, 2009]
[21] Google. “Financial Release.” Internet:
http://investor.google.com/releases/2009Q1_google_earnings.html, [Apr. 22, 2009]
[22] “Live Search.” Internet: http://www.live.com/, [Apr. 22, 2009]
[23] “Yahoo Search.” Internet: http://tools.search.yahoo.com/about/forsearchers.html?p, [Apr.
22, 2009]
[24] “Google Site Search: Power your Website Search with Google.” Internet:
http://www.google.com/sitesearch/, [Apr. 22, 2009]
[25] “Google Search Engine”, Internet: http://www.google.com/, [Apr. 14, 2009]
[26] “Google Salaries.” Internet: http://www.glassdoor.com/Salaries/Google-SalariesE9079.htm, [Apr. 27, 2009]
[27] Amazon.com.” Intel BX80582X7460 6-Core Xeon X7460 Processor.”
Internet:http://www.amazon.com/exec/obidos/ASIN/B001CH9B9Q/ref=nosim/6684177-20,
[Apr. 22, 2009]
[28] “Microsoft Office Professional 2007 - Full Version”, Internet:
http://www.qvc.com/qic/qvcapp.aspx/view.2/app.detail/params.aol_refer.false.tpl.detail.ms
n_refer.false.item.E176568.ref.GBA?cm_ven=GOOGLEBASE&cm_cat=Electronics&cm_p
la=Software&cm_ite=E176568, [Apr. 22, 2009]
49
[29] “Microsoft Office Project Professional 2007 - PC - CD-ROM – English”,
Internet:http://www.google.com/products/catalog?q=MS+Project&hl=en&cid=86751705098
42985112&sa=title#ps-sellers, [Apr. 22, 2009]
[30] “Microsoft Windows Vista Business w/SP1.” Internet:
http://www.google.com/products/catalog?hl=en&rls=com.microsoft:*:IESearchBox&q=cost+of+microsoft+vista+business&um=1&ie=UTF8&cid=982345806236338989&ei=tFL2SZLBJeKrtgej943tDw&sa=X&oi=product_catalog_r
esult&resnum=1&ct=result#ps-sellers, [Apr. 27, 2009]
[31] Oracle. “Data Mining.” Internet:
http://oraclestore.oracle.com/OA_HTML/ibeCCtpSctDspRte.jsp?section=11223&sitex=100
21:22372:US, [Apr. 22, 2009]
[32] OriginLab Data Analysis and Graphing Software. “Products.” Internet:
http://www.originlab.com/index.aspx?s=8&lm=88&pid=941, [Apr. 22, 2009]
[33] “Red Hat Store.” Internet:
https://www.redhat.com/wapps/store/catalog.html;jsessionid=83T9Bh5ijdD6C2s95AHYgQ*
*.4b748952, [Apr. 27, 2009]
[34] “IBM Rational Purify for Linux and UNIX – Unix.”
Internet:http://www.google.com/products?q=rational+purify+price&hl=en, [Apr. 22, 2009]
[35] “IBM Rational Rose Developer for UNIX – Unix.”
Internet:http://www.google.com/products/catalog?q=rational+rose+price&hl=en&cid=6305
737877623743092&sa=title#ps-sellers, [Apr. 22, 2009]
[36] “IBM Rational Rose Technical Developer - PC, Unix.”
Internet:http://www.google.com/products/catalog?q=rational+rose+price&hl=en&cid=1546
7682535907615653&sa=title#ps-sellers, [Apr. 22, 2009]
50
[37] “IBM Rational Rose Data Modeler – PC.“
Internet:http://www.google.com/products/catalog?q=rational+rose+price&hl=en&cid=1741
6741335320358674&sa=title#ps-sellers, [Apr. 22, 2009]
[38] “IBM Rational PurifyPlus Enterprise Edition - PC, Unix.”
Internet:http://www.google.com/products/catalog?hl=en&q=rational+purifyplus+price&cid=
224150831539729924&sa=title#ps-sellers, [Apr. 22, 2009]
[39] NetSarang Computer. “XManager Enterprise.” Internet:
http://sales.netsarang.com/e_sales/online_store.html?open=xme#xme, [Apr. 22, 2009]
[40] Kobayashi, M. and Takeda, K. “Information retrieval on the web”, ACM Computing
Surveys, New York, NY, June 2000.
[41] Brin, S.; Page, L., "The Anatomy of a Large-Scale Hyper textual Web Search Engine",
Seventh International World-Wide Web Conference, Brisbane, Australia, April 1998.
[42] Wikipedia, The Free Encyclopedia. “PageRank.”
Internet:http://en.wikipedia.org/wiki/PageRank, [Apr. 9, 2009]
[43] G. Mishne, "Multiple ranking strategies for opinion retrieval in blogs," 2006 TREC Blog
Track, 2006. [Online]. Available: http://staff.science.uva.nl/~gilad/pubs/trec06-blogret.pdf
[44] K. Lerman, S. Blair-Goldensohn, and R. Mcdonald. "Sentiment summarization: Evaluating
and learning user preferences," in 12th Conference of the European Chapter of the
Association for Computational Linguistics (EACL-09), 2009.
[45] Wikipedia, The Free Encyclopedia. “Automatic Summarization.” Internet:
http://en.wikipedia.org/wiki/Automatic_summarization, [Apr. 9, 2009].
[46] “Amos WebGLOSS arama.” Internet: http://www.amosweb.com/cgibin/awb_nav.pl?s=gls&c=dsp&k=util, [Apr. 14, 2009].
[47] “Google Analytics”, Internet: http://www.google.com/analytics/, [Apr. 14, 2009].
51
[48] Wright, P., “Knowledge Discovery In Databases: Tools and Techniques”, ACM
Crossroads, Winter 1998.
[49] Fayyad, U., Shapiro, G. and Smyth, P., From Data mining to Knowledge Discovery in
Databases.
[50] Wikipedia, The Free Encyclopedia. “Web Crawler.” Internet:
http://en.wikipedia.org/wiki/Web_crawler, [Apr. 9, 2009]
[51] M. Steyvers and T. Griffiths, Probabilistic Topic Models.
Lawrence Erlbaum Associates,
2007.
[52] Wikepedia, The Free Encyclopedia. “IBM WebFountain.” Internet:
http://en.wikipedia.org/wiki/WebFountain, [Apr. 20, 2009]
[53] Roland Piquepaille's Technology Trends. "IBM's WebFountain of Knowledge." Internet:
http://radio.weblogs.com/0105910/2004/03/01.html, [Apr. 20, 2009]
[54] Qingliang Miao , Qiudan Li and Ruwei Dai. “AMAZING: A sentiment mining and retrieval
system.” In Expert Systems with Applications: An International Journal, v.36 n.3, p.71927198, April, 2009
Download