Uploaded by Nick Konstantinou

Har file analysis system

The purpose of this work is to build a population
collection system
HAR data in order to provide some basic
analysis for each user individually, but also
general analyzes concerning the internet
infrastructure in an area. In the system
There are two types of users: Administrator and
The user connects to the system via a desktop
computer, and has the following features:
1) Registration in the system. The user registers
and accesses the system by selecting
a username & password of his choice, and
providing his email. The password is required
be at least 8 characters and contain at least one
capital letter, a number
and a symbol (eg # $ * & @).
2) Upload data. The user selects a HAR file
from their computer. The file will
processed locally to delete sensitive data and
then the user has two
options: a) Upload it to the system or b) Save
the edited file locally.
If the file is uploaded to the system, it will need
to be further processed
(on the server) of the data to be uploaded, in
order to store the desired data with
appropriate format. Also, the IP of the user
uploading the file should be "analyzed" so that
to automatically discover the user connectivity
provider and save the
this information in the database along with the
3) Profile management. The user can change
the username / password and see basically
statistics for uploaded data (last upload date,
number of records)
4) Data visualization. The user can see on a
map the locations of the IPs in which
has sent HTTP requests. Specifically, a
heatmap is displayed on the map that shows
the distribution of the number of records related
to HTML, PHP, ASP web objects,
JSP (or pure domains, without path).
The Administrator accesses the system with a
fixed computer, through an appropriate
username / password. When entering the
system it has the following possibilities.
1. Display of Basic Information. The
administrator sees relevant information on one
in tables and / or graphs depicting:
a. The number of registered users
b. The number of entries in the database per
type (method) of application
c. The number of entries in the database by
response code (status)
d. The number of unique domains that exist in
the database
e. The number of unique connectivity providers
in the database
f. The average age of the web objects at the
time they were retrieved, per CONTENT-TYPE
2. Analysis of response times to requests
(object type entries, timings field).
A configurable diagram with the average
response time (Y axis) in each is displayed
application per hour of the day [0-24] (X axis).
The diagram can be filtered
data as follows:
a. Web object type (select one or more
b. Day of the week (Monday - Sunday or all)
c. HTTP method type on request (select one or
more, or all)
d. Connectivity Provider (eg "Wind", "Cosmote"
or all)
3. HTTP header analysis (header objects). The
administrator is looking at a page
appropriate information, in tables and / or
graphs depicting relevant data
the use of hidden memories. More specifically:
a. Histogram of TTL distribution of web objects
in response, by CONTENT-TYPE
(select one or more CONTENT-TYPE or all).
TTL is the max-age directive (if
exists) or calculated based on expires (if any)
and modification date
of the web object. The number of buckets of the
histogram is 10 and its width
each bucket is calculated dynamically according
to the recovered values.
b. Percentage of max-stale and min-fresh
directives on the total number of applications
per CONTENTTYPE (selection of one or more
c. Percentage of cacheability directives (public,
private, no-cache, no-store) on the total of
responses per CONTENT-TYPE (select one or
more CONTENT-TYPE or all).
All the above graphs / tables are configured by
the provider selection
connectivity Connectivity provider (eg "Wind",
"Cosmote" or all)