McFarmGraph a web based monitoring tool for McFarm jobs

advertisement
UTA HEP/Computing/0026: McFarmGraph______________________________________________ 1
McFarmGraph a web based monitoring
tool for McFarm jobs
Sankalp Jain, Aditya Nishandar,
Drew Meyer, Jae Yu, Mark Sosebee,
Prashant Bhamidipati, Heunu Kim, Karthik Gopalratnam, Vijay Murthi,
Parag Mhashilkar
Abstract
McFarmGraph is a web based Graphical User Interface used to monitor McFarm jobs on
Linux farms. This document is intended to be a comprehensive description of the design
of McFarmGraph. For installation and administration please refer McFarmGraph
administration guide.
UTA HEP/Computing/0026: McFarmGraph______________________________________________ 2
1. INTRODUCTION:
McFarmGraph consists of two parts – the front-end CGI scripts that are used to display
the status of various jobs in a graphical format, and the back-end daemon that is used to
bring status files over to the web server from the various farms.
Status Daemon (McFarmGraph_daemon):
McFarm periodically (e.g. every four hours) outputs a status file on each farm
where it is running. This flat file summarizes the status of the various jobs
running in the farm. The update period can be changed based on a parameter that
can be set by the farmer (see McFarm documentation). The purpose of the
daemon is to bring the status file from each remote farm that is doing productions
using McFarm. The daemon uses globus services (gsiftp) periodically to transfer
the status files and stores them locally. The daemon then triggers the XML
generator, which converts the flat files to XML format. The flowchart illustrates
the control flow for McFarmGraph daemon.
Graphical User Interfaces:
This is a set of CGI scripts written in PERL, and some applets written in Java.
The scripts interpret the XML data representation of the status file transferred
from the remote farms and present them to the user in a graphical format that can
be accessed from the web.
Request Structure:
Each Mcfarm request consists of a bunch of jobs that are grouped together
according to the request id of type ReqXXXX. Individual job names consist of the
Request Id followed by a descriptor string and a number that is unique in the
group. For e.g. “Req6279-zh-zmumu+hbb-03219094626”, “Req6279-zhzmumu+hbb-03219094710”,
Req6279-zh-zmumu+hbb-03219094921…
all
belong to the group with request id Req6279. The “%Done” attribute displayed
on the webpage for a particular request id is the average of %Done attribute of
individual jobs within that group. Also the figures displayed in the PieChart
represent the percentage of jobs that are in a particular phase.
2. STATUS DAEMON DESIGN:
Scalability and simplicity are the pivotal issues that influence the design of the daemon as
well as the cgi scripts. Fig 1 shows the directory structure in which the job status files
from the remote farms are stored.
UTA HEP/Computing/0026: McFarmGraph______________________________________________ 3
/home/mcfarm/McFarmGraph_New/
/SWIFT-HEP
/mcp10
/mcp11
mcp11
//CSE-HEP
/OU-HEP
/LTU-HEP
/conf
/mcp14
mcp14
/log /tmp
README
daemon.log
daemon.conf
mcp10
Fig 1. McFarmGraph Directory Structure on hepfm007.uta.edu
The McFarmGraph job information as well as the configuration and log file is placed in
the directory structure as shown above. Whenever a new farm is added the daemon
automatically creates a directory corresponding to a farm (e.g. SWIFT-HEP for Swift
Farm). The mcpxx subdirectories are created according to the mcp versions on a
particular farm (e.g. CSE-HEP has mcp13 & mcp14, whereas OU-HEP has only mcp14).
The status files and their XML representations are stored in these (mcpxx) directories.
Each mcpxx will typically contain, mcpxx(flat file), mcpxx_arch (XML representation of
the archived job information) and mcpxx.xml(XML representation of queued and live
jobs).
The conf and the log directories contain the McFarmGraph _daemon configuration and
the log files respectively. Addition of farms is done through the configuration file. The
tmp directory is used as a scratch space when the daemon is running to store the process
id of a running daemon as well as some temporary files (e.g. ls.txt).
Fig 1a and 1b illustrate the control flow in the McFarmGraph_daemon.
UTA HEP/Computing/0026: McFarmGraph______________________________________________ 4
Start
Read the configuration file.
Stop
Start/Stop/
check/no
args
Check if the daemon is up
Yes
No
Print
“daemon is
running”
Start
Check if the daemon is up
Print Usage
and Exit
Is
running?
Yes
No
Is
running?
Read the
daemon’s
process id
Fork a child
process and
separate it from
the parent
Print
“daemon is
running”
Invoke the
main ( )
subroutine
Exit
Exit
Flush the
logs
Print
“daemon is
running”
Issue a kill with
pid as the
daemons’ pid
Sleep for specified time
(UPDATE_INTERVAL)
Exit
Sleep
Interval
Over?
No
Fig 1a. McFarmGraph Daemon Flowchart
Yes
UTA HEP/Computing/0026: McFarmGraph______________________________________________ 5
Main Subroutine:
main ()
Redirect the output stream
to the Log File.
Read the configuration file
for the farm variables by
invoking the initialize()
subroutine
Check for
configuration
errors
Yes
Print “Configuration
Errors”
Exit
Errors ?
No
Create Farm objects
corresponding to
NUMBER_OF_FARMS
For each farm, call the
farm_mkdir () method to
create farm specific
directories if not already
present.
For each Farm, call the
getFiles () method to retrieve
the job status files.
When all the files are
retrieved, start the XML
generator
Fig 1b. main() subroutine in the McFarmGraph Daemon
3. XML GENERATOR:
UTA HEP/Computing/0026: McFarmGraph______________________________________________ 6
In earlier version of McFarmGraph lot of computation was done while the client (browser)
was waiting. Although this processing wasn’t a bottleneck but would have increased as
the size of flat files increase. So in order to avoid this, bulk of the processing is now
being done offline with the data stored in an XML file. While generating the page the
task is now simply reading the data from the XML file and generating the HTML code.
The task of generating the XML data is done by two scripts. A wrapper which for each
file pulled over from various farms calls a subroutine (in xmlgen.pm) which generates the
XML data for that status file. The diagram below shows the flow chart for the
subroutine.
Flow chart of XML generator
START
Read the file path of the file to operate on
and create file paths for both XML files
Create a temporary sorted file
from status file
Read a line from the
sorted file
Yes
Write archived job
info to arch XML
file
EOF
No
Accumulate job info
RequestId
changes?
No
Calculate % Done
Delete the sorted
file
%Done =
100 ?
Yes
EXIT
Accumulate
archived job
info
No
write info in
live jobs xml
file
UTA HEP/Computing/0026: McFarmGraph______________________________________________ 7
4. CGI and PERL Scripts
The following scripts generate the various Web pages:
1. filter.cgi
2. applet.pm
3. filemani.pm
4. generalpage.pm
5. html.pm
6. jobpage.pm
All of these scripts are written in PERL and are located under /usr/local/apache2/cgi-bin
on hepfm000.uta.edu. Apart from these scripts there is the java applet code which is in
the file PieChart.java under /usr/public_html/job_status/applet hepfm000.uta.edu. All the
images that are used in the web pages and the “style.css” file are also under
/usr/public_html/job_status on hepfm000.uta.edu.
Functions of various scripts
filter.cgi: All the requests from the browser are directed to filter.cgi along with a set of
parameters. This script then invokes subroutines in other files depending on the
parameters.
applet.pm: This script generated all the applet specific HTML code.
filemani.pm: This script consist a single subroutine whose functions are explained below.
generalpage.pm: This script generates bulk of the Req. Desc page.
jobpage.pm: This script generates the Job Desc page
html.pm: This script prints most of the HTML code for all scripts.
Generation of Web Pages
McFarmGraph generates most of the pages dynamically using CGI. The only static page
is the “index.html” page which is stored under /usr/public_html/job_status on
hepfm000.uta.edu. For adding a new farm this page has to be modified (refer the
installation guide for more details). For the other pages there are 3 cases:
1. “Farm Request Ids” Page Request
UTA HEP/Computing/0026: McFarmGraph______________________________________________ 8
Generation of “Farm Request Ids” page
printHTMLHeader,
printHTMLfooter
Browser
main, farm name
Job Status
index.html
page
webpage
html.pm
HTML code
filter.cgi
printHTMLCell ,
printCellLink
readDir
Req and reply between
browser and cgi script
filemani.pm
Call to functions
Function Return
In this case the parameters passed to the filter.cgi file include “main” and the farm name.
The filter.cgi script calls the html.pm file function to print the header and then calls the
readDir function which read the directory for the requested farm and creates a link for
each mcp version available on that farm.
UTA HEP/Computing/0026: McFarmGraph______________________________________________ 9
2. “Farm Request Desc.” Page Request
UTA HEP/Computing/0026: McFarmGraph______________________________________________ 10
Generation of “Farm Request Desc.” page
Browser
Farm request
ids page
html.pm
genpage, mcp ver.
farm name, arch?
webpage
printHTMLfooter
HTML code
filter.cgi
HTML code
HTML code
generalPage
Req and reply between
browser and cgi script
Call to functions
applet.pm
printApplet
printHTMLHeader ,
printHTMLCell ,
printCellLink
generalpage.
pm
Function Return
The page generated here will either be one containing all the “live jobs” or all the
“archived jobs” on this farm for the requested mcp version depending on the presence of
last attribute. filter.cgi script calls generalPage subroutine in generalpage.pm file which
does the rest of the processing. generalPage calls various subroutines in html.pm file and
also printApplet in applet.pm which embeds the applet into the HTML code generated.
UTA HEP/Computing/0026: McFarmGraph______________________________________________ 11
The PHASES column in archived page indicates all the phases this Request has gone
through.
3 “Job Desc.” Page Request
Generation of “Job Desc.” page
Browser
Farm request
desc. page
jobpage, mcp ver.
Req desc.,farm name
,status
webpage
printHTMLHeader,
printHTMLfooter
html.pm
HTML code
filter.cgi
printHTMLCell
jobPage
Req and reply between
browser and cgi script
Call to functions
Function Return
jobpage.pm
UTA HEP/Computing/0026: McFarmGraph______________________________________________ 12
The page generated here lists all the jobs in a particular group identified by the Request
Id passed on by the browser. If a status parameter is present in the request (present in case
the applet link is clicked) then the page lists the details of all the jobs in the group whose
current status is the one requested. For e.g. it might contain all the jobs whose status is
“D0GSTAR”.
3. FUTURE WORK:
The performance of the McFarmGraph tool can certainly be improved. Some of the
future work is highlighted below. We would like to reiterate that these are just some of
the suggestions; no study of their feasibility and success is done.

Exploring options to cgi: When the number of farms being monitored increases
cgi scripts could be a potential performance bottleneck. Java servlets might be
able solve this issue.

Expiration of proxies: During the course of development of McFarmGraph, it was
observed that the proxies expire, thus disabling the retrieval of status files from
the remote site. Modification to the daemon could solve this problem.

Java applets load slowly: For every row in the job status page a new applet is
loaded and executes on the client side Java Virtual Machine. Mechanisms for
caching the byte code and having a single instance of the applet would speed up
the loading time.

McFarmGraph status updates: Currently McFarmGraph pulls status files;
majority of the information that it contains has already been pulled over before.
Instead it would be more efficient to get information about those Requests that
either have jobs that are still running on the farm or have those requests that have
finished since last update.
Download