Working Groups List

advertisement
Research Computing and Cyberinfrastructure Governance
Working Groups (May 13, 2015)
1. Data Center
a. Provide input into the specs for the PSU Data Center; develop policies related to colocation of servers and other equipment to be used for research at the Data Center.
b. Consider development of a set of access rights/policies for users, and responsibilities for
data center personnel.
c. Help draft SLA’s (service level agreements). Detail a set of services that will be provided
and charges for each. Purchase of equipment? Rental of storage space? Rental or
purchase of CPU time? Access to web servers for making data publicly available? Colocation of equipment to be relocated to the data center by departments, individuals, or
research groups?
d. Who will be allowed access? Can individual faculty buy in? Research Groups?
Departments? Institutes? Will PIs be allowed to supply their own equipment? Will PIs
(or others) have to buy data center equipment? If someone buys in, how can local tech
support reps, or PIs, have access (either physical or remotely) to their equipment?
What maintenance will be supplied by the data center? What guarantees of backups?
e. Given the importance of data and archiving, will this be a prime location for the storage
and backup of big data? Will there be archiving facilities? Will there be separate
segments of the data center where restricted data may be stored? Data that is export
controlled? Data with PII? Will access be via fast pipe (e.g. the research network)?
2. Software
a. Work with the research guru or appropriate representative to catalogue research
software on campus, identify where we could combine licenses efficiently, identify what
new software needs might exist, and determine how to publicize (or distribute) research
software efficiently.
b. Once data is collected from units on current usage of different packages, work on the
cataloging of licenses, and make recommendations for what should be licensed
centrally, vs. locally (e.g. at a department level), vs. being purchased by just a few
researchers as individual licenses. Discuss at what point the
cost/efficiency/coordination tradeoff makes it worthwhile to centralize licensing.
c. Prepare a document showing current (distributed) costs, what the central cost would
be, and the savings. Identify what elements of license agreements need to be tracked.
Obtain relevant details of license agreements affecting research software, e.g. how
broadly can a site license be scaled up, is monitoring of licenses necessary, what
restrictions are there on use (e.g. on one workstation, on a work machine and a home
machine or laptop, only within the U.S. [export controls], and so on.
d. Perhaps consider policies on software distribution and local installation of software and
extensions/updates. Some faculty have administrative rights and can install software,
some cannot. This is a frustration for faculty concerned with productivity, especially in
time-crunch situations.
e. Key participants will include the group developing a new ITS software cataloguing
initiative (Mairéad Martin, ITS). Outreach will need to be to/through all administrative
and/or IT units in Colleges and Institutes, and perhaps to central purchasing if this seems
like a valuable way to identify past software purchases.
Research Computing and Cyberinfrastructure Governance Working Groups / 2
3. High-performance computing
a. Examine HPC in comparison to other universities; look at costs of HPC. Is HPC serving
the faculty well, and what gaps or opportunities are there? Assess high-performance
clusters around the University; are there advantages to seeking consolidation? Assess
whether PSU should engage in a major effort in this area (e.g. should we try to get a
supercomputer? If so, what would it take? Would such an effort have to be Legislature
funded the way it is in some states? Could it be grant-funded? Would a PA University
consortium be feasible?). How can HPC at UP interact with researchers at Hershey (e.g.
folks working on genomics)? Should we seek to position PSU at the forefront of the Big
10 in computing power? Do we need to?
b. What are the differentiated HPC requirements of scientific workflows that are computeintensive, data-intensive or are both compute and data intensive? How can they be
reflected in the features of the hardware and software environments that are needed to
serve these classes of workloads?
c. Key participants will be faculty working with HPC both through the ICS-ACI and in
individual clusters, and perhaps using facilities outside PSU; and ICS-ACI personnel.
4. IT/HR Job Classification and Compensation
a. Consider issues of IT job classification, compensation, and other HR issues. How do we
get and keep the best IT people at Penn State? How do units avoid training, and then
losing, IT personnel we want to retain? Examine losses of IT staff to competitors like
Dell, Apple, Google. Examine compensation across units, and competition across units.
If PSU prohibits internal competitive bidding, why does it seem to occur, and what are
the implications (is there a systemic drain from some units to others, and if so is this a
problem, and what do we do?)? Are there systematic problematic compensation
patterns across units or job descriptions, for example where a sys admin I is routinely
paid more in unit X than the same job in unit Y? Given competition in this area with the
outside world, how can appropriate flexibility be built into hiring CI/IT staff, or is this
unnecessary?
b. Key participants in this discussion will be representatives from HR, along with IT group
managers responsible for recruitment and retention.
c. The Provost and VPR have indicated willingness to entertain proposals for improved
“career tracks” for IT colleagues, and for getting training, trading parts of jobs to gain
experience, and facilitating graduate study towards MA or Ph.D. How do we
institutionalize this?
5. Research Network and Data Classification Policies.
a. What are the parameters and plans for access to the new Research Network? What
segments of campus will be connected? What are the costs for “last mile” connections
(to an individual office, to a building, within a building), and who should pay for that?
How will researchers access the new fast network, how can this be made as transparent
as possible to facilitate research? How does the research network relate to the data
center and access to it, and to ICS-ACI? How easy will it be for faculty to gain access?
For what uses is use of the new network appropriate, and how can such use be ensured
while facilitating research?
b. Data classification has direct impacts on what can be transported in the research
network; it has other implications too, such as access rights, cloud storage, treatment of
Research Computing and Cyberinfrastructure Governance Working Groups / 3
data by identity finder, machine security, and in other areas. Do current proposed
classification policies have enough nuance to cover the varying kinds of research data
we have? Not all research data is sensitive, not all is restricted, and some restrictions
are more important than others. In some cases research data needs to be kept secure,
in other cases security is not critical (e.g. if it is all generated from public sources). In all
cases, barriers to appropriate data use should be reduced. What is a researcher-driven
(along with liability-driven) set of data classifications, and what types of restrictions and
policies (if any) should be placed on use of/access to those types of data?
c. The ITS networking group will be a key connection in this discussion.
6. Data, Data Governance, Data Preservation, Data Dissemination, Data Security, Managing the
Scientific Data Life-Cycle
a. We've gotten a lot of suggestions and issues related to data. Does this need to be split
into multiple working groups? One suggested split was to separate 1) Data
Classification, 2) Data Security, 3) Data Archiving/Data Life-cycle.
b. The working group will deal with developing and disseminating policies and
technologies for data preservation and dissemination. Consider services during active
research and later for longer-term dissemination, archival, curation, etc. Storage is one
key component; speedy access is another; some conversation needs to be around
software and tools for access to and manipulation of scientific data including the roles of
the academic units, research, Libraries and ITS.
c. There are many different specific aspects of data storage that tie in. Space, cost,
archiving, long-term/short-term, publicizing/archiving for public use, internal/external
(cloud), local/centralized… Should these all be in one committee or working group? Or
(for instance) are discussions of “big data” so distinct from discussions of public
replication data sets that these should be handled by different working groups?
d. Data security/data compliance (For discussion: Should we pull out a separate working
group on “Security, Data Protection and Compliance”).
i. What data storage solutions are appropriate for different types of data, e.g.
data that are or are not de-identified, restricted, classified, public. As case
studies, the Network on Child Protection and the Clearinghouse on Military
Families (both in SSRI) are having major difficulties with risk management
around being able to get basic work accomplished; part of the problem seems
to be about protected data storage.
ii. How do we ensure the security of electronic medical records while not cutting
off appropriate access? Penn State's Clinical and Translational Sciences Institute
has invested heavily in making such data accessible, but there are some unique
and complex security issues.
iii. How do we ensure compliance by individual faculty with important and
appropriate efforts to protect data?
iv. Is Identityfinder a good security solution? Is it effective? Does it crash a lot of
machines? Does it waste a lot of time? It certainly generates a lot of
complaints. What is the gain vs. cost of this type of intrusive solution? Are
there others? What are appropriate exceptions policies to automatic running of
identityfinder (as case studies, we hear of identityfinder running in the
background, draining laptop batteries, leading to system shutdown in the
middle of a presentation)? Is it in fact suggested or actually
mandated/required?
Research Computing and Cyberinfrastructure Governance Working Groups / 4
e. Location/centralization of services (or not): Should we let a thousand storage solutions
bloom? What can PSU provide and what needs to go elsewhere? What size data can
the library handle? What size data can the data center handle? Should Penn State have
a long-term archiving solution, or public dissemination solution (e.g. PSU websites)?
What happens if a faculty member leaves, or retires? Should data be archived in
perpetuity, or what restrictions should be placed on this?
f. How should the Library’s Scholarsphere and the Data Commons efforts be integrated
into efforts to manage data? Library resources are in use by only segments of campus.
What should these be used for? How can they be integrated into long-term or big data
storage needs?
g. Cloud vs. (very) local vs. centralized storage solutions. Coordinate (or develop) policies
about cloud computing / cloud storage. There does not seem to be a coordinated
answer on whether faculty (via grants or local funds) can purchase access to (e.g.) the
Amazon cloud services. What issues (security, cost, network transfer, privacy, export
controls) exist with use of such services? What services provide what level of protection
(e.g. Amazon has a military-grade service; if this is accessible, then security should not
be an issue)? Are cloud services a good long-term storage solution? A solution for big
data? What are the limits and best uses of the cloud? How can we get purchasing and
risk management to recognize the appropriate use of, and approve spending on, cloud
services, when this is well-justified?
h. Big Data and access to big data. The issue of big data is different than an issue of just
faster computers.
i. Some key participants include the Library's Research Data Working Group, the Library’s
Digital Preservation Strategies Team, risk management, and the ITS security office.
Download