Syllabus - Brandeis University

advertisement
Brandeis University
Division of Graduate Professional Studies
Rabb School of Continuing Studies
Please read carefully this document.
Pay special attention to the sections
marked
Course Syllabus

I. Course Information
They contain information that you
will need from the very beginning,
some even before the class starts.
1. Introduction to probability and statistics
2.
RBIF-0103-G1
3. 01/21/2015- 04/25/2015
4. Distant Learning Course Week: Wednesday through Tuesday
5. Instructor, contact info:
Michael B. Partensky, PhD
Please contact me via email:
moshep@brandeis.edu, partensky@gmail.com
To avoid delays, please send your mail to both addresses if you want to contact me before
01/21/15. Later, please use the Brandeis address.
6. Virtual office hours: Sunday, 11 am – 13 am (EST) [occasional changes are possible]
7.
Document Overview
This syllabus contains all relevant information about the course: its objectives and outcomes, the grading criteria,
the texts and other materials of instruction, and of weekly topics, outcomes, descriptions of assignments, and due
dates. Consider this your roadmap for the course. Please read through the syllabus carefully and feel free to share
any questions that you may have. Please print a copy of this syllabus for reference.
8. Course Description



Purpose and content. The course builds a foundation for the “probabilistic thinking” method, with applications
to real life problems including bioinformatics, bio- and medical statistics, computational biology and
biophysics, data analysis. The topics cover random numbers, discrete and continuous random variables,
elements of Combinatorics, conditional probability, Bayes' formula, Markov chain, Binomial, Poisson and
normal distribution, entropy and information, Monte-Carlo method, the central limit theorem, confidence
interval and hypothesis testing, correlations, nonlinear regression and maximum likelihood. We will also learn
some basics of Mathematica programming language and will be using it for the computational probabilistic
experiments.

Prerequisites. Solid knowledge of basic algebra, geometry and trigonometry would be very helpful for your
success. If you are not fluent in basic math, please reserve more time for your weekly studies. Some familiarity
with introductory calculus (functions, derivatives, integrals) is preferable, but not required. The lectures will
provide you with the necessary background in calculus as needed.
Catching up with Math. On the first week of the class an introductory math quiz will be offered, aimed to help
you refresh your math background, and allocate adequate time and efforts for your weekly studies. The test
will cover the areas of basic and more advanced Math directly related to the class. The test is not graded, but
required (the grade is 100 if you took it or 0 otherwise). Based on the outcome, you will be advised to refresh
some of the materials if necessary. Mathematica (see section 9.3), an excellent educational and research
software intensively used throughout the course, will also help in refreshing your Math skills. It is strongly
advisable to start practicing Mathematica without delay.
2
9. Instruction Materials
9.1. Semi-Required Texts (mostly for the individual studies)
1. M.S. Spiegel, J.J. Schiller and R.A.Srinivasan , Schaum's Outline of Probability
and Statistics, Schaum’s Outline Series, McGraw-Hill, 3-d (2009), ISBN:9780071544252
2. E. Don, Schaum's Outline of Mathematica, Schaum’s Outline Series, McGraw-Hill 2-d (2009)
ISBN: 9780071608282
3. C.M. Grinstead and J.L., Snell. Introduction to probability. Am. Math, 2-d (1997) ISBN: 9780821894149
(this book can be also downloaded from the web for free Please send a thank-u note to the authors)
9.2 Recommended Text(s)
4. Bennett, D.J. 1998. Randomness. Harvard University Press, Cambridge, (1999), ISBN: 978-0674107465
Enjoyable supplementary reading. A lot of insights, paradoxes, peculiarities.
5. S. Wolfram Mathematica (9-th edition): the reference Source. It is included in e-format in the
standard Mathematica distribution).
6. W.J. Ewens and G.R. Grant, Statistical methods in bioinformatics (an introduction), Springer, 2-d,
(2005) ISBN-13: 978-0387400822
(will be used only occasionally, but could be also handy in your future study of bioinformatics.)
7. R. Durrett, Probability: Theory and Examples (Cambridge Series in Statistical and Probabilistic
Mathematics), CUP (2010) ISBN-13: 978-0521765398
8. N.N. Taleb, Fooled by randomness, Random House, 2-d, (2008) ISBN-13: 978-1400067930
[Contains a lot of insights and cute examples]
9. W.W Hines et al., Probability and Statistics in Engineering, Wiley, 4-th (2009) ISBN: 978-0471240877
10. R. Durbin, S.R. Eddy, A. Krogh, G. Mitchison, Biological Sequence Analysis :
Probabilistic Models of proteins. Cambridge University Press; Reprint edition (1999),
ISBN: 978- 0521629713 (the comment from #6 is also applicable here)
9.3
Required Software

Mathematica 10. We will be using Mathematica for the experiments with randomness. In addition Mathematica
will help you to refresh some of the math required for the course. You will be able to purchase a student version
of Mathematica 10 (which is fully functional) at a significant discount. To get an additional 15% discount please
enter the promotion code PD1637 at checkout from the Wolfram Web Store at store.wolfram.com (If asked,
please enter my name. This feature is provided to the members of the Wolfram Faculty Program).
Mathematica is an extremely powerful and elegant tool, and I am sure that some of you will find it very useful in
your future work.
It is very important to get Mathematica ASAP, prior to the first class session.
 You can even complete a few introductory assignments (including watching two videos)
before the class starts. This can greatly boost you progress in the class.
3
9.4 On-line Course Content
This course will be conducted completely online using Brandeis’ LATTE site, available at
http://latte.brandeis.edu. The site contains the course syllabus, assignments, our discussion forums,
links/resources to course-related professional organizations and sites, and weekly checklists, objectives,
outcomes, topic notes, self-tests, and discussion questions. Access information is emailed to enrolled
participants before the start of the course.
10.
Overall Course Outcomes
The course is designed to teach the probabilistic way of thinking. It provides a thorough background in the basics of
probability theory and statistics, the major pillars of bioinformatics and biostatistics. We will utilize the multidisciplinary approach by using the examples and examining the ideas from various fields, from statistical physics and
computer modeling of proteins to the probabilistic aspects of evolution and biological data analysis. The class will
strongly benefit from using Mathematica, the most advanced “computer aided thinking tool” which helps in
understanding the major concepts of P&S, developing algorithms and running random experiments.
Course Outcome
Assignment / Assessment
1.
Apply the elements of set theory to the
analysis of complex events and biological
sequences
Lect. 2, 3; HW 2, 3
2.
Use Combinatorics for the analysis of
various random selection problems,
derivation of major probability distributions
and grasping some major combinatorial
problems of sequence analysis.
Lect. 3,4; HWs 3,4
Apply Binomial, Poisson, geometric hypergeometric, negative binomial, Normal,
exponential and other probability
distributions to the analysis of probabilities,
sampling errors, sequence similarity.
Lect. 4, 5, 10, 12;
3.
4.
Lect. 10; HW 10
In addition, various Combinatorial concepts are quite
evenly distributed over the course, as one of the
foundations of Probabilistic Thinking
HW 4-6, 10-12
Recognize and analyze phenomena described Lect. 6,7; HW 6,7
by conditional probability. Use the Bayes
formula to analyze prior probabilities given
the outcomes
4
5.
Apply non-linear regression (NLR) to data
modeling; develop Mathematica-based
applications of NLR for solving some reallife problems
Lect. 11; HW 11.
6.
Apply the concept of Maximum likelihood to Lect. 11; HW 11
the experimental data analysis.
7.
Analyze some archetypical paradoxes of
Lect. 2, 7
probability (‘Monty Hall’, ‘prisoner’s
Multiple Q&A forum discussions
dilemma’, second daughter) for the guidance
in solving complex real-life statistical
problems.
8.
Apply the measures of central tendency
(mean, variance, e.t.c.) for the statistical
estimates
Lect. 10; HW 10
9.
Analyze and simulate with Mathematica
various Markov models as a foundation of
the major algorithms of sequence analysis
(HMM, Blast, e.t.c.)
Lect. 7,8; HW 8
10. Use the central limit theorem for the analysis Lect. 12; HW 12; Test preparation problems.
of sampling errors and confidence interval
11. Apply the hypothesis testing technique to the
analysis of statistical data
12. Use relation between entropy and
probability, and Boltzmann statistics as
fundamental concepts behind the protein
dynamics and energetics.
Lect. 12; HW 12; Test preparation problems.
Lect. 13; Q&A forum discussions.
Elucidate relation between entropy, disorder,
and information.
13. Formulate basic principles underlying the
Monte Carlo and Molecular dynamics
modeling of molecular biological systems.
Lect. 5, 13 (+ Videos of MC simulations)
14. Analyze and describe some statistical
problems of genetics ( Hardy-Weinberg law,
probabilities of genetically inherited
diseases, applications of Bayesian statistics)
Lect. 7; HW 7
15. Actively participate in the team work:
problem solving in groups
Weeks 2 - 13
16. Use Mathematica as the programming,
visualization and presentation environment
Weeks 1-5 : intense introduction to Mathematica; practical
applications of Mathematica are evenly distributed between
the classes
5
Upon completion of the course students will be able to

Use general principles of P&S in preparation for future work in bioinformatics
-
-
Use the operational definition of probability to estimate the empiric probabilities for random events and
biological sequences
Apply the elements of set theory to the analysis of complex events
Use Combinatorics for the analysis of various random selection problems, derivation of major probability
distributions and grasping some major combinatorial problems of sequence analysis.
Apply Binomial, Poisson, Normal, geometric, hyper-geometric and negative binomial distributions to the
analysis of probabilities, sampling errors, sequence similarity
Recognize and analyze phenomena described by conditional probability
Use the Bayes’ formula to analyze prior probabilities given the outcomes
Apply non-linear regression (NLR) to data modeling; develop Mathematica-based NLR applications for some
practical examples
Apply the concept of Maximum likelihood to the experimental data analysis.
Analyze some archetypical paradoxes of probability (prisoner’s dilemma, Buffen needle, etc) for the guidance
in the analysis of complex real-life statistical problems.
Apply the measures of central tendency (mean, variance etc) for the statistical estimates
Analyze and simulate with Mathematica various Markov and random walk models for better understanding of
the major algorithms of sequence analysis (HMM, Blast, etc)
Use the central limit theorem for the analysis of sampling errors and confidence interval
Apply the hypothesis testing technique to the analysis of statistical data
Use the ORC curves approach to the test design

Apply probabilistic methods and concepts to the analysis of biological systems on different levels:
-
Use relation between entropy and probability, and Boltzmann statistics as fundamental concepts behind the
protein dynamics and energetics
Formulate basic principles underlying the Monte Carlo and Molecular dynamics modeling of molecular
biological systems.
Analyze the probabilistic basis of Mendelian genetics, distribution of alleles, Hardy-Weinberg (HW) theorem;
-

Participate in a team research work involving numerical statistical analysis and modeling, and communicate its
results to colleagues; make presentations on various statistical topics
-
Team work in the class
Use Mathematica as the programming, visualization and presentation environment
11. General Grading Criteria
The course grade will be based on homework (50%), tests (20%), student’s activity in class (30%). In addition, students
can earn extra credits for various extra activities. This can be done, for instance, by completing the optional
assignments offered in most of the lectures, making short presentations (papers + computer experiments), etc.
12. Assignments and Tests: Description, Structure and Grading
13.1 Participation/Attendance
All students are expected to participate regularly. The activities
(forum discussions, group activities, reading and Home Work assignments) should be spread evenly
over the week.
13.2 Communication, correspondence. All the emails related to this class will be sent to your
Brandeis email account. However, almost everyone has and uses a primary personal account. For this
6
reason it is extremely important to set up forwarding from your Brandeis account to the primary
account.

It is you responsibility to make sure that all the messages from the instructor and from the school are received on
time. At the beginning of the class I will ask you to send me confirmations to make sure that everyone is tuned in.
13.3 Home assignments (content, early submission options, and grading).
General
Every week, a homework assignment will be offered. It typically includes a required part and an extracredit. The deadline for the submission is Tuesday 11.30 pm. The late assignments are not accepted
(graded F). In such cases, a make-up can be offered. However, it is highly recommended to submit on
time because the class is quite intense and working on additional assignments can jeopardize your
progress.
All the submissions should be done via the latte
Submission options.
Usually, you will be offered to choose one of two options:
a) Submitting once (single file submission) for final grading. The only deadline in this
case is Tuesday 11:30 pm.
b) Submitting more than once (multiple file submission). As explained further, this
option is also named the “Early submission” (ES), and involves two deadlines [for
the first submission (see weekly assignments), and for the final submission,
Tuesday, 11:30 pm].

The “Early Submission” (ES) elaborated
ES implies “multiple file submission”, where the originally submitted assignment can be
improved and resubmitted.
One who chooses this option must submit early, usually by 14:00 on Sunday preceding the class
(unless otherwise is stated for a particular week). If the original submission is not perfect, it will be
returned to you with the initial grade (we designate it G(1) ), with the score assigned to each of the
problems, and with some questions and hints helping you to find and fix the errors. Then you are
given an opportunity to resubmit and improve your grade.
The initial submission must be complete: you should provide solutions to all the required problems.
The first grade G(1) is the starting point, and all the further grades depend on it. At the end, after
the resubmission(s), your grade cannot be less than G(1), but you also (except for some rare
occasions) cannot get 100% (assuming G(1) was less than 100%).
Each submission numbered n (n= 1, 2, 3…) is initially graded based purely on its quality. We name
this the “unbiased” grade G(n). The “real” grade for each submission is defined as
𝟏
𝑮𝒓𝒆𝒂𝒍 (𝒏) = (𝑮(𝒏) + 𝑮(𝟏)) (1)
𝟐
For instance, if the first percentile grade is G(1) = 60 and the second grade (first resubmission) is
G(2) = 100, then the final grade is 80%. This approach should motivate you (in addition to submitting
early) to receive the starting grade as high as possible.
The individual problems are graded on the scale 0 to 1. In each submission the total percentile grade
G(n) is obtained as the total of the scores for the individual problems divided by the total number of
the problems, times 100.
7
The real grades for the individual problems are calculated for each submission in the spirit of rule (1).
For example, if the (unbiased) grade for a particular problem changes as {0.6, 0.7, 1. } in the course
of three consecutive submissions , then the “real” grade for this problem is {0.6, 0.65 and 0.8}, and
the final grade is 0.8. Usually, there is only one resubmission, n=2. In some cases (especially if the
work was submitted earlier during the week, say before Sunday), the second, and, occasionally, even
the third resubmission (n= 3, 4) will be allowed.

Please use the same file for all the (re)submissions of a current week!
For each resubmission, please create a separate subsection after each solution being fixed, named the
“Solution n” with n = 2 (for the first resubmission) or n=3 (for the second). We will learn how to
format Mathematica files (including the sectioning) during the first two weeks.

All my comments and your solutions (the current one and all the previous) must stay in the file
unchanged. Do not delete your previous solutions and my comments!

The name of the file should contain your name, and the submission number (n). It is usually derived
from the original name of the file (posted by the instructor) by adding “_YourName_n”. For
example, the assignment of the 5-th week was named HW5.nb. Then, HW5_ SamClemens_2.nb is
the second (2) submission of this assignment by Samuel Clemens. Similarly, HW7_JaneWang_1.nb
is Jane’s first submission of HW7.
The discussions that follow the early submission lead to a better and deeper
understanding of the course material and improve your overall performance.
Naturally, submissions made after the ES deadline (but before the final deadline), are graded only

once.
There is neither a penalty for not using the ES option, nor a reward for submitting early (except for
the opportunity to resubmit and fix the errors).
13.4 Self-tests. Some assignments will be accompanied by the self-tests containing the problems similar
to those from the Home Assignment. These are offered solely for your practice and benefit, and do not
have to be submitted. All the self-test problems can be discussed on the open forum
13.5 Class presentations
Concepts reviewed in the class or related to those could be enriched
through the (optional) students’ presentations (typically, the short papers including the examples with
Mathematica). This activity is entirely voluntary. It is graded as a participation assignment (see the
grading policy). I will suggest a few topics, but quite often students contribute their own ideas and
topics, and share P&S- related experiences from their work (I remember remarkable presentations
about the Bayesian networks, and on a Stock Market analysis) or even their hobbies (the Mathematica
model of the “Texas Hold'em” was one of such examples). Please, indicate you interest in making a
presentations as early as possible). The “presentations” will be added to the reading materials, and
everyone will be encouraged to read and discuss them on the forums. The starting threads for the
discussions can be created by the presenters.
13.6 Groups and related activities. The class will be divided into several groups, usually three
students per group. Certain assignments will be offered for group work, and the answers will be
graded as “class activity” or the HW, depending on the type of assignment.
Short (30 min) tests will also be offered during some weeks, either for the whole class, or separately
for the group work.
An important component of your class activity is the participation in the weekly discussion forums.
Your posts (responses, questions etc.) will be evaluated based on substantiality of their content.
Instead of elaborating our understanding of “substantiality” it is easier to give the examples of nonsubstantial posts:
8
“Hi, John. It’s a wonderful idea! I was thinking along the same line! Cheers. Mike”.
“Ann, I liked your solution but could not understand the last part. Could you please explain it”.
They both are valid and useful responses. The first one is a kind encouragement, while the second
contains a question and invites for further discussion. In other words, they do not contribute to the
grade, but they both are valuable and important.
All kinds of responses are welcome and important, even if they are not graded as substantive. Besides,
there is no sharp boundary between the substantive and other responses. For example a simple inquiry
can be considered substantive if it triggers a valuable discussion, and it should be graded accordingly.
Quite often the HW assignments will be offered on a group basis. In such cases the detailed
instructions will be provided.
Note: In general, the early submission policy is not applied to the group assignments. Instead, the
group members are allowed to discuss the solutions on the group forum, in the process of composing
jointly a submitted document. The instructor can participate in these discussions and provide hints and
advises if necessary.
13.7 Mid-term and Final tests will be offered on the 8th and 13-th week respectively. They include 5-6
problems each. The specific instructions will be provided.
13.8 Online Participation

There are four major types of forum activity
(1) Responses to the original questions posted by the instructor (Q&A forum(s)).
This includes questions related to the HW assignment and Lecture materials. Answering a certain
number of these questions will be required. Each student gains complete access to such forums only
after having responded to the first question posted by the instructor. Usually, there will be up to three
required questions. The specific instructions will be provided weekly.
(2) Participation in the discussion at Q&A forum(s)
After answering the required questions, you will be able to access these forums and participate in
discussion. This is also a valuable component of your participation.
(3) Participation in the open discussion forum(s) (ODF), where a student can ask and
respond to any class-related questions (except for the HW assignment).

Comment: The exception is the Home Work problems: they must be solved individually. The only
HW-related questions allowed for discussion are the questions posted by the instructor (see A.1 ).
Otherwise, the discussion of HW assignments is prohibited. However, you can ask me HW –related
questions (mostly related to the understanding of the problems rather than their solutions) at private
forums.
(4) Group discussion forums
These forums will be created for various group activities.
Detailed participation assignments will be posted weekly. Here is an example (we presume that
the class week starts on Wednesday):
9
“(a) By Friday Night (22:00 EST) post two original (required) responses on Q&A forum(s).
(b) Not later than 12:00 (EST) on Monday post two (at least) replies to the posts of other participants
(and/or submit your own substantive questions or comments) on Q&A forum, and at least one post on
ODF.
The posts must be submitted on at least three different days of the online course week. For example,
you can post your answers to Q&A on Thursday and Friday, reply to Q&A on Saturday, and
participate in ODF during the week".
Online participation is very important. It contributes 30% to the total grade, and it is a very effective
learning tool. You will soon realize that the aforementioned requirements are not “abusive”, and
discussing your questions with others is rewarding and enjoyable. Most of you will easily surpass the
required level of participation.
13.9 Participation Evaluation
First, we introduce two types of students’ responses:
Type 1(T1): responses to the original questions posted by instructor at Q&A forum;
Type 2 (T2): Participation in Q&A (after and ODF discussion.
Points may be earned for original responses and substantive replies based on the following criteria:
Type 1
90-100 pts
(Very Thoughtful)






79-89 pts

(Thoughtful)





68-78 pts
(Somewhat Thoughtful)





Discussion is substantive and relates to key principles.
The answers are complete, and well explained
The Math and coding part (if present ) is correct
Provides examples demonstrating application of principles.
Is submitted according to the deadlines in the course schedule.
Language is clear, concise, and easy to understand. Uses terminology
appropriately and is logically organized.
Makes reference to key principles, but is not well developed or
integrated in the response.
The answers are not complete, and not well explained
The Math/coding part (if present ) is on the right track, with some
errors
Offers some examples, but they are not sufficiently illustrative and
not well integrated in the response.
Submitted according to the deadlines in the course schedule.
Is adequately written, but may use some terms incorrectly; may need
to be read two or more times to be understood.
Contains no reference to key principles; if key principles are present,
there is no evidence the learner understood principles, or key
principles are not integrated into the response.
The Math/Coding part (if present ) contains errors
Does not offer examples, or the examples are too trivial.
Response is not submitted by the due date.
Poorly written; terms are used incorrectly; cannot comprehend
learner’s ideas after repeated readings.
Type 2
90-100 pts

(Very Thoughtful)



Is substantially related to and reinforces the unit overview, text,
and/or supplementary readings.
Responds to the ideas and concerns of other learners.
Math/coding (if present) is correct and clearly explained
Is characterized by three to four of the following criteria:
o Thought-provoking
o Supportive
10


79-89 pts

(Thoughtful)





68-78 pts
(Somewhat Thoughtful)





o
o
Challenging
Reflective
Is submitted according to deadlines in the course schedule.
Language is clear, concise, and easy to understand; uses terminology
appropriately and is well organized.
Contains references to unit overview, text, and/or supplemental
readings, but references are not well integrated in the response.
Response is peripherally related to the ideas and concerns of other
learners.
Math/coding (if present) contains some minor errors or explained
clearly
Is characterized by one or two of the following criteria:
o Thought-provoking
o Supportive
o Challenging
o Reflective
Submitted according to deadlines in the course schedule.
Adequately written, but may use some terms incorrectly; may need to
be read two or more times to be understood.
Contains no reference to key principles; if key principles are present,
there is no evidence learner understood principles, or key principles
are not integrated into the response.
Math/coding (if present) contains errors
Response is unrelated to the ideas and concerns of other learners.
Response is not thought-provoking, supportive, challenging, or
reflective.
Response is not submitted by the due date.
Is poorly written; terms are used incorrectly; instructor cannot
comprehend learner’s ideas after repeated readings.
The total participation grade is calculated as a weighted average. The responses belonging to T1 directly test your
understanding of the lecture material and of the HW assignment. For this reason they are sometimes assigned a higher
weight.
For example, the grade X1 for the type 1 response can be assigned the weight p = 0.6. Then, the weight of contributions
X2 of the second type has weight q=0.4.
The grade X1 itself is the average of percentile grades for all the required responses to the instructor’s questions at
Q&A forum. The grade X2 for T2 is the average of the grades for corresponding contributions.
In all cases, if the number of responses exceeds the required number of responses, Nreq, the best Nreq responses will be
chosen. For example, if the required number for T1 is N req=3 and the actual number of responses is 5, then only three
best responses will be counted towards the grade (we call these grades X1, X2 and X3 ) and the total grade for T1
becomes X1 = (X1 + X2 + X3)/3 . The same holds for T2 responses.
The final grade is calculated as
X 𝑡𝑜𝑡𝑎𝑙 = p × X1 + q × X2
Consider the example:X1=70, X2=90, p=0.6, and q = 0.4. Then, X 𝑡𝑜𝑡𝑎𝑙 = 0.6 × 70 + 0.4 × 90 = 78.

The final grade is shifted towards X1 demonstrating the role of the weights.
Note: This “weighting” approach is not strict. For example, if your contribution belonging to T2 category is original,
mind-provocative and demonstrates your deep understanding of the subject, its contribution to the grade will be
enhanced.
II. Weekly Information
11
1. Course schedule and class topics
Some minor changes in the topics, and in their distribution, are possible.
Comments
Starting
Topic
Week Date
1
01/22/14
(for the week of the
class)
1-st Introduction to Mathematica. Mathematical Background for You are encouraged to watch
the suggested videos even
P&S
before the class starts
2-d Introduction to Mathematica. First random simulation with
Mathematica.
2
Please, volunteer and select
01/29/14
3
02/05/14
First introduction in Probability (P). Early history of P: on The topics for 1-st presentations
shoulders of the giants. Laws of chance: are they possible? (for the weeks 4 or 5)*
Conundrums and Paradoxes of probability.
Random experiments, sample space and random events.
Introduction to the set theory. Axioms of Probability. Frequency
definition of Probability. Random variables. Probability function
and Cumulative Distribution Function.
Counting Probabilities: Multiplication Rule.
3-d Introduction to Mathematica.
4
02/12/14
Counting probabilities (continued). Elements of Combinatorics.
Permutations, combinations, binomial coefficients.
Bernoulli trials and related Probability Distributions: Binomial,
Geometric, Negative binomial distribution. Some applications in
Sequence Analysis.
4-th Introduction to Mathematica: Random numbers, chance
experiments with Mathematica: Matrices in M.
5
02/19/14
6
02/26/14
7
03/05/13
8
03/12/14
9
03/19/14
Multinomial, Gypegeomteric and Poisson distributions. Please select topic for
Applications, problem solving.
the presentation, week
Chance experiments with Mathematica: Monte-Carlo Integration. 9-11. You can use my
suggestions or pick
Conditional probability. Independence, Global independence.
your
own.
The
Total probability Rule. Simulations with Mathematica.
presentation is graded
as class participation.
Bayes formula and related “paradoxes”
If you decide not to
Two-stage experiments. Hardy Weinberg theorem. Markov Chain:
present, - do not worry.
recursive treatment.
This is completely
Practice for the test.
volunteer activity
Mid-term test (it may be distributed between weeks 8 and 9)
Markov chain: Matrices-based treatment. Applications
Bioinformatics: CpG islands. Random walks.
to
Integrals with Mathematica.
Continuous random variables. Distribution function, CDF.
Important distributions and densities: Uniform, Exponential,
Gamma, Normal, Chi-Square. Relations between Binomial,
Poisson and Normal distributions. Practice.
12
10
03/26/14
11
Practice with the continuous distributions.
Mean, Variance and other estimators (moments) for discrete and
continuous random variables. Sums of random variables.
Some applications of Probability Distributions in bioinformatics
Joint distributions, marginal distribution.
Introduction to data modeling: (1) Maximum Likelihood; (2)
Linear and Non-linear regression.
04/02/14
Real life examples with Mathematica.
04/09/14
Distributions of sums of random variables. Laws of large numbers.
Central limit theorem. Confidence Interval. Hypothesis testing (1).
Random samples, and sampling distributions.
04/16/14
Probability and Entropy. Boltzmann distribution. Monte Carlo and
Molecular dynamic simulation of biomolecules.
Final Test. Review.
12
13
*The link to the suggested topics can be found in the “Lecture Materials” page (Latte), but you are also encouraged to suggest your own topics.
2. Weekly assignments
Every week we offer a HW assignment typically including 5 – 9 problems (one of them is usually an extra-credit
problem). The assignments are in Mathematica notebook format, and Mathematica is used both for solving the
problems, and formatting the submitted document. Some assignments include random experiments with Mathematica.
The early submission policy is described in section 10 (1). The latest submission time is Tuesday, 11.30 pm. The
assignments will also include the “participation tasks”, especially the forum activities.
3. Weekly outcomes
1
At the end of week 1, students will:


2
Refresh the main mathematical concepts/tools used in the class, including some elements of algebra,
sums, products, integrals.
Write first Mathematica-based programs using functions, tables, random number generator, plots.
At the end of week 2, students will be able to:




Describe the major sources of Probability Theory
Describe some archetypical paradoxes of Probability
Apply some basic analytical and visualization tools of Mathematica
Run simple random simulations with Mathematica
13
3
At the end of week 3, students will be able to:






Describe the sample spaces of various random experiments.
Analyze simple and complex events in terms of the set theory
Apply the set theory to the classification of amino acids
Use the frequency-based definition of probability rule for the analysis
of probabilities of different random events
Describe random phenomena in terms of Random variables, Probability Function and Cumulative
Distribution Function.
Apply Multiplication Rule to counting the outcomes of sequential experiments.
4
At the end of week 4, students will be able to:
 Use Combinatorics for the analysis of various random selection problems, derivation of major
probability distributions and grasping some major combinatorial problems of sequence analysis.
 Running simple statistical simulations in Mathematica.
 Recognize the Bernoulli trials process, and the related discrete probability distributions (DPDs):
Binomial, Geometric, and Negative Binomial.
 Describe some sequence analysis problems in terms of discrete probability distributions.
 Performing basic operations on matrices using M.
5
At the end of week 5, students will be able to:




6.
At the end of week 6, students will be able to:




7.
Perform numeric computations using Monte Carlo Approach
Formulate basic principles underlying the Monte Carlo approach to computer modeling
Apply Poisson, Hypergeometric and Multinomial DPDs to the analysis of random events
Recognize differences between sampling with and without replacement.
Apply various statistical distributions to analysis of random events
Investigate properties of the statistical distributions using the Mathematica-based algorithms
Analyze the properties of related events in terms of conditional probability.
Investigate the pairwise and global independence of the events.
At the end of week 7, students will be able to:


8.
Apply Bayes’ formula to the analysis of posterior probabilities
Using Bayes’ approach, analyze reliability of tests based on the on the two types of errors and
prevalence.
 Apply the concepts of Specificity and Sensitivity to the medical tests analysis.
 Analyze multi-step random experiments
 Derive the Hardy Weinberg theorem
 Describe the general properties of Markov chains
 Apply recursive approach to the analysis of Markov chain
At the end of week 8, students will be able to:



Apply matrices to the analysis of Markov chains (MC)
Apply MC to bioinformatics ( detecting the CpG Islands)
Simulate the random walks with Mathematica.
14
9.
At the end of week 9, students will be able to
 Describe the continuous probability distributions in terms of pdf and cdf
 Apply Mathematica to the analytical and numerical computations of the continuous probabilities
 Recognize and use some important continuous distributions: Uniform, Exponential, Gamma, and
Normal.
 Recognize and apply joint and marginal distributions
10
At the end of the week 10, students will be able
 Compute the measures of central tendency (expectation, variance etc) and apply them to the analysis
of statistical properties
 Analyze the statistical properties of the sums of random variables using Mathematica-based
simulations
 Apply the law of large numbers to the analysis of the asymptotic behaviors of the sums.
11
At the end of week 11, students will be able to
 Apply the Maximum Likelihood approach to the modeling of statistical data
 Use Linear and Non-linear regression for data modeling and analysis of correlations
 Use Mathematica-based statistical algorithms (NonlinearFit, Anova, LinearRegression) for the data
analysis
 Develop Mathematica-based NLR tools for some practical applications
12
At the end of week 12, students will be able to
 Use the central limit theorem, and explain the special role played in statistics by the normal
distribution.
 Explain the concept of “confidence interval”, “statistical significance” and “ p-values”, and their role
in statistics.
 Apply these concepts to the hypothesis testing.
13
At the end of week 13, students will be able to
 Describe the relation between probability and entropy. Boltzmann’s formula.
 Describe Boltzmann distribution and explain its use for derivation of equilibrium statistical
properties (examples of gases).
 Explain physical principles behind Monte Carlo simulation of biological systems.
4. Weekly Reading assignments.
The lecture materials are mostly self-containing. The additional reading assignments will be posted.
III. Course Policies and Procedures
15
Late Policies
The Homework assignments must be submitted prior to the class, not later than 11.30 pm on Tuesday.
Those who do not submit their assignments on time will have to take a make-up test. However, the class is quite intense,
and it is in your best interest to complete your assignments on time.
Grading Standards
Work expectations
Students are responsible to explore each week's materials and submit required work by their due dates. On
average, a student can expect to spend approximately 9 - 12 hours per week (more specific recommendations
will be made individually during the week 2), reading and completing assignments. This presumes that a
student’s educational background satisfies the prerequisites. Otherwise, more efforts would be required. The
assignments will be posted at the beginning of each week (Wednesday morning).

Grades are not given but are earned. Students are graded on demonstration of knowledge or competence, rather
than on effort alone. Each student is expected to maintain high standards of honesty and ethical behavior.
How points and percentages equate to grades
%%
98-100
94-97
90-93
85-89
80-84
75-79
Character grade
A+
A
AB+
B
B-
%%
70-74
65-69
60-64
50-59
0-49
Extra credit
Character grade
C+
C
CD
F
Adds up to 10% of the base grade
Attention: Sage converts both A+ and A into 4.0. I will still use A+ as a token of my appreciation for a job done far above the
required level. In a practical sense, the extra “+” can be used to improve your other grades if needed.
Feedback
Feedback will be provided on assignments and exams within 2-3 days of receipt. Responses to the forum posts
will be provided not less than 4 times per week.
Confidentiality
We can draw on the wealth of examples from our organizations in class discussions and
in our written work. However, it is imperative that we not share information that is
confidential, privileged, or proprietary in nature. We must be mindful of any contracts we
have agreed to with our companies. In addition, we should respect our fellow classmates
and work under the assumption that what is discussed here (as it pertains to the
workings of particular organizations) stays within the confines of the classroom.
[Please add this to your syllabus, in the confidentiality subsection:
For your awareness, members of the University's technical staff have access to all course sites to aid in course
setup and technical troubleshooting. Program Chairs and a small number of Graduate Professional Studies
(GPS) staff have access to all GPS courses for oversight purposes. Students enrolled in GPS courses can
expect that individuals other than their fellow classmates and the course instructor(s) may visit their course
for various purposes. Their intentions are to aid in technical troubleshooting and to ensure that quality course
delivery standards are met. Strict confidentiality of student information is maintained.
16
Class Schedule
Week 1
Week 2
Week 3
Week 4
Week 5
Week 6
01/22 - 01/28
01/29 - 02/04
02/05 - 02/11
02/12 – 02/18
02/19 – 02/25
02/26 – 03/04
Week 7
Week 8
Week 9
Week 10
Week 11
Week 12
Week 13
03/05 – 03/11
03/12 – 03/18
03/19 - 03/25
03/26 - 04/01
04/02 - 04/08
04/09 - 04/15
04/16 - 04/22
IV. University and Division of Continuing Studies Standards
Please
review
the
policies
and
procedures
of
Continuing
Studies,
found
at
http://www.brandeis.edu/gps/students/studentresources/policiesprocedures/index.html. Among them, we would like to
highlight the following.
Learning Disabilities
If you are a student with a documented disability on record at Brandeis University and wish to have a reasonable
accommodation made for you in this course, please contact me immediately.
Academic Honesty and Student Integrity
Academic honesty and student integrity are of fundamental importance at Brandeis University and we want students to
understand this clearly at the start of the term. As stated in the Brandeis Rights and Responsibilities handbook, “Every
member of the University Community is expected to maintain the highest standards of academic honesty. A student
shall not receive credit for work that is not the product of the student’s own effort. A student's name on any written
exercise constitutes a statement that the work is the result of the student's own thought and study, stated in the students
own words, and produced without the assistance of others, except in quotes, footnotes or references with appropriate
acknowledgement of the source." In particular, students must be aware that material (including ideas, phrases,
sentences, etc.) taken from the Internet and other sources MUST be appropriately cited if quoted, and footnoted in any
written work turned in for this, or any, Brandeis class. Also, students will not be allowed to collaborate on work except
by the specific permission of the instructor. Failure to cite resources properly may result in a referral being made to the
Office of Student Development and Judicial Education. The outcome of this action may involve academic and
disciplinary sanctions, which could include (but are not limited to) such penalties as receiving no credit for the
assignment in question, receiving no credit for the related course, or suspension or dismissal from the University.
Further information regarding academic integrity may be found in the following publications: "In Pursuit of Excellence
- A Guide to Academic Integrity for the Brandeis Community", "(Students') Rights and
Responsibilities Handbook" AND "Continuing Studies Student Handbook". You should read these publications, which
all can be accessed from the Continuing Studies Web site. A student that is in doubt about standards of academic
honesty (regarding plagiarism, multiple submissions of written work, unacknowledged or unauthorized collaborative
effort, false citation or false data) should consult either the course instructor or other staff of the Rabb School for
Continuing Studies.
University Caveat
The above schedule, content, and procedures in this course are subject to change in the event of extenuating
circumstances.
17
Download