The Stats side of things - Department of Mathematics and Statistics

Nairanjana Dasgupta Also called JAN Professor, Dept of Mathematics and Statistics WSU.  My theoretical issues are simultaneous testing, binary data and multivariate data.  If you are doing many many tests with large scale issues of Type I error, I would be the person you want to talk to  I will delineate four areas that I am currently working on. WA is associated with apples.  I have been working with the Tree Fruit Research Commission for the last 5 years modeling various aspects of apples  Apples grow from a bud and goes through about 8 stages before it becomes a fruit.  One relevant question the growers ask: can we build a model so that we can predict what stage a apple will be at a particular time, if I have weather info as well as prior data.  Statistically it becomes a question of modeling the pattern while taking into account issues like auto-correlation, missing data, istonic nature of the data etc. Fun project if you like to dirty your hands with large data sets and solve a REAL problem for this state!   Modeling Growth: Can we predict the SIZE an apple will be at harvest from prior data and weather information at the current time point.  Statistically we need to deal with non-linear curves and try to establish growth patterns. Lets talk about how we worked on this   About 70% of apple production in the United States takes place in the State of Washington alone, with $2.18 billion value of apple produced in 2013 (WSDA, 2015).  Issue: a considerable amount of time gap exists between the order of defined sizes of crates and the delivery of the crates by the manufacturers/suppliers.  Due to this time lag, farmers have to be able to make correct judgment on the size of crates to be ordered far ahead of apple harvest date.  Therefore, a decision tool that can assist apple growers/marketers to make accurate judgment on the size range of apple at harvest is warranted.  To develop a statistical model that can be used for making management decisions on apple marketing.  Specifically, to develop a model that can predict the size of apple at harvest with production data.  In such a model, farmers can input the production parameters in advance and the model will yield the average size of apple for a particular date of harvest. Model for each growth stage Use models from above to estimate model for harvest stage Select few competing models Predict test sample Select the final model  The model will be developed with data from commercial apple growers in the Washington State.  Data are available for different apple varieties for different years (2010 – 2014) across various locations in the state.  For this project, we use data only from Crisp Pink apple variety.  Data are obtained from 11 locations that include East Wenatchee, Lewis Delay, Auvil Chelan, Auvil Brays, Sun Orondo CO, Sunrise, Olmstead Wapato, Ines Kon Pass, Gwen Ballard, Prossor, and Finley.  Suppressing index for individual observation, the regression model can be expressed as: t = {40, 50, …, 180}, i = {10, 20, …}.  We chose 180 DAFB as our targeted date for which apple size has to be predicted.  Each of the estimated regression will first determine the appropriate predictors for the corresponding dependent variable, and finally, the model for Mean180 is estimated.  A series of predictors that seem appropriate to include in the Mean180 model are sequentially replaced by their respective predictors as identified in the previous regressions. Predicting Mean180 3 2.4 2.6 2.8 2.4 2.6 2.8 3 3.2 m180_2 3.2 m180_1 2.4 2.6 2.8 mean180 Fitted values 3 3.2 2.4 mean180 2.6 2.8 mean180 Fitted values 3.2 mean180 3 2.4 2.6 2.8 2.4 2.6 2.8 3 3.2 m180_4 3.2 m180_3 3 2.4 2.6 Fitted values 2.8 mean180 3 mean180 3.2 2.4 2.6 Fitted values 2.8 mean180 3 mean180 3.2 Particulars Model 1 Model 2 Predictors FB, DAFB, Mean40, Mean130 RMFSE, inch 0.0576 FB, DAFB, Mean50, Mean60, Mean100 0.3684 RMFSE, % of Average Harvest Size 1.993% 12.75%  I do spend a significant amount of time modeling genes.  Currently I am working on a project where we are trying to establish early stage markers for Hepatocellular carcinoma (it’s a type of liver cancer, believed to occur after chronic liver disease).  Idea is if we follow people who have chronic Hepaptitis B or C and then see which genes have had changes (methylation). Idea is methylation is a reversible process. So if we find the groups of genes that are methylated early on, maybe we can stop the cancer early. The end product of the research is a therapy that might replace chemo (which doesn’t work for liver cancers anyway).  My part is in identifying the specific genes out of the entire genome by following its expression across he stages of liver disease… To establish pathways for the genes and the proteins involved in the process.  We had 4 groups: • T : Cancer Tumor Group (people with active HCC) • C: Chronic Group (people with active HepB or HepC) • A: Asymptomatic Group (people who have had Hep B/C but have had a liver transplant • B: normal healthy people  We wanted to see the effect of methylation or not on some specific genes that are known to be oncogenes or Tumor Supressor Genes  I will share some pictures:      In any testing, false positives are always a risk. But in larger studies (fMRI – brain imaging, genomic, proteomic studies) when thousands of tests are conducted simultaneously, FP is more than a nuisance, it can have pretty drastic and financial consequences. One cannot eliminate FP’s but we can reduce the probability of such an occurrence. One option people often use is looking at the top k genes or pixels (ordered using some criteria). One question I am investigating is what is expected value of misclassification using these top tables. How do we decide how to pick the top k. What does k depend upon?  P(u( k )  v( N  k  r 1) )   P(u( k )  v | v ) g ( v )dv       h(u )g ( v )dudv  v      {k [ (u )]k 1 (u )du}g ( v )dv  v   ( N  k )! k 1 N  k  r 1 r  { k [  ( u )]  ( u ) du } {  ( v )] ( 1   ( v ))  ( v )dv     ( N  k  r  1)! r!   v        I have worked with an Anthropologist interested comparing hunting patterns of two tribes Worked with a plant pathologist of potato blight disease (cause of Irish Famine) Ecologist looking at heavy metal contamination of WA lakes I work on many interesting (at least to me) real problems. Some of these I see immediate applications: some I have to wait for. But each problem has its own story and it makes it interesting to be able to contribute something to the real world. Please feel free to ask me questions and email me: dasgupta@wsu.edu

The Stats side of things - Department of Mathematics and Statistics

Related documents

Products

Support

The Stats side of things - Department of Mathematics and Statistics

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib