Introduction to Program R

ปราณี นิลกรณ์    R เป็ นภาษาและโปรแกรมสาเร็จรูปสาหรับการ คานวณทางสถิตแ ิ ละสร ้างกราฟประเภทให ้เปล่า ( free open source package ) ทีพ ่ ัฒนาขึน ้ มาจาก ภาษา S ( S language, Bell Labs) โดย Robert Gentleman และ Ross Ihaka แห่งUniversity of Auckland, New Zealandเมือ ่ ปี 2538 ้ เหมาะทัง้ สาหรับการเขียนโปรแกรมเอง และใชแบบ โปรแกรมสาเร็จรูป ั ทางสถิตใิ ห ้เรียกใชมากมาย ้ มีฟังก์ชน และมีผู ้พัฒนา เพิม ่ อย่างต่อเนือ ่ ง 2 ข ้อมูลทุกอย่างเกีย ่ วกับ R หาอ่านได ้จาก http://www.R-project.org ่ นหลัก คือ  R system ประกอบด ้วย 2 สว 1.Base system – ประกอบด ้วย R language software และสว่ นเพิม ่ เติมอืน ่ ๆทีม ่ ค ี วาม จาเป็ นต ้องใชบ่้ อยๆ 2. User contributed add-on packages  3  จะหาโปรแกรม R ได ้จากไหน?  ไป download ได ้ที่ www.r-project.org หรือที่ http://CRAN.R-project.org  โดยเลือกลงโปรแกรมพืน ้ ฐาน ( Base Package) ้  ถ ้าต ้องการใชแบบเมนู จะต ้องติดตัง้ โปรแกรม Rcmdr เพิม ่ เติม 4  การจัดการข ้อมูลและหน่วยความจา  การคานวณในรูป Array และ Matrix  การวิเคราะห์ข ้อมูลทางสถิต ิ  การสร ้างกราฟ  การเขียนโปรแกรม 5 RGui ( Gui – Graphical user interface) ประกอบด ้วย  วินโดวส ์ R Console สาหรับเขียนคาสงั่ และแสดงผล ลัพธ์  วินโดวส ์ R Graph สาหรับแสดงกราฟ  Script Windows สาหรับเขียน แก ้ไขคาสงั่ โปรแกรม 6 7 8 9  R มี Packages ทีมผ ี ู ้สร ้างสาหรับการคานวณและการวิเคราะห์ ้ ้อย่างสะดวก ข ้อมูลทางสถิต ิ ซงึ่ เราสามารถ ดาวน์โหลดมาใชได และรวดเร็ว มีผู ้พัฒนา packages สาหรับเทคนิคการวิเคราะห์ใหม่ๆนอกจาก ่ data/text mining วิธท ี างสถิตแ ิ บบเดิม เชน  นักสถิตท ิ วี่ จ ิ ัยและพัฒนาวิธก ี ารทางสถิตใิ หม่ๆ นิยมเขียนวิธก ี าร เป็ น R packages 10  การทางานกับ R โดยทัว่ ไปคือ  พิมพ์คาสงั่ R ตามทีต ่ ้องการ ใน command line interface หรือ  โหลดไฟล์ทม ี่ ค ี าสงั่ R อยูแ ่ ล ้ว(Script file) มา run  ชา้ แต่มข ี ้อดี คือ  เป็ นการบันทึกขัน ้ ตอนการวิเคราะห์ข ้อมูล เก็บไว ้เป็ นไฟล์ สาหรับงานแต่ละงานได ้  เวลาพบความผิดพลาด ทราบได ้ว่าผิดพลาดขัน ้ ตอนไหน  ถ ้าการวิเคราะห์ต ้องทาหลายขัน ้ ตอน สามารถนาคาสงั่ มา run ซ้าใหม่ได ้โดยไม่ต ้อง click ใหม่ซ้า ๆ 11 >? t.test or >help(t.test) 12 Fast and free. State of the art: Statistical researchers provide their methods as R packages. SPSS and SAS are years behind R! 2nd only to MATLAB for graphics. Not user friendly @ start - steep learning curve, minimal GUI. No commercial support; figuring out correct methods or how to use a function on your own can be frustrating. Easy to make mistakes and not know. Mx, WinBugs, and other programs Working with large datasets is limited by RAM use or will use R. Data prep & cleaning can be messier & Active user community more mistake prone in R vs. SPSS or SAS Excellent for simulation, programming, computer intensive Some users complain about hostility on the R listserve analyses, etc. Forces you to think about your analysis. Interfaces with database storage software (SQL) 13 Many different datasets (and other “objects”) available at same time One datasets available at a given time Datasets can be of any dimension Datasets are rectangular Functions can be modified Functions are proprietary Experience is interactive-you program until you get exactly what you want Experience is passive-you choose an analysis and they give you everything they think you need One stop shopping - almost every analytical tool you can think of is available Tend to be have limited scope, forcing you to learn additional programs; extra options cost more and/or require you to learn a different language (e.g., SPSS Macros) R is free and will continue to exist. Nothing can make it go away, its price will never increase. They cost money. There is no guarantee they will continue to exist, but if they do, you can bet that their prices will always increase 14 >Variables > a = 49 > sqrt(a) [1] 7 > a = "The dog ate my homework" > sub("dog","cat",a) [1] "The cat ate my homework“ > a = (1+1==3) >a [1] FALSE numeric character string logical 15 > a = c(7,5,1) > a[2] [1] 5 ลิสต์: an ordered collection of data of arbitrary types. > doe = list(name="john",age=28,married=F) > doe$name [1] "john“ > doe$age [1] 28 16 data frame: is supposed to represent the typical data table that researchers come up with – like a spreadsheet. It is a rectangular table with rows and columns; data within each column has the same type (e.g. number, text, logical), but different columns may have different types. Example: localisation tumorsize progress XX348 proximal 6.3 FALSE XX234 distal 8.0 TRUE XX987 proximal 10.0 FALSE 17     > x<-c(1,3,2,10,5); y<-1:5 # creation of 2 vectors x [1] 1 3 2 10 5 > x+y [1] 2 5 5 14 10 > x*y [1] 1 6 6 40 25 > x/y [1] 1.0000000 1.5000000 0.6666667 2.5000000 1.0000000 > x^y [1] 1 9 8 10000 3125 > sum(x) #sum of elements in x [1] 21 > cumsum(x) #cumulative sum vector [1] 1 4 6 16 21 18 # 20 numbers from 0 to 20, > x = round(runif(20,0,20), digits=1) >x [1] 10.0 1.6 2.5 15.2 3.1 12.6 19.4 6.1 [9] 9.2 10.9 9.5 14.1 14.3 14.3 12.8 [16] 15.9 0.1 13.1 8.5 8.7 > min(x) [1] 0.1 > max(x) [1] 19.4 > median(x) # médiane [1] 10.45 > mean(x) # moyenne [1] 10.095 > var(x) # variance [1] 27.43734 > sd(x) # standard deviation [1] 5.238067 > sqrt(var(x)) [1] 5.238067 > length(x) [1] 20 > round(x) [1] 10 2 2 15 3 13 19 6 9 11 10 14 14 14 13 16 0 13 8 9 > cor(x,sin(x/20)) # corrélation [1] 0.997286 > quantile(x) # les quantiles, 0% 25% 50% 75% 100% 0.10 7.90 10.45 14.15 19.40  Samples tests ◦ Checking normality  Kolmogorov-Smirnov test > #generate 500 observations from uniform (0,1) distribution > F500<-runif(500);a<-c(mean(F500),sd(F500)) > qqnorm(F500) #normal probability plot > qqline(F500) #ideal sample will fall near the straight line >ks.test(F500, "pnorm", mean=a[1], sd=a[2]) One-sample Kolmogorov-Smirnov test data: F500 D = 0.0655, p-value = 0.02742 alternative hypothesis: two.sided

Introduction to Program R

Related documents

Products

Support

Introduction to Program R

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib