METU Computer Engineering Ceng574 Assignment 1: Dataset selection and presentation Seeds DATA SET Student: 1714443 Alperen Eroğlu 22.10.2012 Instructor: Prof.Dr. Volkan ATALAY 1714443 Alperen EROGLU 1 Outline 1. Name 2. Origin 3. Short Description 4. Dimension of feature vector (number of attributes) 5. Number of classes or groups 6. Number of samples (objects) 7. Explanation of each attribute: label, explanation, type, min, max, mean, std deviation 8. Goal 9. Any previous work on this dataset 22.10.2012 1714443 Alperen EROGLU 2 Name and Origin SEEDs DATA SET Source: Małgorzata Charytanowicz, Jerzy Niewczas Institute of Mathematics and Computer Science, The John Paul II Catholic University of Lublin, Konstantynów 1 H, PL 20-708 Lublin, Poland e-mail: {mchmat,jniewczas}@kul.lublin.pl Piotr Kulczycki, Piotr A. Kowalski, Szymon Lukasik, Slawomir Zak Department of Automatic Control and Information Technology, Cracow University of Technology, Warszawska 24, PL 31-155 Cracow, Poland and Systems Research Institute, Polish Academy of Sciences, Newelska 6, PL 01-447 Warsaw, Poland e-mail: {kulczycki,pakowal,slukasik,slzak}@ibspan.waw.pl 22.10.2012 1714443 Alperen EROGLU 3 Short Description ● ● ● Measurements of geometrical properties of kernels belonging to three different varieties of wheat, Kama, Rosa and Canadian. A soft X-ray technique and GRAINS package were used to construct all seven, real-valued attributes. The data set can be used for the tasks of classification and cluster analysis. 22.10.2012 1714443 Alperen EROGLU 4 Dimension of feature vector ● ● ● Number of attributes in this dataset is 7. Seven geometric parameters of wheat kernels were measured All of these parameters were real-valued continuous. 22.10.2012 1714443 Alperen EROGLU 5 Number of classes and samples ● ● Number of classes in this data set is 3, three different varieties of wheat: Kama, Rosa and Canadian Number of instances in this dataset is 210. 22.10.2012 1714443 Alperen EROGLU 6 Explanation of each attribute(1) ● 22.10.2012 1. area A, 2. perimeter P, 3. compactness C = 4*pi*A/P^2, 4. length of kernel, 5. width of kernel, 6. asymmetry coefficient 7. length of kernel groove. 1714443 Alperen EROGLU 7 Explanation of each attribute(2) LA- MEAN BEL 1 2 3 4 5 6 7 22.10.2012 MIN MAX 14,8475238 10,59 21,18 14,5592857 12,41 17,25 0,87099857 0,8081 0,9183 5,62853333 4,899 6,675 3,25860476 2,63 4,033 3,70020095 0,7651 8,456 3,25860476 4,519 6,55 1714443 Alperen EROGLU TYPE REAL REAL REAL REAL REAL REAL REAL STD. DEV. 2,90276331 1,30284559 0,02357309 0,44200731 0,37681405 1,49997296 0,23603043 8 Goal Goal is to predict three different varieties of wheat: Kama, Rosa and Canadian 22.10.2012 1714443 Alperen EROGLU 9 Previous Works ● ● Relevant Papers: M. Charytanowicz, J. Niewczas, P. Kulczycki, P.A. Kowalski, S. Lukasik, S. Zak, 'A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images', in: Information Technologies in Biomedicine, Ewa Pietka, Jacek Kawa (eds.), Springer-Verlag, BerlinHeidelberg, 2010, pp. 15-24. Citation Request: ● Contributors gratefully acknowledge support of their work by the Institute of Agrophysics of the Polish Academy of Sciences in Lublin. 22.10.2012 1714443 Alperen EROGLU 10 References ● http://archive.ics.uci.edu/ml/datasets/seeds 22.10.2012 1714443 Alperen EROGLU 11 THANKS... 22.10.2012 1714443 Alperen EROGLU 12