Dealing Effectively With Data Section 1 Introduction Introduction. What Happened to the Library? The fact that you’re taking this course on the SUNY Learning Network, independent of any physical classroom space or library, probably has already clued you in that I have some reservations about the exact title of this course. I will ask you to visit a library as part of your assignments. We hope that Feinberg Library is a friendly place for studying or getting together with friends to work on an assignment. We also have around 300,000 books and 1400 magazine and journal subscriptions. Yes, you can find lots of newspaper articles touting the “virtual library” or the “library without walls”. Ever tried to read an entire book or journal article online? I must admit I usually make a paper copy of magazine articles that I find online. If you think I’m wrong about this, Feinberg Library gives you access to a couple of online book services: NetLibrary and Books 24/7. Give them a try. Yes, you can get the book this way at anytime of day or night. However, try reading it this way on a slow Internet connection or at the beach. I guess I don’t see the physical library building disappearing anytime soon. Also, don’t forget about the people that work at Feinberg and other libraries. I know it’s unlikely that anybody is going to write a movie script character for Julia Roberts or Brad Pitt to star as a librarian in the next romantic thriller. Ok, I can accept that. On the other hand most of the library staff doesn’t live for telling people to “shush” or quiet down in the library. In fact we totally redid our Reference room a couple of years ago to make it comfortable for small groups to do research together at the computers. The reference librarians and all members of the library staff are here to help you with your research. Don’t be afraid to ask us for help or just stop by to chat. Enough of my soapbox. . . Change is the Only Constant What are the challenges for you during this class and 10 to 20 years down the road? I know you’ve heard this one before, but it bears repeating for this class. The one constant you will encounter when working with computing and information environments is change. Web pages certainly do not stay the same; you’ll be lucky if the structure for “library research resources” pages we’ll be using for large part of the course don’t change during this semester. Yes, we have plans for a major overhaul again for next fall. Mainstream computer program products also change. My first word processor was WordPerfect. I think it’s still around, but you’d be hard pressed to find anyone who owns a current version. However, one of these days Bill Gates is going to lose a government lawsuit and have to open up the personal software computing field a bit. Databases?? We keep adding new ones. . . The InfoTrac that you used on a CD-Rom in high school is now called IAC Searchbank. Its search interface may look a little different but it’s the same product. All of our databases continually have minor tweaking in the search 1 Dealing Effectively With Data Section 1 Introduction interface or the way the results get displayed back to you. Fortunately, once you understand how databases are set up you will be easily able to adapt to these changes. We’ll talk about this in much more detail in chapter three. A Brief History of Computing and the World Wide Web As we start this course, you need to keep in mind that the history of the computer and the Word Wide Web is extremely recent. The precursors to computers, code breaking machines invented for World War II, date only back to the 1940’s. The ENIAC, considered the first real computer doesn’t appear until the late 1940’s and consisted of a large room filled with vacuum tubes weighing 30 tones for its computational power. How did the scientists transfer information to be processed to these computers? Today we’re accustomed to typing in our word processors or typing in HTML, JAVA, or Visual Basic programs. It’s only been 20 years ago that the main way to enter information into one of these huge (in terms of size, not processing power) computer was to use punch cards. In other words, you’d type your BASIC program using a special typewriter that punched holes in the cards representing special commands. Each line had to be on a separate card. For example, the following “program” hopefully prints out the word hello 10 times 10 X=1 20 FOR X=1 to 10 30 Print "Hello" 40 X=X+1 50NEXT X 60 END This simple program would have required at least six punch cards that would then have needed to be fed through a special punch card reader into the computer. Let me give you some idea as to the scale on which computing power has increased since as recently as the 1960’s. Co-founder of Intel (as in your Pentium processor) Gordon Moore observed a phenomenon in computing in 1965 that has since become referred to as Moore’s law. “Transistors per integrated circuit” is a method of measuring computing power. In 1965 he predicted that the number of “transistors per integrated circuit” would double every 18 months and that this growth would continue until 1975. This ratio in computing power increase still holds up well today. Here’s an illustration from Intel’s own web site showing increases in computing power. I’m sure you recognize some of these computing models from your own experience. My first computer when I started here at PSU was a model 286. The computer I have now is a Pentium III. It pretty much blows anything I could do with the 286 out of the water and I’m sure many of you have computer setups a lot more powerful than mine! 2 Dealing Effectively With Data Section 1 Introduction So How Does This Internet-Web Fit into the Picture? Sorry Al, you didn’t invent the Internet. However, I must credit Former Vice-President Al Gore with working a lot to popularize the Internet for the average person and being the architect of legislation that benefited the growth of the Internet. Much like Moore’s law, the Internet actually started during the 1960’s with names like ARPANET and DARPANET. It was used primarily as a means of transferring information (files and data) between scientists and government researchers. It did three main things: TELNET directly to another computer, FTP file transfer protocol, and a very rudimentary form of email. Today in addition to these three things it also includes HTTP, Hypertext transfer protocol which is what we use to bring up web pages. Do I expect you to remember DARPANET or what FTP stands for? No. I just want to make sure you understand the difference between the World Wide Web and the Internet. The World Wide Web is really a subset of the Internet. Without all the Internet components you could not, for example, do email. There’s another feature of Internet history that you need to know about to understand today’s Internet. The D in DARPANet is for the U.S. Department of Defense. You’ll recall that the 1960’s and 1970’s were the height of the Cold War with the advent of the Cuban missile crisis and the nuclear ICBM buildup. Russian missiles were considered a real threat and we got to do “duck and cover” drills in school. (I doubt this would have done much good against a nuclear bomb, but I just want do give you some sense of the time.) Anyway, because of this the Internet was set up to be redundant. What do I mean by this? On the Internet information gets transferred between various points on the way to its final destination. These various points can be changed along the way if a particular point (location) was attacked by a nuclear bomb. It’s called routing and actually came 3 Dealing Effectively With Data Section 1 Introduction in very useful during the recent attack on New York City. Our SUNY campuses in and near the city were able to get Internet access by having the Internet routed through various other points of the system that were not affected. Ultimately for you as the Internet user this redundancy has one major consequence. The Internet and the information you pull off the world wide web today has no central hierarchy. It has no governing board or authority that decides what information is available or who (person, organization, government) has the right to put information on the Internet. The result for you as the Internet information user is that you need to be extremely aware of figuring out whom exactly is the source of the data that you’re pulling off the web. Just how much stuff is there really on the World Wide Web part of the Internet? Interestingly enough the World Wide Web did not exist yet when I was in college. Most people trace the origin of the World Wide Web to around 1993 with the advent of Gopher, a text based (no graphics) way of viewing web information and Mosaic, a graphical web program that is a precursor to Netscape and Internet Explorer (IE). This means that in addition to the fact that the Internet has no central authority, the World Wide Web that we take for granted when we use Netscape or IE is less than 10 years old. Hopefully this will go a long way toward explaining the problems with Internet search engines when we get to Module 7. Let’s take a quick look at the growth of the World Wide Web from Hobbes’ Internet Timeline, which is considered one of the standards for examining the WWW. As you can see, in 1993 there were only 130 WWW sites. Most of these were governmental or from universities. Remember there is a difference between a web site and a web page. Each document that you view as part of this SLN course is considered 4 Dealing Effectively With Data Section 1 Introduction an individual web page. The entire course is considered a web site. Hobb’s timeline shows close to 40 million website by the end of 2001. Each of these web sites can contain many thousands of pages. Interestingly, many of these websites are less than three years old. No wonder you can’t find anything on the WWW. It’s grown so quickly that the search engine software can’t keep up! 5