Evolution of the Internet as an Expanding Platform for Global Research
Most IT specialists and consultants use the Internet for research, but how many are using it to its full capacity? We have come to rely on Google, or one of the other leading search engines, to supply us with the information we require, but how do we access the information Google’s spiders can’t find? And, more importantly, how do we keep one step ahead of our competition in the race for up-to-date information?
We all realize that the Internet provides access to a wealth of information on a vast number of topics, but we have to be aware that the quality of information is only as good as its source. Information is compiled from a widely diverse bank of contributors. Academic theses, corporate white papers, and simple individual opinions all have different contexts and are variously reliable. While it is tempting to think of the Internet as a huge library where all the holdings have been catalogued according to consistent criteria and with consistent language, this is simply not the case. The cataloguing is inconsistent, the amount of material is vast and increasingly difficult to control, and much of it lacks credibility.
The Internet has vast potential for technical research. Scientists, engineers and students increasingly use and adapt the Internet to collaborate with colleagues throughout the world. Networked information and data forms the backbone of modern research. The next generation Internet is always evolving out of this collaborative process, and we need to be aware of access issues that limit opportunities for global collaboration. Technical, economic and legal factors all affect Internet-enabled collaborative research.
EXPLORE THE DEEP WEB
It is important that we understand how to access the deep or invisible web. The deep web contains content that is not normally found by search engines such as Google or Lycos. The standard search programs, or spiders, that read web pages and identify content cannot, and often will not, enter into databases that are not static web pages. A Google search, for example, may identify that a university library database exists, but it will not search the content of that database—that we must do ourselves. A treasure trove of information exists in the deep web, and we can adapt our search strategies to find it. www.invisible-web.net is a directory of high quality databases of particular interest to global researchers.
UNDERSTANDING WEB SEARCH ENGINES
Most search engines search internet sites for key words that correspond to key words that the user has first entered. A web search engine consists of three components:
1. Spider: A program that traverses the web from link to link. It identifies meta-tags that have been placed into websites for identification purposes. It also reads web pages and collects and indexes the content. Different spiders have different capabilities and use different analytic algorithms.
2. Index: A database that contains a copy of information selected and gathered from each web page by the spider
3. Search Engine Mechanism: software that enables the user to query the index and which usually returns results in relevancy ranked order. Relevancy is often based upon the number of terms that correspond to the search terms that the user has entered. It is also based upon the number of times that a search term is repeated in the data contained in the index that the spider has compiled about a target web page.
HOW TO FORMULATE YOUR QUERIES
There are three key tips in order to specify a computer database search:
1. Identify key concepts: Break down the topic into the component concepts. For example if you wanted information on lumberjacks working in British Columbia, Canada you might use: LUMBERJACKS BC CANADA
2. List alternative keywords for each concept: Identify different ways of saying the same thing for each of the concepts, some keywords may be very specific while others may have a number of alternative search terms:
a. LUMBERJACKS
i. Lumberjacks
ii. Tree Fellers
iii. Forestry Contractors
b. BC
i. BC
ii. British Columbia
iii. Western Canada
c. CANADA
3. Specify the logical relationship between key words: The relationship between words is usually expressed in Boolean logic. You specify relationships by using any of three logical separators AND, OR, NOT (in addition the (+) key may be used instead of AND)
Hence: LUMBERJACKS+BC+CANADA
COLLABORATION IN GLOBAL RESEARCH
The USA and the European Union are collaborating on a research agenda for global access to a large scientific database in biology, physics and other scientific disciplines. The research community is currently assembling large volumes of data for this database. At the same time, new technologies are being developed to store, access and extract the information that this increasing collaboration continues to compile.
The Internet itself, and the supporting technologies that are evolving to support this global endeavour will require international collaboration if they are to succeed. Effective global data communications will need broad bandwidth. We will have to resolve access rights to complex information; and we will have to develop high-fidelity scientific modeling so that we can actually find complex data.
WORLD-WIDE NEXT GENERATION INTERNET TO BE ESTABLISHED FOR RESEARCH AND EDUCATION
As the Internet grows larger and more crowded, government, scientists, and universities need new ways to send information quickly and powerfully. Internet2 and the Next Generation Internet (NGI) are two projects that address the need to move lots of data fast.
Internet2 is sponsored by high-tech companies and universities, while the Next Generation Internet is a government project. Both projects hope to develop new, faster technologies to enhance research and communication, and both projects will eventually improve the current commercial Internet.
Internet2 in the US, CANARIE in Canada, and the NREN Consortium in Europe are key leaders in advanced networking. They recently announced the formation of the Global Terabit Research Network (GTRN: www.indiana.edu/~gtrn). The goal of this international partnership is to establish a true world-wide next generation Internet that will connect national and multinational high-speed research and education networks. Participation of the Asia Pacific and other regions is expected soon.
Douglas Van Houweling, the President and CEO of Internet2 makes the scope and goals of the new Internet clear. He writes:
“The scientific community is now truly international in just about all fields, and many vitally rely on the integration of computation, data, instruments and arrays of sensors that enable e-science. The GTRN will provide a framework in which the advanced networking community can collectively manage and provision the global-scale, high-performance, persistent infrastructure required by the research and education communities.”
Fernando Liello, the Chairman of the European NREN Consortium also shows the leadership role that the scientific community is playing in the evolution of more effective e-collaboration. He writes:
“The GTRN will provide the connectivity and advanced Internet services needed by major multinational scientific collaborations in areas such as high-energy physics, radio and optical astronomy, weather forecasting and climatology, biological sciences and earth sciences.”
Clearly both business and the individual consumer will profit from the lead that the scientific community has taken in evolving more efficient ways to move, catalogue and find information.
© 2005 Global Reach Publishing Inc.