Informationsplattform Open Access: OA-Search engines

Open Access search engines

Search engines of relevance to Open Access

Traditionally, literature searches and literature procurement have been two separate procedures. Thanks to Open Access (OA), this is no longer the case. In addition to its well-known advantages, Open Access makes thematic searches more effective because direct access to retrieved documents frequently yields information which enables searchers to optimise or adjust their search strategies. This calls for search engines which can find and index OA documents. Because appropriate metadata are available for many OA documents, bibliographic searches are another important field of application for such search engines. The present web page offers an overview of interdisciplinary search engines suitable for searching for Open Access documents.

BASE is a multi-disciplinary academic search engine for freely-accessible documents. Operated by Bielefeld University Library, BASE focuses mainly on document servers. 

Content:

19 million documents from some 1100 sources (as of October 2009); a large proportion of the sources are OAI-compatible and are indexed in accordance with the OAI-PMH standard (Open Archives Initiative - Protocol for Metadata Harvesting); a small proportion of the sources are accounted for by scientific and scholarly organisations and websites and by digitised media of Bielefeld University Library. 

Content ranges from articles, books, theses and dissertations, papers and reviews to audio recordings, images, videos, primary data and musical scores.

Indexing:

Indexes all metadata and the full texts from some 40 sources.

Search options:

Basic search: Google-style search via a single search box. Advanced search with differentiated bibliographic search options (seven search aspects in all) and drop-down menus to limit the search to certain document/media types, regions and source types. Search-term expansion via an automatic search for other word forms (plural, genitive), multilingual search or inclusion of related search terms (Eurovoc-Thesaurus). Search terms can truncated (by using an asterisk as a placeholder for characters) or joined with logical operators. There is a dynamically generated drop-down menu for the refinement of search results. 

Results display and additional functionalities:

As an alternative to ranking results according to relevance, other sorting options are offered (for example, sorting by year of publication). BASE also offers detailed descriptions of search results and a cross-searching link to Google Scholar in order to find citing documents or other versions.

Assessment:

Can be recommended both for thematic and bibliographic searches. Advantages: user friendly; efficient retrieval functionalities suitable for metadata searches; comparatively high bibliographic reliability. Because of the sources it covers, BASE can retrieve a large number of documents from the Invisible Web. A comparatively high proportion of the results feature links to full texts.

Google Scholar is a Google search engine that specialises in scholarly and scientific literature.   Its particular potential from an Open Access point of view lies in the fact that freely-accessible scholarly resources in the results list are marked with an identifying symbol.

Although, officially, Google Scholar still has beta status, it has already proven itself to be fully effective in practice.

Content:

No exact details are available from Google. The results of full-text searches are roughly comparable to those of Scirus.

The lion's share of the documents is accounted for by journal articles. Other document types include conference papers, preprints, postprints, reports, theses and dissertations, term papers, metadata from indexing/abstracting services, books (Google Books) and citations.

OAI-compatible repositories are less comprehensively indexed than, for example, in OAIster or OpenDOAR Search. However, Google Scholar does index a considerable number of web documents and postprints located, for example, on personal or institutional websites and in non-OAI-compatible repositories.

Identification of freely-accessible documents in the results list:

Freely-accessible documents in the results list (with a link from the title to the full text) are flagged with a green triangle located to the left of the title. If there is a freely-accessible version of a restricted-access document in the results list, the link to the full text (server name) is displayed to the right of the title. The proportion of results containing a link to the full text ranges from between 25% and 60%, depending on the topic.

Indexing:

Full-text indexing of the documents/document descriptions.

Search options:

Besides the familiar Google keyword and phrase-search options, the advanced search offers additional bibliographic search fields for author, source (especially journal) and publication period. However, the automatic identification is not reliable and is sometimes faulty. The ranking system is similar to that used by the main Google search engine. Citation frequency is a major ranking criterion.

Results display and additional functionalities:

Multiple versions of one publication are grouped. The "Cited by" feature lists documents in the Google Scholar database that have cited documents in the group.

Assessment:

Particularly suitable for initial thematic research. Because of the fact that full texts are indexed and the wide range of the indexed sources, a comparatively large number of freely accessible documents are retrieved. Hence, the documents featured on the initial results pages are correspondingly relevant. Additional searches for the latest research are not really necessary because both freely-accessible and restricted-access documents are featured.

Because of the said shortcomings and the absence of a chronological sorting option, Google Scholar cannot really be recommended for bibliographic searches.

OAIster is a comprehensive metadata catalogue of the documents in OAI-compatible repositories. From the beginning of 2002 to the end of October 2009 it was the benchmark OAI search engine project of the University of Michigan. In October 2009 OAIster's records transitioned to the WorldCat database, which is operated by the semi-commercial library service OCLC.

Content:

As of mid-October 2009, the OAIster database contained 23 million documents from approximately 1150 digital sources, which are indexed in accordance with the OAI-PMH standard (Open Archives Initiative - Protocol for Metadata Harvesting). Resources range from articles, books and audio recordings to images, movies and data collections.

Indexing:

Metadata indexing.

OAN-Suche (Open Access Network Search) is a search interface for documents in DINI-certified repositories. It is being developed within the framework of Open-Access-Netzwerk (Open Access Network), a project funded by the German Research Foundation (DFG). Although still at the alpha stage, the first version of the interface, which is limited to the basic search and browsing functions, can already be viewed and used. The underlying system is implemented as a distributed application on the computers of the project partners. OAI-PMH (Open Archives Initiative - Protocol for Metadata Harvesting) is used to export the metadata for the search from the repositories; the metadata are linked in turn to the full texts stored at diverse locations.

Content:

Documents from 33 DINI-certified repositories (as of October 2009).

Indexing:

Metadata indexing.

Search options:

Single search box. The query can consist of one or several consecutive words from the title, of the name of the author or of the text of a keyword (not, however, of the text of an abstract). The search can be limited to groups or sub-groups of the Dewey Decimal Classification.

Results display and additional functionalities:

The list of results is sorted by repository. In addition to the title and the name of the author, each search result features the assigned keywords, the document address, the repository and the language (flag symbol). A long version of each search result comprising all metadata (including the abstract) can be selected. 

In the medium term, two value-added services (usage statistics and citation analysis) developed within the framework of the same project will be integrated into the search interface.

Open J-Gate is a search engine which indexes articles in OA journals. It is sponsored by Informatics India Ltd..  

Content:

As of October 2009, Open J-Gate had indexed over 300 000 journal articles from 6050 freely-accessible journals, of which approximately 3500 were peer-reviewed periodicals (Open Access journals in the strict sense of the word).

Indexing

Both full texts and metadata are indexed.

Search options:

Basic and advanced search options, both of which can be limited to peer-reviewed or professional and industry journals. The advanced search offers three specifiable search fields and can be refined via a drop-down menu with two-tiered lists of subject categories. In addition, journals can be browsed by title, publisher or subject.

Results display and additional functionalities:

The initial results display is a chronologically sorted list of titles. The display field of an individual search result can be expanded into a detailed description including an abstract. Full-text access is achieved via a drop-down menu featuring the data types available. 

Assessment:

This search engine is especially suitable for finding articles in OA journals. OA journal articles can also be searched for via the relatively simple Find articles search function of the  Directory of Open Access Journals (DOAJ). Articles in OA journals are also indexed by Scirus and Google Scholar. While Scirus indexes only a selection of articles in OA journals, users of Google Scholar can search for OA journal articles by source only.

This search service is suitable for initial thematic searches. The fact that it has access to a large number of quality-controlled repositories and that full texts are indexed is an advantage. However, the way in which search results are obtained is not transparent. Several tests have revealed that – with reference to individual repositories – the number of documents retrieved for specific queries in Google Custom Search is considerably lower than that achieved via a Google Web Search, although, in principle, the number of hits should be the same. However, comparative testing of the retrieval performance of OpenDOAR Search and other OA search engines shows that OpenDOAR Search is suitable for thematic searches nonetheless. For example, in November 2009 the query "lorch meisterlin" yielded 7 hits (full texts) in OpenDOAR, 7 (2 full texts and 5 quotations) in Google Scholar, and no hits in the other search services (OAIster was not searchable at the time).

Content:

24-29 million documents from 1500 repositories (as of October 2009). 

Indexing:

Full-text indexing of the documents via Google Custom Search; the metadata are not indexed even if they are available in OAI-compatible form.

Search options:

Single-field Google Custom Search with no additional drop-down menus or input fields.

Results display and additional functionalities:

Google-style results display. In addition to the link to the full text, three further links are provided where applicable: "Cited by", "Related documents" and "All versions".

Assessment:

Well suited for thematic searches in the extensive document collections of quality-controlled repositories.

Scientific Commons is a search engine project of the Institute for Media and Communication Management (mcm) at the University of St. Gallen. It aims to provide the most comprehensive and freely available access to scientific knowledge on the Internet. The service still has beta status.

Content:

31 million documents (as of October 2009). The search engine indexes some 1150 repositories (document servers with an OAI interface) and also covers web pages with bibliographies that contain metadata. Such pages can also be registered with Scientific Commons. Documents include articles, theses and dissertations, reports and other scholarly and scientific texts.

Indexing:

Full-text and metadata indexing; large range of file types up to 3 MB in size.

Search options:

Single search box. No advanced search option available. In practice, the expansion of search terms by including of word variants or via truncation leads frequently to unwanted results. For example, the query "internal relativity" yields results relating to "international relative prices". The list of results is reloaded when scrolled. The primary ranking criterion is word frequency. 

Results display and additional functionalities:

The links in the results do not lead directly to the full text but rather to an abstract of the document and the publication details, which include a link to the full text. Results can be sorted by relevance or by year (although the latter option frequently gives rise to outliers in the results list). The use of the language criterion (English or German) is the only way to successfully refine the results.

Assessment:

The advantage of having access to an extremely large collection of documents is diminished somewhat by the relatively bulky list of results. The display options are partially faulty. The additional information on the results can be viewed only with suitable system configurations/browsers. The useful "Publication List Details" with lists of authors and co-authors are no longer available. They have obviously been removed to make way for advertisements.

Scirus is a science-specific search engine sponsored by Elsevier. It covers a diverse range of Internet sources.

Content:

50 million documents and 450 million web pages (as of October 2009).

Subject range: main focus is on the health, life and physical sciences; social sciences also covered.

Resources indexed:

  • Publications, especially journal articles, from 15 publishers/aggregators whose (only) main publisher is Elsevier, and from the OA publishers BioMed Central, Pubmed Central, Hindawi Publishing and Projekt Euklid
  • Documents from 20 repositories including such well-known archives as ArXiv.org, NDLTD, PsyDok and RePEc
  • "The rest of the scientific web" (Scirus) including institutional web pages and science-related commercial pages, scientists' homepages.

Restriction of the search results to Open Access documents:

In the advanced search, users have the option to restrict the search to digital archives. Depending on the topic, the well-known Open Access publishers (see above) can be included in the search.

Indexing:

Full-text and metadata indexing of publishers' versions and documents in repositories.  

Search options:

Basic search and advanced searched with differentiated search options: two input fields which can be joined in different ways; a choice of 8 search fields. Search terms can be truncated (insertion of symbol as placeholder) and joined by logical operators. The search can be restricted to groups of sources (see above) or individual sources, to a certain publication period, document type or subject area. Dynamically generated drop-down menus can be used to filter or refine the search results. 

Results display and additional functionalities:

As an alternative to ranking by relevance, the results can be sorted by date. A link to "similar results" is offered for all results. If a result (with URL display) is from a source in the "rest of the scientific web" (see above) the results display can be limited to documents with the same server address.

Assessment:

A high number of results. Scirus stands out by virtue of the fact that it comprehensively indexes scientifically relevant web pages, many of which are text documents. As of October 2009, some 27 million OA documents were available via the indexed repositories and OA publishers (BioMed Central etc). Because it indexes MEDLINE / PubMed, Scirus can be used for medical literature-research purposes. Depending on the number and size of the relevant sources, it can also be used for initial research in other subject areas, such as physics. The many search options offered deserve special mention. 

Sources

Pieper, Dirk & Wolf, Sebastian (2009). Wissenschaftliche Dokumente in Suchmaschinen. (Scholarly and scientific documents in search engines) In: Handbuch Internet-Suchmaschinen (Handbook of Internet Search Engines). Ed. Dirk Lewandowski. Heidelberg 2009, pp. 356-374.

Norries, Michael et al. (2008). Finding open access articles using Google, Google Scholar, OAIster and OpenDOAR. Online Information Review 32 (2008) 6, pp. 709-715.

Search Open Access Repositories. The Library at UCD.

Wissenschaftliche Suchmaschinen im Vergleich (Scientific search engines: a comparison) University of Zurich. Geographical Institute. Library.

Content mentor

Edited/compiled by:

Wolfgang Binder, formerly of the University of Bielefeld, former project collaborator, information platform open-access.net