Informationsplattform Open Access: OA search engines

Open Access search engines

Search engines of relevance to Open Access

Traditionally, literature searches and literature procurement have been two separate procedures. Thanks to Open Access (OA), this is no longer the case. In addition to its well-known advantages, Open Access makes thematic searches more effective because direct access to retrieved documents frequently yields information which enables searchers to optimise or adjust their search strategies. This calls for search engines which can find and index OA documents. Because appropriate metadata are available for many OA documents, bibliographic searches are another important field of application for such search engines. The present web page offers an overview of interdisciplinary search engines suitable for searching for Open Access documents.

BASE is a multi-disciplinary academic search engine for freely-accessible documents. Operated by Bielefeld University Library, BASE focuses mainly on document servers. 

Content:

Over 23 million documents from more than 1500 sources (as of April 2010); a large proportion of the sources are OAI-compatible and are indexed in accordance with the OAI-PMH standard (Open Archives Initiative - Protocol for Metadata Harvesting); a small proportion of the sources are accounted for by scientific and scholarly organisations and websites and by digitised media of Bielefeld University Library. 

Content ranges from articles, books, theses and dissertations, papers and reviews to audio recordings, images, videos, primary data and musical scores.

Indexing:

Indexes all metadata and the full texts from some 40 sources.

Search options:

Basic search: Google-style search via a single search box. Advanced search with differentiated bibliographic search options (seven search aspects in all) and drop-down menus to limit the search to certain document/media types, regions and source types. Search-term expansion via an automatic search for other word forms (plural, genitive), multilingual search or inclusion of related search terms (Eurovoc-Thesaurus). Search terms can truncated (by using an asterisk as a placeholder for characters) or joined with logical operators. There is a dynamically generated drop-down menu for the refinement of search results. 

Results display and additional functionalities:

As an alternative to ranking results according to relevance, other sorting options are offered (for example, sorting by year of publication). BASE also offers detailed descriptions of search results and a cross-searching link to Google Scholar in order to find citing documents or other versions.

Assessment:

Can be recommended both for thematic and bibliographic searches. Advantages: user friendly; efficient retrieval functionalities suitable for metadata searches; comparatively high bibliographic reliability. Because of the sources it covers, BASE can retrieve a large number of documents from the Invisible Web. A comparatively high proportion of the results feature links to full texts.

Google Scholar is a Google search engine that specialises in scholarly and scientific literature.   Its particular potential from an Open Access point of view lies in the fact that freely-accessible scholarly resources in the results list are marked with an identifying symbol.

Although, officially, Google Scholar still has beta status, it has already proven itself to be fully effective in practice.

Content:

No exact details are available from Google. The results of full-text searches are roughly comparable to those of Scirus.

The lion's share of the documents is accounted for by journal articles. Other document types include conference papers, preprints, postprints, reports, theses and dissertations, term papers, metadata from indexing/abstracting services, books (Google Books) and citations.

OAI-compatible repositories are less comprehensively indexed than, for example, in OAIster or OpenDOAR Search. However, Google Scholar does index a considerable number of web documents and postprints located, for example, on personal or institutional websites and in non-OAI-compatible repositories.

Identification of freely-accessible documents in the results list:

Freely-accessible documents in the results list (with a link from the title to the full text) are flagged with a green triangle located to the left of the title. If there is a freely-accessible version of a restricted-access document in the results list, the link to the full text (server name) is displayed to the right of the title. The proportion of results containing a link to the full text ranges from between 25% and 60%, depending on the topic.

Indexing:

Full-text indexing of the documents/document descriptions.

Search options:

Besides the familiar Google keyword and phrase-search options, the advanced search offers additional bibliographic search fields for author, source (especially journal) and publication period. However, the automatic identification is not reliable and is sometimes faulty. The ranking system is similar to that used by the main Google search engine. Citation frequency is a major ranking criterion.

Results display and additional functionalities:

Multiple versions of one publication are grouped. The "Cited by" feature lists documents in the Google Scholar database that have cited documents in the group.

Assessment:

Particularly suitable for initial thematic research. Because of the fact that full texts are indexed and the wide range of the indexed sources, a comparatively large number of freely accessible documents are retrieved. Hence, the documents featured on the initial results pages are correspondingly relevant. Additional searches for the latest research are not really necessary because both freely-accessible and restricted-access documents are featured.

Because of the said shortcomings and the absence of a chronological sorting option, Google Scholar cannot really be recommended for bibliographic searches.

OAIster is a comprehensive metadata catalogue of documents in OAI-PMH-compliant repositories and digital libraries worldwide. Since February 2010, it has been freely accessible as a WorldCat database at OCLC. OAIster began in 2002 as a project of the University of Michigan Library to test the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) in practice. Until 2009, the retrieval service was hosted by the University of Michigan and became known throughout the world as a reference service for Open Archive collections. After a cooperation phase with OCLC in 2009, OAIster was accessible only to OCLC subscribers for several months.

Content:

As of April 2010, OAIster contained over 23 million records representing digital resources from 1100 mainly OAI-PMH-compliant sources. These resources include journal articles, research papers, dissertations, postprints, preprints, reports, audio files, image files, movies, and data collections. They are either born digital or in digitised form.

Indexing:

Metadata indexing.

Search:

Users can choose between a single-field basic search on the home page and an advanced search. The advanced search has three input fields, each of which offers a choice of 14 search aspects (including subject). Users have the option to limit the results by publication date, language ("English", "Non-English"), and resource type. The single-field basic search interface features an inoperative pull-down menu. The search terms entered can also be directly combined with a total of 20 coded search aspects (Expert Search). On the search results pages, users are given the option to refine the search by author and year.

Results display and additional functionalities:

Users can opt to have the list of results sorted according to relevance, author, title, or date ("Oldest First" or "Newest First"). In addition to bibliographic details (title, author, publisher), the brief view of each hit features two constant fields – for the database and the "confirmation of ownership" by WorldCat libraries. The "publisher" field often contains unclear dates of publication comprising two identical or two different years. Here, as in the linked full view of the metadata record, there are some defects in the mapping of the metadata. The form used to present the full metadata, which include an abstract and identifiers, is confusing and oversized. It is overloaded with typical library fields and functionalities such as copy/holding information, and user lists. However, the links to citation and download information from citebase.org are useful. The interface is available in six different language. The default setting depends on the user.

Assessment:

Can be recommended for searching for documents in repositories, especially OAI-PMH-compliant sources.

In view of the fact that public access was interrupted for several months, and that the streamlining of the records interface by cutting out dispensable options and fields is still pending (as of April 2010) and well overdue, OCLC has not completely lived up to expectations with regard to the continuity and maintenance of this well-established service. The transition to OCLC has brought some functional improvements. However, there have been some changes for the worse with regard to the consistency and optical presentation of the metadata. In the meantime, OAIster has to share its leading role as a qualified and comprehensive reference service for documents in repositories with the Bielefeld Academic Search Engine BASE.

 

OAN-Search (OAN-Suche) is a search interface for documents in DINI-certified repositories. It is being developed within the framework of OA-Network (OA-Netzwerk), a project funded by the German Research Foundation (DFG). Although still at the alpha stage, the first version of the interface, which is limited to the basic search and browsing functions, can already be viewed and used. The underlying system is implemented as a distributed application on the computers of the project partners. The metadata used for the search are harvested from the repositories via the OAI-PMH (Open Archives Initiative - Protocol for Metadata Harvesting); the metadata are in turn linked to the full texts stored at diverse locations.

Content:

Documents from 33 DINI-certified repositories (as of October 2009).

Indexing:

Metadata indexing.

Search options:

Single search box. The query can consist of one or several consecutive words from the title, of the name of the author or of the text of a keyword (not, however, of the text of an abstract). The search can be limited to groups or sub-groups of the Dewey Decimal Classification.

Results display and additional functionalities:

The list of results is sorted by repository. In addition to the title and the name of the author, each search result features the assigned keywords, the document address, the repository and the language (flag symbol). A long version of each search result comprising all metadata (including the abstract) can be selected. 

In the medium term, two value-added services (usage statistics and citation analysis) developed within the framework of the same project will be integrated into the search interface.

Open J-Gate is a search engine which indexes articles in OA journals. It is sponsored by Informatics India Ltd..  

Content:

As of October 2009, Open J-Gate had indexed over 300 000 journal articles from 6050 freely-accessible journals, of which approximately 3500 were peer-reviewed periodicals (Open Access journals in the strict sense of the word).

Indexing

Both full texts and metadata are indexed.

Search options:

Basic and advanced search options, both of which can be limited to peer-reviewed or professional and industry journals. The advanced search offers three specifiable search fields and can be refined via a drop-down menu with two-tiered lists of subject categories. In addition, journals can be browsed by title, publisher or subject.

Results display and additional functionalities:

The initial results display is a chronologically sorted list of titles. The display field of an individual search result can be expanded into a detailed description including an abstract. Full-text access is achieved via a drop-down menu featuring the data types available. 

Assessment:

This search engine is especially suitable for finding articles in OA journals. OA journal articles can also be searched for via the relatively simple Find articles search function of the  Directory of Open Access Journals (DOAJ). Articles in OA journals are also indexed by Scirus and Google Scholar. While Scirus indexes only a selection of articles in OA journals, users of Google Scholar can search for OA journal articles by source only.

This search service is suitable for initial thematic searches. The fact that it has access to a large number of quality-controlled repositories and that full texts are indexed is an advantage. However, the way in which search results are obtained is not transparent. Several tests have revealed that – with reference to individual repositories – the number of documents retrieved for specific queries in Google Custom Search is considerably lower than that achieved via a Google Web Search, although, in principle, the number of hits should be the same. However, comparative testing of the retrieval performance of OpenDOAR Search and other OA search engines shows that OpenDOAR Search is suitable for thematic searches nonetheless. For example, in November 2009 the query "lorch meisterlin" yielded 7 hits (full texts) in OpenDOAR, 7 (2 full texts and 5 quotations) in Google Scholar, and no hits in the other search services (OAIster was not searchable at the time).

Content:

24-29 million documents from 1500 repositories (as of October 2009). 

Indexing:

Full-text indexing of the documents via Google Custom Search; the metadata are not indexed even if they are available in OAI-compatible form.

Search options:

Single-field Google Custom Search with no additional drop-down menus or input fields.

Results display and additional functionalities:

Google-style results display. In addition to the link to the full text, three further links are provided where applicable: "Cited by", "Related documents" and "All versions".

Assessment:

Well suited for thematic searches in the extensive document collections of quality-controlled repositories.

This scholarly search engine for Open Access publications, which currently covers four large databases of scientific and scholary content and numerous German repositories (as of March 2010), is an initiative of the Swiss company Point Software AG. It  has been in operation since February 2010.

Content:

ScienceGate currently holds 1.8 million documents from Open Access journals and 1400 universities. Its internal database contains over 26 million data records. The databases or sub-databases 'PubMed Central', 'ArXiv.org', 'CiteSeer', and the DOAJ articles (identifier: 'oai:doaj-articles…') account for almost 90% of the documents (as of March 2010).

Indexing:

ScienceGate indexes metadata.

Search options:

There is a single-line search box with a choice of 12 search fields. The search can be thematically restricted to one of 16 topical areas. If several search terms are entered, a closest-match search is conducted. Results are ordered by relevance, starting with documents in which all the search terms occur, and followed by those featuring less search terms. The phrase-search option is activated in the usual way by enclosing several search terms in quotation marks. 

Results display and additional functionalities:

There is a single-line results list with ten hyperlinked titles (without authors) per page. Detailed metadata including an abstract are available for each result in a cursor-activated pop-up window. However, these metadata cannot be printed out, and in some cases the complete metadata are not visible in the pop-up window.

Assessment:

ScienceGate is a lean, attractively designed search engine. However, its comprehensiveness, and the manner in which results are presented, require optimisation and further development. Important subject-based repositories are not yet covered; nor is the worldwide range of institutional and university repositories. As a result, ScienceGate users must reckon with a correspondingly limited results list.

Scientific Commons is a search engine project of the Institute for Media and Communication Management (mcm) at the University of St. Gallen. It aims to provide the most comprehensive and freely available access to scientific knowledge on the Internet. The service still has beta status.

Content:

31 million documents (as of October 2009). The search engine indexes some 1150 repositories (document servers with an OAI interface) and also covers web pages with bibliographies that contain metadata. Such pages can also be registered with Scientific Commons. Documents include articles, theses and dissertations, reports and other scholarly and scientific texts.

Indexing:

Full-text and metadata indexing; large range of file types up to 3 MB in size.

Search options:

Single search box. No advanced search option available. In practice, the expansion of search terms by including of word variants or via truncation leads frequently to unwanted results. For example, the query "internal relativity" yields results relating to "international relative prices". The list of results is reloaded when scrolled. The primary ranking criterion is word frequency. 

Results display and additional functionalities:

The links in the results do not lead directly to the full text but rather to an abstract of the document and the publication details, which include a link to the full text. Results can be sorted by relevance or by year (although the latter option frequently gives rise to outliers in the results list). The use of the language criterion (English or German) is the only way to successfully refine the results.

Assessment:

The advantage of having access to an extremely large collection of documents is diminished somewhat by the relatively bulky list of results. The display options are partially faulty. The additional information on the results can be viewed only with suitable system configurations/browsers. The useful "Publication List Details" with lists of authors and co-authors are no longer available. They have obviously been removed to make way for advertisements.

Scirus is a science-specific search engine sponsored by Elsevier. It covers a diverse range of Internet sources.

Content:

50 million documents and 450 million web pages (as of October 2009).

Subject range: main focus is on the health, life and physical sciences; social sciences also covered.

Resources indexed:

  • Publications, especially journal articles, from 15 publishers/aggregators whose (only) main publisher is Elsevier, and from the OA publishers BioMed Central, Pubmed Central, Hindawi Publishing and Projekt Euklid
  • Documents from 20 repositories including such well-known archives as ArXiv.org, NDLTD, PsyDok and RePEc
  • "The rest of the scientific web" (Scirus) including institutional web pages and science-related commercial pages, scientists' homepages.

Restriction of the search results to Open Access documents:

In the advanced search, users have the option to restrict the search to digital archives. Depending on the topic, the well-known Open Access publishers (see above) can be included in the search.

Indexing:

Full-text and metadata indexing of publishers' versions and documents in repositories.  

Search options:

Basic search and advanced searched with differentiated search options: two input fields which can be joined in different ways; a choice of 8 search fields. Search terms can be truncated (insertion of symbol as placeholder) and joined by logical operators. The search can be restricted to groups of sources (see above) or individual sources, to a certain publication period, document type or subject area. Dynamically generated drop-down menus can be used to filter or refine the search results. 

Results display and additional functionalities:

As an alternative to ranking by relevance, the results can be sorted by date. A link to "similar results" is offered for all results. If a result (with URL display) is from a source in the "rest of the scientific web" (see above) the results display can be limited to documents with the same server address.

Assessment:

A high number of results. Scirus stands out by virtue of the fact that it comprehensively indexes scientifically relevant web pages, many of which are text documents. As of October 2009, some 27 million OA documents were available via the indexed repositories and OA publishers (BioMed Central etc). Because it indexes MEDLINE / PubMed, Scirus can be used for medical literature-research purposes. Depending on the number and size of the relevant sources, it can also be used for initial research in other subject areas, such as physics. The many search options offered deserve special mention. 

Sources

Pieper, Dirk & Wolf, Sebastian (2009). Wissenschaftliche Dokumente in Suchmaschinen. (Scholarly and scientific documents in search engines) In: Handbuch Internet-Suchmaschinen (Handbook of Internet Search Engines). Ed. Dirk Lewandowski. Heidelberg 2009, pp. 356-374.

Norries, Michael et al. (2008). Finding open access articles using Google, Google Scholar, OAIster and OpenDOAR. Online Information Review 32 (2008) 6, pp. 709-715.

Search Open Access Repositories. The Library at UCD.

Wissenschaftliche Suchmaschinen im Vergleich (Scientific search engines: a comparison) University of Zurich. Geographical Institute. Library.

Content mentor

Edited/compiled by:

Wolfgang Binder, formerly of the University of Bielefeld, former project collaborator, information platform open-access.net