Skip to main content
Instant PDF Download
Web Corpus Construction - Felix Bildhauer PDF Download
Digital Download

Web Corpus Construction - Felix Bildhauer PDF Download

The World Wide Web constitutes the largest existing source of texts written in a great variety of languages. A feasible and sound way of exploiting this data for linguistic research is to compile a st...

$16.99

One-time purchase - Lifetime access

Download Now
View Cart

Instant Download

Get access immediately

Secure Payment

SSL encrypted checkout

24/7 Support

Help when you need it

Money Back

30-day guarantee

What's Included

Complete Service Manual

Full PDF documentation

Step-by-Step Procedures

With detailed diagrams

Wiring Diagrams

Electrical schematics included

Specifications

Torque values & specs

Troubleshooting Guides

Diagnostic procedures

Lifetime Access

Download anytime

The World Wide Web constitutes the largest existing source of texts written in a great variety of languages. A feasible and sound way of exploiting this data for linguistic research is to compile a static corpus for a given language. There are several adavantages of this approach: (i) Working with such corpora obviates the problems encountered when using Internet search engines in quantitative linguistic research (such as non-transparent ranking algorithms). (ii) Creating a corpus from web data is virtually free. (iii) The size of corpora compiled from the WWW may exceed by several orders of magnitudes the size of language resources offered elsewhere. (iv) The data is locally available to the user, and it can be linguistically post-processed and queried with the tools preferred by her/him. This book addresses the main practical tasks in the creation of web corpora up to giga-token size. Among these tasks are the sampling process (i.e., web crawling) and the usual cleanups including boilerplate removal and removal of duplicated content. Linguistic processing and problems with linguistic processing coming from the different kinds of noise in web corpora are also covered. Finally, the authors show how web corpora can be evaluated and compared to other corpora (such as traditionally compiled corpora).

Author: Schäfer, Roland
Author: Bildhauer, Felix
Publisher: Morgan & Claypool Publishers
Illustration: n
Language: ENG
Title: Web Corpus Construction
Pages: 00145 (Encrypted EPUB)
On Sale: 2013-07-01
SKU-13/ISBN: 9781608459834
Category: Computers : Natural Language Processing
Category: Language Arts & Disciplines : Linguistics - General
Category: Computers : Data Modeling & Design

Need Help?

Our support team is here to help you with any questions.

Contact Support

Guaranteed

  • 30-day money back guarantee
  • Secure encrypted checkout
  • Lifetime access to downloads
  • Free updates included

Ready to Start Your Repair?

Get instant access to professional repair information. Download now and start fixing today.

$16.99
Download