Information and Communications Technology (ICT) gives us multiple-purposed weapons. Tools that are used to monitor drug trafficking can become excellent additions to the methods of the historian. They might help with a problem that every historian faces – not only every historian, every person in everyday life. We think, what is this person talking about? Does he live in the same universe as I do? Why does he see things that I cannot see? Based on what kind of information processing has he constructed his universe?
As historians we often face the same bewilderment. One problem I am working on is the development of opinions and sentiments around drug use and drug regulation among the more general public over the past century. Any drug history we open will contain its usual amount of quotes about public unrest, spreading horror stories, or debunking them. It is always possible (and I have been guilty of this myself) to quote some newspaper article that supports any position. But to what extent is this evidence representative of public opinion as a whole? Can ICT help us with processing the available information data that the public had access to, constructing a more representative image?For instance, in my own country, The Netherlands, since 1920 there has been a law outlawing the non-medical use of opiates and cocaine. However, there is not much evidence that anyone before that time was concerned about a drug problem in the country. In fact, the Netherlands was one of the world’s main suppliers of those drugs at the time. Coca was produced in the colony of Indonesia and exported to the rest of the world; in Amsterdam a legal production of cocaine was concentrated; and in Indonesia the colonial government actually held the monopoly on the sale of smoked opium to the population, producing the drug in a state-owned factory and distributing it through concessions. How did the Dutch public react to the changes in the drug laws and the role of the Dutch state in both joining in the international war on drugs and producing and selling its own drugs?
On the eve of the Second World War the detective-comic Dick Bos would paint a picture of vicious and mysterious Chinese crime syndicates flooding their drugs into the country, and of patients with mental disorders and weak nerves falling prey to cocaine pushers. But to what extent and when did these images get hold of the Dutch imagination, through for example popular media?
One of the methods of studying this problem is a survey of popular newspapers at the time. Until now, historians use methods like sample studies and backbone studies to make a representative selection of this material. But these methods have their drawbacks. They are very labor intensive, often involving large groups of academic students doing the research (and hoping they are not missing something). Making selections from the data always contains a certain amount of subjectivity. One has to rely on indices compiled by others, and scanning textual content by hand works better fresh in the morning than after your lunchtime nap.
Computers don’t have those problems and luckily, more and more archival material becomes available in a digital format. The Royal Library of the Netherlands in The Hague is digitalizing a significant part of all the newspapers in its collection published between 1618 and 1995. This includes not only Dutch newspapers in a strict sense, but also newspapers from the former colonies in Indonesia and the Caribbean. A search engine enables advanced and full-text search of the documents. This creates a magnificent source for the historian, even when we account for Optical Character Recognition scanning errors.
Digital sources are obviously the source of the future to be used by the historian. But that historian faces a serious problem when he is faced with, as in the Royal Library database, ultimately nine million newspaper pages. Even when he works on subjects such as drugs in the interwar years in which the number of pages on which drugs are mentioned is ‘only’ tens of thousands, he is practically unable to study all these pages by himself. Using the search engine gives him many hits, but no method to use them all as data.
It was therefore with some excitement that I became involved last year with the development of an open-source application in which search functions will be combined with a set of web services for processing textual content of documents, so-called text mining. The project, in which historians of Utrecht University work together with computer scientists of the University of Amsterdam, is called WAHSP: Web Application for Historical Sentiment Mining in Public Media.
Text and sentiment mining is already indispensable in market and intelligence research, but not so common in the historical world – partly because the large-scale digitalization of historical data has yet to pick up pace, and partly because of a traditionalism often inherent in historians. Commercial software is available and has been adapted to the historian’s purposes by Utrecht researcher José de Kruif, who is also involved with WAHSP and moderates a weblog on humanities ICT.
The goal of WAHSP is to go one further and to develop freely available and user-friendly software. If there are more people out there working on or interested in such new digital tools for historians, do not hesitate to let us know. In future posts I hope to show how their use can further our insight into drug history, more specific in the early history of drug regulation in the Netherlands. Text mining tools will extend the amount of information we can use in constructing and comparing our universes.