One last language post for me to end this semester, as well as my blogging on this site. We discussed the adjective in this sentence in a writing group yesterday:
Such dataset is rather scarce especially in large scale.
Part of the problem we found is that the dataset itself isn’t scarce – it’s the data that are scarce within the dataset. Computational scientists describe datasets as rich or poor. (Data is or data are? read on — the answer is at the end of this post!)
The choices we came up with were: scarce, sparse, scanty, and scant. They’re all close in meaning, and they have quite strong negative connotations; that is, they suggest that there isn’t enough data to do the research you want. Scarce suggests there’s no little that a resource is hard to find or get (food or water, for example, could become scarce). Sparse is generally used for things you can count, and is quite concrete in meaning. The Oxford Learner’s Thesaurus gives: “only present in small amounts or numbers and often spread over a large area” — for instance, trees are sparse if there’s only a few in a big area. Scanty is not really appropriate here — you can use it to mean insufficient (“scanty evidence”), but it more commonly means (women’s) clothing that is too revealing! The related adjective scant could work: the Collins COBUILD says one use of this word is to show “there is not as much of something as there should be” (so, it implies a judgment or evaluation).
Some examples from the corpus (there were no examples of “scanty data” in the academic section):
- Here, too, definitive sources of data are scarce
- assessment of the environmental and health problems associated with pesticides is incomplete because data are scarce
- Infant mortality data is scarce before the introduction of hospital birth
- no association was detected between lamivudine discontinuation and liver-related death, but data are sparse
- Data are sparse for most species and ecosystem types
- Finally, although the relevant data are sparse, an emerging pattern in our understanding
- Scant available data suggest that some siblings are vulnerable to poor adjustment outcomes
- Even though the data are scant and not entirely reliable, apparently
- only scant data exist on student attitudes, academic achievement, or …
From these few examples, I would suggest that scarce and scant mean there aren’t really enough data to work with; sparse suggests there’s not much data, so you can’t draw strong conclusions.
Our solution was in fact to use limited because it’s less negative and suggests there’s still enough data in the dataset to begin work on. (The Oxford Learner’s Thesaurus would have pointed us to that word, too, if I’d thought to look!)
The data is or the data are?
On a related note, do we say the data is or the data are? This is a little tricky because data should be plural (the Latin plural of datum, meaning one data point), but it doesn’t look like a plural in English (there’s no -s on the end). Therefore, in speech and casual writing, most people treat it as a singular (mass) noun (the data is), but in careful, academic writing, the plural is probably safer (the data are).
Corpus data support(s) this advice. The numbers in the table refer to the number of uses of “the data” + singular/plural verb per million words:
| The data + | Spoken | Magazine | Newspaper | Academic |
|---|---|---|---|---|
| Singular | 1.03 | 1.04 | 0.55 | 3.03 |
| Plural | 0.21 | 0.98 | 0.60 | 7.55 |
Source: Corpus of Contemporary Academic English
As you can see, plural verbs occur more than twice as often as singular verbs after the data in academic writing, but the singular is 5x more common than the plural in speech (although the phrase itself is obviously not frequent in conversation!). Magazines and newspapers use both forms fairly equally, which fits my hypothesis since they represent middle registers between spoken and academic English.
I just ran across this site because I was trying to discern the differences among sparse, scant, and scarce. I encounter these terms frequently in my scholarly reading (psychology and sleep medicine). Yours is an interesting take, but I want to look further to see what others say. Thanks!
Joe Buckhalt
Auburn University