Last week we had a workshop on zoon. It was an excellent day and highlighted various very useful avenues for improvement. I think one of the major benefits was clarifying what zoon is useful for, what our ‘competition’ is (and how we significantly differ from other projects) and the major things that need to exist for people to adopt zoon. I will talk about these later.
Bugs and problems
We (inevitably) found plenty of bugs. Some of these were fixed on the day and some are still to be fixed.
I have moved many of them from the google docs and my notes to github issues. Feel free to add more.
The major source for bugs was reading user’s own data in. Notably this led to a number of downstream errors e.g. data being read in wrong but not causing an error until the model module. So one major fix needed is for zoon to check after each module that the output is correct. This will aid debugging by isolating the problem much more quickly. It also highighted a need for zoon to save the ‘progress so far’. When downloading large datasets for example, it is annoying to have to redownload the dataset because of a typo in a module name later on.
In general, reading data in needs plenty of work to make it very easy and robust. I could do with a better idea of how people would like to put their data in (csv, tab delimited?, excel, rasters, anything else commonly used?). We also talked about reading in local R objects.
And while zoon will only use longitude latitude and value (0,1 or abundance mostly), I imagine people often have spreadsheets with more columns that they would like to use. So some attempt at guessing which the correct columns to use are would be sensible. Users indicating which columns they wish to use is also vital.
One major alteration that was suggested at the workshop and implemented in the coming weeks is to change the syntax of the main workflow function from this
w <- workflow(occurMod = 'UKAnopheles', covarMod = 'UKAir', procMod = ModuleOptions('Crossvalidate', k=2), modelMod = 'LogisticRegression', outputMod= 'PrintMap')
w <- workflow(occurrence = UKAnopheles, covariate = UKAir, process = Crossvalidate(k=2), model = LogisticRegression, output = PrintMap)
There are three changes here:
– A change in argument name
– Removing the requirement for quotation marks
– pass module options as arguments to a function like module name (the k=2 here) rather than using ModuleOptions.
The third change is more difficult and has some technical challenges that Nick has mostly figured out.
Why is zoon different
We had some really profitable discussions about the target audience of zoon and what would be required for people to start using zoon. The big packages/software that are ostensibly in the same area are Biomod and GUI maxent. However Zoon is quite different to these being in large part a wrapper to other packages.
It was noted that for people to start using zoon, it needs to be as easy as possible while providing things that other packages don’t. However, rather than being easy, it should be more useable i.e. it should suite the task better. The task of zoon is to do good, reproducible distribution modelling (not ‘get a map that vaguely looks feasible’). What a GUI could or should provide for zoon is an ongoing discussion. But instead of being just an easier way to perform a basic (and perhaps ill advised) analysis, perhaps it should be an easy way to explore the functionality of zoon (i.e. the modules repository). And if a GUI is used to run whole workflows, it should always output a code version of the workflow so that it is reproducible and preferably make the code immediately available to the GUI user as an entry point into the command line use of the package.
In terms of what zoon offers there is two strains. Firstly, as a wrapper for many other packages, we should be able to provide a very wide number of models very easily. Furthermore we aim to have very comprehensive functionality at every stage of an analysis again in part by wrapping other packages rather than writing complex methods from scratch.
The second, perhaps more interesting strain is that zoon should enable different types of analyses to other packages. For example, comparing a number of different models or easily combining data from a number of different sources or quickly providing and combining effective and diverse output from your models. These are tasks that are a level above what most packages provide; in other words, most current packages would provide one step in these analyses. Instead of having to write lengthy R scripts, that in their complexity lose their reproducibility, zoon should allow these higher levels to be run quickly, simply and reproducibly.
Finally, I think there are some interesting differences between Zoon, biomod and maxent. The major difference is that Zoon is aiming to be a community. A small group of developers can never keep up with a large, burgening and rapidly moving field. However, a community can. I think this makes a huge difference (although it comes with it’s own set of issues.)
Finally it was noted that zoon used to be spelt zoön to calify the spelling. So this is my proposal for Zoön artwork.