End of test questions doesn't make much sense anymore, now that the tests are chained. They should probably be moved into "end of study test". But idealy in my opinion we should have a test with those questions. And thus add the different question types. What is your input on it @sbibauw?
How should we compute the score? As a study can have multiple types of tests, how should we count each of them and then aggregate them? I can imagine those cases:
What if a test has 60 questions and another 10 questions? Does it sum up to 70 questions, or each test weight 50%?
What about typing tests? How do we count their correctness? And how do we aggregate them?
How do we count gapfill? Does it has to be exactly the input? Or is there some threshold?
Good questions. The general answer is it doesn't matter much for now. This is just a quick feedback for the participant, NOT the main computation we really care about.
Score should be:
averaged over items (if group A has 60 items, and group B 10 items, the total is computed over 70; no weighting, no averaging at group-level)
typing tests are not graded, simply ignored
gapfill can be graded as 0/1 (at item level), 1 = exactly correct, 0 = anything other than the correct response