The basic essence of our project is we have observed an increase in quality of work produced in a wiki (where our students work together) compared to their normal essays. However, we would like to quantify and qualify this rather than going with impressions. Our simplistic view had been to look at comparing the 2 sources by hand using defined indicators of say evidence of using critical thinking, use if primary sources etc. These can be tested by checking for the presence of certain words or word patterns.
During the first month of the project I collected and cleaned up the essay and wiki data for development and testing purposes. Because the essays and wikis contain many Arabic words, Iqbal is creating a Arabic-English lexicon, which can be used in our NLP system at a later stage.
I also started looking at related work for automated essay marking and identified a number of essay marking systems that are either rule based, statistical or both. I will have a closer look at NLP systems such as ETS I and BETSY next to see what has been done previously and if we can benefit from the existing systems.
I'm also looking at annotation tools we can use for creating training data for our system.
Here are our project poster and the slide for our talk:
http://tinyurl.com/yh7v9t5
http://tinyurl.com/yekh4cv
In order to develop our rule based NLP system we will start running the essay and Wiki data through the LT-TTT2 pipeline (http://www.ltg.ed.ac.uk/software/lt-ttt2), which is a XML-based software for shallow linguistic processing of text developed at the University of Edinburgh.
The TTT2 tool turns text into XML format, tokenizes the text, performs sentence splitting, chunking and rule-based named entity recognition. Third party tools perform part-of-speech tagging and lemmatizing.
The results of the part-of-speech tagging process are important for us to recognize the type of words in our data, e.g. nouns or adjectives, which plays an important role for assessing quality in our list of criteria. For example, a sentence of an encyclopedic style Wiki entry would feature a succession of adjectives, e.g. “luscious, ripe fruit found in paradise”.
Shallow parsing or chunking gives us an insight into what the constituents (noun groups, verbs, verb groups, etc.) of a sentence are. The structure of the sentences is also one of our criteria for assessing quality. Chunking doesn’t specify the internal structure of the constituents or their role in the main sentence, though. This means that full parsing might be necessary at a later stage if the information given by chunking isn’t sufficient to determine quality.
In order to perform the named-entity recognition correctly, the lexicon of Arabic expressions we identified at the start of the project will be incorporated into the NER module so that the Arabic words will be picked up and identified correctly, because as a default the module is trained on English data only. NER is important to identify the use of names, locations, references and such.
The following post shows the full list of essay quality criteria we have developed.
1.) References
a. Emphasis on primary sources is valuable
b. Web references < peer reviewed academic journal references
c. Introductory texts, class notes, and class readings < book references
d. Wikipedia and basic sites like Encyclopedia Britannica are less valuable
e. Number of unique books references in footnotes and bibliography are valuable
f. Quality of referencing using social science conventions are valuable
g. Many quotations and extensive quotations are less valuable
2.) Terminology
a. Use of higher order and unique words in text such as: framework, criticism, illuminates, criticism, Orientalism, etc… (word list to be created)
b. Variety in referencing is valuable
c. Number and level of foreign (Arabic) terminology (word list to be created)
d. Number of unique proper nouns is valuable
e. Word repetition as less valuable
3.) Grammar
a. Run on sentences are less valuable
b. Dependent clauses and passive voice which explains using commas and semicolons are valuable
Three papers used as examples- excellent, average, and poor marks from 2009-2010 class.
Status Update
This project started off as an exploration of possible applications for Informatics techniques to e-learning in the School of Divinity. As a concrete example, we wanted to look at whether we could derive anything interesting from trying to automatically compare student work in the form of traditional essays, with collaborative wikis. But this was always intended as a starting point, and we were hoping to find some interesting potential applications that might be suitable for future collaboration.
On the pedagogical side, we started with the observation that the quality of the student wiki work appeared to be better than that of the individual essays. But it has been very interesting to try and derive explicit criteria for the "quality" of the work. Interesting questions have been raised about the different styles and approaches when using wikis, as opposed to traditional essays - and it isn't clear how easily these can be compared.
We have identified a number of criteria, and we are currently refining these, attempting to clarify their importance, and to see how they might be identified automatically.
On the technical side, we have been dealing with the pre-processing (attempting to handle embedded Arabic phrases!) and researching potential tools and approaches. We are still hoping to demonstrate the automation of some criteria, but it is clear that a significant practical tool is beyond our resources at this stage.
However, as we had hoped, the collaboration has highlighted a few promising directions for the future - in particular, we have become very interested in the possibility of building a tool which would provide automatic formative feedback on student writing. This would be extremely valuable in practical terms. It appears that there has been enough work in this area to demonstrate the feasibility (eg. [1]), at least in a more scientific subject. And we would be very interested in exploring the issues of creating a practical tool in the humanities context.
So, we now have a much clearer focus of where the collaboration might go next. For the remainder of the project, we want to explore the specific technical and pedagogical issues of this particular application. And we now have a final aim of creating a report on this work, together with a clear proposal for further funding. With this in mind, we are hoping to broaden the collaboration a little to include people with appropriate experience, so that we understand the context and related work properly.
It is still rather too early to be able to disseminate results, but we are planning to submit a short presentation to e-Assessment Scotland [2], and we have submitted a proposal to the Online Education Conference [3]
[1] http://www.informaworld.com/smpp/content~db=all~content=a725291193~tab=c...
[2] http://www.rsc-ne-scotland.org.uk/eas/
[3] http://www.online-educa.com/
Question 10
Submitted by jlee on Fri, 11/20/2009 - 16:48.It probably means, what kind of resources do you need -- how much money? And perhaps help in other ways, like contacting the right kinds of people to advise on techniques etc.