|
|

September 18 to 19, 2011
Report on the Presentation at the Japanese Society of Social Psychology:
"Attempt to measure teamwork skills using mutual assessment by office workers"
1. Outline
We made an oral presentation on the validity of measuring teamwork skills and on understanding team structure at the 52nd Congress of the Japanese Society of Social Psychology held on September 18 to 19, 2011, at Nagoya University. We also participated in a session on the responses of Internet surveys and developed an insight into the benefits and challenges of Internet surveys. This will be quite useful for our future research.
2. Content of my presentation
Presenters:
Shinkichi Sugimori, CRET Researcher / Rie Tateishi, CRET Researcher / Atsushi Furuya, CRET Researcher / Ayaka Mori, CRET Researcher / Atsushi Aikawa, Ph.D., CRET Board of Directors
[Issues and Objectives]
The importance of teamwork has increasingly been emphasized by the OECD and the Ministry of Economy, Trade and Industry (METI) in Japan, through development of "skills to effectively formulate interpersonal relationships within a diverse group" and "abilities to work as a team."
In our research, we focused on mutual assessment (self-assessment and other-assessment) by office workers as one of the objective indicators of teamwork, and examined its relevance with the "teamwork skills scale" that has already been developed.
The objectives of our research were;
1. To confirm the validity of the "teamwork skills scale"
2. To understand the teamwork structure by examining the relationship between mutual assessment and the existing teamwork skills scale, and by comparing self-assessment and other-assessment in mutual assessment.
[Methodology]
The "teamwork skills scale" (72 items) and mutual assessment, jointly developed with companies which collaborated in research, were used. The mutual assessment consisted of 20 items, including two perspectives of awareness of cooperation and execution skills of tasks.
112 people took part in the survey. The "teamwork abilities scale" survey was conducted on the Internet in August 2010, and the mutual assessment survey was implemented using a questionnaire in November 2010.
[Outcome and Observations]
1. On examining the validity of the "teamwork skills scale"
There was an above medium degree of correlation between the self-assessment in the mutual assessment (awareness of cooperation and execution skills of tasks) and the lower scale of the "teamwork skills scale," "communication," "backup," "monitoring," and "leadership." Between other-assessment in the mutual assessment and the lower scale of the teamwork skills scale, "communication" and "backup," there was a weak yet significant positive correlation. However, there was no correlation between self-assessment and other-assessment. The validity of the "teamwork skills scale" was confirmed to a certain extent, as a result of the observed correlation between mutual assessment, which is one of the objective indicators, and the "teamwork skills scale."
2. Attempt to understand team structure
There was a positive correlation between the value arrived by subtracting other-assessment from self-assessment and the lower scale of the "teamwork skills scale," "communication," "backup," "monitoring" and "leadership." This means that the higher the self-assessment, compared with other-assessment (self-exaltation), the higher the self-assessment in the teamwork skills scale. It was also clear that the higher the other-assessment compared with self-assessment (self-critical), the lower the self-assessment in the teamwork skills scale.
We also examined the degree of consistency in other-assessment toward the members (variance was used as an indicator) and explored the possibility of its relevance with the shared perspectives within the team. The outcome indicated a negative correlation between the variance in other-assessment and the lower scale in teamwork skills, "team orientation," regarding the three items of mutual assessment; "there is rich communication among employees," "gratitude is orally expressed," and "objectives of work are understood and efforts are made to accomplish them." These results show that the greater the other-assessment varies on the three items of mutual assessment, the lower the self-assessment of "team orientation" skills by the assessors themselves.
[Q&A]
Q1. What is the average number of people who took the other-assessment in the mutual assessment?
A. Six on average.
Q2. What do you measure in the mutual assessment of teamwork? Tangible behaviors or potential skills?
A. Basically, tangible teamwork skills are measured, since responses are made based on tangible behaviors. However, responses may reflect potential skills recalled by tangible behaviors. We have not manipulated to separate the two, and the potential section is blended into the other assessment.
Q3. Why is there a weak correlation between self-assessment and other- assessment?
A. Even though the same items are used in the assessments, we could not find a correlation between the two. It is assumed that there may be different criteria between self- and other-assessment. We have not yet explored this area and consider it necessary to pursue our research in the future.
(Atsushi Furuya, CRET Researcher)
|
August 31 to September 4, 2011
Report on the Presentation at the 11th European Conference on Psychological Assessment;
"Measuring Teamwork Competencies Implicitly Through Projective Responses to Computer Graphics"
1. Outline
We made an oral presentation on "Measuring Teamwork Competencies Implicitly Through Projective Responses to Computer Graphics (CG)" at the 11th European Conference on Psychological Assessment held at the University of Latvia from August 31 to September 4, 2011.
2. Content of my presentation
Presenters:
Shinkichi Sugimori, CRET Researcher / Rie Tateishi, CRET Researcher / Atsushi Furuya, CRET Researcher / Ayaka Mori, CRET Researcher / Atsushi Aikawa, Ph.D., CRET Board of Directors
[Introduction]
The importance of teamwork, an ability to develop interpersonal relationships among diverse groups, has been highlighted recently by the OECD and METI in Japan. The Teamwork Competencies Scale has already been developed. However, this is a scale measured using linguistic media which makes it quite easy for the respondents to guess the objective of the measurement. It therefore tends to affect them in terms of social desirability. Based on this observation, in our research, we used CG media as a less obvious media for the respondents to guess the measurement objective, and developed a test aimed at implicitly measuring the teamwork competencies. The research consists of the following two separate parts;
| | |
|
Study 1 Subject: University students
・Making stimulating CG, collection and categorization of projective responses to CG.
・Finding significant associations between the Teamwork Competencies Scale and projective responses to CG.
→Selection of stimulating CG associated with individual teamwork competencies and making alternatives of projective responses,
Study 2 Subject: working population
・Collection of projective response for selected stimulating CG from Study 1.
・Examination of the associations between the Teamwork Competencies Scale and projective responses for CG.
→Examination of the measurement possibilities of teamwork competencies using CG tests.
|
|
| | |
Study 1
[Method]
・Materials
We used 6 stimulating CG, consisting of two targets (fish and people) x three situations (solo, in a group, outside a group) (Figures 1 through 6) and the Teamwork Competencies Scale, which consists of 5 subscales (Figure 7) and has 72 items in total, was prepared.
・Procedure
In two lectures at university, 139 students were shown the 6 CG one by one and asked to freely describe their responses to the question; "How do you think the yellow fish (person) feels?" The Teamwork Competencies Scale was distributed during the lecture and collected in the next scheduled lecture, as requested.
 |
 |
 |
| Fig. 1) a solo fish |
Fig. 2) a fish in a group |
Fig. 3) a fish outside a group |
 |
 |
 |
| Fig. 4) a solo person |
Fig. 5) a person in a group |
Fig. 6) a person outside a group |
Fig. 7) 5 subscales of the Teamwork Competencies Scale
[Results and Discussion]
Each subscale of the Teamwork Competencies Scale was divided into three groups (high, middle, low). The projective responses for the stimulating CG were divided into 6 categories consisting of two orientations (group/individual) x three evaluations (positive, negative and neutral).
We found significant associations between projective responses for the CG of a fish outside a group and teamwork competencies in the high or low groups of the four subscales; communication, team-orientation, backup and leadership. We also found significant associations between the projective responses for the CG of a fish and a person outside a group and teamwork competencies in the middle group of the "monitoring" subscale. However, we couldn’t find associations with the teamwork competencies when other CG were shown.
We concluded that the CG of a fish or a person outside a group is effective to diagnose high and low groups of individual teamwork competencies. Thus, in Study 2, we pursued the possibilities of measuring teamwork competencies using CG tests by using 12 alternatives extracted from two representative projective responses from each of the 6 categories, and through careful selection of the CG of a fish outside a group.
Study 2
[Method]
We focused on the CG of a fish outside a group and a person outside a group, as they showed a strong correlation with the Teamwork Competencies Scale in Study 1.
・Materials
We used a total of 11 CGs (5 new CGs of a fish/person outside a group (Ex. Figures 8 through 12) were added to the 6 CGs in Study 1), 12 response alternatives in each CG, and the Teamwork Competencies Scale.
・Procedure
Using the Internet, 112 employees (from a food film company with nationwide branches and offices) were tested. The Teamwork Competencies Scale (Sep. 2010) and "CG test" (Nov. 2010) were implemented.
 |
 |
Fig. 8) a fish outside a group (chasing) |
Fig. 9) a person outside a group (chasing) |
 |
 |
 |
Fig. 10) a fish outside a group (following) |
Fig. 11) a fish outside a group (circulating) |
Fig. 12) a fish outside a group (moving in a wave form) |
[Results and Discussion]]
The results of the study on the associations between the 5 subscales of the Teamwork Competencies Scale and response alternatives for stimulating CG showed a stronger association than observed in Study 1.
Significant associations were observed when the newly created CG of a fish outside a group (chasing) and a fish outside a group (following) were shown. When a fish outside a group (chasing) CG was shown, we found associations between the response alternatives and teamwork competencies in the high and low groups of all subscales of the Teamwork Competencies Scale. When a fish outside a group (following) CG was shown, we found significant associations between the response alternatives and teamwork competencies in the high or low groups of the four subscales; communication, backup, monitoring and leadership, and the response options. These results showed that there was a possibility of measuring teamwork competencies by using CGs of a fish or a person chasing or following a group.
However, we found less associations between the response alternatives and teamwork competencies than Study 1 when a CG of a fish outside a group.
On the other hand, stronger associations were observed between the response alternatives and the Teamwork Competencies Scale when a CG of a solo fish, a solo person or a person outside a group or inside a group were shown, even though no significant associations was observed in Study 1. The differences may be explained by the differences in respondents (university students or working people) and other reasons. So, we need to collect more data on the working population to pursue the reasons for these differences further.
[Conclusion]
We suggested the possibility of measuring individual teamwork competencies by using projective responses to the CGs, as we found a strong association between the Teamwork Competencies Scale and response alternatives to the CGs.
In further research, we will make a scoring system for the CG test according to the strength of the associations between the projective responses and teamwork competencies, with checking of the validity of the CG test.
(Atsushi Furuya, CRET Researcher)
July 21st, 2011
Report on the presentation at the International Meeting of the Psychometric Society (IMPS 2011):
"A Multi-dimensional Continuous Item Response Model for Probability Testing"
We made a presentation at the poster session of The International Meeting of the Psychometric Society (IMPS 2011) held at the Hong Kong Institute of Education.
In this presentation, a multi-dimensional continuous item response model for Probability Testing (PT) was proposed. Moreover, the matrix of information function, a method of estimating item parameters and a method of estimating the subject's vector of latent traits were introduced.
The proposed multi-dimensional continuous model can not only be applied to PT scores, but can also be applied to other continuous scores (for instance, short-answer items or essay items which were scored by continuous data).
Since the domains measured in a multi-dimensional test may be dependent on each other, it is possible to draw more information from all items of the test than the subtest. In particular, with the item types which require pre-reading of a long question before giving an answer, a larger effect of accuracy will be gained (for instance, reading tests, knowledge application tests, etc.).
However, by using PT, examinees need to give each response option his/her subjective probability of its being correct as an expression of partial knowledge. Examinees must not only understand the scoring rules, but also need to conduct a proper response. Thus, studies for examinees training are needed.
(Yipping Zhang, Ph. D., CRET Researcher)
|
July 21st, 2011
Report on the presentation at the International Meeting of the Psychometric Society (IMPS 2011): "Setting a Target Test Information Function for Assembly of IRT-Based Classification Tests"
This report describes the poster presentation entitled "Setting a Target Test Information Function for Assembly of IRT-Based Classification Tests," which was given at the International Meeting of the Psychometric Society (IMPS 2011) held in the Hong Kong Institute of Education, Hong Kong. More than 50 posters were presented in the session, and researchers in psychometrics engaged in active discussions with each other.
Test assembly means to select an optimal set of items from an item pool so that the resulting test meets a certain criterion in terms of measurement accuracy. The measurement accuracy of a test based on item response theory (IRT) is represented by the test information function (TIF), and one needs to specify a target TIF for the test assembly. On the one hand, specification of the target TIF requires consideration of several factors, such as a desired level of measurement accuracy (i.e., standard error of estimation of ability) at each ability level, overall characteristics of items in the item pool, and simulation results. On the other hand, it would be useful if there was a systematic method to derive a target TIF from a small number of "conditions." The current study focused on the assembly of classification (i.e., pass/fail) tests, and proposed a method which numerically computes an optimal target TIF when the following two values are given: (a) an acceptable misclassification rate, which is the theoretical probability that an examinee with a true ability in the pass (or fail) level will be erroneously judged as fail (or pass), and (b) a threshold on the ability scale for pass/fail classifications.
The problem was formulated in the framework of "statistical decision theory" in order to define an optimal TIF. Given a certain decision rule and loss function, the "risk function" for the pass/fail classification is considered as the misclassification rate conditional on ability. Its expectation with respect to the population distribution of ability is called the "Bayes risk" and in this case is equivalent to the overall misclassification rate. It must be noted that computation of these misclassification rates requires the TIF to be known. In usual decision theory one is concerned with looking for the best decision rule which minimizes the Bayes risk, whereas in the current problem we are interested in deriving a target TIF which keeps the Bayes risk below a certain value given the fixed decision rule.
Computation of the Bayes risk involves complicated integration, and it is even harder to optimize the TIF contained in the Bayes risk. More manageable is the risk function (i.e., the conditional misclassification rate). If one assumes a certain functional form for the risk function, then the resulting Bayes risk has an analytical form. Then, the risk function can be uniquely determined if the threshold and the upper limit of the overall misclassification rate are given, and then it in turn uniquely determines the target TIF. TIFs obtained in this manner were plotted with varying values for the threshold and the overall misclassification rate.
The results indicated that the target TIF became larger as (a) the threshold became closer to the population mean of ability and/or (b) the overall misclassification rate became smaller. Consider the case in which the overall misclassification rate is less than 10%. If the difference between the threshold and the population mean is 1, the maximum value of the obtained target TIF was as small as 3. However, if the difference is zero, the maximum value of the target TIF jumped up to around 16. This describes an advantage of the proposed method that one can obtain a standard for the target TIF only by specifying two values of the threshold and the overall misclassification rate.
Counterintuitive, but interesting, results were also obtained. Since measurement (or classification) accuracy of examinees whose true ability is close to the threshold is inevitably low, it seems reasonable to set the TIF high, especially near the threshold. However, the results indicated the opposite; the obtained target TIFs all had a sudden fall at and near the threshold. This means that theory tells us to "give up that low accuracy."
These results depend on the specific functional form assumed for the risk function. Other possible forms for this, together with other decision rules and loss functions, should be further considered, and the effects of these different configurations should be investigated.
(Kentaro Kato, Ph.D., CRET Researcher)
|
July 16th, 2011
Report on the presentation given at the 22nd Meeting of the Japanese Society for Curriculum Studies (JSCS)
I participated in the 22nd Meeting of the Japanese Society for Curriculum Studies (JSCS) held at Hokkaido University from July 16th to 17th, 2011 and gave a presentation at the independent research session. This report is the outline of my presentation.
First, I illustrated with a chart the transition of media and learning from the 1970s to the present in relation to the education philosophy. This transition shows a pendulum movement swinging over the education philosophy. It is not unique to Japan, as this has been reported as a global trend by researchers in other countries.
Next, I explained about the digitalized textbooks that have been attracting growing attention recently after their first appearance in this decade from the following perspectives:
1. Meeting on the computerization of school education
– The Ministry of Education, Culture, Sports, Science and Technology (MEXT)
2. Trends of schools in the future.
- The Ministry of Internal Affairs and Communications (MIC)
3. Private trends
4. Digital textbook terminology
5. Joint use of digital and paper textbooks
6. Authorization of textbooks
7. Budgeting
8. Teacher training and support
9. Information terminals
10. Other open issues
There were questions from the floor concerning the relationship between the vision on the digitalization of education and education philosophy; and how the differences between instruction and construction relate to media.
During the symposium, what I found to be most interesting was the topic on the methodology of curriculum management. It is similar to instructional design in education technology with the PDCA cycle as its basis. I found a difference in the framework of the research methodology between the JSCS and the Japan Society for Education Technology (JSET). In the former, the ideal philosophy precedes, whereas in the latter, the need for application in education precedes in research.
I spent two cool summer days with drizzling rain in Sapporo.
(Kanji Akahori, Ph.D., CRET Board of Directors)
|
March 8th, 2011
Report on the presentation at the 17th Annual Convention of The Association for Natural Language Processing:
"Development and evaluation of the scoring support tool for closed constructed-responses"
I made a presentation at the poster session of the 17th Annual Convention of The Association for Natural Language Processing held at Toyohashi University of Technology from March 8th to 10th, 2011.
The closed constructed-response method is often used in school exams. It requires greater human resources and time for scoring, when compared against multiple-choice answers. Moreover, inconsistency in scoring outcomes among scorers becomes an issue in the case of scoring constructed-responses for large scale tests. In order to minimize scoring inconsistency and maximize voluminous scoring efficiency, I believe computerized scoring and scoring support of constructed-responses is quite useful.
In my research, a tool was developed to enable the processing of scoring support with ease of operation. The introduction of functionalities and an example of the application of one of the functions (verification with correct answers using BLEU) to trial test data were made. Open source R language was used in the development of the tool. R language can be executed on any major OS (Windows, Mac OS X, Linux, etc.), enabling development of highly portable tools. In addition, a GUI tool kit and various analytical methods are available as additional packages for R. They were leveraged to improve development efficiency in my research.
Many people came to hear my presentation. Since R is not necessarily a common language in this field, I introduced its functionalities to those who were not familiar with it. Experts of natural language processing gave me many valuable comments on analytical methods of data. I found them particularly beneficial for furthering my research.
The status of the tool developed is far from completion. The usability of the tool and its usefulness for actual scoring has yet to be verified. I intend to conduct evaluation experiments to check the usability of the tool, and apply the tool to various constructed-response data. When verification is made and new outcomes are attained, I will present them at academic society and research meetings.
(Koji Nakajima, CRET Researcher)
|
December 18th, 2010
Report on the presentation at the Japan Society for Educational Technology (JSET):
"The comparison of paper and digital media for manga text"
I participated in a meeting held by the Japan Society for Educational Technology on “FD utilizing ICT and the partnership between senior high school/higher education and the general public.” It was held at the Dannoharu Campus of Oita University. I made a presentation on “The comparison of paper and digital media for manga text.”
I am planning to develop software which can recognize the allotment of frames for manga with the purpose of utilizing educational manga as digitalized text. My presentation was based on the survey conducted as a preparatory stage for software development. It was to investigate if there was any difference between digital text with allotted frames and paper manga text. The subject was university students. They were divided into two groups; one with given paper manga text on the history of motion pictures (20 students) and the other group was given digitalized text based on PowerPoint files which display each frame on screen by pressing the enter key (20 students).With this experiment, we compared the study outcome, reading time, general impressions, etc.
We made the following hypotheses prior to the experiment:
- 1. It will take less time to read paper manga.
- 2. The digital text group, which takes a longer time to read each frame on a screen, will do better in the test which examines the knowledge of motion picture history.
- 3. For questions on characters that appear in manga, there will be no major differences between the two groups.
- 4. The subject will feel that paper manga are easier to read.
The experiment results were as follows:
- 1. There was a significant difference (1 %.)
The digital group took 1.5 times longer to read manga than the paper group.
It is assumed that pressing a key to move to the next frame took longer than turning a page.
- 2. There was a minor difference between the two groups on the test outcome for film history. The digital group scored slightly higher with 22.8 out of 30 points, whereas the paper group scored 20.6. The difference can be explained as being due to digital text requiring a longer time to read each frame than paper text.
- 3. The result was contrary to our hypothesis.
The digital group scored higher with a significant 5 % difference. The digital group marked 4.85 points and the paper group 3.70 points. We believe it is also due to the longer time taken to read each frame with digital text.
- 4. As for the ease of reading, we concluded that there was no difference, as we found no significant differences between the two groups in the answers to the questionnaire Q.40, “Do you think PowerPoint manga text is easier to read than paper manga?”
In conclusion, I believe that there is a value in developing a function to cut & divide each frame of manga, because we can expect educational effects from the slow and careful reading necessitated by digitalized manga.
Many questions were raised after the presentation suggesting, “A comparison should be made between two types of digitalized text with and without frame allotment, instead of comparing the paper group and the frame-allotted digital group. I responded that it would be done in the next experiment.
As for other presentations, I found “Attempt of sharing notes between students using an iPod and a photo-sharing application”, presented by Mr.Yuuki Mori and Dr.Shigeto Ozawa, particularly interesting. One’s notebook is projected by students and accumulated as portfolio, then reviewed using an iPod.
(Toshihiko Takeuchi, CRET Researcher)
|
October 23rd, 2010
Report on the presentation at the Japan Society for Educational Technology (JSET):
"Research on the relationship between annotation volume during reading and memory"
I made an oral presentation at the Research Conference of JSET held at Ibaragi University on Oct. 23rd, 2010. It was on the research that has been continuing since 2009 on annotations for paper testing. My presentation focused on a simple question on the relationship between the annotation volume while reading and academic achievement.
We use various types of annotations such as underlining, highlighting with a marker pen, and putting original marks on the page, while reading. As has been pointed out all along, the same practice has been observed during testing for reading questions. It has been possible with a paper-based test, but not with a computer-based test (CBT.) It is not because the use of annotations is not allowed, but rather, it is because CBTs are not well equipped with annotation capabilities. If it is true that annotations can affect the test outcome, it is necessary to equip CBTs with the relevant interface, enabling annotations. Under certain conditions, it has been proven that annotations during learning are effective. However, within a limited time frame such as testing, the effectiveness of annotations is doubtful. My presentation focused on the correlation between annotation volume and memory.
The study result showed that there was no correlation between annotation volume and memory as well as reading ability of Japanese passages. We can simply assume that the recalled Japanese must be equal to the annotated section. However, it was not necessarily the case. There was little relationship observed between the recalled section and the annotated section. Students who regularly engage in annotation were not impacted in a significant way when annotations were prohibited. We can conclude that for those who engage in annotations in paper tests are not greatly affected by CBTs without annotation capabilities. Their test outcome is not influenced. My assumption was that if annotations affect the test outcome, CBTs should be equipped with annotation capabilities. However, a series of studies contradicted this assumption.
The Research Conference was held immediately after the Annual Conference and therefore had fewer presentations than usual. It was held concurrently in two rooms. The main room was attended by only a few people, and the other room was quite popular with a series of research presentations on manga (cartoons). During the Research Conference, many presentations were made on electronic text books. An in-depth discussion was held on the future of education and learning environment. I was much inspired and received hints for my future research.
(Masayoshi Yanagisawa, CRET Researcher)
|
September 18 - 20, 2010
Report on the Presentation at the 26th Annual Conference of the Japan Society for Educational Technology:
"Comparative analysis on tests using 4 types of media; a digital pen, a tablet PC, a PC, and a pencil & paper"
I made a presentation at the poster session of the 26th Annual Conference of the Japan Society for Educational Technology (JSET) held at Kinjo Gakuin University. This is the report and my observation on the presentation.
Computer-based testing (CBT) is anticipated to become prevalent in diverse scenarios. CBT allows for different ways to present problems and for them to be answered, compared with conventional paper & pencil type tests. Problems presented on computer screens vs. being printed on paper, and answers entered by keyboard, touch pen, and digital pen which can recognize written letters on paper are the variety introduced into testing. It is considered important to study the effects of these differences on testing by comparing conventional paper & pencil testing with different media and tools that can potentially be used for CBT.
The objective of this research is to compare 4 different types of media that can be used for testing; a digital pen, a tablet PC, a PC, and a pencil & paper. The subjects for the study were 96 university students who were divided into four groups with an equal distribution of gender and number of students. They took the PISA2003 reading test. The digital pen system manufactured by Hitachi Corporation and the Anoto digital pen were used for the study. A comparison of the test outcome among the 4 groups was analyzed.
The reading test consisted of multiple-choice and free response questions. The multiple-choice questions did not show any difference based on the type of media used. On the other hand, differences were observed among groups for the free response questions. The groups which used the digital pen and the pencil & paper scored higher than the groups who used the tablet PC and the PC.
The free response questions included questions asking students to give their logical opinions and other type of questions instructing them to extract a relevant section from a long passage and discuss it. Differences were observed in the question which required students to extract a relevant section from a long passage and discuss it. In other words, the group which answered the questions presented on paper and wrote answers on a separate answer sheet using a digital pen, and the group which used a pencil & paper scored higher. On the other hand, the tablet PC group and the PC group, which had to scroll through a long passage and questions on screen in order to answer the questions, showed a tendency to score lower. Different answering methods, it is assumed, influenced the test scores.
For some free response questions, both the digital pen group which answered on paper, and the pencil & paper group, showed a significant difference in terms of greater number of characters used, resulting in longer answers, compared with the other two groups.
Many researchers who were interested in media comparison research participated in my poster presentation and have given recognition to the significance of this study, and expressed their hope to see a detailed study and analysis to be continued in the future. I intend to leverage this experience in my future research activities.
(Yuuki Kato, Ph.D., CRET Researcher)
|
September 18 - 20, 2010
Report on the Presentation at the 26th Annual Conference of the Japan Society for Educational Technology:
"Automatic recognition of short written answers using machine learning"
A poster presentation was given at the 26th Annual Conference of the Japan Society for Educational Technology held at Kinjo-Gakuin University from September 18th to 20th, 2010.
The outcome of recent international assessments of academic ability like PISA triggered strong concerns for "linguistic ability" in Japan. This notion will in turn raise the importance of exams with constructed response questions, which requires more time and work for grading. Meanwhile, prevailing ICT now makes it possible to administer tests with constructed questions by using a computer. If answers are stored in a digital form, like in web pages and on blogs, they can be processed to conduct statistical analyses and data mining. Automatic grading and grading assistance using computers are considered quite effective under this situation; they would lead to reduced workload and increased efficiency in grading a large volume of constructed responses.
In this study, score prediction of short constructed responses was made by machine learning. Parts of the entire collection of answers that had already been manually graded were used as a training sample; they were decomposed into words, and we let the computer learn the relationship between the appearance pattern of the words and the corresponding score. This learned relationship was applied to the remainder of the answers to predict their scores. The accuracy of this method was examined by comparing the computer predicted score and the manually graded score. The R language was used to conduct the entire procedure.
The following points were highlighted in the presentation. First, even though the overall accuracy was not yet sufficient for practical application, it was improved on some answers through use of majority votes based on multiple predictions. Second, there was a possibility of improving accuracy by considering scoring rubrics. Finally, using the R language for the entire analysis processes had an advantage over the conventional methods, which required dedicated programs for each process, such as decomposition of answers into words, machine learning, examination for accuracy, and graphical displays.
Due to the time and venue constraints, I was able to discuss with a limited number of people. However, it was a great opportunity for future improvement of this study that some people pointed out issues that I had never considered. As a CRET researcher, I intend to continue to collect relevant information and engage in research exchange at various academic societies and study groups, as well as to present my research outcomes at those opportunities.
(Koji Nakajima, CRET Researcher)
|
September 17 - 18, 2010
Report on the Presentation at the 51st Convention of the Japanese Society of Social Psychology:
"Development and Study on the Validity of the ‘Measurement Scale of Teamwork Ability of Individuals"
I participated in, and gave an oral presentation at the 51st Convention of the Japanese Society of Social Psychology, held at Hiroshima University from September 17th to 18th, 2010. Hiroshima was hot with the end-of-summer heat.
The presentation was on the development of a scale to measure the individual teamwork ability of university students and on the study of its validity.
I have emphasized the fact that there was an absence of a measurement scale for individuals, even though there has been development of a measurement scale for the teamwork ability of a team as a whole.
By referring to the preceding research, I identified five factors which comprise individual teamwork ability: communication ability, team orientation, backup action, monitoring adjustment, and leadership ability.
Then, items were collected for each scale, and the validity of the scale was examined using α coefficient and structural equation modeling. As a result, the validity was confirmed from three aspects, namely: content aspect of construct validity, structural aspect of construct validity, and generalizability aspect of construct validity. I am convinced that the scale developed in my research proved valid based on this outcome. Moreover, the equivalence of measurement between genders was also recognized. These outcomes, I believe, enhance the usefulness of the scale.
During Q & A, comments were made on the equivalence between genders, the practical application and usefulness of the scale, the response bias, etc.
On the usefulness of the scale, I responded by pointing out that useful information can be provided at scenes of intervention aiming at gaining teamwork ability and at the time of organizing a team. Concerning the response bias, my response was that the research on the measurement of an individual teamwork ability using CG, which is under examination at present, should also be useful in clarifying the bias.
My presentation was given orally using PowerPoint. It was attended by about 20 people and I received questions on my presentation from the floor. It was my first academic presentation, and I am glad that it went well.
(Masahiro Takamoto, CRET Researcher)
|
September 17 - 18, 2010
Report on the Presentation at the 51st Convention of the Japanese Society of Social Psychology:
"A Preliminary Attempt on the Measurement of Teamwork Ability using Computer Graphics (CG)"
I attended and gave my poster presentation at the 51st Convention of the Japanese Society of Social Psychology held from September 17th to 18th, 2010 at Hiroshima University. My presentation was attended by around 13 people.
My presentation was on the outcome report of the preliminary research undertaken with a purpose of developing a CG test which measures teamwork ability.
In concrete, a study on the relationship between the free description on the reaction from perceiving video images of fish and people for 45 seconds, and the individual teamwork ability (Takamoto, Aikawa & Sugimori, 2010).
Research was conducted on university students. The response to questions on individual teamwork ability was prepared. 6 types of CG stimulating images (a single fish/ person, a fish/person outside the group, a fish/person inside the group) were shown respectively for 45 seconds and students were requested to give free descriptions on about what the assigned fish and person might be feeling.
The relationship between the subordinate factors (communication ability, team orientation, backup action, monitoring adjustment, leadership ability) of the individual teamwork ability scale and the rate of categorized appearance rate of the free description to CG shown was analyzed with chi-square test. A correlation was found between the backup action of individual teamwork ability, the subordinate factors excluding monitoring, and the appearance rate of free description s to CG shown (a fish outside the group, a fish inside the group, a person inside the group.) This outcome suggests the possibility of measuring the teamwork ability of individual using CG.
Questions received were on the validity of the CG, the method of creating the CG, etc. On the validity of the CG, I responded that future research should be conducted with employees of a company, and an objective assessment on the individual teamwork ability within a company should be studied at the same time, in order to examine the validity of the CG. On the method of creating CG, I have demonstrated it using CG software.
Many people showed an interest in my poster presentation because it was a new attempt, and also because I showed the CG images which were used for the research. 45 minutes was the original time allocated for my presentation but it was not enough. Therefore I have continued my presentation as long as the schedule has allowed.
I must admit that I was quite nervous since it was my first academic presentation. Nevertheless, the poster presentation gave me an invaluable opportunity to develop new perspectives, and receive various comments and advice.
(Rie Tateishi, CRET Researcher)
|
September 6th, 2010
Report on the Presentation at the Japanese Federation of Statistical Science Association 2010:
"New Generic Skills Required of Workers in the 21st Century-Assessment and Development of those Skills"
CRET and Benesse Corporation have engaged in joint research from 2008 to 2009. This consisted of preliminary research on a statistical and mathematical processing test for university students, followed by the development of training materials. I have presented the outcome of this joint research at the planning session entitled "cross-sectional human resources development" at the 2010 Convention of the Japanese Federation of Statistical Science Association. Approximately 120 researchers and educators participated in the session. This report covers the content of my presentation with a focus on the weaknesses of university students in the area of statistical and mathematical processing, and the composition of the training materials.
One of the essential skills needed in the 21st century is problem-solving capability, a capacity to effectively solve different types of problems. We have first defined the ability to objectively solve problems based on numerical data, and developed an assessment method geared to university students. This is a context-based test modeled after the problem-solving process developed by CensusAtSchool in New Zealand. It includes four problem-solving steps named PPDAC (Problem, Plan, Data, Analysis, Conclusion ) cycle.
As a result of preliminary research with 200 university students, we have identified the weaknesses of the students whose grades range from low to middle. Some of the weaknesses we have found are: a lack of ability to express "the growth rate of revenue over the previous term" using numerical expressions; a difficulty to grasp the meaning of the symbols explained in the questions; an inability to accurately interpret the numerical values which they calculated correctly, etc. In short, they are not equipped with the ability to connect words and numbers.
With this finding, we have developed training materials. They follow the problem-solving process in the same way as the framework of the test, and follow three sets of processes: 1. Identify the problem and the context; 2. Analyze the problem by "confirming the position + 3 viewpoints"; 3. Judgment. We used situations that the students are familiar with. When we conducted the monitor survey among about 80 students, positive feedback was received, such as "became aware of the importance of data analysis and statistics", "felt that it was useful for job hunting", "came to recognize the meaning of the things I was doing unconsciously", etc.
The last comment suggests the importance of lecturing on the problem-solving process intentionally and explicitly, rather than the implicit teaching often observed in university classrooms in Japan.
(Chie Hoshi, CRET Researcher)
|
August 29th, 2010
Report on the Presentation at the 52nd Annual Meeting of the Japanese Association of Educational Psychology:
"A study on the subject vocabulary of Japanese Language and Math"
The Assessment and Analysis of Educational Testing of CRET has been conducting research and development on multiple-choice vocabulary tests on different subjects. Part of the research outcome was shared through an oral presentation entitled "A study on the subject vocabulary of Japanese Language and Math" given at the 52nd Annual Meeting of the Japanese Association of Educational Psychology.
About 20 people listened to the presentation. Most of them were university researchers.
Please refer to the PDF materials for details.
(Yiping Zhang, Ph.D., CRET Researcher)
|
August 27 - 29, 2010
Report on the Presentation at the 52nd Annual Meeting of Japanese Association of Educational Psychology:
"Examining the validity of Shyness IAT"
I have given an oral presentation at the 52nd General Assembly of the Japanese Association of Educational Psychology held at Waseda University (Waseda Campus) from August 27th to 29th, 2010.
The Annual Meeting had changed its style of presentation. All presentations were orally given in small rooms without any poster presentations. There were numerous sessions. No large room was used because there were no poster presentations. It was quite a refreshing change and I enjoyed it.
My presentation was on the validity of the Japanese version of the shyness IAT (Implicit Association Test), that we have developed and have been studying for three years. It is similar to what was presented at the International Congress of Applied Psychology (ICAP) in July, 2010. The purpose of the presentation in September was to disseminate information further throughout Japan and overseas.
The research results from last year were adopted by the research journal, "Japanese Journal of Psychology", published by the Japanese Psychological Association in September, 2010. It is an important recognition of our research. However, we were well aware that this research must be furthered by increasing the number of samples to prove the robust outcome. This was set as our future agenda. The presentation reported here, I believe, has sufficiently fulfilled this agenda because sufficient samples were secured and the robust outcome was attained.
My presentation was on the research on the visible shyness measured by question sheet and potential shyness measured by IAT, which did respectively predict the degree of controlled shy behaviors perceived by others (aggressively appealing concerning one’s existence, etc.) as well as non-controlled shyness behaviors (blushing in front of people, etc.) This was a confirmation of the research outcome last year. Thus, the validity of the shyness IAT, I believe, was confirmed.
The time given for the oral presentation was 20 minutes, including a Q & A for 5 minutes. Compared with there poster presentation, there was less time for explanations. However, I am grateful for the questions and comments given to me.
Questions included; "What stimulant words did you use in IAT?," "How did you select those words?," and on the models we assumed for covariance structural analysis. I deeply appreciate wide-ranging questions ranging from procedural questions to the substance of the analysis. It was an invaluable experience for me.
With this presentation, the purpose of our attempt to measure shyness with IAT was attained. From now on, I would like to pursue new challenges.
(Tsutomu Fujii, CRET Researcher)
|
August 30th, 2010
Report on the presentation at the Japan Association for Research on Testing (JART) 8th annual meeting:
"The Framework of problem-solving abilities as the required skills for the 21st century and international assessment"
CRET has been engaged in continuous research and development of the abilities and competences required in the 21st century. The research outcome was presented at the JART 8th annual meeting entitled "Measurement of Skills", in one of which planning sessions we discuss the theme of "Assessment and learning of problem solving." Approximately 70 researchers and professionals in the education field attended the session. The following is a report on the presentation made, focusing on the formulation process*1 of DeSeCo*2 key competencies.
With the advance of technology and globalization, routine work is rapidly being replaced by computers. As a result, more sophisticated competency is required in areas of work where people are engaged, such as the abilities to use a flexible sense of judgment depending on the circumstances. Discussion over such changes in required competency started from the 1980s in the international community.
OECD established three key competencies*3 identified as appropriate competencies in various areas such as economy, politics, society, family and individuals. They are used as the underlying principle applied to OECD international surveys such as PISA.
DeSeCo projects were promoted by Dr. Heinz Gilomen from SFSO*4, Switzerland; Dr. Eugene Owen from NCES*5, U.S.A.; Dr. Barry McGaw and Dr. Andreas Schleicher from OECD; and Dr. Scott Murray from Statistics Canada who joined the team in the year 2000. In the first half of the period since the inception of the projects, reviews of existing OECD reports were made and expert opinions from multidisciplinary fields involving educators, economists and psychologists were collected. In October 1999, the outcome of the review was presented at the 1st DeSeCo International Conference held in Neuchatel, Switzerland. From the year 2000, opinions from OECD member states were invited. 12 members*6 have submitted their reports and they were reported at the 2nd DeSeCo International Conference held in Geneva, Switzerland, followed by the final report compiled in the year 2002. There are 4 official reports*7 presented by DeSeCo projects.
Pursuant to the presentation on key competencies by DeSeCo, EU, U.S.A., Japan and other countries have proposed their respective framework of key competencies. We plan to examine the methods of assessment and development of key competencies by accurately defining them as the abilities*8 to draw relevant knowledge and skills depending on circumstances, by transcending frameworks of specific expertise and discipline.
*1 source : DeSeCo Background paper , Revised (2001)
*2 THE DEFINITION AND SELECTION OF KEY COMPETENCIES
*3 Using Tools Interactively, Interacting in Heterogeneous Groups, Acting Autonomously
*4 Swiss Federal Statistical Office
*5 National Center for Education Statistics
*6 Austria, Belgium, Denmark, Finland, France, Germany, the Netherlands, NewZealand, Norway, Sweden, Switzerland, and the US
*7 Projects on Competencies in the OECD Context: Analysis of Theoretical and Conceptual Foundations (1999), Comments on the DeSeCo Expert Opinions (1999), Definition and Selection of Key Competencies (2000), Defining and Selecting Key Competencies (2001)
*8 source: THE DEFINITION AND SELECTION OF KEY COMPETENCIES Executive Summary
(Chie Hoshi, CRET Researcher)
|
August 30 - 31, 2010
Report on the presentation at the 6th National Convention on the Japan Association for Developmental Education (JADE):
"From the perspective of University Teaching Practitioners"
The Japan Association for Developmental Education (JADE) National Convention was held at the Shonan Institute of Technology on August 30 & 31, 2010. The chairman of the convention executive committee was Mr. Ryuichi Mizumachi, a CRET researcher. Another CRET researcher, Mr. Masashi Misono, also worked as an active member of the committee.
I was one of the designated presenters at the main symposium held in the main hall from 14:20 to 16:20. The theme of discussion was "Liaison exam questions between high school and university and the university education which ensures sufficient scholastic abilities." It was an extremely interesting topic as a CRET researcher.
The presenters were the following; (from the Table of Contents of the Convention);
1. The problems of "liaison"
By Dr. Toshihiro Kawamoto, the President of Japan Achievement Society
2. Turning point of Japanese-style liaison between high school and university
"What the Liaison exam between high school and university can offer"
By Dr. Takao Sasaki, specially-appointed professor at Hokkaido University
3. Designated discussion:
"From the perspective of University Teaching Practitioners"
By Dr. Kanji Akahori, Professor Emeritus of Tokyo Institute of Technology,
Professsor of Hakuoh University
The Liaison exam between high school and university proposed by Dr. Sasaki is highly acclaimed and drawing much attention now as a possible replacement to the National Center Test. The conventional test conforms to group, but the Liaison exam is standardized. Many participants showed interest in this aspect.
I have posed questions from the perspective of the expected requirements of the Liaison exam. They should include the establishment of study habits, gaining of learning skills, and education to guide students to set objectives, in order to guarantee the minimum standard to be met for graduation. There were many responses from the participants. The possibility of printing the record of the discussion on the academic journal of the JADE is studied. At the pursuant reception, I received questions from several academicians.
(Kanji Akahori, Ph.D., CRET Board of Directors)
|
July 30 - August 4, 2010
Report on the presentation at the 92nd National Arithmetic/Mathematics Education Research Convention:
"Research on developmental teaching materials to enhance interest of mathematics <VI>"
An oral presentation was made at The 92nd National Arithmetic/Mathematics Education Research Convention, held from July 30 to August 4, 2010 in Niigata. This is a joint research with Dr. Akihiro Yoshida, lecturer at Musashi University and Junior & Senior High School at Komaba, University of Tsukuba University.
This research is aimed at developing methodology and application for teaching by utilizing "Cavalieri’s principle" and "homothetic ratio" as a method of obtaining the area of the region surrounded by parabola and line (mensuration), instead of the conventional method using the definite integral taught in Math II at high school. We believe students can understand this method more intuitively. In our presentation, we have introduced the method of mensuration and its generalization.
Research on mensuration dates back to Archimedes’ mensuration by division. This method is also called "method of exhaustion" which obtains the area of a polygon through an approximation of the region with a triangle/polygon (expressed as the sum of an infinite series.) When Newton and Leibniz established the differential and integral calculus in the 17th century, mensuration based on the "definite integral", which makes use of the inverse operation of the differential and integral calculus (the fundamental theorem of calculus), became a popular method.
On the other hand, the "Cavalieri’s principle" was discovered by Cavalieri from Italy in the early 17th century. It is a general principle concerning area and volume. On volume, for instance, this principle states that "the volume of two solids which always have an equal area of cross section is equal." Using this principle, the volume of a sphere can be easily obtained.
The method we have proposed first compares the relationship between a general region surrounded by parabola and the most basic region (calls unit region) within that region using the principle of Cavalieri. Then, two unit regions are combined and Cavalieri’s principle was applied once again to gain the area of the unit region as well as general region. Thus, the area can be obtained without referring to the concept of "infinity" required by the method of exhaustion and definite integral.
Also, by using "even and odd functions", "binomial theorem", "mathematical induction" taught in mathematics classes in high school, the method proposed by us can expand to the cases where the curved line is expressed using multinomial expressions possessing degrees of the integer greater than or equal to zero, not limiting to parabola.
Our presentation took place in the room where basic and independent research themes at a senior high school sectional meeting were being discussed. There were about 30 participants, most of whom were senior high school teachers and graduate school students. During the Q & A session, Fermat’s preceding research was introduced. I am now examining its content with my co-researcher. Moreover, we are continuing research on the math teaching ideas which apply the method we have proposed, in addition to our proposition of the mensuration based on a different way of thinking.
(Daisuke Abe, CRET Researcher)
|
July 11 - 16, 2010
Report on Presentation at the International Congress of Applied Psychology (ICAP2010):
"An attempt to measure implicit and explicit shyness: Using Implicit Association Test (IAT)"
We participated in the International Congress of Applied Psychology, ICAP2010, held in Melbourne, Australia, from July 11th to 16th, and gave a presentation using electronic posters.
Melbourne has tram services throughout the city and they provided us with a convenient means of transportation. The International Congress was held at a large conference hall. It was an impressive venue compared with other academic congresses often held at university campuses in Japan. The congress staff offered warm hospitality and were very accommodating. They tolerated my English and were very responsive to the questions and requests of the participants. I was touched.
Our presentation was prepared by uploading PowerPoint slides to the ICAP server. It was a different approach from the poster presentations generally made. During the Congress, participants were able to view the slides of the presentation they wished to read from the computers installed at the venue.
There existed no presentation time schedule nor paper media posters. It was a different experience from what we were used to as poster presentations. There were no questions and comments vis-a-vis our presentation. It was the same for other presentations which participated synchronously. Little communication seemed to take place on electronic posters as a whole.
We presented the additional testing we did to measure shyness using the Implicit Association Test (IAT) following the previous testing by Fujii, Sugimori, and Aikawa in 2009. New participants (different university students) were used. According to last year’s research, there was a difference in the anticipated behaviors between shyness measured by a questionnaire and the IAT. Shyness measured by a questionnaire anticipated one’s controllable behaviors (e.g., appealing one’s existence to others). On the other hand, shyness measured by the IAT anticipated one’s uncontrollable behaviors (e.g., blushing in front of others). Dependent variable was the other-evaluation made by test participants’ friends. In our research using new participants, we believed that if a similar outcome to the previous research was reproduced, the validity of the IAT as a measurement of shyness would be enhanced.
The participants were university students from different universities from the previous experiment. In experiment 1 (participants: 91 undergraduate and graduate students), along with an invitation to participate in the new experiment, the explicit shyness of the participants and social desirability were measured using a questionnaire. In experiment 2 (Participants: 50 undergraduate and graduate students), implicit shyness was measured using the IAT, and three sets of envelopes per participant (an other-evaluation questionnaire was enclosed) were provided and an assessment by three friends was requested.
A factor analysis of the other-evaluation scale was made. Four factors; sociability, gaining praise, denial avoidance, and interpersonal tension were extracted. When each factor was categorized based on the dimension of control possibilities, controllable behaviors (sociability, gaining praise, denial avoidance) and difficulty of control (interpersonal tension) resulted. Next, a covariance structural analysis was made using explicit and implicit shyness of the participants as independent variables, and four factors of the other-evaluation were used as dependent variables. The outcome was similar to that of the previous research. In concrete, explicit shyness measured by the questionnaire predicted the degree of controllable actions (3 factors), whereas implicit shyness measured by the IAT predicted actions difficult to control (1 factor).
Moreover, explicit shyness and the scale which measured social desirability showed a correlation, and a possibility of distortion in response was suggested.
On the other hand, there was no correlation between the IAT and social desirability. This is an outcome which supports the effectiveness of the IAT.
Aspects which can be measured only through implicit measurement were indicated. It means that implicit measurement is useful for measuring shyness in a multi-dimensional manner.
(Tsutomu Fujii, CRET Researcher)
|
July 8th 2010
Report on a presentation at the International Meeting of the Psychometric Society (IMPS 2010):
"An Item Response Model for Probability Testing"
This report describes the oral presentation entitled "an Item Response Model for Probability Testing," which was given at the International Meeting of the Psychometric Society (IMPS 2010) held at the University of Georgia in Athens, Georgia. Approximately 30 people, most of whom were researchers in psychometrics, participated in the session.
Probability testing is a response method for multiple-choice test items. In probability testing, for each response option, an examinee gives his/her subjective probability that the option is correct, unlike the usual "choose-one-alternative" method. It is expected that probability testing enables an estimation of the target ability with better accuracy than the "choose-one-alternative" method. It is because the former takes into account a finer difference in the probability rating attached to the correct response option, while the latter is only concerned with whether the response is correct or incorrect.
In this study, an item response model that can be applied to response data from probability testing (the PT model) was proposed, and its performance was evaluated by simulations and read data.
The proposed PT model can be regarded as a probability testing version of the two-parameter logistic model (2PLM) in the item response theory (IRT). Familiar item characteristics in IRT, such as the item response function (discrimination and difficulty) and the item information function, can be expressed in exactly the same manner in the PT model. In terms of the item information function, which represents the estimation accuracy of the ability parameter, it is theoretically derived that the PT model yields information at least several times as large as 2PLM.
In the simulation study, replications of hypothetical data with 20 items and 500 respondents were generated under the PT model and used to examine parameter recovery and estimation accuracy of the PT model. Performance of the PT model was also compared to cases in the 2PLM with equivalent item parameters. The results indicated that in the PT model the parameters were estimated with practically sufficient accuracy (measured by the mean squared error), and it was substantially better than the 2PLM in terms of both item and ability parameters. The fact that the estimation accuracy of the ability parameter in the PT model was much better than the 2PLM showed the effectiveness of the PT model.
In the study with real data, probability ratings of 414 high school students to 20 English test items were analyzed. In this particular data, more than 60% of the ratings were either exactly 0% or 100%, and this deteriorated the estimation under the PT model. In situations like this, it would be better to use an IRT model after transforming probability ratings into dichotomous data.
Probability testing requires training of examinees so that they can express their beliefs in the form of probability. In the discussion that followed the presentation, a few participants asked about how to train examinees or what population of examinees would be appropriate for probability testing, while some others expressed very positive reactions to the idea and usefulness of the PT model. The practicality of probability testing heavily depends on trainability and the background knowledge of examinees, so we will continue to work on these issues.
(Kentaro Kato, Ph.D., CRET Researcher)
|
July 3 - 4, 2010
Report on the presentation at the 21st Assembly of the Japanese Society for Curriculum Studies:
"Comparative Survey on study motivations and mathematics ability of junior high school students in Japan and in China"
The re-analysis outcome of the comparative survey on study motivations and mathematics ability of junior high school students in Japan and in China was presented at the 21st Assembly of the Japanese Society for Curriculum Studies held at Saga University on July 3rd and 4th, 2010.
This is a joint research by Kanji Akahori and Liu Yun Long(Mitsubishi Chemical Corporation), conducted several years ago. The research data was reviewed and reanalyzed by the two researchers for the following reasons;
-This research was little known as the measurement of the international academic ability in China.
-The outcome far exceeded our expectations.
-The test results insinuated the existence of a cultural gap in general, resulting from the perception gap of the guardians and students as well as the curriculum between two cultures.
I want to explain the background to this study first. China has not been an active participant in international academic ability surveys. However, the outcome of international academic ability surveys in Singapore, Taiwan and Hong Kong lead us to believe that children of Chinese descent perform rather well. Also considering rigid competition for entrance examinations, it is anticipated that Chinese children may outperform Japanese children. On the other hand, due to the wide economic disparity between urban and rural regions of China, we did not expect a high performance among children in rural communities.
The outcome of our assessment showed the best performance in rural regions in China, followed by urban regions in China, and Japanese students. The details are described in the presentation summary. The causes of this surprising result had to be analyzed. We implemented a questionnaire for both Chinese students and their guardians, with the purpose of investigating the relationship with the grades. At the same time, face-to-face interviews with Chinese students and teachers were conducted.
Our findings showed a gap of perception between what we had assumed and the reality. We had envisioned a grim image of competition for entrance examinations. However, from our surveys, this perception was replaced by the perception of positively progressing towards their goal of hopes and dreams. This gap seemed to reflect the difference in the momentum of politics and economy in Japan and China today. Moreover, the fact that students in rural districts in China outperformed urban districts in China and Japan was quite shocking. This outcome can be interpreted as a gap in culture and awareness having a greater influence on academic ability than economy, contrary to the OECD analyses and other research materials.
The challenging issue here is the reliability of the outcome. We can recognize that 829 students were actually tested. China is a vast nation and there may be major inter-regional influences. I intend to pursue our joint research and further analyze the aspects relative to teaching and guidance methods as well as education principles. The photo shows the survey site.
Many questions were entertained at the Assembly. After the presentation, several researchers proposed engaging in joint research with us.
(Kanji Akahori, Ph.D., CRET Board of Directors)
|
June 27 - July 1, 2010
ED-MEDIA2010 Presentation Report:
"CBT and Paper Testing in an Examination of Regular Expression, Using University Students as Research Subjects"
This presentation was made at ED-MEDIA201, held in Toronto, Canada from June 27th to July 1st, 2010. This is a joint research by Toshihiko Takeuchi and Tomohiro Wakui (Ibaraki University).
The objective of this research is to compare CBT (Computer Based Testing) and paper testing.
We have first developed the following hypothesis: in the learning field, such as regular expression, CBT allows respondents to simulate their response. It leads to far better results than paper testing. CBT test takers can accurately anticipate their test results. The rate of blank answers rises. Respondents will have a better impression of their experience after taking a CBT.
In order to verify this hypothesis, we have conducted a one-hour lesson on regular expression to 40 students, from freshmen to seniors, in various universities. Immediately after the lesson, they were divided into two classes with 20 students in each class. One group took a CBT and the other group took a paper test consisting of the same 12 questions. Before and after the test, the students in each group were requested to fill out the questionnaire on their attributes and their level of confidence towards each question.
Test procedures are as follows:
1. Lesson on regular expression (60 minutes)
2. Pre-test questionnaire (5 minutes)
3. Break (10 minutes)
4. Examination on regular expression(20 minutes)
5. Post-test questionnaire (5 minutes)
In the regular expression lesson given before the examination, nearly comparable software to CBT was used. Figure 1 shows the system used for CBT.
The result of the experiment was as follows:
The full score is 12 points. The CBT test group (5.85 points) scored slightly higher than the paper test group (5.05 points) but with no significant difference between the two groups. As for the mean value of one’s expected score and the actual score, the CBT group (1.50) and the paper test group (1.80) showed no significant difference There was, however, a significant difference in the rate of keeping the blank answers between the CBT group (4.58 questions) and the paper test group (0.08 questions).
Our findings indicated the following tendency of CBT takers for an examination of regular expression:
-Test takers have a tendency to “keep challenging the questions until they are solved and answer them only when they are confident of their answers.”
-Test takers tend to become perfectionists and spend much time on each question, and as a result, tend to spend too much time at the beginning, and run out of time before answering all the questions. The end results of the examination end up similar to those of paper test takers.
During the Q & A session, there was a comment;
“Researchers stated that the ‘CBT group ran out of time because they were more devoted to answering each question.’ However, the fact that the paper test group showed a lower rate of blank answers can be interpreted as evidence that the paper test group tried harder than the CBT group.” Based on our observation of actual testing and without any significant difference in the accuracy rate in total between the two groups, our impression was that the paper test group was simply filling in the blank for each question, for the sake of doing so.
(Toshihiko Takeuchi, CRET Researcher)
|
May 15th 2010
Report on the presentation at the Japan Society for Educational Technology:
"Study of the influence of modern character images appearing in test questions and giving oral encouragement to the test takers"
We have experimented on the effects of modern character images uttering encouraging words to the test takers at the beginning and in the middle of tests. The results were orally presented at a research conference of the Japan Society for Educational Technology held on the Asahikawa Campus of the Hokkaido University of Education.
Most of the research concerning testing is centered on the study of the cognitive aspect. However, it is empirically shown that the sentiment of the examinee also affects test results.
Many examinees tend to feel nervous or anxious while taking a test. If there is an opportunity to make them relax and ease their tension, some positive impact on the test results and their attitude towards test taking may be expected.
In our research, we have used character images which appear in the middle of questions and speak to the examinee. Their effects on testing were studied.
This is a follow-up research to the CRET research conducted in 2009 by Drs. Yuuki Kato, Shogo Kato, and Kanji Akahori as the Phase I study on the effects of character images giving encouragement.
In our research as the Phase II, we have repeated the method used in the previous research.
At the beginning and in the middle of the paper-based test, encouraging words coming from human characters were shown. We have examined whether this has helped ease the tension of the test takers and led to test taking in more relaxed manner, or whether this has had any negative impact. We have also studied if the encouragement by the character images has any impact on the actual implementation of the task.
The following is the difference between the research of Phase I and Phase II..
In the Phase I research, we set up three conditions and encouraging words which correspond to each three conditions, and compared them to each other. We used three characters to give encouraging words. One was a male junior high school student, the second was a female junior high school student, and the third was a pencil character. The first and second characters, which were human characters, seemed outdated and were seldom seen in teaching materials commercially available today.
In the Phase II research, we used modern and trendy animated characters called "moe" in Japanese.
In the Phase I research, examinees, who were university students, seemed to feel exposed to images of junior high school students because the characters were junior high school students. In the Phase II research, we utilized a mature teacher-like character. Conditions were increased by adding "no character" and "no encouraging words".
We have analyzed the effects of each condition on the test results and the attitude of examinees. The result indicated that the encouragements given by the characters, in particular, have effectively influenced the way examinees tackled the problems.
Compared with the Phase I research, the use of modern characters suggested the possibility of exerting a more favorable influence. As for the test results, there was no statistically significant difference. We have concluded that the encouragement by the characters helped students reduce the amount of wasted time and improved their attentiveness with more persistence during test taking.
Our presentation invited many comments from the participants of the research conference. They are summarized as follows;
1. Since the use of characters giving encouragement had more impact on the way
examinees carried on with their test taking, perhaps this method is considered more
useful when used in various learning activities.
2. Does the content of the test make a difference in the impact of the characters or not?
3. The same type of research should be done with CBT as well.
4. It is not possible to conclude that the examinees were more relaxed as a result of
the encouragement by the characters.
We intend to explore the open issues by taking account of these valuable comments.
(Yuuki Kato, Ph.D., CRET Researcher)
|
May 15th 2010
Report on the presentation at the Japan Society for Educational Technology:
"Comparison of CBT and paper testing for regular expressions given to university students"
An experiment was conducted to compare CBT and paper testing for regular expressions among university students and the result was orally presented at a research conference of the Japan Society for Educational Technology held at the Asahikawa Campus of the Hokkaido University of Education.
The objective of this research was to make a comparison between Computer Based Testing (CBT) and paper testing.
In a field of learning such as regular expressions, respondents are able to simulate their answers in CBT. Therefore, the following assumptions were made for CBT.
1. Test results can greatly improve compared against paper testing.
2. Respondents are able to accurately predict their test results.
3. The number of blank answers will increase.
4. Respondents' impression after the test will improve.
In order to verify this hypothesis, we have conducted a class on regular expressions for students from various universities. Forty students ranging from freshmen to seniors took a one-hour classroom lecture, immediately followed by testing. They were divided into two groups with twenty students in each. One group was given CBT and the other a paper test, each consisting of identical 12 questions. Before and after taking the test, each student was asked to respond to the questionnaire concerning the confidence for each question and the attributes of each student.
The result of the experiment showed no significant difference in the test, which had a perfect score of 12 points. The CBT group scored 5.85 points, which was slightly higher than 5.05 points by the paper test group. As for the mean value of the difference between the expected and the actual score, there was no significant difference, with 1.50 points for CBT and 1.80 points for paper test group.
A significant difference was shown in the rate of blank answers, with 4.58 items in the CBT group and 0.08 in the paper test group.
We have concluded that in CBT on regular expressions, respondents tend to spend much time in their attempt to answer each question, and answer the questions only when they are confident and leave the rest blank. In other words, they are inclined to perfectionism and spend too much time on the questions in the beginning and tend to run out of time before answering the rest of the questions. As a result, the tendency is that their test results were not much different from those of the paper test group.
In the Q & A session, there was a comment suggesting that the focus should be on a test which can be simulated, rather than on a comparison between CBT and a paper test.
(Toshihiko Takeuchi, CRET Researcher)
|
October 10-12, 2009
Report on the JSSP 2009/JGDA 2009 (SP50) Presentation:
"The Concept of Teamwork: Working toward Measurement of Teamwork Capabilities"
A presentation on the development of the scale to measure teamwork capabilities was given at JSSP2009/JGDA2009(SP50).
A standardized scale to measure individual teamwork capabilities does not exist in Japan, even though several scales have been developed in other countries. There are three reasons which explain why relying on the translation of foreign research is not sufficient for the successful development of a teamwork scale for domestic use.
1. A cultural gap may exist in the concept of teamwork.
2. Depending on different conditions, such as team goals, activities, and members, a team member may exhibit different qualities and contribute a different quantity of teamwork.
3. Teamwork is a complicated phenomenon and around seven varieties of lower scale are usually combined to measure teamwork in foreign countries.
It is therefore fair to assume that the most reasonable combination of factors do vary, depending on the team goals, members, culture, etc.
We have surveyed the connotation/meaning of the concept in order to find out what exactly it meant when Japanese university students, and workers in the U.S. and Japan, positively or negatively assessed teamwork.
Based on its outcome and the existing related scale, 30 scale items which were deemed to reasonably measure teamwork (with high face validity) were selected. Japanese and English versions were created.
A survey was conducted among Japanese and American workers to explore how each item is considered to effectively measure the degree of individual teamwork abilities. ("Face Validity Survey").
The result of the connotation survey showed that Japanese university students frequently picked "cooperativeness" and "common goals". Japanese workers selected "fun", "ability to listen", "vigor", "openness", "accommodating", "laughter", "entrusting", "high" and other cheerful and enjoyable images associated with a work environment. On the other hand, in the U.S., there was a tendency to select words to fulfill the team mission such as "communication", "knowledge", "sharing", "leader", and "decision-making".
Based on this observation, the Development of Testing Items Group at CRET has picked representative items from the existing scale items and integrated them into the newly developed items. Then, face validity was examined. Next, the gender and cultural gap between Japan and the U.S. was studied.
An active Q&A session followed the presentation. Interest in future development, and the importance of "severity", "punishment", "compliance", which did not appear in the connotation survey, were suggested as important items during the Q&A.
(Shinkichi Sugimori, Ph.D., CRET Researcher)
|
October 10-12, 2009
Report on the JSSP 2009/JGDA 2009 (SP50) Presentation:
"Assessment of Shyness using IAT (2)"
This is a report on the poster presentation given to the JSSP 2009/JGDA 2009 (SP50) on shyness assessment, using the Implicit Association Test (IAT). A fruitful discussion with many exchanges of opinions followed the presentation.
Compared to other methods which measure individual differences in implicit cognition, IAT has been recognized as a test with superior reliability and stability. Since its advent, increasing numbers of research utilize IAT methodology. Asendorpf, Banse, & Mucke (2002) have validated the double dissociation between implicit and explicit personality self-concept in the case of shyness. In their model, the IAT predicts spontaneous shy behavior (tense posture, etc.), whereas the explicit ratings predict controlled shy behavior (length of speech), resulting in double dissociation.
The research on implicit association measurement predicting the often uncontrollable behavior as represented in the observed change in performance of task achievement (e.g., Egloff & Schmukle, 2002; Fujii & Uebuchi, 2009) shows that it is possible to regard explicit measurement to predict self-controllable behavior, whereas implicit measurement predicts uncontrollable behavior.
Considering the above research outcome, Fujii, Sugimori, & Aikawa (2008) developed IAT which measures shyness by referencing the Asendorpf et al. (2002) model, and implemented the test. The outcome was used to study the relationship between implicit shyness and social desirability. Additional data from Fujii and others was collected and used for re-analysis. The result is reported as Study 1. In Study 2, other assessments by friends and acquaintances of the Study 1 participants were used to examine the subject for assessment in shyness IAT.
(Tsutomu Fujii, CRET Researcher)
|
October 27th, 2009
Report on E-Learn 2009 Presentation:
CBT R&D Report on "the Use of Animation to Display Math Problems in a Computer-Based Test"
In late October 2009, we presented our research on the use of animation to display math problems in the computer-based testing interface, at the international conference, E-Learn (World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009) held in Vancouver, Canada.
This research was initiated to explore the alternative methods to measure math competence without relying on the examinees?Ereading literacy. We created four types of math questions (solid figures, velocity, equation, pattern locus) in two versions: one version as animation and the other as written texts. The experiment was conducted involving 20 university students. Our major thrust of the experiment was to measure how well and how fast the examinees were able to understand the questions, rather than the accuracy of answers to the questions.
Based on the questionnaire assessment, there was a significant difference in examinees?Eimpression that the animation version seemed easier. However, based on the rubric assessment on understanding of questions, as well as the time required for comprehension of questions, there was no significant difference between the two versions. At E-Learn 2009, we have presented our conclusion by suggesting, as a possibility for math examinations, the use of animation instead of written questions.
Three individuals posed questions and comments vis-a-vis our presentation, confirming the details of the experiment and analytical methods, as well as a comment that it would be a useful method of presenting questions to children with learning difficulties. Advice and positive feedback were received, such as ways to improve the item bank through animation, and its application to other science and math courses.
This experiment may have depended on the characteristics of questions and the examinees, and therefore further verification is to take place.
(Masahiro Yachi, CRET Researcher)
|
October 24th, 2009
The influence of images of characters who speak encouraging words on examinees.
Computer Based Tests (CBTs) are expected to be introduced in various scenes. CBTs may have fewer constraints than Paper Based Tests (PBTs) in terms of displaying the questions, and the methods of answering. Images and animation can also be used. Research was conducted as the first stage of examining how the use of character images may or may not influence examinees. By inserting human characters who speak encouraging words at the beginning or in the middle of a test, what kind of effects could be expected? Would it ease the tension of the examinees and help them relax, or might negative effects be expected instead? Would it affect the process of answering the questions? The study was carried out using paper-based experiments.
Questions were created that were simple but rather monotonous, and could be irritating and time-consuming to answer. Images of characters were utilized at the beginning and in the middle of the tests. They spoke words meant to cheer up the test takers. Four conditions were used to differentiate the way words were spoken;
1. The character of a boy spoke encouraging words.
2. The character of a girl spoke encouraging words.
3. An illustration of a pencil spoke encouraging words.
4. Encouraging words were heard without character images.
Results showed that using the human character images (1 and 2) negatively affected the performance and mood of test takers.
Many comments were received after our report was presented at the Japan Society, which can be summarized as follows:
1. There are a variety of character images that can be used. Therefore, a continued study should be made using different character images.
2. The experiment used images of a boy and a girl. It may be interesting to compare the feedback based on how each gender responds to the character of their own gender or the opposite gender.
3. The character images used looked like junior high school students. The examinees were university students. They may not have liked encouragement coming from younger characters.
4. The test questions used were of a simple nature. If difficult questions were used, perhaps the results of the experiment might have been different.
Based on the comments received, we intend to continue with our research.
(Shogo Kato, Yuuki Kato, CRET Researchers)
|
October 24th, 2009
Research on the influence of "with" or "without" annotations on simple tests.
Students generally use annotations while taking tests, such as underlining and marking key words, or scribbling in the margin of test sheets. It is possible with paper-based tests, but usually not with CBTs. If annotation gives positive effects on the test performance, CBTs need to be equipped with a comparable interface in order to avoid differences between CBTs and PBTs, caused by the ability or inability to annotate. If, however, there is no difference in performance, with or without annotation, there is no need to make an enhancement to CBTs with an interface for annotation.
This research was conducted in a simple test to study the effect of annotation by comparing the results between two groups: one allowed to use annotation and the other not allowed to use it. The relationship between the examinees?Eannotation activities in daily life and the control of annotation activities was also investigated.
To our surprise, the result showed that there was little significance in the use of annotation. Annotation proved effective under the circumstances where annotation was absolutely advantageous. However, in general, there was almost no difference. When the test involved difficult reading or when annotation methods were not formalized, in some cases there was the possibility of annotation interfering with the answers. There are many open issues to be explored, such as the relationship between the difficulty of questions and the effects of annotation, needing to teach the right way to annotate or not, and the verification of the effect of teaching.
In our presentation, we have also introduced the preceding research result. It showed that underlining while reading a book is effective, only when the text is quite difficult and there is sufficient time to do so. There was some feedback from the floor that it was surprising to hear that the effect of annotation was not very apparent. Most of the feedback consisted mainly of asking for the details of the research. There was not much discussion. We sensed that they had high expectations of further analysis. Our research did not include the analysis of the relationship between daily annotation activities and the test results of the university students who have participated in our experiment. We intend to announce the result in the future.
(Masanori Yanagisawa, CRET Researcher)
|
September 2nd, 2009
Report entitled "Methodology of Research on the University Teaching and its Application on the Freshmen Education (the First Year Experience at Universities and Colleges)" presented at the 5th National Convention on the Japan Association for Developmental Education.
Dr. Kanji Akahori, a member of the CRET board of directors, has given the presentation at the 5th National Convention on the Japan Association for Developmental Education, held at Chitose Institute of Science and Technology from September 1st to 2nd, 2009.
His report was based on the findings of the Design Based Research applied to analyze classroom teaching, design and assess the freshmen education at the universities and colleges.
Here is the abstract of the presentation:
[Summary]
Research methodology vis-a-vis university classroom teaching cannot be easily constructed in general. I have reviewed several research methods and decided to use the Design Based Research 2003, abbreviated as DBR, as the pragmatic and demonstrative approach. The report is on the findings based on the application of DBR on freshmen education.
Since the direct application of DBR method accompanied difficulties, I have decided to use the revised version which was developed as a method to improve university classroom teaching. Using the revised DBR, an analysis, design and assessment of classroom teaching was made.
As part of the freshmen education, I teach a subject named freshmen seminar. The content of the course consists of the common freshmen curriculum, flavored with newly developed content through the ingenuity of the faculty members.
The outline of the freshmen education consists of basic courses such as orientation, self-introduction and a campus tour. I have added specific content such as writing, newspaper reading, educational counseling, basic education terminology, note-taking, and logical writing into the freshmen courses to complement each other. The content is comparable to the seminar offered by the Department of Education.
The DBR and the revised DBR enable us to repeat the designing process while assessing each program, and explain it using past findings in addition to generating new methodologies as findings.
In this research, I shall report on the findings gained by applying the revised DBR.
(Dr. Kanji Akahori, a member of the CRET board of directors)
|
August 5, 2009
Report entitled "Research on Vocabulary Competency for school subjects and PISA-type Knowledge Application Competency"
presented at the 37th Convention of the Behaviormetric Society of Japan
A research group in the Assessment and Analysis of Educational Testing section has been engaged in the research and development of multiple-choice testing to measure the mastery of vocabulary for key school subjects. This is part of a more general project which is intended to develop methods to predict mastery of school subjects in simpler manners.
In 2008, the research group developed vocabulary tests for four key subjects (Japanese, arithmetic/mathematics, science, and social studies) and a PISA-type knowledge application competency test. A preliminary survey was conducted targeting fifth graders and eighth graders. The result was presented at the 37th Convention of the Behaviormetric Society of Japan.
Many scholars from universities attended the session. One of the questions asked was on the recommended teaching method for students who scored low on vocabulary tests. Another question was whether the unidimensionality of each vocabulary test had been verified by subject area.
(Yiping Zhang, Ph.D., CRET Researcher)
|
March 11th, 2009
The CBT Interface Research Development Report on the Intensive Reading Comprehension Exam: ED-MEDIA 2008 Presentation Report
 Scene of the Experiment
 System Display
 Presentation at ED-MEDIA
At the Research Department on Advancement of Testing Technology, we are currently researching a new test interface as a part of a motherboard development project for next-generation examinations.
In 2008, we researched and developed a test interface for an intensive reading comprehension exam that is written vertically. After conducting an experiment with 22 university students as test subjects, a presentation was given in the Short Paper (New Development) section at the international ED-MEDIA conference.
When vertically-written Japanese is displayed on a computer screen, which is often the case for intensive reading comprehension exams, the entire examination text, using this new interface, can be viewed in a single screenshot. The new interface can also display the same number of letters per vertical line as exams printed on paper. Moreover, this interface can be used to display a variety of printing formats commonly used in various media, including the 41-letter (per vertical line) format in the case of the new pocket paperback, 20 letters in the case of manuscript paper, and 13 letters in the case of newsprint. The display can be changed by zooming and scrolling with the mouse, giving the examinee the ability to view the entire exam in one glance. This has enabled us to create a test environment similar to that of an exam printed on paper.
Although the focus of this presentation was on the development of the interface for the purpose of reading or writing Japanese, it was seen as something other languages could benefit from. Three researchers from English-speaking countries asked us questions and made suggestions. We exchanged opinions about the fact that students examined on printed materials scored higher. We also discussed the algorithm used in enlarging font sizes as well as its implementation methods; the discussion was useful for future research. We will continue to make suggestions for new methods of presenting examination questions, as well as researching the effects that writing long answers on a computer has on examinees. We also hope to further our research on how to deal with the issue of showing the intermediate steps taken by the examinee when solving a mathematical problem. Finally, we hope to create a test environment that links the above to an automatic marking system.
(Masahiro Yachi, CRET Researcher)
|
|