In search for searchable Han texts of primary sources for pre-20th century Vietnam
Hieu Phung minhhieu at msn.com
Wed Apr 15 14:58:06 PDT 2015
Hello everybody,
I am writing my PhD dissertation on human perceptions of environment in Vietnam from 1450-1850 and, therefore, I am trying to find/build/contribute searchable Han texts of primary sources for studies of pre-20th century Vietnam.
Though I have learned about 2 follow things, I started to input text only in a "primitive" way (by typing texts) on a "primitive" platform (the Chinese WikiSource page).
1- There're people in Vietnam who accomplished a searchable text for the Dai Viet su ky toan thu (The Complete Book of Dai Viet). But such a searchable text has not yet been available to all people including myself. Part of the text (which covers the history from the beginning to some earlier years of the Hong Duc era or the 1470s) is made available by contributions of the Chinese WikiSource. (https://zh.wikisource.org/wiki/大越史記全書) You can search with "大越史記全書."
2- "Chinese" people have developed a very good OCR (Optical Character Recognition) technology and there's a very good community that promotes this technology to build searchable data for Chinese texts. Here: http://ctext.org/instructions/ocr. I find this platform friendly to use after doing a tiny bit with the OCR texts available here. But I have not had any idea how to bring this technology into the work of Han-Nom studies.
So it is my hope that if anyone is interested in this activity, please recommend if 1) there is a better platform to build this kind of data (rather than Chinese Wikipedia Source). I am having an idea to input some Han texts for the Phu bien tap luc, but, on the Wikisource tieng Viet page (maybe it is not a very good idea.)
2) when we work with Han texts, it would be great that we share the searchable texts (even a small passage) somewhere online so that other people do not need to type them again. I am aware that the idea is simple, but there will be a more complete data that emerges over time, while we are to wait for the development of OCR. N.B.: Many Vietnamese Han texts are manuscripts rather than print (like Chinese texts) - so there come more challenges to the OCR technology.
Kindly Regards,
Hieu Phung
Anh-Minh Do caligarn at gmail.com
Thu Apr 16 00:26:58 PDT 2015
Hi Phung,
This is an area that interests me very much. My father, James Do Ba Phuoc, who
was also a VSGer up until his passing earlier this year, was very focused
on the digitization of chu Nom.
My colleague Tomo and I are in the process of working with the national
library in Vietnam to further the work my father was doing in terms of OCR.
This is a project that the Nom Foundation is specifically quite keen on. We
are exploring the use of Google's Tesseract software to do so.
Please ping me more for further conversation. I'd love to chat.
Anh-Minh Do
Editor, Tech In Asia, Vietnam
Thompson, C. M. thompsonc2 at southernct.edu
Thu Apr 16 14:08:32 PDT 2015
Dear Anh-Minh,
I just wanted to say that although I did not know your father very well, I respected his work tremendously, I worked with him productively for several years on the Vietnamese Nom Preservation Foundation and I am very sorry to hear that he has passed away.
My condolences to your family.
Sincere Regards
Michele
Michele Thompson
Professor of Southeast Asian History
Dept. of History
Southern Connecticut State Univ.
Dien Nguyen nguyendien519 at gmail.com
Thu Apr 16 17:11:01 PDT 2015
Dear Michele,
James Do passed away on 10 Jan 2015 in Saigon.
Diễn Đàn published the following obituaries and articles about his
contribution to the encoding of chữ Việt and chữ Nôm in Unicode:
Tưởng nhớ Anh Đỗ Bá Phước (James Đỗ)
Chúng tôi được tin anh Đỗ Bá Phước (James Do) đã từ trần sáng ngày
10.1.2015 tại Thành phố Hồ Chí Minh, thọ 63 tuổi.
Anh học toán nhưng làm về công nghệ thông tin và là một trong những
nhà nghiên cứu đã cộng tác để đưa chữ Việt và chữ Nôm vào bảng mã
Unicode(*) : Đỗ Bá Phước, Ngô Thanh Nhàn, Ngô Trung Việt, Nguyễn
Hoàng, Nguyễn Quang Hồng.
http://www.diendan.org/nhung-con-nguoi/do-ba-phuoc-1952-2015
Tưởng nhớ Đỗ Bá Phước (1952-2015)
http://www.diendan.org/nhung-con-nguoi/henri-edward...-va-james-do
Nhớ Phước
http://www.diendan.org/nhung-con-nguoi/nho-phuoc
Đỗ Bá Phước – người bạn hiền theo năm tháng cuộc đời
http://www.diendan.org/nhung-con-nguoi/do-ba-phuoc-2013-nguoi-ban-hien
See also:
James Do (1952-2015)
James Do (Đỗ Bá Phước) was active from the the early days of Unicode
when he worked to shape the encoding of both the Latin-based Quốc ngữ
script now used in Vietnam, as well as the traditional Hán (Literary
Chinese) and Chữ Nôm. He brought together Vietnamese experts in Hán
and Chữ Nôm to facilitate Vietnamese participation in the IRG. He
worked tirelessly in Vietnam and overseas to promote the adoption of
Unicode. He co-founded the Vietnamese Nôm Preservation Foundation as
part of his long-term interest in making the largely untranslated
corpus of traditional Vietnamese literature in Hán and Nôm, and the
cultural legacy it contains, available to students around the world in
digital form. James was also interested in the sustainable development
of Vietnam through improved education, to which end he helped found
the Pacific Links Foundation. James moved from California back to
Vietnam in 2007 to work as CTO of InfoNam Inc. until his passing on
January 10, 2015.
http://www.unicode.org/consortium/memoriam.html
Nguyễn Điền
Independent Researcher
Canberra
Anh-Minh Do caligarn at gmail.com
Thu Apr 16 19:11:33 PDT 2015
Thank you all for your kind words.
I do think it's interesting to note that if you use a Mac, and you type in
Vietnamese, you may notice that there is a strange icon where a flag is
supposed to be. This is an icon of the Temple of Literature (you can see it
below, to right of the Bluetooth icon). My father worked closely with Apple
and suggested this very icon for them to use. In other words, every Mac
computer has Vietnam's Temple of Literature in it. :)
[image: Inline image 1]
Cheers,
Minh
anhminhdo.com
Phone & Whatsapp: +84988477612
Skype: caligarn
Twitter: @hellominhdo
Margaret B. Bodemer mbodemer at calpoly.edu
Thu Apr 16 20:43:09 PDT 2015
Hello Minh,
Thank you for mentioning the temple of literature icon - I always notice it on my computer when I switch to Vietnamese but didn't realize how it came about - what a great idea of your father's!
Best,
Margaret B. Bodemer, Ph.D.
California Polytechnic State University
San Luis Obispo, CA, U.S.A.
Anh-Minh Do caligarn at gmail.com
Thu Apr 16 21:35:24 PDT 2015
He suggested a number of icons to Apple, and most of them were designs of
the Temple of Literature. The main reason why he chose that symbol for
Vietnam was because of the sensitive political tensions that we all know.
You can actually talk to Lee Collins, who worked directly with my father to
hear the more full story. Please ping me personally, and I'll ping Lee.
Cheers,
Minh
anhminhdo.com
Phone & Whatsapp: +84988477612
Skype: caligarn
Twitter: @hellominhdo
Thompson, C. M. thompsonc2 at southernct.edu
Fri Apr 17 10:11:43 PDT 2015
Dear Dien,
Thanks very much for all of this information.
cheers
Michele
Michele Thompson
Professor of Southeast Asian History
Dept. of History
Southern Connecticut State Univ.