In search for searchable Han texts of primary sources for pre-20th century Vietnam

Hieu Phung minhhieu at msn.com

Wed Apr 15 14:58:06 PDT 2015

Hello everybody,

I am writing my PhD dissertation on human perceptions of environment in Vietnam from 1450-1850 and, therefore, I am trying to find/build/contribute searchable Han texts of primary sources for studies of pre-20th century Vietnam.

Though I have learned about 2 follow things, I started to input text only in a "primitive" way (by typing texts) on a "primitive" platform (the Chinese WikiSource page).

1- There're people in Vietnam who accomplished a searchable text for the Dai Viet su ky toan thu (The Complete Book of Dai Viet). But such a searchable text has not yet been available to all people including myself. Part of the text (which covers the history from the beginning to some earlier years of the Hong Duc era or the 1470s) is made available by contributions of the Chinese WikiSource. (https://zh.wikisource.org/wiki/大越史記全書) You can search with "大越史記全書."

2- "Chinese" people have developed a very good OCR (Optical Character Recognition) technology and there's a very good community that promotes this technology to build searchable data for Chinese texts. Here: http://ctext.org/instructions/ocr. I find this platform friendly to use after doing a tiny bit with the OCR texts available here. But I have not had any idea how to bring this technology into the work of Han-Nom studies.

So it is my hope that if anyone is interested in this activity, please recommend if 1) there is a better platform to build this kind of data (rather than Chinese Wikipedia Source). I am having an idea to input some Han texts for the Phu bien tap luc, but, on the Wikisource tieng Viet page (maybe it is not a very good idea.)

2) when we work with Han texts, it would be great that we share the searchable texts (even a small passage) somewhere online so that other people do not need to type them again. I am aware that the idea is simple, but there will be a more complete data that emerges over time, while we are to wait for the development of OCR. N.B.: Many Vietnamese Han texts are manuscripts rather than print (like Chinese texts) - so there come more challenges to the OCR technology.

Kindly Regards,

Hieu Phung

Anh-Minh Do caligarn at gmail.com

Thu Apr 16 00:26:58 PDT 2015

Hi Phung,

This is an area that interests me very much. My father, James Do Ba Phuoc, who

was also a VSGer up until his passing earlier this year, was very focused

on the digitization of chu Nom.

My colleague Tomo and I are in the process of working with the national

library in Vietnam to further the work my father was doing in terms of OCR.

This is a project that the Nom Foundation is specifically quite keen on. We

are exploring the use of Google's Tesseract software to do so.

Please ping me more for further conversation. I'd love to chat.

Anh-Minh Do

Editor, Tech In Asia, Vietnam

Thompson, C. M. thompsonc2 at southernct.edu

Thu Apr 16 14:08:32 PDT 2015

Dear Anh-Minh,

I just wanted to say that although I did not know your father very well, I respected his work tremendously, I worked with him productively for several years on the Vietnamese Nom Preservation Foundation and I am very sorry to hear that he has passed away.

My condolences to your family.

Sincere Regards

Michele

Michele Thompson

Professor of Southeast Asian History

Dept. of History

Southern Connecticut State Univ.

Dien Nguyen nguyendien519 at gmail.com

Thu Apr 16 17:11:01 PDT 2015

Dear Michele,

James Do passed away on 10 Jan 2015 in Saigon.

Diễn Đàn published the following obituaries and articles about his

contribution to the encoding of chữ Việt and chữ Nôm in Unicode:

Tưởng nhớ Anh Đỗ Bá Phước (James Đỗ)

Chúng tôi được tin anh Đỗ Bá Phước (James Do) đã từ trần sáng ngày

10.1.2015 tại Thành phố Hồ Chí Minh, thọ 63 tuổi.

Anh học toán nhưng làm về công nghệ thông tin và là một trong những

nhà nghiên cứu đã cộng tác để đưa chữ Việt và chữ Nôm vào bảng mã

Unicode(*) : Đỗ Bá Phước, Ngô Thanh Nhàn, Ngô Trung Việt, Nguyễn

Hoàng, Nguyễn Quang Hồng.

http://www.diendan.org/nhung-con-nguoi/do-ba-phuoc-1952-2015

Tưởng nhớ Đỗ Bá Phước (1952-2015)

http://www.diendan.org/nhung-con-nguoi/henri-edward...-va-james-do

Nhớ Phước

http://www.diendan.org/nhung-con-nguoi/nho-phuoc

Đỗ Bá Phước – người bạn hiền theo năm tháng cuộc đời

http://www.diendan.org/nhung-con-nguoi/do-ba-phuoc-2013-nguoi-ban-hien

See also:

James Do (1952-2015)

James Do (Đỗ Bá Phước) was active from the the early days of Unicode

when he worked to shape the encoding of both the Latin-based Quốc ngữ

script now used in Vietnam, as well as the traditional Hán (Literary

Chinese) and Chữ Nôm. He brought together Vietnamese experts in Hán

and Chữ Nôm to facilitate Vietnamese participation in the IRG. He

worked tirelessly in Vietnam and overseas to promote the adoption of

Unicode. He co-founded the Vietnamese Nôm Preservation Foundation as

part of his long-term interest in making the largely untranslated

corpus of traditional Vietnamese literature in Hán and Nôm, and the

cultural legacy it contains, available to students around the world in

digital form. James was also interested in the sustainable development

of Vietnam through improved education, to which end he helped found

the Pacific Links Foundation. James moved from California back to

Vietnam in 2007 to work as CTO of InfoNam Inc. until his passing on

January 10, 2015.

http://www.unicode.org/consortium/memoriam.html

Nguyễn Điền

Independent Researcher

Canberra

Anh-Minh Do caligarn at gmail.com

Thu Apr 16 19:11:33 PDT 2015

Thank you all for your kind words.

I do think it's interesting to note that if you use a Mac, and you type in

Vietnamese, you may notice that there is a strange icon where a flag is

supposed to be. This is an icon of the Temple of Literature (you can see it

below, to right of the Bluetooth icon). My father worked closely with Apple

and suggested this very icon for them to use. In other words, every Mac

computer has Vietnam's Temple of Literature in it. :)

[image: Inline image 1]

Cheers,

Minh

anhminhtrando at gmail.com

anhminhdo.com

Phone & Whatsapp: +84988477612

Skype: caligarn

Twitter: @hellominhdo

Margaret B. Bodemer mbodemer at calpoly.edu

Thu Apr 16 20:43:09 PDT 2015

Hello Minh,

Thank you for mentioning the temple of literature icon - I always notice it on my computer when I switch to Vietnamese but didn't realize how it came about - what a great idea of your father's!

Best,

Margaret B. Bodemer, Ph.D.

California Polytechnic State University

San Luis Obispo, CA, U.S.A.

Anh-Minh Do caligarn at gmail.com

Thu Apr 16 21:35:24 PDT 2015

He suggested a number of icons to Apple, and most of them were designs of

the Temple of Literature. The main reason why he chose that symbol for

Vietnam was because of the sensitive political tensions that we all know.

You can actually talk to Lee Collins, who worked directly with my father to

hear the more full story. Please ping me personally, and I'll ping Lee.

Cheers,

Minh

anhminhtrando at gmail.com

anhminhdo.com

Phone & Whatsapp: +84988477612

Skype: caligarn

Twitter: @hellominhdo

Thompson, C. M. thompsonc2 at southernct.edu

Fri Apr 17 10:11:43 PDT 2015

Dear Dien,

Thanks very much for all of this information.

cheers

Michele

Michele Thompson

Professor of Southeast Asian History

Dept. of History

Southern Connecticut State Univ.