星空网 > 软件开发 > 操作系统

(转)汉字转拼音HanziToPinyin

本文转载于:http://blog.csdn.net/zhangphil/article/details/47164665

Android系统本身自带有有将汉字转化为英文拼音的类和方法。具体的类就是HanziToPinyin.java。Android系统自身实现的通讯录中就使用了HanziToPinyin.java对中文通讯录做分组整理。通过HanziToPinyin.java可以将汉字转化为拼音输出,在一些应用中非常必须,比如联系人的分组,假设一个人通讯录中存有若干姓张(ZHANG)的联系人,那么所有姓张的联系人按理都应该分组在“Z”组下。又比如微信、QQ等等此类社交类APP,凡是涉及到联系人、好友分组排序的应用场景,则均需要将汉字转化为拼音然后依据首字母排序归类。
HanziToPinyin.java不是一个公开的类,只是谷歌官方内部在实现Android通讯录中私有使用的一个类,我们不能够直接像使用普通Android SDK API一样使用,但这没关系,我们完全可以将这个类文件拷贝出来,放到我们自己的项目中,直接使用。
HanziToPinyin.java的代码文件,谷歌官方的通讯录APP下:

packages/providers/ContactsProvider /src/com/android/providers/contacts/HanziToPinyin.java

网上也有这个HanziToPinyin.java类文件的项目地址。但是,直接使用这个 类不能正常工作,错误原因是:

"There is no Chinese collator, HanziToPinyin is disabled"

发生这一错误的代码块是在HanziToPinyin.java的方法:
public static HanziToPinyin getInstance();
具体原因是这个方法在一些非原生定制的Android系统中,对中文Locale的定义规则不同,导致原代码文件中的locale[i].equals(Locale.CHINA)返回false,不能识别,致使以后的代码全部失去功效。

对此问题的修复(解决方案)

我改进了判断条件,增加一些代码:
final Locale chinaAddition = new Locale("zh");
将此chinaAddition作为辅助条件也加入到条件判断中,

1 if ( locale[i].equals(Locale.CHINA) || locale[i].equals(chinaAddition) ){2 …3 }

下面是我改进后的getInstance()方法全部代码:

 1 public static HanziToPinyin getInstance() { 2     synchronized (HanziToPinyin.class) { 3       if (sInstance != null) { 4         return sInstance; 5       } 6       // Check if zh_CN collation data is available 7       final Locale locale[] = Collator.getAvailableLocales(); 8  9       // 增加的代码,增强。10       final Locale chinaAddition = new Locale("zh");11 12       for (int i = 0; i < locale.length; i++) {13         if (locale[i].equals(Locale.CHINA)14             || locale[i].equals(chinaAddition)) {15           // Do self validation just once.16           if (DEBUG) {17             Log.d(TAG, "Self validation. Result: "18                 + doSelfValidation());19           }20           sInstance = new HanziToPinyin(true);21           return sInstance;22         }23       }24       Log.w(TAG,25           "There is no Chinese collator, HanziToPinyin is disabled");26       sInstance = new HanziToPinyin(false);27       return sInstance;28     }29   }

经由改进增强,HanziToPinyin.java的全部源代码如下(代码可以复制到自己的项目中直接使用):

 1 /* 2  * Copyright (C) 2011 The Android Open Source Project 3  * 4  * Licensed under the Apache License, Version 2.0 (the "License"); 5  * you may not use this file except in compliance with the License. 6  * You may obtain a copy of the License at 7  * 8  *   http://www.apache.org/licenses/LICENSE-2.0 9  * 10  * Unless required by applicable law or agreed to in writing, software 11  * distributed under the License is distributed on an "AS IS" BASIS, 12  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13  * See the License for the specific language governing permissions and 14  * limitations under the License. 15 */ 16  17 package zhangphil.hanyupinyin; 18  19 import android.text.TextUtils; 20 import android.util.Log; 21  22 import java.text.Collator; 23 import java.util.ArrayList; 24 import java.util.Locale; 25  26 /** 27  * An object to convert Chinese character to its corresponding pinyin string. 28  * For characters with multiple possible pinyin string, only one is selected 29  * according to collator. Polyphone is not supported in this implementation. 30  * This class is implemented to achieve the best runtime performance and minimum 31  * runtime resources with tolerable sacrifice of accuracy. This implementation 32  * highly depends on zh_CN ICU collation data and must be always synchronized 33  * with ICU. 34  * 35  * Currently this file is aligned to zh.txt in ICU 4.6 鏉ヨ嚜android4.2婧愮爜 36 */ 37 public class HanziToPinyin { 38   private static final String TAG = "HanziToPinyin"; 39  40   // Turn on this flag when we want to check internal data structure. 41   private static final boolean DEBUG = false; 42  43   /** 44    * Unihans array. 45    * 46    * Each unihans is the first one within same pinyin when collator is zh_CN. 47   */ 48   public static final char[] UNIHANS = { '\u963f', '\u54ce', '\u5b89', 49       '\u80ae', '\u51f9', '\u516b', '\u6300', '\u6273', '\u90a6', 50       '\u52f9', '\u9642', '\u5954', '\u4f3b', '\u5c44', '\u8fb9', 51       '\u706c', '\u618b', '\u6c43', '\u51ab', '\u7676', '\u5cec', 52       '\u5693', '\u5072', '\u53c2', '\u4ed3', '\u64a1', '\u518a', 53       '\u5d7e', '\u66fd', '\u66fe', '\u5c64', '\u53c9', '\u8286', 54       '\u8fbf', '\u4f25', '\u6284', '\u8f66', '\u62bb', '\u6c88', 55       '\u6c89', '\u9637', '\u5403', '\u5145', '\u62bd', '\u51fa', 56       '\u6b3b', '\u63e3', '\u5ddb', '\u5205', '\u5439', '\u65fe', 57       '\u9034', '\u5472', '\u5306', '\u51d1', '\u7c97', '\u6c46', 58       '\u5d14', '\u90a8', '\u6413', '\u5491', '\u5446', '\u4e39', 59       '\u5f53', '\u5200', '\u561a', '\u6265', '\u706f', '\u6c10', 60       '\u55f2', '\u7538', '\u5201', '\u7239', '\u4e01', '\u4e1f', 61       '\u4e1c', '\u543a', '\u53be', '\u8011', '\u8968', '\u5428', 62       '\u591a', '\u59b8', '\u8bf6', '\u5940', '\u97a5', '\u513f', 63       '\u53d1', '\u5e06', '\u531a', '\u98de', '\u5206', '\u4e30', 64       '\u8985', '\u4ecf', '\u7d11', '\u4f15', '\u65ee', '\u4f85', 65       '\u7518', '\u5188', '\u768b', '\u6208', '\u7ed9', '\u6839', 66       '\u522f', '\u5de5', '\u52fe', '\u4f30', '\u74dc', '\u4e56', 67       '\u5173', '\u5149', '\u5f52', '\u4e28', '\u5459', '\u54c8', 68       '\u548d', '\u4f44', '\u592f', '\u8320', '\u8bc3', '\u9ed2', 69       '\u62eb', '\u4ea8', '\u5677', '\u53ff', '\u9f41', '\u4e6f', 70       '\u82b1', '\u6000', '\u72bf', '\u5ddf', '\u7070', '\u660f', 71       '\u5419', '\u4e0c', '\u52a0', '\u620b', '\u6c5f', '\u827d', 72       '\u9636', '\u5dfe', '\u5755', '\u5182', '\u4e29', '\u51e5', 73       '\u59e2', '\u5658', '\u519b', '\u5494', '\u5f00', '\u520a', 74       '\u5ffc', '\u5c3b', '\u533c', '\u808e', '\u52a5', '\u7a7a', 75       '\u62a0', '\u625d', '\u5938', '\u84af', '\u5bbd', '\u5321', 76       '\u4e8f', '\u5764', '\u6269', '\u5783', '\u6765', '\u5170', 77       '\u5577', '\u635e', '\u808b', '\u52d2', '\u5d1a', '\u5215', 78       '\u4fe9', '\u5941', '\u826f', '\u64a9', '\u5217', '\u62ce', 79       '\u5222', '\u6e9c', '\u56d6', '\u9f99', '\u779c', '\u565c', 80       '\u5a08', '\u7567', '\u62a1', '\u7f57', '\u5463', '\u5988', 81       '\u57cb', '\u5ada', '\u7264', '\u732b', '\u4e48', '\u5445', 82       '\u95e8', '\u753f', '\u54aa', '\u5b80', '\u55b5', '\u4e5c', 83       '\u6c11', '\u540d', '\u8c2c', '\u6478', '\u54de', '\u6bea', 84       '\u55ef', '\u62cf', '\u8149', '\u56e1', '\u56d4', '\u5b6c', 85       '\u7592', '\u5a1e', '\u6041', '\u80fd', '\u59ae', '\u62c8', 86       '\u5b22', '\u9e1f', '\u634f', '\u56dc', '\u5b81', '\u599e', 87       '\u519c', '\u7fba', '\u5974', '\u597b', '\u759f', '\u9ec1', 88       '\u90cd', '\u5594', '\u8bb4', '\u5991', '\u62cd', '\u7705', 89       '\u4e53', '\u629b', '\u5478', '\u55b7', '\u5309', '\u4e15', 90       '\u56e8', '\u527d', '\u6c15', '\u59d8', '\u4e52', '\u948b', 91       '\u5256', '\u4ec6', '\u4e03', '\u6390', '\u5343', '\u545b', 92       '\u6084', '\u767f', '\u4eb2', '\u72c5', '\u828e', '\u4e18', 93       '\u533a', '\u5cd1', '\u7f3a', '\u590b', '\u5465', '\u7a63', 94       '\u5a06', '\u60f9', '\u4eba', '\u6254', '\u65e5', '\u8338', 95       '\u53b9', '\u909a', '\u633c', '\u5827', '\u5a51', '\u77a4', 96       '\u637c', '\u4ee8', '\u6be2', '\u4e09', '\u6852', '\u63bb', 97       '\u95aa', '\u68ee', '\u50e7', '\u6740', '\u7b5b', '\u5c71', 98       '\u4f24', '\u5f30', '\u5962', '\u7533', '\u8398', '\u6552', 99       '\u5347', '\u5c38', '\u53ce', '\u4e66', '\u5237', '\u8870',100       '\u95e9', '\u53cc', '\u8c01', '\u542e', '\u8bf4', '\u53b6',101       '\u5fea', '\u635c', '\u82cf', '\u72fb', '\u590a', '\u5b59',102       '\u5506', '\u4ed6', '\u56fc', '\u574d', '\u6c64', '\u5932',103       '\u5fd1', '\u71a5', '\u5254', '\u5929', '\u65eb', '\u5e16',104       '\u5385', '\u56f2', '\u5077', '\u51f8', '\u6e4d', '\u63a8',105       '\u541e', '\u4e47', '\u7a75', '\u6b6a', '\u5f2f', '\u5c23',106       '\u5371', '\u6637', '\u7fc1', '\u631d', '\u4e4c', '\u5915',107       '\u8672', '\u4eda', '\u4e61', '\u7071', '\u4e9b', '\u5fc3',108       '\u661f', '\u51f6', '\u4f11', '\u5401', '\u5405', '\u524a',109       '\u5743', '\u4e2b', '\u6079', '\u592e', '\u5e7a', '\u503b',110       '\u4e00', '\u56d9', '\u5e94', '\u54df', '\u4f63', '\u4f18',111       '\u625c', '\u56e6', '\u66f0', '\u6655', '\u7b60', '\u7b7c',112       '\u5e00', '\u707d', '\u5142', '\u5328', '\u50ae', '\u5219',113       '\u8d3c', '\u600e', '\u5897', '\u624e', '\u635a', '\u6cbe',114       '\u5f20', '\u957f', '\u9577', '\u4f4b', '\u8707', '\u8d1e',115       '\u4e89', '\u4e4b', '\u5cd9', '\u5ea2', '\u4e2d', '\u5dde',116       '\u6731', '\u6293', '\u62fd', '\u4e13', '\u5986', '\u96b9',117       '\u5b92', '\u5353', '\u4e72', '\u5b97', '\u90b9', '\u79df',118       '\u94bb', '\u539c', '\u5c0a', '\u6628', '\u5159', '\u9fc3',119       '\u9fc4', };120 121   /**122    * Pinyin array.123    *124    * Each pinyin is corresponding to unihans of same offset in the unihans125    * array.126   */127   public static final byte[][] PINYINS = { { 65, 0, 0, 0, 0, 0 },128       { 65, 73, 0, 0, 0, 0 }, { 65, 78, 0, 0, 0, 0 },129       { 65, 78, 71, 0, 0, 0 }, { 65, 79, 0, 0, 0, 0 },130       { 66, 65, 0, 0, 0, 0 }, { 66, 65, 73, 0, 0, 0 },131       { 66, 65, 78, 0, 0, 0 }, { 66, 65, 78, 71, 0, 0 },132       { 66, 65, 79, 0, 0, 0 }, { 66, 69, 73, 0, 0, 0 },133       { 66, 69, 78, 0, 0, 0 }, { 66, 69, 78, 71, 0, 0 },134       { 66, 73, 0, 0, 0, 0 }, { 66, 73, 65, 78, 0, 0 },135       { 66, 73, 65, 79, 0, 0 }, { 66, 73, 69, 0, 0, 0 },136       { 66, 73, 78, 0, 0, 0 }, { 66, 73, 78, 71, 0, 0 },137       { 66, 79, 0, 0, 0, 0 }, { 66, 85, 0, 0, 0, 0 },138       { 67, 65, 0, 0, 0, 0 }, { 67, 65, 73, 0, 0, 0 },139       { 67, 65, 78, 0, 0, 0 }, { 67, 65, 78, 71, 0, 0 },140       { 67, 65, 79, 0, 0, 0 }, { 67, 69, 0, 0, 0, 0 },141       { 67, 69, 78, 0, 0, 0 }, { 67, 69, 78, 71, 0, 0 },142       { 90, 69, 78, 71, 0, 0 }, { 67, 69, 78, 71, 0, 0 },143       { 67, 72, 65, 0, 0, 0 }, { 67, 72, 65, 73, 0, 0 },144       { 67, 72, 65, 78, 0, 0 }, { 67, 72, 65, 78, 71, 0 },145       { 67, 72, 65, 79, 0, 0 }, { 67, 72, 69, 0, 0, 0 },146       { 67, 72, 69, 78, 0, 0 }, { 83, 72, 69, 78, 0, 0 },147       { 67, 72, 69, 78, 0, 0 }, { 67, 72, 69, 78, 71, 0 },148       { 67, 72, 73, 0, 0, 0 }, { 67, 72, 79, 78, 71, 0 },149       { 67, 72, 79, 85, 0, 0 }, { 67, 72, 85, 0, 0, 0 },150       { 67, 72, 85, 65, 0, 0 }, { 67, 72, 85, 65, 73, 0 },151       { 67, 72, 85, 65, 78, 0 }, { 67, 72, 85, 65, 78, 71 },152       { 67, 72, 85, 73, 0, 0 }, { 67, 72, 85, 78, 0, 0 },153       { 67, 72, 85, 79, 0, 0 }, { 67, 73, 0, 0, 0, 0 },154       { 67, 79, 78, 71, 0, 0 }, { 67, 79, 85, 0, 0, 0 },155       { 67, 85, 0, 0, 0, 0 }, { 67, 85, 65, 78, 0, 0 },156       { 67, 85, 73, 0, 0, 0 }, { 67, 85, 78, 0, 0, 0 },157       { 67, 85, 79, 0, 0, 0 }, { 68, 65, 0, 0, 0, 0 },158       { 68, 65, 73, 0, 0, 0 }, { 68, 65, 78, 0, 0, 0 },159       { 68, 65, 78, 71, 0, 0 }, { 68, 65, 79, 0, 0, 0 },160       { 68, 69, 0, 0, 0, 0 }, { 68, 69, 78, 0, 0, 0 },161       { 68, 69, 78, 71, 0, 0 }, { 68, 73, 0, 0, 0, 0 },162       { 68, 73, 65, 0, 0, 0 }, { 68, 73, 65, 78, 0, 0 },163       { 68, 73, 65, 79, 0, 0 }, { 68, 73, 69, 0, 0, 0 },164       { 68, 73, 78, 71, 0, 0 }, { 68, 73, 85, 0, 0, 0 },165       { 68, 79, 78, 71, 0, 0 }, { 68, 79, 85, 0, 0, 0 },166       { 68, 85, 0, 0, 0, 0 }, { 68, 85, 65, 78, 0, 0 },167       { 68, 85, 73, 0, 0, 0 }, { 68, 85, 78, 0, 0, 0 },168       { 68, 85, 79, 0, 0, 0 }, { 69, 0, 0, 0, 0, 0 },169       { 69, 73, 0, 0, 0, 0 }, { 69, 78, 0, 0, 0, 0 },170       { 69, 78, 71, 0, 0, 0 }, { 69, 82, 0, 0, 0, 0 },171       { 70, 65, 0, 0, 0, 0 }, { 70, 65, 78, 0, 0, 0 },172       { 70, 65, 78, 71, 0, 0 }, { 70, 69, 73, 0, 0, 0 },173       { 70, 69, 78, 0, 0, 0 }, { 70, 69, 78, 71, 0, 0 },174       { 70, 73, 65, 79, 0, 0 }, { 70, 79, 0, 0, 0, 0 },175       { 70, 79, 85, 0, 0, 0 }, { 70, 85, 0, 0, 0, 0 },176       { 71, 65, 0, 0, 0, 0 }, { 71, 65, 73, 0, 0, 0 },177       { 71, 65, 78, 0, 0, 0 }, { 71, 65, 78, 71, 0, 0 },178       { 71, 65, 79, 0, 0, 0 }, { 71, 69, 0, 0, 0, 0 },179       { 71, 69, 73, 0, 0, 0 }, { 71, 69, 78, 0, 0, 0 },180       { 71, 69, 78, 71, 0, 0 }, { 71, 79, 78, 71, 0, 0 },181       { 71, 79, 85, 0, 0, 0 }, { 71, 85, 0, 0, 0, 0 },182       { 71, 85, 65, 0, 0, 0 }, { 71, 85, 65, 73, 0, 0 },183       { 71, 85, 65, 78, 0, 0 }, { 71, 85, 65, 78, 71, 0 },184       { 71, 85, 73, 0, 0, 0 }, { 71, 85, 78, 0, 0, 0 },185       { 71, 85, 79, 0, 0, 0 }, { 72, 65, 0, 0, 0, 0 },186       { 72, 65, 73, 0, 0, 0 }, { 72, 65, 78, 0, 0, 0 },187       { 72, 65, 78, 71, 0, 0 }, { 72, 65, 79, 0, 0, 0 },188       { 72, 69, 0, 0, 0, 0 }, { 72, 69, 73, 0, 0, 0 },189       { 72, 69, 78, 0, 0, 0 }, { 72, 69, 78, 71, 0, 0 },190       { 72, 77, 0, 0, 0, 0 }, { 72, 79, 78, 71, 0, 0 },191       { 72, 79, 85, 0, 0, 0 }, { 72, 85, 0, 0, 0, 0 },192       { 72, 85, 65, 0, 0, 0 }, { 72, 85, 65, 73, 0, 0 },193       { 72, 85, 65, 78, 0, 0 }, { 72, 85, 65, 78, 71, 0 },194       { 72, 85, 73, 0, 0, 0 }, { 72, 85, 78, 0, 0, 0 },195       { 72, 85, 79, 0, 0, 0 }, { 74, 73, 0, 0, 0, 0 },196       { 74, 73, 65, 0, 0, 0 }, { 74, 73, 65, 78, 0, 0 },197       { 74, 73, 65, 78, 71, 0 }, { 74, 73, 65, 79, 0, 0 },198       { 74, 73, 69, 0, 0, 0 }, { 74, 73, 78, 0, 0, 0 },199       { 74, 73, 78, 71, 0, 0 }, { 74, 73, 79, 78, 71, 0 },200       { 74, 73, 85, 0, 0, 0 }, { 74, 85, 0, 0, 0, 0 },201       { 74, 85, 65, 78, 0, 0 }, { 74, 85, 69, 0, 0, 0 },202       { 74, 85, 78, 0, 0, 0 }, { 75, 65, 0, 0, 0, 0 },203       { 75, 65, 73, 0, 0, 0 }, { 75, 65, 78, 0, 0, 0 },204       { 75, 65, 78, 71, 0, 0 }, { 75, 65, 79, 0, 0, 0 },205       { 75, 69, 0, 0, 0, 0 }, { 75, 69, 78, 0, 0, 0 },206       { 75, 69, 78, 71, 0, 0 }, { 75, 79, 78, 71, 0, 0 },207       { 75, 79, 85, 0, 0, 0 }, { 75, 85, 0, 0, 0, 0 },208       { 75, 85, 65, 0, 0, 0 }, { 75, 85, 65, 73, 0, 0 },209       { 75, 85, 65, 78, 0, 0 }, { 75, 85, 65, 78, 71, 0 },210       { 75, 85, 73, 0, 0, 0 }, { 75, 85, 78, 0, 0, 0 },211       { 75, 85, 79, 0, 0, 0 }, { 76, 65, 0, 0, 0, 0 },212       { 76, 65, 73, 0, 0, 0 }, { 76, 65, 78, 0, 0, 0 },213       { 76, 65, 78, 71, 0, 0 }, { 76, 65, 79, 0, 0, 0 },214       { 76, 69, 0, 0, 0, 0 }, { 76, 69, 73, 0, 0, 0 },215       { 76, 69, 78, 71, 0, 0 }, { 76, 73, 0, 0, 0, 0 },216       { 76, 73, 65, 0, 0, 0 }, { 76, 73, 65, 78, 0, 0 },217       { 76, 73, 65, 78, 71, 0 }, { 76, 73, 65, 79, 0, 0 },218       { 76, 73, 69, 0, 0, 0 }, { 76, 73, 78, 0, 0, 0 },219       { 76, 73, 78, 71, 0, 0 }, { 76, 73, 85, 0, 0, 0 },220       { 76, 79, 0, 0, 0, 0 }, { 76, 79, 78, 71, 0, 0 },221       { 76, 79, 85, 0, 0, 0 }, { 76, 85, 0, 0, 0, 0 },222       { 76, 85, 65, 78, 0, 0 }, { 76, 85, 69, 0, 0, 0 },223       { 76, 85, 78, 0, 0, 0 }, { 76, 85, 79, 0, 0, 0 },224       { 77, 0, 0, 0, 0, 0 }, { 77, 65, 0, 0, 0, 0 },225       { 77, 65, 73, 0, 0, 0 }, { 77, 65, 78, 0, 0, 0 },226       { 77, 65, 78, 71, 0, 0 }, { 77, 65, 79, 0, 0, 0 },227       { 77, 69, 0, 0, 0, 0 }, { 77, 69, 73, 0, 0, 0 },228       { 77, 69, 78, 0, 0, 0 }, { 77, 69, 78, 71, 0, 0 },229       { 77, 73, 0, 0, 0, 0 }, { 77, 73, 65, 78, 0, 0 },230       { 77, 73, 65, 79, 0, 0 }, { 77, 73, 69, 0, 0, 0 },231       { 77, 73, 78, 0, 0, 0 }, { 77, 73, 78, 71, 0, 0 },232       { 77, 73, 85, 0, 0, 0 }, { 77, 79, 0, 0, 0, 0 },233       { 77, 79, 85, 0, 0, 0 }, { 77, 85, 0, 0, 0, 0 },234       { 78, 0, 0, 0, 0, 0 }, { 78, 65, 0, 0, 0, 0 },235       { 78, 65, 73, 0, 0, 0 }, { 78, 65, 78, 0, 0, 0 },236       { 78, 65, 78, 71, 0, 0 }, { 78, 65, 79, 0, 0, 0 },237       { 78, 69, 0, 0, 0, 0 }, { 78, 69, 73, 0, 0, 0 },238       { 78, 69, 78, 0, 0, 0 }, { 78, 69, 78, 71, 0, 0 },239       { 78, 73, 0, 0, 0, 0 }, { 78, 73, 65, 78, 0, 0 },240       { 78, 73, 65, 78, 71, 0 }, { 78, 73, 65, 79, 0, 0 },241       { 78, 73, 69, 0, 0, 0 }, { 78, 73, 78, 0, 0, 0 },242       { 78, 73, 78, 71, 0, 0 }, { 78, 73, 85, 0, 0, 0 },243       { 78, 79, 78, 71, 0, 0 }, { 78, 79, 85, 0, 0, 0 },244       { 78, 85, 0, 0, 0, 0 }, { 78, 85, 65, 78, 0, 0 },245       { 78, 85, 69, 0, 0, 0 }, { 78, 85, 78, 0, 0, 0 },246       { 78, 85, 79, 0, 0, 0 }, { 79, 0, 0, 0, 0, 0 },247       { 79, 85, 0, 0, 0, 0 }, { 80, 65, 0, 0, 0, 0 },248       { 80, 65, 73, 0, 0, 0 }, { 80, 65, 78, 0, 0, 0 },249       { 80, 65, 78, 71, 0, 0 }, { 80, 65, 79, 0, 0, 0 },250       { 80, 69, 73, 0, 0, 0 }, { 80, 69, 78, 0, 0, 0 },251       { 80, 69, 78, 71, 0, 0 }, { 80, 73, 0, 0, 0, 0 },252       { 80, 73, 65, 78, 0, 0 }, { 80, 73, 65, 79, 0, 0 },253       { 80, 73, 69, 0, 0, 0 }, { 80, 73, 78, 0, 0, 0 },254       { 80, 73, 78, 71, 0, 0 }, { 80, 79, 0, 0, 0, 0 },255       { 80, 79, 85, 0, 0, 0 }, { 80, 85, 0, 0, 0, 0 },256       { 81, 73, 0, 0, 0, 0 }, { 81, 73, 65, 0, 0, 0 },257       { 81, 73, 65, 78, 0, 0 }, { 81, 73, 65, 78, 71, 0 },258       { 81, 73, 65, 79, 0, 0 }, { 81, 73, 69, 0, 0, 0 },259       { 81, 73, 78, 0, 0, 0 }, { 81, 73, 78, 71, 0, 0 },260       { 81, 73, 79, 78, 71, 0 }, { 81, 73, 85, 0, 0, 0 },261       { 81, 85, 0, 0, 0, 0 }, { 81, 85, 65, 78, 0, 0 },262       { 81, 85, 69, 0, 0, 0 }, { 81, 85, 78, 0, 0, 0 },263       { 82, 65, 78, 0, 0, 0 }, { 82, 65, 78, 71, 0, 0 },264       { 82, 65, 79, 0, 0, 0 }, { 82, 69, 0, 0, 0, 0 },265       { 82, 69, 78, 0, 0, 0 }, { 82, 69, 78, 71, 0, 0 },266       { 82, 73, 0, 0, 0, 0 }, { 82, 79, 78, 71, 0, 0 },267       { 82, 79, 85, 0, 0, 0 }, { 82, 85, 0, 0, 0, 0 },268       { 82, 85, 65, 0, 0, 0 }, { 82, 85, 65, 78, 0, 0 },269       { 82, 85, 73, 0, 0, 0 }, { 82, 85, 78, 0, 0, 0 },270       { 82, 85, 79, 0, 0, 0 }, { 83, 65, 0, 0, 0, 0 },271       { 83, 65, 73, 0, 0, 0 }, { 83, 65, 78, 0, 0, 0 },272       { 83, 65, 78, 71, 0, 0 }, { 83, 65, 79, 0, 0, 0 },273       { 83, 69, 0, 0, 0, 0 }, { 83, 69, 78, 0, 0, 0 },274       { 83, 69, 78, 71, 0, 0 }, { 83, 72, 65, 0, 0, 0 },275       { 83, 72, 65, 73, 0, 0 }, { 83, 72, 65, 78, 0, 0 },276       { 83, 72, 65, 78, 71, 0 }, { 83, 72, 65, 79, 0, 0 },277       { 83, 72, 69, 0, 0, 0 }, { 83, 72, 69, 78, 0, 0 },278       { 88, 73, 78, 0, 0, 0 }, { 83, 72, 69, 78, 0, 0 },279       { 83, 72, 69, 78, 71, 0 }, { 83, 72, 73, 0, 0, 0 },280       { 83, 72, 79, 85, 0, 0 }, { 83, 72, 85, 0, 0, 0 },281       { 83, 72, 85, 65, 0, 0 }, { 83, 72, 85, 65, 73, 0 },282       { 83, 72, 85, 65, 78, 0 }, { 83, 72, 85, 65, 78, 71 },283       { 83, 72, 85, 73, 0, 0 }, { 83, 72, 85, 78, 0, 0 },284       { 83, 72, 85, 79, 0, 0 }, { 83, 73, 0, 0, 0, 0 },285       { 83, 79, 78, 71, 0, 0 }, { 83, 79, 85, 0, 0, 0 },286       { 83, 85, 0, 0, 0, 0 }, { 83, 85, 65, 78, 0, 0 },287       { 83, 85, 73, 0, 0, 0 }, { 83, 85, 78, 0, 0, 0 },288       { 83, 85, 79, 0, 0, 0 }, { 84, 65, 0, 0, 0, 0 },289       { 84, 65, 73, 0, 0, 0 }, { 84, 65, 78, 0, 0, 0 },290       { 84, 65, 78, 71, 0, 0 }, { 84, 65, 79, 0, 0, 0 },291       { 84, 69, 0, 0, 0, 0 }, { 84, 69, 78, 71, 0, 0 },292       { 84, 73, 0, 0, 0, 0 }, { 84, 73, 65, 78, 0, 0 },293       { 84, 73, 65, 79, 0, 0 }, { 84, 73, 69, 0, 0, 0 },294       { 84, 73, 78, 71, 0, 0 }, { 84, 79, 78, 71, 0, 0 },295       { 84, 79, 85, 0, 0, 0 }, { 84, 85, 0, 0, 0, 0 },296       { 84, 85, 65, 78, 0, 0 }, { 84, 85, 73, 0, 0, 0 },297       { 84, 85, 78, 0, 0, 0 }, { 84, 85, 79, 0, 0, 0 },298       { 87, 65, 0, 0, 0, 0 }, { 87, 65, 73, 0, 0, 0 },299       { 87, 65, 78, 0, 0, 0 }, { 87, 65, 78, 71, 0, 0 },300       { 87, 69, 73, 0, 0, 0 }, { 87, 69, 78, 0, 0, 0 },301       { 87, 69, 78, 71, 0, 0 }, { 87, 79, 0, 0, 0, 0 },302       { 87, 85, 0, 0, 0, 0 }, { 88, 73, 0, 0, 0, 0 },303       { 88, 73, 65, 0, 0, 0 }, { 88, 73, 65, 78, 0, 0 },304       { 88, 73, 65, 78, 71, 0 }, { 88, 73, 65, 79, 0, 0 },305       { 88, 73, 69, 0, 0, 0 }, { 88, 73, 78, 0, 0, 0 },306       { 88, 73, 78, 71, 0, 0 }, { 88, 73, 79, 78, 71, 0 },307       { 88, 73, 85, 0, 0, 0 }, { 88, 85, 0, 0, 0, 0 },308       { 88, 85, 65, 78, 0, 0 }, { 88, 85, 69, 0, 0, 0 },309       { 88, 85, 78, 0, 0, 0 }, { 89, 65, 0, 0, 0, 0 },310       { 89, 65, 78, 0, 0, 0 }, { 89, 65, 78, 71, 0, 0 },311       { 89, 65, 79, 0, 0, 0 }, { 89, 69, 0, 0, 0, 0 },312       { 89, 73, 0, 0, 0, 0 }, { 89, 73, 78, 0, 0, 0 },313       { 89, 73, 78, 71, 0, 0 }, { 89, 79, 0, 0, 0, 0 },314       { 89, 79, 78, 71, 0, 0 }, { 89, 79, 85, 0, 0, 0 },315       { 89, 85, 0, 0, 0, 0 }, { 89, 85, 65, 78, 0, 0 },316       { 89, 85, 69, 0, 0, 0 }, { 89, 85, 78, 0, 0, 0 },317       { 74, 85, 78, 0, 0, 0 }, { 89, 85, 78, 0, 0, 0 },318       { 90, 65, 0, 0, 0, 0 }, { 90, 65, 73, 0, 0, 0 },319       { 90, 65, 78, 0, 0, 0 }, { 90, 65, 78, 71, 0, 0 },320       { 90, 65, 79, 0, 0, 0 }, { 90, 69, 0, 0, 0, 0 },321       { 90, 69, 73, 0, 0, 0 }, { 90, 69, 78, 0, 0, 0 },322       { 90, 69, 78, 71, 0, 0 }, { 90, 72, 65, 0, 0, 0 },323       { 90, 72, 65, 73, 0, 0 }, { 90, 72, 65, 78, 0, 0 },324       { 90, 72, 65, 78, 71, 0 }, { 67, 72, 65, 78, 71, 0 },325       { 90, 72, 65, 78, 71, 0 }, { 90, 72, 65, 79, 0, 0 },326       { 90, 72, 69, 0, 0, 0 }, { 90, 72, 69, 78, 0, 0 },327       { 90, 72, 69, 78, 71, 0 }, { 90, 72, 73, 0, 0, 0 },328       { 83, 72, 73, 0, 0, 0 }, { 90, 72, 73, 0, 0, 0 },329       { 90, 72, 79, 78, 71, 0 }, { 90, 72, 79, 85, 0, 0 },330       { 90, 72, 85, 0, 0, 0 }, { 90, 72, 85, 65, 0, 0 },331       { 90, 72, 85, 65, 73, 0 }, { 90, 72, 85, 65, 78, 0 },332       { 90, 72, 85, 65, 78, 71 }, { 90, 72, 85, 73, 0, 0 },333       { 90, 72, 85, 78, 0, 0 }, { 90, 72, 85, 79, 0, 0 },334       { 90, 73, 0, 0, 0, 0 }, { 90, 79, 78, 71, 0, 0 },335       { 90, 79, 85, 0, 0, 0 }, { 90, 85, 0, 0, 0, 0 },336       { 90, 85, 65, 78, 0, 0 }, { 90, 85, 73, 0, 0, 0 },337       { 90, 85, 78, 0, 0, 0 }, { 90, 85, 79, 0, 0, 0 },338       { 0, 0, 0, 0, 0, 0 }, { 83, 72, 65, 78, 0, 0 },339       { 0, 0, 0, 0, 0, 0 }, };340 341   /**342    * First and last Chinese character with known Pinyin according to zh343    * collation344   */345   private static final String FIRST_PINYIN_UNIHAN = "\u963F";346   private static final String LAST_PINYIN_UNIHAN = "\u9FFF";347 348   private static final Collator COLLATOR = Collator.getInstance(Locale.CHINA);349 350   private static HanziToPinyin sInstance;351   private final boolean mHasChinaCollator;352 353   public static class Token {354     /**355      * Separator between target string for each source char356     */357     public static final String SEPARATOR = " ";358 359     public static final int LATIN = 1;360     public static final int PINYIN = 2;361     public static final int UNKNOWN = 3;362 363     public Token() {364     }365 366     public Token(int type, String source, String target) {367       this.type = type;368       this.source = source;369       this.target = target;370     }371 372     /**373      * Type of this token, ASCII, PINYIN or UNKNOWN.374     */375     public int type;376     /**377      * Original string before translation.378     */379     public String source;380     /**381      * Translated string of source. For Han, target is corresponding Pinyin.382      * Otherwise target is original string in source.383     */384     public String target;385   }386 387   protected HanziToPinyin(boolean hasChinaCollator) {388     mHasChinaCollator = hasChinaCollator;389   }390 391   public static HanziToPinyin getInstance() {392     synchronized (HanziToPinyin.class) {393       if (sInstance != null) {394         return sInstance;395       }396       // Check if zh_CN collation data is available397       final Locale locale[] = Collator.getAvailableLocales();398 399       // 增加的代码,增强。400       final Locale chinaAddition = new Locale("zh");401 402       for (int i = 0; i < locale.length; i++) {403         if (locale[i].equals(Locale.CHINA)404             || locale[i].equals(chinaAddition)) {405           // Do self validation just once.406           if (DEBUG) {407             Log.d(TAG, "Self validation. Result: "408                 + doSelfValidation());409           }410           sInstance = new HanziToPinyin(true);411           return sInstance;412         }413       }414       Log.w(TAG,415           "There is no Chinese collator, HanziToPinyin is disabled");416       sInstance = new HanziToPinyin(false);417       return sInstance;418     }419   }420 421   /**422    * Validate if our internal table has some wrong value.423    *424    * @return true when the table looks correct.425   */426   private static boolean doSelfValidation() {427     char lastChar = UNIHANS[0];428     String lastString = Character.toString(lastChar);429     for (char c : UNIHANS) {430       if (lastChar == c) {431         continue;432       }433       final String curString = Character.toString(c);434       int cmp = COLLATOR.compare(lastString, curString);435       if (cmp >= 0) {436         Log.e(TAG, "Internal error in Unihan table. "437             + "The last string \"" + lastString438             + "\" is greater than current string \"" + curString439             + "\".");440         return false;441       }442       lastString = curString;443     }444     return true;445   }446 447   private Token getToken(char character) {448     Token token = new Token();449     final String letter = Character.toString(character);450     token.source = letter;451     int offset = -1;452     int cmp;453     if (character < 256) {454       token.type = Token.LATIN;455       token.target = letter;456       return token;457     } else {458       cmp = COLLATOR.compare(letter, FIRST_PINYIN_UNIHAN);459       if (cmp < 0) {460         token.type = Token.UNKNOWN;461         token.target = letter;462         return token;463       } else if (cmp == 0) {464         token.type = Token.PINYIN;465         offset = 0;466       } else {467         cmp = COLLATOR.compare(letter, LAST_PINYIN_UNIHAN);468         if (cmp > 0) {469           token.type = Token.UNKNOWN;470           token.target = letter;471           return token;472         } else if (cmp == 0) {473           token.type = Token.PINYIN;474           offset = UNIHANS.length - 1;475         }476       }477     }478 479     token.type = Token.PINYIN;480     if (offset < 0) {481       int begin = 0;482       int end = UNIHANS.length - 1;483       while (begin <= end) {484         offset = (begin + end) / 2;485         final String unihan = Character.toString(UNIHANS[offset]);486         cmp = COLLATOR.compare(letter, unihan);487         if (cmp == 0) {488           break;489         } else if (cmp > 0) {490           begin = offset + 1;491         } else {492           end = offset - 1;493         }494       }495     }496     if (cmp < 0) {497       offset--;498     }499     StringBuilder pinyin = new StringBuilder();500     for (int j = 0; j < PINYINS[offset].length && PINYINS[offset][j] != 0; j++) {501       pinyin.append((char) PINYINS[offset][j]);502     }503     token.target = pinyin.toString();504     if (TextUtils.isEmpty(token.target)) {505       token.type = Token.UNKNOWN;506       token.target = token.source;507     }508     return token;509   }510 511   /**512    * Convert the input to a array of tokens. The sequence of ASCII or Unknown513    * characters without space will be put into a Token, One Hanzi character514    * which has pinyin will be treated as a Token. If these is no China515    * collator, the empty token array is returned.516   */517   public ArrayList<Token> get(final String input) {518     ArrayList<Token> tokens = new ArrayList<Token>();519     if (!mHasChinaCollator || TextUtils.isEmpty(input)) {520       // return empty tokens.521       return tokens;522     }523     final int inputLength = input.length();524     final StringBuilder sb = new StringBuilder();525     int tokenType = Token.LATIN;526     // Go through the input, create a new token when527     // a. Token type changed528     // b. Get the Pinyin of current charater.529     // c. current character is space.530     for (int i = 0; i < inputLength; i++) {531       final char character = input.charAt(i);532       if (character == ' ') {533         if (sb.length() > 0) {534           addToken(sb, tokens, tokenType);535         }536       } else if (character < 256) {537         if (tokenType != Token.LATIN && sb.length() > 0) {538           addToken(sb, tokens, tokenType);539         }540         tokenType = Token.LATIN;541         sb.append(character);542       } else {543         Token t = getToken(character);544         if (t.type == Token.PINYIN) {545           if (sb.length() > 0) {546             addToken(sb, tokens, tokenType);547           }548           tokens.add(t);549           tokenType = Token.PINYIN;550         } else {551           if (tokenType != t.type && sb.length() > 0) {552             addToken(sb, tokens, tokenType);553           }554           tokenType = t.type;555           sb.append(character);556         }557       }558     }559     if (sb.length() > 0) {560       addToken(sb, tokens, tokenType);561     }562     return tokens;563   }564 565   private void addToken(final StringBuilder sb,566       final ArrayList<Token> tokens, final int tokenType) {567     String str = sb.toString();568     tokens.add(new Token(tokenType, str, str));569     sb.setLength(0);570   }571 }

写一个MainActivity.java测试汉字转化为汉语拼音输出的效果:

 1 package zhangphil.hanyupinyin; 2  3 import java.util.ArrayList; 4  5 import zhangphil.hanyupinyin.HanziToPinyin.Token; 6 import android.app.Activity; 7 import android.os.Bundle; 8  9 public class MainActivity extends Activity {10 11   @Override12   protected void onCreate(Bundle savedInstanceState) {13     super.onCreate(savedInstanceState);14 15     String s = "安卓";16     System.out.println("汉字转拼音输出: " + getPinYin(s));17   }18 19   // 输入汉字返回拼音的通用方法函数。20   public static String getPinYin(String hanzi) {21     ArrayList<Token> tokens = HanziToPinyin.getInstance().get(hanzi);22     StringBuilder sb = new StringBuilder();23     if (tokens != null && tokens.size() > 0) {24       for (Token token : tokens) {25         if (Token.PINYIN == token.type) {26           sb.append(token.target);27         } else {28           sb.append(token.source);29         }30       }31     }32 33     return sb.toString().toUpperCase();34   }35 }

结果输出如图:

(转)汉字转拼音HanziToPinyinimages/loading.gif' data-original="http://img.blog.csdn.net/20150731084108217?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQv/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" />

(转)汉字转拼音HanziToPinyin




原标题:(转)汉字转拼音HanziToPinyin

关键词:

*特别声明:以上内容来自于网络收集,著作权属原作者所有,如有侵权,请联系我们: admin#shaoqun.com (#换成@)。

去日本入住酒店,东西随意用却有一个特殊“要:https://www.vstour.cn/a/411241.html
中国有哪些著名的酒店品牌。:https://www.vstour.cn/a/411242.html
相关文章
我的浏览记录
最新相关资讯
海外公司注册 | 跨境电商服务平台 | 深圳旅行社 | 东南亚物流