记录英语单词时,想把英语和中文翻译分别对齐,有些人写代码喜欢把变量按这种方式对齐。在网上没搜到相关方法,于是自己试着写代码去实现,原本以为很简单,写的时候才发现有不少问题。先看效果:普通的对齐前:对齐后:发挥点创意对齐前:对齐后: 实现实现的思路比较简单,读取文本文件,按正则 ...
记录英语单词时,想把英语和中文翻译分别对齐,有些人写代码喜欢把变量按这种方式对齐。在网上没搜到相关方法,于是自己试着写代码去实现,原本以为很简单,写的时候才发现有不少问题。先看效果:
普通的
对齐前:
对齐后:
发挥点创意
对齐前:
对齐后:
实现
实现的思路比较简单,读取文本文件,按正则分割,找出最长的部分,补齐空格,输出。
看起来相当简单,花了一个多小时,就写出来了,马上运行,发现输出一团糟,去数每个部分的字符数,个数是一样的,网上一搜,原来跟字体有关系,好吧,那换个等宽的字体。换好字体后有些地方已经对齐了,有些地方还是没对齐,发现是中文的问题,中文宽度与英文宽度不相同,于是首先根据正则去判断字符是中文还是英文,然后自己实现计算字符长度的方法,在判断中文字符上折腾了许久,因为标点符号等等都要考虑进去,反正是来来回回试了好久,对Unicode编码范围不熟悉,没办法。终于,好像都搞定了,反复测试,突然发现第一行的对齐少了一个空格,尼玛,这是怎么回事啊,Debug发现第一行的最开始有一个奇怪的字符"\uFEFF",这他妈是什么鬼,上网搜,发现是Unicode编码的什么鬼BOM头,好吧,不管它是什么鬼,直接把它去掉了……
反正是遇到了各种各样的问题,越到后面心里越没底了,与字符集相关的问题实在是太头疼了,而且我根本就没去处理编码的问题,所以文本的编码需要和IDE的编码保持一致,否则就会产生乱码。我也就这样算了,以下是JAVA代码实现。
源码
因为看过《重构》和《代码整洁之道》,写代码时时刻想着要写干净点,扩展性强点,经过反复修改,最终自己觉得还行吧,当然,肯定有不少值得改进的地方,现在就这样吧。
App.java
package textalign;
import java.io.IOException;
/**
* @author tingl
* @version 2017/9/27
*/
public class App {
public static void main(String[] args) {
long start = System.currentTimeMillis();
String filePath = "C:\\Users\\tingl\\Desktop\\Test2.txt";
TextAlign textAlign = new TextAlign(/*",|。|,|[.]|( {2,})|\t| +"*/);
if (args.length > 0) {
filePath = args[0];
}
try {
textAlign.align(filePath);
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(System.currentTimeMillis() - start);
}
}
TextAlign.java
package textalign;
import java.io.IOException;
import java.util.List;
import java.util.regex.Pattern;
/**
* @author tingl
* @version 2017/9/27
*/
public class TextAlign {
private static final String CHINESE_CHARACTER = "[\u4e00-\u9fa5]|[\uFE30-\uFFA0]|[\u3000-\u303F]";
private static final Pattern CHINESE_CHARACTER_PATTERN = Pattern.compile(CHINESE_CHARACTER);
private static final int SEPARATE_SPACE_AMOUNT = 4;
private TextAlignFileUtil textAlignFileUtil;
private List<String[]> textLines;
private int[] longestBlockLengths;
public TextAlign() {
textAlignFileUtil = new TextAlignFileUtil();
}
public TextAlign(String spiltRegex) {
textAlignFileUtil = new TextAlignFileUtil(spiltRegex);
}
public void align(String filePath) throws IOException {
textLines = textAlignFileUtil.readToList(filePath);
initLongestBlockLengths();
fillTextLinesBySpaces();
textAlignFileUtil.write();
}
private void initLongestBlockLengths() {
int longestArrayLength = 0;
for (String[] blocks : textLines) {
if (blocks.length > longestArrayLength) {
longestArrayLength = blocks.length;
}
}
longestBlockLengths = new int[longestArrayLength];
fillLongestBlockLengths();
}
private void fillLongestBlockLengths() {
for (String[] blocks : textLines) {
if (blocks.length < 2) continue;
for (int i = 0; i < blocks.length; i++) {
int length = stringLengthFitWidth(blocks[i]);
if (length > longestBlockLengths[i]) {
longestBlockLengths[i] = length;
}
}
}
}
private int stringLengthFitWidth(String s) {
if (!CHINESE_CHARACTER_PATTERN.matcher(s).find()) {
return s.length();
}
int length = 0;
for (String c : s.split("")) {
if (CHINESE_CHARACTER_PATTERN.matcher(c).find()) {
length++;
}
length++;
}
return length;
}
private void fillTextLinesBySpaces() {
for (String[] blocks : textLines) {
for (int i = 0; i < blocks.length - 1; i++) {
String block = blocks[i];
int spaceAmount = longestBlockLengths[i] - stringLengthFitWidth(block) + SEPARATE_SPACE_AMOUNT;
blocks[i] = block + spaces(spaceAmount);
}
}
}
private String spaces(int spaceAmount) {
StringBuilder spaces = new StringBuilder();
for (int i = 0; i < spaceAmount; i++) {
spaces.append(" ");
}
return spaces.toString();
}
}
TextAlignFileUtil.java
package textalign;
import java.io.*;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
/**
* @author tingl
* @version 2017/9/27
*/
class TextAlignFileUtil {
private static final String FILENAME_POSTFIX = "_aligned";
private String spiltRegex = "( {2,})|\t";
private List<String[]> textLines;
private String outPath;
TextAlignFileUtil() {
}
TextAlignFileUtil(String spiltRegex) {
this.spiltRegex = spiltRegex;
}
List<String[]> readToList(String path) throws IOException {
File file = new File(path);
return readToList(file);
}
private List<String[]> readToList(File file) throws IOException {
getOutPath(file.getAbsolutePath());
BufferedReader reader = new BufferedReader(new FileReader(file));
textLines = new ArrayList<>();
String line;
while ((line = reader.readLine()) != null) {
textLines.add(removeEmptyAndTrim(line.split(spiltRegex)));
}
reader.close();
removeBomHead();
return textLines;
}
private void getOutPath(String srcPath) {
int dotPosition = srcPath.lastIndexOf(".");
outPath = srcPath.substring(0, dotPosition) + FILENAME_POSTFIX + srcPath.substring(dotPosition);
if (new File(outPath).exists()) {
getOutPath(outPath);
}
}
private String[] removeEmptyAndTrim(String[] src) {
for (int i = 0; i < src.length; i++) {
src[i] = src[i].trim();
}
List<String> dest = new ArrayList<>(Arrays.asList(src));
dest.removeIf(String::isEmpty);
return dest.toArray(new String[0]);
}
private void removeBomHead() {
String[] blocks = textLines.get(0);
blocks[0] = blocks[0].replace("\uFEFF", "");
}
void write() throws IOException {
BufferedWriter writer = new BufferedWriter(new FileWriter(outPath));
for (String[] blocks : textLines) {
writer.write(getLine(blocks));
writer.newLine();
writer.flush();
}
writer.close();
}
private String getLine(String[] blocks) {
StringBuilder sb = new StringBuilder();
for (String block : blocks) {
sb.append(block);
}
return sb.toString();
}
}
原标题:每行文本分为多个小段相互对齐,以及编码中的小问题
关键词:编码
*特别声明:以上内容来自于网络收集,著作权属原作者所有,如有侵权,请联系我们:
admin#shaoqun.com
(#换成@)。