2006年12月26日 星期二

用 JavaScript 讀寫檔案

除非你開放存取權限,通常因為安全上的顧慮,JavaScript 並不能夠存取你電腦上的檔案系統。

這裡介紹兩種方式來利用 JavaScript 存取檔案系統:
  1. 使用 JavaScript extension(如從 JavaScript Editor 中執行)
  2. 使用微軟的 ActiveX 物件(只適用於 Internet Explorer)
使用 ActiveX 讓我們擁有很大的談行,但是有一些限制:
  • 你必須要有一個網頁來執行你的 JavaScript;
  • ActiveX 物件只被 IE 支援
當使用 JavaScript Editor 時,你需要選取選單中的「Build / Execute」,並載入下面介紹的語法的 script 檔即可。

範例 1(使用 JavaScript Editor 讀取檔案)

  1. 執行 JavaScript Editor
  2. 複製下列的程式碼Copy and paste the code below
  3. 將其儲存為 FileRead.js
  4. 從選單中選取「Build / Execute」

範例 2(使用 JavaScript Editor 列出目錄內的檔案)

  1. 執行 JavaScript Editor
  2. 複製下列的程式碼Copy and paste the code below
  3. 將其儲存為 FolderExample.js
  4. 從選單中選取「Build / Execute」

範例 3(使用 ActiveX 列出所有的磁碟機代號)

  1. 複製下面的程式碼
  2. 將檔案儲存為 DriveList.htm
  3. 在瀏覽器中檢視該頁面

範例 4(用 ActiveX 物件寫入檔案)

參考

Windows Script

2006年12月20日 星期三

HOWTO use CRF++

使用方法

Training 和 Test 的檔案格式

Training 和 test 檔都必須以特定的格式來編碼好讓 CRF++ 能正常運作。一般來說,training 和 test 檔必須由多個 token 所組成。此外,一個 token 是由多個(但是是固定數量的)的 column 所組成。依據不同的 task 可能會有不同的 token 定義,然而,在大多數的情形下,它們相當於 word。每一個 token 必須被宣告於一列中,每一個 column 藉由空白字元(或是 tab 字元)加以區隔。一連串的 token 構成一個 sentence。要辨識每個 sentence 之間的界線,是利用一個空白列加以區分。

我們可以加入任意數目的 cloumn,但是加入的 column 數目在所有的 token 中必須一樣。此外,有一些 semantics 會存在於 column 之間,比如,第一個 column 代表‘word’,第二個 column 是‘POS tag’,第三個是‘sub-category of POS’,依此類推。

最後一個 column 代表的是一個 true answer tag,它會被 CRF 用來 train。

下面是一個範例:

He        PRP  B-NP
reckons   VBZ  B-VP
the       DT   B-NP
current   JJ   I-NP
account   NN   I-NP
deficit   NN   I-NP
will      MD   B-VP
narrow    VB   I-VP
to        TO   B-PP
only      RB   B-NP
#         #    I-NP
1.8       CD   I-NP
billion   CD   I-NP
in        IN   B-PP
September NNP  B-NP
.         .    O

He        PRP  B-NP
reckons   VBZ  B-VP
..

每一個 token 有三個 column:

  • 字本身(比如 reckons);
  • 該字的詞性(比如 VBZ);
  • Chunk(answer)tag;以 IOB2 格式來表示;

下面的資料是無效的,因為第二個和第三個的 column 為 2(它們沒有 POS column)。column 的數目必須是固定的。

He        PRP  B-NP
reckons   B-VP
the       B-NP
current   JJ   I-NP
account   NN   I-NP
..

準備 feature template

因為 CRF++ 設計成為一個泛用的工具,使用者必須事先指定 feature template。該檔描述了在 training 和 testing 時會用到的 feature,

Template basic 和 macro

在 template 檔中的每一列,代表的是一個 template。在每一個 template 中,%x[row,col] 這種特殊的 macro 會被用來指定在輸入資料中的某一個 token。row 指明了相對於目前 token 的位置,而 col 則指定了 column 的絕對位置。

下面是一個例子:

輸入的資料:

He        PRP  B-NP
reckons   VBZ  B-VP
the       DT   B-NP << 目前的 token
current   JJ   I-NP
account   NN   I-NP

templateexpanded feature
%x[0,0]the
%x[0,1]DT
%x[-1,0]rokens
%x[-2,1]PRP
%x[0,0]/%x[0,1]the/DT
ABC%x[0,0]123ABCthe123

Template 類型

要注意的是,template 有兩種類型,使用者必須指定所有的 template 的類型。template 的第一個字元說明了該 template 是哪種類行。

Unigram template:第一個字元是‘U’

用來描述 unigram feature 的 template。當你指定一個 template:“U01:%x[0,1]”,CRF++ 自動產生一組 feature function(func1 ... funcN):

func1 = if (output = B-NP and feature="U01:DT") return 1 else return 0
func2 = if (output = I-NP and feature="U01:DT") return 1 else return 0
func3 = if (output = O and feature="U01:DT") return 1  else return 0
....
funcXX = if (output = B-NP and feature="U01:NN") return 1  else return 0
funcXY = if (output = O and feature="U01:NN") return 1  else return 0
...

template 產生的 feature function 數目是 (L * N),其中 L 是輸出的 class 數目,N 是從指定的 template 中 expand 的 unique string 。

Bigram template:第一個字元為‘B’

用來描述 bigram feature 的 template。藉由這個 template,最自動產生目前的輸出 token 和 前一個輸出 token(bigram)的結合。這種類型的 template 會產生總共 (L * L * N) 不同的 feature,L 是輸出的 class 數目,N 是被這個 template 產生的 unique feature 總數。當 class 的數量越大,這種 template 會產生大量的不同 feature,因此可能造成在 training 和 testing 上的效能低落。

用來分辨相對位置的 identifier

當使用者要去辨識 token 的相對位置時,必須在 template 中放置一個 identifier。

下面的例子中,“%x[-2,1]”“%x[1,1]” macro 會被替換成“DT”。但是它們代表不同的“DT”:

The       DT  B-NP
pen       NN  I-NP
is        VB  B-VP << CURRENT TOKEN
a         DT  B-NP

為了要區隔它們,我們會放置一個 unique identifier(U01: 或 U02:)在 template 中:

U01:%x[-2,1]
U02:%x[1,1]

在上面的例子中,這兩個 template 會被視為不同,因為它們會被 expand 成為不同的 feature,“U01:DT”和“U02:DT”。我們可以使用任何一種 identifier,但是最好使用數字來管理它們的命名,因為它們可以簡單的對應到 feature number。

假如你想使用“bag-of-words”feature,則不需要使用 identifier。

例子

下面是一個用於 CoNLL 2000 shared task 和 Base-NP chunking task 的 template 範例。只用到一個 bigram template(‘B’)。這表示只有前一個輸出 token 和目前的 token 的組合,才拿來當作 bigram feature。空白列或是以 # 起頭的列會被視為註解:

# Unigram
U00:%x[-2,0]
U01:%x[-1,0]
U02:%x[0,0]
U03:%x[1,0]
U04:%x[2,0]
U05:%x[-1,0]/%x[0,0]
U06:%x[0,0]/%x[1,0]

U10:%x[-2,1]
U11:%x[-1,1]
U12:%x[0,1]q
U13:%x[1,1]
U14:%x[2,1]
U15:%x[-2,1]/%x[-1,1]
U16:%x[-1,1]/%x[0,1]
U17:%x[0,1]/%x[1,1]
U18:%x[1,1]/%x[2,1]

U20:%x[-2,1]/%x[-1,1]/%x[0,1]
U21:%x[-1,1]/%x[0,1]/%x[1,1]
U22:%x[0,1]/%x[1,1]/%x[2,1]

# Bigram
B

若是寫成:

B01:%x[0,0]

則實際的 feature 數為:

(x[0,0] 的字元總數)×pre_label×cur_label

若是只寫:

B

則實際的 feature 數只有:

pre_label×cur_label

Training(encoding)

在命令列中:

crf_learn template_file train_file model_file

template_file 和 train_file 必須在事先準備好。crf_learn 會產生在 model_file 參數指定的檔案中產生 trained model。

一般來說,crf_learn 會在標準輸出中產生下列的資訊。另外,crf_learn 也會顯示每一次 LBFGS 程序的額外資訊。

C:> crf_learn template_file train_file model_file

CRF++: Yet Another CRF Tool Kit
Copyright (C) 2005 Taku Kudo, All rights reserved.

reading training data:
Done! 0.32 s

Number of sentences:          77
Number of features:           32856
Freq:                         1
eta:                          0.0001
C(sigma^2):                   10

iter=0 terr=0.7494725738 serr=1 obj=2082.968899 diff=1
iter=1 terr=0.1671940928 serr=0.8831168831 obj=1406.329356 diff=0.3248438053
iter=2 terr=0.1503164557 serr=0.8831168831 obj=626.9159973 diff=0.5542182244
  • iter:處理多少 iteration
  • terr:錯誤率(針對 tag)(錯誤的 tag 數/tag 總數)
  • serr:針對 sentence 的錯誤率(錯誤的 sentence 數/sentence 總數)
  • obj:目前的 object 值。當該值趨近於定點(fixed point),CRF++ 會停止該 iteration。
  • diff:相對於前一個 object 值的差異。

有兩個主要的參數用來控制 training:

  • -c float: 藉由這個選項,我們可以改變 CRF 的 hyper-parameter。越大的 C 值,CRF 越傾向去 overfit 指定的 training corpus。該參數控制了 overfitting 和 underfitting 之間的平衡。這個參數對產生的結果有很大的影響力。
  • -f NUM: 這個參數設定 feature 的 cut-off threshold。CRF++ 在指定的 training 資料中使用這個參數不小於 NUM 次。預設值為 1。當套用 CRF++ 到大量的資料時,unique feature 的資料量將會很大,在這種情形下,這個參數很有用。

自 0.42 版起,CRF++ 開始支援平行處理!對於有兩個 CPU 以上的系統而言,這可以增進 training 的速度!要啟用該選項,請在命令列中加上下列參數:

  • -p NUM:若是擁有多個 CPU,則藉由使用多執行緒處理,可以增進 training 的速度。NUM 是執行緒的數目。

下面是使用這兩個參數的例子:

% crf_learn -f 3 -c 1.5 template_file train_file model_file

Testing(decoding)

在命令列下:

crf_test -m model_file test_files ...

model_file 是被 crf_learn 產生的 model 檔。在測試時,不需要指定 template 檔,因為 model 檔有跟 template 一樣的資訊。test_file 是我們希望被指定連續 tag 的 test 資料。該檔的格式跟 training 檔的格式一樣。

下面是 crf_test 的輸出:

C:\ crf_test -m model test.data
Rockwell        NNP     B       B
International   NNP     I       I
Corp.   NNP     I       I
's      POS     B       B
Tulsa   NNP     I       I
unit    NN      I       I
..

最後一個 column 是經由 CRF++ 判斷的 tag。假如第三個 column 是正確答案的 tag,我們可以藉由比較第三和第四個 colume 的差異得到正確率。

verbose level

-v 選項設定 verbose level。預設值為 0。藉由增加 level,我們可以從 CRF++ 中取得額外的資訊。

level 1

可以取得每一個 tag 的邊緣可能性(marginal probability)(每個輸出的 tag 的信任度衡量),和輸出的可能性(整個輸出的可靠度衡量)。

C:\ crf_test -v1 -m model test.data| head
# 0.478113
Rockwell        NNP     B       B/0.992465
International   NNP     I       I/0.979089
Corp.   NNP     I       I/0.954883
's      POS     B       B/0.986396
Tulsa   NNP     I       I/0.991966
...

第一列“# 0.478113”顯示輸出的可靠度,另外每一個輸出的 tag 都有一個可靠度衡量輸出,像是“B/0.992465”。

level 2

顯示所有其它的可能性的邊緣可能性。

C:\ crf_test -v2 -m model test.data
# 0.478113
Rockwell        NNP     B       B/0.992465      B/0.992465      I/0.00144946    O/0.00608594
International   NNP     I       I/0.979089      B/0.0105273     I/0.979089      O/0.0103833
Corp.   NNP     I       I/0.954883      B/0.00477976    I/0.954883      O/0.040337
's      POS     B       B/0.986396      B/0.986396      I/0.00655976    O/0.00704426
Tulsa   NNP     I       I/0.991966      B/0.00787494    I/0.991966      O/0.00015949
unit    NN      I       I/0.996169      B/0.00283111    I/0.996169      O/0.000999975
..

N-best 輸出

藉由 -n 選項,使用者可以取得 N-best result。藉著啟用 n-best 輸出模式,CRF++ 首先會加入一個額外的列,該列會近似“# N prob”,其中 N 代表該輸出的 rank,其值是從 0 開始算;prob 表示該輸出的 conditional probability。

要注意的是,假如 CRF++ 無法找到任何候選時,將不會列舉出 N-best 結果。一個可能遇到這種情況的是將 CRF++ 應用於一個短的 sentence 時。

下面是一個 N-best 結果的例子:

C:\ crf_test -n 20 -m model test.data
# 0 0.478113
Rockwell        NNP     B       B
International   NNP     I       I
Corp.   NNP     I       I
's      POS     B       B
...

# 1 0.194335
Rockwell        NNP     B       B
International   NNP     I       I

CRF++ 0.3 跟 0.42 的比較

Comparison CRF++ 0.3 CRF++ 0.42 (parallel) with -p=2
Server iasl-64server1 (140.109.19.203) Windows 2003 Enterprise x64 Edition R2 10 G RAM
features 14791014 14791014
Thread Default (1) 2
iteraction 155 159
reading training data times 164.02 s 135.82 s
Memory usage 2.82 G 2.85 G 3.04 G 3.21 G
CPU usage 1 CPU 2 CPU with 50-100% usage
Training time 1838.89 s 1313.00 s
Testing result (uncertainty) F MEASURE: 0.952 TOTAL NCHANGE: 6887 F MEASURE: 0.952 TOTAL NCHANGE: 6881

2006年11月17日 星期五

Maximum Entropy Modeling Toolkit

bbME

我客製化過後的 ME

  1. 啟用 LBFGS
  2. 預設 300 iteraction
  3. .NET CLR library 實做讀檔

注意事項

將 cutoff 設成 0 和設成 1 是一樣的,因為不會有 0 次的 event 啊!

另外,若是設的數值過大,比如 2000,但是在 train 資料中沒有出現超過 2000 的 event,那麼程式會出錯!如下的錯誤訊息:

 IFLAG= -3
IMPROPER INPUT PARAMETERS (N OR M ARE NOT POSITIVE)

使用方式

Train

依據 train.txt 建立一個名為 test.model 的 model 檔,並設定 iteraction 為 30(預設會啟用 L-BFGS):

MEMT.exe/maxent.exe train.txt -m model1 -i 30

假若指定 -b 參數,那麼 model 在儲存時會以 binary 的方式儲存,如此可以增進載入和儲存的速度(相較於 text mode)。在載入時,會自動判斷 model 的格式,因此不需要額外指定 -b 參數。

預設 MEMT 會使用 mmap() 系統呼叫來讀取檔案;若是無法順利的讀取,則請在參數中加上: --nommap。

若是指定 -v 則會印出 verbose 訊息。

Predict

進行 predict 並輸出 predict accuracy:

MEMT.exe/maxent -p  -m 

進行 predict 並輸出 label 到指定的檔案:

MEMT.exe/maxent -p -m  -o  

另外加上 --detail,會輸出完整的描述。

其它特殊使用

在 train.txt 上執行 10-fold cross-validation,並回報正確率:

MEMT.exe/maxent.exe -n 10 train.txt

編譯法

編譯 Maximum Entropy Modeling Toolkit for Python and C++

在編譯前,必須先安裝好 Intel 的 Fortran Compiler(可在此下載)。安裝完後,在 Visual Studio 環境中會多出開發 Fortran 的功能。

利用整合好的環境,編譯 src/lbfgs.f 檔,產生 static library。或是直接使用安裝好的 Intel Fortran Compiler Environment,在命令列下達:

ifort  -c -O3 lbfgs.f

這樣會產生 lbfgs/obj 檔。將該檔直接加到專案的檔案列表中。(拉到 IDE 內即可)

接著把所有的 C++ source code 匯入 C++ 專案後,修改 hash_map 和 hash_set 的定義(在編譯時會出錯,將它們的 namespace 改為 stdext 即可)。接著加入額外的 Include 目錄:lib(裡面含有 boot 目錄和其它標頭檔),並在連結器選項中加入 Intel Fortran 的 lib 作為額外的程式庫目錄。若是在編譯時發生:Strings.h 找不到時,請定義 HAVE_STRING_H 的前置處理器定義。

接著進行編譯。若是遇到 LIBCMTD.lib(crt0dat.obj) : error LNK2005: _exit 已經在 MSVCRTD.lib(MSVCR71D.dll) 中定義,等類似錯誤時,請在專案選項中的:「C/C++→程式碼產生→執行階段程式庫」將之修改為:多執行緒偵錯(/MTd),並且在「連結器→輸入」 中,將 LIBCMT.lib 加到忽略的特定程式庫中。

這是最後產生的專案檔。參考壓縮檔內的 build 目錄。要啟用 LBFGS 支援,則多定義 HAVE_FORTRAN 標頭。

CreateFile 失敗?

在編譯時,我習慣把 .c 檔設定為編譯為 C 程式碼。這似乎會造成呼叫 CreateFile 失敗。若是遇到該問題時,請將該 source code 修改成「C/C++→進階→編譯成為 C++ 程式碼/TP」。

此外,若遇到無法將 const char* 轉換為 LPCWSTR 的問題時,請參考本文。下面是一個修改範例:

// add by hongjie for CA2W
#include 

// ...
// modify by hongjie
//fh = CreateFile(file, access_mode, FILE_SHARE_READ, NULL,
fh = CreateFile(CA2W(file), access_mode, FILE_SHARE_READ, NULL,
OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL|FILE_FLAG_RANDOM_ACCESS,NULL);

Link Error: unresolved external symbol iob ?

注意,用這種方式 compile 出來的為 dynamic link,因此目標電腦上必須安裝好 Intel Fortran!

參考本文

節錄如下:

Whatever your reason was for ignoring these libs, ignoring them is the reason you're getting linker errors. It looks like you're going to have to figure out why you are forced to ignore those libs, and correct it from there.

LNK2001s/LNK2019s are usually a result of mixing debug libs with release libs (or libs compiled with another version of Visual Studio). It can also be caused by static libraries not using the same runtime library (project properties -> c/c++ -> code generation -> runtime library -> should be /MDd for debug and /MD for release).

以這裡為例,在 compile LBFGS 時,應該再加上 -MD 參數;如下:

ifort  -c -O3 /Qvc8  -MD lbfgs.f
/MD
use dynamically-loaded, multithread runtime
/MDs
use dynamically-loaded, single thread runtime

convert System::String to std:string

參考原文

方法如下:

#include 
#include 
#include 
using namespace System;
bool To_CharStar( String^ source, char*& target )
{
pin_ptr wch = PtrToStringChars( source );
int len = (( source->Length+1) * 2);
target = new char[ len ];
return wcstombs( target, wch, len ) != -1;
}

bool To_string( String^ source, string &target )
{
pin_ptr wch = PtrToStringChars( source );
int len = (( source->Length+1) * 2);
char *ch = new char[ len ];
bool result = wcstombs( ch, wch, len ) != -1;
target = ch;
delete ch;
return result;
}

convert std:string to System::String

System::String* std2gc(const std::string& s)
{
return new System::String(s.c_str());
}

2006年11月2日 星期四

HOWTO Convert Char 2 LPCWSTR

強制轉換

char c[20]="abc";
LPCWSTR lc=(WCHAR *)c;

但是這樣會遺失資訊

利用 ATL Macro

參考此頁

下面是幾個例子:

#include
LPCWSTR lc = CA2W(c);

HANDLE fh = CreateFile(CA2W(file), access_mode, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL,NULL);

USE_CONVERSION;
LPCWSTR lc = A2W(c);

LPCWSTR pw = T2W("Hello,world!"); // tchar -> wchar

LPCTSTR pt = W2T(L"Hello,world!"); // wchar -> tchar

其它參考

char* 轉換成 CString

若將 char* 轉換成 CString,除了直接賦值外,還可使用 CString::Format 進行。例如:

char chArray[] = "This is a test";
char * p = "This is a test";

LPSTR p = "This is a test";

或在已定義 Unicode 中

TCHAR * p = _T("This is a test");

LPTSTR p = _T("This is a test");
CString theString = chArray;
theString.Format(_T("%s"), chArray);
theString = p;

CString 轉換成 char*

若將 CString 類轉換成 char*(LPSTR) 類型,常常使用下列三種方法:

使用強制轉換

例如:

CString theString( "This is a test" );
LPTSTR lpsz =(LPTSTR)(LPCTSTR)theString;

需要說明的是,strcpy(或可移值 Unicode/MBCS 的 _tcscpy)的第二個參數是 const wchar_t* (Unicode) 或 const char* (ANSI),系統編譯器將會自動對其進行轉換。

使用 CString::GetBuffer

例如:

CString s(_T("This is a test "));
LPTSTR p = s.GetBuffer();

// 在這裏添加使用p的代碼
if(p != NULL) *p = _T('\0');
s.ReleaseBuffer();
// 使用完後及時釋放,以便能使用其他的CString成員函數

BSTR 轉換成 char*

使用 ConvertBSTRToString

例如:

#include
#pragma comment(lib, "comsupp.lib")
int _tmain(int argc, _TCHAR* argv[])
{
BSTR bstrText = ::SysAllocString(L"Test");
char* lpszText2 = _com_util::ConvertBSTRToString(bstrText);
SysFreeString(bstrText); // 用完釋放
delete[] lpszText2;
return 0;
}

使用 _bstr_t 的賦值運算符重載

例如:

_bstr_t b = bstrText;
char* lpszText2 = b;

char* 轉換成 BSTR

使用 SysAllocString 等 API 函數

例如:

BSTR bstrText = ::SysAllocString(L"Test");
BSTR bstrText = ::SysAllocStringLen(L"Test",4);
BSTR bstrText = ::SysAllocStringByteLen("Test",4);

使用 COleVariant 或 _variant_t

例如:

//COleVariant strVar("This is a test");
_variant_t strVar("This is a test");
BSTR bstrText = strVar.bstrVal;

使用 _bstr_t

這是一種最簡單的方法。例如:

BSTR bstrText = _bstr_t("This is a test");

使用 CComBSTR

例如:

BSTR bstrText = CComBSTR("This is a test");

CComBSTR bstr("This is a test");
BSTR bstrText = bstr.m_str;

使用 ConvertStringToBSTR

例如:

char* lpszText = "Test";
BSTR bstrText = _com_util::ConvertStringToBSTR(lpszText);

CString 轉換成 BSTR

通常是通過使用 CStringT::AllocSysString 來實現。例如:

CString str("This is a test");
BSTR bstrText = str.AllocSysString();
// …
SysFreeString(bstrText); // 用完釋放

BSTR 轉換成 CString

一般可按下列方法進行:

BSTR bstrText = ::SysAllocString(L"Test");
CStringA str;
str.Empty();
str = bstrText;

ANSI、Unicode 和寬字元之間的轉換

可以使用 MultiByteToWideChar 將 ANSI 字元轉換成 Unicode 字元,或是使用 WideCharToMultiByte 將 Unicode 字元轉換成 ANSI 字元。

另外也可以使用“_T”將 ANSI 轉換成“一般”類型字串,使用“L”將 ANSI 轉換成 Unicode,而在託管 C++ 環境中還可使用 S 將 ANSI 字串轉換成 String* 物件。例如:

TCHAR tstr[] = _T("this is a test");
wchar_t wszStr[] = L"This is a test";
String* str = S”This is a test”;

此外,還能更方便的使用 ATL 7.0 的轉換類別。ATL7.0 在原有 3.0 基礎上完善和增加了許多字串轉換巨集以及提供相應的類別。

其中,第一個 C 表示“class”,以便於 ATL 3.0 相區別,第二個 C 表示常數,2 表示“to”,EX 表示要開闢一定大小的緩衝。SourceType 和 DestinationType 可以是 A、 T、W 和 OLE,其含義分別是 ANSI、Unicode、“一般”類型和 OLE 字串。例如,CA2CT 就是將 ANSI 轉換成一般類型的字串常數。下面是一些例子:

LPTSTR tstr= CA2TEX<16>("this is a test");
LPCTSTR tcstr= CA2CT("this is a test");
wchar_t wszStr[] = L"This is a test";
char* chstr = CW2A(wszStr);

參考

2006年4月10日 星期一

Flex CRFs v.s. CRF++

比較對照表

CRF++FlexCRFs
Configuration * Freq = 1 * C(sigma^2) = 1.00000 * Number of features: UC (Unigram for current word)/U+BC (Unigram+Bigram for current word) = 7504/11280 * Iteration = 40/43 * f_rare_threshold = 1 * sigma_square = 1 * Number of features: 1851 * Iteration = 128
Result * UC-recall / precision / f-score = 0.6656 / 0.6535 / 0.6595 * U+BC-recall / precision / f-score = 0.6705 / 0.6582 / 0.6643 * recall / precision / f-score = 0.6332 / 0.6362 / 0.6347

FlexCRFs V.S. CRF++

CRF++ FlexCRFs
Feature numbers 30,986 No edge features With edge features
34,848 34,861
Training time 690.10 seconds 20,370.0 seconds
Num. of iterations 88 46
Performance (recall / precision / f-score) 0.6948 / 0.7002 / 0.6975 No edge features With edge features
0.5659 / 0.6033 / 0.5840 0.6642 / 0.6515 / 0.6578
Based on 86918 sentences C(sigma^2) = 1.0

2006年3月14日 星期二

Word Calculator

更新紀錄

  • 2006/03/13:讀取檔名列表,並計算列在其中的檔案之間的 f-score 差異。
  • 2006/03/10:計算 GTB 和 PB 中的動詞個數。
  • 2005/11/15:加入不規則變化字典;結果
  • 2005/11/11:結果

檔名列表格式

#Base
代表下一行的檔名為 base line
#Lists
代表其後列出的每一行為要比較的檔名列表

預設,碰到的第一個 #Base#Lists 代表的是 PB,其後的 #Base#Lists 會被視為是 GTB。 下面是一個完整範例:

#Base
pb-pasbio_:0:1:_test.svm.Result.Eval
#Lists
pb-pasbio_:0:1:_test.svm.*1*9*.svm.Result.Eval
pb-pasbio_:0:1:_test.svm.13.svm.Result.Eval
pb-pasbio_:0:1:_test.svm.15.svm.Result.Eval
pb-pasbio_:0:1:_test.svm.16.svm.Result.Eval
pb-pasbio_:0:1:_test.svm.*20*24*.svm.Result.Eval
pb-pasbio_:0:1:_test.svm.*27*31*.svm.Result.Eval
pb-pasbio_:0:1:_test.svm.*35*37*.svm.Result.Eval
pb-pasbio_:0:1:_test.svm.41.svm.Result.Eval
pb-pasbio_:0:1:_test.svm.45.svm.Result.Eval
pb-pasbio_:0:1:_test.svm.51.svm.Result.Eval
#Base
gtb-supervised_:0:1:.svm.Result.Eval
#Lists
gtb-supervised_:0:1:.svm.*1*9*.svm.Result.Eval
gtb-supervised_:0:1:.svm.13.svm.Result.Eval
gtb-supervised_:0:1:.svm.15.svm.Result.Eval
gtb-supervised_:0:1:.svm.16.svm.Result.Eval
gtb-supervised_:0:1:.svm.*20*24*.svm.Result.Eval
gtb-supervised_:0:1:.svm.*27*31*.svm.Result.Eval
gtb-supervised_:0:1:.svm.*35*37*.svm.Result.Eval
gtb-supervised_:0:1:.svm.41.svm.Result.Eval
gtb-supervised_:0:1:.svm.45.svm.Result.Eval
gtb-supervised_:0:1:.svm.51.svm.Result.Eval

另外,列表檔中的特殊字元「:」和「*」,分別代表的是實驗取樣編號(上例是取樣 0 到 1)和 feature 編號(1~9、13、15、16、20~24、27~31、35~37、41、45 和 51)

feature 編號

##### Feature: Predicate
1
##### Feature: Path 
2
##### Feature: Phrase Type
3
##### Feature: Position (Left / Right / ContainJust / Contain)
4
##### Feature: Voice
5
##### Feature: Head Word
6
##### Feature: Sub-categorization
7
##### Feature: Head POS
8
##### Feature: First Word
9
##### Feature: First Word's POS
10
##### Feature: Last Word
11
##### Feature: Last Word's POS
12
##### Feature: Contain First Word
13
##### Feature: Contain Last Word
14
##### Feature: Combination: Distance(predicate and phrase) and Predicate
15
##### Feature: Syntactic Frame of predicate/NP
16 17 18 19
##### Feature: PP Parent's Headword
20
##### Feature: Combination: Voice and Position
21
##### Feature: Combination: Predicate and Phrase Type
22
##### Feature: Combination: Predicate and Headword
23
##### Feature: Named Entity

##### Feature: Headword suffixes of length 2, 3, 4
24 25 26
##### Feature: Number of words in the phrase
27
##### Feature: Chunk distance: All/VP

##### Feature: Predicate's Verb Class
28
##### Feature: Predicate POS tag
29
##### Feature: Predicate Frequency: (f > 3) ? frequent : rare
30
##### Feature: Predicate's Context: BP (Chunk)

##### Feature: Predicate's Context: POS
31 32 33 34
##### Feature: Number of predicates
35
##### Feature: Level 
36
##### Feature: Context: Words +-2
37 38 39 40
##### Feature: Context: Context POS +-2
41 42 43 44
##### Feature: Context: Chunk Type

##### Feature: Parent's Features 
45 46 47 48 49 50
##### Feature: Siblings' Features
L : 51 52 53 54 55 56
R : 57 58 59 60 61 62

需求

將命令列指定檔案內的各種動、名詞(各種型態)總數加起來。

本程式主要的目的是能夠自動辨識不同型態的英文單字,將相同意義但是型態不同的單字(比如 abolish、abolished、abolishes、abolishing)視為相同,在將它們的總數計算成列於後。

使用範例

加入 GTP 和 PB 支援後:

java -jar wordsummer.jar -h

上面的用法會顯示使用說明:

usage:
   java -jar tools.jar [option] [verb list filename] filename
available option:
   -Tfre:
           total frequence.
   -Vfre:
           verb frequence.
-Tfre
計算所有句子中,滿足是動詞的個數(frequency)
-Vfre
必須加上 verb list filename 參數(該參數是一個包含動詞(變化型、原形)的列表檔),其會依據提供的列表檔中的動詞列表,來計算動詞個數(frequency)

上面兩個參數都會產生原檔名.fre 的結果檔,其內容為每一個動詞的出現次數。

不加上任何參數時,會依據提供的動詞 fre 檔,產生結果。


Ant task

  • WordSummer
  • WordSummerTest
  • build
  • clean
  • help
  • init
java -jar wordsummer.jar [inputfile]

或是執行 JUnit 測試程式:

java -jar wordsummer.jar WordSummerTest

輸入檔案的內容格式(以 tab 分隔字元和其出現數目:字元^t數目):

-competed 1
-expressing 1
-phosphorylated 1
-ribosylates 1
/activating 1
a 1
abbreviated 1
ablating 2
abolish 10
abolished 54
abolishes 16
abolishing 1

輸出的結果為:

-competed,1,-competed,1

-expressing,1,-expressing,1

-phosphorylated,1,-phosphorylated,1

-ribosylates,1,-ribosylates,1

/activating,1,/activating,1

a,1,a,1

abbreviated,1,abbreviated,1

ablating,2,ablating,2

abolish,81,abolish,10
abolish,81,abolished,54
abolish,81,abolishes,16
abolish,81,abolishing,1

即經過 WordSummer 處理後的結果格式為:

單字,總出現數目,原始單字,原始單字出現的數目

輸入的資料可以不用排序過如下:

activated 620
induced 589
expressed 503
has 492
been 478
found 422
suggest 391
increased 372
show 367
required 348
involved 330
inhibited 329
demonstrate 309
identified 308
shown 295
showed 290
binding 290
using 289
observed 276
associated 270
demonstrated 270
did 264
mediated 255
induce 233
including 228
containing 227

程式說明

因為趕著要交給 Jacky 作文檔的分析,因此整個程式架構有點凌而且有許多重複,或為完整實做的部分。不過執行出來的結果已經可以讓人接受,省去不少時間!

基本上程式的實做是依循著英文的文法規則來加以撰寫,大致上,被讀進來的每一個單字,會被 model 成一個個 idv.hjd.util.Vocabulary 物件,而若是經由文法規則判斷某幾個單字為變化形時,則會把它們全部填入 idv.hjd.util.VocabularyList 物件中。

程式主要是以 eclipse 開發,另外提供了 ant 的建構檔。

不規則變化字典

下面列出是套用的字典檔:

abide,abides,abiding,abided,abided
aby,abyes,abys,abying,abought,abought
aerify,aerifies,aerifying,aerified,aerified
air-dry,air-dries,air-drying,air-dried,air-dried
anglicise,anglicises,anglicising,anglicised,anglicised
anglicize,anglicizes,anglicizing,anglicized,anglicized
anglify,anglifies,anglifying,anglified,anglified
ante,antes,anteing,anted,anteed,anted,anteed
arc,arcs,arcing,arcking,arced,arcked,arced,arcked
argufy,argufies,argufying,argufied,argufied
arise,arises,arising,arisen,arose
awake,awakes,awaking,awoken,awaked
baa,baas,baaing,baaed,baaed
baby-sit,baby-sits,baby-sitting,baby-sat,baby-sat
backbite,backbites,backbiting,backbitten,backbit
back-pedal,back-pedals,back-pedaling,back-pedalling,back-pedaled,back-pedalled,back-pedaled,back-pedalled
backslide,backslides,backsliding,backslidden,backslid
ballyhoo,ballyhoos,ballyhooing,ballyhooed,ballyhooed
ballyrag,ballyrags,ballyragging,ballyragged,ballyragged
bar,bars,barring,barred,barred
barrel,barrels,barreling,barrelling,barreled,barrelled,barreled,barrelled
basify,basifies,basifying,basified,basified
bayonet,bayonets,bayoneting,bayonetting,bayoneted,bayonetted,bayoneted,bayonetted
bear,bears,bearing,borne,bore
beat,beats,beating,beaten,beat
become,becomes,becoming,become,became
bedevil,bedevils,bedeviling,bedevilling,bedeviled,bedevilled,bedeviled,bedevilled
bedim,bedims,bedimming,bedimmed,bedimmed
befall,befalls,befalling,befallen,befell
befog,befogs,befogging,befogged,befogged
beget,begets,begetting,begotten,begat,begot
begin,begins,beginning,begun,began
behold,beholds,beholding,beholden,beheld
bejewel,bejewels,bejeweling,bejewelling,bejeweled,bejewelled,bejeweled,bejewelled
belly-flop,belly-flops,belly-flopping,belly-flopped,belly-flopped
bename,benames,benaming,benamed,benamed
bend,bends,bending,bent,bent
beseech,beseeches,beseeching,beseeched,besought,beseeched,besought
beset,besets,besetting,beset,beset
bespeak,bespeaks,bespeaking,bespoken,bespoke
bespread,bespreads,bespreading,bespread,bespread
bestead,besteads,besteading,besteaded,besteaded
bestrew,bestrews,bestrewing,bestrewed,bestrewn,bestrewed
bestride,bestrides,bestriding,bestridden,bestrode
bet,bets,betting,bet,betted,bet,betted
betake,betakes,betaking,betaken,betook
bethink,bethinks,bethinking,bethought,bethought
bevel,bevels,beveling,bevelling,beveled,bevelled,beveled,bevelled
bias,biases,biasing,biassing,biased,biassed,biased,biassed
bid,bids,bidding,bid,bidden,bade,bid
bind,binds,binding,bound,bound
bird-dog,bird-dogs,bird-dogging,bird-dogged,bird-dogged
birdie,birdies,birdieing,birdied,birdied
bit,bits,bitting,bitted,bitted
bite,bites,biting,bitten,bit
blackberry,blackberries,blackberrying,blackberried,blackberried
blackleg,blacklegs,blacklegging,blacklegged,blacklegged
blat,blats,blatting,blatted,blatted
bleed,bleeds,bleeding,bled,bled
bless,blesses,blessing,blessed,blest,blessed,blest
blob,blobs,blobbing,blobbed,blobbed
blot,blots,blotting,blotted,blotted
blow,blows,blowing,blown,blew
blub,blubs,blubbing,blubbed,blubbed
blue,blues,blueing,bluing,blued,blued
blue-pencil,blue-pencills,blue-pencils,blue-penciling,blue-pencilling,blue-penciled,blue-pencilled,blue-penciled,blue-pencilled
blur,blurs,blurring,blurred,blurred
bob,bobs,bobbing,bobbed,bobbed
bog-down,bogs-down,bogging-down,bogged-down,bogged-down
boo,boos,booing,booed,booed
boogie,boogies,boogieing,boogied,boogied
boohoo,boohoos,boohooing,boohooed,boohooed
bottle-feed,bottle-feeds,bottle-feeding,bottle-fed,bottle-fed
break,breaks,breaking,broken,broke
breast-feed,breast-feeds,breast-feeding,breast-fed,breast-fed
breed,breeds,breeding,bred,bred
brei,breis,breiing,breid,breid
brevet,brevets,breveting,brevetting,breveted,brevetted,breveted,brevetted
bring,brings,bringing,brought,brought
broadcast,broadcasts,broadcasting,broadcast,broadcasted,broadcast,broadcasted
browbeat,browbeats,browbeating,browbeaten,browbeat
brutify,brutifies,brutifying,brutified,brutified
buckram,buckrams,buckraming,buckramed,buckramed
build,builds,building,built,built
bullwhip,bullwhips,bullwhipping,bullwhipped,bullwhipped
bullyrag,bullyrags,bullyragging,bullyragged,bullyragged
bunco,buncos,buncoing,buncoed,buncoed
bunko,bunkos,bunkoing,bunkoed,bunkoed
bur,burs,burring,burred,burred
burn,burns,burning,burned,burnt,burned,burnt
burst,bursts,bursting,burst,burst
bus,buses,busses,busing,bussing,bused,bussed,bused,bussed
bushel,bushels,busheling,bushelling,busheled,bushelled,busheled,bushelled
buss,busses,bussing,bussed,bussed
bust,busts,busting,bust,busted,bust,busted
buy,buys,buying,bought,bought
calque,calques,calquing,calqued,calqued
can,could
canal,canals,canaling,canalling,canaled,canalled,canaled,canalled
cancel,cancels,canceling,cancelling,canceled,cancelled,canceled,cancelled
canopy,canopies,canopying,canopied,canopied
carbonado,carbonados,carbonadoing,carbonadoed,carbonadoed
carburet,carburets,carbureting,carburetting,carbureted,carburetted,carbureted,carburetted
carol,carols,caroling,carolling,caroled,carolled,caroled,carolled
cast,casts,casting,cast,cast
catch,catches,catching,caught,caught
catnap,catnaps,catnapping,catnapped,catnapped
cavil,cavils,caviling,cavilling,caviled,cavilled,caviled,cavilled
cbel,cbels,cbeling,cbelling,cbeled,cbelled,cbeled,cbelled
channel,channels,channeling,channelling,channeled,channelled,channeled,channelled
char,chars,charring,charred,charred
chasse,chasses,chasseing,chassed,chassed
chide,chides,chiding,chid,chidden,chid,chided
chisel,chisels,chiseling,chiselling,chiseled,chiselled,chiseled,chiselled
chivy,chevies,chivies,chivvies,chevying,chivvying,chivying,chevied,chivied,chivvied,chevied,chivied,chivvied
chondrify,chondrifies,chondrifying,chondrified,chondrified
choose,chooses,choosing,chosen,chose
chop,chops,chopping,chopped,chopped
citify,citifies,citifying,citified,citified
clad,clads,cladding,clad,clad
cleave,cleaves,cleaving,cleaved,cleft,clove,cloven,cleaved,cleft
clepe,clepes,cleping,cleped,clept,yclept,cleped,clept,yclept
cling,clings,clinging,clung,clung
clog,clogs,clogging,clogged,clogged
clop,clops,clopping,clopped,clopped
clot,clots,clotting,clotted,clotted
cockneyfy,cockneyfies,cockneyfying,cockneyfied,cockneyfied
cod,cods,codding,codded,codded
cog,cogs,cogging,cogged,cogged
coif,coifs,coiffing,coiffed,coiffed
collogue,collogues,colloguing,collogued,collogued
colly,collies,collying,collied,collied
combat,combats,combating,combatting,combated,combatted,combated,combatted
come,comes,coming,come,came
complot,complots,complotting,complotted,complotted
con,cons,conning,conned,conned
coo,coos,cooing,cooed,cooed
cop,cops,copping,copped,copped
coquet,coquets,coquetting,coquetted,coquetted
cost,costs,costing,cost,cost
co-star,co-stars,co-starring,co-starred,co-starred
counsel,counsels,counseling,counselling,counseled,counselled,counseled,counselled
counterplot,counterplots,counterplotting,counterplotted,counterplotted
countersink,countersinks,countersinking,countersunk,countersank
creep,creeps,creeping,crept,crept
crossbreed,crossbreeds,crossbreeding,crossbred,crossbred
crosscut,crosscuts,crosscutting,crosscut,crosscut
cuckoo,cuckoos,cuckooing,cuckooed,cuckooed
cudgel,cudgels,cudgeling,cudgelling,cudgeled,cudgelled,cudgeled,cudgelled
cupel,cupels,cupeling,cupelling,cupeled,cupelled,cupeled,cupelled
curet,curets,curettes,curetting,curetted,curetted
curry,curries,currying,curried,curried
curse,curses,cursing,crust,cursed,crust,cursed
curvet,curvets,curveting,curvetting,curveted,curvetted,curveted,curvetted
cut,cuts,cutting,cut,cut
dag,dags,dagging,dagged,dagged
damnify,damnifies,damnifying,damnified,damnified
dandify,dandifies,dandifying,dandified,dandified
dap,daps,dapping,dapped,dapped
daresay,daresays,daresaying,daresaid,daresaid
deal,deals,dealing,dealt,dealt,delt
debar,debars,debarring,debarred,debarred
debus,debuses,debusses,debusing,debussing,debused,debussed,debused,debussed
decalcify,decalcifies,decalcifying,decalcified,decalcified
decontrol,decontrols,decontrolling,decontrolled,decontrolled
de-emphasize,de-emphasizes,de-emphasizing,de-emphasized,de-emphasized
deepfreeze,deep-freezes,deep-freezing,deep-freezed,deep-freezed
deep-fry,deep-fries,deep-frying,deep-fried,deep-fried
degas,degases,degasses,degassing,degassed,degassed
dehumidify,dehumidifies,dehumidifying,dehumidified,dehumidified
dele,deles,deleing,deled,deled
demit,demits,demitting,demitted,demitted
demulsify,demulsifies,demulsifying,demulsified,demulsified
denazify,denazifies,denazifying,denazified,denazified
denitrify,denitrifies,denitrifying,denitrified,denitrified
detoxify,detoxifies,detoxifying,detoxified,detoxified
devaluate,devaluates,devaluating,devaluated,devaluated
devil,devils,deviling,devilling,deviled,devilled,deviled,devilled
devitrify,devitrifies,devitrifying,devitrified,devitrified
diagram,diagrams,diagraming,diagramming,diagramed,diagrammed,diagramed,diagrammed
dial,dials,dialing,dialling,dialed,dialled,dialed,dialled
dib,dibs,dibbing,dibbed,dibbed
die,dies,dying,died,died
dig,digs,digging,dug,dug
dight,dights,dighting,dight,dighted,dight,dighted
dilly-dally,dilly-dallies,dilly-dallying,dilly-dallied,dilly-dallied
disannul,disannuls,disannulling,disannulled,disannulled
disbud,disbuds,disbudding,disbudded,disbudded
disembody,disembodies,disembodying,disembodied,disembodied
disembogue,disembogues,disemboguing,disembogued,disembogued
disembowel,disembowels,disemboweling,disembowelling,disemboweled,disembowelled,disemboweled,disembowelled
disenthral,disenthrall,disenthralls,disenthralling,disenthralled,disenthralled
dishevel,dishevels,disheveling,dishevelling,disheveled,dishevelled,disheveled,dishevelled
disinter,disinters,disinterring,disinterred,disinterred
ditto,dittoes,dittos,dittoing,dittoed,dittoed
dive,dives,diving,dived,dived,dove
do,does,doing,done,did
dog,dogs,dogging,dogged,dogged
don,dons,donning,donned,donned
dot,dots,dotting,dotted,dotted
double-tongue,double-tongues,double-tonguing,double-tongued,double-tongued
draw,draws,drawing,draws,drew
dream,dreams,dreaming,dreamed,dreamt,dreamed,dreamt
dree,drees,dreeing,dreed,dreed
drink,drinks,drinking,drunk,drank
drive,drives,driving,driven,drove
drivel,drivels,driveling,drivelling,driveled,drivelled,driveled,drivelled
drop,drops,dropping,dropped,dropped
duel,duels,dueling,duelling,dueled,duelled,dueled,duelled
dulcify,dulcifies,dulcifying,dulcified,dulcified
dup,dups,dupping,dupped,dupped
dwell,dwells,dwelling,dwelled,dwelt,dwelled,dwelt
dye,dyes,dyeing,dyed,dyed
eat,eats,eating,eaten,ate
embus,embuses,embusses,embusing,embussing,embused,embussed,embused,embussed
emcee,emcees,emceeing,emceed,emceed
empanel,empanels,empaneling,empanelling,empaneled,empanelled,empaneled,empanelled
enamel,enamels,enameling,enamelling,enameled,enamelled,enameled,enamelled
endue,endues,enduing,endued,endued
englut,engluts,englutting,englutted,englutted
ensue,ensues,ensuing,ensued,ensued
entrammel,entrammels,entrammelling,entrammelled,entrammelled
enwind,enwinds,enwinding,enwound,enwound
enwrap,enwraps,enwrapping,enwrapped,enwrapped
equal,equals,equaling,equalling,equaled,equalled,equaled,equalled
esterify,esterifies,esterifying,esterified,esterified
estop,estops,estopping,estopped,estopped
etherify,etherifies,etherifying,etherified,etherified
eye,eyes,eyeing,eying,eyed,eyed
facet,facets,faceting,facetting,faceted,facetted,faceted,facetted
fall,falls,falling,fallen,fell
featherbed,featherbeds,featherbedding,featherbedded,featherbedded
feed,feeds,feeding,fed,fed
feel,feels,feeling,felt,felt
fight,fights,fighting,fought,fought
filagree,filagrees,filagreeing,filagreed,filagreed
fill up,filled up
fin,fins,finning,finned,finned
find,finds,finding,found,found
fine-draw,fine-draws,fine-drawing,fine-drawn,fine-drew
fit,fits,fitting,fit,fitted,fit,fitted
flam,flams,flamming,flammed,flammed
flannel,flannels,flanneling,flannelling,flanneled,flannelled,flanneled,flannelled
flee,flees,fleeing,fled,fled
flimflam,flimflams,flimflamming,flimflammed,flimflammed
fling,flings,flinging,flung,flung
flip-flop,flip-flops,flip-flopping,flip-flopped,flip-flopped
floodlight,floodlights,floodlighting,floodlit,floodlit
flub,flubbed,flubbed
flurry,flurries,flurrying,flurried,flurried
fly,flies,flying,flown,flew
flyblow,flyblows,flyblowing,flyblown,flyblew
fog,fogs,fogging,fogged,fogged
forbear,forbears,forbearing,forborne,forbore
forbid,forbad,forbids,forbidding,forbidden,forbad,forbade
force-feed,force-feeds,force-feeding,force-fed,force-fed
fordo,fordoes,fordoing,fordone,fordid
forecast,forecasts,forecasting,forecast,forecasted,forecast,forecasted
foredo,foredoes,foredoing,foredone,foredid
forego,foregoes,foregoing,foregone,forewent
foreknow,foreknows,foreknowing,foreknown,foreknew
forerun,foreruns,forerunning,foreran,foreran
foresee,foresees,foreseeing,foreseen,foresaw
forespeak,forespeaks,forespeaking,forespoken,forespoke
foretell,foretells,foretelling,foretold,foretold
forget,forgets,forgetting,forgotten,forgot
forgive,forgives,forgiving,forgiven,forgave
forgo,forgoes,forgoing,forgone,forwent
format,formats,formatting,formatted,formatted
forsake,forsakes,forsaking,forsaken,forsook
forspeak,forspeaks,forspeaking,forspoken,forspoke
forswear,forswears,forswearing,forsworn,forswore
frap,fraps,frapping,frapped,frapped
free,frees,freeing,freed,freed
freeze,freezes,freezing,frozen,froze
freeze-dry,freeze-dries,freeze-drying,freeze-dried,freeze-dried
frenchify,frenchifies,frenchifying,frenchified,frenchified
frig,frigs,frigging,frigged,frigged
frit,frits,fritting,fritted,fritted
frivol,frivols,frivoling,frivolling,frivoled,frivolled,frivoled,frivolled
fuel,fuels,fueling,fuelling,fueled,fuelled,fueled,fuelled
funnel,funnels,funneling,funnelling,funneled,funnelled,funneled,funnelled
fur,furs,furring,furred,furred
gad,gads,gadding,gadded,gadded
gainsay,gainsays,gainsaying,gainsaid,gainsaid
gam,gams,gamming,gammed,gammed
gambol,gambols,gamboling,gambolling,gamboled,gambolled,gamboled,gambolled
gan,gans,ganning,ganned,ganned
garnishee,garnishees,garnisheeing,garnisheed,garnisheed
gas,gases,gasses,gassing,gassed,gassed
geld,gelds,gelding,gelded,gelt,gelded,gelt
gen-up,gens-up,genning-up,genned-up,genned-up
get,gets,getting,got,gotten,got
get lost,gets lost,got lost,gotten lost,getting lost
get started,gets started,getting started,got started,got started
ghostwrite,ghostwrites,ghostwriting,ghostwritten,ghostwrote
gib,gibs,gibbing,gibbed,gibbed
giftwrap,giftwraps,giftwrapping,giftwrapped,giftwrapped
gild,gilds,gilding,gilded,gilt,gilded,gilt
gip,gips,gipping,gipped,gipped
gird,girds,girding,girded,girt,girded,girt
give,gives,giving,given,gave
glom,glomming,glommed,glommed
glue,glues,gluing,glued,glued
gnaw,gnaws,gnawing,gnawed,gnawn,gnawed
go,goes,going,gone,went
go deep,goes deep,going deep,gone deep,went deep
goose-step,goose-steps,goose-stepping,goose-stepped,goose-stepped
grab,grabs,grabbing,grabbed,grabbed
grave,graves,graving,graven,graved
gravel,gravels,graveling,gravelling,graveled,gravelled,graveled,gravelled
gree,grees,greeing,greed,greed
grind,grinds,grinding,ground,ground
grovel,grovels,groveling,grovelling,groveled,grovelled,groveled,grovelled
grow,grows,growing,grown,grew
gumshoe,gumshoes,gumshoeing,gumshoed,gumshoed
hallo,hallos,halloing,halloed,halloed
halloo,halloos,hallooing,hallooed,hallooed
halo,haloes,halos,haloing,haloed,haloed
hamstring,hamstrings,hamstringing,hamstringed,hamstrung,hamstring,hamstrung
handfeed,handfeeds,handfeeding,handfed,handfed
handicap,handicaps,handicapping,handicapped,handicapped
hand-knit,hand-knits,hand-knitting,hand-knitted,hand-knitted
handsel,handsels,handselling,handselled,handselled
hang,hangs,hanging,hanged,hung,hanged,hung
hansel,hansels,hanseling,hanseled,hanseled
has,have,having,had,had
hatchel,hatchels,hatcheling,hatchelling,hatcheled,hatchelled,hatcheled,hatchelled
hear,hears,hearing,heard,heard
heave,heaves,heaving,heaved,hove,heaved,hove
hiccup,hiccups,hiccuping,hiccupping,hiccuped,hiccupped,hiccuped,hiccupped
hide,hides,hiding,hid,hidden,hid,hided
hie,hieing,hying,hied,hied
high-hat,high-hats,high-hatting,high-hatted,high-hatted
hinny,hinnies,hinnying,hinnied,hinnied
hit,hits,hitting,hit,hit
hocus,hocuses,hocusing,hocussing,hocused,hocussed,hocused,hocussed
hocus-pocus,hocus-pocuses,hocus-pocusing,hocus-pocussing,hocus-pocused,hocus-pocussed,hocus-pocused,hocus-pocussed
hoe,hoes,hoeing,hoed,hoed
hold,holds,holding,held,held
honey,honeys,honeying,honeyed,honied,honeyed,honied
hop,hops,hopping,hopped,hopped
housel,housels,houseling,houselling,houseled,houselled,houseled,houselled
hovel,hovels,hoveling,hovelling,hoveled,hovelled,hoveled,hovelled
hurry,hurries,hurrying,hurried,hurried
hurt,hurts,hurting,hurt,hurt
hypertrophy,hypertrophies,hypertrophying,hypertrophied,hypertrophied
imbrue,imbrues,imbruing,imbrued,imbrued
imbue,imbues,imbuing,imbued,imbued
impanel,impanells,impanels,impaneling,impanelling,impaneled,impanelled,impaneled,impanelled
inbreed,inbreeds,inbreeding,inbred,inbred
indue,indues,induing,indued,indued
indwell,indwells,indwelling,indwelt,indwelt
initial,initials,initialing,initialling,initialed,initialled,initialed,initialled
inlay,inlays,inlaying,inlaid,inlaid
inlet,inlets,inlet,inlet
input,inputs,inputting,input,inputted,input,inputted
inset,insets,insetting,inset,insetted,inset,insetted
inspan,inspans,inspanning,inspanned,inspanned
instal,install,instals,installing,installed,installed
interbreed,interbreeds,interbreeding,interbred,interbred
intercrop,intercrops,intercropping,intercropped,intercropped
intercut,intercuts,intercutting,intercut,intercut
interlay,interlays,interlaying,interlaid,interlaid
intermit,intermits,intermitting,intermitted,intermitted
interplead,interpleads,interpleading,interpleaded,interpled,interpleaded,interpled
interstratify,interstratifies,interstratifying,interstratified,interstratified
interweave,interweaves,interweaving,interwoven,interweaved,interwove
intromit,intromits,intromitting,intromitted,intromitted
inweave,inweaves,inweaving,inwoven,inweaved
inwrap,inwraps,inwrapping,inwrapped,inwrapped
issue,issues,issuing,issued,issued
jar,jars,jarring,jarred,jarred
jell,jells,jelling,jelled,jelled
jellify,jellifies,jellifying,jellified,jellified
jerry-build,jerry-builds,jerry-building,jerry-built,jerry-built
jewel,jewels,jeweling,jewelling,jeweled,jewelled,jeweled,jewelled
jitterbug,jitterbugs,jitterbugging,jitterbugged,jitterbugged
job,jobs,jobbing,jobbed,jobbed
jog,jogs,jogging,jogged,jogged
jollify,jollifies,jollifying,jollified,jollified
jot,jots,jotting,jotted,jotted
joypop,joypops,joypopping,joypopped,joypopped
joy-ride,joy-rides,joy-riding,joy-ridden,joy-rode
jut,juts,jutting,jutted,jutted
keep,keeps,keeping,kept,kept
kennel,kennels,kenneling,kennelling,kenneled,kennelled,kenneled,kennelled
kernel,kernels,kerneling,kernelling,kerneled,kernelled,kerneled,kernelled
kidnap,kidnaps,kidnaping,kidnapping,kidnaped,kidnapped,kidnaped,kidnapped
knap,knaps,knapping,knapped,knapped
kneel,kneels,kneeling,kneeled,knelt,kneeled,knelt
knit,knits,knitting,knit,knitted,knit,knitted
knot,knots,knotting,knotted,knotted
know,knows,knowing,known,knew
KO,KO's,F,Knock Out,KO'ing,KO'd,KO'd
label,labels,labeling,labelling,labeled,labelled,labeled,labelled
lade,lades,lading,laden,laded
ladify,ladyfies,ladyfying,ladyfied,ladyfied
laicize,laicizes,laicizing,laicized,laicized
lallygag,lallygags,lallygagging,lallygagged,lallygagged
lam,lams,lamming,lammed,lammed
lasso,lassoes,lassos,lassoing,lassoed,lassoed
laurel,laurels,laureling,laurelling,laureled,laurelled,laureled,laurelled
lay,lays,laying,laid,laid
lead,leads,leading,led,led
lean,leans,leaning,leaned,leant,leaned,leant
leap,leaps,leaping,leaped,leapt,leaped,leapt
leapfrog,leapfrogs,leapfrogging,leapfrogged,leapfrogged
learn,learns,learning,learned,learnt,learned,learnt
leave,leaves,leaving,left,left
leave undone,leaves undone,leaving undone,left undone,left undone
legitimize,legitimizes,legitimizing,legitimized,legitimized
lend,lends,lending,lent,lent
let,lets,letting,let,let
level,levels,leveling,levelling,leveled,levelled,leveled,levelled
libel,libels,libeling,libelling,libeled,libelled,libeled,libelled
lie,lies,lying,lain,lay
light,lights,lighting,lighted,lit,lighted,lit
lignify,lignifies,lignifying,lignified,lignified
lip-read,lip-reads,lip-reading,lip-read,lip-read
liquify,liquifies,liquifying,liquified,liquified
lob,lobs,lobbing,lobbed,lobbed
log,logs,logging,logged,logged
lop,lops,lopping,lopped,lopped
lose,loses,losing,lost,lost
lot,lots,lotting,lotted,lotted
machine-gun,machine-guns,machine-gunning,machine-gunned,machine-gunned
make,makes,making,made,made
man,mans,manning,manned,manned
manumit,manumits,manumitting,manumitted,manumitted
mar,mars,marring,marred,marred
marcel,marcels,marcelling,marcelled,marcelled
marshal,marshals,marshaling,marshalling,marshaled,marshalled,marshaled,marshalled
marvel,marvels,marveling,marvelling,marveled,marvelled,marveled,marvelled
may,might
mean,means,meaning,meant,meant
medal,medals,medaling,medalling,medaled,medalled,medaled,medalled
meet,meets,meeting,met,met
melt,melts,melting,molten,melted
metal,metals,metaling,metalling,metaled,metalled,metaled,metalled
metrify,metrifies,metrifying,metrified,metrified
militate against,militates against,militating against,militated against,militated against
minify,minifies,minifying,minified,minified
misbecome,misbecomes,misbecoming,misbecame,misbecame
miscast,miscasts,miscasting,miscast,miscast
misconstrue,misconstrues,misconstruing,misconstrued,misconstrued
misdeal,misdeals,misdealing,misdealt,misdealt
misgive,misgives,misgiving,misgiven,misgave
mishear,mishears,mishearing,misheard,misheard
mishit,mishits,mishitting,mishit,mishit
mislay,mislays,mislaying,mislaid,mislaid
mislead,misleads,misleading,misled,misled
misplead,mispleads,mispleading,mispleaded,mispled,mispleaded,mispled
misread,misreads,misreading,misread,misread
misspell,misspells,misspelling,mispelt,misspelled,mispelt,misspelled
misspend,misspends,misspending,misspent,misspent
mistake,mistakes,mistaking,mistaken,mistook
misunderstand,misunderstands,misunderstanding,misunderstood,misunderstood
mob,mobs,mobbing,mobbed,mobbed
model,models,modeling,modelling,modeled,modelled,modeled,modelled
moonlight,moonlights,moonlighting,moonlighted,moonlighted
mow,mows,mowing,mowed,mown,mowed
mug,mugs,mugging,mugged,mugged
nickel,nickels,nickeling,nickelling,nickeled,nickelled,nickeled,nickelled
nidify,nidifies,nidifying,nidified,nidified
nid-nod,nid-nods,nid-nodding,nid-nodded,nid-nodded
niello,niellos,nielloing,nielloed,nielloed
nitrify,nitrifies,nitrifying,nitrified,nitrified
nonplus,nonpluses,nonplusses,nonplusing,nonplussing,nonplused,nonplussed,nonplused,nonplussed
objectify,objectifies,objectifying,objectified,objectified
offset,offsets,offsetting,offset,offset
opsonize,opsonizes,opsonizing,opsonized,opsonized
outbid,outbids,outbidding,outbidden,outbid,outbidden
outbreed,outbreeds,outbreeding,outbred,outbred
outdo,outdoes,outdoing,outdone,outdid
outfit,outfits,outfitting,outfitted,outfitted
outgas,outgasses,outgassing,outgassed,outgassed
outgeneral,outgenerals,outgeneraling,outgeneralling,outgeneraled,outgeneralled,outgeneraled,outgeneralled
outgo,outgoes,outgoing,outgone,outwent
outgrow,outgrows,outgrowing,outgrown,outgrew
outlay,outlays,outlaying,outlaid,outlaid
outman,outmans,outmanning,outmanned,outmanned
outride,outrides,outriding,outridden,outrode
outrun,outruns,outrunning,outran,outran
outsell,outsells,outselling,outsold,outsold
outshine,outshines,outshining,outshined,outshone,outshined,outshone
outshoot,outshoots,outshooting,outshot,outshot
outspan,outspans,outspanning,outspanned,outspanned
outspread,outspreads,outspreading,outspread,outspread
outstand,outstands,outstanding,outstood,outstood
outthink,outthinks,outthinking,outthought,outthougth
outwear,outwears,outwearing,outworn,outwore
overbear,overbears,overbearing,overborne,overbore
overbid,overbids,overbidding,overbidden,overbid
overblow,overblows,overblowing,overblown,overblew
overbuild,overbuilds,overbuilding,overbuilt,overbuilt
overcast,overcasts,overcasting,overcast,overcast
overcome,overcomes,overcoming,overcome,overcame
overcrop,overcrops,overcropping,overcropped,overcropped
overdo,overdoes,overdoing,overdone,overdid
overdraw,overdraws,overdrawing,overdrawn,overdrew
overdrive,overdrives,overdriving,overdriven,overdrove
overeat,overeats,overeating,overeaten,overate
overflow,overflows,overflowing,overflowed,overflowed
overfly,overflies,overflying,overflew,overflew
overgrow,overgrows,overgrowing,overgrown,overgrew
overhang,overhangs,overhanging,overhung,overhung
overhear,overhears,overhearing,overheard,overheard
overissue,overissues,overissuing,overissued,overissued
overlap,overlaps,overlapping,overlapped,overlapped
overlay,overlays,overlaying,overlaid,overlaid
overlie,overlay,overlies,overlying,overlain,overlain
overload,overloads,overloading,overladen,overloaded,overloaded
overman,overmans,overmanning,overmanned,overmanned
overpay,overpays,overpaying,overpaid,overpaid
override,overrides,overriding,overridden,overrode
overrun,overruns,overrunning,overrun,overran
oversee,oversees,overseeing,overseen,oversaw
oversell,oversells,overselling,oversold,oversold
overset,oversets,oversetting,overset,overset
overshoot,overshoots,overshooting,overshot,overshot
oversleep,oversleeps,oversleeping,overslept,overslept
overspend,overspends,overspending,overspent,overspent
overstep,oversteps,overstepping,overstepped,overstepped
overtake,overtakes,overtaking,overtaken,overtook
overthrow,overthrows,overthrowing,overthrown,overthrew
overtop,overtops,overtopping,overtopped,overtopped
overwind,overwinds,overwinding,overwound,overwound
overwrite,overwrites,overwriting,overwritten,overwrote
pandy,pandies,pandying,pandied,pandied
panel,panels,paneling,panelling,paneled,panelled,paneled,panelled
panic,panics,panicking,panicked,panicked
parallel,parallels,paralleling,parallelling,paralleled,parallelled,paralleled,parallelled
parcel,parcels,parceling,parcelling,parceled,parcelled,parceled,parcelled
parenthesize,parenthesizes,parenthesizing,parenthesized,parenthesized
partake,partakes,partaking,partaken,partook
pasquinade,pasquils,pasquinades,pasquilling,pasquinading,pasquilled,pasquinaded,pasquilled,pasquinaded
pay,pays,paying,paid,paid
pedal,pedals,pedaling,pedalling,pedaled,pedalled,pedaled,pedalled
pencil,pencils,penciling,pencilling,penciled,pencilled,penciled,pencilled
pettifog,pettifogs,pettifogging,pettifogged,pettifogged
phantasy,phantasies,phantasying,phantasied,phantasied
photocopy,photocopies,photocopying,photocopied,photocopied
photomap,photomaps,photomapping,photomapped,photomapped
photoset,photosets,photosetting,photoset,photoset
pie,pies,pieing,piing,pied,pied
pinch-hit,pinch-hits,pinch-hitting,pinch-hit,pinch-hit
pistol,pistols,pistoling,pistolling,pistoled,pistolled,pistoled,pistolled
pistol-whip,pistol-whips,pistol-whipping,pistol-whipped,pistol-whipped
pitapat,pitapats,pitapatting,pitapatted,pitapatted
plat,plats,platting,platted,platted
plead,pleads,pleading,pleaded,pled,pleaded,pled
plod,plods,plodding,plodded,plodded
plop,plops,plopping,plopped,plopped
plot,plots,plotting,plotted,plotted
pod,pods,podding,podded,podded
pommel,pommels,pommeling,pommelling,pommeled,pommelled,pommeled,pommelled
pop,pops,popping,popped,popped
pot,pots,potting,potted,potted
precancel,precancels,precanceling,precancelling,precanceled,precancelled,precanceled,precancelled
precast,precasts,precasting,precast,precast
preoccupy,preoccupies,preoccupying,preoccupied,preoccupied
prepay,prepays,prepaying,prepaid,prepaid
preset,presets,presetting,preset,preset
presignify,presignifies,presignifying,presignified,presignified
pretermit,pretermits,pretermitting,pretermitted,pretermitted
prod,prods,prodding,prodded,prodded
program,programmes,programs,programming,programed,programmed,programed,programmed
prologue,prologs,prologues,prologing,prologuing,prologed,prologued,prologed,prologued
proofread,proofreads,proofreading,proofread,proofread
prop,props,propping,propped,propped
prove,proves,proving,proven,proved
pummel,pummels,pummeling,pummelling,pummeled,pummelled,pummeled,pummelled
pursue,pursues,pursuing,pursued,pursued
put,puts,putting,put,put
quarrel,quarrels,quarreling,quarrelling,quarreled,quarrelled,quarreled,quarrelled
quick-freeze,quick-freezes,quick-freezing,quick-frozen,quick-froze
radio,radios,radioing,raoed,raoed
rappel,rappels,rappelling,rappelled,rappelled
ravel,ravels,raveling,ravelling,raveled,ravelled,raveled,ravelled
raz-cut,raz-cuts,raz-cutting,raz-cut,raz-cut
razee,razees,razeeing,razeed,razeed
read,reads,reading,read,read
reave,reaves,reaving,reaved,reaved
rebel,rebels,rebelling,rebelled,rebelled
rebind,rebinds,rebinding,rebound,rebound
rebuild,rebuilds,rebuilding,rebuilt,rebuilt
recast,recasts,recasting,recast,recast
recce,recces,recceing,recced,recceed,recced,recceed
recommit,recommits,recommitting,recommitted,recommitted
recopy,recopies,recopying,recopied,recopied
redd,redds,redding,redded,redded
redo,redoes,redoing,redone,redid
red-pencil,red-pencills,red-pencilling,red-pencillling,red-pencilled,red-pencillled,red-pencilled,red-pencillled
refuel,refuels,refueling,refuelling,refueled,refuelled,refueled,refuelled
rehear,rehears,rehearing,reheard,reheard
relay,relays,relaying,relaid,relaid
rely,relies,relying,relied,relied
remake,remakes,remaking,remade,remade
rend,rends,rending,rent,rent
repay,repays,repaying,repaid,repaid
replevy,replevies,replevying,replevied,replevied
rerun,reruns,rerunning,rerun,reran
resell,resells,reselling,resold,resold
reset,resets,resetting,reset,reset
resit,resits,resitting,resat,resat
ret,rets,retting,retted,retted
retake,retakes,retaking,retaken,retook
retell,retells,retelling,retold,retold
rethink,rethinks,rethinking,rethought,rethought
retread,retreads,retreading,retreaded,retreaded
re-tread,re-treads,re-treading,re-trodden,re-trod
retrofit,retrofits,retrofitting,retrofitted,retrofitted
retry,retries,retrying,retried,retried
reunify,reunifies,reunifying,reunified,reunified
revalorize,revalorizes,revalorizing,revalorized,revalorized
revel,revels,reveling,revelling,reveled,revelled,reveled,revelled
revet,revets,revetting,revetted,revetted
rewind,rewinds,rewinding,rewound,rewound
rewrite,rewrites,rewriting,rewritten,rewrote
ricochet,ricochets,ricocheting,ricochetting,ricocheted,ricochetted,ricocheted,ricochetted
rid,rids,ridding,rid,ridded,rid,ridded
ride,rides,riding,ridden,rode
rigidify,rigidifies,rigidifying,rigidified,rigidified
ring,rings,ringing,rung,rang
rise,rises,rising,risen,rose
rival,rivals,rivaling,rivalling,rivaled,rivalled,rivaled,rivalled
rive,rives,riving,riven,rived
rob,robs,robbing,robbed,robbed
rot,rots,rotting,rotted,rotted
roughcast,roughcasts,roughcasting,roughcast,roughcast
rough-dry,rough-dries,rough-drying,rough-dried,rough-dried
rough-hew,rough-hews,rough-hewing,rough-hewed,rough-hewed
rowel,rowels,roweling,rowelling,roweled,rowelled,roweled,rowelled
rue,rues,ruing,rued,rued
ruggedize,ruggedizes,ruggedizing,ruggedized,ruggedized
run,runs,running,run,ran
saccharify,saccharifies,saccharifying,saccharified,saccharified
sandbag,sandbags,sandbagging,sandbagged,sandbagged
sand-cast,sand-casts,sand-casting,sand-cast,sand-cast
saponify,saponifies,saponifying,saponified,saponified
saute,sautes,sauteing,sauted,sauteed,sauted,sauteed
saw,saws,sawing,sawed,sawn,sawed
say,says,saying,said,said
scag,scags,scagging,scagged,scagged
scar,scars,scarring,scarred,scarred
scorify,scorifies,scorifying,scorified,scorified
scry,scries,scrying,scried,scried
scurry,scurries,scurrying,scurried,scurried
see,sees,seeing,seen,saw
seed,seeds,seeding,seeded,seeded
seek,seeks,seeking,sought,sought
sell,sells,selling,sold,sold
send,sends,sending,sent,sent
set,sets,setting,set,set
shake,shakes,shaking,shaken,shook
shall,should
sharecrop,sharecrops,sharecropping,sharecropped,sharecropped
shave,shaves,shaving,shaven,shaved
shear,shears,shearing,sheared,shorn,sheared
shed,sheds,shedding,shed,shed
shend,shends,shending,shent,shent
shikar,shikars,shikarring,shikarred,shikarred
shillyshally,shillyshallies,shillyshallying,shillyshallied,shillyshallied
shim,shims,shimming,shimmed,shimmed
shimmy,shimmies,shimmying,shimmied,shimmied
shine,shines,shining,shined,shone,shined,shone
shit,shits,shitting,shat,shit,shat,shit
shoe,shoes,shoeing,shod,shoed,shod,shoed
shoot,shoots,shooting,shot,shot
shop,shops,shopping,shopped,shopped
shot,shots,shotting,shotted,shotted
shovel,shovels,shoveling,shovelling,shoveled,shovelled,shoveled,shovelled
show,shows,showing,showed,shown,showed
shred,shredding,shred,shredded,shred,shredded
shrink,shrinks,shrinking,shrunk,shrunken,shrank,shrunk
shrink-wrap,shrink-wraps,shrink-wrapping,shrink-wrapped,shrink-wrapped
shrive,shrives,shriving,shriven,shrived
shrivel,shrivels,shriveling,shrivelling,shriveled,shrivelled,shriveled,shrivelled
shut,shuts,shutting,shut,shut
shy,shies,shying,shied,shied
sight-read,sight-reads,sight-reading,sight-read,sight-read
sightsee,sightsees,sightseeing,sightseen,sightsaw
signal,signals,signaling,signalling,signaled,signalled,signaled,signalled
silicify,silicifies,silicifying,silicified,silicified
sin,sins,sinning,sinned,sinned
sing,sings,singing,sung,sang
single-step,single-steps,single-stepping,single-stepped,single-stepped
sink,sinks,sinking,sunk,sunken,sank,sunk
sit,sits,sitting,sat,sat
skelly,skellies,skellying,skellied,skellied
sken,skens,skenning,skenned,skenned
sket,skets,sketting,sketted,sketted
skinny-dip,skinny-dips,skinny-dipping,skinny-dipped,skinny-dipped
skin-pop,skin-pops,skin-popping,skin-popped,skin-popped
sky,skies,skying,skied,skyed,skied,skyed
skydive,skydives,skydiving,skydived,skydived
slay,slays,slaying,slain,slew
sleep,sleeps,sleeping,slept,slept
slide,slides,sliding,slid,slidden,slid
sling,slings,slinging,slung,slung
slink,slinks,slinking,slunk,slunk
slit,slits,slitting,slit,slit
slog,slogs,slogging,slogged,slogged
slop,slops,slopping,slopped,slopped
slot,slots,slotting,slotted,slotted
slur,slurs,slurring,slurred,slurred
smell,smells,smelling,smelled,smelt,smelled,smelt
smite,smites,smiting,smitten,smote
sned,sneds,snedding,snedded,snedded
snivel,snivels,sniveling,snivelling,sniveled,snivelled,sniveled,snivelled
sob,sobs,sobbing,sobbed,sobbed
sod,sods,sodding,sodded,sodded
soft-pedal,soft-pedals,soft-pedaling,soft-pedalling,soft-pedaled,soft-pedalled,soft-pedaled,soft-pedalled
solemnify,solemnifies,solemnifying,solemnified,solemnified
soothsay,soothsays,soothsaying,soothsaid,soothsaid
sop,sops,sopping,sopped,sopped
sow,sows,sowing,sowed,sown,sowed
spancel,spancels,spanceling,spancelling,spanceled,spancelled,spanceled,spancelled
spar,spars,sparring,sparred,sparred
spat,spats,spatting,spatted,spatted
speak,speaks,speaking,spoken,spoke
speed,speeds,speeding,sped,speeded,sped,speeded
spell,spells,spelling,spelled,spelt,spelled,spelt
spellbind,spellbinds,spellbinding,spellbound,spellbound
spend,spends,spending,spent,spent
spill,spills,spilling,spilled,spilled
spin,spins,spinning,spun,span,spun
spin-dry,spin-dries,spin-drying,spin-dried,spin-dried
spiral,spirals,spiraling,spiralling,spiraled,spiralled,spiraled,spiralled
spit,spits,spitting,spat,spit,spat,spit
split,splits,splitting,split,split
spoil,spoils,spoiling,spoiled,spoilt,spoiled,spoilt
spoon-feed,spoon-feeds,spoon-feeding,spoon-fed,spoon-fed
spot,spots,spotting,spotted,spotted
spotlight,spotlights,spotlighting,spotlighted,spotlit,spotlighted,spotlit
spread,spreads,spreading,spread,spread
spring,springs,springing,sprung,sprang,sprung
spue,spues,spuing,spued,spued
spur,spurs,spurring,spurred,spurred
squat,squats,squatting,squatted,squatted
stall-feed,stall-feeds,stall-feeding,stall-fed,stall-fed
stand,stands,standing,stood,stood
star,stars,starring,starred,starred
steal,steals,stealing,stolen,stole
stellify,stellifies,stellifying,stellified,stellified
stencil,stencils,stenciling,stencilling,stenciled,stencilled,stenciled,stencilled
stick,sticks,sticking,stuck,stuck
sting,stings,stinging,stung,stung
stink,stinks,stinking,stunk,stank,stunk
strew,strews,strewing,strewed,strewn,strewed
stride,strides,striding,stridden,strode
strike,strikes,striking,struck,struck
string,strings,stringing,strung,strung
strive,strives,striving,strived,striven,strived,strove
strop,strops,stropping,stropped,stropped
stucco,stuccoes,stuccos,stuccoing,stuccoed,stuccoed
stultify,stultifies,stultifying,stultified,stultified
stum,stums,stumming,stummed,stummed
stymie,stymies,stymieing,stymying,stymied,stymied
sublet,sublets,subletting,sublet,sublet
subtotal,subtotals,subtotaling,subtotalling,subtotaled,subtotalled,subtotaled,subtotalled
sue,sues,suing,sued,sued
sulphuret,sulphurets,sulphureting,sulphuretting,sulphureted,sulphuretted,sulphureted,sulphuretted
swab,swabs,swabbing,swabbed,swabbed
swag,swags,swagging,swagged,swagged
swap,swaps,swops,swapping,swopping,swapped,swopped,swapped,swopped
swat,swats,swatting,swatted,swatted
swear,swears,swearing,sworn,swore
sweep,sweeps,sweeping,swept,swept
swell,swells,swelling,swelled,swollen,swelled
swim,swims,swimming,swum,swam
swing,swings,swinging,swung,swung
swivel,swivels,swiveling,swivelling,swiveled,swivelled,swiveled,swivelled
syllabicate,syllabicates,syllabicating,syllabicated,syllabicated
symbol,symbols,symboling,symbolling,symboled,symbolled,symboled,symbolled
take,takes,taking,taken,took
talc,talcs,talcing,talcking,talced,talcked,talced,talcked
tar,tars,tarring,tarred,tarred
tassel,tassels,tasseling,tasselling,tasseled,tasselled,tasseled,tasselled
tat,tats,tatting,tatted,tatted
tattoo,tattoos,tattooing,tattooed,tattooed
taxi,taxies,taxiing,taxying,taxied,taxied
teach,teaches,teaching,taught,taught
tear,tears,tearing,torn,tore
teasel,teasels,teaselling,teaseled,teaselled,teaseled,teaselled
ted,teds,tedding,tedded,tedded
tee,tees,teeing,teed,teed
te-hee,te-hees,te-heeing,te-heed,te-heed
telecast,telecasts,telecasting,telecasted,telecasted
tell,tells,telling,told,told
tepefy,tepefies,tepefying,tepefied,tepefied
think,thinks,thinking,thought,thought
thrive,thrives,thriving,thrived,thriven,thrived,throve
throw,throws,throwing,thrown,threw
thrust,thrusts,thrusting,thrust,thrust
tinge,tinges,tingeing,tinging,tinged,tinged
tinsel,tinsels,tinseling,tinselling,tinseled,tinselled,tinseled,tinselled
tittup,tittups,tittuping,tittupping,tittuped,tittupped,tittuped,tittupped
tog,togs,togging,togged,togged
top,tops,topping,topped,topped
torrefy,torrefies,torrefying,torrefied,torrefied
tot,tots,totting,totted,totted
total,totals,totaling,totalling,totaled,totalled,totaled,totalled
towel,towels,toweling,towelling,toweled,towelled,toweled,towelled
trammel,tramels,trameling,tramelling,trameled,tramelled,trameled,tramelled
transvalue,transvalues,transvaluing,transvalued,transvalued
traumatize,traumatizes,traumatizing,traumatized,traumatized
travel,travels,traveling,travelling,traveled,travelled,traveled,travelled
tread,treads,treading,trod,trodden,trod
trig,trigs,trigging,trigged,trigged
trot,trots,trotting,trotted,trotted
trowel,trowels,troweling,trowelling,troweled,trowelled,troweled,trowelled
true,trues,trueing,truing,trued,trued
tumefy,tumefies,tumefying,tumefied,tumefied
tun,tuns,tunning,tunned,tunned
tunnel,tunnels,tunneling,tunnelling,tunneled,tunnelled,tunneled,tunnelled
typecast,typecasts,typecasting,typecast,typecast
typeset,typesets,typesetting,typeset,typeset
typewrite,typewrites,typewriting,typewritten,typewrote
uglify,uglifies,uglifying,uglified,uglified
unbend,unbends,unbending,unbent,unbent
unbind,unbinds,unbinding,unbound,unbound
uncap,uncaps,uncapping,uncapped,uncapped
unclog,unclogs,unclogging,unclogged,unclogged
unclothe,unclothes,unclothing,unclothed,unclothed
underbid,underbids,underbidding,underbid,underbid
underbuy,underbuys,underbuying,underbought,underbought
undercut,undercuts,undercutting,undercut,undercut
underfeed,underfeeds,underfeeding,underfed,underfed
undergird,undergirds,undergirding,undergirded,undergirded
undergo,undergoes,undergoing,undergone,underwent
underlay,underlays,underlaying,underlaid,underlaid
underlet,underlets,underletting
underlie,underlies,underlying,underlain,underlay
underpay,underpays,underpaying,underpaid,underpaid
underpin,underpins,underpinning,underpinned,underpinned
underprop,underprops,underpropping,underpropped,underpropped
undersell,undersells,underselling,undersold,undersold
underset,undersets,undersetting,underset,underset
undershoot,undershoots,undershooting,undershot,undershot
understand,understands,understanding,understood,understood
undertake,undertakes,undertaking,undertaken,undertook
undervalue,undervalues,undervaluing,undervalued,undervalued
underwrite,underwrites,underwriting,underwritten,underwrote
undo,undoes,undoing,undone,undid
unfit,unfits,unfitting,unfitted,unfitted
unfreeze,unfreezes,unfreezing,unfrozen,unfroze
unkennel,unkennels,unkenneling,unkennelling,unkenneled,unkennelled,unkenneled,unkennelled
unknit,unknits,unknitting,unknit,unknitted,unknit,unknitted
unlay,unlays,unlaying,unlaid,unlaid
unlearn,unlearns,unlearning,unlearned,unlearned
unmake,unmakes,unmaking,unmade,unmade
unman,unmans,unmanning,unmanned,unmanned
unpeg,unpegs,unpegging,unpegged,unpegged
unpin,unpins,unpinning,unpinned,unpinned
unplug,unplugs,unplugging,unplugged,unplugged
unravel,unravels,unraveling,unravelling,unraveled,unravelled,unraveled,unravelled
unreeve,unreeves,unreeving,unreeved,unreeved
unrig,unrigs,unrigging,unrigged,unrigged
unsay,unsays,unsaying,unsaid,unsaid
unship,unships,unshipping,unshipped,unshipped
unsling,unslings,unslinging,unslung,unslung
unsnap,unsnaps,unsnapping,unsnapped,unsnapped
unspeak,unspeaks,unspeaking,unspoken,unspoke
unsteady,unsteadies,unsteadying,unsteadied,unsteadied
unstep,unsteps,unstepping,unstepped,unstepped
unstick,unsticks,unsticking,unstuck,unstuck
unstring,unstrings,unstringing,unstrung,unstrung
unswear,unswears,unswearing,unsworn,unswore
unteach,unteaches,unteaching,untaught,untaught
unthink,unthinks,unthinking,unthought,unthought
untread,untreads,untreading,untrodden,untrod
unwind,unwinds,unwinding,unwound,unwound
unwrap,unwraps,unwrapping,unwrapped,unwrapped
upbuild,upbuilds,upbuilding,upbuilt,upbuilt
upcast,upcasts,upcasting,upcast,upcast
upheave,upheaves,upheaving,upheaved,upheaved
uphold,upholds,upholding,upheld,upheld
uppercut,uppercuts,uppercutting,uppercut,uppercut
uprise,uprises,uprising,uprisen,uprose
upset,upsets,upsetting,upset,upset
upspring,upsprings,upspringing,upsprung,upsprang
upsweep,upsweeps,upsweeping,upswept,upswept
upswell,upswells,upswelling,upswollen,upswelled
upswing,upswings,upswinging,upswung,upswung
verbify,verbifies,verbifying,verbified,verbified
victual,victuals,victualing,victualling,victualed,victualled,victualed,victualled
vitriol,vitriols,vitrioling,vitriolling,vitrioled,vitriolled,vitrioled,vitriolled
vivify,vivifies,vivifying,vivified,vivified
wad,wads,wadding,wadded,wadded
waddy,waddies,waddying,waddied,waddied
wadset,wadsets,wadsetting,wadsetted,wadsetted
wake,wakes,waking,waked,woken,waked,woke
wan,wans,wanning,wanned,wanned
war,wars,warring,warred,warred
water-ski,water-skis,water-skiing,water-skied,water-skied
waylay,waylays,waylaying,waylaid,waylaid
wear,wears,wearing,worn,wore
weave,weaves,weaving,woven,weaved,wove
weep,weeps,weeping,wept,wept
wet,wets,wetting,wet,wetted,wet,wetted
whap,whaps,whapping,whapped,whapped
whir,whirs,whirring,whirred,whirred
whistle-stop,whistle-stops,whistle-stopping,whistle-stopped,whistle-stopped
whop,whops,whopping,whopped,whopped
wigwag,wigwags,wigwagging,wigwagged,wigwagged
will,would
win,wins,winning,won,won
wind,winds,winding,wound,wound
window-shop,window-shops,window-shopping,window-shopped,window-shopped
winterfeed,winterfeeds,winterfeeding,winterfed,winterfed
wiredraw,wiredraws,wiredrawing,wiredrawn,wiredrew
withdraw,withdraws,withdrawing,withdrawn,withdrew
withhold,withholds,withholding,withheld,withheld
withstand,withstands,withstanding,withstood,withstood
worry,worries,worrying,worried,worried
worship,worships,worshipping,worshipped,worshipped
wreak,wreaks,wreaking,wreaked,wrought,wreaked,wrought
wring,wrings,wringing,wrung,wrung
write,writes,writing,written,wrote
wry,wries,wrying,wried,wried
yak,yaks,yakking,yakked,yakked
yodel,yodels,yodeling,yodelling,yodeled,yodelled,yodeled,yodelled
zap,zaps,zapping,zapped,zapped
need,needed,needing,needs
invoke,invokes,invoked,invoked,invoking
involve,involves,involved,involving
isolate,isolates,isolated,isolating
act,acts,acted,acting
add,adds,adding,added
concern,concerned,concerns,concerning
exceed,exceeding,exceeded,exceeded
play,plays,played,playing
plate,plates,plated,plating
place,places,placing,placed
rearranges,rearranged,rearranging,rearrange
rest,resting,rests,rested
span,spans,spanned,spanning
stimulate,stimulated,stimulates,stimulating
activate,activates,activated,activating
believe,believes,believed,believing
belong,belongs,belonged,belonging
induce,induces,induced,inducing,induced.
infer,inferred,inferring,infers
influence,influences,influenced,influencing
interrupt,interrupts,interrupted,interrupting
introduce,introduces,introduced,introducing
wash,washs,washed,washing
mimic,mimics,mimiced,mimicing
be,being,is,are,was,were,been
occur,occurs,occured,occuring

只要不規則變化不是整個字都改變(比如 be、is、are),只要修改規則檔即可增進辨識率。

參考

2006年3月9日 星期四

Regular Expression 簡介

何謂 Regular Expression?

Regular expression (後文簡稱 RE)常見用來在某段文字中搜尋需要的字串。而 RE 的作法就是藉由“樣版(pattern)”來做比對。

RE 是一種由字元組成的樣式,用來比對資料,看看究竟符合或不符合這個樣式,然後可做進一步的處理。比如說:電子郵件位址的樣式,可寫成: .+\@.+。

RE 也可以稱為是樣式(Pattern),本身自成一種小型的程式語言。

例子

熟悉以前 DOS 作業系統的人都知道,‘*’‘?’字元在顯示目錄的命令中分別用來代表零或多個任意字元,以及單一一個任意字元(一定要有一個)

所以當我們使用像是“text?.*”的樣版時,下面列出的檔案都匹配該樣版:

  • textf.txt
  • text1.asp
  • text9.html

而下列的檔案名稱則不符合:

  • text.txt
  • text.asp
  • text.html

上面的例子展現了 RE 的運作模式。

為何要使用 RE

常見的 RE 應用範圍包括了:

  • 將某份 HTML 檔案中的某些特定的標籤移除。
  • 檢查電子郵件的位址是否合法。

基本上我們可以對字串進行下列的 RE 操作:

  • 驗證某個樣版:在一串字串中搜尋,檢查是否該樣版符合某部分子字串;回傳 true 或 false。
  • 從字串中取出子字串:在字串中搜尋某個字串,並將它取出。
  • 取代某個子字串:在字串中搜尋滿足範本的字串,並將它以其它字串取代之。

哪些地方在應用 RE

在文件處理方面,RE 經常可以發揮強大的功能。只要談到 RE,大家第一個聯想到的多半是 Perl 這個語言。RE 為 Perl 語言的基礎,因此其內建就支援 RE。其它的程式語言也可以藉由使用外部函式庫來支援使用 RE:

  • VBScript(5.x 以上):可藉由 RegExp 物件來使用 RE。
  • JScript(Version 5.x 以上):也是藉由 RegExp 物件來使用 RE。
  • C++ 經由 Regex++ 函式庫和 PCRE(Perl Compatible Regular Expression)函式庫來支援。
  • Java 內建支援 RE。
  • Microsoft .NET framework 內建支援 RE(經由使用 System.Text.RegularExpression 命名空間)。
  • PHP 內建 Perl 相容的函式,或是使用 POSIX 延伸的 RE 函式。

本文的程式碼以 Perl 語言為基準,至於其它語言因為它們彼此設計上的不同,語法稍有差異,但是大多大同小異。

在 Perl 中使用 RE

這裡我們簡單的說明要如何在 Perl 中使用 RE。

依樣版搜尋字串

expression =~ m/pattern/[switches]

在字串表示式中搜尋符合‘pattern’的子字串出現位置,並回傳該子字串(分別儲存於變數 $1}、$2$3...中)。其中的“m”代表“match”

例子

$test = "this is just one test";
$test =~ m/(o.e)/;

會回傳“one”於變數 $1 中。

替換字串

expression =~ s/pattern/new text/[switches]

在 expression 字串中搜尋符合 pattern 的子字串,並以 new text 替換找尋到的子字萬,其中的 s 代表 substitute。

例子

$test = "this is just one test";
$test =~ s/one/my/

將會以 my 替換 one,因此會將 this is just my test 字串存放於 $test 變數中。

Regular Expression 基本語法

類似於 C++ 中的跳脫字元,RE 的 meta character 必須以 \ 加以跳脫,比如若是要指定中括弧([),我們必須以 \[ 表示(這裡提到的跟你使用的程式語言有關;這裡是針對 Perl 來說)。

重要的 Meta Character 列表

Character描述
\標示下一個字元為一個特殊的字元(special character)、字母(literal)、backreference 或是 octal escape。比如,‘n’相當於去比對字元“n”。‘\n’則代表比對換行字元,‘ ’會配對到“\”等等。
.除了換行字元(\n)外,比對所有字元。若是要包含換行字元
,則使用下面這個樣式:[.\n]

Character 種類(Class)

一個字元種類(character class)是由一個或多的字元所組成的群組,這些群組內的字元被以包含於 [...] 來表示。比如說,“B[iu]rma”將會配對到 Birma 或是 Burma;也就是說,B 後面跟隨著 i 或是 u,後面再加上 rma。

換言之,字元種類代表:配對任何在該種類中的單一字元(match any single character of that class)。

RE 中也存在一些相反的字元種類:negotiated character class;其代表,配對任何一種不存在於該種類中的單一字元。比如說‘[^1-6]’將會比對任意字元,除了數字 1 到 6。

數量詞(Quantifier)

假如我們無法確定到底會有多少個字元,我們可以使用數量詞來指明某個字元可以出現的次數;我們可以使用像“Hel+o”,來代表 He 後面接著一或多個 l,後面再加上一個 o。

字元描述
*出現零次或多次以上;比如‘zo*’會配對到 z 或 zoo;* 相等於 ‘{0,}’。
+出現一次或多次;比如‘zo+’配對到‘zo’和 zoo,但是不包括 z;+ 相等於‘{1,}’。
?出現零次或一次。比如‘do(es)?’配對在 do 或 does 中的 do;?; 相當於‘{0,1}’。
{n}n 為非負整,代表確實的配對次數;比如‘o{2}’不會配對到位於 Bob 中的 o,而會配對到位於 foo 中的兩個 o。
{n,}n 為非負整,至少配對到 n 次;比如‘o{2,}’不會配對到位於 Bob 中的 o,但是會配對到 fooooooood 中所有的 o。‘o{1,}’相當於‘o+’;‘o{0,}’相當於‘o*’。
{n,m}m 和 n 皆為非負整數,其中 n 要小於等於 m;配對至少 n 次,至多 m 次。比如‘o{1,3}’配對到 fooooood 中的前面三個 o;‘o{0,1}’相當於‘o?’。要注意的是,在逗號和數字之間,請不要出現任何的空白。

Greedy

要注意到的一件事是,‘*’和‘+’是 greedy;它們會盡量去配對,配對出越多的會優先選取。比如說:

$test = "hello out there, how are you";
$test =~ m/h.*o/

表示:找出以 h 開頭,後面跟著多個任意的字元,最後以 o 結束。你也許會認為它會配對到“hello”,但是事實上,它配對到的是“hello out there, how are yo”,因為 RE 的組態是 greedy,所以它會搜尋直到最後一個“o”,而,這個例子中,就是在 you 中“o”。

你可以在表示式加上一個“?”,明確的指示要使用“ungreedy”方法。下面是一個例子:

$test = "hello out there, how are you";
$test =~ m/h.*?o/

上面的例子會尋找到“hello”;因為這個樣版的意義是去尋找一個“h”,後面接著多個任意字元,直到遇到第一次出現的“o”。

錨點(Anchor)

行起頭和行結尾

要檢查某一行的起始或結尾(或是字串),你可以使用 ^$ 這兩個 meta character。比如說,“^thing”,會配對到某個以“thing”起始的行。而“thing$”則會配對到某個以 “thing”結尾的行。

字邊界(Word Boundary)

\b’和‘\B’分別用來測試字邊界,以及非字邊界。下面這個例子:

$test =~ m/out/

對於“speak out loud”這個句子而言,會配對到“out”,但是也會配對到在“please don't shout at me”句子中的 out。若是要避免這種情況出現,你可以在樣式之前加上一個字邊界的錨點:

$test =~ m/\bout/

這樣,這個樣式只會尋找到以字邊界起始的“out”,而不包含在某個字中的“out”。

置換(alternation)和群組(grouping)

置換允許使用‘|’字元來在兩個或多個可替換的選擇中做選擇。在圓括弧中使用“(...|...|...)”,其允許你群組多個置換。

圓括弧本身用來抓取某個子字串,以便在其後對其做處理,並將它們儲存於 Perl 內建的 $1$2、... 及 $9 變數。

下面是一個例子:

$test = "I like apples a lot";
$test =~ m/like (apples|pines|bananas)/

將會配對成功,因為“apples”是三個置換人選中的其中之一,因此,會尋找到“like apples”。另外,圓括弧也會抓取到“apples”,並將它儲存於 $1 變數中,作為一個後參考(backreference)。

後參考(backreference)、往前看(lookahead-condition)和往後看(lookbehind-condition)

後參考

RE 一個重要的特性就是,它可以儲存之前配對到的子字串於某個變數中,以便之後處理。這是藉由在圓括弧中放置子字串來達到。這些抓取到的字串會被儲存於 Perl 的內建變數 $1 $2、... $9。

假如你不需要抓取某個子字串,但是需要用到圓括弧來群組字串,你可以使用“?:”來避免抓取。

例子:

$test = "Today is monday the 18th.";
$test =~ m/([0-9]+)th/

會將“18”儲存於 $1 變數中,而

$test = "Today is monday the 18th.";
$test =~ m/[0-9]+th/

$1 將不會儲存任何東西,因為並未使用圓括弧。

$test = "Today is monday the 18th.";
$test =~ m/(?:[0-9]+)th/

也不會儲存任何東西在 $1 變數中,因為在圓括弧中,有使用“?:”。另一個例子可以使用在置換中:

$test = "Today is monday the 18th.";
$test =~ s/ the ([0-9]+)th/, and the day is $1/

儲存於 $test 變數中的會是“Today is monday, and the day is 18.”。

你也可以藉由使用 \1\2、...\9 在尋找(query)中使用後參考(backreference),參考到之前尋找到的子字串。比如說,下面的例子會移除掉重複的字:

$test = "the house is is big";
$test =~ s/\b(\S+)\b(\s+\1\b)+/$1/

會在 $test 變數中儲存“the house is big”。

往前看和往後看狀態

有時候,我們需要下面幾個例子的配對:「配對它,但是只有在它不是處在另一個東西之前」或是「配對它,但是只有在它並非跟隨在某個東西之後」。若是只有考慮單一一個字元,你可以使用 negotiated character 類別:[^...]。

但是當它有多個字元時,你需要使用所謂的往前看狀態或是往後看狀態。總共有四種可能的類型:

  • Positive lookahead-condition '(?=re)' 只配對後面跟隨著 re RE。
  • Negative lookahead-condition '(?!re)' 只配對後面不跟隨著 re RE。
  • Positive lookbehind-condition '(?<=re)' 只配對前面跟隨著 re RE。
  • Negative lookbehind-condition '(?re RE。

範例:

$test = "HTML is a document description-language and not a programming-language";
$test =~ m/(?<=description-)language/ 

會配對到第一個“language”(description-language),因為前面跟隨著的是“description-”,所以

$test = "HTML is a document description-language and not a programming-language";
$test =~ m/(? 

會配對到第二個“language”(description-language),因為其並未跟隨在“description”。

更多例子

下面列出更多真實應用的例子。

置換前兩個字:

s/(\S+)(\s+)(\S+)/$3$2$1/

尋找 name=value 對:

m/(\w+)\s*=\s*(.*?)\s*$/

現在 name 儲存於 $1 中,而 value 則存於 $2

讀取 YYYY-MM-DD 格式的日期資料:

m/(\d{4})-(\d\d)-(\d\d)/

$1 中儲存的是 YYYY,在 $2中儲存的是 MM,DD 則存放於 $3

移除檔名前面的路徑:

s/^.*\///

總結

要注意的是,要處理相同的一件事,RE 可以用多種方式來呈現範本,有的表示法執行的速度可能很快,有的可能可讀性較高。

參考

2006年2月9日 星期四

Bio NER Tagger

展示網頁

Web Service

Web Service 所建立的資料並不是全部都能在網路上傳遞,因此使用 Web Service 實應注意。以下是在 .Net 中常用且可在網路上傳遞的資料型別:
  • 基本資料型別(bool、string、DataTime、int 以及所有數值類型的資料型別)
  • 列舉資料型別(透過 enum 所定義的資料型別)
  • 以 XML 結構描述的資料
  • DataSet 物件(DataSet 物件是利用 XML 來描述的因此可以傳遞)
  • 上述資料型別所組成的陣列

Web Service 物件的方法必須在程式碼前加上:

  • [WebMethod]
  • public
這樣才可以被外部程式所參考。

.NET 呼叫(啟動)外部程式

如何利用 C# 執行外部程式

System.Diagnostics.Process proc = new System.Diagnostics.Process();
proc.EnableRaisingEvents=false;
proc.StartInfo.FileName="filename"; // 檔案名稱
proc.Start();

除了使用 System.Diagnostics.Process 以外,傳統的 Shell 方法也可以,不過必須注意存在的風險:

一般的命令提示列指令大多會存取系統資源,要使用 Web 應用程式呼叫命令列程式,可能會因為 Web 程式權限不足無法無法執行,也可能因為開放權限而造成安全漏洞。

這問題有兩種解法:

  • 將呼叫命令列的程式獨立寫成元件 將元件加入 COM+,若是 Windows 2003 可以設定成以 Windows service 方式執行。在 COM+ 上設定 Web 應用程式的存取權。
  • Web Service 將此一 Web services 獨立一個應用程式集區,並只提供本機呼叫,授與一個較高的權限執行系統命令。

重新導向輸出

當一個程式啟動時,作業系統會相對應的產生一個行程,而我們可以經由 process ID 和行程名稱來辨識該行程。

在 .NET 中,Process 元件可以用來產生、刪除、暫停一個新的行程;我們也可以經由下面的屬性來取的相關的資訊:

  • NonpagedSystemMemorySize:
  • PagedMemorySize:
  • PagedSystemMemorySize:
  • PeakPagedMemorySize:
  • PeakVirtualMemorySize:
  • PeakWorkingSet
  • PriorityClass
  • PrivateMemorySize
  • PrivilegedProcessorTime

Process 元件中,包含了靜態和實例方法。比較常用的靜態方法如:

  • EnterDebugMode:讓該行程進入除錯模式
  • GetCurrentProcess:回傳目前正在執行的行程
  • GetProcessById:回傳指定的行程 ID 參考
  • GetProcesses:回傳目前機器上所有執行的行程所組成的陣列
  • Start:啟動一個新行程

Process 類別方法:

  • Close:釋放 Process 元件所使用的相關資源
  • Kill:終止正在執行的行程
  • CloseMainWindow:藉由傳送訊息到行程所在的主視窗來結束該行程

Process 類別屬性:

  • Responding:取得目前行程的狀態;true 表示還在執行中,false 代表無回應
  • StandardError:取得行程的標準錯誤輸出的檔案描述子(file descriptor)好將其重新導向到 Stream Reader,以檔案的方式來讀取之
  • StandardInput:取得行程的標準輸入的檔案描述子(file descriptor)好將其重導向,使得我們寫入資料到行程就好像寫資料到 StremWriter 一樣
  • StandardOutput:類似標準錯誤輸出,但是用來讀取行程的標準輸出
  • StartInfo:在行程啟動前,傳遞初始化參數給行程的屬性;若是在行程啟動後才改變行程的參數,將不會有任何影響

要重新導向標準的輸出入,首先必須初始化一個 ProcessStartInfo 類別,並以我們想要啟動的應用程式名稱作為該類別的建構子引數,並設定一些參數,最後把它傳遞給 Process 實例:

ProcessStartInfo psI = new ProcessStartInfo("cmd"); 

psI.UseShellExecute 屬性必須設定為 false,以便重新導向 StandardInput 等其它標準輸出入。所以,下面的這些屬性

psI.RedirectStandardInput 
psI.RedirectStandardOutput 
psI.RedirectStandardError 

都被設定為 true。

為了避免顯示命令列提示字元那個惱人的視窗,我們也將 psI.CreateNoWindow 屬性設定為 true,因此 cmd 將不會顯示視窗。最後,設定 p.StartInfo 為剛剛建立的 ProcessStartInfo 實例。

為了能夠擷取 p.StandardInput、p.StandardOutput 和 p.StandardError,我們必須取得檔案描述子(StreamReaders 和 StreamWriter 類別),來讀取 StandardOutput、StandardError 和寫入 StandardInput;就跟我們讀取或寫入檔案一樣的方式。當我們關閉 p.StandardInput 檔案描述子時,cmd 行程也會被終止。

最後,我們讀取 p.StandardOutput 和 p.StandardError 檔案描述子的內容到 text box。

下面是整個完整範例:

private void start()
{
   Process p = new Process();
   StreamWriter input;
   StreamReader output;
   StreamReader err;

   ProcessStartInfo psI = new ProcessStartInfo("cmd");
   psI.UseShellExecute = false;
   psI.RedirectStandardInput = true;
   psI.RedirectStandardOutput = true;
   psI.RedirectStandardError = true;
   psI.CreateNoWindow = true;
   p.StartInfo = psI;

   p.Start();
   input = p.StandardInput;
   output = p.StandardOutput;
   err = p.StandardError;

   input.AutoFlush = true;
   if (tbComm.Text != "")
      input.WriteLine(tbComm.Text);
   else
      //execute default command
      input.WriteLine("dir \\");
   input.Close();

   textBox1.Text = output.ReadToEnd();
   textBox1.Text += err.ReadToEnd();
}

C# Thread

.NET Framework 在 System.Threading 命名空間中定義了一些跟執行緒相關的類別。下面我們示範如何在 C# 中產生一個執行緒:

步驟一:建立一個 System.Threading.Thread 物件

建立一個 System.Threading.Thread 物件會在 .NET 環境中產生一個 managed 執行緒。Thread 類別只有一個建構式,其接受一個 ThreadStart delegate 做為參數。ThreadStart delegate 是一個回呼方法,當我們啟動該執行緒時,會呼叫該方法。

步驟二:建立回呼函式

該方法將會是我們新產生的執行緒的起始點。它可能是一個類別物件的實例方法(instance function)或是一個靜態方法。

若是該方法為實例方法,我們必須在建立 ThreadStart delegate 前,建立該類別物件;對於靜態方法而言,我們只需直接使用方法名稱來初始化 delegate。另外要注意的是,回呼方法必須以 void 來當作其回傳類型和參數;因為 ThreadStart delegate 函數的宣告為如此!

步驟三:啟動執行緒

我們可以使用 Thread 類別的 Start 方法來啟動新建立的執行緒。這個方法是非同步的方法,該方法會要求作業系統來啟動建立的執行緒。

例子

// 執行緒的回呼方法

public static void MyCallbackFunction()
{
    while (true)
    {
        System.Console.WriteLine("Hey!, My Thread Function Running");
        ………
    }
}

public static void Main(String []args)
{
    // 建立執行緒物件
    Thread MyThread = new Thread(new ThreadStart
        (MyCallbackFunction));

    MyThread.Start()
    ……
}

刪除執行緒

藉由呼叫執行緒物件的 Abort 方法,我們可以刪除一個執行緒。呼叫 Abort 方法會造成目前的執行緒拋出 ThreadAbortException 例外,並結束執行。

MyThread.Abort();

暫停和恢復執行緒

我們可以用 Suspend 方法來暫停一個正在執行中的執行緒,或從另一個執行緒中使用 Resume 方法來恢復一個執行緒的執行。

MyThread.Suspend() // 暫停執行緒執行
MyThread.Resume() // 恢復執行緒執行

執行緒狀態

執行緒可能處於下列幾種狀態:

  • Unstarted:執行緒已經被建立於 Common Language Runtime 中,但是還未啟動
  • Running:執行緒的 Start 方法被呼叫後,即進入此狀態
  • WaitSleepJoin:當呼叫執行緒的 wait、Sleep 或是 Join 方法時,會進入此狀態
  • Suspended:當 Suspend 方法被呼叫時,會進入此狀態
  • Stopped:執行緒終止(不論是正常終止或是 Abort)

藉由使用 Thread 的 ThreadState 屬性我們可以得知目前執行緒所處的狀態。

執行緒優先權

Thread 類別的 ThreadPriority 屬性用來設定該執行緒的優先權。可用的值包括:Lowest、BelowNormal、Normal、AboveNormal、Highest。預設值為 Normal。

Cross compile

因為從 GENIA 中下載的 geniatagger 只有 Linux 版本,所以我們必須使用 cygwin 來將它編譯成為可以在 windows 上使用的版本。

要做 cross compile 很簡單,只要在安裝 cygwin 時記得安裝 g++、gcc 以及 vim (可以不安裝)後,就可以在 cygwin 的視窗中下達 gcc/g++ 來做編譯。要注意的是,若是要將編譯好的 exe 檔在其它的電腦上執行,必須要複製 cygwin1.dll 檔。

2006年1月10日 星期二

HOWTO 在 VC 下編譯 64 位元程式

安裝完 VS 2005 後(安裝時 Visual C++ 記得要勾選 X64 Compilers and Tools),在開始功能集裡的 Microsoft Visual Studio 2005 → Visual Studio Tools 資料夾內會新增「Visual Studio 2005 x64 Win64 Command Prompt」等。

下面列出 cl.exe(Visuall C++ 的編譯器)的不同版本:

x86 on x86

允許我們建立用於 x86 機器上的應用程式。這個版本的 cl.exe 以 32 bit 的行程來運作,執行於 x86 機器上,也可以執行於 64 bit 的 Windows 作業系統(藉由使用 WOW64 技術)。

Itanium on x86(Itanium cross-compiler)

允許我們建立用於 Itanium 機器上的應用程式。這個版本的 cl.exe 以 32 bit 的行程來運作,執行於 x86 機器上,也可以執行於 64 bit 的 Windows 作業系統(藉由使用 WOW64)。

x64 on x86(x64 cross-compiler)

允許我們建立適用於 x64 系統上的應用程式。這個版本的 cl.exe 以 32 bit 的行程來運作,執行於 x86 機器上,也可以執行於 64 bit 的 Windows 作業系統(藉由使用 WOW64)。

Itanium on Itanium

允許我們建立用於 Itanium 機器上的應用程式。這個版本的 cl.exe 執行時就如同一般的 Itanium 機器上的行程。

x64 on x64

允許我們建立用於 Itanium 機器上的應用程式。這個版本的 cl.exe 執行時就如同一般的 x64 機器上的行程。

Vcvarsall.bat

前面提到的五種編譯器類型,可以藉由 vcvarsall.bat 來執行。預設這個批次檔是位於 C:\Program Files\Microsoft Visual Studio 8\VC\Vcvarsall.bat。

假如未提供任何的參數,這個批次檔會組態用於 x86、32 bit 的編譯器。下面列出可用於 vcvarsall.bat 的參數:

參數編譯器主機(包括模擬的)目標架構
x86(預設)32-bit Nativex86、x64、Itaniumx86
x86_amd64x64 Crossx86、x64, Itaniumx64
x86_IPFItanium Crossx86、x64、ItaniumItanium
amd64x64 Nativex64x64
IPF 或 itaniumItanium NativeItaniumItanium

組態產生 64 bit 平台的目的碼

  1. 開啟要組態成 64 bit 的專案的專案屬性(Property Pages)
  2. 點選「Configuration Manager..」按鈕,開啟 Configuration Manager 對話視窗。
  3. 點選「Active Solution Platform」下拉選單,再選取「」選項開啟「New Solution Platform」視窗。
  4. 選取「Type or select the new platform」下拉選單,選取 x64。
  5. 點擊「OK」。在前一個步驟選取的目標平台(x64)應該會出現在「Active Solution Platform」選單內。
  6. 接著關閉 Configuration Manager 和 Property Pages 視窗。