{"id":1123,"date":"2020-08-16T13:17:13","date_gmt":"2020-08-16T06:17:13","guid":{"rendered":"http:\/\/www.miai.vn\/?p=1123"},"modified":"2020-08-16T13:17:13","modified_gmt":"2020-08-16T06:17:13","slug":"named-entity-recognition-nhan-dien-thuc-the-trong-cau-khi-xu-ly-ngon-ngu-tu-nhien","status":"publish","type":"post","link":"https:\/\/miai.vn\/?p=1123","title":{"rendered":"Named Entity Recognition &#8211; Nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u khi x\u1eed l\u00fd ng\u00f4n ng\u1eef t\u1ef1 nhi\u00ean"},"content":{"rendered":"\n<p>Ch\u00e0o tu\u1ea7n m\u1edbi c\u00e1c member M\u00ec th\u00e2n y\u00eau, h\u00f4m nay ch\u00fang ta s\u1ebd c\u00f9ng \u0111i t\u00ecm hi\u1ec3u v\u1ec1 Named Entity Recognition &#8211; Nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u khi x\u1eed l\u00fd ng\u00f4n ng\u1eef t\u1ef1 nhi\u00ean nh\u00e9. M\u00f3n n\u00e0y hay g\u1ecdi l\u00e0 NER \u0111\u00f3 c\u00e1c mem.<\/p>\n\n\n\n<p>C\u00e1c b\u00e0i v\u1ec1 NLP &#8211; x\u1eed l\u00fd ng\u00f4n ng\u1eef t\u1ef1 nhi\u00ean th\u01b0\u1eddng tr\u1eebu t\u01b0\u1ee3ng v\u00e0 kh\u00f3 hi\u1ec3u h\u01a1n c\u00e1c b\u00e0i Computer Vision r\u1ea5t nhi\u1ec1u n\u00ean m\u00ecnh s\u1ebd \u0111i th\u1eadt t\u1eeb t\u1eeb v\u00e0 m\u1ecdi th\u1ee9 s\u1ebd \u0111\u01b0\u1ee3c di\u1ec5n \u0111\u1ea1t \u1edf d\u1ea1ng M\u00ec \u0103n li\u1ec1n \u0111\u1ec3 b\u1ea1n n\u00e0o c\u0169ng c\u00f3 th\u1ec3 hi\u1ec3u \u0111\u01b0\u1ee3c.<\/p>\n\n\n\n<p>M\u00ecnh c\u00f3 m\u1ea5y b\u00e0i v\u1ec1 NLP, b\u1ea1n n\u00e0o ch\u01b0a \u0111\u1ecdc th\u00ec \u0111\u1ecdc l\u1ea1i <strong><a href=\"https:\/\/www.miai.vn\/category\/natural-language-processing\/\" target=\"_blank\" rel=\"noreferrer noopener\">t\u1ea1i \u0111\u00e2y<\/a><\/strong> nh\u00e9.<\/p>\n\n\n\n<ins class=\"adsbygoogle\" style=\"display:block\" data-ad-client=\"ca-pub-5095883280136027\" data-ad-slot=\"7735063137\" data-ad-format=\"auto\" data-full-width-responsive=\"true\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n\n\n<p>Let&#8217;s go th\u00f4i c\u00e1c mem!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Ph\u1ea7n 1 &#8211; Named Entity Recognition (NER) l\u00e0 c\u00e1i chi chi?<\/h2>\n\n\n\n<p>Ok, th\u00ec v\u1eeba n\u00f3i \u1edf tr\u00ean \u0111\u00f3, ch\u00fang ta h\u1ecdc v\u1ec1 NER &#8211; nh\u1eadn di\u1ec7n c\u00e1c th\u1ef1c th\u1ec3 trong v\u0103n b\u1ea3n. <\/p>\n\n\n\n<p>V\u1eabn tr\u1eebu t\u01b0\u1ee3ng v\u00e3i \u0111\u00fang kh\u00f4ng? M\u00ecnh s\u1ebd c\u00f3 v\u00ed d\u1ee5 ngay, t\u00f3m l\u1ea1i b\u00e0i h\u00f4m nay ch\u00fang ta l\u00e0m b\u00e0i to\u00e1n nh\u01b0 sau:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>V\u00ed d\u1ee5 1: \u0110\u1ea7u v\u00e0o l\u00e0 c\u00e2u v\u0103n b\u1ea3n, v\u00ed d\u1ee5 &#8220;H\u00f4m nay Peter \u0111i M\u1ef9&#8221;. Model ph\u1ea3i \u0111\u01b0a ra output l\u00e0 : Peter &#8211; T\u00ean ri\u00eang, M\u1ef9 &#8211; \u0110\u1ecba \u0111i\u1ec3m.<\/li><li>V\u00ed d\u1ee5 2: \u0110\u1ea7u v\u00e0o &#8220;S\u1ed1 \u0111i\u1ec7n tho\u1ea1i c\u00f4ng ty Apple  l\u00e0 091345678&#8221;. Model s\u1ebd ph\u1ea3i hi\u1ec7n ra: Apple &#8211; T\u00ean c\u00f4ng ty, 091345678 &#8211;  S\u1ed1 \u0111i\u1ec7n tho\u1ea1i.<\/li><\/ul>\n\n\n\n<p>V\u1eady \u0111\u00f3, NER l\u00e0 tr\u00edch t\u1eeb c\u00e2u v\u0103n ra c\u00e1c th\u1ef1c th\u1ec3 c\u00f3 t\u00ean (ngh\u0129a l\u00e0 c\u00e1c th\u1ef1c th\u1ec3 \u1ea5y \u0111\u01b0\u1ee3c ta \u0111\u1eb7t t\u00ean r\u1ed3i \u1ea5y). V\u00ed d\u1ee5 nh\u01b0 tr\u00ean l\u00e0 th\u1ef1c th\u1ec3 T\u00ean ri\u00eang, th\u1ef1c t\u1ec3 \u0110\u1ecba \u0111i\u1ec3m, th\u1ef1c th\u1ec3 s\u1ed1 \u0111i\u1ec7n tho\u1ea1i&#8230;.<\/p>\n\n\n\n<p>Xem th\u00eam c\u00e1i h\u00ecnh \u0111\u1ec3 bi\u1ebft NER l\u00e0 g\u00ec n\u00e0y, c\u00e1c th\u1ef1c th\u1ec3 \u0111\u01b0\u1ee3c nh\u1eadn ra khi \u0111\u01b0a \u0111o\u1ea1n v\u0103n b\u1ea3n v\u00e0o v\u00e0 b\u00f4i xanh, b\u00f4i \u0111\u1ecf \u0111\u1ea5y:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/d2ueix13hy5h3i.cloudfront.net\/wp-content\/uploads\/2019\/06\/3.png\" alt=\"Named Entity Recognition\"\/><figcaption>Ngu\u1ed3n: <a href=\"https:\/\/d2ueix13hy5h3i.cloudfront.net\/wp-content\/uploads\/2019\/06\/3.png\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">T\u1ea1i \u0111\u00e2y<\/a><\/figcaption><\/figure>\n\n\n\n<p>L\u1edd m\u1edd hi\u1ec3u m\u1ee5c \u0111\u00edch b\u00e0i h\u00f4m nay r\u1ed3i ch\u1ee9 c\u00e1c b\u1ea1n. \u0110i ti\u1ebfp nh\u00e9!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Ph\u1ea7n 2 &#8211; Ph\u01b0\u01a1ng ph\u00e1p tri\u1ec3n khai b\u00e0i to\u00e1n<\/h2>\n\n\n\n<h5 class=\"wp-block-heading\">V\u1ec1 v\u1ea5n \u0111\u1ec1 input, output<\/h5>\n\n\n\n<p>Nh\u01b0 v\u1eady \u0111\u1ec3 l\u00e0m b\u00e0i n\u00e0y ta s\u1ebd th\u1ef1c hi\u1ec7n nh\u1eadn m\u1ed9t chu\u1ed7i v\u0103n b\u1ea3n \u0111\u1ea7u v\u00e0o X g\u1ed3m n t\u1eeb \u0111\u00e1nh s\u1ed1 t\u1eeb 1 \u0111\u1ebfn n. Ch\u00fang ta s\u1ebd th\u1ef1c hi\u1ec7n predict ra vector y c\u0169ng c\u00f3 n ph\u1ea7n t\u1eed l\u00e0 nh\u00e3n c\u1ee7a c\u00e1c t\u1eeb trong c\u00e2u X. <\/p>\n\n\n\n<p>C\u00e1c nh\u00e3n c\u00f3 c\u1ea5u tr\u00fac: P-Name. Trong \u0111\u00f3<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>P c\u00f3 th\u1ec3 l\u00e0 B (b\u1eaft \u0111\u1ea7u), I (b\u00ean trong) v\u00e0 E (k\u1ebft th\u00fac) \u0111\u1ec3 mi\u00eau t\u1ea3 v\u1ecb tr\u00ed b\u1eaft \u0111\u1ea7u, b\u00ean trong v\u00e0 k\u1ebft th\u00fac c\u1ee7a th\u1ef1c th\u1ec3 trong c\u00e2u.<\/li><li>Name l\u00e0 t\u00ean th\u1ef1c t\u1ec3. V\u00ed d\u1ee5: org &#8211; t\u1ed5 ch\u1ee9c, per &#8211; t\u00ean ri\u00eang&#8230;.<\/li><\/ul>\n\n\n\n<p>V\u00ed d\u1ee5 v\u1ec1 vi\u1ec7c g\u00e1n nh\u00e3n cho c\u00e1c t\u1eeb nh\u01b0 sau:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>C\u00e2u X: John   Michael Wick  like United Kingdom very much\nNh\u00e3n.  B-per  I-per   E-per O    B-loc  E-loc   O    O<\/code><\/pre>\n\n\n\n<p>Nh\u00ecn v\u00e0o \u0111\u00f3 ta th\u1ea5y:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>John \u0111\u01b0\u1ee3c g\u00e1n B-per: Ngh\u0129a l\u00e0 b\u1eaft \u0111\u1ea7u m\u1ed9t t\u00ean ri\u00eang.<\/li><li>Michael l\u00e0 I-per, l\u00e0 ph\u1ea7n gi\u1eefa c\u1ee7a m\u1ed9t t\u00ean ri\u00eang.<\/li><li>Wick th\u00ec l\u00e0 E-per , k\u1ebft th\u00fac c\u1ee7a t\u00ean ri\u00eang.<\/li><li>T\u01b0\u01a1ng t\u1ef1 cho United Kingdom l\u00e0 m\u1ed9t th\u1ef1c th\u1ec3 loc &#8211; Location<\/li><li>like,  very, much l\u00e0 c\u00e1c t\u1eeb kh\u00f4ng c\u1ea7n nh\u1eadn di\u1ec7n n\u00ean \u0111\u1ec3 l\u00e0 O &#8211; Outside.<\/li><\/ul>\n\n\n\n<p>Ph\u1ea7n n\u00e0y b\u1ea1n n\u00e0o c\u00f2n ch\u01b0a hi\u1ec3u c\u00f3 th\u1ec3 post l\u00ean Group trao \u0111\u1ed5i, chia s\u1ebb: <strong><a href=\"https:\/\/facebook.com\/groups\/miaigroup\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/facebook.com\/groups\/miaigroup<\/a><\/strong> \u0111\u1ec3 giao l\u01b0u th\u00eam nh\u00e9.<\/p>\n\n\n\n<ins class=\"adsbygoogle\" style=\"display:block\" data-ad-client=\"ca-pub-5095883280136027\" data-ad-slot=\"7735063137\" data-ad-format=\"auto\" data-full-width-responsive=\"true\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n\n\n<h5 class=\"wp-block-heading\">V\u1ec1 v\u1ea5n \u0111\u1ec1 m\u1ea1ng Deep learning <\/h5>\n\n\n\n<p>N\u00f3i \u0111\u1ebfn Input l\u00e0 v\u0103n b\u1ea3n l\u00e0 ch\u00fang ta ngh\u0129 ngay \u0111\u1ebfn LSTM r\u1ed3i. V\u00e2ng, b\u00e0i n\u00e0y ch\u00fang ta s\u1eed d\u1ee5ng LSTM v\u00e0 c\u1ee5 th\u1ec3 l\u00e0 Bidirection LSTM \u0111\u1ec3 x\u00e2y d\u1ef1ng m\u1ea1ng Neural.<\/p>\n\n\n\n<p>N\u1ebfu c\u00e1c b\u1ea1n c\u1ea7n t\u00ecm hi\u1ec3u s\u00e2u h\u01a1n v\u1ec1 LSTM th\u00ec \u0111\u1ecdc <strong><a href=\"https:\/\/medium.com\/datadriveninvestor\/a-high-level-introduction-to-lstms-34f81bfa262d\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">link n\u00e0y<\/a><\/strong> nh\u00e9, n\u00f3 s\u00e2u v\u1ec1 l\u00fd thuy\u1ebft ch\u00fat.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/777\/1*5-zfD8A0hNbF8LN_DN8koQ.png\" alt=\"LSTM\"\/><figcaption>Ngu\u1ed3n: <a href=\"https:\/\/miro.medium.com\/max\/777\/1*5-zfD8A0hNbF8LN_DN8koQ.png\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">T\u1ea1i \u0111\u00e2y<\/a><\/figcaption><\/figure>\n\n\n\n<p>Sau khi tr\u00edch xu\u1ea5t \u0111\u01b0\u1ee3c vector \u0111\u1eb7c tr\u01b0ng cho t\u1eebng t\u1eeb trong c\u00e2u, ch\u00fang ta s\u1ebd s\u1eed d\u1ee5ng thu\u1eadt to\u00e1n Conditional Random Fields (CRF) \u0111\u1ec3 predict t\u1eeb \u0111\u00f3 c\u00f3 l\u00e0 Named Entity hay kh\u00f4ng.<\/p>\n\n\n\n<p>L\u00fd do ph\u1ea3i s\u1eed d\u1ee5ng CRF l\u00e0 v\u00ec :<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Vi\u1ec7c predict nh\u00e3n t\u1eebng t\u1eeb nhi\u1ec1u khi kh\u00f4ng mang l\u1ea1i hi\u1ec7u qu\u1ea3 cao. V\u00ed d\u1ee5 n\u1ebfu x\u00e9t ri\u00eang t\u1eebng t\u1eeb th\u00ec t\u1eeb Apple trong c\u00e2u &#8220;I eat Apple&#8221; s\u1ebd gi\u1ed1ng nh\u01b0 t\u1eeb Apple trong c\u00e2u &#8220;Apple makes iPhone&#8221;. Trong khi ta bi\u1ebft r\u00f5 l\u00e0 kh\u00e1c nhau \ud83d\ude00<\/li><li>CRF c\u00f3 c\u01a1 ch\u1ebf s\u1eed d\u1ee5ng ng\u1eef c\u1ea3nh (nh\u00e3n c\u1ee7a c\u00e1c t\u1eeb tr\u01b0\u1edbc \u0111\u00f3) v\u00e0o vi\u1ec7c predict nh\u00e3n c\u1ee7a t\u1eeb hi\u1ec7n t\u1ea1i n\u00ean nh\u00e3n c\u1ee7a c\u1ee7a 1 t\u1eeb s\u1ebd ph\u1ee5 thu\u1ed9c v\u00e0o t\u1eeb \u0111\u00f3 n\u1eb1m trong c\u00e2u n\u00e0o, \u0111i\u1ec1u \u0111\u00f3 h\u1ee3p l\u00fd h\u01a1n.<\/li><\/ul>\n\n\n\n<p>Link d\u00e0nh cho b\u1ea1n n\u00e0o mu\u1ed1n t\u00ecm hi\u1ec3u s\u00e2u h\u01a1n v\u1ec1 CRF: <strong><a href=\"https:\/\/people.cs.umass.edu\/~mccallum\/papers\/crf-tutorial.pdf\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">t\u1ea1i \u0111\u00e2y<\/a><\/strong>.<\/p>\n\n\n\n<p>\u0110\u00f3, l\u00fd thuy\u1ebft cho b\u00e0i n\u00e0y ch\u1ec9 c\u00f3 th\u1ec3, gi\u1edd ta ti\u1ebfn h\u00e0nh x\u1eed l\u00fd t\u1eebng b\u01b0\u1edbc n\u00e0o.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Ph\u1ea7n 3 &#8211; Build v\u00e0 train model Named Entity Recognition<\/h2>\n\n\n\n<ins class=\"adsbygoogle\" style=\"display:block\" data-ad-client=\"ca-pub-5095883280136027\" data-ad-slot=\"7735063137\" data-ad-format=\"auto\" data-full-width-responsive=\"true\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n\n\n<h5 class=\"wp-block-heading\">D\u1eef li\u1ec7u cho train model<\/h5>\n\n\n\n<p>B\u00e0i to\u00e1n n\u00e0y c\u00f3 th\u1ec3 s\u1eed d\u1ee5ng b\u1ea5t k\u00ec d\u1eef li\u1ec7u n\u00e0o \u0111\u00e3 g\u00e1n nh\u00e3n \u0111\u1ec3 train nh\u00e9, c\u00e1c b\u1ea1n tu\u1ef3 v\u00e0o b\u00e0i to\u00e1n c\u1ee7a m\u00ecnh \u0111\u1ec3 ch\u1ecdn d\u1eef li\u1ec7u cho ph\u00f9 h\u1ee3p. \u1ede \u0111\u00e2y \u0111\u1ec3 nhanh g\u1ecdn m\u00ecnh s\u1eed d\u1ee5ng d\u1eef li\u1ec7u tr\u00ean Kaggle, c\u00e1c b\u1ea1n download <strong><a href=\"https:\/\/www.kaggle.com\/abhinavwalia95\/entity-annotated-corpus\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">t\u1ea1i \u0111\u00e2y<\/a><\/strong>. M\u00ecnh c\u0169ng \u0111\u1ec3 tr\u00ean github lu\u00f4n cho c\u00e1c b\u1ea1n c\u1ea7n t\u1ea3i nhanh.<\/p>\n\n\n\n<p>D\u1eef li\u1ec7u n\u00e0y \u0111\u00e3 \u0111\u01b0\u1ee3c g\u00e1n nh\u00e3n nh\u01b0 t\u1ea1i ph\u1ea7n 2 v\u1edbi c\u00e1c lo\u1ea1i Entity g\u1ed3m: <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>geo = Geographical Entity\norg = Organization\nper = Person\ngpe = Geopolitical Entity\ntim = Time indicator\nart = Artifact\neve = Event\nnat = Natural Phenomenon<\/code><\/pre>\n\n\n\n<h5 class=\"wp-block-heading\">\u0110\u1ecdc v\u00e0 x\u1eed l\u00fd d\u1eef li\u1ec7u <\/h5>\n\n\n\n<p>\u0110\u1ec3 load d\u1eef li\u1ec7u, ch\u00fang ta s\u1eed d\u1ee5ng th\u01b0 vi\u1ec7n Pandas v\u00e0 \u0111\u1ecdc file &#8216;ner_dataset.csv&#8217;.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def load_data(filename='..\/ner_dataset.csv'):\n    df = pd.read_csv(filename, encoding = \"ISO-8859-1\")\n    df = df.fillna(method = 'ffill')\n    return df<\/code><\/pre>\n\n\n\n<p>Trong khi \u0111\u1ecdc ch\u00fang ta ti\u1ebfn h\u00e0nh fill c\u00e1c \u00f4 b\u1ecb Null lu\u00f4n b\u1eb1ng l\u1ec7nh df.fillna.<\/p>\n\n\n\n<p>D\u1eef li\u1ec7u \u0111\u1ecdc \u0111\u01b0\u1ee3c s\u1ebd \u0111\u01b0\u1ee3c ghi th\u00e0nh t\u1eebng d\u00f2ng, m\u1ed7i d\u00f2ng m\u1ed9t t\u1eeb k\u00e8m theo POS (Part of Speech) v\u00e0 nh\u00e3n Tag t\u01b0\u01a1ng \u1ee9ng theo \u0111\u00fang quy t\u1eafc t\u1ea1i Ph\u1ea7n 2.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/github.com\/thangnch\/photos\/blob\/master\/Screen%20Shot%202020-08-16%20at%2016.54.31.png?raw=true\" alt=\"Named Entity Recognition (NER) - Nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u khi x\u1eed l\u00fd ng\u00f4n ng\u1eef t\u1ef1 nhi\u00ean\"\/><\/figure>\n\n\n\n<p>\u1ede \u0111\u00e2y ta b\u1ecf qua c\u1ed9t POS nh\u00e9. Quan t\u00e2m 3 c\u1ed9t c\u00f2n l\u1ea1i th\u00f4i.<\/p>\n\n\n\n<p>Ti\u1ebfp theo ta s\u1ebd group dataframe n\u00e0y theo c\u1ed9t Sentences # \u0111\u1ec3 n\u1ed1i c\u00e1c t\u1eeb trong t\u1eebng d\u00f2ng ri\u00eang r\u1ebd v\u1ec1 m\u1ed9t c\u00e2u:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>agg = lambda s: &#91;(w, p, t) for w, p, t in zip(s&#91;'Word'].values.tolist(),                                                     s&#91;'POS'].values.tolist(),                                       s&#91;'Tag'].values.tolist())]\n        self.grouped = self.df.groupby(\"Sentence #\").apply(agg)\n        self.sentences = &#91;s for s in self.grouped]<\/code><\/pre>\n\n\n\n<ins class=\"adsbygoogle\" style=\"display:block\" data-ad-client=\"ca-pub-5095883280136027\" data-ad-slot=\"7735063137\" data-ad-format=\"auto\" data-full-width-responsive=\"true\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n\n\n<p>\u0110o\u1ea1n ch\u01b0\u01a1ng tr\u00ecnh tr\u00ean m\u00ecnh vi\u1ebft ri\u00eang ra 1 class ri\u00eang (file cls_sentences.py) \u0111\u1ec3 tr\u00e1nh r\u1ed1i ch\u01b0\u01a1ng tr\u00ecnh ch\u00ednh nh\u00e9.<\/p>\n\n\n\n<p>R\u1ed3i sau b\u01b0\u1edbc tr\u00ean ta \u0111\u00e3 c\u00f3 danh s\u00e1ch c\u00e1c c\u00e2u l\u01b0u trong bi\u1ebfn sentences, m\u1ed7i item s\u1ebd c\u00f3 3 gi\u00e1 tr\u1ecb l\u00e0 [word , pos, tag].<\/p>\n\n\n\n<h5 class=\"wp-block-heading\">Encoding d\u1eef li\u1ec7u<\/h5>\n\n\n\n<p>B\u00e2y gi\u1edd n\u1ebfu nh\u01b0 \u0111\u1ec3 d\u1eef li\u1ec7u \u1edf d\u1ea1ng text th\u00f4ng th\u01b0\u1eddng th\u00ec ch\u1eafc ch\u1eafn model s\u1ebd kh\u00f4ng th\u1ec3 x\u1eed l\u00fd \u0111\u01b0\u1ee3c, ch\u00fang ta ti\u1ebfn \u00e0nh x\u00e2y d\u1ef1ng vocab v\u00e0 encode th\u00e0nh c\u00e1c vector  s\u1ed1.<\/p>\n\n\n\n<p>\u0110\u1ea7u ti\u00ean l\u00e0 b\u01b0\u1edbc x\u00e2y d\u1ef1ng vocab v\u00e0 4 dictionary \u0111\u1ec3 map word th\u00e0nh index, tag th\u00e0nh index v\u00e0 ng\u01b0\u1ee3c l\u1ea1i:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    # X\u00e2y d\u1ef1ng vocab cho word v\u00e0 tag\n    words = list(df&#91;'Word'].unique())\n    tags = list(df&#91;'Tag'].unique())\n\n    # T\u1ea1o dict word to index, th\u00eam 2 t\u1eeb \u0111\u1eb7c bi\u1ec7t l\u00e0 Unknown v\u00e0 Padding\n    word2idx = {w : i + 2 for i, w in enumerate(words)}\n    word2idx&#91;\"UNK\"] = 1\n    word2idx&#91;\"PAD\"] = 0\n\n    # T\u1ea1o dict tag to index, th\u00eam 1 tag \u0111\u1eb7c bi\u1ec7t v\u00e0 Padding\n    tag2idx = {t : i + 1 for i, t in enumerate(tags)}\n    tag2idx&#91;\"PAD\"] = 0\n\n    # T\u1ea1o 2 dict index to word v\u00e0 index to tag\n    idx2word = {i: w for w, i in word2idx.items()}\n    idx2tag = {i: w for w, i in tag2idx.items()}<\/code><\/pre>\n\n\n\n<p>\u1ede \u0111\u00e2y c\u00e1c b\u1ea1n ch\u00fa \u00fd m\u1ed9t \u0111i\u1ec3m l\u00e0 t\u1ea1i sao l\u1ea1i ph\u1ea3i th\u00eam 2 t\u1eeb \u0111\u1eb7c bi\u1ec7t l\u00e0 Unknown v\u00e0 Padding? L\u00fd do nh\u01b0 sau:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Th\u00eam t\u1eeb Unknown \u0111\u1ec3 deal v\u1edbi c\u00e1c t\u1eeb kh\u00f4ng c\u00f3 trong vocab khi predict, n\u1ebfu nh\u01b0 g\u1eb7p c\u00e1c t\u1eeb kh\u00f4ng bi\u1ebft th\u00ec quy h\u1ebft v\u1ec1 t\u1eeb Unknow n\u00e0y.<\/li><li>Th\u00eam t\u1eeb Padding, ch\u00ednh l\u00e0 t\u1eeb ta s\u1ebd s\u1eed d\u1ee5ng \u0111\u1ec3 ch\u1ec1n th\u00eam v\u00e0o cu\u1ed1i c\u00e1c c\u00e2u ng\u1eafn h\u01a1n 1 length c\u1ed1 \u0111\u1ecbnh do ch\u00fang ta quy \u0111\u1ecbnh. Ch\u1eafc b\u1ea1n v\u1eabn th\u1eafc m\u1eafc sao l\u1ea1i ph\u1ea3i l\u00e0m th\u1ebf? \u0110\u01a1n gi\u1ea3n v\u00ec khi feed v\u00e0o v\u00e0o c\u00e1c model ta lu\u00f4n c\u1ea7n length c\u1ed1 \u0111\u1ecbnh trong khi c\u00e1c c\u00e2u th\u00ec c\u00e2u d\u00e0i c\u00e2u ng\u1eafn kh\u00e1c nhau -&gt; c\u1ea7n ph\u1ea3i padding cho \u0111\u1ec1u nhau m\u1edbi \u0111\u01b0a v\u00e0o m\u1ea1ng \u0111\u01b0\u1ee3c. Ta hay ch\u1ecdn \u0111\u1ed9 d\u00e0i c\u00e2u d\u00e0i nh\u1ea5t \u0111\u1ec3 padding c\u00e1c c\u00e2u ng\u1eafn h\u01a1n v\u1ec1 \u0111\u1ed9 d\u00e0i \u0111\u00f3.<\/li><\/ul>\n\n\n\n<p>Sau khi \u0111\u00e3 t\u1ea1o \u0111\u01b0\u1ee3c c\u00e1c dict, ta ti\u1ebfn h\u00e0nh map c\u00e1c c\u00e2u v\u0103n b\u1ea3n v\u00e0 c\u00e1c tag v\u1ec1 index:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>   # Chuy\u1ec3n c\u00e1c c\u00e2u v\u1ec1 d\u1ea1ng vector of index\n    X = &#91;&#91;word2idx&#91;w&#91;0]] for w in s] for s in sentences]\n    # Padding c\u00e1c c\u00e2u v\u1ec1 max_len\n    X = pad_sequences(maxlen = max_len, sequences = X, padding = \"post\", value = word2idx&#91;\"PAD\"])\n    # Chuy\u1ec3n c\u00e1c tag v\u1ec1 d\u1ea1ng index\n    y = &#91;&#91;tag2idx&#91;w&#91;2]] for w in s] for s in sentences]\n    # Ti\u1ec1n h\u00e0nh padding v\u1ec1 max_len\n    y = pad_sequences(maxlen = max_len, sequences = y, padding = \"post\", value = tag2idx&#91;\"PAD\"])<\/code><\/pre>\n\n\n\n<p>Sau b\u01b0\u1edbc n\u00e0y c\u00e1c c\u00e2u v\u00e0 c\u00e1c vector tag c\u1ee7a c\u00e2u s\u1ebd c\u00f3 d\u1ea1ng:<\/p>\n\n\n\n<ins class=\"adsbygoogle\" style=\"display:block\" data-ad-client=\"ca-pub-5095883280136027\" data-ad-slot=\"7735063137\" data-ad-format=\"auto\" data-full-width-responsive=\"true\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n\n\n<pre class=\"wp-block-code\"><code># C\u00e2u s\u1ebd l\u00e0 vector ch\u1ee9a c\u00e1c word index\n&#91;1 332 3300 760 87 3 22 300]\n# Vector tag s\u1ebd ch\u1ee9a tag index t\u01b0\u01a1ng \u1ee9ng v\u1edbi c\u00e1c t\u1eeb trong c\u00e2u\n&#91;2 3 3 15 15 2 2 2 2]<\/code><\/pre>\n\n\n\n<p>V\u00e0 \u0111\u00ea ti\u1ebfn h\u00e0nh train, ta c\u1ea7n l\u00e0m th\u00eam m\u1ed9t b\u01b0\u1edbc l\u00e0 chuy\u1ec3n c\u00e1c tag index v\u1ec1 d\u1ea1ng One-hot<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Chuy\u1ec3n y v\u1ec1 d\u1ea1ng one-hot\n    num_tag = df&#91;'Tag'].nunique()\n    y = &#91;to_categorical(i, num_classes = num_tag + 1) for i in y]<\/code><\/pre>\n\n\n\n<p>V\u00e0 cu\u1ed1i c\u00f9ng l\u00e0 chia train, test:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.15)<\/code><\/pre>\n\n\n\n<p>Sau b\u01b0\u1edbc n\u00e0y th\u00ec d\u1eef li\u1ec7u \u0111\u00e3 s\u1eb5n s\u00e0ng, ta sang b\u01b0\u1edbc ti\u1ebfp nh\u00e9<\/p>\n\n\n\n<h5 class=\"wp-block-heading\">Build v\u00e0 train model<\/h5>\n\n\n\n<p>Model \u1edf \u0111\u00e2y kh\u00e1 \u0111\u01a1n gi\u1ea3n, ch\u1ec9 g\u1ed3m c\u00e1c l\u1edbp:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Embedding: \u0110\u1ec3 embed c\u00e1c c\u00e2u v\u0103n b\u1ea3n. C\u1ee5 th\u1ec3 l\u00e0 bi\u1ebfn c\u00e1c word index th\u00e0nh c\u00e1c vector n chi\u1ec1u c\u1ed1 \u0111\u1ecbnh.<\/li><li>Bidirection LSTM v\u1edbi return_sequence=True<\/li><li>TimeDistributed Layer \u0111\u1ec3 l\u1ea5y ra vector Dense cho t\u1eebng t\u1eeb l\u1ea1i m\u1ed7i step.<\/li><li>CRF \u1edf tr\u00ean c\u00f9ng <\/li><\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>    input = Input(shape=(max_len,))\n    model = Embedding(input_dim=len(words) + 2, output_dim=embedding, input_length=max_len, mask_zero=False)(input)\n    model = Bidirectional(LSTM(units=hidden_size, return_sequences=True, recurrent_dropout=0.1))(model)\n    model = TimeDistributed(Dense(hidden_size, activation=\"relu\"))(model)\n    crf = CRF(num_tags + 1)  # CRF layer\n    out = crf(model)  # output<\/code><\/pre>\n\n\n\n<p>Model n\u00e0y s\u1eed d\u1ee5ng loss v\u00e0 accuracy c\u1ee7a l\u1edbp CRF \u0111\u1ec3 fine tune model.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    model = Model(input, out)\n    model.compile(optimizer=\"rmsprop\", loss=crf.loss_function, metrics=&#91;crf.accuracy])\n<\/code><\/pre>\n\n\n\n<p>Trong b\u00e0i m\u00ecnh c\u00f3 d\u00f9ng m\u1ed9t s\u1ed1 th\u1ee7 thu\u1eadt d\u1ec3 l\u01b0u l\u1ea1i file data c\u0169ng nh\u01b0 l\u00e0 check xem \u0111\u00e3 train hay ch\u01b0a \u0111\u1ec3 th\u1ef1c hi\u1ec7n qu\u00e1 tr\u00ecnh train, c\u00e1c b\u1ea1n \u0111\u1ecdc source \u0111\u1ec3 hi\u1ec3u r\u00f5 h\u01a1n nh\u00e9. <\/p>\n\n\n\n<p>\u1ede \u0111\u00e2y m\u00ecnh ch\u1ec9 n\u00eau ph\u1ea7n ch\u00ednh l\u00e0 train model k\u00e8m v\u1edbi m\u1ed9t checkpoint \u0111\u1ec3 l\u01b0u l\u1ea1i weights nh\u00e9:<\/p>\n\n\n\n<ins class=\"adsbygoogle\" style=\"display:block\" data-ad-client=\"ca-pub-5095883280136027\" data-ad-slot=\"7735063137\" data-ad-format=\"auto\" data-full-width-responsive=\"true\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n\n\n<pre class=\"wp-block-code\"><code>    checkpoint = ModelCheckpoint(filepath = 'model.hdf5',\n                           verbose = 0,\n                           mode = 'auto',\n                           save_best_only = True,\n                           monitor='val_loss')\n    history = model.fit(X_train, np.array(y_train), batch_size=batch_size, epochs=epochs,\n                        validation_split=0.1, callbacks=&#91;checkpoint])<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Ph\u1ea7n 4 &#8211; Ki\u00eam th\u1eed model Named Entity Recognition<\/h2>\n\n\n\n<p>Sau khi train xong ta th\u1eed Eval tr\u00ean t\u1eadp test xem k\u1ebft qu\u1ea3 nh\u01b0 n\u00e0o:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Test v\u1edbi to\u00e0n b\u1ed9 t\u1eadp test\ny_pred = model.predict(X_test)\ny_pred = np.argmax(y_pred, axis=-1)\ny_test_true = np.argmax(y_test, -1)\n\n# Ki\u1ec3m th\u1eed F1-Score\ny_pred = &#91;&#91;idx2tag&#91;i] for i in row] for row in y_pred]\ny_test_true = &#91;&#91;idx2tag&#91;i] for i in row] for row in y_test_true]\nprint(\"F1-score is : {:.1%}\".format(f1_score(y_test_true, y_pred)))\n<\/code><\/pre>\n\n\n\n<p>\u1ede \u0111\u00e2y F1- Score l\u00e0 83.1%, c\u0169ng kh\u00e1 \u1ed5n r\u1ed3i.<\/p>\n\n\n\n<p>Ho\u1eb7c ta c\u0169ng c\u00f3 th\u1ec3 test v\u1edbi 1 c\u00e2u random trong t\u1eadp test \u0111\u1ec3 ki\u1ec3m tra k\u1ebft qu\u1ea3:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Test v\u1edbi m\u1ed9t c\u00e2u ng\u1eabu nhi\u00ean trong t\u1eadp test\nidx = np.random.randint(0,X_test.shape&#91;0])\n\np = model.predict(np.array(&#91;X_test&#91;idx]]))\np = np.argmax(p, axis=-1)\ntrue = np.argmax(y_test&#91;i], -1)\n\nprint(\"Example #{}\".format(idx))\n\nprint(\"{:15}||{:5}||{}\".format(\"Word\", \"True\", \"Pred\"))\nprint(40 * \"*\")\nfor w, t, pred in zip(X_test&#91;idx], true, p&#91;0]):\n    if w != 0:\n        print(\"{:15}: {:5} {}\".format(words&#91;w-2], idx2tag&#91;t], idx2tag&#91;pred]))<\/code><\/pre>\n\n\n\n<p>K\u1ebft qu\u1ea3 in ra m\u00e0n h\u00ecnh r\u1ea5t \u1ed5n, kh\u00e1 \u0111\u00fang v\u1edbi true label:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Example #6198\nWord           ||True ||Pred\n****************************************\nTickets        : O     O\nfor            : O     O\nthe            : O     O\nso-called      : O     O\n\"              : O     O\nFootball       : O     O\nfor            : O     O\nHope           : O     O\n\"              : O     O\nmatch          : O     O\nFebruary       : B-tim B-tim\n15             : I-tim I-tim\nin             : O     O\nBarcelona      : B-geo B-geo\nwill           : O     O\ncost           : O     O\nbetween        : O     O\n$              : O     O\n13             : O     O\n-              : O     O\n$              : O     O\n38             : O     O\n.              : O     O<\/code><\/pre>\n\n\n\n<ins class=\"adsbygoogle\" style=\"display:block\" data-ad-client=\"ca-pub-5095883280136027\" data-ad-slot=\"7735063137\" data-ad-format=\"auto\" data-full-width-responsive=\"true\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n\n\n<p>Ch\u00fa \u00fd c\u00e1c b\u1ea1n s\u1ebd th\u1ea5y t\u1eeb February 15 d\u00e3 \u0111\u01b0\u1ee3c nh\u1eadn \u0111\u00fang l\u00e0 time (B-tim) v\u00e0 Barcelone l\u00e0 B-geo, ngh\u0129a l\u00e0 Geographical Entity.<\/p>\n\n\n\n<p>To\u00e0n b\u1ed9 source v\u00e0 data c\u00e1c b\u1ea1n c\u00f3 th\u1ec3 t\u1ea3i t\u1ea1i github c\u1ee7a m\u00ecnh <strong><a href=\"https:\/\/github.com\/thangnch\/MIAI_NER\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">t\u1ea1i \u0111\u00e2y<\/a><\/strong> nh\u00e9!<\/p>\n\n\n\n<p>Okie, nh\u01b0 v\u1eady m\u00ecnh \u0111\u00e3 guide c\u00e1c b\u1ea1n c\u00e1ch t\u1ef1 train m\u1ed9t model NER &#8211; Named Entity Recognition ch\u1ea1y \u0111\u01b0\u1ee3c v\u00e0 \u1ed5n. N\u1ebfu c\u00f2n v\u01b0\u1edbng g\u00ec c\u00e1c b\u1ea1n c\u1ee9 post l\u00ean Group trao \u0111\u1ed5i, chia s\u1ebb: <a href=\"https:\/\/facebook.com\/groups\/miaigroup\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>https:\/\/facebook.com\/groups\/miaigroup<\/strong><\/a> \u0111\u1ec3 c\u00f9ng giao l\u01b0u nh\u00e9.<\/p>\n\n\n\n<p>H\u1eb9n g\u1eb7p l\u1ea1i c\u00e1c b\u1ea1n trong c\u00e1c b\u00e0i ti\u1ebfp theo!<\/p>\n\n\n\n<p>Ch\u00fac c\u00e1c b\u1ea1n th\u00e0nh c\u00f4ng!<\/p>\n\n\n\n<p>Fanpage:&nbsp;<strong><a href=\"http:\/\/facebook.com\/miaiblog\" target=\"_blank\" rel=\"noreferrer noopener\">http:\/\/facebook.com\/miaiblog<\/a><\/strong><br>Group trao \u0111\u1ed5i, chia s\u1ebb:&nbsp;<a href=\"https:\/\/www.facebook.com\/groups\/miaigroup\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>https:\/\/www.facebook.com\/groups\/miaigroup<\/strong><\/a><br>Website:&nbsp;<a href=\"https:\/\/miai.vn\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>https:\/\/miai.vn\/<\/strong><\/a><br>Youtube:&nbsp;<strong><a href=\"http:\/\/bit.ly\/miaiyoutube\" target=\"_blank\" rel=\"noreferrer noopener\">http:\/\/bit.ly\/miaiyoutube<\/a><\/strong><\/p>\n\n\n\n<ins class=\"adsbygoogle\" style=\"display:block\" data-ad-client=\"ca-pub-5095883280136027\" data-ad-slot=\"7735063137\" data-ad-format=\"auto\" data-full-width-responsive=\"true\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n\n\n<p>C\u1ea3m \u01a1n b\u00e0i tham kh\u1ea3o tuy\u1ec7t v\u1eddi <a href=\"https:\/\/github.com\/Akshayc1\/named-entity-recognition\/blob\/master\/NER%20using%20Bidirectional%20LSTM%20-%20CRF%20.ipynb\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">t\u1ea1i \u0111\u00e2y<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Ch\u00e0o tu\u1ea7n m\u1edbi c\u00e1c member M\u00ec th\u00e2n y\u00eau, h\u00f4m nay ch\u00fang ta s\u1ebd c\u00f9ng \u0111i t\u00ecm hi\u1ec3u v\u1ec1 Named Entity Recognition &#8211; Nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u khi x\u1eed l\u00fd ng\u00f4n ng\u1eef t\u1ef1 nhi\u00ean nh\u00e9. M\u00f3n n\u00e0y hay g\u1ecdi l\u00e0 NER \u0111\u00f3 c\u00e1c mem. C\u00e1c b\u00e0i v\u1ec1 NLP &#8211; x\u1eed l\u00fd ng\u00f4n ng\u1eef t\u1ef1 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[67],"tags":[63,374,68,375,376,69,377,378,32],"class_list":["post-1123","post","type-post","status-publish","format-standard","hentry","category-natural-language-processing","tag-lstm","tag-named-entity-recognition","tag-nature-language-processing","tag-ner","tag-nhan-dien-thuc-the-trong-cau","tag-nlp","tag-part-of-speech","tag-pos","tag-python"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Named Entity Recognition (NER) - Nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u - M\u00ec AI<\/title>\n<meta name=\"description\" content=\"H\u00f4m nay ch\u00fang ta s\u1ebd c\u00f9ng \u0111i t\u00ecm hi\u1ec3u v\u1ec1 Named Entity Recognition(NER) - Nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u khi x\u1eed l\u00fd ng\u00f4n ng\u1eef t\u1ef1 nhi\u00ean nh\u00e9\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/miai.vn\/?p=1123\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Named Entity Recognition (NER) - Nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u - M\u00ec AI\" \/>\n<meta property=\"og:description\" content=\"H\u00f4m nay ch\u00fang ta s\u1ebd c\u00f9ng \u0111i t\u00ecm hi\u1ec3u v\u1ec1 Named Entity Recognition(NER) - Nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u khi x\u1eed l\u00fd ng\u00f4n ng\u1eef t\u1ef1 nhi\u00ean nh\u00e9\" \/>\n<meta property=\"og:url\" content=\"https:\/\/miai.vn\/?p=1123\" \/>\n<meta property=\"og:site_name\" content=\"M\u00ec AI\" \/>\n<meta property=\"article:published_time\" content=\"2020-08-16T06:17:13+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/d2ueix13hy5h3i.cloudfront.net\/wp-content\/uploads\/2019\/06\/3.png\" \/>\n<meta name=\"author\" content=\"Ch\u1ee7 ti\u1ec7m M\u00ec\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Ch\u1ee7 ti\u1ec7m M\u00ec\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"15 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/miai.vn\\\/?p=1123#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/miai.vn\\\/?p=1123\"},\"author\":{\"name\":\"Ch\u1ee7 ti\u1ec7m M\u00ec\",\"@id\":\"https:\\\/\\\/miai.vn\\\/#\\\/schema\\\/person\\\/cc8bc24bb90bd3f596add82f3a59948c\"},\"headline\":\"Named Entity Recognition &#8211; Nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u khi x\u1eed l\u00fd ng\u00f4n ng\u1eef t\u1ef1 nhi\u00ean\",\"datePublished\":\"2020-08-16T06:17:13+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/miai.vn\\\/?p=1123\"},\"wordCount\":2260,\"commentCount\":4,\"publisher\":{\"@id\":\"https:\\\/\\\/miai.vn\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/miai.vn\\\/?p=1123#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/d2ueix13hy5h3i.cloudfront.net\\\/wp-content\\\/uploads\\\/2019\\\/06\\\/3.png\",\"keywords\":[\"LSTM\",\"Named Entity Recognition\",\"Nature Language Processing\",\"NER\",\"nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u\",\"NLP\",\"Part of Speech\",\"POS\",\"python\"],\"articleSection\":[\"Natural Language Processing\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/miai.vn\\\/?p=1123#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/miai.vn\\\/?p=1123\",\"url\":\"https:\\\/\\\/miai.vn\\\/?p=1123\",\"name\":\"Named Entity Recognition (NER) - Nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u - M\u00ec AI\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/miai.vn\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/miai.vn\\\/?p=1123#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/miai.vn\\\/?p=1123#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/d2ueix13hy5h3i.cloudfront.net\\\/wp-content\\\/uploads\\\/2019\\\/06\\\/3.png\",\"datePublished\":\"2020-08-16T06:17:13+00:00\",\"description\":\"H\u00f4m nay ch\u00fang ta s\u1ebd c\u00f9ng \u0111i t\u00ecm hi\u1ec3u v\u1ec1 Named Entity Recognition(NER) - Nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u khi x\u1eed l\u00fd ng\u00f4n ng\u1eef t\u1ef1 nhi\u00ean nh\u00e9\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/miai.vn\\\/?p=1123#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/miai.vn\\\/?p=1123\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/miai.vn\\\/?p=1123#primaryimage\",\"url\":\"https:\\\/\\\/d2ueix13hy5h3i.cloudfront.net\\\/wp-content\\\/uploads\\\/2019\\\/06\\\/3.png\",\"contentUrl\":\"https:\\\/\\\/d2ueix13hy5h3i.cloudfront.net\\\/wp-content\\\/uploads\\\/2019\\\/06\\\/3.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/miai.vn\\\/?p=1123#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/miai.vn\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Named Entity Recognition &#8211; Nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u khi x\u1eed l\u00fd ng\u00f4n ng\u1eef t\u1ef1 nhi\u00ean\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/miai.vn\\\/#website\",\"url\":\"https:\\\/\\\/miai.vn\\\/\",\"name\":\"M\u00ec AI\",\"description\":\"H\u1ecdc AI theo c\u00e1ch M\u00ec \u0103n li\u1ec1n!\",\"publisher\":{\"@id\":\"https:\\\/\\\/miai.vn\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/miai.vn\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/miai.vn\\\/#organization\",\"name\":\"M\u00ec AI\",\"url\":\"https:\\\/\\\/miai.vn\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/miai.vn\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/miai.vn\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/cropped-Logo_w_slogan.png\",\"contentUrl\":\"https:\\\/\\\/miai.vn\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/cropped-Logo_w_slogan.png\",\"width\":240,\"height\":193,\"caption\":\"M\u00ec AI\"},\"image\":{\"@id\":\"https:\\\/\\\/miai.vn\\\/#\\\/schema\\\/logo\\\/image\\\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/miai.vn\\\/#\\\/schema\\\/person\\\/cc8bc24bb90bd3f596add82f3a59948c\",\"name\":\"Ch\u1ee7 ti\u1ec7m M\u00ec\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b0b5124b0d2c0a8054d7127d2c236bdc3dc7a50e2d4e8728ab32eee5b122a8d1?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b0b5124b0d2c0a8054d7127d2c236bdc3dc7a50e2d4e8728ab32eee5b122a8d1?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b0b5124b0d2c0a8054d7127d2c236bdc3dc7a50e2d4e8728ab32eee5b122a8d1?s=96&d=mm&r=g\",\"caption\":\"Ch\u1ee7 ti\u1ec7m M\u00ec\"},\"sameAs\":[\"https:\\\/\\\/miai.vn\"],\"url\":\"https:\\\/\\\/miai.vn\\\/?author=1\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Named Entity Recognition (NER) - Nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u - M\u00ec AI","description":"H\u00f4m nay ch\u00fang ta s\u1ebd c\u00f9ng \u0111i t\u00ecm hi\u1ec3u v\u1ec1 Named Entity Recognition(NER) - Nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u khi x\u1eed l\u00fd ng\u00f4n ng\u1eef t\u1ef1 nhi\u00ean nh\u00e9","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/miai.vn\/?p=1123","og_locale":"en_US","og_type":"article","og_title":"Named Entity Recognition (NER) - Nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u - M\u00ec AI","og_description":"H\u00f4m nay ch\u00fang ta s\u1ebd c\u00f9ng \u0111i t\u00ecm hi\u1ec3u v\u1ec1 Named Entity Recognition(NER) - Nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u khi x\u1eed l\u00fd ng\u00f4n ng\u1eef t\u1ef1 nhi\u00ean nh\u00e9","og_url":"https:\/\/miai.vn\/?p=1123","og_site_name":"M\u00ec AI","article_published_time":"2020-08-16T06:17:13+00:00","og_image":[{"url":"https:\/\/d2ueix13hy5h3i.cloudfront.net\/wp-content\/uploads\/2019\/06\/3.png","type":"","width":"","height":""}],"author":"Ch\u1ee7 ti\u1ec7m M\u00ec","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Ch\u1ee7 ti\u1ec7m M\u00ec","Est. reading time":"15 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/miai.vn\/?p=1123#article","isPartOf":{"@id":"https:\/\/miai.vn\/?p=1123"},"author":{"name":"Ch\u1ee7 ti\u1ec7m M\u00ec","@id":"https:\/\/miai.vn\/#\/schema\/person\/cc8bc24bb90bd3f596add82f3a59948c"},"headline":"Named Entity Recognition &#8211; Nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u khi x\u1eed l\u00fd ng\u00f4n ng\u1eef t\u1ef1 nhi\u00ean","datePublished":"2020-08-16T06:17:13+00:00","mainEntityOfPage":{"@id":"https:\/\/miai.vn\/?p=1123"},"wordCount":2260,"commentCount":4,"publisher":{"@id":"https:\/\/miai.vn\/#organization"},"image":{"@id":"https:\/\/miai.vn\/?p=1123#primaryimage"},"thumbnailUrl":"https:\/\/d2ueix13hy5h3i.cloudfront.net\/wp-content\/uploads\/2019\/06\/3.png","keywords":["LSTM","Named Entity Recognition","Nature Language Processing","NER","nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u","NLP","Part of Speech","POS","python"],"articleSection":["Natural Language Processing"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/miai.vn\/?p=1123#respond"]}]},{"@type":"WebPage","@id":"https:\/\/miai.vn\/?p=1123","url":"https:\/\/miai.vn\/?p=1123","name":"Named Entity Recognition (NER) - Nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u - M\u00ec AI","isPartOf":{"@id":"https:\/\/miai.vn\/#website"},"primaryImageOfPage":{"@id":"https:\/\/miai.vn\/?p=1123#primaryimage"},"image":{"@id":"https:\/\/miai.vn\/?p=1123#primaryimage"},"thumbnailUrl":"https:\/\/d2ueix13hy5h3i.cloudfront.net\/wp-content\/uploads\/2019\/06\/3.png","datePublished":"2020-08-16T06:17:13+00:00","description":"H\u00f4m nay ch\u00fang ta s\u1ebd c\u00f9ng \u0111i t\u00ecm hi\u1ec3u v\u1ec1 Named Entity Recognition(NER) - Nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u khi x\u1eed l\u00fd ng\u00f4n ng\u1eef t\u1ef1 nhi\u00ean nh\u00e9","breadcrumb":{"@id":"https:\/\/miai.vn\/?p=1123#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/miai.vn\/?p=1123"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/miai.vn\/?p=1123#primaryimage","url":"https:\/\/d2ueix13hy5h3i.cloudfront.net\/wp-content\/uploads\/2019\/06\/3.png","contentUrl":"https:\/\/d2ueix13hy5h3i.cloudfront.net\/wp-content\/uploads\/2019\/06\/3.png"},{"@type":"BreadcrumbList","@id":"https:\/\/miai.vn\/?p=1123#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/miai.vn\/"},{"@type":"ListItem","position":2,"name":"Named Entity Recognition &#8211; Nh\u1eadn di\u1ec7n th\u1ef1c th\u1ec3 trong c\u00e2u khi x\u1eed l\u00fd ng\u00f4n ng\u1eef t\u1ef1 nhi\u00ean"}]},{"@type":"WebSite","@id":"https:\/\/miai.vn\/#website","url":"https:\/\/miai.vn\/","name":"M\u00ec AI","description":"H\u1ecdc AI theo c\u00e1ch M\u00ec \u0103n li\u1ec1n!","publisher":{"@id":"https:\/\/miai.vn\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/miai.vn\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/miai.vn\/#organization","name":"M\u00ec AI","url":"https:\/\/miai.vn\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/miai.vn\/#\/schema\/logo\/image\/","url":"https:\/\/miai.vn\/wp-content\/uploads\/2026\/05\/cropped-Logo_w_slogan.png","contentUrl":"https:\/\/miai.vn\/wp-content\/uploads\/2026\/05\/cropped-Logo_w_slogan.png","width":240,"height":193,"caption":"M\u00ec AI"},"image":{"@id":"https:\/\/miai.vn\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/miai.vn\/#\/schema\/person\/cc8bc24bb90bd3f596add82f3a59948c","name":"Ch\u1ee7 ti\u1ec7m M\u00ec","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/b0b5124b0d2c0a8054d7127d2c236bdc3dc7a50e2d4e8728ab32eee5b122a8d1?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/b0b5124b0d2c0a8054d7127d2c236bdc3dc7a50e2d4e8728ab32eee5b122a8d1?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/b0b5124b0d2c0a8054d7127d2c236bdc3dc7a50e2d4e8728ab32eee5b122a8d1?s=96&d=mm&r=g","caption":"Ch\u1ee7 ti\u1ec7m M\u00ec"},"sameAs":["https:\/\/miai.vn"],"url":"https:\/\/miai.vn\/?author=1"}]}},"_links":{"self":[{"href":"https:\/\/miai.vn\/index.php?rest_route=\/wp\/v2\/posts\/1123","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/miai.vn\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/miai.vn\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/miai.vn\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/miai.vn\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1123"}],"version-history":[{"count":0,"href":"https:\/\/miai.vn\/index.php?rest_route=\/wp\/v2\/posts\/1123\/revisions"}],"wp:attachment":[{"href":"https:\/\/miai.vn\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1123"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/miai.vn\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1123"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/miai.vn\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1123"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}