{"id":864,"date":"2020-05-11T10:33:41","date_gmt":"2020-05-11T03:33:41","guid":{"rendered":"http:\/\/www.miai.vn\/?p=864"},"modified":"2020-05-11T10:33:41","modified_gmt":"2020-05-11T03:33:41","slug":"nlp-series-2-su-dung-gensim-word2vec-de-day-may-tinh-ngui-van-phan-loai-van-ban","status":"publish","type":"post","link":"https:\/\/miai.vn\/?p=864","title":{"rendered":"[NLP Series #2] S\u1eed d\u1ee5ng Gensim Word2Vec \u0111\u1ec3 d\u1ea1y m\u00e1y t\u00ednh &#8220;ng\u1eedi&#8221; v\u0103n (ph\u00e2n lo\u1ea1i v\u0103n b\u1ea3n)"},"content":{"rendered":"\n<p>Trong b\u00e0i n\u00e0y ch\u00fang ta s\u1ebd s\u1eed d\u1ee5ng m\u1ed9t ph\u01b0\u01a1ng ph\u00e1p Word Embedding m\u1edbi l\u00e0 Gensim Word2Vec \u0111\u1ec3 t\u1ea1o vector \u0111\u1eb7c tr\u01b0ng cho c\u00e1c t\u1eeb v\u00e0 train model &#8220;ng\u1eedi v\u0103n&#8221; (ph\u00e2n lo\u1ea1i v\u0103n b\u1ea3n) cho m\u00e1y t\u00ednh nh\u00e9.<\/p>\n\n\n\n<p>B\u00e0i n\u00e0y l\u00e0 b\u00e0i th\u1ee9 2 trong Series v\u1ec1 NLP, b\u00e0i tr\u01b0\u1edbc m\u00ecnh \u0111\u00e3 chia s\u1ebb c\u00e1ch s\u1eed d\u1ee5ng TFIDF \u0111\u1ec3 l\u00e0m bi\u1ec3u di\u1ec5n v\u0103n b\u1ea3n <strong><a rel=\"noreferrer noopener\" href=\"https:\/\/www.miai.vn\/2020\/05\/04\/nlp-series-1-thu-lam-he-thong-danh-gia-san-pham-lazada\/\" target=\"_blank\">t\u1ea1i \u0111\u00e2y.<\/a><\/strong><\/p>\n\n\n\n<p>Ch\u1eafc h\u1eb3n ng\u00e0y c\u00f2n b\u00e9 anh em \u0111\u00e3 \u0111\u1ecdc truy\u1ec7n c\u01b0\u1eddi d\u00e2n gian Ng\u1eedi v\u0103n ch\u1ee9? N\u1ebfu ch\u01b0a \u0111\u1ecdc th\u00ec anh em xem \u1ea3nh b\u00ean d\u01b0\u1edbi m\u00ecnh c\u00f3 tr\u00edch d\u1eabn l\u1ea1i :<\/p>\n\n\n\n<ins class=\"adsbygoogle\" style=\"display:block\" data-ad-client=\"ca-pub-5095883280136027\" data-ad-slot=\"7735063137\" data-ad-format=\"auto\" data-full-width-responsive=\"true\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/github.com\/thangnch\/MiAI_Word2Vec_Demo\/blob\/master\/Screen%20Shot%202020-05-11%20at%2013.40.29.png?raw=true\" alt=\"\"\/><\/figure>\n\n\n\n<p>N\u00f3i vui v\u1eady c\u00f2n ch\u00fang ta s\u1ebd l\u00e0m m\u1ed9t b\u00e0i to\u00e1n l\u00e0 \u0111\u01b0a m\u1ed9t c\u00e2u v\u0103n b\u1ea3n v\u00e0o, m\u00e1y t\u00ednh ph\u1ea3i nh\u1eadn bi\u1ebft \u0111\u01b0\u1ee3c \u0111\u00e2y l\u00e0 tin kinh t\u1ebf, gi\u00e1o d\u1ee5c hay y h\u1ecdc (m\u00ecnh t\u1ea1m train 3 category n\u00e0y l\u00e0m sample nha).<\/p>\n\n\n\n<p>Trong b\u00e0i n\u00e0y c\u00e1c b\u1ea1n s\u1ebd h\u1ecdc \u0111\u01b0\u1ee3c c\u00e1c k\u1ef9 thu\u1eadt:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Ti\u1ec1n x\u1eed l\u00fd v\u0103n b\u1ea3n \u0111\u1ea7u v\u00e0o: t\u00e1ch c\u00e2u, x\u00f3a d\u1ea5u c\u00e2u,&#8230;<\/li><li>Train model Word2vec v\u00e0 t\u1ea1o Embedding Vector cho c\u00e1c c\u00e2u d\u1eef li\u1ec7u<\/li><li>Train model LSTM \u0111\u1ec3 classify v\u0103n b\u1ea3n<\/li><\/ul>\n\n\n\n<p>\u01afu \u0111i\u1ec3m c\u1ee7a model Word2vec g\u1ed3m:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>T\u1ed1c \u0111\u1ed9 kh\u00e1 nhanh<\/li><li>C\u00f3 bi\u1ec3u di\u1ec5n \u0111\u01b0\u1ee3c th\u00f4ng tin v\u1ec1 h\u01b0\u1edbng c\u1ee7a s\u1ef1 t\u01b0\u01a1ng \u0111\u1ed3ng gi\u1eefa c\u00e1c t\u1eeb c\u0169ng \u0111\u01b0\u1ee3c l\u01b0u l\u1ea1i trong m\u00f4 h\u00ecnh<\/li><\/ul>\n\n\n\n<p>B\u1ea1n n\u00e0o c\u1ea7n t\u00ecm hi\u1ec3u s\u00e2u h\u01a1n v\u1ec1 m\u00f3n n\u00e0y th\u00ec tham kh\u1ea3o <strong><a href=\"https:\/\/pathmind.com\/wiki\/word2vec\" target=\"_blank\" rel=\"noreferrer noopener\">link<\/a><\/strong> n\u00e0y nh\u00e9!<\/p>\n\n\n\n<p>Go ahead man!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Ph\u1ea7n 1 &#8211; Chu\u1ea9n b\u1ecb nguy\u00ean v\u1eadt li\u1ec7u cho b\u00e0i to\u00e1n<\/h2>\n\n\n\n<p>\u0110\u1ea7u ti\u00ean c\u1ee9 ph\u1ea3i m\u00e3 ngu\u1ed3n cho n\u00f3 t\u01b0\u1eddng minh c\u00e1i \u0111\u00e3 nh\u1edf. C\u00e1c b\u1ea1n clone m\u00e3 ngu\u1ed3n v\u1ec1 b\u1eb1ng l\u1ec7nh git quen thu\u1ed9c:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>git clone https:\/\/github.com\/thangnch\/MiAI_Word2Vec_Demo .<\/code><\/pre>\n\n\n\n<p>\u0110\u1ee3i ch\u1ea1y xong s\u1ebd th\u1ea5y th\u00eam th\u01b0 m\u1ee5c MiAI_Word2Vec_Demo, \u0111\u00f3 ch\u00ednh l\u00e0 th\u01b0 m\u1ee5c ch\u1ee9a m\u00e3 ngu\u1ed3n nh\u00e9.<\/p>\n\n\n\n<p>Ti\u1ebfp theo \u0111\u1ec3 ch\u1ea1y \u0111\u01b0\u1ee3c th\u00ec c\u00e1c b\u1ea1n c\u00e0i \u0111\u1eb7t c\u00e1c th\u01b0 vi\u1ec7n c\u1ea7n thi\u1ebft. Chuy\u1ec3n v\u00e0o th\u01b0 m\u1ee5c MiAI_Word2Vec_Demo sau \u0111\u00f3 ch\u1ea1y l\u1ec7nh:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>pip install -r setup.txt<\/code><\/pre>\n\n\n\n<ins class=\"adsbygoogle\" style=\"display:block\" data-ad-client=\"ca-pub-5095883280136027\" data-ad-slot=\"7735063137\" data-ad-format=\"auto\" data-full-width-responsive=\"true\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n\n\n<p>Sau khi in ra m\u00e0n h\u00ecnh c\u1ea3 \u0111\u1ed1ng ch\u1eef th\u00ec c\u00e1c b\u1ea1n s\u1ebd c\u00e0i \u0111\u1eb7t th\u00e0nh c\u00f4ng c\u00e1c th\u01b0 vi\u1ec7n nh\u01b0 tensorflow, keras, gensim&#8230;<\/p>\n\n\n\n<p>OK! Gi\u1edd \u0111\u1ec3 t\u1ea1m m\u00e3 ngu\u1ed3n \u0111\u00f3, ch\u01b0a s\u1edd \u0111\u1ebfn v\u1ed9i. Ch\u00fang ta c\u00f9ng \u0111i qua thu\u1eadt to\u00e1n, ph\u01b0\u01a1ng ph\u00e1p tri\u1ec3n khai c\u1ee7a b\u00e0i n\u00e0y \u0111\u00e3.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Ph\u1ea7n 2 &#8211; Ph\u01b0\u01a1ng ph\u00e1p tri\u1ec3n khai b\u00e0i to\u00e1n<\/h2>\n\n\n\n<p>B\u00e0i DL n\u00e0o c\u0169ng g\u1ed3m 02 phase quen thu\u1ed9c l\u00e0 Train v\u00e0 Test. B\u00e0i n\u00e0y c\u0169ng kh\u00f4ng ngo\u1ea1i l\u1ec7:<\/p>\n\n\n\n<h5 class=\"wp-block-heading\">Qu\u00e1 tr\u00ecnh train<\/h5>\n\n\n\n<ul class=\"wp-block-list\"><li>B\u01b0\u1edbc 1: Ta chu\u1ea9n b\u1ecb d\u1eef li\u1ec7u v\u0103n b\u1ea3n c\u1ee7a 03 l\u0129nh v\u1ef1c: Kinh t\u1ebf, Gi\u00e1o d\u1ee5c v\u00e0 Y t\u1ebf. C\u00e1c b\u1ea1n nh\u00ecn v\u00e0o th\u01b0 m\u1ee5c data s\u1ebd th\u1ea5y r\u00f5 c\u00e1c file v\u0103n b\u1ea3n \u0111\u00f3. \u0110\u00e2y l\u00e0 m\u00ecnh tr\u00edch 1 ph\u1ea7n nh\u1ecf d\u1eef li\u1ec7u \u0111\u1ec3 v\u00ed d\u1ee5, to\u00e0n b\u1ed9 d\u1eef li\u1ec7u v\u0103n b\u1ea3n c\u00e1c b\u1ea1n c\u00f3 th\u1ec3 t\u00ecm trong Th\u01b0 vi\u1ec7n M\u00ec AI: <strong><a href=\"https:\/\/miai.vn\/thu-vien-mi-ai\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/miai.vn\/thu-vien-mi-ai <\/a><\/strong>(c\u00e1c b\u1ea1n xem video clip \u0111\u1ec3 bi\u1ebft c\u00e1ch t\u1ea3i v\u1ec1 nha)<\/li><li>B\u01b0\u1edbc 2: Ta ti\u1ec1n x\u1eed l\u00fd v\u0103n b\u1ea3n \u0111\u1ea7u v\u00e0o, chia th\u00e0nh c\u00e1c c\u00e2u v\u0103n b\u1ea3n ri\u00eang bi\u1ec7t.<\/li><li>B\u01b0\u1edbc 3: Th\u1ef1c hi\u1ec7n t\u00e1ch t\u1eeb (tokenizer) v\u00e0 train m\u1ed9t model Word2Vec (train t\u1eeb \u0111\u1ea7u, ko d\u00f9ng pretrain) d\u1ef1a tr\u00ean d\u1eef li\u1ec7u \u0111\u1ea7u v\u00e0o. M\u1ee5c \u0111\u00edch l\u00e0 t\u1ea1o ra c\u00e1c vector \u0111\u1eb7c tr\u01b0ng cho t\u1eebng word trong v\u0103n b\u1ea3n c\u1ee7a ch\u00fang ta.<\/li><li>B\u01b0\u1edbc 4: Nh\u00e9t d\u1eef li\u1ec7u input v\u00e0o m\u1ea1ng LSTM \u0111\u01a1n gi\u1ea3n \u0111\u1ec3 train v\u1edbi c\u00e1c l\u01b0u \u00fd:<ul><li>\u0110\u1ea7u v\u00e0o : L\u1edbp Embedding v\u1edbi weights ch\u00ednh l\u00e0 weights c\u1ee7a model Word2vec v\u1eeba train. M\u1ee5c \u0111\u00edch c\u1ee7a l\u1edbp n\u00e0y l\u00e0 chuy\u1ec3n t\u1eeb v\u0103n b\u1ea3n \u0111\u1ea7u v\u00e0o \u0111\u00e3 \u0111\u01b0\u1ee3c tokenizer (n\u00f3i cho \u0111\u01a1n gi\u1ea3n, th\u1ef1c ra l\u00e0 v\u0103n b\u1ea3n \u0111\u1ea7u v\u00e0o \u0111\u00e3 \u0111\u01b0\u1ee3c thay b\u1eb1ng word index) th\u00e0nh m\u1ed9t embedding matrix.<\/li><li>\u0110\u1ea7u ra: L\u1edbp Softmax th\u1ea7n th\u00e1nh v\u1edbi \u0111\u1ea7u ra l\u00e0 3 class ch\u00fang ta c\u1ea7n.<\/li><\/ul><\/li><\/ul>\n\n\n\n<ins class=\"adsbygoogle\" style=\"display:block\" data-ad-client=\"ca-pub-5095883280136027\" data-ad-slot=\"7735063137\" data-ad-format=\"auto\" data-full-width-responsive=\"true\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n\n\n<h5 class=\"wp-block-heading\">Qu\u00e1 tr\u00ecnh test:<\/h5>\n\n\n\n<ul class=\"wp-block-list\"><li>\u0110\u1ec3 \u0111\u01a1n gi\u1ea3n ch\u00fang ta s\u1ebd evaluate tr\u00ean t\u1eadp test lu\u00f4n \u0111\u1ec3 xem model c\u1ee7a ch\u00fang ta predict nh\u01b0 th\u1ebf n\u00e0o nh\u00e9.<\/li><li>C\u00e1c b\u1ea1n ho\u00e0n to\u00e0n c\u00f3 th\u1ec3 s\u1eeda l\u1ea1i m\u00e3 ngu\u1ed3n \u0111\u1ec3 predict m\u1ed9t c\u00e2u b\u1ea5t k\u00ec. <\/li><\/ul>\n\n\n\n<p>T\u1ea1m th\u1ebf, c\u00f3 g\u00ec ch\u01b0a hi\u1ec3u c\u00e1c b\u1ea1n c\u1ee9 post l\u00ean Group trao \u0111\u1ed5i, chia s\u1ebb: <strong><a rel=\"noreferrer noopener\" href=\"https:\/\/facebook.com\/groups\/miaigroup\" target=\"_blank\">https:\/\/facebook.com\/groups\/miaigroup<\/a><\/strong> \u0111\u1ec3 c\u00f9ng giao l\u01b0u, h\u1ecfi \u0111\u00e1p nh\u00e9.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Ph\u1ea7n 3 &#8211; Tri\u1ec3n khai thu\u1eadt to\u00e1n ph\u00e2n lo\u1ea1i v\u0103n b\u1ea3n<\/h2>\n\n\n\n<p>B\u00e2y gi\u1edd, ch\u00fang ta s\u1ebd c\u00f9ng nhau \u0111i\u1ec3m qua m\u1ed9t s\u1ed1 \u0111o\u1ea1n m\u00e3 ngu\u1ed3n ch\u00ednh trong ch\u01b0\u01a1ng tr\u00ecnh nh\u00e9. C\u00e1c b\u01b0\u1edbc \u1edf ph\u1ea7n n\u00e0y s\u1ebd b\u00e1m theo flow \u0111\u00e3 n\u00eau ra t\u1ea1i ph\u1ea7n tr\u00ean n\u00ean c\u00e1c b\u1ea1n c\u1ea7n \u0111\u1ea3m b\u1ea3o hi\u1ec3m r\u00f5 c\u00e1c b\u01b0\u1edbc b\u00ean tr\u00ean tr\u01b0\u1edbc khi b\u1eaft \u0111\u1ea7u.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/blog.francoismaillet.com\/wp-content\/uploads\/2015\/10\/map.png\" alt=\"ph\u00e2n lo\u1ea1i v\u0103n b\u1ea3n\"\/><\/figure>\n\n\n\n<h5 class=\"wp-block-heading\">B\u01b0\u1edbc 1. Load d\u1eef li\u1ec7u t\u1eeb c\u00e1c file v\u0103n b\u1ea3n trong th\u01b0 m\u1ee5c data<\/h5>\n\n\n\n<p>\u0110o\u1ea1n n\u00e0y ch\u00fang ta v\u1eeba load, v\u1eeba ti\u1ec1n x\u1eed l\u00fd v\u0103n b\u1ea3n, v\u1eeba t\u00e1ch c\u00e2u v\u00e0 \u0111\u1ed3ng th\u1eddi sinh ra labels cho c\u00e1c c\u00e2u.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def preProcess(sentences):\n\n    text = &#91;re.sub(r'(&#91;^\\s\\w]|_)+', '', sentence) for sentence in sentences if sentence!='']\n    text = &#91;sentence.lower().strip() for sentence in text]\n    return text\n\ndef loadData(data_folder):\n\n    texts = &#91;]\n    labels = &#91;]\n    #\n    for folder in listdir(data_folder):\n        #\n        if folder != \".DS_Store\":\n            print(\"Load cat: \",folder)\n            for file in listdir(data_folder + sep + folder):\n                #\n                if file!=\".DS_Store\":\n                    print(\"Load file: \", file)\n                    with open(data_folder + sep + folder + sep +  file, 'r', encoding=\"utf-8\") as f:\n                        all_of_it = f.read()\n                        sentences  = all_of_it.split('.')\n\n                        # Remove garbage\n                        sentences = preProcess(sentences)\n\n                        texts = texts + sentences\n                        label = &#91;folder for _ in sentences]\n                        labels = labels + label\n                        del all_of_it, sentences\n\n\n    return texts, labels<\/code><\/pre>\n\n\n\n<p>Sau 2 \u0111o\u1ea1n ch\u01b0\u01a1ng tr\u00ecnh n\u00e0y ch\u00fang ta s\u1ebd c\u00f3 1 list c\u00e1c c\u00e2u k\u00e8m v\u1edbi nh\u00e3n c\u00e2u \u0111\u00f3 thu\u1ed9c th\u1ec3 lo\u1ea1i g\u00ec. Ti\u1ebfp n\u00e0o!<\/p>\n\n\n\n<ins class=\"adsbygoogle\" style=\"display:block\" data-ad-client=\"ca-pub-5095883280136027\" data-ad-slot=\"7735063137\" data-ad-format=\"auto\" data-full-width-responsive=\"true\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n\n\n<h5 class=\"wp-block-heading\">B\u01b0\u1edbc 2. Tokenizer c\u00e1c c\u00e2u v\u0103n b\u1ea3n<\/h5>\n\n\n\n<p>B\u01b0\u1edbc n\u00e0y s\u1ebd chuy\u1ec3n c\u00e1c c\u00e2u v\u0103n b\u1ea3n t\u1eeb d\u1ea1ng list of string v\u1ec1 list of numbers. V\u00ed d\u1ee5 s\u1ebd chuy\u1ec3n c\u00e2u:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> &#91;'h\u00f4m nay ch\u00fang ta h\u1ecdc x\u1eed l\u00fd ng\u00f4n ng\u1eef t\u1ef1 nhi\u00ean']<\/code><\/pre>\n\n\n\n<p>th\u00e0nh (v\u00ed d\u1ee5 dummy th\u00f4i nh\u00e9, ko chu\u1ea9n \u0111\u00e2u haha )<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&#91; 20 1 22 31 34 22 1 12 67 43 22 14...] <\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>\ndef txtTokenizer(texts):\n    tokenizer = Tokenizer(num_words=500)\n    # fit the tokenizer on our text\n    tokenizer.fit_on_texts(&#91;text.split() for text in texts])\n\n    # get all words that the tokenizer knows\n    word_index = tokenizer.word_index\n    #return tokenizer, word_index\n\n    tokenizer, word_index = txtTokenizer(texts)\n\n    # put the tokens in a matrix\n    X = tokenizer.texts_to_sequences(texts)\n    X = pad_sequences(X)\n\n    # prepare the labels\n    y = pd.get_dummies(labels)<\/code><\/pre>\n\n\n\n<p>V\u1edbi nh\u00e3n \u0111\u1ea7u ra th\u00ec ta d\u00f9ng h\u00e0m get_dummies c\u1ee7a pandas \u0111\u1ec3 chuy\u1ec3n t\u1eeb d\u1ea1ng [&#8216;Kinh t\u1ebf&#8217;,&#8217;Gi\u00e1o d\u1ee5c&#8217;,&#8217;Kinh t\u1ebf&#8217;] th\u00e0nh d\u1ea1ng one-hot vector.<\/p>\n\n\n\n<p>R\u1ed3i, b\u00e2y gi\u1edd ti\u1ebfp t\u1ee5c sang b\u01b0\u1edbc 3!<\/p>\n\n\n\n<h5 class=\"wp-block-heading\">B\u01b0\u1edbc 3 &#8211; Train model Word2vec<\/h5>\n\n\n\n<p>\u0110o\u1ea1n n\u00e0y kh\u00e1 \u0111\u01a1n gi\u1ea3n do c\u00f3 th\u01b0 vi\u1ec7n c\u1ee7a Gensim r\u1ed3i. Ta ch\u1ec9 c\u1ea7n l\u00e0m m\u1ed9t \u0111o\u1ea1n l\u1ec7nh \u0111\u01a1n gi\u1ea3n sau:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>word_model = gensim.models.Word2Vec(texts, size=300, min_count=1, iter=10)\nword_model.save(data_folder + sep + \"word_model.save\")<\/code><\/pre>\n\n\n\n<ins class=\"adsbygoogle\" style=\"display:block\" data-ad-client=\"ca-pub-5095883280136027\" data-ad-slot=\"7735063137\" data-ad-format=\"auto\" data-full-width-responsive=\"true\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n\n\n<p>Sau khi train xong, ta c\u00f3 th\u1ec3 ki\u1ec3m tra qu\u00e1 tr\u00ecnh train b\u1eb1ng l\u1ec7nh sau \u0111\u1ec3 t\u00ecm c\u00e1c t\u1eeb li\u00ean quan v\u1edbi t\u1eeb &#8216;c\u01a1m&#8217; xem sao:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>print(word_model.wv.most_similar('c\u01a1m'))<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>&#91;('n\u1ea5u', 0.7218972444534302), ('ch\u00e1o', 0.6976884603500366), ('n\u01b0\u1edbng', 0.6886948347091675), ('b\u00fan', 0.6797002553939819), ('ph\u1edf', 0.6455461978912354), ('ri\u00eau', 0.6107968091964722), ('n\u00e1t', 0.6087987422943115), ('m\u00ec', 0.607367992401123), ('x\u00e0o', 0.6070189476013184), ('r\u00e1n', 0.587298572063446)]<\/code><\/pre>\n\n\n\n<p>Haha, in ra chu\u1ea9n ph\u1ebft, &#8216;n\u1ea5u&#8217;, &#8216;ch\u00e1o&#8217;&#8230;. to\u00e0n li\u00ean quan \u0111\u1ebfn c\u01a1m qu\u00e1 c\u00f2n g\u00ec!<\/p>\n\n\n\n<h5 class=\"wp-block-heading\">B\u01b0\u1edbc 4 &#8211; Train model ph\u00e2n lo\u1ea1i v\u0103n b\u1ea3n<\/h5>\n\n\n\n<p>R\u1ed3i, c\u00e1c ph\u1ea7n \u0111\u00e3 xong, b\u00e2y gi\u1edd \u0111\u01a1n gi\u1ea3n r\u1ed3i, gh\u00e9p n\u1ed1i v\u00e0 n\u00e9m v\u00e0o model c\u1ee7a ch\u00fang ta \u0111\u1ec3 train th\u00f4i. \u1ede \u0111\u00e2y m\u00ecnh d\u00f9ng m\u1ed9t model LSTM \u0111\u01a1n gi\u1ea3n th\u00f4i v\u00ec th\u1ea5y c\u0169ng kh\u00e1 hi\u1ec7u qu\u1ea3 r\u1ed3i.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>model = Sequential()\n    model.add(Embedding(len(word_model.wv.vocab)+1,300,input_length=X.shape&#91;1],weights=&#91;embedding_matrix],trainable=False))\nmodel.add(LSTM(300,return_sequences=False))\nmodel.add(Dense(y.shape&#91;1],activation=\"softmax\"))<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/devopedia.org\/images\/article\/221\/4080.1570464995.png\" alt=\"ph\u00e2n lo\u1ea1i v\u0103n b\u1ea3n\"\/><\/figure>\n\n\n\n<p>L\u01b0\u1ee3ng d\u1eef li\u1ec7u kh\u00e1 l\u1edbn n\u00ean sau khi train 1 epochs m\u00ecnh \u0111\u00e3 \u0111\u1ea1t \u0111\u01b0\u1ee3c acc v\u00e0 loss kh\u00e1 t\u1ed1t r\u1ed3i. <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/github.com\/thangnch\/MiAI_Word2Vec_Demo\/blob\/master\/Untitled.png?raw=true\" alt=\"ph\u00e2n lo\u1ea1i v\u0103n b\u1ea3n\"\/><\/figure>\n\n\n\n<ins class=\"adsbygoogle\" style=\"display:block\" data-ad-client=\"ca-pub-5095883280136027\" data-ad-slot=\"7735063137\" data-ad-format=\"auto\" data-full-width-responsive=\"true\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n\n\n<p>Cu\u1ed1i c\u00f9ng ch\u00fang ta eval tr\u00ean t\u1eadp test xem sao:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>model.evaluate(X_test,y_test)<\/code><\/pre>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img decoding=\"async\" src=\"https:\/\/github.com\/thangnch\/MiAI_Word2Vec_Demo\/blob\/master\/Untitled1.png?raw=true\" alt=\"\"\/><\/figure><\/div>\n\n\n\n<p>K\u1ebft qu\u1ea3 \u0111\u1ea1t <strong>accurary 0.77<\/strong> &#8211; \u1ed5n r\u1ed3i c\u00e1c b\u1ea1n \u01a1i!<\/p>\n\n\n\n<p>N\u1ebfu nh\u01b0 m\u00ecnh train ti\u1ebfp, kho\u1ea3ng 5 epochs n\u1eefa th\u00ec train accuracy l\u00ean \u0111\u01b0\u1ee3c <strong>0.95<\/strong>! Wow. <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/github.com\/thangnch\/MiAI_Word2Vec_Demo\/blob\/master\/Untitled2.png?raw=true\" alt=\"text classify\"\/><\/figure>\n\n\n\n<p>V\u00e0 n\u1ebfu ti\u1ebfp t\u1ee5c evaluate tr\u00ean t\u1eadp test ta s\u1ebd c\u00f3 k\u1ebft qu\u1ea3 l\u00e0 0.95 accuracy lu\u00f4n. Qu\u00e1 \u1ed5n \u00e1p!<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/github.com\/thangnch\/MiAI_Word2Vec_Demo\/blob\/master\/Untitled3.png?raw=true\" alt=\"text classification\"\/><\/figure>\n\n\n\n<p>C\u00e1c b\u1ea1n c\u00f3 th\u1ec3 t\u1ef1 ch\u1ea1y file train_model.py \u0111\u1ec3 ki\u1ec3m tra k\u1ebft qu\u1ea3 ho\u1eb7c s\u1eeda l\u1ea1i \u0111\u1ec3 predict th\u1eed 1 v\u00e0i k\u1ebft qu\u1ea3 trong t\u1eadp test xem :D. Ngo\u00e0i ra c\u00e1c b\u1ea1n c\u00f3 th\u1ec3 th\u00eam b\u1edbt d\u1eef li\u1ec7u, thay \u0111\u1ed5i ki\u1ebfn tr\u00fac m\u1ea1ng \u0111\u1ec3 xem k\u1ebft qu\u1ea3 thay \u0111\u1ed5i ra sao nh\u00e9!<\/p>\n\n\n\n<p>Ah c\u00f2n v\u1ea5n \u0111\u1ec1 l\u00e0 b\u00e0i n\u00e0y c\u00f2n n\u00e2ng cao \u0111\u01b0\u1ee3c accurary ko ? C\u00e2u tr\u1ea3 l\u1eddi l\u00e0 c\u00f3! C\u00e1c b\u1ea1n c\u00f3 th\u1ec3 \u00e1p d\u1ee5ng th\u00eam c\u00e1c bi\u1ec7n ph\u00e1p ti\u1ec1n x\u1eed l\u00fd (b\u1ecf ch\u1eef s\u1ed1, x\u1eed l\u00fd t\u1eeb vi\u1ebft sai, d\u00ednh t\u1eeb&#8230;) ho\u1eb7c thay b\u1ed9 tokenizer cho ph\u00f9 h\u1ee3p v\u1edbi ti\u1ebfng Vi\u1ec7t nh\u00e9 (b\u1eadt m\u00ed l\u00e0 trong c\u00e1c b\u00e0i sau m\u00ecnh c\u0169ng s\u1ebd guide qua ph\u1ea7n n\u00e0y). <\/p>\n\n\n\n<p>C\u00f2n b\u00e2y gi\u1edd, m\u00ecnh xin t\u1ea1m d\u1eebng b\u00e0i n\u00e0y \u1edf \u0111\u00e2y. H\u00f4m nay ch\u00fang ta \u0111\u00e3 h\u1ecdc \u0111\u01b0\u1ee3c kha kh\u00e1 v\u1ec1 Word2Vec r\u1ed3i. Trong c\u00e1c b\u00e0i t\u1edbi s\u1ebd c\u00f2n nhi\u1ec1u m\u00f3n nh\u01b0: Doc2Vec, Bert&#8230;. H\u1eb9n g\u1eb7p l\u1ea1i c\u00e1c b\u1ea1n!<\/p>\n\n\n\n<p><strong><em>H\u00e3y join c\u00f9ng c\u1ed9ng \u0111\u1ed3ng M\u00ec AI nh\u00e9!<\/em><\/strong><\/p>\n\n\n\n<p>Fanpage:&nbsp;<a rel=\"noreferrer noopener\" href=\"http:\/\/facebook.com\/miaiblog\" target=\"_blank\">http:\/\/facebook.com\/miaiblog<\/a><br>Group trao \u0111\u1ed5i, chia s\u1ebb:&nbsp;<a rel=\"noreferrer noopener\" href=\"https:\/\/www.facebook.com\/groups\/miaigroup\" target=\"_blank\">https:\/\/www.facebook.com\/groups\/miaigroup<\/a><br>Website:&nbsp;<a href=\"https:\/\/miai.vn\/\">https:\/\/miai.vn\/<\/a><br>Youtube:&nbsp;<a rel=\"noreferrer noopener\" href=\"http:\/\/bit.ly\/miaiyoutube\" target=\"_blank\">http:\/\/bit.ly\/miaiyoutube<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Trong b\u00e0i n\u00e0y ch\u00fang ta s\u1ebd s\u1eed d\u1ee5ng m\u1ed9t ph\u01b0\u01a1ng ph\u00e1p Word Embedding m\u1edbi l\u00e0 Gensim Word2Vec \u0111\u1ec3 t\u1ea1o vector \u0111\u1eb7c tr\u01b0ng cho c\u00e1c t\u1eeb v\u00e0 train model &#8220;ng\u1eedi v\u0103n&#8221; (ph\u00e2n lo\u1ea1i v\u0103n b\u1ea3n) cho m\u00e1y t\u00ednh nh\u00e9. B\u00e0i n\u00e0y l\u00e0 b\u00e0i th\u1ee9 2 trong Series v\u1ec1 NLP, b\u00e0i tr\u01b0\u1edbc m\u00ecnh \u0111\u00e3 chia s\u1ebb c\u00e1ch [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[67],"tags":[266,267,68,69,268,269,64,270,264,265],"class_list":["post-864","post","type-post","status-publish","format-standard","hentry","category-natural-language-processing","tag-document-classify","tag-gensim","tag-nature-language-processing","tag-nlp","tag-phan-loai-van-ban","tag-text-classify","tag-word-embeding","tag-word2vec","tag-xu-ly-ngon-ngu","tag-xu-ly-ngon-ngu-tu-nhien"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Th\u1eed x\u00e2y d\u1ef1ng model &quot;ng\u1eedi&quot; (ph\u00e2n lo\u1ea1i) v\u0103n b\u1ea3n d\u00f9ng Word2Vec - M\u00ec AI<\/title>\n<meta name=\"description\" content=\"Trong b\u00e0i n\u00e0y ch\u00fang ta s\u1ebd d\u00f9ng Gensim Word2Vec \u0111\u1ec3 t\u1ea1o vector \u0111\u1eb7c tr\u01b0ng cho c\u00e1c t\u1eeb v\u00e0 train model &quot;ng\u1eedi v\u0103n&quot; (ph\u00e2n lo\u1ea1i v\u0103n b\u1ea3n) cho m\u00e1y t\u00ednh nh\u00e9.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/miai.vn\/?p=864\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Th\u1eed x\u00e2y d\u1ef1ng model &quot;ng\u1eedi&quot; (ph\u00e2n lo\u1ea1i) v\u0103n b\u1ea3n d\u00f9ng Word2Vec - M\u00ec AI\" \/>\n<meta property=\"og:description\" content=\"Trong b\u00e0i n\u00e0y ch\u00fang ta s\u1ebd d\u00f9ng Gensim Word2Vec \u0111\u1ec3 t\u1ea1o vector \u0111\u1eb7c tr\u01b0ng cho c\u00e1c t\u1eeb v\u00e0 train model &quot;ng\u1eedi v\u0103n&quot; (ph\u00e2n lo\u1ea1i v\u0103n b\u1ea3n) cho m\u00e1y t\u00ednh nh\u00e9.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/miai.vn\/?p=864\" \/>\n<meta property=\"og:site_name\" content=\"M\u00ec AI\" \/>\n<meta property=\"article:published_time\" content=\"2020-05-11T03:33:41+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/github.com\/thangnch\/MiAI_Word2Vec_Demo\/blob\/master\/Screen%20Shot%202020-05-11%20at%2013.40.29.png?raw=true\" \/>\n<meta name=\"author\" content=\"Ch\u1ee7 ti\u1ec7m M\u00ec\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Ch\u1ee7 ti\u1ec7m M\u00ec\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/miai.vn\\\/?p=864#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/miai.vn\\\/?p=864\"},\"author\":{\"name\":\"Ch\u1ee7 ti\u1ec7m M\u00ec\",\"@id\":\"https:\\\/\\\/miai.vn\\\/#\\\/schema\\\/person\\\/cc8bc24bb90bd3f596add82f3a59948c\"},\"headline\":\"[NLP Series #2] S\u1eed d\u1ee5ng Gensim Word2Vec \u0111\u1ec3 d\u1ea1y m\u00e1y t\u00ednh &#8220;ng\u1eedi&#8221; v\u0103n (ph\u00e2n lo\u1ea1i v\u0103n b\u1ea3n)\",\"datePublished\":\"2020-05-11T03:33:41+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/miai.vn\\\/?p=864\"},\"wordCount\":1852,\"commentCount\":5,\"publisher\":{\"@id\":\"https:\\\/\\\/miai.vn\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/miai.vn\\\/?p=864#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/github.com\\\/thangnch\\\/MiAI_Word2Vec_Demo\\\/blob\\\/master\\\/Screen%20Shot%202020-05-11%20at%2013.40.29.png?raw=true\",\"keywords\":[\"document classify\",\"gensim\",\"Nature Language Processing\",\"NLP\",\"ph\u00e2n lo\u1ea1i v\u0103n b\u1ea3n\",\"text classify\",\"word embeding\",\"word2vec\",\"x\u1eed l\u00fd ng\u00f4n ng\u1eef\",\"X\u1eed l\u00fd ng\u00f4n ng\u1eef t\u1ef1 nhi\u00ean\"],\"articleSection\":[\"Natural Language Processing\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/miai.vn\\\/?p=864#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/miai.vn\\\/?p=864\",\"url\":\"https:\\\/\\\/miai.vn\\\/?p=864\",\"name\":\"Th\u1eed x\u00e2y d\u1ef1ng model \\\"ng\u1eedi\\\" (ph\u00e2n lo\u1ea1i) v\u0103n b\u1ea3n d\u00f9ng Word2Vec - M\u00ec AI\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/miai.vn\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/miai.vn\\\/?p=864#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/miai.vn\\\/?p=864#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/github.com\\\/thangnch\\\/MiAI_Word2Vec_Demo\\\/blob\\\/master\\\/Screen%20Shot%202020-05-11%20at%2013.40.29.png?raw=true\",\"datePublished\":\"2020-05-11T03:33:41+00:00\",\"description\":\"Trong b\u00e0i n\u00e0y ch\u00fang ta s\u1ebd d\u00f9ng Gensim Word2Vec \u0111\u1ec3 t\u1ea1o vector \u0111\u1eb7c tr\u01b0ng cho c\u00e1c t\u1eeb v\u00e0 train model \\\"ng\u1eedi v\u0103n\\\" (ph\u00e2n lo\u1ea1i v\u0103n b\u1ea3n) cho m\u00e1y t\u00ednh nh\u00e9.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/miai.vn\\\/?p=864#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/miai.vn\\\/?p=864\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/miai.vn\\\/?p=864#primaryimage\",\"url\":\"https:\\\/\\\/github.com\\\/thangnch\\\/MiAI_Word2Vec_Demo\\\/blob\\\/master\\\/Screen%20Shot%202020-05-11%20at%2013.40.29.png?raw=true\",\"contentUrl\":\"https:\\\/\\\/github.com\\\/thangnch\\\/MiAI_Word2Vec_Demo\\\/blob\\\/master\\\/Screen%20Shot%202020-05-11%20at%2013.40.29.png?raw=true\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/miai.vn\\\/?p=864#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/miai.vn\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"[NLP Series #2] S\u1eed d\u1ee5ng Gensim Word2Vec \u0111\u1ec3 d\u1ea1y m\u00e1y t\u00ednh &#8220;ng\u1eedi&#8221; v\u0103n (ph\u00e2n lo\u1ea1i v\u0103n b\u1ea3n)\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/miai.vn\\\/#website\",\"url\":\"https:\\\/\\\/miai.vn\\\/\",\"name\":\"M\u00ec AI\",\"description\":\"H\u1ecdc AI theo c\u00e1ch M\u00ec \u0103n li\u1ec1n!\",\"publisher\":{\"@id\":\"https:\\\/\\\/miai.vn\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/miai.vn\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/miai.vn\\\/#organization\",\"name\":\"M\u00ec AI\",\"url\":\"https:\\\/\\\/miai.vn\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/miai.vn\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/miai.vn\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/cropped-Logo_w_slogan.png\",\"contentUrl\":\"https:\\\/\\\/miai.vn\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/cropped-Logo_w_slogan.png\",\"width\":240,\"height\":193,\"caption\":\"M\u00ec AI\"},\"image\":{\"@id\":\"https:\\\/\\\/miai.vn\\\/#\\\/schema\\\/logo\\\/image\\\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/miai.vn\\\/#\\\/schema\\\/person\\\/cc8bc24bb90bd3f596add82f3a59948c\",\"name\":\"Ch\u1ee7 ti\u1ec7m M\u00ec\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b0b5124b0d2c0a8054d7127d2c236bdc3dc7a50e2d4e8728ab32eee5b122a8d1?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b0b5124b0d2c0a8054d7127d2c236bdc3dc7a50e2d4e8728ab32eee5b122a8d1?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b0b5124b0d2c0a8054d7127d2c236bdc3dc7a50e2d4e8728ab32eee5b122a8d1?s=96&d=mm&r=g\",\"caption\":\"Ch\u1ee7 ti\u1ec7m M\u00ec\"},\"sameAs\":[\"https:\\\/\\\/miai.vn\"],\"url\":\"https:\\\/\\\/miai.vn\\\/?author=1\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Th\u1eed x\u00e2y d\u1ef1ng model \"ng\u1eedi\" (ph\u00e2n lo\u1ea1i) v\u0103n b\u1ea3n d\u00f9ng Word2Vec - M\u00ec AI","description":"Trong b\u00e0i n\u00e0y ch\u00fang ta s\u1ebd d\u00f9ng Gensim Word2Vec \u0111\u1ec3 t\u1ea1o vector \u0111\u1eb7c tr\u01b0ng cho c\u00e1c t\u1eeb v\u00e0 train model \"ng\u1eedi v\u0103n\" (ph\u00e2n lo\u1ea1i v\u0103n b\u1ea3n) cho m\u00e1y t\u00ednh nh\u00e9.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/miai.vn\/?p=864","og_locale":"en_US","og_type":"article","og_title":"Th\u1eed x\u00e2y d\u1ef1ng model \"ng\u1eedi\" (ph\u00e2n lo\u1ea1i) v\u0103n b\u1ea3n d\u00f9ng Word2Vec - M\u00ec AI","og_description":"Trong b\u00e0i n\u00e0y ch\u00fang ta s\u1ebd d\u00f9ng Gensim Word2Vec \u0111\u1ec3 t\u1ea1o vector \u0111\u1eb7c tr\u01b0ng cho c\u00e1c t\u1eeb v\u00e0 train model \"ng\u1eedi v\u0103n\" (ph\u00e2n lo\u1ea1i v\u0103n b\u1ea3n) cho m\u00e1y t\u00ednh nh\u00e9.","og_url":"https:\/\/miai.vn\/?p=864","og_site_name":"M\u00ec AI","article_published_time":"2020-05-11T03:33:41+00:00","og_image":[{"url":"https:\/\/github.com\/thangnch\/MiAI_Word2Vec_Demo\/blob\/master\/Screen%20Shot%202020-05-11%20at%2013.40.29.png?raw=true","type":"","width":"","height":""}],"author":"Ch\u1ee7 ti\u1ec7m M\u00ec","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Ch\u1ee7 ti\u1ec7m M\u00ec","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/miai.vn\/?p=864#article","isPartOf":{"@id":"https:\/\/miai.vn\/?p=864"},"author":{"name":"Ch\u1ee7 ti\u1ec7m M\u00ec","@id":"https:\/\/miai.vn\/#\/schema\/person\/cc8bc24bb90bd3f596add82f3a59948c"},"headline":"[NLP Series #2] S\u1eed d\u1ee5ng Gensim Word2Vec \u0111\u1ec3 d\u1ea1y m\u00e1y t\u00ednh &#8220;ng\u1eedi&#8221; v\u0103n (ph\u00e2n lo\u1ea1i v\u0103n b\u1ea3n)","datePublished":"2020-05-11T03:33:41+00:00","mainEntityOfPage":{"@id":"https:\/\/miai.vn\/?p=864"},"wordCount":1852,"commentCount":5,"publisher":{"@id":"https:\/\/miai.vn\/#organization"},"image":{"@id":"https:\/\/miai.vn\/?p=864#primaryimage"},"thumbnailUrl":"https:\/\/github.com\/thangnch\/MiAI_Word2Vec_Demo\/blob\/master\/Screen%20Shot%202020-05-11%20at%2013.40.29.png?raw=true","keywords":["document classify","gensim","Nature Language Processing","NLP","ph\u00e2n lo\u1ea1i v\u0103n b\u1ea3n","text classify","word embeding","word2vec","x\u1eed l\u00fd ng\u00f4n ng\u1eef","X\u1eed l\u00fd ng\u00f4n ng\u1eef t\u1ef1 nhi\u00ean"],"articleSection":["Natural Language Processing"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/miai.vn\/?p=864#respond"]}]},{"@type":"WebPage","@id":"https:\/\/miai.vn\/?p=864","url":"https:\/\/miai.vn\/?p=864","name":"Th\u1eed x\u00e2y d\u1ef1ng model \"ng\u1eedi\" (ph\u00e2n lo\u1ea1i) v\u0103n b\u1ea3n d\u00f9ng Word2Vec - M\u00ec AI","isPartOf":{"@id":"https:\/\/miai.vn\/#website"},"primaryImageOfPage":{"@id":"https:\/\/miai.vn\/?p=864#primaryimage"},"image":{"@id":"https:\/\/miai.vn\/?p=864#primaryimage"},"thumbnailUrl":"https:\/\/github.com\/thangnch\/MiAI_Word2Vec_Demo\/blob\/master\/Screen%20Shot%202020-05-11%20at%2013.40.29.png?raw=true","datePublished":"2020-05-11T03:33:41+00:00","description":"Trong b\u00e0i n\u00e0y ch\u00fang ta s\u1ebd d\u00f9ng Gensim Word2Vec \u0111\u1ec3 t\u1ea1o vector \u0111\u1eb7c tr\u01b0ng cho c\u00e1c t\u1eeb v\u00e0 train model \"ng\u1eedi v\u0103n\" (ph\u00e2n lo\u1ea1i v\u0103n b\u1ea3n) cho m\u00e1y t\u00ednh nh\u00e9.","breadcrumb":{"@id":"https:\/\/miai.vn\/?p=864#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/miai.vn\/?p=864"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/miai.vn\/?p=864#primaryimage","url":"https:\/\/github.com\/thangnch\/MiAI_Word2Vec_Demo\/blob\/master\/Screen%20Shot%202020-05-11%20at%2013.40.29.png?raw=true","contentUrl":"https:\/\/github.com\/thangnch\/MiAI_Word2Vec_Demo\/blob\/master\/Screen%20Shot%202020-05-11%20at%2013.40.29.png?raw=true"},{"@type":"BreadcrumbList","@id":"https:\/\/miai.vn\/?p=864#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/miai.vn\/"},{"@type":"ListItem","position":2,"name":"[NLP Series #2] S\u1eed d\u1ee5ng Gensim Word2Vec \u0111\u1ec3 d\u1ea1y m\u00e1y t\u00ednh &#8220;ng\u1eedi&#8221; v\u0103n (ph\u00e2n lo\u1ea1i v\u0103n b\u1ea3n)"}]},{"@type":"WebSite","@id":"https:\/\/miai.vn\/#website","url":"https:\/\/miai.vn\/","name":"M\u00ec AI","description":"H\u1ecdc AI theo c\u00e1ch M\u00ec \u0103n li\u1ec1n!","publisher":{"@id":"https:\/\/miai.vn\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/miai.vn\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/miai.vn\/#organization","name":"M\u00ec AI","url":"https:\/\/miai.vn\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/miai.vn\/#\/schema\/logo\/image\/","url":"https:\/\/miai.vn\/wp-content\/uploads\/2026\/05\/cropped-Logo_w_slogan.png","contentUrl":"https:\/\/miai.vn\/wp-content\/uploads\/2026\/05\/cropped-Logo_w_slogan.png","width":240,"height":193,"caption":"M\u00ec AI"},"image":{"@id":"https:\/\/miai.vn\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/miai.vn\/#\/schema\/person\/cc8bc24bb90bd3f596add82f3a59948c","name":"Ch\u1ee7 ti\u1ec7m M\u00ec","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/b0b5124b0d2c0a8054d7127d2c236bdc3dc7a50e2d4e8728ab32eee5b122a8d1?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/b0b5124b0d2c0a8054d7127d2c236bdc3dc7a50e2d4e8728ab32eee5b122a8d1?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/b0b5124b0d2c0a8054d7127d2c236bdc3dc7a50e2d4e8728ab32eee5b122a8d1?s=96&d=mm&r=g","caption":"Ch\u1ee7 ti\u1ec7m M\u00ec"},"sameAs":["https:\/\/miai.vn"],"url":"https:\/\/miai.vn\/?author=1"}]}},"_links":{"self":[{"href":"https:\/\/miai.vn\/index.php?rest_route=\/wp\/v2\/posts\/864","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/miai.vn\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/miai.vn\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/miai.vn\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/miai.vn\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=864"}],"version-history":[{"count":0,"href":"https:\/\/miai.vn\/index.php?rest_route=\/wp\/v2\/posts\/864\/revisions"}],"wp:attachment":[{"href":"https:\/\/miai.vn\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=864"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/miai.vn\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=864"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/miai.vn\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=864"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}