tnblog
首页
视频
资源
登录

HugginFace 使用管道工具(学习笔记)

2429人阅读 2023/10/24 16:28 总访问:3505539 评论:0 收藏:0 手机
分类: HuggingFace

HugginFace 使用管道工具(学习笔记)

管道工具介绍


HuggingFace 有一个巨大的模型库,其中一些是已经非常成熟的经典模型,这些模型使不进行仍和训练也可以得到比较好的预测结果,也就是场所的Zero Shot Learning。
使用管道工具时,调用者需要做的只是告诉管道工具进行的任务类型,管道工具自动分配合适的模型,直接给出预测结果,如果这个预测结果对于调用者已经可以满足则不需要再次训练。。
管道工具的API简洁,隐藏了很多大量复杂的代码。

使用管道工具

常见任务演示

文本分类


使用管道工具处理文本分类任务,代码如下:

  1. from transformers import pipeline
  2. classifier = pipeline("sentiment-analysis")
  3. result = classifier("I hate you")[0]
  4. print(result)
  5. result = classifier("I love you")[0]
  6. print(result)


把任务类型输入pipeline()函数中,返回值即为能执行具体预测任务的classifier对象,如果向具体的句子输入该对象,则会返回具体的预测结果。
举例预测I hate youI love you两种情感分类,结果如下:


从运行来看,前者结果为NEGATIVE(消极)后者结果为POSITIVE(积极),并且准确率很高。

阅读理解


使用管道工具处理阅读理解任务,代码如下:

  1. #第5章/阅读理解
  2. from transformers import pipeline
  3. question_answerer = pipeline("question-answering")
  4. context = r"""
  5. Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
  6. question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
  7. a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.
  8. """
  9. result = question_answerer(
  10. question="What is extractive question answering?",
  11. context=context,
  12. )
  13. print(result)
  14. result = question_answerer(
  15. question="What is a good example of a question answering dataset?",
  16. context=context,
  17. )
  18. print(result)


在这段代码中,首先以question-answering为参数调用了pipeline()函数,得到了question_answerer对象。
在调用question_answerer函数时应当传入相应的question问题和context内容,问题必须在context中有写到。


这里的第一个问题翻译成中文是什么是抽取式问答?,模型给出的答案翻译成中文是从给定文本中提取答案的任务
第二个问题是问答数据集的一个好例子是什么?,模型给出的答案翻译成中文是SQuAD 数据集

完形填空


使用管道工具处理完形填空,代码如下:

  1. #第5章/完形填空
  2. from transformers import pipeline
  3. unmasker = pipeline("fill-mask")
  4. from pprint import pprint
  5. sentence = 'HuggingFace is creating a <mask> that the community uses to solve NLP tasks.'
  6. unmasker(sentence)


在这段代码中,sentence是一个句子,其中某些词<mask>符号替代了,表明这是需要让模型填空的空位,运行结果如下:


中文翻译是:HuggingFace正在创建一个社区用户,用于解决NLP任务的_
它给了5个答案:工具框架数据库原型

文本生成

  1. #第5章/文本生成
  2. from transformers import pipeline
  3. text_generator = pipeline("text-generation")
  4. text_generator("As far as I am concerned, I will",
  5. max_length=50,
  6. do_sample=False)


这段代码中,得到了text_generator对象后,直接调用text_generator对象,入参为一个句子的开头,让text_generator接着往下续写,参数max_length=50表明要续写的长度运行结果如下:


翻译:就我而言,我将是第一个承认我不喜欢“自由市场”的想法的人。我认为自由市场的想法有点牵强。我认为这个想法。

命名实体


命名实体识别任务为找出一段文本中的人名、地名、组织机构名等。
使用管道工具命名实体识别任务,代码如下:

  1. from transformers import pipeline
  2. ner_pipe = pipeline("ner")
  3. sequence = """Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO,
  4. therefore very close to the Manhattan Bridge which is visible from the window."""
  5. for entity in ner_pipe(sequence):
  6. print(entity)


可以看到结果当中对所有的都进行了一定的名称类型区分。

文本摘要


简单来说,简化。代码如下:

  1. from transformers import pipeline
  2. summarizer = pipeline("summarization")
  3. ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
  4. A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
  5. Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
  6. In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
  7. Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
  8. 2010 marriage license application, according to court documents.
  9. Prosecutors said the marriages were part of an immigration scam.
  10. On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
  11. After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
  12. Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
  13. All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
  14. Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
  15. Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
  16. The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
  17. Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
  18. Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
  19. If convicted, Barrientos faces up to four years in prison. Her next court appearance is scheduled for May 18.
  20. """
  21. summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False)


这里的ARTICLE是我们的文本内容,指定摘要的内容为30-130个词。

翻译


这里我们将英语的翻译成德语。

  1. from transformers import pipeline
  2. translator = pipeline("translation_en_to_de")
  3. sentence = "Hugging Face is a technology company based in New York and Paris"
  4. translator(sentence, max_length=40)

由于默认的翻译任务底层调用的是5t-base模型,该模型只支持由英语翻译为德语、法语、罗马尼亚文,如果需要支持其他语言则需要替换模型。

替换模型执行任务

替换模型执行中译英任务

  1. from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
  2. #要使用该模型,需要安装sentencepiece
  3. # !pip install sentencepiece
  4. tokenizer = AutoTokenizer.from_pretrained("KennStack01/Helsinki-NLP-opus-mt-zh-en")
  5. model = AutoModelForSeq2SeqLM.from_pretrained("KennStack01/Helsinki-NLP-opus-mt-zh-en")
  6. translator = pipeline(task="translation_zh_to_en",
  7. model=model,
  8. tokenizer=tokenizer)
  9. sentence = "我叫萨拉,我住在伦敦。"
  10. translator(sentence, max_length=20)

替换模型执行英译中任务

  1. from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
  2. #要使用该模型,需要安装sentencepiece
  3. # !pip install sentencepiece
  4. tokenizer = AutoTokenizer.from_pretrained("KennStack01/Helsinki-NLP-opus-mt-en-zh")
  5. model = AutoModelForSeq2SeqLM.from_pretrained("KennStack01/Helsinki-NLP-opus-mt-en-zh")
  6. translator = pipeline(task="translation_en_to_zh",
  7. model=model,
  8. tokenizer=tokenizer)
  9. sentence = "My name is Sarah, and I live in London."
  10. translator(sentence, max_length=20)


欢迎加群讨论技术,1群:677373950(满了,可以加,但通过不了),2群:656732739

评价

HugginFace 初探

HugginFace 初探[TOC] 安装环境python环境是3.6。import sys sys.version 安装torch,简单起见,避免环境问题,并且计...

HugginFace 使用编码工具(学习笔记)

HugginFace 使用编码工具(学习笔记)[TOC] 安装环境# 我这里的python是3.11 %pip install -q transformers==4.18 datasets...

HugginFace 使用数据集(学习笔记)

HugginFace 使用数据集(学习笔记)[TOC] 数据集工具介绍HuggingFace 提供了统一的数据集处理工具,让不同的数据集通过统一...

HugginFace 使用评价指标工具(学习笔记)

HugginFace 使用评价指标工具(学习笔记)[TOC] 评价指标工具介绍在训练和测试一个模型时往往需要计算不同的评价指标,如正...

HugginFace 使用训练工具(学习笔记)

HugginFace 使用训练工具(学习笔记)[TOC] 训练工具介绍HuggingFace提供了巨大的模型库,但我们往往还需要对特定的数据集进...

HugginFace 中文情感分类(学习笔记)

HugginFace 中文情感分类(学习笔记)[TOC] 数据集介绍本章使用的是lansinuote/ChnSentiCorp数据集,这是一个情感分类数据集...

HugginFace 中文填空(学习笔记)

HugginFace 中文填空(学习笔记)[TOC] 数据集介绍本章使用的仍然是情感分类数据集,每条包括一句购物评价一集以及是不是好...

HugginFace 中文数据关系推断(学习笔记)

HugginFace 中文数据关系推断(学习笔记)[TOC] 实现代码安装包加载的环境可以通过如下命令进行安装。%pip install -q trans...

HugginFace 中文命名实体识别(学习笔记)

HugginFace 中文命名实体识别(学习笔记)[TOC] 任务简介简单来说就是的识别人名、机构名、地名。数据集的介绍本章所使用的...

常用的很厉害的工具

图片压缩相对图片无损放大来说是小kiss。下面是非常非常帅气的图片无损放大http://bigjpg.com/图片压缩https://www.upyun.c...

一些有用的资源分享(工具+电子书)

工具类图片相关工具TinyPNG:https://tinypng.com/ 免费的在线图片压缩工具,压缩率高,无损画质,直接拖拽使用,很方便。p...

开发自己的代码生成工具

在一个项目中其实有很多代码都是重复的,几乎每个基础模块的代码都有增删改查的功能,而这些功能都是大同小异,如果这些功...

windoes 强力下载工具Internet Download Manager

大家好,我是刘小贱,今天呢我给大家推荐一款Windows系统上的下载神器:Internet Download Manager ,这款软件通过在浏览器上...

net core使用jwt 三: 使用过滤器实现通用token验证Token验证工具

net core使用jwt二 : 验证前台传递的tokenhttp://www.tnblog.net/aojiancc2/article/details/2845过滤器实现通用token验证...

oralce plsql打开执行sql的工具

如果不小心关闭了,要打开执行sql的工具栏如下箭头的地方右键即可
这一世以无限游戏为使命!
排名
2
文章
640
粉丝
44
评论
93
docker中Sware集群与service
尘叶心繁 : 想学呀!我教你呀
一个bug让程序员走上法庭 索赔金额达400亿日元
叼着奶瓶逛酒吧 : 所以说做程序员也要懂点法律知识
.net core 塑形资源
剑轩 : 收藏收藏
映射AutoMapper
剑轩 : 好是好,这个对效率影响大不大哇,效率高不高
ASP.NET Core 服务注册生命周期
剑轩 : http://www.tnblog.net/aojiancc2/article/details/167
ICP备案 :渝ICP备18016597号-1
网站信息:2018-2025TNBLOG.NET
技术交流:群号656732739
联系我们:contact@tnblog.net
公网安备:50010702506256
欢迎加群交流技术