Python教程-使用Python Faker库生成假数据

Python提供了一个开源库，称为Faker，可以帮助用户生成虚假数据集。我们可以使用随机属性（如姓名、年龄、位置等）生成随机数据。Faker库支持所有主要位置和语言，可以根据地区生成数据。

我们可以利用这些虚假数据来调整机器学习模型、对模型进行压力测试等。我们可以根据需要生成数据。我们还可以将Faker数据用于培训和学习，例如在各种数据类型上执行各种操作。

我们还可以使用生成的数据集来调整机器学习模型、验证模型和测试模型。

在以下教程中，我们将了解Faker及其函数，并创建自己的数据集。

让我们开始实施Faker库。

实施Faker库

在开始使用Faker之前，有必要安装该库。我们可以使用命令提示符或终端中的pip安装程序来执行此操作，如下所示：

语法：

$ pip install faker

导入所需库

为了了解Faker库的各种功能，我们必须导入Faker库。我们还导入了pandas库，因为我们将对数据集执行一些操作。

语法：

from faker import Faker  
import pandas as pd

使用各种函数

一旦我们导入了所需的库，让我们尝试使用Faker库中的各种函数。为了执行此类操作，我们必须使用一个变量初始化Faker函数，如下所示：

语法：

sample = Faker()

我们将使用以下一些函数：

语法：

sample.name()  
sample.date_of_birth()  
sample.address()  
sample.country()  
sample.email()

让我们考虑一个演示这些函数工作方式的示例：

示例：

# importing the required libraries  
from faker import Faker  
import pandas as pd  
  
# defining the variable for Faker() module  
sample = Faker()  
  
# using some functions  
print("Your Name: ", sample.name())  
print("Your Date of Birth: ", sample.date_of_birth())  
print("Your Address: ", sample.address())  
print("Your Country: ", sample.country())  
print("Your E-mail Address: ", sample.email())

输出：

Your Name:  Teresa Hill
Your Date of Birth:  1950-03-12
Your Address:  430 Bauer Turnpike Suite 931
Annaton, OR 12319
Your Country:  Angola
Your E-mail Address:  christy34@yahoo.com

解释：

在上面的示例中，我们导入了所需的库并定义了Faker()模块的变量。然后，我们使用name、date_of_birth、address、country和email等函数生成一些虚假数据集。由于这个生成的数据集是随机的，每次执行代码时都会得到不同的数据集。

我们还可以根据不同的地点和语言生成信息。我们只需提供所需的语言即可。让我们考虑以下示例，其中我们以印地语、法语和日语生成了一些数据。

示例：

# importing the required libraries  
from faker import Faker  
import pandas as pd  
  
# defining the variable for Faker() module  
sample = Faker(['hi_IN', 'fr', 'jp_JP'])  
for n in range(10):  
    print(sample.name())

输出：

Thomas Schneider
?????? ??????
?? ??
Lucas Poulain
????? ??????
Aurélie Merle-Menard
?? ??
????????, ??????
Stéphane Lefebvre-Alves
????? ?????

解释：

在上面的示例中，我们再次导入了所需的模块并定义了Faker()模块的变量，同时指定了一些语言作为参数。然后，我们使用“for”语句以不同的语言生成名称。结果，程序已为用户生成了十个不同语言的不同名称。

我们还可以使用text和sentences等函数创建自己的文本或句子。

让我们考虑以下示例，了解这些函数的工作方式。

示例：

# importing the required libraries  
from faker import Faker  
import pandas as pd  
  
# defining the variable for Faker() module  
sample = Faker()  
  
# printing the text  
print("Text: ", sample.text())  
  
# printing the sentence  
print("Sentence: ", sample.sentence())

输出：

Text:  Size plant task we through score name. Whose learn drop ground.
Option entire some surface seek film involve. Billion body really common decade man. Worker foreign your then likely beat.
Sentence:  Project star plant she energy them leave.

解释：

在上面的示例中，我们再次导入了所需的模块并定义了Faker()模块的变量。然后，我们使用text和sentence等函数创建了自己的文本和句子，并将它们打印出来。结果，我们成功地创建了自己的句子。

但是，我们还可以定义一个单词库，其中存储了一组单词，允许我们使用这些指定的单词生成新的虚假句子。让我们考虑以下示例，生成虚假句子。

示例：

# importing the required libraries  
from faker import Faker  
import pandas as pd  
  
# defining the variable for Faker() module  
sample = Faker()  
# list of words  
mywords = ['Cow', 'domestic', 'why', 'what', 'bird', 'parrot', 'is', 'animal', 'a', 'my']  
  
# printing the sentence  
print("Sentence: ", sample.sentence(ext_word_list = mywords))

输出：

Sentence:  Cow is domestic domestic domestic animal animal.

解释：

在上面的示例中，我们再次导入了所需的模块并定义了Faker()模块的变量。然后，我们定义了一个单词列表，并使用sentence()函数使用我们创建的单词库生成了句子。结果，使用列表中的单词生成了虚假句子。

此外，Faker()模块还提供了一个profile函数，用于生成不同虚假人员的完整个人资料，而不是分别生成姓名和地址。让我们考虑以下示例，了解此函数的行为。

示例：

# importing the required libraries  
from faker import Faker  
import pandas as pd  
  
# defining the variable for Faker() module  
sample = Faker()  
  
# generating the profile  
print("Complete Profile: ", sample.profile())

输出：

Complete Profile:  {'job': 'Minerals surveyor', 'company': 'Nichols and Sons', 'ssn': '715-16-7081', 'residence': '550 Moore Locks\nSouth Andrea, SD 94842', 'current_location': (Decimal('-78.730969'), Decimal('-151.109875')), 'blood_group': 'B+', 'website': ['https://www.smith-avila.com/', 'http://bennett-scott.com/', 'https://www.nguyen.com/'], 'username': 'joseph04', 'name': 'Toni Martin', 'sex': 'F', 'address': '29676 Mann Rapid\nWilkinsonbury, MN 35916', 'mail': 'jwallace@yahoo.com', 'birthdate': datetime.date(2016, 10, 1)}

解释：

在上面的示例中，我们再次导入了所需的模块并定义了变量。然后，我们使用profile函数生成了虚假个人资料，并将其打印给用户。

现在，让我们使用Faker库创建一个虚假数据集。

使用Faker库创建虚假数据集

由于我们已经了解了大多数函数并在上一部分中已经使用了profile函数，让我们尝试生成包含20个独特人员的虚假个人资料数据集。为了将这些个人资料存储到数据框中，我们还将使用pandas库。

示例：

# importing the required libraries  
from faker import Faker  
import pandas as pd  
  
# defining the variable for Faker() module  
sample = Faker()  
  
# generating the profiles of 20 people  
mydata = [sample.profile() for n in range(20)]  
my_dframe = pd.DataFrame(mydata)  
  
print(my_dframe)

输出：

                                              job                    company  ...                       mail   birthdate
0                         Housing manager/officer                  Cross LLC  ...  robinsonroger@hotmail.com  1983-03-26
1                       Learning disability nurse            Bennett-Sellers  ...      fordjeffrey@gmail.com  1923-04-14
2                           Agricultural engineer                Patrick PLC  ...     franciskayla@gmail.com  1941-01-13
3              Research scientist (life sciences)    Coleman, Shaw and Owens  ...    tyleramanda@hotmail.com  1927-07-07
4                                   Haematologist           Jefferson-Bailey  ...     gomezbreanna@gmail.com  2001-06-06
5   Chartered legal executive (England and Wales)            Torres-Andersen  ...      xgonzalez@hotmail.com  1956-05-12
6                                    Statistician            Rodriguez-Chung  ...      millersarah@yahoo.com  1955-07-06
7                                Paediatric nurse  Simmons, Acosta and Gates  ...  melissacampos@hotmail.com  1984-02-29
8                             Dispensing optician                  Bauer Inc  ...     amandacook@hotmail.com  1935-03-30
9                  Equality and diversity officer  Martinez, Allen and Davis  ...     reedjennifer@gmail.com  2019-06-28
10                       Secondary school teacher  Greene, Gonzalez and Hill  ...    kellythomas@hotmail.com  1913-10-02
11                                   TEFL teacher             Smith and Sons  ...     justinsnyder@gmail.com  1989-06-17
12              Planning and development surveyor       Smith, Lee and Reyes  ...    sanderstony@hotmail.com  1905-09-05
13                               Product designer   Taylor, Davis and Wilson  ...       perezdavid@yahoo.com  1938-11-27
14                  Development worker, community              Carlson-Evans  ...          david26@gmail.com  1929-03-08
15                    Engineer, building services                 Pham Group  ...   valdezlaurie@hotmail.com  1984-12-31
16                       Therapist, horticultural          Anderson-Gonzalez  ...     georgecolton@yahoo.com  1929-03-16
17       Geographical information systems officer               Burke-Burton  ...          randy36@gmail.com  1997-06-12
18                                 Retail manager               Rivera-Lucas  ...            tcruz@yahoo.com  2016-03-20
19                       Therapeutic radiographer             Holloway Group  ...       oalexander@yahoo.com  2011-02-23

[20 rows x 13 columns]

解释：

在上面的示例中，我们再次导入了所需的库并定义了变量。然后，我们定义了包含20个人员的个人资料的数据。最后，我们将此数据转换为数据框并将其打印给用户。结果，生成的数据集存储了各种属性，如职位、公司、位置、电子邮件等。我们可以根据需要使用此数据集。

Python教程-使用Python Faker库生成假数据

实施Faker库

使用Faker库创建虚假数据集

推荐文章

其它