粉丝最多的50个推特号 ·

R语言数据分析 - 这篇文章属于一个选集。

§ 1: 二氧化碳排放量估算

§ 2: 本文

§ 3: 美国监狱死亡情况

§ 4: 乳腺癌预测模型

Data source: Top 50 Most Followed Twitter Accounts

简介
#

数据列出了Twitter上最受关注的50个账户，每个总数四舍五入到最近的十万，以及每个用户的职业或活动。账户总数和排名月度变化最后更新于2022年5月12日。

粉丝排行榜
#

我们试图使用这个数据集制作推特粉丝前50名账户的直观图表

导入
#

第一步是将原始数据导入程序代码。

data_csv <- read.csv("Top 50 Most Followed Twitter Accounts.csv")

可视化
#

由于Twitter账号ID的非重复性，我们自然选择了用户ID作为y轴数据。
我们按照粉丝数量重新排列账号，并在图像上从多到少显示。
为了使图像比较更直观，我们创造性地使用粉丝数量来绘制渐变色，从深到浅代表粉丝数量从多到少。

dataplot <- data_csv %>%
  ggplot(mapping = aes(
    x = Followers..millions.,
    y = reorder(Account.username, Followers..millions.),
    fill = -log(Followers..millions.),
  )) +
  geom_bar(
    stat = "identity",
  ) +
  guides(fill = "none") +
  geom_text(mapping = aes(
    label = Followers..millions.,
  ))

# Add auxiliary information.
dataplot +
  labs(
    x = "Followers (Millions)",
    y = "Username",
    title = "Top 50 in Twitter",
    subtitle = "Information was last updated on May 12, 2022.",
    caption = "Data sources: https://www.kaggle.com/datasets/hassanshehzadk/top-50-most-followed-twitter-accounts?resource=download",
  ) +
  theme(
    plot.title = element_text(hjust = 0.4, size = 14), # title position
    panel.grid.minor = element_blank(), # Secondary grid lines
    text = element_text(family = "Hack Nerd Font"), # font
    axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)
  )

Full code

library(tidyverse)
library(ggplot2)

# Import
data_csv <- read.csv("Top 50 Most Followed Twitter Accounts.csv")

# Plot
dataplot <- data_csv %>%
  ggplot(mapping = aes(
    x = Followers..millions.,
    y = reorder(Account.username, Followers..millions.),
    fill = -log(Followers..millions.),
  )) +
  geom_bar(
    stat = "identity",
  ) +
  guides(fill = "none") +
  geom_text(mapping = aes(
    label = Followers..millions.,
  ))

# Add auxiliary information.
dataplot +
  labs(
    x = "Followers (Millions)",
    y = "Username",
    title = "Top 50 in Twitter",
    subtitle = "Information was last updated on May 12, 2022.",
    caption = "Data sources: https://www.kaggle.com/datasets/hassanshehzadk/top-50-most-followed-twitter-accounts?resource=download", # nolint
  ) +
  theme(
    plot.title = element_text(hjust = 0.4, size = 14), # title position
    panel.grid.minor = element_blank(), # Secondary grid lines
    text = element_text(family = "Hack Nerd Font"), # font
    axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)
  )

账户归属分析
#

我们还想了解拥有大量粉丝的账户的国家归属，因此绘制一个饼图来检查分布情况

导入
#

第一步是将原始数据导入程序代码。

data_csv <- read.csv("Top 50 Most Followed Twitter Accounts.csv")

选择
#

快速统计数据集中每个国家出现的频率。

area_count <- data_csv %>%
  count(Country, name = "count")

可视化
#

由于ggplot中没有内置的饼图绘制方法，我们使用geom_bar和coord_polar来尝试达到相同的效果。

dataplot <- area_count %>%
  ggplot(mapping = aes(
    x = 1,
    y = count,
    fill = Country,
  ))

dataplot +
  geom_bar(stat = "identity") +
  coord_polar(theta = "y") +
  scale_x_continuous(name = NULL, breaks = NULL) +
  scale_y_continuous(name = NULL, breaks = NULL) +
  scale_fill_viridis_d(option = "inferno")

研究结论
#

显然，作为美国的本土软件，考虑到美国庞大的人口基数，属于美国的账户所在国家的数量远超其他国家。
作为一个人口众多的国家，印度意外地成为了美国的第二位。
其他国家的数量基本上是相同的。

Full code

library(tidyverse)
library(ggplot2)

data_csv <- read.csv("Top 50 Most Followed Twitter Accounts.csv")

area_count <- data_csv %>%
  count(Country, name = "count")

dataplot <- area_count %>%
  ggplot(mapping = aes(
    x = 1,
    y = count,
    fill = Country,
  ))

dataplot +
  geom_bar(stat = "identity") +
  coord_polar(theta = "y") +
  scale_x_continuous(name = NULL, breaks = NULL) +
  scale_y_continuous(name = NULL, breaks = NULL) +
  scale_fill_viridis_d(option = "inferno") +
  labs(
    x = "Followers (Millions)",
    y = "Username",
    fill = "Country",
    title = "Country of Account",
    subtitle = "Calculate the top 50 fan accounts on Twitter.\nInformation was last updated on May 12, 2022.", # nolint
    caption = "Data sources: https://www.kaggle.com/datasets/hassanshehzadk/top-50-most-followed-twitter-accounts?resource=download", # nolint
  ) +
  theme(
    plot.title = element_text(hjust = 0.6, size = 14), # title position
    axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
    plot.caption = element_text(hjust = 0.3),
  )

R语言数据分析 - 这篇文章属于一个选集。

§ 1: 二氧化碳排放量估算

§ 2: 本文

§ 3: 美国监狱死亡情况

§ 4: 乳腺癌预测模型

简介#

粉丝排行榜#

导入#

可视化#

账户归属分析#

导入#

选择#

可视化#

研究结论#

简介
#

粉丝排行榜
#

导入
#

可视化
#

账户归属分析
#

导入
#

选择
#

可视化
#

研究结论
#