跳过正文
  1. 博客/

粉丝最多的50个推特号

·943 字·2 分钟· ·
数据分析 R语言
按点下班
作者
按点下班
Work to live, don’t live to work
目录
R语言数据分析 - 这篇文章属于一个选集。
§ 2: 本文

Data source: Top 50 Most Followed Twitter Accounts

简介
#

数据列出了Twitter上最受关注的50个账户,每个总数四舍五入到最近的十万,以及每个用户的职业或活动。账户总数和排名月度变化最后更新于2022年5月12日

粉丝排行榜
#

我们试图使用这个数据集制作推特粉丝前50名账户的直观图表

导入
#

第一步是将原始数据导入程序代码。

data_csv <- read.csv("Top 50 Most Followed Twitter Accounts.csv")

可视化
#

  • 由于Twitter账号ID的非重复性,我们自然选择了用户ID作为y轴数据。
  • 我们按照粉丝数量重新排列账号,并在图像上从多到少显示。
  • 为了使图像比较更直观,我们创造性地使用粉丝数量来绘制渐变色,从深到浅代表粉丝数量从多到少。
dataplot <- data_csv %>%
  ggplot(mapping = aes(
    x = Followers..millions.,
    y = reorder(Account.username, Followers..millions.),
    fill = -log(Followers..millions.),
  )) +
  geom_bar(
    stat = "identity",
  ) +
  guides(fill = "none") +
  geom_text(mapping = aes(
    label = Followers..millions.,
  ))

# Add auxiliary information.
dataplot +
  labs(
    x = "Followers (Millions)",
    y = "Username",
    title = "Top 50 in Twitter",
    subtitle = "Information was last updated on May 12, 2022.",
    caption = "Data sources: https://www.kaggle.com/datasets/hassanshehzadk/top-50-most-followed-twitter-accounts?resource=download",
  ) +
  theme(
    plot.title = element_text(hjust = 0.4, size = 14), # title position
    panel.grid.minor = element_blank(), # Secondary grid lines
    text = element_text(family = "Hack Nerd Font"), # font
    axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)
  )

Full code
library(tidyverse)
library(ggplot2)

# Import
data_csv <- read.csv("Top 50 Most Followed Twitter Accounts.csv")

# Plot
dataplot <- data_csv %>%
  ggplot(mapping = aes(
    x = Followers..millions.,
    y = reorder(Account.username, Followers..millions.),
    fill = -log(Followers..millions.),
  )) +
  geom_bar(
    stat = "identity",
  ) +
  guides(fill = "none") +
  geom_text(mapping = aes(
    label = Followers..millions.,
  ))

# Add auxiliary information.
dataplot +
  labs(
    x = "Followers (Millions)",
    y = "Username",
    title = "Top 50 in Twitter",
    subtitle = "Information was last updated on May 12, 2022.",
    caption = "Data sources: https://www.kaggle.com/datasets/hassanshehzadk/top-50-most-followed-twitter-accounts?resource=download", # nolint
  ) +
  theme(
    plot.title = element_text(hjust = 0.4, size = 14), # title position
    panel.grid.minor = element_blank(), # Secondary grid lines
    text = element_text(family = "Hack Nerd Font"), # font
    axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)
  )

账户归属分析
#

我们还想了解拥有大量粉丝的账户的国家归属,因此绘制一个饼图来检查分布情况

导入
#

第一步是将原始数据导入程序代码。

data_csv <- read.csv("Top 50 Most Followed Twitter Accounts.csv")

选择
#

快速统计数据集中每个国家出现的频率。

area_count <- data_csv %>%
  count(Country, name = "count")

可视化
#

  • 由于ggplot中没有内置的饼图绘制方法,我们使用geom_barcoord_polar来尝试达到相同的效果。
dataplot <- area_count %>%
  ggplot(mapping = aes(
    x = 1,
    y = count,
    fill = Country,
  ))

dataplot +
  geom_bar(stat = "identity") +
  coord_polar(theta = "y") +
  scale_x_continuous(name = NULL, breaks = NULL) +
  scale_y_continuous(name = NULL, breaks = NULL) +
  scale_fill_viridis_d(option = "inferno")

研究结论
#

  1. 显然,作为美国的本土软件,考虑到美国庞大的人口基数,属于美国的账户所在国家的数量远超其他国家。
  2. 作为一个人口众多的国家,印度意外地成为了美国的第二位。
  3. 其他国家的数量基本上是相同的。
Full code
library(tidyverse)
library(ggplot2)

data_csv <- read.csv("Top 50 Most Followed Twitter Accounts.csv")

area_count <- data_csv %>%
  count(Country, name = "count")

dataplot <- area_count %>%
  ggplot(mapping = aes(
    x = 1,
    y = count,
    fill = Country,
  ))

dataplot +
  geom_bar(stat = "identity") +
  coord_polar(theta = "y") +
  scale_x_continuous(name = NULL, breaks = NULL) +
  scale_y_continuous(name = NULL, breaks = NULL) +
  scale_fill_viridis_d(option = "inferno") +
  labs(
    x = "Followers (Millions)",
    y = "Username",
    fill = "Country",
    title = "Country of Account",
    subtitle = "Calculate the top 50 fan accounts on Twitter.\nInformation was last updated on May 12, 2022.", # nolint
    caption = "Data sources: https://www.kaggle.com/datasets/hassanshehzadk/top-50-most-followed-twitter-accounts?resource=download", # nolint
  ) +
  theme(
    plot.title = element_text(hjust = 0.6, size = 14), # title position
    axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
    plot.caption = element_text(hjust = 0.3),
  )
R语言数据分析 - 这篇文章属于一个选集。
§ 2: 本文