Reduce rows in data frame when computing variable?
Would anyone have an idea as to how I can change this code so that I get to a data frame that only has one row per year?
# Step 1. Weighted sum = Sum(sum of mean severity ratings * the repetition of each word near trauma)
df_word - df_word %% # compute the product of mean sum ratings (AV,A,V) * lemma repetitions
mutate(AV.prod=(AV.Mean.Sum*repet),
A.prod=(A.Mean.Sum*repet),
V.prod=(V.Mean.Sum.R*repet))
df_word - df_word %% # group by year and sum for each group (AV,A,V) (numerator)
group_by(year) %%
mutate(sumAVprod.word = sum(AV.prod),
sumAprod.word = sum(A.prod),
sumVprod.word = sum(V.prod)) %% ungroup()
# Step 2. Standardize: Weighted average = sum of(repetition-weighted severity: AV,A,V) by lemma/ sum(repetitions by year)
df_word - df_word %% # sum repetitions by year (denominator)
group_by(year) %%
mutate(sum_repet_word=sum(repet)) %% ungroup()
df_word - df_word %% # compute standardization (for AV,A,V)
mutate(sev_word=(sumAVprod.word/sum_repet_word),
aro_word=(sumAprod.word/sum_repet_word),
val_word=(sumVprod.word/sum_repet_word))
head(df_word) # verify values by manually calculating 1-3 rows of `sev_word`
current data frame (too many rows)
desired data frame
Topic r
Category Data Science