Im trying to make a graph like this (actually single line for simplicity sake) enter image description here

Given an input word like 'M4M" and a data set file (csv) like this

1529972216.0,Seeking Black M4M
1529972047.0,Looking for car fun 
1529971885.0,armenian M4M

How can I visualize the trend of the given word? I want to chart the occurrence of the word over the time span, to be able to tell if the word/topic is declining or increasing in popularity.

(the data set is a csv file containing in field 1 the unix epoch timestamp of craigslist posts and in field 2 the title of the craiglist posts)

In my system I have R and gnu plot installed (if that helps)
In any given day, hundreds of craiglist posts can be there.

closed as too broad by camille, E_net4, Zoe, M-M, Machavity Mar 12 at 23:00

Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.


gnuplot can do that. It's basically like a histogram and gnuplot has the option smooth frequency for this. If Keyword appears in the second column it will be counted and summed up. Adapt the code to your needs.

The code:

### count occurrence of a word
reset session

$Data <<EOD
1300000000.0,Seeking Green M4M
1300000000.0,Seeking Blue M4M
1310000000.0,Seeking Green M4M
1320000000.0,Seeking Red M4M
1330000000.0,Seeking Black M4M
1340000000.0,Looking for car fun 
1350000000.0,armenian M4M
1360000000.0,english M4M
1370000000.0,german M4M
1380000000.0,french M4M
1390000000.0,italian M4M
1390200000.0,greek M4M
1400000000.0,swiss M4M
1500000000.0,spanish M4M

set datafile separator ","
set xdata time
set timefmt "%s"
set format x "%Y"

Keyword ="M4M"
Binwidth = 3600.*24*7   # one week

plot $Data u (floor($1/Binwidth)*Binwidth):(strstrt(strcol(2),Keyword)>0) \
    smooth freq w lp pt 7 lc rgb "red" title Keyword
### end of code

The result:

enter image description here

edit Comment:

actually, it might be misleading to plot the result with lines or linespoints (as above), because it suggests that the result between 2015 and 2017 is 1, which is not true. The plotstyle with boxes would suggest this as well. These plotstyles could only be applied if there is a value in every bin (here: every week). Well, you could set the value of all other weeks to zero. So, the "correct" plotstyle in any case would be with impulses.

enter image description here

Not the answer you're looking for? Browse other questions tagged or ask your own question.