%pylab inline
Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.kernel.zmq.pylab.backend_inline]. For more information, type 'help(pylab)'.
Nikolay Koldunov
koldunovn@gmail.com
In this notebook, I am going to show simple example of an Apache access log analysis with pandas. It's my first experience with pandas, and I am sure there are better and more efficient ways to do some of the things shown here. So comments, suggestions and corrections of my broken English are very welcome. You can send me an email, or create a PR for this notebook at github.
We will need apachelog module, for parsing the log. We also have to know the format of the log that is set in the Apache config. In my case I have no access to the Apache config, but the hoster provide description of the format on his help page. Below is the format itself and a short description of every element:
format = r'%V %h %l %u %t \"%r\" %>s %b \"%i\" \"%{User-Agent}i\" %T'
Here (copied mostly from this SO post):
%V - the server name according to the UseCanonicalName setting
%h - remote host (ie the client IP)
%l - identity of the user determined by identd (not usually used since not reliable)
%u - user name determined by HTTP authentication
%t - time the server finished processing the request.
%r - request line from the client. ("GET / HTTP/1.0")
%>s - status code sent from the server to the client (200, 404 etc.)
%b - size of the response to the client (in bytes)
\"%i\" - Referer is the page that linked to this URL.
User-agent - the browser identification string
%T - Apache request time
import apachelog, sys
Set the format:
fformat = r'%V %h %l %u %t \"%r\" %>s %b \"%i\" \"%{User-Agent}i\" %T'
Create the parcer:
p = apachelog.parser(fformat)
Sample string:
koldunov.net 85.26.235.202 - - [16/Mar/2013:00:19:43 +0400] "GET /?p=364 HTTP/1.0" 200 65237 "http://koldunov.net/?p=364" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11" 0
sample_string = 'koldunov.net 85.26.235.202 - - [16/Mar/2013:00:19:43 +0400] "GET /?p=364 HTTP/1.0" 200 65237 "http://koldunov.net/?p=364" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11" 0'
data = p.parse(sample_string)
data
{'%>s': '200', '%T': '0', '%V': 'koldunov.net', '%b': '65237', '%h': '85.26.235.202', '%i': 'http://koldunov.net/?p=364', '%l': '-', '%r': 'GET /?p=364 HTTP/1.0', '%t': '[16/Mar/2013:00:19:43 +0400]', '%u': '-', '%{User-Agent}i': 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'}
log = open('access_log_for_pandas').readlines()
Parse every line of it and create a list of dictionaries:
log_list = []
for line in log:
try:
data = p.parse(line)
except:
sys.stderr.write("Unable to parse %s" % line)
data['%t'] = data['%t'][1:12]+' '+data['%t'][13:21]+' '+data['%t'][22:27]
log_list.append(data)
We had to tweak the time format a bit, since otherwise pandas will not be able to parse it.
This will create a list of dictionaries, that can be transformed in to a Data Frame:
import pandas as pd
import numpy as np
from pandas import Series, DataFrame, Panel
df = DataFrame(log_list)
Show the first two lines of the Data Frame:
df[0:2]
%>s | %T | %V | %b | %h | %i | %l | %r | %t | %u | %{User-Agent}i | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 200 | 0 | www.oceanographers.ru | 26126 | 109.165.31.156 | - | - | GET /index.php?option=com_content&task=section... | 16/Mar/2013 08:00:25 +0400 | - | Mozilla/5.0 (Windows NT 6.1; rv:19.0) Gecko/20... |
1 | 200 | 0 | www.oceanographers.ru | 10532 | 109.165.31.156 | http://www.oceanographers.ru/index.php?option=... | - | GET /templates/ja_procyon/css/template_css.css... | 16/Mar/2013 08:00:25 +0400 | - | Mozilla/5.0 (Windows NT 6.1; rv:19.0) Gecko/20... |
We are not going to use all the data, so let's delete some of the columns:
del df['%T']; del df['%V']; del df['%i']; del df['%l']; del df['%u']; del df['%{User-Agent}i']
and rename columns to something that humans can understand:
df = df.rename(columns={'%>s': 'Status', '%b':'b',
'%h':'IP', '%r':'Request', '%t': 'Time'})
First five rows of resulting Data Frame:
df.head()
Status | b | IP | Request | Time | |
---|---|---|---|---|---|
0 | 200 | 26126 | 109.165.31.156 | GET /index.php?option=com_content&task=section... | 16/Mar/2013 08:00:25 +0400 |
1 | 200 | 10532 | 109.165.31.156 | GET /templates/ja_procyon/css/template_css.css... | 16/Mar/2013 08:00:25 +0400 |
2 | 200 | 1853 | 109.165.31.156 | GET /templates/ja_procyon/switcher.js HTTP/1.0 | 16/Mar/2013 08:00:25 +0400 |
3 | 200 | 37153 | 109.165.31.156 | GET /includes/js/overlib_mini.js HTTP/1.0 | 16/Mar/2013 08:00:25 +0400 |
4 | 200 | 3978 | 109.165.31.156 | GET /modules/ja_transmenu/transmenuh.css HTTP/1.0 | 16/Mar/2013 08:00:25 +0400 |
Convert Time column to datetime format and make an index out of it (pop will drop original Time column):
df.index = pd.to_datetime(df.pop('Time'))
The Status variable is a string type, so we have to change it to int:
df['Status'] = df['Status'].astype('int')
Some of the rows in the b column contain '-' string, so we can't convert them with astype:
df['b'][93]
'-'
We can apply a custom function to this column, that will convert all dashes to NaN, and the rest to floats, additionally converting from bytes to megabytes:
def dash2nan(x):
if x == '-':
x = np.nan
else:
x = float(x)/1048576.
return x
df['b'] = df['b'].apply(dash2nan)
I am sure there is a more elegant way to do this :)
Our first, simplest plot: outgoing traffic from the website:
df['b'].plot()
<matplotlib.axes.AxesSubplot at 0xbf7574c>
Look's like somebody downloaded something big from the website around 9 a.m.
But actually the first thing that you probably want to know is how many visits (actually hits) your site has, and how they are distributed in time. We resample the series of variable b with 5 minute intervals, and calculate number of requests during every time span. Actually, in this case it doesn't matter what variable we use, these numbers will indicate just how many times information from the the website was requested.
df_s = df['b'].resample('5t', how='count')
df_s.plot()
<matplotlib.axes.AxesSubplot at 0xc14588c>
We can not only count number of requests per time, but also calculate the sum of the traffic for every time span:
df_b = df['b'].resample('10t', how=['count','sum'])
df_b['count'].plot( color='r')
legend()
df_b['sum'].plot(secondary_y=True)
<matplotlib.axes.AxesSubplot at 0xc2d53ac>
On the plot you can see, that the number of server requests do not always coincide with the amount of traffic, and correlation is actually not extremely high:
df_b.corr()
count | sum | |
---|---|---|
count | 1.000000 | 0.512629 |
sum | 0.512629 | 1.000000 |
We can have a closer look at the curious morning peak:
df_b['2013-03-16 6:00':'2013-03-16 10:00']['sum'].plot()
<matplotlib.axes.AxesSubplot at 0xc3f5dac>
Seems that this traffic spike was caused by only one request. Let's find out how this request looks. Choose all requests with the size of the response larger than 20 Mb (remember we convert bytes to Mb):
df[df['b']>20]
Status | b | IP | Request | |
---|---|---|---|---|
Time | ||||
2013-03-16 09:02:59 | 200 | 21.365701 | 77.50.248.20 | GET /books/Bondarenko.pdf HTTP/1.0 |
It was a .pdf file of a book (look at Request field), and this explains the peak in outgoing trafic at 2013-03-16 09:02:59.
Clearly 20 Mb is a large request (for our website at least :)). But what is the typical size of the server's response? The histogram of response sizes (less than 20 Mb) looks like this:
cc = df[df['b']<20]
cc.b.hist(bins=10)
<matplotlib.axes.AxesSubplot at 0xc52374c>
So, most of the files are less than 0.5 Mb large. In fact they are even smaller:
cc = df[df['b']<0.3]
cc.b.hist(bins=100)
<matplotlib.axes.AxesSubplot at 0xc5760ec>
Very small responses can be all kinds of stuff, but larger ones are probably some relatively big files. Let's have a look at these little spikes between 0.2 and 0.25 Mb:
cc = df[(df['b']>0.2)&(df['b']<0.25)]
cc.b.hist(bins=100)
<matplotlib.axes.AxesSubplot at 0xc57cacc>
We now can locate the size of the file more precisely, and find out its name:
cc = df[(df['b']>0.220)&(df['b']<0.224)]
cc.head()
Status | b | IP | Request | |
---|---|---|---|---|
Time | ||||
2013-03-16 10:13:51 | 200 | 0.223101 | 178.238.18.194 | GET /mypict/itpocover.png HTTP/1.0 |
2013-03-16 11:02:12 | 200 | 0.223101 | 2.95.140.217 | GET /mypict/itpocover.png HTTP/1.0 |
2013-03-16 11:18:48 | 200 | 0.223101 | 213.87.138.145 | GET /mypict/itpocover.png HTTP/1.0 |
2013-03-16 11:33:36 | 200 | 0.223101 | 91.207.106.252 | GET /mypict/itpocover.png HTTP/1.0 |
2013-03-16 11:34:03 | 200 | 0.223101 | 212.3.137.202 | GET /mypict/itpocover.png HTTP/1.0 |
This is one of the image files from the front page of the website.
I would like to know how different server responses are distributed through time. Let's first try a not very elegant approach. We create several variables with time series of Status values, each containing only a time series with one particular value. Then a Data Frame is created out of this time series.
t_span = '2H'
df_404 = df['Status'][df['Status'] == 404].resample(t_span, how='count')
df_403 = df['Status'][df['Status'] == 403].resample(t_span, how='count')
df_301 = df['Status'][df['Status'] == 301].resample(t_span, how='count')
df_304 = df['Status'][df['Status'] == 304].resample(t_span, how='count')
df_200 = df['Status'][df['Status'] == 200].resample(t_span, how='count')
status_df = DataFrame({'Not Found':df_404, 'Forbidden':df_403, 'Moved Permanently':df_301, 'Not Modified':df_304, 'OK':df_200,})
status_df.head()
Forbidden | Moved Permanently | Not Found | Not Modified | OK | |
---|---|---|---|---|---|
Time | |||||
2013-03-16 08:00:00 | 22 | 17 | 3 | 4 | 375 |
2013-03-16 10:00:00 | 26 | 76 | 2 | 2 | 607 |
2013-03-16 12:00:00 | 45 | 83 | 2 | 23 | 780 |
2013-03-16 14:00:00 | 29 | 34 | 4 | 2 | 699 |
2013-03-16 16:00:00 | 27 | 94 | 5 | 3 | 886 |
We plot all values at once:
status_df.plot(figsize=(10, 3))
<matplotlib.axes.AxesSubplot at 0xcda1e6c>
Show only Error and Redirection status codes as a stacked plot:
status_df[['Not Found','Forbidden','Moved Permanently','Not Modified']].plot(kind='barh', stacked=True, figsize=(10, 7))
<matplotlib.axes.AxesSubplot at 0xce27e2c>
But there is a better way to receive statistics about different groups of values. Here we group our data by Status:
grouped_status = df.groupby('Status')
Data are now rearranged in to several groups, each corresponding to a certain value of Status (here, only the first two values of every group are shown):
grouped_status.head(2)
Status | b | IP | Request | ||
---|---|---|---|---|---|
Status | Time | ||||
200 | 2013-03-16 08:00:25 | 200 | 0.024916 | 109.165.31.156 | GET /index.php?option=com_content&task=section... |
2013-03-16 08:00:25 | 200 | 0.010044 | 109.165.31.156 | GET /templates/ja_procyon/css/template_css.css... | |
206 | 2013-03-16 11:42:53 | 206 | 0.031250 | 109.171.109.164 | GET /images/stories/pdf/0104_044.pdf HTTP/1.0 |
2013-03-16 11:42:53 | 206 | 0.224717 | 109.171.109.164 | GET /images/stories/pdf/0104_044.pdf HTTP/1.0 | |
301 | 2013-03-16 08:13:08 | 301 | 0.000232 | 38.100.21.63 | GET /forum HTTP/1.0 |
2013-03-16 08:13:09 | 301 | 0.000234 | 38.100.21.63 | GET /podcast HTTP/1.0 | |
302 | 2013-03-16 08:47:04 | 302 | NaN | 176.8.91.244 | POST /podcast/wp-comments-post.php HTTP/1.0 |
2013-03-16 11:27:40 | 302 | NaN | 78.30.212.183 | POST /podcast/wp-comments-post.php HTTP/1.0 | |
304 | 2013-03-16 08:19:41 | 304 | NaN | 178.154.206.250 | GET /images/stories/researchers/laplace.jpg HT... |
2013-03-16 08:33:14 | 304 | NaN | 178.154.206.250 | GET /images/stories/researchers/treshnikov.jpg... | |
400 | 2013-03-16 08:13:09 | 400 | 0.000999 | 38.100.21.63 | GET /.. HTTP/1.0 |
403 | 2013-03-16 08:04:23 | 403 | 0.001054 | 5.79.199.116 | GET /forum/download/file.php?avatar=156.gif HT... |
2013-03-16 08:04:23 | 403 | 0.001054 | 5.79.199.116 | GET /forum/download/file.php?avatar=216.jpg HT... | |
404 | 2013-03-16 08:04:25 | 404 | 0.001063 | 5.79.199.116 | GET /apple-touch-icon-precomposed.png HTTP/1.0 |
2013-03-16 08:04:26 | 404 | 0.001063 | 5.79.199.116 | GET /apple-touch-icon.png HTTP/1.0 |
We can count the number of values in each group:
grouped_status.size().plot(kind='bar')
<matplotlib.axes.AxesSubplot at 0xcfe76ac>
We can look at individual groups:
t_span = '30t'
grouped_status.get_group(301)['Status'].resample(t_span, how='count').plot(color='g', label='301')
legend()
grouped_status.get_group(200)['Status'].resample(t_span, how='count').plot(color='b', secondary_y=True, label='200')
<matplotlib.axes.AxesSubplot at 0xd0799ec>
Now I want to group by IPs and calculate how many requests we get from each of them:
ips = df.groupby('IP').size()
I don't want to see all of them, just the top 10:
ips.sort()
ips[-10:].plot(kind='barh')
<matplotlib.axes.AxesSubplot at 0xd13c0ec>
I just want to have a nice table with the top 10 IPs, so I convert them to Data Frame :)
ips_fd = DataFrame({'Number of requests':ips[-10:]})
ips_fd = ips_fd.sort(columns='Number of requests', ascending=False)
ips_fd
Number of requests | |
---|---|
IP | |
66.249.73.137 | 814 |
178.154.206.250 | 168 |
85.114.191.235 | 165 |
173.199.115.187 | 127 |
85.142.104.227 | 126 |
178.70.210.234 | 120 |
85.142.104.228 | 111 |
95.118.89.119 | 93 |
217.69.133.68 | 74 |
92.255.161.124 | 72 |
We also can group by both IP and Status:
ips_status = df.groupby(['IP', 'Status']).size()
ips_status.sort()
ips_status[-20:].plot(kind='barh')
<matplotlib.axes.AxesSubplot at 0xd3425cc>
Import module and initialise data base:
import pygeoip
gi = pygeoip.GeoIP('./GeoLiteCity.dat', pygeoip.MEMORY_CACHE)
This is how output of the pygeoip looks (we test our top IP from the previous section):
ipcon = gi.record_by_addr('64.233.161.99')
ipcon
{'area_code': 650, 'city': 'Mountain View', 'continent': 'NA', 'country_code': 'US', 'country_code3': 'USA', 'country_name': 'United States', 'dma_code': 807, 'latitude': 37.41919999999999, 'longitude': -122.0574, 'metro_code': 'San Francisco, CA', 'postal_code': '94043', 'region_name': 'CA', 'time_zone': 'America/Los_Angeles'}
Guess who it is :) Looks like Google Bot is my main reader. That's rather sad :(
Loop to create a list of dictionaries with information about IP numbers. We also add information about the number of requests from each of the IPs:
ipcon = []
for iip in ips.index:
rres = gi.record_by_addr(iip)
# rres['ip'] = iip
rres['Number'] = ips[iip]
#delete some fields we don't need
del rres['area_code']
del rres['dma_code']
del rres['metro_code']
del rres['postal_code']
del rres['region_name']
del rres['time_zone']
del rres['country_code']
ipcon.append(rres)
Create Data Frame from this list, use ips index:
reg = DataFrame(ipcon, index = ips.index)
reg.head()
Number | city | continent | country_code3 | country_name | latitude | longitude | |
---|---|---|---|---|---|---|---|
IP | |||||||
112.78.139.130 | 1 | Jakarta | AS | IDN | Indonesia | -6.1744 | 106.8294 |
116.25.249.140 | 1 | Shenzhen | AS | CHN | China | 22.5333 | 114.1333 |
117.199.193.19 | 1 | Agra | AS | IND | India | 27.1833 | 78.0167 |
117.27.175.158 | 1 | Zhangzhou | AS | CHN | China | 24.5133 | 117.6556 |
119.147.6.60 | 1 | Guangzhou | AS | CHN | China | 23.1167 | 113.2500 |
Group by Country:
country = reg.groupby('country_code3')
ff = country.Number.agg('sum').copy()
ff.sort( )
ff[-10:].plot(kind='barh')
<matplotlib.axes.AxesSubplot at 0xd6c68cc>
The website is in Russian, so no surprises here.
Group by city:
city = reg.groupby('city')
ff = city.Number.agg('sum').copy()
ff.sort( )
ff[-20:].plot(kind='barh', figsize=(5,8))
<matplotlib.axes.AxesSubplot at 0xd86fa2c>
The first 3 are kind of expected, but Tyumen, a Russian city located in the middle of Eurasia thousands of kilometers away from the nearest ocean coast, is a bit of a surprise (the website is about Oceanography). This is something to investigate later on.
We have lats and lons for the data, so it is natural to put them on the map:
from mpl_toolkits.basemap import Basemap
import matplotlib.cm as cm
m = Basemap(projection='robin',lon_0=0,resolution='c')
x, y = m(reg['longitude'],reg['latitude'])
figure(figsize=(15,15))
m.drawcoastlines(linewidth=0.25)
m.drawcountries(linewidth=0.25)
m.fillcontinents(color='coral',lake_color='aqua')
m.drawmapboundary(fill_color='white')
m.drawmeridians(np.arange(0,360,30))
m.drawparallels(np.arange(-90,90,30))
m.scatter(x,y,s=reg['Number']*3,c=reg['Number']/5,marker='o',zorder=4, cmap=cm.Paired,alpha=0.5)
<matplotlib.collections.PathCollection at 0x15facacc>
Have closer look at Europe:
m = Basemap(projection='cyl',llcrnrlat=35,urcrnrlat=72,\
llcrnrlon=-10,urcrnrlon=50,resolution='l')
x, y = m(reg['longitude'],reg['latitude'])
figure(figsize=(15,15))
m.drawcoastlines(linewidth=0.25)
m.drawcountries(linewidth=0.25)
m.fillcontinents(color='white',lake_color='aqua')
m.drawmapboundary(fill_color='aqua')
m.drawmeridians(np.arange(0,360,30))
m.drawparallels(np.arange(-90,90,30))
m.scatter(x,y,s=reg['Number']*30,c=reg['Number'],marker='o',zorder=4, cmap=cm.gist_ncar ,alpha=0.3)
<matplotlib.collections.PathCollection at 0x169381ec>
Spammers are a huge problem, and it would be nice to filter them out. My website has very low user activity, and if somebody tries to POST something more than couple of times it's already suspicious. First we have to get information about what request method is being used for the requests. For this we have to analyse the Request column, which contain strings with requests. Request methods are always located in the beginning of the request string, so it's trivial to get it:
def gmet(x):
x = x.split()
return x[0]
We apply this little function to the Request column and create new Method column:
df['Method'] = df.Request.apply(gmet)
Group by Method and IP:
met = df.groupby(['Method', 'IP']).size()
met.head
<bound method Series.head of Method IP GET 100.44.124.8 26 108.171.252.242 23 109.165.31.156 12 109.171.109.164 4 109.191.82.110 14 109.200.149.88 56 109.205.252.77 32 109.225.55.49 13 109.226.101.186 45 109.230.128.168 9 109.230.128.169 13 109.230.132.244 2 109.87.146.37 4 109.87.148.141 13 111.73.45.49 10 ... POST 46.191.247.15 5 46.29.255.194 5 5.164.227.102 6 66.23.234.10 10 77.222.40.65 58 78.30.212.183 4 81.200.28.126 1 85.114.191.235 2 85.142.104.227 1 91.207.4.78 5 94.190.190.121 5 94.45.180.57 5 95.105.29.56 5 95.27.28.213 5 95.84.137.56 6 Length: 470>
Create POST time series (we have to have a copy of the time series to sort the data):
post = met['POST'].copy()
post.sort()
Plot the top 10 IPs that use POST method:
post[-10:].plot(kind='barh')
<matplotlib.axes.AxesSubplot at 0x16ddaeec>
post[-5:]
IP 173.208.108.10 10 66.23.234.10 10 213.248.47.229 11 160.75.185.7 15 77.222.40.65 58
What is the winner doing?
df[df.IP =='77.222.40.65'][0:5]
Status | b | IP | Request | Method | |
---|---|---|---|---|---|
Time | |||||
2013-03-16 08:28:43 | 403 | 0.001054 | 77.222.40.65 | POST /podcast/wp-cron.php?doing_wp_cron=136340... | POST |
2013-03-16 08:45:08 | 403 | 0.001054 | 77.222.40.65 | POST /podcast/wp-cron.php?doing_wp_cron=136340... | POST |
2013-03-16 09:19:52 | 403 | 0.001054 | 77.222.40.65 | POST /podcast/wp-cron.php?doing_wp_cron=136341... | POST |
2013-03-16 09:36:26 | 403 | 0.001054 | 77.222.40.65 | POST /podcast/wp-cron.php?doing_wp_cron=136341... | POST |
2013-03-16 09:40:22 | 403 | 0.001054 | 77.222.40.65 | POST /podcast/wp-cron.php?doing_wp_cron=136341... | POST |
Making some Wordpress cron requests... This IP looks familiar... Oh, it's actually the IP of my website - this is my Wordpress doing something. Have to investigate later if I can disable this thing (anyway it's not working, getting 403 error) :)
Number two:
df[df.IP =='160.75.185.7'][0:10]
Status | b | IP | Request | Method | |
---|---|---|---|---|---|
Time | |||||
2013-03-16 17:27:12 | 200 | 0.002224 | 160.75.185.7 | GET /administrator/index.php HTTP/1.0 | GET |
2013-03-16 17:27:12 | 200 | 0.002224 | 160.75.185.7 | GET /administrator/index.php HTTP/1.0 | GET |
2013-03-16 17:27:12 | 200 | 0.002224 | 160.75.185.7 | GET /administrator/index.php HTTP/1.0 | GET |
2013-03-16 17:27:12 | 200 | 0.002224 | 160.75.185.7 | GET /administrator/index.php HTTP/1.0 | GET |
2013-03-16 17:27:12 | 200 | 0.002224 | 160.75.185.7 | GET /administrator/index.php HTTP/1.0 | GET |
2013-03-16 17:27:12 | 200 | 0.002224 | 160.75.185.7 | POST /administrator/index.php HTTP/1.0 | POST |
2013-03-16 17:27:12 | 200 | 0.002224 | 160.75.185.7 | POST /administrator/index.php HTTP/1.0 | POST |
2013-03-16 17:27:12 | 200 | 0.002224 | 160.75.185.7 | GET /administrator/index.php HTTP/1.0 | GET |
2013-03-16 17:27:12 | 200 | 0.002224 | 160.75.185.7 | POST /administrator/index.php HTTP/1.0 | POST |
2013-03-16 17:27:12 | 200 | 0.002224 | 160.75.185.7 | POST /administrator/index.php HTTP/1.0 | POST |
This is suspicious; looks like it tries to get to the website's admin area - candidate for blocking.
Number three:
df[df.IP =='213.248.47.229'][0:10]
Status | b | IP | Request | Method | |
---|---|---|---|---|---|
Time | |||||
2013-03-16 11:00:34 | 200 | 0.012539 | 213.248.47.229 | GET /forum/ucp.php?mode=register&sid=8203c4843... | GET |
2013-03-16 11:00:34 | 200 | 0.026559 | 213.248.47.229 | GET /forum/index.php HTTP/1.0 | GET |
2013-03-16 11:00:34 | 200 | 0.012011 | 213.248.47.229 | GET /forum/ucp.php?mode=register&sid=c78851b8f... | GET |
2013-03-16 11:00:39 | 200 | 0.016414 | 213.248.47.229 | POST /forum/ucp.php?mode=register&sid=12986e06... | POST |
2013-03-16 11:00:39 | 200 | 0.016414 | 213.248.47.229 | POST /forum/ucp.php?mode=register HTTP/1.0 | POST |
2013-03-16 11:00:44 | 200 | 0.015024 | 213.248.47.229 | POST /forum/ucp.php?mode=register HTTP/1.0 | POST |
2013-03-16 11:00:45 | 200 | 0.015028 | 213.248.47.229 | POST /forum/ucp.php?mode=register HTTP/1.0 | POST |
2013-03-16 11:00:49 | 200 | 0.015024 | 213.248.47.229 | POST /forum/ucp.php?mode=register HTTP/1.0 | POST |
2013-03-16 11:00:50 | 200 | 0.015028 | 213.248.47.229 | POST /forum/ucp.php?mode=register HTTP/1.0 | POST |
2013-03-16 11:00:55 | 200 | 0.015024 | 213.248.47.229 | POST /forum/ucp.php?mode=register HTTP/1.0 | POST |
This one tries to register on the forum every 5 second or so - clearly forum spammer. Block.