Proxylizer/Concepts
Collected Log Data
Received logs from MikroTik router Web Proxy-server contain various information about web requests. Information stored from these logs is domain , IP addresses, event time and is the site loaded from proxy server cache or not. All URL info that is behind '?' is dropped off and not stored in data base, because this info is too large and it is unnecessary for monitoring Web Proxy server request statistics.
Received domain is stored in other table then logs and divided in 3 parts sub domain, domain and top domain. If domain has more than 3 parts, for example abc.def.ghi.mikrotik.com, it is divided in 'abc.def.ghi' + 'mikrotik' + 'com'.
It takes approximately 200KB of space to store 1000 log records in the data base. You can see data base statistic in the Status section on the Proxylizer web page.
Reports
Report name
This field is designed to give each report user created identifier.
Report type
There are 2 report types - Domain and User statistics. The former report is used to analyze domain usage of all web requests, while the latter - to view web requests of each user.
Frequency
There are 4 types of frequency available - once, daily, weekly and monthly. There is one difference between once report and others. Once report is designed to create custom reports, there are no restrictions for this type of report. As the name states, it is executed only once, while the others are periodic. Periodic reports are generated only when all data are collected. For example, if data for March 4th are needed, them are available only after March 5th 00:00.
Recipient
All generated reports can be sent to e-mail. If "No recipient" is selected, reports is not sent to any email. But, as all reports, it is still available to see the generated report in web interface, report section, (history). Recipient addresses are selected from IP User list, which can be edited at IP users page.
Date interval
These fields are for setting up date boundaries for report.
- For once report it means that all data in this period will be selected
- For daily - for each day, weekly - each week, monthly - each month.
It is posible, to set these boundaries in past and in future. For example if today is February 15th it is posible to set these values for daily report January 3rd to February 2nd and receive reports for each day in this period or for March 5th to April 20th and when these dates come - reports will be generated then.
Day interval
This field is only available for monthly reports. It denotes needed data interval - first and last day of each month of data to include in report. For example if data for first an then second half of month is needed, 2 reports must be created one - 1st till 15th, second - 16th till 31st. For the end date any value greater than the day count in the actual month is considered as the last day of the month. For example, there are 28/29 days in the February, so any value greater than 28 means end of the month. But for October, 29 means "do not select data about October 30 and 31".
Weekdays
Use if custom days of week filtration needed. For example if only working days needed then uncheck weekend days.
Time interval
Use if custom time of day filtration needed. Multiple time intervals available, for example, 09:00-12:00 and 13:00-17:00. Click to add and to remove intervals.
IP
Use to filter IP addresses. If "Show all" selected then report for all IP addresses and approximate time spent on internet is displayed. Otherwise domains and time for specific IP user.
Domain
Use to filter domains and parts of domains. As explained previously domain is divided in 3 parts - sub domain, domain and top domain. Respectively use first field to filter by sub domain, second - by domain and etc.
Top
Use if only most often gathered data is needed. For example, select "View top 10" to view only 10 most used domains.
Generate time
This field is to set time when report must be generated. For each frequency it is different. For "once" it is date and day time, for daily only day time and etc. How mentioned previously in this section data can be generated only when all data are collected. For example, if weekly report for working days needed then data can be generated only after Saturday 00:00. Principle - as fast as possible.
Time calculation
Time used for each domain or user is calculated by a simple algorithm - if user makes at least one web request per minute, user has used internet for one minute. It means that it is no difference if there are tousands of requests per minute ore just one. Count of these request are selected and viewed under column "Hitcount". And it is irrelevant, how many different requests are made at one time.
Created Reports
Reports consist of two parts: 1. Report configuration - contains information about what data has to be collected, when the reports will be generated and who will receive them; 2. The generated report - contains the actual data based on configuration rules.
The configuration is created in the report section on the Proxylizer web page.
Reports are generated automatically by a background script which is run by the scheduler.
When a configuration is deleted, all generated reports for that configuration are deleted as well.
Report Generation
Report generation principle is like this - each minute script is started and checked if there must be some report generated. Reports are generated as fast as posible - when all data are collected and generation time has come. If some reports are generating more then one minute, there is maximum simultaneous generating report amount defined, which can be set in the config section. For multi-core processors this value should be equal with the core count to utilize the processor efficiently. If there is some problems or errors during report generation, e-mail is sent. To diagnose the problem, look for log files that are stored in "/var/log/proxylizer directory (mail_send_log.log" is the default log file).
Report History
The whole history of generated reports are available in the report section on the Proxylizer web page. Each report in the table has the (history) icon in the action column. Clicking on it reveals the list of generated reports for this configuration. A particular report can be accessed then by clicking on the (View) icon.
Inactive Reports
If a report is not needed for a while, it can be deactivated instead of deleting. It has the following advantages:
- The history is saved;
- Report can be activated later without creating new configuration.
During the time when a report is inactive, it is not generated. If it is activated afterwards, report will be generated again starting from the activation point. For example, if a daily report is inactive for a period of 20.jan.2009.-25.jan.2009. the next report generated after 19.jan.2009. will be the one about 26.jan.2009., reports during the inactivity period will be skipped.
Reports can be activated/deactivated in the Reports section on the Proxylizer Web page.
Report Editing
Report configurations can be edited after creating. But some restrictions apply:
- Reports with frequency "once" can be modified only before they are generated. When the report is processed, the configuration is read-only, because this is what "once" means;
- To other report types only the fields, which do not change the semantics of the report, can be edited. For example: date interval, weekdays and generate time can be changed for daily report as they all do not affect the contents of daily report - they only specify some properties of when the report must be generated. Report type, frequency, time interval, IP address and domain are read only, because changing these values leads to a completely different report.
IP Users
Each Web-proxy request has a source IP address - address of the host, which generated the web request. Reports can be filtered using this IP address.
Usually one static IP address corresponds to a specific staff member. To make the filtering by IP address easy, Proxylizer has the ability to assign real persons to IP addresses. This can be done in the IP Users section on the Proxylizer web page.
IP users are used only to assign person names and email addresses to IP addresses. IP users do not have access to Proxylizer web page.
Each IP User (person) has the following attributes:
- IP - address of the person's computer;
- Name;
- E-mail - person's address to which report emails can be sent;
- Admin - when checked, this user is treated as administrator. Some Proxylizer functions are designed for admins, for example, carbon copy (CC) of email reports can be sent to all admin email addresses;
- User receives empty reports - when checked, this user receives email reports even when they are empty (for example, reports of weekend data when no web requests are generated). This is useful for administrators to identify report generation problems - the administrator is sure that report will be sent anyway, and when it is not received, it happens only because of some system or infrastructure failure.
Database Statistics
Database (DB) statistics are shown in Status section on the Proxylize Web Page. The following statistics are available:
- HIT-MISS ratio: shows the web-proxy hit/miss ration - what part of all requests are found in the proxy's cache;
- Total domain count: total count of different domains stored in the database. Note that, for example, www.mikrotik.com and wiki.mikrotik.com are counted as two different domains;
- Total hit count: total count of requests logged by the web-proxy;
- Oldest record - oldest request stored in the DB right now. This determines the oldest point in time we have data about;
- Latest record - most recent web request logged in the DB. If this field contains an old value (let's say, more than one hour ago) this means one of the following: either the clients are not using web-proxy or requests are not logged to Proxylizer DB. This value can be used to detect system errors;
- Data base size: how big is the database currently. This can be used to identify how much the size of the DB grows daily, monthly etc.
As you could notice, these values are loaded not immediately after logging in the web interface, rather several seconds later. The reason is simple - calculating the statistics takes some time and to not force the web user to wait, they are loaded in background. Once the values are calculated, they are cached for the whole web session. To get the actual statistics, Refresh button can be used later.