Proxylizer/Concepts

From MikroTik Wiki
Jump to navigation Jump to search

Collected Log Data

Received logs from MikroTik router Web Proxy-server contain various information about web requests. Information stored from these logs is domain , IP addresses, event time and is the site loaded from proxy server cache or not. All URL info that is behind '?' is dropped off and not stored in data base, because this info is too large and it is unnecessary for monitoring Web Proxy server request statistics.

Received domain is stored in other table then logs and divided in 3 parts sub domain, domain and top domain. If domain has more than 3 parts, for example abc.def.ghi.mikrotik.com, it is divided in 'abc.def.ghi' + 'mikrotik' + 'com'.

It takes approximately 200KB of space to store 1000 log records in the data base. You can see data base statistic in the Status section on the Proxylizer web page.

Reports

  • Report name - This field is designed to give each report user created identifier.
  • Report type - There are 2 report types - Domain and User statistics. Main difference is that Domain statistics report is used to analyze domain usage of all web requests, but other to view web requests of each user.
  • Frequency - There are 4 types of frequency available - once, daily, weekly and monthly. There is one difference between once report and others. Once report is designed to create custom reports, there are no restrictions for this type of report. But daily, weekly an monthly reports are generated only when all data are collected. For example, if data for March 4th are needed, them are available only after March 5th 00:00.
  • Recepient - All generated reports can be sent to e-mail, if "No recipient" selected, then reports are available in web interface at (history).
  • Date interval - These fields are for setting up date boundaries for report. For once report it means that all data in this period will be selected, for daily - for each day, weekly - each week, monthly - each month. It is posible, to set these boundaries in past and in future. For example if today is February 15th it is posible to set these values for daily report January 3rd to February 2nd and receive reports for each day in this period or for March 5th to April 20th and when these dates come reports will be generated.
  • Day interval - For frequency "monthly" there is this field, it means days of month interval of what data are needed. For example if data for first an then second half of month is needed, 2 reports must be created one - 1st till 15th, second - 16th till 31st. And pertaining to short months, if data till end of month are needed then second value must be 31, even in February and other short months. But if second value is 30 ore less then for example September 31th will be not selected.
  • Weekdays - Use if custom days of week filtration needed. For example if only working days needed then uncheck weekend days.
  • Time interval - Use if custom time of day filtration needed. Multiple time intervals available, just click to add and to remove.
  • IP - Use to filter IP addresses. If "Show all" selected then report for all IP addresses and approximate time spent on internet is displayed. Otherwise domains and time for concrete IP user.
  • Domain - Use to filter domains and parts of domains. As explained previously domain is divided in 3 parts - sub domain, domain and top domain. Respectively use first field to filter by sub domain, second - by domain and etc.
  • Top - Use if only top data are needed. For example, select "View top 10" to view only 10 most used domains.
  • Generate time - This field is to set time when report must be generated. For each frequency it is different. For "once" it is date and day time, for daily only day time and etc. How mentioned previously in this section data can be generated only when all data are collected. For example, if weekly report for working days needed then data can be generated only after Saturday 00:00. Principle - as fast as posible. For now there is unresolved problem - if data ,for example, till 15:00 are needed then them are available only on next day 00:00.

Time calculation

Time used for each domain or user is calculated by simple theory - if user makes at least one web request per minute, user has used internet for one minute. It means that it is no difference if there are tousands of requests per minute ore just one. Count of these request are selected and viewed under column "Hitcount". And it is irrelevant, how many different requests are made at one time.

Created Reports

Reports consist of two parts: 1. Report configuration - contains information about what data has to be collected, when the reports will be generated and who will receive them; 2. The generated report - contains the actual data based on configuration rules.

The configuration is created in the report section on the Proxylizer web page.

Reports are generated automatically by a background script which is run by the scheduler.

When a configuration is deleted, all generated reports for that configuration are deleted as well.

Report Generation

Report generation principle is like this - each minute script is started and checked if there must be some report generated. Reports are generated as fast as posible - when all data are collected and generation time has come. If some reports are generating more then one minute, there is simultaneous report generation designed. It is posible to set this here and it is useful to utilize multi-core processors efficiently. If there is some problems or errors with report generation e-mail is sent and to diagnose the problem look for log files that are stored in "/var/log/proxylizer/mail_send_log.log"(if standard configuration).

Report History

The whole history of generated reports are available in the report section on the Proxylizer web page. Each report in the table has the (history) icon in the action column. Clicking on it reveals the list of generated reports for this configuration. A particular report can be accessed then by clicking on the (View) icon.

Inactive Reports

If a report is temporary not needed, it can be inactivated instead of deleting. It has the following advantages:

  • The history is saved;
  • Report can be activated later without creating new configuration.

During the time when a report is inactive, it is not generated. If it is activated afterwards, report will be generated again starting from the activation point. For example, if a daily report is inactive for a period of 20.jan.2009.-25.jan.2009. the next report generated after 19.jan.2009. will be the one about 26.jan.2009., reports about the inactivity period will be skipped.

Reports can be activated/inactivated in the Reports section on the Proxylizer Web page.

Report Editing

Report configurations can be edited after creating. But some restrictions apply:

  • Reports with frequency "once" can be modified only before they are generated. When the report is processed, the configuration is read-only, because this is what "once" means;
  • To other report types only the fields, which do not change the semantics of the report, can be edited. For example: date interval, weekdays and generate time can be changed for daily report as they all do not affect the contents of daily report - they only specify some properties of when the report must be generated. Report type, frequency, time interval, IP address and domain are read only, because changing these values leads to a completely different report.

IP Users

Each Web-proxy request has a source IP address - address of the host, which generated the web request. Reports can be filtered using this IP address.

Usually one static IP address corresponds to a specific staff member. To make the filtering by IP address easy, Proxylizer has the ability to assign real persons to IP addresses. This can be done in the IP Users section on the Proxylizer web page.

IP users are used only to assign person names and email addresses to IP addresses. IP users do not have access to Proxylizer web page.

Each IP User (person) has the following attributes:

  • IP - address of the person's computer;
  • Name;
  • E-mail - person's address to which report emails can be sent;
  • Admin - when checked, this user is treated as administrator. Some Proxylizer functions are designed for admins, for example, carbon copy (CC) of email reports can be sent to all admin email addresses;
  • User receives empty reports - when checked, this user receives email reports even when they are empty (for example, reports of weekend data when no web requests are generated). This is useful for administrators to identify report generation problems - the administrator is sure that report will be sent anyway, and when it is not received, it happens only because of some system or infrastructure failure.

Database Statistics

Database (DB) statistics are shown in Status section on the Proxylize Web Page. The following statistics are available:

  • HIT-MISS ratio: shows the web-proxy hit/miss ration - what part of all requests are found in the proxy's cache;
  • Total domain count: total count of different domains stored in the database. Note that, for example, www.mikrotik.com and wiki.mikrotik.com are counted as two different domains;
  • Total hit count: total count of requests logged by the web-proxy;
  • Oldest record - oldest request stored in the DB right now. This determines the oldest point in time we have data about;
  • Latest record - most recent web request logged in the DB. If this field contains an old value (let's say, more than one hour ago) this means one of the following: either the clients are not using web-proxy or requests are not logged to Proxylizer DB. This value can be used to detect system errors;
  • Data base size: how big is the database currently. This can be used to identify how much the size of the DB grows daily, monthly etc.

As you could notice, these values are loaded not immediately after logging in the web interface, rather several seconds later. The reason is simple - calculating the statistics takes some time and to not force the web user to wait, they are loaded in background. Once the values are calculated, they are cached for the whole web session. To get the actual statistics, Refresh button can be used later.