Sunday, August 23, 2020

Parsing WebLogic Access Log with Perl

Why Perl?


Perl is a general-purpose programming language originally developed for text manipulation and now used for a wide range of tasks including system administration, web development, network programming, GUI development, and more.

Perl is easy, nearly unlimited, mostly fast, and kind of ugly.  However, its power and quickness of coding can also be achieved with the combination of other tools:
  • shell or awk programming with
    • grep, cut, sort, and sed

Access Log in WebLogic


By default, WebLogic Server (WLS) keeps a log of all HTTP transactions in a text file. The file is named access.log and is located in the 
$DOMAIN_HOME/servers/Xxx/logs 
directory.

The log provides true timing information from WebLogic, in terms of how long each individual application request takes. This timing information can be important in troubleshooting a slow system.

For more details, read [2] or other more updated information at the Oracle official site.

Case Study


In this article, we will use the below sample access log entry for the illustration:

2020-08-23      15:54:02        0.031   479     GET     /xx-contentyyyyyyy/api/v1/instances/bootstrap/artifacts/namespaces/content:catalog/attributes/system/skins/activetheme        404     "4cb1bd49-1deb-4b6f-84c5-1153f22e3739-0000000c" "1.4cb1bd49-1deb-4b6f-84c5-1153f22e3739-0000000c;kXKwo3hCQtRLGmjE0ZJOoOTLkKPOoLRKlSODoITT_G"  -       -


Notice that the above fields are separated by horizontal tab (i.e., ht), not spaces.

0000000   2   0   2   0   -   0   8   -   2   3  ht   1   5   :   5   4
0000020   :   0   2  ht   0   .   0   3   1  ht   4   7   9  ht   G   E
0000040   T  ht   /   b   i   -   c   o   n   t   e   n   t   s   t   o
0000060   r   a   g   e   /   a   p   i   /   v   1   /   i   n   s   t
0000100   a   n   c   e   s   /   b   o   o   t   s   t   r   a   p   /
0000120   a   r   t   i   f   a   c   t   s   /   n   a   m   e   s   p
0000140   a   c   e   s   /   c   o   n   t   e   n   t   :   c   a   t
0000160   a   l   o   g   /   a   t   t   r   i   b   u   t   e   s   /
0000200   s   y   s   t   e   m   /   s   k   i   n   s   /   a   c   t
0000220   i   v   e   t   h   e   m   e  ht   4   0   4  ht   "   4   c
0000240   b   1   b   d   4   9   -   1   d   e   b   -   4   b   6   f
0000260   -   8   4   c   5   -   1   1   5   3   f   2   2   e   3   7
0000300   3   9   -   0   0   0   0   0   0   0   c   "  ht      "   1   .
0000320   4   c   b   1   b   d   4   9   -   1   d   e   b   -   4   b
0000340   6   f   -   8   4   c   5   -   1   1   5   3   f   2   2   e
0000360   3   7   3   9   -   0   0   0   0   0   0   0   c   ;   k   X
0000400   K   w   o   3   h   C   Q   t   R   L   G   m   j   E   0   Z
0000420   J   O   o   O   T   L   k   K   P   O   o   L   R   K   l   S
0000440   O   D   o   I   T   T   _   G   "  ht   -  ht   -  nl 

Awk


Awk is a pattern scanning and processing language, which is good for purposes of extracting or transforming text, such as producing formatted reports.  Read [4] for more details.

For well-formatted access.log in WLS, awk can be handy for extracting fields such as:
  • cs-method — The request method, for example GET or POST. This field has type <name>, as defined in the W3C specification.
  • cs-uri — The full requested URI. This field has type <uri>, as defined in the W3C specification.
  • sc-status — Status code of the response, for example (404) indicating a "File not found" status. This field has type <integer>, as defined in the W3C specification.

 bash-4.2$ awk '{ print $5, $6, $7 }' sample.log  | grep "\s404" | sort -r | uniq -c
      1 GET /xx-contentyyyyyyy/api/v1/instances/bootstrap/artifacts/namespaces/content:catalog/attributes/users/18446744073709551615 404
      1 GET /xx-contentyyyyyyy/api/v1/instances/bootstrap/artifacts/namespaces/content:catalog/attributes/system/skins/activetheme 404
      1 GET /xx-contentyyyyyyy/api/v1/instances/bootstrap/artifacts/namespaces/content:catalog/attributes/maintenancemode 404


Perl


Perl is a powerful programming language due to its unsurpassed regular expression and string parsing abilities.  In Perl, you can use patterns to locate the parts of strings that you want to change with its “search and replace” .

Search and replace is performed using s/regex/replacement/modifiers. The replacement is a Perl double-quoted string that replaces in the string whatever is matched with the regex . If there is a match, s/// returns the number of substitutions made; otherwise it returns false.


bash-4.2$ perl -n -p -e 's/^[^A-Z]*([A-Z]+)\s([^\"]*)\s(\".*)/$1 $2/g' sample.log | grep "\s404" | sort -r | uniq -c

      1 GET /xx-contentyyyyyyy/api/v1/instances/bootstrap/artifacts/namespaces/content:catalog/attributes/users/18446744073709551615    404
      1 GET /xx-contentyyyyyyy/api/v1/instances/bootstrap/artifacts/namespaces/content:catalog/attributes/system/skins/activetheme      404
      1 GET /xx-contentyyyyyyy/api/v1/instances/bootstrap/artifacts/namespaces/content:catalog/attributes/maintenancemode       404


Note that the above Perl example is not the optimal command for the designed purpose.  But, we just try to demonstrate as many Perl's features as possible in one example. 

In the above example, our substitution operator is:
s/^[^A-Z]*([A-Z]+)\s([^\"]*)\s(\".*)/$1 $2/g
or
regex: "^[^A-Z]*([A-Z]+)\s([^\"]*)\s(\".*)"
replacement: "$1 $2"
modifiers: "g"

where

  • regex
    • ^[^A-Z]* matches 2020-08-23      15:54:02        0.031   479     
    • The first capturing group ([A-Z]+) or $1 match GET     /xx-contentyyyyyyy/api/v1/instances/bootstrap/artifacts/namespaces/content:catalog/attributes/system/skins/activetheme
    • The second capturing group ([^\"]*) or $2 matches 404
    • Note that we have discarded the third capturing group or $3
    • \s matches a whitespace character (i.e., ht)
  • replacement
    • The whole line was changed to "$1 $2" or
      • GET     /xx-contentyyyyyyy/api/v1/instances/bootstrap/artifacts/namespaces/content:catalog/attributes/system/skins/activetheme 404
  • modifiers
    • The global modifier /g allows the matching operator to match within a string as many times as possible.  In our example, it is not needed.  But, for illustration only.

You can read this Perl script example to learn more of its features.

Acknowledgement


This author would like thank his co-worker Mohan Tadepalli for providing Perl examples and inspiring me to write this article.

References

7 comments:

youtubecomactivate said...

Nice Post, I really like your post. If any YouTube users need help regarding youtube.com/activate then they can contact us.

Digital Shout said...

That's a beautiful post. I can't wait to utilize the resources you've shared with us. Do share more such informative posts.For more information regarding SEO and Digital Marketing Services Visit Digital Shout We provide marketing services to startups and small businesses looking for a partner for their digital media. We work with you, not for you. We are on a mission to build, grow and maintain loyal communities.

moumita said...

They are very useful article. really it was so awesome article. Thanks for sharing nice information. Bridal Makeup Artists in Kolkata

Best Training Institute said...

Hey there it awesome to read and explore your blog my friend. Keep creating like this! Thanks for sharing!
Perl training in bangalore
Perl institutes in bangalore

PhotoMama.in said...

It’s an amazing blog regarding the topic regarding the topic "Parsing WebLogic Access Log with Perl".
Photomama.in is one of The Best Candid Wedding Photographers in Hyderabad.

Kristina Dunn said...

There are no words to describe how much I appreciate this post. I am really impressed by the great man who made this post. Thank you for sharing. In the same way, I share with you an article which is related to Fake Instagram Post. This is a fun article, which you can use for fun. You can make fake posts on Mac and PC.

APTRON said...

Perl Interview Questions and Answers