Xml and More: March 2016

Saturday, March 26, 2016

How-to: Installing Python 3.5.1 in Linux

This summary is not available. Please click here to view the post.

Friday, March 25, 2016

How-to: When a Missing Python Module Error Was Thrown

When updating Python from 2 to 3, you may want to get familiar with the following topics first:

Can you install multiple Python versions in Linux?
How to do when a missing module error was thrown?
Learn about search path to locate modules in Python
Know the differences between Python 2 and 3^[1]
How to resolve missing Python module

ImportError: No module named 'encodings'

Where is a specific Python module located?

Multiple Python Installations

In our system, we have both Python 2 and 3 installed under /usr/bin as such:

/usr/bin/python
/usr/bin/python3

To choose a specific version to use in your python scripts, you can specify shebang (or hashbang) as follows:

#!/usr/bin/python3

Python2


$ python

Python 2.4.3 (#1, Feb 24 2012, 13:04:26)

[GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2

Type "help", "copyright", "credits" or "license" for more information.

Python3


$ python3

Python 3.5.1 (default, Mar 24 2016, 20:01:47)

[GCC 4.1.2 20080704 (Red Hat 4.1.2-52)] on linux

Type "help", "copyright", "credits" or "license" for more information.

In the rest of article, we will use Python 3.5.1 for illustration unless stated otherwise. Read section "Python 2 vs Python 3" to learn the differences between them.

Missing Python Module

Python module is a file (e.g., with suffixes like .py, .pyc, .pyo etc.):^[13,17]

Containing Python definitions and statements
Can be imported in a script or in an interactive instance of the interpreter

Imported only once per interpreter session

Simply for efficiency reasons
If you change your modules, you must restart the interpreter
If it’s just one module you want to test interactively, can also use importlib.reload().^[14]

Oftentimes, you could run into missing Python Module reported by ImportError module like:


$ python3.5

Fatal Python error: Py_Initialize: Unable to get the locale encoding

ImportError: No module named 'encodings'

In such cases, you may need to fix sys.path to include missing library paths.

sys.path

sys.path variable stores a list of strings that specifies the search path for modules. It is initialized from these locations:

The directory containing the input script (or the current directory when no file is specified)
PYTHONPATH (a list of directory names)

With the same syntax as the shell variable PATH

The installation-dependent default

A program is free to modify this list for its own purposes. Only strings and bytes should be added to sys.path; all other data types are ignored during import. See also Module site — This describes how to use .pth files to extend sys.path.

Python 2 vs 3

In this section, we will show you how to display sys.path value from the command line without entering interactive mode. To do that, we use a built-in module print. However, as noted below, there are syntax differences between Python 2 and Python 3 in the way of invoking it.

Python 2



$ python -c 'import sys; print "\n".join(sys.path)'

/usr/lib64/python24.zip

/usr/lib64/python2.4

/usr/lib64/python2.4/plat-linux2

/usr/lib64/python2.4/lib-tk

/usr/lib64/python2.4/lib-dynload

/usr/lib64/python2.4/site-packages

/usr/lib64/python2.4/site-packages/Numeric

/usr/lib64/python2.4/site-packages/PIL

/usr/lib64/python2.4/site-packages/gtk-2.0

/usr/lib/python2.4/site-packages

Python 3



$ python3 -c 'import sys; print("\n".join(sys.path))'



/usr/lib/python35.zip

/usr/lib/python3.5

/usr/lib/python3.5/plat-linux

/usr/lib/python3.5/lib-dynload

/scratch/perf/.local/lib/python3.5/site-packages

/usr/lib/python3.5/site-packages

Where Is a Python Module Located?

When a module, say, encodings is imported, the interpreter first searches for a built-in module with that name. If not found, it then searches for a file named encodings.py in a list of directories given by the variable sys.path.

To find out where an imported module is located, you can use its attribute __file__.^[11] For example, module encodings is located under /usr/lib/python3.5 in our system:



$ python3

<snipped>

>>> import encodings

>>> print(encodings.__file__)

/usr/lib/python3.5/encodings/__init__.py

References

What does “SyntaxError: Missing parentheses in call to 'print'” mean in Python?

This error message means that you are attempting to use Python 3 to follow an example or run a program that uses the Python 2 print statement:

Install / Update Python 3.5.0 at Linux machine. (Youtube)
Python 3.5.1
Python Module
upgrade Python to 2.7.2
How can I troubleshoot Python “Could not find platform independent librar ies ”
Py_Initialize: Unable to get the locale encoding in OpenSuse 12.3
Environment Variables (Python)

Python script header
Standard modules (Python)
How do I find the location of Python module sources?
sys module — System-specific parameters and functions
What do the python file extensions, .pyc .pyd .pyo stand for?
How do I unload (reload) a Python module?
Purpose of #!/usr/bin/python3 (important)
shebang (or hashbang)

Under Unix-like operating systems, when a script with a shebang is run as a program, the program loader parses the rest of the script's initial line as an interpreter directive; the specified interpreter program is run instead, passing to it as an argument the path that was initially used when attempting to run the script.

Importing Python Modules

Wednesday, March 23, 2016

How-to: Can't locate IPTables/IPv4/IPQueue.pm

In this article, we will cover the following topics:

How to resolve Perl module missing issue
Know about CPAN (Comprehensive Perl Archive Network)
Learn how to configure CPAN module (i.e, CPAN.pm)

Missing Perl Module

When a Perl script using IPTables::IPv4::IPQueue^[1] was executed:


BEGIN

{   
   push @INC, "/scratch/perf/.../perl/5.8.8/x86_64-linux-thread-multi"; 
} 


use strict; 
use warnings; 
use IPTables::IPv4::IPQueue qw(:constants);

It threw the following error message:^[6]

Can't locate IPTables/IPv4/IPQueue.pm in @INC (@INC contains:,,,

A Perl module is the Perl equivalent of the class in OOP. It defines how its source codes are packaged (much like Java packages) using namespaces. Its file structure mirrors the namespace structure. For instance, IPTables::IPv4::IPQueue could locate in your file system somewhere like:

/usr/local/lib64/perl5/auto/IPTables/IPv4/IPQueue

To resolve the missing module issue, you need to install it by entering:


cpan[1]> force install IPTables::IPv4::IPQueue

But, before you do it, make sure you understand the following sections first.

CPAN (Comprehensive Perl Archive Network)

CPAN is a software repository of over 150,929 modules written in the Perl programming language. The modules can be downloaded from metacpan.org and also from mirrored sites worldwide. The resources found on CPAN are easily accessible with the CPAN.pm module.

From metacpan.org home page, you can search for any Perl Module you need. For example, enter "IPTables::IPv4" in the search field. You will find the documentation for IPTables::IPv4 here.

CPAN Module (CPAN.pm)

The resources found on CPAN are easily accessible with the CPAN.pm module. If you want to use CPAN module, you use CPAN shell, which provides an interactive mode, in two ways:


perl -MCPAN -e shell

--or--


cpan

Configuration Steps

If you want to use CPAN.pm, lots of things have to be configured. So, when you use it the first time, you will be prompted to configure them. After the configuration, don't forget to commit by entering:



cpan[19]> o conf commit

to make the configuration permanent, which configuration data will be logged into below file:

/usr/share/perl5/CPAN/Config.pm

Only one CPAN process can be run at a time and this is protected by a mechanism using below lock:

/root/.cpan/.lock

How to Connect to the Internet behind a Proxy

After the first-time configuration effort, you can still modify configured data by entering:



cpan[20]> o conf init

Then you will be asked if you like to configure as much as possible automatically or not. Without the trouble of going through all configuration steps again, you can also specify which data to be configured. For example, if your server is behind a proxy server, you may run into the following issue:

As you did not allow me to connect to the internet you need to supply a valid CPAN URL now.

To work around, you can configure a proxy for CPAN by entering:^[4,5]



cpan[21]> o conf init /proxy/

If you're accessing the net via proxies, you can specify them in the CPAN configuration or via environment variables. The variable in
the $CPAN::Config takes precedence. 

Your ftp_proxy? []

At the "Your http_proxy? " prompt, we have entered the following:

http://146.xx.xx.29:80

and it works fine afterwards. Besides proxy configuration, you may also want to configure a urllist to specify which mirror(s) to use for downloading:



cpan[22]> o conf init urllist

There are 235 registered sites around the world make up the N part of CPAN (the Network), you can find the full list here.

References

IPTables::IPv4::IPQueue - Perl extension for libipq.

/usr/include is where libipq.h should be located (see here for its usage case)

CPAN Mirror Network
Installing CPAN Perl Modules Revisited
Using CPAN with a proxy failing after o conf init /proxy/
lwp https requests via proxy
How do I include a Perl module that's in a different directory?
ipq: [Unknown error: Device or resource busy]

Try to reboot your Linux box

See Do I need to restart server after a linux kernel update?

Saturday, March 19, 2016

Linux: How to Read Large Text File—/var/log/messages

To support Cloud Services, IaaS is the hardware and software that powers it all – servers, storage, networks, operating systems. These days Linux (or Windows) servers used in IaaS are more and more powerful. Hence they also generate more log files.

Very often we will run into large message files above 1 GB. These log files can be viewed by regular text editors. However, most text editors have a limitation of supporting files over a certain size.

In this article, we will cover how to read large message files (e.g., /var/log/messages) generated on Linux systems.

/var/log/messages

To debug issues in Cloud environments, it's essential for you to know where the log files are and what is contained in each log file. On Linux servers, over a dozen log files are located in /var/log directory. Here we only focus on one of them:

/var/log/messages^[7]

This log aims at storing "general system activity" messages.

There are several things that are logged in /var/log/messages including mail, cron, daemon, kern, auth, etc.
The severity of messages could be

[INFO]
[DEBUG]
[WARNING]
[ERR]
etc

Older message files are archived periodically with their name annotated with the date.

If your Linux system uses rsyslogd utility, its configuration file is

/etc/rsyslog.conf

in which you can specify rules (i.e., selector + action) of logging. For example, you can log anything of level informational or higher except mail, cron, or private authentication message:

*.info;mail.none;authpriv.none;cron.none /var/log/messages

and messages are logged into a file named /var/log/messages.

Limitations of Text Editors

Some editors have limitations of supporting certain sizes of text file. For example, the following popular editors on Windows have described limitation:

Notepad^[3]

64 kilobytes (KB)

Wordpad^[4]

It's said of no size limit. But, the real problem is performance.
Depends on the version of Wordpad, some people say it can support files of size up to 20 MB without performance issues.

Textpad^[8]

It can handle file sizes up to the largest contiguous chunk of 32-bit virtual memory.

Solutions

Basically, there are two solutions of dealing with large text files:

Find a more capable text editor
Divide and conquer

If you google search "large text file", you may find many suggestions on Large Text File Reader. Some editors may be able to open and read large text files. However, the performance (e.g., searching a pattern) of it could be slow.

On Linux systems, a good approach is 'divide-and-conquer" by using split command like:

split -b1000m messages-20160315T2201 split-messages

After splitting, a good text editor such as Textpad will be able to read a file of 1000 MB easily.

References

Tuesday, March 8, 2016

Excel: Get Every Third Row with Formula: INDEX and ROWS*3

I have used TextPad to clean up data with bookmark and macro as described in [1]:

The next task is to extract start time and end time from column A to calculate elapsed time of each individual event. Start time and End time are located in different rows:

Start Time: A1, A4, ..., A{N*3+1}
End Time: A3, A6, ..., A{N*3+3}

where N = 0 to 64.

This article has followed an excellent video describing how to achieve the task using INDEX and ROWS functions in Excel.

Formula Used

To retrieve start time, here is the formula I have defined in cell J1:

=INDEX($A$1:$A$195, ROWS($J$1:J1)*3)

To retrieve end time, the formula in cell K1 is defined as:

=INDEX($A$1:$A$195,ROWS($K$1:K1)*3-2)

Details of Formula

Dollar Sign ($)

When you copy J1 and paste it to J2, the formula will be changed from

=INDEX($A$1:$A$195, ROWS($J$1:J1)*3)

=INDEX($A$1:$A$195, ROWS($J$1:J2)*3)

The reference (i.e., J1) in the formula just point to itself. After being pasted to J2, the reference will automatically be set to J2. But, notice that the following references:

$A$1
$A$195
$J$1

remain the same after being pasted into J2 because we have prefixed them with dollar sign ($). For example, instead of A1, we have named it $A$1.

INDEX Function

INDEX function has two forms:

Array form

INDEX(array, row_num, [column_num])

Reference form

INDEX(reference, row_num, [column_num], [area_num])

Our formula uses the array form by specifying a range of cells as:

$A$1:$A$195

and row_num is defined using ROWS function.

ROWS Function

ROWS function has the following syntax:

ROWS(array)

where array can be an array, an array formula, or a reference to a range of cells for which you want the number of rows.

Our formula defines array as a range of cells. For example, in cell J2, the range is J1:J2 and, in cell J3, the range is J1:J3. In other words, we just count the number of rows from current cell to the first cell in column J. Then we use that count multiplied by three to retrieve every third row from the array.

References

Tuesday, March 1, 2016

SSH: How to Simplify Connection Using Configuration Files

ssh (SSH client) is a program for logging into a remote machine and for executing commands on a remote machine. ssh obtains configuration data from the following sources in the following order (for each parameter, the first obtained value will be used):

command-line options

If a configuration file is given on the command line (i.e., ssh -F ), the system-wide configuration file (/etc/ssh/ssh_config) will be ignored

user's configuration file

~/.ssh/config

system-wide configuration file

/etc/ssh/ssh_config

In this article, we will focus on the specifications of directives via ssh's configuration file (specifically user's configuration file).

Advantages of Using Configuration File

There are some advantages of using configuration file to specify ssh directives:

Can use shorthand to avoid long keystrokes
Avoid mistakes

Especially when you have lots of parameters to be specified and/or some of them using non-standard connection values.

Can provide options in different scopes (per-host vs per-user)

User's Configuration File

Here are the sample contents from a user's configuration file (i.e., ~/.ssh/config):

Host dev
    HostName dev.example.com
    Port 22000
    User fooey
Host github.com
    IdentityFile ~/.ssh/github.key

Instead of specifying:

ssh fooey@dev.example.com -p 22000

now you can just use the shorthand "dev" and the options will be read from the configuration file:

ssh dev

Ssh session normally will prompt you for a password. However, you can also set up public/private keys for password-less logins.^[4]

Format of Configuration File

To get you started, here are the basics:^[3]

Section

Separated by "Host" specifications

A single ‘*’ as a pattern can be used to provide global defaults for all hosts

See here for more information on patterns

The matched host name is usually the one given on the command line

Comment

Empty lines and lines starting with ‘#’ are comments.

Keyword

Case-insensitive
Examples

Host, Match, etc.

Directive/Argument

Directive

Used to specify session details including:

Identity

Username

Bind address

[bind_address:]port:host:hostport

Address family

“any”, “inet” (use IPv4 only), or “inet6” (use IPv6 only)

Other options

ServerAliveInterval, ServerAliveCountMax, etc

See the directive reference here

Argument

Arguments are case-sensitive
Arguments may optionally be enclosed in double quotes (") in order to represent arguments containing spaces.

/var/log/secure

Linux has an extensive set of log files under the /var/log directory.^[5] This directory is the central place where all applications and programs put their log files. Most log files are text files that can be viewed using a standard text editor.

/var/log/secure – This file contains all security related messages on the system. This includes authentication failures, possible break-in attempts, SSH logins, failed passwords, sshd logouts, invalid user accounts etc.

-rw------- 1 root root 3091237 Sep 14 11:54 secure
-rw------- 1 root root 2429153 Aug 18 01:50 secure-20130818
-rw------- 1 root root 4695728 Aug 25 03:29 secure-20130825
-rw------- 1 root root 12348973 Sep 1 02:24 secure-20130901
-rw------- 1 root root 7211819 Sep 8 01:22 secure-20130908

As shown above, old secure files are archived periodically with their name annotated with the date.

/var/log/messages – This file contains messages of various programs and services including the SSH server.^[6,7] Old message files are also archived periodically with their name annotated with the date.