Tuesday, June 21, 2016

Kafka: Knowing the Basics

Learning a new software/system, it's better to start with a high-level view of it.

In this article, we will introduce you the basics of Apache Kafka (written in Scala; does not use JMS). From here, you may continue to explore, say, how to configure Kafka components, how to monitor Kafka performance metrics, etc.


What is Kafka?


In the last few years, there has been significant growth in the adoption of Apache Kafka. Current users of Kafka include Uber, Twitter, Netflix, LinkedIn, Yahoo, Cisco, Goldman Sachs, etc.[12]

Kafka is a message bus that achieves
  • a high level of parallelism
It also decouples between data producers and data consumers, which makes its architecture more flexible and adaptable to change.

Key Concepts


Kafka is a distributed messaging system providing fast, highly scalable and redundant messaging through a pub-sub model. It is organized around a few key terms:
  • Topics
  • Producers
  • Consumers
  • Messages
  • Brokers
Communication between all components of Kafka is done via a high performance simple binary API over TCP protocol.
  • Kafka
    • Topics
        • Kafka maintains feeds of messages in categories called topics
          • All Kafka messages are organized into topics.
      • Cluster
        • As a distributed system, Kafka runs in a cluster.
        • Each node in the cluster is called a Kafka broker.
          • Each broker holds a number of partitions and each of these partitions can be either a leader or a replica for a topic.
          • Brokers load balance by partition
    • Clients
      • Producers
        • Send (or push) messages to a specific topic
      • Consumers
        • Read (or pull) messages from a specific topic

    Messages


    Each specific message in a Kafka cluster can be uniquely identified by a tuple consisting of the message’s
    • Topic
    • Partition
      • Topics are broken up into ordered commit logs called partitions
    • Offset (within the partition )
    Kafka offers 4 guarantees about data consistency and availability:[3]
    1. Messages sent to a topic partition will be appended to the commit log in the order they are sent,
    2. A single consumer instance will see messages in the order they appear in the log,
    3. A message is ‘committed’ when all in sync replicas have applied it to their log, and
    4. Any committed message will not be lost, as long as at least one in sync replica is alive.
    These guarantees hold as long as you are producing to one partition and consuming from one partition. All guarantees are off if you are reading from the same partition using two consumers or writing to the same partition using two producers. Finally, message ordering is preserved for each partition, but not the entire topic.


    Zookeeper


    Kafka servers require zookeeper. Brokers, producers, and consumers use zookeeper to manage and share state.

    However, the way Zookeeper used in v 0.8 Kafka and v 0.9 Kafka differs.[11] So, the first thing you need to know is which version of Kafka you are referring to. To find out the version of Kafka, do:
    • cd $KAFKA_HOME
    • find ./libs/ -name \*kafka_\* | head -1 | grep -o '\kafka[^\n]*'
    For example, if the above command line prints:
    • kafka_2.10-0.9.0.2.4.2.0-258-javadoc.jar

    It means that the following versions of products are installed:
    • Scala version
      • 2.10
    • Kafka version
      • 0.9.0.2.4.2.0-258

    In summary, Kafka uses Zookeeper for the following:[5]
    1. Electing a controller
      • The controller is one of the brokers and is responsible for maintaining the leader/follower relationship for all the partitions.
      • When a node shuts down, it is the controller that tells other replicas to become partition leaders to replace the partition leaders on the node that is going away.
      • Zookeeper is used to elect a controller, make sure there is only one and elect a new one if it crashes.
    2. Cluster membership
      • Tells which brokers are alive and are still part of the cluster
    3. Topic configuration
      • Tells which topics exist, how many partitions each has, where are the replicas, who is the preferred leader, and what configuration overrides are set for each topic
    4. (0.9.0)
      • Quotas - how much data is each client allowed to read and write
      • ACLs - who is allowed to read and write to which topic
    5. (old high level consumer)
      • Tells which consumer groups exist, who are their members and what is the latest offset each group got from each partition.
      • This functionality is going away

    Diagram Credit

    • www.michael-noll.com

    References

    1. Apache Kafka
    2. Configuration of Kafka
    3. Kafka in a Nutshell
    4. Why do Kafka consumers connect to zookeeper, and producers get metadata from brokers
    5. What is the actual role of ZooKeeper in Kafka?
    6. How to choose the number of topics/partitions in a Kafka cluster?
    7. Message Hub Kafka Java API
    8. Introduction to Apache Kafka
    9. Kafka Controller Redesign
    10. Log4j Appender 
    11. Apache Kafka 0.8 Basic Training (Michael G. Noll, Verisign)
        • ZooKeeper
          • v0.8: used by brokers and consumers , but not by producers
            • v0.9: used by brokers only
                • Consumers will use speicial topics instead of ZooKeeper
                  • Will substitally reduce the load on ZooKeeper for large deployments
              • G1 Tuning (JDK 7u51 or later; slide 66)
                • java -Xms4g -Xmx4g -XX:PermSize=48m -XX:MaxPermSize=48m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35
                  • Note that PermGen has been removed in JDK 8
            1. The value of Apache Kafka in Big Data ecosystem
            2. Kafka Security Specific Features (06/03/2014)
            3. Kafka FAQ
            4. Kafka Operations
            5. Kafka System Tools
            6. Kafka Replication Tools
            7. Monitoring Kafka performance metrics
            8. Collecting Kafka performance metrics
            9. Spark Streaming + Kafka Integration Guide (Spark 1.6.1)
            10. All Cloud-related articles on Xml and More

            Sunday, June 5, 2016

            How to Enable Software Collections (SCL) on RHEL

            Package is the basic unit used for the distribution, management and update of software customized for a Linux system.  Collections of packages stored in different software repositories (e.g., YUM Repositories) can be accessed locally or over a network connection.  Metadata are combined with packages themselves to determine (and resolve, if possible) dependencies among the packages.

            In this article, we will try to learn the following entities/concepts in more details:
            • RPM 
              • May refers to the .rpm file format, files in this format, software packaged in such files, and the package manager itself
              • RPM's dependency processing is based on knowing what capabilities are provided by a package and what capabilities a package requires.
              • It does automatically determine what shared libraries a package requires
            • YUM 
              • One of the several front-ends to RPM, which ease the process of obtaining and installing RPMs from repositories and help in resolving their dependencies.
              • YUM Repository
                • YUM repository metatadata is structured as a series of XML files to organize packages and their associated package groups for installation.
                • Need to setup Apache, nginx, or some other web server and point it at the base directory of the repository to make it available
            • Red Hat Software Collections
              • Is a prescribed set of content intended for use in Red Hat Enterprise Linux production environments

            RPM


            The RPM Package Manager (RPM) is a package management system that
            • Facilitates the distribution, management and update of software customized for a Linux system
              • Works with standard package management tools (e.g., Yum or PackageKit) to install, reinstall, remove, upgrade and verify RPM packages
            • Provides metadata to describe packages, installation instructions, and so on
              • Each RPM package includes metadata that describes the package's components, version, release, size, project URL, installation instructions, and so on
            • Separates source and binary packages
              • In source packages, you have the pristine sources along with any patches that were used, plus complete build instructions.
            • Allows you to use the database of installed packages to query and verify packages
            • Allows you to add your package to a Yum repository
            • Allows you to Digitally sign your packages (e.g., using a GPG signing key)

            Red Hat Software Collections


            Red Hat Software Collections is a prescribed set of content intended for use in Red Hat Enterprise Linux production environments. Through Red Hat Software Collections, you can choose the runtime versions best suited for your projects, preserve application stability, and deploy your applications with confidence.

            Software Collections Functionality


            Software collections functionality (from hereafter, we will refer it simply as "Software Collections")—not to be confused with Red Hat Software Collections—has been available in earlier Red Hat Enterprise Linux distributions. Software collections provides a structural definition, independent of the operating system, for applications or tools.

            With Software Collections, you can build and concurrently install multiple versions of the same software components on your system. Software Collections have no impact on the system versions of the packages installed by any of the conventional RPM package management utilities.

            To summarize, Software Collections have the following characteristics:
            • Do not overwrite system files
            • Are designed to avoid conflicts with system files
              • Software Collections make use of a special file system hierarchy to avoid possible conflicts between a single Software Collection and the base system installation.
            • Require no changes to the RPM package manager
            • Need only minor changes to the spec file
              • To convert a conventional package to a single Software Collection, you only need to make minor changes to the package spec file.
            • Allow you to build a conventional package and a Software Collection package with a single spec file
            • Uniquely name all included packages
            • Do not conflict with updated packages
            • Can depend on other Software Collections 
              • Because one Software Collection can depend on another, you can define multiple levels of dependencies.


            Enabling and Building Software Collections


            To enable support for Software Collections on your system so that you can enable and build Software Collections, you need to have installed the following packages:

            References

            1. Software Collections Guide (Redhat)
            2. 20 Linux YUM (Yellowdog Updater, Modified) Commands for Package Management
            3. PackageKit
            4. Red Hat Enterprise Linux 6 Deployment Guide 
            5. Red Hat Enterprise Linux 5 Deployment Guide.
            6. Learn Linux, 101: RPM and YUM package management
              • Yum allows automatic updates, package and dependency management, on RPM-based distributions.
            7. Creating a Local Yum Repository Using an ISO Image
              • Yum works with software repositories (collections of packages), which can be accessed locally or over a network connection.
            8. Configuring Yum and Yum Repositories (important)
            9. Converting a Conventional Spec File
            10. HOWTO: GPG sign and verify RPM packages and yum repositories
            11. The Software Collections (SCL) Repository (CentOS)
            12. Red Hat will provide PHP 5.4 for RHEL-6

            Tuesday, May 31, 2016

            PHP: Knowing the Basics

            PHP (recursive acronym for PHP: Hypertext Preprocessor) is mainly focused on server-side scripting, so you can do anything any other CGI program can do, such as collect form data, generate dynamic page content, or send and receive cookies. But PHP can do much more.[1]

            PHP has the following features:
            • PHP code may be embedded into HTML code
            • PHP code is usually processed by a PHP interpreter 
              • PHP interpreter is implemented as a module in the web server or as a Common Gateway Interface (CGI) executable. 
              • The web server combines the results of the interpreted and executed PHP code, which may be any type of data, including images, with the generated web page. 
            • PHP code may also be executed with a command-line interface (CLI) 
              • Which can be used to implement standalone graphical applications.

            Pros and Cons


            Based on an article posted in 2010, here are the pros and cons described by 8 experts: [2]

            Pros
            • Ubiquity and ease to use
              • With PHP, you have the freedom of choosing an operating system and a web server.
            • An excellent tool for disciplined developers
              • It stays close to its C roots while removing some of the unnecessary pain points like memory management, pointers and the compile cycle. 
              • The OOP implementation is simple, elegant and easier to read than its peers. 
              • The Java mantra of "complexity at any cost" is nowhere to be found; concise method names are used throughout. 
            • Good documentation 
            • Healthy PHP community
              • PHP Planet is a great resource for and from PHP community members

            Cons
            • One of PHP's biggest strengths is also one of its limitations
              • Hard to maintain codes written by other people
                • PHP is very flexible in general; there are no less than 30 ways to accomplish the same task.
                • Especially when code standards are not consistent and best practices aren't followed
              • Lots of bad PHP in the world
                • The low entry barrier means that there is a lot of bad PHP in the world
                • The readily available resources online can be great and terrible at the same time
                  • For example, junior developers can quickly pick up insecure PHP codes from the Internet and adapt them in their projects
              • Too much choice (time can be wasted searching for quality codes)
                • Developers have too much choice when it comes to selecting a library or framework to work with, and the information available is often biased and unreliable so a lot of time can be wasted searching for quality.

            PHP Composer


            If you have ever written anything in PHP before, you have probably found that it feels like you have to keep re-inventing the wheel anytime you want to do a common task such as User Authentication, Database Management or Request Routing. PHP now has a dozen of mature frameworks[10,11]
            that have already solved all of these problems.  PHP Composer is a tool to make it easier to cherry pick the bits that you needed from each framework.

            To summarize, PHP Composer is
            • A dependency manager for PHP
              • Help you to install packages on a project-by-project basis

            References

            1. What can PHP do?
            2. 8 Experts Break Down the Pros and Cons of Coding With PHP
            3. HipHop
            4. PhpUnit
            5. Joind.in
              • Is a good example of a PHP app
            6. StatusNet
              • Is an open-source microblogging platform
            7. PHPDoc
            8. Facebook
              • Best PHP application dealing with scalability
            9. What is PHP Composer?
            10. The Best PHP Framework for 2015: SitePoint Survey Results
            11. 10 PHP Frameworks For Developers – Best Of
            12. PHP Extension and Application Repository (PEAR)
              • Is a repository of PHP software code to promote the re-use of code that performs common functions (founded by Stig S. Bakken in 1999)
            13. PHPClasses.org
              • Is a service created in 1999 as a means of distributing freely available programming classes of objects written in PHP
            14. Software Collections (rh-php56)

            Friday, April 22, 2016

            Security Considerations: Enabling/Disabling Ping /Traceroute for Your Network

            There are two Linux tools (i.e., Ping and Traceroute) common for monitoring network connections. However, because of cycle attacks, most systems have been hardened to disable them in hope of staving off attacks and the surveillance of network mapping tools.

            In this article, we will demonstrate how to secure your network by using two approaches:
            • Uses IP Filters
              • Use filters to prevent attacks from crossing the firewall
            • Use NAT (Network Address Translation)
              • Define the name/address mappings for trusted nodes in your secure network

            Ping and Traceroute


            The main difference between Ping and Traceroute commands is that
            • Ping
              • Is a quick and easy way to tell you if the destination server is online and estimates how long it takes to send and receive data to the destination
            • Traceroute
              • Tells you the exact route you take to reach the server from your computer and how long each hop takes

            Ping

            Ping generates an ICMP echo request message and expects to receive an echo reply message in response. Echo request is a relatively safe message, but any of the ICMP messages can be used by an outsider in order to gain some knowledge of your network or to directly attack your system. Also, like every protocol that you allow, ICMP messages can be used to overwhelm your systems in a denial of service attack ( i.e., ICMP ping responses are used as a covert-channel). Unlike the higher layer protocols, such as TCP or UDP, there is not a source port nor a destination port, just the message type and code.[7]

            Traceroute (or tracert on Windows)

            Traceroute is useful in allowing network administrators to track the path that an IP packet is following in order to reach its final destination. It works by sending UDP packets from one high port (port number > 1023) to another high port. It selects a free UDP Port and starts to send packets to different high ports. If you see a series of UDP packets within this port range (i.e., 33434 - 33600), then it is probably indicative of traceroute.

            In order to discover the path, it plays some tricks with the TTL value of the packet (this field must be decremented by routers everytime they forward the packet). First it sends a UDP packet with TTL=1, so the first router gets the packet, decrements the TTL field, and then discards the packet because the TTL reached 0. After discarding the packet, the router sends an ICMP TTL exceeded message to the sender, so the sender learns the address of the first hop. Then it uses a TTL value of two, and it gets the second router address. It keeps getting router addresses with TTL exceeded messages until the packet reaches the destination host.

            IP Filters


            Firewall implementation normally use IP filters to control which packets are passed and which are blocked on each side.[1] The information it uses to decide whether to block or pass a packet is largely contained in the packet headers.

            Some of the filtering criteria are:
            • The source and destination IP address
            • The direction of flow
            • The IP protocol (ICMP, TCP, UDP or other protocols)
            • The interface where the packet is detected (secure or nonsecure)

            Considerations of ICMP Filtering

            ICMP is a protocol designed to communicate errors and information between hosts that are processing IP datagrams. In other words, ICMP messages are the "control messages" for TCP/IP. There are many different types of ICMP messages. For example, type 8 ICM messages are echo requests and type 0 ICM messages are echo replies.

            The echo (a,k.a. echo request) message is used to check if a host is up or down. When a host receives the request, it sends back an echo reply message. These messages are usually generated by a ping command, but may also be generated by a network management station that is polling the nodes of a network.

            The simplest approach to secure ICMP protocol is to block all ICMP messages from crossing the firewall. However, the problem is that ICMP are the "control messages" for TCP/IP. If you block all incoming ICMP, then you may break some essential networking. So, you want to be more selective about which ICMP messages you want to allow.

            The absolute minimum ICMP traffic to allow is the packets dealing with TCP path MTU discovery. Fragmenting a stream is more efficient at the TCP layer rather than the IP layer, so the TCP layer will try to discover when IP packets are being inadvertently fragmented. They do this by setting the "DF" (Don't Fragment) on all outgoing packets. When a router cannot forward the packet because it is too big, rather than fragmenting it, it sends back a "fragmentation needed" ICMP packet (type=3/code=4). The TCP stack then starts sending smaller IP packets, segmenting the data at the TCP layer rather than allow routers to fragment at the IP layer. Therefore, firewalls must be configured to allow incoming ICMP type=3, code=4 packets.

            How to Protect against Ping?


            To counter the surveillance from ping commands, you may consider to:[1]
            • Permit the outgoing echo request and incoming echo reply
            • Deny the incoming echo request and outgoing echo reply.
            You could consider enabling this settings to some key hosts, such as the router of your network provider.

            Another consideration is Host unreachable and Destination Unreachable packets (type=3/code=1). Allowing these to come in through your firewall will allow connections to timeout faster, but they can also be used as a denial of service attack (by disconnecting clients from servers).

            How to Protect against Traceroute?


            Using traceroute, we know it involves several UDP packets flowing from the sender to the destination, The danger of exposing your traceroute service to nonsecure network is that an attacker can use it to find out which hosts are the routers in your network. This tool manipulates the TTL option of a UDP packet, in order to receive an ICMP TTL exceeded message in response . Blocking the outgoing TTL messages (type=11/code=0) will help you hide your network structure.

            To counter any attacks from traceroute command, the simplest approach is blocking TTL exceeded messages (type=11/code=0) from going from the secure network to the nonsecure network. In summary, here are what you want to do:[1]

            Traceroute from the Firewall:
            • This configuration can be safely permitted. In order to do this, you must send high UDP packets and accept ICMP TTL (type=11/code=0) and port unreachable messages (type=3/code=3).

            Traceroute from Internet to the Firewall:
            • Block the outgoing TTL messages (type=11/code=0)
            • In addition, you also want to block the outgoing ICMP port unreachable messages (type=3/code=3) because it would be useful to an attacker as a fast way to discover which services you are providing.

            To find out which high port is used by your traceroute service, do:

            #cat /etc/services
            traceroute 33434/tcp
            traceroute 33434/udp

            Securing Network by Using NAT


            Network address translation (NAT) can be utilized to secure your network by
            1. Providing access to nonsecure network name/address mappings for users in the secure network
            2. Hiding the secure network names and addresses from users outside the secure network
            3. Providing name/address mapping for resources that you want to reveal (usually servers and gateways
            Here is how the address translation is done, For each,
            • Outgoing IP packet
              • The source address is checked by the NAT configuration rules.
                • If a rule matches the source address, the address is translated to an official address from the predefined address pool.
            • Incoming packet
              • The destination address is checked if it is used by NAT.
                • When this is true the address is translated to the original unofficial address.

            Note that only TCP and UDP packets are translated by NAT. The ICMP protocol is not supported by NAT. For example, pinging to the NAT addresses does not work, because ping uses the ICMP protocol.

            Allowing Ping/Traceroute from Firewall to Internet


            Users will constantly ask for the ability to ping and traceroute machines on the Internet. Most firewall adminsitrators will eventually give in to these demands. Nobody really needs to ping/traceroute, but they really want to.

            To protect your network, basically you will create filter rules that will route packets from a secure network to the Internet and back. NAT will take care of the address translation of the secure addresses. Normally, NAT translation will occur for the outgoing packet after the packet has gone through both packet filters (secure and non-secure). This means that you should never mention NAT addresses in the filter rules.

            References

            1. Protect and Survive Using IBM Firewall 3.1 for AIX (pdf)
            2. differences between ping and traceroute
            3. Default TCP Port
            4. Ping Blocking: How to do and how to break?
            5. tracert (Windows; traceroute in Linux)
            6. Firewall Forensics (What am I seeing?)
            7. ICMP Types and Codes
              • There are many different types of ICMP messages.
            8. ICMP (RFC 792)
            9. IP Fragmentation: How to Avoid It?
            10. Docker Container Networks: All Things Considered (Xml and More)

            Tuesday, April 19, 2016

            Docker: Btrfs Storage in Practice

            One of the ways Docker makes containerization so easy is by managing an overlay-style filesystem, allowing containers and images to incrementally change the filesystem layout of the image without requiring large copies of multiple images kicking around. This is a copy-on-write approach: parent layers are held read-only, and changes are reflected in the working layer.

            The Docker has support for different image/container layer storage drivers:[7]
            • aufs
            • btrfs
            • devicemapper
            • overlay
            • vfs
            • zfs 
            Your choice of storage driver can affect the performance of your containerized applications. So it’s important to understand the different storage driver options available and select the right one for your application.

            In this article, we will focus only on btrfs (B-tree file system) storage.

            Storage Driver and Backing Filesystem


            To begin with, let's study different storage-related entities that form a container:
            • Images
              • Is a tagged hierarchy of read-only layers plus some metadata
              • docker images command can be used to list all images and report their virtual sizes
            • Image layers
              • Each successive layer (with a UUID tag) builds on top of the layer below it
              • Reuse layers can improve image build time[8]
                • Each Dockerfile instruction generates a new layer
                • You should put instructions least likely to change at the top of your Dockerfile to reuse layers as much as possible and try to make changes only at the bottom of your Dockerfile.
              • Image layers can be shared among images
              • Docker limits the number of layers to 127
                • Layers don’t come for free, depending on storage driver used there are some penalties to pay
                  • For example, in AUFS, each layer can introduce latency to container write performance on the first write to each file existing in the image layers stack, especially if the file is big and exists below many image layers.
              • docker history command can be used to list all layers of an image
            • Storage drivers
              • Docker has a pluggable storage driver architecture. 
                • This gives you the flexibility to “plug in” the storage driver that is best for your environment and use-case.
              • Each Docker storage driver is based on a Linux filesystem or volume manager
              • The Docker daemon can only run one storage driver, and all containers created by that daemon instance use the same storage driver.
              • Each storage driver is free to implement the management of image layers and the container layer in its own unique way. 
                • This means some storage drivers perform better than others in different circumstances.
                • See [7] to learn more on which storage driver you should choose

            Storage Driver and Backing Filesystem


            Which storage driver you use, in part, depends on the backing filesystem you plan to use for your Docker host’s local storage area. Some storage drivers can operate on top of different backing filesystems. However, other storage drivers require the backing filesystem to be the same as the storage driver. For example, the btrfs storage driver requires a btrfs backing filesystem. 

            The following table lists each storage driver and whether it must match the host’s backing file system or not:

            |Storage driver |Must match backing filesystem |
            |---------------|------------------------------|
            |overlay        |No                            |
            |aufs           |No                            |
            |btrfs          |Yes                           |
            |devicemapper   |No                            |
            |vfs*           |No                            |
            |zfs            |Yes                           |

            The btrfs Backend


            The backing filesystem refers to the filesystem that was used to create the Docker host’s local storage area under /var/lib/docker.  The brtfs (B-tree file system) backend requires /var/lib/docker to be on a btrfs filesystem and uses the filesystem level snapshotting to implement layers.

            # df -h
            Filesystem      Size  Used Avail Use% Mounted on
            /dev/xvdb4       11G  7.1G  2.5G  75% /var/lib/docker
            <snipped>
            
            # mount -l
            /dev/xvdb4 on /var/lib/docker type btrfs (rw)
            <snipped>
            


            You can find the layers of the images in the folder /var/lib/docker/btrfs/subvolumes.  Each layer is stored as a btrfs subvolume inside the folder  and start out as a snapshot of the parent subvolume (if any).

            The btrfs driver is very fast for docker build - but like devicemapper does not share executable memory between devices. Mounting /var/lib/docker on a different filesystem than the rest of your system is recommended in order to limit the impact of filesystem corruption.

            You can set the storage driver by passing the --storage-driver= option to the docker command line, or by setting the option on the DOCKER_OPTS line in the /etc/default/docker file.  For example, to set the btrfs storage driver, do:
            # docker -d -s btrfs -g /mnt/btrfs_partition ...
            
            
              -s, --storage-driver=""  Storage driver to use
              -g, --graph=""           Path to use as the root of the Docker runtime.  
                                         Default is /var/lib/docker.
            

            To verify if btrfs storage driver is used in your docker container, do:
            # docker info
            Containers: 1
            Images: 19
            Storage Driver: btrfs
            ...
            

            References

            1. Btrfs
            2. LVM dangers and caveats
            3. 20 Linux Server Hardening Security Tips
            4. ZFS Vs. BTRFS
            5. How to Use Different Docker Filesystem Backends
            6. Daemon storage-driver option
            7. Select a storage driver
            8. Optimizing Docker images for image size and build time
            9. Docker Filesystems: Understanding the btrfs Backend (Xml and More)
            10. Oracle WebLogic Server on Docker Containers (white paper)
            11. WebLogic on Docker (GitHub)
              • Sample Docker configurations to facilitate installation, configuration, and environment setup for DevOps users. This project includes quick start dockerfiles and samples for both WebLogic 12.1.3 and 12.2.1 based on Oracle Linux and Oracle JDK 8 (Server).

            Friday, April 15, 2016

            Oracle Process Cloud Service 16.2.1 Release

            What's New!
            • Document Workflow Process Apps
            • Business Indicators and Analytics Dashboard
            • REST Service Connectors
            • New Process Composer
            • New Data Association and Transformation Editor
            • Federated SSO
            • OAuth Authentication for REST API

            Document Workflow Process Apps
            Oracle Process Cloud Service introduces a new Document Start activity to create Document Workflow Process Applications specifically designed to process documents and document folders from Documents Cloud Service. Documents and folders used to initiate a Document Workflow Process Application are automatically associated to the process, providing participants easy access to the content within the context of their assigned tasks. The introduction of first class Document Workflow capabilities makes it easy to create processes to manage document review and approval, transactional document management and light weight case management use cases.  

            Business Indicators and Analytics Dashboard
            Business indicators enable you to capture and display business metrics specific to your applications. Use the Business Analytics Dashboard to plot and view charts, graphs and reports for business metrics captured as business indicators. Create charts that display business indicator values. In Composer, developers create business indicators for data objects whose metrics they want to capture and display as X axis, Y axis, and filter values. In Workspace, you select business or system indicators to plot them in charts, graphs and reports.

            REST Service Connectors
            Use the new REST Service Connector to invoke RESTful services directly from within a process flow. The Service Activity has been enhanced to support both SOAP and REST Service Connectors, providing a familiar approach to interacting with external data sources and systems. Using the REST Service Connector you can easily define the resources, operations and payloads needed to connect to a REST service, regardless of the description language used to define the service.

            New Process Composer
            The new Process Composer has a more flexible and intuitive user Interface that can now be accessed through mobile tablets. The BPMN palette is easier to navigate. The process Activities are easier to access. And improved process flow layout makes it easier to create professional and organized looking process maps. Select an Activity to choose from a contextual set of actions and properties to configure the Activity behavior. Changing process sequence is more intuitive and provides contextual instructions on sequence flow alternatives.

            New Data Association and Transformation Editor
            The new Data Association editor greatly simplifies the task of mapping data between process Data Objects and Activities. Separate Input and Output associations allow for a more natural experience that makes it easy to map and review associations. Simply drag-and-drop Data Objects and Activity Payloads or use the intelligent auto-complete data entry. Even create new Data Objects right in the editor. Define reusable Transformations to associate different data types or reduce complex mappings. Define them once and apply wherever they’re needed in the application.

            Federated SSO
            Oracle Process Cloud Service now supports federated single sign-on (SSO) and authentication. Users who enter their valid credentials are authenticated through the their identity provider such as Oracle Identity Federation (OIF) or Active Directory Federation Services (ADFS) using the Security Assertion Markup Language (SAML) protocol, and redirected to the Oracle Process Cloud Service Workspace or Composer home page.

            OAuth Authentication for REST API
            Oracle Process Cloud Service now accepts OAuth tokens as an alternative to basic authentication for our REST APIs.


            Learn More about Oracle Process Cloud Service at https://cloud.oracle.com/process

            Sunday, April 3, 2016

            Expect Scripts: How to Automate Your Tasks

            Task to be Automated

            Below shows an interactive session that psm — an Oracle Application Container Cloud Service command line tool—prompted a user for needed authentication and authorization information before he/she can sign in to an Oracle Cloud service and work on a specific identity domain.

            $ psm setup
            Username: weblogic1 Password: Retype Password: Identity domain: myIdDomain Region [us]: http://anycloudserver.example.com:7103 Output format [json]:
            In this article, we will demonstrate how to utilize Expect to automate the above task , but not focus on the correctness of the information needed by psm.

            Expect


            Expect is an extension to the Tcl scripting language that "talks" to other interactive programs according to a script.[1] Following the script, Expect knows what can be expected from a program and what the correct response should be.

            It can be used to automate control of interactive applications such as telnet, ftp, passwd, fsck, rlogin, tip, ssh, and others including psm . Expect uses pseudo terminals (Unix) or emulates a console (Windows), starts the target program, and then communicates with it, just as a human would, via the terminal or console interface. Finally, Tk, another Tcl extension, can be used to provide a GUI.


            Expect Script


            To automate the said task, we need to write an Expect script (i.e., psmSetup.exp) as shown below:

            psmSetup.exp
            #!/usr/bin/expect -f #exp_internal 1 set argDomain [lindex $argv 0] spawn psm setup expect "Username: " send "weblogic\r" expect "Password: " send "welcome1\r" expect "Retype Password: " send "welcome1\r" expect "Identity domain: " send "$argDomain\r" expect "Region \\\[us\\\]: " send "http://anycloudserver.example.com:7103\r" expect "Output format \\\[json\\\]: " send "\r" expect "\r" spawn psm accs apps expect "\r"

            Expect scripts can have any file name suffix you like, though they generally have an .exp extension. Read [5] for more details.

            How to Debug Expect Scripts?


            Writing an Expect script the first time, it is easy to be completely lost and not getting the result you expect. In this case, un-comment the following line in psmSetup.exp:

            #exp_internal 1

            Setting "exp_internal 1" at the beginning of an Expect script is similar to -d flag (When using Expectk, this option is specified as -diag.), which enables some diagnostic output. This primarily reports internal activity of commands such as expect and interact. In addition, the strace command is useful for tracing statements, and the trace command is useful for tracing variable assignments.

            How to Pass Variables from Shell Script to Expect Script?


            Say, if you write a Korn shell script as below:

            #!/bin/ksh
            ...
            
            ./psmSetup.exp $domain

            and would like to pass $domain from shell script to psmSetup.exp script, you can add the following line to the Expect script:

            set argDomain [lindex $argv 0]

            To reference the Expect variable (i.e, argDomain), you use prefix "$" as below:

            expect "Identity domain: "
            send "$argDomain\r"

            In the above psm dialog, it depicts the interaction between a sender (i.e., end user) and a receiver (i.e. psm) as:
            "Identity domain: " is the prompt you "expect" from the psm; therefore, you enter that expected response (i.e., "$argDomain\r") by using "send".
            Read [7] for more explanation.

            How to Escape Special Characters?


            In the below dialog,
            Region [us]: http://anycloudserver.example.com:7103
            
            psm will prompt the user for a response:

            Region [us]: 
            

            which includes some special characters "[" and "]". To escape special characters in Expect, you can use backslash. However, to protect backslash from being substituted, you actually need to use "\\\" in front of both "[" and "]":[9-12]
            expect "Region \\\[us\\\]: "
            send "http://anycloudserver.example.com:7103\r"

            References

            1. Expect User Command
            2. Tcl
            3. How to pass variables from shell script to expect script?
            4. How to write a script that accepts input from a file or from stdin?
            5. Using Expect Scripts
            6. Debugging Expect Programs
            7. Using Expect Scripts to Automate Tasks
            8. How to escape unusual/uniq characters from expect scripts?
            9. Passing '\' In Username To Expect
            10. Problem in expect script with password involving trailing backslash
            11. How to send escape characters through Expect
            12. How to escape unusual/uniq characters from expect scripts?
            13. Oracle Application Container Cloud Service
            14. Introduction to the Oracle VM Command Line Interface (CLI)
            15. All Cloud-related articles on Xml and More

            Saturday, March 26, 2016

            How-to: Installing Python 3.5.1 in Linux

            When installing a new software in Linux, you could experience the following:
            • Expected
              • For example, if you have read this companion article (or watch this video), you would know that:
                • You can install multiple versions of Python on the same Linux Server (but in different PATHONHOME).[1]
                • There are differences between Python 2 and 3. So, be careful of reading articles that refer to different versions of Python installation (2 vs 3).
            • Unexpected
              • Surprises always happen even with careful planning. For example, we have run into at least two issues:
                • /usr/bin/install: cannot change permissions of `/usr/local/lib': No such file or directory
                • The directory '/home/<usrname>/.cache/pip' or its parent directory is not owned by the current user

            In this article, we will cover the installation of Python 3.5.1 in Linux and how to resolve the issues encountered.

            Downloads


            You can click on this link to download Python 3.5.1 in Gzipped source tarball format. Go to that page and scroll down to Files section.

            VersionOperating SystemDescriptionMD5 SumFile SizeGPG
            Gzipped source tarballSource releasebe78e48cdfc1a7ad90efff146dce6cfe20143759SIG

            If you want to download it using command lines, you can do:

            $ export http_proxy=http://www-proxy.us.xxx.com:80 $ export https_proxy=http://www-proxy.us.xxx.com:80 $ wget https://www.python.org/ftp/python/3.5.1/Python-3.5.1.tgz $ tar -xvf Python-3.5.1.tgz $ cd Python-3.5.1

            Configure and Install (Overview)


            In the Python-3.5.1 folder, there is a README file. Read the information inside and follow the steps as follows:

            On Unix, Linux, BSD, OSX, and Cygwin:

            $ ./configure
            $ make
            $ make test
            $ sudo make install

            This will install Python as python3. Note that only make install need to be run as a root user.

            Considerations of Configuration


            Before you execute the configure command, do plan in advance for your new PYTHONHOME.[1] This is especially important if you have:
            • Multiple versions of Python installed on the system, or
            • Some parts of file system are read-only

            Enter ./configure --help to learn how to customize installation directories in the configuration step.

            $./configure --help Installation directories: --prefix=PREFIX install architecture-independent files in PREFIX [/usr/local] --exec-prefix=EPREFIX install architecture-dependent files in EPREFIX [PREFIX] By default, `make install' will install all the files in `/usr/local/bin', `/usr/local/lib' etc. You can specify an installation prefix other than `/usr/local' using `--prefix', for instance `--prefix=$HOME'. For better control, use the options below. Fine tuning of the installation directories: --bindir=DIR user executables [EPREFIX/bin] --sbindir=DIR system admin executables [EPREFIX/sbin] --libexecdir=DIR program executables [EPREFIX/libexec] --sysconfdir=DIR read-only single-machine data [PREFIX/etc] --sharedstatedir=DIR modifiable architecture-independent data [PREFIX/com] --localstatedir=DIR modifiable single-machine data [PREFIX/var] --libdir=DIR object code libraries [EPREFIX/lib] --includedir=DIR C header files [PREFIX/include] --oldincludedir=DIR C header files for non-gcc [/usr/include] --datarootdir=DIR read-only arch.-independent data root [PREFIX/share] --datadir=DIR read-only architecture-independent data [DATAROOTDIR] --infodir=DIR info documentation [DATAROOTDIR/info] --localedir=DIR locale-dependent data [DATAROOTDIR/locale] --mandir=DIR man documentation [DATAROOTDIR/man] --docdir=DIR documentation root [DATAROOTDIR/doc/python] --htmldir=DIR html documentation [DOCDIR] --dvidir=DIR dvi documentation [DOCDIR] --pdfdir=DIR pdf documentation [DOCDIR] --psdir=DIR ps documentation [DOCDIR]

            In our system, both default installation directories /usr/local/bin and /usr/local/lib  are read-only. So, we need to configure it with different PREFIX and EPREFIX as follows:

            $ ./configure --prefix=/usr --exec-prefix=/usr
            creating Modules/Setup creating Modules/Setup.local creating Makefile

            After resolving all installation issues, you could find out where the final installation directories are by entering:[2]

            $ python3 -c 'import sys; print("\n".join(sys.path))'
            /usr/lib/python35.zip /usr/lib/python3.5 /usr/lib/python3.5/plat-linux /usr/lib/python3.5/lib-dynload /home/<usrname>/.local/lib/python3.5/site-packages /usr/lib/python3.5/site-packages

            Potential Issues


            Without specifying PREFIX and/or EPREFIX, you might run into the following issues (read [2] for further help):

            $ python3 Could not find platform independent libraries Could not find platform dependent libraries Consider setting $PYTHONHOME to [:] Fatal Python error: Py_Initialize: Unable to get the locale encoding ImportError: No module named 'encodings'


            To resolve another issue as shown below, try "sudo -H make install" as suggested (note that you may want to clean up first—see next section):

            $ sudo make install The directory '/home/<usrname>/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. The directory '/home/<usrname>/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.

            Cleanup and Retry


            In case you have run into any issues, you can cleanup your environment and retry with fixes. In your Python-3.5.1 folder, type the following:

            # make clean
            find . -depth -name '__pycache__' -exec rm -rf {} ';' find . -name '*.py[co]' -exec rm -f {} ';' find . -name '*.[oa]' -exec rm -f {} ';' find . -name '*.s[ol]' -exec rm -f {} ';' find . -name '*.so.[0-9]*.[0-9]*' -exec rm -f {} ';' find build -name 'fficonfig.h' -exec rm -f {} ';' || true find: build: No such file or directory find build -name '*.py' -exec rm -f {} ';' || true find: build: No such file or directory find build -name '*.py[co]' -exec rm -f {} ';' || true find: build: No such file or directory rm -f pybuilddir.txt rm -f Lib/lib2to3/*Grammar*.pickle rm -f Programs/_testembed Programs/_freeze_importlib rm -rf build

            References

            1. Environment Variables (Python)
              • sys.path
                • A list of strings that specifies the search path for modules. Initialized from the environment variable PYTHONPATH, plus an installation-dependent default.
            2. How-to: When a Missing Python Module Error Was Thrown
            3. What does “SyntaxError: Missing parentheses in call to 'print'” mean in Python?
              • This error message means that you are attempting to use Python 3 to follow an example or run a program that uses the Python 2 print statement:
            4. Install / Update Python 3.5.0 at Linux machine. (Youtube)
            5. Python 3.5.1
            6. Python Module
            7. upgrade Python to 2.7.2
            8. How can I troubleshoot Python “Could not find platform independent libraries
            9. Py_Initialize: Unable to get the locale encoding in OpenSuse 12.3
            10. Python script header
            11. Standard modules (Python)
            12. How do I find the location of Python module sources?
            13. sys module — System-specific parameters and functions
            14. What do the python file extensions, .pyc .pyd .pyo stand for?
            15. How do I unload (reload) a Python module?
            16. Python Packaging User Guide
            17. Purpose of #!/usr/bin/python3 (important)