decor
 

planetDB2 logo

Planet DB2 is an aggregator of blogs about the IBM DB2 database server. We combine and republish posts by bloggers around the world. Email us to have your blog included.

 

January 17, 2019


Craig Mullins

Making Plans for IBM Think 2019

I'm looking forward to once again attend IBM Think, IBM's flagship technology conference. I attended the inaugural Think conference and it was one of the highlights of the year (2017). This year IBM Think is being held in San Francisco February 12 thru 15 at the Moscone Center and surrounding hotels. San Francisco is a wonderful location because it will give the conference more room to...

(Read more)
Jack Vamvas

DB2 TSAMP Clustering - Resource marked as OFFLINE

Question: I've installed  the latest security vulnerability patching for jdk on DB2 TASMP clustering - and while running some tests i.e fail  over and fail back , I'm getting a Resource Marked as offline.

I've investigated the root cause - and fixed , and now I'm trying to reset the resource back as ONLINE. How can reset the resource?

Answer: Without knowing anything about how the root cause and  what steps you took to fix the underlying cause, the typical steps to take is to use the resetrsrc  command. This is an example of how you might reset the value . In this example - the failover DB2 instance is currently marked as offline - and must be reset back to ONLINE

# resetrsrc -s "Name like 'my_resource' AND NodeNameList in {'my_node'}" IBM.Application

 

To get the values for the my_resource and my_node , use the lssam command , which gives you details about the Cluster status.

 

For more commands to manage the TSAMP DB2 Cluster use TSAMP Cheat Sheet for DBA managing DB2 clustering

 

January 15, 2019

Cognitive Class

Why every Data Scientist should know SQL

Still waiting…it’s been over an hour and still nothing. I watch the clock, get some tea, ruminate on the structure of dark matter….

I’m trying to work with course enrollment data in a relatively large database and format it in a nice splashboard, but processing this data takes far too long. Perhaps dark matter is to blame.

Let me back up.

Last year, I was tinkering with a Jupter notebook to summarize course enrollment and completion stats for some of our database courses.

In fact I started with a notebook that someone had originally created for another set of courses involving the same database. Why re-invent the wheel when a perfectly good notebook to do something similar you want already exists. After all, data science is a team sport.

I had made relatively minor updates to the notebook – just switched the course numbers that I wanted summaries for and clicked on Run All to execute all cells in the notebook.

I hadn’t really looked carefully at the code in the notebook before running it. But once the summarized results failed to materialize after a couple of hours, I knew I couldn’t blame things on dark matter anymore and would need to get my hands dirty with code.

So I grabbed another warm beverage and got ready to dig into the code in the notebook. But it only took scrolling to the cells that performed the database queries to recognize the problem.

SELECT * FROM ENROLLMENTS. I read it and the read it again. Aloud, the second time. It was like a eureka moment.

I was pleased that I was able to debug the problem so quickly, but was not too happy with the prospect of having to spend some time hacking the notebook to make it run faster. A lot faster.

I have over 25 years of experience working with databases so I knew fixing the database queries would be relatively quick. But much of the data analysis logic in the notebook involved Pandas dataframes.

I had only recently picked up some data science skills and most of my data science capability involved the use of R programming language. All the data scientists I had been talking to recently were using Python. [So this was also good opportunity for me to pick up some skills in Python and Data Analysis with Python].

But let me not digress further and get back to the problem with SELECT * FROM ENROLLMENTS.

Imagine you want to buy one item from an online retailer. Would you order all the millions of items in the retailer’s warehouse to get just the one you want and then discard or return the rest of the items? Can you imagine how long it would take to have the entire inventory shipped to you? Even if all of the contents managed to reach you somehow, would you even have enough capacity and resources in your house to receive and process the entire inventory?

But apparently that is exactly what a lot of data scientists actually do. They “order” all of the items in the data warehouse and then use tools like Pandas dataframes to sift through the data they need and discard the rest.

And that is exactly what the SQL query: SELECT * FROM ENROLLMENTS in my example above does. The database I was accessing had millions of rows for course enrollment and completion data, and getting all the data into a notebook would take considerable time. And with constrained resources on my laptop, processing those millions of rows with Pandas dataframes would take even longer.

Shortly after this issue, I met with a Database Administrator (DBA) at one of the big banks. Their CEO was sold on the fact that data science could help transform the company and data science teams were cropping up all over the company in the recent months, but that’s when his job had started to become “hell”.

DBAs run a tight ship. They tune the system and queries to the umpteenth degree so the database can hum along fine responding to predictable queries efficiently.

And then comes along a hotshot data scientist and runs a huge query like “SELECT * FROM ENROLLMENTS” against an operational database. The database slows to a crawl, and the company’s clients on the website start seeing database errors and timeouts. And the DBA responsible for the database gets called to the boss’s office.

I may have exaggerated a bit and fictionalized parts of the narrative but unfortunately this sort of a thing is quite common. But data scientists are not entirely to blame. Data Science itself has been evolving.

Data Science traditionally has been done on very small data sets. As a matter of fact, over 80% of data science work is done on a laptop according to one of the consulting firms.

Small data sets are easy and fast to manipulate in memory and Pandas is great for that. Data Scientists traditionally worked with CSV files (text files with comma separated values) and did not have a connection to a database. A DBA would do a one-time database dump in to a CSV and that was it.

We are in the age of Big Data and working with CSV files is simply not practical. Repeatedly generating CSV file extracts with more up to date data is even less practical. This means that Data Scientists need to learn to work with big data repositories like relational Data Warehouses, Hadoop, Spark, Cloud Object Storage etc.

The language of relational databases is SQL. And because of SQL’s ease of use, it is increasingly being adopted by other big data repositories.

In case of my query – “SELECT * FROM ENROLLMENTS” – all I had to do was add a WHERE clause to the query to filter the results for just the courses I was interested in so the result set would include only a small subset of the millions of rows in the table.

So that is one reason I feel the knowledge of SQL is essential for today’s Data Scientists. Perhaps modern data scientists only need to learn a subset of SQL. They don’t need to learn transaction processing but things like simple filtering and aggregation are a must.

The impact of adding filtering to my SQL query in the Jupyter notebook was dramatic. The results were rendered in a couple of minutes instead of a couple of hours. And I don’t consider myself to be a genius.

And if I could tweak SQL in my data science experiment by so little and have such a huge impact on performance, I could surely help other Data Scientists (and some of those DBAs who are frustrated with newly minted data science yahoos like myself) work more efficiently with databases and SQL.

So shorty after these episodes, working with my colleagues Hima Vasudevan and Raul Chong, we launched the course Databases and SQL for Data Science on Coursera. It is an online self-study course that you can complete at your own pace.

This course introduces relational database concepts and helps you learn and apply knowledge of the SQL language. It also shows you how to perform SQL access in a data science environment like Jupyter notebooks.

The course requires no prior knowledge of databases, SQL, Python, or programming. It has four modules and each requires 2 – 4 hours of effort to complete. Topics covered include:

Module 1:
– Introduction to Databases
– How to Create a Database Instance on Cloud
– CREATE Table Statement
– SELECT Statement
– INSERT Statement
– UPDATE and DELETE Statements
– Optional: Relational Model Concepts

Module 2:
– Using String Patterns, Ranges
– Sorting Result Sets
– Grouping Result Sets
– Built-in Functions, Dates, Timestamps
– Sub-Queries and Nested Selects
– Working with Multiple Tables
– Optional: Relational Model Constraints

Module 3:
– How to access databases using Python
– Writing code by Using DB-API
– Connecting to a Database by Using ibm_db API
– Creating Tables, Loading Data, and Querying Data from Jupyter Notebooks
– Analyzing Data with SQL and Python
– Optional: INNER JOIN, LEFT, RIGHT OUTER JOIN

Module 4:
– Working with Real-world Data Sets
– Assignment: Analyzing Chicago Data Sets using SQL and Python

The emphasis in this course is hands-on and practical learning. As such, you will work with real databases, real data science tools, and real-world datasets. You will create a database instance in the cloud. Through a series of hands-on labs, you will practice building and running SQL queries using cloud based tools. You will also learn how to access databases from Jupyter notebooks by using SQL and Python.

Anyone can audit this course at no-charge. If you want a certificate and access to graded components of the course, there is currently a limited time price of $39 USD. And if you are looking for a Professional Certificate in Data Science, this course is one of the 9 courses in the IBM Data Science Professional Certificate.

So if you are interested in learning SQL for Data Science, you can enroll now and audit for free.

NOTE: Portions of this post have been updated from the original version. In the process of publishing this blog post, I may have inadvertently hurt the emotions of a few Data Scientists and perhaps some DBAs, but certainly none were physically harmed. But seriously, it is not my intent to offend or stereotype any Data Scientist or DBA. So my sincere apologies to anyone who may have taken offence. The intent of this blog post is to highlight a real problem in data science, one that can be easily rectified with some knowledge of SQL, and I would be a lousy marketeer if I didn’t include a shameless plug for the IBM Data Science Professional Certificate on Coursera.

The post Why every Data Scientist should know SQL appeared first on Cognitive Class.


Ember Crooks

Db2’s Alternate Page Cleaning

The first professional technical presentation I ever gave was on March 31, 2003. I had been a DBA for just less than 2 years. It was at an event called “SDC West Geek Fest”. It was an internal IBM event within Global Services where the technical folks delivering services got
 

January 14, 2019


Data and Technology

Common Database Design Errors

Before we begin today’s blog post, wherein I explain some of the more common mistakes that rookies and non-database folks make (heck, even some database folks make mistakes), I first want to...

(Read more)
 

January 11, 2019


Triton Consulting

Six Reasons to Review your Database Availability – Part 5 HR and Labour Costs

From working with many customers to help keep their critical databases up and running we have come up with the top 6 reasons for putting Database Availability at the top of your priority list. In this blog we look at … Continue reading → The post Six Reasons to Review your Database...

(Read more)
 

January 10, 2019


Ember Crooks

Update on Ember’s Data Science Journey – January 10, 2019

Well, I knew it was going to be slow, and that is part of why my plan is a 20-year plan instead of just a short-term plan. The fall was particularly difficult due to a couple of issues at work and two conferences that I absolutely loved, but that took
 

January 08, 2019


Kim May

IBM 2019 Replication Updates – 4 in 45 Minutes – Featuring Kafka!

Join replication experts Frank Fillmore and Ed Lynch for this information-packed webinar with news and announcements about IBM’s replication solutions. Whether you are a current IBM replication user,...

(Read more)

Dave Beulke

5 Data Management Predictions for 2019

Hi everyone, I am back after taking a blogging break in 2018. It was nice seeing everyone during the year at data management and Db2 conferences, user group presentations, and various other functions. During 2018, I continued to enjoy the dynamic problem solving of challenging consulting...

(Read more)
 

January 05, 2019


Dave Beulke

Dave Beulke an IBM Champion for 2019

I was recently notified I’ve been selected as an IBM Champion for 2019. This is my eleventh year as an IBM Champion. It is great to be included again with all the other IBM Champions. Check all the deep technical experience and clients successes of the IBM Champions here....

(Read more)

Dave Beulke

Five Data Management Predictions for 2018

Happy New Year! I hope everyone has their 2018 filled with good health, happiness and prosperity. Once again I am offering you my data management predictions for 2018. Thank you all for great support in 2017 and countless emails regarding my conference, user group presentations and blogs last...

(Read more)
 

January 04, 2019


Robert Catterall

A Case Study: Measuring the Impact of Db2 for z/OS Buffer Pool Changes

Not long ago, a Db2 for z/OS administrator contacted me with a request. His team had made some changes to the buffer pool configuration in their production Db2 environment (the overall size of the buffer pool configuration was increased to reduce read I/O activity, and greater use was made of large real storage page frames), and they were not sure as to how the performance impact of these changes could be measured. Could I assist with that analysis? Sure, I said. I asked the Db2 admin to use...

(Read more)
 

January 02, 2019


Frank Fillmore

SQL to Aid #Db2 for z/OS EBCDIC to Db2 for LUW Unicode Conversion #IBMAnalytics #IBMz

Dr. Sheldon Cooper of “The Big Bang Theory” regularly records a YouTube video called “Fun with Flags” so I’m going to call this blog post “Fun with SQL”. ...

(Read more)
 

January 01, 2019


Data and Technology

Happy New Year 2019

Just a quick post today to wish everybody out there a very Happy New Year! I hope you have started 2019 off with a bang and that the year is successful and enjoyable for one and all!

(Read more)
 

December 28, 2018


Kim May

2019 Events – Coming Soon!

With 2018 coming to a close I am focused on plans for 2019.  Our current plans are being driven by changes in IBM’s services delivery model and product enhancements to be available in 2019. ...

(Read more)

Robert Catterall

Db2 for z/OS Partitioned Table Spaces: How the Number of Partitions can Impact Performance

This past summer, I delivered a couple of presentations at a meeting of a regional Db2 users group in the USA. During the lunch break, I shared a table with a member of the Db2 for z/OS support team at a large financial institution. He told me an interesting story. At his site, the Db2 for z/OS team had decided that universal partition-by-range (PBR) would be a better table space type for a large table currently housed in a partition-by-growth table space (PBG). To prove out this...

(Read more)
 

December 24, 2018


Craig Mullins

Happy Holidays 2018

It is hard to believe that yet another year has come and gone (well, almost) and that the holiday season is once again upon us. And that means it is time to reflect on the past year -- including all that we have accomplished and what is yet to be done. And importantly, it is also time to wind down and relax with friends, family and loved ones.  A time to put down the work that consumes...

(Read more)
 

December 21, 2018


DB2Night Replays

The DB2Night Show #210: Db2 Performance Tuning- Indexes vs Prefetch

@mhubel Special Guest: Martin Hubel Owner, MHC IBM Gold Consultant Db2 LUW Performance Tuning: Indexes versus Prefetch! 100% of our audience learned something! Famous for putting his kids through college by Dropping Db2 Indexes, Martin covers a wide variety of Db2 Performance Tuning topics with focus on indexes, prefetch I/O, bufferpools, Db2 MDC and ITC tables, and much more (detailed below). Watch and learn from a top Db2 performance...

(Read more)
 

December 17, 2018


DB2Night Replays

The Db2Night Show #z92 - Distributed access to Db2 for z/OS: top things

Presented by: Cristian Molaro IBM Champion and Gold Consultant "The DB2Night Show #Z92: Distributed access to Db2 for z/OS: top things to know" Replays available in WMV and M4V formats! 98% of our studio audience learned something!Cristian covered all aspects of distributed access to Db2. A must see show! Watch the replay...

(Read more)

Craig Mullins

Dirty Reads... Done Dirt Cheap

Let's talk about dirty reads (with apologies to the AC/DC pun in the title of this blog post). Application programmers must understand how concurrency problems impact the access and modification of Db2 data. When one program attempts to read data that’s in the process of being changed by another, the DBMS must forbid access until the modification is complete to ensure data integrity. Most DBMS...

(Read more)
 

December 13, 2018


Data and Technology

FaunaDB: A multi-model, distributed database system with ACID consistency

Although relational, SQL database systems continue to dominate the DBMS market, modern database management has shifted to encompass additional types of database systems. This is exemplified in the...

(Read more)

Ember Crooks

The Autumn That Nearly Drowned Me

For more than 5 years I’ve managed to average at least one blog entry a week through fluctuating workloads and even through changing employers. And then came this autumn. The combination of a couple of large clients and going into holiday peak period for a couple of retail clients, collided
 

December 12, 2018

DB2 Programmers Guide

Long time no see

Its been a long time since i have published any new topic. I have been busy with career and some other personal issues. But i am back and planning to be more active now.

In coming days, i will add lot of interesting topics on DB2.

And finally one good news , along with this blog , i am planning to reach you with new interactive way.

New You tube Channel ,yeah... this will be really interesting.

More details Soon.....

Share your suggestions,thoughts for this new You tube Channel in comments.
DB2 Programmers Guide

DB2 Image Copy

In a Production environment, there are constant updates happening on your DB2 tables as a result of several jobs/programs functionality. These updates are nothing but the results of the execution of  Insert/Update/Delete statement on your various DB2 tables.

What if, I want to take a look at the data in tables before my today's batch pass ran ?? this may be for some analysis/understanding  or worse inadvertently something went  wrong and you want to recover the data back to where it was before the job ran. Is there a way? Wouldn't it be nice if some one can take a back up of my DB2 tables before the batch pass run so I can always go and get it back where we started.

Well , it's already happening the only thing is as a programmer/ developer you may not be knowing it. This is something done by using DB2 utility and by your DataBaseAdministrator .

This process of taking full back-ups of your data objects is called as DB2 full image copy. It is achieved by DB2 utility COPY.

You can make full image copies of a variety of data objects. Data objects include table spaces, table space partitions, data sets of nonpartitioned table spaces, index spaces, and index space partitions.

The following statement specifies that the COPY utility is to make a full image copy of the TABSPACE table space in database DATABASE.

COPY TABLESPACE TABSPACE.DATABASE

The COPY utility writes pages from the table space or index space to the output data sets.

So, if you want to take a look at old data before pass, all you have to do is dump that image copy dataset from COPY Utility into normal DASD file on disk.

There is also something called as incremental image copy, this as it name suggest does not take complete copy but only the records which has changed (Insert/Update/Delete) from last run .

The information of this COPY Utility can be found on the DB2 catalog table SYSIBM.SYSCOPY 

More on how JCL'S for DB2 Utility looks and how it works will be covered later in DB2 utility post which I will be doing shortly.

 

December 11, 2018


Data and Technology

SQL Performance and Optimization

Just a quick post today to refer my readers to a series of blog posts that I recently made to the IDERA database community blog. This four-part blog series took a look into SQL performance and...

(Read more)

Triton Consulting

Six Reasons to Review your Database Availability – Part 4 Regulatory and Compliance Issues

From working with many customers to help keep their critical databases up and running we have come up with the top 6 reasons for putting Database Availability at the top of your priority list. In this blog we look at … Continue reading → The post Six Reasons to Review your Database...

(Read more)
 

December 05, 2018


Triton Consulting

Development is more Important than Production!

Some years ago, not too long after Y2K when DB2 V7 for z/OS was reasonably new, I recall a busy afternoon trying to identify the root cause of a slowdown within a recently amended critical application process in Production. New … Continue reading → The post Development is more...

(Read more)
 

December 03, 2018

 

November 30, 2018


Data and Technology

My Data Quotes – 2018

I am frequently approached by journalists and bloggers for my thoughts on the data-related news of the day… and I am usually happy to discuss data with anybody! Some of these discussions wind...

(Read more)

DB2Night Replays

The DB2Night Show #209: The Perfect Db2 Server and What's New Db2 V11.1.4.4

@klaas_brant Special Guest: Klaas Brant Owner, KBCE IBM Gold Consultant The PERFECT Db2 Server!Bonus: What's New in Db2 V11.1.4.4 100% of our audience learned something!Klaas Brant's presentation was amazing! From monitoring commands to configuration recommendations, Klaas covered it all! Watch our replay to learn MON_GET queries, OS commands, configuration settings, and best practices for making your Db2 servers run optimally! What's...

(Read more)

Robert Catterall

Db2 for z/OS Global Variables: What is a "Session"?

Has this ever happened to you? You're looking at the Db2 for z/OS SQL Reference (or in the Db2 for z/OS Knowledge Center on the Web), and you see something like this (and the quote below is from the description of the CREATE VARIABLE statement in the SQL Reference, with highlighting added by me): Global variables have a session scope. Although they are available for use to all sessions that are active at the current server, the value of the global variable is private for each...

(Read more)
decor