Introduction

In programming, any value that you store in a variable has a type also called as data type. These types range from string (or text), integer (numbers like 0, 1, 2, -1, -222), float (real numbers like 0.22, -74), list, dictionary or even your own. Data type available in a programming language is called built-in data type and the ones that you create using classes are also called user defined data types.

In this post, we will look at string data type and will dive into others in later posts.

String type (str in Python) is one of the most frequently used data types. If you want to store a name of a person, an email’s body, user’s comment in a blog post or any other values where you usually write something using alpha-numeric characters is a string. Take a look at the example below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
name = "python"
print(type(name)) # prints <class 'str'>

# we can also use single quotes
name = 'python'
print(type(name)) # prints <class 'str'>

# we can also use triple double or single quotes
# for long strings with multiple lines
blog_post = """
This is a tutorial with multiple lines.
Please share this post.
"""
print(type(blog_post)) # prints <class 'str'>

We’ve created a variable called name with value python. To assign a string to a variable, we need to wrap the string within double/single quotes. If you want to have multiple lines, then you can also use triple quotes as shown above. The value saved in the name will be the text inside the quotes. The quotes themselves are not part of the value but rather a syntax of Python to create string values.

:warning: Please remember to use quotes to create your string value. Otherwise Python will throw an error.

What do I mean by create a string value? A value that is assigned to a variable can come from many sources like a user input or from a database or by hard-coding the value. In this context, when I say create a value, I mean hard-coding a value by typing the value in the code yourself like the ones you’ve seen above. Take a look at the example below where we ask user for an input and assign it to a variable.

1
2
3
name = input("What is your name? ")
print(name)
print(type(name))
1
2
3
What is your name? Doe
Doe
<class 'str'>

In this case, we did not create the value of name variable but rather asked user to enter a value using input function. When a user enters a value, the input function creates a string value which is then assigned to the variable name.

Common pitfalls

Forgetting quotes

Common beginner mistake is to forget about quotes. Let’s say that we want create a variable called name and assign samsung to be its value. It’s very common to write something like this when you are just starting your programming journey.

1
2
3
4
5
6
name = samsung

Traceback (most recent call last):
  File "main.py", line 1, in <module>
    name = python
NameError: name 'samsung' is not defined

As the error says, name 'samsung' is not defined. This means that when you do not use quotes, Python will think that samsung is a variable and it tries to assign the value of samsung to the variable name. But we have not created a variable called samsung so Python does not find it and throws an error like above. To fix that we need to wrap the text samsung with quotes like name = 'samsung' or name = "samsung".

Forgetting quotes II

Another common error I see is when using print function to display something like:

1
2
3
4
5
6
print(python programming is exciting)

File "main.py", line 1
  print(python programming is exciting)
                 ^
SyntaxError: invalid syntax

We wanted to print python programming is exciting but again we forgot to put quotes around the entire sentence. This time the error is something different. We haven’t covered anything about functions yet so I will skip the explanation but just keep in mind that if you see invalid syntax when using print function, then probably you forgot the quotes.

Unnecessary quotes

This is another mistake I see made frequently is due to the fact that everyone says to use quotes and you use it when you actually should not. For example,

1
2
brand = "samsung"
print("brand")

Here we wanted to display the value of variable brand but it prints the word brand instead of samsung. Do you know why? It’s because when we want to use the variable, we do not put quotes around it. We want to use the variable to display its value so instead we should write print(brand). Here is what happens:

  • Python will see that it needs to print something.
  • What is that thing - a variable
  • What is the value of that variable - samsung (a string)
  • Print it

But in the example above here is what happens when print("brand") is executed

  • Python will see that it needs to print something
  • What is that thing - a string (not a variable even though they both have the words brand in it)
  • Print it

String Operations

There are many operations you can do to modify string values. Some of the common ones are finding and replacing, slicing (selecting only a portion of the entire string), changing to upper case, lower case etc.

:warning: Any operation on a string returns (gives you back) a new modified string. Don’t forget to assign the result of the operation to a new variable. Your original string will not be changed at all.

Find

You can use find method of a string to check if the input string (needle) occurs in your “main” string (haystack). For example, let’s say you are trying to figure out if a text contains the word idiot so that you can find out mean comments made by users. For that, you can use find which returns the character position at which the needle appears in haystack. If it is not found then it returns -1.

1
2
comment = "You are an idiot"
print(comment.find("idiot")) # prints 11

It will print a number 11. Can you count, starting from 0 not 1, each character in the comment including spaces and see at which position the “i” of idiot appears? You’ll find that it is 11. So if find function returns some number other than -1, then it means the word(s) you want to find is present.

Take a look at another example

1
2
comment = "You are an idiot"
print(comment.find("Idiot")) # prints -1

In this case it prints -1. Why? Because “Idiot” is not same as “idiot” and hence Python does not think that they are same. This means that string comparison is case-sensitive.

A complete example

1
2
3
4
5
6
comment = "You are an idiot"
index = comment.find("idiot")
if index >= 0:
	print("idiot word was found")
else:
	print("this comment is not mean")

Replace

Let’s imagine you want to remove foul words from a comment. How would you do that?

1
2
comment = "You are an idiot"
print(comment.replace("idiot", "angel"))

It will print You are an angel. replace function will replace any occurrence of the input word(s) in your main text with any word you like. In the example, we told Python to replace idiot with angel.

As with find, replace is also case-sensitive. If you do

1
2
comment = "You are an idiot"
print(comment.replace("Idiot", "angel"))

it will still print You are an idiot because Idiot (with capital I) is not present in the comment.

Full example where we assign the “new” string to a new variable.

1
2
3
comment = "You are an idiot"
nice_comment = comment.replace("Idiot", "angel")
print(nice_comment)

Changing cases

Sometimes we want to change the case of the letters in the string. As we saw in the examples above idiot is not same as Idiot or idioT. Usually we change the case of letters for searching or replacing to prevent the headache of considering many variations of same word in different cases. Commonly used methods to change the cases are upper, lower, title and capitalize.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
comment = "yoU aRe an iDioT"
print(comment)

lower_cased_comment = comment.lower()
print(lower_cased_comment)

upper_cased_comment = comment.upper()
print(upper_cased_comment)

title_cased_comment = comment.title()
print(title_cased_comment)

capitalized_comment = comment.capitalize()
print(capitalized_comment)

The code above produces the following output. Take a closer look at what each function did.

1
2
3
4
5
yoU aRe an iDioT
you are an idiot
YOU ARE AN IDIOT
You Are An Idiot
You are an idiot

Now let’s make a “fool-proof” :wink: profanity checker.

1
2
3
4
5
6
comment = "yoU aRe an iDioT"
foul_word = "idiot"
nice_word = "angel"
lower_cased_comment = comment.lower()
nice_comment = lower_cased_comment.replace(foul_word, nice_word)
print(nice_comment)

It prints you are an angel, Even though the comment had the word idiot in a very weird way, we were still able to replace it by converting the entire sentence to lowercase and then using the replace function. Also note that the final output is entirely in lowercase. This might not be desirable for all situation since capitalizing letters is quite important in English language and we have to preserve them. There are ways find and replace in a case-insensitive manner without butchering the text like this using regular expressions.

Slicing

You can also extract a subset from a string using slicing.

1
2
3
text = "012345"
first_five_characters = text[0:6]
print(first_five_characters)

The code above prints:

1
012345

To “slice” a string you have to use the following notation [start_index:end_index]. In Python the index or position starts from 0 so to extract first 5 characters, we have to use 0 as start_index. Then we have to put a colon “:” and the end index. Note that the character at the end index is not included in the result. When translating the code text[0:6] to plain English, it might read some thing like “Give me characters starting from index 0 up to, but not including index 6”.

There are other variations you can use while slicing. When we want to start from the beginning i.e. index 0, we can omit writing 0 and write it as text[:6]. Note that you still need to write the colon and the end index. This is exactly same as text[0:6].

Another situation is where to want to start from some position other than 0 and select all remaining characters, you can omit the end index. For example, if you want to select characters from second character until the end you can do something like this:

1
2
text = "012345"
print(text[1:])

The code prints:

1
12345

Notice that the start index is 1 because, index 0 means first character and index 1 means second character.

:confused: This might be confusing at first but all major programming languages “0 based indexing” i.e. counting begins from 0.

Since we wanted all characters until the end, we don’t need to specify the end index. But don’t forget the colon! What exactly happens if you forget the colon, you ask? Well, it will give you the character at that index. So if you just say print(text[1]) it will print character in the second position i.e. “1”.

There is also another way of slicing where you can specify negative index! Let’s say you want to extract extension of a filename. If the filename is “photo.png”, we want to extract only the “png” part. For now assume that all files have 3 character extension like jpg, png etc. but the file names could be any character long. So you cannot know when the extension begins. e.g. in filename = "photo.png", we can extract the only “png” using filename[6:] but for “photo1.png” we need to use filename[7:].

We know that the extension is last 3 characters so we can use negative slicing!

1
2
3
4
5
filename = "photo.png"
print(filename[-3:])

filename = "myphoto.png"
print(filename[-3:])

In both cases, it will print “png” regardless of how long the name actually is. How does it work? Let’s break it down. When we write filename[-3:] we are telling Python to start slicing from 3rd last character until the end. Makes sense? The syntax is still the same as above [start_index: end_index] but this time we use negative number to indicate start looking from the end.

To understand more about slicing take a look at this table. Assume that we have filename = "photo.png"

slice result details
filename[0] p first character (or character at index 0)
filename[0:2] ph first and second characters
filename[:2] ph same as above but omit start index
filename[3:7] to.p 4th to 7th character (note 7th character not character at 7th index which is 8th character)
filename[3:] to.png 4th character till the end
filename[-1] g the last character
filename[-3:] png from 3rd last character till the end
filename[-5: -3] o. from 5th last character till 4th last character (since end index is not inclusive, 3rd last character is not included)

:bulb: There are much much better ways to parse file names and paths. Usually os and pathlib modules are used and can handle wide variety of cases across multiple operating systems.

If you’ve understood the concept of slicing strings then you already know how to slice lists in Python. Exactly the same concepts and syntax is used to slice lists which we will go through in next posts.

Joining two strings

You can join or concatenate two string to create a new one which contains both. For example

1
2
3
4
first_name = "John"
last_name = "Doe"
full_name = first_name + last_name
print(full_name)

The code prints “JohnDoe”. You can use the + sign to concatenate two strings. If you look at the output, there is no space between first and the last name. That’s not very nice. But there is a easy solution, you can “add” as many strings as you want.

1
2
3
4
first_name = "John"
last_name = "Doe"
full_name = first_name + " " + last_name
print(full_name)

It print “John Doe”. Notice how we added a string with spaces with the other strings first_name + " " + last_name. You can use any string variables or hard-code new strings when concatenating strings.

Exercise

We need to write a code that generates a new name of a file to keep it as a back up. If a filename is “photo.png”, it should create a new name which is “photo.backup.png”

1
2
3
filename = "photo.png"
new_filename = ... # your code
print(new_filename)

That’s it for this post. This is a lot more about strings in Python that I’ve mentioned here. If you want to know about strings and their operations in details you can read the official documentation.

Updated:

Comments