Skip to main content

Unicode | What and Why?

We will discuss the following concerns in this article

  • What were the needs of character sets? (ASCII, Unicode etc.)
  • How ASCII has emerged?
  • What is  Unicode and Why is it?
  • Why 1 Bit = 8 Byte?
  • Why Java character takes Two Bytes?

In the early age of Computer Technology. The only Binary language was there(1,0). We, the Human, were not comforting with Binary. When they tried to write their name in binary it was time taking and can be a mesh. They needed a human language to use which is most often to use. This technology was emerging in the US. so they need the English language. They found Latin which contains 256 characters including 26 characters of English + German + French + special characters + numbers and others. Language is just a set of characters. So they assign a binary sequence of 8 bit for their every English alphabet's character as:
a => 01100001
=> 01100010
=> 01100011
=> 01000001 etc.

That's why 1 Byte = 8 Bit, as 2^8 = 256 characters.
That's why character data type of C, C++, and many technologies takes One byte.
This is how ASCII(American Standard Code for Information Interchange) created.

And by following this, Every country was making its own character set. As  KOI for Japanese, Big5 for Chinese, ISO-8859 for Europe's seven languages etc.


But with this, a problem arises. let's look at this:
"The character set is the fundamental raw material of any language and they are used to represent information. Like natural languages, computer language will also have a well-defined character set, which is useful to build the programs."
This means that Choosing a programming language means choosing a character set also.
So If we create a software using C language which supports ASCII character set then this software will understand English only, not Chinese or French etc. Softwares were language specific.

We humans have not one language. Every country has their own language. And every country wanted the software to understand their language too. It means that they need a character set which contains all languages of this world.  And Programming languages use that character set to make software.

So they collect the world's all languages and try to make a new character set. They became more than 256 characters so they assign 2 bytes for that character set. It means now this character set can hold 2^16 = 65536 character. But in world's all language's characters are more than 65536. To increase this size, they encode it first in Hexadecimal then convert in binary and generate a Unique Code called Unicode. This is how Unicode arises.

So languages which use Unicode, much have their character data type of 2 bytes.
for ex: Java, Ruby etc
That's why Java reserve 2 bytes for character data type. 

Comments

Popular posts from this blog

Why "F" and "L" suffix | (10.0F, 10L)

Let us take it this way, We will create their needs. So we will get why they are needed. Try to guess, which functions will be executed in the following program: public class MyClass {     public static void main(String args[]) {         MyClass obj = new MyClass();         obj.fun1(10);     }     void fun1(byte val){         System.out.println(val);     }     void fun1(int val){         System.out.println(val);     }     void fun1(float val){         System.out.println(val);     }     void fun1(long val){         System.out.println(val);     }     } It seems like every method is capable to run this program because 10 is still literal because It has no data type. Before Java, In previous technologies, this scenario gave an ambiguity error. But Java solves this problem by removing the concepts of literals. It means Java provide a data type immediately when these born. So here 10 is no more literal. Java provides Integer data type for it. So now it is of Integer t

only large files upload on S3 | Ruby On Rails

models/attachment.rb class Attachment < ApplicationRecord after_initialize :set_storage private def set_storage # larger that 5mb file would be upload on s3 if file . blob . byte_size > 5_000_000 Rails . application . config . active_storage . service = :amazon else Rails . application . config . active_storage . service = :local end end # end of private end

Typecasting | How is Long to Float Conversion possible?

We will take a brief description of Typecasting and will try to do focus on Log to Float Conversion. Typecasting: Assigning a value of one data type to another. When we assign a value of smaller data type to a bigger one. it is called Widening. Java did this conversion automatically as they are compatible. As shown in the following figure: One another kind of conversion, when automatic conversion not possible i.e. when they are not compatible is Shortening. It will be just opposite of above and diagram will be reversed. How is Long to Float Conversion possible? If we look carefully at the diagram, there is one conversion which looks questionable is Long(8 bytes) to Float(4 bytes) conversion. It looks like data lossy conversion. Actually, Type conversion does two things: Either change in range or change in behavior or both. Change in Range: short a = 3456 // this value can be varied within the range of -32768 to 32767 int b = a // now this value can be varied wi